Skip to content

Commit

Permalink
[KED-1174] README.md redesign (kedro-org#324)
Browse files Browse the repository at this point in the history
* Reduced text to add in decription of why Kedro exists
* Changed pipeline image
* Moved content to FAQ
  • Loading branch information
yetudada authored Nov 27, 2019
1 parent 02b2c0c commit 67aa3f4
Show file tree
Hide file tree
Showing 3 changed files with 61 additions and 80 deletions.
123 changes: 44 additions & 79 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,31 +1,27 @@
![Kedro Logo Banner](https://raw.githubusercontent.com/quantumblacklabs/kedro/master/img/kedro_banner.jpg)

`develop` | `master`
----------|---------
[![CircleCI](https://circleci.com/gh/quantumblacklabs/kedro/tree/develop.svg?style=shield)](https://circleci.com/gh/quantumblacklabs/kedro/tree/develop) | [![CircleCI](https://circleci.com/gh/quantumblacklabs/kedro/tree/master.svg?style=shield)](https://circleci.com/gh/quantumblacklabs/kedro/tree/master)
[![Build status](https://ci.appveyor.com/api/projects/status/2u74p5g8fdc45wwh/branch/develop?svg=true)](https://ci.appveyor.com/project/QuantumBlack/kedro/branch/develop) | [![Build status](https://ci.appveyor.com/api/projects/status/2u74p5g8fdc45wwh/branch/master?svg=true)](https://ci.appveyor.com/project/QuantumBlack/kedro/branch/master)
-----------------

[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![Python Version](https://img.shields.io/badge/python-3.5%20%7C%203.6%20%7C%203.7-blue.svg)](https://pypi.org/project/kedro/)
[![PyPI version](https://badge.fury.io/py/kedro.svg)](https://pypi.org/project/kedro/)
[![Documentation](https://readthedocs.org/projects/kedro/badge/?version=latest)](https://kedro.readthedocs.io/)
[![Code Style: Black](https://img.shields.io/badge/code%20style-black-black.svg)](https://github.com/ambv/black)
[![Downloads](https://pepy.tech/badge/kedro)](https://pepy.tech/project/kedro)
| Theme | Status |
|------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Latest Release | [![PyPI version](https://badge.fury.io/py/kedro.svg)](https://pypi.org/project/kedro/) |
| Python Version | [![Python Version](https://img.shields.io/badge/python-3.5%20%7C%203.6%20%7C%203.7-blue.svg)](https://pypi.org/project/kedro/) |
| `master` Branch Build | [![CircleCI](https://circleci.com/gh/quantumblacklabs/kedro/tree/master.svg?style=shield)](https://circleci.com/gh/quantumblacklabs/kedro/tree/master) [![Build Status](https://ci.appveyor.com/api/projects/status/2u74p5g8fdc45wwh/branch/master?svg=true)](https://ci.appveyor.com/project/QuantumBlack/kedro/branch/master) |
| `develop` Branch Build | [![CircleCI](https://circleci.com/gh/quantumblacklabs/kedro/tree/develop.svg?style=shield)](https://circleci.com/gh/quantumblacklabs/kedro/tree/develop) [![Build status](https://ci.appveyor.com/api/projects/status/2u74p5g8fdc45wwh/branch/develop?svg=true)](https://ci.appveyor.com/project/QuantumBlack/kedro/branch/develop) |
| Documentation Build | [![Documentation](https://readthedocs.org/projects/kedro/badge/?version=latest)](https://kedro.readthedocs.io/) |
| License | [![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) |
| Code Style | [![Code Style: Black](https://img.shields.io/badge/code%20style-black-black.svg)](https://github.com/ambv/black) |

# What is Kedro?

> "The centre of your data pipeline."
## What is Kedro?

Kedro is a workflow development tool that helps you build data pipelines that are robust, scalable, deployable, reproducible and versioned. We provide a standard approach so that you can:
- spend more time building your data pipeline,
- worry less about how to write production-ready code,
- standardise the way that your team collaborates across your project,
- work more efficiently.
> "The centre of your data pipeline."
Kedro was originally designed by [Aris Valtazanos](https://github.com/arisvqb) and [Nikolaos Tsaousis](https://github.com/tsanikgr) to solve challenges they faced in their project work.
Kedro is a development workflow framework that implements software engineering best-practice for data pipelines with an eye towards productionising machine learning models. We provide a standard approach so that you can:
- Worry less about how to write production-ready code,
- Spend more time building data pipelines that are robust, scalable, deployable, reproducible and versioned,
- And, standardise the way that your team collaborates across your project.

This work was later turned into a product thanks to the following contributors:
[Ivan Danov](https://github.com/idanov), [Dmitrii Deriabin](https://github.com/DmitryDeryabin), [Gordon Wrigley](https://github.com/tolomea), [Yetunde Dada](https://github.com/yetudada), [Nasef Khan](https://github.com/nakhan98), [Kiyohito Kunii](https://github.com/921kiyo), [Nikolaos Kaltsas](https://github.com/nikos-kal), [Meisam Emamjome](https://github.com/misamae), [Peteris Erins](https://github.com/Pet3ris), [Lorena Balan](https://github.com/lorenabalan), [Richard Westenra](https://github.com/richardwestenra) and [Anton Kirilenko](https://github.com/Flid).

## How do I install Kedro?

Expand All @@ -35,94 +31,63 @@ This work was later turned into a product thanks to the following contributors:
pip install kedro
```

For more detailed installation instructions, including how to setup Python virtual environments, please visit our [installation guide](https://kedro.readthedocs.io/en/latest/02_getting_started/02_install.html).

## What are the main features of Kedro?

### 1. Project template and coding standards
See more detailed installation instructions, including how to setup Python virtual environments, in our [installation guide](https://kedro.readthedocs.io/en/latest/02_getting_started/02_install.html) and get started with our ["Hello Word"](https://kedro.readthedocs.io/en/latest/02_getting_started/04_hello_world.html) example.

- A standard and easy-to-use project template
- Configuration for credentials, logging, data loading and Jupyter Notebooks / Lab
- Test-driven development using `pytest`
- [Sphinx](http://www.sphinx-doc.org/en/master/) integration to produce well-documented code
## Why does Kedro exist?

### 2. Data abstraction and versioning
Kedro is built upon our collective best-practice (and mistakes) trying to deliver real-world ML applications that have vast amounts of dirty data. We developed Kedro to achieve the following:

- Separation of the _compute_ layer from the _data handling_ layer, including support for different data formats and storage options
- Versioning for your data sets and machine learning models
- **Collaboration** on an analytics codebase when different team members have varied exposure to software engineering best-practice
- Focussing on **maintainable data and ML pipelines** as the standard, instead of a singular activity of deploying models in production
- A way to inspire the creation of **reusable analytics code** so that we never start from scratch when working on a new project
- **Efficient use of time** because we're able to quickly move from experimentation into production

### 3. Modularity and pipeline abstraction

- Support for pure Python functions, `nodes`, to break large chunks of code into small independent sections
- Automatic resolution of dependencies between `nodes`
- Visualise your data pipeline with [Kedro-Viz](https://github.com/quantumblacklabs/kedro-viz), a tool that shows the pipeline structure of Kedro projects
Kedro was originally designed by [Aris Valtazanos](https://github.com/arisvqb) and [Nikolaos Tsaousis](https://github.com/tsanikgr) to solve challenges they faced in their project work. This work was later turned into a product thanks to the following contributors:
[Ivan Danov](https://github.com/idanov), [Dmitrii Deriabin](https://github.com/DmitryDeryabin), [Gordon Wrigley](https://github.com/tolomea), [Yetunde Dada](https://github.com/yetudada), [Nasef Khan](https://github.com/nakhan98), [Kiyohito Kunii](https://github.com/921kiyo), [Nikolaos Kaltsas](https://github.com/nikos-kal), [Meisam Emamjome](https://github.com/misamae), [Peteris Erins](https://github.com/Pet3ris), [Lorena Balan](https://github.com/lorenabalan), [Richard Westenra](https://github.com/richardwestenra) and [Anton Kirilenko](https://github.com/Flid).

*Note:* Read our [FAQs](https://kedro.readthedocs.io/en/latest/06_resources/01_faq.html#how-does-kedro-compare-to-other-projects) to learn how we differ from workflow managers like Airflow and Luigi.
## What are the main features of Kedro?

![Kedro-Viz Pipeline Visualisation](https://raw.githubusercontent.com/quantumblacklabs/kedro/master/img/pipeline_visualisation.png)
*A pipeline visualisation generated using [Kedro-Viz](https://github.com/quantumblacklabs/kedro-viz)*

### 4. Feature extensibility

- A plugin system that injects commands into the Kedro command line interface (CLI)
- List of officially supported plugins:
- [Kedro-Airflow](https://github.com/quantumblacklabs/kedro-airflow), making it easy to prototype your data pipeline in Kedro before deploying to [Airflow](https://github.com/apache/airflow), a workflow scheduler
- [Kedro-Docker](https://github.com/quantumblacklabs/kedro-docker), a tool for packaging and shipping Kedro projects within containers
- Kedro can be deployed locally, on-premise and cloud (AWS, Azure and GCP) servers, or clusters (EMR, Azure HDinsight, GCP and Databricks)
| Feature | What is this? |
|----------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Project Template | A standard, modifiable and easy-to-use project template based on [Cookiecutter Data Science](https://github.com/drivendata/cookiecutter-data-science/). |
| Data Catalog | A series of lightweight data connectors used for saving and loading data across many different file formats and file systems including local storage, AWS, Azure Blob and GCP _(coming soon)_. The Data Catalog also includes data and model versioning for file-based systems. Used with a Python or YAML API. |
| Pipeline Abstraction | Automatic resolution of dependencies between pure Python functions and data pipeline visualisation using [Kedro-Viz](https://github.com/quantumblacklabs/kedro-viz). |
| The Journal | An ability to reproduce pipeline runs with saved pipeline run results. |
| Coding Standards | Test-driven development using `pytest`, produce well-documented code using [Sphinx](http://www.sphinx-doc.org/en/master/) and make use of the standard Python logging library. |
| Flexible Deployment | Deployment strategies that include the use of Docker with [Kedro-Docker](https://github.com/quantumblacklabs/kedro-docker), conversion of Kedro pipelines into Airflow DAGs with [Kedro-Airflow](https://github.com/quantumblacklabs/kedro-airflow), leveraging a REST API endpoint with Kedro-Server _(coming soon)_ and serving Kedro pipelines as a Python package. Kedro can be deployed locally, on-premise and cloud (AWS, Azure and GCP) servers, or clusters (EMR, Azure HDinsight, GCP and Databricks) |

## What are the main Kedro building blocks?

You can find the overview of Kedro architecture [here](https://kedro.readthedocs.io/en/latest/06_resources/02_architecture_overview.html).

## How do I use Kedro?

Our [documentation](https://kedro.readthedocs.io/en/latest/) explains:

- A typical Kedro workflow
- How to set up the project configuration
- Building your first pipeline
- How to use the CLI offered by `kedro_cli.py` (`kedro new`, `kedro run`, ...)
- Best-practice on how to [get started using Kedro](https://kedro.readthedocs.io/en/latest/02_getting_started/01_prerequisites.html)
- A ["Hello World" data and ML pipeline example](https://kedro.readthedocs.io/en/latest/02_getting_started/04_hello_world.html) based on the **Iris dataset**
- A two-hour [Spaceflights tutorial](https://kedro.readthedocs.io/en/latest/03_tutorial/01_workflow.html) that teaches you beginner to intermediate functionality
- How to [use the CLI](https://kedro.readthedocs.io/en/latest/06_resources/03_commands_reference.html) offered by `kedro_cli.py` (`kedro new`, `kedro run`, ...)
- An overview of [Kedro architecture](https://kedro.readthedocs.io/en/latest/06_resources/02_architecture_overview.html)
- [Frequently asked questions (FAQs)](https://kedro.readthedocs.io/en/latest/06_resources/01_faq.html)

> *Note:* The CLI is a convenient tool for being able to run `kedro` commands but you can also invoke the Kedro CLI as a Python module with `python -m kedro`
Documentation for the latest stable release can be found [here](https://kedro.readthedocs.io/en/latest/). You can also run `kedro docs` from your CLI and open the documentation for your current version of Kedro in a browser.

## How do I find Kedro documentation?

This CLI command will open the documentation for your current version of Kedro in a browser:

```
kedro docs
```

Documentation for the latest stable release can be found [here](https://kedro.readthedocs.io/en/latest/). Check these out first:
> *Note:* The CLI is a convenient tool for being able to run `kedro` commands but you can also invoke the Kedro CLI as a Python module with `python -m kedro`
- [Getting started](https://kedro.readthedocs.io/en/latest/02_getting_started/01_prerequisites.html)
- [Tutorial](https://kedro.readthedocs.io/en/latest/03_tutorial/01_workflow.html)
- [FAQ](https://kedro.readthedocs.io/en/latest/06_resources/01_faq.html)
*Note:* Read our [FAQs](https://kedro.readthedocs.io/en/latest/06_resources/01_faq.html#how-does-kedro-compare-to-other-projects) to learn how we differ from workflow managers like Airflow and Luigi.

## Can I contribute?

Yes! Want to help build Kedro? Check out our guide to [contributing](https://github.com/quantumblacklabs/kedro/blob/master/CONTRIBUTING.md).

## How do I upgrade Kedro?

We use [Semantic Versioning](http://semver.org/). The best way to safely upgrade is to check our [release notes](https://github.com/quantumblacklabs/kedro/blob/master/RELEASE.md) for any notable breaking changes.

Once Kedro is installed, you can check your version as follows:

```
kedro --version
```

To later upgrade Kedro to a different version, simply run:

```
pip install kedro -U
```

## What licence do you use?

Kedro is licensed under the [Apache 2.0](https://github.com/quantumblacklabs/kedro/blob/master/LICENSE.md) License.


## We're hiring!

Do you want to be part of the team that builds Kedro and [other great products](https://quantumblack.com/labs) at QuantumBlack? If so, you're in luck! QuantumBlack is currently hiring Software Engineers who love using data to drive their decisions. Take a look at [our open positions](https://www.quantumblack.com/careers/current-openings#content) and see if you're a fit.
18 changes: 17 additions & 1 deletion docs/source/06_resources/01_faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,23 @@ The primary differences to Bonobo ETL and Bubbles are related to the following f

## What version of Python does Kedro use?

Kedro is built for Python 3.5+.
Kedro is built for Python 3.5, 3.6 and 3.7.

## How do I upgrade Kedro?

We use [Semantic Versioning](http://semver.org/). The best way to safely upgrade is to check our [release notes](https://github.com/quantumblacklabs/kedro/blob/master/RELEASE.md) for any notable breaking changes.

Once Kedro is installed, you can check your version as follows:

```
kedro --version
```

To later upgrade Kedro to a different version, simply run:

```
pip install kedro -U
```

## What best practice should I follow to avoid leaking confidential data?

Expand Down
Binary file modified img/pipeline_visualisation.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 67aa3f4

Please sign in to comment.