diff --git a/README.md b/README.md index 76fe77108d..d74524c15e 100644 --- a/README.md +++ b/README.md @@ -1,31 +1,27 @@ ![Kedro Logo Banner](https://raw.githubusercontent.com/quantumblacklabs/kedro/master/img/kedro_banner.jpg) -`develop` | `master` -----------|--------- -[![CircleCI](https://circleci.com/gh/quantumblacklabs/kedro/tree/develop.svg?style=shield)](https://circleci.com/gh/quantumblacklabs/kedro/tree/develop) | [![CircleCI](https://circleci.com/gh/quantumblacklabs/kedro/tree/master.svg?style=shield)](https://circleci.com/gh/quantumblacklabs/kedro/tree/master) -[![Build status](https://ci.appveyor.com/api/projects/status/2u74p5g8fdc45wwh/branch/develop?svg=true)](https://ci.appveyor.com/project/QuantumBlack/kedro/branch/develop) | [![Build status](https://ci.appveyor.com/api/projects/status/2u74p5g8fdc45wwh/branch/master?svg=true)](https://ci.appveyor.com/project/QuantumBlack/kedro/branch/master) +----------------- -[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) -[![Python Version](https://img.shields.io/badge/python-3.5%20%7C%203.6%20%7C%203.7-blue.svg)](https://pypi.org/project/kedro/) -[![PyPI version](https://badge.fury.io/py/kedro.svg)](https://pypi.org/project/kedro/) -[![Documentation](https://readthedocs.org/projects/kedro/badge/?version=latest)](https://kedro.readthedocs.io/) -[![Code Style: Black](https://img.shields.io/badge/code%20style-black-black.svg)](https://github.com/ambv/black) -[![Downloads](https://pepy.tech/badge/kedro)](https://pepy.tech/project/kedro) +| Theme | Status | +|------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| Latest Release | [![PyPI version](https://badge.fury.io/py/kedro.svg)](https://pypi.org/project/kedro/) | +| Python Version | [![Python Version](https://img.shields.io/badge/python-3.5%20%7C%203.6%20%7C%203.7-blue.svg)](https://pypi.org/project/kedro/) | +| `master` Branch Build | [![CircleCI](https://circleci.com/gh/quantumblacklabs/kedro/tree/master.svg?style=shield)](https://circleci.com/gh/quantumblacklabs/kedro/tree/master) [![Build Status](https://ci.appveyor.com/api/projects/status/2u74p5g8fdc45wwh/branch/master?svg=true)](https://ci.appveyor.com/project/QuantumBlack/kedro/branch/master) | +| `develop` Branch Build | [![CircleCI](https://circleci.com/gh/quantumblacklabs/kedro/tree/develop.svg?style=shield)](https://circleci.com/gh/quantumblacklabs/kedro/tree/develop) [![Build status](https://ci.appveyor.com/api/projects/status/2u74p5g8fdc45wwh/branch/develop?svg=true)](https://ci.appveyor.com/project/QuantumBlack/kedro/branch/develop) | +| Documentation Build | [![Documentation](https://readthedocs.org/projects/kedro/badge/?version=latest)](https://kedro.readthedocs.io/) | +| License | [![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) | +| Code Style | [![Code Style: Black](https://img.shields.io/badge/code%20style-black-black.svg)](https://github.com/ambv/black) | -# What is Kedro? -> "The centre of your data pipeline." +## What is Kedro? -Kedro is a workflow development tool that helps you build data pipelines that are robust, scalable, deployable, reproducible and versioned. We provide a standard approach so that you can: -- spend more time building your data pipeline, -- worry less about how to write production-ready code, -- standardise the way that your team collaborates across your project, -- work more efficiently. +> "The centre of your data pipeline." -Kedro was originally designed by [Aris Valtazanos](https://github.com/arisvqb) and [Nikolaos Tsaousis](https://github.com/tsanikgr) to solve challenges they faced in their project work. +Kedro is a development workflow framework that implements software engineering best-practice for data pipelines with an eye towards productionising machine learning models. We provide a standard approach so that you can: + - Worry less about how to write production-ready code, + - Spend more time building data pipelines that are robust, scalable, deployable, reproducible and versioned, + - And, standardise the way that your team collaborates across your project. -This work was later turned into a product thanks to the following contributors: -[Ivan Danov](https://github.com/idanov), [Dmitrii Deriabin](https://github.com/DmitryDeryabin), [Gordon Wrigley](https://github.com/tolomea), [Yetunde Dada](https://github.com/yetudada), [Nasef Khan](https://github.com/nakhan98), [Kiyohito Kunii](https://github.com/921kiyo), [Nikolaos Kaltsas](https://github.com/nikos-kal), [Meisam Emamjome](https://github.com/misamae), [Peteris Erins](https://github.com/Pet3ris), [Lorena Balan](https://github.com/lorenabalan), [Richard Westenra](https://github.com/richardwestenra) and [Anton Kirilenko](https://github.com/Flid). ## How do I install Kedro? @@ -35,94 +31,63 @@ This work was later turned into a product thanks to the following contributors: pip install kedro ``` -For more detailed installation instructions, including how to setup Python virtual environments, please visit our [installation guide](https://kedro.readthedocs.io/en/latest/02_getting_started/02_install.html). - -## What are the main features of Kedro? - -### 1. Project template and coding standards +See more detailed installation instructions, including how to setup Python virtual environments, in our [installation guide](https://kedro.readthedocs.io/en/latest/02_getting_started/02_install.html) and get started with our ["Hello Word"](https://kedro.readthedocs.io/en/latest/02_getting_started/04_hello_world.html) example. -- A standard and easy-to-use project template -- Configuration for credentials, logging, data loading and Jupyter Notebooks / Lab -- Test-driven development using `pytest` -- [Sphinx](http://www.sphinx-doc.org/en/master/) integration to produce well-documented code +## Why does Kedro exist? -### 2. Data abstraction and versioning +Kedro is built upon our collective best-practice (and mistakes) trying to deliver real-world ML applications that have vast amounts of dirty data. We developed Kedro to achieve the following: -- Separation of the _compute_ layer from the _data handling_ layer, including support for different data formats and storage options -- Versioning for your data sets and machine learning models + - **Collaboration** on an analytics codebase when different team members have varied exposure to software engineering best-practice + - Focussing on **maintainable data and ML pipelines** as the standard, instead of a singular activity of deploying models in production + - A way to inspire the creation of **reusable analytics code** so that we never start from scratch when working on a new project + - **Efficient use of time** because we're able to quickly move from experimentation into production -### 3. Modularity and pipeline abstraction - -- Support for pure Python functions, `nodes`, to break large chunks of code into small independent sections -- Automatic resolution of dependencies between `nodes` -- Visualise your data pipeline with [Kedro-Viz](https://github.com/quantumblacklabs/kedro-viz), a tool that shows the pipeline structure of Kedro projects +Kedro was originally designed by [Aris Valtazanos](https://github.com/arisvqb) and [Nikolaos Tsaousis](https://github.com/tsanikgr) to solve challenges they faced in their project work. This work was later turned into a product thanks to the following contributors: +[Ivan Danov](https://github.com/idanov), [Dmitrii Deriabin](https://github.com/DmitryDeryabin), [Gordon Wrigley](https://github.com/tolomea), [Yetunde Dada](https://github.com/yetudada), [Nasef Khan](https://github.com/nakhan98), [Kiyohito Kunii](https://github.com/921kiyo), [Nikolaos Kaltsas](https://github.com/nikos-kal), [Meisam Emamjome](https://github.com/misamae), [Peteris Erins](https://github.com/Pet3ris), [Lorena Balan](https://github.com/lorenabalan), [Richard Westenra](https://github.com/richardwestenra) and [Anton Kirilenko](https://github.com/Flid). -*Note:* Read our [FAQs](https://kedro.readthedocs.io/en/latest/06_resources/01_faq.html#how-does-kedro-compare-to-other-projects) to learn how we differ from workflow managers like Airflow and Luigi. +## What are the main features of Kedro? ![Kedro-Viz Pipeline Visualisation](https://raw.githubusercontent.com/quantumblacklabs/kedro/master/img/pipeline_visualisation.png) *A pipeline visualisation generated using [Kedro-Viz](https://github.com/quantumblacklabs/kedro-viz)* -### 4. Feature extensibility -- A plugin system that injects commands into the Kedro command line interface (CLI) -- List of officially supported plugins: - - [Kedro-Airflow](https://github.com/quantumblacklabs/kedro-airflow), making it easy to prototype your data pipeline in Kedro before deploying to [Airflow](https://github.com/apache/airflow), a workflow scheduler - - [Kedro-Docker](https://github.com/quantumblacklabs/kedro-docker), a tool for packaging and shipping Kedro projects within containers -- Kedro can be deployed locally, on-premise and cloud (AWS, Azure and GCP) servers, or clusters (EMR, Azure HDinsight, GCP and Databricks) +| Feature | What is this? | +|----------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| Project Template | A standard, modifiable and easy-to-use project template based on [Cookiecutter Data Science](https://github.com/drivendata/cookiecutter-data-science/). | +| Data Catalog | A series of lightweight data connectors used for saving and loading data across many different file formats and file systems including local storage, AWS, Azure Blob and GCP _(coming soon)_. The Data Catalog also includes data and model versioning for file-based systems. Used with a Python or YAML API. | +| Pipeline Abstraction | Automatic resolution of dependencies between pure Python functions and data pipeline visualisation using [Kedro-Viz](https://github.com/quantumblacklabs/kedro-viz). | +| The Journal | An ability to reproduce pipeline runs with saved pipeline run results. | +| Coding Standards | Test-driven development using `pytest`, produce well-documented code using [Sphinx](http://www.sphinx-doc.org/en/master/) and make use of the standard Python logging library. | +| Flexible Deployment | Deployment strategies that include the use of Docker with [Kedro-Docker](https://github.com/quantumblacklabs/kedro-docker), conversion of Kedro pipelines into Airflow DAGs with [Kedro-Airflow](https://github.com/quantumblacklabs/kedro-airflow), leveraging a REST API endpoint with Kedro-Server _(coming soon)_ and serving Kedro pipelines as a Python package. Kedro can be deployed locally, on-premise and cloud (AWS, Azure and GCP) servers, or clusters (EMR, Azure HDinsight, GCP and Databricks) | -## What are the main Kedro building blocks? - -You can find the overview of Kedro architecture [here](https://kedro.readthedocs.io/en/latest/06_resources/02_architecture_overview.html). ## How do I use Kedro? Our [documentation](https://kedro.readthedocs.io/en/latest/) explains: -- A typical Kedro workflow -- How to set up the project configuration -- Building your first pipeline -- How to use the CLI offered by `kedro_cli.py` (`kedro new`, `kedro run`, ...) +- Best-practice on how to [get started using Kedro](https://kedro.readthedocs.io/en/latest/02_getting_started/01_prerequisites.html) +- A ["Hello World" data and ML pipeline example](https://kedro.readthedocs.io/en/latest/02_getting_started/04_hello_world.html) based on the **Iris dataset** +- A two-hour [Spaceflights tutorial](https://kedro.readthedocs.io/en/latest/03_tutorial/01_workflow.html) that teaches you beginner to intermediate functionality +- How to [use the CLI](https://kedro.readthedocs.io/en/latest/06_resources/03_commands_reference.html) offered by `kedro_cli.py` (`kedro new`, `kedro run`, ...) +- An overview of [Kedro architecture](https://kedro.readthedocs.io/en/latest/06_resources/02_architecture_overview.html) +- [Frequently asked questions (FAQs)](https://kedro.readthedocs.io/en/latest/06_resources/01_faq.html) -> *Note:* The CLI is a convenient tool for being able to run `kedro` commands but you can also invoke the Kedro CLI as a Python module with `python -m kedro` +Documentation for the latest stable release can be found [here](https://kedro.readthedocs.io/en/latest/). You can also run `kedro docs` from your CLI and open the documentation for your current version of Kedro in a browser. -## How do I find Kedro documentation? - -This CLI command will open the documentation for your current version of Kedro in a browser: - -``` -kedro docs -``` - -Documentation for the latest stable release can be found [here](https://kedro.readthedocs.io/en/latest/). Check these out first: +> *Note:* The CLI is a convenient tool for being able to run `kedro` commands but you can also invoke the Kedro CLI as a Python module with `python -m kedro` -- [Getting started](https://kedro.readthedocs.io/en/latest/02_getting_started/01_prerequisites.html) -- [Tutorial](https://kedro.readthedocs.io/en/latest/03_tutorial/01_workflow.html) -- [FAQ](https://kedro.readthedocs.io/en/latest/06_resources/01_faq.html) +*Note:* Read our [FAQs](https://kedro.readthedocs.io/en/latest/06_resources/01_faq.html#how-does-kedro-compare-to-other-projects) to learn how we differ from workflow managers like Airflow and Luigi. ## Can I contribute? Yes! Want to help build Kedro? Check out our guide to [contributing](https://github.com/quantumblacklabs/kedro/blob/master/CONTRIBUTING.md). -## How do I upgrade Kedro? - -We use [Semantic Versioning](http://semver.org/). The best way to safely upgrade is to check our [release notes](https://github.com/quantumblacklabs/kedro/blob/master/RELEASE.md) for any notable breaking changes. - -Once Kedro is installed, you can check your version as follows: - -``` -kedro --version -``` - -To later upgrade Kedro to a different version, simply run: - -``` -pip install kedro -U -``` ## What licence do you use? Kedro is licensed under the [Apache 2.0](https://github.com/quantumblacklabs/kedro/blob/master/LICENSE.md) License. + ## We're hiring! Do you want to be part of the team that builds Kedro and [other great products](https://quantumblack.com/labs) at QuantumBlack? If so, you're in luck! QuantumBlack is currently hiring Software Engineers who love using data to drive their decisions. Take a look at [our open positions](https://www.quantumblack.com/careers/current-openings#content) and see if you're a fit. diff --git a/docs/source/06_resources/01_faq.md b/docs/source/06_resources/01_faq.md index 0fb77ec62b..b546924e06 100644 --- a/docs/source/06_resources/01_faq.md +++ b/docs/source/06_resources/01_faq.md @@ -83,7 +83,23 @@ The primary differences to Bonobo ETL and Bubbles are related to the following f ## What version of Python does Kedro use? -Kedro is built for Python 3.5+. +Kedro is built for Python 3.5, 3.6 and 3.7. + +## How do I upgrade Kedro? + +We use [Semantic Versioning](http://semver.org/). The best way to safely upgrade is to check our [release notes](https://github.com/quantumblacklabs/kedro/blob/master/RELEASE.md) for any notable breaking changes. + +Once Kedro is installed, you can check your version as follows: + +``` +kedro --version +``` + +To later upgrade Kedro to a different version, simply run: + +``` +pip install kedro -U +``` ## What best practice should I follow to avoid leaking confidential data? diff --git a/img/pipeline_visualisation.png b/img/pipeline_visualisation.png index f35eaa1e41..e8338254f5 100644 Binary files a/img/pipeline_visualisation.png and b/img/pipeline_visualisation.png differ