diff --git a/README.md b/README.md index 39f54a5..b2e5cb6 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,36 @@ -# FRX Challenges +# Frictionless Data Exchanges (FRX) for data challenges -## The Helm chart +Welcome to the FRX Challenges project! This is an open source repository that provides key software components for running competitive data science challenges. + +FRX stands for **F**rictionless **R**eproducibility e**X**change. Inspired by [Donoho, 2023](https://doi.org/10.48550/arXiv.2310.00865). This project enables communities to **host data challenges with live evaluation**. It is designed to be run either on local infrastructure, or via cloud-hosted infrastructure. + +## Target functionality and goals + +This is a young project that is under active development. +Below are the core workflows that we wish to support: + +- **prompts**: Allow organizers to create a website that describes the data challenge and provides instructions for participants. +- **submissions**: Allow one or more participants to create one or more submissions for evaluation. +- **evaluators**: Leverage cloud infrastructure to run submissions against standardized environments and datasets, and allow organizers to define their own evaluation scripts, criteria, and metrics. +- **feedback**: Provide information to participants about how their submissions scored relative to others. + +We aim for this project to work **locally** as well as **via cloud infrastructure**, so that users may prototype and make submissions via their own hardware and environments, or via community-hosted infrastructure like **JupyterHub** and **BinderHub**. + +## About this project and acknowledgements + +This project was built in collaboration with [the HHMI CellMap Segmentation challenge](https://cellmapchallenge.janelia.org/), which funded its original development. +Our goal is to generalize the infrastructure that enabled this challenge to be used for other communities, datasets, and workflows. + +It builds heavily upon the Jupyter ecosystem and is designed to be interoperable with community-based cloud infrastructure like [JupyterHub](https://jupyterhub.readthedocs.io) and [BinderHub](https://binderhub.readthedocs.io). + +It is currently developed and maintained [by 2i2c](https://2i2c.org), a non-profit dedicated to providing communities with interactive computing infrastructure to create and share knowledge. + +## Technical details + +This repository contains the core software that powers the FRX Challenges platform. +It consists of a Django application that is meant to leverage cloud infrastructure as part of the evaluation system. + +### Additional Helm chart The Helm chart lets a user create a reproducible and maintainable deployment of FRX Challenges on a Kubernetes cluster in a cloud environment. The diff --git a/docs/frx.md b/docs/frx.md new file mode 100644 index 0000000..e7b2bff --- /dev/null +++ b/docs/frx.md @@ -0,0 +1,51 @@ +--- +abbreviations: + FRX: Frictionless Research Exchange +--- + +# About Frictionless Data Exchanges + +This page describes some of the background and inspiration for this project as defined in [Donoho, 2023](https://doi.org/10.48550/arXiv.2310.00865). + +## The core idea + +In [Donoho, 2023](https://doi.org/10.48550/arXiv.2310.00865), the author describes three key aspects of a Frictionless Data Exchange (FRX): + +> The three initiatives are related but separate; and all three have to come together, and in a particularly strong way, to provide the conditions for the new era. Here they are: +> +> **[FR-1: Data] datafication of everything**, with a culture of research data sharing. One can now find datasets publicly available online on a bewildering variety of topics, from chest x-rays to cosmic microwave background measurements to uber routes to geospatial crop identifications. +> +> **[FR-2: Re-execution]** research code sharing including the ability to exactly re-execute the same complete workflow by different researchers. +> +> **[FR-3: Challenges]** adopting challenge problems as a new paradigm powering scientific research. The paradigm includes: a shared public dataset, a prescribed and quantified task performance metric, a set of enrolled competitors seeking to outperform each other on the task, and a public leaderboard. Thousands of such challenges with millions of entries have now taken place, across many fields. +> +> -- [Donoho, 2023](https://doi.org/10.48550/arXiv.2310.00865) + +Together, these three components provide a powerful framework for sharing and accelerating scientific discovery: + +> We see a new institution arising spontaneously; let’s call it a Frictionless Research Exchange (FRX). +> FRX is an exchange, because participants are constantly bringing something (code, data, re- sults), and taking something (code, data, new ideas), from the exchange; and various globally visible resources - task leaderboards, open review referee reports - broadcast information to the whole com- munity about what works and what doesn’t. Of course, this is a very different type of exchange from those involved in financial markets; it involves intellectual engagement, not money. Finan- cial exchanges produce price discovery. Frictionless Research Exchanges produce community critical review. +> +> -- [Donoho, 2023](https://doi.org/10.48550/arXiv.2310.00865) + +However, there's a common missing link that requires an unnecessary amount of work to enable: providing infrastructure that reduces the friction of hosting data challenges. + +## Enabling Data Challenges is our goal + +Donoho describes how the most common "missing piece" of FRX is to leave out the Data Challenge ([FR-3]) component. + +> The most common leave-one-out setting is surely Reproducible Computational Science (RCS) where we combine [FR-1: Data Sharing] and [FR-2: Code Sharing], without [FR-3]. Here there is a scientific question but no underlying challenge problem being considered; we might simply be doing an exploratory data analysis and reporting what we saw, and giving others access to the data and the analysis scripts. RCS lacks the ability to focus attention of a broad audience on optimizing a performance measure. +> +> -- [Donoho, 2023](https://doi.org/10.48550/arXiv.2310.00865) + +We believe that this is in-part because there are no clear tools or standards for enabling this aspect of FRX without a lot of custom work and infrastructure orchestration. + +This is the gap that this project aims to fill. The `frx-challenges` project allows a data challenge organizer to enable **[FR-3: Challenges]** by leveraging open datasets ([FR-1]) and computational infrastructure for reproducible execution ([FR-2]). + +Enabling all three of these components is a key aspect of realizing Frictionless Data Exchanges: + +> Without all three triad legs [FR-1]+[FR-2]+[FR-3], FR is simply blocked. +> +> Less clear is what we might be missing without [FR-3 – Challenges]. We would be missing the task definition which formalized a specific research problem and made it an object of study; the competitive element which attracted our attention in the first place; and the performance measure- ment which crystallized a specific project’s contribution, boiling down an entire research contribution essentially to a single number, which can be reproduced. The quantification of performance – part of practice [FR-3] – makes researchers everywhere interested in reproducing work by others and gives discussion about earlier work clear focus; it enables a community of researchers to care intensely about a single defined performance number, and in discussing how it can be improved. + +> -- [Donoho, 2023](https://doi.org/10.48550/arXiv.2310.00865) diff --git a/docs/index.md b/docs/index.md index 67649a5..691383a 100644 --- a/docs/index.md +++ b/docs/index.md @@ -1,18 +1,3 @@ -# FRX Challenges +```{include} ../README.md -Welcome to the FRX Challenges project! This is an open source repository that provides key software components for running competitive data science challenges. - -:::{card} Getting Started -:link: quickstart.md -See our quickstart guide to deploying this project. -::: - -## Features - -- submissions! -- evaluators! -- teams! - -## Goals - -FRX stands for **F**rictionless **R**eproducibility e**X**change. Inspired by [Donoho, 2023](https://arxiv.org/abs/2310.00865v1). +``` diff --git a/docs/myst.yml b/docs/myst.yml index 66d1263..5e51263 100644 --- a/docs/myst.yml +++ b/docs/myst.yml @@ -14,6 +14,7 @@ project: # Auto-generated by `myst init --write-toc` - file: index.md - file: quickstart.md + - file: frx.md - title: Contribute children: - file: CONTRIBUTING.md @@ -23,4 +24,5 @@ site: options: hide_outline: true logo: images/logo.png - # favicon: favicon.ico + logo_text: FRX Data Challenges + favicon: images/logo.png diff --git a/docs/quickstart.md b/docs/quickstart.md index 28dbe29..222067a 100644 --- a/docs/quickstart.md +++ b/docs/quickstart.md @@ -5,3 +5,8 @@ description: Get up and running deploying an FRX Challenge. --- This is the quickstart guide to get you started with your first deployment of an FRX Challenge website. + +:::{tip} Work in progress! + +This guide is currently under active development, please check back soon for updates. +:::