Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add background and about docs #217

Merged
merged 4 commits into from
Dec 5, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 31 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,35 @@
# FRX Challenges
# Frictionless Data Exchanges (FRX) for data challenges

## The Helm chart
Welcome to the FRX Challenges project! This is an open source repository that provides key software components for running competitive data science challenges.

FRX stands for **F**rictionless **R**eproducibility e**X**change. Inspired by [Donoho, 2023](https://doi.org/10.48550/arXiv.2310.00865). This project enables communities to leverage cloud infrastructure and interactive computing tools to **host data challenges with live computation**.
choldgraf marked this conversation as resolved.
Show resolved Hide resolved

## Target functionality and goals

This is a young project that is under active development.
Below are the core workflows that we wish to support:

- **prompts**: Allow organizers to create a website that describes the data challenge and provides instructions for participants.
- **submissions**: Allow participants to create one or more submissions for evaluation.
choldgraf marked this conversation as resolved.
Show resolved Hide resolved
- **evaluators**: Leverage cloud infrastructure to run submissions against standardized environments and datasets, and allow organizers to define their own evaluation scripts, criteria, and metrics.
- **feedback**: Provide information to participants about how their submissions scored relative to others.
- **teams**: Allow participants to submit and view their results as a team of people.
choldgraf marked this conversation as resolved.
Show resolved Hide resolved

## About this project and acknowledgements

choldgraf marked this conversation as resolved.
Show resolved Hide resolved
This project was built in collaboration with [the HHMI CellMap Segmentation challenge](https://cellmapchallenge.janelia.org/), which funded its original development.
Our goal is to generalize the infrastructure that enabled this challenge to be used for other communities, datasets, and workflows.

It builds heavily upon the Jupyter ecosystem and is designed to be interoperable with community-based cloud infrastructure like [JupyterHub](https://jupyterhub.readthedocs.io) and [BinderHub](https://binderhub.readthedocs.io).

It is currently developed and maintained [by 2i2c](https://2i2c.org), a non-profit dedicated to providing communities with interactive computing infrastructure to create and share knowledge.

## Technical details

This repository contains the core software that powers the FRX Challenges platform.
It consists of a Django application that is meant to leverage cloud infrastructure as part of the evaluation system.

### Additional Helm chart

The Helm chart lets a user create a reproducible and maintainable
deployment of FRX Challenges on a Kubernetes cluster in a cloud environment. The
Expand Down
51 changes: 51 additions & 0 deletions docs/frx.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
---
abbreviations:
FRX: Frictionless Research Exchange
---

# About Frictionless Data Exchanges

This page describes some of the background and inspiration for this project as defined in [Donoho, 2023](https://doi.org/10.48550/arXiv.2310.00865).

## The core idea

In [Donoho, 2023](https://doi.org/10.48550/arXiv.2310.00865), the author describes three key aspects of a Frictionless Data Exchange (FRX):

> The three initiatives are related but separate; and all three have to come together, and in a particularly strong way, to provide the conditions for the new era. Here they are:
>
> **[FR-1: Data] datafication of everything**, with a culture of research data sharing. One can now find datasets publicly available online on a bewildering variety of topics, from chest x-rays to cosmic microwave background measurements to uber routes to geospatial crop identifications.
>
> **[FR-2: Re-execution]** research code sharing including the ability to exactly re-execute the same complete workflow by different researchers.
>
> **[FR-3: Challenges]** adopting challenge problems as a new paradigm powering scientific research. The paradigm includes: a shared public dataset, a prescribed and quantified task performance metric, a set of enrolled competitors seeking to outperform each other on the task, and a public leaderboard. Thousands of such challenges with millions of entries have now taken place, across many fields.
>
> -- [Donoho, 2023](https://doi.org/10.48550/arXiv.2310.00865)

Together, these three components provide a powerful framework for sharing and accelerating scientific discovery:

> We see a new institution arising spontaneously; let’s call it a Frictionless Research Exchange (FRX).
> FRX is an exchange, because participants are constantly bringing something (code, data, re- sults), and taking something (code, data, new ideas), from the exchange; and various globally visible resources - task leaderboards, open review referee reports - broadcast information to the whole com- munity about what works and what doesn’t. Of course, this is a very different type of exchange from those involved in financial markets; it involves intellectual engagement, not money. Finan- cial exchanges produce price discovery. Frictionless Research Exchanges produce community critical review.
>
> -- [Donoho, 2023](https://doi.org/10.48550/arXiv.2310.00865)

However, there's a common missing link that requires an unnecessary amount of work to enable:
choldgraf marked this conversation as resolved.
Show resolved Hide resolved
choldgraf marked this conversation as resolved.
Show resolved Hide resolved

## Enabling Data Challenges is our goal

Donoho describes how the most common "missing piece" of FRX is to leave out the Data Challenge ([FR-3]) component.

> The most common leave-one-out setting is surely Reproducible Computational Science (RCS) where we combine [FR-1: Data Sharing] and [FR-2: Code Sharing], without [FR-3]. Here there is a scientific question but no underlying challenge problem being considered; we might simply be doing an exploratory data analysis and reporting what we saw, and giving others access to the data and the analysis scripts. RCS lacks the ability to focus attention of a broad audience on optimizing a performance measure.
>
> -- [Donoho, 2023](https://doi.org/10.48550/arXiv.2310.00865)

We believe that this is in-part because there are no clear tools or standards for enabling this aspect of FRX without a lot of custom work and infrastructure orchestration.

This is the gap that this project aims to fill. The `frx-challenges` project allows a data challenge organizer to enable **[FR-3: Challenges]** by leveraging open datasets ([FR-1]) and computational infrastructure for reproducible execution ([FR-2]).

Enabling all three of these components is a key aspect of realizing Frictionless Data Exchanges:

> Without all three triad legs [FR-1]+[FR-2]+[FR-3], FR is simply blocked.
>
> Less clear is what we might be missing without [FR-3 – Challenges]. We would be missing the task definition which formalized a specific research problem and made it an object of study; the competitive element which attracted our attention in the first place; and the performance measure- ment which crystallized a specific project’s contribution, boiling down an entire research contribution essentially to a single number, which can be reproduced. The quantification of performance – part of practice [FR-3] – makes researchers everywhere interested in reproducing work by others and gives discussion about earlier work clear focus; it enables a community of researchers to care intensely about a single defined performance number, and in discussing how it can be improved.

> -- [Donoho, 2023](https://doi.org/10.48550/arXiv.2310.00865)
19 changes: 2 additions & 17 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,3 @@
# FRX Challenges
```{include} ../README.md

Welcome to the FRX Challenges project! This is an open source repository that provides key software components for running competitive data science challenges.

:::{card} Getting Started
:link: quickstart.md
See our quickstart guide to deploying this project.
:::

## Features

- submissions!
- evaluators!
- teams!

## Goals

FRX stands for **F**rictionless **R**eproducibility e**X**change. Inspired by [Donoho, 2023](https://arxiv.org/abs/2310.00865v1).
```
4 changes: 3 additions & 1 deletion docs/myst.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ project:
# Auto-generated by `myst init --write-toc`
- file: index.md
- file: quickstart.md
- file: frx.md
- title: Contribute
children:
- file: CONTRIBUTING.md
Expand All @@ -23,4 +24,5 @@ site:
options:
hide_outline: true
logo: images/logo.png
# favicon: favicon.ico
logo_text: FRX Data Challenges
favicon: images/logo.png
5 changes: 5 additions & 0 deletions docs/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,8 @@ description: Get up and running deploying an FRX Challenge.
---

This is the quickstart guide to get you started with your first deployment of an FRX Challenge website.

:::{tip} Work in progress!

This guide is currently under active development, please check back soon for updates.
:::