README

Data generation

For all challenges during the Ideathon you are permitted to generate your own datasets for the purposes of developing, testing, and demonstrating your prototype. Simulated data must be completely randomly generated data and cannot be based based on any real world datasets, instead simulated data should be generated as simple heuristics, for example draws from probability distributions. For all simulated data you must include a data dictionary that defines all your variables and clearly justifies any choices in data generation. Finally, your data generation process must be simple enough that datasets can be clearly and easily reproduced.

Uploading and sharing simulated data

Any data generation scripts you create for use in your project must be shared with all other teams at the point of creation. This should be done by uploading your data generation script (not the data directly) to this repository in the appropriate folder which will be in the format health_challenge (e.g. mental_health). All scripts in this repository are licensed under MIT without exception.

You should name your script challenge_TeamName_DatasetID where:

challenge is a one-word ID for your chosen challenge that can be found in the README of each challenge area folder, as well as in the respective challenge repositories
TeamName is the name of your team in CamelCase
DatasetID is a readable identifier for your dataset in CamelCase (e.g., if you were generating a dataset to predict if bears like sandwiches then you might call this "BearSandwichData")

If you make changes to your script you are free to overwrite this directly using GitHub's version control, or you could add a new version with a suffix to clearly differentiate versions, e.g. challenge_TeamName_DatasetID_v2.5

Using other team's data

You may use data generated from another team's script providing this is cited as: "data used with thanks from challenge_TeamName_DatasetID (accessed YYYY-MM-DD)" (where YYYY-MM-DD is the date you accessed the script for generation).

Using other team's data generation scripts

You cannot make changes to other team scripts directly but are free to either:

Make Pull Requests that the team can merge or close; or
Create a new script based on another team's providing you have cited this as: "data generation process based on challenge_TeamName_DatasetID (accessed YYYY-MM-DD)" (where YYYY-MM-DD is the date you accessed the script for adaptation).

Please make use of Issues and Discussions if you have questions about another team's dataset.

Example

See example_challenge_area/bears_TeamWellcome_BearSandwichData_v1.R for an example of how we expect data generation scripts to look. Note the mapping in the link above:

health_challenge_area -> example_challenge_area
challenge -> bears
TeamName -> TeamWellcome
DatasetID -> BearSandwichData
Version -> v1

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
climate_health		climate_health
example_challenge_area		example_challenge_area
infectious_disease		infectious_disease
mental_health		mental_health
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

README

Data generation

Uploading and sharing simulated data

Using other team's data

Using other team's data generation scripts

Example

About

Releases

Packages

Contributors 13

Languages

License

WellcomeIdeathon2023/simulated_data

Folders and files

Latest commit

History

Repository files navigation

README

Data generation

Uploading and sharing simulated data

Using other team's data

Using other team's data generation scripts

Example

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 13

Languages

Packages