Skip to content

Commit

Permalink
Merge pull request #16 from udel-cbcb/Submodule3_Tutorial2
Browse files Browse the repository at this point in the history
Submodule3 tutorial2
  • Loading branch information
chenchuming authored Oct 20, 2024
2 parents 4fd6c1d + 2b35b8a commit 68d9f8c
Show file tree
Hide file tree
Showing 3 changed files with 318 additions and 140 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -176,3 +176,4 @@ cython_debug/
/Submodule_1/Exercises/*.csv
/Submodule_1/Tutorials/foo.*
/Submodule_3/Tutorials/*.csv
/Submodule_3/Tutorials/*.names
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@
"source": [
"# Basic Data Cleaning\n",
"\n",
"Adpated from Jason Brownlee. 2020. [Data Preparation for Machine Learning](https://machinelearningmastery.com/data-preparation-for-machine-learning/).\n",
"\n",
"## Overview\n",
"\n",
"This tutorial covers basic data cleaning techniques using Python and pandas. We'll explore common data quality issues and learn how to address them effectively.\n",
Expand Down Expand Up @@ -47,22 +49,46 @@
"metadata": {},
"outputs": [],
"source": [
"# Install required packages\n",
"%pip install numpy\n",
"%pip install pandas\n",
"%pip install requests\n",
"\n",
"# Import necessary libraries\n",
"%pip install requests"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Import necessary libraries"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from pathlib import Path\n",
"\n",
"import numpy as np\n",
"import pandas as pd\n",
"import requests\n",
"import requests"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Define utility functions\n",
"\n",
"Define a helper function for downloading example datasets. \n",
"\n",
"# Define helper function for downloading example datasets.\n",
"# (It is not essential that you understand the following code--it is just for\n",
"# getting the example data.)\n",
"*Note!* It is not essential that you understand the following code. It is just for getting the example data."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def download(url, to_file):\n",
" \"\"\"Download content from the given URL and save it to a file.\n",
"\n",
Expand All @@ -72,20 +98,8 @@
"\n",
" \"\"\"\n",
" response = requests.get(url, timeout=10)\n",
" Path(to_file).write_bytes(response.content)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this tutorial, you will learn:\n",
"\n",
"* How to identify and remove column variables that only have a single value.\n",
"* How to identify and consider column variables with very few unique values.\n",
"* How to identify and remove rows that contain duplicate observations.\n",
"\n",
"Adpated from Jason Brownlee. 2020. [Data Preparation for Machine Learning](https://machinelearningmastery.com/data-preparation-for-machine-learning/)."
" Path(to_file).write_bytes(response.content)\n",
" print(f\"downloaded file '{to_file}'\")"
]
},
{
Expand Down
Loading

0 comments on commit 68d9f8c

Please sign in to comment.