This repository is a collection of my projects from Dataquest.io.
The projects below will serve as reference notes for myself and anyone else who is interested. These projects contain lots of comments showing my thought process and what I personally learned from each project.
Project #1: Exploring US births
Concepts explored: lists, dictionaries, functions, for loops
Functions, methods, and properties used: .read(), open(), .split(), .append(), int()
Project #2: Exploring Gun Deaths in the US
Concepts explored: list comprehension, datetime module, csv module
Functions, methods, and properties used: csv.reader(), .items(), list(), datetime.datetime()
Project #3: Analyzing Thanksgiving Dinner
Concepts explored: pandas, functions, boolean filtering
Functions, methods, and properties used: .read_csv(), .pivot_table(), .replace(), .describe(), .apply(), .isnull(), .columns, .shape, .head()
Project #4: Visualizing Earnings Based On College Majors
Concepts explored: pandas, matplotlib, histograms, bar charts, scatterplots, scatter matrices
Functions, methods, and properties used: .plot(), scatter_matrix(), hist(), iloc[], .head(), .tail(), .describe()
Project #5: Visualizing The Gender Gap In College Degrees
Concepts explored: pandas, matplotlib, histograms, line plots, chart graphics
Functions, methods, and properties used: .savefig(), .text(), .axhline(), .set_yticks(), .tick_params(), .set_title(), .set_ylim(), .set_xlim(), .spines(), .tick_params()
Project #6: Analyzing NYC High School Data
Concepts explored: pandas, matplotlib.pyplot, correlations, regex, basemap, data analysis, string manipulation
Functions, methods, and properties used: .scatter(), info(), .tolist(), .groupby(), .agg(), .concat(), .apply(), .strip, .merge(), .fillna(), .corr()
Project #7: Star Wars Survey
Concepts explored: pandas, matplotlib.pyplot, data cleaning, string manipulation, bar plots
Functions, methods, and properties used: .read_csv(), .columns, notnull, map(), .dtypes, .rename, astype(), .mean(), .sum(), .xlabel(), .ylabel()
Project #8: Working with Data Downloads
Concepts explored: pandas, manipulating files with command line
Project #9: Transforming Data with Python
Concepts explored: pandas, manipulating files with command line, working with multiple python scripts, dateutil.parser
Project #10: Analyzing CIA Factbook
Python/SQL concepts explored: python+sqlite3, pandas, SQL queries, SQL subqueries, matplotlib.plyplot, seaborn, histograms
Functions, methods, and properties used: .cursor(), .read_sql_query(), .set_xlabel(), .set_xlim(), .add_subplot(), .figure()
SQL statements used: SELECT, WHERE, FROM, MIN(), MAX(), ORDER BY, AND
Project #11: Preparing data for SQLite
Python/SQL concepts explored: python+sqlite3, pandas, data cleaning, columns manipulation
Functions, methods, and properties used: .str.rstrip(), .str.split(), .connect(), .cursor(), .drop(), .str[], .map(), .value_counts()
SQL statements used: SELECT, FROM, PRAGMA
Project #12: Creating Relations in SQLite
Python/SQL concepts explored: python+sqlite3, pandas, multiple tables, foreign keys, subqueries, populating new tables
Functions, methods, and properties used: .cursor(), .connect(), .execute(), .fetchall(), .executemany()
SQL statements used: PRAGMA, LIMIT, FROM, SELECT, INNER JOIN, DROP, ALTER, VALUES
Project #13: Analyzing Movie Reviews
Concepts explored: pandas, descriptive statistics, numpy, matplotlib, scipy, correlations
Functions and methods used: .sort_values(), sci.linregress(), .hist(), .absolute(), .mean(), .median(), .absolute()
Project #14: Winning Jeopardy
Concepts explored: pandas, matplotlib, data cleaning, string manipulation, chi squared test, regex, try/except
Functions, methods, and properties used: .columns, .lower(), .sub(), .apply(), sum(), .array(), .split(), .shape, .mean(), .iterrows(), .remove(), .add(), .append()
Project #15: Predicting Car Prices
Concepts explored: Concepts explored: pandas, matplotlib, data cleaning, features engineering, k-nearest neighbors, hyperparameter tuning, RMSE
Functions and methods used: .read_csv(), .replace(), .drop(), .astype(), isnull().sum(), .min(), .max(), .mean(), .permutation(), .reindex(), .iloc[], .fit(), .predict(), mean_squared_error(), .Series(), .sort_values(), .plot(), .legend()
Project #16: Predicting House Sale Prices
Concepts explored: pandas, data cleaning, features engineering, linear regression, hyperparameter tuning, RMSE, KFold validation
Functions, methods, and properties used: .dtypes, .value_counts(), .drop, .isnull(), sum(), .fillna(), .sort_values(), . corr(), .index, .append(), .get_dummies(), .astype(), predict(), .fit(), KFold(), mean_squared_error()
Project #17: Predicting the Stock Market
Concepts explored: linear regression, mean squared error, categorical features, datetime
Functions, methods, and properties used: .read_csv(), .to_datetime(), .sort_values(), .rolling(), .apply(), .concat(), .get_dummies(), .shift(), datetime(), .fit(), .predict(), mean_squared_error()
Project #18: Predicting Bike Rentals
Concepts explored: pandas, matplotlib, features engineering, linear regression, decision trees, random forests, MSE
Functions, methods, and properties used:.hist(), .apply(), .corr(), .columns, .drop(), .sample(), .index, .floor(),.fit() .predict(), .mean_squared_error(), .append()
Project #19: Investigating Airplane Accidents
Concepts explored: Big O notation, strings, dictionaries, data parsing, try/except
Functions, methods, and properties used: range(), .append(), .split(), .values(), Counter()
Project #20: Working with Spark in Jupyter Notebook
Concepts explored: Spark
PySpark methods used: .map(), .flatMap(), .filter(), .count(), .collect(), .take()
Concepts explored: Spark SQL, Spark Dataframes, combining data from multiple files
Methods and functions used: .SQLContext(), .head(), .toPandas(), .show(), .select(), .hist(), .registerTempTable()