Skip to content

sengkchu/dataquest-guided-projects-solutions

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dataquest Guided Projects

This repository is a collection of my projects from Dataquest.io.

The projects below will serve as reference notes for myself and anyone else who is interested. These projects contain lots of comments showing my thought process and what I personally learned from each project.

Concepts explored: lists, dictionaries, functions, for loops

Functions, methods, and properties used: .read(), open(), .split(), .append(), int()

Concepts explored: list comprehension, datetime module, csv module

Functions, methods, and properties used: csv.reader(), .items(), list(), datetime.datetime()

Concepts explored: pandas, functions, boolean filtering

Functions, methods, and properties used: .read_csv(), .pivot_table(), .replace(), .describe(), .apply(), .isnull(), .columns, .shape, .head()

Concepts explored: pandas, matplotlib, histograms, bar charts, scatterplots, scatter matrices

Functions, methods, and properties used: .plot(), scatter_matrix(), hist(), iloc[], .head(), .tail(), .describe()

Concepts explored: pandas, matplotlib, histograms, line plots, chart graphics

Functions, methods, and properties used: .savefig(), .text(), .axhline(), .set_yticks(), .tick_params(), .set_title(), .set_ylim(), .set_xlim(), .spines(), .tick_params()

Concepts explored: pandas, matplotlib.pyplot, correlations, regex, basemap, data analysis, string manipulation

Functions, methods, and properties used: .scatter(), info(), .tolist(), .groupby(), .agg(), .concat(), .apply(), .strip, .merge(), .fillna(), .corr()

Project #7: Star Wars Survey

Concepts explored: pandas, matplotlib.pyplot, data cleaning, string manipulation, bar plots

Functions, methods, and properties used: .read_csv(), .columns, notnull, map(), .dtypes, .rename, astype(), .mean(), .sum(), .xlabel(), .ylabel()

Concepts explored: pandas, manipulating files with command line

Concepts explored: pandas, manipulating files with command line, working with multiple python scripts, dateutil.parser

Python/SQL concepts explored: python+sqlite3, pandas, SQL queries, SQL subqueries, matplotlib.plyplot, seaborn, histograms

Functions, methods, and properties used: .cursor(), .read_sql_query(), .set_xlabel(), .set_xlim(), .add_subplot(), .figure()

SQL statements used: SELECT, WHERE, FROM, MIN(), MAX(), ORDER BY, AND

Python/SQL concepts explored: python+sqlite3, pandas, data cleaning, columns manipulation

Functions, methods, and properties used: .str.rstrip(), .str.split(), .connect(), .cursor(), .drop(), .str[], .map(), .value_counts()

SQL statements used: SELECT, FROM, PRAGMA

Python/SQL concepts explored: python+sqlite3, pandas, multiple tables, foreign keys, subqueries, populating new tables

Functions, methods, and properties used: .cursor(), .connect(), .execute(), .fetchall(), .executemany()

SQL statements used: PRAGMA, LIMIT, FROM, SELECT, INNER JOIN, DROP, ALTER, VALUES

Concepts explored: pandas, descriptive statistics, numpy, matplotlib, scipy, correlations

Functions and methods used: .sort_values(), sci.linregress(), .hist(), .absolute(), .mean(), .median(), .absolute()

Project #14: Winning Jeopardy

Concepts explored: pandas, matplotlib, data cleaning, string manipulation, chi squared test, regex, try/except

Functions, methods, and properties used: .columns, .lower(), .sub(), .apply(), sum(), .array(), .split(), .shape, .mean(), .iterrows(), .remove(), .add(), .append()

Concepts explored: Concepts explored: pandas, matplotlib, data cleaning, features engineering, k-nearest neighbors, hyperparameter tuning, RMSE

Functions and methods used: .read_csv(), .replace(), .drop(), .astype(), isnull().sum(), .min(), .max(), .mean(), .permutation(), .reindex(), .iloc[], .fit(), .predict(), mean_squared_error(), .Series(), .sort_values(), .plot(), .legend()

Concepts explored: pandas, data cleaning, features engineering, linear regression, hyperparameter tuning, RMSE, KFold validation

Functions, methods, and properties used: .dtypes, .value_counts(), .drop, .isnull(), sum(), .fillna(), .sort_values(), . corr(), .index, .append(), .get_dummies(), .astype(), predict(), .fit(), KFold(), mean_squared_error()

Concepts explored: linear regression, mean squared error, categorical features, datetime

Functions, methods, and properties used: .read_csv(), .to_datetime(), .sort_values(), .rolling(), .apply(), .concat(), .get_dummies(), .shift(), datetime(), .fit(), .predict(), mean_squared_error()

Concepts explored: pandas, matplotlib, features engineering, linear regression, decision trees, random forests, MSE

Functions, methods, and properties used:.hist(), .apply(), .corr(), .columns, .drop(), .sample(), .index, .floor(),.fit() .predict(), .mean_squared_error(), .append()

Concepts explored: Big O notation, strings, dictionaries, data parsing, try/except

Functions, methods, and properties used: range(), .append(), .split(), .values(), Counter()

Concepts explored: Spark

PySpark methods used: .map(), .flatMap(), .filter(), .count(), .collect(), .take()

Concepts explored: Spark SQL, Spark Dataframes, combining data from multiple files

Methods and functions used: .SQLContext(), .head(), .toPandas(), .show(), .select(), .hist(), .registerTempTable()