The purpose of this project is to analyze the impact of weather on restaurant reviews viau large data sets for climate data and Yelp reviews. Snowflake - a cloud-native data warehouse - is used to stage the data, where it is then migrated and transformed into a ODS environment. The data is then migrated to a Data Warehouse schema, where SQL queries are performed to understand the relationship between weather and reviews.
- Yelp dataset of all existing reviews in JSON format(link: https://www.yelp.com/dataset/download). consisting of seperate datasets:
- Individual Business information
- Checkin
- Covid features
- Reviews
- Tips
- Users
- Historical climate data for San Diego in CSV format (link: https://crt-climate-explorer.nemac.org/):
- Precipitation
- Temperature
- Load data into Snowflake stage from locally downloaded files.
- Migrate staged files into staging schema.
- Transform data from staging into ODS schema. a. Follow ER Diagram data structure.
- Migrate data from ODS schema into final Data Warehouse schema,
- Perform queries to understand any hidden relationships between weather and a customers review on a restaurant.