You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Previously, since we only have one year of lottery data, it made sense to co-locate the lottery number and the room information in a single table. Now that we have multiple years of data, this schema is no longer suitable. If we add one column for each year's lottery data, the table will soon become very large. Moreover, it would be hard to mark a single year's data as an outlier (maybe a 32-bit bitmap will work, but probably not a brilliant idea).
I'm thinking of using one table for storing room information with a composite primary key consisting of residence hall and room number, and other tables for storing lottery data and a foreign key that references the table containing room information.
I think this needs further discussion. For example, should we assign each dorm an ID as a primary key? Should we store lottery numbers each year in a single table or spread them into mulitple identified by year? It might be worth creating views to cater to our analytics needs.
The text was updated successfully, but these errors were encountered:
Here are some challenges that we face with the current schema:
Lottery Data Aggregation
a) lottery data for each year is stored in a separate table with (residence hall, room, room suffix) as the primary key
b) lottery data from different year may contain different rooms: e.g (Schapiro, 1203, 1) may exist in 2021 but not in 2020.
Idea 1: A single SQL query
Our MySQL version (5.27) doesn't support full outer join, only left and right outer join are supported, so
it would be hard to write a single SQL query to obtain lottery data aggregate
it's already hard to account for b) with 2 years of data, the SQL query could become quickly unmaintainable with several years of data to account for b).
too many join operations
This approach seems inefficient and can quickly become unmaintainable.
Idea 2: A new Table / View
Will add details later. Thinking about
a table for dorm: assign a room id to each room, add the constraint that (residence hall, room, room suffix) must be unique
a table for lottery data, primary key: room id, a column for each year's lottery data
a table / view for aggregate lottery data
Every year's lottery data go through an ETL process and eventually end up in the lottery data table.
The idea is to make this scheme optimal for analytics query.
Description
Previously, since we only have one year of lottery data, it made sense to co-locate the lottery number and the room information in a single table. Now that we have multiple years of data, this schema is no longer suitable. If we add one column for each year's lottery data, the table will soon become very large. Moreover, it would be hard to mark a single year's data as an outlier (maybe a 32-bit bitmap will work, but probably not a brilliant idea).
I'm thinking of using one table for storing room information with a composite primary key consisting of residence hall and room number, and other tables for storing lottery data and a foreign key that references the table containing room information.
The text was updated successfully, but these errors were encountered: