[ post Mar 22 ] revise database schema for storing lottery data #515

yunlanli · 2022-03-17T04:04:01Z

Description

Previously, since we only have one year of lottery data, it made sense to co-locate the lottery number and the room information in a single table. Now that we have multiple years of data, this schema is no longer suitable. If we add one column for each year's lottery data, the table will soon become very large. Moreover, it would be hard to mark a single year's data as an outlier (maybe a 32-bit bitmap will work, but probably not a brilliant idea).

I'm thinking of using one table for storing room information with a composite primary key consisting of residence hall and room number, and other tables for storing lottery data and a foreign key that references the table containing room information.

I think this needs further discussion. For example, should we assign each dorm an ID as a primary key? Should we store lottery numbers each year in a single table or spread them into mulitple identified by year? It might be worth creating views to cater to our analytics needs.

yunlanli · 2022-04-03T19:52:27Z

Here are some challenges that we face with the current schema:

Lottery Data Aggregation

a) lottery data for each year is stored in a separate table with (residence hall, room, room suffix) as the primary key
b) lottery data from different year may contain different rooms: e.g (Schapiro, 1203, 1) may exist in 2021 but not in 2020.

Idea 1: A single SQL query

Our MySQL version (5.27) doesn't support full outer join, only left and right outer join are supported, so

it would be hard to write a single SQL query to obtain lottery data aggregate
it's already hard to account for b) with 2 years of data, the SQL query could become quickly unmaintainable with several years of data to account for b).
too many join operations

This approach seems inefficient and can quickly become unmaintainable.

Idea 2: A new Table / View

Will add details later. Thinking about

a table for dorm: assign a room id to each room, add the constraint that (residence hall, room, room suffix) must be unique
a table for lottery data, primary key: room id, a column for each year's lottery data
a table / view for aggregate lottery data

Every year's lottery data go through an ETL process and eventually end up in the lottery data table.
The idea is to make this scheme optimal for analytics query.

Lottery Data Filter

yunlanli added help wanted Extra attention is needed lottery predictor labels Mar 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ post Mar 22 ] revise database schema for storing lottery data #515

[ post Mar 22 ] revise database schema for storing lottery data #515

yunlanli commented Mar 17, 2022 •

edited

Loading

yunlanli commented Apr 3, 2022 •

edited

Loading

[ post Mar 22 ] revise database schema for storing lottery data #515

[ post Mar 22 ] revise database schema for storing lottery data #515

Comments

yunlanli commented Mar 17, 2022 • edited Loading

Description

yunlanli commented Apr 3, 2022 • edited Loading

Lottery Data Aggregation

Idea 1: A single SQL query

Idea 2: A new Table / View

Lottery Data Filter

yunlanli commented Mar 17, 2022 •

edited

Loading

yunlanli commented Apr 3, 2022 •

edited

Loading