Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some DB tables are not dumped when using db.to_csv() #383

Closed
nesnoj opened this issue Nov 17, 2022 · 5 comments · Fixed by #384
Closed

Some DB tables are not dumped when using db.to_csv() #383

nesnoj opened this issue Nov 17, 2022 · 5 comments · Fixed by #384
Assignees
Labels
🐛 bug Something isn't working

Comments

@nesnoj
Copy link
Collaborator

nesnoj commented Nov 17, 2022

Hey!

It's great to see the progress you've made this year 👍. Following your instructions in the docs I was able to bulk-download and access the MaStR data. But:

Expected behavior

db.to_csv(None) exports all tables.

Problem

db.to_csv(None) successfully creates a dump but some tables seem to be omitted, e.g. no files are created for locations and gas. So then I tried

>>> db.to_csv("location")
are saved to: /home/nesnoj/.open-MaStR/data/dataversion-2022-11-17

afterwards, but no location file is created.
So I thought it might be a problem if the dir already exists, removed it and explicitly listed all tables:

>>> db.to_csv(['wind', 'solar', 'biomass', 'hydro', 'gsgk', 'combustion', 'nuclear', 'gas', 'storage', 'electricity_consumer', 'location', 'market', 'grid', 'balancing_area', 'permit', 'deleted_units'])

Technology tables: ['wind', 'solar', 'biomass', 'hydro', 'gsgk', 'combustion', 'nuclear', 'storage']

Additional tables: ['electricity_consumer', 'balancing_area']
are saved to: /home/nesnoj/.open-MaStR/data/dataversion-2022-11-17

Same result, the additional tables are missing again.

Investigation

Ok, as the requested table "location" is not included in the "Additional tables: ..." above, I checked these lines in the package to find out where they get lost:

# Determine tables to export
technologies_to_export = []
additional_tables_to_export = []
for table in data:
if table in TECHNOLOGIES:
technologies_to_export.append(table)
elif table in ADDITIONAL_TABLES:
additional_tables_to_export.append(table)
if technologies_to_export:
print(f"\nTechnology tables: {technologies_to_export}")
if additional_tables_to_export:
print(f"\nAdditional tables: {additional_tables_to_export}")

It turns out that it is included in data before the loop starts in L316 but not in the constant ADDITIONAL_TABLES taken from constants.py - in there only "locations_extended" is listed. However, db.to_csv("locations_extended") is not allowed and raises this message.

I guess this is a bug?

My dirty workaround: Comment out the validation here and run db.to_csv("locations_extended"), this works out.

Problem 2: CSV encoding

But, haha, the resulting CSV has a wrong encoding (utf-16le):

luke@skywalker:~/.open-MaStR/data$ file -i dataversion-2022-11-17/bnetza_mastr_locations_extended_raw.csv 
dataversion-2022-11-17/bnetza_mastr_locations_extended_raw.csv: application/csv; charset=utf-16le

But you explicitly defined the charset:

if chunk_number == 0:
df.to_csv(
csv_file,
index=True,
index_label="EinheitMastrNummer",
encoding="utf-8",
)
log.info(
f"Technology csv {csv_file.split('/')[-1:]} didn't exist and was created."
)
else:
df.to_csv(
csv_file,
mode="a",
header=False,
index=True,
index_label="EinheitMastrNummer",
encoding="utf-8",

That might be a different issue?

@nesnoj nesnoj added the 🐛 bug Something isn't working label Nov 17, 2022
@nesnoj
Copy link
Collaborator Author

nesnoj commented Nov 17, 2022

Update on the encoding: it seems that only the additional tables have utf-16le:

darth@vader:~/.open-MaStR/data$ for f in dataversion-2022-11-17_OLD/*.csv;do file -i $f;done
dataversion-2022-11-17_OLD/bnetza_mastr_balancing_area_raw.csv: application/csv; charset=utf-16le
dataversion-2022-11-17_OLD/bnetza_mastr_biomass_raw.csv: application/csv; charset=utf-8
dataversion-2022-11-17_OLD/bnetza_mastr_combustion_raw.csv: application/csv; charset=utf-8
dataversion-2022-11-17_OLD/bnetza_mastr_electricity_consumer_raw.csv: application/csv; charset=utf-16le
dataversion-2022-11-17_OLD/bnetza_mastr_gsgk_raw.csv: application/csv; charset=utf-8
dataversion-2022-11-17_OLD/bnetza_mastr_hydro_raw.csv: application/csv; charset=utf-8
dataversion-2022-11-17_OLD/bnetza_mastr_nuclear_raw.csv: application/csv; charset=utf-8
dataversion-2022-11-17_OLD/bnetza_mastr_solar_raw.csv: application/csv; charset=utf-8
dataversion-2022-11-17_OLD/bnetza_mastr_storage_raw.csv: application/csv; charset=utf-8
dataversion-2022-11-17_OLD/bnetza_mastr_wind_raw.csv: application/csv; charset=utf-8

@nesnoj
Copy link
Collaborator Author

nesnoj commented Nov 19, 2022

Update on the encoding: it seems that only the additional tables have utf-16le:

darth@vader:~/.open-MaStR/data$ for f in dataversion-2022-11-17_OLD/*.csv;do file -i $f;done
dataversion-2022-11-17_OLD/bnetza_mastr_balancing_area_raw.csv: application/csv; charset=utf-16le
dataversion-2022-11-17_OLD/bnetza_mastr_biomass_raw.csv: application/csv; charset=utf-8
dataversion-2022-11-17_OLD/bnetza_mastr_combustion_raw.csv: application/csv; charset=utf-8
dataversion-2022-11-17_OLD/bnetza_mastr_electricity_consumer_raw.csv: application/csv; charset=utf-16le
dataversion-2022-11-17_OLD/bnetza_mastr_gsgk_raw.csv: application/csv; charset=utf-8
dataversion-2022-11-17_OLD/bnetza_mastr_hydro_raw.csv: application/csv; charset=utf-8
dataversion-2022-11-17_OLD/bnetza_mastr_nuclear_raw.csv: application/csv; charset=utf-8
dataversion-2022-11-17_OLD/bnetza_mastr_solar_raw.csv: application/csv; charset=utf-8
dataversion-2022-11-17_OLD/bnetza_mastr_storage_raw.csv: application/csv; charset=utf-8
dataversion-2022-11-17_OLD/bnetza_mastr_wind_raw.csv: application/csv; charset=utf-8

I opened a separate issue for this problem: #385

chrwm added a commit that referenced this issue Nov 21, 2022
* add tqdm for additional table export
* rename second table variable
chrwm added a commit that referenced this issue Nov 21, 2022
* Move method to helpers.py
* Add 'source' parameter to choose mapping
* Add NotImplementedException
chrwm added a commit that referenced this issue Nov 21, 2022
@chrwm
Copy link
Member

chrwm commented Nov 21, 2022

The problem with the encoding was simply fba5bb1.
The problem exporting the additional tables was that in

for table in data:
if table in TECHNOLOGIES:
technologies_to_export.append(table)
elif table in ADDITIONAL_TABLES:
additional_tables_to_export.append(table)

table was drawn from set of data options of BULK_DATA and comparead to ADDITIONAL_TABLES, in the case that the tables to export were unspecified in db.to_csv(). See below, as method='bulk' is default.
elif data is None:
data = BULK_DATA if method == "bulk" else API_DATA

This resulted in balancing_area and electricity_consumer beeing the only match between BULK_DATA and ADDITIONAL_TABLES. Thus, only the two were added to the list of exported additional tables.

As solution, I wrote a mapping (64c306e) from the possible data parameters for the bulk download to the corresponding tables names in the sql database and a case to handle it (f478203)

@nesnoj
Copy link
Collaborator Author

nesnoj commented Nov 22, 2022

Thanks @chrwm for these quick fixes!

chrwm added a commit that referenced this issue Nov 22, 2022
chrwm added a commit that referenced this issue Nov 22, 2022
* raise NotImplementedError successfully
* run black
chrwm added a commit that referenced this issue Nov 23, 2022
chrwm added a commit that referenced this issue Nov 23, 2022
@chrwm
Copy link
Member

chrwm commented Nov 24, 2022

Fixed by #384

@chrwm chrwm closed this as completed Nov 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants