Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixtures: prettify output message structure #128

Open
tiborsimko opened this issue Jan 13, 2025 · 0 comments
Open

fixtures: prettify output message structure #128

tiborsimko opened this issue Jan 13, 2025 · 0 comments

Comments

@tiborsimko
Copy link
Member

tiborsimko commented Jan 13, 2025

Current behaviour

When uploading records, the current output of the cernopendata fixtures command does not look very well structured:

record 69734 updated
2025-01-12 16:59:04.555600 This is an index file. Let's check the entries that it has: root://eospublic.cern.ch//eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v1/file-indexes/CMS_mc_RunIISummer20UL16NanoAODv9_WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8_NANOAODSIM_106X_mcRun2_asymptotic_v17-v1_120000_file_index.json
The file index contains 4 entries.
File index created
2025-01-12 16:59:04.682527 This is an index file. Let's check the entries that it has: root://eospublic.cern.ch//eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v1/file-indexes/CMS_mc_RunIISummer20UL16NanoAODv9_WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8_NANOAODSIM_106X_mcRun2_asymptotic_v17-v1_130000_file_index.json
The file index contains 1 entries.
File index created
2025-01-12 16:59:04.733111 This is an index file. Let's check the entries that it has: root://eospublic.cern.ch//eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v1/file-indexes/CMS_mc_RunIISummer20UL16NanoAODv9_WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8_NANOAODSIM_106X_mcRun2_asymptotic_v17-v1_270000_file_index.json
The file index contains 19 entries.
File index created
2025-01-12 16:59:05.040391 This is an index file. Let's check the entries that it has: root://eospublic.cern.ch//eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v1/file-indexes/CMS_mc_RunIISummer20UL16NanoAODv9_WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8_NANOAODSIM_106X_mcRun2_asymptotic_v17-v1_280000_file_index.json
The file index contains 14 entries.
File index created
2025-01-12 16:59:05.249401 This is an index file. Let's check the entries that it has: root://eospublic.cern.ch//eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v1/file-indexes/CMS_mc_RunIISummer20UL16NanoAODv9_WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8_NANOAODSIM_106X_mcRun2_asymptotic_v17-v1_70000_file_index.json
The file index contains 8 entries.
File index created
record 69735 updated

The output makes it hard to detect various sections and subsection around uploading records, mixes information-based and time-based output formatting techniques, and is not very machine-friendly for possible output log parsing and analysis by launching scripts.

Expected behaviour

It would be nice to polish the output cosmetics so that the various findings would be more readable and their location predictable.

Option 1: The output could be structured around records as the main heading, since it is the main unit of information handled by the upload process. The various subsequent information related to a record could then be presented as lesser subheadings. For example:

==> Processing record 69735
  -> Detected DOI 10.7483/OPENDATA.LHCB.PAM7.JHT0
  -> Detected direct file root://eospublic.cern.ch//eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v1/foo.root
  -> Detected index file with 4 entries root://eospublic.cern.ch//eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v1/file-indexes/CMS_mc_RunIISummer20UL16NanoAODv9_WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8_NANOAODSIM_106X_mcRun2_asymptotic_v17-v1_120000_file_index.json
  -> Processed index file with 4 entries root://eospublic.cern.ch//eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v1/file-indexes/CMS_mc_RunIISummer20UL16NanoAODv9_WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8_NANOAODSIM_106X_mcRun2_asymptotic_v17-v1_120000_file_index.json
  -> Detected index file with 1 entries root://eospublic.cern.ch//eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v1/file-indexes/CMS_mc_RunIISummer20UL16NanoAODv9_WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8_NANOAODSIM_106X_mcRun2_asymptotic_v17-v1_130000_file_index.json
  -> Processed index file with 1 entries root://eospublic.cern.ch//eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v1/file-indexes/CMS_mc_RunIISummer20UL16NanoAODv9_WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8_NANOAODSIM_106X_mcRun2_asymptotic_v17-v1_130000_file_index.json
  -> Detected index file with 19 entries root://eospublic.cern.ch//eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v1/file-indexes/CMS_mc_RunIISummer20UL16NanoAODv9_WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8_NANOAODSIM_106X_mcRun2_asymptotic_v17-v1_270000_file_index.json
  -> Processed index file with 19 entries root://eospublic.cern.ch//eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v1/file-indexes/CMS_mc_RunIISummer20UL16NanoAODv9_WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8_NANOAODSIM_106X_mcRun2_asymptotic_v17-v1_270000_file_index.json
==> Record 69735 updated

Option 2: Alternatively, if we want to have a time-based output formatting, then let's make all the output lines to have the same time-based logging format, for example:

2025-01-12 16:59:03 Processing record 69735
2025-01-12 16:59:04 Detected index file root://eospublic.cern.ch//eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v1/file-indexes/CMS_mc_RunIISummer20UL16NanoAODv9_WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8_NANOAODSIM_106X_mcRun2_asymptotic_v17-v1_120000_file_index.json
2025-01-12 16:59:04 Detected 4 entries in the index file root://eospublic.cern.ch//eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v1/file-indexes/CMS_mc_RunIISummer20UL16NanoAODv9_WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8_NANOAODSIM_106X_mcRun2_asymptotic_v17-v1_120000_file_index.json
2025-01-12 16:59:04 Processed index file root://eospublic.cern.ch//eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v1/file-indexes/CMS_mc_RunIISummer20UL16NanoAODv9_WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8_NANOAODSIM_106X_mcRun2_asymptotic_v17-v1_120000_file_index.json
2025-01-12 16:59:04 Detected index file root://eospublic.cern.ch//eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v1/file-indexes/CMS_mc_RunIISummer20UL16NanoAODv9_WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8_NANOAODSIM_106X_mcRun2_asymptotic_v17-v1_130000_file_index.json
2025-01-12 16:59:04 Detected 1 entries in the index file root://eospublic.cern.ch//eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v1/file-indexes/CMS_mc_RunIISummer20UL16NanoAODv9_WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8_NANOAODSIM_106X_mcRun2_asymptotic_v17-v1_130000_file_index.json
2025-01-12 16:59:04 Processed index file root://eospublic.cern.ch//eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v1/file-indexes/CMS_mc_RunIISummer20UL16NanoAODv9_WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8_NANOAODSIM_106X_mcRun2_asymptotic_v17-v1_130000_file_index.json
2025-01-12 16:59:06 Record 69735 updated

Note: introduce verbose flag

Moreover, the "internal messages" should be ideally toggle-friendly. By default, the output could be very terse so that data curators get a good overview of the main information concerning record uplodas, such as:

$ cernopendata fixtures records --mode insert-or-replace -f myfile.json
2025-01-12 16:59:06 Record 123 created
2025-01-12 16:59:06 Record 124 updated
2025-01-12 16:59:07 Record 125 created
2025-01-12 16:59:07 Record 126 updated

And, when the data curator wishes to know more details about the upload process, the curator could use a verbose flag to trigger the above verbose output:

$ cernopendata fixtures records --mode insert-or-replace -f myfile.json -v
2025-01-12 16:59:06 Processing record 123
2025-01-12 16:59:06 Detected DOI ...
2025-01-12 16:59:06 Detected direct file ...
2025-01-12 16:59:06 Detected index file ...
2025-01-12 16:59:06 Record 123 created
2025-01-12 16:59:06 Processing record 124
2025-01-12 16:59:06 Detected DOI ...
2025-01-12 16:59:06 Detected direct file ...
2025-01-12 16:59:06 Detected index file ...
2025-01-12 16:59:06 Record 124 updated
2025-01-12 16:59:07 Processing record 125
2025-01-12 16:59:07 Detected DOI ...
2025-01-12 16:59:07 Detected direct file ...
2025-01-12 16:59:07 Detected index file ...
2025-01-12 16:59:07 Record 125 created
2025-01-12 16:59:07 Processing record 126
2025-01-12 16:59:07 Detected DOI ...
2025-01-12 16:59:07 Detected direct file ...
2025-01-12 16:59:07 Detected index file ...
2025-01-12 16:59:07 Record 126 updated

Notes: print final statistics

The fixture loading script could also output some statistics at the end of processing, for the data curator convenience:

$ cernopendata fixtures records --mode insert-or-replace -f myfile.json -v
...
2025-01-12 16:59:07 [INFO] Record 125 updated
2025-01-12 16:59:07 [INFO] Record 126 updated
2025-01-12 16:59:07 [INFO] Processed 456 records (400 created, 56 updated, 0 error, 0 pending)
2025-01-12 16:59:07 [INFO] Processing took 10 minutes (1.3 records per second)
$ cernopendata fixtures records --mode insert-or-replace -f myfile.json -v
...
2025-01-12 16:59:07 [INFO] Record 125 updated
2025-01-12 16:59:07 [ERROR] Record 126 exception "Duplicate DOI"
2025-01-12 16:59:07 [ERROR] Processed 355 records (300 created, 55 updated, 1 error, 99 pending)

Just some illustrative possibilities for live discussions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant