You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When uploading records, the current output of the cernopendata fixtures command does not look very well structured:
record 69734 updated
2025-01-12 16:59:04.555600 This is an index file. Let's check the entries that it has: root://eospublic.cern.ch//eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v1/file-indexes/CMS_mc_RunIISummer20UL16NanoAODv9_WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8_NANOAODSIM_106X_mcRun2_asymptotic_v17-v1_120000_file_index.json
The file index contains 4 entries.
File index created
2025-01-12 16:59:04.682527 This is an index file. Let's check the entries that it has: root://eospublic.cern.ch//eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v1/file-indexes/CMS_mc_RunIISummer20UL16NanoAODv9_WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8_NANOAODSIM_106X_mcRun2_asymptotic_v17-v1_130000_file_index.json
The file index contains 1 entries.
File index created
2025-01-12 16:59:04.733111 This is an index file. Let's check the entries that it has: root://eospublic.cern.ch//eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v1/file-indexes/CMS_mc_RunIISummer20UL16NanoAODv9_WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8_NANOAODSIM_106X_mcRun2_asymptotic_v17-v1_270000_file_index.json
The file index contains 19 entries.
File index created
2025-01-12 16:59:05.040391 This is an index file. Let's check the entries that it has: root://eospublic.cern.ch//eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v1/file-indexes/CMS_mc_RunIISummer20UL16NanoAODv9_WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8_NANOAODSIM_106X_mcRun2_asymptotic_v17-v1_280000_file_index.json
The file index contains 14 entries.
File index created
2025-01-12 16:59:05.249401 This is an index file. Let's check the entries that it has: root://eospublic.cern.ch//eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v1/file-indexes/CMS_mc_RunIISummer20UL16NanoAODv9_WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8_NANOAODSIM_106X_mcRun2_asymptotic_v17-v1_70000_file_index.json
The file index contains 8 entries.
File index created
record 69735 updated
The output makes it hard to detect various sections and subsection around uploading records, mixes information-based and time-based output formatting techniques, and is not very machine-friendly for possible output log parsing and analysis by launching scripts.
Expected behaviour
It would be nice to polish the output cosmetics so that the various findings would be more readable and their location predictable.
Option 1: The output could be structured around records as the main heading, since it is the main unit of information handled by the upload process. The various subsequent information related to a record could then be presented as lesser subheadings. For example:
==> Processing record 69735
-> Detected DOI 10.7483/OPENDATA.LHCB.PAM7.JHT0
-> Detected direct file root://eospublic.cern.ch//eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v1/foo.root
-> Detected index file with 4 entries root://eospublic.cern.ch//eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v1/file-indexes/CMS_mc_RunIISummer20UL16NanoAODv9_WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8_NANOAODSIM_106X_mcRun2_asymptotic_v17-v1_120000_file_index.json
-> Processed index file with 4 entries root://eospublic.cern.ch//eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v1/file-indexes/CMS_mc_RunIISummer20UL16NanoAODv9_WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8_NANOAODSIM_106X_mcRun2_asymptotic_v17-v1_120000_file_index.json
-> Detected index file with 1 entries root://eospublic.cern.ch//eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v1/file-indexes/CMS_mc_RunIISummer20UL16NanoAODv9_WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8_NANOAODSIM_106X_mcRun2_asymptotic_v17-v1_130000_file_index.json
-> Processed index file with 1 entries root://eospublic.cern.ch//eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v1/file-indexes/CMS_mc_RunIISummer20UL16NanoAODv9_WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8_NANOAODSIM_106X_mcRun2_asymptotic_v17-v1_130000_file_index.json
-> Detected index file with 19 entries root://eospublic.cern.ch//eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v1/file-indexes/CMS_mc_RunIISummer20UL16NanoAODv9_WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8_NANOAODSIM_106X_mcRun2_asymptotic_v17-v1_270000_file_index.json
-> Processed index file with 19 entries root://eospublic.cern.ch//eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v1/file-indexes/CMS_mc_RunIISummer20UL16NanoAODv9_WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8_NANOAODSIM_106X_mcRun2_asymptotic_v17-v1_270000_file_index.json
==> Record 69735 updated
Option 2: Alternatively, if we want to have a time-based output formatting, then let's make all the output lines to have the same time-based logging format, for example:
2025-01-12 16:59:03 Processing record 69735
2025-01-12 16:59:04 Detected index file root://eospublic.cern.ch//eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v1/file-indexes/CMS_mc_RunIISummer20UL16NanoAODv9_WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8_NANOAODSIM_106X_mcRun2_asymptotic_v17-v1_120000_file_index.json
2025-01-12 16:59:04 Detected 4 entries in the index file root://eospublic.cern.ch//eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v1/file-indexes/CMS_mc_RunIISummer20UL16NanoAODv9_WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8_NANOAODSIM_106X_mcRun2_asymptotic_v17-v1_120000_file_index.json
2025-01-12 16:59:04 Processed index file root://eospublic.cern.ch//eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v1/file-indexes/CMS_mc_RunIISummer20UL16NanoAODv9_WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8_NANOAODSIM_106X_mcRun2_asymptotic_v17-v1_120000_file_index.json
2025-01-12 16:59:04 Detected index file root://eospublic.cern.ch//eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v1/file-indexes/CMS_mc_RunIISummer20UL16NanoAODv9_WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8_NANOAODSIM_106X_mcRun2_asymptotic_v17-v1_130000_file_index.json
2025-01-12 16:59:04 Detected 1 entries in the index file root://eospublic.cern.ch//eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v1/file-indexes/CMS_mc_RunIISummer20UL16NanoAODv9_WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8_NANOAODSIM_106X_mcRun2_asymptotic_v17-v1_130000_file_index.json
2025-01-12 16:59:04 Processed index file root://eospublic.cern.ch//eos/opendata/cms/mc/RunIISummer20UL16NanoAODv9/WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8/NANOAODSIM/106X_mcRun2_asymptotic_v17-v1/file-indexes/CMS_mc_RunIISummer20UL16NanoAODv9_WJetsToLNu_HT-800To1200_TuneCP5_13TeV-madgraphMLM-pythia8_NANOAODSIM_106X_mcRun2_asymptotic_v17-v1_130000_file_index.json
2025-01-12 16:59:06 Record 69735 updated
Note: introduce verbose flag
Moreover, the "internal messages" should be ideally toggle-friendly. By default, the output could be very terse so that data curators get a good overview of the main information concerning record uplodas, such as:
$ cernopendata fixtures records --mode insert-or-replace -f myfile.json2025-01-12 16:59:06 Record 123 created2025-01-12 16:59:06 Record 124 updated2025-01-12 16:59:07 Record 125 created2025-01-12 16:59:07 Record 126 updated
And, when the data curator wishes to know more details about the upload process, the curator could use a verbose flag to trigger the above verbose output:
$ cernopendata fixtures records --mode insert-or-replace -f myfile.json -v2025-01-12 16:59:06 Processing record 1232025-01-12 16:59:06 Detected DOI ...2025-01-12 16:59:06 Detected direct file ...2025-01-12 16:59:06 Detected index file ...2025-01-12 16:59:06 Record 123 created2025-01-12 16:59:06 Processing record 1242025-01-12 16:59:06 Detected DOI ...2025-01-12 16:59:06 Detected direct file ...2025-01-12 16:59:06 Detected index file ...2025-01-12 16:59:06 Record 124 updated2025-01-12 16:59:07 Processing record 1252025-01-12 16:59:07 Detected DOI ...2025-01-12 16:59:07 Detected direct file ...2025-01-12 16:59:07 Detected index file ...2025-01-12 16:59:07 Record 125 created2025-01-12 16:59:07 Processing record 1262025-01-12 16:59:07 Detected DOI ...2025-01-12 16:59:07 Detected direct file ...2025-01-12 16:59:07 Detected index file ...2025-01-12 16:59:07 Record 126 updated
Notes: print final statistics
The fixture loading script could also output some statistics at the end of processing, for the data curator convenience:
$ cernopendata fixtures records --mode insert-or-replace -f myfile.json -v...2025-01-12 16:59:07 [INFO] Record 125 updated2025-01-12 16:59:07 [INFO] Record 126 updated2025-01-12 16:59:07 [INFO] Processed 456 records (400 created, 56 updated, 0 error, 0 pending)2025-01-12 16:59:07 [INFO] Processing took 10 minutes (1.3 records per second)
$ cernopendata fixtures records --mode insert-or-replace -f myfile.json -v...2025-01-12 16:59:07 [INFO] Record 125 updated2025-01-12 16:59:07 [ERROR] Record 126 exception "Duplicate DOI"2025-01-12 16:59:07 [ERROR] Processed 355 records (300 created, 55 updated, 1 error, 99 pending)
Just some illustrative possibilities for live discussions.
The text was updated successfully, but these errors were encountered:
Current behaviour
When uploading records, the current output of the
cernopendata fixtures
command does not look very well structured:The output makes it hard to detect various sections and subsection around uploading records, mixes information-based and time-based output formatting techniques, and is not very machine-friendly for possible output log parsing and analysis by launching scripts.
Expected behaviour
It would be nice to polish the output cosmetics so that the various findings would be more readable and their location predictable.
Option 1: The output could be structured around records as the main heading, since it is the main unit of information handled by the upload process. The various subsequent information related to a record could then be presented as lesser subheadings. For example:
Option 2: Alternatively, if we want to have a time-based output formatting, then let's make all the output lines to have the same time-based logging format, for example:
Note: introduce verbose flag
Moreover, the "internal messages" should be ideally toggle-friendly. By default, the output could be very terse so that data curators get a good overview of the main information concerning record uplodas, such as:
And, when the data curator wishes to know more details about the upload process, the curator could use a verbose flag to trigger the above verbose output:
Notes: print final statistics
The fixture loading script could also output some statistics at the end of processing, for the data curator convenience:
Just some illustrative possibilities for live discussions.
The text was updated successfully, but these errors were encountered: