Python script developed for York Digital Journals. Uses the REST API for OJS 3.3.
The script outputs a CSV with four datapoints for each of multiple journals:
- Total published submissions
- Total published issues
- Previous month's abstract views
- Previous month's galley views
The script outputs the total number of submissions and issues which require the manual calculation of new submissions and issues. This is because YDJ frequently ingests back issues of journals which would not be identified if the API call specified a data range for these data points.
This readme file is written for librarians and other OJS managers who knew as little coding as I did when I started this project.
The API endpoints and tokens are stored in a CSV file. One row = one journal with the following column headers:
jabbr
: journal abbreviation (an identifier for each journal)subs_endpoint
: endpoint for the Submissions callissues_endpoint
: endpoint for the Issues callabstractViews_endpoint
: endpoint for the Abstract views callgalleyViews_endpoint
: endpoint for the Galley views calltoken
: the API token for each journal
The script reads the CSV file with endpoints and tokens as my_keys
, creates monthLookup
for the previous months in YYYY-MM format, and creates four empty lists to store the data extracted from the API calls:
journal_list
: the journal abbreviationmonth_list
: the month for the stats pullmetric_list
: the metric to which the value appliesvalue_list
: the value
The script then iterates through each journal with a for
loop, calling each API in turn and writing the output datapoint to the appropriate list.
The API call includes the parameter 'status':'3'
, returning only published submissions. The call returns a JSON file and the script reads itemsMax
. It writes jabbr
, monthLookup
, the string "published submissions", and the value of itemsMax
to the appropriate list.
The API call includes the parameter 'isPublished':'true'
, returning only published issues. The call returns a JSON file and the script reads itemsMax
. It writes jabbr
, monthLookup
, the string "published issues", and the value of itemsMax
to the appropriate list.
The API call includes the parameter 'dateStart':'2001-01-01'
, returning data for all possible months. The call returns a JSON file featuring an array of objects where each object is a month and its view data. Python reads this as a list of dictionaries in which each month and its view data is a dictionary. The script uses monthLookup
to identify the dictionary for the previous month and extracts value
for that month. It writes jabbr
, monthLookup
, the string "abstract views", and the value of values
to the appropriate list.
The API call includes the parameter 'dateStart':'2001-01-01'
, returning data for all possible months. The call returns a JSON file featuring an array of objects where each object is a month and its view data. Python reads this as a list of dictionaries in which each month and its view data is a dictionary. The script uses monthLookup
to identify the dictionary for the previous month and extracts value
for that month. It writes jabbr
, monthLookup
, the string "galley views", and the value of values
to the appropriate list.
To help with debugging, the script prints the journal abbreviation in the console once it completes the iteration of the loop for that journal.
The data harvested from the API calls have been stored in four lists of equal length. The script zips these into a long data frame with four headers: journal
, month
, metric
, value
. It then pivots the data frame to make it wide and prints the output to a CSV file.
The output CSV file has one row for each journal with the following headers:
journal
month
abstract views
galley views
published issues
published submissions
Somewhere along the way the journal abbreviations and the labels for the four metrics got sorted alphabetically.