-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenAlex publications CSV #51
Conversation
cb13d18
to
79cf3e1
Compare
|
||
from rialto_airflow.utils import invert_dict | ||
|
||
config.email = os.environ.get("AIRFLOW_VAR_OPENALEX_EMAIL") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm kind of wondering if we should just do this for all the environment variables, instead of using airflow.models.Variable
and passing things down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seemed harder to set up the tests when using Variable
as well.
writer.writerow(pub) | ||
|
||
|
||
def publications_from_dois(dois: list, batch_size=75): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just curious did you arrive at 75 through experimentation to see what was possible? Or was it documented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can do larger batches, but the request ends up being too long (over 4096). I kept getting Bad Request errors when the requests were too large. So this seemed to get us safely below that threshold. I'll add a comment.
return pub | ||
|
||
|
||
FIELDS = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably move other global variables to all caps when we have a chance. It reads better I think.
Resolves #8 to query OpenAlex by DOI to create a publications CSV.
I followed the model you're using for the dimensions publications lookup, @edsu so some of the code will look familiar.