Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk carbon-dating feature #31

Open
cjer opened this issue Jan 16, 2019 · 3 comments
Open

Bulk carbon-dating feature #31

cjer opened this issue Jan 16, 2019 · 3 comments

Comments

@cjer
Copy link

cjer commented Jan 16, 2019

I am looking to carbon-date a list of tens or hundreds of thousands URIs.
Currently running a (pretty bad and inefficient) short script I wrote that runs main.py with a different URI parameter for each URI in a line-separated text file. Was wondering whether I was missing something that already does this or something similar in this repository or elsewhere.

Thanks!

@ibnesayeed
Copy link
Member

The tool is designed for one URI at a time in the CLI mode. This makes the code logic and the response structure simple. Besides, we don't see any performance benefits if the tool were to take multiple URIs or an input file as a parameter, because processing each URI is independent and quite time consuming and the time to boot the script up is negligible in comparison.

@anwala
Copy link
Member

anwala commented Jan 16, 2019

I think another option is to run it in server mode, then make parallel requests against the server, e.g., 5 threads depending on the capabilities of your machine. But you may want to check from time to time if the server is alive and switch it on if it's off. I suppose you're saving the responses independently not in one file, such that you can restart without losing data.

@ibnesayeed
Copy link
Member

Parallel processing is possible both in server and one off modes. However, one has to realize that it is a network intensive task not a processor intensive one. This means, many parallel requests to various upstream services might cause rate limiting to be kicked in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants