We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
https://shodhganga.inflibnet.ac.in/ is the website by union govt of india, to publish all PhD research papers in all indian languages.
Currently it has 5.5 Lakh + thesis works, all in Creative Commons License - CC BY-NC-SA
Write a program to scrap them all pdf files, with metadata and push to archive.org
Note - each research thesis is as multipart pdf file there. merge all the parts for each work.
The site is made with Dspace - An Open source software for digital repositories - https://dspace.lyrasis.org/
Check for any existing web scrappers for dspace.
The text was updated successfully, but these errors were encountered:
explore this - https://github.com/thenicekat/Scrapers_BPHC
Sorry, something went wrong.
I'm interested in this! Is this meta data we need correct ?
No branches or pull requests
https://shodhganga.inflibnet.ac.in/ is the website by union govt of india, to publish all PhD research papers in all indian languages.
Currently it has 5.5 Lakh + thesis works, all in Creative Commons License - CC BY-NC-SA
Write a program to scrap them all pdf files, with metadata and push to archive.org
Note - each research thesis is as multipart pdf file there. merge all the parts for each work.
The site is made with Dspace - An Open source software for digital repositories - https://dspace.lyrasis.org/
Check for any existing web scrappers for dspace.
The text was updated successfully, but these errors were encountered: