DergiPark Project

Description

DergiPark is one of the biggest websites that provides electronic hosting for academic peer-reviewed articles in Turkey. In this project, I extracted all articles from DergiPark and parsed the data in 8 main headings. Afterwards, I outputted that data into different formats of files like .jsonl (JSON lines) and .txt (Text). Complete Data-Set is available in the DergiPark-Data-Set repository. The number of formats can be increased by customizing the source code.

DergiPark currently has over 25.000 academic articles. I extracted them all through Web Scraping with Python. Web Scraping is basically extracting a big amount of data from a specific website by reaching its source codes and parsing the tags.

The data that I extracted can be used in Ai models to give meaning to this data or train any model with them. Because the data is academic peer-reviewed articles this data can be used in any formal project.

Used Techs

I used Python as a main programming language.

For Web scraping, I used 'BeautifulSoup' and 'request' modules. Except for these I used 'json', 'os' and 'time' for outputting the data and waiting sections.

Installation

1) Download

Download the project as an executable file from Releases and run the DergiPark.exe file.

2) Clone

Clone the project

git clone https://github.com/Alperencode/DergiPark-Project

Go to the project directory

cd DergiPark-Project

Install the required modules

pip install -r requirements.txt

Run the python file

python main.py

Usage/Examples

Run main.py in root directory

python main.py

Example of proper working

Screenshots

Screenshot from JSON line data

Screenshot from txt data

Authors

Alperen Aga

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
DergiPark-Data-Set @ 1fbbc77		DergiPark-Data-Set @ 1fbbc77
Example-Data-Set		Example-Data-Set
img		img
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
License		License
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DergiPark Project

Description

Used Techs

Installation

1) Download

2) Clone

Usage/Examples

Screenshots

Related

Authors

About

Releases 3

Languages

License

Alperencode/DergiPark-Project

Folders and files

Latest commit

History

Repository files navigation

DergiPark Project

Description

Used Techs

Installation

1) Download

2) Clone

Usage/Examples

Screenshots

Related

Authors

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 3

Languages