A Python application that processes scientific papers in PDF format and generates detailed scientific analyses using the GPT-4 API.
- Text extraction from PDF files
- Scientific analysis using GPT-4 API
- Batch PDF processing
- Automatic result reporting
- Error handling and retry mechanism
- Configurable OpenAI model selection
- Clone the project:
git clone [repository-url]
- Install required packages:
pip install -r requirements.txt
- Create a
.env
file in the project root directory:- Copy
.env.example
to.env
- Add your OpenAI API key
- Select your preferred model (gpt-4 or gpt-3.5-turbo)
- Copy
Example .env
file:
# OpenAI Configuration
OPENAI_API_KEY=your-api-key-here
OPENAI_MODEL=gpt-4 # Options: gpt-4, gpt-3.5-turbo
You can get your API key from: https://platform.openai.com/api-keys
-
Place your PDF files in the
Papers
directory:- The program will process all PDF files in this directory
- Files are processed sequentially in alphabetical order
- Each PDF is analyzed independently
-
Run the program:
python gpt_paper.py
- Check results in the
Result
directory:- For each PDF, a corresponding .txt file is created
- Output files are named as:
[original_pdf_name]_output.txt
- Results include detailed scientific analysis in a structured format
- Each PDF analysis consumes OpenAI API tokens
- Token consumption depends on:
- Length of the PDF document
- Selected GPT model (GPT-4 costs more than GPT-3.5-turbo)
- Number of API calls made
- Monitor your OpenAI API usage at: https://platform.openai.com/usage
- Processing time varies based on:
- PDF size and complexity
- Number of files being processed
- API response time
- Rate limiting and retry mechanisms
- Start with a small number of PDFs to test the system
- Monitor the console output for processing status
- Keep PDFs in English for best results
- Ensure PDFs are text-searchable (not scanned images)
- Check your API key has sufficient credits before processing large batches
The analysis report generated for each PDF includes:
-
Article Citation
- Title
- Authors
- Journal
- Volume and Issue
- Publication Date
- DOI
- Publisher
-
Research Purpose and Hypothesis
- Research Topic
- Hypothesis/Problem Statement
-
Participants and Study Area
- Participant Information
- Study Area
-
Methodology
- Data Collection Method
- Tools/Instruments Used
- Data Analysis Method
-
Results
- Key Findings
- Statistical Results
-
Authors' Recommendations and Discussion
- Research Success Status
- Authors' Recommendations and Future Research
-
Scientific Contribution and Strengths/Weaknesses
- Strengths
- Weaknesses
-
Summary and Scientific Evaluation
Burak Can KARA Email: [email protected]
This project is licensed under the MIT License - see the LICENSE file for details.