This project is an RPA (Robotic Process Automation) bot designed to automate the extraction of news articles from a selected news website. The bot processes search parameters, retrieves news articles, and outputs structured data into an Excel file, along with downloaded images.
- Automated Data Extraction: Fetches article titles, publication dates, descriptions, and associated images.
- Excel Output: Saves news data into an Excel file, including:
- Title
- Date
- Description
- Picture filename
- Search phrase count in title and description
- Presence of monetary amounts in the content
- Dynamic Parameters: Accepts the following input parameters via Robocloud work items:
- Search phrase
- News category/section/topic
- Time period (current month, or up to the specified number of months)
- Image Handling: Downloads associated images and links them in the Excel file.
- Open a selected news website (e.g., AP News, Reuters).
- Perform a search using the provided search phrase.
- Filter results by category/topic (if applicable).
- Extract and save data for articles published within the specified time frame.
- Download article images and include their filenames in the output.
- Python 3.8+
- Robocorp installed and configured
- Necessary Python packages (see
requirements.txt
):rpaframework
openpyxl
- Other dependencies
-
Clone the Repository
git clone https://github.com/your-repo/news-automation-bot.git cd news-automation-bot
-
Install Dependencies
pip install -r requirements.txt
-
Configuration
- Define input parameters via Robocloud work items.
- Set up Robocloud for deployment and process execution.
-
Run the Bot
python main.py
The bot generates:
- An Excel file in the
/output
directory containing the extracted news data. - Images downloaded to the
/output
directory, with filenames linked in the Excel file.
- Code Quality: Ensure PEP8 compliance and use an object-oriented design.
- Resiliency: Implement fault-tolerant architecture (e.g., explicit waits for website interactions).
- Logging: Use structured logging to track bot actions and errors.
- Incorporate creative logging messages to showcase personality and engagement.
- Make the bot intuitive and extensible for future improvements.
news-automation-bot/
├── main.py # Entry point for the bot
├── config/
│ ├── settings.json # Configuration file for parameters
├── output/ # Directory for output files
├── logs/ # Directory for logs
├── requirements.txt # Python dependencies
├── README.md # Project documentation
Contributions are welcome! Feel free to open issues or submit PRs to improve this bot.
This project is licensed under the MIT License. See the LICENSE file for details.