Skip to content
Manal Shaikh edited this page Oct 27, 2020 · 1 revision

Welcome to the WPExtractor's wiki!

About

WPExtractor is a python-based tool specifically made for Artificial Intelligence-based projects to make datasets. This helps to collect data from blogs which can be used to train bot in many useful ways.

Features

  • Automatically extract all posts from a WordPress website within seconds.
  • Saves the data in the JSON file in the directory for you.
  • Ability to bypass certain restrictions related to user-agents, with a custom default user-agent.
  • Easily understandable JSON format to make your life easier :D
  • Responsive developers. Just make an issue, we'll fix it for you :)

Usage

Usage : Downloading posts from a WordPress website -
python main.py -u https://shadowhosting.net/blog

Downloading pages from a WordPress website -
python main.py -u https://itsfoss.com/wp-json -p

Note - Above URLs are just for reference of the wiki and are not to be copied, unless the license/author allows specifically so.

License

WPExtractor is licensed under GPL v3.0.

Clone this wiki locally