Developed to identify broken links on website for the Gilder Lehrman Insitute of American History. This could easily be adapted to search other websites by modifying the domain and start url parameters.
- A simple python script that can be launched from the terminal.
- --fname: desired name for the output file
- --number: how many pages should be searched
- fname: All pages visited and all the links contained in those pages as a csv
- broken_fname: All broken links, i.e. origin page, destination page, anchor text
- Python3.6+
- Scrapy 2.5.0
- find broken images
- find pages with code fragments showing as text