Skip to content
This repository has been archived by the owner on Jan 14, 2025. It is now read-only.

Latest commit

 

History

History
90 lines (89 loc) · 3.68 KB

README.md

File metadata and controls

90 lines (89 loc) · 3.68 KB

What is this?


This is my cornerstone project for Univeristy of Michigan. This can be found in the courses honors track.

Professor: Dr. Charles 'Chuck' Severance

Class: Python for Everybody

Refrence:

Coursera: Python UM
Dr. Chuck's Website: www.dr-chuck.com
Free Python Materials: Python for Everybody

Websites used for research:
Dr. Chucks Projects Website
Google.com

Description: This project was made utilizing Dr. Chucks files provided in his course. Spider.py was handmade.


Utilization:

  1. Install all dependencies within the provided filelock.
  2. Run spider.py.
  3. Spider.py

    • Requests via command line:
      - URL to be spidered
      - Enable exception list
      - Exceptions list text file (Example of exception: https://www.google.com/search... skips all google urls with google.com/search)
      - Enable saving of settings for easy setup
      - When restarted it will ask if you want to use a new url of provided an updated exceptions list text file
    • Crawls the designated URL adding newly found urls to a spider.sqlite DB (auto creates the DB)
    • Crawls the next url in the sqlite DB
    • Records html (if found), error code(if provided), and the number of attempts on the site(if unable to access with a max of 3 attempts)
  4. Run sprank.py
  5. sprank.py

    • Requests via command line:
      - Amount of iterations to calculate the ranking of the URLs collected so far(must be visited by the crawler, not just collected)
    • Cylces through visited sites and ranks them based upon all other visited sites in the spider.sqlite DB
    • Addds the ranking to the "rank' colomn
  6. Run spjson.py
  7. spjson.py

    • Pulls the ranking and url from the DB
    • Creates spider/js for the force.html to utilize for the nodes
  8. Open force.html in browser/web engine



Note: If you are doing the same class/project, please make your own graph and crawl the web. The pictures provided above are for showing what the code dose and not for use for grades, research, etc.