Skip to content

Commit

Permalink
Merge pull request #99 from cct-datascience/robotstxt
Browse files Browse the repository at this point in the history
update robots.txt
  • Loading branch information
Aariq authored Sep 30, 2024
2 parents 262ffed + bff5e76 commit 3f9e319
Showing 1 changed file with 14 additions and 1 deletion.
15 changes: 14 additions & 1 deletion robots.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
# source: https://neil-clarke.com/block-the-bots-that-feed-ai-models-by-scraping-your-website/
# sources:
# https://www.cyberciti.biz/web-developer/block-openai-bard-bing-ai-crawler-bots-using-robots-txt-file/
# https://neil-clarke.com/block-the-bots-that-feed-ai-models-by-scraping-your-website/

# Data from Common Crawl is used to train ChatGPT, Bard, etc.
User-agent: CCBot
Expand Down Expand Up @@ -27,12 +29,23 @@ Disallow: /
User-agent: FacebookBot
Disallow: /

# Anthropic AI (Claude)
User-agent: anthropic-ai
Disallow: /

User-agent: Claude-Web
Disallow: /

User-agent: ClaudeBot
Disallow: /

# ByteDance's bot for gathering LLM training data
User-agent: Bytespider
Disallow: /

User-agent: ImagesiftBot
Disallow: /

# Takes content and re-writes it using genAI
User-agent: PerplexityBot
Disallow: /

0 comments on commit 3f9e319

Please sign in to comment.