Skip to content

pvjulien/word-cloud-podcasts-jan-leike

Repository files navigation

The two transcripts are from AXRP and 80,000 Hours.

I included in the transcripts not only Jan Leike's answers, but also the questions. The reason is because I wanted to represent the conversations.

Cleaning

I did some cleaning on the transcripts.

  • Removed cold open
  • Removed punctuation
  • Removed headings
  • Changed to US spelling in 80,000 Hours transcript:
    • generalize
    • organization
  • Converted to lower case (except OpenAI, because why not)

Stopwords

I used the standard stopwords from the Python library plus those in the file custom_stopwords.xlsx.

Result

Top 75 words from the transcripts. See top-words.txt for the list.

a word cloud

About

Code and notes for a podcasts transcripts word cloud.

Resources

License

Stars

Watchers

Forks