The two transcripts are from AXRP and 80,000 Hours.
I included in the transcripts not only Jan Leike's answers, but also the questions. The reason is because I wanted to represent the conversations.
I did some cleaning on the transcripts.
- Removed cold open
- Removed punctuation
- Removed headings
- Changed to US spelling in 80,000 Hours transcript:
- generalize
- organization
- Converted to lower case (except OpenAI, because why not)
I used the standard stopwords from the Python library plus those in the file custom_stopwords.xlsx
.
Top 75 words from the transcripts. See top-words.txt
for the list.