Skip to content

an experiment in using word2vec to rephrase familiar book titles for the funniez

Notifications You must be signed in to change notification settings

dumoulma2/ineffable_abracadabra_of_cleaning

 
 

Repository files navigation

The Ineffable Wizardry of Cleaning

ugh I can't remember the name of that cleaning up book so let's make all the varieties using word2vec to find synonyms of the words in the title

and other books too, obviously.

Natural Language?

This uses word2vec, which is some existence-altering wizardry of its own. Tutorial and explanation below -- along with an interactive word2vec model: http://radimrehurek.com/2014/02/word2vec-tutorial/

Some ideas I had and might be inexplicably contained in unreachable code in the source

other options to increase fidelity to grammatical structure of input:

  • use an off-the-shelf stemmer and compare the removed portion (leaf?) in the word and synonym for levenshtein distance
  • pre-parse (by hand) into SimpleNLG structures then synonymize each word (using an API) then use NLG for conjugation

wordnik has an awesome API, might help doing the stem -> conjugated form

http://developer.wordnik.com/docs.html#!/word/getRelatedWords_get_4

Future work

Themed book titles "What if Harry Potter was about artists"

this requires a list of titles ALONG WITH their topics so we can do something like:

Harry Potter and the Chamber of Secrets minus magic plus art and see what we get.

Corpora

I'm using a corpus of sentences from The New York Times that I, unfortunately, can't share. Here are some alternatives that'd probably work fine:

About

an experiment in using word2vec to rephrase familiar book titles for the funniez

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%