Markov's Passwords

Markov's passwords is a program to generate random passwords composed of strings of pronounceable made up words. It is based off of the XKCD comic "Password Strength", and the observation that made-up words are even more resistant to dictionary attacks from the Computerphile episode "How to Choose a Password".

Usuage

First the model needs to be trained

> python3 train.py dict.txt dict.model -l 5

Then we can generate passwords using

> python3 markovs_passwords.py eng_37k.model
guessi_henedn_xychol_benone
ggiero_hlorag_terpol_jetter
xinemi_quitab_vernic_mialle
rapher_terpro_tedlyw_utioni
denbur_dthway_npliab_icketi

How It Works

I use a Markov chain of word frequencies based of off the $n$ characters before them. The higher $n$ is the more recognizable a character is. If $n$ is to large the model may end up outputting longer words verbatim from the dictionary, as they are the only matches that are that long. In general $n$ should be selected to give the highest entropy rate (outputted by the training program).

The program thus needs a dictionary file as input, and could theoretically work on multiple languages, provided that they are alphebetic and not logographic (like Chinese). See dict/.

Model File Format

Model files are encoded using the msgpack format. They correspond to the flowing JSON format.

<model> := {"lookback" : <lookback>, "charSet" : <charSet>, "models" : <submodels>}
<charSet> := [<char>, ...]
<submodels> := [<submodel>, ...]
<submodel> := {"lookback" : <lookback>,
               "freqs" : {<char> : {<char> : <freq>, ...}, ...}

Here's and example for a model trained on "cat", "call", and "late", with a lookback of 2.

{'lookback': 2,
 'charSet': ['e', 't', 'c', 'a', 'l'],
 'models': [{'lookback': 1,
   'freqs': {'c': {'a': 1.0},
    'a': {'t': 0.6666666666666666, 'l': 0.3333333333333333},
    'l': {'l': 0.5, 'a': 0.5},
    't': {'e': 1.0}}},
  {'lookback': 2,
   'freqs': {'ca': {'t': 0.5, 'l': 0.5},
    'al': {'l': 1.0},
    'la': {'t': 1.0},
    'at': {'e': 1.0}}}]}

Hidden Markov Model

An HMM implementation can be found in the hidden branch. The HMM needs to be modified to handle words better, as it currently performs worse than the current model.

Contributing

Contributions are welcome and can be made via pull requests. Low-hanging fruit includes adding dictionaries for more languages. Ideally these dictionaries should have the 1000-4000, depending on the language, most common words in the language. This is to ensure that the produced words are still somewhat recognizable, and aren't being perturbed by words native speakers might not use often enough to recognize.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
dict		dict
models		models
Generator.py		Generator.py
LICENSE		LICENSE
README.md		README.md
markovs_passwords.py		markovs_passwords.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Markov's Passwords

Usuage

How It Works

Model File Format

Hidden Markov Model

Contributing

About

Releases

Packages

Languages

License

lambda-11235/markovs_passwords

Folders and files

Latest commit

History

Repository files navigation

Markov's Passwords

Usuage

How It Works

Model File Format

Hidden Markov Model

Contributing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages