LDA - Latent Dirichlet Analysis

An Azure-ready implementation of LDA in Python using Scikit.

General Description

Topic Modelling is a statistical method that attempts to identify common topics in large bodies of text.

It is an unsupervised method which means that it doesn't require any labelled data. It does however require a little bit of interpretation of the results. The technique is able to identify word groups that are commonly found together but it is not able to label them itself.

Example output (from a 100,000 customer survey corpus):

'great', 'easy’, 'problem’
'good', 'service', 'customer’
'use’, 'account', 'card’
'products', 'like', 'love’
'website', 'problems', 'order’,
'did', 'phone', 'new’,
'store', 'went’, 'didn’t’,
'food’, 'nice', 'shop’
'online', 'time', 'bought’,
'staff', 'helpful', 'friendly'

It's up to the user to interpret what the topics actually are but hopefully it can be seen that some obvious candidates easily suggest themselves: 'problems using a website', "good customer service", "helpful staff" and so on.

Implementation

The LDA implementation is behind a Flask-based HTTP host. A json representation of the data, as a list of strings, is first written to Azure Blob storage and a request on the webserver made detailing the location of the new data. The server responds immediately with a UUID which references the job and creates a corresponding row in Azure Table storage tracking the job's progress.

When complete the Table row status is updated. Clients wishing to track the job status can simply poll the relevant Table row.

Deployment

Details about deploying a Flask-based web app to Azure can be found here: https://docs.microsoft.com/en-us/azure/app-service/app-service-web-get-started-python

-- end

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
virtualenv_proxy.py		virtualenv_proxy.py
web.2.7.config		web.2.7.config
web.3.4.config		web.3.4.config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LDA - Latent Dirichlet Analysis

General Description

Implementation

Deployment

About

Releases

Packages

Languages

tobybrad/lda

Folders and files

Latest commit

History

Repository files navigation

LDA - Latent Dirichlet Analysis

General Description

Implementation

Deployment

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages