Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

markov_engine.py:141: RuntimeWarning: invalid value encountered in true_divide p_values = distance_magnitudes / sums #40

Open
Panics11 opened this issue Aug 21, 2018 · 10 comments

Comments

@Panics11
Copy link

Panics11 commented Aug 21, 2018

This appears while the bot is using Discord and responds to anyone..
After that it answers just sometimes..

@csvance
Copy link
Owner

csvance commented Aug 21, 2018

The warning is harmless but I do need to fix it. The bot responds when it can successfully generate a sentence based on a subject word it selects in your message using a semi-random process with multiple attempts made. Chances are it’s dataset is too small still and it needs more data. What have you trained it with so far?

@Panics11
Copy link
Author

Panics11 commented Aug 21, 2018

I train it using 3 Discord servers with more than 15k people on it :/
I didnt find any better solution cause i dont understand what kind of trainingdata i could give the script..
(my python is somewhat rusted..)

If you could tell me what kind of data i could feed the bot i will do it.

EDIT:// i forgot to ask how is the bot storing data from Discord?
I find that the responses get more and more sence but i couldnt find any database that is growing..

EDIT2:// Actually the bot runs on a server with 64GB Ram so i think it should be possible to feed a lot of trainingdata to it or am i wrong?

Ohh btw i hope my english isnt as bad as i think cause i am from germany and my english isnt the best :/

@csvance
Copy link
Owner

csvance commented Aug 21, 2018

Important metrics are number of lines processed and number of words processed. Currently the bot only trains when restarted, and the sentence structure model only trains one time initially unless started with --retrain-structure.

Basically, normal use case for the bot is to train it with 10,000-30,000+ lines of text initially. You can either let the bot run for a while and collect them, or import it using a script like scripts/import_text_file.py. Then restart the bot with these arguments: python3 armchair-expert.py --retrain-structure

The quality of output has alot to do with what you feed the bot. I find that if you feed it only discord messages, the majority are just one or two words can cripple the bots idea of sentence structure making it constantly return short messages similar to this.

Performance wise the bot will do fine with 2GB of free RAM. Its CPU intensive when restarting, learning words, and generating text. If available, it is GPU intensive when training its sentence structure model.

@Panics11
Copy link
Author

The bot leanrs from developer discords mostly long texts will be written there.
So i could feed the bot with every basic text?

I start the bot every time with --retrain-structure to be sure that every new entry is processed correctly by the bot.

i think the server has more than enough GPU horsepower with 4 quadro cards inside :D

@csvance
Copy link
Owner

csvance commented Aug 21, 2018

Long text is best.

It sounds to me like you just need to allow the bot more time to acquire more lines of text, and then trigger its training process.

One other thing, is the learned text in english or another language? By default the bot uses an english spaCy dataset, but they have other ones available which should work better for other languages. The reason the bot needs this is because it needs to know where "Parts of Speech" like nouns, verbs, adjectives appear in the sentence.

@Panics11
Copy link
Author

i just train the bot in english an german i installed the german spaCy dataset as well end feeded two small textfiles as input one in german one in english.

i just wanted to ask if i could feed the bot more data to get better learning results.
The bot was startet the first time a couple of days ago and runs without the restart once per day every time.

And i would mention the Warning to you cause i didnt know if you where aware of it ^,.,^

Another question: is it okay if i use the ground model you build as an starting reference to build this into a game i am currently working on? (you will be mentioned as developer for the bot of corse)

@csvance
Copy link
Owner

csvance commented Aug 21, 2018

You are free to do basically whatever you want with it (just follow terms of the MIT License).

Not sure how computationally practical or efficient my model is (I specifically tried to push the boundaries), but it would be interesting to see if it could contribute to your efforts.

@Panics11
Copy link
Author

One last question the bot has absolutly no voice interface or do i miss something?

I think from what i see actually the bot is absolut solid and would be a greate extension for my game AI that learns every step a human player do against her.

Btw it will be an open world survival shooter so it would be funny to implement an chatbot :D

@csvance
Copy link
Owner

csvance commented Aug 21, 2018

No voice interface currently or planned in short to mid term. Not ruling it out though long term. Just have other development priorities currently.

When I get time I want to find a way to make training happen without restarting again. The old version didn't have nearly as good of output, but it would instantly learn new words without restarting.

Good luck with your game, a chatbot in-game does indeed sound like it could be integrated in an interesting or funny way.

@Panics11
Copy link
Author

should be trained to be creepy ;)
I just would ask if i miss something.. i will fork the repo and let some of my programmers do the voice interface.

Thank you for your time :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants