Can we use our own data set to train the models and predict our own test set? #14

xinxu1018 · 2018-10-17T01:54:36Z

Can we use our own data set to train the models and predict our own test set?

Sshanu · 2018-10-17T09:54:16Z

You can, just make sure the input format remains the same.

On Wed, Oct 17, 2018 at 7:24 AM xinxu1018 ***@***.***> wrote: Can we use our own data set to train the models and predict our own test set? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#14>, or mute the thread <https://github.com/notifications/unsubscribe-auth/APvEHzSOPblJDTxAoi1H6gAbCHnDKeb7ks5ulo3DgaJpZM4Xi-p_> .

xinxu1018 · 2018-10-17T15:44:37Z

@Sshanu Thanks so much for your quick response! Please allow me to ask one more question.
Since I am using word embeddings trained over my specific corpus instead of your given Glove embeddings, how can I get my embeddings in the same format with the Glove embedding file you provided in the data folder and use it in your designed LSTM model?

All the best!

Sshanu · 2018-10-17T17:43:13Z

I first extracted words and stored it in a list named as vocab, then extracted word embedding and stored it in a numpy array. If 2nd word in vocab is "the", the 2nd row in numpy array will have word embedding corresponding to "the". I then saved both vocab and word embedding numpy array using pickle. So, you can create a similar array and vocab, or you can change the code to load embeddings.

…

On Wed, Oct 17, 2018 at 9:14 PM xinxu1018 ***@***.***> wrote: @Sshanu <https://github.com/Sshanu> Thanks so much for your quick response! Please allow me to ask one more question. Since I am using word embeddings trained over my specific corpus instead of your given Glove embeddings, how can I get my embeddings in the same format with the Glove embedding file you provided in the data folder and use it in your designed LSTM model? All the best! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#14 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/APvEH5M7hwIaEYdC0p093V4MPlrF7FFrks5ul1BmgaJpZM4Xi-p_> .

xinxu1018 · 2018-10-17T19:17:44Z

@xinxu1018 That is so informative! Thanks a lot!

xinxu1018 · 2018-10-17T19:27:21Z

@Sshanu It works! Thanks a lot! Can I ask a follow-up question again?
If I wanna classify relations between multi-word terms (in your case it is one-word term pairs), how can I preprocess the sentences before I go to the step of dependency path extraction? Do you have any suggestions? One way I am considering is to connect every word within a multi-word term using underscores (like, "system configuration" to "system_configuration" ) and then treat them as a one-word term. Then follow your designed procedures. Not sure if it will work. Do you have any ideas?

Many thanks!

Sshanu · 2018-10-18T01:53:57Z

You can try this approach but its shortcoming is that you don't have word embedding for the multi-word term. Create a dependency tree, then choose entity from the two word which is below another one in the tree, then the information regarding the other word will be computed from the lstm -tree, and instead of only using features of lca, entity1, and entity2 from the lstm-tree for relation classification, also use the features of the other word from the lstm-tree. I did not work on Relation Classification or any related field after this project, this project was my first in NLP, that's why I have very less knowledge in this relation classification or extraction.

…

On Thu, Oct 18, 2018 at 12:57 AM xinxu1018 ***@***.***> wrote: @Sshanu <https://github.com/Sshanu> It works! Thanks a lot! Can I ask a follow-up question again? If I wanna classify relations between multi-word terms (in your case it is one-word term pairs), how can I preprocess the sentences before I go to the step of dependency path extraction? Do you have any suggestions? One way I am considering is to connect every word within a multi-word term using underscores (like, "system configuration" to "system_configuration" ) and then treat them as a one-word term. Then follow your designed procedures. Not sure if it will work. Do you have any ideas? Many thanks! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#14 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/APvEH10qEJGzLGNtyWXIfXWIEnXIPJDxks5ul4SZgaJpZM4Xi-p_> .

xinxu1018 · 2018-10-18T03:37:30Z

@Sshanu Thanks a lot! Hope you everything goes very well!

xinxu1018 · 2018-10-18T15:33:32Z

@Sshanu Could you please provide your word_embd_wiki file? I cannot find the embedding file in your given data folder. Thanks for you help!

Best,

Sshanu · 2018-10-18T16:04:19Z

My google drive is full, please share a folder with me, where I will upload the word_embed file.

…

On Thu, Oct 18, 2018 at 9:03 PM xinxu1018 ***@***.***> wrote: @Sshanu <https://github.com/Sshanu> Could you please provide your word_embd_wiki file? I cannot find the embedding file in your given data folder. Thanks for you help! Best, — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#14 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/APvEH44UwKs0jDTAByJd8PV4ORR6PXK2ks5umJ9SgaJpZM4Xi-p_> .

xinxu1018 · 2018-10-18T16:07:58Z

@Sshanu How can I share a folder with you? What's your address? Sorry, I am new here!

xinxu1018 · 2018-10-18T16:13:42Z

@Sshanu Hi Sshanu, I just shared a google drive folder to the email you provided in your Github profile. Not sure am I doing right! Many thanks!

Sshanu · 2018-10-19T04:23:39Z

Oh, please share it with [email protected]

…

On Thu, Oct 18, 2018 at 9:43 PM xinxu1018 ***@***.***> wrote: @Sshanu <https://github.com/Sshanu> Hi Sshanu, I just shared a google drive folder to the email you provided in your Github profile. Not sure am I doing right! Many thanks! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#14 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/APvEHwTaPKHc9kpPbTxI8xl0QvV6dsahks5umKi3gaJpZM4Xi-p_> .

xinxu1018 · 2018-10-19T04:35:30Z

@Sshanu I have shared to your gmail. Please check and many thanks!

xinxu1018 · 2018-10-19T04:47:02Z

@Sshanu Hi Sshanu, got your shared file! You helped me a lot! I am just wondering do you have the original code that was used to separate embedding file into vocab and word_embedding arrays? Then I can generate my own trained embeddings into the format aligned with your designed method. Could you please share me the code? Thanks again!

Sshanu · 2018-10-19T06:39:17Z

I don't have the file, if you are having a problem in generating the exact file, then simply try storing the embeddings in numpy array and vocabulary in a list, my code will work afterward.

…

On Fri, Oct 19, 2018 at 10:17 AM xinxu1018 ***@***.***> wrote: @Sshanu <https://github.com/Sshanu> Hi Sshanu, got your shared file! You helped me a lot! I am just wondering do you have the original code that was used to separate embedding file into vocab and word_embedding arrays? Then I can generate my own trained embeddings into the format aligned with your designed method. Could you please share me the code? Thanks again! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#14 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/APvEH2QJJ7qTVg8HHwGhi8TqBQvRMP3Dks5umVlLgaJpZM4Xi-p_> .

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can we use our own data set to train the models and predict our own test set? #14

Can we use our own data set to train the models and predict our own test set? #14

xinxu1018 commented Oct 17, 2018

Sshanu commented Oct 17, 2018 via email

xinxu1018 commented Oct 17, 2018

Sshanu commented Oct 17, 2018 via email

xinxu1018 commented Oct 17, 2018

xinxu1018 commented Oct 17, 2018

Sshanu commented Oct 18, 2018 via email

xinxu1018 commented Oct 18, 2018

xinxu1018 commented Oct 18, 2018

Sshanu commented Oct 18, 2018 via email

xinxu1018 commented Oct 18, 2018

xinxu1018 commented Oct 18, 2018

Sshanu commented Oct 19, 2018 via email

xinxu1018 commented Oct 19, 2018

xinxu1018 commented Oct 19, 2018

Sshanu commented Oct 19, 2018 via email

Can we use our own data set to train the models and predict our own test set? #14

Can we use our own data set to train the models and predict our own test set? #14

Comments

xinxu1018 commented Oct 17, 2018

Sshanu commented Oct 17, 2018 via email

xinxu1018 commented Oct 17, 2018

Sshanu commented Oct 17, 2018 via email

xinxu1018 commented Oct 17, 2018

xinxu1018 commented Oct 17, 2018

Sshanu commented Oct 18, 2018 via email

xinxu1018 commented Oct 18, 2018

xinxu1018 commented Oct 18, 2018

Sshanu commented Oct 18, 2018 via email

xinxu1018 commented Oct 18, 2018

xinxu1018 commented Oct 18, 2018

Sshanu commented Oct 19, 2018 via email

xinxu1018 commented Oct 19, 2018

xinxu1018 commented Oct 19, 2018

Sshanu commented Oct 19, 2018 via email