custom delft train args #469

de-code · 2019-07-26T16:28:38Z

Allows custom args to be passed in to the the training operation.

It adds the following optional configuration options:

grobid.delft.train.module: Training module (default is grobidTagger.py as per DeLFTModel)
grobid.delft.train.args: additional arguments to be passed to the training module. To keep it simple, it's split on spaces (which could be improved, should shell parsing be required in the future).

This is also useful for the current training, to configure the training.

The useELMo flag didn't actually work for training because the command list was immutable.

coveralls · 2019-07-26T16:38:42Z

Coverage increased (+0.04%) to 38.313% when pulling 95534bc on elifesciences:custom-delft-train-args into 506c00a on kermitt2:master.

lfoppiano · 2019-08-08T06:05:14Z

@de-code so this should run the training using delft directly from Grobid?

de-code · 2019-08-08T16:25:24Z

@de-code so this should run the training using delft directly from Grobid?

Yes, it would be trained like the Wapiti model. Although I have deferred trying to run it that way. Not least because I then need to create GPU version of the Docker container to run in the cloud. Since I am just using a single training dataset at present, it's okay to do that manually (I still need to run the GROBID training command to get the converted training data).

grobid-core/src/main/java/org/grobid/core/jni/DeLFTModel.java

lfoppiano · 2019-08-23T08:14:46Z

This PR works in Grobid and on a submodule 👍
Only condition is that the training must be ran after activating the virtual environment.
Better to fix #490 on a separate PR, cause it requires moving around some code.

de-code · 2019-11-29T09:04:44Z

Hi @lfoppiano I rebased as requested

de-code · 2019-11-29T09:07:11Z

BTW I would recommend to always squash merge PRs

lfoppiano · 2019-12-24T07:42:17Z

I wonder, what about we make this an optional method for training the delft model?
Since Jep is not reliable at this stage, this could still be removed in future.

For me, personally the alternative is to poke the files from the tmp directory, and run the training manually, which is also not ideal dd

de-code · 2020-01-03T10:37:37Z

I wonder, what about we make this an optional method for training the delft model?
Since Jep is not reliable at this stage, this could still be removed in future.

For me, personally the alternative is to poke the files from the tmp directory, and run the training manually, which is also not ideal dd

Just trying to go through some old GitHub notifications. Not sure if it was for me but there seem to be some agreement, that a way to train a DeLFT model without GROBID can be beneficial.

For training via GROBID, as you mentioned JEP is less reliable. Maybe training via the command has the additional advantage that it is more transparent, simpler (given the tool already exists) and could be reproduced separately (via the CLI). I guess for training and evaluation there shouldn't be any noticeable performance difference.

lfoppiano · 2020-01-04T21:59:35Z

I would parametrize this change via configuration and add a new CLI function in grobid to generate the training data automatically so that we provide multiple ways of training the models.

lfoppiano · 2020-01-23T07:39:00Z

new CLI is done: https://github.com/lfoppiano/grobid-superconductors/blob/master/src/main/java/org/grobid/service/command/PrepareDelftTraining.java

Since it's in a (yet) private repository, here a (non updated) version: https://gist.github.com/lfoppiano/dd28365a83d7aaf9b459ec0ef846ea34

de-code · 2020-11-25T09:53:49Z

I was resolving conflicts after the addition of the architecture flag.
Perhaps this could be merged soonish?

de-code mentioned this pull request Jul 26, 2019

custom delft train args fork elifesciences/grobid#5

Merged

lfoppiano reviewed Aug 9, 2019

View reviewed changes

grobid-core/src/main/java/org/grobid/core/jni/DeLFTModel.java Show resolved Hide resolved

de-code mentioned this pull request Aug 23, 2019

Add support for CONDA_PREFIX when training DeLFT #490

Open

lfoppiano self-requested a review August 23, 2019 08:14

lfoppiano approved these changes Aug 23, 2019

View reviewed changes

kermitt2 force-pushed the master branch from 4a7e4f3 to 9ad861e Compare October 26, 2019 18:54

de-code force-pushed the custom-delft-train-args branch from 37f2831 to cdd63c7 Compare November 29, 2019 08:59

lfoppiano approved these changes Dec 3, 2019

View reviewed changes

de-code added 12 commits April 17, 2020 16:14

added grobid.delft.train.args

33f695c

extracted getTrainCommand

139d564

log command

129a66e

removed outputModel argument

62fbc54

fixed GrobidProperties getInstance vs static

729381d

added test for useELMO flag and fixed implementation

1d082ff

implemented simple arg

a2342b6

support for multiple args (space separated)

46c87ae

added support for custom train module

fa7381f

minor test method rename

e3ed0e6

extracted trainingData

794bba8

fixed testShouldReturnEmptyTrainModuleByDefault reset wrong prop

fcac500

de-code added 2 commits April 17, 2020 16:17

moved train module up to match implementation order

9469c64

removed commented out test method (rebase issue)

3b83725

de-code force-pushed the custom-delft-train-args branch from 9d3bce4 to 3b83725 Compare April 17, 2020 15:18

Merge branch 'master' into custom-delft-train-args

95534bc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

custom delft train args #469

custom delft train args #469

de-code commented Jul 26, 2019

coveralls commented Jul 26, 2019 •

edited

Loading

lfoppiano commented Aug 8, 2019

de-code commented Aug 8, 2019

lfoppiano commented Aug 23, 2019 •

edited

Loading

de-code commented Nov 29, 2019

de-code commented Nov 29, 2019

lfoppiano commented Dec 24, 2019

de-code commented Jan 3, 2020

lfoppiano commented Jan 4, 2020

lfoppiano commented Jan 23, 2020 •

edited

Loading

de-code commented Nov 25, 2020

custom delft train args #469

Are you sure you want to change the base?

custom delft train args #469

Conversation

de-code commented Jul 26, 2019

coveralls commented Jul 26, 2019 • edited Loading

lfoppiano commented Aug 8, 2019

de-code commented Aug 8, 2019

lfoppiano commented Aug 23, 2019 • edited Loading

de-code commented Nov 29, 2019

de-code commented Nov 29, 2019

lfoppiano commented Dec 24, 2019

de-code commented Jan 3, 2020

lfoppiano commented Jan 4, 2020

lfoppiano commented Jan 23, 2020 • edited Loading

de-code commented Nov 25, 2020

coveralls commented Jul 26, 2019 •

edited

Loading

lfoppiano commented Aug 23, 2019 •

edited

Loading

lfoppiano commented Jan 23, 2020 •

edited

Loading