Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to generate LM image and GC via two separate jobs #446

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

NeoLegends
Copy link
Contributor

@NeoLegends NeoLegends commented Aug 21, 2023

Closes #430
Closes #514

Now testing...

@NeoLegends NeoLegends added the enhancement New feature or request label Aug 21, 2023
@NeoLegends NeoLegends self-assigned this Aug 21, 2023
@NeoLegends NeoLegends marked this pull request as draft August 21, 2023 16:34
@NeoLegends NeoLegends force-pushed the feat/separate-lmi-gc-generation branch from 6b40167 to b74c654 Compare August 21, 2023 16:39
recognition/advanced_tree_search.py Outdated Show resolved Hide resolved
recognition/advanced_tree_search.py Show resolved Hide resolved
if separate_lmi_gc_generation:
gc = BuildGlobalCacheJob(crp, extra_config, extra_post_config).out_global_cache

arpa_lms = AdvancedTreeSearchLmImageAndGlobalCacheJob.find_arpa_lms(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe find_arpa_lms can be moved from a classmethod to a standalone method in the lm module?

@NeoLegends NeoLegends requested a review from michelwi August 29, 2023 14:44
@NeoLegends NeoLegends marked this pull request as ready for review August 29, 2023 14:49
@NeoLegends
Copy link
Contributor Author

NeoLegends commented Aug 29, 2023

I just discovered during testing the LM image and GC are set on the post config, not in the normal config. This means splitting the LMGC-Job into two does not change the hash of any existing jobs.

Therefore, I think this flag can be enabled by default, or rather, we can in-line the flag and make it the new default! WDYT?

To finish testing I need to run a decoding w/ the new GC and new LM, but otherwise the stuff here is now tested. This is what it looks like when the flag is set in a pipeline w/ thousands of jobs (note due to the GC not being hashed only a few GC-jobs are even run):

[2023-08-29 17:30:46,918] INFO: Finished updating job states
[2023-08-29 17:30:46,934] INFO: Experiment directory: /u/mgunz/setups/2023-08--subsampling-new      Call: /u/mgunz/src/sisyphus/sis m -r -io
[2023-08-29 17:30:47,002] INFO: runnable: Job<work/i6_core/lm/lm_image/CreateLmImageJob.gNAo29L3jB0Y>
[2023-08-29 17:30:47,002] INFO: runnable: Job<work/i6_core/recognition/advanced_tree_search/BuildGlobalCacheJob.9kZVSb7jUVMk>
[2023-08-29 17:30:47,002] INFO: runnable: Job<work/i6_core/recognition/advanced_tree_search/BuildGlobalCacheJob.VesmsCzSguWG>
[2023-08-29 17:30:47,002] INFO: runnable(3) running(8) waiting(878)

@NeoLegends
Copy link
Contributor Author

Apparently switching the default here changes hashes at AppTek. In case it is possible to live w/ the hash breakage (e.g. because it is in unused parts of the code) I'd like the flag to be on by default so as many folks as possible can profit from the changes here. If the hash breakage is unacceptable we leave it off by default of course.

@NeoLegends NeoLegends changed the title feat: Add option to generate LM image and GC via two separate jobs Add option to generate LM image and GC via two separate jobs Aug 29, 2023
@michelwi
Copy link
Contributor

I just discovered during testing the LM image and GC are set on the post config, not in the normal config. This means splitting the LMGC-Job into two does not change the hash of any existing jobs.

*Any existing search jobs. The graph however is changed as in all of the LMGC Jobs are now removed and separate LM and GC Jobs are added. This is caught in the pipeline.

In case it is possible to live w/ the hash breakage (e.g. because it is in unused parts of the code) I'd like the flag to be on by default so as many folks as possible can profit from the changes here. If the hash breakage is unacceptable we leave it off by default of course.

Unfortunately all parts that are tested in the pipeline are used.

lm/lm_image.py Show resolved Hide resolved
rasr/crp.py Show resolved Hide resolved
recognition/advanced_tree_search.py Show resolved Hide resolved
recognition/advanced_tree_search.py Show resolved Hide resolved
@NeoLegends
Copy link
Contributor Author

I have successfully run recognitions w/ this setup. This is tested now.

recognition/advanced_tree_search.py Outdated Show resolved Hide resolved
recognition/advanced_tree_search.py Outdated Show resolved Hide resolved
@@ -190,6 +173,7 @@ def __init__(
:param lmgc_mem: Memory requirement for the AdvancedTreeSearchLmImageAndGlobalCacheJob
:param lmgc_alias: Alias for the AdvancedTreeSearchLmImageAndGlobalCacheJob
:param lmgc_scorer: Dummy scorer for the AdvancedTreeSearchLmImageAndGlobalCacheJob which is required but unused
:param separate_lm_image_gc_generation: Whether to generate the LM image and the global cache via two separate jobs for a more stable hash. Whether or not this flag is set is not part of the hash, so using separate jobs is the default.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be a so NOT using separate jobs is the default?

@NeoLegends
Copy link
Contributor Author

AppTek hash broke after changing only a comment, did the AppTek pipeline change in the meantime?

@Marvin84
Copy link
Contributor

Marvin84 commented Nov 6, 2023

AppTek hash broke after changing only a comment, did the AppTek pipeline change in the meantime?

@curufinwe @michelwi @christophmluscher any idea?

@michelwi
Copy link
Contributor

michelwi commented Nov 8, 2023

AppTek hash broke

We now do require the changes that were introduced in #455 and your PR is so old it does not include them. If you rebase to the current main all should be fine. Thanks.

lm/util.py Show resolved Hide resolved
@Marvin84
Copy link
Contributor

@curufinwe @michelwi can we merge this?

@michelwi
Copy link
Contributor

@curufinwe @michelwi can we merge this?

Since the apptek test passes, I see no objections from our site.

I'll dismiss my old review, but I currently don't have much time to re-review it.. sorry.

@michelwi michelwi dismissed their stale review November 14, 2023 19:11

No time to follow up on it.

import i6_core.rasr as rasr


def _has_image(c: rasr.RasrConfig, pc: rasr.RasrConfig):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would either like to spell config and post_config out or have a small docstring here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @Atticus1806

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @Atticus1806

return res


def find_arpa_lms(lm_config: rasr.RasrConfig, lm_post_config=None) -> List[Tuple[rasr.RasrConfig, rasr.RasrConfig]]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the post config also rasr.RasrConfig? Or some other type?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def find_arpa_lms(lm_config: rasr.RasrConfig, lm_post_config=None) -> List[Tuple[rasr.RasrConfig, rasr.RasrConfig]]:
def find_arpa_lms(lm_config: rasr.RasrConfig, lm_post_config: Optional[rasr.RasrConfig] = None) -> List[Tuple[rasr.RasrConfig, rasr.RasrConfig]]:

yes, post configs should also be of type RasrConfig
TODO: add Optional import above and check with black for line length :P

@@ -206,6 +190,8 @@ def __init__(
self.config,
self.post_config,
self.lm_gc_job,
self.gc_job,
self.lm_image_jobs,
) = AdvancedTreeSearchJob.create_config(**kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is .create_config only called within this job? Otherwise we need to be careful with changing the returned variables. But I think you caught the cases here and otherwise the change is easy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the AdvancedTreeSearchWithRescoringJob below in this file inherits from AdvancedTreeSearchJob and uses super().create_config. that should be fixed.

if lmgc_alias is not None:
lm_gc.add_alias(lmgc_alias)
lm_gc.rqmt["mem"] = lmgc_mem
def add_lm_config_to_crp(crp, lm_config):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def add_lm_config_to_crp(crp, lm_config):
def add_lm_config_to_crp(crp: rasr.CommonRasrParameters, lm_config: rasr.RasrConfig):

recognition/advanced_tree_search.py Show resolved Hide resolved
config.flf_lattice_tool.network.recognizer.lm,
post_config.flf_lattice_tool.network.recognizer.lm,
)
for i, lm_config in enumerate(arpa_lms):
lm_config[1].image = lm_gc.out_lm_images[i + 1]
for i, (_lm_config, lm_post_config) in enumerate(arpa_lms):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

(i + 1): lm.CreateLmImageJob(
add_lm_config_to_crp(crp, lm_config), extra_config=extra_config, extra_post_config=extra_post_config
)
for i, (lm_config, _lm_post_config) in enumerate(arpa_lms)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for the _. Maybe I am overlooking something in the web view. If its not used you could just fully replace it with _

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the `post_config here is unused, because

  • both config and post_confic are extracted from the crp
  • only the config is modified
  • the crp with the old post_config is still being used

+1 to can be just _ to make clear it is unused

Suggested change
for i, (lm_config, _lm_post_config) in enumerate(arpa_lms)
for i, (lm_config, _) in enumerate(arpa_lms)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the `post_config here is unused, because

  • both config and post_confic are extracted from the crp
  • only the config is modified
  • the crp with the old post_config is still being used

+1 to can be just _ to make clear it is unused

Suggested change
for i, (lm_config, _lm_post_config) in enumerate(arpa_lms)
for i, (lm_config, _) in enumerate(arpa_lms)

recognition/advanced_tree_search.py Show resolved Hide resolved
@Atticus1806
Copy link
Contributor

can we merge this?

With two approves yes :). I just got a few comments then I can approve, but please ask someone else to also review.

Copy link
Contributor

@michelwi michelwi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think many open comments should be addressed before merging.

import i6_core.rasr as rasr


def _has_image(c: rasr.RasrConfig, pc: rasr.RasrConfig):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @Atticus1806

return res


def find_arpa_lms(lm_config: rasr.RasrConfig, lm_post_config=None) -> List[Tuple[rasr.RasrConfig, rasr.RasrConfig]]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def find_arpa_lms(lm_config: rasr.RasrConfig, lm_post_config=None) -> List[Tuple[rasr.RasrConfig, rasr.RasrConfig]]:
def find_arpa_lms(lm_config: rasr.RasrConfig, lm_post_config: Optional[rasr.RasrConfig] = None) -> List[Tuple[rasr.RasrConfig, rasr.RasrConfig]]:

yes, post configs should also be of type RasrConfig
TODO: add Optional import above and check with black for line length :P

recognition/advanced_tree_search.py Show resolved Hide resolved
(i + 1): lm.CreateLmImageJob(
add_lm_config_to_crp(crp, lm_config), extra_config=extra_config, extra_post_config=extra_post_config
)
for i, (lm_config, _lm_post_config) in enumerate(arpa_lms)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the `post_config here is unused, because

  • both config and post_confic are extracted from the crp
  • only the config is modified
  • the crp with the old post_config is still being used

+1 to can be just _ to make clear it is unused

Suggested change
for i, (lm_config, _lm_post_config) in enumerate(arpa_lms)
for i, (lm_config, _) in enumerate(arpa_lms)

(i + 1): lm.CreateLmImageJob(
add_lm_config_to_crp(crp, lm_config), extra_config=extra_config, extra_post_config=extra_post_config
)
for i, (lm_config, _lm_post_config) in enumerate(arpa_lms)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the `post_config here is unused, because

  • both config and post_confic are extracted from the crp
  • only the config is modified
  • the crp with the old post_config is still being used

+1 to can be just _ to make clear it is unused

Suggested change
for i, (lm_config, _lm_post_config) in enumerate(arpa_lms)
for i, (lm_config, _) in enumerate(arpa_lms)

recognition/advanced_tree_search.py Show resolved Hide resolved
@@ -190,6 +173,7 @@ def __init__(
:param lmgc_mem: Memory requirement for the AdvancedTreeSearchLmImageAndGlobalCacheJob
:param lmgc_alias: Alias for the AdvancedTreeSearchLmImageAndGlobalCacheJob
:param lmgc_scorer: Dummy scorer for the AdvancedTreeSearchLmImageAndGlobalCacheJob which is required but unused
:param separate_lm_image_gc_generation: Whether to generate the LM image and the global cache via two separate jobs for a more stable hash. Whether or not this flag is set is not part of the hash, so NOT using separate jobs is the default.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
:param separate_lm_image_gc_generation: Whether to generate the LM image and the global cache via two separate jobs for a more stable hash. Whether or not this flag is set is not part of the hash, so NOT using separate jobs is the default.
:param separate_lm_image_gc_generation: Whether to generate the LM image and the global cache via two separate
jobs for a more stable hash. This flag is not part of the hash, using a combined job is the default.

break comment into two lines to respect 120 char line length?

if separate_lm_image_gc_generation:
gc_job = BuildGlobalCacheJob(crp, extra_config, extra_post_config)

arpa_lms = lm.find_arpa_lms(crp.language_model_config, None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function lm.find_arpa_lms only returns the LMs that do not already have an lm.image - because for those that already have an image we do not need to create a new one.

But the lm.image usually is defined in the post_config. When we here not pass the post config, then all (arpa) LMs are found and returned. Therefore we do extra work here.
And even worse: below in line 404/418 we call the function again, but with the post config. So if the original crp contain a mix of arpa LMs out of which some have already images and others do not, then the items in arpa_lms differ between the calls. And since we use the index to match the image to the LM, the mapping will be off and the wrong image will be assigned.

Suggested change
arpa_lms = lm.find_arpa_lms(crp.language_model_config, None)
arpa_lms = lm.find_arpa_lms(crp.language_model_config, crp.language_model_post_config)


arpa_lms = AdvancedTreeSearchLmImageAndGlobalCacheJob.find_arpa_lms(
arpa_lms = lm.find_arpa_lms(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have called lm.find_arpa_lms above. Maybe move the above call out of the if condition and then reuse the arpa_lms from above here. This would also avoid mismatching items in the list as I outlined above.

@@ -206,6 +190,8 @@ def __init__(
self.config,
self.post_config,
self.lm_gc_job,
self.gc_job,
self.lm_image_jobs,
) = AdvancedTreeSearchJob.create_config(**kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the AdvancedTreeSearchWithRescoringJob below in this file inherits from AdvancedTreeSearchJob and uses super().create_config. that should be fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
5 participants