Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialize Lark.grammar (fixes issue #1472) #1506

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

NasalDaemon
Copy link

Fixes #1472

@erezsh
Copy link
Member

erezsh commented Jan 17, 2025

How much does this change add to the cache size? (in rough terms)

@NasalDaemon
Copy link
Author

How much does this change add to the cache size? (in rough terms)

In my scenario with a file.lark of 4kb:

cache without grammar: 63,994 bytes
cache with grammar: 115,970 bytes

making an increase of ~52kb

As a quick and dirty performance test, I ran my script which uses lark to parse a file and produce a cpp (with jinja) multiple times under different regimes.

Without caching: 440ms
With caching, but using Lark.grammer = load_grammar(...): 370ms
With caching+serializing grammar: 320ms

This shows that there is indeed a noticeable speed benefit even if the cache has become larger, despite the overhead of jinja and the rest of my script.

@erezsh
Copy link
Member

erezsh commented Jan 18, 2025

How does serializing the grammar make it run faster? It only adds operations, and doesn't remove any.

@NasalDaemon
Copy link
Author

How does serializing the grammar make it run faster? It only adds operations, and doesn't remove any.

According to the above tests, deserialising the grammar from the cache is faster than reproducing it from the source .lark file each time.

@erezsh
Copy link
Member

erezsh commented Jan 19, 2025

I think you're mistaken.

@NasalDaemon
Copy link
Author

I think you're mistaken.

If you would like to avoid the cost of de/serialising the grammar by default, I can make its serialisation optional instead, so you must specify cacheWithGrammar=True.

@erezsh
Copy link
Member

erezsh commented Jan 21, 2025

I prefer cache_grammar=False.

Also consider adding text about this in the reconstructor error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cache option does not work with Reconstructor
2 participants