lint: handle multiline jinja2 code #6552

wxtim · 2025-01-11T11:25:16Z

Closes #6348
Closes #6507

Check List

I have read CONTRIBUTING.md and added my name as a Code Contributor.
Contains logically grouped changes (else tidy your branch by rebase).
Does not contain off-topic changes (use other PRs for other changes).
Applied any dependency changes to both setup.cfg (and conda-environment.yml if present).
Tests are included (or explain why tests are not needed).
Changelog entry included if this is a change that can affect users
Cylc-Doc pull request opened if required at cylc/cylc-doc/pull/XXXX.
If this is a bug fix, PR should be raised against the relevant ?.?.x branch.

wxtim · 2025-01-11T11:25:59Z

@oliver-sanders Please could you have a look at this proof of concept before I bother testing it?

fix linting

oliver-sanders · 2025-01-13T16:46:52Z

I think this is going to be the quickest way to get over the state barrier, but going statefull is a good opportunity to review our approach here before we lay down a pattern which ends up biting us later.

I haven't really given much thought to this so far, I don't have working knowledge of modern linters, a bit of reading would be a good idea...

Mid to long term, it depends how far we want to go with the linter. The initial idea was a quick and dirty line-by-line regex tool, but we thought we would leave it there (we decided it would not be statefull). Now we are clearly at the point where we want to develop this into something more serious.

I think the optimal solution for this sort of problem is something more tokeniser'ey. I.e, a program that reads the file into labeled bits rather than lines, allowing rules to be run on each "bit" in a more structured way than regex would allow by itself.

E.g, here's an example of how a tokeniser'ey type thing might see the file:

>>> for type, content, state in tokeniser(file):
...    print((type, content))
('shebang', '#!Jinja2', <file:flow.cylc, lines:[0], preproc:jinja2, path:>)
('blankline', '', <file:flow.cylc, lines:[1], preproc:jinja2, path:>)
('section', '[scheduling]', <file:flow.cylc, lines:[2], preproc:jinja2, path:[scheduling]>)
('config', '    initial cycle point = 2000', <file:flow.cylc, lines:[3], preproc:jinja2, path:[scheduling]initial cycle point>)
('cycle point', '2000', <file:flow.cylc, lines:[3], preproc:jinja2, path:[scheduling]initial cycle point>)
('section', '    [[graph]]', <file:flow.cylc, lines:[4], preproc:jinja2, path:[scheduling][graph]>)
('config', '        R1 = """\nfoo => bar\nbar => baz\n    """', <file:flow.cylc, lines:[5,6,7,8], path:[scheduling][graph]>)
('graph', 'foo => bar\nbar => baz', <file:flow.cylc, lines:[6,7,8], path:[scheduling][graph]R1>)
('jinja2', '{% if ... \n ... %}', <file:flow.cylc, lines:[9,10], path:[scheduling][graph]>)
('jinja2', '{% include ... %}', <file:flow.cylc, lines:[11], path:?>)

It's similar (but less detailed & more annotated) to what our syntax highlighters see. Note, we are carrying the Cylc config "path" where the information is available allowing us to apply specific rules to particular sections or configs and each entry could represent multiple lines. This would allow us, for example, to apply rules, only to graph sections, and to do clever things, like read a few lines ahead to check if different task configs are compatible.

It might be relatively simple to get a paired-down version of this working, e.g something that has only awareness of "plain" Cylc lines and Jinja2 expressions.

There might also be the option of building a linter into one of the Cylc grammars we have already developed? I wouldn't be surprised if this is becoming common in the LSP era.

I mention these things largely to curtail the scope of what we do now. It might be that the Cylc linter never gets much further than it is now, in which case this is a pretty decent pattern. But it is also possible that we end up developing this much further and the community starts contributing rulesets. So we should be cautious of how far we push this pattern, if it's starting to look like this utill might balloon, we should be ready to change tack rather than bang our heads against the pattern we've got. If there's a pattern out there we could easily adopt, this might be a good opportunity to do so before we start trying to get clever with our line reader. And if we do go ahead with this (and I don't think that would be a bad option), we should be cautious over how far we push it.

hjoliver · 2025-01-15T01:42:32Z

Much as I like the linter (I do!) we should think carefully, and maybe look at how widely it is used, before sinking a lot of effort into making it significantly more general than it is now (I think this gels with what Oliver is saying, but in more general terms). (Priorities vs resources, as ever...)

wxtim · 2025-01-15T07:43:41Z

Much as I like the linter (I do!) we should think carefully, and maybe look at how widely it is used, before sinking a lot of effort into making it significantly more general than it is now (I think this gels with what Oliver is saying, but in more general terms). (Priorities vs resources, as ever...)

We already have validate, and I'm against trying too hard with lint. If that means that eventually there are things we can't deal with, so be it. It's already a goodish way from "a sed script to identify 7 to 8 changes".

wxtim self-assigned this Jan 11, 2025

wxtim requested a review from oliver-sanders January 11, 2025 11:26

wxtim marked this pull request as draft January 11, 2025 11:26

oliver-sanders removed their request for review January 13, 2025 10:59

wxtim force-pushed the lint.state_object branch from 7284e7c to 8335bd2 Compare January 13, 2025 11:00

wxtim marked this pull request as ready for review January 13, 2025 11:01

wxtim force-pushed the lint.state_object branch from 4477ce9 to becc52e Compare January 13, 2025 11:02

wxtim changed the base branch from master to 8.4.x January 13, 2025 11:02

wxtim and others added 2 commits January 13, 2025 11:05

Devise a tidy way to hold more state for Cylc lint

b3c2ab7

fix linting

move LinterState to it's own directory

b784135

wxtim force-pushed the lint.state_object branch from becc52e to 9191eae Compare January 13, 2025 11:05

added tests for linterstate object

7437cfa

wxtim force-pushed the lint.state_object branch from 9191eae to 7437cfa Compare January 13, 2025 11:07

wxtim requested review from oliver-sanders and markgrahamdawson January 13, 2025 11:07

oliver-sanders changed the title ~~Lint.state object~~ lint: handle multiline jinja2 code Jan 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lint: handle multiline jinja2 code #6552

lint: handle multiline jinja2 code #6552

wxtim commented Jan 11, 2025 •

edited

Loading

wxtim commented Jan 11, 2025

oliver-sanders commented Jan 13, 2025

hjoliver commented Jan 15, 2025

wxtim commented Jan 15, 2025

lint: handle multiline jinja2 code #6552

Are you sure you want to change the base?

lint: handle multiline jinja2 code #6552

Conversation

wxtim commented Jan 11, 2025 • edited Loading

wxtim commented Jan 11, 2025

oliver-sanders commented Jan 13, 2025

hjoliver commented Jan 15, 2025

wxtim commented Jan 15, 2025

wxtim commented Jan 11, 2025 •

edited

Loading