-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove requirement for test_ prefix for pytest test modules #30315
Remove requirement for test_ prefix for pytest test modules #30315
Conversation
Before: 15202 tests collected in 123.14s (0:02:03) After: 15223 tests collected in 144.51s (0:02:24) |
f0888d7
to
8f599fd
Compare
8f599fd
to
193f626
Compare
193f626
to
1249d5f
Compare
cc: @ashb - I think you attempted to do it before, I think I found a solution. |
7840f7a
to
b90ad26
Compare
b90ad26
to
e30194c
Compare
Seems we have a few errors to handle before we merge that one :) |
8580edf
to
72095ba
Compare
Yes. As explained in the PR - we already hit this problem because we missed the fact that Those tests were not running when you run them in CI - simply because pytest collection rules misses them entirely because of lack of As result our CI tests do NOT test those tests for regression. And again - this already happened with databricks and smtp tests - so it will inevitably happen again unless we protect against this - this is not a hypothetical problem. We could of course add some pre-commits (and @hussein-awala attempted to detect such silently skipped tests in #30306 (comment) , however that was very simplistic and not fool-proof. I think the only way to avoid any such mishaps in the future is to add all files in "tests" dir to be pytest-collectable. |
(all right I also fixed k8s tests to have their own pyproject.toml). |
And BTW - another issue raised to pytest pytest-dev/pytest#10845 |
Also @Taragolis on top of the risks involved by accidentally skipping tests and risking regressions - one more point https://docs.pytest.org/en/7.2.x/example/pythoncollection.html#customizing-test-collection explain that you "might" want to collect all python files and go to a great length in explaining how to do it and how to avoid some pitfalls, so this is not entirely unusual to change it if pytest maintainers decided to explain it. |
Not only module name could be the reason that tests are not collected. The also other naming conventions also could be a reason, and we also have it in the past when test class doesn't start with We have additional check that tests exists for appropriate airflow modules, however it broken for a very long time and some of the modules doesn't follow this naming convention. I do not have any concern about scan all files if it safe, I really do not know how it affects to May be just better follow naming conventions (this sample only cover regular airflow parts but not breeze, k8s and charts test) and check it again:
|
Yep. But missing If you don't have Test* name (unlike module name) you won't be able to run this test in your IDE or command line. That's the big difference because not following I just made an experiment. I brought back "test_*.py" in my PR and I run run this:
Also the tests are nicely showing as Runnable tests in the IDE (regardless of pyproject.toml setting): Now, I renamed
I even used the colon syntax when I renamed the class:
And neither test class nor test methods are runnable via IDE any more: This is PRECISELY what happened when smtp tests were added (and maybe @hussein-awala can confirm it).
So this is what I want to avoid - I do not want to enforce standards (that might be follow up) - I want to make sure tha human mistakes are not causing regressions.
Sure. I absolutely don't argue with that. Even more - i will absolutely support it. Feel absolutely free to follow that. It will require quite a number of catch-up with the modules that do not follow it. a lot of manual fixes and renames. But it's an approach we might take in the future. I will be certainly supportive and happilly take part in it :). It will likely take weeks or months to complete though and likely quite some communication and multiple people to contribute. And I think this is a nice follow-up to he the change here and this change does not invalidate this one. It's done, green, functional. Standardized using I believe merging (or not) this change does not change anything with the idea of adding conventions? We can still add it even if we merge this one I guess? Or do you think this change is problematic if we would like to follow with implementing the convention you mentioned? |
BTW. It's all green - all the problems are nicely solved and it's ready to be reviewed and merged. |
I still think this is hacky and use some workaround things. This wouldn't be required if we had approved PR but unfortunetly this how the things happen, some one PRs more important that others. I'm not happy with this changes because instead of fix the reason why it initially happen we tried to add new workarounds and hacks (and we for month that test that check doesn't work), and this hacks and workaround could potentially break something in the future. But this is only my opinion, I just want to said it, rather than change some one mind. If someone approve, well why not, CI is green. |
Which PR? Why was it not approved? I think it's quite a jump you made from not approved PR to things being more important than others, implying a bit bad intentions in general and hinting into some accusations. I do not like the tone of this. If there is a problem, please raise it (likely privately if there is a personal problem you are dealing with and do not want to share it in public).
I do not think they will - we often do workaround and hacks temporarily and this one is followed up by two issues to pytest:
They might lead to removing the need for the workarounds. This is the usual practice we do here very ofen. I do not see why this would be different this time. And if someone makes a change to follow the conventions - we can easily remove the workarounds - they are environmental and do not change essentialy our tests we run. I see no reason why we should do it differently than what we do usually. |
I'd l;ove to merge this one in - as explained - it will prevent accidentally missed tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love the workarounds either, but this feels like a situation where we can't let perfection be the enemy of good enough, I think missing tests is much more critical of a problem so it's worth the change.
Though, having some mechanism (maybe just TODO??) to remember to revisit these workarounds in the future would be nice to have so they don't get forgotten.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although my preference is to detect these not collected tests in pre-commit (or in the CI), but the idea of collecting everything except provided files in collect_ignore
list is very interesting and can ensure 100% test detection .
Great work! LGTM
Currently we had a requirement to use `test_` prefix for modules for all our tests, however this means that we could have missed some of the tests from pytest collection when they were placed in a module without the `test_` prefix. This happened already: see apache#30311 and apache#30306 The previous attempts to remove it failed, because cassandra tests seemed to be incompatible with pytest collection when we allowed all Python files, also there were a few names that caused the pytest collection to fail with selecting all Python files for collection. The move is accompanied by converting pytest configuration to pyproject.toml. After long investigation, it turned out that the cause of cassandra failures was pytest assert rewrite with collided with Cython-related type_codes.py definition of types. See datastax/python-driver#1142 for PR that is aimed to fix it on cassandra side, in the meantime we are manually patching the type_codes.py file from Cassandra by adding PYTEST_DONT_REWRITE to its docstring when building the CI image for testing. Another error was using run_module method which also suffers from assert rewriting incompatibilities but this could be fixed by using run_path instead. pytest-dev/pytest#9007 Yet another assert-rewrite problem was that ``test_airflow_context`` also failed with ``--assert=rewrite`` turned on when python_files were not filtered. No solution was found on how we can disable the rewrite (apparently it happens for yaml.py file but adding PYTEST_DONT_REWRITE to yaml.py and related files did not help, so we had to extract the test to separate test type) Also test in docker_tests/kubernetes_tests should be run with working directory being in their folders in order to apply pyproject.toml from their directores (without specifying asyncio mode). The following changes are applied: * conversion of pytest configuration from pytest.ini to pyproject.toml (for main project, docker_tests and breeze) * addedd pyproject.toml to breeze, docker_tests, kubernetes_tests to reflect they need different configuration for pytest (lack of pytest-asyncio for example) * adding automated patching of Cassandra type_codes.py with PYTEST_DONT_REWRITE docstring marker * add pyproject.toml to docker context in order to include it in sources of airflow in the image * changed working directory for k8s suite of breeze commands to be kubernetes_tests * remove pytest.ini from docker compose mounting and automated removal in entrypoin in case old image is used * rename few breeze function to not start with "test_" so that they are not collected as test methods * remove few test dags from collecton by pytest by name * replace run_module with run_path usage in EKS test and test_www. * CI workflow is updated to use docker_tests as working directory for those tests * perf dags were moved out of the tests/test_utils dir to dev directory. * PlainAsserts test type was added and the ``test_airflow_context`` test is run in this separate test type when running parallel tests by default.
72095ba
to
c552e97
Compare
One of the steps in our CI is to collect tests, which is to prevent parallel tests from even starting when we know test collection will fail (it is terrible waste of resources to start 20 test jobs and initialize databases etc. when we know it is not needed. This however introduced a single point of delay in the CI process, which with the recent collection protection implemented in apache#30315 piled up to more than 5 minutes occassionally on the CI machines of ours, especially on public runners. This PR utilises our existing test framework to be able to parallelise test collection (Pytest does not have paralllel collection mechanism) - also for localised PRs it will only run test collection for the test types that are going to be executed, which will speed it up quite a lot. This might get further sped up if we split Provider tests into smaller groups to parallelise them even more.
One of the steps in our CI is to collect tests, which is to prevent parallel tests from even starting when we know test collection will fail (it is terrible waste of resources to start 20 test jobs and initialize databases etc. when we know it is not needed. This however introduced a single point of delay in the CI process, which with the recent collection protection implemented in #30315 piled up to more than 5 minutes occassionally on the CI machines of ours, especially on public runners. This PR utilises our existing test framework to be able to parallelise test collection (Pytest does not have paralllel collection mechanism) - also for localised PRs it will only run test collection for the test types that are going to be executed, which will speed it up quite a lot. This might get further sped up if we split Provider tests into smaller groups to parallelise them even more.
The cassandra hack added in apache#30315 does not seem to have a chance to get away. Neither Pytest pytest-dev/pytest#10844 nor Datastax datastax/python-driver#1142 want to own the problem for now (though there is a proposal from pytest contributors on how Datastax could refactor their code to avoid the problem) However during the discussion an idea popped in my head on how we could come back to test_* pattern with far less probability of missing some tests that are added to wrong files. Seems that we can add a fixture that will outright fail tests if they are placed if files not following the test_* pattern. While it would not help in case test would be wrongly named in the first place, it would definitely help to not to add more tests in wrongly named files because it will be literally impossible to run the tests added in a wrong file, even if you manualy do `pytest somefile.py` and avoid running collection. I also did a quick check to try to find cases where the test_* file name was already violated and I found (and renamed) two that I have found. It seems it is quite likely that similar mistake could be done in the future - but with the fixture I added it should be far less likely someone adds tests in a wrongly named file.
…32626) The cassandra hack added in #30315 does not seem to have a chance to get away. Neither Pytest pytest-dev/pytest#10844 nor Datastax datastax/python-driver#1142 want to own the problem for now (though there is a proposal from pytest contributors on how Datastax could refactor their code to avoid the problem) However during the discussion an idea popped in my head on how we could come back to test_* pattern with far less probability of missing some tests that are added to wrong files. Seems that we can add a fixture that will outright fail tests if they are placed if files not following the test_* pattern. While it would not help in case test would be wrongly named in the first place, it would definitely help to not to add more tests in wrongly named files because it will be literally impossible to run the tests added in a wrong file, even if you manualy do `pytest somefile.py` and avoid running collection. I also did a quick check to try to find cases where the test_* file name was already violated and I found (and renamed) two that I have found. It seems it is quite likely that similar mistake could be done in the future - but with the fixture I added it should be far less likely someone adds tests in a wrongly named file.
Currently we had a requirement to use
test_
prefix for modules for allour tests, however this means that we could have missed some of the
tests from pytest collection when they were placed in a module without
the
test_
prefix. This happened already: see #30311 and #30306The previous attempts to remove it failed, because cassandra tests
seemed to be incompatible with pytest collection when we allowed all
Python files, also there were a few names that caused the pytest
collection to fail with selecting all Python files for collection.
The move is accompanied by converting pytest configuration to
pyproject.toml.
After long investigation, it turned out that the cause of cassandra
failures was pytest assert rewrite with collided with Cython-related
type_codes.py definition of types.
See datastax/python-driver#1142 for PR
that is aimed to fix it on cassandra side, in the meantime we
are manually patching the type_codes.py file from Cassandra by adding
PYTEST_DONT_REWRITE to its docstring when building the CI image for
testing.
Another error was using run_module method which also suffers from
assert rewriting incompatibilities but this could be fixed by using
run_path instead. pytest-dev/pytest#9007
Yet another assert-rewrite problem was that
test_airflow_context
also failed with
--assert=rewrite
turned on when python_fileswere not filtered. No solution was found on how we can disable
the rewrite (apparently it happens for yaml.py file but adding
PYTEST_DONT_REWRITE to yaml.py and related files did not help,
so we had to extract the test to separate test type)
Also test in docker_tests/kubernetes_tests should be run with working
directory being in their folders in order to apply pyproject.toml from
their directores (without specifying asyncio mode).
The following changes are applied:
conversion of pytest configuration from pytest.ini to
pyproject.toml (for main project, docker_tests and breeze)
addedd pyproject.toml to breeze, docker_tests, kubernetes_tests
to reflect they need different configuration for pytest (lack
of pytest-asyncio for example)
adding automated patching of Cassandra type_codes.py with
PYTEST_DONT_REWRITE docstring marker
add pyproject.toml to docker context in order to include it in
sources of airflow in the image
changed working directory for k8s suite of breeze commands to be
kubernetes_tests
remove pytest.ini from docker compose mounting and automated removal
in entrypoin in case old image is used
rename few breeze function to not start with "test_" so that they
are not collected as test methods
remove few test dags from collecton by pytest by name
replace run_module with run_path usage in EKS test and test_www.
CI workflow is updated to use docker_tests as working directory
for those tests
perf dags were moved out of the tests/test_utils dir to dev
directory.
PlainAsserts test type was added and the
test_airflow_context
test is run in this separate test type when running parallel
tests by default.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rst
or{issue_number}.significant.rst
, in newsfragments.