-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Source generator POC #65
Conversation
I've dropped in a simple example of a behavioural test for the source generator. It invokes the Roslyn source-generation system and updates a compilation with the generated sources; this is what happens within the build process without the overhead of actually performing the build. Right now the test just checks the output was produced and the code generated matches a simple string. We could opt to compare normalized syntax trees instead, or do a full compilation and examine or execute the produced assembly. |
A few thoughts from doing this work:
|
@Code-Grump Isn't there a .NET framework mismatch then? If I compile my test assembly to .NET 6 on .NET 8 SDK, will the generator be running in .NET 6 or .NET 8? So with other words: should the generator's .NET version match the target framework's .NET version or the SDK's version? |
@Code-Grump Do you have an example in your mind about what other analyzers might cause a problem here? |
The analyzers run in the SDK's runtime. If you're building using a .NET 8 SDK, I want to believe your analyzer runs on the .NET 8 runtime. If you target .NET 6 but build on the .NET 8 SDK, you will still be on the .NET 8 runtime. This is part of the motivation behind why analyzers must target .NET Standard 2.0.
No concrete examples, but it seems plausible that an IDE could have some Gherkin language support which emits its own syntax errors and would "double-up" on errors in the output. It's worth keeping in mind that analyzers run in just about any .NET context, either as a build-time tool, or during the live editing process, so there are chances of "bumping into" other tooling we may not be immediately aware of. |
Next steps:
|
I don't know whether this is the right time to drop this in, so feel free to ignore, but I would like to change the generator in a way that everything should be in a single method. The current implementation depends (?) on the tear-down events of the test runners and the "ITestRunner" class level instance field, but this causes problems with parallel execution, so the best would be to put everything into a single test method and use a local variable for the testRunner and a try-finally for the tear-down logic. If this causes too much complication right now, feel free to ignore it for the POC. |
@gasparnagy The only complication would be determining the desired output. If you can give me an desired output for an example input, I can drop it onto a test and make it happen. |
@Code-Grump I will make a sample tomorrow. |
I made a hard coded sample. I reviewed and added comments so that it should be easier to understand. It is for MsTest, but I tried to mark the parts that will be different for other test runners. As you will see, I used all type reference with Tested it and it works (*). The feature file was:
The class that we should generate is:
(*) With this approach the very last "AfterFeature" hook is not invoked, but we can work this out in Reqnroll itself. We can ignore it for now. |
I've added an implementation of an MSTest generator using the example provided as a specification. I made a few alterations to target text, but kept it as close as possible, including comments. In this implementation, I focused on creating a strategy for test-framework specific and language-specific classes. After several attempts, this is the strategy I felt worked best:
Some things I learned doing thisTesting a code-generator's output as a string is a bit more painful than I had imagined.
We don't take advantage of MSTest specific details when creating our generated code. When I was looking at implementing my own Roslyn-based Gherkin test generator I was interested in leveraging xunit's similar features. It felt like leaning into the framework was the right thing to do. |
Forgot to mention: I put in a test-generator selector mechanism that looks for test framework assemblies to determine which framework is being used. We could reasonably put all the code generation and framework adaptation into the generator and package Reqnroll as a single NuGet package. |
Thank you, @Code-Grump for the improvements. Sounds promising. What should be the next steps? I think feature-wise the POC is complete. Do you have anything else you would like to experiment with before we start planning how to integrate this to the product?
That was intentional. (The current prod version of Reqnroll does use the fw specific features.) I would like to rely to the fw specific stuff as little as possible for the following reasons:
Of course if we find any benefit of using the test fw infrastructure for handling these, we can revert it, but as long this is only a try-finally, I think we can manage it. |
I'm not happy with the testing model at present. I've experimented with comparing output source to an expected string and interrogating the C# syntax tree more directly. In both cases it felt like a ton of fiddly work to prove the output met a very specific definition of "expected". However, I don't have any other good ideas for a "unit testing" approach that only considers the raw output of the generator. That makes me feel like there's no good unit test for these generators without compiling and executing the generated output. This is something we could do, but it's starts getting closer to the new suite of spec tests that I can see being assembled. So while I'm happy to try and put in some work to build a test of this kind, I'd like to know if it's suitable within the broader scope of the project and worth the effort.
I don't have the depth of knowledge on this kind of integration that you do, however my instinct is that we could hide some of the "magic instance resolving" required for methods like For example, the initialize and destroy methods:
One of the long-standing issue with SpecFlow was its inability to fully parallelise execution. Running test methods in parallel (on separate test class instances) is the default for xunit and MSTest, and NUnit can be made to do so with a pair of attributes. My gut feel is that letting test-framework control the lifecycle of Reqnroll objects affords more guarantees about execution and state, and more confident integration with other test-related libraries than trying to run "alongside" the test-framework lifecycle and looping into its events. The Roslyn generator has to be a clean break from the existing generator and affords the opportunity to re-think how we should integrate with test frameworks. I feel confident I can produce any output that we want to see, so the big questions now should be about what we want to see! |
There is no need to test everything in one go. If you only verify that the code generated is syntactically valid and has some marker stuff such as the correct class names etc. (e.g. by passing it to Roslyn and checking the AST), then that is a valuable fast test on its own (and probably less brittle than a full source text comparison). A full-blown test where you're actually generating and executing the tests generated from a feature file has its own merits, but can be done as an addition on top of the other tests, IMO. Also like you said whitespace has little meaning and can be very frustrating. Unless we're accidentally adding significant(ly more) whitespace it does not matter and we should avoid failing on the test. |
diagnostics.Should().BeEmpty(); | ||
|
||
generatedCompilation.SyntaxTrees.Should().HaveCount(1); | ||
var generatedSyntaxTree = generatedCompilation.SyntaxTrees.Single().Should().BeAssignableTo<CSharpSyntaxTree>().Subject!; | ||
|
||
generatedSyntaxTree.GetDiagnostics().Should().BeEmpty(); | ||
|
||
var generatedSyntaxRoot = (CompilationUnitSyntax)generatedSyntaxTree.GetRoot(); | ||
generatedSyntaxRoot.ChildNodes().Should().HaveCount(1); | ||
|
||
var generatedNamespace = generatedSyntaxRoot.ChildNodes().Single().Should().BeAssignableTo<NamespaceDeclarationSyntax>().Subject; | ||
generatedNamespace.Name.ToString().Should().Be("test"); | ||
|
||
generatedNamespace.Members.Should().HaveCount(1); | ||
var generatedClass = generatedNamespace.Members.Single().Should().BeAssignableTo<ClassDeclarationSyntax>().Subject; | ||
|
||
generatedClass.AttributeLists.Should().HaveCount(1); | ||
var generatedClassAttributes = generatedClass.AttributeLists.Single().Attributes; | ||
|
||
generatedClassAttributes.Should().HaveCount(1); | ||
var generatedClassAttribute = generatedClassAttributes.Single(); | ||
|
||
generatedClassAttribute.Name.ToString().Should().Be("global::Microsoft.VisualStudio.TestTools.UnitTesting.TestClass"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All this required to assert:
- There were no diagnostic issues raised during generation
- The generated file contains a class in a "test" namespace
- The class is decorated with the MSTest attribute
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we wanted to check for other attributes, or the test methods were present with their attributes, it's probably triple the line-count.
I do prefer this to checking an exact string output, but it's significantly harder to read and I think presents an immediate maintenance headache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recently discovered the Subject
/ Which
properties of fluent assertions that can make this code slightly better:
generatedClass.AttributeList.Should().ContainSingle()
.Which.Should().ContainSingle()
.Which.Name.ToString().Should().Be("global::Microsoft.VisualStudio.TestTools.UnitTesting.TestClass");
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should().ContainSingle()
does help reduce the bloat. This could just be a case of needing some syntactic sugar to make the statements less hurtful.
var generatedClass = generatedSyntaxRoot.Should().ContainNamespaceDeclaration("test")
.Which.Should().ContainClassDeclaration("Calculator");
generatedClass.Should().HaveAttribute("global::Microsoft.VisualStudio.TestTools.UnitTesting.TestClass");
This itself has no direct relation to the parallel issues and it cannot be replaced by the fw infrastructure unfortunately. Let me briefly explain, because this might be relevant and of course you might be able to spot out some logical errors in this.
The parallel issues that we would like to get rid of (and also enable test-level parallelization) are not caused by these, but mainly by two factors:
|
Regarding my knowledge the MsTest |
The example provided by MSTest includes a |
I am thinking about execution in terms of scopes, analogous to how a DI container scopes resources: if you were to define a singleton at the Test Run scope, that same instance is shared across all child scopes. Information in child scopes is isolated from siblings, but available to descendants. At a scenario scope, instances are unique to that scenario, not shared by anything else. If this mental model is correct, then the trick is making the Reqnroll runtime match those scopes, when some/all of it is tied to the test-framework's lifecycles. In my head, with my knowledge of the three test frameworks supported, that seems straight-forward: they already provide hooks that match the lifetime of these scopes, or whole context-management systems. The only sticky part I sense (because I haven't attempted this in every framework) is propagating our scopes where they need to be - essentially what the "test thread" concept is trying to solve. I think it could be achieved via the test frameworks' own mechanisms. xunit (for instance) provides a particularly good mechanism for supplying context to all test methods in a class, or to all tests in an assembly: https://xunit.net/docs/shared-context I am proposing only a series of thin shims that map these scopes to the test frameworks; execution within the Reqnroll runtime would just continue as normal (assuming my mental model holds up.) |
Basically that could be used to replace the static field we have in
This is how Reqnroll works. We have a container hierarchy, quite much how you described. But there is the "test thread" container in between the test run (called "global") and the feature one. See https://github.com/reqnroll/Reqnroll/blob/main/docs/extend/available-containers.md
Because of the test thread container, I don't see such an easy match. The shared context stuff you mention for xUnit is a pre-defined grouping of tests that are dependent on a resource. But the test thread is an ad-hoc grouping of test executions that happen to run in the same logical thread. Or maybe I got your thoughts wrong. |
You are correct, I think the "Test Thread" container is unique to Reqnroll's architecture, not easily married to any of the test frameworks. Since the advent of Task in .Net we've been encouraged to not think about threads and thread-affinity. I appreciate this has become more of an "executor context", than being specific to a thread. I suppose I'm posing the question: does it make sense to have this container going forward? The easiest path to concurrency is to have a model that is largely ignorant of its execution specifics - either you're in "local" state and can assume no concurrency, or your in shared state and you must assume concurrency. These thoughts might be too radical; I have been experimenting about how to make Gherkin-to-C# kind of thing work in xunit for a while, and would consider affinity to an execution context to be something to generally avoid. I'm prepared to believe it's useful in Reqnroll, but it goes against my instincts and experience. |
I absolutely agree. Maybe the name of the "test thread" concept is not ideal. That can also be seen more as an "execution context" and it is not specific to a particular thread.
I agree. Test automation, however (especially integration testing that BDD is naturally connected with) is in many cases game of trying to find the right compromises. I have not seen many projects where all the dependencies were easily testable. In contrast in many cases we had some dependencies that needed special smart solutions for testing. The test thread concept is just one option to be used in this tricky situations but that should be avoided generally (similarly to the global or the feature context, or static fields). So this is not necessarily contradicting. Whether this is really the responsibility of Reqnroll to give help in this is a valid question. Maybe not, but we have it already and it has been useful in many situations. (E.g. it makes much more sense than the feature container and the before/after feature concept, that is totally senseless in a parallel & random order execution.) I liked your comment on the step-retry plugin about Polly. That is a nice example of accepting the need but not taking the responsibility with Reqnroll. On the other hand I have also seen teams that does not have such a broad overview of the possibilities and they are overwhelmed with the number of different libraries and tools that they need to combine to solve their tasks. It's a tough decision in either way. I think in the current situation, removing it (and handling all related support cases) would be more costly then keeping it, even in long term. So let's keep it for now. Once we have the new gen model and better testing infrastructure we can reopen the next big pain point of parallel execution problems. If we will find that the test thread is part of the root causes, we can think it over once again. In any case, the new gen model will be simpler, cleaner, better tested and better testable then the current solution is (👏), so any changes will be easier to make. Let's go step by step. |
Quite happy to proceed this way. As I said, my perspective has been influenced by trying to build something considerably more minimalist and leveraging test framework features. I appreciate you taking the time to consider my opinions, despite my lack of experience with Reqnroll's inner workings. To take this from a prototype to something shippable, I think there are a couple of important decisions to make, because they're expensive to pivot on later:
And although this is something we should be comfortable to pivot on, it would be good to have a clear target up-front: what does the scope of generator need to include for its initial version? A concrete list of what it has to achieve to be "good enough" would be ideal. |
My thoughts on these questions.
It's a big change, but it seems feasible. It creates a new problem: determining which test-framework is present. I think it would make sense to have a configuration option which allows manual selection of a test-framework, overriding any logic. It makes shipping Reqnroll and installing it into a project significantly simpler than at present. The generator can be shipped in the same NuGet package as the runtime - it has no actual dependencies on test-framework libraries and only requires them to be present in the final compilation. We end up with one package that does all the things required for the vanilla Reqnroll experience.
My gut feel is to want to test the behaviour of the generator: to produce code that invokes the Reqnroll runtime in the expected way which is decorated correctly to run within the specific test-framework. I would create tests to prove:
Some of this will overlap with the integration tests, however these feel like useful unit tests, because discovering a failure in an integration test will be much harder to diagnose.
Using a series of I've tried to break down the composition process into small chunks and I think it's made it pretty good, so I'd be happy to live with this model, however there are templating languages that could be "fast enough" and provide a much better editing experience. If anybody had experience with getting high performance out of a templating engine, I think it should be a consideration.
If we want to use the Reqnroll config files for code-generation, we'll need to make them an input to the compilation and figure out a way to efficiently represent the relevant content within the generator. I'm a little worried about a need for extensibility in this space - ideally we'd read from a config file once and provide the structured data to everything that needs to know about it. This is pretty easy when you can represent the output as an object-graph, but if we need to be able to provide values without knowing their structure, we might have to expose some kind of Alternatively, we could move all generation concerns to the built-in configuration mechanisms: .editorconfig and properties from project files or passed in from the command-line. This requires a lot less code and infrastructure to support, but it does represent a shift from current thinking.
I think this should initially be:
|
My comments:
I would suggest a mixed approach: move all code to the main package as you suggested, but keep the test fw specific packages that would just list the dependencies (Reqnroll and the test fw) and would configure the new infra for the selected the test fw (e.g. by setting an MsBuild prop that we could read out in the generator). (In this way they would be optional, as anyone could set the prop manually as well.) The benefit of this would be that it would make the configuration easier for the simple cases and all the documentation / videos etc that we have on Reqnroll (and SpecFlow) setup would still remain valid.
For verifying that the generated stuff really calls Reqnroll correctly, we already have the SystemTests project (soon finished, but already tests a lot). Those tests will remain the same (as they are end-to-end anyway), so they will server as regression tests as well. So we don't need to worry about that part. What we need to include here is unit tests, so basically from our list the first two:
For these unit tests I would probably make assertions on the dom as you shown earlier, not a string assertion. You are right that the integration tests (in our case the SystemTests) are harder to diagnose, but I see no other option to test the actual behavior, because you need to get the test fw involved, that you can only do with an e2e test.
As this generation will not be open directly to plugins anymore, we have more freedom here. So I would just follow some evolutionary design. As long as "string concatenation" is good, we use it, if we run into troubles with it, we will refactor. It will be our internal stuff anyway. So as you said "live with this model" is fine for me. I don't have much experience with such low-level perf improvements. I don't think a templating engine would improve much and at the end, the code gen was already part of the build process (in a slightly different way) and it was not catastrophically slow, our new solution must be already significantly faster by not using code dom. So I would not deal with perf opt yet.
I don't think we have any extensibility options in reqnroll.json config, so reading it up and store the deserialized version should be fine. Maybe for selecting the test fw we should use MsBuild props and not the reqnroll.config (see above)
I agree to your suggestion, depending on the rollout strategy (see below), we can make it even simpler by not even supporting additional attributes.
I think the best & safest would be if we would be able to make a two-phase rollout: we somehow keep the old generation strategy (for a limited time) but also allow to use the new one. (Maybe we could have the new as the default and keep the old as a backup.) This could be achieved in a way that the If someone would like to go back to the old model, they can manually add the |
The Verify plugin as of now requires the attribute because it depends on an old version of Verify and xunit. If we were to update the dependencies to the latest versions that requirement drops as well. |
Good news. We can upgrade then to the latest Verify once we get closer to roll out of the new gen. |
Thanks for the input, everybody.
I can see a path to making a small test that mocks the Reqnroll runtime so we can verify we make the proper sequence of calls with the right values. We'd just need the generated code to reference a "root" that we can mock during a test.
This would require a small change to the runtime so there was some internal root we could mock, and then small changes to the code to reference the new root. I could add this kind of test to the POC if we want to get an idea of the effort required or if we like the test style, but I'm quite prepared to put this on the backburner if we'd rather lean on the system test for now and focus on getting a 1.0 version done. 😁
If we're happy to, dropping the requirements around custom attributes would considerably shorten the path to a release version; they can always be added later. |
Yes, that could be done indeed. Good point. But that would not involve the test fw so it would only be able to test half of it. And my feeling is that it would require quite a lot of efforts. So I would let that go for now. We will need something similar in my envisioned renovation plan for the "Specs" project, maybe together with that we can re-discuss the idea. |
I've started on a robust implementation based on our POC. I have opened this issue in the Gherkin project that would be useful to have cucumber/gherkin#246 @gasparnagy |
Makes sense. I commented it there. |
Two projects have been added:
The following functions are demonstrated:
Not demonstrated:
Unit testing the generator using Microsoft's Roslyn testing libraries.Created by @Code-Grump related to the discussion: https://github.com/orgs/reqnroll/discussions/11