-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Writing Submission Tests need total restructuring #464
Comments
I can tell you who wrote it (most probably me, just like majority of docs on authoring), and I can also tell you who did not answer to any of the numerous calls for review and opinions when drafts of docs were announced, and when initial versions of docs were published.
This sounds almost as if you were reserving all rights for calling authors incompetent for yourself :P There is more than one reason why the document are structured the way it is, and I will be glad to discuss it. When writing the docs I usually tried to make every paragraph justified and supported by reoccurring issues, so I can say that while probably not always good, every point is justified in some way and it's not just a mash up of ideas pulled out of thin air. W.r.t. ""avoid reference solution", well, it's exactly this: I (like: me, personally) think reference solutions in tests are overused, lead to problems, and that tests which avoid reference solutions and use calculated inputs with known answers are underrepresented. I know it's not always possible to avoid reference solutions, but it's possible more often than not. The section is so high on the list because it potentially may have a big impact on quality: when tests don't rely on a reference solution, a few classes of problems just disappear and become irrelevant.
The order here is mostly related to the efforts related to support of affected users. In my experience, problems caused by incorrectly implemented reference solution and problems caused by incorrect handling of potentially mutated inputs are one of the hardest to debug, handle, and help out with. I don't really agree it's just a technical detail, because it leads to significant costs and efforts of supporting users. When solvers face an issue caused by a mutated input or an incorrect reference solution, flow usually is: Affected user: "I have a problem with this kata, it seems to do [something impossible], and assertion message is [some mutated input]" Discouraging reference solutions and prominently pointing out guidelines related to mutation of input is meant to prevent exchanges like the one above.
This paragraph is meant to address kata like chess puzzles or grid puzzles (sudokus, skyscrapers, nonograms) where it's difficult to generate a valid input configuration randomly, and the advice about random order is meant to prevent solving by counting. In retrospective, this indeed does not seem to be very helpful, because even if not by counting, such shuffled tests of fixed inputs can still be worked around by pattern matching.
This point is meant to address a couple of issues, one of them being authors using a mess of a golfed code as a reference solution for code golf kata, or authors using awfully slow reference solution in kata which are meant to accept non-performant, slow user solutions.
I am not exactly bothered by single occurrences of users misinterpreting or not understanding some paragraph. While it would be great to have docs perfectly clear to everyone, as a not native speaker and a person probably strongly zoomed into the existing stuff, I find it very difficult for me to create such docs in a way they would cover every reasoning readers could come up with. I will definitely think of improvements in areas where complaints are reoccurring. Good thing about the docs is that |
I strongly disagree with this. In particular, you seem to have ignored the major reason why random tests are generated the way it is generated right now, and not how it was generated back in 2013: random tests should cover the entire input space of values, and should sample each relevant category of input space enough. I linked this specifically because your ill-advised opinion has lead a new kata creator to avoid putting a reference solution by generating extremely simple to reverse-engineer random tests. In the face of an adversary this is as laughable as having no random tests, and is hence more harmful than what it's worth. (Unless you want to defend against this by using In addition, the current way random tests are done, it put user's own solution head-to-head against a yet-to-be-verified reference solution. This is the fastest way to check the difference between two implementations, and the understanding of the spec between kata author and user. You'd have an astronomically low probability to pass the kata with a solution different from the reference one if hundreds, or thousands of tests are done against both of them. You just can't pass the kata. I would also like to call a [citation needed]. Please explain how the majority the katas (beyond the simplest
...Then it's the site's problem for allowing said user to write/translate katas in the first place? A kata author should have a sufficient level of proficiency with programming in general, and the language itself before writing a kata. I don't know why you're making input mutation a big deal here; object mutation is a very basic concept, that it'd be reasonable to assume someone who is qualified to write katas should also know what it means, and how to deal with it. Putting it at such emphasis is honestly ridiculous: what is the target audience of this page? Users who aren't quite qualified to write katas yet? Then they should probably should learn more before writing katas in the first place???
Again, don't buy it. It sounds like your intended purpose of the article is for technical troubleshooting, like an FAQ. However this article definitely isn't intended to be one (just look at where it's placed), is too long to be one (your intended audience would go tl;dr at this wall of text very quickly), and seem to serve your desire to save your hassle instead of giving a clear reference documentation for a person who wants a write a kata to absorb the relevant knowledge. If you want a FAQ to point novice into, put it in another page. It doesn't belong in this page.
If this is the point, then the paragraph is grossly obtuse for conveying the point.
The denominator, however, is very bothered by your bliss: please note that the amount of times the kata author is confused enough to ask for help (with power users stepping in to provide assistance) is also among single digit. This would be at least 10% of occurrences, which is very significant. (Also, why the hell did @JohanWiltink liked the comment? This literally just happened last month that exactly proves my point. I did not raise this issue for no reason.) |
I liked Hob's comment to show my support for him, his position, and his work. Hobs and I disagree on how to write random tests and random generators sometimes, but I respect his position. You just seem hell-bent on burning to the ground anything you don't like, and its creator, and its immediate surroundings. You might try being a little nicer about the work he did and how it could be improved. |
If you have nothing to add to this topic perhaps you shouldn't leave a comment showing off-topic support anyway (that kind of thing belongs to Discord or somewhere else). I was mentioning you above as a part to this topic, which I don't see any problems with. My motive for bringing up this issue is simple:
And, as far as the page is concerned I cannot claim that it makes sense or is helpful at its current state. Considering that this is one of the most important pages quote by everyone whenever a kata author needs help, it deserves much more scrutiny and effort than whatever it has gotten by far for over a year, so I do not consider my response overblown. (Also, I have no intentions to contribute, as I do not wish to bear responsibilities to anything resembling official CW affairs. I just want to solve katas nowadays. If you're not okay with that, 🤷) |
I would be more than happy if things were like this, and yes, it would be great if authors followed this guideline. This is also emphasized, for example in the Random Tests section. Problem is, this does not really happen, and I also partially blame for this the practice of using reference solutions. Having a reference solution, authors rely too much on a technique of randomly spraying RNG and feeding whatever it generates into ref solution, then into a users solution, and comparing for equality. Due to this, I encountered things like:
I do know that it is an indirect effect, and it's not explicitly mentioned in docs, but this is also something what bothers me: when a reference solution is discouraged, I would believe it's more probable authors would come up with more structured tests, with better balance between scenarios, and with better coverage. Because when you have no reference solution, you need to think what outputs you can test for, and how to generate corresponding inputs.
Absolutely not. If it were up to me, I would establish a rule which makes writing tests with undebuggable feedback and ones which do not present inputs on failure a bannable offense ;)
In this specific case, I honestly do not blame the point of the guideline itself, but either the wording and language skills of the writer, or the user who did not put enough consideration into it. I still stand at the opinion that easiness of just slapping a reference solution into tests increases potential for problems of various types which are not there if a reference solution is also not there. I think the user just did not get the point of the guideline. I accept the possibility that one of the reasons could be wording, grammar, or composition of the writing. But if you mean that the very point of avoiding a reference solution where possible is wrong, you still gotta change my mind, and I am open for discussion on this topic.
I make the input mutation such a big deal not because it's a complex concept. I make it such a big deal, because together with another common issue: rounding of floating point values, it makes problems reported by users difficult to diagnose, reproduce, and explain. I disagree that having a reference solution makes it easy to find errors in user solution by comparison: my experience is exactly opposite, and I believe that user solution has 50/50 (figuratively) chance of passing based on how tests are constructed, or whether it repeats the same errors as reference solution does, or maybe does things differently. If a user solution uses different order of operations than a reference solution and gets rejected, then such comparison is worthless. When a user solution mutates input and pulls the rug from under reference's solution feet, such validation is also worthless. Having a reference solution does verify user solution's conformance to a reference solution, but not necessarily its correctness. You seem to very optimistically assume that the reference solution is correct, and that tests are composed in a way which calls both user solution and reference solution in a way they don't affect each other. It's not always true, and when it happens not to be, it induces very expensive support events.
This paragraph just makes me think you are not Voile. We both know that authors, at least the ones who would be pointed to the guidelines, are none of this.
Then it's most probably a skill issue. As I said, expressing ideas in English is not easy for me, and, frankly, work on the docs was very exhausting. I did all of this because I hoped that when the guidelines are ready, I (and others) can just post links into discourse when reviewing betas. I was looking for support (i.e. review and opinions) everywhere, and incorporated any useful feedback I got.
The article is not meant to be a FAQ. It's meant to be a series of paragraphs and bullet points used by reviewers to point authors to, whenever an author does something potentially causing problems to users. Actually, it's not exactly mean to be a tutorial, or a read-up. When I was working on the guidelines, I mostly thought of them as a collection of linkable reactions to issues which I encountered while solving, reviewing, and fixing Codewars kata. "Hey, this is wrong, you should do this like that-and-that to avoid potential problems when solving". It is perfectly possible that it reflects mine desire, because, uh, I wrote it in its major part. I am also open to critique and remarks and ideas for improvements, but before I decide to remove anything, I would like to be proven wrong. Note: I do not consider myself the owner of the docs and the only person allowed to introduce changes into them, but, from my experience, not many others bother. So except like three users or so, there might be not many to talk to. |
Docs are a community work, and as such, I am not exactly sure they constitute "official CW affairs". Additionally, your feedback can be accounted for in many ways. If you do not want to submit PRs, that's fine. I cannot fathom tho why would it be such a big problem to contact me on Gitter and PM me that "yo dipshit, this paragraph would be better to sound this-and-this". We could then discuss, exchange ideas, explain why we think things are good or bad, and come together to some conclusion. Just complaining without guidance is pointless. |
Both practically and pragmatically speaking, random tests are not written by generating randomized cases of a specific pattern, but limiting totally random cases with specific constraints, because:
Random test quality is a thing, yes. So let's look at what randomized testing framework like QuickCheck does: it generates random cases that obey to constraints you explicitly add to the generator (aka the latter). It does everything right:
While in some cases you can avoid having a reference solution while also having said high quality random test cases, in general it's either not possible, or at least as hard as NP (while the reference solution would be P, because otherwise it'd time out). So I do not see why it is encouraged to not write a reference solution, which is much harder in most cases. It should be a conscious choice to not include a reference solution, not a rule or a serious suggestion. There are also a class of katas that test behaviors of user code (e.g implement a data structure), or verifies that the result obey specific properties. In these cases indeed you're much more likely to not need a reference solution, but these katas are very rare, and you'll need custom-made testing code anyway, so you're going to put a "don't try this at home" sign regardless. They don't apply to 99% of the kata we see in beta.
In a vacuum this is completely true, but this page is about the beta process, and we're not sending a bunch of novices to perform beta testing. Over 90% of the users who regular do beta katas are power users who are very proficient. These kinds or problems are supposed to be caught during beta. Telling users to not write random tests powered by a reference solution "because the nasty pitfalls will happen" is akin to burying the head in the sand: you hid the problem but never actually ensure they don't happen. This is what leads to all the hidden specs under "gotcha" premises ("you should've read my mind from a mile away!"), issue-ridden approved katas with dubious random tests around. Making it fail fast and hard during beta is the point: you need to make sure the kata author's understanding and implementation is aligned with users' understanding and implementation from the kata description and implementation (I've always left a lot of flak over kata authors who consistently and actively violate this for many approved katas), and that the potentially bugs are discovered with a high chance. Not using a reference solution would do the opposite. (Another elephant in the room: I don't think the bug where initial publish from draft doesn't validate user code against test fixture has been fixed yet. So having a test fixture that isn't inclusive to solutions close to the correct one is a good thing.)
This issue is openly for this purpose. Overall the article feels very hostile (just like how the beta process has been for a long time). There are in fact many more pitfalls to watch out for other than those, but they don't belong to this page: this page is for explaining what submission tests is comprised of, so when someone asks you "what is a random test" you point them to here. All the common pitfalls can go into a separate "Submission Test Pitfalls/Hardening" page. This page currently feels like a infosec page: plattering someone with all the pitfalls before explaining the actual thing (which is very unwelcoming), and giving instructions rather than explaining why things are the way they are (which gives the impression that the kata author is so novice they need explicit herding). It really looks like every time I look at this page, new things are added/discovered that make it look even more hostile, so 🤷 |
(side notes)
In CW's context, the random tests are technically here to forbid hardcoding the answers, actually. But good random tests definitely make for a more qualitative kata, yes. Whatever the way they are written, as long as they are good.
This page (assuming we're still talking about "writting random tests") is more about authoring than about beta testing/reviewing. (back to main topic) About "writing the random tests": could it be helpful to give an example of the thought process (so that no concrete language is involved) to build random inputs/tests on some examples like those hob's shown there? (like, for testing prime numbers, or palindromic strings) |
I managed to apply my idea of avoiding a reference solution in a couple of kata or translations which required some overhaul, and I noticed some pros and cons of this approach, but my main impression is: it's applicable to more kata than I initially expected. Some approximate examples (not actual generators I used in kata, but kind of illustrate the idea):
I do not say that there are no problems with this approach:
When you say that avoiding a reference solution is difficult, we either consider different kinds of kata, or I am stupid and miss some complexity.
I believe this might be the main point of the discussed matter. The remark states "if possible", and is placed at the top, because I would like it to be a very first possibility to consider, and not because it's absolutely necessary, and a kata will be rejected if it does not conform to this requirement. Guidelines are, well, guidelines, and not absolute requirements, and have their scope of applicability. I do not think anyone would be fighting hard in cases where applying it would be infeasible. I like its prominent position at the top because I'd hope this draws attention to an often overlooked, and at the same time potentially helpful, possibility. I treat a reference solution as a potential hole for bugs and issues (what very often proved to be true), and if reviewers agree that author's reference solution is used reasonably and seems to be correct, then all good. Maybe all what is needed is rewording the section in some way to reduce the impression of how absolutely necessary it is (not). If you have any idea for a better wording, then I would be glad to hear. At the same time, I am not sure it needs to be moved into a less prominent location, because, as I said, I still think that making at least some effort at avoiding a reference solution is a good thing, and that the possibility itself is often missed, while I'd love to see it being at least considered (even if rejected later on) by authors more often.
Now, I honestly admit: I am confuse. Either something has changed over time and I missed it, or we talk about two different beta processes. I do agree that regular reviewers, including you, do their job of weeding out issues very well. At the same time, the beta process is still what it is: prone to misuse, and abuse, and with thresholds too low to prevent bad kata from slipping through. Even if all reviewers do their job perfectly, a bad kata is still three blind upvotes away from approval. I totally share your sympathy towards quality of work of currently active reviewers. But I do not agree about a high quality of reviews in general, and especially at the time when the guidelines were written. Point in case: authors blindly approving translations incoming to their freshly published betas.
Seriously, I fail to see a connection here. I hear how you say that lack of a reference solution makes the quality worse, but I honestly fail to understand this. I'm probably just dumb.
You put it really bluntly, but when I was working on the docs, this kind of an impression (i.e. "authoring is difficult") was kinda one of the goals. I wanted to use it to balance poor support and implementation of the beta process in the system itself: if there are no functions guarding against bad quality, and the functions which are there do not protect from bad quality, let's at least have docs which prevent (also by their form) users from introducing bad quality. The docs were not meant to actively discourage users from authoring, but there might be a glimpse of a premise of discouraging users who are not willing to put an effort into authoring and who do not care about quality. The guidelines are not meant to be the only resource for authors. The practical application of guidelines in particular languages is meant to be illustrated by language-specific authoring guides, currently available for Python, JavaScript, C++, C, Ruby. There are some queued for Haskell, Java, Scala. The articles probably have their own set of issues (which I would be glad to hear), but mainly they are meant as a first line of support for authors, rather than the articles with guidelines themselves.
Maybe the issue just got covered by a heap of new ones and needs refreshing. |
I am wondering how to apply your feedback on this part of docs, and if I understand correctly:
A first step could be to move the "Reference Solution" and "Input mutation" sections to less prominent locations: make them the last on the page, or between "Random Tests" and "Performance Test" - what do you think would be better?
As I tried to argue above, I do not really agree that the paragraphs you mention are not applicable in such a wide scope, and I still think that authors could put more effort into better generation of inputs which would make it possible to avoid reference solution, and, as a result, a couple of classes of potential errors. I agree that the paragraphs could be reworded in some way, to reduce "severity" of the guideline, and make it sound more like a suggestion/advice/reminder. I would need some help with this tho, because I might have difficulties with expressing this clearly in English.
I still think that this guideline is relevant for a remarkable kind of tasks, and I hope it would avoid authors from going "But it's impossible to generate a random chessboard!" etc. I agree that it might be rewritten in some way, to make it clearer that it applies to some specific kind of kata, and potentially reduce stress on shuffling, and add more stress on randomly applying some validation-preserving transformations (rotating, swapping, flipping boards) to some fixed, "base" inputs to make pattern matching harder. But I would still like to keep the guideline in docs. I will prepare a PR with my proposal of changes, and if you think I still got something totally wrong, let me know. I would also appreciate as direct help as possible, regarding hints, ideas, wording, grammar, spelling, and generally everything what would help me (re)writing text in a foreign language. As I mentioned above, work on docs is not easy for me due to the language barrier, and everything gets written really slowly and takes a lot of time. The more help I get, the better and sooner the changes will be. I will post a link to a PR when I create one. |
I created a PR #465 but other than reducing the severity of the "do not use a reference solution if possible", I cannot think of anything more. I slightly reworded the "randomized tests" part so the guideline would be more difficult to mis-apply, but in the case pointed out by you I think it's a deliberate attempt at laziness, and I don't think any rule can prevent this :( As usual, I really appreciate all and any feedback. |
I updated some things, but I do not know if in a way that would make you satisfied. |
May I know who wrote Writing Submission Tests, specifically the "Reference Solution" part, and why it is so focused on emphasizing
that they are put at the top of the list? Like, this part:
So said person is saying "we shouldn't have reference solutions powering random tests because you might mess up the reference solution"? Wut? Why don't said person just say "we shouldn't have people creating katas because they most likely would mess up something"? This is absurd and it absolutely should not be written like this.
In fact, why do we have "Reference Solution" and "Input Mutation" even before "Fixed Tests" and "Random Tests"? What is more important, having fixed tests and random tests, or defending the tests against input mutation? The former is an absolutely requirement, the latter is a technical detail.
"Input Mutation" should definitely go after "Random Tests", and IMO "Reference Solution" section need to cut down the first two paragraphs, which literally doesn't apply to 99.99% of the existing katas. It does not help anyone and would instead add more to the confusion, like this.
(Also, I would seriously consider removing the "Under some rare circumstances, it is allowed to use so-called randomized tests instead of fully random ones." section as well. It also literally doesn't apply to 99.99% of the existing katas, and writing this paragraph is just opening a hole for people to object on questionable grounds, such as this).
The text was updated successfully, but these errors were encountered: