Discuss how to report on alternate algorithms #218
Replies: 23 comments
-
Hello! We do include a label in the standard output which should account for the solution "category". Right now this label is not taken into account but as we progress with multiple solution types we will include that as a dimension in the report. |
Beta Was this translation helpful? Give feedback.
-
I was experimenting with breaking up the sieves into components, and run things as an arbitrary combination of those components. The components in question were:
The parallel stuff was set aside as it was getting overly complicated, but even the storage/algorithm pairings had issues of adding a lot of overhead when trying to componentize things. I still have the partially completed work on a separate branch, but I wasn't sure it was worth trying to build them for the purpose of this repo. At least, not as a part of my existing solution. I'd need to build an entirely new solution to make it work, and I don't know yet how it would perform. It would still feel nice to be able to run it as something like |
Beta Was this translation helpful? Give feedback.
-
I also experimented a bit with running things more in parallel, but it is very difficult to get it to run faster if we're only doing the primes up to 1'000'000. I found this nice site that got me a bit inspired (although my code is very different from the java on that page): http://compoasso.free.fr/primelistweb/page/prime/eratosthene_en.php. So I have a working segmented prime implementation using the 5760of30030 map, which works fine for larger number of primes than the ones below 1'000'000. Doesn't use much memory, but since it needs more operations, it is still slower, since the bitmap for the sieve below 1'000'000 fits in L2 cache. |
Beta Was this translation helpful? Give feedback.
-
I think there is indeed a case to be made to separate the basic comparison of languages (which is what @davepl originally created the repo for) from the comparison between algorithms/alternate solutions (in part across languages, by now). I do not believe reusing the free-form "username_optionaltag" field to make the distinction is the way to go, though. At the moment, my proposal would be to add a category field to the "drag-race" output format, with the following defined values:
Of course, we would need to do some work to update existing In all this, the definition of the "base" algorithm is of course relevant. I would propose to define it as follows:
|
Beta Was this translation helpful? Give feedback.
-
Thanks @rbergen! As a pragmatic suggestion, we could append a new (optional) column to the output for the category label. The rationale here is that the vast majority of existing implementations are just Solutions that have interesting work done on algorithms could simply implement the final column with the label, with some runs like I'm not sure what I agree that the definition of the base algorithm hasn't been buttoned down, but I haven't personally noticed anything too strange implemented yet luckily. I need some clarification on some of the points, though, as I haven't quite understood them. Maybe they'll make sense after some more coffee :)
To be honest, I think it will need some clarification. In my understanding, all solutions that are using bits for storage will need to use bit masks as bits aren't addressable on most hardware. But you might be trying to make a different point that I haven't picked up on :)
I'm not sure what this would permit people to do. I suspect that any solution that exchanges inner/outer loops in some way would be different enough to the original algorithm that it wouldn't be comparable with it, and would need to be labelled appropriately. But I also might have misunderstood what you mean by it. |
Beta Was this translation helpful? Give feedback.
-
That's basically what I was thinking too, although my suggestion would be to make
There are examples of languages where the nature of the language is such that its "natural form" is not faithful, and there are also languages that just don't include the requirements for a faithful implementation. This is true because for a faithful implementation, some form of "true/false" array, some form of class/object/structure, and means to explicitly choose to create a new instance of the latter are required. If the algorithm is the base algorithm but the implementation does not include things like recreation and initialization of the sieve state then it is not "fair" to compare them with other, faithful, implementations. At the same time, I think it is also not fair to put them in the same category as completely different algorithms. If people are confused or unsure, they can always choose
I think I agree with you. In fact, I think this whole point can be dropped, as what it's trying to express is already covered by other points.
It would allow people to run the clearing loop first with 3 as a factor, and then seek for the first (next) factor. I know by experience that for some languages, this makes a real difference for how "language-like" the implementation is, while it actually doesn't change the algorithm, as such. What is missing in my proposed phrasing is the word "inner", as in "Changing the order of the two mentioned inner loops is permissible." |
Beta Was this translation helpful? Give feedback.
-
I agree with what both of you are saying. When I submitted the various algorithms in C I was surprised that faithful just stated "uses sieve of Eratosthenes" and "does not use external dependencies", so they all were included into faithful. As long as you drop the point about the bit masks, which it seems you are already saying, it all makes sense to me. If bit masks are not allowed it is not possible to express smaller than byte storage, unless one concocts code with divisions and modulus just to avoid the "bit masks not allowed" rule. |
Beta Was this translation helpful? Give feedback.
-
I see this as two separate problems, one refers to the output format for the current solutions and the other is coming up with a proper ruleset for classifying the solution types. Problem 1I propose adding another column that can store multiple labels with values. Something like Problem 2Using the proposed format above should remove the single classification problem since we can now support any classification based on tags. |
Beta Was this translation helpful? Give feedback.
-
That's an interesting idea, indeed. A few things cross my mind thinking about it:
|
Beta Was this translation helpful? Give feedback.
-
Maybe we should just switch to JSON :) |
Beta Was this translation helpful? Give feedback.
-
Wow there's been a lot of great discussion! @marghidanu, I'd avoid probably json unless it's really necessary; while we're used to it in the higher-level language world, it's not part of the standard library of many other languages and will require external dependencies. Or you could just hard-code the string I guess, although it'll get pretty ugly potentially. I think the bigger question is whether we're going to benefit from having the flexibility of arbitrary
I don't really have a clear idea of what kind of reporting we want to do, or what attributes we want to extract from it. But if you've got a pretty good use case in mind, it'll help guide us on what sort of keys/etc you'd be happy to support. I think this echoes what @rbergen has said above too. Also consider the option of just having comma-separated Talking to some of the above comments too.
This makes sense :) We'll get the fine-grained reporting we need by having labelling each "run" appropriately rather than marking the whole solution as faithful/otherwise. It's great having different implementations in the same solution as it's a lot easier for readers to understand and compare. I think we're all in agreement on the bit masking as it's essential for bit-level storage. Even C++ does this internally behind
This is an interesting one, but I'm not sure I can think of an easy example. I can imagine fun stuff with functional/recursive code, but that's not my area of expertise. In my head, you'd have to allocate space for the sieve and initialise it to apply Erastothene's sieve algorithm to it. Do you have something in mind you can point me at?
Ah, thanks! I see what you mean, and I definitely agree. It won't have much impact on performance. We'll need to be careful with the phrasing as it's quite hard to convey clearly. Maybe something along the lines of: |
Beta Was this translation helpful? Give feedback.
-
@mike-barber That’s why I suggested storing the tags in an arbitrary format, it’s impossible to think of all dimensions right now. We need something that is extensible. As for the output of the implementations, I can tell you that all output will eventually end up in JSON since I’m writing the report generator :) Because the parser actually will convert any output to JSON for generating the HTML files. |
Beta Was this translation helpful? Give feedback.
-
To me, 'bitmasking' is different from 'bit manipulation'. Bit manipulation is setting a single bit at a time in a way that is specific to a single operation, whereas bitmasking is setting multiple values at a time in a way that can be repeatedly applied (a rubberstamping effect). So bitmasking is a parallelization tool, whereas bit manipulation is merely a way of saving memory. Bit manipulation may use a mask/bitmask (noun) to apply its effect, because the languages and CPU's don't allow direct addressing of bits, but it's different from bitmasking (verb) as an operation concept. It's basically what you describe, but with separate terms describing the two methods, rather than overloading a single term to mean both types of operations (and confusing people). |
Beta Was this translation helpful? Give feedback.
-
Assembly is a prime example, in my view. I've now coded 4 implementations in various incarnations of that language, 3 of which are now in the drag-race branch (the arm64 version is on its way). In my opinion/experience, the "natural flow" of assembly is to just thunder through the algorithm, and not spend coding time/bytes/processing cycles on things like subroutine calls and heap allocations unless absolutely necessary. In fact, for a proper "faithful" comparison, one basically needs to step out to libc function calls. In C and other languages these are very much "native" to the language, but from the perspective of assembly, one could argue they are "external". |
Beta Was this translation helpful? Give feedback.
-
@marghidanu As I now understand what we're discussing, we would end up with a definition of "official" labels, similar to this:
These would then be combined with the existing standardized output format, to create something like this:
Would this be in line with your thinking? |
Beta Was this translation helpful? Give feedback.
-
Thanks, that makes a lot of sense! I wasn't thinking about assembly :) And I see what you mean -- you've basically got ~100 extra lines to deal with the housekeeping to handle the setup and memory allocation. Where the "slightly unfaithful" one just has reserved space in the data section. Nice work, by the way! My understanding is that believe heap allocation is always external in some form or another. At some point, you've still got to do a syscall to get some pages from the OS. I wouldn't worry about that being external too much :) |
Beta Was this translation helpful? Give feedback.
-
Thank you! After tripping over myself a few times at the beginning, I did enjoy coding both the x64 and the arm64 implementations. I did find out that documentation on arm64 assembly is quite a bit behind that on x64. Particularly finding out how to execute a square root with conversions from and to "integer" registers was quite a bigger puzzle than I expected.
Indeed, which is why I had the audacity to still label the implementations using those calls as faithful. :) |
Beta Was this translation helpful? Give feedback.
-
I know what you mean! I had the same experience doing some simd stuff on arm64 before. Risc is fun though :)
I think it is, on the basis that it's doing the same work and allocating memory and so on. More relevant to the general thread, though, I think we're in a good place on the ideas for labels and so on. Thanks for fixing the format in all the solutions, including mine! Happy to help with any other changes needed. The next thorny issue to work out is how to do the benchmark runs outside of the default Azure runners that GitHub provides -- results are all over the place between runs. But I think that'll be a subject for a different thread :) |
Beta Was this translation helpful? Give feedback.
-
RISC, or the arm64 implementation of it, is no-nonsense, clean and consistent. I really do like that. The only instructions I found oddly absent at a conceptual level were push and pop. (I know, the 16-byte SP alignment requirement makes it practically impossible to implement them, but still...)
Well, now that we have a proposal for a structured distinction between implementations, we will at some point have to implement it. I say that in part because @davepl has been very supportive of doing so in recent e-mail communication I've had with him. So, it may not be too long before we indeed need to go over the whole set of solutions, again.
We may be closer to an actual solution for that issue than one might think. It's still work in progress, but in my opinion some good progress on externalizing the benchmark runs is being made. If you have any systems laying around that could act as benchmark runners, I think it's not too soon to start dusting them off. :) |
Beta Was this translation helpful? Give feedback.
-
Awesome! At work, we use self-hosted runners for a bunch of things, so my initial instinct was to do that. I can't reconcile how to make it secure enough on a public repo though: It's a bit scary running code you don't control. Even if you put the runner itself in a Docker container, it still needs network access to pull dependencies etc. I haven't thought of a good way to solve this yet, but definitely happy to help if I can. |
Beta Was this translation helpful? Give feedback.
-
Runners on a public repo are not really an option for us. We are now testing the bit of code that allows any of us to run the benchmark on any machine externally. Additionally, we will provide a way to publish the local results to a web service and gather the data into a web app so anybody could see it. More to come in the following few days. |
Beta Was this translation helpful? Give feedback.
-
The problem with the original definition of "faithful" is that it did not exclude other sieve-algorithms that employed further optimizations. So, those algorithms, like mine, were no longer apples-to-apples comparisons of languages and language features, but rather of algorithmic optimizations. I think both are interesting. To my mind, we should have a category of "direct ports" of the original algorithm to other languages, used for language comparisons - perhaps clarifying "faithful" as using Dave's original algorithm in all respects. And then, more interesting to me are other algorithmic optimizations - still using a sieve, but applying different techniques to optimize cpu or memory. Actually, any algorithm that could produce the list of primes to 1M by any method would be interesting even if not using a sieve (though I don't know if there are more efficient algorithms for producing bulk primes within a range). A tag like "algorithmic" could be applied to these entries. We don't gather statistics like memory use - though I think that could be interesting. |
Beta Was this translation helpful? Give feedback.
-
In line with what was discussed in earlier comments, I think one-dimensional tagging is going to be too limited. I therefore still think that working with key-value pairs is the way to go. For one, from a perspective of comparing algorithms, I think it would be very useful for the benchmark reports to not only indicate that solutions are intended for an algorithmic comparison, but also what algorithm each solution implements. To phrase it this way: you're going to want to make comparisons between (the best performers in) the range of algorithms that are in the set of solutions, as well as compare the various implementations of "8 of 30" with each other, to name one example. In that context, I think that once we start identifying implementations that stick to @davepl's original algorithm as "base", then the exact meaning of faithful vs unfaithful will become somewhat less relevant (but not meaningless). It will then make a statement on the (technical) implementation style, instead of defining the implementation as a whole. |
Beta Was this translation helpful? Give feedback.
-
I think we need to have a discussion to figure out how to deal with reporting for different algorithms, as this adds an extra dimension of fun.
Currently, all of the implementations (except for the C ones) are very similar to the "reference" C++ version, where we consider all odd numbers as potential primes.
The C implementations from @mckoss and @danielspaangberg are really interesting additions, and make reference to this wikipedia section. The same applies to the C# implementation by @Kinematics. Thanks for these! The important thing to note here is that these are algorithmically quite different from all other implementations as they avoid needing to check higher known primes factors besides just 2: variants include 3,5,7, and upwards.
However, this does raise some issues:
1of2
, is the equivalent of the "reference" C++ implementation, so this one is comparable1of2
is part of a more general set of algorithms, I don't think it would make sense to move it to a separate project.In essence, I think we just need to work out what do to on the reporting side. My initial thoughts are along the lines of this:
1of2
8of30
and upI'm not sure if this is the right approach, or how to deal with tagging specific lines in the output as "different algorithm". It'll be great to get everyone's input! I know you're working on reporting currently @marghidanu and it's looking pretty cool so far.
Beta Was this translation helpful? Give feedback.
All reactions