-
Notifications
You must be signed in to change notification settings - Fork 236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 fix FPU's float-to-signed-integer corner case #943
base: main
Are you sure you want to change the base?
Conversation
fix "random" operand generator
@stnolting if you are okay with being patient for a day I can check the corner case tomorrow (US West Cost time). 99% its okay, there is just something nagging about why I added the extra 2 conditions :) |
@stnolting okay so challenge number 1. When I try and replicate the bug locally I get the following failures. These look like corner case rounding failures somewhere and not necessarily in the hardware btw.
0x3F abd 0xBF-> +/- 2^-64 respectively. So the question is why the expectation is +/-1 as a 32-bit int vs 0. Without knowing the rounding mode it will be a bit tricky. For almost any round this should end up as a +/- 0 unless the integer used is a long/64-bit. This as the "1" will be shifted soooo far away for 0.5LSB that it shouldn't result in an error. So unless there is a rule in IEEE that it has to be +/- 1 unless the float is 0 exact (which seems odd) there might be an issue here. I'll spend some time getting the compare stuff to go on my end on the test case. We should also diff our FPU versions as mine is 300/400 lines shorter than yours it seems. We definitely aren't getting the same results. Maybe I missed a commit/messed up a commit? |
The default rounding mode (at least for the hardware) is "round to nearest, ties to even".
That's great! Thank you very much! 👍 I'm on the latest version of the main branch (so no FPU hot-fix) and this is what I get when running a section of @Quma78's code:
With the fix from this branch the last test case (208) also passes. |
Interesting, let me check my quick hack test setup. Its a bit worrisome that the softfloat we are using disagree what the result should be in the test case, whereas the HW agrees across test case. This might hint at a different problem. I'll get the latest branch on my local machine and do some diff's today to see whats what. It could be something as simple as rounding mode being different between the two cases. I agree the "default mode" should be round to nearest even (normal round), but because of RISCV ISA "fun" we need to look at the GCC assembly to figure out what the heck it actually did as it either sets it directly in the opcode, change the default round mode, or uses the out-of-reset round mode. Did I mention I hate RISCV float rounding modes ;) Anywho, I believe the reason for the exception I added was to deal with a specific rounding mode that likely isn't tested in the simple test you have setup. Hence why I'm a tad cautious :) A note here though is: 0xD0 or (208) is -1.0 * 2^(160-128) or -1.0 * 2^32 which is the sign bit corner case. Which is indeed what that if statement is trying to catch. The exception handling (my shoddy memory here) might be to deal with the case where we are doing a convert to unsigned int. Thinking we should expand the test case with both float to signed and unsigned int.. and trigger all the rounding modes as well, to ensure we aren't missing something. |
After a bit more research on math.h we find that the rounding mode of float to int conversions is not round to nearest: Not sure how to fully interpret this, but it seems like we should be setting the rounding mode to round towards zero if we are using rint for float conversions? or does it only affect casting? It explains why the cast operator experiment is different than rint at least. For rint, reading the documentation: From this is seem rint is using "current rounding mode" Now reading the manual further we get: As we are converting outside the size of an integer the return value would be: "an implementation-defined value is returned" I did a small dive into rint to see whats going on there.. its mostly straight up assembly, yeah :| There is a double rint(double x) C example where the return value, if the exponent is >51 (remember double) is just X straight up. I'll dive a bit more. Early indications that we might not actually be looking at a "bug" but a "feature" in math.h. I still need to run the Imperas with softfloat vs the bug to confirm whether it is a bug bug :) |
I am not sure if I want to continue with my pre-built toolchains. The X-pack project provides excellent toolchains - so why reinvent the wheel? ;)
Good point. I think we should adjust the intrinsics and use the floating point CSR for configuration. This is so much more flexible.
Holy cricket! Thanks for all your work!
👍
Thanks again! ❤️ |
And here I thought reinventing the wheel was what engineering was all about ;) Makes sense. Forwarded the x-pack link to my firmware SDK person, might be we just pivot to that vs internal build.
Welcome to FPU :) I knew what I was signing up for, so not too surprised but still learning. I've yet to pull in favors from my friends that are building Sharc and Tensilica DSPs, but might at some point :) |
😅
I can only agree. Btw, I have modified the |
rounding mode defined by fcsr "rm" bits
So finally got around to run the testcase locally with RVVI comparison to OVP sim from Imperas. A few details about my setup is that I'm likely calling gcc with the zfinx extension enabled. This probably means that rint gets replaced with the zfinx extension for everything but corner case handling
Notice the 2 failures reported seem to be corner case related. But the Imperas compare reports:
From the reporting you can see the number of compares and mismatches being "0". So Imperas reference implementation "ovpsim" -https://github.com/riscv-admin/riscv-ovpsim- that uses the Berkeley softfloat reference library (the same as Sail and Spike) reports no failures. Next step is to dig into why we see the failures vs the math.h library from RISCV. Also I should probably try a run where I'm not enabling the zfinx extension in gcc. |
But doesn't that mean you are comparing Which hardware are you using? The "default" one (aka from the main branch) or the "fix" from this PR? |
With gcc compiling with Zfinx enabled, or HW float if you will, means the compiler will likely utilize the fcwt.ws.s instruction and not the emulated function in math.h. Basically the "SW" vs "HW" compare function would always match as its calling the same RISCV instruction under the hood. As the math.h likely has an exception handler is why we see a trigger that causes the 2 failures.
The next step is to deep dive into the RISCV F/Zfinx extension spec and the IEEE 768 float spec to ensure that we are indeed behaving right and its rooted in a mistake in math.h based on yet another ancient x86 float "feature". |
for now...
So this is the riscv spec notes on float to signed integer conversion: From this for FCVT.W.S if the number is out-of-range for a negative input, which I believe is the case, then the resulting integer should be 2^-31 or 0x8000_0000. So next step is for me to do a diff. The errors I have seem to be rounding related vs the SW implementation. Need to dig some more, but might be a mistake in the C library, as I'm comparing against the berkeley reference implementation. @stnolting did you touch the FPU after my checkin? no blame :) just trying to understand why my implementation is something like 400 lines shorter :) |
But this is not happening when using the default FPU from the main branch, right?! 🤔
Wait, your local version is shorter? It should be the other way around I think. I just trimmed some trailing spaces and reworked the header. No real rtl code edits from my side. |
@stnolting I apologize for the Very long delay here. Got stuck in some hairy support case combined with a 2 vacation back to the homeland in europe :) So I did a diff between the FPU in the tip of trunk and my local one. Besides a number of stylistic differences the one that seems to fall out is: Where I use hex for the mantissa encoding and you have changed it to a binary value... well that and I sim with xcelium and not ghdl.. but that shouldn't matter I would think. The rest of the diffs are stylistic and license text shortening except for these sections which are tied to the CSR addressing. |
Oh no worries! The "stale" label was just a memory for me and by no means a subtle trigger. 😅
I have no idea why I have changed that... Anyway, it should not matter at all.
I think I just moved some of the "coarse" CSR addressing logic out of the FPU. There should be no functional difference. |
I hadn't even noticed the stale flag :) this bug as been on my mind for a while now. Its especially frustrating that I cannot recreate it in our ASIC sim environement. I agree the addressing doesn't matter :) more a "this is the only major difference I see". Basically the diff tells me they 2 FPUs are essentially the same. As for the hex vs non-hex encoding. The only difference I could see would be (and not I could have counted wrong) the hex version has 24 "0"s and the non-hex version only has 23 "0"s. Not sure if I remember the width of the mantessia at this stage. I thought it was 24-bits as we haven't post shifted anything yet. Now my base core is still way old. I think my last sync point was sometime in February, beyond bug-fixes. As this is an FPU specific issue this shouldn't matter. Next steps will be to try and recreate the environment with ghdl vs xcelium and do trace dump of the FPU to compare the internal state in the 2 cases. If all else fails it could be a tool issues. Super unlikely though. |
So you are comparing your local FPU with the one from the main branch, right?
Please note that the "fix" from #943 does not contain this mantissa = 0 check anymore (as part of the supposed "bug fix" triggered by #942).
👍
That sounds great! So we can discard #943 as this seems so be "just" a toolchain/library issue? 🤔
I'm very curious about the findings from this! I've tried feeding the core with permanent interrupt requests before (some of it is still available in the default processor check program), but I've never taken it to the extreme. |
Fixing #942