Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

smhasher test results for funny hash show failures #1

Open
demerphq opened this issue Jan 11, 2018 · 23 comments
Open

smhasher test results for funny hash show failures #1

demerphq opened this issue Jan 11, 2018 · 23 comments

Comments

@demerphq
Copy link

https://github.com/demerphq/smhasher/blob/master/doc/FunnyHash64-2.128.128.64.txt

FunnyHash64-2 128 128 64 FAILED 17 of 187 tests. Avalanche: 17-24, OverAll: 187, Sanity: 4-5, TwoBytes: 59, 61, 63, Zeroes: 166-167, 170

Just a heads up.

@funny-falcon
Copy link
Owner

Thank you for test and reminding.

I've looked inattentive before, and i've seen only avalanche one short keys. But seems like there are more. I'll investigate.

BTW, is non-zero seed used for failed tests? FunnyHash is known to be bad with all-zero seed.

@demerphq
Copy link
Author

demerphq commented Jan 11, 2018 via email

@funny-falcon
Copy link
Owner

FunnyHash is intended for usage with hidden random-like seed. Zero seed is certainly neither hidden nor random-like.

May I send pull request with change of seeding for funny_hash? Like:

*(uint64_t *) out = fh64_string_hash2(key, len, s64[0] ^ 0x12345678, s64[1] ^ 0xabcdef12 );

I'll investigate issues that wouldn't be fixed with this.

@demerphq
Copy link
Author

demerphq commented Jan 11, 2018 via email

@demerphq
Copy link
Author

demerphq commented Jan 11, 2018 via email

@funny-falcon
Copy link
Owner

I don't care for zero seed at all. I'll put pull request for seed xoring as in comment above.

No, that is wrong. when choosing a number from 0 to 2**64-1, 0 is no less random than 0x0123456789ABCDEF.

Yes. But seed is 128bit. And all-zero seed has probability is 2^-128 . It is same probability there is another UUIDv4 matching my UUIDv4.
So, I don't care.

I don't care much for avalanche for short keys. I know how to fix it: just add another one multiplication round. It will slow function for short keys. (That is why it 8 byte keys are good: there are one more multplication round compared to 7 byte).
Especially I don't care much for this avalanche dependency on seed value. Even your test results shows there is always at least 70 bits of seed value that produces good avalanche, and that is more than enough for FunnyHash purposes (ie hash for internal hash table without exposed seed or hashsum value).

But I care for twobytes. I'll certainly will try to understand whats happening.

btw, can you help me to compile benchmark? I do:

$ git clone https://github.com/demerphq/smhasher.git
$ cd smhasher
$ mkdir build
$ cd build
$ cp ../../funny_hash/funny_hash.h .
$ cmake ..
$ # VERSION.h not found. What should I install?
$ make
....
/home/yura/Project/smhasher-demerphq/TAP.cpp:5:21: fatal error: VERSION.h: No such file or directory
 #include "VERSION.h"
                     ^
compilation terminated.

@funny-falcon
Copy link
Owner

btw: did you try https://github.com/funny-falcon/fanom_hash ? I'll check it as well and will send PR for it.

@demerphq
Copy link
Author

demerphq commented Jan 11, 2018 via email

@demerphq
Copy link
Author

demerphq commented Jan 11, 2018 via email

@demerphq
Copy link
Author

demerphq commented Jan 11, 2018 via email

@funny-falcon
Copy link
Owner

That is your call again. I suspect that it means you leak seed state, so
attacking the hash is a lot easier than you think, but I cant prove it
right now.

FunnyHash is only for use-cases where neither seed, nor result hash-sum is exposed (in any way, except timing for insertion/lookup into hash-table) (and seed is strongly random).
For any other use case there is no any promises.
70 bits of seed is more than enough to protect against timing investigations.
If it were avalanche fail on keybits (not on seed), or if there were only 30bits of seed value that produces good avalanche, I'd bother more.

@funny-falcon
Copy link
Owner

Maybe I should give some context on my view of this. Perl used to use
a hash function with the same problem, and with some builds it would
end up seeded as zero. Thus there was a security issue and etc. Even
if it is unlikely under totally correct usage, it is still a design
flaw.

Any function with known seed is vulnurable to bruteforce searching of collisioned keys.
It doesn't matter, have you SHA-1, SHA-2, SHA-3 or BLAKE-2 or SIPHASH.
If you know seed, you easily generate collisioned keys.

That is why I don't bother for known seed. And zero-seed is known seed.

@funny-falcon
Copy link
Owner

If you know seed, you easily generate collisioned keys.

I mean, collisioned into same bucket of hash table.

@funny-falcon
Copy link
Owner

Look at Microsoft Marvin32 hash function. It is awesome. (and patented). But it is also dumb with zero-seed.

@demerphq
Copy link
Author

demerphq commented Jan 11, 2018 via email

@funny-falcon
Copy link
Owner

Marvin32 doesn't do well in the smhasher tests either.

Just try it with pre-xor-ed seed (so it is non-zero).

@demerphq
Copy link
Author

demerphq commented Jan 11, 2018 via email

@demerphq
Copy link
Author

demerphq commented Jan 11, 2018 via email

@funny-falcon
Copy link
Owner

SHA-2 and SHA-3 have fixed seeds. I dont know much about BLAKE-2, but i
think its a SHA-3 candidate, so it would have a fixed seed as well.

Both BLAKE-2 and SHA3 are keyed hash functions, ie they have arbitrary seed.

Nope. Sorry. It took Google level resources and a major research to find
any collisions at all in SHA1, which by the way has a fixed seed.

I didn't mean full hashsum collision. Only collision to same bucket of hash-table. If hash-table has 100_000 buckets, it is enough to examine 10_000*100_000 = 1_000_000_000 keys to find collision chain longer than 50_000, if seed is known. That is just a day of GPU even with newest SHA-3.

That is why it is important to generate secure random seed and keep it secret.

That is why language/db runtime designer have to be sure seed may be all-zero only in strict random case, so attacker cannot rely on it.

@funny-falcon
Copy link
Owner

I've tested funny_hash with xored seed (as I suggest above). It passes all tests except avalanche for short keys. I will send PR for this.

@funny-falcon
Copy link
Owner

Thank you for this issue.

I found way to feed 64bit seed into 128bit state so even no avalanche errors happens (at a cost of halving seed).
And I accept your idea of "preparing state" function. I will add this "prepare" to funny_hash.h and will make PR to your tests for "funny_hash with 64bit seed and 128bit state".

Also I will PR with fanom hash. It is quite fast, and looks to pass tests (after little tweak I will publish soon). Though I believe its complexity makes it less attractive than funny_hash for language implementations.

funny_hash has good property of easy chaining. It has same 128 intermediate state equal to (random/expanded) seed.
fanom_hash has 256bit intermediate state with 128bit initial seed, and it is less convenient to use in language implementations, where state should be passed while deep structure hashed.

@funny-falcon
Copy link
Owner

Damn, fanom_hash didn't pass differential "up-to-3-bit differentials in 256-bit keys -> 64 bit hashes" ... pitty...

@funny-falcon
Copy link
Owner

No, there was just small error. FanomHash passes tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants