Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster Map for Buffer keys #172

Closed
ronag opened this issue Jun 9, 2024 · 4 comments
Closed

Faster Map for Buffer keys #172

ronag opened this issue Jun 9, 2024 · 4 comments

Comments

@ronag
Copy link
Member

ronag commented Jun 9, 2024

A bit of a challenge for those interested. Is it possible to do a javascript/wasm/native implementation of Map that supports Buffer keys and is faster than Map + Buffer.toString.

Here is my initial failing experiment:

import { bench, group, run } from 'mitata'
import xxhash from 'xxhash-wasm'

const hasher = await xxhash()

const values = []
for (let n = 0; n < 100e3; n++) {
  const key = Buffer.from(Math.random().toString(36).repeat(2))
  values.push({
    key,
    hash: hasher.h32Raw(key),
    value: Math.random()
  })
}

class BufferMapString {
  constructor() {
    this.map = new Map()
  }

  set(key, hash, value) {
    this.map.set(key.toString() + '', value)
  }

  get(key, hash) {
    return this.map.get(key.toString() + '')
  }
}

class BufferMap {
  buckets = new Array(0xffff + 1).fill(null).map(() => [])

  set (key, hash, value) {
    this.buckets[hash & 0xffff].push(hash, key, value)
  }

  get (key, hash) {
    const bucket = this.buckets[hash & 0xffff]
    const len = bucket.length
    for (let i = 0; i < len; i += 3) {
      if (bucket[i + 0] === hash && bucket[i + 1].equals(key)) {
        return bucket[i + 2]
      }
    }
  }
}

group('set', () => {
  bench('string', () => {
    const strMap = new BufferMapString()
    for (const { key, hash, value } of values) {
      strMap.set(key, hash, value)
    }
  })

  bench('buffer', () => {
    const bufMap = new BufferMap()
    for (const { key, hash, value } of values) {
      bufMap.set(key, hash, value)
    }
  })
})

group('get', () => {
  const bufMap = new BufferMap()
  const strMap = new BufferMapString()

  for (const { key, hash, value } of values) {
    bufMap.set(Buffer.from(key), hash, value)
    strMap.set(Buffer.from(key), hash, value)
  }

  bench('string', () => {
    for (const { key, hash } of values) {
      strMap.get(key, hash)
    }
  })

  bench('buffer', () => {
    for (const { key, hash } of values) {
      bufMap.get(key, hash)
    }
  })
})

await run()

Note how I even try to give an advantage to BufferMap by moving the overhead of hashing outside of the benchmark.

cpu: Apple M2 Pro
runtime: node v22.2.0 (arm64-darwin)

benchmark      time (avg)             (min … max)       p75       p99      p999
------------------------------------------------- -----------------------------
• set
------------------------------------------------- -----------------------------
string     28'979 µs/iter (22'492 µs … 36'209 µs) 31'362 µs 36'209 µs 36'209 µs
buffer     11'187 µs/iter  (7'465 µs … 31'122 µs) 11'608 µs 31'122 µs 31'122 µs

summary for set
  buffer
   2.59x faster than string

• get
------------------------------------------------- -----------------------------
string      5'025 µs/iter   (4'653 µs … 8'866 µs)  5'044 µs  8'440 µs  8'866 µs
buffer     17'304 µs/iter (13'833 µs … 34'199 µs) 19'076 µs 34'199 µs 34'199 µs

summary for get
  string
   3.44x faster than buffer
@ronag
Copy link
Member Author

ronag commented Jun 9, 2024

@lemire thoughts?

@ronag
Copy link
Member Author

ronag commented Jun 9, 2024

Can we beat it with native bindings?

@lemire
Copy link
Member

lemire commented Jun 9, 2024

The benchmarks should include in the cost of hashing the strings, shouldn't then? It is interesting that your get benchmark is only 3.44x slower, considering that it is a sequential search.

How is Map implemented currently?

@ronag
Copy link
Member Author

ronag commented Jun 9, 2024

The benchmarks should include in the cost of hashing the strings, shouldn't then? It is interesting that your get benchmark is only 3.44x slower, considering that it is a sequential search.

Just trying to give javascript implementation every advantage.

How is Map implemented currently?

Probably some kind of hash table with open addressing.

@ronag ronag closed this as completed Jun 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants