Hopscotch hashing

Hopscotch hashing. Here, H is 4. Gray entries are occupied. In part (a), the item x is added with a hash value of 6. A linear probe finds that entry 13 is empty. Because 13 is more than 4 entries away from 6, the algorithm looks for an earlier entry to swap with 13. The first place to look in is H−1 = 3 entries before, at entry 10. That entry's hop information bit-map indicates that d, the item at entry 11, can be displaced to 13. After displacing d, Entry 11 is still too far from entry 6, so the algorithm examines entry 8. The hop information bit-map indicates that item c at entry 9 can be moved to entry 11. Finally, a is moved to entry 9. Part (b) shows the table state just before adding x.

Hopscotch hashing is a scheme in computer programming for resolving hash collisions of values of hash functions in a table using open addressing. It is also well suited for implementing a concurrent hash table. Hopscotch hashing was introduced by Maurice Herlihy, Nir Shavit and Moran Tzafrir in 2008.[1] The name is derived from the sequence of hops that characterize the table's insertion algorithm (see Hopscotch for the children's game).

The algorithm uses a single array of n buckets. For each bucket, its neighborhood is a small collection of H consecutive buckets (i.e. ones with indices close to the original hashed bucket). The desired property of the neighborhood is that the cost of finding an item in the buckets of the neighborhood is close to the cost of finding it in the bucket itself (for example, by having buckets in the neighborhood fall within the same cache line). The size of the neighborhood must be sufficient to accommodate a logarithmic number of items in the worst case (i.e. it must accommodate log(n) items), but only a constant number on average. If some bucket's neighborhood is filled, the table is resized.

In hopscotch hashing, as in cuckoo hashing, and unlike in linear probing, a given item will always be inserted-into and found-in the neighborhood of its hashed bucket. In other words, it will always be found either in its original hashed array entry, or in one of the next H−1 neighboring entries. H could, for example, be 32, a common machine word size. The neighborhood is thus a "virtual" bucket that has fixed size and overlaps with the following H−1 buckets. To speed the search, each bucket (array entry) includes a "hop-information" word, an H-bit bitmap that indicates which of the next H−1 entries contain items that hashed to the current entry's virtual bucket. In this way, an item can be found quickly by looking at the word to see which entries belong to the bucket, and then scanning through the constant number of entries (most modern processors support special bit manipulation operations that make the lookup in the "hop-information" bitmap very fast).

Here is how to add item x which was hashed to bucket i:

  1. If the hop-information word for bucket i shows there are already H items in this bucket, the table is full; expand the hash table and try again.
  2. Starting at entry i, use a linear probe to find an empty entry at index j. (If no empty slot exists, the table is full.)
  3. While (ji) mod nH, move the empty slot toward i as follows:
    1. Search the H−1 slots preceding j for an item y whose hash value k is within H−1 of j, i.e. (jk) mod n < H. (This can be done using the hop-information words.)
    2. If no such item y exists within the range, the table is full.
    3. Move y to j, creating a new empty slot closer to i.
    4. Set j to the empty slot vacated by y and repeat.
  4. Store x in slot j and return.

The idea is that hopscotch hashing "moves the empty slot towards the desired bucket". This distinguishes it from linear probing which leaves the empty slot where it was found, possibly far away from the original bucket, or from cuckoo hashing which, in order to create a free bucket, moves an item out of one of the desired buckets in the target arrays, and only then tries to find the displaced item a new place.

To remove an item from the table, one simply removes it from the table entry. If the neighborhood buckets are cache aligned, then one could apply a reorganization operation in which items are moved into the now vacant location in order to improve alignment.

One advantage of hopscotch hashing is that it provides good performance at very high table load factors, even ones exceeding 0.9. Part of this efficiency is due to using a linear probe only to find an empty slot during insertion, not for every lookup as in the original linear probing hash table algorithm. Another advantage is that one can use any hash function, in particular simple ones that are close to universal.

Variants

The paper also introduces several variants of the hopscotch hashing scheme.[1]

An advanced approach uses a pointer scheme to implement the hop information word (in the basic case this is the hop information bit-map). This allows for the hop information word to be of arbitrary (but fixed) size.

While the basic case and the advanced approach are designed to be sequential, there also is a concurrent variant for each of them.

A lock-free variant was introduced by Robert Kelly, Barak A. Pearlmutter and Phil Maguire in 2020.[2]

See also

References

  1. ^ a b Herlihy, Maurice; Shavit, Nir; Tzafrir, Moran (2008). "Hopscotch Hashing" (PDF). DISC '08: Proceedings of the 22nd international symposium on Distributed Computing. Arcachon, France: Springer-Verlag. pp. 350–364. Archived from the original (PDF) on 2022-12-20.
  2. ^ Kelly, Robert; Pearlmutter, Barak A.; Maguire, Phil (2020). "Lock-Free Hopscotch Hashing" (PDF). In Maggs, Bruce (ed.). Symposium on Algorithmic Principles of Computer Systems. pp. 45–59. doi:10.1137/1.9781611976021.4. ISBN 978-1-61197-602-1.