Hash Tables

A Sparse Table ADT

A hash table is one possible implementation of a sparse table, an abstract data type in which we have a reasonable number of key-element data items to store, but the number of possible keys (the domain) is so large that it is not possible or practical to create an array large enough to be directly indexed by each possible key. A hash table allows many of the possible keys to map to the same location in an array of reasonable size (somewhat larger than the actual number of data items we need to store). Although it is possible that two (or more) different key-element pairs will be mapped to the same index -- which occurrence is called a collision -- if the array is large enough (see below) and the function doing the mapping -- called a hash function -- distributes keys somewhat uniformly, the need to handle collisions will not appreciably slow down access to the table.

Analysis

If the load factor of the table is L = N / M, where N is the number of data items in the table and M is the table size, then the (theoretical) average numbers of probes required for successful and unsuccessful search of a hash table using various collision resolution techniques are as follows (where 'ln' stands for the natural logarithm):

Average Number of Probes in Terms of Load Factor L
  Successful Unsuccessful
Chaining 1 + L / 2 L
Open Addressing
  Linear collision resolution [1+1/(1-L)]/2 [1+1/(1-L)2]/2
Quadratic collision resolution  1 - ln(1-L) - L/2   1/(1-L) - L - ln(1-L) 
Double hashing (rehashing, random probes) -ln(1-L)/L 1/(1-L)

Average Number of Probes for a Given Load Factor
Load Factor 0.10 0.50 0.75 0.80 0.90 0.99 2.00
Successful
Chaining 1.05 1.25 1.38 1.40 1.45 1.50 2.00
Open Addressing
  Linear collision resolution 1.06 1.5 2.5 3.0 5.5 50.5 --
Quadratic collision resolution 1.05 1.44 2.01 2.21 2.85 5.11 --
Double hashing (rehashing, random probes) 1.05 1.39 1.85 2.01 2.56 4.6 --
Unsuccessful
Chaining 0.10 0.50 0.75 0.80 0.90 0.99 2.00
Open Addressing
  Linear collision resolution 1.12 2.5 8.5 13 50 5000 --
Quadratic collision resolution 1.11 2.19 4.64 5.81 11.4 103.6 --
Double hashing (rehashing, random probes) 1.11 2.0 4.0 5.0 10 100 --


Hash Code Generator

The following form is provided for your convenience in doing exercises involving hashing and double hashing. Enter an integer representing a key value in the first box, another integer representing the size of the hash table (typically a prime number) in the second box, then click the "Hash" button to see the resulting hash code in the third box. This uses the function:

      code = key % modulus

Try entering a series of keys in the first box, pressing the "Hash" button for each one, and write each key in an array at the index indicated for each in the third box. Use linear collision resolution and see if you observe clustering.

If you click the "Rehash" button, the value in the "Hash code" box will be updated using the formula:

      (code + ( 8 - key % 8)) % modulus

To experiment with double hashing, enter a series of keys in the first box, clicking the "Hash" button for each one until a collision occurs. Then click the "Rehash" button repeatedly until the "Hash code" box indicates an empty (or "deleted") bucket in the hash table.

Enter an integer key:

Enter a modulus:

       

Hash code: