Data structures using c and c++ langsam augenstein tenenbaum pdf

Not to be confused with Hash list or Hash tree. A hash table uses a hash function to compute an index into an array of buckets or slots, from which the desired value can be found. Ideally, the hash function will assign each key to a unique bucket, but most hash table designs employ an imperfect hash function, which might cause data structures using c and c++ langsam augenstein tenenbaum pdf collisions where the hash function generates the same index for more than one key.

Such collisions must be accommodated in some way. In many situations, hash tables turn out to be more efficient than search trees or any other table lookup structure.

For this reason, they are widely used in many kinds of computer software, particularly for associative arrays, database indexing, caches, and sets. In the case that the array size is a power of two, the remainder operation is reduced to masking, which improves speed, but can increase problems with a poor hash function. A good hash function and implementation algorithm are essential for good hash table performance, but may be difficult to achieve. A basic requirement is that the function should provide a uniform distribution of hash values.

A non-uniform distribution increases the number of collisions and the cost of resolving them. Uniformity is sometimes difficult to ensure by design, but may be evaluated empirically using statistical tests, e. Pearson’s chi-squared test for discrete uniform distributions.

The distribution needs to be uniform only for table sizes that occur in the application. In particular, if one uses dynamic resizing with exact doubling and halving of the table size s, then the hash function needs to be uniform only when s is a power of two.

Here the index can be computed as some range of bits of the hash function. On the other hand, some hashing algorithms prefer to have s be a prime number. For open addressing schemes, the hash function should also avoid clustering, the mapping of two or more keys to consecutive slots.

Such clustering may cause the lookup cost to skyrocket, even if the load factor is low and collisions are infrequent. The popular multiplicative hash is claimed to have particularly poor clustering behavior.

Cryptographic hash functions are believed to provide good hash functions for any table size s, either by modulo reduction or by bit masking. A drawback of cryptographic hashing functions is that they are often slower to compute, which means that in cases where the uniformity for any s is not necessary, a non-cryptographic hashing function might be preferable. If all keys are known ahead of time, a perfect hash function can be used to create a perfect hash table that has no collisions.

If minimal perfect hashing is used, every location in the hash table can be used as well. Perfect hashing allows for constant time lookups in all cases. The expected constant time property of a hash table assumes that the load factor is kept below some bound. For a fixed number of buckets, the time for a lookup grows with the number of entries and therefore the desired constant time is not achieved.