# Hash Tables

A hash table is a more advanced data structure which typically makes use of one or more other data structure. The general idea is to store the value within a bucket based on the hashing of some key. The simplest example stores values within an array. By applying the hashing algorithm to our key, we get the index to store our value in. Generally speaking, hash tables are used to store a key=>value pair, though in our examples the key will be the value - to keep things straightforward.

## Implementation

function HashTable(size) { this.size = size; this.buckets = new Array(size); this.add = function(value) { var index = this.hash(value); this.buckets[index] = value; }; this.hash = function(value) { var sum = 0; for (var i = 0; i < value.length; ++i) { sum += value[i].charCodeAt() - 97; } return sum % this.size; }; } var hash = new HashTable(3); hash.add('fear'); hash.add('is the'); hash.add('little death');

## Example

Click *step* to add the values to our hash table

## Hash Functions

The hash function plays a pivotal role. Our naive implementation merely sums the ascii value of each character (minus 97). In order to fit within our fixed array, the remainder is taken (via the modulo operator (`%`

)). This last step guarantees that our `hash`

function returns a value between 0 and the size of the array (0 to 2 in our specific case).

## Collisions

Hash tables must deal with collisions - that is, two values returning the same bucket index. This is more common than you might think (see the birthday problem). One approach is for each bucket be its own data structure - say, a linked list.

## Collision Implementation

function HashTable(size) { this.size = size; this.buckets = new Array(size); for(var i = 0; i < size; ++i) { this.buckets[i] = new LinkedList(); } this.add = function(value) { var index = this.hash(value); this.buckets[index].add(value); }; this.hash = function(value) { ... }; } var hash = new HashTable(5); hash.add('i'); hash.add('will'); hash.add('face'); hash.add('my'); hash.add('fear');

## Example

Click *step* to add the values to our hash table

## Characteristics

As a general rule hash algorithms are lightweight and fast. This means that, without collisions, read and write performance is good (and constant). However, as collisions occur, read and write performance become dependent on the underlying collision resolution implementation. Also, if keys are not properly distributed, performance between individual buckets can vary.

## In The Real World

Hash tables are used frequently in modern applications. They are often used to store values per key in order to efficiently do lookups without relying on a linear search. However, many developers believe that reading and writing to a hash table is a constant 0(1) operation (that is, the hash table is collision-free). This is rarely the case. In the worst case, every value might go in the same bucket.

Hashing functions and the collision strategy used by modern languages and libraries are efficient and should be used. The only thing to be careful of is using custom objects as a key without specifying a custom hashing solution.

blog comments powered by Disqus