All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.

Related Documents

Share

Description

CLRS Linked Lists

Transcript

11.2 Hash tables 257 T U
(universe of keys)
K
(actualkeys)
k
1
k
2
k
3
k
4
k
5
k
6
k
7
k
8
k
1
k
2
k
3
k
4
k
5
k
6
k
7
k
8
Figure 11.3
Collision resolution by chaining. Each hash-table slot
TŒj
contains a linked list of all the keys whose hash value is
j
. For example,
h.k
1
/
D
h.k
4
/
and
h.k
5
/
D
h.k
7
/
D
h.k
2
/
.The linked list can be either singly or doubly linked; we show it as doubly linked because deletion isfaster that way.
There is one hitch: two keys may hash to the same slot. We call this situationa
collision
. Fortunately, we have effective techniques for resolving the conﬂictcreated by collisions.Of course, the ideal solution would be to avoid collisions altogether. We mighttry to achieve this goal by choosing a suitable hash function
h
. One idea is tomake
h
appear to be “random,” thus avoiding collisions or at least minimizingtheir number. The very term “to hash,” evoking images of random mixing andchopping, captures thespirit ofthis approach. (Ofcourse, ahash function
h
must bedeterministic in that a given input
k
should always produce the same output
h.k/
.)Because
j
U
j
> m
, however, there must be at least two keys that have the same hashvalue; avoiding collisions altogether is therefore impossible. Thus, while a well-designed, “random”-looking hash function can minimize the number of collisions,we still need a method for resolving the collisions that do occur.The remainder of this section presents the simplest collision resolution tech-nique, called chaining. Section 11.4 introduces an alternative method for resolvingcollisions, called open addressing.
Collision resolution by chaining
In
chaining
, we place all the elements that hash to the same slot into the samelinked list, as Figure 11.3 shows. Slot
j
contains a pointer to the head of the list of all stored elements that hash to
j
; if there are no such elements, slot
j
contains
NIL
.
258 Chapter 11 Hash Tables
The dictionary operations on a hash table
T
are easy to implement when colli-sions are resolved by chaining:C
HAINED
-H
ASH
-I
NSERT
.T;x/
1 insert
x
at the head of list
TŒh.x:
key
/
C
HAINED
-H
ASH
-S
EARCH
.T;k/
1 search for an element with key
k
in list
TŒh.k/
C
HAINED
-H
ASH
-D
ELETE
.T;x/
1 delete
x
from the list
TŒh.x:
key
/
The worst-case running time for insertion is
O.1/
. The insertion procedure is fastin part because it assumes that the element
x
being inserted is not already present inthe table; if necessary, we can check this assumption (at additional cost) by search-ing for an element whose key is
x:
key
before we insert. For searching, the worst-case running time is proportional to the length of the list; we shall analyze thisoperation more closely below. We can delete an element in
O.1/
time if the listsare doubly linked, as Figure 11.3 depicts. (Note that C
HAINED
-H
ASH
-D
ELETE
takes as input an element
x
and not its key
k
, so that we don’t have to search for
x
ﬁrst. If the hash table supports deletion, then its linked lists should be doubly linkedso that we can delete an item quickly. If the lists were only singly linked, then todelete element
x
, we would ﬁrst have to ﬁnd
x
in the list
TŒh.x:
key
/
so that wecould update the
next
attribute of
x
’s predecessor. With singly linked lists, bothdeletion and searching would have the same asymptotic running times.)
Analysis of hashing with chaining
How well does hashing with chaining perform? In particular, how long does it taketo search for an element with a given key?Given a hash table
T
with
m
slots that stores
n
elements, we deﬁne the
load factor
˛
for
T
as
n=m
, that is, the average number of elements stored in a chain.Our analysis will be in terms of
˛
, which can be less than, equal to, or greaterthan
1
.The worst-case behavior of hashing with chaining is terrible: all
n
keys hashto the same slot, creating a list of length
n
. The worst-case time for searching isthus
‚.n/
plus the time to compute the hash function—no better than if we usedone linked list for all the elements. Clearly, we do not use hash tables for theirworst-case performance. (Perfect hashing, described in Section 11.5, does providegood worst-case performance when the set of keys is static, however.)The average-case performance of hashing depends on how well the hash func-tion
h
distributes the set of keys to be stored among the
m
slots, on the average.
11.2 Hash tables 259
Section 11.3 discusses these issues, but for now we shall assume that any givenelement is equally likely to hash into any of the
m
slots, independently of whereany other element has hashed to. We call this the assumption of
simple uniformhashing
.For
j
D
0;1;:::;m
1
, let us denote the length of the list
TŒj
by
n
j
, so that
n
D
n
0
C
n
1
C C
n
m
1
;
(11.1)and the expected value of
n
j
is E
Œn
j
D
˛
D
n=m
.We assume that
O.1/
time sufﬁces to compute the hash value
h.k/
, so thatthe time required to search for an element with key
k
depends linearly on thelength
n
h.k/
of the list
TŒh.k/
. Setting aside the
O.1/
time required to computethe hash function and to access slot
h.k/
, let us consider the expected number of elements examined by the search algorithm, that is, the number of elements in thelist
TŒh.k/
that the algorithm checks to see whether any have a key equal to
k
. Weshall consider two cases. In the ﬁrst, the search is unsuccessful: no element in thetable has key
k
. In the second, the search successfully ﬁnds an element with key
k
.
Theorem 11.1
In a hash table in which collisions are resolved by chaining, an unsuccessful searchtakes average-case time
‚.1
C
˛/
, under the assumption of simple uniform hashing.
Proof
Under the assumption of simple uniform hashing, any key
k
not alreadystored in the table is equally likely to hash to any of the
m
slots. The expected timeto search unsuccessfully for a key
k
is the expected time to search to the end of list
TŒh.k/
, which has expected length E
Œn
h.k/
D
˛
. Thus, the expected numberof elements examined in an unsuccessful search is
˛
, and the total time required(including the time for computing
h.k/
) is
‚.1
C
˛/
.The situation for a successful search is slightly different, since each list is notequally likely to be searched. Instead, the probability that a list is searched is pro-portional to the number of elements it contains. Nonetheless, the expected searchtime still turns out to be
‚.1
C
˛/
.
Theorem 11.2
In a hash table in which collisions are resolved by chaining, a successful searchtakes average-case time
‚.1
C
˛/
, under the assumption of simple uniform hashing.
Proof
We assume that the element being searched for is equally likely to be anyof the
n
elements stored in the table. The number of elements examined during asuccessful search for an element
x
is one more than the number of elements that
260 Chapter 11 Hash Tables
appear before
x
in
x
’s list. Because new elements are placed at the front of thelist, elements before
x
in the list were all inserted after
x
was inserted. To ﬁndthe expected number of elements examined, we take the average, over the
n
ele-ments
x
in the table, of
1
plus the expected number of elements added to
x
’s listafter
x
was added to the list. Let
x
i
denote the
i
th element inserted into the ta-ble, for
i
D
1;2;:::;n
, and let
k
i
D
x
i
:
key
. For keys
k
i
and
k
j
, we deﬁne theindicator random variable
X
ij
D
I
f
h.k
i
/
D
h.k
j
/
g
. Under the assumption of sim-ple uniform hashing, we have Pr
f
h.k
i
/
D
h.k
j
/
g D
1=m
, and so by Lemma 5.1,E
ŒX
ij
D
1=m
. Thus, the expected number of elements examined in a successfulsearch isE
1n
n
X
i
D
1
1
C
n
X
j
D
i
C
1
X
ij
!#
D
1n
n
X
i
D
1
1
C
n
X
j
D
i
C
1
E
ŒX
ij
!
(by linearity of expectation)
D
1n
n
X
i
D
1
1
C
n
X
j
D
i
C
1
1m
!
D
1
C
1nm
n
X
i
D
1
.n
i/
D
1
C
1nm
n
X
i
D
1
n
n
X
i
D
1
i
!
D
1
C
1nm
n
2
n.n
C
1/2
(by equation (A.1))
D
1
C
n
12m
D
1
C
˛2
˛2n :
Thus, the total time required for a successful search (including the time for com-puting the hash function) is
‚.2
C
˛=2
˛=2n/
D
‚.1
C
˛/
.What does this analysis mean? If the number of hash-table slots is at least pro-portional to the number of elements in the table, we have
n
D
O.m/
and, con-sequently,
˛
D
n=m
D
O.m/=m
D
O.1/
. Thus, searching takes constant timeon average. Since insertion takes
O.1/
worst-case time and deletion takes
O.1/
worst-case time when the lists are doubly linked, we can support all dictionaryoperations in
O.1/
time on average.

We Need Your Support

Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks