1.CSE 373: Data Structures &amp; Algorithms Hash Tables Riley Porter Winter 2017 Winter 2017 CSE373: Data Structures &amp; Algorithms

2.2 Course Logistics HW2 clarification posted on spec: buildQueue replaces elements, not adds them. Weekly Summaries changed to Topic Summaries, first one out on amortized runtime, rest out soon.

3.Review + Motivating Hash Tables insert find delete Unsorted linked-list Unsorted array Sorted linked list Sorted array BST AVL Tree Magic Array? 3 O(1) O(n) O(n) O(1) O(n) O(n) O(n) O(n) O(n) O(n) O( logn ) O(n) Winter 2017 O(n) O(n) O(n) O( logn ) O( logn ) O( logn ) O(1) O(1) O(1) Sufficient “magic”: Use key to compute array index for an item in O (1) time [doable] Have a different index for every item [magic]

4.Motivating Hash Tables Let’s say you are tasked with counting the frequency of integers in a text file. You are guaranteed that only the integers 0 through 100 will occur: For example : 5, 7, 8, 9, 9, 5, 0, 0, 1, 12 Result: 0  2 1  1 5  2 7  1 8  1 9  2 What structure is appropriate? Tree? List? Array? 4 CSE373: Data Structures &amp; Algorithms 2 1 2 1 1 2 0 1 2 3 4 5 6 7 8 9

5.5 Motivating Hash Tables Now what if we want to associate name to phone number? Suppose keys are first, last names how big is the key space? What if we could map a large set of keys to a small amount of space? Like mapping all possible strings to the set of numbers from 1 to 100?

6.6 Hash Functions Maps any key to a number result should be constrained to some range passing in the same key should always give the same result Keys should be distributed over a range very bad if everything hashes to 1! should "look random" How would we write a hash function for String objects?

7.Hash Functions An ideal hash function: Fast to compute “Rarely” hashes two “used” keys to the same index Often impossible in theory but easy in practice Will handle collisions later 7 CSE373: Data Structures &amp; Algorithms 0 … TableSize –1 hash function: index = h(key) hash table key space (e.g., integers, strings)

8.Using hash functions for Hash Tables Aim for constant-time (i.e., O (1)) find , insert , and delete “On average” under some often-reasonable assumptions A hash table is an array of some fixed size Basic idea: 8 CSE373: Data Structures &amp; Algorithms 0 … TableSize –1 hash function: index = h(key) hash table key space (e.g., integers, strings)

9.Example of Hash Table used for Dictionary of phone numbers

10.Hash Tables vs. Balanced Trees In terms of a Dictionary ADT for just insert , find , delete , hash tables and balanced trees are just different data structures Hash tables O (1) on average ( assuming we follow good practices) Balanced trees O ( log n ) worst-case Constant-time is better, right? Yes, but you need “hashing to behave” (must avoid collisions) Yes, but your data will not be stored in a sorted order Yes, but findMin , findMax , predecessor , and successor go from O ( log n ) to O ( n ), printSorted from O ( n ) to O ( n log n ) 10 CSE373: Data Structures &amp; Algorithms

11.Examples of Hash Tables Really, any dictionary or collection that doesn’t need to be sorted. Compiler: All possible identifiers allowed by the language vs. those used in some file of one program Database: All possible student names vs. students enrolled AI: All possible chess-board configurations vs. those considered by the current player 11 CSE373: Data Structures &amp; Algorithms

12.12 Review: Mod Mod is the remainder function To keep hashed values within the size of the table, we will generally do: h(K) = function(K) % TableSize As a reference, useful properties of mod: (a + b) % c = [(a % c) + (b % c)] % c (a b) % c = [(a % c) (b % c)] % c a % c = b % c → (a – b) % c = 0 Show 24 +/* 57 = 4 +/ 7

13.13 Example: Simple Integer Hash Function key space K = integers TableSize = 7 h(K) = K % 7 Insert : 7, 18, 41 0 1 2 3 4 5 6 7 18 41,34 7 18 41

14.14 Designing Hash Functions Often based on modular hashing : h(K) = f(K) % P P is typically the TableSize P is often chosen to be prime: Reduces likelihood of collisions due to patterns in data Is useful for guarantees on certain hashing strategies (as we’ll see) Equivalent objects MUST hash to the same location

15.14 Designing Hash Functions Often based on modular hashing : h(K) = f(K) % P P is typically the TableSize P is often chosen to be prime: Reduces likelihood of collisions due to patterns in data Is useful for guarantees on certain hashing strategies (as we’ll see) Equivalent objects MUST hash to the same location

16.16 Java String’s hashCode The hashCode function inside String objects could look like this: public int hashCode () { int hash = 0; for ( int i = 0; i &lt; this.length (); i ++) { hash = 31 * hash + this.charAt ( i ) ; } return hash; } As with any general hashing function, collisions are possible. Example: " Ea " and "FB" have the same hash value. Early versions of Java examined only the first 16 characters. For some common data this led to poor hash table performance.

17.Hashing your own Objects For objects with several fields, usually best to have most of the “identifying fields” contribute to the hash to avoid collisions Example: public class Person { private String first; private String middle; private String last; private Date birthdate; } An inherent trade-off: hashing-time vs. collision-avoidance Bad idea(?): Use only first name Good idea(?): Use only middle initial? Combination of fields? Admittedly, what-to-hash-with is often unprincipled  17 CSE373: Data Structures &amp; Algorithms

18.Note: Equal Objects MUST hash the same The Java library makes a very important assumption that clients must satisfy… If a.equals (b) , then we require a.hashCode () == b.hashCode () If you ever override equals You need to override hashCode also in a consistent way See CoreJava book, Chapter 5 for other "gotchas" with equals 18

19.Collisions collision : When hash function maps 2 values to same index. h(k) = k % tableSize ; set.add (11); index 0 1 2 3 4 5 6 7 8 9 value 0 0 0 0 0 0 0 0 0 0

20.Collisions collision : When hash function maps 2 values to same index. h(k) = k % tableSize ; set.add (11); set.add (49) ; index 0 1 2 3 4 5 6 7 8 9 value 0 11 0 0 0 0 0 0 0 0

21.Collisions collision : When hash function maps 2 values to same index. h(k) = k % tableSize ; set.add (11); set.add (49); set.add (24); index 0 1 2 3 4 5 6 7 8 9 value 0 11 0 0 0 0 0 0 0 49

22.Collisions collision : When hash function maps 2 values to same index. h(k) = k % tableSize ; set.add (11); set.add (49); set.add (24); set.add (7) ; index 0 1 2 3 4 5 6 7 8 9 value 0 11 0 0 2 4 0 0 0 0 49

23.Collisions collision : When hash function maps 2 values to same index. h(k) = k % tableSize ; set.add (11); set.add (49); set.add (24); set.add (7); set.add (54); index 0 1 2 3 4 5 6 7 8 9 value 0 11 0 0 2 4 0 0 7 0 49

24.Collisions collision : When hash function maps 2 values to same index. h(k) = k % tableSize ; set.add (11); set.add (49); set.add (24); set.add (7); set.add (54); // collides with 24! index 0 1 2 3 4 5 6 7 8 9 value 0 11 0 0 54 0 0 7 0 49

25.Collision resolution Collision : When two keys map to the same location in the hash table We try to avoid it, but number-of-keys exceeds table size So hash tables should support collision resolution Ideas? 25 CSE373: Data Structures &amp; Algorithms

26.Probing probing : Resolving a collision by moving to another index. linear probing : Moves to the next index. set.add (11); set.add (49); set.add (24); set.add (7); set.add (54); // collides with 24; must probe Is this a good approach? variation: quadratic probing moves increasingly far away index 0 1 2 3 4 5 6 7 8 9 value 0 11 0 0 24 54 0 7 0 49

27.Linear Probing Example If h(key) is already full, try (h(key) + 1) % TableSize . If full, try (h(key) + 2) % TableSize . If full, try (h(key) + 3) % TableSize . If full… Example: insert 38, 19, 8, 109, 10 27 0 / 1 / 2 / 3 / 4 / 5 / 6 / 7 / 8 38 9 /

28.Linear Probing Example 28 0 / 1 / 2 / 3 / 4 / 5 / 6 / 7 / 8 38 9 19 If h(key) is already full, try (h(key) + 1) % TableSize . If full, try (h(key) + 2) % TableSize . If full, try (h(key) + 3) % TableSize . If full… Example: insert 38, 19, 8, 109, 10

29.Linear Probing Example If h(key) is already full, try (h(key) + 1) % TableSize . If full, try (h(key) + 2) % TableSize . If full, try (h(key) + 3) % TableSize . If full… Example: insert 38, 19, 8, 109, 10 29 0 8 1 / 2 / 3 / 4 / 5 / 6 / 7 / 8 38 9 19