HashSet

15 points

A Set is a data structure that is not ordered. When an item is added to a Set, the Set (not the user) decides where to store it. A Set also does not allow duplicates. If an item is present in a Set and the user tries to add an identical item to the Set, the attempt will be rejected. A Set allows for rapid, efficient search to see if a given item is present in the Set. Insertion and deletion of items is also quick with a Set.

One way to implement a Set is with a hash table, and the resulting structure is called a HashSet. Using a hash table, or hashing, involves assigning a number to a particular item. The item is then stored in that position in the Set. For example, if hashing an item leads to the number 2010, then the item would be stored in position 2010 in the Set. It is possible that more than one item will have the hash number 2010; this is called a 'collision'. If 2 or more items hash to 2010, then a list of items with the hash number 2010 is stored at position 2010. If the set has fewer than 2010 elements, modular arithmetic can be used to reduce a hash number so that the item can be stored in the available space. If the Set has only 1000 positions, for example, the calculation 2010 mod 1000 is carried out, yielding 10, and the item could be stored in position 10. Section 16.4 gives examples of hashing.

In this program, the top 250 movies of all time, according to imdb.com as of April 2010, are added to a HashSet. The list of movies actually has 253 entries, because I listed movies #100, #200 and #250 twice. This way, you can test the code you write that prevents duplicates in a Set. You'll use both a homegrown HashSet and the HashSet class in the Java library. The two HashSets should behave similarly: they should both reject duplicates, print all 250 movies, and successfully find and delete a movie. The two HashSet classes are not expected to store the 250 movies in the same buckets. Using a HashSet, the list of movies can be efficiently searched. Insertion and deletion of a movie is also efficient.

 

Starting Point

You can download HashSet2.java, HashSetMain.java, Movie.java and Top250Movies.dat to use as starting points in writing your program. Right-click on the links, choose 'Save Target As' and add the '.java' extension to each file name.  Each Java file contains instructions on writing code.