Download the following file to a location in your home directory tree on a computer in the senior lab:
You may also need the following large file of words in alphabetical order (depending on what your Linux distribution offers):
Create a new Java project in Eclipse — call it BinSearch.
With the new project name highlighted, select the "Import" option from the File menu or by right-clicking the project name. Choose the "Archive file" option, and navigate to where you saved BinSearch.zip. Importing this archive file should create all the files you need in the BinSearch project, which is divided into two packages: intArrayBinSearch and stringArrayBinSearch.
Study the file BinarySearch.java in the intArrayBinSearch package. Note that the search method is a non-recursive helper method that simply calls the recursive method binarySearch with appropriate arguments so that the search for a target integer will start by considering the full range of elements in the array of integers to be searched. Compare it to the algorithm for binary search given in pseudo-code on page 395 of the textbook. Except for the fact that we are searching a simple array of integers rather than an array list of key-value pairs, you should see that the code is almost identical to the pseudo-code. (There are some trivial differences such as variable renaming, the order of parameters, and the way the comparison of values is done).
Now study the file DumbBinarySearch.java in the same package. What is the main difference between the binarySearch method in this class and that in the BinarySearch class? Try hand-executing this algorithm on the array of integers illustrated on page 396 of the textbook. Does it work for find(22)? What happens when you try find(25)? Why do you think I named this class DumbBinarySearch? Did you notice the subtle difference between the last line of the binarySearch method of this class and the corresponding line in the same method of the BinarySearch class? Is this difference necessary or not? Why or why not?
Which binary search algorithm do you think will perform faster? Why?
Study the file SearchTimes.java in the intArrayBinSearch package. Observe how the array of integers is filled with even integers in ascending order in the main method. Unless you change the value of the ORDERED_PROBES constant, the orderedProbes method will be executed when you run this program. Note how it searches for both even and odd integers, so half of its searches should be successful and half unsuccessful. Note that it calculates how much time it takes to perform one million searches using "regular" binary search and to perform another million searches using "dumb" binary search.
Run the SearchTimes program. Run it several times to see how consistent or inconsistent the results are. Do the results confirm your speculation about which algorithm would run faster?
Now look at the files in the stringArrayBinSearch package. The search algorithms in this package are the same as those in the intArrayBinSearch package, but are revised to search an ArrayList of Strings (through the List<String> interface). The WordList class constructs a list of more than 38,000 words. (You may have to revise the DEFAULT_WORD_FILE constant so it corresponds to the location of the list of words provided with your version of Linux. Alternatively, you can specify a filename as a command-line argument when running the program. On Berio, and probably on the computers in the senior computing lab, the words file is found at /usr/share/dict/words.) It also constructs a list of the same size of non-words, by changing one randomly chosen letter in each word to an 'x'. The SearchTimes program searches the word list for each actual word and each non-word, so it should also have an equal number of successful and unsuccessful searches.
Run the SearchTimes program from the stringArrayBinSearch package, several times. What do you notice about the performance of the two binary search algorithms on an array of strings compared with their performance on an array of integers? How do you explain the difference?
The implementation of binary search given in the textbook uses trichotomous comparison — a three-way decision between less-than, equal-to, and greater-than, which requires two comparisons on every probe, but which quits as soon as the target value is found.
An alternative implementation uses dichotomous comparison — it makes only one comparison on each probe, deciding only between less-than and greater-than-or-equal-to. This implementation must continue iterating or recurring until the lower and upper limits meet or cross, and then must check at the end whether the desired value has been found or not.
Draw a picture that illustrates how binary search works. Your picture might resemble a binary search tree, or a decision tree in which each node corresponds to one of the comparisons made by the algorithm. If you draw a binary search tree that corresponds to the order in which array elements are probed by a binary search of an ordered array, the tree will have the same shape for both variants of binary search. If you draw a decision tree, the tree for trichotomous search will have more decision nodes than that for dichotomous search. How often does the "regular" binary search algorithm terminate before reaching the lowest level of the tree? How often does it terminate before reaching the second-lowest level? How often does it terminate at the root? Can you determine a formula that will characterize the performance of "regular" binary search on an array of n elements? Would the formula be different for successful versus unsuccessful search? Can you determine the formula for "dumb" binary search? Is it different for successful and unsuccessful searches?
What is the main difference between searching for integers and searching for strings? How does that explain the difference, if any, in the performance of the two binary search algorithms on integers and on strings?
What is the main advantage of trichotomous binary search as compared with the dichotomous variety? What is the main advantage of dichotomous search? How important is the difference between them?
Copyright © 2005 Jonathan Mohr