Augustana Faculty logo and wordmark


COMPUTING SCIENCE 210
Algorithm Analysis and Data Structures


Programming Assignment 4
Zipf's Law and the Map Interface



Due Date: Wednesday, December 7 (by midnight)

Objectives

Assignment

Write a Java program named WordCount that explores Zipf's Law as it applies to textual data by reading (scanning) text input from the standard input, counting the number of occurrences of each input word, and reporting the 100 top-ranked words to the standard output.

Your program can be based on the WordCount program given on p. 388 of the textbook, but must be revised to use the interfaces and classes of the Java Collections Framework instead of the net.datastructures classes. In particular, you should write your program to the Java Map interface, appropriately parameterized to do word counts.

You should ensure that your program is written to the Map interface by testing it with two implementations of that interface — the HashMap and TreeMap implementations from the Java Collections Framework — by changing only one line (actually, half a line) of your program. Time your program (using the Unix time command) using each implementation; which one gives faster results?

Ensure that your method for getting the word-frequency pairs into the correct ranked order by frequency operates in time O(n log n), not O(n2).

Interfaces and Exceptions

The Javadoc documentation for the following Java interface and collection classes are provided here for your convenience:

The full set of documentation for the Java API's is available at:

Hints

You may need to create a class that implements the parameterized Map.Entry interface, each object containing a word and its associated word count, in order to be able to instantiate an array of word-frequency entries retrieved from the map.

You may also need to implement the Comparator interface (appropriately parameterized) to provide a customized comparator that the Arrays.sort() method can use to sort the entries by frequency.

There are many sources for text input for your program. One example, which is conveniently and freely available in plain text format, is the KJV Bible, from Project Gutenberg:

. The Bible, King James Version — available both in plain text (4.24 MB) and zipped plain text (1.34 MB)

Submission

Please hand in a hardcopy listing of your solution to this assignment (to ensure that the formatting which you carefully prepared is preserved) and submit the source for your solution using the following form. You may use the same form to submit multiple files if your solution involves more than one file. You may also submit a solution multiple times if you wish to correct errors in previous submissions. Your last submissions will be accepted for grading.

A source file (*.java)

Late Submissions

Assignments will be accepted after the due date and time, but late submissions will be assessed a penalty of 1% per hour or portion of an hour.


Web Password

Your assignment submissions are password-protected. These passwords apply only to form data submitted via the Web server. They are separate from (and typically different from) your network password or the password for your Unix account. To change your Web password, press the following button:

Copyright © 2005 Jonathan Mohr