Augustana logo

COMPUTING SCIENCE 370
Programming Languages


Programming Assignment 1 -- Lisp

The K-Nearest-Neighbours Problem



Due Date: Friday, October 15, by midnight.

Objectives

Background

A common problem in pattern recognition is that of classifying a new data item relative to previously classified data items. Typically, each data item is described by a feature vector, a list of values, each of which specifies the measure of some feature of the data item. For example, we might describe a person by listing their age, height, weight, hair colour, and eye colour.

One way to classify a new data item is to compare its feature vector with the feature vectors of all the previously classified data items in order to find the data item that is most similar to the new data item. If a feature vector is considered to be the coordinates of a data point in an n-dimensional Euclidean space, we can compare two feature vectors by calculating the Euclidean distance between them. For example, to calculate the distance between the feature vectors describing two persons, we would sum the squares of the differences between the respective persons' ages, heights, weights, hair colours, and eye colours. In order for this to work well, we would have had to ensure that the values used to encode hair and eye colours actually worked as a distance measure; that is, similar colours should be given values that are close to each other.

Better classification is usually achieved by finding more than one of the nearest neighbours in the feature space of the new data item. This is called the k-nearest-neighbour (KNN) method.

Assignment

Write a set of Lisp functions that find the k nearest neighbours of a given feature vector in a set of feature vectors.

The top-level function must be named knn, and it must accept three parameters: k, the number of nearest neighbours to be found; the feature vector of the new data item; and a list of feature vectors from which the nearest neighbours are to be found. The function should return a list (of length k) of the feature vectors from the list provided as the function's third parameter that are closest, in a Euclidean sense, to the feature vector provided as the second argument to the function.

You should also provide a function named print_knn that formats the list of nearest neighbour feature vectors so that they are displayed one per line, with no leading or trailing parentheses.

In order to make it easier for a human reader to check the output from the knn function, all feature vectors will consist of an alphabetic identifier followed by an ordered sequence of numbers. (Make sure that your distance-calculating function does not try to calculate the difference between two alphabetic identifiers.) For example, the following might be a valid feature vector:

(v 1 8 3 -2 7 -4)

Your implementation is subject to the constraint that it should illustrate pure functional programming; that is, it should not use set, setf, setq, or any other binding primitive. You may use set, etc., for the purposes of testing your functions, but not in any of the functions that form part of your solution to the problem. For example, you could include the following lines at the end of the file containing your Lisp functions:

  ( setq v '(v 1 8 3 -2 7 -4))
   ( setq c '(
	(a 2 -1 6 3 -7 3)
	(b -1 7 -2 1 3 -2)
	(c 0 9 4 0 6 -3)
	(d 6 -1 3 -4 3 1)
	(e 1 6 -1 3 3 -1)
	(f 3 8 5 1 5 -3)
	(g 4 5 2 -1 4 0)
	(h -2 6 5 -8 4 2)))

You could then test your program with the function calls:

  (print_knn (knn 5 v c))

The result would then be:

  E 1 6 -1 3 3 -1
  B -1 7 -2 1 3 -2
  G 4 5 2 -1 4 0
  F 3 8 5 1 5 -3
  C 0 9 4 0 6 -3
 

You may, of course, define any other subsidiary functions that you might find useful. For example, a function that calculates the distance between two feature vectors (ignoring the leading identifier) might be useful. You might also want a function that finds the single nearest neighbour to the target feature vector in a set of feature vectors; you could then solve the KNN problem by finding the nearest neighbour, then finding the nearest neighbour from the set of feature vectors with the first nearest neighbour removed, and so on.


Submission

Please hand in a hardcopy listing of your solution to this assignment (to ensure that the formatting which you carefully prepared is preserved) and submit the source code for your solution using the following form:

A set of LISP functions to solve the k-nearest neighbour problem



Web Password

Your assignment submissions are password-protected. These passwords apply only to form data submitted via the Web server. They are separate from (and typically different from) your network password or the password for your Unix account (if you have one).



Grading

Your solution should display the following characteristics:

  1. Correctness - The program should conform to the specifications for which it was written. It should include correct handling of special cases, error conditions, etc., except that you may assume that input will be provided in the specified format.

  2. Design and Efficiency - The program should be constructed from small, coherent, independent and loosely coupled functions. Each function should access only its own parameters. The control constructs and data structures used should be those appropriate to the problem at hand. The program should not perform unnecessary steps, use extraneous variables, nor implement the algorithm in a contorted or inefficient way.

  3. Style and Documentation - The program should conform to generally accepted principles of style, such as a consistent pattern of indentation, use of meaningful identifiers, generous use of space, etc. Internal documentation should include program and function headers, and in-line comments to clarify the code.

  4. Knowledge of the language - Your program should provide evidence of your familiarity with the principal control constructs, operators, built-in functions, and data structuring facilities of the assigned language.

In the case of Lisp, you should also demonstrate your grasp of the principles of functional (applicative) programming. Comments should be prefixed according to the standard practice for Lisp.

Weightings

The weightings of the various factors for grading will be approximately as follows:

     Correctness 50%
  Design and Efficiency 20%
  Style and Documentation 20%
  Idiomatic use of the Lisp language    10%


Late Submissions

Assignments will be accepted after the due date and time, but late submissions will be assessed a penalty of 1% per hour or portion of an hour.

Copyright © 2004 Jonathan Mohr