Logo of the Augustana Faculty of the University of Alberta

COMPUTING SCIENCE 210
Algorithm Analysis and Data Structures


Heapsort



Some terms pertaining to binary trees:

proper
A binary tree in which each node has either zero or two children; or, equivalently, each internal (non-leaf) node has two children.
full
A binary tree in which all levels, except possibly the lowest, are full; equivalently, a binary tree in which all the leaves are on the same level or on two adjacent levels.
Note: This definition differs from that given by Goodrich and Tamassia, who regard full as a synonym for proper.
leftmost
A binary tree in which all the internal nodes are to the left of all the external nodes (leaves), and if a node has only one child, it is a left child.
A node v is to the left of node w in tree T if v is neither an ancestor nor a descendant of w and v is encountered before w in an inorder traversal of T.
complete
A binary tree that is full and leftmost.
Note that a complete binary tree will have at most one node with a single (left) child.

Heap-Order Property

Given a tree in which each node stores a key from a set of keys on which a total order relation is defined, for any internal node v, the key stored at v is less than or equal to the key(s) stored at its child(ren).

More generally, each node can store a key-value pair, such that the total ordering is defined on the key portion of the tuple stored at each node.

This definition defines the heap-order property for a min-heap. One consequence of this definition is that the root of the tree stores the minimum key of all the keys in the tree (or one of the keys of minimum value); colloquially, the key of minimum value is always at the "top of the heap".

Assuming that the total order relation is defined by a comparator, it is an easy matter to reverse the comparison operation so that the maximal key is stored at the root instead, yielding a max-heap.

Heap

A heap is:

One consequence of the first characteristic of a heap — the fact that it is a complete binary tree — is that it can be stored very efficiently in a contiguous representation (i.e., an array). If a heap of size n is stored in elements 1…n of an array, then the left child of an internal node at index i is stored at index 2i and its right child, if it exists, is stored at index 2i+1.

If implementing a heap in a deficient language such as C, C++, or Java that does not allow arrays to begin at indices other than 0, then if a heap of size n is stored in elements 0…n-1 of an array, the left child of an internal node at index i is stored at index 2i+1 and its right child, if it exists, is stored at index 2i+2. It is, of course, possible to store a heap of size n in an array of size n+1 and to use only elements 1…n; however, if one intends to use a heap as part of an in-place sorting algorithm, as explained below, one may have no choice but to use zero-based indexing in order to avoid having to move all the elements of the array one position to the left at the end of the sort.

A heap is an excellent data structure to implement a priority queue. Due to the second characteristic of a heap — the heap-order property — the highest priority element is always stored at the root of the tree. Because the binary tree on which the heap is represented is complete, the height of a heap of size n is floor( log n ); this means that both the operations defined on a priority queue — insert and removeMin — can be implemented to operate in time O(log n).

Heapsort

Given an array of elements to be sorted, it is easy to sort them in place using a contiguous implementation of a heap representing a priority queue. The sort proceeds in two phases:

  1. Transform the array of elements into a max-heap by moving elements of the array as needed to establish the heap property.
  2. Sort the elements by repeatedly extracting the maximum value from the priority queue (the value at the root of the heap), restoring the heap property on the heap that has now shrunk one element in size, and storing the most recently extracted value in the element of the array vacated by the shrinking heap.

Assuming (without loss of generality) that the n values to be sorted are stored in elements 1…n of an array A, the following pseudo-code describes the in-place heapsort algorithm using a recursive procedure downheap to establish the heap property for a subtree. The first phase uses the O(n) bottom-up method of heap construction due to Floyd.


   procedure downheap( int i, int last )
      /*
         Assumes that the heap property holds for all descendants of
         element i, and makes the property hold for element i as well.
      */
      int j = 2 * i   // Set j to the index of the left child of i.

      if j < last then
         if A[j].key < A[j+1].key then  // change to > for a min-heap
            j = j + 1

      if j <= last then
         if A[i].key < A[j].key then    // change to > for a min-heap
            swap( A[j], A[i] )
            downheap( j, last )

   procedure heapsort
      /* 
         First, establish the heap property by bottom-up 
         heap construction. 
      */
      for i = n / 2 downto 1 do
         downheap( i, n )

      /* 
         Next, place the largest element last, the second-largest
         second-last, etc.
      */
      for i = n downto 2 do
         swap( A[1], A[i] )
         downheap( 1, i-1 )   // heap shrinks by 1

Note: The Pascal-like construction

   for i = n downto m do

would be implemented in C, C++, or Java as

   for ( i = n; i >= m; i-- )
Copyright © 2005 Jonathan Mohr