Monday, January 7, 2008

LECTURE SERIES FOR DATA STRUTURE

Lecture 1
1.Abstract Data Types
2.Introduction
3.Basic Definitions
4.Abstract Data Type




1. Introduction
1.1. Course Description



Handout course description and stack specification. Introduce course, myself (office hours), textbook, marking scheme.
Midterm exam??. Final exam ??

Assignments: There will be approximately 9, most are 1-week, some 2-week.
Lab access??
Tutorials??
TAs. [who are they, how can they be contacted?]
1.2. Programming Prerequisites
This course is not about any particular programming language, both the textbook and the lectures are 99% language independent. However, the assignments must be done in a specific programming language - Turbo C. The assignments assume you are already familiar with:
built-in data types (integer, real, character)
arrays
enumeration types
structures, unions
user-defined types
functions & parameter passing
input/output
control structures: if, switch, for, while




2. Basic Definitions (Chapter 2)
2.1. Data Types and Structured Data Types
Let me begin the course by giving definitions for the terms data type and structured data type.




A data type consists of
a domain (= a set of values)
a set of operations.

Example 1: boolean or logical data type provided by most programming languages.
two values: true, false.
many operations, including: AND, OR, NOT, etc.

Example 2: As a second example, consider the datatype fraction. How can we specify the domain and operations that define fractions? It seems straightforward to name the operations; fractions are numbers so all the normal arithmetic operations apply, such as addition, multiplication, comparison. In addition there might be some fraction-specific operations such as normalizing a fraction by removing common terms from its numerator and denominator - for example, if we normalized 6/9 we'd get 2/3.

But how do we specify the domain for fractions, i.e. the set of possible values for a fraction?
2.2. Structural and Behavioural Definitions
There are two different approaches to specifying a domain: we can give a structural definition, or we can give a behavioural definition. Let us see what these two are like.
Structural Definition of the domain for `Fraction'



The value of of a fraction is made of three parts (or components):
a sign, which is either + or -
a numerator, which may be any non-negative integer
a denominator, which may be any positive integer (not zero, not negative).
This is called a structural definition because it defines the values of type `fraction' by imposing an internal structure on them (they have 3 parts...). The parts themselves have specific types, and there may be further constraints. For example, we could have insisted that a fraction's numerator and denominator have no common divisor (in that case we wouldn't need the normalize operation - 6/9 would not be a fraction by this definition).
Behavioural Definition of the domain for `Fraction'



The alternative approach to defining the set of values for fractions does not impose any internal structure on them. Instead it just adds an operation that creates fractions out of other things, such as
CREATE_FRACTION(N,D)
where N is any integer, D is any non-zero integer.
The values of type fraction are defined to be the values that are produced by this function for any valid combination of inputs.

The parameter names were chosen to suggest its intended behaviour: CREATE_FRACTION(N,D) should return a value representing the fraction N/D (N for numerator, D for denominator).

You are probably thinking, this is crazy, CREATE_FRACTION could be any old random function, how do we guarantee that CREATE_FRACTION(N,D) actually returns the fraction N/D?

The answer is that we have to constrain the behaviour of this function, by relating it to the other operations on fractions. For example, one of the key properties of multiplication is that:
NORMALIZE ((N/D) * (D/N)) = 1/1
This turns into a constraint on CREATE_FRACTION:
NORMALIZE (CREATE_FRACTION(N,D) * CREATE_FRACTION(D,N)) = CREATE_FRACTION(1,1)
So you see CREATE_FRACTION cannot be any old function, its behaviour is highly constrained, because we can write down lots and lots of constraints like this.

And that's the reason I call this sort of definition behavioural, because the definition is strictly in terms of a set of operations and constraints or axioms relating the behaviour of the operations to one another.

In this style of definition, the domain of a data type - the set of permissible values - plays an almost negligible role. Any set of values will do, as long as we have an appropriate set of operations to go along with it.
Which Kind of Definition Should We Use?
In this course, we will mainly use behavioural definitions. Their main advantage is that they are more abstract, imposing absolutely no unnecessary or arbitrary constraints on the data type. This makes them ideal as abstract definitions, which are central to this course. Behavioural definitions are also the best ones to use if you ever need to reason abstractly about a data type: we won't stress this in this course, but in general it is an important reason for using abstract definitions.

We will occasionally use structural definitions. They are somewhat more intuitive, until you get used to the behavioural kind, and they are very common in computer science (e.g. many textbooks use them). Also, because they impose structure on the values they are quite useful in giving ideas about how to implement a particular data type in a computer program.





4. Abstract Data Type
4.1. Alternative Implementations Of Fractions
Returning to our example of the fraction data type, how might we actually implement this datatype in C?



Implementation 1
typedef struct { int numerator,denominator; } fraction;

main()
{
fraction f;
f.numerator = 1;
f.denominator = 2;
...
}
Implementation 2
#define numerator 0
#define denominator 1
typedef int fraction[2];

main()
{
fraction f;
f[numerator] = 1;
f[denominator] = 2;
...
}
These are just 2 of many different possibilities. Obviously, these differences are in some sense extremely trivial - they do not affect the domain of values or meaning of the operations of fractions.
4.2. Substitutivity of Implementations



In a real application we would like to be able to experiment with many different implementations, in order to find the implementation that is most efficient - in terms of memory and speed - for our specific application. And, if our application changes, we would like to have the freedom to change the implementation so that it is the best for the new application.

Equally important, we would like our implementation to give us simple implementations of the operations. It is not always obvious from the outset how to get the simplest implementation, so, again, we need to have freedom to change our implementation.

What is the cost we must pay in order to change implementation? We have to find and change every line of code that depends upon the specific details of the implementation (e.g. available operations, naming conventions, details of syntax - for example, the two implementations of fractions given above differ in how you refer to the components: one uses the dot notation for structures, and the other uses the bracketed index notation for arrays). This can be very costly, expensive, and can run a high risk of introducing bugs.
4.3. Abstract Data Types
We can minimize this cost - and therefore buy as much freedom as possible to change the implementation whenever we like - by minimizing the amount of code that makes use of specific details of the implementation.

This is the idea of an abstract data type. We define the data type - its values and operations - without referring to how it will be implemented. Applications that use the data type are oblivious to the implementation: they only make use of the operations defined abstractly. In this way the application, which might be millions of lines of code, is completely isolated from the implementation of the data type. If we wish to change implementations, all we have to do is re-implement the operations. No matter how big our application is, the cost in changing implementations is the same. In fact often we do not even have to re-implement all the datatype operations, because many of the them will be defined in terms of a small set of basic core operations on the datatype.

Core Fraction Operations



create_fraction(N,D)
Takes 2 integers and returns the fraction N/D.
get_numerator(F)
Takes a fraction and returns its numerator.
get_denominator(F)
Takes a fraction and returns its denominator.
Using these, we can define any other function on fractions such as:
fraction ADD(fraction F1, fraction F2)
{ int n1 = get_numerator(F1);
n2 = get_numerator(F2);
d1 = get_denominator(F1);
d2 = get_denominator(F2);
return create_fraction( n1*d2+n2*d1 , d1*d2 ); }
4.4. Programming With Abstract Data Types
By organizing our program this way - i.e. by using abstract data types - we can change implementations extremely quickly: all we have to do is re-implement three very trivial functions. No matter how large our application is.




In general terms, an abstract data type is a specification of the values and the operations that has 2 properties:
it specifies everything you need to know in order to use the datatype

it makes absolutely no reference to the manner in which the datatype will be implemented.
When we use abstract datatypes, our programs divide into two pieces:
The Application:
The part that uses the abstract datatype.
The Implementation:
The part that implements the abstract datatype.
These two pieces are completely independent. It should be possible to take the implementation developed for one application and use it for a completely different application with no changes.
All your programs in this course must divide like this. T.A.'s may well test the independence of the two parts by substituting one of their own.

If programming in teams, implementers and application-writers can work completely independently once the specification is set.



------------------------------------------------------------------------------------------------------------






Lecture 2
Sorting Algorithms (Section 5.4.5, Chapter 6)
Handout examples of programs to analyze.
General Sorting Strategy (Recursive)
Merge Sort
Insertion Sort
QuickSort
Radix Sort
Sorting Summary
Asymptotic Complexity



1. General Sorting Strategy (Recursive)



A sorting algorithm is an algorithm whose input is any list and whose output is a list (or a change in the given list) that is sorted and has the same elements as the given list.

Most well-known sorting algorithms are based on this general strategy: given a list L
If L has zero or one element, then it is already sorted, so nothing need be done;

Otherwise
divide in L into two smaller lists, L1 and L2.

recursively sort each of the smaller lists. Result: L1 and L2 are now sorted.

Combine L1 and L2. The result is a sorted version of the original list. How do we combine L1 and L2? As just discussed there's only one way to do so and produce a sorted list. We MERGE them.
This strategy works no matter how we do step (1). The pieces can be any size; they could be the same size, or they could be very different sizes; and it does not matter which elements of L we put together or in what order we put them into the pieces. We can divide up and scramble up the elements any way we like, and we will still sort L. In fact, we could even divide L into more than two pieces, we could divide it into 3, or 7, or whatever.

The one thing we must avoid is having one of the pieces equal to L; if this happens we would have an infinite loop. But that is the only thing we have to avoid.

We will now look at three specific versions of this general strategy: Merge Sort, Insertion Sort and Quick Sort.



2. Merge Sort (Section 5.4.4)



Although the general strategy uses MERGE to put the pieces back together, there is a specific version of the general strategy called Merge Sort. The defining feature of Merge Sort is that in step (1), the given list is divided into two equal size pieces (or as near equal as possible).

There are various ways of dividing L into 2 equal pieces, we will illustrate the algorithm by putting the first half of L in one piece and the second half of L in the other piece. We proceed until we're down to singleton or empty lists, which are always sorted, then we work our way back up the recursion by MERGING together the short lists into larger ones (which replace the list that we started with at that level).

The slowest step in this algorithm is the MERGE operation. Earlier we looked at two special cases of MERGE - INSERT and JOIN. How can we change the way we do step (1) - SPLIT L into pieces - so that we can use INSERT or JOIN instead of MERGE?



3. Insertion Sort



In order to use INSERT instead of MERGE it is clear what we have to do. We have to divide L so that one of the pieces is a singleton. This is easy - for example, just separate the head of L from its tail. Knowing that this piece is a singleton, we also can delete the recursive call that sorts it. This leads to the algorithm:
Divide L into two pieces, its HEAD and its TAIL.
recursively sort the TAIL.
INSERT the HEAD into the sorted TAIL.
This is called Insertion Sort.

Example: Given L = [56,35,42,29]:
HEAD = 56. TAIL = [35,42,29].
Recursively sort the TAIL. Result = [29,35,42]. (the details of this step are given below)
INSERT 56 into [29,35,42]. Result = [29,35,42,56].
Step (2) is a recursive call, so it follows the same pattern, except for this computation, L = [35,42,29].
HEAD = 35. TAIL = [42,29].
Recursively sort the TAIL. Result = [29,42].
INSERT 35 into [29,42]. Result = [29,35,42].
So, we have succeeded in replacing the MERGE operation in step (3) with an INSERT operation, which is quite a bit more efficient. However, we have made a major sacrifice in efficiency in order to do this. Merge Sort happens to be one of the fastest sorting algorithms known; its speed is entirely due to the fact that it cuts the given list L into two equal size pieces. Insertion Sort gives up this source of efficiency, and as a result, it is much less efficient for large lists. We will see how to analyze the efficiency of algorithms later in the lecture.


4. Asymptotic Complexity (Big O Analysis) (Chapter 6)



We have spoken about the efficiency of the various sorting algorithms, and it is time now to discuss the way in which the efficiency of sorting algorithms, and algorithms in general, is measured. This is called complexity analysis, and it is a very important and widely-studied subject in computer science. Our short introduction here touches only the surface of the subject: you are encouraged to study it further through textbooks devoted to the subject (there are many) or in-depth university courses.

Now you might say, it's obvious how to measure the efficiency of an algorithm. Just code up the algorithm as a program and run the program on some test data. Measure the execution time, or the memory required, and there you are. If you want to know if one algorithm is more efficient than another, do this for both of them and compare the results.

Can anyone see any problems in measuring efficiency this way?
Two major kinds of problems
CPU time (if speed is what we're interested in) is affected by very many factors, many of which are not related to the algorithm itself:
which computer, how heavily is it loaded
which programming language, which compiler
hidden factors can swamp the ones you're trying to measure:
array-bounds checking
relative speeds: array indexing, record access and pointer following,
overhead for procedure calls
I/O

Which test data? It is true that `constant time' algorithms, by definition, take the same amount of time on all inputs. But most algorithms are not constant time so we cannot measure efficiency as a single number; it must be expressed as a function of the input.
How will we solve these problems?



The way we will get around (1) is to do our measurements analytically, not experimentally. In other words we will analyze the algorithm - or the relevant aspects of a program implementing it - in order to figure out its efficiency and the factors that affect efficiency. Most commonly, we measure one of the following:
the number of additions,multiplications etc.(for numerical algorithms).
the number of comparisons (for searching, sorting)
the number of data moves (assignment statements,maybe also parameters).
the amount of memory required.

The way we will get around (2) is to express efficiency (or whatever we choose to measure in (1) as a function of the input... Efficiency(algorithm A) = a function F of some property of A's input.

We have to decide which property of the input we are going to measure; the best choice is the property that most significantly affects the efficiency-factor we are trying to analyze. We will use the generic term size of the input to refer to the key property that will be used in the analysis. And we'll use the variable N to denote this property. By the way, it is not absolutely necessary to have just one property; we could express efficiency as a function of several properties. But it is most common to use just one, and we'll stick to examples where one is all that is needed.

For example, the time taken to sort a list is invariably a function of the length of the list. The sorting algorithms you meet in first year are typically quadratic functions - TIME(N) = N*N (roughly). MergeSort and QuickSort are faster: for them TIME(N) = N*log(N) (roughly), which is less than quadratic, but greater than linear in the length of the list.




This does not entirely answer problem (2). Consider an algorithm that searches through a list looking for a specific value. Let's assume the value is always somewhere in the list, so the algorithm always succeeds. What are the factors affecting the speed with which this algorithm runs?

Answer: The position where the value is located.


If the algorithm happens to start at the location containing the value, the algorithm will finish very quickly. On the other hand, if the algorithm is `unlucky' it will try all other locations before getting to the right one!

So the function we're trying to draw is really a band: for an input of size N (e.g. length of a list), you get variation between two extremes, called the worst case behaviour and the best case behaviour.

In our searching example: best-case = constant time, worst-case = length of list (or is it one less than length - see below) The average-case is somewhere in between - exactly where depends on things beyond this course.

Now, when we go to compare two algorithms we are comparing two functions. One has to be careful, because functions can cross one another several times - if this happens it means that neither of the algorithms is better than the other on all possible inputs.
To really do a proper job of comparing two algorithms we should work out their efficiency functions exactly, and determine exactly where they cross. In practice, this is asking too much. It is simply too difficult to work out the exact functions analytically. And for most purposes a ballpark estimate will do.

The technical definition of ballpark estimate is asymptotic complexity. With asymptotic complexity all we are interested in is the dominant term, in the limit, of the efficiency function.

For example, suppose we have two algorithms:
A1-efficiency(N) = 29 [constant time]
A2-efficiency(N) = 3+N/2 [linear time]
Draw graph showing that A1(N) > A2(N) when N < 52.

Although for very short lists A2 is more efficient than A1, for all sufficiently large lists A1 is more efficient than A2. This is the idea asymptotic complexity captures.

Asymptotic complexity ignores all low-order terms, and all constant multipliers ... so all linear functions are written O(N), all constant time functions O(1).

The asymptotic complexity of A1 is O(1), of A2 is O(N).

Big-O is read ``order of''

Obviously, asymptotic complexity is only appropriate when you are dealing with big inputs. If, in your application, you know your inputs will be small you will have to do more careful analysis.



For example, the NlogN entry shows that MergeSort would do roughly 896 comparisons in sorting a list of length 128; for the same list, the number of comparisons done by Insertion Sort is given by the N*N entry, 16,192.

As you can see these are massive differences, so reducing the order of complexity is always a huge payoff (unlike most simple coding tricks).

What is wrong with the following: efficiency(N) = O(2N + logN)

Answer: With Big-O notation, we are strictly concerned with the dominant term, low-order terms and constant coefficients are ignored. The statement in the example ought to have been simplified to efficiency(N) = O(N).














4. Graphs (Chapter 10)



Recall that a list was a collection of nodes with two properties
every node has a unique predecessor (except the first node)
every node has a unique successor (except the last node).
In a tree we have (1) but not (2). In a graph we have neither property: any node can have any number of successors and any number of predecessors. In fact, we don't always distinguish between successors and predecessors.

Our general aim in this lecture is a rather brief overview to make you familiar with the terminology and one or two basic algorithms, such as traversal.

A Graph is a collection of nodes - which hold values - and edges (or arcs) which connect two nodes. Usually there is no distinguished `first' or `last' nodes.

Directed Graph: A graph is a directed graph if the edges have a direction, i.e.if they are arrows with a head and a tail.

Undirected Graph: A graph is undirected if the edges are ``two way'' (have no direction).

Weighted Graph: In a weighted graph each arc has a number associated with it, called its weight. For example, an edge's weight might represent the distance between to nodes.

Here is an example of a weighted undirected graph:

Subgraph: a subset of nodes and a subset of the arcs involving those nodes.




Path: a sequence of nodes in which each successive pair is joined by an edge. Path length in an unweighted graph = number of arcs in the path. Path length in a weighted graph = sum of the weights of the edges. e.g. in the graph above, the path ABD has length 7 + 3 = 10.

generalization of path length: the weight of a subgraph of a weighted graph is the sum of the weights of the edges in the subgraph.

Simple Path: a path in which no node occurs twice.
the path ABD in the graph above is a simple path
the path BDCBA is not a simple path

Cycle: a path (with more than one node) that starts and ends at the same node. e.g. the path BDCB is a cycle

Connected Graph: an undirected graph is connected if there exists a path between every pair of nodes. We will normally restrict ourselves to connected graphs.

Adjacent Node: we say that node N2 is adjacent to node N1 if there is an edge from N1 to N2.

Neighbours of a node: NEIGHBOURS (N) = set of nodes adjacent to N.
Operations
Graph Traversal



4.2. Graph Traversal
To traverse a graph is to process every node in the graph exactly once. Because there are many paths leading from one node to another, the hardest part about traversing a graph is making sure that you do not process some node twice. There are two general solutions to this difficulty:
when you first encounter a node, mark it as REACHED. When you visit a node, check if it's marked REACHED; if it is, just ignore it. This is the method our algorithms will use.

when you process a node, delete it from the graph. Deleting the node causes the deletion of all the arcs that lead to the node, so it will be impossible to reach it more than once.
At the heart of our traversal algorithm - and in fact all of our algorithms - is a list of nodes that we have reached but not yet processed. we will call this the READY list.
General Traversal Strategy
Mark all nodes in the graph as NOT REACHED.

pick a starting node. Mark it as REACHED and place it on the READY list.

pick a node on the READY list. Process it. Remove it from READY. Find all its neighbours: those that are NOT REACHED should marked as REACHED and added to READY.

repeat 3 until READY is empty.



Example:
Step 1: A = B = C = D = NOT REACHED.
Step 2: READY = {A}. A = REACHED.
Step 3: Process A. READY = {B,C}. B = C = REACHED.
Step 3: Process C. READY = {B,D}. D = REACHED.
Step 3: Process B. READY = {D}.
Step 3: Process D. READY = {}.
In fact this will traverse only a connected graph. To traverse a graph that might be unconnected, you repeat this whole procedure until all nodes are marked as REACHED.

There are two choice points in this algorithm: in step 2, how do we pick the initial starting node, and in step 3 how do we pick a node from READY?

The answer is, it depends on what you are trying to accomplish. If all you want to do is print out nodes, or count them, or do any other processing that is order-independent, then any selection will do.

The two most common traversal patterns are breadth-first traversal and depth-first traversal.

In breadth-first traversal, READY is a QUEUE, not an arbitrary list. Nodes are processed in the order they are reached (FIFO). This has the effect of processing nodes according to their distance from the initial node. First, the initial node is processed. Then all its neighbours are processed. Then all of the neighbours' neighbours etc.

In depth-first traversal, READY is a STACK; the most recently reached nodes are processed before earlier nodes.




Let us compare the two traversal orders on the following graph:

Initial Steps: READY = [A]. Process A. READY = [B,E]. Process B.

It is at this point that two traversal strategies differ. Breadth-first adds B's neighbours to the back of READY; depth-first adds them to the front:

Breadth First:
READY = [E,C,G].
process E. READY = [C,G,F].
process C. READY = [G,F,D].
process G. READY = [F,D,H].
process F. READY = [D,H].
process D. READY = [H].
process H. READY = [].

Depth First:
READY = [C,G,E].
process C. READY = [D,G,E].
process D. READY = [G,E].
process G. READY = [H,F,E].
process H. READY = [F,E].
process F. READY = [E].
process E. READY = [].
Our discussion of graphs will continue next lecture.



------------------------------------------------------------------------------------------------------------




Lecture 3
Graph Algorithms (Chapter 10)
This lecture will probably not take a full 3 hours. Use the additional time to summarize the course and review for the exam.
In this lecture we conclude our discussion of graphs. First we look at standard algorithms for two very common problems; then we look at two common ways of implementing graphs.
1. Shortest Path Algorithm
2.Spanning Trees
3.Algorithm For Computing a Minimum Spanning Tree
4.How To Implement A Graph


1. Shortest Path Algorithm



Given a weighted graph, and node X, find the shortest path from X to each node in the graph.

Our algorithm is a modified version of our traversal algorithm. The algorithm is originally due to Dijkstra, although the presentation below is phrased in a slightly different manner than the textbook's description of Dijkstra's algorithm.

When we add a node to READY, we will store along with the node some extra information: the path (sequence of nodes) that has just led to the node, and the length of this path.

Thus, each entry on READY will be a triple: , abbreviated .

READY will be a priority queue: the ``priority'' of an entry is its Path Length, the entry with the smallest Path Length is the next one removed.

We need to modify our strategy for marking nodes as REACHED. In the traversal algorithms, we could ignore a node the moment it was added to the READY list. But the very first path we find to a node may not be the shortest path, so we will leave the node marked as NOT REACHED until we are sure we have reached it via the shortest path (starting at node X). As it will turn out, we will know the shortest path to node N the moment an entry for node N is removed from the READY list: it is at this point that will mark N as REACHED.

In fact, the term REACHED is not an appropriate word now; we will instead mark nodes as `processed' or `unprocessed'.

Finally, when we go to add a node to READY we look to see if it is already on READY. If it is, we compare the new triple with the one already there, and keep whichever one has the shorter path length.
shortest path algorithm



X is your starting point. Place on READY.

pick the triple, on READY such that L is minimum.
Process it. (print ``the shortest path from X to N is P of length L'')

Remove it from READY, and mark it as `processed'.

Look up N's neighbours {N1, N2...Nk}. These are connected to N by edges that have weights W1,W2,...Wk. Ignore those that are `processed'.

For each neighbour Ni that is `unprocessed':
create the triple (where P^Ni means add node Ni to the end of path P)

add this triple to READY - if there is already a triple for Ni on READY, only add the new one (and delete the old one) if its path length is shorter.

repeat 2 until READY is empty.




Example: shortest paths from A to all nodes.
Step 1: READY = [].

Step 2: Process A, mark it `processed'. READY = [,]

Step 2: Process C, mark it `processed'. C's unprocessed neighbours are B and D. Make up a triple for each of them: . Add these to READY if the node name is not already there, or if it is there but the new triple has a shorter path length.

In this case, we add the new D triple because D is new, and the new B triple because it has shorter path length than the one already on READY.

READY = [, ]

Step 2: Process B, mark it `processed'. B's sole unprocessed neighbour is D. Create the triple . This is not added to READY because there is already a D-triple there that has shorter path length.

READY = []

Step 2: Process D, mark it `processed'. D has no unprocessed neighbours. READY = [] (i.e. empty).
The result of all this is the output:
the shortest path from A to A is `A' of length 0
the shortest path from A to C is `A-C' of length 2
the shortest path from A to B is `A-C-B' of length 6
the shortest path from A to D is `A-C-D' of length 7


2. Spanning Trees



A spanning tree of a connected graph G is a connected subgraph of G having no cycles. Every connected graph has a spanning tree. For example, the following graph has four distinct spanning trees.
Spanning Trees:
In general a graph will have very many spanning trees. Recall that the weight of a subgraph to be sum of all the edge weights in the subgraph. Different spanning trees have different weights, the minimum spanning tree is the spanning tree with the minimum weight.

Example: weights added to the graph above
spanning tree S1 has weight 3+2+4 = 9
spanning tree S2 has weight 2+4+1 = 7
spanning tree S3 has weight 3+2+1 = 6
spanning tree S4 has weight 3+4+1 = 8
S3 is the minimum spanning tree of this graph.


3. Algorithm For Computing a Minimum Spanning Tree



This algorithm is extremely similar to our shortest path algorithm. In the shortest path algorithm, the key ideas was to pick from READY the node that was closest to the starting point: to do this we needed to store extra information on the READY list along with each node - in particular we need to store the length of the shortest known path to the node from the starting point (we also stored the path itself).

In Minimum spanning tree (MST), the key idea is very similar: pick from READY the node that is closest to the spanning tree we have built so far. To do this we need to store extra information on the READY list along with each node: the length of the shortest known path to the node from any node in the tree built so far. We note that the shortest path to the nearest node (N) will always be a single edge, so to store the `path' leading to N will always involve just one other node (one that already in the MST), so along with N we will just store this node and the weight of the edge connecting them.
Minimum Spanning Tree (MST) algorithm
Pick any node X as your starting point. Place on READY.

pick the triple on READY such that L is minimum.
Process it. (print ``N is connected to P in the MST'')

Remove it from READY, and mark it as `processed'.

Look up N's neighbours {N1, N2...Nk}. These are connected to N by edges that have weights W1,W2,...Wk. Ignore those that are `processed'.

For each neighbour Ni that is `unprocessed':
create the triple

add this triple to READY - if there is already a triple for Ni on READY, only add the new one (and delete the old one) if its path length (weight) is smaller.

repeat 2 until READY is empty.




Example (MST):
Step 1: READY = []

Step 2: Process A, mark it `processed'. READY = [, ]

Step 2: Process C, mark it `processed'. C's unprocessed neighbours are B and D. Make up a triple for each of them: . Add these to READY if the node name is not already there, or if it is there but the new triple has a smaller path length (weight).

In this case, we add the new D triple because D is new, and the new B triple because it has a smaller weight than the one on READY.

READY = [, ]

Step 2: Process B, mark it `processed'. B's sole unprocessed neighbour is D. Create the triple . This is added to READY, and the D-triple on READY is deleted, because the new triple has a smaller path length than the existing one.

Step 2: Process D, mark it `processed'. D has no unprocessed neighbours. READY = [] (i.e. empty).
The output produced by this algorithm is:
C is connected to A in the MST.
B is connected to C in the MST.
D is connected to B in the MST.
It should be clear that if you swapped the weights of the edges connecting D to B and C, then D would be connected to C in the minimum spanning tree.



4. How To Implement A Graph



The most popular and convenient representation of a graph, whether directed or undirected, weighted or unweighted, is an adjacency matrix. The graph is represented by a matrix whose rows and columns are indexed by the nodes. There is an entry in the matrix in row N1, column N2, iff there is an edge from N1 to N2 in the graph. In a weighted graph, the matrix position (N1,N2) would contain the weight of the edge from N1 to N2.
In the adjacency matrix for a directed graph the successors of node N are stored in row N and the predecessors for node N are stored in column N. For example:
Row `B' gives the successors of `B', column `B' gives predecessors of `B'. The fact that row `D' is empty indicates that `D' has no successors; similarly, column `A' being empty indicates `A' has no predecessors.

Certain operations are extremely efficient with an adjacency matrix representation:
adding or deleting an edge
adding or deleting a node (= adding/deleting a row and column)
The drawback of the adjacency matrix is its space efficiency. It requires N*N space, even if the graph actually contains very few edges. We would like a representation that used O(N+E) space.

In fact, we have already discussed the solution to this problem. When a graph has very few - O(N) - edges, almost all the positions in the matrix will be zero (empty). Does anyone recall an efficient implementation of almost-empty matrices we discussed several lectures ago?




A matrix in which most of the positions are zero is called a sparse matrix. We talked about these in our lecture about multi-linked lists. They can be represented very efficiently by having each row be a linked list, and each column a linked list, where the rows and columns share nodes. You only allocate space for the non-empty positions of the matrix. The matrix above would be represented in this way:
The space needed is O(N+E) - there are 2N list headers and one node in the matrix for each edge in the graph (E nodes in the matrix in total). For each edge in the graph there are 2 pointers (next-in-row, next-in-column), a weight, and the names of the nodes the edge connects.

This representation is commonly called the adjacency list representation. Often, for simplicity of implementation, the successor and predecessor lists do not share nodes but have separate, redundant nodes. If the graph is undirected, the predecessor and successor lists will be identical so only one of them is necessary.

The biggest drawback of this representation is that deleting an edge, or even looking up its weight, is O(E) time, not constant-time as it is with an adjacency matrix.

No comments: