What The Heck Is Graph Theory

Before we start -do me a favour. Forget that the word 'graph' usually means a chart with an x and a y axis. Assume that we have an entirely new meaning of the word. Don't sit there thinking "I know what a graph is, I like graphs, any moment we are going to start ploting points on a graph". Wrong, Wrong, Wrong! It may help if you make up a word such as "bazonka" and replace every occurance of the word "graph" with the word "bazonka".

Introduction to Bazonkas (easy)

The mathematical use of the word "set" is fine. It fits with our everyday use of the word. In the Lerch Grossman case we are particularly interested in a set of "blocks" -the chunks of rock that make up the geology in our area of interest. Graph theory parlance has us call the elements of this set vertices. Imagine a second set, this time a set of connections between the vertices, we call these connections edges. It is common to draw vertices as points and edges as line between them. Thats all there is -a graph is a set of things (vertices) and a set (or a collection) of connections between them (edges). Sometimes these edges are defined as an pair of vertices.
graph diagram Graph with set of vertices {A,B,C,D} and set of edges {(A,B),(A,C),(C,B)(C,D)(B,D)}
Notice the use of braces {} to indicate a set.
In the LG algorithm we encounter two graphs. Though the set of vertices is the same (the set of blocks), the set of the edges is different. One represents information about the overburden of each graph (I call it the "slope graph"). This information does not change in the course of the solution. The other graph is a tree (explained below) and changes as the solution proceeds. I call this the "tree graph".

The Graph Zoo

There are many types of graphs rather too many terms to describe them. Moreover not all authors use the same terms in the same way. Mining engineers should not be troubled by this as they are used to dealing with geologists. Important distinctions include:
  1. The null graph -no vertices, no edges.
  2. Multigraphs permit more than one edge between a given pair of vertices. So a graph might Contain vertices {A,B,C} and have edges {(A,B),(A,B),(B,C)}. We don't meet multigraphs in the Lerch Grossman Algorithm. However it illustrates a point about sets. Since in this diagram there are two edges between A and B the pair (A,B) occurs twice. However, the nature of sets is that something is either a member or it isn't. The way that we have written it we have (A,B) being a member twice. We can expect a visit from the Math Police. The get out is to call it a "collection" of edges rather than a "set".
  3. Directed graphs are connected by directed edges. In the Lerch Grossman paper directed edges are called arcs. A directed edge has a sense of direction to it. We draw them with little arrows on. We write the set of edges as an ordered pair. In a regular graph the edge (A,B) is the same as the edge. For mathematicians directed graphs are usually viewed as a special case -they are simple graphs with (if you like) arrows added. For the programmer directed graphs are easier to deal with and you expend effort ensuring that your program knows that an edge (A,B) is the same as one (B,A). Directed graphs are often called 'digraphs' for those wishing to seam familiar.
  4. Connected graphs are graphs in which there is a chain (a sequence of edges (A,B)(B,C)(C,D))... ) joining every pair of vertices. LG use the term 'path' on a directed graph for the same concept but using directed edges. (Watch for other terms from other authors e.g. edge sequences). The issue of connectedness is a little more complicated with digraphs but we don't need it here.
  5. Cyclic vs acyclic graphs. If there are loops in a graph its called cyclic if not its acyclic. Formally a loop is a chain with the same first and last vertex.
  6. Trees a connected acyclic graph is called a tree. If you pick a vertex (at random) in a tree and call it the "root" then you have a "rooted tree". L and G go on to define branches and twigs (yawn). They mean what you'd think, more or less. Rooted trees have two important properties. Even though we are in theory talking about an non-directed graph, every edge now has a special direction which is the direction of the root. In other words you can if you want think of rooted trees as directed graphs. As programmers we say "yes please" to this. The second point, which turns out to be important in the LG algorithm is this. Every vertex, except the root has exactly one edge on the root side. It may have any number (including zero) on the 'leaf' side. So if we have a rooted tree we may be able to represent it as a data structure in which each record has exactly one pointer pointing in the root direction. This turns out to be the case in the LG algorithm -and is a good thing. You read it here first folks!
  7. The list goes on. With only the above properties we can ask a large number of intriguing questions of a graph. For example, going back to Euler in 1763, we can ask whether it is possible to find a circuit that will use all the edges exactly once (an euler circuit). Often, however, we need to add to the basic graph concept. Vertices represent values or edges represent distances. Numbers and sometimes colours are added. The menagerie of graph types and properties is extensive.

Some Diagrams

Multigraphs

graph diagram
Here Euler's Konigsberg bridge problem is shown as a multigraph drawn in red. It is a multigraph because there are more than one edges between certain vertices.(here representing bridges). The background is intended to depict the layout of the the town with bridges where the edges cross the river. Euler showed that (for this particular multigraph problem) residents could not take a walk that would cross each bridge exactly once.

Directed Graph

graph diagramDirected Graph with set of vertices {A,B,C,D} and directed edges {(a,b),(a,c)(b,c),(d,c),(c,a)}

Graph that is not connected

graph diagram Graph that is not connected. Although there is (e.g.) a path from G to C {(G,F)(F,E)(E,C)} there is not one from A to C. Note that {(C,D)(D,E)(E,C)} forms a cycle so this particular graph is not acyclic.

Rooted Tree

graph diagram If you look at the graph drawn in yellow you can see that it is acyclic and connected. Therefore it is a tree. Any vertex can be chosen as a root. I chose E even though the graph is drawn suggesting that it should be A. Every edge has rootwards direction (green arrow).

In the LG algorithm the "slope graph" is directed (i.e. its a digraph). As all the directed edges point to blocks that are higher than themselves there are no cycles. The other graph is a rooted tree. In fact it would not generally be connected -it would be a set of disconnected trees. However, a "dummy root" is introducted. This is a vertex with no corresponding block of rock associated with it -though it is good to imagine it below the rest of the model. Adding this extra block is a shameless dodge that just keeps things tidy. Similarly the tree graph is treated in the literature as an undirected graph -even though the direction of the root is very important. Further as there is the all important value of each block we might call them "weighted graphs" -this just refers to the presence of the additional information associated with each block (vertex.)

Transitive closure

Transitive is an important concept in many aspects of discrete mathamatics, including the LG algorithm. Supose that we have a graph (in this case a digraph) where the vertices represent people and the directed edges imply the relation "is a parent of". If we were interested in the transmission of a genetic disorder we can see that not only can the child receive the disorder so can the grand child, great grand child etc.. We can define a new graph "is an ancestor of" in terms of the original simple one . Sometimes this is refered to as the repeated application of the function (function in the computing sense). Often a reference is made to a computer technique called "recursion".
graph diagram
In this graph you can think of the purple graph as the original. It might represent the function "is a parent of" on the set of 8 people (vertices). The red graph is then created as the "transitive closure " of the red graph -it might represent "is an ancestor of".

In the LG algorithm a digraph (directed graph) is used to express the constraints of rock strength and slope design. I call this the slope graph. Each block is represented as a vertex that has directed edges (or arcs) to its close neighbours on the levels above. The transitive closure of this graph represents the concept of "is in the overburden of". From the above diagram you can see that there is often vastly more edges in the transitive closure of a graph than in the graph itself. So the concept of transitive closure is the defining of a new graph in terms of an old one, often with vastly more edges.

Partial Graphs and Subgraphs

Suppose we make a new graph "graph B" by picking out bits of an existing graph "graph A". There are two special possibilities here:
  1. Suppose that graph B contains some of the vertices of graph a but all of the edges from A that connect those vertices. Here B is called a subgraph. If A consisted of vertices which represented the towns of Canada and edges that represented connecting roads then an example of a subgraph would cover e.g. the cities and roads of Manitoba -some of the towns but all of the roads connecting them.
  2. On the other hand if we keep all of the vertices (often only nominally) but only some of the edges, then the result is called a partial graph. Extending our previous example, a partial graph might represent all the towns of Canada but only roads which are four lanes or more.

Closures of a Graph and Maximum Closure of a Weighted Graph

The LG algorithm is formally defined as and algorithm to find "The maximum closure of a graph". What does this mean? A closure means a "closed subgraph". This generally applied to directed graphs. A closure is a subset of vertices that has no directed edges (arcs) pointing to vertices outside the closure. Consider the following digraph: {(A,B),(B,C),(D,B),(D,E),(F,G),(G,E),(G,H),(H,C)}
graph diagram
If, for example, vertex D is to be part of a closure, then so are B and E because of the directed edges (D,B) and (D,E). But because B is in the closure so is C because of the directed edge (B,C). In finding a closure containing D we are using a similar concept to that introduced with "transitive closure"-repeated application. We are not adding edges, however. It does not follow that G is a member of the closure because the edge (G,E) "points" the wrong way. If we need to have D and F in the closure then it must include the vertex set {D,F,B,E,G,H,C}. Notice that the null graph is always a closure.

If the directed graph in question is weighted (there is a numerical value associated with each vertex) then the maximum closure is the closure with the greatest value obtained by summing all the weights in the closure. Look carefully at the following diagram:
graph diagram
Notice:
  1. A directed graph is shown in red.
  2. The four vertices in the yellow bubbles are a closure.
  3. They are the maximum closure and have the value 2+3-1-1=3. Convince yourself that this is so. i.e. that there is no other closure with a greater value
  4. Working out the maximum closure of even a simple graph is not trivial

The Pit Optimisation Context

Remember that one of the graphs in the LG algorithm is a directed graph, which I call the "slope graph". The simplest case of the slope graph gives each block in the interior of the model 9 edges/arcs pointing at the nearest 9 blocks in the layer above. These are nine blocks that must be mined first, but before they can be mined there are blocks on the layer above the nine that must be mined. The graph whose directed edges indicate "block Y is in the overburden of a block X" is therefore the transitive closure of the slope graph. In this case it defines an overburden resembling and inverted pyramid. Moreover any mine design that proposes slopes which are feasible (as defined by the slope graph) is a closure of the slope graph. That means that for any block X to be mined any block Y pointed to by and directed edge or arc (X,Y) will also have to be mined. We don't want just any closure -we want the one that make the most money. In the pit optimisation process each block has a monetary value -which may be negative. We can therefore regard any graph for which the blocks are regraded as vertices as "weighted" -the weights of the vertices are the values of the blocks. The optimum pit must be geotechnically feasible (therefore it must be a closure) and we want the most lucre we can have -therefore we want the maximum closure. So the L-G algorithm is entitled in the original paper the "Maximum Closure of a Graph".

Last Thoughts About graph theory

Background information only.
Graph theory is an approachable area of mathematics. Many problems are regularly featured as puzzles or under the term "recreational mathematics". Introductory texts deal with problems of organising parties, dances and sporting events. The impression is that graph theorests have lots of fun while calculus instructors spend their time in leaky baths or on ladders forever sliding down walls. Applied graph theory, however, is generally considered part of Operations Research (a nebulous term). Moreover it is not coincidental that graph theory started in earnest at the same time as computers started to appear.

What is it that Graph theory dudes do? When they are not attending parties, dances and sporting events they do three things:
  1. As we see above it is important to have a system for describing graphs and their properties.
  2. As mathematicians are wont to do, they prove things about graphs. So for example there is a simple way to determine whether a graph has an Euler circuit, and this was proven by Euler.
  3. Lastly (and this is where we get interested) they work on algorithms to solve problems that involve graphs.
The remainder of this page is intended for those hoping to program the algorithm.

Graphs and Computer Programming

Strictly for programmers This section is only relevent if you are considering programming the LG algorithm yourself
Since graphs consists of things (vertices) and connecting between things (edges) it is worth reviewing some relevent programming issues. This page is not about computer programming. But it seems worthwhile to review some issues.
  1. Dynamic vs Static Storage. Are you going to use static arrays, dynamic data structures (linked or doubly linked records) or a more modern type of container, such as those provided by C++'s standard template library.
  2. How the edges are represented. Options include true pointers, iterators (pointers on drugs), integers that represent the index of vertex in an array or array like structure, names (substrings) that can then be used to lookup the storage position of the indicated vertex, and using 2d arrays as adjacency matrices.
  3. Maintaining the structure.For example, if we are not dealing with a directed graph we if A is adjacent to B than B is adjacent to A.
  4. Additional data: Depending on the method additional information may be stored abour individual vertices or edges. The might be flagged e.g as belonging to a particular sub set, they may have temporary or permanent values applied to them etc..

Graph Algorithms

This section is only relevent if you are considering programming the LG algorithm yourself
Formally an algorithm is a method for solving a problem that requires a finite number of steps. Just a pieces of string have finite length, proving that a method will work in a finite time does't imply that the finite time is less than, say the age of the universe. Never the less, its better than not knowing that it will work. As a rule the publishing of an algorithm does not involve giving many of the practical details concerning how you should or could program it. The tone is generally "of course we know how to program it ourselves but we don't want to spoil your fun".

Complexity

The LG algorithm is ....er well an algorithm. It is proven to find a solution in a finite number of steps. We don't know how many. The study of how many steps an algorithm requires is called complexity. Complexity is usually a function of "problem size" which in the case of graphs may be the number of vertices and or edges. For example, an algorithm for finding the shortest route between two cities on a given road network might take a number of steps that is proportional to the number of cities and proportional to the square of the number of roads. As far as I am aware nothing has been published on the LG algorithm. Certain problems likely take much longer than others even if they are the same size. In practice it finds solutions reasonably well, but there doesn't seem to be any guarantee that the next model that you try will finish in your lifetime. Note that speed is often obtained at the expense of memory usage.

Representing Graphs in The computer

The first thing that they don't tell you is how to represent a graph in the computer. The requirements of different methods of representation vary, however they include:
  1. Size: is it important how much memory is required? In particular some matrix methods tend to require space that is proportional to the square of the number vertices. This is no problem for say 100 vertices but for a million it becomes impractical.
  2. The ability to visit each vertex (or edge) once. Algorithms tend to have steps that say things like "find a vertex with property P -if one exists". If the vertex data is embedded in a linear structure such as a list or an array this is easy.
  3. Examining for structural features makes different demands on an implementation. For example determining whether a graph is connected or cyclic, or counting the vertices on a branch each have their own requirements.
  4. Creating and destroying the structures. Particularly in the case of the more complex data structures, you must give some thought
Methods include:
  1. Adjacency matrices. If there are n vertices in a graph an adjacency matrix is an n x n matrix containing 1's where vertices are connected by edges (i.e. are adjacent), and 0 otherwise. So the first graph above (the yellow and black one) would be written as:
    
          A B C D
       A  0 1 1 0
       B  1 0 1 1
       C  1 1 0 1
       D  0 1 1 0
        
    A two dimensional array (e.g. of booleans) would clearly work. Notice that the space required increases with the square of the number of vertices. This may not be a problem is the problem sizes are moderate or if a large percentage of the vertices are connected by and edge
  2. Adjacency lists. Here for each vertex a list is made of each adjacent vertex. The yellow graph could be written:
        A:BC
        B:ACD
        C:ABD
        D:BC
    
    This is not quite so simple as we have a list or array of items which have variable length. However it is a good fit with the storage classes available through C++'s standard template library for example.
  3. Data Structures. The most general way to represent graphs is probably through data structures. Records that represent vertices and have a variable number (linked list?) of pointers to other such records.
    graph diagram
    This diagram is supposed to indicate a doubly linked structure. As vertex A is linked by an edge to B, so there is a pointer in A pointing at B and a pointer in B pointing at A. The red line is supposed to represent a pointer locating the structure from the programs point of view. If the graph is not known to be connected you need to arrange multiple such pointers -perhaps a dummy vertex- pointing at least once at each component. It is worth considering that disposing of the memory in a structure such as this is not trivial.
  4. Problem specific methods. IN the LG implementation that I present here our vertices represent blocks of rock which are arranged in a three dimensionally so that the arrangement suggest storing all the data in three dimensional arrays. One of the data structures is a tree. Because it is a tree and because it is possible to work always in a rootwards direction you can get away with a three dimensional array of "pointers" to each blocks rootward neighbour. In fact true pointers are less convenient than remembering the index of the adjacent block in the array.
  5. Rule based representations. On occassion it is possible not to have to represent the edges of the graph at all because they have a regularity that can be described by rules. Suppose

Recursion and Transitive closure

In the following example it is assumed that there is a function called ISAParentOF(A,B) which returns a true value only if A is a parent of B. If there is a set of vertices: a set of people considered {A,B,C...} this function defines a digraph. The following pseudocode defines a second graph on the same set of vertices. The function IsAnAncestor(A,B) returns true if the directed edge (A,B) is in the transitive closure of IsAParentOf. Note the recursive call in bold (the function calls itself). Note that, as written, bad things would happen if IsAParentOF defined a cyclic graph.
  Function Boolean IsAnAncestorOf(A,B)
    {
     if(IsAParentOf(A,B))return true;
     
     for each C such that IsAParent(A,C)  if IsAnAncestor(C,B) return true;

     return false;
    }