Graph algorithms have many useful applications, ranging from finding the shortest route on a street map to efficiently managing computer networks. These algorithms are an essential part of a standard computer science degree curriculum, so I decided to write down and explain the basics to understand the topic better myself. Here's what I'm going to cover in this article:
- (Very) Basic Set Theory
- Properties of Relations (Symmetry)
- Asymptotic Notation
- Understand graph theory
- Build foundations for Understanding Graph Algorithms II: Traversal Algorithms (not published yet)
Let's start from scratch. What is a graph? A graph is basically just a structure which consists of a bunch of nodes, formally called vertices, and a bunch of lines which connect the vertices, called edges.
Here, we have a graph with 4 vertices (, , , and ) and 4 edges (the lines between the vertices). If we were really picky, we'd say that there aren't 4 lines but 4 arrows. But we'll get to that later. We'll refer to this graph again, so let's call it .
Let's try to formalise this a bit more, and define a graph :
Ok, what are we saying here? We have a graph which is defined by a set of vertices and a set of edges. We also have some additional information: is a subset of the Cartesian product . So by this definition, an edge is just an ordered pair of vertices.
In the example above, would be an element of since there is a line between and . Similarly, .
and are sets, so we can write the following:
- is the cardinality ('size') of , i.e. the number of vertices
- is the cardinality of , i.e. the number of edges
We've seen the visualisation of a graph above. Vertices are represented by labelled circles, and edges are visualised using lines to connect the vertices. Note that we have two different kinds of edges:
- directed edge: represented by an arrow, also called arch
- undirected edge: represented by a plain line
We use directed edges when we have an edge going from some vertex to some vertex but not vice versa:
We use undirected edges when we have an edge going from to and another edge going from to :
We have two different types of graphs, directed and undirected graphs. A graph is undirected if, and only if, all of its edges are undirected. Otherwise it's directed. So if a graph has some edge (that is, going from vertex to vertex ), it needs to have an edge too in order to be undirected. And that must be true for all edges .
From a mathematical perspective, it's fairly easy to distinguish between these two types. To this end, we note that is a binary relation as it is a subset of a Cartesian product. Let's also recall the definition of a symmetric relation:
Let be a relation on a set . A relation is symmetric if, and only if, for every , if then .
This is just another way of saying that for all pairs in , there must also be a complementing pair . A graph is undirected if, and only if, is symmetric.
For directed graphs, we have a bit of additional notation. If we pick some random vertex , where , we can write . This denotes the in-degree of , that is, the number of incoming edges into . In a similar manner, we also have , denoting the number of outgoing edges from .
A vertex which has only edges going in but no edges going out is also called a sink node.
A graph is planar if it is possible to draw the graph (formally, to embed the graph in the plane) so that no edges cross each other.
A graph is connected if you start from any vertex and there always exists a path to any other vertex.
This graph is not connected as it has a sink node.
Now that we've covered the basic theory of graphs, we can move on to the next problem: how can we represent graphs in memory? What data structures are suitable?
There are two popular approaches:
- Adjacency matrices
- Adjacency lists
Let's assume we have some graph with vertices. We can name these vertices . Then, the adjacency matrix of is an matrix, that is, a matrix which has rows and columns. Let's call that matrix .
is an entry of , where indicates the row and indicates the column. is if there is an edge which goes from vertex to vertex . Otherwise, is zero.
For our graph , this would be , assuming that the vertices are ordered in lexicographical order:
For instance, if our indexing is zero-based, since is the 3rd vertex and has an edge pointing to itself.
Let's take with its vertices again. We come up with an array with entries, one for each vertex. Each entry contains a list of all vertices to which the respective node has outgoing edges too. So an entry is a list of vertices adjacent to the node for which the entry is.
This adjacency list represents a graph , where
I'll just throw this table at your head; explanations (and a bit of additional information) will follow:
|Efficiency Criterion||Adjacency Matrix||Adjacency List|
|Time to check if is adjacentto (adjacency check)|
|Time to visit all adjacentto (adjacency visits)|
|Time to visit all edges (visit-all)|
Adjacency matrices have a major advantage that they are very easy to implement. We can simply use a two-dimensional array, or even one-dimensional if we use an injective, or ideally bijective, mapping function.
Good for Undirected-Check
It's easy to check whether a graph is undirected or not. If it's an undirected graph, the corresponding adjacency matrix is symmetric around the diagonal. If you don't know why this is the case, I'd advise you to draw an undirected graph and write down the adjacency matrix.
Good for Adjacency-Check
Given that we have direct access to all entries of using array indices, we can check very efficiently if, given a pair of vertices, they are adjacent, i.e. there exists an edge between the vertices.
Index-accesses and simple comparisons happen within constant time, so worst-case running-time: .
We have an matrix, so the implementation will use up space. So rather space-inefficient in comparison to adjacency lists where we don't have all the zeros of .
Slow Adjacency Visits
We do have direct access via indices, so we can directly access the row for vertex . However, we don't know which of the cells of that row are and which are . So we need to go through the entire list which is of length . Worst-case running-time of .
Similar to the prior section, but now we're going through all rows and for each row through all columns. So worst-case running-time of .
In comparison to the adjacency matrix, the adjacency list is a more compact way of representing a graph as we don't have all the zeros of .
The list has entries and within these entries, all vertices are split up. So space-efficiency of .
Good for Adjacency Visits
If we want to visit all vertices adjacent to , we simply index into the entry for , a operation, and go through the entry list which has length .
Worst-case running-time of , pretty good!
Good for Visiting All Vertices
We have a faster asymptotic worst-case running-time for visiting all vertices which are adjacent to a given vertex. We simply direct-access the entry for each of vertices and go along the respective entry list, which has length in total.
Worst-case running-time of .
Slow Adjacency Check (is a neighbour of ?)
Since each of the entries of the adjacency list is an unordered list of neighbours, we need to go through the list until we find (or we don't, then the check evaluates to
In the worst case, we need to go through the entire list. Recall that the length of that list is denoted by . So we have a worst-case running-time of .
Use adjacency lists unless the adjacency check plays a major role in your algorithm.
Stuff that is good to know but may not be used in my article series about graphs.
Recall that is the number of vertices of a graph. Then, the maximum number of edges is . This is easy to understand: if we start with the first vertex , we can have at most edges going out from , including an edge to itself (yup, that's definitely possible). For vertex , we can also have at most edges going out from the vertex. And so on, up until vertex . So the maximum number of edges is , with summands. Hence, at most edges.
Recall that is the number of edges of a graph . If is close to , we say that is a dense graph. You can think of this property like this: a dense graph is a graph which has proportionally many edges, it is well connected.
If is much smaller than , we call a sparse graph.
There is no exact definition how much smaller or how close to should be, these terms are usually used when we're talking asymptotically about a graph, checking whether it hast most or few edges. In that context, we'd usually talk about rather than as an exact number.
Kuratowski's theorem is useful when working with planar graphs. The theorem has the corollary that
We can use this to:
- Check whether a graph is planar given the number of edges and the number of vertices . The graph is planar if the inequality remains true.
- Given the information that a graph is planar, we can use the criterion to obtain a quite tight upper bound for the number of edges. Looking at the inequality, that number will asymptotically be the same as the number of vertices for planar graphs.
- Cryan, Mary. ´Graphs I: graphs, Breadth-first search, Depth-first search.´ Class Lecture, University of Edinburgh, November 14, 2020.
Cormen, Thomas H., Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms, Third Edition. Cambridge, UNITED STATES: MIT Press, 2009. http://ebookcentral.proquest.com/lib/ed/detail.action?docID=3339142.
- Section 22.1
- Radcliﬀe, Mary. ‘Math 228: Kuratowski’s Theorem’, n.d., 9. https://www.math.cmu.edu/~mradclif/teaching/228F16/Kuratowski.pdf.
- ‘Cartesian Product’. In Wikipedia, 12 November 2020. https://en.wikipedia.org/w/index.php?title=Cartesian_product&oldid=988368067.