Introduction to Analysis of Algorithms

Graph Theory

Graphs

A graph G consists of an ordered pair of sets = (, ) where ≠ ?, and ? (2) = {2-subsets of }.

In other words E consists of unordered pairs of elements of V. We call = () the vertex set, and =

() the edge set of G. In this handout, we consider only graphs in which both the vertex set and edge set

are finite. An edge {, }, typically denoted xy or yx, is said to join its two end vertices x and y, and these

ends are said to be incident with the edge xy. Two vertices are called adjacent if they are joined by an edge,

and two edges are said to be adjacent if they have a common end vertex. A graph will usually be depicted

as a collection of points in the plane (vertices), together with line segments (edges) joining the points.

Two graphs 1 and 2 are said to me isomorphic if there exists a bijection : (1) → (2) such that for

any , ∈ (1), the pair xy is an edge of 1 if and only if the pair ()() is an edge of 2. In other

words, the function must preserve all incidence relations amongst the vertices and edges in 1 . We write

1 ? 2 to mean that 1 and 2 are isomorphic.

Example 2 Let 1 be the graph from the previous example, and define 2 by (2) = {, , , , , },

(2) = {, , , , , , , , , }. 2 can be drawn as

Define a mapping : (1) → (2) by 1 → , 2 → , 3 → , 4 → , 5 → , 6 → . One checks easily

that is an isomorphism.

Isomorphic graphs are indistinguishable as far as graph theory is concerned. In fact, graph theory can be

defined to be the study of those properties of graphs that are preserved by isomorphisms. Thus a graph is

not a picture, in spite of the way we visualize it. Instead, a graph is a combinatorial object consisting of

two abstract sets, together with some incidence data relating those sets.

Notice that our definition of a graph does not allow for the existence of an edge joining a single vertex to

itself (sometimes called a loop), since an edge = {, } must be a 2-element subset of V, and therefore

must have distinct ends.

Neither does our definition allow two distinct edges to join the same pair of vertices (parallel edges), since

the edge set E, being a set of pairs of vertices, cannot contain the same pair twice.

When these types of edges are allowed, we call the resulting structure a multi-graph. Some authors use the

term “graph” to denote what we have designated as a multi-graph. Those authors would call our notion of

graph (i.e. a graph in which loops and parallel edges do not occur) a simple graph.

The degree of a vertex ∈ (), denoted deg (), is the number of edges incident with x, or equivalently,

the number of vertices adjacent to x. Referring to Example 1 above, we see that deg(1) = 2, deg(2) = 5

and deg(6) = 3. The degree sequence of a graph consists of its vertex degrees arranged in increasing order.

The graph in Example 1 has degree sequence (2, 3, 3, 3, 4, 5). Observe that the graph in Example 2 has the

same degree sequence. Clearly if : (1) → (2) is an isomorphism, then deg(()) = deg () for

any ∈ (), and hence isomorphic graphs must have the same degree sequence. This is our first example

of an isomorphism invariant, a graph property that is preserved by isomorphisms.

Observe that

∑ deg ()

∈()

= 2|()|

since each edge, having two distinct ends, contributes 2 to the sum on the left. This is sometimes known as

the Handshake Lemma for it says that the number of hands shaken at a party is exactly twice the number of

handshakes.

Exercise 1 Show that the number vertices of odd degree in any graph must be even. (Hint: suppose G

contains an odd number of odd vertices. Argue that the left hand side of the above equation is then odd,

while the right hand side is clearly even.)

Given any , ∈ (), an x-y walk in G, is a sequence of vertices = 0, 1, 2, … , ?1, = for which

?1 ∈ () for 1 ≤ ≤ . We call x the origin and y the terminus of the walk. Note these need not be

distinct. If = , the walk is said to be closed. The length of the walk is k, the number of edge traversals

performed in going from x to y along the sequence. Since the edges of a graph have no inherent direction,

we do not distinguish between the above sequence and its reversal: = , ?1, … , 2, 1, 0 = . Thus

the designation as to which vertex in a walk is the origin and which is the terminus is arbitrary. If G contains

an x-y walk, we say that y is reachable from x (equivalently x is reachable from y). A special case of a

closed walk is the trivial walk = 0 = of length 0, in which no edges are traversed.

A walk in which no edge is traversed more than once is called a trail, and a trail in which no vertex is visited

more than once (except possibly when origin=terminus) is called a path. Observe that the trivial walk is

both a trail and a path. A non-trivial closed path is called a cycle. Since our graphs have no loops or parallel

edges, the minimum possible length for a cycle is 3.

3

Example 3 Referring again to Example 1 we have:

a cycle of length 3: 2, 5, 6, 2

a cycle of length 6: 1, 2, 3, 6, 5, 4, 1

a 1-6 path of length 5: 1, 4, 2, 5, 3, 6

a 1-6 path of length 2: 1, 2, 6

a 3-1 trail that is not a path: 3, 2, 5, 6, 2, 1

a 3-1 walk that is not a trail: 3, 5, 2, 4, 5, 2, 1

the trivial 1-1 path: 1 (though this is a closed path, it is not a cycle)

Note that the reachability relation could just as well have been defined in terms of paths instead of walks

since, if G contains an x-y walk that is not a path, we can, by eliminating some cycles, replace it by an x-y

path. For instance the 3-1 walk in Example 1 above (3, 5, 2, 4, 5, 2, 1) contains the cycle (5, 2, 4, 5), and

can be replaced by the 3-1 path (3, 5, 2, 1).

The distance from x to y , denoted (, ), is defined as follows. If y is reachable from x, then (, ) is

the length of a shortest x-y path. If y is not reachable from x (i.e. if no x-y path exists), then (, ) is

infinity. The Single Source Shortest Path (SSSP) problem is this: given a distinguished vertex ∈ ()

called the source, determine (, ) for all ∈ (), and for each x that is reachable from s, determine a

shortest s-x path in G. We will learn two efficient algorithms that solve this problem.

A graph G is said to be connected iff y is reachable from x for every pair of vertices , ∈ (). If G is

not connected, it is called disconnected. Examples 1 and 2 above are clearly connected, while the following

graph is disconnected.

Example 4 = {1, 2, 3, 4, 5, 6, 7, 8, 9}, = {12, 15, 25, 26, 56, 37, 38, 78, 49}

A subgraph of a graph G is a graph H in which () ? (), and () ? (). In the above example,

({1, 2, 5}, {12, 15, 25}) is a connected subgraph, while ({2, 3, 6, 7}, {26, 37}) is a disconnected subgraph.

A subgraph H is called a connected component of G if it is (i) connected, and (ii) maximal with respect to

property (i), i.e. any other subgraph of G that properly contains H is disconnected. Observe Example 4 has

three connected components: ({1, 2, 5, 6}, {12, 15, 25, 26, 56}), ({3, 7, 8}, {37, 38, 78}) and ({4, 9}, {49}).

Obviously a graph is connected if and only if it has exactly one connected component.

Trees

A graph is called acyclic (or a forest) if it contains no cycles. A tree is a graph that is both connected and

acyclic. The connected components of an acyclic graph are obviously trees. The following example is a

forest with three connected components.

# of vertices: 8 6 5

# of edges: 7 5 4

Observe that in each tree of this forest, the number of edges is one less that the number of vertices. This

fact holds in general for all trees. The following lemmas demonstrate how the independent properties of

connectedness and acyclicity are related.

Lemma 1 If T is a tree with n vertices and m edges, then = ? 1.

Proof:

This result was proved in the handout on Induction Proofs by induction on n. We prove it here by induction

on m. If = 0 then T can have only one vertex, since T is connected. Thus = 1, and = ? 1,

establishing the base case. Now let > 0 and assume that any tree ′ with fewer than m edges satisfies

|(′)| = |(′)| ? 1. Pick an edge ∈ () and remove it. The resulting graph consists of two trees

1, 2, each having fewer than m edges. Suppose has edges and vertices ( = 1, 2). Then the

induction hypothesis gives = ? 1 ( = 1, 2). Also = 1 + 2 since no vertices were removed.

Therefore = 1 + 2 + 1 = (1 ? 1) + (2 ? 1) + 1 = 1 + 2 ? 1 = ? 1, as required. ■

Lemma 2 If G is an acyclic graph with n vertices, m edges, and k connected components, then = ? .

Proof:

Let the connected components of G (which are necessarily trees) be denoted 1, ?2, ?… ,?. Suppose

has edges and vertices respectively (1 ≤ ≤ ). By Lemma 1 we have = ? 1 (1 ≤ ≤ ).

T

Lemma 3 If G is a connected graph with n vertices and m edges, then ≥ ? 1.

Proof:

Our proof is a generalization of that of Lemma 1, again by induction on m. If = 0, then G, being

connected, can have only one vertex, hence = 1. Therefore ≥ ? 1 reduces to 0 ≥ 0, showing that

the base case is satisfied.

Let > 0, and assume for any connected graph ′ with fewer than m edges that |(′)| ? ≥ ? |(′)| ? 1.

Remove an edge ∈ () and let ? denote the resulting subgraph. We have two cases to consider.

Case 1: is connected. We note that has n vertices and 1 edges, so the induction hypothesis

gives 1 ≥ 1. Certainly then ≥ 1, as was claimed.

Case 2: ? is disconnected. In this case ? consists of two connected components. (**See the claim

and proof below.) Call them 1 and 2, and observe that each component contains fewer than m edges.

Suppose has edges and vertices ( = 1, ?2). The induction hypothesis gives ≥ ? 1 ( =

1, ?2). Also = 1 + 2 since no vertices were removed. Therefore

= 1 + 2 + 1 ≥ (1 ? 1) + (2 ? 1) + 1 = 1 + 2 ? 1 = ? 1

and therefore ≥ ? 1 as required. ■

Claim**: Let G be a connected graph and ∈ (), and suppose that ? is disconnected. (Such an

edge e is called a bridge). Then ? has exactly two connected components.

Proof:

Since ? is disconnected, it has at least two components. We must show that it also has at most two

components. Let e have end vertices u, and v. Let and be the connected components of ? that

contain u and v respectively. Choose ∈ () arbitrarily, and let P be an x-u path in G (note P exists since

G is connected.) Either P includes the edge e, or it does not. If P does not contain e, then P remains intact

after the removal of e, and hence P is an x-u path in ? , whence ∈ . If on the other hand P does

contain the edge e, then e must be the last edge along P from x to u.

In this case is an x-v path in ? , whence ∈ . Since x was arbitrary, every vertex in ?

belongs to either or , and therefore ? has at most two connected components. ■

Lemma 4 If G is a graph with n vertices, m edges, and k connected components, then ≥ ? .

Proof:

Let 1,?2, ?… ,? , be the connected components of G. Let and denote the number of vertices and

edges, respectively, of , for 1 ≤ ≤ . By Lemma 3 we have ≥ ? 1, for 1 ≤ ≤ , and therefore

Lemma 5 Let G be a connected graph with n vertices and m edges. Suppose also that = ? 1. Then

G is acyclic, and hence a tree.

Proof:

Suppose G is connected and = ? 1. Assume, to get a contradiction, that G is not acyclic. Let e be any

edge belonging to any cycle in G. Remove e from G, and denote the resultant graph by ? . Observe

that ? has ? 1 edges and n vertices, respectively. Since e is a cycle edge, its removal does not

disconnect G, and therefore ? is also connected. Lemma 3 above then gives ? 1 ≥ ? 1, whence

≥ . But then = ? 1 gives ? 1 ≥ , a contradiction. Therefore our original assumption was

false, and therefore G is acyclic, as claimed. Being connected, G is also a tree. ■

Lemma 6 Let G be an acyclic graph with n vertices and m edges. Suppose also that = ? 1. Then G

is connected, and hence a tree.

Proof:

Suppose G is acyclic and = ? 1. Let k be the number of connected components of G. By Lemma 2

we have = ? , whence ? 1 = ? , and hence = 1, showing that G is connected, as claimed.

■

Lemma 7 Let G be a connected graph with n vertices and m edges. Suppose also that = . Then G

contains exactly one cycle. (Such a graph is called unicyclic.)

Proof:

G contains at least one cycle since otherwise G is a tree, and hence = 1 (by Lemma 1), contrary to

hypothesis. If G contained two distinct cycles, say 1 and 2, we could find edges 1 ∈ (1) ? (2) and

2 ∈ (2) ? (1). Removing these two edges gives a connected graph = ? 1 ? 2 with |()| ? =

and |()| ? = ? ? 2, contradicting Lemma 3. ■

Consider the following three properties of a graph = (, ) in light of Lemmas 1, 5, and 6:

(i) G is connected,

(ii) G is acyclic

(iii) || = || ? 1.

We see that these properties are logically dependent in the sense that if any two hold, then the third must

also hold. Lemma 1 states that (i) and (ii) together imply (iii), Lemma 5 says that (i) and (iii) imply (ii),

and Lemma 6 says (ii) and (iii) imply (i). The following theorem summarizes these and other facts about

trees.

Theorem 1 (The Treeness Theorem) Let = (, ) be a graph with || = and || = . Then the

following statements are equivalent.

a) G is a tree (i.e. connected and acyclic).

b) G contains a unique x-y path for any , ∈ .

c) G is connected, but if any edge is removed, the resulting graph ? is disconnected.

d) G is connected, and = ? 1.

e) G is acyclic, and = ? 1.

f) G is acyclic, but if any edge is added (joining two non-adjacent vertices), then the resulting graph

- contains a unique cycle.

Note: This is theorem B.2 in Cormen, Leiserson, Rivest, & Stein (p.1085 in 2nd ed., p.1174 in 3rd ed.)

Proof: As mentioned in the preceding paragraph, Lemmas 1, 5, and 6 have already established the

equivalences (a)?(d)?(e).

7

(a)?(b): Suppose G is a tree and let , ∈ (). If = , then the trivial path is the only x-y path in G,

since being a tree, G has no cycles. Suppose ≠ . Since G is connected, there exists at least one - path

in G. Assume, to get a contradiction, that G contains two distinct - paths. Call them 1 and 2. By

traveling along 1 from x to y, then along 2 from y to x, we obtain a closed walk in G that begins and ends

at x. If no vertex (other than x) is repeated in this walk, then we have found a cycle in G. If some vertex is

repeated in this walk, we can obtain a cycle as follows. Travel along 1 from x to the first repeated vertex,

then back to x along 2. Therefore G contains a cycle, contradicting that it is a tree. This contradiction

shows our assumption was false, and hence two different - paths cannot exist in G, which therefore

contains a unique - path.

() ? (): Suppose G contains a unique x-y path for any , ∈ . Then G is certainly connected. Pick

any edge = {, } ∈ and remove it. Since e constitutes a path joining its two ends (x and y) and since

there are no other such paths by hypothesis, the resulting graph ? contains no x-y path, and is therefore

disconnected.

() ? (): Suppose G is connected, and if any edge is removed from G, the resulting graph is disconnected.

Assume, to get a contradiction, that G contains a cycle. Then the removal of any edge on that cycle would

not disconnect G, contrary to hypothesis. It follows that no such cycle can exist, and hence G is acyclic.

Since G is also connected, it is a tree.

() ? (): Suppose G is a tree (i.e. connected and acyclic). By Lemma 1 we also have = ? 1. Pick

any two non-adjacent vertices and join them with a new edge e. The resulting graph + is still connected,

and |( + )| = + 1 = = |( + )|. Lemma 7 now says that + contains exactly one cycle.

() ? (): Suppose G is acyclic, and if a new edge e is inserted between any two non-adjacent vertices,

then the resulting graph + is unicyclic. Pick any , ∈ with ≠ . If x and y are adjacent, then

certainly y is reachable from x. If x and y are non-adjacent, insert a new edge e joining x to y, and call the

resulting cycle C. Observe that the edges of ? constitute an x-y path in G, so in this case also, y is

reachable from x. Since x and y were arbitrary, we conclude that G is connected, and therefore a tree.

This completes the proof of the Treeness Theorem. ■

Directed Graphs

A Directed Graph (or Digraph) = (, ) is a pair of sets, where the vertex set = () is, as before,

finite and non-empty, and the edge set = () ? × , i.e. E consists of ordered pairs of vertices.

The directed edge (x, y) in the above example is said to have origin x and terminus y, and we say that x is

adjacent to y. The origin and terminus of a directed edge are said to be incident with that edge. Two edges

are called adjacent if they have a common end vertex, so for instance (x, y) in the above example is adjacent

to (u, x). The in degree of a vertex is the number of edges having that vertex as terminus, and it’s out degree

is the number of edges having that vertex as origin. The degree of a vertex is the sum if it’s in degree and

out degree. Thus in the above example id() = 1, od() = 2, and deg() = 3. The analog of the

handshake lemma for directed graphs is

As in the undirected case, there is a simple notion of isomorphism for directed graphs. Two digraphs 1

and 2 are said to me isomorphic if there exists a bijection : (1) → (2) such that for any , ∈

(1), the ordered pair (, ?) is a directed edge of 1 if and only if the ordered pair ((), ?()) is a

directed edge of 2. Thus preserves incidence relations and directionality amongst the vertices and edges

of 1. We write 1 ? 2 to mean that 1 and 2 are isomorphic.

A directed path P in a digraph is a finite sequence of vertices P: 0, 1, 2, … , ?1, such that (?1, ) ∈

for all 1 ≤ ≤ . As in the undirected case, we require that all vertices be distinct (except possibly 0

and ), and that no edge be traversed more than once. If it so happens that the initial and terminal vertices

are the same, 0 = 1, the path is called a directed cycle. The length of such a path is k, the number of

edges traversed. If = 0 ≠ = , we call P a directed x-y path. Notice that, unlike the undirected case,

a directed x-y path and a directed y-x path are not the same thing. We say that ∈ () is reachable from

∈ () if G contains a directed x-y path. Observe that for digraphs, the reachability relation is reflexive

(x is reachable from x via the trivial path with no edges), transitive (if y is reachable from x, and z is reachable

from y, then z is reachable from x), but not symmetric (y may be reachable from x without x being reachable

from y).

A digraph G is said to be strongly connected if for all , ?? ∈ (), both x is reachable from y, and y is

reachable from x. Notice that the digraph in Example 6 above is not strongly connected, since for instance,

u is not reachable from y. The following example is strongly connected.

More generally, a subset ? () is said to be strongly connected if for all , ? ∈ , both x is reachable

from y, and y is reachable from x. Furthermore, a subset ? () is said to be a strongly connected

component of the digraph G if it is (i) strongly connected, and (ii) maximal with respect to property (i), i.e.

any other subset of () that properly contains S is not strongly connected. Obviously G is strongly

connected iff it has just one strongly connected component, namely () itself. Going back to the digraph

in Example 6, we see that it has 2 strongly connected components: {, , } and {}.

If we replace each directed edge in a digraph G with an undirected edge, we obtain an (undirected) graph

known as the underlying undirected graph of G. Note that two non-isomorphic digraphs, such as Examples

6 and 7 above, can have the very same underlying graph.

Representations of Graphs and Digraphs

We discuss three methods for representing graphs and digraphs in terms of standard data structures available

in most computer languages. They are called the Incidence Matrix, the Adjacency Matrix, and the

Adjacency List representations respectively. In what follows we suppose = (, ?) to be a graph

(directed or undirected) with || = and || = .

The Incidence Matrix () requires that both the vertex set () and the edge set () be ordered. For

this purpose we suppose that = {1, 2, … , } and = {1, 2, 3, … , }. Then () is an ×

rectangular matrix. Row i corresponds to vertex , for 1 ≤ ≤ . Column j corresponds to edge (1 ≤

≤ ), and contains zeros everywhere except for the two rows corresponding to the ends of . If G is an

undirected graph, these two rows contain 1s. If G is a directed graph, the row corresponding to the origin

of contains ?1, while the row corresponding to the terminus of contains +1. Thus () = () where

in the undirected case: