Skip to main content

Trees


Minimally Connected Graphs

Theorem 1

Let GG be a connected simple graph on nn vertices. Then the following are equivalent.

(1) GG is minimally connected, that is, if we remove any edge of GG, then the obtained graph GG^{\prime} will not be connected.

(2) GG does not contain a cycle.

Definition

A connected simple graph GG satisfying either, and therefore, both, criteria of Theorem 1 is called a tree.

Proof

(1) \Rightarrow (2) Let us assume that GG is minimally connected, but it contains a cycle CC. Remove the edge aba b of CC. We claim that GG is still connected. Indeed, let xx and yy be two vertices in GG. As GG is connected, GG contains a path pp from xx to yy. If pp does not contain the edge aba b, then it still connects xx and yy. If pp contains aba b, then let us replace aba b by the other (longer) arc aba b, to get a new walk from xx to yy. As there is a walk from xx to yy in GG^{\prime}, there must also be a path. Therefore, GG^{\prime} is connected, which is a contradiction.

(2) \Rightarrow (1) We prove that the opposite of (1) implies the opposite of (2). That will suffice, because it will imply that if (2) holds, the opposite of (1) cannot hold as that would imply the opposite of (2), therefore (1) has to hold. So "(2) implies (1)" will follow. Let us assume that GG is not minimally connected. That means that there is an edge in GG, say ABA B, so that G=G{AB}G^{\prime}=G-\{A B\} is still connected. Then there is a path PP from BB to AA in GG^{\prime}. However, ABPA B \cup P must then be a cycle in GG as it defines a path that starts in AA and ends in A. So GG contains a cycle.

Corollary

A connected graph HH is a tree if and only if for each pair of vertices (x,y)(x, y), there is exactly one path joining xx and yy.

Theorem 2

All trees on nn vertices have n1n-1 edges. Conversely, all connected graphs on nn vertices with exactly n1n-1 edges are trees.

Proof

We use induction on nn. If n=1n=1, the statement is trivially true as a 1-vertex cycle-free graph has no edges. Let us assume that the statement is true for trees on nn vertices. Let TT be a tree on n+1n+1 vertices. Find a leaf ll in TT (the previous lemma ensures the existence of two leaves), then delete ll and the only edge ee adjacent to it from TT, to get a new tree TT^{\prime}. (Note that TT^{\prime} is always a tree as it is connected and cycle-free.) This new tree TT^{\prime} has nn vertices, so by the induction hypothesis, it has n1n-1 edges. But then T=TeT=T^{\prime} \cup e has nn edges, and the Theorem is proved.

Proposition

Let FF be a forest on nn vertices with kk connected components. Then FF has nkn-k edges.

Proof

By Theorem 2, the number of vertices exceeds that of edges by one in each connected component, and the proof follows.

Cayley's Formula

For any positive integer nn, the number of all trees with vertex set [n][n] is An=nn2A_n = n^{n-2}.

Proof

Take all AnA_n trees on [n][n], and in each of them, choose two vertices, which do not have to be different, and call one of them Start, and the other one End. Do this in all possible n2n^2 ways for each tree. Call the n2Ann^2 A_n objects obtained this way doubly rooted trees.

We are going to show that the number of doubly rooted trees on [n][n] is nnn^n by constructing a bijection from the set of all functions from [n][n] to [n][n] to that of doubly rooted trees on [n][n]. This will prove our Theorem.

Let ff be a function from [n][n] to [n][n]. Let C[n]C \subseteq[n] be the subset of elements x[n]x \in[n] which are part of a cycle under the action of ff, that is, for which there is a positive integer ii so that fi(x)=xf^i(x)=x. Let C={c1<c2<<ck}C=\left\{c_1<c_2<\cdots<c_k\right\}. Now let di=f(ci)d_i=f\left(c_i\right), and write the integers d1,d2,,dkd_1, d_2, \cdots, d_k in this order to the nodes of a tree consisting of one line of kk vertices. In other words, we write down the elements of CC in the order given by the permutation that is the product of the cycles on CC. Also, we mark d1d_1 by Start and dkd_k by End.

Finally, if j[n]j \in[n], but jCj \notin C, then join the vertex jj to the vertex f(j)f(j) by an edge. This way we always get a tree. Indeed, we get a connected graph as the Start-End line is connected to all vertices, and we get a cycle-free graph as the only cycles created by ff involved vertices from CC, and CC corresponds to a single line. The tree is doubly rooted, as the vertices Start and End are marked.

To see that this is a bijection, take a doubly rooted tree on [n][n]. For vertices jj not on the Start-End line, define f(j)f(j) to be the first neighbor of jj on the unique path from jj to the Start-End line. For the vertices on the Start-End line, define ff so that the image of the ii th smallest of them is the one that is in the ii th position from Start.

This shows that there is exactly one function f:[n][n]f : [n] \rightarrow [n] corresponding to each doubly rooted tree, and our theorem is proved.


Minimum-weight Spanning Trees

Kruskal's Greedy Algorithm
  1. Create a forest FF (a set of trees), where each vertex in the graph is a separate tree
  2. Create a sorted set SS containing all the edges in the graph
  3. While SS is nonempty and FF is not yet spanning
    • Remove an edge with minimum weight from SS
    • If the removed edge connects two different trees then add it to the forest FF, combining two trees into a single tree

Lemma

Let FF and FF^{\prime} be two forests on the same vertex set VV, and let FF have less edges than FF^{\prime}. Then FF^{\prime} has an edge e that can be added to FF so that the obtained graph FeF \cup e is still a forest.

Proof

Let us assume that there is no such edge eFe \in F^{\prime}. Then adding any edge of FF^{\prime} to FF would create a cycle in FF. So all edges of FF^{\prime} are between two vertices of the same component of FF. Therefore, FF^{\prime} has at least as many components as FF. This is a contradiction, however, as we know that a forest on nn vertices and with kk components has nkn-k edges, so if FF^{\prime} has more edges than FF, it must have less components. Now we are in a position to prove the main result of this section.

Theorem

The greedy algorithm always finds the minimum-weight spanning tree.

Proof

Again, we use an indirect argument. Assume the greedy algorithm gives us the spanning tree TT, whereas our graph GG has a spanning tree HH whose total weight is less than that of TT. Let h1,h2,,hn1h_1, h_2, \cdots, h_{n-1} be the edges of HH so that w(h1)w(h2)w(hn1)w\left(h_1\right) \leq w\left(h_2\right) \leq \cdots \leq w\left(h_{n-1}\right) holds. Similarly, let t1,t2,,tn1t_1, t_2, \cdots, t_{n-1} be the edges of TT so that w(t1)w(t2)w(tn1)w\left(t_1\right) \leq w\left(t_2\right) \leq \cdots \leq w\left(t_{n-1}\right) holds.

Let ii be the step at which HH first "beats" TT. That is, let ii be the smallest integer so that j=1iw(hj)<j=1iw(tj)\sum_{j=1}^i w\left(h_j\right)<\sum_{j=1}^i w\left(t_j\right). Such an index ii exists as at the end of the entire selection procedure HH beats TT, so there has to be a time HH takes the lead. It is also clear that i>1i>1 as w(t1)w\left(t_1\right) is minimal among all the edge-weights of GG.

As ii is the first index at which HH took the lead, the inequality w(hi)<w\left(h_i\right)< w(ti)w\left(t_i\right) must hold. Indeed, this is the only way

j=1iw(hj)<j=1iw(tj)\sum_{j=1}^i w\left(h_j\right)<\sum_{j=1}^i w\left(t_j\right)

and

j=1i1w(hj)j=1i1w(tj)\sum_{j=1}^{i-1} w\left(h_j\right) \geq \sum_{j=1}^{i-1} w\left(t_j\right)

can both hold. We will deduce a contradiction from this, that is, we will prove that with w(hi)<w(ti)w\left(h_i\right)<w\left(t_i\right) holding, the greedy algorithm could not possibly choose tit_i at step ii. Let Ti1T_{i-1} be the forest the greedy algorithm produced in i1i-1 steps, that is, the union of the edges t1,t2,,ti1t_1, t_2, \cdots, t_{i-1}, and let HiH_i be the forest formed by the edges h1,h2,,hih_1, h_2, \cdots, h_i. Applying Lemma 10.11 to Ti1T_{i-1} and HiH_i, we see that there is an edge hjh_j (for some jij \leq i ) that can be added to Ti1T_{i-1} without forming a cycle. However, our definitions show that w(hj)w(hi)<w(ti)w\left(h_j\right) \leq w\left(h_i\right)<w\left(t_i\right), so at step ii, the greedy algorithm could not add tit_i to Ti1T_{i-1} as tit_i did not have minimum weight among the edges that could be added to Ti1T_{i-1} without forming a cycle.

This proves by contradiction that no spanning tree HH can have a smaller total weight than TT, the tree obtained by the greedy algorithm.


Graphs and Matrices

Definition

Let GG be an undirected graph on nn labeled vertices, and define an n×nn \times n matrix A=AGA = A_G by setting Ai,jA_{i,j} equal to the number of edges between vertices ii and jj. Then AA is called the adjacency matrix of GG.

Theorem

Let GG be a graph on labeled vertices, let AA be its adjacency matrix, and let kk be a positive integer. Then Ai,jkA_{i, j}^k is equal to the number of walks from ii to jj that are of length kk.

Proof

By induction on kk. For k=1k=1, the statement is true as a walk of length one is an edge. Now let us assume that the statement is true for kk, and prove it for k+1k+1. Let zz be any vertex of GG. If there are bi,zb_{i, z} walks of length kk from ii to zz, and there are az,ja_{z, j} walks of length one (in other words, edges) from zz to jj, then there are bi,zaz,jb_{i, z} a_{z, j} walks of length k+1k+1 from ii to jj whose next-to-last vertex is zz. Therefore, the number of all walks of length k+1k+1 from ii to jj is

c(i,j)=zGbi,zaz,jc(i, j)=\sum_{z \in G} b_{i, z} a_{z, j}

It follows from the induction hypothesis that the matrix BB defined by Bi,j=B_{i, j}= bi,jb_{i, j} fulfills B=AkB=A^k. It is immediate from the definition of the adjacency matrix AA of GG that Ai,j=ai,jA_{i, j}=a_{i, j}.

Therefore, it follows from the definition of matrix multiplication that c(i,j)=zGbi,zaz,jc(i, j)=\sum_{z \in G} b_{i, z} a_{z, j} is in fact the (i,j)(i, j)-entry of BA=Ak+1B A=A^{k+1}, (indeed, it is the scalar product of the ii th row of BB and the jj th column of AA ), and our claim is proved.


The Number of Spanning Trees on a Graph

Definition

Let GG be a directed graph without loops. Let {v1,v2,,vn}\left\{v_1, v_2, \cdots, v_n\right\} denote the vertices of GG, and let {e1,e2,,em}\left\{e_1, e_2, \cdots, e_m\right\} denote the edges of GG. Then the incidence matrix of GG is the n×mn \times m matrix AA defined by

  • ai,j=1a_{i, j}=1 if viv_i is the starting vertex of eje_j,
  • ai,j=1a_{i, j}=-1 if viv_i is the ending vertex of eje_j, and
  • ai,j=0a_{i, j}=0 otherwise.

Matrix-Tree theorem

Let #U# be a simple undirected graph. Let {v1,v2,,vn}\{ v_1,v_2,\cdots ,v_n \} be the vertices of UU. Define the (n1)×(n1)(n-1)\times (n-1) matrix L0L_0 by

li,j={ the degree of vi if i=j,1 if ij, and vi and vj are connected, and 0 otherwise, l_{i, j}=\left\{\begin{array}{l} \text { the degree of } v_i \text { if } i=j, \\ -1 \text { if } i \neq j, \text { and } v_i \text { and } v_j \text { are connected, and } \\ 0 \text { otherwise, } \end{array}\right.

where 1i,jn11 \leq i, j \leq n-1. Then UU has exactly detL0\operatorname{det} L_0 spanning trees.

Proof

First, we turn UU into a directed graph GG by replacing each edge of UU by a pair of directed edges, one edge going in each direction.

Let A0A_0 be the incidence matrix of GG with the last row removed. We claim that A0A0T=2L0A_0 A_0^T=2 L_0. The entry of A0A0TA_0 A_0^T in position (i,j)(i, j) is the scalar product of the ii th and jj th row of A0A_0. If i=ji=j, then every edge that starts or ends at viv_i contributes 1 to this inner product. Therefore, the entry of A0A0TA_0 A_0^T in position (i,i)(i, i) is the degree of viv_i in GG, or, in other words, twice the degree of viv_i in UU.

If iji \neq j, then every edge that starts at viv_i and ends at vjv_j, and every edge that starts at vjv_j and ends at viv_i contributes -1 to this inner product. Recall that UU is simple, so there is either no edge or one edge from viv_i to vjv_j in GG. So the entry of A0A0TA_0 A_0^T in position (i,j)(i, j) is -2 if vivjv_i v_j is an edge of UU, and 0 otherwise. This proves that indeed, A0A0T=2L0A_0 A_0^T=2 L_0.

This implies that 2n1detL0=det(A0A0T)2^{n-1} \operatorname{det} L_0=\operatorname{det}\left(A_0 A_0^T\right). Note that each spanning tree of UU can be turned into 2n12^{n-1} different spanning trees of GG by orienting its n1n-1 edges.

Theorem

Let GG be a directed graph without loops, and let AA be the incidence matrix of GG. Remove any row from AA, and let A0A_0 be the remaining matrix. Then the number of spanning trees of GG is detA0A0T\operatorname{det} A_0 A_0^T.

Proof

Let us assume, without loss of generality, that it was the last row of AA that we removed. Let BB be an (n1)×(n1)(n-1) \times(n-1) submatrix of A0A_0. (If m<n1m<n-1, then GG cannot be connected, and it has no spanning trees.) We claim that detB=1|\operatorname{det} B|=1 if and only if the subgraph GG^{\prime} corresponding to the columns of BB is a spanning tree, and detB=0\operatorname{det} B=0 otherwise. We prove this claim by induction on nn. (a) Let us first assume that there is a vertex vi(in)v_i(i \neq n) of degree one in GG^{\prime}. (The degree of a vertex in an undirected graph is the number of all edges adjacent to that vertex.) Then the ii th row of BB contains exactly one nonzero element, and that element is 11 or 1-1. Expanding detB\operatorname{det} B by this row, and using the induction hypothesis, the claim follows. Indeed, GG^{\prime} is a spanning tree of GG if and only if GviG^{\prime}-v_i is a spanning tree of GviG-v_i (b) Now let us assume that GG^{\prime} has no vertices of degree one (except possibly vnv_n, the vertex associated to the deleted last row). Then GG^{\prime} is not a spanning tree. Moreover, as GG^{\prime} has n1n-1 edges, and is not a spanning tree, there must be a vertex in GG^{\prime} that has degree zero. If this vertex is not vnv_n, then BB has a zero row and detB=0\operatorname{det} B=0. If this vertex is vnv_n, then each column of BB contains one 11, and one 1-1 as each edge has a head and a tail. Therefore, the sum of all rows of BB is 00, so the rows of BB are linearly dependent, and detB=0\operatorname{det} B=0.

So we have proved that indeed, detB=1|\operatorname{det} B|=1 exactly if the subgraph GG^{\prime} corresponding to the columns of BB is a spanning tree, and detB=0\operatorname{det} B=0 otherwise. The Binet-Cauchy formula, which can be found in most Linear Algebra textbooks, says that

detA0A0T=(detB)2\operatorname{det} A_0 A_0^T=\sum(\operatorname{det} B)^2

where the sum ranges over all (n1)×(n1)(n-1) \times(n-1) submatrices BB of A0A_0. However, we have just seen that (detB)2=1(\operatorname{det} B)^2=1 if and only if BB corresponds to a spanning tree of AA, and (detB)2=0(\operatorname{det} B)^2=0 otherwise.

Corollary

The number of spanning trees of KnK_n is nn2n^{n-2}.

Proof

The matrix L0L_0 associated to KnK_n will have the following simple structure

(n1111n1111n1)\left(\begin{array}{cccc} n-1 & -1 & \cdots & -1 \\ -1 & n-1 & \cdots & -1 \\ \cdots & & & \\ -1 & -1 & \cdots & n-1 \end{array}\right)

To compute the determinant of this matrix, add all rows to the first, to get

(1111n1111n1)\left(\begin{array}{cccc} 1 & 1 & \cdots & 1 \\ -1 & n-1 & \cdots & -1 \\ \cdots & & & \\ -1 & -1 & \cdots & n-1 \end{array}\right)

Now add the first row to all other rows to get the upper triangular matrix

(1110n000n)\left(\begin{array}{llll} 1 & 1 & \cdots & 1 \\ 0 & n & \cdots & 0 \\ \cdots & & \\ 0 & 0 & \cdots & n \end{array}\right)

This shows that det L0=nn2L_0=n^{n-2} as claimed.