Recitation 6
Recitation notes for the week of October 2, 2017.
BFS/DFS Overview
- BFS explores nearest neighbors of node before moving on to the next set of neighbors, while DFS fully explores a certain path as far as possible before backtracking.
- DFS does not compute distance correctly. For more information, consult the support page on trying to use DFS to compute distance. E.g. consider the following (undirected) graph \[G=(\{A,B,C,D\},\{(A,B),(B,C),(C,D),(D,A)\}),\] i.e. this is a cycle with four nodes. To compute the distance from A to D, DFS would either output 1 or 3 depending on which path it takes first (and if we’re looking for minimum distance, 3 would be the wrong answer).
Homework 4, Question 2
- Go over definitions of
admissible
seating,distance compatible
property andfriendship distance
from the homework description. Go over example given in the homework description and examples from the walk through video also. - Remember, you have to separate your submission to have a formalization of the problem (optional if you choose to solve the second part directly), Algorithm idea, Algorithm details, Proof of correctness idea, Proof details, and runtime analysis.
Homework 4, Question 3
We will talk about a problem related to the actual Question 3:
A related Problem
Let $G=(V,E)$ be a connected graph. Recall that the distance between two nodes $u,w\in V$, which we will denote by $d(u,w)$ is the length of the shortest path connecting $u$ and $w$. In this problem we will consider how far apart the maximum and average of all these distances can be.
More formally, define the maximum distance between any two nodes, also called the diameter as defined as \[diam(G)=\max_{u,w\in V} d(u,w).\] Similarly define the average distance to be \[avgd(G)=\frac{\sum_{u\neq w\in V} d(u,w)}{\binom{|V|}{2}}.\]
Argue whether the following claim is true or false. There exists a constant $c\ge 1$ such that for every graph $G$, it is the case that \[\frac{diam(G)}{avgd(G)}\le c.\]
The difference
If we replace graph in the above by a complete binary tree, then we get the actual Question 3. (Recall that a complete binary tree is a tree such that all interior nodes have exactly two children and all leaves are at the same level.)
Proof Idea
The claim is false, so we will disprove it with a counterexample. In the claim, there has to be a fixed constant $c$ such that every possible connected graph satisfies the property. We're going to construct a graph so that for any $c$ value, we can make the $diam$ large enough and the $avgd$ small enough that their ratio will be bigger than $c$. We can do this by constructing a class of connected graphs where the bulk of the vertices are in one big, "fully connected" clump, but there's one arm, a single line of nodes, sticking out of it that is really long,like this:
The nodes in the clump will be will be connected to each other directly by an edge, so they will all be a distance $1$ from each other, which will help keep our $avgd$ down. The arm on the other hand, will be really long, so the vertex at the far end of it will have a really large distance from the vertices in the clump, so that will help keep the $diam$ quantity high. Since the claim would have to be true for every connected graph, we're free to choose the size of the clump and the length of the arm in order to make the $\frac{diam(G)}{avgd(G)}$ ratio larger than any constant $c$.
Proof Details
We'll construct our class of connected graphs as follows: $G = (V,E)$ contains $n$ vertices in two main groups, $n-d$ vertices in a fully connected clump, and $d$ vertices in a long arm, such that each vertex is only connected to its neighbors. The arm is connected to the clump as follows: the vertex closest to the clump is connected every vertex in the clump, but none of the other nodes in the arm is connected to any of the clump nodes.
We can compute the diameter easily; the maximum distance between any two vertices in the graph is between the vertex at the far end of the arm and any of the nodes in the clump. The arm has $d$ nodes in it, and therefore $d-1$ edges along it, and counting the edge to go between the arm node nearest the clump and any of the clump nodes, we get $diam(G) = (d-1)+1= d$.
Computing the $avgd$ is a bit trickier. For simplicity, we're going to approximate the sum in the numerator of the $avgd$ formula, because determining the exact formula would be an unnecessary hassle. Now, since $avgd$ is in the denominator of the fraction in the claim we're trying to disprove, we have to approximate it by overestimating the sum in $avgd$ so that we don't overestimate the diam/avgd ratio and erroneously show it to be larger than $c$. So, we can decompose the pairwise distances into three groups: the distances between the nodes within the clump, the distances between the arm nodes, and the distances between a node in the arm and a node in the clump. The first of these quantities is easily determined; since the clump vertices are connected directly by an edge, all nodes within it are a distance of 1 from each other, and there are ${n-d \choose 2}$ possible pairs of them. However, for simplicity, we'll overestimate the number of pairs as ${n \choose 2}$. For the distances between each node in the arm and every other node, we know that the maximum distance between the node at the far end and any other arm node is $d$, and there are ${d \choose 2}$ pairs, so the sum of these distances is $< {d \choose 2}d$. For the last set of pairwise distances, we observe that for each node in the clump, it is a distance of 1 from the first node in the arm, a distance 2 from the second node in the arm, etc., up to a distance $d$ from the last node in the arm. Since we know that the sum of the integers from 1 to $d$ is $\frac{d(d+1)}{2}$, and that each one of the $n-d$ nodes in the clump has that set of distances to the nodes in the arm, our total for this set of distances is $(n-d)\frac{d(d+1)}{2}$.
So, putting it all together: \[ avgd(G) = \frac{\displaystyle\sum\limits_{\{u,v\} \subseteq V} dist(u,v)}{{n \choose 2}} \] \[ avgd(G) = \frac{ \displaystyle\sum\limits_{\{u,v\} \subseteq Clump} dist(u,v) + \displaystyle\sum\limits_{\{u,v\} \subseteq Arm} dist(u,v) + \displaystyle\sum\limits_{u \in Clump, v \in Arm} dist(u,v)}{{n \choose 2}} \]
The discussion in the paragraph above implies the following:
\[ avgd(G) < \frac{{n \choose 2} + {d \choose 2}d + (n-d)\frac{d(d+1)}{2}}{{n \choose 2}} \]
Recalling that $\binom{k}{2}=\frac{k(k-1)}{2}$, we get that
\begin{align*}
avgd(G)&<~ 1+\frac{d^2(d-1)}{n(n-1)}+\frac{d(n-d)(d+1)}{n(n-1)}\\
&<~1+\frac{d^3}{n(n-1)}+\frac{d(d+1)}{n-1},
\end{align*}
where the second inequality follows from the facts that $d-1$ Now one could just start simplifying the expression above and then show that it is much smaller than $d$. However, we are going to take some "shortcuts," which might be useful for you in future problems. The main idea is to make both the expressions $\frac{d^3}{n(n-1)}$ and $\frac{d(d+1)}{n-1}$ smaller than $1$. This would imply that $avgd(G)<3$ and thus, would lead to
\[\frac{dia(G)}{avgd(G)}>\frac{d}{3}.\]
so if we pick $d=3c$, we would have dis-proved the claim, as desired. So next, we will show how to pick $d$ in terms of $n$ such that
\[\frac{d^3}{n(n-1)}\le 1,~~~~~~~ (1)\]
and
\[\frac{d(d+1)}{n-1}\le 1.~~~~~~~~ (2)\]
To help us guide our choice of $d$, we will make a rough estimation using asymptotic analysis first. Note that the denominators in the expressions in (1) and (2) are $\Theta(n^2)$ and $\Theta(n)$ respectively. Further, if we pick $d$ to be $O(n^{2/3})$, then the LHS in (1) is $O(1)$ while if we pick $d$ to be $O(\sqrt{n})$, then the LHS in (2) will be $O(1)$. Thus, $d$ being $O(\sqrt{n})$ is a good choice. However, we need to pick $d$ more precisely so that we can prove (1) and (2). This step might need some trial and error but we claim the choice
\[d=\frac{\sqrt{n}}{2}-1\]
works. To see why (2) is true with this choice note that
\[d(d+1) < \left(\frac{\sqrt{n}}{2}\right)^2 =\frac{n}{4} < n-1, \]
where the last inequality follows as $n\ge 2$. To see why (1) is true with our choice of $d$, note that
\[d^3 \le n\cdot d^2 < n\cdot\frac{n}{4} < n(n-1), \]
where the first inequality follows from the fact that $d\le n$, the second inequality follows from the argument for (2) above and the last inequality follows for $n\ge 2$. Note that our final choices are $d=3c$ and $n=4(d+1)^2$. In other words, for every choice of natural number $c$ there exists choices for $d$ and $n$ such that $dia(G)/avgd(G)>c$. This completes the proof.
Further Reading
- Make sure you carefully read the policy documents: Syllabus and Homework policies.
- Support page on using DFS to compute distances.