Homework 3
Due by 11:30pm, Tuesday, October 1, 2024.
Make sure you follow all the homework policies.
All submissions should be done via Autolab.
Sample Problem
The Problem
This problem is just to get you thinking about graphs and get more practice with proofs.
A forest with $c$ components is a graph that is the union of $c$ disjoint trees. The figure below shows for an example with $c=3$ and $n=13$ with the three connected components colored blue, read and yellow).
Prove that an $n$-vertex forest with $c$ components has $n-c$ edges.
Note
See this care package page for a proof of the statement above for $c=1$.
Below are two different approaches to this problem (there could be others).
Proof Idea 1
Assume you have an $n$-vertex forest, where $n$ is arbitrary. Then perform induction on $c$, the number of components in the forest.
Proof Details 1
Base Case ($c=1$): Consider some $n$-vertex forest with one component, essentially just a single tree. From statement $3.1$ in the text (and a proof in class), there are $n-1$ edges in this forest. This satisfies the property that there are $n-c$ edges in a forest with $c$ components, since $c=1$ here.
Induction Hypothesis ($c=k-1$): We'll assume the property holds for some $c=k-1$. So we are assuming that for any $n\ge k-1$, an $n$-vertex forest with $k-1$ components contains $n-(k-1)$ edges.
Proof of Induction Step ($c=k$): A forest with $k-1$ components can be created from any one with $k$ components: if the components are to remain trees, and not contain cycles, the addition of one edge must join two components, since the addition of an edge within a component would create a cycle. Given a forest with $k$ components, we add an edge between two disjoint trees. Note that the resulting forest has $k-1$ connected components (each of which is a tree). Thus, by the induction hypothesis, this new forest has $n-(k-1)=n-k+1$ edges. Since we got to the new forest by adding an edge, the original forest (with $k$ trees) must have one less edge, i.e. $n-k$, as desired.
Proof Idea 2
Use the property that each of the $c$ components in the forest contains one fewer edge than it has vertices (since it is a tree). Adding up all of the vertices across the $c$ components will give us $n$, and adding up the edges across the $c$ components will give us the total number for the forest.
Proof Details 2
Suppose we have a forest of $c$ components with a total of $n$ vertices. Further suppose that the $i^{th}$ connected component in the forest has $n_i$ vertices-- note that $\sum_{i=1}^c n_i=n$. Then it has $n_i -1$ edges. Summing over all of the components in the forest, we have: \(\sum_{i=1}^c \left(n_i - 1\right) = \left(\sum_{i=1}^c n_i\right)- c = n-c.\) Thus, an $n$-vertex forest with $c$ components has $n-c$ edges, as desired.
Submission
You will NOT submit this question. This is for you to get into thinking more about proving properties of graphs.
Question 1 (Seating ADFs) [50 points]
The Problem
There is a group of $p$ Ava DuVernay fans (or ADFs for short) who have booked an entire movie theater for a screening of A Wrinkle in Time . Out of the $p(p-1)/2$ possible pairs, $f$ pairs of ADFs are friends. The movie theater is called the Square Theater and has $p$ rows with each row having exactly $p$ chairs in them. (If it helps, you can assume that the $p^2$ chairs are arranged in a "grid"/"matrix" form where each of the $p$ rows has $p$ chairs each and each of the $p$ columns has $p$ chairs each.)
The ADF organization committee has come to you to help them efficiently solve the following seating problem for them. Informally, you want to seat the ADFs so that friends can talk to each other (and "distant" friends remain distant). Next, we make this informal description more formal.
A seating (as you might expect) is an assignment of the $p$ ADFs to the $p^2$ chairs in the theater so that each ADF gets his/her own chair.
A seating is called admissible
if
- for every pair of friends $(b_1,b_2)$, they can
talk
to each other during the movies - some ADF is assigned a seat in the first row and
- the seating satisfies the
distance compatible
property.
The ADFs have been to the Square theater before so they have figured out that two people can talk
to each other if and only if they are either
- seated in the same row or
- are seated in the rows next to them (i.e. either to the row directly in front or the row directly behind. The first and the last row of course only have one row next to them).
A seating has the distance compatible
property if and only if the following holds. For any $b_1$ who is assigned a seat in the first row and any ADF $b_2$, if $b_1$ and $b_2$ have a friendship distance of $d$ (assuming $d$ is defined), then they have to be seated $d$ or more rows apart. (The friendship distance is the natural definition. Let a friendship path between $b_1$ and $b_2$ be the sequence of ADFs $f_0=b_1,f_1,\dots,f_{k-1},f_k=b_2$ (for some $k\ge 0$) such that for every $0\le i\le k-1$, $(f_i,f_{i+1})$ are friends-- the length of such a path is $k$. The friendship distance between $b_1$ and $b_2$ is the shortest length of any friendship path between them. So if $b_1$ is $b_2$'s friend, they have a friendship distance of $1$ and if $b_2$ is a friend of a friend, they have a distance of $2$ and so on. The friendship distance is not defined if there is no friendship path between $b_1$ and $b_2$.)
Note
Note that the above property is not defined for all pairs of ADFs $b_1$ and $b_2$. (Indeed the problem you will be asked to solve is impossible if the distance compatible
property has to hold for all pairs.) Note that the above is defined for only those $b_1$ who are seated in the first row (and every possible $b_2$).
As an example consider the case of $p=3$ and $f=2$ as follows. The ADFs are $A,B,C$ and the $(A,B)$ and $(B,C)$ are the pairs of friends. Then an admissible seating is to assign the "left-most" seat on first row to $A$, the left-most seat in second row to $B$ and the left-most seat on the third row to $C$. This following is also an admissible seating: $B$ is assigned (any seat) in the first row while $A$ and $C$ are assigned any two seats in the second row.
Your final task is to design an $O(f+p^2)$ time algorithm that computes an admissible seating, given as input the set of $p$ ADFs and the set of $f$ pairs of ADFs who are friends. Here are the two parts:
- Part (a): Formalize the problem above in terms of graphs: i.e. write down
- How you would represent the input as a graph $G$;
- How you would define a seating and
- Define what it means for a seating to be admissible in terms of properties of graph $G$.
- Part (b): Using part 1 or otherwise design an $O(f+p^2)$ time algorithm to compute an admissible seating. (Recall that your algorithm should work for any set of $p$ people and any possible set of $f$ friendships.) You should prove the correctness of your algorithm. You also need justify the running time of your algorithm.
Hint
Think how you can use some of the graph exploration algorithms we have seen in class.
Common Mistake
A common mistake in this (and related) problem is to think that both BFS and DFS are completely interchangeable. There are concrete cases where they are not.
Walkthrough video
Here is the walkthrough video from Fall 18, which would work just as well for this year-- just ignore the fact that it talks about Q2 on HW4 instead of Q1 on HW3 (thanks to Dhruv Kumar for the video):
Submission
Submit part (a) and (b) separately
You need to submit two (2) PDF files to Autolab: one for part (a) and one for part (b). While you can assume part (a) as a given for part (b), to get credit for part (a) you have to submit your solution for part (a) separately from part (b).
Make sure you submit the correct PDF to the correct submission link on Autolab. If you do not (e.g. if you submit Q1(a) PDF to Q1(b) or even Q2(a) or Q2(b)), then you will lose ALL points.
We recommend that you typeset your solution but we will accept scans of handwritten solution-- you have to make sure that the scan is legible.
PDF only please
If Autolab cannot display your file, (irrespective of the reason) then you will get a zero (0) on the entire question.
Autolab might not be able to display files in formats other than PDF (e.g. Word cannot be displayed). Note that Autolab will "accept" your submission even if you submit non-PDF file, so it is YOUR responsibility to make sure you submit in the correct format.
Also the file size has to be at most 3MB.
Grading Guidelines
We will follow the usual grading guidelines for non-programming questions. Here is a high level grading rubric specific to part (a) of this problem:
- Formalizing the problem:
10
points.
- Algorithm idea:
9
points. - Algorithm details:
9
points. - Proof of correctness idea:
9
points. - Proof details:
9
points. - Runtime analysis:
4
points. Note that you only need a Big-Oh analysis.
Note
If you do not have separated out and labeled algorithm idea, algorithm details, proof idea, proof details and runtime analysis for part (b), you will get a zero(0) irrespective of the technical correctness of your solution.
Note
Your must explicitly list your sources and collaborators in your submission file before uploading it to Autolab. Note that you can only used one of the five allowed sources. If you have used a source that is not allowed, please do not submit your homework. If you did not consult any source and/or did not collaborate with anyone just say None
.
Question 2 (Listing $4$-cliques) [25 points]
The Problem
A 4-clique
in a graph $G=(V,E)$ is a set of four distinct vertices $\{u,v,w,x\}$
such that $(u,v),(v,w),(w,x),(x,u),(u,w),(v,x)\in E$. (Note that $G$ is undirected.) In this problem you will design
a series of algorithms that given a connected graph $G$ as input, lists all
the $4$-cliques in $G$. (It is fine to list one $4$-clique more than once.) We call this the 4-clique listing problem
(duh!). You can assume that as input you are given $G$ in both the adjacency matrix and
adjacency list format. For this problem you can also assume that $G$ is connected.
Sample Input/Output pair
- Input: Consider the input $G=(\{A,B,C,D,E\}, \{(A,B),(A,C), (A,D), (B,C), (B,D), (C,D), (A,E), (B,E), (C,E), (D,E)\})$: i.e. a graph on $5$ vertices where all possible $\binom{5}{2}=10$ pairs of vertices are connected by an edge.
- Output: The list $\{A,B,C,D\}, \{A,B,C,E\}, \{A,B,D,E\}, \{A,C,D,E\}, \{B,C,D,E\}$.
Below are the two parts of the problem:
- Part (a): Present an $O(n^4)$ time algorithm to solve the $4$-clique listing problem.
Note
The trivial algorithm works. For this part only a correct algorithm idea is needed.
- Part (b): Present an $O(m^2)$ algorithm to solve the $4$-clique listing problem.
Walkthrough video
Here is the walkthrough video-- ignore the part where it says Fall 2023 (by Caleb Levine ):
Submission
Submit part (a) and (b) separately
You need to submit two (2) PDF files to Autolab: one for part (a) and one for part (b). While you can assume part (a) as a given for part (b), to get credit for part (a) you have to submit your solution for part (a) separately from part (b).
Make sure you submit the correct PDF to the correct submission link on Autolab. If you do not (e.g. if you submit Q2(a) PDF to Q2(b) or Q1(a) or Q1(b)), then you will lose ALL points.
We recommend that you typeset your solution but we will accept scans of handwritten solution-- you have to make sure that the scan is legible.
PDF only please
If Autolab cannot display your file, (irrespective of the reason) then you will get a zero (0) on the entire question.
Autolab might not be able to display files in formats other than PDF (e.g. Word cannot be displayed). Note that Autolab will "accept" your submission even if you submit non-PDF file, so it is YOUR responsibility to make sure you submit in the correct format.
Also the file size has to be at most 3MB.
Grading Guidelines
We will follow the usual grading guidelines for non-programming questions. Here is a high level grading rubric specific to part (a) of this problem:
- Algorithm idea:
10
points.
- Algorithm idea:
2
points. - Algorithm details:
3
points. - Proof of correctness idea:
2
points. - Proof details:
3
points. - Runtime analysis:
5
points.
Note
If you do not have separated out and labeled algorithm idea, algorithm details, proof idea, proof details and runtime analysis for part (b), you will get a zero(0) irrespective of the technical correctness of your solution.
Note
Your must explicitly list your sources and collaborators in your submission file before uploading it to Autolab. Note that you can only used one of the five allowed sources. If you have used a source that is not allowed, please do not submit your homework. If you did not consult any source and/or did not collaborate with anyone just say None
.
Question 3 (Programming Assignment) [25 points]
Note
This assignment can be solved in either Java, Python or C++ (you should pick the language you are most comfortable with). Please make sure to look at the supporting documentation and files for the language of your choosing.
The Problem
In this problem we will explore graphs and problems that are relatable to the 6 degrees of separation problem, which claims that everyone in the world is connected by 6 people or less. We will investigate different networks to see the minimum distance between some starting node, s, and all other nodes in the graph. We will be using real-life data sets to test your code.
We are given a starting node $s$ for some undirected graph $G$ with $n$ nodes. The graph is formatted as an adjacency list, meaning that for each node, $u$, we can access all of $u$'s neighbors. The goal is to output all $n$ nodes in $G$ along each $u$'s minimum distance from $s$.
Input
The input file is given with the first line as the starting node and the remainder of the file is an adjacency list for graph $G$ (we assume that the set of vertices is $\{0,1,\dots,n-1\}$). The adjacency list assumes that the current node is the index of the line under consideration. For instance line 0 of the input file (not including the starting node) has the list of all nodes adjacent to node 0.
s <- Starting node (some node between u0 and un-1) u1 u4 u6 <- All nodes that share edges with u0 u3 uu <- All nodes that share edges with u1 u0 <- All nodes that share edges with u2 . . . u0 u4 u2 u7 <- All nodes that share edges with un-1
For example
2 <- Starting node 1 2 <- Node 0 shares edges with nodes 1 & 2 0 3 <- Node 1 shares edges with nodes 0 & 3 0 3 4 5 <- Node 2 shares edges with nodes 0, 3, 4 & 5 1 2 6 7 <- Node 3 shares edges with nodes 1, 2, 6 & 7 2 <- Node 4 shares edges with node 2 2 <- Node 5 shares an edge with node 2 3 8 <- Node 6 shares an edge with nodes 3 & 8 3 8 <- Node 7 shares edges with nodes 3 & 8 6 7 <- Node 8 shares edges with nodes 6 & 7
If we draw the graph using the above input, it will look like this:
Output
The output is every node (given by the index in the array), along with it's minimum distance from the starting node formatted as your language of choice's array-type data structure (see below for the exact details).
[d0 d1 d2... dm] <- Node 0 is distance d0 from the starting node Node 1 is distance d1 from the starting node Node 2 is distance d2 from the starting node . . . Node m is distance dm from the starting node
For example, using the example input from above:
[1, 2, 0, 1, 1, 1, 2, 2, 3] <- Node 0 is distance 1 from node 2 Node 1 is distance 2 from node 2 Node 2 is distance 0 from itself Node 3 is distance 1 from node 2 Node 4 is distance 1 from node 2 Node 5 is distance 1 from node 2 Node 6 is distance 2 from node 2 Node 7 is distance 2 from node 2 Node 8 is distance 3 from node 2
Note
There is no guarantee that the graph has only one connected component. In the situation where a node can not be reached from the starting node, the distance returned by your algorithm for that node should be -1.
Hint
The best possible algorithm for this problems that we are aware of runs in time $O(m + n)$ where $m$ in the total number of edges, and $n$ is the total number of nodes.
Note
Both the input and output parsers in each of the three languages are already written for you. Note that you have to work with the input data structures provided (which will come pre-loaded with the data from an input file).
Addition is the only change you should make
Irrespective of what language you use, you will have to submit just one file. That file will come pre-populated with some stuff in it. You should not change any of those things because if you do you might break what the grader expects and end up with a zero on the question. You should of course add stuff to it (including helper functions and data structures as you see fit).
Directory Structure
├── src │ ├── ub │ ├── cse │ ├── algo │ ├── Driver.java │ ├── Graph.java │ ├── HW3Utility.java │ ├── Solution.java ├── testcases/ │ ├── input1.txt │ ├── input2.txt │ ├── input5.txt ├── outputs/ │ ├── output1.txt │ ├── output2.txt │ └── output5.txt
You are given four coding files: Driver.java
,Graph.java
,HW3Utility.java
, and Solution.java
. Driver.java
takes the input file, parses it with a new instance of HW3Utility
and creates an instance of the class Solution
and calls the outputDistances()
method on it. It then prints the distances (which, if your code is correct, will be the shortest distace from the start node). You only need to update the Solution.java
file.
On top of that, you may write your own helper methods and data structures in it.
The testcases folder has 3 input files and their corresponding output files for your reference. We will use these three input files (and seven others) in our autograding.
Method you need to write:
public int[] outputDistances() { return null; }
The Solution
class has 2 instance variables:
startNode
which is of type int and stores the starting node id.graph
which is of typeHashMap<Integer, ArrayList<Integer>>
where the key is the node id and the value is an arraylist of adjacent nodes.
int
s. In particular, the location i
(for $0\le i <n$) contains the distance of node i
from startNode
. Finally, note that the starting node is distance 0 from itself.
Compiling and executing from command line:
Assuming you're in the same directory level as src
. Run javac src/ub/cse/algo/*.java to compile.
To execute your code on input1.txt
, run java -cp "src" ub.cse.algo.Driver testcases/input1.txt. The output array will be printed to stdout
.
Submission
You only need to submit Solution.java
to Autolab.
Directory Structure
├── Driver.py ├── Graph.py ├── Solution.py ├── Utility.py ├── testcases/ │ ├── input1.txt │ ├── input2.txt │ ├── input5.txt ├── outputs/ │ ├── output1.txt │ ├── output2.txt │ └── output5.txt
You are given four coding files: Driver.py
,Graph.py
,Solution.py
, and Utility.py
. Driver.py
takes the input file, parses it with a new instance of Utility
and creates an instance of the class Solution
and calls the output_distances()
method on it. It then prints the distances (which, if your code is correct, will be the shortest distace from the start node). You only need to update the Solution.py
file.
On top of that, you may write your own helper methods and data structures in it.
The testcases folder has 3 input files and their corresponding output files for your reference. We will use these three input files (and seven others) in our autograding.
Method you need to write:
def output_distances(self): """ :return: the list of minimum distances from each node to the start node """ return [] # Return empty
The Solution
class has 2 instance variables.
start_node
which is the unique id of the start node.graph
which is a dictionary where the key is a unique node id and the value is a list of nodes that share an edge with the key node
Your method is expected to return a list
where the minimum distance from the start node is stored at the index corresponding to every node's unique id. In particular, the location i
(for $0\le i <n$) contains the distance of node i
from start_node
. Finally, note that the starting node is distance 0 from itself.
Executing from command line:
Assuming you're in the same directory level as Driver.py
and you want to run your code on the input1.txt
. Run python Driver.py testcases/input1.txt.
Submission
You only need to submit Solution.py
to Autolab.
Directory Structure
├── Driver.cpp ├── Graph.h ├── HW3Utility.h ├── Solution.cpp ├── Utility.h ├── testcases/ │ ├── input1.txt │ ├── input2.txt │ ├── input5.txt ├── outputs/ │ ├── output1.txt │ ├── output2.txt │ └── output5.txt
You are given five coding files: Driver.cpp
,Graph.h
,HW3Utility.h
,Solution.cpp
, and Utility.h
. Driver.cpp
takes the input file, parses it with a new instance of Utility
and creates an instance of the class Solution
and calls the outputDistances()
method on it. It then prints the distances (which, if your code is correct, will be the shortest distace from the start node). You only need to update the Solution.cpp
file.
On top of that, you may write your own helper methods and data structures in it.
The testcases folder has 3 input files and their corresponding output files for your reference. We will use these three input files (and seven others) in our autograding.
Method you need to write:
vector<int> outputDistances() { return m_outputDistances; }
Note: The output for this function must output a vector
of int
. In particular, the location i
(for $0\le i <n$) contains the distance of node i
from start_node
. Finally, note that the starting node is distance 0 from itself.
The Solution
class has 3 member variables.
graph
which is a map where each key is a unique node id as an int and the value is a corresponding list of adjacent nodes for that key as a vectorstart_node
which is an integer representing the node id of the start nodem_outputDistances
a vector which will have an index for every unique node id where that position will hold the distance of that node from the starting node
Compiling and executing from command line:
Assuming you're in the same directory level as Driver.cpp
. Run g++ -std=c++11 Driver.cpp to compile.
To execute your code on input1.txt
, run ./a.out testcases/input1.txt.
Submission
You only need to submit Solution.cpp
to Autolab.
Grading Guidelines
We will follow the usual grading guidelines for programming questions.
For those of you who are feeling a little ambitious
For the top 3 submissions in the scoreboard in Python, the top 2 submissions in the scoreboard in Java and the top submission in the scoreboard in C++, we are offering 2.5 bonus points. But be warned! You should not be spending too much time on this. We rather you work on Questions 1 and 2 above.