Homework 3

Due by 11:30pm, Tuesday, October 1, 2024.

Make sure you follow all the homework policies.

All submissions should be done via Autolab.

Sample Problem

The Problem

This problem is just to get you thinking about graphs and get more practice with proofs.

A forest with $c$ components is a graph that is the union of $c$ disjoint trees. The figure below shows for an example with $c=3$ and $n=13$ with the three connected components colored blue, read and yellow).

Note that a tree is a forest with $1$ component.

Prove that an $n$-vertex forest with $c$ components has $n-c$ edges.

Note

See this care package page for a proof of the statement above for $c=1$.

Click here for the Solution

Below are two different approaches to this problem (there could be others).

Proof Idea 1

Assume you have an $n$-vertex forest, where $n$ is arbitrary. Then perform induction on $c$, the number of components in the forest.

Proof Details 1

Base Case ($c=1$): Consider some $n$-vertex forest with one component, essentially just a single tree. From statement $3.1$ in the text (and a proof in class), there are $n-1$ edges in this forest. This satisfies the property that there are $n-c$ edges in a forest with $c$ components, since $c=1$ here.

Induction Hypothesis ($c=k-1$): We'll assume the property holds for some $c=k-1$. So we are assuming that for any $n\ge k-1$, an $n$-vertex forest with $k-1$ components contains $n-(k-1)$ edges.

Proof of Induction Step ($c=k$): A forest with $k-1$ components can be created from any one with $k$ components: if the components are to remain trees, and not contain cycles, the addition of one edge must join two components, since the addition of an edge within a component would create a cycle. Given a forest with $k$ components, we add an edge between two disjoint trees. Note that the resulting forest has $k-1$ connected components (each of which is a tree). Thus, by the induction hypothesis, this new forest has $n-(k-1)=n-k+1$ edges. Since we got to the new forest by adding an edge, the original forest (with $k$ trees) must have one less edge, i.e. $n-k$, as desired.

Proof Idea 2

Use the property that each of the $c$ components in the forest contains one fewer edge than it has vertices (since it is a tree). Adding up all of the vertices across the $c$ components will give us $n$, and adding up the edges across the $c$ components will give us the total number for the forest.

Proof Details 2

Suppose we have a forest of $c$ components with a total of $n$ vertices. Further suppose that the $i^{th}$ connected component in the forest has $n_i$ vertices-- note that $\sum_{i=1}^c n_i=n$. Then it has $n_i -1$ edges. Summing over all of the components in the forest, we have: $\sum_{i=1}^c \left(n_i - 1\right) = \left(\sum_{i=1}^c n_i\right)- c = n-c.$ Thus, an $n$-vertex forest with $c$ components has $n-c$ edges, as desired.

Submission

You will NOT submit this question. This is for you to get into thinking more about proving properties of graphs.

Question 1 (Seating ADFs) [50 points]

The Problem

There is a group of $p$ Ava DuVernay fans (or ADFs for short) who have booked an entire movie theater for a screening of A Wrinkle in Time . Out of the $p(p-1)/2$ possible pairs, $f$ pairs of ADFs are friends. The movie theater is called the Square Theater and has $p$ rows with each row having exactly $p$ chairs in them. (If it helps, you can assume that the $p^2$ chairs are arranged in a "grid"/"matrix" form where each of the $p$ rows has $p$ chairs each and each of the $p$ columns has $p$ chairs each.)

The ADF organization committee has come to you to help them efficiently solve the following seating problem for them. Informally, you want to seat the ADFs so that friends can talk to each other (and "distant" friends remain distant). Next, we make this informal description more formal.

A seating (as you might expect) is an assignment of the $p$ ADFs to the $p^2$ chairs in the theater so that each ADF gets his/her own chair.

A seating is called admissible if

for every pair of friends $(b_1,b_2)$, they can talk to each other during the movies
some ADF is assigned a seat in the first row and
the seating satisfies the distance compatible property.

The ADFs have been to the Square theater before so they have figured out that two people can talk to each other if and only if they are either

seated in the same row or
are seated in the rows next to them (i.e. either to the row directly in front or the row directly behind. The first and the last row of course only have one row next to them).

A seating has the distance compatible property if and only if the following holds. For any $b_1$ who is assigned a seat in the first row and any ADF $b_2$, if $b_1$ and $b_2$ have a friendship distance of $d$ (assuming $d$ is defined), then they have to be seated $d$ or more rows apart. (The friendship distance is the natural definition. Let a friendship path between $b_1$ and $b_2$ be the sequence of ADFs $f_0=b_1,f_1,\dots,f_{k-1},f_k=b_2$ (for some $k\ge 0$) such that for every $0\le i\le k-1$, $(f_i,f_{i+1})$ are friends-- the length of such a path is $k$. The friendship distance between $b_1$ and $b_2$ is the shortest length of any friendship path between them. So if $b_1$ is $b_2$'s friend, they have a friendship distance of $1$ and if $b_2$ is a friend of a friend, they have a distance of $2$ and so on. The friendship distance is not defined if there is no friendship path between $b_1$ and $b_2$.)

Note

Note that the above property is not defined for all pairs of ADFs $b_1$ and $b_2$. (Indeed the problem you will be asked to solve is impossible if the distance compatible property has to hold for all pairs.) Note that the above is defined for only those $b_1$ who are seated in the first row (and every possible $b_2$).

As an example consider the case of $p=3$ and $f=2$ as follows. The ADFs are $A,B,C$ and the $(A,B)$ and $(B,C)$ are the pairs of friends. Then an admissible seating is to assign the "left-most" seat on first row to $A$, the left-most seat in second row to $B$ and the left-most seat on the third row to $C$. This following is also an admissible seating: $B$ is assigned (any seat) in the first row while $A$ and $C$ are assigned any two seats in the second row.

Your final task is to design an $O(f+p^2)$ time algorithm that computes an admissible seating, given as input the set of $p$ ADFs and the set of $f$ pairs of ADFs who are friends. Here are the two parts:

Part (a): Formalize the problem above in terms of graphs: i.e. write down
1. How you would represent the input as a graph $G$;
2. How you would define a seating and
3. Define what it means for a seating to be admissible in terms of properties of graph $G$.
Part (b): Using part 1 or otherwise design an $O(f+p^2)$ time algorithm to compute an admissible seating. (Recall that your algorithm should work for any set of $p$ people and any possible set of $f$ friendships.) You should prove the correctness of your algorithm. You also need justify the running time of your algorithm.

Hint

Think how you can use some of the graph exploration algorithms we have seen in class.

Common Mistake

A common mistake in this (and related) problem is to think that both BFS and DFS are completely interchangeable. There are concrete cases where they are not.

Walkthrough video

Here is the walkthrough video from Fall 18, which would work just as well for this year-- just ignore the fact that it talks about Q2 on HW4 instead of Q1 on HW3 (thanks to Dhruv Kumar for the video):

Submission

Submit part (a) and (b) separately

You need to submit two (2) PDF files to Autolab: one for part (a) and one for part (b). While you can assume part (a) as a given for part (b), to get credit for part (a) you have to submit your solution for part (a) separately from part (b).

Make sure you submit the correct PDF to the correct submission link on Autolab. If you do not (e.g. if you submit Q1(a) PDF to Q1(b) or even Q2(a) or Q2(b)), then you will lose ALL points.

We recommend that you typeset your solution but we will accept scans of handwritten solution-- you have to make sure that the scan is legible.

PDF only please

If Autolab cannot display your file, (irrespective of the reason) then you will get a zero (0) on the entire question.

Autolab might not be able to display files in formats other than PDF (e.g. Word cannot be displayed). Note that Autolab will "accept" your submission even if you submit non-PDF file, so it is YOUR responsibility to make sure you submit in the correct format.

Also the file size has to be at most 3MB.

Grading Guidelines

We will follow the usual grading guidelines for non-programming questions. Here is a high level grading rubric specific to part (a) of this problem:

Formalizing the problem: 10 points.

and here is the high level grading rubric for part (b):

Algorithm idea: 9 points.
Algorithm details: 9 points.
Proof of correctness idea: 9 points.
Proof details: 9 points.
Runtime analysis: 4 points. Note that you only need a Big-Oh analysis.

Note

If you do not have separated out and labeled algorithm idea, algorithm details, proof idea, proof details and runtime analysis for part (b), you will get a zero(0) irrespective of the technical correctness of your solution.

Templates

Download LaTeX template. Download Microsoft Word template.

Note

Your must explicitly list your sources and collaborators in your submission file before uploading it to Autolab. Note that you can only used one of the five allowed sources. If you have used a source that is not allowed, please do not submit your homework. If you did not consult any source and/or did not collaborate with anyone just say None.

Question 2 (Listing $4$-cliques) [25 points]

The Problem

A 4-clique in a graph $G=(V,E)$ is a set of four distinct vertices $\{u,v,w,x\}$ such that $(u,v),(v,w),(w,x),(x,u),(u,w),(v,x)\in E$. (Note that $G$ is undirected.) In this problem you will design a series of algorithms that given a connected graph $G$ as input, lists all the $4$-cliques in $G$. (It is fine to list one $4$-clique more than once.) We call this the 4-clique listing problem (duh!). You can assume that as input you are given $G$ in both the adjacency matrix and adjacency list format. For this problem you can also assume that $G$ is connected.

Sample Input/Output pair

Input: Consider the input $G=(\{A,B,C,D,E\}, \{(A,B),(A,C), (A,D), (B,C), (B,D), (C,D), (A,E), (B,E), (C,E), (D,E)\})$: i.e. a graph on $5$ vertices where all possible $\binom{5}{2}=10$ pairs of vertices are connected by an edge.
Output: The list $\{A,B,C,D\}, \{A,B,C,E\}, \{A,B,D,E\}, \{A,C,D,E\}, \{B,C,D,E\}$.

Below are the two parts of the problem:

Part (a): Present an $O(n^4)$ time algorithm to solve the $4$-clique listing problem.

Note

The trivial algorithm works. For this part only a correct algorithm idea is needed.
Part (b): Present an $O(m^2)$ algorithm to solve the $4$-clique listing problem.

Walkthrough video

Here is the walkthrough video-- ignore the part where it says Fall 2023 (by Caleb Levine ):

Submission

Submit part (a) and (b) separately

Make sure you submit the correct PDF to the correct submission link on Autolab. If you do not (e.g. if you submit Q2(a) PDF to Q2(b) or Q1(a) or Q1(b)), then you will lose ALL points.

We recommend that you typeset your solution but we will accept scans of handwritten solution-- you have to make sure that the scan is legible.

PDF only please

If Autolab cannot display your file, (irrespective of the reason) then you will get a zero (0) on the entire question.

Also the file size has to be at most 3MB.

Grading Guidelines

We will follow the usual grading guidelines for non-programming questions. Here is a high level grading rubric specific to part (a) of this problem:

Algorithm idea: 10 points.

and here is the high level grading rubric for part (b):

Algorithm idea: 2 points.
Algorithm details: 3 points.
Proof of correctness idea: 2 points.
Proof details: 3 points.
Runtime analysis: 5 points.

Note

Templates

Download LaTeX template. Download Microsoft Word template.

Note

Question 3 (Programming Assignment) [25 points]

Note

This assignment can be solved in either Java, Python or C++ (you should pick the language you are most comfortable with). Please make sure to look at the supporting documentation and files for the language of your choosing.

The Problem

In this problem we will explore graphs and problems that are relatable to the 6 degrees of separation problem, which claims that everyone in the world is connected by 6 people or less. We will investigate different networks to see the minimum distance between some starting node, s, and all other nodes in the graph. We will be using real-life data sets to test your code.

We are given a starting node $s$ for some undirected graph $G$ with $n$ nodes. The graph is formatted as an adjacency list, meaning that for each node, $u$, we can access all of $u$'s neighbors. The goal is to output all $n$ nodes in $G$ along each $u$'s minimum distance from $s$.

Input

The input file is given with the first line as the starting node and the remainder of the file is an adjacency list for graph $G$ (we assume that the set of vertices is $\{0,1,\dots,n-1\}$). The adjacency list assumes that the current node is the index of the line under consideration. For instance line 0 of the input file (not including the starting node) has the list of all nodes adjacent to node 0.

                s		<- Starting node (some node between u₀ and u_n-1)
                u₁ u₄ u₆		<- All nodes that share edges with u₀
            	u₃ u_u		<- All nodes that share edges with u₁
		u₀		<- All nodes that share edges with u₂
		.
		.
		.
		u₀ u₄ u₂ u₇ 	<- All nodes that share edges with u_n-1

For example

            	2		<- Starting node
	 	1 2		<- Node 0 shares edges with nodes 1 & 2
		0 3    		<- Node 1 shares edges with nodes 0 & 3
	    	0 3 4 5		<- Node 2 shares edges with nodes 0, 3, 4 & 5
		1 2 6 7		<- Node 3 shares edges with nodes 1, 2, 6 & 7
		2		<- Node 4 shares edges with node 2
		2		<- Node 5 shares an edge with node 2
		3 8		<- Node 6 shares an edge with nodes 3 & 8
		3 8		<- Node 7 shares edges with nodes 3 & 8
		6 7		<- Node 8 shares edges with nodes 6 & 7

If we draw the graph using the above input, it will look like this:

Output

The output is every node (given by the index in the array), along with it's minimum distance from the starting node formatted as your language of choice's array-type data structure (see below for the exact details).

            	[d₀ d₁ d₂... d_m]   					<- Node 0 is distance d₀ from the starting node
									   Node 1 is distance d₁ from the starting node
									   Node 2 is distance d₂ from the starting node
									   .
									   .
									   .
									   Node m is distance d_m from the starting node

For example, using the example input from above:

           	[1, 2, 0, 1, 1, 1, 2, 2, 3]			<- Node 0 is distance 1 from node 2
								   Node 1 is distance 2 from node 2
								   Node 2 is distance 0 from itself 
								   Node 3 is distance 1 from node 2 
								   Node 4 is distance 1 from node 2
								   Node 5 is distance 1 from node 2
								   Node 6 is distance 2 from node 2
							 	   Node 7 is distance 2 from node 2
								   Node 8 is distance 3 from node 2

Note

There is no guarantee that the graph has only one connected component. In the situation where a node can not be reached from the starting node, the distance returned by your algorithm for that node should be -1.

Hint

The best possible algorithm for this problems that we are aware of runs in time $O(m + n)$ where $m$ in the total number of edges, and $n$ is the total number of nodes.

Note

Both the input and output parsers in each of the three languages are already written for you.
Note that you have to work with the input data structures provided (which will come pre-loaded with the data from an input file).

Addition is the only change you should make

Irrespective of what language you use, you will have to submit just one file. That file will come pre-populated with some stuff in it. You should not change any of those things because if you do you might break what the grader expects and end up with a zero on the question. You should of course add stuff to it (including helper functions and data structures as you see fit).

Download Java Skeleton Code

Directory Structure

                                    ├── src
                                    │   ├── ub
                                    │       ├── cse
                                    │           ├── algo
                                    │               ├── Driver.java
                                    │               ├── Graph.java
                                    │               ├── HW3Utility.java
                                    │               ├── Solution.java
                                    ├── testcases/
                                    │   ├── input1.txt
                                    │   ├── input2.txt
                                    │   ├── input5.txt
                                    ├── outputs/
                                    │   ├── output1.txt
                                    │   ├── output2.txt
                                    │   └── output5.txt

You are given four coding files: Driver.java,Graph.java,HW3Utility.java, and Solution.java. Driver.java takes the input file, parses it with a new instance of HW3Utility and creates an instance of the class Solution and calls the outputDistances() method on it. It then prints the distances (which, if your code is correct, will be the shortest distace from the start node). You only need to update the Solution.java file.

On top of that, you may write your own helper methods and data structures in it.

The testcases folder has 3 input files and their corresponding output files for your reference. We will use these three input files (and seven others) in our autograding.

Method you need to write:

                                    public int[] outputDistances() {
                                        return null;
                                    }

The Solution class has 2 instance variables:

startNode which is of type int and stores the starting node id.
graph which is of type HashMap<Integer, ArrayList<Integer>> where the key is the node id and the value is an arraylist of adjacent nodes.

The output is an array of ints. In particular, the location i (for $0\le i <n$) contains the distance of node i from startNode. Finally, note that the starting node is distance 0 from itself.

Compiling and executing from command line:

Assuming you're in the same directory level as src. Run javac src/ub/cse/algo/*.java to compile.

To execute your code on input1.txt, run java -cp "src" ub.cse.algo.Driver testcases/input1.txt. The output array will be printed to stdout.

Submission

You only need to submit Solution.java to Autolab.

Download Python Skeleton Code

Directory Structure

                                    ├── Driver.py
                                    ├── Graph.py
                                    ├── Solution.py
                                    ├── Utility.py
                                    ├── testcases/
                                    │   ├── input1.txt
                                    │   ├── input2.txt
                                    │   ├── input5.txt
                                    ├── outputs/
                                    │   ├── output1.txt
                                    │   ├── output2.txt
                                    │   └── output5.txt

You are given four coding files: Driver.py,Graph.py,Solution.py, and Utility.py. Driver.py takes the input file, parses it with a new instance of Utility and creates an instance of the class Solution and calls the output_distances() method on it. It then prints the distances (which, if your code is correct, will be the shortest distace from the start node). You only need to update the Solution.py file.

On top of that, you may write your own helper methods and data structures in it.

The testcases folder has 3 input files and their corresponding output files for your reference. We will use these three input files (and seven others) in our autograding.

Method you need to write:

                                def output_distances(self):
                                    """
                                    :return: the list of minimum distances from each node to the start node
                                    """
                                    return [] # Return empty

The Solution class has 2 instance variables.

start_node which is the unique id of the start node.
graph which is a dictionary where the key is a unique node id and the value is a list of nodes that share an edge with the key node

Your method is expected to return a list where the minimum distance from the start node is stored at the index corresponding to every node's unique id. In particular, the location i (for $0\le i <n$) contains the distance of node i from start_node. Finally, note that the starting node is distance 0 from itself.

Executing from command line:

Assuming you're in the same directory level as Driver.py and you want to run your code on the input1.txt. Run python Driver.py testcases/input1.txt.

Submission

You only need to submit Solution.py to Autolab.

Download C++ Skeleton Code

Directory Structure

                                    ├── Driver.cpp
                                    ├── Graph.h
                                    ├── HW3Utility.h
                                    ├── Solution.cpp
                                    ├── Utility.h
                                    ├── testcases/
                                    │   ├── input1.txt
                                    │   ├── input2.txt
                                    │   ├── input5.txt
                                    ├── outputs/
                                    │   ├── output1.txt
                                    │   ├── output2.txt
                                    │   └── output5.txt

You are given five coding files: Driver.cpp,Graph.h,HW3Utility.h,Solution.cpp, and Utility.h. Driver.cpp takes the input file, parses it with a new instance of Utility and creates an instance of the class Solution and calls the outputDistances() method on it. It then prints the distances (which, if your code is correct, will be the shortest distace from the start node). You only need to update the Solution.cpp file.

On top of that, you may write your own helper methods and data structures in it.

The testcases folder has 3 input files and their corresponding output files for your reference. We will use these three input files (and seven others) in our autograding.

Method you need to write:

                                    vector<int> outputDistances() {
                                        return m_outputDistances;
                                    }

Note: The output for this function must output a vector of int. In particular, the location i (for $0\le i <n$) contains the distance of node i from start_node. Finally, note that the starting node is distance 0 from itself.

The Solution class has 3 member variables.

graph which is a map where each key is a unique node id as an int and the value is a corresponding list of adjacent nodes for that key as a vector
start_node which is an integer representing the node id of the start node
m_outputDistances a vector which will have an index for every unique node id where that position will hold the distance of that node from the starting node

Compiling and executing from command line:

Assuming you're in the same directory level as Driver.cpp. Run g++ -std=c++11 Driver.cpp to compile.

To execute your code on input1.txt, run ./a.out testcases/input1.txt.

Submission

You only need to submit Solution.cpp to Autolab.

Grading Guidelines

We will follow the usual grading guidelines for programming questions.

For those of you who are feeling a little ambitious

For the top 3 submissions in the scoreboard in Python, the top 2 submissions in the scoreboard in Java and the top submission in the scoreboard in C++, we are offering 2.5 bonus points. But be warned! You should not be spending too much time on this. We rather you work on Questions 1 and 2 above.

Homework 3

Sample Problem

The Problem

Note

Proof Idea 1

Proof Details 1

Proof Idea 2

Proof Details 2

Submission

Question 1 (Seating ADFs) [50 points]

The Problem

Note

Hint

Common Mistake

Walkthrough video

Submission

Submit part (a) and (b) separately

PDF only please

Grading Guidelines

Note

Templates

Note

Question 2 (Listing $4$-cliques) [25 points]

The Problem

Sample Input/Output pair

Note

Walkthrough video

Submission

Submit part (a) and (b) separately

PDF only please

Grading Guidelines

Note

Templates

Note

Question 3 (Programming Assignment) [25 points]

Note

The Problem

Input

Output

Note

Hint

Note

Addition is the only change you should make

Directory Structure

Method you need to write:

Compiling and executing from command line:

Submission

Directory Structure

Method you need to write:

Executing from command line:

Submission

Directory Structure

Method you need to write:

Compiling and executing from command line:

Submission

Grading Guidelines

For those of you who are feeling a little ambitious