Using Python 3.9 or newer, create a virtual environment as follows:
python3 -m venv venv
source venv/bin/activate
Next, install PyTorch using the instructions here. Select pip as the installation method. If you're on Linux and have a CUDA-capable GPU, select the latest CUDA version. This will give you a command like this:
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
Finally, install this homework's other dependencies:
pip install -r requirements.txt
You can now open the project directory in VS Code. Within VS Code, open the command palette (⌘ command ⇧ shift P), run Python: Select Interpreter
, and choose the virtual environment you created in the previous steps.
For the best editing experience, install the following VS Code extensions:
- Python (
ms-python.python
) - Pylance (
ms-python.vscode-pylance
) - Black Formatter (
ms-python.black-formatter
) - Ruff (
charliermarsh.ruff
)
Before getting started: if you're new to PyTorch or einsum
, you can optionally check out this Colab notebook.
This part doesn't involve programming. Just run the script using the provided VS Code launch configuration:
- Navigate to the debugger (⌘ command ⇧ shift D).
- Select
Part 0: Introduction
from the drop-down menu. - Click the green run button.
Running code directly from a command line (not recommended)
Remember to activate your virtual environment using source venv/bin/activate
first. Then, run the following command:
python3 -m scripts.0_introduction
In this part, you'll have to fill out the following files:
src/geometry.py
src/rendering.py
Check your work by running Part 1: Projection
. This will write the rendered images to the directory outputs/1_projection
.
In this class, camera extrinsics are represented as 4x4 matrices in OpenCV-style camera-to-world format. This means that matrix-vector multiplication between an extrinsics matrix and a point in camera space yields a point in world space.
In this class, camera intrinsics are represented as 3x3 matrices that have been normalized via division by the image height
All of the functions in this homework are annotated with types. These are enforced at runtime using jaxtyping and beartype. If you're not familiar with jaxtyping
, you should read the (short) jaxtyping documentation to learn the tensor annotation format.
Hint: These annotations have important implications for broadcasting. If your code does not support the broadcasting implied by the annotations, you will not get full credit. See the note below for more details.
A note on the batch dimension annotations used in the homework.
The annotations *batch
and *#batch
are used for functions that can handle inputs with arbitrary batch dimensions. They differ in that *#batch
states that batch dimensions can be broadcasted. For example:
def broadcastable(a: Float[Tensor, "*#batch 4 4"], b: Float[Tensor, "*#batch 4"]) -> Float[Tensor, "*batch"]:
...
# This works, because the shapes (1, 2, 3, 1) and (2, 3, 5) can be broadcasted.
broadcastable(
torch.randn((1, 2, 3, 1, 4, 4)), # a
torch.randn((2, 3, 5, 4)), # b
)
def not_broadcastable(a: Float[Tensor, "*batch 4 4"], b: Float[Tensor, "*batch 4"]):
pass
# This doesn't work, since the shapes (1, 2, 3, 1) and (2, 3, 5) are not exactly the same.
not_broadcastable(
torch.randn((1, 2, 3, 1, 4, 4)), # a
torch.randn((2, 3, 5, 4)), # b
)
All functions in geometry.py
that have multiple parameters use *#batch
, meaning that you must fill them out to handle broadcasting correctly. The functions homogenize_points
and homogenize_vectors
instead use *batch
, since broadcasting doesn't apply when there's only one parameter. All functions return *batch
, which means that outputs should have a fully broadcasted shape. In the example above, the output of broadcastable
would have shape (1, 2, 3, 5)
.
We've included some tests that you can use as a sanity check. These tests only test basic functionality—it's up to you to ensure that your code can handle more complex use cases (e.g. batch dimensions, corner cases). Run these tests from the project root directory as follows:
python -m pytest tests
You can also run the tests from inside VS Code's graphical interface. From the command palette (⌘ command ⇧ shift P), run Testing: Focus on Test Explorer View
.
In this part of the homework, you're given a synthetic computer vision dataset in which the camera format has not been documented. You must convert between this unknown format and the OpenCV format described in part 1. To do so, fill in the functions in src/puzzle.py
.
Each student will receive a unique dataset with a randomly generated format. Per-student dataset download links will be made available via a Piazza announcement.
Each dataset contains 32 images and a metadata.json
file. The metadata.json
file contains two keys:
intrinsics
: These intrinsics use the normalized format described in part 1.extrinsics
: These extrinsics use a randomized format described below.
The extrinsics are either in camera-to-world format or world-to-camera format. The axes have been randomized, meaning that the camera look, up, and right vectors could be any of
The cameras are arranged as described below. Use this information to help you figure out your camera format.
- The camera origins are always exactly 2 units from the origin.
- The world up vector is
$+y$ , and all cameras have$y \geq 0$ . - All camera look vectors point directly at the origin.
- All camera up vectors are pointed "up" in the world. In other words, the dot product between any camera up vector and
$+y$ is positive.
Hint: How might one build a rotation matrix to convert between camera coordinate systems?
Run the script for part 2 to check your work. If your conversion function works correctly, you should be able to exactly reproduce the images in your dataset. If you want to test your load_dataset
function in isolation, you can use the dataset located at data/sample_dataset
with a convert_dataset
that simply returns its input.
Note that you may use the mesh at data/stanford_bunny.obj
if you find it helpful to do so, although using it is not required to find the solution.
You may work with other students and use AI tools (e.g. ChatGPT, GitHub Copilot), but must submit code that you understand fully and have written yourself.
Before submitting, ensure that your code has been formatted using Black and linted using Ruff. Otherwise, you will lose points. Also, double-check that you have not changed any of the function signatures in geometry.py
, puzzle.py
, or rendering.py
. Submit your work on Canvas.
Each homework will have a bonus problem that we will use to allocate A+ grades. These problems are completely optional. For this homework, the bonus problem is as follows:
Can you devise a way to automatically solve everyone's puzzles? Create a script that converts a folder of dataset .zip
files to a folder of dataset .zip
files in standardized (converted) format. If you attempt this problem, make sure to include your script's location and your general approach in your answer to explanation_of_problem_solving_process
in puzzle.py
.
Also, if you manage to do this, please don't spoil everyone else's puzzles! :)