Complete Lesson 09 on Jupyter Notebooks before beginning this exercise. Watch the video here and respond to the questions on Gradescope.
In this unit we are focused on a few new data structures (such as set
and dict
), and some fundamental “cost” differences from the list
data structure. We discussed operations on each, such as the in
operator which tests whether an element is contained in the data structure, have different characteristics in how expensive each is to carry out with the respective data structure. This “expense” can be thought of in two dimensions:
Time Complexity - How long does an algorithm take to complete as the size of its data grows? This is analogous to how much “work”, or how many statements and loop iterations, an algorithm takes to complete.
Space Complexity - How much memory is required for an algorithm or data structure as the size of its memory grows? This is a question you will learn more about in the next course in computer science: data structures.
In this exercise, you will develop a benchmark to empirically investigate and compare the time complexity of the same operation on two data structures, as well as two different implementations of the same fundamental algorithm. To benchmark a subject, we will time how long the operation subject takes to complete after a number of experimental runs.
Note of off-limits construct: We have not discussed a Python concept called a list comprehension in our course. If you do not know what this is, you have nothing to worry about as long as you are completing the assignment using permitted course materials and not utilizing any AI assistance such as by ChatGPT. In the unlikely event you know what a list comprehension is from beyond our course, please refrain from using them in this assignment. They’re not necessary. This construct happens to be commonly generated by ChatGPT, though, so if we see their usage we will assume the usage of an unpermitted aid.
0. Setup
Create a new directory in exercises
named ex06
.
Inside the exercises/ex06
directory, create a file named workbench.py
. Add a docstring and establish an __author__
variable to be assigned a string with the digits of your PID. This is where you will implement your function skeletons and implementations below.
Additionally, inside this directory, create a new Jupyter Notebook file named notebook.ipynb
. As mentioned at the start of the exercise, please be sure you have completed Lesson 9 which introduces Jupyter Notebooks before continuing on from this point.
In your Jupyter notebook, add a Markdown Cell that describes “This notebook investigates the time complexity of the in
operator for list
and set
.”
Following the markdown cell, add a Python code cell with the following contents (which will automatically reload work in workbench.py
when you are developing these two files):
%reload_ext autoreload
%autoreload 2
print("Autoreload of imported modules enabled. Be sure to save your work in other modules before refreshing cells.")
1. Establishing our benchmark “Subjects”
Back in workbench.py
define the following two functions which will serve as our benchmark “subjects”:
# Benchmark Subjects:
def set_in(needle: float, haystack: set[float]) -> bool:
"""Returns True if haystack contains needle, false otherwise."""
return needle in haystack
def list_in(needle: float, haystack: list[float]) -> bool:
"""Returns True if haystack contains needle, false otherwise."""
return needle in haystack
These are intentionally very simple functions that “wrap” around the actual algorithmic operation we are trying to test: Python’s builtin in
operator for both list
and set
.
Our benchmarking goal, which we are working toward in upcoming steps, is to call each of these functions many times over, say 100, with larger and larger data sizes of the list
and set
parameters, and compare how “expensive” one is relative to the other.
Check for understanding: do these subjects have the same Callable type?
Almost, right? The second parameter is different, but if we were to treat that second parameter as a generic TypeVar T
, then they could have the same generic Callable
type. We’ll return back to this idea…
2. Check the expected behavior of these subjects in the Jupyter Notebook
Back in the Jupyter Notebook file, add an additional Markdown cell with the following (feel encouraged to rewrite this in your own words, to aid understanding, but it is not required):
## Testing our Subjects
`list`'s `in` and `set`'s `in` operations, so let's first establish our subject functions are working as expected. We are going to benchmark
Next, add the following Python code cell beneath the markdown cell:
from workbench import list_in, set_in
assert list_in(1.0, [0.0, 1.0, 2.0])
assert list_in(-1.0, [0.0, 1.0, 2.0]) is False
assert set_in(1.0, {0.0, 1.0, 2.0})
assert set_in(-1.0, {0.0, 1.0, 2.0}) is False
print("Subject function assertions passed!")
Notice here we’re importing the two functions from the workbench.py
module and added a few basic assertions. This is a more hand-waving approach to testing than we saw in unit testing, but it achieves something similar: it gives us confidence our functions are correcltly defined for some basic inputs. In this notebook, it also gives the reader a sense of what the subject functions are expected to do.
When you evaluate this function, the assertions should silently pass and the final statement will print. If an assertion fails, you will see an error, and should go back and check your work.
3. Establishing Random Data Producer Functions
Before we can benchmark these subject functions, we need some test data to feed into each as an argument. Generally, to test the “average” cost of a subject like this, we come up with randomized input data of a given length. For this part, we are going to provide you with one of the two functions you will need, leaving the implementation of the second function to you.
Before adding these functions to workbench.py
, be sure to add the following import statement to import the random()
function, which generates a random float
value each time it is called, at the top of your workbench.py file, beneath your docstring:
from random import random
After your workbench subject definitions, add the following definitions to workbench.py
:
# Data Producers
def list_producer(n: int) -> list[float]:
"""Produce a list of random floats of size n."""
list[float] = []
result: while len(result) < n:
result.append(random())return result
def set_producer(n: int) -> set[float]:
"""Produce a set of random floats of size n."""
# TODO: Finish implementing this function
...
Notice list_producer
establishes an empty list
and then continues appending random
numbers to it until its size is equal to the parameter n
. This gives us a function where we give it an n
and it returns to us a list of length n
random float
values: perfect for our benchmarking purposes!
Your task in this part is to complete the implementation of set_producer
such that it returns a set
of length n
random numbers. Remember, to create an empty set use the set()
constructor.
Check for understanding: although the odds for generating two identically random float
values n list_producer
is infinitesimally small, it is still possible. Given this possibility, why shouldn’t you implement set_producer
by just returning set(list_producer(n))
?
To the check for understanding question above, the reason why is because lists can contain duplicates and sets do not. Your set_producer
function would not be able to guarantee the set has n
elements.
4. Check the expected behavior of these producers in the Jupyter Notebook
Back in the Jupyter Notebook file, add an additional Markdown cell with the following (feel encouraged to rewrite this in your own words, to aid understanding, but it is not required):
## Testing our Data Producers
`list_producer` and `set_producer` will each produce a `list` and `set`, respectively, with argument `n` random `float` elements. The functions
Next, add the following Python code cell beneath the markdown cell and replace the TODO
comment with two or more assertions demonstrating the correct implementation of your set_producer
function.
from workbench import list_producer, set_producer
assert len(list_producer(n=10)) == 10
assert len(list_producer(n=100)) == 100
# TODO: Add two or more assertions to prove correctnes of set_producer
print("Producer assertions passed! Sample output:")
print(f"list_producer(n=4) -> {list_producer(n=4)}")
print(f"set_producer(n=4) -> {set_producer(n=4)}")
5. Writing microbenchmark functions
Next, we will write two functions to benchmark each of our subject functions with their respective data producers. The first function for microbenchmarking set_in
is given to you, the second for list_in
is up to you to implement.
The big idea behind a microbenchmark is we will import and call a function that gives us a highly precise clock counter which returns to us an integer time in nanoseconds. A nanosecond is one billionth of a second. This denotes our “start” time. Then we will call the subject function we are microbenchmarking. After it completes, we will once again ask for the current time in nanoseconds and call it our “end” time. By subtracting start time from end time we will have a measure of the number of elapsed nanoseconds that it took to complete the call to our subject function. This gives us its cost in time.
Back in workbench.py
, add the following import
statement below your docstring:
from time import perf_counter_ns
The perf_counter_ns
function, when called, gives us a precise time in int
nanoseconds. You can see its use in the set_in
microbenchmark function below:
def microbenchmark_set_in(n: int) -> int:
"""Time (ns) required to check if a set of size n contains a value."""
float = random()
needle: set[float] = set_producer(n)
haystack: int = perf_counter_ns()
start_time: # Our subject function being microbenchmarked
set_in(needle, haystack) int = perf_counter_ns()
end_time: return end_time - start_time
def microbenchmark_list_in(n: int) -> int:
"""Time (ns) required to check if a list of size n contains a value."""
# TODO: Implement a **very** similar microbenchmark for the `list_in` subject
# Be sure your haystack is the correct _type_ and calls the correct _producer_
6. Testing the Microbenchmark Functions
Now that you have implemented microbenchmark_list_in
, let’s try utilizing these functions back in the notebook. Add a markdown cell explaining what we are about to do next:
## Microbenchmarking without Higher-order Functions
Next we will microbenchmark the time, in nanoseconds, each of our subject functions takes to complete using plain-old functions. Later, we will improve upon these two functions.
Following the markdown cell, add a Python code cell:
from workbench import microbenchmark_set_in, microbenchmark_list_in
print(f"set_in(n=100) cost {microbenchmark_set_in(n=100)} nanoseconds")
print(f"list_in(n=100) cost {microbenchmark_list_in(n=100)} nanoseconds")
Try running this cell a few times and observing the results. Each number, individually, will change each run within some reasonable margin. Why? Your computer is also carrying out many, many other tasks, so the speed in which an individual task (such as evaluating a function call) completes is slightly non-deterministic.
Important: Notice that, more often than not, the set_in
operator is faster than list_in
with 100
elements. Exactly how many nanoseconds and the multiplier in speedup will depend on your computer.
An interesting question we will soon explore more rigorously: how does this performance difference change as the number of elements in our set and list changes? Try tinkering with the n
argument by an order of magnitude or two to get an intuitive feel for this.
7. Writing a higher-order, generic microbenchmark function
Look back at your function definitions for microbenchmark_set_in
and microbenchmark_list_in
. You should see a pattern to their structure with two differences around generating data and calling our subject function. Seeing patterns like this across two or more functions is a strong indicator you can abstract out a higher-order function in its place. Our goal is to write an additional function, named microbenchmark
, that is a higher-order function that is given a subject function, a data producer function, and an n
argument that leads to benchmarking a data set of size n
.
To help you get started on writing this function, we need to define our Callable
function types for Producer
and Subject
. These will need to be generic types. Why? Notice your producers and subjects have the same shape but are operating on different types: list[float]
versus set[float]
. Add the following import to the top of your workbench.py
file with your other imports:
from typing import Callable, TypeVar
Next, add the following three definitions below your imports:
= TypeVar("T")
T = Callable[[int], T]
Producer = Callable[[float, T], bool] Subject
Check for understanding: Are set_in
and list_in
examples of type Subject
? If so, for each, what is type T
? Similarly, are list_producer
and set_producer
examples of type Producer
? If so, for each, what is type T
?
Answers: Yes, set_in
and list_in
are valid Subject
type functions. Their generic type T
is concretely set[float]
and list[float]
, respectively. Similarly, list_producer
and set_producer
functions of type Producer
.
Below your earlier functions, write a new function named microbenchmark
with the following structure. It is your task to replace the blanks/comments with valid function calls. :
def microbenchmark(subject: Subject[T], producer: Producer[T], n: int) -> int:
"""Time (ns) cost of subject with a produced collection of size n."""
float = random()
needle: = _________ # TODO: Call the Producer Function with n
haystack: T int = perf_counter_ns()
start_time: # TODO: Call the Subject Function
__________ int = perf_counter_ns()
end_time: return end_time - start_time
Notice the relationship between this function and the earlier, simpler microbenchmark functions that were less flexible. Here, we have a single microbenchmark
function we can reuse with any other Subject[T]
and Producer[T]
functions we define without having to copy/reimplement this same functionality again.
8. Testing your microbenchmark
function in the Jupyter Notebook
Add the following Markdown cell to your notebook.ipynb
file:
## Microbenchmarking with Higher-order Function
`microbenchmark` function, which can be reused with different subjects and producers, let's try the above examples again and make use of it. Now that we have abstracted a single
Then, add a Python code cell below the markdown, with the following tests of your microbenchmark
function. Your task is to replace the PRODUCER_FUNCTION
with the correct name of a Producer
function you imported earlier in your notebook and defined in workbench.py
, given the context. Similarly, replace SUBJECT_FUNCTION
with the correct name of a Subject
function you imported earlier in your notebook and defined in workbench.py
.
from workbench import microbenchmark
print(f"set_in(n=1000) cost {microbenchmark(set_in, PRODUCER_FUNCTION, 1000)} nanoseconds")
print(f"list_in(n=1000) cost {microbenchmark(SUBJECT_FUNCTION, list_producer, 1000)} nanoseconds")
9. Computing the average cost of a subject function
Next, we would like to more accurately measure the time cost of each subject. To do so, we’d like to measure the average cost of running the subject function many, many (say 1,000) times over. We will give you this function since it uses some interesting concepts related to what we have explored in class. Add the following function to your workbench.py
file:
def microbenchmark_avg(
int, runs: int
subject: Subject[T], producer: Producer[T], n: -> int:
) """Compute the average time cost (ns) of a microbenchmark over # runs."""
# 1. Define an "internal" helper function for an individual run
def run_once(n: int) -> int:
"""Given an n, run one microbenchmark of subject/producer."""
return microbenchmark(subject, producer, n)
# 2. n_runs is a list where each element is n repeated (length=runs)
list[int] = [n] * runs
n_runs:
# 3. Use the built-in map function to call run_once on each n
list[int] = list(map(run_once, n_runs))
time_costs:
# 4. TODO: Compute the average time cost of the runs and return it.
# Complete your work here:
Let’s break down the important steps here:
- We’ll return back to this function at step 3.
- When you multiply a
list
by anint
, the list repeats. Thus,[100] * 3
would evaluate to[100, 100, 100]
, for example. Here, we’re repeating then
(size of our produced data set for benchmark purposes)runs
number of times. So ifn
is100
andruns
is1000
,n_runs
would be alist
that looks like[100, 100, ..., 100]
with100
repeated1000
times in the resultinglist
. - We learned the
map
higher-order function in lecture! This line makes use of Python’s built-inmap
. Notice we are mapping the functionrun_once
over then_runs
list. Thanks to step #2, this means we’re repeatedly callingrun_once
with the givenn
for a total ofruns
times. The resulting time cost from the microbenchmark is stored in thetime_costs
list. Notice that therun_once
function is defined with a single parametern
, in order to make it satisfy themap
function’s semantics. We are able to readsubject
andproducer
thanks to this function being defined inside of themicrobenchmark_avg
’s function definition. - Given the above steps, compute the average time cost by summing the time costs of all runs from step 3 and dividing by the number of microbenchmark
runs
using integer division.
10. Testing your microbenchmark_avg
function in the notebook
Back in notebook.ipynb
, add a new Markdown text cell beneath your earlier work and fill in the following:
## Average of Many Microbenchmark Runs
`microbenchmark_avg`, which has an additional parameter of the number of repeated runs of the microbenchmark. We then take the average of this number of runs to give us a more consistent estimated "average cost" of each subject function. Since each microbenchmark has some variance in duration, we're implementing
Then, add a Python code cell with the following snippet and evaluate it:
from workbench import microbenchmark_avg
print(f"set_in(n=1000) avg cost {microbenchmark_avg(set_in, set_producer, n=1000, runs=1000)} nanoseconds")
print(f"list_in(n=1000) avg cost {microbenchmark_avg(list_in, list_producer, n=1000, runs=1000)} nanoseconds")
Notice here we’re choosing a runs
argument of 1000
, meaning we are running each subject funtion (set_in
and list_in
) 1,000 times over, tracking how long each takes to complete, given an n
size of 1000
elements.
When you run this cell multiple times over you should see more consistent (but still varying) results in the number of nanoseconds per operation. Your results should be int
values, not float
values. If you are seeing float
values, be sure you are using integer division when calculating the average!
11. Benchmarking for Doubling Sizes of n Items
Phew, you have almost completed this benchmarking task! The last step is a function that will gather your microbenchmarking results over increasing sizes of n
. We will double the size of n
for each subsequent microbenchmark, find the average cost, and return a dictionary where the keys are the n
size associated values of average cost (nanoseconds).
We are providing the skeleton of this function definition. Your task is to replace the two sets of elipses with calls to the map
function given the context. You will use the list
constructor to convert the map
results into a list
format. For an example, look back at microbenchmark_avg
’s step 3.
The general format of these calls will be:
list(map(TRANSFORM_FUNCTION, INPUT_LIST))
Add the following definition to your workbench.py
file and follow the instructions above:
def benchmark(
int, runs: int
subject: Subject[T], producer: Producer[T], max_pow_2_n: -> dict[int, int]:
) """Compute the avg cost of a microbenchmark over increasing sizes of n.
Returns a dictionary where the key is `n` size of produced data set and
value is the cost of microbenchmarking the `subject` with the given `n`.
The sizes of `n` double for each successive benchmark, such that the test
ns are: [2**0, 2**1, 2**2, ..., 2**max_pow_2_n]
"""
# 1. Define an internal helper function to compute powers of 2
def pow2(exponent: int) -> int:
return int(2**exponent)
# 2. Produce a list of exponents: [0, 1, 2, ..., max_pow_2_n]
list[int] = list(range(max_pow_2_n + 1))
exponents:
# 3. Produce a list of n sizes, which are successive powers of 2
# [1, 2, 4, 8, ..., 2 ** max_pow_2_n]
# TODO: replace the elipses, hint: use the pow2 function and exponents list
list[int] = ...
n_sizes:
# 4. Define an internal helper function to microbenchmark given n
def avg_cost_runner(n: int) -> int:
return microbenchmark_avg(subject, producer, n, runs)
# 5. Compute microbenchmark avg time costs for each of our n_sizes
# TODO replace the elipses, hint: use n_sizes and the function of step 4
list[int] = ...
n_costs:
# 6. Finally, return a dictionary where the keys are n_sizes and
# the values are n_costs
return dict(zip(n_sizes, n_costs))
12. Benchmarking in the Notebook
Now that you have completed the benchmark
function, let’s bring our analysis home and test each of our subject functions at doubling sizes of data. First, add the following markdown cell to explain what is going on in our notebook:
## Benchmarking Doubling n Sizes of Data
`benchmark` function that produces a dictionary where the keys are our data size and values are the time cost in nanoseconds. Let's investigate how expensive our subject functions are as the sizes of their datasets grow. To do so, we've defined an ultimate
Then add the following code cell:
from workbench import benchmark
=14, runs=1000) benchmark(set_in, set_producer, max_pow_2_n
Before evaluating this cell, try to conceptualize what this call to benchmark
is carrying out. What is the largest n
size the benchmark will test? After evaluating it, you should see a resulting dictionary. The keys are the sizes of n
and the values are the average cost of the set_in
subject function given a dataset of that size.
Next, add an additional Python code cell below the previous one you just added. In this code cell, write your own call to benchmark
but benchmark your list_in
subject instead. When evaluated with the same max_pow_2_n
and runs
arguments, this benchmark should prove list_in
to have a much different cost as the size of its dataset grows. If you are not seeing this result, check your work!
As the final step of this Exercise, let’s visualize these results on a chart. The usage of data plotting libraries in Python is beyond the scope of COMP110, so we will give you the code to make this possible.
Add the following to a new code cell beneath your last code cell benchmarking list_in
:
# The following function plots our results
import matplotlib.pyplot as plt
def plot(title: str, plots: list[tuple[str, dict[int, int]]]):
= plt.subplots()
fig, ax for plot_data in plots:
str = plot_data[0]
label: dict[int, float] = plot_data[1]
results: list(results.keys()), list(results.values()), label=label)
ax.plot(='plain')
ax.ticklabel_format(style
ax.legend()
fig.suptitle(title)"Size of Collection (N)")
ax.set_xlabel(f"Avg Cost (Nanoseconds)")
ax.set_ylabel(
plt.show()
"Time Complexity of `in` Operator",
plot(
["set's `in`", benchmark(set_in, set_producer, max_pow_2_n=14, runs=1000)),
("list's `in`", benchmark(list_in, list_producer, max_pow_2_n=14, runs=1000))
(
] )
If your functions are properly implemented, after evaluating this cell you should see a line chart where the X axis is the size (N) of each collection and the Y axis is the average cost of the test subject.
Finally, you can see visualized before you one of the most fundamental concepts in computer science and this unit in the course: different data structures have different characteristics for their operations. Here you benchmarked the cost of determining whether a list and a set “contained” some “needle” value you were searching for. That is what the in
operator does. For lists, this operation is expensive because the underlying algorithm moves 1-by-1 through each element in the list until it finds (or does not find) the needle you are searching for. Thus, as the size of the list grows, so grows the cost or amount of work the algorithm needs to complete. This is called linear growth in complexity, as shown in your chart. By comparison, a set (and, by extension, dictionary’s keys) can test whether it contains some “needle” value in constant time: as additional items are added to the set the cost of searching for a specific item does not significantly increase. This property is thanks to a BIG IDEA called a hashing algorithm and hash table. You will learn all about this concept, and how a set
is implemented with hash tables, in COMP210, should you choose to continue your computer science journey.
Make a Backup Checkpoint “Commit”
As you make progress on this exercise, making backups is encouraged.
- Open the Source Control panel (Command Palette: “Show SCM” or click the icon with three circles and lines on the activity panel).
- Notice the files listed under Changes. These are files you’ve made modifications to since your last backup.
- Move your mouse’s cursor over the word Changes and notice the + symbol that appears. Click that plus symbol to add all changes to the next backup. You will now see the files listed under “Staged Changes”.
- If you do not want to backup all changed files, you can select them individually. For this course you’re encouraged to back everything up.
- In the Message box, give a brief description of what you’ve changed and are backing up. This will help you find a specific backup (called a “commit”) if needed. In this case a message such as, “Progress on Exercise” will suffice.
- Press the Check icon to make a Commit (a version) of your work.
- Finally, press the Ellipses icon (…), look for “Pull/Push” submenu, and select “Push to…”, and in the dropdown select your backup repository.
Submitting your Work
We will manually grade the Jupyter Notebook looking to be sure you were able to execute all cells and produce the expected graph of time complexity by the end of this exercise. BE SURE TO SAVE YOUR WORK IN THE .ipynb FILE BEFORE SUBMITTING TO GRADESCOPE! After submitting, view your submission and look at your Jupyter Notebook to be sure you are able to see the results of running your cells and the expected graphs.
To create your submission zip, as usual, type the following command (all on a single line):
python -m tools.submission exercises/ex06
In the file explorer pane, look to find the zip file named “yy.mm.dd-hh.mm-exercises-ex06.zip”. The “yy”, “mm”, “dd”, and so on, are timestamps with the current month, day, hour, minute. If you right click on this file and select “Reveal in File Explorer” on Windows or “Reveal in Finder” on Mac, the zip file’s location on your computer will open. Upload this file to Gradescope to submit your work for this exercise.
Autograding will take a few moments to complete. If there are issues reported, you are encouraged to try and resolve them and resubmit. If for any reason you aren’t receiving full credit and aren’t sure what to try next, come give us a visit in office hours!
Remember: check your ipynb file on Gradescope to be sure it contains the outputs of evaluated cells and charts! This is not autograded and will be manually graded after the deadline.