EX06 - Benchmarking Time Complexity

Complete Lesson 09 on Jupyter Notebooks before beginning this exercise. Watch the video here and respond to the questions on Gradescope.

In this unit we are focused on a few new data structures (such as set and dict), and some fundamental “cost” differences from the list data structure. We discussed operations on each, such as the in operator which tests whether an element is contained in the data structure, have different characteristics in how expensive each is to carry out with the respective data structure. This “expense” can be thought of in two dimensions:

  1. Time Complexity - How long does an algorithm take to complete as the size of its data grows? This is analogous to how much “work”, or how many statements and loop iterations, an algorithm takes to complete.

  2. Space Complexity - How much memory is required for an algorithm or data structure as the size of its memory grows? This is a question you will learn more about in the next course in computer science: data structures.

In this exercise, you will develop a benchmark to empirically investigate and compare the time complexity of the same operation on two data structures, as well as two different implementations of the same fundamental algorithm. To benchmark a subject, we will time how long the operation subject takes to complete after a number of experimental runs.

Note of off-limits construct: We have not discussed a Python concept called a list comprehension in our course. If you do not know what this is, you have nothing to worry about as long as you are completing the assignment using permitted course materials and not utilizing any AI assistance such as by ChatGPT. In the unlikely event you know what a list comprehension is from beyond our course, please refrain from using them in this assignment. They’re not necessary. This construct happens to be commonly generated by ChatGPT, though, so if we see their usage we will assume the usage of an unpermitted aid.

0. Setup

Create a new directory in exercises named ex06.

Inside the exercises/ex06 directory, create a file named workbench.py. Add a docstring and establish an __author__ variable to be assigned a string with the digits of your PID. This is where you will implement your function skeletons and implementations below.

Additionally, inside this directory, create a new Jupyter Notebook file named notebook.ipynb. As mentioned at the start of the exercise, please be sure you have completed Lesson 9 which introduces Jupyter Notebooks before continuing on from this point.

In your Jupyter notebook, add a Markdown Cell that describes “This notebook investigates the time complexity of the in operator for list and set.”

Following the markdown cell, add a Python code cell with the following contents (which will automatically reload work in workbench.py when you are developing these two files):

%reload_ext autoreload
%autoreload 2
print("Autoreload of imported modules enabled. Be sure to save your work in other modules before refreshing cells.")

1. Establishing our benchmark “Subjects”

Back in workbench.py define the following two functions which will serve as our benchmark “subjects”:

# Benchmark Subjects:

def set_in(needle: float, haystack: set[float]) -> bool:
    """Returns True if haystack contains needle, false otherwise."""
    return needle in haystack


def list_in(needle: float, haystack: list[float]) -> bool:
    """Returns True if haystack contains needle, false otherwise."""
    return needle in haystack

These are intentionally very simple functions that “wrap” around the actual algorithmic operation we are trying to test: Python’s builtin in operator for both list and set.

Our benchmarking goal, which we are working toward in upcoming steps, is to call each of these functions many times over, say 100, with larger and larger data sizes of the list and set parameters, and compare how “expensive” one is relative to the other.

Check for understanding: do these subjects have the same Callable type?

Almost, right? The second parameter is different, but if we were to treat that second parameter as a generic TypeVar T, then they could have the same generic Callable type. We’ll return back to this idea…

2. Check the expected behavior of these subjects in the Jupyter Notebook

Back in the Jupyter Notebook file, add an additional Markdown cell with the following (feel encouraged to rewrite this in your own words, to aid understanding, but it is not required):

## Testing our Subjects

We are going to benchmark `list`'s `in` and `set`'s `in` operations, so let's first establish our subject functions are working as expected.

Next, add the following Python code cell beneath the markdown cell:

from workbench import list_in, set_in

assert list_in(1.0, [0.0, 1.0, 2.0])
assert list_in(-1.0, [0.0, 1.0, 2.0]) is False
assert set_in(1.0, {0.0, 1.0, 2.0})
assert set_in(-1.0, {0.0, 1.0, 2.0}) is False

print("Subject function assertions passed!")

Notice here we’re importing the two functions from the workbench.py module and added a few basic assertions. This is a more hand-waving approach to testing than we saw in unit testing, but it achieves something similar: it gives us confidence our functions are correcltly defined for some basic inputs. In this notebook, it also gives the reader a sense of what the subject functions are expected to do.

When you evaluate this function, the assertions should silently pass and the final statement will print. If an assertion fails, you will see an error, and should go back and check your work.

3. Establishing Random Data Producer Functions

Before we can benchmark these subject functions, we need some test data to feed into each as an argument. Generally, to test the “average” cost of a subject like this, we come up with randomized input data of a given length. For this part, we are going to provide you with one of the two functions you will need, leaving the implementation of the second function to you.

Before adding these functions to workbench.py, be sure to add the following import statement to import the random() function, which generates a random float value each time it is called, at the top of your workbench.py file, beneath your docstring:

from random import random

After your workbench subject definitions, add the following definitions to workbench.py:

# Data Producers

def list_producer(n: int) -> list[float]:
    """Produce a list of random floats of size n."""
    result: list[float] = []
    while len(result) < n:
        result.append(random())
    return result


def set_producer(n: int) -> set[float]:
    """Produce a set of random floats of size n."""
    # TODO: Finish implementing this function
    ...

Notice list_producer establishes an empty list and then continues appending random numbers to it until its size is equal to the parameter n. This gives us a function where we give it an n and it returns to us a list of length n random float values: perfect for our benchmarking purposes!

Your task in this part is to complete the implementation of set_producer such that it returns a set of length n random numbers. Remember, to create an empty set use the set() constructor.

Check for understanding: although the odds for generating two identically random float values n list_producer is infinitesimally small, it is still possible. Given this possibility, why shouldn’t you implement set_producer by just returning set(list_producer(n))?

To the check for understanding question above, the reason why is because lists can contain duplicates and sets do not. Your set_producer function would not be able to guarantee the set has n elements.

4. Check the expected behavior of these producers in the Jupyter Notebook

Back in the Jupyter Notebook file, add an additional Markdown cell with the following (feel encouraged to rewrite this in your own words, to aid understanding, but it is not required):

## Testing our Data Producers

The functions `list_producer` and `set_producer` will each produce a `list` and `set`, respectively, with argument `n` random `float` elements.

Next, add the following Python code cell beneath the markdown cell and replace the TODO comment with two or more assertions demonstrating the correct implementation of your set_producer function.

from workbench import list_producer, set_producer

assert len(list_producer(n=10)) == 10
assert len(list_producer(n=100)) == 100

# TODO: Add two or more assertions to prove correctnes of set_producer

print("Producer assertions passed! Sample output:")
print(f"list_producer(n=4) -> {list_producer(n=4)}")
print(f"set_producer(n=4) -> {set_producer(n=4)}")

5. Writing microbenchmark functions

Next, we will write two functions to benchmark each of our subject functions with their respective data producers. The first function for microbenchmarking set_in is given to you, the second for list_in is up to you to implement.

The big idea behind a microbenchmark is we will import and call a function that gives us a highly precise clock counter which returns to us an integer time in nanoseconds. A nanosecond is one billionth of a second. This denotes our “start” time. Then we will call the subject function we are microbenchmarking. After it completes, we will once again ask for the current time in nanoseconds and call it our “end” time. By subtracting start time from end time we will have a measure of the number of elapsed nanoseconds that it took to complete the call to our subject function. This gives us its cost in time.

Back in workbench.py, add the following import statement below your docstring:

from time import perf_counter_ns

The perf_counter_ns function, when called, gives us a precise time in int nanoseconds. You can see its use in the set_in microbenchmark function below:

def microbenchmark_set_in(n: int) -> int:
    """Time (ns) required to check if a set of size n contains a value."""
    needle: float = random()
    haystack: set[float] = set_producer(n)
    start_time: int = perf_counter_ns()
    set_in(needle, haystack)  # Our subject function being microbenchmarked
    end_time: int = perf_counter_ns()
    return end_time - start_time


def microbenchmark_list_in(n: int) -> int:
    """Time (ns) required to check if a list of size n contains a value."""
    # TODO: Implement a **very** similar microbenchmark for the `list_in` subject
    # Be sure your haystack is the correct _type_ and calls the correct _producer_

6. Testing the Microbenchmark Functions

Now that you have implemented microbenchmark_list_in, let’s try utilizing these functions back in the notebook. Add a markdown cell explaining what we are about to do next:

## Microbenchmarking without Higher-order Functions

Next we will microbenchmark the time, in nanoseconds, each of our subject functions takes to complete using plain-old functions. Later, we will improve upon these two functions.

Following the markdown cell, add a Python code cell:

from workbench import microbenchmark_set_in, microbenchmark_list_in

print(f"set_in(n=100) cost {microbenchmark_set_in(n=100)} nanoseconds")
print(f"list_in(n=100) cost {microbenchmark_list_in(n=100)} nanoseconds")

Try running this cell a few times and observing the results. Each number, individually, will change each run within some reasonable margin. Why? Your computer is also carrying out many, many other tasks, so the speed in which an individual task (such as evaluating a function call) completes is slightly non-deterministic.

Important: Notice that, more often than not, the set_in operator is faster than list_in with 100 elements. Exactly how many nanoseconds and the multiplier in speedup will depend on your computer.

An interesting question we will soon explore more rigorously: how does this performance difference change as the number of elements in our set and list changes? Try tinkering with the n argument by an order of magnitude or two to get an intuitive feel for this.

7. Writing a higher-order, generic microbenchmark function

Look back at your function definitions for microbenchmark_set_in and microbenchmark_list_in. You should see a pattern to their structure with two differences around generating data and calling our subject function. Seeing patterns like this across two or more functions is a strong indicator you can abstract out a higher-order function in its place. Our goal is to write an additional function, named microbenchmark, that is a higher-order function that is given a subject function, a data producer function, and an n argument that leads to benchmarking a data set of size n.

To help you get started on writing this function, we need to define our Callable function types for Producer and Subject. These will need to be generic types. Why? Notice your producers and subjects have the same shape but are operating on different types: list[float] versus set[float]. Add the following import to the top of your workbench.py file with your other imports:

from typing import Callable, TypeVar

Next, add the following three definitions below your imports:

T = TypeVar("T")
Producer = Callable[[int], T]
Subject = Callable[[float, T], bool]

Check for understanding: Are set_in and list_in examples of type Subject? If so, for each, what is type T? Similarly, are list_producer and set_producer examples of type Producer? If so, for each, what is type T?

Answers: Yes, set_in and list_in are valid Subject type functions. Their generic type T is concretely set[float] and list[float], respectively. Similarly, list_producer and set_producer functions of type Producer.

Below your earlier functions, write a new function named microbenchmark with the following structure. It is your task to replace the blanks/comments with valid function calls. :

def microbenchmark(subject: Subject[T], producer: Producer[T], n: int) -> int:
    """Time (ns) cost of subject with a produced collection of size n."""
    needle: float = random()
    haystack: T = _________ # TODO: Call the Producer Function with n
    start_time: int = perf_counter_ns()
    __________ # TODO: Call the Subject Function
    end_time: int = perf_counter_ns()
    return end_time - start_time

Notice the relationship between this function and the earlier, simpler microbenchmark functions that were less flexible. Here, we have a single microbenchmark function we can reuse with any other Subject[T] and Producer[T] functions we define without having to copy/reimplement this same functionality again.

8. Testing your microbenchmark function in the Jupyter Notebook

Add the following Markdown cell to your notebook.ipynb file:

## Microbenchmarking with Higher-order Function

Now that we have abstracted a single `microbenchmark` function, which can be reused with different subjects and producers, let's try the above examples again and make use of it.

Then, add a Python code cell below the markdown, with the following tests of your microbenchmark function. Your task is to replace the PRODUCER_FUNCTION with the correct name of a Producer function you imported earlier in your notebook and defined in workbench.py, given the context. Similarly, replace SUBJECT_FUNCTION with the correct name of a Subject function you imported earlier in your notebook and defined in workbench.py.

from workbench import microbenchmark

print(f"set_in(n=1000) cost {microbenchmark(set_in, PRODUCER_FUNCTION, 1000)} nanoseconds")
print(f"list_in(n=1000) cost {microbenchmark(SUBJECT_FUNCTION, list_producer, 1000)} nanoseconds")

9. Computing the average cost of a subject function

Next, we would like to more accurately measure the time cost of each subject. To do so, we’d like to measure the average cost of running the subject function many, many (say 1,000) times over. We will give you this function since it uses some interesting concepts related to what we have explored in class. Add the following function to your workbench.py file:

def microbenchmark_avg(
    subject: Subject[T], producer: Producer[T], n: int, runs: int
) -> int:
    """Compute the average time cost (ns) of a microbenchmark over # runs."""

    # 1. Define an "internal" helper function for an individual run
    def run_once(n: int) -> int:
        """Given an n, run one microbenchmark of subject/producer."""
        return microbenchmark(subject, producer, n)

    # 2. n_runs is a list where each element is n repeated (length=runs)
    n_runs: list[int] = [n] * runs

    # 3. Use the built-in map function to call run_once on each n
    time_costs: list[int] = list(map(run_once, n_runs))

    # 4. TODO: Compute the average time cost of the runs and return it.
    # Complete your work here:

Let’s break down the important steps here:

  1. We’ll return back to this function at step 3.
  2. When you multiply a list by an int, the list repeats. Thus, [100] * 3 would evaluate to [100, 100, 100], for example. Here, we’re repeating the n (size of our produced data set for benchmark purposes) runs number of times. So if n is 100 and runs is 1000, n_runs would be a list that looks like [100, 100, ..., 100] with 100 repeated 1000 times in the resulting list.
  3. We learned the map higher-order function in lecture! This line makes use of Python’s built-in map. Notice we are mapping the function run_once over the n_runs list. Thanks to step #2, this means we’re repeatedly calling run_once with the given n for a total of runs times. The resulting time cost from the microbenchmark is stored in the time_costs list. Notice that the run_once function is defined with a single parameter n, in order to make it satisfy the map function’s semantics. We are able to read subject and producer thanks to this function being defined inside of the microbenchmark_avg’s function definition.
  4. Given the above steps, compute the average time cost by summing the time costs of all runs from step 3 and dividing by the number of microbenchmark runs using integer division.

10. Testing your microbenchmark_avg function in the notebook

Back in notebook.ipynb, add a new Markdown text cell beneath your earlier work and fill in the following:

## Average of Many Microbenchmark Runs

Since each microbenchmark has some variance in duration, we're implementing `microbenchmark_avg`, which has an additional parameter of the number of repeated runs of the microbenchmark. We then take the average of this number of runs to give us a more consistent estimated "average cost" of each subject function.

Then, add a Python code cell with the following snippet and evaluate it:

from workbench import microbenchmark_avg

print(f"set_in(n=1000) avg cost {microbenchmark_avg(set_in, set_producer, n=1000, runs=1000)} nanoseconds")
print(f"list_in(n=1000) avg cost {microbenchmark_avg(list_in, list_producer, n=1000, runs=1000)} nanoseconds")

Notice here we’re choosing a runs argument of 1000, meaning we are running each subject funtion (set_in and list_in) 1,000 times over, tracking how long each takes to complete, given an n size of 1000 elements.

When you run this cell multiple times over you should see more consistent (but still varying) results in the number of nanoseconds per operation. Your results should be int values, not float values. If you are seeing float values, be sure you are using integer division when calculating the average!

11. Benchmarking for Doubling Sizes of n Items

Phew, you have almost completed this benchmarking task! The last step is a function that will gather your microbenchmarking results over increasing sizes of n. We will double the size of n for each subsequent microbenchmark, find the average cost, and return a dictionary where the keys are the n size associated values of average cost (nanoseconds).

We are providing the skeleton of this function definition. Your task is to replace the two sets of elipses with calls to the map function given the context. You will use the list constructor to convert the map results into a list format. For an example, look back at microbenchmark_avg’s step 3.

The general format of these calls will be:

list(map(TRANSFORM_FUNCTION, INPUT_LIST))

Add the following definition to your workbench.py file and follow the instructions above:

def benchmark(
    subject: Subject[T], producer: Producer[T], max_pow_2_n: int, runs: int
) -> dict[int, int]:
    """Compute the avg cost of a microbenchmark over increasing sizes of n.
    Returns a dictionary where the key is `n` size of produced data set and 
    value is the cost of microbenchmarking the `subject` with the given `n`.

    The sizes of `n` double for each successive benchmark, such that the test
    ns are: [2**0, 2**1, 2**2, ..., 2**max_pow_2_n]
    """
    # 1. Define an internal helper function to compute powers of 2
    def pow2(exponent: int) -> int:
        return int(2**exponent)

    # 2. Produce a list of exponents: [0, 1, 2, ..., max_pow_2_n]
    exponents: list[int] = list(range(max_pow_2_n + 1))

    # 3. Produce a list of n sizes, which are successive powers of 2
    # [1, 2, 4, 8, ..., 2 ** max_pow_2_n]
    # TODO: replace the elipses, hint: use the pow2 function and exponents list
    n_sizes: list[int] = ...

    # 4. Define an internal helper function to microbenchmark given n
    def avg_cost_runner(n: int) -> int:
        return microbenchmark_avg(subject, producer, n, runs)

    # 5. Compute microbenchmark avg time costs for each of our n_sizes
    # TODO replace the elipses, hint: use n_sizes and the function of step 4
    n_costs: list[int] = ...

    # 6. Finally, return a dictionary where the keys are n_sizes and
    # the values are n_costs
    return dict(zip(n_sizes, n_costs))

12. Benchmarking in the Notebook

Now that you have completed the benchmark function, let’s bring our analysis home and test each of our subject functions at doubling sizes of data. First, add the following markdown cell to explain what is going on in our notebook:

## Benchmarking Doubling n Sizes of Data

Let's investigate how expensive our subject functions are as the sizes of their datasets grow. To do so, we've defined an ultimate `benchmark` function that produces a dictionary where the keys are our data size and values are the time cost in nanoseconds.

Then add the following code cell:

from workbench import benchmark

benchmark(set_in, set_producer, max_pow_2_n=14, runs=1000)

Before evaluating this cell, try to conceptualize what this call to benchmark is carrying out. What is the largest n size the benchmark will test? After evaluating it, you should see a resulting dictionary. The keys are the sizes of n and the values are the average cost of the set_in subject function given a dataset of that size.

Next, add an additional Python code cell below the previous one you just added. In this code cell, write your own call to benchmark but benchmark your list_in subject instead. When evaluated with the same max_pow_2_n and runs arguments, this benchmark should prove list_in to have a much different cost as the size of its dataset grows. If you are not seeing this result, check your work!

As the final step of this Exercise, let’s visualize these results on a chart. The usage of data plotting libraries in Python is beyond the scope of COMP110, so we will give you the code to make this possible.

Add the following to a new code cell beneath your last code cell benchmarking list_in:

# The following function plots our results
import matplotlib.pyplot as plt

def plot(title: str, plots: list[tuple[str, dict[int, int]]]):
    fig, ax = plt.subplots()
    for plot_data in plots:
        label: str = plot_data[0]
        results: dict[int, float] = plot_data[1]
        ax.plot(list(results.keys()), list(results.values()), label=label)
    ax.ticklabel_format(style='plain')
    ax.legend()
    fig.suptitle(title)
    ax.set_xlabel("Size of Collection (N)")
    ax.set_ylabel(f"Avg Cost (Nanoseconds)")
    plt.show()

plot("Time Complexity of `in` Operator",
  [
    ("set's `in`", benchmark(set_in, set_producer, max_pow_2_n=14, runs=1000)),
    ("list's `in`", benchmark(list_in, list_producer, max_pow_2_n=14, runs=1000))
  ]
)

If your functions are properly implemented, after evaluating this cell you should see a line chart where the X axis is the size (N) of each collection and the Y axis is the average cost of the test subject.

Finally, you can see visualized before you one of the most fundamental concepts in computer science and this unit in the course: different data structures have different characteristics for their operations. Here you benchmarked the cost of determining whether a list and a set “contained” some “needle” value you were searching for. That is what the in operator does. For lists, this operation is expensive because the underlying algorithm moves 1-by-1 through each element in the list until it finds (or does not find) the needle you are searching for. Thus, as the size of the list grows, so grows the cost or amount of work the algorithm needs to complete. This is called linear growth in complexity, as shown in your chart. By comparison, a set (and, by extension, dictionary’s keys) can test whether it contains some “needle” value in constant time: as additional items are added to the set the cost of searching for a specific item does not significantly increase. This property is thanks to a BIG IDEA called a hashing algorithm and hash table. You will learn all about this concept, and how a set is implemented with hash tables, in COMP210, should you choose to continue your computer science journey.

Make a Backup Checkpoint “Commit”

As you make progress on this exercise, making backups is encouraged.

  1. Open the Source Control panel (Command Palette: “Show SCM” or click the icon with three circles and lines on the activity panel).
  2. Notice the files listed under Changes. These are files you’ve made modifications to since your last backup.
  3. Move your mouse’s cursor over the word Changes and notice the + symbol that appears. Click that plus symbol to add all changes to the next backup. You will now see the files listed under “Staged Changes”.
    • If you do not want to backup all changed files, you can select them individually. For this course you’re encouraged to back everything up.
  4. In the Message box, give a brief description of what you’ve changed and are backing up. This will help you find a specific backup (called a “commit”) if needed. In this case a message such as, “Progress on Exercise” will suffice.
  5. Press the Check icon to make a Commit (a version) of your work.
  6. Finally, press the Ellipses icon (…), look for “Pull/Push” submenu, and select “Push to…”, and in the dropdown select your backup repository.

Submitting your Work

We will manually grade the Jupyter Notebook looking to be sure you were able to execute all cells and produce the expected graph of time complexity by the end of this exercise. BE SURE TO SAVE YOUR WORK IN THE .ipynb FILE BEFORE SUBMITTING TO GRADESCOPE! After submitting, view your submission and look at your Jupyter Notebook to be sure you are able to see the results of running your cells and the expected graphs.

To create your submission zip, as usual, type the following command (all on a single line):

python -m tools.submission exercises/ex06

In the file explorer pane, look to find the zip file named “yy.mm.dd-hh.mm-exercises-ex06.zip”. The “yy”, “mm”, “dd”, and so on, are timestamps with the current month, day, hour, minute. If you right click on this file and select “Reveal in File Explorer” on Windows or “Reveal in Finder” on Mac, the zip file’s location on your computer will open. Upload this file to Gradescope to submit your work for this exercise.

Autograding will take a few moments to complete. If there are issues reported, you are encouraged to try and resolve them and resubmit. If for any reason you aren’t receiving full credit and aren’t sure what to try next, come give us a visit in office hours!

Remember: check your ipynb file on Gradescope to be sure it contains the outputs of evaluated cells and charts! This is not autograded and will be manually graded after the deadline.

Contributor(s): Kris Jordan