Algorithms & Data Structures

Problem: Explain what the spacetime complexity of an algorithm on a problem of characteristic input size \(n\) is.

Solution: Time complexity \(T(n)\), is the number of \(\Theta(1)\)-operations that an algorithm requires to produce a solution to a problem of input size \(n\), while space complexity \(S(n)\) is the same but for memory. More precisely, just as the Wigner-Eckart theorem factorizes angular momentum matrix elements of spherical components of tensor operators as a product of purely geometric Clebsch-Gordan coefficients with an operator-specific “reduced matrix element”, the total runtime can be thought of as factorizing into a purely algorithm-dependent factor \(T(n)\) and some hardware-specific factor, and similarly the memory is proportional to \(S(n)\) with hardware-dependent proportionality constant.

Problem: Explain how the \(3\) time complexities \(T_{b}(n),T(n),T_w(n)\) are defined (similar definitions apply to their spatial counterparts \(S_{b}(n),S(n),S_w(n)\)).

Solution: Letting \(\mathcal I_n\) be the set of all valid inputs of size \(n\), one can define a random function \(T(I)\) returning some algorithm’s time complexity on a specific length-\(n\) input \(I\in\mathcal I_n\) (which itself can be thought of as a random variable with some not-necessarily-uniform probability distribution \(p(I)\), e.g. Zipf’s law in word queries). Then the best-case and worst-case time complexities are simply:

\[T_{b}(n):=\min_{I\in\mathcal I_n}T(I)\]

\[T_{w}(n):=\max_{I\in\mathcal I_n}T(I)\]

whereas the average-case time complexity is:

\[T(n):=\langle T(I)\rangle=\sum_{I\in\mathcal I_n}p(I)T(I)\]

Problem: Describe several classic comparison-based sorting algorithms (think of ordering a bunch of people by height), and justify their (asymptotic) best/average/worst-case time complexities and auxiliary space complexities (i.e. space complexities but ignoring the \(\Theta(n)\) memory occupied by the input length-\(n\) array itself).

Solution: First, there are a few sorting algorithms which are inefficient and thus not actually implemented in production:

  • Bogosort works like tossing a deck of cards on the floor and picking them up in a random order. If the result is not sorted, then repeat. Of course, this is not a comparison-based, or even deterministic sorting algorithm. It is just bad, don’t use it. \(T_b(n)\in\Theta(n),T_w(n)\in\Theta(\infty),T(n)\in\Theta(n\times n!)\) while \(S_b(n),S_w(n),S(n)\in\Theta(1)\).
  • Bubble sort works by comparing elements pairwise and swapping if necessary, moving through the array and repeating until there’s no elements left to swap. It has time complexities \(T_b(n)\in\Theta(n)\) (if the array is already sorted) and \(T_w(n),T(n)\in\Theta(n^2)\) (since one may have to compare all \(n(n-1)/2\in\Theta(n^2)\) pairs of elements). Like bogosort, it is an in-place algorithm, so \(S_b(n),S_w(n),S(n)\in\Theta(1)\)).
  • Insertion sort is like sorting cards. It has time complexities \(T_b(n)\in\Theta(n)\) and \(T_w(n),T(n)\in\Theta(n^2)\). It is also an in-place algorithm \(S_b(n),S_w(n),S(n)\in\Theta(1)\).
  • Selection sort divides the array into sorted and unsorted regions, repeatedly finding the min from the unsorted region and swapping it with the first element of the unsorted region. It has time complexities \(T_b(n),T_w(n),T(n)\in\Theta(n^2)\), and operates strictly in-place \(S_b(n),S_w(n),S(n)\in\Theta(1)\).

The sorting algorithms below are characterized by log-linear \(T(n)\in\Omega(n\ln n)\) average-case time complexities and thus are considered to be efficient and commonly used in production:

  • Merge sort uses a divide-and-conquer strategy, recursively halving the array into single-element subarrays and then merging the sorted subarrays back together in order. One can thus write down a recurrence relation \(T(n)=2T(n/2)+\Theta(n)\), and hence apply the master theorem to obtain \(T_b(n),T_w(n),T(n)\in\Theta(n\ln n)\). Because it requires an auxiliary array to hold elements during the merge, merge sort has auxiliary space complexities \(S_b(n),S_w(n),S(n)\in\Theta(n)\).
  • Quick sort works by randomly choosing a pivot, comparing all other elements to the pivot and swapping them left/right of the pivot accordingly, then recursively repeating this on the subarrays split by the pivot (now in the correct position). It has time complexities \(T_w(n)\in\Theta(n^2)\) and \(T_b(n),T(n)\in\Theta(n\ln n)\) (in all \(3\) cases, a factor of \(n\) is due to the \(\Theta(n)\) time required to scan an array of length \(n\) at each level of the recursion tree, while the remaining factor is the recursion tree depth for that particular case). Meanwhile, quick sort has auxiliary space complexities \(S_w(n)\in\Theta(n)\) and \(S_b(n),S(n)\in\Theta(\ln n)\) which are simply the recursion tree depths used to build up a call stack in the respective cases (there seems to be debate about whether quick sort is considered “in-place” or not…)
  • Shell sort works as a generalized and optimized version of insertion sort. It allows the exchange of far-apart elements by sorting pairs of elements separated by a gap that progressively decreases down to \(1\). It has a best-case time complexity of \(T_b(n)\in\Theta(n\ln n)\) (or \(\Theta(n)\) depending on the gap sequence) and a worst-case of \(T_w(n)\in\Theta(n^2)\) using Shell’s original gaps, though modern gap sequences can reduce this to \(\Theta(n^{3/2})\) or \(\Theta(n\ln^2 n)\). Its average-case time complexity \(T(n)\) depends heavily on the chosen gap sequence but typically scales better than quadratic time. Because it modifies the array via localized insertion passes, it is an in-place algorithm with space complexities \(S_b(n),S_w(n),S(n)\in\Theta(1)\).
  • Heap sort works by visualizing the array as a complete binary tree and rearranging it into a max-heap structure. It then repeatedly extracts the maximum element from the root, moves it to the end of the array, and runs a heapify procedure to restore the heap property for the remaining elements. It has time complexities \(T_b(n),T_w(n),T(n)\in\Theta(n\ln n)\) because constructing the initial heap takes \(\Theta(n)\) time, and each of the \(n\) extractions requires a heapify step that takes logarithmic time proportional to the height of the tree. Since it manipulates the array directly by swapping elements, its space complexities are \(S_b(n),S_w(n),S(n)\in\Theta(1)\).
  • Timsort works as a hybrid, stable sorting algorithm that intelligently combines the strengths of merge sort and insertion sort. It operates by scanning the array to find naturally ordered segments called “runs,” extending short runs using insertion sort, and then combining those runs using an optimized merging routine. It achieves a best-case time complexity of \(T_b(n)\in\Theta(n)\) when the data is already sorted or reverse-sorted. Its worst and average-case time complexities are \(T_w(n),T(n)\in\Theta(n\ln n)\) as it scales up to a merge-like structural tree for random data. It requires auxiliary space to execute its merge operations, yielding space complexities of \(S_w(n),S(n)\in\Theta(n)\) and \(S_b(n)\in\Theta(1)\).

    The sorting algorithms below are non-comparison-based sorting algorithms, allowing them to bypass the lower bound of \(\Omega(n\ln n)\) by leveraging specific assumptions about the properties of the input data:
  • Counting sort works by mapping the frequency of each distinct key value into an auxiliary counting array, then calculating the precise starting index of each element in the final sorted array via a prefix sum. It requires the input keys to be integers falling within a known, finite range \([0, k]\). It features time complexities of \(T_b(n),T_w(n),T(n)\in\Theta(n+k)\) because it requires a constant number of linear scans over both the input data and the counting bucket array. Its auxiliary space complexities are \(S_b(n),S_w(n),S(n)\in\Theta(n+k)\) due to the necessity of maintaining the counting array of size \(k\) and a separate output buffer array of size \(n\).
  • Radix sort works by processing integer or string data digit-by-digit, from the least significant digit to the most significant digit (or vice versa). It distributes the elements into buckets based on the current digit’s value, using a stable sorting subroutine—typically counting sort—to ensure the relative order from prior digits is preserved. It has time complexities \(T_b(n),T_w(n),T(n)\in\Theta(d\cdot(n+k))\), where \(d\) is the number of digits required to represent the maximum key, and \(k\) is the radix or base of the number system. Its space complexities are \(S_b(n),S_w(n),S(n)\in\Theta(n+k)\) to accommodate the underlying stable sorting infrastructure.
  • Bucket sort works by distributing elements across a fixed number of buckets that segment a known interval. Each bucket is then sorted individually using a secondary sorting algorithm (like insertion sort) or by recursively applying bucket sort, before the sorted buckets are concatenated sequentially. It has a best and average-case time complexity of \(T_b(n),T(n)\in\Theta(n+k)\) when elements are uniformly distributed across \(k\) buckets, making the individual bucket sizes small and constant. However, if the data heavily clusters into a single bucket, the worst-case time complexity defaults to the secondary sorting algorithm, yielding \(T_w(n)\in\Theta(n^2)\) if insertion sort is utilized. Its space complexities are \(S_b(n),S_w(n),S(n)\in\Theta(n+k)\) to maintain the collection of bucket structures.

Problem: Distinguish between the concepts of abstract data types, data types, and data structures.

Solution: At the highest level, it’s abstract data types (e.g. the Platonic mathematical idea of the real numbers \(\mathbf R\) and the axiomatic operations one can perform with them). Below that, it’s the data type itself (e.g. the messy, IEEE \(754\) double-precision standard concrete implementation of the ADT such as a Python float), and at the lowest level of the hardware/RAM, it’s the data structure describing the geometric arrangement of bits/bytes to implement objects of a data type.

Problem: List several standard data structures, pros and cons, and example applications.

Solution:

  • Static arrays: contiguous block of memory with fixed size \(n\). Access takes \(\Theta(1)\) time but insertion takes \(\Theta(n)\) to shift bits around to make room. Python tuples (an immutable data type) are stored using static array data structures.
  • Dynamic arrays: like static arrays except \(n\) is not fixed. In practice, are implemented via static arrays (i.e. if capacity \(n\) of a static array is exceeded, duplicate to a new static array with \(n\mapsto\lambda n\) for some fixed multiplier \(\lambda\sim 2\)). Python lists (a mutable data type) are stored using dynamic array data structures (not linked lists! unfortunate nomenclature…)
  • Linked lists: unlike in (static/dynamic) arrays where bits are arranged contiguously (and thus the contiguity implicitly determines which bit comes after which), linked lists abandon contiguity in favor of a linear graph-like structure consisting of nodes that store not only the value of the data (as was the case with arrays) but also a pointer (memory address of the next node). Insertion/deletion are \(\Theta(1)\) but access and search are both \(\Theta(n)\) (cannot make use of caching, unlike arrays).
  • Hash tables: essentially a data structure implementation of a dictionary data type. For each key \(k\), one associates some value \(v(k)\) where the function \(v\) is called the hash function and should be chosen to be simultaneously as simple and injective as possible (will need some system for handling inevitable non-injective mappings though, aka collisions). Useful for memoization of recursive function computations (e.g. Fibonacci).
  • Stacks: (strictly an abstract data type, can be implemented using either array or linked list data structures) LIFO (last-in first-out) like a stack of plates, a ctrl\(+z\) undo stack, or a call stack. Push or pop.
  • Queues: (strictly an abstract data type, can be implemented using either array of linked list data structures) FIFO (first-in-first-out) like a queue of people or a print spooler. Enqueue or dequeue.
  • Binary search trees: essentially always start from a root parent node, and obey the BST invariant! Then, search becomes \(\Theta(\ln n)\) (assuming the BST is balanced as in AVL/red-black BSTs).
  • Heaps: whereas BSTs require the values of the left child node \(v_{cl}\), the parent node \(v_p\), and the right child node \(v_{cr}\) to be horizontally stratified \(v_{cl}\leq v_p\leq v_{cr}\), a heap imposes purely vertical stratification (more precisely, a max binary heap demands \(v_p\geq v_{cl},v_{cr}\) whereas a min binary heap demands \(v_p\leq v_{cl},v_{cr}\)). The point of this heap property is that accessing the max or min values takes \(O(1)\) time (in max and min heaps resp.) which for some applications (e.g. emergency room priorities) may be desirable. Heaps are thus always balanced unlike BSTs.
  • Tries:
  • Graphs:
  • Union-Finds:

Problem:

Fundamental Theorem of Classical Computing: The vector space \(\textbf N\) over the binary Galois field \(\textbf Z/2\textbf Z\) admits the countably infinite basis \(\textbf N=\text{span}_{\textbf Z/2\textbf Z}\{2^n:n\textbf N\}\) so that \(\text{dim}_{_{\textbf Z/2\textbf Z}}(\textbf N)=\aleph_0\). Put differently, the binary representation map \(n\in\textbf N\mapsto [n]^{(1,2,4,8,…)}\in \) is

From a physics perspective, one can loosely think of \(\textbf N\cong(\textbf Z/2\textbf Z)^{\oplus\infty}\) as the direct sum of infinitely many copies of the binary “vector space” \(\textbf Z/2\textbf Z\). By contrast, in the field of quantum computing, one instead has a vector space of the form \((\textbf C^2)^{\otimes N}\) for \(N\) qubits…and the multiplicative structure of the tensor product is much richer than the additive structure of the direct sum, hence the interest in quantum computing.

Knowing all this, it follows that any data \(X\) (e.g. an image file, an audio file, etc.) which can simply be “reduced to numbers” by some injection \(f:X\to\textbf N\) is also in principle just reduced to some bit string \((b_k)_{k=0}^\infty\) simply by taking each \(f(x)\in\textbf N\) for \(x\in X\) and writing out the binary representation of \(f(x)\). For instance, when \(X=\{a,b,c,…,x,y,z\}\) is the alphabet, then one possible injection (or encoding) is called the American Standard Code for Information Interchange \(\text{ASCII}:\{a,b,c,…,x,y,z\}\to\textbf N\) which maps for instance \(\text{ASCII}(a):=97\), \(\text{ASCII}(b):=98\), and so forth (strictly speaking, ASCII uses \(7\) bits to represent \(2^7=128\) characters of which \(26\) are the usual English alphabet letters in lower case while another \(26\) are uppercase English letters. There was even an extended ASCII which used \(8\) bits to represent \(2^8=256\) characters, but ultimately that was clearly insufficient for things like the Chinese language and so nowadays Unicode (with UTF-8 encoding, which stands for Unicode Transformation Format – 8 bit, or UTF-16, UTF-32, etc.) tends to be used in lieu of ASCII).

When storing numbers in memory (e.g. RAM, HDD, SSD) a computer can experience overflow error (most computers have \(64\)-bit architecture, so can safely store up to \(2^{64}-1\)), roundoff error (this is especially relevant to floating point arithmetic), and precision errors.

An analog-to-digital converter (ADC) is just an abstraction for any function that maps an analog signal \(V(t)\), \(I(x,y)\), etc. to a digital signal \(\bar V_i,\bar I_{ij}\), etc. More precisely, one can consider an ADC to be a pair \(\text{ADC}=(f_s,\rho)\) where \(f_s\) is the sampling rate of the ADC in samples/second and \(\rho\) is the bit depth/resolution/precision at which the ADC quantizes data in bits/sample, thus the bit rate \(\dot b\) of the ADC is \(\dot b=f_s\rho\) and this is in general distinct from the baud rate \(\dot{Bd}\) of serial communication with the ADC by a factor \(\lambda:=\frac{\dot b}{\dot{Bd}}\geq 1\) which describes the number of bits per baud (see this Stack Overflow Q&A for an idea of the distinction).

For \(V(t)\) an analog signal in the time domain which is bandlimited by some bandwidth \(\Delta f\), the Nyquist-Shannon sampling theorem asserts that in order to avoid aliasing distortions when sampling \(V(t)\), one has to use \(f_s>\Delta f\). Equivalently, if \(f^*=\Delta f/2\) is the largest frequency present in \(V(t)\), then the sampling frequency needs to obey \(f_s>2f^*\). For instance, humans can hear audio up to \(f^*=20\text{ kHz}\), so audio ADCs (a fancy way of saying microphones) sample at \(f_s=48\text{ kHz}\). Cameras are just image ADCs, where now “samples” is replaced by “pixels” and so \(f_s\) might be better called “pixel frequency” (with units of pixels/meter rather than samples/second?) and the use of the RGB color space is fundamentally based on the biology of the human eye and its \(3\) types of cone cells, and conventionally each R, G, B channel has \(256\) levels (or \(1\) byte) of intensity quantization simply because that was empirically found to be sufficient (so the total bit depth of an RGB image is \(\rho=3\) bytes/pixel or \(24\) bits/px.

In practice, such data would likely be further compressed (either via lossless or lossy data compression algorithms). For instance, JPEG (lossy), PNG (lossless), run-length encoding (lossless), etc. for digital/bitmap/raster images, Lempel-Ziv-Welch (LZW) (lossless) compression, Huffman encoding (lossless), byte pair encoding, for text files, and perceptual audio encoding (lossy) which exploits the psychoacoustic quirks of the human auditory system such as auditory masking and high frequency limits.

Computers & Logic Circuits

One abstract paradigm for understanding how a computer works is: input\(\to\)storage + processing\(\to\)output. Input is typically taken from sensors (e.g. keyboards, mouses, touchscreens, microphones, cameras), memory is handled by RAM, storage is handled by HDD/SSD, processing is done by the central processing unit (CPU) (an integrated circuit (IC)) where CPU = control unit + arithmetic logic unit (ALU) (both storage and processing use logic circuits made of many logic gates combined together), and output is a monitor, a speaker, an electric motor, an LED etc. Processing + memory are heavily interdependent, connected by a memory bus.

This excellent YouTube demonstration of implementing standard logic gates (e.g. buffers, NOT gates, AND gates, OR gates, XOR gates, NAND gates, NOR gates, etc.) using standard hardware on a solderless breadboard, notably transistors. So really, when it comes to processing data, a “computer” is an abstraction over “logic circuits” which itself is an abstraction over “logic gates” which itself is an abstraction over “bits” which itself is an abstraction over transistors and physical hardware that one actually touch and feel in the real world.

Our current computer (Microsoft Surface Studio \(2\)) has \(\sigma_{\text{RAM}}=32\text{ GB}\) and \(\sigma_{\text{SSD or C-Drive}}=1\text{ TB}\) (with Microsoft OneDrive providing an additional \(\sigma_{\text{OneDrive}}=1\text{ TB}\) of storage space).

The Internet

A computer network is topologically any strongly connected undirected graph where nodes represent computing devices (e.g. computers, phones, etc.) and edges are communication channels between a pair of computers. Common network topologies include the ring, star, mesh, bus, and tree topologies. Examples of computer networks include local area networks (LAN), wide area networks (WAN), data center networks (DCN), etc. with the Internet being a distributed packet-switched WAN. When designing the architecture of a computer network, one is interested in minimizing the distance (with respect to a suitable metric) that any piece of data \(D\) must travel to get from one computer to another.

At the level of physical hardware, data can be communicated between computers via copper category 5 (CAT5) twisted pair cables adhering to Ethernet standards. Fiber optic cables can also be used with Ethernet standards. Wi-Fi or Bluetooth communicates data via radio waves which suffer attenuation. Regardless, for all \(3\) of these line coding schemes (maps from abstract bits to a digital signal in the real world) we need to thank James Clerk Maxwell.

The informal notion of the “speed” of an internet connection (i.e. one of the communication channels mentioned earlier) between two computing devices \(X, Y\) is made precise by the bit rate \(\dot b_{(X,Y)}\) between the computing devices \(X\) and \(Y\). The bandwidth of that communication channel is then just the maximum bit rate \(\dot b^*_{(X,Y)}\) between \(X\) and \(Y\) (not to be confused with the signal processing notion of the bandwidth \(\Delta f\) of an analog signal). Another important factor is the latency \(\Delta t_{(X,Y)}\) of a given communication channel (i.e. just the delay). Running an Internet speed test for my computer with the measurement lab (M-lab) yields \(\dot b^{\text{downloads}}_{\text{(computer,M-lab)}}=650.7\frac{\text{Mb}}{\text s}\) and \(\dot b^{\text{uploads}}_{\text{(computer,M-lab)}}=732.0\frac{\text{Mb}}{\text s}\) and a latency of \(\Delta t_{\text{(computer,M-lab)}}=4\text{ ms}\)

Just like every house \(H\) has a physical address \(A_H\), in the WAN that we call the Internet, every computing device \(X\) has an Internet Protocol (IP) address \(\text{IP}_X\). When a computing device \(X\) transmits data packets \(D\) across a communication channel to another computing device \(Y\), \(X\) must specify the IP address \(\text{IP}_Y\) of \(Y\) in addition to providing its own IP address \(\text{IP}_X\) so that \(Y\) can reply to it. There are actually \(2\) common IP address protocols, IPv4 (a string of \(4\) bytes, leading to \(2^{32}\) possible IPv4 addresses) and IPv6 (a string of \(8\) hexadecimal numbers each up to \(0xFFFF\) for a total of \(2^{128}\) possible IPv6 addresses). IP addresses of computing devices may also be dynamic meaning that one’s Internet service provider changes it over time \(t\). Or, if one connects to a disjoint Wi-Fi network, then this will usually also mean a different IP address as each Wi-Fi provider (internet service provider) has a range of IP addresses it is allowed to allocate. By contrast, computing devices acting as servers (e.g. Google’s computers) often have static IP addresses (e.g. \(\text{IPv4}_{\text{Google computers}}=74.125.20.113\)) to make it easier for client computing devices to communicate quickly with.

In terms of actually interpreting what the numbers in an IP address mean, it turns out that it doesn’t have to be the case that (say in an IPv4 address) each byte corresponds to some piece of information. Rather, one can decide how one wishes to impose a hierarchy of subnetworks (subnets) on the IP address, that is, how many bits to represent a given piece of information. This is sometimes known as “octet splitting” where “octet” = “byte” in an IPv4 address.

The Domain Name System (DNS) is essentially a map \(\text{DNS}:\{\text{URLs}\}\to\{\text{IP addresses}\}\) and indeed anytime one uses a browser application (e.g. Chrome) to search for a website URL (e.g. www.youtube.com), DNS servers need to first find the IP address \(\text{DNS}\)(www.youtube.com) associated with that URL.

This entry was posted in Blog. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *