Just as the fundamental theorem of single-variable calculus \(\int_{x_1}^{x_2}f'(x)dx=f(x_2)-f(x_1)\) is the key insight on which the entire subject of single-variable calculus rests, there is an analogous sense in which one can consider a fundamental theorem of classical computing to be the key insight on which the entire field of classical computing rests. This is:
Fundamental Theorem of Classical Computing: The vector space \(\textbf N\) over the binary Galois field \(\textbf Z/2\textbf Z\) admits the countably infinite basis \(\textbf N=\text{span}_{\textbf Z/2\textbf Z}\{2^n:n\textbf N\}\) so that \(\text{dim}_{_{\textbf Z/2\textbf Z}}(\textbf N)=\aleph_0\). Put differently, the binary representation map \(n\in\textbf N\mapsto [n]^{(1,2,4,8,…)}\in \) is
From a physics perspective, one can loosely think of \(\textbf N\cong(\textbf Z/2\textbf Z)^{\oplus\infty}\) as the direct sum of infinitely many copies of the binary “vector space” \(\textbf Z/2\textbf Z\). By contrast, in the field of quantum computing, one instead has a vector space of the form \((\textbf C^2)^{\otimes N}\) for \(N\) qubits…and the multiplicative structure of the tensor product is much richer than the additive structure of the direct sum, hence the interest in quantum computing.
Knowing all this, it follows that any data \(X\) (e.g. an image file, an audio file, etc.) which can simply be “reduced to numbers” by some injection \(f:X\to\textbf N\) is also in principle just reduced to some bit string \((b_k)_{k=0}^\infty\) simply by taking each \(f(x)\in\textbf N\) for \(x\in X\) and writing out the binary representation of \(f(x)\). For instance, when \(X=\{a,b,c,…,x,y,z\}\) is the alphabet, then one possible injection (or encoding) is called the American Standard Code for Information Interchange \(\text{ASCII}:\{a,b,c,…,x,y,z\}\to\textbf N\) which maps for instance \(\text{ASCII}(a):=97\), \(\text{ASCII}(b):=98\), and so forth (strictly speaking, ASCII uses \(7\) bits to represent \(2^7=128\) characters of which \(26\) are the usual English alphabet letters in lower case while another \(26\) are uppercase English letters. There was even an extended ASCII which used \(8\) bits to represent \(2^8=256\) characters, but ultimately that was clearly insufficient for things like the Chinese language and so nowadays Unicode (with UTF-8 encoding, which stands for Unicode Transformation Format – 8 bit, or UTF-16, UTF-32, etc.) tends to be used in lieu of ASCII).
When storing numbers in memory (e.g. RAM, HDD, SSD) a computer can experience overflow error (most computers have \(64\)-bit architecture, so can safely store up to \(2^{64}-1\)), roundoff error (this is especially relevant to floating point arithmetic), and precision errors.
An analog-to-digital converter (ADC) is just an abstraction for any function that maps an analog signal \(V(t)\), \(I(x,y)\), etc. to a digital signal \(\bar V_i,\bar I_{ij}\), etc. More precisely, one can consider an ADC to be a pair \(\text{ADC}=(f_s,\rho)\) where \(f_s\) is the sampling rate of the ADC in samples/second and \(\rho\) is the bit depth/resolution/precision at which the ADC quantizes data in bits/sample, thus the bit rate \(\dot b\) of the ADC is \(\dot b=f_s\rho\) and this is in general distinct from the baud rate \(\dot{Bd}\) of serial communication with the ADC by a factor \(\lambda:=\frac{\dot b}{\dot{Bd}}\geq 1\) which describes the number of bits per baud (see this Stack Overflow Q&A for an idea of the distinction).
For \(V(t)\) an analog signal in the time domain which is bandlimited by some bandwidth \(\Delta f\), the Nyquist-Shannon sampling theorem asserts that in order to avoid aliasing distortions when sampling \(V(t)\), one has to use \(f_s>\Delta f\). Equivalently, if \(f^*=\Delta f/2\) is the largest frequency present in \(V(t)\), then the sampling frequency needs to obey \(f_s>2f^*\). For instance, humans can hear audio up to \(f^*=20\text{ kHz}\), so audio ADCs (a fancy way of saying microphones) sample at \(f_s=48\text{ kHz}\). Cameras are just image ADCs, where now “samples” is replaced by “pixels” and so \(f_s\) might be better called “pixel frequency” (with units of pixels/meter rather than samples/second?) and the use of the RGB color space is fundamentally based on the biology of the human eye and its \(3\) types of cone cells, and conventionally each R, G, B channel has \(256\) levels (or \(1\) byte) of intensity quantization simply because that was empirically found to be sufficient (so the total bit depth of an RGB image is \(\rho=3\) bytes/pixel or \(24\) bits/px.
In practice, such data would likely be further compressed (either via lossless or lossy data compression algorithms). For instance, JPEG (lossy), PNG (lossless), run-length encoding (lossless), etc. for digital/bitmap/raster images, Lempel-Ziv-Welch (LZW) (lossless) compression, Huffman encoding (lossless), byte pair encoding, for text files, and perceptual audio encoding (lossy) which exploits the psychoacoustic quirks of the human auditory system such as auditory masking and high frequency limits.
Computers & Logic Circuits
One abstract paradigm for understanding how a computer works is: input\(\to\)storage + processing\(\to\)output. Input is typically taken from sensors (e.g. keyboards, mouses, touchscreens, microphones, cameras), memory is handled by RAM, storage is handled by HDD/SSD, processing is done by the central processing unit (CPU) (an integrated circuit (IC)) where CPU = control unit + arithmetic logic unit (ALU) (both storage and processing use logic circuits made of many logic gates combined together), and output is a monitor, a speaker, an electric motor, an LED etc. Processing + memory are heavily interdependent, connected by a memory bus.
This excellent YouTube demonstration of implementing standard logic gates (e.g. buffers, NOT gates, AND gates, OR gates, XOR gates, NAND gates, NOR gates, etc.) using standard hardware on a solderless breadboard, notably transistors. So really, when it comes to processing data, a “computer” is an abstraction over “logic circuits” which itself is an abstraction over “logic gates” which itself is an abstraction over “bits” which itself is an abstraction over transistors and physical hardware that one actually touch and feel in the real world.
Our current computer (Microsoft Surface Studio \(2\)) has \(\sigma_{\text{RAM}}=32\text{ GB}\) and \(\sigma_{\text{SSD or C-Drive}}=1\text{ TB}\) (with Microsoft OneDrive providing an additional \(\sigma_{\text{OneDrive}}=1\text{ TB}\) of storage space).
The Internet
A computer network is topologically any strongly connected undirected graph where nodes represent computing devices (e.g. computers, phones, etc.) and edges are communication channels between a pair of computers. Common network topologies include the ring, star, mesh, bus, and tree topologies. Examples of computer networks include local area networks (LAN), wide area networks (WAN), data center networks (DCN), etc. with the Internet being a distributed packet-switched WAN. When designing the architecture of a computer network, one is interested in minimizing the distance (with respect to a suitable metric) that any piece of data \(D\) must travel to get from one computer to another.
At the level of physical hardware, data can be communicated between computers via copper category 5 (CAT5) twisted pair cables adhering to Ethernet standards. Fiber optic cables can also be used with Ethernet standards. Wi-Fi or Bluetooth communicates data via radio waves which suffer attenuation. Regardless, for all \(3\) of these line coding schemes (maps from abstract bits to a digital signal in the real world) we need to thank James Clerk Maxwell.
The informal notion of the “speed” of an internet connection (i.e. one of the communication channels mentioned earlier) between two computing devices \(X, Y\) is made precise by the bit rate \(\dot b_{(X,Y)}\) between the computing devices \(X\) and \(Y\). The bandwidth of that communication channel is then just the maximum bit rate \(\dot b^*_{(X,Y)}\) between \(X\) and \(Y\) (not to be confused with the signal processing notion of the bandwidth \(\Delta f\) of an analog signal). Another important factor is the latency \(\Delta t_{(X,Y)}\) of a given communication channel (i.e. just the delay). Running an Internet speed test for my computer with the measurement lab (M-lab) yields \(\dot b^{\text{downloads}}_{\text{(computer,M-lab)}}=650.7\frac{\text{Mb}}{\text s}\) and \(\dot b^{\text{uploads}}_{\text{(computer,M-lab)}}=732.0\frac{\text{Mb}}{\text s}\) and a latency of \(\Delta t_{\text{(computer,M-lab)}}=4\text{ ms}\)

Just like every house \(H\) has a physical address \(A_H\), in the WAN that we call the Internet, every computing device \(X\) has an Internet Protocol (IP) address \(\text{IP}_X\). When a computing device \(X\) transmits data packets \(D\) across a communication channel to another computing device \(Y\), \(X\) must specify the IP address \(\text{IP}_Y\) of \(Y\) in addition to providing its own IP address \(\text{IP}_X\) so that \(Y\) can reply to it. There are actually \(2\) common IP address protocols, IPv4 (a string of \(4\) bytes, leading to \(2^{32}\) possible IPv4 addresses) and IPv6 (a string of \(8\) hexadecimal numbers each up to \(0xFFFF\) for a total of \(2^{128}\) possible IPv6 addresses). IP addresses of computing devices may also be dynamic meaning that one’s Internet service provider changes it over time \(t\). Or, if one connects to a disjoint Wi-Fi network, then this will usually also mean a different IP address as each Wi-Fi provider (internet service provider) has a range of IP addresses it is allowed to allocate. By contrast, computing devices acting as servers (e.g. Google’s computers) often have static IP addresses (e.g. \(\text{IPv4}_{\text{Google computers}}=74.125.20.113\)) to make it easier for client computing devices to communicate quickly with.
In terms of actually interpreting what the numbers in an IP address mean, it turns out that it doesn’t have to be the case that (say in an IPv4 address) each byte corresponds to some piece of information. Rather, one can decide how one wishes to impose a hierarchy of subnetworks (subnets) on the IP address, that is, how many bits to represent a given piece of information. This is sometimes known as “octet splitting” where “octet” = “byte” in an IPv4 address.
The Domain Name System (DNS) is essentially a map \(\text{DNS}:\{\text{URLs}\}\to\{\text{IP addresses}\}\) and indeed anytime one uses a browser application (e.g. Chrome) to search for a website URL (e.g. www.youtube.com), DNS servers need to first find the IP address \(\text{DNS}\)(www.youtube.com) associated with that URL.