Skip to main content

Measure theory and $L^p$ Spaces

I'm on a quest to learn about operator algebras in the hopes of understanding the many interesting ways they have been applied to quantum field theory. This note is the first in a series that will build the essential aspects of the theory from the ground up. I will not prove everything, giving references for the proofs of many lemmas, but I will try to give enough detail that the mathematical underpinnings of the theory are clear.

This post is about $L^p$ spaces. These are Banach spaces of functions — with the $L^{\infty}$ space actually being a Banach algebra — that show up frequently in the theory of general operator algebras. I will introduce the basics of measure theory and the general theory of Lebesgue integration, introduce the $L^p$ spaces, and prove some basic theorems concerning them.

The outline is:

  1. In section 1, I will introduce the basic tools of measure theory: $\sigma$-algebras, measurable functions, and measures.
  2. In section 2, I will discuss some elementary properties of real- and complex-valued measurable functions.
  3. In section 3, I will define the Lebesgue integral on a general measure space and state some important theorems such as the dominated convergence theorem.
  4. In section 4, I will make some comments on the role of sets of measure zero, and how to think about them.
  5. In section 5, I will introduce the $L^p$ spaces and prove that they are Banach spaces. I will also show that $L^2$ is a Hilbert space.
I learned most of this material from Walter Rudin's textbook "Real and Complex Analysis," with a few points being picked up later by reading Ronald Douglas's textbook "Banach Algebra Techniques in Operator Theory."

Prerequisites: Basics of topology and real analysis.

Table of Contents

  1. $\sigma$-algebras, measurability and measures
  2. Real and complex measurable functions
  3. Integration
  4. Sets of measure zero
  5. $L^p$ spaces

1. $\sigma$-algebras, measurability, and measures

Given a set $X$, a measure is a way of assigning volume to certain subsets of $X$. It can be confusing, on first approaching measure theory, to understand why only some subsets of $X$ should be considered measurable. But when we think about how crazy sets can be — completely arbitrary collections of points in $X$ — it shouldn't be so surprising that a consistent theory of measure might only allow us to measure a restricted family of subsets.

So let $\Sigma$ be some collection of subsets of $X$ that we will declare to be "measurable." It will turn out that not any collection $\Sigma$ can reasonably be called measurable; we will discover by investigation what properties $\Sigma$ ought to have. We begin by considering a function $\mu : \Sigma \to [0, \infty],$ which we will call a measure, and which we will think of as assigning volumes to elements of $E$.

There are two obvious properties that $\mu$ should have: (i) it should assign measure zero to the empty set, $\mu(\varnothing) = 0,$ and (ii) if $E_1, E_2 \in \Sigma$ are disjoint, then we should have $\mu(E_1 \cup E_2) = \mu(E_1) \cup \mu(E_2).$ In fact, since we allow the measure of a set to be infinite, we might as well extend property (ii) to sequences of disjoint sets: if $\{E_n\}$ is a sequence in $\Sigma$ of pairwise-disjoint sets, then we should have $\mu(\cup_j E_j) = \sum_j \mu(E_j).$

In order for the preceding paragraph to make sense, we need $\varnothing$ to be contained in $\Sigma$, and we need $\Sigma$ to be closed under countable unions. Another condition that makes sense to impose is that $\mu$ should be able to assign a volume to the full set $X$, whether that volume be finite or infinite; so we should require $X \in \Sigma$. Finally, if we are able to measure the sum of the volume of disjoint sets — $\mu(E_1) + \mu(E_2) = \mu(E_1 \cup E_2)$ — then we really ought to be able to measure the difference in volume of nested sets. That is, if $E_1 \subseteq E_2$ and $E_1, E_2 \in \Sigma$, then we ought to have $\mu(E_2 - E_1) = \mu(E_2) - \mu(E_1).$ So we should impose $E_2 - E_1 \in \Sigma.$

To meet all the criteria of the preceding paragraph, it is sufficient to require that $\Sigma$ (i) contains the empty set, (ii) is closed under complements, and (iii) is closed under countable unions. Any collection of subsets of $X$ satisfying these three properties is called a $\sigma$-algebra on $X$. Once a $\sigma$-algebra has been specified, $X$ is said to be a measurable space and the sets in the $\sigma$-algebra are said to be measurable sets. Standard manipulations in set theory show that conditions (i-iii) imply that $\Sigma$ contains $X = \varnothing^c,$ is closed under countable intersections $\cap_j E_j = (\cup_j E_j^c)^c$, and is closed under set subtractions $A, B \in \Sigma \Rightarrow A - B  = A \cap B^c.$

One can easily show that the intersection of $\sigma$-algebras is a $\sigma$-algebra, which lets us define the $\sigma$-algebra "generated by" any collection of subsets of $X$ as the intersection of all $\sigma$-algebras containing that collection. When $X$ is a topological space, it is often convenient to require that all open sets be measurable; the Borel algebra on $X$ is the $\sigma$-algebra generated by open subsets of $X$.

A map from a measurable space $X$ into a topological space $Y$ is said to be a measurable function if preimages of open sets in $Y$ are measurable in $X$; this mimics the definition of a continuous function on a topological space, for which preimages of open sets are open. Note that if $X$ is a topological space, then all continuous maps from $X$ to $Y$ are measurable with respect to the Borel algebra.

We are now ready to define a measure properly. Given a measurable space $X$ with $\sigma$-algebra $\Sigma,$ a measure is a map $\mu : \Sigma \to [0, \infty]$ satisfying $\mu(\varnothing) = 0$ and $\mu(\cup_j E_j) = \sum_j \mu(E_j)$ for pairwise-disjoint measurable sets $E_j.$ From this definition, many interesting properties can be proved; the most important of these, for the moment, is that measures are monotonic: for $A \subseteq B$ both measurable, we have $\mu(B) = \mu(A) + \mu(B - A) \geq \mu(A).$

A measurable space on which a measure has been defined is called a measure space.

2. Real and complex measurable functions

Let $X$ be a measure space with $\sigma$-algebra $\Sigma$ and measure $\mu.$ We will consider maps from $X$ into $\mathbb{R}$ or $\mathbb{C}.$ I won't prove the basic properties of these maps, but I will state them below. (The proofs aren't very instructive, but if you're curious they can be found in chapter 1 of Rudin's book.) They are:
  • If $u : X \to \mathbb{R}$ and $v : X \to \mathbb{R}$ are measurable, then $f(x) = u(x) + i v(x)$ is measurable.
  • If $f : X \to \mathbb{C}$ is measurable, then its real part, its complex part, and its magnitude are all measurable.
  • If $f $ and $g$ are measurable functions from $X$ to $\mathbb{R}$ or $\mathbb{C}$, then $fg$ and $f+g$ are measurable.
  • The pointwise supremum and infimum (and hence limit-superior and limit-inferior) of any sequence of measurable functions are measurable.
This last point implies that for any measurable function $f : X \to \mathbb{R},$ the positive and negative parts $f^+ = \max\{f, 0\}$ and $f^{-} = - \min\{f, 0\}$ are measurable. The decomposition of $f$ into positive and negative parts will play an important part in defining integration in the next section.

For any set $E$ in $X$, we define the characteristic function $\chi_E$ to be the function from $X$ to $\mathbb{R}$ that takes the value $1$ on $E$ and $0$ on $E^c.$ Clearly $E$ is a measurable set if and only if $\chi_E$ is a measurable function.

3. Integration

Given a measure space $X$, a simple function is a measurable function whose range has only finitely many values. Any such function can be written as a finite sum of characteristic functions over disjoint measurable sets $E_j$:
$$s(x) = \sum_{j=1}^{n} \alpha_j \chi_{E_j}(x).$$
The utility of simple functions is that they can be used to approximate any measurable function $f$ "from below" in the sense that we can find a sequence $|s_1| \leq |s_2| \leq \dots \leq |f|$ for which the sequence $s_n$ converges pointwise to $f.$ We will prove this now for $f : X \to [0, \infty],$ but the same conclusion holds for general $f : X \to \mathbb{C}$ by approximating the real-positive, real-negative, imaginary-positive, and imaginary-negative parts of $f$ separately.

For each $n,$ we will divide the positive real axis $[0, \infty]$ into a finite number of layers; some collection of "small" layers that cover all of the real axis up to $n$, and then one big layer that covers the remaining portion of the real axis from $n$ all the way to infinity. We want the "small" layers to decrease in width with increasing $n.$ To accomplish this, we will divide the interval $[0, n]$ into $n 2^n$ segments each of which has length $2^{-n}.$ For any $x \in X,$ we then approximate $f(x)$ by the bottom of the layer it lies in, defining the function $s_n$ by
$$s_n(x) = \max_{k=0, \dots, n 2^n} \{ k 2^{-n} | k 2^{-n} \leq f(x)\}.$$
The function $s_n$ takes on finitely many values, each of which can be expressed in terms of sets of the form $f^{-1}([\alpha, \infty]).$ These sets are measurable, so we conclude that $s_n$ is a simple function. It is straightforward to check the properties $s_1 \leq \dots \leq f$ and $s_n \to f.$

It is easy to define what we mean by the integral of a simple function with respect to the measure $\mu$: it ought to just be
$$\int_{X} d\mu\, (\sum_j \alpha_j \chi_{E_j}) = \sum_{j} \alpha_j \mu(E_j).$$
Because simple functions can be used to approximate any measurable function from below, it makes sense to define the integral of a general measurable function $f : X \to [0, \infty]$ by approximation:
$$\int_{X} d\mu\, f = \sup \{\int_{X} d\mu\, s | s \leq f, s \text{ simple}\}.$$
We define integration over a measurable subset $A$ by
$$\int_{A} d\mu\, f = \int_{X} d\mu\, \chi_A f.$$

It is straightforward to check from this definition that the integral of a measurable function has nice, standard properties:
  1. $0 \leq f \leq g$ implies $\int_{X} d\mu\, f \leq \int_{X} d\mu\, g.$
  2. $A \subseteq B$ and $f \geq 0$ implies $\int_{A} d\mu\, f \leq \int_{B} d\mu\, f.$
  3. $c \in [0, \infty)$ and $f \geq 0$ implies $\int_X d\mu\, c f = c \int_X d\mu\, f.$
  4. $\mu(A) = 0$ or $f|_A = 0$ both imply $\int_{X} d\mu\, f = 0.$
A few important properties of integration are slightly less trivial to show; again, proofs can be found in chapter 1 of Rudin's book. They are:
  1. Integration is linear: $f, g \geq 0$ implies $\int_{X} d\mu\, (f + g) = \int_{X} d\mu\, f + \int_{X} d\mu\, g.$
  2. If $f \geq 0$ is a measurable function on $X$, then the map $E \mapsto \int_{E} d\mu\, f$ is a measure on $X$.
So far we have only defined integrals of positive measurable functions. It is easier to do this than in the general case, for the same reason that sums of positive numbers are always insensitive to rearrangement, while general sums can converge to different values if they are rearranged. A general sum can be rearranged only if it converges absolutely; this inspires us to give a notion of what it means for a function to be absolutely integrable.

If $f : X \to \mathbb{C}$ is measurable, then we say $f$ is integrable if $\int_{X} d\mu\, |f|$ is finite. The space of all such functions is denoted $\mathcal{L}^1(\mu).$ The real and complex parts of an integrable function are also integrable, as are the positive and negative parts of those. This lets us define the integral of a general function in $\mathcal{L}^1(\mu)$:
$$\int_{X} d\mu\, f = \int_{X} d\mu\, \operatorname{Re}(f)^+ - \int_{X} d\mu\, \operatorname{Re}(f)^- + i \int_{X} d\mu\, \operatorname{Im}(f)^+ - i \int_{X} d\mu\, \operatorname{Im}(f)^-.$$
This definition of complex integration is easily checked to be complex-linear using real-linearity of the positive-real case.

Complex integration also satisfies a version of the triangle inequality. If $f$ is in $\mathcal{L}^1(\mu),$ then we can write its integral in polar form as $\int_{X} d\mu\, f = r e^{i \theta}.$ We then have
$$\left| \int_{X} d\mu\, f \right| = \left| \int_{X} d\mu\, f e^{- i \theta} \right| = \int_{X} d\mu\, \operatorname{Re}(f e^{- i \theta}) \leq \int_{X} d\mu\, |f|.$$
In the second step we have used that we know the integral is real, so we can replace the integrand by its real part. In the third step, we have used that the real part of a number is upper bounded by its norm.

A final important tool in the study of integration over measures is the dominated convergence theorem. I won't prove it; the proof isn't hard, but it requires a few lemmas that I don't think are particularly instructive on their own. The statement is that if $f_n$ is a sequence of complex, measurable functions on $X$ that converge pointwise to a function $f,$ and that satisfy $|f_n| \leq g$ for some $g \in \mathcal{L}^1(\mu)$, then $f_n$ and $f$ are also in $\mathcal{L}^1(\mu)$ and the integral of the sequence converges:
$$\int_{X} d\mu\, f_n \to \int_{X} d\mu\, f.$$

4. Sets of measure zero

One of the really interesting things about measure spaces is that in integration, sets of measure zero just don't matter at all. If $E$ is a set of measure zero, and $f$ and $g$ are measurable functions on $X$ that agree away from $E$, then we have
$$\int_{X} d\mu\, f = \int_{E} f + \int_{X - E} d\mu\, f = 0 + \int_{X - E} d\mu\, f = \int_{E} g + \int_{X- E} d\mu\, g = \int_{X} d\mu\, g.$$
In fact, the converse is true as well: if $\int_{X} d\mu\, f = \int_{X} d\mu\, g,$ then $f$ and $g$ must agree "almost everywhere," i.e., there must be a set of measure zero $E$ away from which they agree.  As usual, it suffices to show this when $f - g$ is nonnegative, and to get the general complex case by looking at real/imaginary/positive/negative parts. In the case $f - g \geq 0,$ we define the measurable sets
$$E_n = \{x | f(x) > g(x) + 1/n\}.$$
We have
$$\frac{\mu(E_n)}{n} \leq \int_{E_n} d\mu\, (f - g) \leq \int_{X} d\mu\, (f - g) = 0.$$
This implies that each $E_n$ has measure zero, so their union, which is the set $\{x | f(x) \neq g(x)\},$ also has measure zero.

These observations inspire us to define equivalence classes on $\mathcal{L}^1(\mu),$ where we say two integrable functions $f$ and $g$ are equivalent if they differ by a function that vanishes almost everywhere. The space of equivalence classes is called $L^1(\mu)$; a class $[f] \in L^1(\mu)$ has a well defined integral, and by abuse of notation elements of $L^1(\mu)$ are treated as functions, rather than equivalence classes, since for most purposes the behavior of a function on a set of measure zero is completely irrelevant.

5. $L^p$ spaces

A generalization of $L^1(\mu)$ now presents itself to us. For $f : X \to \mathbb{C}$ measurable and $p \geq 1$, we define the p-norm of $f$ by
$$\lVert f \rVert_p = \left( \int_{X} d\mu\, |f|^p \right)^{1/p}.$$
The term "p-norm" is slightly inaccurate, as $\lVert \cdot \rVert_p$ isn't actually a norm; it isn't positive definite, since it assigns zero to all functions that vanish almost everywhere. We define $\mathcal{L}^p(\mu)$ to be the set of functions on which the p-norm is finite, and define $L^p(\mu)$ to be the quotient of $\mathcal{L}^p(\mu)$ by the space of functions that vanish almost everywhere. We will see that the p-norm is an actual norm on $L^p(\mu).$ To show this, it suffices to show that the p-norm is a seminorm on $\mathcal{L}^p(\mu),$ i.e., that it satisfies all the properties of a norm except for positive definiteness.

Before proceeding to show that the p-norm is a norm on $L^p(\mu)$, we will define a p-norm in the limit $p \to \infty.$ We define the essential supremum of a measurable function $f : X \to [0, \infty]$ to be the smallest number $a$ such that the set $\{x | f(x) > a\}$ has measure zero. Formally, we write
$$\operatorname{ess\,sup}(f) = \inf \{a\in \mathbb{R} | \mu(f^{-1}((a, \infty])) = 0\}.$$
The $\infty$-norm of a complex, measurable function is the essential supremum of its absolute value. We define $\mathcal{L}^\infty(\mu)$ to be the space of measurable functions with finite $\infty$-norm, and $L^\infty(\mu)$ to be the quotient by the set of functions that vanish almost everywhere.

Now, we will observe that all of the p-norms, including the $\infty$-norm, satisfy the triangle inequality $\lVert f + g \rVert_p \leq \lVert f \rVert_p + \lVert g \rVert_p$ and the absolute homogeneity condition $\lVert \alpha f \rVert_p = |\alpha| \lVert f \rVert_p.$ Let's start with the absolute homogeneity condition; for $p < \infty,$ this follows trivially from the definition due to linearity of integration. For $p = \infty$ and $\alpha = 0,$ the conclusion is obvious. For $p=\infty$ and $\alpha \neq 0,$ we have
$$\{x | |\alpha f(x)| > a\} = \{x | |f(x)| > a / |\alpha|\},$$
and thus
$$\lVert \alpha f \rVert_{\infty} = \inf \{a | \mu(\{ x | |f(x)| > a / |\alpha|\}) = 0\}. = |\alpha| \inf\{a | \mu(\{ x | |f(x)| > a\}) = 0\} = |\alpha| \lVert f \rVert_{\infty}.$$

The triangle inequality is a little harder. The proof, while not difficult, uses some lemmas about convex functions that are beyond the scope of the present post. I'll omit it here; the proof can be found on the Wikipedia page for the Minkowski inequality. Another important inequality is Holder's inequality, which says that if $p$ and $q$ are numbers greater than or equal to one satisfying $1/p + 1/q = 1$ (and we say that $p=1$ and $q=\infty$ satisfy this relationship), then for $f \in \mathcal{L}^p(\mu)$ and $g \in \mathcal{L}^q(\mu)$ we have
$$\lVert f g \rVert_1 \leq \lVert f \rVert_p \lVert g \rVert_q.$$

One important thing to know about $L^p$ spaces is that they are Banach spaces; that is, $L^p(\mu)$ is complete in the p-norm. To see this for finite $p,$ let $f_n \in \mathcal{L}^p(\mu)$ be a Cauchy sequence. Using standard properties of Cauchy sequences, we can define a subsequence $f_{n_j}$ satisfying
$$\lVert f_{n_{j+1}} - f_{n_j} \rVert \leq \frac{1}{2^j}.$$
Using the triangle inequality for the $p$-norm, we have
$$\lVert \sum_{j=1}^{k} |f_{n_{j+1}} - f_{n_j}| \rVert_p \leq \sum_{j=1}^{k} \lVert f_{n_{j+1}} - f_{n_j} \rVert_p.$$
The right-hand side converges in the limit $k \to \infty,$ so the limit $g = \sum_{j=1}^{\infty} |f_{n_{j+1}} - f_{n_j}|$ must converge to a function $g \in \mathcal{L}^p(\mu).$ For this to be true, $g$ must be finite almost everywhere. As such, the limit
$$f = \lim_{k} f_{n_{k+1}} = f_{n_1} + \sum_{j=1}^{\infty} (f_{n_{j+1}} - f_{n_j})$$
exists almost everywhere in $\mathcal{L}^p(\mu)$, and thus exists exactly in $L^p(\mu).$ So the Cauchy sequence $\{f_n\}$ has a convergent subsequence in $L^p(\mu),$ which means it converges in $L^p(\mu).$  This shows that $L^p(\mu)$ is complete.

In the case that $p$ is infinite, let $f_n \in \mathcal{L}^{\infty}(\mu)$ be a Cauchy sequence, and let $E$ be the complement of the union of all sets of the form
$$\{x | |f_n(x) - f_m(x)| > \lVert f_n- f_m \rVert_{\infty} \}.$$
There are countably many such sets, and they all have measure zero, so $E^c$ has measure zero. On $E$, we have
$$|f_n(x) - f_m(x)| \leq \lVert f_n - f_m \rVert_{\infty},$$
so $f_n$ is uniformly Cauchy on $E$ and therefore converges uniformly to some function $f.$ This means that for any $\epsilon,$ there exists some integer $N$ for which $n \geq N$ implies $|f_n - f| < \epsilon$ on $E$. So the set
$$\{x | |f_n(x) - f(x)| > \epsilon \}$$
is contained in $E^c$ for all $n \geq N.$ This set must have measure zero (since it lies within a set of measure zero), which implies
$$\lVert f_n - f \rVert_{\infty} \leq \epsilon.$$
Taking limits implies that $f_n$ converges to $f$ in $L^{\infty}(\mu).$

The last thing we will show is that $L^2(\mu)$ is not just a Banach space, but a Hilbert space. For $f, g \in L^2(\mu),$ we define the inner product
$$\langle f | g \rangle = \int_{X} d\mu\, \bar{f} g.$$
We are implicitly treating $f$ and $g$ as functions, rather than equivalence classes of functions, but this is fine; the integral in the above equation does not change if either $f$ or $g$ is changed on a set of measure zero. The only important thing is to check that the integral exists in the first place; but this follows from Holder's inequality, which gives $\lVert \bar{f} g \rVert_1 \leq \lVert \bar{f} \rVert_2 \lVert g \rVert_2 < \infty.$ This inner product clearly induces the $2$-norm, and is linear in $g,$ and is positive definite. The only thing we need to check is that it satisfies $\langle f | g \rangle = \overline{\langle g | f \rangle},$ but this follows readily from considering real and imaginary parts of integrals.


Popular posts from this blog

The stress-energy tensor in field theory

I came to physics research through general relativity, where the stress energy tensor plays a very important role, and where it has a single unambiguous meaning as the functional derivative of the theory with respect to metric perturbations. In flat-space quantum field theory, some texts present the stress tensor this way, while some present the stress tensor as a set of Noether currents associated with spatial translations. These definitions are usually presented as being equivalent, or rather, equivalent up to the addition of some total derivative that doesn't affect the physics. However, this is not actually the case. The two stress tensors differ by terms that can be made to vanish classically, but that have an important effect in the quantum theory. In particular, the Ward identities of the two different stress tensors are different. This has caused me a lot of grief over the years, as I've tried to compare equations between texts that use two different definitions of the

Hamiltonian simulation via the Trotter-Suzuki decomposition

This academic term, some colleagues at Stanford and I are running a journal club on Hamiltonian simulation — the problem of how to use a quantum computer to simulate the time evolution of a physical system. Hamiltonian simulation is a hot topic in research, in part because it's believed that simulating certain systems on quantum computers will allow us to probe aspects of those systems that we don't know how to access with traditional laboratory experiments. The earliest approach to this problem, and one that is still practically useful for certain applications, makes use of the Trotter decomposition and its generalization the Trotter-Suzuki decomposition . These are algorithms for decomposing a time evolution operator that acts simultaneously on the entire quantum system, into a sequence of time evolution operators that act locally on only a few physical sites at a time. Specifically, given a time-independent Hamiltonian $H = \sum_j h_j,$ we would like to find a way to approx

Hilbert spaces of Majorana fermions

When we talk about fermions in quantum mechanics, we talk about two kinds: Dirac and Majorana. Both of these are supposed to have the property that if I create a fermion of type $1$ and then a fermion of type $2,$ the resulting state will be related by a minus sign to the state where I create a fermion of type $2$ and then a fermion of type $1.$ But there is a decision to make as to what should happen if we try to create two type-$1$ fermions. Dirac fermions are defined by the property that if you try to create two type-$1$ fermions, the state is completely annihilated to the zero vector. Majorana fermions are defined by the property that if you try to create two type-$1$ fermions, they annihilate one another and leave the total state unchanged. These properties are summarized by saying that the algebras of operators that create and annihilate the two different types of fermions should be different. The Dirac fermions have creation and annihilation operators $a_j, a_j^{\dagger}$ satisf