Skip to main content

Hamiltonian simulation via the Trotter-Suzuki decomposition

This academic term, some colleagues at Stanford and I are running a journal club on Hamiltonian simulation — the problem of how to use a quantum computer to simulate the time evolution of a physical system. Hamiltonian simulation is a hot topic in research, in part because it's believed that simulating certain systems on quantum computers will allow us to probe aspects of those systems that we don't know how to access with traditional laboratory experiments.

The earliest approach to this problem, and one that is still practically useful for certain applications, makes use of the Trotter decomposition and its generalization the Trotter-Suzuki decomposition. These are algorithms for decomposing a time evolution operator that acts simultaneously on the entire quantum system, into a sequence of time evolution operators that act locally on only a few physical sites at a time. Specifically, given a time-independent Hamiltonian $H = \sum_j h_j,$ we would like to find a way to approximate the time evolution operator $e^{- i H t}$ by some product
$$\prod_{\mu = 1}^{M} e^{- i t_\mu h_{c_\mu}}. \tag{1}$$
That is, we should be able to make some list $\{h_{c_\mu}\}_{\mu=1}^{M}$ of local terms appearing in the Hamiltonian, evolve with each one for a time $t_\mu,$ and in doing so approximate the full time evolution $e^{- i H t}.$

By approximate, we mean that the two operators should be close in the operator norm. The operator norm $\lVert A \rVert$ is given by the largest singular value of $A$, which can be realized as
$$\lVert {A} \rVert^2 = \max_{|\psi\rangle \neq 0} \frac{\langle \psi | A^{\dagger} A | \psi \rangle}{\langle \psi | \psi \rangle}.$$
We say that a Trotter-Suzuki decomposition of the form (1) approximates $e^{- i H t}$ with error at most $\epsilon$ if we can prove
$$\left\lVert \prod_{\mu=1}^{M} e^{- i t_\mu h_{c_\mu}} - e^{- i H t} \right\rVert \leq \epsilon.$$

Quantum computers are designed to implement local unitaries on a few qubits at a time. The Trotter-Suzuki decomposition, because it lets us approximate a true Hamiltonian time evolution by a sequence of local unitaries, is an effective approach to simulating Hamiltonians on quantum computers.

In section 1 of this post, I review the Trotter decomposition, which appeared in this paper and is the "first-order" version of a decomposition later improved by Suzuki. In section 2, I review the Suzuki decomposition as presented in this paper. I also comment on the coefficients of higher-order error terms in the Trotter-Suzuki decomposition, and the limitations they place on the practical effectiveness of the Suzuki formula.

Thanks to Shaun Datta and Ani Krishna for spotting errors in this post.

Prerequisites: Only elementary quantum mechanics. Understanding the circuit model of quantum computation is probably helpful for understanding the motivation.

Table of Contents

  1. The Trotter decomposition
  2. The Suzuki decomposition

1. The Trotter decomposition

Suppose we have a Hamiltonian $H$ expressible as a sum of local terms,
$$H = \sum_j h_j,$$
with corresponding time evolution operator
$$U(t) = e^{- i t \sum_j h_j}.$$
If $t$ were small, then we could imagine approximating this operator using Taylor expansions of the exponential. But we can't put that assumption in by hand, because generally we will want to be able evolve quantum systems for long times. So the trick Trotter came up with is to subdivide the time evolution into $N$ time steps, each of length $\delta = t / N$, and consider the limit as $N$ becomes large. We can write the time evolution operator as
$$U(t) = \left(e^{- i \delta \sum_j h_j} \right)^{N}.$$
The Baker-Campbell Hausdorff formula tells us that the difference between $e^{- i \delta \sum_j h_j}$ and $\prod_j e^{- i \delta h_j}$ can be written as a power series in $\delta,$ with the first term being of order $\delta^2.$ So, naively, we can write
$$U(t) = \left(\prod_j e^{- i \delta h_j} + O(\delta^2)\right)^N = \left(\prod_j e^{- i \delta h_j}\right)^N + O(N \delta^{2}).$$
This looks pretty good! It tells us that the operator norm difference between $U(t)$ and $\left(\prod e^{- i \delta h_j} \right)^N$ scales like $N \delta^{2}.$ So if we want the total error to be $\epsilon$, then we must choose $N$ to satisfy
$$\epsilon \sim \frac{t^2}{N},$$
or
$$N \sim \frac{t^2}{\epsilon}.$$
So the Trotter formula gives us a way of approximating time evolution to accuracy $\epsilon$, when $\epsilon / t$ is small, by applying a number of local gates that scales like $t^2/\epsilon.$

Of course, there are parameters other than $\epsilon/t$ that might be relevant in a practical computation of the error. We might want to know how the error scales as we increase the total number of local terms in the Hamiltonian, or as we vary the operator norms of the local terms in the Hamiltonian. These are interesting questions, but beyond the scope of the present post. In order to address them, you'd need to use the Zassenhaus formula to bound higher-order terms in the series relating $e^{- i t \sum_j h_j}$ to $\prod_j e^{- i t h_j}.$

2. The Suzuki decomposition

Suzuki's idea for approximating a time evolution operator by a sequence of local gates starts from the same place as the Trotter formula: let's write
$$U(t) = e^{- i t \sum_j h_j} = \left( e^{- i \delta \sum_j h_j} \right)^N,$$
and then approximate $e^{- i \delta \sum_j h_j}$ for small $\delta$ by a sequence of local operators,
$$e^{- i \delta \sum_j h_j} = e^{- i \delta t_{1} h_{c_{1}}} \dots e^{- i \delta t_{M} h_{c_M}} + O(\delta^k), \tag{2}$$
where $k \geq 2$ is an integer that we'd like to make as large as possible to get the best possible approximation.

The Trotter formula tells us we can set $h_{c_j} = h_j$ and $t_{j} = 1$ in equation (2) and get an error term of order $\delta^2.$ But we can certainly imagine there being longer strings of local operators $h_{c_\mu}$ and times $t_\mu$ that give us even better approximations! A priori, the only restriction on the operators $h_{c_\mu}$ and the times $t_\mu$ is that the  order-$\delta$ terms on each side of equation (2) should match, which gives us
$$\sum_j h_j = \sum_{\mu} t_{\mu} h_{c_{\mu}}. \tag{3}$$
If the local terms in the Hamiltonian are all linearly independent, then this is just the statement that on the right-hand side of (2), each local term $h_j$ should be applied for total time $\delta$. You obtain the Trotter formula by doing the whole $\delta$-length time evolution at once for each local term in sequence.

We can think of equation (3) as the unique restriction on equation (2) that constraints the error term to be order $\delta^2$ or higher. We could try to continue this way of thinking, and ask, "what restrictions are placed on $t_\mu$ and $h_{c_{\mu}}$ by requiring the error term to be order $\delta^3$ or higher?" If we were to match the order-$\delta^2$ terms on both sides of equation (2) and simplify, we'd obtain the formula
$$(\sum_j h_j)^2 = \sum_{\mu \neq \nu} t_\mu t_\nu h_{c_\mu} h_{c_\nu} + (\sum_{\mu} t_{\mu} h_{c_{\mu}})^2.$$
Thanks to equation (3) we know the left side of the equation cancels the second term on the right side, and we obtain the restriction
$$\sum_{\mu \neq \nu} t_\mu t_\nu h_{c_\mu} h_{c_\nu} = 0. \tag{4}$$
This restriction isn't so bad. But already it doesn't have as nice of a physical interpretation as does (3), and we can see that as we go to higher and higher order we're going to produce more and more nontrivial restrictions on the operators $h_{c_{\mu}}$ and times $t_{\mu}.$

Suzuki's idea was to use a simple ansatz for a string of exponentials that might work to approximate $e^{- i \delta \sum_j h_j}$ up to order $\delta^k,$ then to write equation (2) as a constraint on the parameters in that ansatz. The ansatz is recursive. To start, assume there is an operator $Q_{k-1}(\delta)$ satisfying
$$e^{- i \delta \sum_j h_j} = Q_{k-1}(\delta) + O(\delta^{k}). \tag{5}$$
We will construct an ansatz for the next-order approximation $Q_k$ by  applying $Q_{k-1}(\delta)$ repeatedly for some sequence of times $p_1 \delta, \dots, p_{\Omega}\delta,$ i.e., we will write
$$Q_k(\delta) = \prod_{\alpha = 1}^{\Omega} Q_{k-1}(p_\alpha \delta).$$
We now write
$$e^{- i \delta \sum_j h_j} = Q_k(\delta) + O(\delta^{k+1}), \tag{6}$$
and ask whether there exist numbers $\{p_{\alpha}\}$ so that this equation holds.
 
As a first observation, we note that for $k \geq 2$ the operators $Q_{k-1}(\delta)$ and $Q_k(\delta)$ must agree at linear order in $\delta,$ which implies the restriction $\sum_{\alpha} p_{\alpha} = 1.$ We then write
$$e^{- i \delta \sum_j h_j} = \prod_{\alpha} e^{- i \delta p_{\alpha} \sum_j h_j},$$
and substitute in equation (5). When we substitute, it will be helpful to write out the order-$\delta^k$ term explicitly, so we have
$$e^{- i \delta p_{\alpha} \sum_j h_j} = Q_{k-1}(p_{\alpha} \delta) + p_{\alpha}^k \delta^{k} A + O(\delta^{k+1}).$$
After substitution, we obtain
$$e^{- i \delta \sum_j h_j} = \prod_{\alpha} \left( Q_{k-1}(p_{\alpha} \delta) + p_{\alpha}^k \delta^k A + O(\delta^{k+1}) \right).$$
Expanding out the product to leading order, we obtain
$$e^{- i \delta \sum_j h_j} = Q_k(\delta) + \sum_{\alpha} p_{\alpha}^k \delta^{k} A + O(\delta^{k+1}).$$
So if we fix the coefficients $p_{\alpha}$ so that they satisfy $\sum_\alpha p_{\alpha}^k = 0$, then all lingering terms at order $\delta^k$ sum to zero, and we are left with equation (6).

This is pretty good, but unfortunately it appears we have a little bit of a problem. For odd $k$, we can certainly find coefficients $p_{\alpha}$ satisfying $\sum_{\alpha} p_{\alpha} = 1$ and $\sum_{\alpha} p_{\alpha}^k = 0$. For example, the coefficients
$$p = p_1 = p_2 = \dots = p_{\Omega-1}, p_{\Omega} = 1 - (\Omega - 1) p$$
satisfy $\sum_{\alpha} p_{\alpha} = 1,$ and the equation $\sum_{\alpha} p_{\alpha}^k = 0$ becomes
$$(\Omega - 1) p^k + (1 - (\Omega - 1) p)^k = 0,$$
which has a real solution $p$ for any odd $k$ and $\Omega > 1.$ (This is because solutions to polynomials with real coefficients come in complex conjugate pairs, so if a polynomial has an odd number of roots then at least one of them must be real.) However, for even k, the only solution to $\sum_{\alpha} p_{\alpha}^k = 0$ is $p_1 = \dots = p_{\Omega} = 0,$ which does not satisfy the other constraint $\sum_{\alpha} p_{\alpha} = 1.$

So it appears, for the moment at least, that Suzuki's ansatz has hit a critical problem. It simply cannot be applied to produce an operator $Q_{k}$ for even $k.$ But there is a solution! What if, for odd k, we were able to show
$$e^{- i \delta \sum_j h_j} = Q_{k}(\delta) + O(\delta^{k+2})? \tag{7}$$
Then we could define the even term $Q_{k+1}$ as equal to $Q_k,$ and define the next odd term $Q_{k+2}$ from $Q_{k}$ using Suzuki's ansatz. 

In particular, Suzuki shows that equation (7) holds if $Q_{k}$ has the symmetry property $Q_{k}(\delta) Q_{k}(-\delta) = 1.$ The proof is simple: we write the order-$\delta^{k+1}$ term of the time evolution operator explicitly as
$$e^{- i \delta \sum_j h_j} = Q_{k}(\delta) + A \delta^{k+1} + O(\delta^{k+2}),$$
then expand the identity $1 = e^{i \delta \sum_j h_j} e^{- i \delta \sum_j h_j}$ in $\delta$ to obtain
$$1 = Q_{k}(-\delta) A \delta^{k+1} + A Q_{k}(\delta) (-\delta)^{k+1} + O(\delta^{k+2}).$$
The order-$\delta^{k+1}$ term on the right side of this equation must vanish. Since $k$ is odd, we have $(- \delta)^{k+1} = \delta^{k+1}.$ This fact, combined with the expansion $Q_k(\delta) = 1 + O(\delta),$ gives us the vanishing of the order-$\delta^{k+1}$ term as
$$A = 0.$$
So $Q_{k}(\delta)$ does indeed agree to $e^{- i \delta \sum_j h_j}$ up to and including order $\delta^{k+1}.$

So Suzuki's idea is to start with some symmetric approximant $Q_{2}(\delta)$, with
$$e^{- i \delta \sum_j h_j} = Q_2(\delta) + O(\delta^{3}),$$
then to pick some parameters $\{p_{\alpha}\}$ summing to $1$ and with
$$\sum_{\alpha} p_{\alpha}^3 = 0,$$
in such a way that
$$Q_{3}(\delta) = \prod_{\alpha} Q_{2} (p_{\alpha} \delta)$$
will also be symmetric. (In particular, because $Q_2$ is symmetric, we can accomplish this by making the list $\{p_{\alpha}\}_{\alpha=1}^{\Omega}$ symmetric under the substitution $\alpha \mapsto \Omega + 1 - \alpha.$) Because $Q_3$ is symmetric, we may define $Q_4 \equiv Q_3,$ and then define $Q_5$ by choosing a new sequence $p_{\alpha}.$ And so on and so forth, ad infinitum. A good starting choice of $Q_2(\delta)$ is
$$Q_2(\delta) = e^{- i \delta h_1 / 2} \dots e^{- i \delta h_{n-1}/2} e^{- i \delta h_{n}} e^{- i \delta h_{n-1}/2} \dots e^{- i \delta h_1 / 2}.$$
 
By repeating this recursive algorithm, we can produce, for arbitrary integers $k \geq 1$, an operator $Q_{k}(\delta)$ that is made up of a sequence of exponentials of local operators satisfying
$$e^{- i \delta \sum_j h_j} = Q_k(\delta) + O(\delta^{k+1}).$$
The full time evolution operator,
$$(e^{- i \delta \sum_j h_j})^N,$$
is then given by
$$Q_{k}(\delta)^N + O(N \delta^{k+1}).$$
So to get an error of order $\epsilon$, we must choose $N \delta^{k+1} = t^{k+1}/N^{k}$ to be order $\epsilon,$ i.e., we must choose $N$ to be order
$$N \sim \frac{t^{(k+1)/k}}{\epsilon^{1/k}}.$$
For the Trotter formula at $k=1,$ we had to choose $N \sim t^2/\epsilon$; for sufficiently large $k$, this formula gives $N \sim t$! Naively, this looks like an improvement; but while the total number of local gates in the Trotter formula scaled only with $N$, the total number of gates in this formula scales also with the number of iterations $k,$ because each operator $Q_k$ is made up of a number of local gates depending on how many parameters $p_{\alpha}$ we choose at each level of the Suzuki ansatz. If we choose the same number $\Omega$ of parameters at each level, and denote by $L$ the number of gates in our seed approximant $Q_2,$ then we have:
$$\#(Q_4) = \#(Q_3) = \Omega \#(Q_2) = L \Omega,$$
$$\#(Q_6) = \#(Q_5) = \#(Q_3) \Omega = L \Omega^2,$$
and so on, giving the general formula
$$\#(Q_{k}) = L \Omega^{\text{floor}((k-1)/2)}.$$

The actual number of gates required for a Suzuki decomposition therefore goes like
$$N \#(Q_k) \sim \Omega^{\text{floor}((k-1)/2)} \frac{t^{(k+1)/k}}{\epsilon^{1/k}}.$$
For a fixed time $t$ and error threshold $\epsilon,$ the prefactor $\Omega^{\text{floor}((k-1)/2)}$ can make a big difference for large $k.$ So in practice, designing a Trotter-Suzuki algorithm for a particular simulation problem involves choosing a value of $k$ that optimizes between these two penalties: the largeness of $t^{(k+1)/k}/\epsilon^{1/k}$ if $k$ is too small, and the largeness of $\Omega^{\text{floor}((k-1)/2)}$ if $k$ is too big.

One other practical thing to consider is that in $Q_{k}(\delta),$ the actual time interval $\tau$ appearing in any particular local term $e^{- i \tau h}$ is equal to $\delta$ times a product of $\sim k/2$ parameters $p_{\alpha}$ at various levels. So if the parameters $p_{\alpha}$ are bigger than one, then for large $k$ and fixed $\delta$ we might end up needing to evolve local operators for times exponentially long in $k,$ which causes some practical issues. It is therefore useful to choose a recursion scheme so that all parameters $p_{\alpha}$ have magnitude uniformly bounded below one. A particular recursion scheme that Suzuki chooses is to set $\Omega = 5$ at each level, and to set
$$p_1 = p_2 = p_4 = p_5 = p,$$
$$p_3 = 1 - 4 p.$$
This list is symmetric, so it maintains the symmetry properties of $Q_{k}$ for all $k.$ For this list, the equation $\sum_{\alpha} p_{\alpha}^k$ is given by
$$4 p^k + (1 - 4p)^{k} = 0,$$
which for $k$ odd has the real solution
$$p = \frac{1}{4 - 4^{1/k}},$$
which satisfies
$$0.41 \approx (4 - 2^{2/3})^{-1} \geq p > 1/3$$
and therefore
$$-0.66 \approx 1- 4 (4 - 2^{2/3})^{-1} \leq (1 - 4 p) < -1/3$$
for all $k \geq 3.$ This scheme successfully bounds $|p_{\alpha}|$ uniformly below one for all levels $k.$

Comments

Popular posts from this blog

Stone's theorem

 Stone's theorem is the basic result describing group-like unitary flows on Hilbert space. If the map $t \mapsto U(t)$ is continuous in a sense we will make precise later, and each $U(t)$ is a unitary map on a Hilbert space $\mathcal{H},$ and we have $U(t+s)=U(t)U(s),$ then Stone's theorem asserts the existence of a (self-adjoint, positive definite, unbounded) operator $\Delta$ satisfying $U(t) = \Delta^{it}.$ This reduces the study of group-like unitary flows to the study of (self-adjoint, etc etc) operators. Quantum mechanically, it tells us that every group-like unitary evolution is generated by a time-independent Hamiltonian. This lets us study very general symmetry transformations in terms of Hamiltonians. The standard proof of Stone's theorem, which you'll see if you look at Wikipedia , involves trying to make sense of a limit like $\lim_{t \to 0} (U(t) - 1)/t$. However, I have recently learned of a beautiful proof of Stone's theorem that works instead by stud

Vector integration

Lately I've been thinking a lot about algebras of operators acting on a Hilbert space, since they provide an extremely useful tool for thinking about locality in quantum field theory. I'm working on a review article about Tomita-Takesaki modular theory to supplement my recent review on the type classification of von Neumann algebras . The core object of study in Tomita-Takesaki theory is a one-parameter group of unitary operators $\Delta^{it},$ generated by a single positive (often unbounded) operator $\Delta.$ In physics, the Tomita-Takesaki unitaries furnish a "hidden thermodynamic symmetry" of a physical state. A lot of interesting physics and mathematics can be learned by studying the analytic structure of the function $z \mapsto \Delta^{z}$ for generic complex $z,$ or of the function $z \mapsto \Delta^{z} |\psi\rangle$ for generic complex $z$ and some fixed state $|\psi\rangle.$ But in order to do this, we need to understand how to do complex analysis for operat

The stress-energy tensor in field theory

I came to physics research through general relativity, where the stress energy tensor plays a very important role, and where it has a single unambiguous meaning as the functional derivative of the theory with respect to metric perturbations. In flat-space quantum field theory, some texts present the stress tensor this way, while some present the stress tensor as a set of Noether currents associated with spatial translations. These definitions are usually presented as being equivalent, or rather, equivalent up to the addition of some total derivative that doesn't affect the physics. However, this is not actually the case. The two stress tensors differ by terms that can be made to vanish classically, but that have an important effect in the quantum theory. In particular, the Ward identities of the two different stress tensors are different. This has caused me a lot of grief over the years, as I've tried to compare equations between texts that use two different definitions of the