Hamiltonian simulation via the Trotter-Suzuki decomposition

This academic term, some colleagues at Stanford and I are running a journal club on Hamiltonian simulation — the problem of how to use a quantum computer to simulate the time evolution of a physical system. Hamiltonian simulation is a hot topic in research, in part because it's believed that simulating certain systems on quantum computers will allow us to probe aspects of those systems that we don't know how to access with traditional laboratory experiments.

The earliest approach to this problem, and one that is still practically useful for certain applications, makes use of the Trotter decomposition and its generalization the Trotter-Suzuki decomposition. These are algorithms for decomposing a time evolution operator that acts simultaneously on the entire quantum system, into a sequence of time evolution operators that act locally on only a few physical sites at a time. Specifically, given a time-independent Hamiltonian $H = \sum_j h_j,$ we would like to find a way to approximate the time evolution operator $e^{- i H t}$ by some product
$$\prod_{\mu = 1}^{M} e^{- i t_\mu h_{c_\mu}}. \tag{1}$$
That is, we should be able to make some list $\{h_{c_\mu}\}_{\mu=1}^{M}$ of local terms appearing in the Hamiltonian, evolve with each one for a time $t_\mu,$ and in doing so approximate the full time evolution $e^{- i H t}.$

By approximate, we mean that the two operators should be close in the operator norm. The operator norm $\lVert A \rVert$ is given by the largest singular value of $A$, which can be realized as
$$\lVert {A} \rVert^2 = \max_{|\psi\rangle \neq 0} \frac{\langle \psi | A^{\dagger} A | \psi \rangle}{\langle \psi | \psi \rangle}.$$
We say that a Trotter-Suzuki decomposition of the form (1) approximates $e^{- i H t}$ with error at most $\epsilon$ if we can prove
$$\left\lVert \prod_{\mu=1}^{M} e^{- i t_\mu h_{c_\mu}} - e^{- i H t} \right\rVert \leq \epsilon.$$

Quantum computers are designed to implement local unitaries on a few qubits at a time. The Trotter-Suzuki decomposition, because it lets us approximate a true Hamiltonian time evolution by a sequence of local unitaries, is an effective approach to simulating Hamiltonians on quantum computers.

In section 1 of this post, I review the Trotter decomposition, which appeared in this paper and is the "first-order" version of a decomposition later improved by Suzuki. In section 2, I review the Suzuki decomposition as presented in this paper. I also comment on the coefficients of higher-order error terms in the Trotter-Suzuki decomposition, and the limitations they place on the practical effectiveness of the Suzuki formula.

Thanks to Shaun Datta and Ani Krishna for spotting errors in this post.

Prerequisites: Only elementary quantum mechanics. Understanding the circuit model of quantum computation is probably helpful for understanding the motivation.

Table of Contents

The Trotter decomposition
The Suzuki decomposition

1. The Trotter decomposition

Suppose we have a Hamiltonian $H$ expressible as a sum of local terms,
$$H = \sum_j h_j,$$

with corresponding time evolution operator
$$U(t) = e^{- i t \sum_j h_j}.$$
If $t$ were small, then we could imagine approximating this operator using Taylor expansions of the exponential. But we can't put that assumption in by hand, because generally we will want to be able evolve quantum systems for long times. So the trick Trotter came up with is to subdivide the time evolution into $N$ time steps, each of length $\delta = t / N$, and consider the limit as $N$ becomes large. We can write the time evolution operator as
$$U(t) = \left(e^{- i \delta \sum_j h_j} \right)^{N}.$$

The Baker-Campbell Hausdorff formula tells us that the difference between $e^{- i \delta \sum_j h_j}$ and $\prod_j e^{- i \delta h_j}$ can be written as a power series in $\delta,$ with the first term being of order $\delta^2.$ So, naively, we can write
$$U(t) = \left(\prod_j e^{- i \delta h_j} + O(\delta^2)\right)^N = \left(\prod_j e^{- i \delta h_j}\right)^N + O(N \delta^{2}).$$

This looks pretty good! It tells us that the operator norm difference between $U(t)$ and $\left(\prod e^{- i \delta h_j} \right)^N$ scales like $N \delta^{2}.$ So if we want the total error to be $\epsilon$, then we must choose $N$ to satisfy
$$\epsilon \sim \frac{t^2}{N},$$

or
$$N \sim \frac{t^2}{\epsilon}.$$

So the Trotter formula gives us a way of approximating time evolution to accuracy $\epsilon$, when $\epsilon / t$ is small, by applying a number of local gates that scales like $t^2/\epsilon.$

Of course, there are parameters other than $\epsilon/t$ that might be relevant in a practical computation of the error. We might want to know how the error scales as we increase the total number of local terms in the Hamiltonian, or as we vary the operator norms of the local terms in the Hamiltonian. These are interesting questions, but beyond the scope of the present post. In order to address them, you'd need to use the Zassenhaus formula to bound higher-order terms in the series relating $e^{- i t \sum_j h_j}$ to $\prod_j e^{- i t h_j}.$

2. The Suzuki decomposition

Suzuki's idea for approximating a time evolution operator by a sequence of local gates starts from the same place as the Trotter formula: let's write
$$U(t) = e^{- i t \sum_j h_j} = \left( e^{- i \delta \sum_j h_j} \right)^N,$$

and then approximate $e^{- i \delta \sum_j h_j}$ for small $\delta$ by a sequence of local operators,
$$e^{- i \delta \sum_j h_j} = e^{- i \delta t_{1} h_{c_{1}}} \dots e^{- i \delta t_{M} h_{c_M}} + O(\delta^k), \tag{2}$$

where $k \geq 2$ is an integer that we'd like to make as large as possible to get the best possible approximation.

The Trotter formula tells us we can set $h_{c_j} = h_j$ and $t_{j} = 1$ in equation (2) and get an error term of order $\delta^2.$ But we can certainly imagine there being longer strings of local operators $h_{c_\mu}$ and times $t_\mu$ that give us even better approximations! A priori, the only restriction on the operators $h_{c_\mu}$ and the times $t_\mu$ is that the order-$\delta$ terms on each side of equation (2) should match, which gives us
$$\sum_j h_j = \sum_{\mu} t_{\mu} h_{c_{\mu}}. \tag{3}$$

If the local terms in the Hamiltonian are all linearly independent, then this is just the statement that on the right-hand side of (2), each local term $h_j$ should be applied for total time $\delta$. You obtain the Trotter formula by doing the whole $\delta$-length time evolution at once for each local term in sequence.

We can think of equation (3) as the unique restriction on equation (2) that constraints the error term to be order $\delta^2$ or higher. We could try to continue this way of thinking, and ask, "what restrictions are placed on $t_\mu$ and $h_{c_{\mu}}$ by requiring the error term to be order $\delta^3$ or higher?" If we were to match the order-$\delta^2$ terms on both sides of equation (2) and simplify, we'd obtain the formula
$$(\sum_j h_j)^2 = \sum_{\mu \neq \nu} t_\mu t_\nu h_{c_\mu} h_{c_\nu} + (\sum_{\mu} t_{\mu} h_{c_{\mu}})^2.$$

Thanks to equation (3) we know the left side of the equation cancels the second term on the right side, and we obtain the restriction

$$\sum_{\mu \neq \nu} t_\mu t_\nu h_{c_\mu} h_{c_\nu} = 0. \tag{4}$$

This restriction isn't so bad. But already it doesn't have as nice of a physical interpretation as does (3), and we can see that as we go to higher and higher order we're going to produce more and more nontrivial restrictions on the operators $h_{c_{\mu}}$ and times $t_{\mu}.$

Suzuki's idea was to use a simple ansatz for a string of exponentials that might work to approximate $e^{- i \delta \sum_j h_j}$ up to order $\delta^k,$ then to write equation (2) as a constraint on the parameters in that ansatz. The ansatz is recursive. To start, assume there is an operator $Q_{k-1}(\delta)$ satisfying

$$e^{- i \delta \sum_j h_j} = Q_{k-1}(\delta) + O(\delta^{k}). \tag{5}$$

We will construct an ansatz for the next-order approximation $Q_k$ by applying $Q_{k-1}(\delta)$ repeatedly for some sequence of times $p_1 \delta, \dots, p_{\Omega}\delta,$ i.e., we will write

$$Q_k(\delta) = \prod_{\alpha = 1}^{\Omega} Q_{k-1}(p_\alpha \delta).$$

We now write

$$e^{- i \delta \sum_j h_j} = Q_k(\delta) + O(\delta^{k+1}), \tag{6}$$

and ask whether there exist numbers $\{p_{\alpha}\}$ so that this equation holds.

As a first observation, we note that for $k \geq 2$ the operators $Q_{k-1}(\delta)$ and $Q_k(\delta)$ must agree at linear order in $\delta,$ which implies the restriction $\sum_{\alpha} p_{\alpha} = 1.$ We then write

$$e^{- i \delta \sum_j h_j} = \prod_{\alpha} e^{- i \delta p_{\alpha} \sum_j h_j},$$

and substitute in equation (5). When we substitute, it will be helpful to write out the order-$\delta^k$ term explicitly, so we have

$$e^{- i \delta p_{\alpha} \sum_j h_j} = Q_{k-1}(p_{\alpha} \delta) + p_{\alpha}^k \delta^{k} A + O(\delta^{k+1}).$$

After substitution, we obtain

$$e^{- i \delta \sum_j h_j} = \prod_{\alpha} \left( Q_{k-1}(p_{\alpha} \delta) + p_{\alpha}^k \delta^k A + O(\delta^{k+1}) \right).$$

Expanding out the product to leading order, we obtain

$$e^{- i \delta \sum_j h_j} = Q_k(\delta) + \sum_{\alpha} p_{\alpha}^k \delta^{k} A + O(\delta^{k+1}).$$

So if we fix the coefficients $p_{\alpha}$ so that they satisfy $\sum_\alpha p_{\alpha}^k = 0$, then all lingering terms at order $\delta^k$ sum to zero, and we are left with equation (6).

This is pretty good, but unfortunately it appears we have a little bit of a problem. For odd $k$, we can certainly find coefficients $p_{\alpha}$ satisfying $\sum_{\alpha} p_{\alpha} = 1$ and $\sum_{\alpha} p_{\alpha}^k = 0$. For example, the coefficients

$$p = p_1 = p_2 = \dots = p_{\Omega-1}, p_{\Omega} = 1 - (\Omega - 1) p$$

satisfy $\sum_{\alpha} p_{\alpha} = 1,$ and the equation $\sum_{\alpha} p_{\alpha}^k = 0$ becomes

$$(\Omega - 1) p^k + (1 - (\Omega - 1) p)^k = 0,$$

which has a real solution $p$ for any odd $k$ and $\Omega > 1.$ (This is because solutions to polynomials with real coefficients come in complex conjugate pairs, so if a polynomial has an odd number of roots then at least one of them must be real.) However, for even k, the only solution to $\sum_{\alpha} p_{\alpha}^k = 0$ is $p_1 = \dots = p_{\Omega} = 0,$ which does not satisfy the other constraint $\sum_{\alpha} p_{\alpha} = 1.$

So it appears, for the moment at least, that Suzuki's ansatz has hit a critical problem. It simply cannot be applied to produce an operator $Q_{k}$ for even $k.$ But there is a solution! What if, for odd k, we were able to show
$$e^{- i \delta \sum_j h_j} = Q_{k}(\delta) + O(\delta^{k+2})? \tag{7}$$

Then we could define the even term $Q_{k+1}$ as equal to $Q_k,$ and define the next odd term $Q_{k+2}$ from $Q_{k}$ using Suzuki's ansatz.

In particular, Suzuki shows that equation (7) holds if $Q_{k}$ has the symmetry property $Q_{k}(\delta) Q_{k}(-\delta) = 1.$ The proof is simple: we write the order-$\delta^{k+1}$ term of the time evolution operator explicitly as

$$e^{- i \delta \sum_j h_j} = Q_{k}(\delta) + A \delta^{k+1} + O(\delta^{k+2}),$$

then expand the identity $1 = e^{i \delta \sum_j h_j} e^{- i \delta \sum_j h_j}$ in $\delta$ to obtain

$$1 = Q_{k}(-\delta) A \delta^{k+1} + A Q_{k}(\delta) (-\delta)^{k+1} + O(\delta^{k+2}).$$

The order-$\delta^{k+1}$ term on the right side of this equation must vanish. Since $k$ is odd, we have $(- \delta)^{k+1} = \delta^{k+1}.$ This fact, combined with the expansion $Q_k(\delta) = 1 + O(\delta),$ gives us the vanishing of the order-$\delta^{k+1}$ term as

$$A = 0.$$

So $Q_{k}(\delta)$ does indeed agree to $e^{- i \delta \sum_j h_j}$ up to and including order $\delta^{k+1}.$

So Suzuki's idea is to start with some symmetric approximant $Q_{2}(\delta)$, with

$$e^{- i \delta \sum_j h_j} = Q_2(\delta) + O(\delta^{3}),$$

then to pick some parameters $\{p_{\alpha}\}$ summing to $1$ and with

$$\sum_{\alpha} p_{\alpha}^3 = 0,$$

in such a way that

$$Q_{3}(\delta) = \prod_{\alpha} Q_{2} (p_{\alpha} \delta)$$

will also be symmetric. (In particular, because $Q_2$ is symmetric, we can accomplish this by making the list $\{p_{\alpha}\}_{\alpha=1}^{\Omega}$ symmetric under the substitution $\alpha \mapsto \Omega + 1 - \alpha.$) Because $Q_3$ is symmetric, we may define $Q_4 \equiv Q_3,$ and then define $Q_5$ by choosing a new sequence $p_{\alpha}.$ And so on and so forth, ad infinitum. A good starting choice of $Q_2(\delta)$ is

$$Q_2(\delta) = e^{- i \delta h_1 / 2} \dots e^{- i \delta h_{n-1}/2} e^{- i \delta h_{n}} e^{- i \delta h_{n-1}/2} \dots e^{- i \delta h_1 / 2}.$$

By repeating this recursive algorithm, we can produce, for arbitrary integers $k \geq 1$, an operator $Q_{k}(\delta)$ that is made up of a sequence of exponentials of local operators satisfying

$$e^{- i \delta \sum_j h_j} = Q_k(\delta) + O(\delta^{k+1}).$$

The full time evolution operator,

$$(e^{- i \delta \sum_j h_j})^N,$$

is then given by

$$Q_{k}(\delta)^N + O(N \delta^{k+1}).$$

So to get an error of order $\epsilon$, we must choose $N \delta^{k+1} = t^{k+1}/N^{k}$ to be order $\epsilon,$ i.e., we must choose $N$ to be order

$$N \sim \frac{t^{(k+1)/k}}{\epsilon^{1/k}}.$$

For the Trotter formula at $k=1,$ we had to choose $N \sim t^2/\epsilon$; for sufficiently large $k$, this formula gives $N \sim t$! Naively, this looks like an improvement; but while the total number of local gates in the Trotter formula scaled only with $N$, the total number of gates in this formula scales also with the number of iterations $k,$ because each operator $Q_k$ is made up of a number of local gates depending on how many parameters $p_{\alpha}$ we choose at each level of the Suzuki ansatz. If we choose the same number $\Omega$ of parameters at each level, and denote by $L$ the number of gates in our seed approximant $Q_2,$ then we have:

$$\#(Q_4) = \#(Q_3) = \Omega \#(Q_2) = L \Omega,$$

$$\#(Q_6) = \#(Q_5) = \#(Q_3) \Omega = L \Omega^2,$$

and so on, giving the general formula

$$\#(Q_{k}) = L \Omega^{\text{floor}((k-1)/2)}.$$

The actual number of gates required for a Suzuki decomposition therefore goes like

$$N \#(Q_k) \sim \Omega^{\text{floor}((k-1)/2)} \frac{t^{(k+1)/k}}{\epsilon^{1/k}}.$$

For a fixed time $t$ and error threshold $\epsilon,$ the prefactor $\Omega^{\text{floor}((k-1)/2)}$ can make a big difference for large $k.$ So in practice, designing a Trotter-Suzuki algorithm for a particular simulation problem involves choosing a value of $k$ that optimizes between these two penalties: the largeness of $t^{(k+1)/k}/\epsilon^{1/k}$ if $k$ is too small, and the largeness of $\Omega^{\text{floor}((k-1)/2)}$ if $k$ is too big.

One other practical thing to consider is that in $Q_{k}(\delta),$ the actual time interval $\tau$ appearing in any particular local term $e^{- i \tau h}$ is equal to $\delta$ times a product of $\sim k/2$ parameters $p_{\alpha}$ at various levels. So if the parameters $p_{\alpha}$ are bigger than one, then for large $k$ and fixed $\delta$ we might end up needing to evolve local operators for times exponentially long in $k,$ which causes some practical issues. It is therefore useful to choose a recursion scheme so that all parameters $p_{\alpha}$ have magnitude uniformly bounded below one. A particular recursion scheme that Suzuki chooses is to set $\Omega = 5$ at each level, and to set

$$p_1 = p_2 = p_4 = p_5 = p,$$

$$p_3 = 1 - 4 p.$$

This list is symmetric, so it maintains the symmetry properties of $Q_{k}$ for all $k.$ For this list, the equation $\sum_{\alpha} p_{\alpha}^k$ is given by

$$4 p^k + (1 - 4p)^{k} = 0,$$

which for $k$ odd has the real solution

$$p = \frac{1}{4 - 4^{1/k}},$$

which satisfies

$$0.41 \approx (4 - 2^{2/3})^{-1} \geq p > 1/3$$

and therefore

$$-0.66 \approx 1- 4 (4 - 2^{2/3})^{-1} \leq (1 - 4 p) < -1/3$$

for all $k \geq 3.$ This scheme successfully bounds $|p_{\alpha}|$ uniformly below one for all levels $k.$

Research Notes

Search This Blog