Daniel Lidar: “Quantum Information Processing: Are We There Yet?”

Daniel Lidar: “Quantum Information Processing: Are We There Yet?”

MALE SPEAKER: Let’s get started. It’s my pleasure to
introduce Daniel Lidar, who’s our neighbor, so to speak. He’s a professor for physics,
electrical engineering, and chemistry at USC. And actually, he in
some ways paved the way for the Quantum AI lab to exist
because he was brave enough to start a project between
USC and Lockheed Martin to buy the first
D-Wave chip and elevate the study of the D-Wave
chip to academic levels. And Dan is an old hand in
quantum information in general. He’s been doing
it since 20 years. He’s well known for
his contributions to quantum error controller,
master equations, recently a lot of work in
quantum annealing. And he has his PhD from
Hebrew University in Jerusalem from the Physics
Institute, which gives me some
nostalgic feelings. I’ve worked there, as well. And I’m often joking with Daniel
that this area, the west side of LA area, is called the
Silicon Beach sometimes. And I say hey, if
you work hard, we can maybe upgraded
it to niobium beach. Silicon was yesterday. But yeah, let’s see how we
gotten in this whole area. Maybe the last thing
I should mention is that, by way of
what I say, Daniel is the Director of the Center
for Quantum Information, Science and Technology at USC. DANIEL LIDAR: OK, well,
thanks, [INAUDIBLE]. And thanks, Masoud, for the
invitation to speak here. It’s a real pleasure. It’s always great to come
here for lunch first. It’s better than anywhere else. And it’s also great to see that
this idea that we discussed a long time ago about
looking into the performance of the D-Wave machine
has taken off. And it’s become a rather
popular subject nowadays, although also
somewhat polarizing. And I’ll get to that. But this talk will not
be just about D-Wave. I was asked to provide
a general introduction to quantum computing
in some sense. So I’ve kind of divided
it into two parts, where the first half will
be really easygoing. And the second half will be
a little bit more technical about the actual
results that we’ve obtained using the
D-Wave machine. So the title of my talk
is “Quantum Information Processing– Are We There Yet?” And I’d like to suggest
that quantum information processing is in the
process of arriving. We have useful, commercially
available quantum cryptography. There are in particular
two companies I’m aware of that make
quantum cryptographic devices. And now, quantum cryptography,
in theory, cannot be hacked. But in practice, it has
already been hacked. In fact, there is a
quantum hacking lab led by Vadim
Makarov at Waterloo, who has a whole page dedicated
to the hacking of these two particular
cryptographic devices. So it’s a fascinating
field with lots of promise. But it’s still in its infancy. We have quantum
simulators that are becoming more and more powerful. These simulators are designed
to simulate quantum systems. They are quantum systems that
simulate other quantum systems. And that is a real burgeoning
field, very promising. But at this point, we can still
only do small scale problems. This picture in particular is
from a paper by Chris Monroe’s group where they simulated a
paramagnetic to a ferromagnetic phase transition
using 16 trapped ions. And undoubtedly, these
systems will become larger. But at this point, they’re
still pretty small. Then, we have
quantum optimization, which is what I’ll
talk about mostly. It faces many challenges. It has reached the
largest scale so far of all quantum information
processing devices. But let me not say more
about it right now. And finally,
everybody would like to build a large-scale
universal quantum computer. The gate model is
where this all started. And there, we have
fantastic progress. But we’re still also quite far. And what sets the bar
here is, in particular, the constraints
imposed by the theory of quantum fault tolerance. But we have to achieve
certain gate fidelities. And in some cases, like in the
example of the superconducting qubits produced by
John Martinis’s lab, these gate fidelities
have already exceeded, according
to certain estimates, the threshold for
fault tolerance. But still, we’re quite far from
reaching large scale systems. So I would say that the answer
to has quantum information processing arrived, the answer
is, we’re working on it. So let me, however, now
take a big step back. And in the interest of
pleasing the general audience, I would like to tell you
a little bit about why we think we need quantum
computers in the first place. So why is that an
interesting idea? Well, the first
thing we can say is that quantum computers
are just interesting. As Feynman observed
early on, we can ask what happens when we shrink
the size of components down to the size of
atoms and electrons. And of course, quantum
behavior is then dominant. And so we can ask, what happens? It’s interesting if
you build computers out of quantum components,
quantum bits. Next thing is, of course,
there are some very powerful computational
complexity motivations. There are these speed ups
that we know, or suspect, might be there. In particular, we can break
public key cryptography using Shor’s algorithm. So RSA is out the window. And Shor rules. But we can also– we suspect
that we can simulate quantum mechanics itself
exponentially faster. That goes back to the
simulators that I talked about in the previous slide. We know that we can solve linear
systems exponentially faster. Or I should say again,
we suspect we can. Even quantum Page
Rank, we can compute faster than we can classically. That is, Page Rank
can be computed faster using quantum mechanics
than we can classically. And in some cases,
there are even examples of provable speed ups. The most famous example
is quantum searching. This is Grover’s
algorithm, where we know that– we
actually really know and we can prove that there
is a quadratic speedup. Now, in addition,
there is this argument by inevitability
of why we should be interested in
quantum computing. And that’s the notion
that classical chips are going to hit a wall. So Moore’s Law, the
scaling of the density of transistors as a function
of time is exponential. And while Moore’s Law
is still holding strong, we know that already, at
some level, the party’s over. Dennard scaling, which has to do
essentially with other metrics, is already coming to an end. You see the flattening
of various other metrics, like speed and frequency
and power drawn. And the response
to that has been to increase the number of cores. So nowadays, when
you buy a laptop, you certainly have
more than one core. Four, maybe even
eight is possible. So this is a kind of parallelism
that the classical computer industry has introduced. And parallelism is a good idea. We can continue this parallelism
to an extreme, like in Titan here, currently the largest
supercomputer in the US. Soon to be eclipsed,
but a huge machine. And we can also try to, instead
of going the classical route, we can try to go
the quantum root. And that is to exploit
the phenomenon of quantum superpositions in order to
perform quantum information processing. So how does that work? A two minute primer on qubits. So a qubit is a
quantum generalization of a classical bit. It can be 1 and 0. It can be in a
superposition of two basis states, the 1 and the 0 states. Suppose you have three cubits. Then, in some sense, all eight
of these binary combinations are simultaneously represented. And if you had n cubits,
you would get two to the n binary states. And in quantum
mechanics, we know that according to the
rules of quantum mechanics, we can parallel process all
two to the n of these states. They can all be made to
evolve simultaneously with one operation. And what’s more,
the coefficients in front of these two to
the n states can be complex. In particular, they can
be positive or negative. So they can interfere. They can cancel out. And at some level, that is the
magic of quantum computing. But it’s not the whole story
because there’s a big caveat. And that is that
measurement, whether it’s intentional or not, will
take a superposition and will collapse it at
random to, let’s say, one of the basis states. And so if you evolve all two
to the n possible answers to a problem in parallel, and
then you make a measurement, you might not get
the one you wanted. You might get a random
one, which is garbage. What’s more, this can also
happen unintentionally. And this is the
process of decoherence, which is the nemesis
of quantum computing. And it’s what everybody in this
businesses is trying to fight. So what is decoherence? Well, every real
quantum system interacts with an environment, or a bath. And that environment is both
noisy and uncontrollable. And that means that it makes
measurements on your quantum system when you
don’t want it to. So imagine that you put
your quantum computer into a superposition
of two states. And along comes a
photon from somewhere and interacts with your system. So this photon is now
part of the environment. That is like a
measurement, which collapses the superposition. So the consequence, one
element of your superposition will disappear. And you’re left with just
a single classical state. And that’s very bad. That happens at random. So it turns out, in
fact, there’s a theorem that a sufficiently
decoherent quantum computer can be efficiently simulated
on a classical computer. So too much decoherence
means it’s a no go. And you have to engineer
your system better, or you have to protect it using
tricks from error corruption. OK, so how are qubits realized? There are many
different proposals, as many as there are
physical systems that can support two orthogonal
states, if you’d like. Basically, a qubit is any
well-defined quantum two-level system. And of the long
list, I’ve picked a few of the more popular ones. Trapped ions, for example,
where two atomic energy levels represent
the qubit there. Photon polarization, horizontal
vertical polarization could be the two
states of your qubit that can be in superposition. In the domain of
superconductivity, there are many different
proposals– charge flux, phase in particular. So here are some
pictures from those. And I’ll talk about the flux
qubits when we get to D-Wave. The two states of an electron,
spin up and spin down. So whether the electron’s
spin is spinning clockwise or counterclockwise,
if you’d like. Same for a nucleus, and
the list goes on and on. Any two level system is good. AUDIENCE: Does it
matter whether or not they’re degenerate
eigenstates, or that they have the same energy? DANIEL LIDAR: The question
was whether it matters whether they’re
degenerate eigenstates. Degeneracy can be useful. It can help you store
information more robustly. But it’s not necessary. AUDIENCE: If they’re
not degenerate, then they revolve at
different frequencies, right? DANIEL LIDAR: Right, right. But if they’re not
degenerate, then you can also drive a transition
between them more easily. So there are advantages
here or there. So it’s a fact of this
business that in spite of 20 plus years of very
hard experimental work and theoretical work, as well,
we are still at the level where we just have a handful of
qubits in most of our quantum computing devices, quantum
information processing devices. In that sense,
quantum computation still hasn’t arrived. We still can’t really
solve– we can’t really capitalize on the promise
of quantum computing yet. It’s, I believe, an undisputed
fact that you still cannot factor nontrivial numbers using
quantum mechanics or using Shor’s algorithm faster on a
quantum device than you can classically. So in that sense,
we’re working hard. We’re making great progress,
like this chip here, which I already featured
earlier from the Martinis lab. Great gate fidelities,
but the numbers of qubits are still not a point
where we have large scale. So from that perspective,
quantum computation still hasn’t arrived. However, claims to the
contrary have been made. And in particular,
the company D-Wave has, at some points at least–
I don’t know if this is still on any of their web pages, but
it was at some point, the claim that quantum computing has
arrived with the devices that they’ve built. And these devices are not cheap. You can buy yourself one for
around $10 to $15 million. The customers are, so
far, Lockheed Martin, with a machine that we
have installed at USC, and Google, with a
machine installed at NASA Ames, some heavy
duty investors, and a lot of money raised in
venture capital. So of course, this raises
the interesting question of whether quantum
computing, in fact, has arrived with the arrival
of the D-Wave machines. And we, at USC, and people here
at Google and NASA and Lockheed Martin, as well, set
out to try to address this question of whether
the D-Wave machines have now launched the era of
quantum computing. So there are many questions. And basically, I think
these are the key questions. And so if quantum
computing has arrived, then the D-Wave machine
has got to be quantum. And if so, is it
faster than classical? If it’s not faster than
classical, but it’s quantum, then maybe there’s something
wrong, like decoherence. And then maybe we can
do something about that. We can improve it. And ultimately,
what we do with it? What is this computer good for? So let me briefly go
over some answers. And then, I’ll spend
some more time in detail on some of these answers. So first, regarding the
question of quantumness, that is not an easy yes no
type answer that one can give. But at a rough level, let
me say that at this point, I believe the question
of quantumness has been answered
in the affirmative. Various techniques
have been introduced. People have come up
with classical models, and those have been
put to the test, and essentially
rejected successfully. Entanglement has been
measured in D-Wave devices, and multi-qubit
tunneling has also been measured in D-Wave devices. So these are some
pretty strong answers. In particular,
entanglement is something that you can’t
really argue with. If there’s entanglement,
then there’s quantumness. So I would dare to put a
check mark next to this is it quantum question. Is it quantum enough
to give us a speedup? That is a totally
different question. And this has to do with whether
the quantum effects are or are not suppressed by decoherence. And we’ve looked
at that carefully. And it’s a work in progress. Here’s one paper on that topic. And I’ll say a few
words about it later. Now, if it’s not
quantum enough, then we can try to use error
correction tricks. And so again, we
have work on that. And I’ll definitely
spend some time talking about our error
correction results. And the answer there is,
yes, we can certainly improve the device. We don’t know yet whether we
can improve it to the point that we can actually make
it faster than classical. But we can do quite a
bit that’s beneficial using error correction. And finally, what
we can do with it, there are many
different applications. And I actually won’t spend
any time talking about that. Here’s a couple of
pointers to the literature. There are many more. OK, so what’s in the
black box, briefly? So here’s the box. If you open it, you will
find a dilution refrigerator. Inside that, you’ll find some
powerful magnetic shielding. So together, they bring the
system down, in our case at you USC, to 17 millikelvin operating
temperature, and one nanoTesla in 3D across the processor. These are very good numbers. If you zoom in more, you’ll find
lots of classical control lines that feed the
signals to the chip. And the chip itself is in here. This is actually the chip,
the little white square there. And the chip consists of
superconducting flux qubits, so called RF SQUIDs. So here’s a close up
on one of the chips. Actually, on one
of the unit cells. So the chip consists
of unit cells. And each unit cell itself
contains eight qubits. And they’re arranged in
this vertical and horizontal pattern. Each qubit is an elongated
loop that carries flux. And this is superconducting
flux, superconducting current. And that current can
flow, if you’d like, clockwise and counterclockwise
simultaneously. And that is the qubit
that we’re using here. So there are eight
cubits per unit cell. And then, if you take these
unit cells and you tile them, you get the entire chip. In this case, this is the
Vesuvius D-Wave Two chip that we have at USC. It contains an eight by
eight array of unit cells. So 8 times 8 times 8,
that gives you 512 qubits. Although not all
512 qubits yielded. And in fact, on the
current chip that we have, 504 of the 512 qubits
are operational. Now, it’s useful to
draw a little diagram to understand the
connectivity of these qubits. So here are the four vertical
and four horizontal qubits. Where they intersect,
you get a coupling. That’s where the qubits
interact and talk to each other. And it’s useful to represent
that in this graphical form, where this is just a
K44 bipartite graph. So now, the circles
are the qubits. Previously, the
qubits were the lines. That’s what they
are in real life. But in this diagram, the
qubits are the circles. And the lines here
in this K44 graph are the interactions
between the qubits. So you see, every
qubit on the left is coupled to every
quantum on the right. But there are no couplings
going down on either side. OK, so how does the
chip itself hook up? Well, this is the
unit cell again. So four qubits on
the left, four qubits on the right with
their couplings. And then, the entire
chip is a tiling of eight by eight unit cells. And the connectivity
among the qubits, the intercell connectivity among
the qubits is depicted here. You can hopefully make out
that qubits on the right column couple to qubits in the right
column of the neighboring unit cell. And qubits in the
left column couple down or up to qubits
in the left column of the corresponding unit cell. So this is the so-called
chimera coupling graph of the D-Wave chip. The degree of this graph is
six, at least in the bulk. Every quantum is coupled
to six other qubits. It’s obviously not
a complete graph, which presents some
challenges when you try to embed real
life problems into it. Every real life
problem has to be mapped to this
particular architecture. But it’s a rich structure. And it’s powerful
enough to support hard computational problems. So what is the
D-Wave chip, then, beyond this
architectural layout? Well, in fact, it’s a
special purpose processor. It implements a type
of quantum computing called quantum annealing. It’s not universal
quantum computing. It’s designed to solve a
specific class of problems, namely optimization problems. And those optimization
problems are ones that are captured by
the classical Ising model. So what’s the
classical Ising model? Very briefly, it goes under many
names depending on the field, but in physics, this is
how we think about it. You have binary spin variables. These are now classical spins. They can be plus or minus 1. We have n of them. And every Ising problem
can be represented in terms of a Hamiltonian,
or an energy function, which is a sum of single spin terms
and a sum of two spin terms. These coefficients, the h’s and
the j’s, the h’s are actually local magnetic fields
that can be applied to the qubits in the
D-Wave chip, if you’d like. And the j’s are the
interactions among the qubits in the D-Wave chip. But here, they are
simple the coefficients that define an Ising problem. For every Ising
problem, once you write down the h’s
and the j’s, and you specify the connectivity, you’ve
written down an Ising problem. And then, the problem
what’s the problem? The problem is to minimize
this energy, this Hamiltonian. That is, to find the
configuration of the spins– and I’m sorry, I switched
notation here, I just realized. The s’s and the sigmas
mean the same thing. So the s’s are the
binary variables. The simgas are, in this
context, again the same binary variables. The problem is to find
the configuration, or the values of these binary
variables, which minimize, given the h’s and the j’s,
the value of this H Ising. And it turns out
that this problem of finding that minimizing
configuration is NP hard. Already, if you
just limit yourself to you can set all
the h’s to 0, you could set all the j’s
to plus or minus 1. And if you put that on a
non-planar graph, it turns out this problem is NP hard. And the fact it’s NP hard
means that any problem in NP can be mapped to this one with,
at most, polynomial overhead. And so there are many very
interesting optimization problems, like traveling
salesman and satisfiability of Boolean formulas in
machine learning, which is a big deal here at
Google, of course, that can be mapped to minimizing
this Ising Hamiltonian. So it’s a very rich class
of optimization problems that are captured by this model. So how do you solve
the Ising model? How do you actually find the
minimum energy configuration? So here’s one of the workhorses
of classical techniques, classical heuristic techniques,
simulated annealing. And in simulated
annealing, essentially you imagine that there’s
some energy landscape. And what we’re trying to
find is this global minimum. That is, the spin configuration
which minimizes H Ising. And the way that’s done
is by a combination of downward moves
plus upward moves. Downward moves are dictated by
this acceptance formula, where essentially we’re trying to
minimize the energy difference. Sorry, we’re trying to
minimize the energy. So every time we make a move,
we check the energy difference. That’s delta E. And
if the energy is less, then we accept
with probability 1. And if the energy
is more, then we can make a thermal
hop over this barrier with a probability that’s
given by this Boltzmann factor. This is called the
metropolis rule. So essentially, we go downhill. And then, when we’re stuck, we
don’t have to be stuck forever. There is some
probability that we’ll make a hop over this hill,
which is dictated by the energy difference. But also, by a parameter which
is like a physical temperature. But it’s really just a
simulation parameter. This temperature is typically
lowered gradually so that the probability of making
a thermal hop goes down. And if you’re lucky, you
will find the global minimum in this manner. So lowering the temperature
is called annealing. And it turns off these
thermal fluctuations that allow you to
hop over a barrier. So this technique
is very powerful. It’s very popular,
simulated annealing. OK, so now let’s
go back to D-Wave after this brief introduction. And let me start to talk
about some of the actual work that we and others
have done in trying to address these questions. So first, regarding
quantumness tests, so let me tell you about
the test that we did, which was published
in this paper last year, which
attempted to distinguish the D-Wave performance
from the performance of classical algorithms, as
well as a simulation of quantum dynamics. So the strategy was to find hard
test problems for the machine. And so these test
problems were designed– this is, by the way, the
chimera graph of the D-Wave One machine, the first of
the machines that we had, with 108 functional
qubits out of 128. So what we did was we
picked random couplings, let’s say plus or minus
1, although we also considered other values, on
all the edges of this chimera graph. Now, if you have an
Ising problem that has random plus minus 1
couplings on this graph– I already mentioned
earlier, this is NP hard because this graph
is non-planar. So these are hard problems. These are not the
kind of problems you can easily
solve in your head. And we picked 1,000
different choices of these random couplings. So every time, we
picked a plus or minus 1 for all these couplings. And we did that 1,000 times. And then, we ran the
machine 1,000 times per problem instance. So we did that using
the D-Wave One. At the same time, we
also tried to solve the same problem, the same
Ising problem, the same 1,000 of these Ising problems, using
classical simulated annealing. A classical model
called spin dynamics, which I’ll describe
momentarily, and also a quantum model, so-called
simulated annealing, or quantum Monte Carlo,
which is essentially a description of the
quantum equilibrium state as the D-Wave
machine evolves. Or so we believe. So let me briefly talk about
this classical spin dynamics. So this is one of
the classical models that we compare the
D-Wave machine against. Every qubit is represented
as a classical spin. And we simply solve
the Newton’s equation of motion for that spin
with a magnetic field that mimics the
D-Wave Hamiltonian. So you have a spin. It evolves according
to an effective field. That’s essentially the same as
what the D-Wave qubits field. All right, so what’s
the idea here? The idea is to look at the
output of the D-Wave machine. If it’s a quantum machine,
then perhaps the output is going to be distinguished
from the output of these classical models. And if the quantum
dynamics model is correct, this quantum Monte
Carlo model, then it will match the output
of the D-Wave machine. So to do that,
here’s what we did. We pick one of the 1,000
specific random instances, performed 1,000 annealing runs. So we ran the
machine 1,000 times, just to collect statistics. And if we find
the correct ground state, the correct answer,
S times out of 1,000 runs, we call that the
success probability, P, calculated as S over 1,000. Then, we repeat that
for many instances. That’s the 1,000
different instances. And we plot the histogram of
the success probabilities. OK, so here, this
graph is actually from the classical spin
dynamics simulation. But it doesn’t matter. It’s the same idea. You have the number of instances
that had a given success probability being plotted here. And so for example, if
you had a certain success probability of 0.4, your grouped
together all the instances of 1,000 that had success
probability 0.4 into this bin. So just a histogram
of the number of instances with a given
success probability. Now, if you had instances
that had low success probability over here,
we call them hard. And if you had instances that
had high success probability, we call them easy. So over here are
the instances where the ground state
was found always. Over here, you have
instances where the ground state
was never found. They were hard. All right, so now, here are
the experimental results for these histograms. This is from the
D-Wave One experiment. And you see that it’s bimodal. It has a peak at low
success probability. So it finds a lot of
the instances hard. But it also finds a lot
of the instances easy. And most of the intermediate
success probabilities are not populated very much. Here’s the histogram
for the quantum model. And it looks remarkably similar. So in this sense,
the D-Wave result matches the predictions of
quantum model, quantum Monte Carlo. Here is the result from
simulated annealing. And you see that it
doesn’t match at all. So simulated
annealing, which again is the model that inspires
quantum annealing, which is the hypothesis
we wanted to test that the D-Wave machine
implements does not match. Classical simulated
annealing does not match the D-Wave
experimental results. However, the spin
dynamics model, which was this model that I
talked about on this slide here. Sorry, not responding
as quickly as I’d like. So this model
here, the so called Landau-Lifshitz-Gilbert
model of spins that are evolving according
to the dynamics that is similar to that of
the D-Wave machine. But they’re classical spins. That model also has a
bimodal distribution. And so at least at a
qualitative level, the fact that it’s bimodal as
opposed to unimodal, that is a match for
the D-Wave results. So that’s a little
disappointing, perhaps. So we’re finding that
the D-Wave results are inconsistent with
the thermal annealer, the classical simulated
annealing model. But it’s consistent
with both a quantum model and this other classical
model of spin dynamics. But if you look
carefully, you see that this peak here
is way too big. And so if you look
a little deeper, instead of looking
at histograms, we could look at correlation
plots, which are more detailed. So here, every instance of the
1,000 instances that we ran is plotted on this
correlation plot. And the colors are
just the number of instances with a given
pair of probabilities. So on this axis, it’s
the probabilities of running the D-Wave
a certain– well, there’s a process
called gauge averaging. But you could think of this as
D-Wave compared against D-Wave. And so if D-Wave were
perfectly stable and perfectly reproducible, then
every instance would have the same
probability whether I ran it this way or that way. But that’s not the case. We see there’s some noise. So ideally, all
the instances would have been on this
diagonal, meaning that there’s
perfect correlation. That’s not what we’re seeing. So you see that there’s
some scatter here. Sometimes, we ran the machine,
and it produced a probability of 0.4 for the same instance. Run it a different time, it
produces a probability of 0.1. So there’s a some scatter. OK, so this is the
noise that is intrinsic. Now, if we compare D-Wave to
simulated quantum annealing to the quantum model,
we see that it’s noisy. But it’s just about as noisy
as D-Wave against itself. You can’t expect it to be
better than these correlations. So once again, we have
confirmation that the quantum model correlates very
well with D-Wave, about as well as D-Wave
correlates with itself. AUDIENCE: Could
you do [INAUDIBLE]? DANIEL LIDAR: Uh, sure, yeah. And it correlates well. But if you look at simulated
annealing versus D-Wave, where we already saw that the
histograms were a poor match, we now see in more detail that
the correlation plots are also a poor match. So that’s no surprise. We already had a rejection
of simulated annealing, if you’d like. But now, here’s the
spin dynamics model, the one that also has the
bimodal distribution, just like D-Wave. And you see that it actually
correlates very poorly. And you see that huge peak
of low success probability that it had right here. So this, the more
detailed analysis, rules out spin dynamics,
classical spin dynamics, as a good model of
the D-Wave machine. So what we’re left
with is a good match with simulated
quantum annealing. There’s a quantum model
that correlates well with the D-Wave machine. And there are two
classical models that we’ve been able to rule
out using this technique. And of course, you
can say, that’s not the end of the story. It couldn’t be, right? Because there are infinitely
many possible classical models. And that’s true. And in fact, after
we did this work, a paper came out which proposed
a different classical model, which was in a way a hybrid
of classical spin dynamics and simulated annealing
with Monte Carlo updates for the angles
of [? O2 ?] rotors, or of classical
spins in the plane. And it turned out that
this model actually was a fantastic match for the
output of the D-Wave machine, not only in terms
of the histogram, but also in terms of
these correlation plots. So the fact that we found a good
match with quantum dynamics, with quantum Monte
Carlo, does not prove that the dynamics of
the D-Wave machine is quantum. It just means that it
agrees with quantum. But in the standard Popperian
way of doing science, we can rule out a hypothesis. But you can never prove
anything experimentally. You can disprove. So we disproved simulated
annealing and classical spin dynamics as candidate models
for the D-Wave machine. We found a match with
quantum Monte Carlo. But now, there’s a classical
model that is also a match. So that forced us
to look even deeper. And we found– I don’t
have the references here, unfortunately– but we have
a couple of other papers where we looked more
deeply into this model. And we found that there are
other aspects that don’t match. So I won’t get into that,
in the interest of time. But it turns out that this
model, too, can be ruled out. OK, so you can play
this game forever. Somebody can now come up with
yet another classical model. And this is a good
way of doing science because these
classical models are– they capture a certain physics. And so by ruling out that this
is the physics that’s going on, we are actually learning
something very valuable. But there are other ways of
addressing this question. Not only do we have
additional quantum models that agree with the D-Wave data. We now also have– and this
thanks to work by people here, in particular Sergio
Boixo, and the D-Wave team and many others, some of whom
are present in this room, we now have an
affirmative demonstration of the quantumness of
the D-Wave machine. And that is in terms of
entanglement and multi-qubit tunneling. These, especially
entanglement, is very hard to fake classically. The only downside
of these results is that they are for
small numbers of qubits. In particular, the
entanglement experiment is done for up to eight qubits. And in principle, I
believe the experiments can be extended to
larger numbers of qubits. But that’s very challenging. The advantage of the
approach that we pursued here is that it allowed us to address
very large numbers of qubits. More than 100, in fact,
with the D-Wave Two chip. Similar types of
experiments have been done. And we’re looking at 500 cubits. OK, so that’s all
I wanted to say about the story of
quantumness testing. Again, I believe that we can
put a check mark next to it. And we can move
on to the question of whether the
machine is faster. So this is the
benchmarking problem. Now, this is a busy slide. And I’m going to try to explain
it to you as best I can. It’s from this paper that
was published last year. And it shows you the
performance in terms of time to solution
for the D-Wave machine and simulated annealing as
a function of problem size. So we’re plotting square
root of the number of spins here for a technical
reason that has to do with the tree width
of the chimera graph. It doesn’t matter. So this is log of
time to solution as a function of problem size. The dashed lines are
simulated annealing. The dashed lines are
simulated annealing. The solid lines are D-Wave. And the different colors
represent percentiles. So you can think of
that as hardness. So the hardest problems are
at the highest percentile. The 99th percentile
is the 99th percentile of the hardest problems. So 99 is hard, 1% are
the easier problems. And moreover, you
need to know what the problem parameters were. So we set the local fields to 0. We set the couplings
to plus or minus 1 in this point at random. Plus or minus 1 at
random, just like in the previous
quantumness testing slides. And now, what we’re
interested in is, what does the time to
solution scale like? So this is log versus
square root or [INAUDIBLE], if you like. So a straight line here means
that time scales exponentially in this variable, root n. And we see that these lines,
they tend to become straight. So there is exponential
scaling, which is consistent with the
hardness of these problems. But the slope is what matters. If there’s an advantage in
using the D-Wave machine, then its slope should be lower. It means it scales better. So what do we actually see? Well, we see that– let’s
compare like colors. So take green for example. Green is the median. So you see that apart from the
behavior for very small problem sizes here, there is a
nice match in the slopes. So it looks like for the median,
D-Wave and simulated annealing, if you want to extrapolate,
they scale very similarly. On the other hand, if you look
at the hard problems– that’s the black line– the reason
the D-Wave line terminates here is because it
actually couldn’t find the solutions here. And I think your
eye will tell you that this slope is somewhat
larger than the slope of the simulated
annealing black line. And in fact, if you
carefully analyze it, you find that that’s the case. So D-Wave scales worse here
than simulated annealing. On the other hand, it
seems to scale better for the easier problems,
the lower percentiles. So we might be
tempted to conclude that we see a break
even performance for the median percentiles. We see worse performance
for the harder problems. And we see an advantage for
D-Wave on the easier problems. But we have to be careful
because, in fact, it turns out that these lines are
a little bit too flat. And what’s really
going on apparently is that the
annealing time, which is the time to run a single
cycle on the D-Wave machine, which happens to be 20
microseconds on this D-Wave Two machine, is too long. And that means
that it’s taking it too long to solve
the small problems. Small problems are easy. And if you just wait a long time
before you spit out the answer, well then, you’ll get an
artificially flat slope. So we have to be very
careful in interpreting this as a speed up. And if we go to
harder problems, this is where the j’s are
now chosen as multiples of plus or minus 1/6. We see, first of all,
if we compare these two, you see that the slopes
have all gone up. This is the harder
problems again. And now also, it seems
like the advantage we might have had for
the easier problems, that advantage seems to
have largely dissipated. The slopes now become
kind of similar. So overall, it
doesn’t look like, for these random
Ising problems, we see an advantage using
the D-Wave machine over simulated annealing. And simulated annealing
is not necessarily the best classical
algorithm out there. There are highly
optimized algorithms that are available that
will perform better than simulated annealing. So at this point, at
least from this study, we were unable to conclude
that there was a speedup. And there are many reasons
that can come into play as to why there’s no evidence
of a speedup here. One very interesting reason
that was proposed in this paper by Katzgraber et. al. is that actually, this set
of problems, the random Ising problems, they might
actually be too easy for simulated annealing. Technically, what
Katzgraber et. al. pointed out is that there is
no spin glass phase, which is where the
problems become hard, until the very end in simulated
annealing, until you reduce the temperature all
the way down to zero. Roughly, this means
that these problems are in some sense too easy
for classical algorithms. And that’s why you don’t
expect there to be a speed up. And it’s fair. You have to be very
careful in the way that you choose your
problems if you’re looking for a quantum speedup. If you pick a random
problem and you fed it to a perfectly working circuit
model quantum computer, in all likelihood, you
would not see a speed up. We know there’s a
speed up for factoring. That’s a very carefully
crafted problem. Random problems
don’t necessarily have to be amenable
to a speedup. So this is one
potential problem. There are other things
that might be going on. And to me, the most
interesting possibility is that what’s really going on
is that the machine is actually still too noisy. Too decoherent. And this would not be
surprising because we know from very
general considerations that quantum computing
is unlikely to work without error correction. So we have to think about
error correction carefully. This was pointed out from
the very earliest days of quantum computing. There’s always decoherence
and noise going on. I mentioned in the
beginning, when I told you about the random collapse
of the superposition. We know that error
correction is essential. And so is it possible that
this low success probability that we see in the
D-Wave output over here, this peak, which is responsible
for the scaling partly, is it possible that this
is actually correctable? Maybe we can get rid
of these low success probability events by
introducing error correction. That would be very
exciting, if we can do that. So that’s the last thing. The last topic I’d
like to talk about is the topic of
error correction. I dare say, a form
of quantum error correction even for
the D-Wave machine. And the example I’d
like to start with is a very simple one. It’s the case of
antiferromagnetic chains. Let’s see how the D-Wave
machine performs on these. So what’s an
antiferromagnetic chain? It’s simply a chain
of qubits or spins that are coupled with a
positive coupling constant. And the positive
coupling constant means that they
want to anti-align. So they want to flip-flop. And that minimizes the energy
because then the product of z sub i, which is plus or
minus 1, times z sub i plus 1 will be negative. So in this problem, there
are two ground states. The ground state of up,
down, up, down, up down, and the other ground state is
down, up, down, up, down, up. Those are the two ground states
of a ferromagnetic chain. OK, so this is a
trivial problem. You probably solved
it in your head faster than it took me
to say all these words. Now the question
is, what happens when we run the D-Wave machine
on this kind of problem? How well does it do? And it turns out that actually,
it doesn’t do very well. So here’s the probability
of the correct answer. That is, the correct answer is
finding either one of the two ground states as a function
of the length of the chain. And you see that this
probability drops rather fast. So it’s not finding,
for chains of length 80 and above or so, the
probability is just about 0. It’s not finding
these ground states. What’s going on? And this is a trivial problem. You might say, well, if it
can’t solve a trivial problem, how is it going to
solve a hard problem? But actually, while this is
a trivial problem for you, it’s not a trivial problem for a
machine like the D-Wave machine because of a phenomenon
known as domain walls. And so let’s say that bit number
three here flips accidentally. There was a thermal
fluctuation that flipped it. Well then, what’s going
to happen is a cascade. Energetically, it’s going
to be favorable for all the other bits to flip as well. And that’s an error. Right here is where
the errors form. This is a kink right here,
and then the rest of this is called a domain wall. So this is the kind
of state that you would get which would correspond
to an error in the output. It’s the wrong answer. It’s not the ground state. It’s in fact an excited state. And you can easily
see that these are– there’s a
combinatorial explosion of these kind of possibilities. So their entropy is very high. OK, so what can we do? We can try to error correct
this type of process. And the error
correction, I’m going to glue together a bunch
of different concepts here which hopefully will be clear. So the first thing
we’re going to do is we’re going to use
a repetition code. Instead of thinking about the
spins as individual spins, or a qubit as an
individual qubit, let’s encode each
physical qubit into three. So we’re going to represent
an encoded qubit using three physical qubits. So this is going to be an
example of a repetition code. And I’ve drawn two here, because
I want you to imagine this as being part of a unit cell. And then, the next
thing we’re going to do is we’re going to add
a fourth qubit here, let’s say the red one. This is now an entire unit
cell, eight qubits total. And this fourth qubit is going
to be coupled ferromagnetically to the other three. Ferromagnetically
means that it’s going to try to keep
these three in check. So these three are going to have
to be aligned with the fourth. Same is true for the blue. These three blue ones
are going to have to be aligned with the
fourth blue one because of the ferromagnetic coupling. So this is the term
that’s being added, if you like, to the
problem Hamiltonian here. This is the penalty term. And then, we’re going to combine
the problem Hamiltonian, which is now encoded– that’s
why we have bars. The bars represent
these logical qubits. We add these two. And we stick coefficients
in front of them, which we can control. So alpha times the encoded
problem Hamiltonian plus beta times this penalty. And this is our
total new encoded plus penalty Ising Hamiltonian. So this is the way we’re going
to represent the problem. Rather than in its
bare and naked form, we’re going to replace the
original Ising Hamiltonian by this expression here,
which has two advantages. There is a repetition
code, which we can decode. And there’s a
penalty term, which suppresses random flips of these
three data qubits over here. Because they’re forced to
align with the fourth one. So this is the encoded
problem that we’re going to run on
the D-Wave machine. And everything here is feasible
because all we’re doing is, if you look carefully,
you see that we’re just adding sigma z
times sigma z terms. We’re just adding interactions
which are available to us on the machine. And this is a big deal because
a lot of error correction strategies that are available
from the literature in quantum error correction are not
implementable on a machine like the D-Wave machine because
they would require higher order interactions. That’s a three-body or more. Here, everything’s
two-body interaction. So this is actually
implementable on the D-Wave machine. And again, the fact that
we have a repetition code allows us, at the end, to
decode by a majority vote. So we just do a majority
vote on this triple here and see which
is the majority, and take that as the
answer to the problem. So what happens when we
apply this technique? Now, what you’re seeing here
is the same plot as before, the probability of
a correct ground state of the
antiferromagnetic chain as a function of chain length. And the first curve to notice
is that curve that I showed you on that first slide of the
antiferromagnetic chain. So this is the probability
as a function of chain length without doing anything,
where it dropped very fast. But now, you see that all
these other curves are higher. And they actually correspond
to different levels of doing error correction. And there are too many
curves here to explain. So I’ll just ask you
to focus on two others. This is the benchmark,
the reference. This is what we have
to do better than. Now, if you look at what
I’m calling classical, which is these triangles
pointing to the right, it’s this curve
that starts up here. And then, it goes like that. So what is classical? Classical is when all
you do is you just use triples instead of a single. So you encode your qubits
but forget the penalty. That’s just a pure
classical repetition code. So instead of working
with a single qubit, we’re working with triples. And then, we do majority voting. That’s that curve, classical. So of course, that does
better than doing nothing because you’re giving yourself
three chances, in a way, and that’s better. But if we also add the
energy penalty, which gives rise to– it
actually gives rise to a non-commuting
term, and so it’s a quantum effect,
this energy penalty, we get the yellow
data, the diamonds, what I’m calling QAC for
Quantum Annealing Correction. And you see that
while for short chain, it doesn’t work better than
the simple classical strategy, as the chains get long
enough, it starts to win. So this strategy of not
only encoding, but also adding an energy penalty, which
locks the qubits into place, and again, there’s a quantum
effect going on there because of non-commutativity,
this strategy is the best of all the
strategies that we’ve explored. Now, how well does
it actually work? So here, I’m looking at
the probability of success as a function of this problem
energy parameter, alpha. Remember, that’s the coefficient
in front of the encoded Ising Hamiltonian. Think of it as temperature. Think of it as
probability as a function of inverse temperature. So temperature is
high over here. Temperature is low over there. And this is for a fixed
chain of length 86. That’s the longest encoding
chain we could fit on the chip. So these are the
hardest problems. They’re the longest chains. So success probability
as a function of effectively temperature,
hot temperature mean success probability is low. Cold temperatures mean success
probability is effectively high. But the interesting thing is
that the unprotected chain, where we don’t do any
encoding or energy penalties, is down here. And you see that when
this problem energy scale or the temperature
is sufficiently high, the probability drops to zero. Yet, if we do our quantum
annealing correction, we can actually kind
of revive it from zero to some small value. And the revival
happens everywhere. At any scale of the problem
energy, or at any temperature, effective temperature,
we get an improvement by using this quantum
annealing correction. And we always do better than
the simple classical strategy, which are these triangles here. So adding the energy penalty, in
addition to doing the majority vote on encoded qubits, is the
strategy that always does best. And the relative
improvement gets better as you decrease the
problem energy scale or as you increase the
effective temperature. So we win more as the
noise gets higher. And that’s very good. So the last couple
slides, I just want to very briefly
show you what happens when we apply this error
correction technique to not chains, which are
intrinsically uninteresting as an optimization
problem, but what happens when we apply
the error correction technique to hard problems,
actual optimization problems. So this is the chimera
graph of the D-Wave chip after you encode
it using this code. We started from a
degree six graph, and it reduces down to a degree
three graph, as it turns out. It’s not hard to see that. So what we do is very similar
to our benchmarking studies. We pick 1,000 random problem
instances over this graph. We use couplings that are
multiples of plus or minus 6. So those are fairly hard
optimization problems. And then, we do the same
thing as we did earlier with benchmarking. We run each of these
instances 1,000 times at the lowest annealing time
available of 20 microseconds, compute the empirical
success probability, infer the time to solution. And now, we’re going
to compare the results of the classical repetition
code to the quantum annealing correction strategy
on these problems. And what we find is
this is effectively the time to solution. Or if you’d like, it’s
actually the number of repetitions required,
how many times you’d have to run the machine to
succeed with probability 99%, again as a function
of problem size. And now, what we’re comparing
is not simulated annealing, but rather this classical
error correction strategy. That’s C. That’s the solid
lines to the quantum annealing correction. And what we find is that for
the different percentiles, consistently across the
different percentiles, the quantum annealing
correction strategy reduces the required
number of repetitions. That is, it increases
the success probability of finding the ground state. And the improvement is better
the harder the problems get in two ways, both in
terms of larger problems get bigger improvement, and also
in terms of higher percentiles. The higher percentile here
is the 95th percentile. The improvement is
relatively larger than for the lowest percentile. So the harder the problems,
the more this strategy is effective. And what’s particularly
important here is that the slope,
if you can associate a slope with this rather
discrete looking curve, it’s clear that the slope
is lowered by the quantum annealing correction strategy. So we’ve actually improved
the scaling of the time to solution using this strategy. And that’s exactly
the kind of effect that we’re looking for,
using error correction to improve scaling. OK, so with that,
let me summarize. There’s a few references here
which address some of the work that I talked about. And I’ve asked a few
questions about D-Wave. Is it a quantum annealer? And the evidence we have
suggests that it probably is in the sense
that it disagrees with all the classical
models that we’ve tried to test it against. And moreover, it
exhibits entanglement, at least for small
numbers of qubits. So that’s good news
for quantum annealing. Does it actually do what
it’s advertised to do? That is, does it actually
solve optimization problems using quantum annealing? Probably yes. It certainly does solve
Ising-like optimization problems. Not with 100%
success probability, for all the reasons
I told you about. Does it do it well enough
to generate a speed up? It’s too early to tell. But it’s likely that
using error fraction, we’ll be able to
get the performance to improve significantly. In fact, I’ve shown you some
evidence to that effect. And finally, this
is where we started. So has quantum information
processing arrived? And I would say that the
answer is undoubtedly not yet. We’re working on it. And let me thank
all of the people that I had the pleasure
to collaborate with, in no particular
order, actually. So listed here
are all the people that contributed to the work
that I told you about here. Josh Job is a grad
student in my group. Zhihui Wang was a postdoc
who is on her way to NASA. Sergio did some of this
work while he was at USC, and he’s been at
Google for a while now. Troels Ronnow, postdoc of
Mathias Troyer down here, Sergei Isakov as
well, now at Google. Dave Wecker at Microsoft, John
Martinis, who I mentioned, and Kristen Pudenz, who was
a grad student in my group, is now at Lockheed Martin. And she and postdoc
Tameen Albash did the error correction work. And also, let me thank all
the generous sponsors listed at the bottom here, and
you for your attention. [APPLAUSE] MALE SPEAKER: So we’re
a little over time, but someone had some questions. AUDIENCE: Yeah, if I understood
the QAC correction correctly, it seems that you need
four x as many qubits. DANIEL LIDAR: Yes. AUDIENCE: Is that right? So given that qubits are
such a scarce resource, this is not a practical
solution at the moment. Or maybe it is? I don’t know given the scales
of the problem, whatever. It’s not the most
desirable solution given that qubits
are so expensive. Is it possible to use fewer
extra qubits in some way? DANIEL LIDAR: Yeah. So let me start
with the first part. So error correction,
as I understand it, typically involves overhead. There’s always overhead. And most quantum
error correction involves huge overhead. In fact, if you want to go
into fault tolerance using concatenated codes, then
almost all your qubits are doing error
correction for you. And also, most of the
time, what you’re doing is error correction. So a factor of four is
actually not so bad. It’s pretty good. But the real question is, am I
better off using this technique than just running four
copies of the same problem? Because that’s
the same resource. And that’s what I showed you. That was the point of the
comparison of the QAC technique to what I call the C technique. In the C technique, we’re
just using four copies. So we want to compare
apples to apples. Same amount of resources. We’re doing better
than just four copies. Are we doing better
than no copies at all? Well, of course, we’re taking
this hit by a factor of four. So what is fair to
compare is n qubits unencoded within
n encoded qubits. And yes, we’re doing better. AUDIENCE: OK. DANIEL LIDAR: Yeah. Oh, and finally you asked,
is it possible to use fewer? No. We tried it. And you get no improvement. AUDIENCE: So what we really
need to do is then [INAUDIBLE]? DANIEL LIDAR: Yes. And that’s actually a
very viable prospect. The next generation
of the D-Wave chip that’s already operational
has more than 1,000 qubits. The one after that– in fact,
that is part of a larger chip. That’s part of a
2,000 plus qubit chip. So making qubits is
actually– they’re not as scarce as
you might think. A factor of four every year
or two is not unreasonable. AUDIENCE: [INAUDIBLE] So
in the D-Wave machine, you said that the two states are
counterclockwise and clockwise? DANIEL LIDAR: Current. AUDIENCE: Current–
is that light? DANIEL LIDAR: Mm-hmm. AUDIENCE: So what
does the phase mean? You’ve got superposition
with the complex phase. What’s the physical
interpretation? DANIEL LIDAR: Well,
the phase is– instead of thinking
about complex phase, let’s just think
about plus minus 1. That’s rich enough. It turns out, in fact, that all
the power of quantum computing is already in that. So that allows you to
create interference. So you can imagine taking a
state that is ket 0 plus ket 1, and another state that
is ket 0 minus ket 1. If you add those two
up, you interfered away the 1, the ket 1. And so that’s interference. That’s what that buys you. And that’s considered
to be a resource, an indispensable
resource in terms of the power of
quantum computing. AUDIENCE: [INAUDIBLE]. DANIEL LIDAR: Right, right. Another way to say
what Masoud just said is you can think of a
quantum computer involving all candidate answers in
quantum superposition. Again, if you
measure, you’re going to get possibly random garbage. So what you want to
do is amplify out of that sea of
different possibility. You want to amplify
the right one. You do that using interference. AUDIENCE: Does the D-Wave system
have a physical [INAUDIBLE] built into it? [INAUDIBLE] or is
so the question was whether the D-Wave machine
has built-in error correction. Well, there is a lot of
very clever engineering that goes into it,
which you could call error correction
if you’d like. There’s filtering. There’s compensation, many
engineering style tricks that are– without which, the
machine wouldn’t work at all. So on that level, I
would say that yes, there is error correction. What we’re doing is we’re using
the machine as is in order to introduce yet another level
of error correction, which is inspired by tricks
that we’ve learned from the field of quantum
information and quantum error correction, which allows us to
boost performance even more. Now, there’s
nothing in principle preventing the next design
of the D-Wave machine from incorporating
this way of doing error correction inherently. So this could be a
knob you could turn, or an option you could
choose on the next generation of the D-Wave devices. And I think that
would be a good idea. AUDIENCE: [INAUDIBLE] why aren’t
all the qubits [INAUDIBLE]? DANIEL LIDAR: Yeah. So the question was, why
aren’t all the qubits usable? Why don’t they all yield? Well, a qubit, as you saw, is
actually a macroscopic object. It’s a loop of current. Rather, it’s a metal loop. And they’re not all identical
when they come out of fab. And there’s a lot
of control circuitry that surrounds every qubit. I didn’t show those pictures. But it’s, in fact, an incredibly
complex piece of engineering. And things happen in fab. And so not everything
comes out as well as you would have hoped. In fact, they produce
thousands of chips. And only a few actually
are up to specs. And that’s fine. You can throw away
the great majority of your defective
chips, as long as you find one that’s really good. AUDIENCE: [INAUDIBLE]? DANIEL LIDAR: I’m sorry,
the order of what? AUDIENCE: The order
of [INAUDIBLE]. DANIEL LIDAR: Yeah. So the question was
whether lowering the degree of the graph,
having it as low as six, limits the types of
problems that you can solve. And so yes and no. First, no. This graph is bipartite. And it’s non-planar,
which means that you can put NP hard problems on it. You can embed NP
hard problems on it. So in terms of the
computational complexity, the answer is it doesn’t do
damage from a pure computer science perspective. However, from a
practical perspective, if you try to embed
a real life problem, like let’s say image
processing for machine learning, or traveling
salesman, or protein folding, or what not, then you take a
hit because of this low degree. So typically, the
strategy is to try to map the actual graph to
an effective, complete graph. On the complete graph,
you can embed anything. And finding that mapping
for a particular problem is, itself, hard. And there are various heuristics
that are being used for it. So in practice, the degree
six and certainly degree three that we get on the
encoded graph create problems.


17 thoughts on “Daniel Lidar: “Quantum Information Processing: Are We There Yet?””

Leave a Reply

Your email address will not be published. Required fields are marked *