Intro To QC Vol 1 Loceff PDF
Intro To QC Vol 1 Loceff PDF
Intro To QC Vol 1 Loceff PDF
Volume 1
Michael Loceff
Foothill College
mailto:[email protected]
c 2015 Michael Loceff
All Rights Reserved
This work is licensed under the Creative Commons
Attribution-NonCommercial-NoDerivatives 4.0 International License.
To view a copy of this license, visit
http://creativecommons.org/licenses/by-nc-nd/4.0/.
Contents
0 Introduction 21
0.1 Welcome to Volume One . . . . . . . . . . . . . . . . . . . . . . . . . 21
0.1.1 About this Volume . . . . . . . . . . . . . . . . . . . . . . . . 21
0.1.2 About this Introduction . . . . . . . . . . . . . . . . . . . . . 21
0.2 Bits and Qubits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
0.2.1 More Information than Zero and One . . . . . . . . . . . . . . 22
0.2.2 The Probabilistic Nature of Qubits . . . . . . . . . . . . . . . 22
0.2.3 Quantum Mechanics – The Tool that Tames the Beast . . . . 24
0.2.4 Sneak Peek at the Coefficients α and β . . . . . . . . . . . . . 24
0.3 The Promise of Quantum Computing . . . . . . . . . . . . . . . . . . 25
0.3.1 Early Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
0.3.2 The Role of Computer Scientists . . . . . . . . . . . . . . . . 26
0.4 The Two Sides of Quantum Computer Science . . . . . . . . . . . . . 26
0.4.1 Circuit Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
0.4.2 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
0.5 Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
0.6 Navigating the Topics . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1 Complex Arithmetic 30
1.1 Complex Numbers for Quantum Computing . . . . . . . . . . . . . . 30
1.2 The Field of Complex Numbers . . . . . . . . . . . . . . . . . . . . . 31
1.2.1 The Real Numbers Just Don’t Cut It . . . . . . . . . . . . . . 31
1.2.2 The Definition of C . . . . . . . . . . . . . . . . . . . . . . . . 31
1.2.3 The Complex Plane . . . . . . . . . . . . . . . . . . . . . . . . 32
1.2.4 Operations on Complex Numbers . . . . . . . . . . . . . . . . 33
1.2.5 C is a Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
1.3 Exploring the Complex Plane . . . . . . . . . . . . . . . . . . . . . . 35
1
1.3.1 Complex Numbers as Ordered Pairs of Real Numbers . . . . . 35
1.3.2 Real and Imaginary Axes and Polar Representation . . . . . . 35
1.3.3 Complex Conjugate and Modulus . . . . . . . . . . . . . . . . 36
1.4 Transcendental Functions and Their Identities . . . . . . . . . . . . . 38
1.4.1 The Complex Exponential Function Part 1: Pure Imaginary Case 38
1.4.2 Real sin() and cos() in Terms of the Complex Exponential . . 41
1.4.3 Complex Exponential Part 2: Any Complex Number . . . . . 41
1.4.4 The Complex Trigonometric Functions . . . . . . . . . . . . . 42
1.4.5 Polar Relations Expressed Using the Exponential . . . . . . . 42
1.5 Roots of Unity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
1.5.1 N Distinct Solutions to z N = 1 . . . . . . . . . . . . . . . . 44
1.5.2 Euler’s Identity . . . . . . . . . . . . . . . . . . . . . . . . . . 46
1.5.3 Summation Notation . . . . . . . . . . . . . . . . . . . . . . . 47
1.5.4 Summing Roots-of-Unity and the Kronecker Delta . . . . . . . 48
3 Matrices 72
3.1 Matrices in Quantum Computing . . . . . . . . . . . . . . . . . . . . 72
3.2 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.2.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.3 Matrix Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.3.1 Row × Column . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.3.2 Definition of Matrix Multiplication . . . . . . . . . . . . . . . 74
3.3.3 Product of a Vector by a Matrix . . . . . . . . . . . . . . . . . 76
3.4 Matrix Transpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.5 Matrix Addition and Scalar Multiplication . . . . . . . . . . . . . . . 78
3.6 Identity Matrix and Zero Matrix . . . . . . . . . . . . . . . . . . . . . 78
3.7 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.7.1 Determinant of a 2 × 2 Matrix . . . . . . . . . . . . . . . . . 79
3.7.2 Determinant of a 3 × 3 Matrix . . . . . . . . . . . . . . . . . 80
3.7.3 Determinant of an n × n Matrix . . . . . . . . . . . . . . . . 81
3.7.4 Determinants of Products . . . . . . . . . . . . . . . . . . . . 81
3.8 Matrix Inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.9 Matrix Equations and Cramer’s Rule . . . . . . . . . . . . . . . . . . 83
3.9.1 Systems of Linear Equations . . . . . . . . . . . . . . . . . . . 83
3.9.2 Cramer’s Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4 Hilbert Space 89
4.1 Complex Vector Spaces for Quantum Computing . . . . . . . . . . . 89
4.1.1 The Vector Nature of a Qubit . . . . . . . . . . . . . . . . . . 89
4.1.2 The Complex Nature of a Qubit . . . . . . . . . . . . . . . . . 90
n
4.2 The Complex Vector Space, C . . . . . . . . . . . . . . . . . . . . . 90
4.3 The Complex Inner Product . . . . . . . . . . . . . . . . . . . . . . . 91
4.3.1 Norm and Distance . . . . . . . . . . . . . . . . . . . . . . . . 93
4.3.2 Expansion Coefficients . . . . . . . . . . . . . . . . . . . . . . 94
4.4 Hilbert Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.4.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.4.2 Old Friends and New Acquaintances . . . . . . . . . . . . . . 98
4.4.3 Some Useful Properties of Hilbert Spaces . . . . . . . . . . . . 99
4.5 Rays, Not Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.5.1 Modeling Quantum Systems . . . . . . . . . . . . . . . . . . . 100
4.5.2 0 is not a Quantum State . . . . . . . . . . . . . . . . . . . . 103
4.5.3 Why? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
4.6 Almost There . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Introduction
21
0.2 Bits and Qubits
0.2.1 More Information than Zero and One
Classical computing is done with bits, which can be either 0 or 1, period.
Quantum computing happens in the world of quantum bits, called qubits. Until a
qubit, call it |ψi, is directly or indirectly measured, it is in a state that is neither 0
nor 1, but rather a superposition of the two, expressed formally as
The symbol |0i corresponds to classical “0 ” value, while |1i is associated with a
classical value of “1.” Meanwhile, the symbols α and β stand for numbers that express
how much “0 ” and how much “1 ” are present in the qubit.
We’ll make this precise shortly, but the idea that you can have a teaspoon of “0 ”
and a tablespoon of “1 ” contained in a single qubit immediately puts us on alert that
we are no longer in the world of classical computing. This eerie concept becomes all
the more magical when you consider that a qubit exists on a sub-atomic level (as
photon or the spin state of an electron, for example), orders of magnitude smaller
than the physical embodiment of a single classical bit which requires about a million
atoms (or in research labs as few as 12).
That an infinitely small entity such as a qubit can store so much more information
than a bulky classical bit comes at a price, however.
If 100 classical one-bit memory locations are known to all hold the same value – call
it x until we know what that value is – then they all hold x = 0 or all hold x = 1. If
we measure the first location and find it to be 1, then we will have determined that
all 100 must hold a 1 (because of the assumption that all 100 locations are storing
the exact same value). Likewise, if we measure a 0, we’d know that all 100 locations
22
contain the value 0. Measuring the other 99 locations would confirm our conclusion.
Everything is logical.
A Quantum Experiment
Qubits are a lot more slippery. Imagine a quantum computer capable of storing
qubits. In this hypothetical we can inspect the contents of any memory location in
our computer by attaching an output meter to that location and read a result off the
meter.
Let’s try that last experiment in our new quantum computer. We load 100 qubit
memory locations with 100 identically prepared qubits. “Identically prepared” means
that each qubit has the exact same value, call it |ψi. (Never mind that I haven’t
explained what the value of a qubit means; it has some meaning, and I’m asking you
imagine that all 100 have the same value.)
Next, we use our meter to measure the first location. As in the classical case
we discover that the output meter registers either a “0 ” or a “1.” That’s already a
disappointment. We were hoping to get some science-fictiony-looking measurement
from a qubit, especially one with a name like “|ψi.” Never mind; we carry on. Say
the location gives us a measurement of “1.”
Summary to this point. We loaded up all 100 locations with the same qubit,
peered into the first location, and saw that it contained an ordinary “1.”
What should we expect if we measure the other 99 locations? Answer: We
have no idea what to expect.
Some of the remaining qubits may show us a “1,” others a “0,” all despite the fact
that the 100 locations initially stored identical qubit values.
This is disquieting. To our further annoyance,
23
• the measurement will have permanently destroyed the original state we pre-
pared, leaving it in a classical condition of either “0 ” or “1,” no more magical
superposition left in there,
• as already stated we know nothing (well, almost nothing, but that’s for another
day) about the measurement outcomes of the other 99 supposedly identically
prepared locations, and
• most bizarre of all, in certain situations, measuring the state of any one of these
qubits will cause another qubit in a different computer, room, planet or galaxy
to be modified without the benefit of wires, radio waves or time.
24
In other words, if a qubit with the precise and definitive value
This is far from the whole story as we’ll learn in our very first lecture, but it gives
you a feel for how the probabilistic nature of the quantum world can be both slippery
and quantitative at the same time. We don’t know what we’ll get when we query a
quantum memory register, but we do know what the probabilities will be.
25
0.3.2 The Role of Computer Scientists
Quantum computers don’t exist yet. There is production grade hardware that appears
to leverage quantum behavior but does not exhibit the simple qubit processing needs
of the early – or indeed most of the current – quantum algorithms in computer
science. On the other hand, many university rigs possess the “right stuff” for quantum
algorithms, but they are years away from having the stability and/or size to appear
in manufactured form.
The engineers and physicists are doing their part.
The wonderful news for us computer scientists is that we don’t have to wait.
Regardless of what the hardware ultimately looks like, we already know what it
will do. That’s because it is based on the most fundamental, firmly established and –
despite my scary sounding lead-in – surprisingly simple quantum mechanics. We know
what a qubit is, how a quantum logic gate will affect it, and what the consequences of
reading qubit registers are. There is nothing preventing us from designing algorithms
right now.
• Circuit Design. We know what the individual components will be, even if
they don’t exist yet. So we must gain some understanding and proficiency in
the assembly of these parts to produce full circuits.
Classical logic gates are relatively easy to understand. An AND gate, for example,
has a common symbol and straightforward truth table that defines it:
26
x y x∧y
x 0 0 0
x∧y
y
0 1 0
1 0 0
1 1 1
You were introduced to logic like this in your first computer science class. After about
20 minutes of practice with various input combinations, you likely absorbed the full
meaning of the AND gate without serious incident.
Quantum
A quantum logic gate requires significant vocabulary and symbolism to even define,
never mind apply. If you promise not to panic, I’ll give you a peek. Of course, you’ll
be trained in all the math and quantum mechanics in this course before we define
such a circuit officially. By then, you’ll be eating quantum logic for breakfast.
We’ll take the example of something called a second order Hadamard gate. We
would start by first considering the second order qubit on which the gate operates.
Such a thing is symbolized using the mysterious notation and a column of numbers,
α
β
|ψi2 =
γ .
Next, we would send this qubit through the Hadamard gate using the symbolism
(
|ψi2 → H ⊗2 H ⊗2 |ψi2
Although it means little to us at this stage, the diagram shows the qubit |ψi2 entering
the Hadamard gate, and another, H ⊗2 |ψi2 , coming out.
Finally, rather than a truth table, we will need a matrix to describe the behavior of
the gate. Its action on our qubit would be the result of matrix multiplication (another
27
topic we’ll cover if you haven’t had it),
1 1 1 1 α α+β+γ+δ
1 1 −1 1 −1 β 1α − β + γ − δ .
H ⊗2 |ψi2 =
=
21 1 −1 −1 γ 2 α + β − γ − δ
1 −1 −1 1 δ α−β−γ+δ
Again, we see that there is a lot of unlearned symbolism, and not the kind that can be
explained in a few minutes or hours. We’ll need weeks. But the weeks will be packed
with exciting and useful information that you can apply to all areas of engineering
and science, not just quantum computing.
0.4.2 Algorithms
In quantum computing, we first design a small circuit using the components that are
(or will one day become) available to us. An example of such a circuit in diagram
form (with no explanation offered today) is
|0in H ⊗n H ⊗n
Uf
| {z }
(actual)
|0in
| {z }
(conceptual)
A B C
There are access points, A, B and C, to assist with the analysis of the circuit. When
we study these circuits in a few weeks, we’ll be following the state of a qubit as it
makes its way through each access point.
Deterministic
We run the circuit one time only and measure the output.
• If we read a zero the function is constant.
• If we read any non-zero value, the function is balanced.
This will differ from a corresponding classical algorithm that requires, typically, many
evaluations of the circuit (or in computer language, many “loop passes”).
28
Probabilistic
Or perhaps our algorithm will be probabilistic, which means that once in a blue moon
it will yield an incorrect answer. The final steps, then, might be:
Once Again: I don’t expect you to know about time complexity or probability
yet. You’ll learn it all here.
Whether deterministic or probabilistic, we will be designing circuits and their
algorithms that can do things faster than their classical cousins.
0.5 Perspective
Quantum computing does not promise to do everything better than classical comput-
ing. In fact, the majority of our processing needs will almost certainly continue to
be met more efficiently with today’s bit-based logic. We are designing new tools for
currently unsolvable problems, not to fix things that are currently unbroken.
1. Self-Selection. The titles of chapters and sections are visible in the click-able
table of contents. You can use them to evaluate whether a set of topics is likely
to be worth your time.
2. Chapter Introductions. The first sentence of some chapters or sections may
qualify them as optional, intended for those who want more coverage of a spe-
cific topic. If any such optional component is needed in the later volumes
accompanying CS 83B or CS 83C, the student will be referred back to it.
3. Tips Found at the Course Site. Students enrolled in CS 83A at Foothill
College will have access to the course web site where weekly modules, discussion
forums and private messages will contain individualized navigation advice.
29
Chapter 1
Complex Arithmetic
But what are these numbers α and β? If you are careful not to take it too seriously,
you can imagine them to be numbers between 0 and 1, where small values mean less
probable – or a small dose – and larger values mean more probable – or a high dose.
So a particular qubit value, call it |ψ0 i, defined to be
would mean a large “amount” of the classical bit 0 and a small “amount” of the clas-
sical bit 1. I’m intentionally using pedestrian terminology because it’s going to take
us a few weeks to rigorously define all this. However, I can reveal something imme-
diately: real numbers will not work for α or β. The vagaries of quantum mechanics
require that these numbers be taken from the richer pool of complex numbers.
Our quest to learn quantum computing takes us through the field of quantum
mechanics, and the first step in that effort must always be a mastery of complex
arithmetic. Today we check off that box. And even if you’ve studied it in the past,
our treatment today might include a few surprises – results we’ll be using repeatedly
like Euler’s formula, how to sum complex roots-of-unity, the complex exponential
function and polar forms. So without further ado, let’s get started.
30
1.2 The Field of Complex Numbers
1.2.1 The Real Numbers Just Don’t Cut It
We can only go so far in studying quantum information theory if we bind ourselves to
using only the real numbers, R. The hint that R is not adequate comes from simple
algebra. We know the solution of
x2 − 1 = 0
is x = ±1. But make the slightest revision,
x2 + 1 = 0,
and we no longer have any solutions in the real numbers. Yet we need such solutions
in physics, engineering and indeed, in every quantitative field from economics to
neurobiology.
The problem is that the real numbers do not constitute a complete field, a term
that expresses the fact that there are equations that have no solutions in that number
system. We can force the last equation to have a solution by royal decree: we declare
the number
√
i ≡ −1
to be added to R. It is called an imaginary number. Make sure you understand the
meaning here. We are not computing a square root. We are defining a new number
whose name is i, and proclaiming that it have the property that
i2 = −1 .
√
i is merely shorthand for the black-and-white pattern of scribbles, “ −1.” But those
scribbles tells us the essential property of this new number, i.
This gives us a solution to the equation x2 + 1 = 0, but there are infinitely
many other equations that still don’t have solutions. It seems like we would have to
manufacture a new number for every equation that lacks a real solution (a.k.a. a real
zero in math jargon).
31
All of these numbers are of the form
Terminology
When there is no real part (the first term in the above sum) to the complex number,
it is called purely imaginary, or just imaginary. The last example in the list above is
purely imaginary.
If we take all these combinations, we have a new supersized number system, the
complex numbers, defined by
√
C ≡ a + bi | a ∈ R, b ∈ R and i = −1 .
(a, b) ↔ a + bi,
we have a natural way to represent each such number on a plane, whose x-axis is
the real axis (which expresses the value a), and y-axis is the imaginary axis (which
expresses the value b) (figure 1.1). This looks a lot like the real Cartesian plane, R2 ,
but it isn’t. Both pictures are a collection of ordered pairs of real numbers, but C
is richer than R2 when we dig deeper: it has product and quotient operations, both
missing in R2 . (You’ll see.) For now, just be careful to not confuse them.
32
One usually uses z, c or w to denote complex numbers. Often we write the i
before the real coefficient:
z = x + iy, c = a + ib, w = u + iv
In quantum computing, our complex numbers are usually coefficients of the compu-
tational basis states – a new term for those special symbols |0i and |1i we have been
toying with – in which case we may use use Greek letters α or β, for the complex
numbers,
This notation emphasizes the fact that the complex numbers are scalars of the com-
plex vector space under consideration.
[Note. If terms like vector space or scalar are new to you, fear not. I’m not
officially defining them yet, and we’ll have a full lecture on them. I just want to start
exposing you to some vocabulary early.]
The criteria for two complex numbers to be equal follows the template set by two
points in R2 being equal: both coordinates must be equal. If
z = x + iy and w = u + iv,
then
z=w ⇐⇒ x=u and y = v.
i2 = −1 ,
which tells us the answer to i · i. From there, we build-up by assuming that the oper-
ations × and + obey the same kinds of laws (commutative, associative, distributive)
as they do for the reals. With these ground rules, we can quickly demonstrate that
the only way to define addition (subtraction) and multiplication is
33
The rule for division can actually be derived from these,
a + ib a + ib c − id
= ·
c + id c + id c − id
(ac + bd) + i(bc − ad)
=
c2 + d 2
ac + bd bc − ad
= 2 2
+ i 2 , where c2 + d2 6= 0 .
c +d c + d2
A special consequence of this is the oft cited identity
1
= −i .
i
[Exercise. Prove it.]
Addition (or subtraction) can be pictured as the vectorial sum of the two complex
numbers (see figure 1.2). However, multiplication is more easily visualized when we
get to polar coordinates, so we’ll wait a few minutes before showing that picture.
[Exercise. For the following complex numbers,
√
2 + 3i, 1 − i, .05 + .002i, π + 2i, 52, −100i,
(a) square them,
(b) subtract 1 − i from each of them,
(c) multiply each by 1 − i,
(d) divide each by 5,
(e) divide each by i, and
(f) divide any three of them by 1 − i. ]
[Exercise. Explain why it is not necessarily true that the square of a complex
number, z 2 , is positive or zero – or even real, for that matter. If z 2 is real, does that
mean it must be non-negative?]
34
1.2.5 C is a Field
A number system that has addition and multiplication replete with the usual prop-
erties is called a field. What we have outlined above is the fact that C is, like R, a
field. When you have a field, you can then create a vector space over that field by
taking n-tuples of numbers from that field. Just as we have real n-dimensional vector
spaces, Rn , we can as easily create n-dimensional vector spaces over C which we call
Cn . (We have a whole lesson devoted to defining real and complex vector spaces.)
x + iy ←→ (x, y) .
x + iy ←→ r (cos θ + i sin θ)
35
Figure 1.3: the connection between cartesian and polar coordinates of a complex
number
Terminology
Note. Modulus will be discussed more fully in a moment. For now, |z| = r can be
taken as a definition or just terminology with the definition to follow.
z = x + iy
then its complex conjugate (or just conjugate) is designated and defined as
z ≡ z∗ .
Geometrically, this is like reflecting z across the x (real) axis (figure 1.4).
36
Figure 1.4: conjugation as reflection
Examples
(35 + 8i)∗ = 35 − 8i
√ ∗ √
(2 − 2 i) = 2 + 2i
√ ∗ √
(−3 2 i) = (3 2 i
(1.5π)∗ = 1.5π
It is easy to show that conjugation distributes across sums and products, i.e.,
(w z)∗ = w∗ z ∗ and
(w + z)∗ = w + z∗ .
∗
These little factoids will come in handy when we study kets, bras and Hermitian
conjugates in a couple weeks.
[Exercise. Prove both assertions. What about quotients?]
Just as in the case of R2 , the modulus of the complex z is the distance of the line
segment (in the complex plane) 0 z, that is,
p
|z| ≡ Re(z)2 + Im(z)2
p
= x2 + y 2 .
37
Figure 1.5: modulus of a complex number
|z|2 = zz ∗ = z∗z = x2 + y 2
√
|z| = z∗z
The term transcendental function has a formal definition, but for our purposes it
means functions like sin x, cos x, ex , sinh x, etc. It’s time to talk about how they are
defined and relate to complex arithmetic.
38
expansion or Taylor series,
∞
X xn
exp(x) = ex ≡ .
n=0
n!
This suggests that we can define a complex exponential function that has a similar
expansion, only for a complex z rather than a real x.
We start by defining a new function of a purely imaginary number, iθ, where θ is the
real angle (or arg) of the the number,
∞
iθ
X (iθ)n
exp(iθ) = e ≡ .
n=0
n!
(A detail that I am skipping is the proof that this series converges to a complex
number for all real θ. But believe me, it does.)
Let’s expand the sum, but first, an observation about increasing powers of i.
i0 =1
i1 =i
i2 = −1
i3 = −i
i4 =1
i5 =i
...
n+4
i = in
Apply these powers of i to the infinite sum.
iθ iθ θ2 iθ3 θ4
e = 1 + − − +
1! 2! 3! 4!
iθ5 θ6 iθ7 iθ8
+ − − +
5! 6! 7! 8!
+ ... .
Rearrange the terms so that all the real terms are together and all the imaginary
terms are together,
θ2 θ4 θ6 θ8
iθ
e = 1 − + − + + ...
2! 4! 6! 8!
θ3 θ5 θ7
θ
+ i − + − + ... .
1! 3! 5! 7!
You may recognize the two parenthetical expressions as the Taylor series for cos θ and
sin θ, and we pause to summarize this result of profound and universal importance.
39
Euler’s Formula
which leads to one of the most necessary and widely used facts in all of physics and
engineering,
iθ
e = 1 , for real θ .
[Exercise. Prove this last equality without recourse to Euler’s formula, using
exponential identities alone.]
Euler’s formula tells us how to visualize the exponential of a pure imaginary. If we
think of θ as time, then eiθ is a “spec” (if you graph it in the complex plane) traveling
around the unit-circle counter-clockwise at 1 radian-per-second (see Figure 1.6). I
cannot overstate the importance of Euler’s formula and the accompanying picture.
You’ll need it in this course and almost in any other engineering study.
40
1.4.2 Real sin() and cos() in Terms of the Complex Expo-
nential
Now try this: Plug −θ (minus theta) into Euler’s formula, then add (or subtract) the
resulting equation to the original formula. Because of the trigonometric identities
the so-called oddness and evenness of sin and cos, respectively, you would quickly
discover the first (or second) equality
eiθ + e−iθ
cos θ = ,
2
eiθ − e−iθ
sin θ = .
2i
These appear often in physics and engineering, and we’ll be relying on them later in
the course.
We can do this because each factor on the far right is a complex number (the first,
of course, happens to also be real), so we can take their product.
For completeness, we should note (this requires proof, not supplied) that every-
thing we have done leads to the promised Taylor expansion for ez as a function of a
complex z, namely,
∞
X zn
exp(z) = ez ≡ .
n=0
n!
This definition implies, among other things, the correct behavior of exp(z) with regard
to addition of “exponents,” that is
ez+w = ez ew .
41
Easy Proof of Addition Law of Sines and Cosines
A consequence of all this hard work is an easy way to remember the trigonometric
addition laws. I hope you’re sitting down. Take A, B two real numbers – they can
be angles. From Euler’s formula, using θ = A + B, we get:
Because of the law of exponents we know that the LHS of the last two equations are
equal, so their RHSs must also be equal. Finally, equate the real and imaginary parts:
QED
z2 z4 z6 z8
cos z = 1 − + − + + ...
2! 4! 6! 8!
and
z z3 z5 z7
sin z = − + − + ... .
1! 3! 5! 7!
We also get an Euler-like formula for general complex z, namely,
z = x + iy .
42
But from the equivalence of the Cartesian and polar coordinates of a complex number
seen in figure 1.3 and expressed by
x + iy ←→ r (cos θ + i sin θ) ,
we can use the Euler formula on the RHS to obtain the very useful
z = reiθ .
The latter version gives us a variety of important identities. If z and w are two
complex numbers expressed in polar form,
z = reiθ and
w = seiφ ,
then we have
zw = rs ei(θ+φ) and
r i(θ−φ)
z/w = e .
s
Notice how the moduli multiply or divide and the args add or subtract (figure 1.7 ).
In fact, that last equation is so useful, I’ll restate it slightly. For any real number, θ,
∗
reiθ = re−iθ .
[Exercise. Using polar notation, find a short proof that the conjugate of a product
(quotient) is the product (quotient) of the conjugates. (This is an exercise you may
have done above using more ink.)]
43
A common way to use these relationships is through equivalent identities that put
the emphasis on the modulus and arg, separately,
|zw| = |z| |w| ,
z |z|
= ,
w |w|
|z ∗ | = |z| ,
arg(zw) = arg z + arg w ,
z
arg = arg z − arg w , and
w
arg(z ∗ ) = − arg z .
[Exercise. Verify all of the last dozen or so polar identities using the results of the
earlier sections.]
44
• ωN lies somewhere on the unit circle,
• ω1 = e2πi = 1,
• ω2 = eπi = −1,
√
3
• ω3 = e2πi/3 = − 12 + 2
i,
• ω4 = eπi/2 = i, and
• for N > 4, as N increases (5, 6, 7, etc.), ωN marches clockwise along the upper
half of the unit circle approaching 1 (but never reaching it). For example ω1000
is almost indistinguishable from 1, just above it.
In fact, we should call this the primitive N th root-of-unity to distinguish it from its
siblings which we’ll meet in a moment. Finally, we can see that ωN is a non-real
(when N > 2) solution to the equation
zN = 1.
Often, when we are using the same N and ωN for many pages, we omit the subscript
and use the simpler,
ω ≡ e2πi/N ,
(See Figure 1.8.) These are also N th roots-of-unity, generated by taking powers of
the primitive N th root.
[Exercise. Why are they called N th roots-of-unity? Hint: Raise any one of them
to the N th power.]
45
Figure 1.8: the fifth roots-of-unity
For our purposes, we’ll consider the primitive N th root-of-unity to be the one
that has the smallest arg (angle), namely, e2πi/N , and we’ll call all the other powers,
ordinary, non-primitive roots. However, this is not how mathematicians would make
the distinction, and does not match the actual definition which includes some of the
other roots, a subtlety we can safely overlook.
As you can see, all N of the N th roots are the vertices of a regular polygon
inscribed in the unit circle, with one vertex anchored at the real number 1. This is a
fact about roots-of-unity that you’ll want to remember. And now we have met all N
of the distinct solutions to the equation z N = 1.
eiπ/2 = i,
e2πi = 1, and
πi
e = −1 .
(See Figure 1.9.) That last equation is also known as Euler’s identity (distinct from
Euler’s formula). Sometimes it is written in the form
eπi + 1 = 0.
46
Figure 1.9: Three of the fourth roots-of-unity (find the fourth)
Take a moment to ponder this amazing relationship between the constants, i, e and
π.
The index starting the sum is indicated below the large Greek Sigma, (Σ), and the
final index in the sum is placed above it. For example, if we wanted to start the sum
from a0 and end at an−1 , we would write
n−1
X
ak .
k=0
47
where the start of the sum can be anything we want it to be: 0, 1, −5 or even −∞.
[Exercise. Write out the sums
1 + 2 + 3 + . . . + 1999 ,
and
0 + 2 + 4 + . . . + 2N ,
ω, ω2, ..., ω N −1 , ω N = ω 0 = 1, ,
zN − 1 = 0.
(z − 1) (z − ω) z − ω 2 · · · z − ω N −1
= 0.
To see this, just plug any of the N th roots into this equation and see what happens.
Any two polynomials that have exactly the same roots are equal, so
z N − 1 = (z − 1) (z − ω) z − ω 2 · · · z − ω N −1 .
By high school algebra, you can easily verify that the LHS can also be factored to
z N − 1 = (z − 1) z N −1 + z N −2 + . . . + z 2 + z + 1 .
We can therefore equate the RHS of the last two equations (and divide out the
common (z − 1)) to yield
z N −1 + . . . + z + 1 = (z − ω) · · · z − ω N −1 .
This equation has a number of easily confirmed consequences that we will use going
forward. We list each one as an exercise.
48
Exercises
Hint: Prove that, for all l, ω l+N = ω l , and apply the last result.
Hint: Add (or subtract) an integral multiple of N to (or from) l to bring it into
0
the interval [−N, N ) and call l0 the new value of l. Argue that ω l (N −2) = ω l(N −2) ,
so this doesn’t change the value of the sum. Finally, apply the last result.
Kronecker Delta
All of these exercises have a common theme: When we add all of the N th roots-
of-unity raised to certain powers, there is a massive cancellation causing the sum
to be zero except for special cases. I’ll repeat the last result using Kronecker delta
symbol, as this will be very important in some upcoming algorithms. First, what is
the “Kronecker delta?”
49
You will see Kronecker delta throughout the course (and beyond), starting immedi-
ately, as I rewrite result (d) using it.
N
X −1
ω (j−m)k = N δjm .
k=0
Our flight is officially off the ground. We’ve learned or reviewed everything needed
about complex arithmetic to sustain ourselves through the unusual terrain of quantum
computing. There’s a second subject that we will want to master, and that’ll be our
first fly-over. Look out the window, as it should be coming into view now: linear
algebra.
50
Chapter 2
where last lecture revealed two of the symbols, α and β, to be complex numbers.
Today, we add a little more information about it.
Every vector lives in a world of other vectors, all of which bear certain similarities.
That world is called a vector space, and two of the similarities that all vectors in any
given vector space share are
• the kinds of ordinary numbers – or scalars – that support the vector space
operations (i.e., does it use real numbers? complex numbers? the tiny set of
integers {0, 1}?).
In this lecture we’ll restrict our study to real vector spaces – those whose scalars
are the real numbers. We’ll get to complex vector spaces in a couple days. As for
dimension, we’ll start by focusing on two-dimensional vector spaces, and follow that
by meeting some higher dimensional spaces.
51
2.2 Vectors and Vector Spaces
R2 , sometimes referred to as Euclidean 2-space, will be our poster child for vector
spaces, and we’ll see that everything we learn about R2 applies equally well to higher
dimensional vector spaces like R3 (three-dimensional), R4 (four-dimensional) or Rn
(n-dimensional, for any positive integer, n).
With that overview, let’s get back down to earth and define the two-dimensional
real vector space R2 .
The Objects
A vector space requires two sets of things to make sense: the scalars and the vectors.
Scalars
A vector space is based on some number system. For example, R2 is built on the
real numbers, R. These are the scalars of the vector space. In math lingo the scalars
are referred to as the underlying field.
Vectors
The other set of objects that constitute a vector space are the vectors. In the case
of R2 they are ordered pairs,
3 500 1 0
r= , a= , x̂ = , ŷ = .
−7 1+π 0 1
You’ll note that I use boldface to name the vectors. That’s to help distinguish a
vector variable name like r, x or v, from a scalar name, likea, x or α. Also, we will
3
usually consider vectors to be written as columns like , not rows like (3, −7),
−7
although this varies by author and context.
A more formal and complete description of the vectors in R2 is provided using set
notation,
( )
x
R2 ≡ x, y ∈ R .
y
(See figure 2.1.) This is somewhat incomplete, though, because it only tells what the
52
Figure 2.1: A vector in R2
The Rules
Two operations and their properties are required for the above objects.
Vector Addition
We must be able to combine two vectors to produce a third vector,
v + w 7−→ u .
• Zero Vector. Every vector space must have a unique zero vector, denoted by
boldface 0, which has the property that v + 0 = 0 + v = v, for all v in
the space. For R2 the zero vector is (0, 0)t .
53
Figure 2.2: Vector addition in R2
54
Figure 2.3: Scalar multiplication in R2
When a vector space has this feature, it provides a way to multiply two vectors in
order to produce a scalar,
v · w 7−→ c .
In R2 this is called the dot product, but in other contexts it may be referred to as an
inner product. There can be a difference between a dot product and an inner product
(in complex vector spaces, for example), so don’t assume the terms are synonymous.
However, for R2 they are, with both defined by
x1 x2
· = x1 x 2 + y 1 y 2 .
y1 y2
Inner products can be defined differently in different vector spaces,. However they
are defined, they must obey certain properties to get the title inner or dot product.
I won’t burden you with them all, but one that is very common is a distributive
property.
v · ( w1 + w2 ) = v · w1 + v · w2 .
55
[Exercise. Look-up and list another property that an inner product must obey.]
[Exercise. Prove that the dot product in R2 , as defined above, obeys the dis-
tributive property.]
When we get to Hilbert spaces, there will be more to say about this.
An inner product, when present in a vector space, confers each vector with a length
(a.k.a. modulus or norm), denoted by either |v| or ||v||. The length (in most situ-
ations, including ours) of each vector must be a non-negative real number, even for
complex vector spaces coming later. So,
|v| ≥ 0.
Orthogonality
v ·w = 0,
we say that they are orthogonal or mutually perpendicular. In the relatively visu-
alizable spaces R2 and R3 , we can imagine the line segments 0 v and 0 w forming
right-angles with one another. (See figure 2.4.)
56
Examples of Orthogonal Vectors. A well known orthogonal pair is
1 0
, .
0 1
Another is
1 −1
, .
1 1
[Exercise. Use the definition of dot product to show that each of the two sets of
vectors, listed above, is orthogonal.]
Examples of Non-Orthogonal Vectors. An example of a set that is not
orthogonal is
1 1
, .
0 1
More Vocabulary. If a set of vectors is both orthogonal and each vector in the
set has unit length (norm = 1), it is called orthonormal.
The set
1 0
,
0 1
• A length seems like it should always be non-negative, but this won’t be true in
some common vector spaces of first-year physics (hint: special relativity).
These two unexpected situations suggest that we need a term to distinguish the
typical, well-behaved inner products from those that are off-beat or oddball. That
term is positive definiteness.
Positive Definite Property. An inner-product is usually required to be positive
definite, meaning
• v · v ≥ 0, and
57
• v 6= 0 ⇒ kvk > 0 .
When a proposed inner product fails to meet these conditions, it is often not granted
the status inner (or dot) product but is instead called a pairing. When we come
across a pairing, I’ll call it to your attention, and we can take appropriate action.
[Exercise. Assume
3 500 1
r= , a= and x̂ = .
−7 1+π 0
(c) Prove that the only vector in R2 which has zero norm is the zero vector, 0.
(d) For each vector, find an (i ) orthogonal and (ii ) non-orthogonal companion.]
x
R3 ≡ y x, y, z ∈ R .
z
(This only tells us what the vectors are, but I’m telling you now that the scalars
continue to be R.)
58
Examples of vectors in R3 are
2 0 π/2
−1 , 1 , 100 .
3.5 0 −9
It’s harder to graph these objects, but it can be done using three-D sketches (See
figure 2.5.).
[Exercise. Repeat some of the examples and exercises we did for R2 for this
richer vector space. In particular, define vector addition, scalar multiplication, dot
products, etc.]
59
and two scalars, a, b, we can form the vector u by taking
u = av + bw .
Mathematicians call it a linear combination of v and w. Physicists call it a superpo-
sition of the two vectors. Superposition or linear combination, the idea is the same.
We are ”weighting” each of the vectors, v and w, by scalar weights, a and b, respec-
tively, then adding the results. In a sense, the two scalars tell the relative amounts
of each vector that we want in our result. (However, if you lean too heavily on that
metaphor, you will find yourself doing some fast talking when your students ask you
about negative numbers and complex scalars, so don’t take it too far.)
The concept extends to sets containing more than two vectors. Say we have a
finite set of n vectors {vk } and corresponding scalars {ck }. Now a linear combination
would be expressed either long-hand or using summation notation,
u = c0 v0 + c1 v1 + . . . + cn−1 vn−1
n−1
X
= ck v k .
k=0
In R2 , only two vectors are needed to produce – through linear combination – all the
rest. The most famous basis for this space is the standard (or natural or preferred )
basis, which I’ll call A for now,
1 0
A = {x̂, ŷ} = , .
0 1
60
For example, the vector (15, 3)t can be expressed as the linear combination
15
= 15 x̂ + 3 ŷ .
3
Note. In the diagram that follows, the vector pictured is not intended to be (15, 3)t .
(See figure 2.6.)
Properties of a Basis
Hang on, though, because we haven’t actually defined a basis yet; we only saw an
example of one. A basis has to have two properties called (i ) linear independence and
(ii ) completeness (or spanning).
61
Since we can express (3, 2)t as
3
= 3 x̂ + 2ŷ ,
2
Since we cannot express (3, −1, 2)t as a linear combination of these two, i.e.,
3 1 0
−1, 6= x 0 + y 1
2 0 0
for any x, y ∈ R, A00 does not span the set of vectors in R3 . In other words, A00
is not complete.
[Exercise. Find two vectors, such that if either one (individually) were added to
00
A , the augmented set would span the space.]
[Exercise. Find two vectors, such that, if either one were added to A00 , the
augmented set would still fail to be a spanning set.]
62
Definition of Basis
where not all the ck can be 0, since that would imply that v0 = 0, which cannot
be a member of any orthonormal set (remember from earlier?). By orthonormality
v0 · vk = 0 for all k 6= 0, but of course v0 · v0 = 1, so we get the following chain of
equalities:
n−1
! n−1
X X
1 = v0 · v0 = v0 · ck v k = ck (v0 · vk ) = 0 ,
k=1 k=1
a contradiction. QED
Notice that even if the vectors were orthogonal, weaker than their being orthonor-
mal, they would still have to be linearly independent.
63
Alternate Bases
There are many different pairs of vectors in R2 which can be used as a basis. Every
basis has exactly two vectors (by our theorem). Here is an alternate basis for R2 :
1 4
B = {b0 , b1 } = , .
1 −1
For example, the vector (15, 3)t can be expressed as the linear combination
15
= (27/5) b0 + (12/5) b1 .
3
[Exercise. Multiply this out to verify that the coefficients, 27/5 and 12/5 work for
that vector and basis.]
And here is yet a third basis for R2 :
√ √
C = {c0 , c1 } = √2/2 , −√ 2/2 ,
2/2 2/2
64
Figure 2.8: A vector expressed as linear combination of c0 and c1
Expanding Along a Basis. When we want to call attention to the specific basis
we are using to express (i.e., to produce using the wrench of linear combination) a
vector, v, we would say we are expanding v along the so-and-so basis, e.g., ”we are
expanding v along the natural basis,” or ”let’s expand v along the B basis.”
Orthonormal Bases
We are particularly interested in bases whose vectors have unit length and are mutu-
ally perpendicular. Not surprisingly, because of the definition of orthonormal vectors,
above, such bases are called orthonormal bases.
A is orthonormal: x̂ · x̂ = ŷ · ŷ = 1, x̂ · ŷ = 0
B is not orthonormal: b0 · b0 = 2 6= 1, b0 · b1 = 3 6= 0
C is orthonormal: ([Exercise])
If they are mutually perpendicular but do not have unit length, they are almost as
useful. Such a basis is called an orthogonal basis. If you get a basis to be orthogonal,
your hard work is done; you simply divide each basis vector by its norm in order to
make it orthonormal.
v = vx x̂ + vy ŷ,
65
its coordinates relative to the natural basis are vx and vy . In other words, the coor-
dinates are just weighting factors needed to expand that vector in that given basis.
Vocabulary. Sometimes the term coefficient is used instead of coordinate.
If we have a different basis, like B = {b0 , b1 }, and we expand v along that basis,
v = v0 b0 + v1 b1 ,
then its coordinates (coefficients) relative to the B basis are v0 and v1 .
As you see, the same vector, v, will have different coordinates relative to different
bases. However, the ordered pair that describes the pure object – the vector, (x, y)t ,
itself – does not change.
For some vector spaces, R2 being a prime example, it is easy to confuse the
coefficients with the actual vector. Relative to the natural (standard ) basis, the
coordinates happen to be the same as the numbers in the vector itself, so the vector
and the coordinates, when written in parentheses, are indistinguishable. If we need
to clarify that we are expressing coordinates of a vector, and not the raw vector itself,
we can label the ordered pair appropriately. Such a label would be the name of the
basis being used. It’s easier to do than it is to say. Here is the vector (15, 3)t , first
expanded along the natural basis,
15
= 15 x̂ + 3 ŷ ,
3
and now shown along with its coordinates in the standard basis,
15 15
= .
3 3 A
For the non-preferred bases of the previous section, the coordinates for (15, 3), simi-
larly labeled, are written as
√
15 27/5 9 √2
= =
3 12/5 B −6 2 C
In physics, it is common to use the term expansion coefficients to mean the coordinates
relative to some basis. This terminology seems to be used especially if the basis
happens to be orthonormal, although you may see the term used even if it isn’t.
15 √ √
= 9 2 c0 + −6 2 c1 ,
3
√ √
so 9 2 and −6 2 are the “expansion coefficients” of
15
3
66
along the C basis.
If our basis happens to be orthonormal, there is a special trick we can use to find
the coordinates – or expansion coefficients – relative to that orthonormal basis.
Let’s say we would like to expand v along the C basis, but we only know it in the
natural basis, A . We also know the C basis vectors, c0 and c1 , in the natural basis
A. In other words, we would like, but don’t yet have,
v = α0 c0 + α1 c1 ,
i.e., we don’t know the scalar weights α0 , α1 . An easy way to find the αk is to ”dot”
v with the two C basis vectors. We demonstrate why this works by computing α0 :
c0 · v = c0 · (α0 c0 + α1 c1 )
= c0 · α0 c 0 + c 0 · α1 c 1
= α0 (c0 · c0 ) + α1 (c0 · c1 )
= α0 (1) + α1 (0) = α0
[Exercise. Justify each equality in this last derivation using the axioms of vector
space and assumption of orthonormality of C.]
[Exercise. This trick works almost as well with an orthogonal basis which does
not happen to be orthonormal. We just have to add a step; when computing the
expansion coefficient for the basis vector, ck , we must divide the dot product by |ck |2 .
Prove this and give an example.]
Thus, dotting by c0 produced the expansion coefficient α0 . Likewise, to find α1
just dot the v with c1 .
For the specific vector v = (15, 3)t and the basis C, let’s verify that this actually
works for, say, the 0th expansion coefficient:
√ √ √
√
c0 · v = √2/2 · 15 = 15 2/2 + 3 2/2 = 9 2 X
2/2 3
The reason we could add the check-mark is that this agrees with the expression we
had earlier for the vector v expanded along the C basis.
Remark. We actually did not have to know things in terms of the natural basis
A in order for this to work. If we had known the coordinates of v in some other basis
(it doesn’t even have to be orthonormal), say B, and we also knew coordinates of the
C basis vectors with respect to B, then we could have done the same thing.
[Exercise. If you’re up for it, prove this.]
67
that we know a vector can have different coordinates relative to different bases, we
ask “is the inner product formula that I gave independent of basis?,” i.e., can we
use the coordinates, (dx , dy ) and (ex , ey ) relative to some basis – rather than the
numbers in the raw vector ordered pair – to compute the inner product using the
simple dx ex + dy ey ? In general the answer is no.
[Exercise. Compute the length of the vector (15, 3)t by dotting it with itself. Now
do the same thing, but this time compute using that vector’s coordinates relative to
the three bases A, B and C through use of the imputed formula given above. Do you
get the same answers? Which bases’ coordinates give the right inner-product answer?]
However, when working with orthonormal bases, the answer is yes, one can use
coordinates relative to that basis, instead of the pure vector coordinates, and apply
the simple formula to the coordinates.
[Exercise. Explain the results of the last exercise in light of this new assertion.]
[Exercise. See if you can prove the last assertion.]
Note. There is a way to use non-orthonormal basis coordinates to compute dot
products, but one must resort to a more elaborate matrix multiplication, the details
of which we shall skip (but it’s a nice [Exercise] should you wish to attempt it).
2.4 Subspaces
The set of vectors that are scalar multiples of a single vector, such as
a 2
a∈R = a a∈R ,
.5a 1
is, itself, a vector space. It can be called a subspace of the larger space R2 . As
an exercise, you can confirm that any two vectors in this set, when added together,
produce a third vector which is also in the set. Same with scalar multiplication. So
the subspace is said to be closed under vector addition and scalar multiplication. In
fact, that’s what we mean by a subspace.
68
Axioms
We can extend directly from ordered pairs or triples to ordered n-tuples for any
positive integer n:
x0
x1
n
≡ .. , xk ∈ R, k = 0 to n − 1
R
.
x
n−1
Inner Product
then
n
X
a·b ≡ ak b k .
k=1
Basis
69
It is clearly orthonormal (why “clearly?”), and any other basis,
B = { b0 , b1 , . . . , bn−1 } ,
for Rn will therefore have n vectors. The orthonormal property would be satisfied by
the alternate basis B if and only if
bk · bj = δkj .
We defined the last symbol, δkj , in our complex arithmetic lesson, but since many of
our readers will be skipping one or more of the early chapters, I’ll reprise the definition
here.
and all the remaining properties and definitions follow exactly as in the case of the
smaller dimensions.
by dotting v with the basis vectors one-at-a-time. In practical terms, this means “we
dot with bj to get αj ,”
n
X n
X
bj · v = bj · (αk bk ) = bj · (αk bk )
k=1 k=1
n
X Xn
= αk (bj · bk ) = αk δjk = αj .
k=1 k=1
70
2.6 More Exercises
Prove any of following that interest you:
1. If you have a spanning set of vectors that is not linearly independent, you can
find a subset that is a basis. (Argue why/how.) Give an example that demon-
strates you cannot arbitrarily keep the “correct number” of vectors from the
original spanning set and expect those to be your basis; you have to select your
subset, carefully.
4. Show that the set of vectors in R2 which have length < 1 does not constitute a
vector subspace R2 .
5. Prove that the set of points on the line y = 3x − 1 does not constitute a vector
subspace of R2 .
And that does it for our overview of vector spaces. The extension of these concepts
to the more exotic sounding vector spaces head – complex vector spaces, Hilbert spaces
and spaces over the finite field Z2 – will be very easy if you understand what’s in this
section and have done several of the exercises.
71
Chapter 3
Matrices
x y x∧y
0 0 0
0 1 0
1 0 0
1 1 1
Then, I mentioned that in the quantum world these truth tables get replaced by
something more abstract, called matrices. Like the truth table, a matrix contains
the rules of engagement when a qubit steps into its foyer. The matrix for a quantum
operator that we’ll study later in the quarter is
1 1 1 1
1 1 −1 1 −1 ,
21 1 −1 −1
1 −1 −1 1
which represents a special gate called the second order Hadamard operator. We’ll
meet that officially in a few weeks.
Our job today is to define matrices formally and learn the specific ways in which
they can be manipulated and combined with the vectors we met the previous chapter.
72
3.2 Definitions
Definition of a Matrix. A matrix is a rectangular array of numbers,
variables or pretty much anything. It has rows and columns. Each
matrix has a particular size expressed as [# rows] × [# columns],
for example, 2 × 2, 3 × 4, 7 × 1, 10 × 10, etc.
3.2.1 Notation
For this lesson I will number starting with 1, not 0, so
• row 1 = the first row = the top row of the matrix, and
• column 1 = the first column = the left column of the matrix.
73
Therefore, it’s important that we note the order of the product in the definition.
First, we can only multiply the matrices A (n × p) and B (q × m) if p = q. So we will
only define the product AB for two matrices of sizes, A (n × p) and B (p × m). The
size of the product will be n × m. Symbolically,
(n × p) · (p × m) = (n × m) .
Note that the “inner” dimension gets annihilated, leaving the “outer” dimensions to
determine the size of the product.
8
= 18 .
This is the definition of matrix multiplication in the special case when the first is a
column vector and the second is a row vector. But that definition is used repeatedly
to generate the product of two general matrices, coming up next. Here’s an example
when the vectors happen to have complex numbers in them.
−5
6
1, i, 3, 2 − 3i −4i = (1)(−5) + (i)(6) + (3)(−4i) + (2 − 3i)(8i)
8
= 19 + 10i .
[For Advanced Readers. As you see, this is just the sum of the simple products
of the coordinates, even when the numbers are complex. For those of you already
familiar with complex inner products (a topic we will cover next week), please note
that this is not a complex inner product; we do not take the complex conjugate of either
vector. Even if the matrices are complex numbers, we take the ordinary complex
product of the corresponding elements and add them.]
74
1-dimensional matrices, we must define the (kl)th element of the answer matrix to
be the dot product of the kth row of A with the lth column of B. Let’s look at it
graphically before we see the formal definition.
We illustrate the computation of the 1-1 element of an answer matrix in Figure 3.1
and the computation of the 2-2 element of the same answer matrix in Figure 3.2.
Figure 3.1: Dot-product of the first row and first column yields element 1-1
Figure 3.2: Dot-product of the second row and second column yields element 2-2
where k = 1, . . . , n and l = 1, . . . , m.
75
[Exercise. Fill in the rest of the elements in the product matrix, C, above.]
[Exercise. Compute the products
1 2 1 1 2 3
−2 0 4 3 1 1
5 5 5 0 0 1
and
1 2 3
1 2 0 −1
−2 −1 1 4 3 1 1
. ]
0 0 1
5 0 5 0
1 1 1
[Exercise. Compute the first product above in the opposite order. Did you get
the same answer?]
While matrix multiplication may not be commutative, it is associative,
A(BC) = (AB)C,
Some Observations
• Position. You cannot put the vector on the left of the matrix; the dimensions
would no longer be compatible. You can, however, put the transpose of the
vector on the left of the matrix, vt A. That does make sense and it has an
answer (see exercise, below).
76
[Exercise. Using the v and A from the last exercise, compute the product vt A.
[Exercise. Using the same A as above, let the vector w ≡ (−1, −2.5i, 1)t and
That was a special case of the more general operation, namely, taking the transpose
of an entire matrix. The transpose operation creates a new matrix whose rows are
the columns of the original matrix (and whose columns are the rows of the original).
More concisely, if A is the name of our original n × m matrix, its transpose, At , is the
m × n matrix defined by
( Akl )t = ( Alk ) .
[Exercise. Make up two matrices, one square and one not square, and show the
transpose of each.]
77
3.5 Matrix Addition and Scalar Multiplication
Matrices can be added (component-wise) and multiplied by a scalar (apply the scalar
to all nm elements in the matrix). I’m going to let you be the authors of this section
in two short exercises.
[Exercise. Make these definitions precise using a formula and give an example of
each in a 3 × 3 case.]
[Exercise. Show that matrix multiplication is distributive over addition and both
associative and commutative in combination with scalar multiplication, i.e.,
A (c B1 + B2 ) = c (AB1 ) + (AB2 )
= (AB2 ) + c (AB1 ) . ]
Clearly, when you add the zero matrix to another matrix, it does not change anything
in the other matrix. When you multiply the zero matrix by another matrix, including
a vector, it will squash it to all 0s.
( 0 )A = ( 0 ) and ( 0 )v = 0
The identity matrix has the property that when you apply it to (multiply it with)
78
another matrix or vector, it leaves that matrix or vector unchanged, e.g.,
1 0 0 0 1 2i 0 −1 1 2i 0 −1
0 1 0 0 −2 −1 1 4 = −2 −1 1 4
0 0 1 0 5 0 5 0 5 0 5 0
0 0 0 1 3 − i 2 −2 2 3 − i 2 −2 2
1 2i 0 −1 1 0 0 0 1 2i 0 −1
−2 −1 1 4 0 1 0 0 = −2 −1 1 4
5 0 5 0 0 0 1 0 5 0 5 0
3 − i 2 −2 2 0 0 0 1 3 − i 2 −2 2
Notice that multiplication by a unit matrix has the same (non) effect whether it
appears on either side of its co-multiplicand. The rule for both matrices and vectors
is
1M = M 1 = M,
1 v = v and
vt 1 = vt .
In words, the unit matrix is the multiplicative identity for matrices.
3.7 Determinants
Associated with every square matrix is a scalar called its determinant. There is very
little physics, math, statistics or any other science that we can do without a working
knowledge of the determinant. Let’s check that box now.
79
[Exercise. Compute the following determinants:
1 −2
3 4 = ?
4 1 + i
= ? ]
1−i 2
√
(Sorry, I had to use the variable name i, not for the −1, but to mean the 3-3 element
of the matrix, since I ran out of reasonable letters.) The latter defines the minors of
a matrix element to be the determinant of the smaller matrix constructed by crossing
out that element’s row and column (See Figure 3.3.)
Compare the last answer with the last 2 × 2 determinant exercise answer. Any
thoughts?]
80
3.7.3 Determinant of an n × n Matrix
The 3×3 definition tells us to proceed recursively for any square matrix of size n.
We define its determinant as an alternating sum of the first row elements times their
minors,
det(A) = A
≡ A11 minor of A11 − A12 minor of A12 + A13 minor of A13
+ ···
Xn
k+1
= (−1) A1k minor of A1k .
k=1
I think you know what’s coming. Why row 1 ? No reason at all (except that every
square matrix has a first row). In the definition above we would say that we expanded
the determinant along the first row, but we could have expanded it along any row –
or any column, for that matter – and gotten the same answer.
However, there is one detail that has to be adjusted if we expand along some other
row (or column). The expression
(−1)k+1
has to be changed if we expand along the jth row (column) rather than the first row
(column). The 1 above becomes j,
(−1)k+j ,
giving the formula
n
X
k+j
det(A) = (−1) Ajk minor of Ajk ,
k=1
81
3.8 Matrix Inverses
Since we have the multiplicative identity (i.e., 1) for matrices we can ask whether,
given an arbitrary square matrix A, the inverse of A can be found. That is, can we
find a B such that AB = BA = 1? The answer is sometimes. Not all matrices have
inverses. If A does have an inverse, we say that A is invertible or non- singular, and
write its inverse as A−1 . Shown in a couple different notations, just to encourage
flexibility, a matrix inverse must satisfy (and is defined by)
M −1 M = M M −1 = 1 or
A−1 A = A A−1 = I .
Here is one of two “little theorems” that we’ll need when we introduce quantum
mechanics.
While we don’t have to prove this, it’s actually fun and easy. In fact, we can get
half way there in one line:
[Exercise. Prove that M non-singular ⇒ det(M ) 6= 0 (in a single line). Hint:
Use the fact about determinants of products.]
The other direction is a consequence of a popular result in matrix equations. Since
one or more of our quantum algorithms will refer to this result, we’ll devote the next
section to it and do an example. Before we leave this section, though, I need to place
into evidence, our second “little theorem.”
Little Inverse Theorem “B”. If M v = 0 for some non- zero vector v, then
det(M ) = 0.
Proof. Little Theorem A tells us that the hypothesis implies M has no inverse,
and the subsequent Big Theorem tells us that this forces det(M ) 6= 0. QED
82
3.9 Matrix Equations and Cramer’s Rule
3.9.1 Systems of Linear Equations
A system of simultaneous linear equations with n unknowns is a set of equations in
variables x1 , x2 , . . . , xn , which only involves sums first order (linear ) terms of each
variable, e.g.,
To solve the system uniquely, one needs to have exactly at least as many equations as
there are unknowns. So the above system does not have a unique solution (although
you can find some simpler relationships between the variables if you try). Even if you
do have exactly the same number of equations as unknowns, there still may not be a
unique solution since one of the equations might – to cite one possibility – be a mere
multiple of one of the others, adding no new information. Each equation has to add
new information – it must be independent of all the others.
Matrix Equations
We can express this system of equations concisely using the language of matrix mul-
tiplication,
x1
3.2 4 −5 0 −1 x
2
19
1 1 1 1 1 x3 = 1 .
−23 1 0 10 0 x4 85
x5
If we were able to get two more relationships between the variables, independent of
these three, we would have a complete system represented by a square 5 × 5 matrix
on the LHS. For example,
3.2 4 −5 0 −1 x1 19
1 1 1 1 1 x2 1
−23 1 0 10 0
x3 = 85 .
2.5 π .09 50 1 x4 0
2π 0 .83 −1 −17 x5 4
Setting M = the 5 × 5 matrix on the left, v = the vector of unknowns, and c = the
vector of constants on the right, this becomes
Mv = c.
83
How can we leverage the language of matrices to get a solution? We want to know
what all the xk are. That’s the same as having a vector equation in which v is all
alone on the LHS,
v = d,
and the d on the RHS is a vector of constants. The scent of a solution should be
wafting in the breeze. If M v = c were a scalar equation we would divide by M .
Since it’s a matrix equation there are two differences:
1. Instead of dividing by M , we multiply each side of the equation (on the left) by
M −1 .
2. M −1 may not even exist, so this only works if M is non-singular.
84
Figure 3.4: The numerator of Cramer’s fraction
We will not prove Cramer’s rule. You can take it on faith, look up a proof, or
have a whack at it yourself.
We can compute the relevant determinants of the 5 × 5 system above with the help
of Mathematica or similar software which computes determinants for us. Or, we can
write our own determinant calculator using very few instructions and recursion, and
do it all ourselves. First, we check the main determinant,
det(M ) = −167077 6= 0 . X
det(M1 ) = −635709
det(M2 ) = 199789
det(M3 ) = 423127
det(M4 ) = 21996.3
det(M5 ) = −176281
[Exercise. Compute the other four xk and confirm that any one of the equations
in the original system holds for these five values.]
As noted, Cramer’s rule is not very useful for writing software, but we can apply it
to the problem of finding a matrix inverse, especially when dealing with small, 2 × 2,
85
matrices by hand. Say we are given such a matrix
a b
M = ,
c d
that we have confirmed to be non-singular by computing ad − bc. So we know it has
an inverse, which we will temporarily write as
−1 e f
M = .
g h
The only thing we can say with certainty, at this point, is that
a b e f 1 0
MM −1
= = 1 = .
c d g h 0 1
Our goal is to solve this matrix equation. We do so in two parts. First, we break this
into an equation which only involves the first column of the purported inverse,
a b e 1
= .
c d g 0
This is exactly the kind of 2-equation linear system we have already conquered.
Cramer’s rule tells us that it has a solution (since det(M ) 6= 0) and the solution
is given by
1 b
det
0 d
e =
det M
and
a 1
det
c 0
g = .
det M
The same moves can be used on the second column of M −1 , to solve
a b f 0
= .
c d h 1
[Exercise. Write down the corresponding quotients that give f and h.]
Example. We determine the invertibility of M and, if invertible, compute its
inverse, where
−12 1
M = .
15 1
The determinant is
−12 1
15 1
= −12 − 15 = −27 6= 0,
86
which tells us that the inverse does exist. Setting
−1 e f
M = ,
g h
e
we solve first for the left column . Following the procedure above,
g
1 1
det
0 1 1 1
e = = = −
det M −27 27
and
−12 1
det
15 0 −15
g = = ,
det M −27
which we don’t simplify ... yet. Continuing on to solve for f and h results in the final
inverse matrix
−1 −1/27 1/27 1 −1 1
M = = ,
15/27 12/27 27 15 12
is non-singular, then compute its inverse using Cramer’s rule. Check your work.]
Remember this
87
?
Your proved ⇒ in an exercise. Now you can do ⇐ in another exercise.
[Exercise. Using Cramer’s rule as a starting point, prove that
det(M ) 6= 0 ⇒ M is non-singular .
88
Chapter 4
Hilbert Space
The qubit shown above is a vector having unit length, and the α, β are
its coordinates in what we shall see is the vector space’s natural basis
{ |0i , |1i }.
89
4.1.2 The Complex Nature of a Qubit
We need to know whatever is possible to know about a hypothetical quantum circuit
using paper and pencil so we can predict what the hardware will do before we build
it. I’m happy to report that the mathematical language that works in this situation is
one of the most accurate and well tested in the history of science: quantum mechanics.
It behooves me to report that a real vector space like R2 or R3 doesn’t work. To make
accurate predictions about qubits and the logic circuitry that entangles them, we’ll
need to allow α and β to be complex. Thus, our vector space must be defined over a
complex scalar field.
Besides the straightforward consequences resulting from the use of complex arith-
metic, we’ll have to define an inner product that differs from the simple dot product
of real vector spaces. At that point we’ll be working in what mathematicians call
complex Hilbert space or simply Hilbert space. To make the correspondence between
our math and the real world complete, we’ll have to add one last tweak: we will con-
sider only vectors of unit length. That is, we will need to work on something called
the projective sphere a.k.a projective Hilbert space.
Today we learn how to manipulate vectors in a complex Hilbert space and the
projective sphere within it.
Except for the inner product, everything else works like the real space Rn with
the plot twist that the components are complex. In fact, the natural basis is actually
identical to Rn ’s basis.
[Exercise. Prove that the same natural basis works for both Rn and Cn .]
There are, however, other bases for Cn which have no counterpart in Rn , since
their components can be complex.
[Exercise. Drum up a basis for Cn that has no real counterpart. Then find a
vector in Cn which is not in Rn but whose coordinates relative to this basis are all
real.]
90
[Exercise. Is Rn ⊆ Cn as a set? As a subspace? Justify your answers.]
we’d have a problem with lengths of vectors which we want to be ≥ 0. Recall that
lengths are defined by dotting a vector with itself (I’ll remind you about the exact
details in a moment). To wit, for a complex a,
n−1
? X
a·a = ak 2
k=0
is not necessarily real, never mind non-negative (try a vector whose components are
all 1 + i). When it is real, it could still be negative:
5i 5i
· = −25 + 9 = −16 .
3 3
When defined in this way, some authors prefer the term inner product, reserving
dot product for the real vector space analog (although some authors say complex dot
product, so you have to adapt.)
Notation. An alternative notation for the (complex) inner product is sometimes
used,
ha, bi ,
ha | bi .
ha | bi 6= hb | ai .
91
There is an important relationship between the two, however, namely
ha | bi∗ = hb | ai .
[Exercise. Show that the definition of complex inner product implies the above
result.]
Caution #2. The complex inner product can be defined by conjugating the bk s,
rather than the ak s, and this would produce a different result, one which is the complex
conjugate of our defined inner product. However, physicists – and we – conjugate the
left-hand vector’s coordinates because it produces nicer looking formulas.
Example. Let
1+i 1 − 2i
a = and b = .
3 5i
Then
In our linear algebra lesson we learned that inner products are distributive. In the
second position, this means
ha | b + b0 i = ha | bi + ha | b0 i ,
and the same is true in the first position. For the record, we should collect two more
properties that apply to the all-important complex inner product.
c ha | bi = ha | cbi .
c ha | bi = hc∗ a | bi .
92
4.3.1 Norm and Distance
The inner product on Cn confers it with a metric, that is, a way to measure things.
There are two important concepts that emerge:
1. Norm. The norm of vector, now repurposed to our current complex vector
space, is defined by
v v
u n−1 u n−1
uX 2
uX
kak ≡ t |ak | = t (ak )∗ ak .
k=0 k=0
With the now correct definition of inner product we get the desired behavior when
we compute a norm,
n−1
X n−1
X
2
kak ≡ ha | ai = (ak )∗ ak = |ak |2
k=0 k=0
≥ 0.
Since the length (norm or modulus) of a, kak, is the non-negative square root of this
value, once again all is well: lengths are real and non-negative.
More Notation. You may see the modulus for a vector written in normal (not
bold) face, as in
a = kak .
|a| = kak .
93
Complex Orthonormality and Linear Independence
We had a little theorem about orthonormal sets in real vector spaces. It’s just as true
for complex vector spaces with the complex inner product.
Theorem. If a set of vectors {ak } is orthonormal, it is necessarily linearly inde-
pendent.
Proof. We’ll assume the theorem is false and arrive at a contradiction. Say {ak }
is an orthonormal collection, yet one of them, a0 , is a linear combination of the others,
i.e.,
n−1
X
a0 = ck ak .
k=1
a contradiction. QED
94
(We do this easy example because the answer is obvious: the coordinates along A
should match the pure vector components, otherwise we have a problem with our
technique. Let’s see ...)
We seek
v0
.
v1 A
and
so
1+i 1+i
= ,
1−i 1−i A
as expected (X). Of course that wasn’t much fun since natural basis vector compo-
nents were real (0 and 1), thus there was nothing to conjugate. Let’s do one with a
little crunch.
2 1+i
Example. In C we expand the same v = along the basis B,
1−i
√ √
n o
√ 2/2 i √2/2
B ≡ b̂0 , b̂1 = , .
2/2 −i 2/2
First, we confirm that this basis is orthonormal (because the dot-product trick only
works for orthonormal bases).
D E
b̂0 b̂1 = (b00 )∗ b10 + (b01 )∗ b11
√ √ √ √
= ( 2/2) (i 2/2) + ( 2/2) (−i 2/2)
= i/2 − i/2 = 0.X
Also,
D E
b̂0 b̂0 = (b00 )∗ b00 + (b01 )∗ b01
√ √ √ √
= ( 2/2) ( 2/2) + ( 2/2) ( 2/2)
= 1/2 + 1/2 = 1 X,
95
and
D E
b̂1 b̂1 = (b10 )∗ b10 + (b11 )∗ b11
√ √ √ √
= (−i 2/2) (i 2/2) + (i 2/2) (−i 2/2)
= 1/2 + 1/2 = 1 X,
and
D E
v1 = b̂1 v = (b10 )∗ v0 + (b11 )∗ v1
√ √
= (−i 2/2) (1 + i) + (i 2/2) (1 − i)
√
= 2,
so
√
1+i √2, .
=
1−i 2, B
[Exercise. Work in C2 and use the same v as above to get its coordinates relative
to the basis C,
1 1+i 1 2+i
C ≡ {ĉ0 , ĉ1 } = √ , √ .
3 1 15 −3 + i
Before starting, demonstrate that this basis is orthonormal.]
96
4.4 Hilbert Space
4.4.1 Definitions
A Hilbert Space is a real or complex vector space that
2. is complete.
You already know what an inner product is. A vector space that has an inner product
is usually called an inner-product space.
Completeness is different from the property that goes by the same name (unfor-
tunately) involving vectors spanning a space. This is not that. Rather, completeness
here means (please grab something stable and try not to hit your head) that any
Cauchy sequence of vectors in the space converges to some vector also in the space.
Unless you remember your advanced calculus, completeness won’t mean much, and
it’s not a criterion that we will ever use, explicitly. However, if you want one more
degree of explanation, read the next paragraph.
The inner product, as we saw, imbues the vector space with a distance function,
dist( · , · ). Once we have a distance function, we can inspect any infinite sequence
of vectors, like { a0 , a1 , a2 , . . . }, and test whether the distance between consecutive
vectors in that sequence, dist(ak+1 , ak ), get small fast (i.e., obeys the Cauchy criterion
– please research the meaning of this phrase elsewhere, if interested). If it does,
{ a0 , a1 , a2 , . . . } is called a Cauchy sequence. The vector space is complete if every
Cauchy sequence of vectors in the space approaches a unique vector, also in the space,
called the limit of the sequence.
Illustration of Completeness. The completeness criterion does not require a
vector space in order that it be satisfied. It requires only a set and a norm (or metric)
which allows us to measure distances. So we can ask about completeness of many sets
that are not even vector spaces. The simplest example happens to not be a vector
space.
Consider the interval [0, 1] in R which includes both endpoints. This set
is
1 ∞
complete with respect to the usual norm in R. For example, the sequence 1 − k k=2
is Cauchy and converges to 1, which is, indeed, in [0, 1]. This does not establish that
[0, 1] is complete, but it indicates the kind of challenging sequence that might prove
a set is not Cauchy.(See Figure 4.1)
For a counter-example, consider the interval (0, 1) in R which does not include
either endpoint. The same sequence, just discussed, is still in the set, but its limit,
1, is not. So this set is not complete. (See Figure 4.2)
97
∞
Figure 4.1: The Cauchy sequence 1 − k1 k=2 has its limit in [0, 1]
∞
Figure 4.2: The Cauchy sequence 1 − k1 k=2 does not have its limit in (0, 1)
Notation
When I want to emphasize that we’re working in a Hilbert space, I’ll use the letter
H just as I use terms like R3 or Cn to denote real or complex vector spaces. H could
take the form of a C2 or a Cn , and I normally won’t specify the dimension of H until
we get into tensor products.
Thus, we already know two classes of inner-product spaces that are “Hilbert”.
98
Infinite Dimensional Hilbert Spaces
When physicists use the term Hilbert space, they often mean something slightly more
exotic: function spaces. These are vector spaces whose vectors consist of sufficiently
well-behaved functions, and whose inner product is usually defined in terms of an
integral.
The Space L2 [a, b]. For example, all complex-valued functions, f (x), defined
over the real interval [a, b] and which are square-integrable, i.e.,
Z b
|f (x)|2 dx < ∞ ,
a
form a vector space. We can define an inner product for any two such functions, f
and g, using
Z b
hf | gi ≡ f (x)∗ g(x) dx ,
a
and this inner-product will give a distance and norm that satisfy the completeness
criterion. Hilbert spaces very much like these are used to model the momentum and
position of sub-atomic particles.
Triangle Inequality
Both the real dot product of Rn and the complex inner product of Cn satisfy the
triangle inequality condition: For any vectors, x, y and z,
dist(x, z) ≤ dist(x, y) + dist(y, z) .
Pictured in R2 , we can see why this is called the triangle inequality. (See Figure 4.3).
[Exercise. Pick three vectors in C3 and verify that the triangle inequality is
satisfied. Do this at least twice, once when the three vectors do not all lie on the
same complex line, { α x α ∈ C } and once when all three do lie on the same line
and y is “between” x and z.]
Cauchy-Schwarz Inequality
99
Figure 4.3: Triangle inequality in a metric space.svg from Wikipedia
The LHS is the absolute-value-squared of a complex scalar, while the RHS is the
product of two vector norms (squared), each of which is necessarily a non-negative
real. Therefore, it is an inequality between two non-negative values. In words, it says
that the magnitude of an inner product is never more than the product of the two
component vector magnitudes.
The Cauchy-Schwarz inequality becomes an exact equality if and only if the two
vectors form a linearly dependent set, i.e., one is a scalar multiple of the other.
[Exercise. Pick two vectors in C3 and verify that the Cauchy-Schwarz inequality
is satisfied. Do this at least twice, once when the two vectors are linearly independent
and once when the are linearly dependent.]
Each possible physical state of our quantum system – a concept we will cover soon
– will be represented by a unit vector (vector of length one) in H. Stated differently,
any two non-zero H-vectors that differ by a complex scalar represent the same state,
100
so we may as well choose a vector of length one (kvk = 1) that is on the same ray as
either (or both) of those two; we don’t have to distinguish between any of the infinite
number of vectors on that ray.
This equivalence of all vectors on a given ray makes the objects in our mathemat-
ical model not points in H, but rays through the origin of H.
If two complex n-tuples differ by a mere scalar multiple, α, they are to be consid-
ered the same state in our state space, i.e.,
b = αa (as H-space vectors)
⇒
b ∼ a (as a physical state),
to be read “a is equivalent to b,” or “a and b represent the same quantum state.”
This identifies
with
Example. All three of these complex ordered-pairs lie on the same ray in C2 , so
they all represent the same quantum state modeled by this two-dimensionbal H:
3−i
!
2 −10 5
a= , b= and c = 1+3i
i −5i
10
101
[Exercise. Elaborate.]
Dividing any one of them by its norm will produce a unit vector (vector with
modulus one) which also represents the same ray. Using the simplest representative,
2 2
√2
!
a i i 5
= p = √ = i
,
kak (2)(2) + (i)(−i) 5 √
5
often written as
1 2
√ .
5 i
Figure 4.5: Dividing a vector by its norm yields a unit vector on the same ray
102
√
Example (continued). We have seen that a= (2, i)t has norm 5. A different
Cn representative for this H-space vector is
πi/6 eπi/6
a. We can easily see that eπi/6
a
has the same length as a. In words, e = 1 (prove it or go back and review your
complex arithmetic module), so multiplying by eπi/6 , while changing the Cn vector,
will not change its modulus (norm). Thus, that adjustment not only produces a
different representative, it does so without changing the norm. Still, it doesn’t hurt
to calculate norm of the product the long way, just for exercise:
πi/6 !
πi/6 2e 2 (cos π/6 + i sin π/6)
e a = =
i eπi/6 i (cos π/6 + i sin π/6)
2 cos π/6 + 2i sin π/6
=
−1 sin π/6 + i cos π/6
√ !
2 23 + 2i 12
= √
−1 21 + i 23
√ !
3 + i
= √
− 21 + i 2 3
√ !
1 2 3 + 2i
= √
2 −1 + i 3
I did the hard part: I simplified the rotated vector (we call the act of multiplying by
eiθ for real θ a rotation because of the geometric implication which you can imagine,
look-up, or simply accept).
√ All that’s left to do in this computation is calculate the
norm and see that it is 5.
[Exercise. Close the deal.]
• The mathematical entity that we get if we take the collection of all rays, {[a]}, as
its “points,” is a new construct with the name: complex projective sphere. This
new entity is, indeed, in one-to-one correspondence with the quantum states,
but ...
103
• ... the complex projective sphere is not a vector space, so we don’t want to go
too far in attempting to define it; any attempt to make a formal mathematical
entity just so that it corresponds, one-to-one, with the quantum states it models
results in a non-vector space. Among other things, there is no 0-vector in such
a projective sphere, thus, no vector addition.
The Drill. With this in mind, we satisfy ourselves with the following process. It
may lack concreteness until we start working on specific problems, but it should give
you the feel for what’s in store.
4. In all cases, we will take care to apply valid operations to our unit “state vector,”
making sure that we end up with an answer which is also a unit vector on the
projective sphere.
4.5.3 Why?
There one question (at least) that you should ask and demand be answered.
Why is this called a “projective sphere?” Good question. Since the states
of our quantum system are rays in H, and we would prefer to visualize vectors as
points, not rays, we go back to the underlying Cn and project the entire ray (maybe
collapse would be a better word) onto the surface of an n-dimensional sphere (whose
real dimension is actually 2(n − 1), but never mind that). We are projecting all those
representatives onto a single point on the complex n-sphere. (See Figure 4.5.) Cau-
tion: Each point on that sphere still has infinitely many representatives impossible
to picture due to a potential scalar factor eiθ , for real θ.]
None of this is to say that scalar multiples, a.k.a. phase changes, never matter.
When we start combining vectors in H, their relative phase will become important,
and so we shall need to retain individual scalars associated with each component
n-tuple. Don’t be intimidated; we’ll get to that in cautious, deliberate steps.
104
4.6 Almost There
We have one last math lesson to dance through after which we will be ready to
learn graduate level quantum mechanics (and do so without any prior knowledge of
undergraduate quantum mechanics). This final topic is linear transformations. Rest
up. Then attack it.
105
Chapter 5
Linear Transformations
Y ,
106
depending which basis we are using. For example, there is a basis in which the same
logic gate has a different matrix, i.e.,
1 0
Y = .
0 1
This should disturb you. If a matrix defines a quantum logic gate, and there can be
more than one matrix describing that gate, how can we ever know anything?
There is a more fundamental concept than the matrix of linear algebra, that of
the linear transformation. As you’ll learn today, a linear transformation is the basis-
independent entity that describes a logic gate. While a linear transformation’s matrix
will change depending on which underlying basis we use to construct it, its life-giving
linear transformation remains fixed. You can wear different clothes, but underneath
you’re still you.
107
or they can move vectors around, keeping them in the same space,
T
V −−−−−−−−→ V .
They describe actions that we take on our vectors. They can move a vector by
mapping it onto another vector. They can also be applied to vectors that don’t move
at all. For example, we often want to expand a vector along a basis that is different
from the one originally provided, and linear transformations help us there as well.
T (v) 7−→ w .
That describes the mapping aspect (Figure 5.1). However, a linear transformation
has the following additional properties that allow us to call the mapping linear :
T (c v) = c T (v) and
T (v1 + v2 ) = T (v1 ) + T (v2 ) .
Here, c is a scalar of the domain vector space and v1 , v2 are domain vectors. These
conditions have to be satisfied for all vectors and scalars.
Besides possibly being different spaces, the domain and range can also have dif-
ferent dimensions. However, today, we’ll just work with linear transformation of a
space into itself.
A linear transformation can be interpreted many different ways. When learning
about them, it’s best to think of them as mappings or positional changes which
108
convert a vector (having a certain direction and length) into a different vector (having
a different direction and length).
Notation. Sometimes the parentheses are omitted when applying a linear trans-
formation to a vector:
Tv ≡ T (v) .
Here are a few useful linear transformations, some of which refer to Rn or Cn , some
to function spaces not studied much in this course. I’ve included figures for a few
that we’ll meet later today.
v is any vector in an n dimensional vector space on which the linear transformation
acts. x̂k is the kth natural basis vector, vk is the kth coordinate of v in the natural
basis, c is a scalar, and n̂ is any unit vector.
1(v) ≡ v (Identity)
0(v) ≡ 0 (Zero)
Sc (v) ≡ c v (Scale) (F igure 5.2)
Pk (v) ≡ vk x̂k (Projection onto x̂k ) (F igure 5.3)
Pn̂ (v) ≡ (v · n̂) n̂ (Projection onto n̂) (F igure 5.4)
D(ϕ) ≡ ϕ0 (Differentiation)
Z x Z x
(f ) ≡ f (x0 ) dx0 (Anti-differentiation)
Let’s pick one and demonstrate that it is, indeed, linear. We can prove both
conditions at the same time using the vector cv1 + v2 as a starting point. I’ve
109
Figure 5.3: Projection onto the direction ẑ, a.k.a. x̂3
Therefore,
TA (v) ≡ Av
is linear. This is the linear transformation, TA , induced by the matrix A. We’ll look
at it more closely in a moment.
[Exercise. Prove these two claims about matrix-multiplication.]
110
[Exercise. Look at these mapping of C3 into itself.
x 2x
T1 : y
7−→ 2y
z 2z
x √2ix
T2 : y 7−→ 3y
z (4 − 1i)z
x x+2
T3 : y 7−→ y +√ 2i
z z+ 2
x −y
T4 : y
7−→ x
z z
x xy
T5 : y
7−→ y
z z2
x 0
T6 : y 7−→ xyz
z 0
Which are linear, which are not? Support each claim with a proof or counter example.]
We now apply T to v and make use of the linearity (the sum could be infinite) to get
n
! n
X X
Tv = T βk bk = βk T (bk )
k=1 k=1
111
What does this say? It tells us that if we know what T does to the basis vectors,
we know what it does to all vectors in the space. Let’s say T is some hard-to-
determine function which is actually not known analytically (by formula), but by
experimentation we are able to determine its action on the basis. We can extend that
knowledge to any vector because the last result tells us that the coordinates of the
vector combined with the known values of T on the basis are enough. In short, the
small set of vectors
{T (b1 ) , T (b2 ) , ...}
completely determines T .
The plan
The Details
Figures 5.5 and 5.6 show the result of the rotation on the two basis vectors.
112
Figure 5.6: Rotation of ŷ counter-clockwise by π/2
therefore, for any v, with natural basis coordinates (vx , vy )t , linearity allows us to
write
Rπ/2 (v) = Rπ/2 (vx x̂ + vy ŷ)
= vx Rπ/2 (x̂) + vy Rπ/2 (ŷ)
= vx ŷ − vy x̂ .
From knowledge of the linear transformation on the basis alone, we were able to derive
a formula applicable to the entire space. Stated in column vector form,
x −y
Rπ/2 = ,
y x
and we have our formula for all space (assuming the natural basis coordinates). It’s
that easy.
[Exercise. Develop the formula for a counter-clockwise rotation (again in R2 )
through an arbitrary angle, θ. Show your derivation based on its effect on the natural
basis vectors.]
[Exercise. What is the formula for a rotation, Rz, π/2 , about the z-axis in R3
through a 90◦ angle, counter-clockwise when looking down from the positive z-axis.
Show your derivation based on its effect on the natural basis vectors, {x̂, ŷ, ẑ}.]
113
which turns out to be linear. As you see, both A, and therefore, TA , might map
vectors into a different-sized vector space; sometimes m > n, sometimes m < n and
sometimes m = n.
We showed that the action of T on the few vectors in basis A completely determines
its definition on all vectors v. This happened because of linearity,
n
X
Tv = αk T (ak ) .
k=1
Let’s write the sum in a more instructive way as the formal (but not quite legal) “dot
product” of a (row of vectors) with a (column of scalars). Shorthand for the above
sum then becomes
! α1
α2
Tv = T (a1 ) , T (a2 ) , . . . , T (an ) “ · ” .. .
.
αn
Now we expand each vector T (ak ) vertically into its coordinates relative to the same
basis, A, and we will have a legitimate product,
(T a1 )1 (T a2 )1 · · · (T an )1 α1
(T a1 )2 (T a2 )2 · · · (T an )2 α2
.. .. .
.. .. . .
. . . . .
(T a1 )n (T a2 )n . . . (T an )n αn
114
which reveals that T is nothing more than multiplication by a matrix made up of the
constants ajk .
Executive Summary. To get a matrix, MT , for any linear transformation, T ,
form a matrix whose columns are T applied to each basis vector.
We can then multiply any vector v by the matrix MT = ajk to get T (v).
Notation
Because this duality between matrices and linear transformations is so tight, we rarely
bother to distinguish one from the other. If we start with a linear transformation, T ,
we just use T as its matrix (and do away with the notation MT ). If we start with a
matrix, A, we just use A as its induced linear transformation, (and do away with the
notation TA ).
Example
We’ve got the formula for a linear transformation that rotates vectors counter-clockwise
in R2 , namely,
x −y
Rπ/2 = .
y x
To compute its matrix relative to the natural basis, {x̂, ŷ}, we form
1 0
MRπ/2 = Rπ/2 (x̂) , Rπ/2 (ŷ) = Rπ/2 , Rπ/2
0 1
0 −1
= .
1 0
as required.
[Exercise. Show that the matrix for the scaling transformation ,
S3i (v) ≡ 3i v ,
is
3i 0
MS3i = ,
0 3i
and verify that it works by multiplying this matrix by an arbitrary vector to recover
the definition of S3i .]
115
5.4.3 Dependence of a Matrix on Basis
When we are using an obvious and natural orthonormal basis (like {x̂, ŷ, ẑ} for C3 ),
everything is fairly straightforward. However, there will be times when we want a
different basis than the one we thought, originally, was the preferred or natural. The
rule is very strict, but simple.
In short, don’t mix bases when you are expressing vectors and linear transformations.
And don’t forget what your underlying common basis is, especially if it’s not the
preferred basis.
Remember, there is always some innate definition of our vectors, independent of
basis. Distinct from this, we have the expression of those vectors in a basis.
Say v is a vector. If we want to express it using coordinates along the A basis on
Sunday, then “cabaret” along the B basis all day Monday, that’s perfectly fine. They
are all the same object, and this notation could be employed to express that fact:
v = v|A = v|B
2
This is confusing for beginners,
because they usually start with vectors in R , which
x
have innate coordinates, , that really don’t refer to a basis. As we’ve seen,
y
though, the vector happens to look the same as its expression in the natural basis,
{x̂, ŷ}, so without the extra notation, above, there’s no way to know (nor is there
usually the need to know) whether the author is referring to the vector, or its coordi-
nates in the natural basis. If we ever want to disambiguate the conversation, we will
use the notation, v|A , where A is some basis we have previously described.
The same goes for T ’s matrix. If I want to describe T in a particular basis, I will
say something like
T = T |A = T |B ,
and now I don’t even need to use MT , since the |A or |B implies we are talking about
a matrix.
So the basis-free statement,
w = T (v) ,
can be viewed in one basis as
w|A = T |A · v|A ,
and in another basis as
w|B = T |B · v|B .
116
Matrix of MT in a Non-Standard Basis
There was nothing special about the preferred basis in this formula; if we had any
basis – even one that was non-orthonormal – the formula would still be
!
T = T (b1 ) , T (b2 ) , . . . , T (bn ) .
B B B B
The way to see this most easily is to first note that in any basis B, each basis vector
bk , when expressed in its own B-coordinates, looks exactly like the kth preferred basis
element, i.e.,
0
..
.
bk = 1 ←− kth element .
B .
..
0 B
Example
The transformation
x x
T : 7−→
y (3i) y
117
in the context of the vector space C2 is going to have the preferred-basis matrix
1 0
MT = .
0 3i
We compute each T (ck ) , k = 1, 2.
C
First up, T (c1 ) :
C
2 2
T (c1 ) = T = .
0 0
Everything needs to be expressed in the C basis, so we show that for this vector:
2 2 1 1
= 1 + 0 = ,
0 0 1 0 C
so this last column vector will be the first column of our matrix.
Next, T (c2 ) :
C
1 1
T (c2 ) = T = .
1 3i
We have to express this, too, in the C basis, a task that requires a modicum of algebra:
1 2 1
= α + β .
3i 0 1
118
so
!
1−3i
1
2
T (c2 ) = =
3i
3i
C
Even though we already know how to get the matrix in the preferred basis, A, let’s
analyze it, briefly. Take the scaling transformation S√2/2 in R2 whose matrix is
√
2/2 √ 0
.
0 2/2 A
(You constructed the matrix for an Sc in an earlier exercise today using the scaling
factor c = 3i, but just to summarize, we apply S√2/2 to the two vectors (1, 0)t and
(0, 1)t , then make the output vectors the columns of the required matrix.) Remember,
each column is computed by listing the coordinates of T (x̂) and T (ŷ) in the natural
basis. But for either vector, its coordinates can be expressed using the “dotting” trick
because we have an orthonormal basis. So,
√
2/2 √ 0
x̂ x̂
√ 0 2/2
2/2 √ 0
= .
0 2/2 A √
2/2 0
ŷ √
ŷ
0 2/2
This is true for any T and the preferred basis in R2 . To illustrate, let’s take the upper
left component. It’s given by the dot product
T11 = hx̂ | T (x̂)i .
119
Similarly, the other three are given by
In other words,
!
hx̂ | T (x̂)i hx̂ | T (ŷ)i
T = .
A hŷ | T (x̂)i hŷ | T (ŷ)i
To make things crystal clear, let’s rename the natural basis vectors
A ≡ { ê1 , ê2 },
But this formula, and the logic that led to it, would work for any orthonormal basis,
not just A, and in any vector space, not just R2 .
Summary. The jkth matrix element for the transformation, T , in an orthonormal
basis,
B = { b1 , b2 , . . . . bn }
is given by
D E
Tjk = b̂j T (b̂k ) ,
B
so
D E D E
b̂1 T (b̂1 ) b̂1 T (b̂2 )
T = E.
D E D
B b̂2 T (b̂1 )
b̂2 T (b̂2 )
Not only that, but we don’t even have to start with a preferred basis to express our
T and B that are used in the formula. As long as T and B are both expressed in
C –E
the same basis – say someD third we can use the coordinates and matrix elements
relative to C to compute b̂j T (b̂k ) and thus give us the matrix of TB .
Happy Note. That third basis, C, doesn’t even have to be orthonormal. As long
as B, the desired basis in which we are seeking a new representation, is orthonormal,
it all works.
120
Example 1
in a basis that we encountered in a previous lesson (which was named C then, but
we’ll call B today, to make clear the application of the above formulas),
√ √
B ≡ {b1 , b2 } = √2/2 , −√ 2/2 .
2/2 2/2
D √ E D √ E
2 2
b̂1 2 b̂1 b̂1 2 b̂2
=
D √ E D √ E
2 2
b̂2 2 b̂1 b̂2 2 b̂2
* !+ * !+
1 −1
2 2
b̂1 b̂1 1
1 2
2
=
* !+ * !+
1 − 1
2 2
b̂2 b̂2 1
1 2
2
√
2
2
0
= .
√
2
0 2
B
That’s surprising (or not). The matrix is the same in the B basis as it is in the A
basis.
2) Is this going to be true for all orthonormal bases and all transformations?
These are the kinds of questions you have to ask yourself when you are manipulating
mathematical symbols in new and unfamiliar territory. I’ll walk you through it.
1) Does it make sense?
121
Let’s look at the result of
−3
S√ 2/2 10
in both bases.
• First we’ll compute it in the A-basis and transform the resulting output vector
to the B-basis.
• When we’ve done that, we’ll start over, but this time first convert the starting
vector and matrix into B-basis coordinates and use those to compute the result,
giving us an answer in terms of the B-basis.
• We’ll compare the two answers to see if they are equal.
By picking a somewhat random-looking vector, (−3, 10)t , and discovering that
both T |A and T |B turn it into equivalent output vectors, we will be satisfied that the
result makes sense; apparently it is true that the matrix for S√2/2 looks the same
when viewed in both bases. But we haven’t tested that, so let’s do it.
A-Basis. In the A basis we already know that the output vector must be
√
−3 √2/2
5 2 ,
A
because that’s just the application of the transformation to the innate vector, which
is the same as applying the matrix to the preferred coordinates. We transform the
output vector into B-basis:
√
−3 √2/2
√ b1
−3 √2/2
5 2
= √
5 2
A
−3 2/2
√
b2
5 2
B
! !
− 32 + 5 7
2
= =
3 13
2
+5
B 2 B
−3
B-Basis. Now, do it again, but this time convert the input vector, , to
10
B-coordinates, and run it through S√2/2 B . First the input vector:
−3
b 1
10
−3
=
10 A −3
b2
10
B
√ √ ! √ !
− 3 2 2 + 102 2 7 2
2
= = .
√ √ √
3 2 10 2 13 2
2
+ 2
B 2 B
122
Finally, we apply S√2/2 B to these B coordinates:
√
√ ! 2 √ !
7 2 2
0 7 2
2 2
S√2/2 =
√ √
B 13 2 √ 13 2
2 B 2 2
0 2
7
!
2
= .
13
2 B
And ... we get the same answer. Apparently there is no disagreement, and other tests
would give the same
results, so it seems we made no mistake when we derived the
matrix for S√2/2 B and got the same answer as S√2/2 A .
This is not a proof, but it easy enough to do (next exercise). A mathematically-
minded student might prefer to just do the proof and be done with it, while an
applications-oriented person may prefer just to test it on a random vector to see if
s/he is on the right track.
[Exercise. Prove that for any v, you get the same result for S√2/2 (v) whether
you use coordinates in the A basis or in the B basis, thus confirming once-and-for-all
that the matrix of this scaling transformation is the same in both bases.]
2) Do we always get the same matrix?
Certainly not – otherwise would I have burdened you with a formula for computing
the matrix of a linear transformation in an arbitrary basis?
Example 2
! !
(Pn̂ )11 (Pn̂ )12 hê1 | Pn̂ (ê1 )i hê1 | Pn̂ (ê2 )i
Pn̂ = = .
A (Pn̂ )21 (Pn̂ )22 hê2 | Pn̂ (ê1 )i hê2 | Pn̂ (ê2 )i
Since
1/2
Pn̂ (ê1 ) = and
1/2
1/2
Pn̂ (ê2 ) =
1/2
123
(exercise: prove it), then
1/2 1/2
Pn̂ =
A 1/2 1/2
(exercise: prove it).
The B Matrix. Now use the B basis.
D E D E
b̂ P ( b̂ ) b̂1 Pn̂ (b̂2 )
1 n̂ 1
Pn̂ = = D E .
E D
B b̂2 Pn̂ (b̂1 ) b̂2 Pn̂ (b̂2 )
B
Remember, for this to work, both b̂1 and Pn̂ (b̂1 ) have to be expressed in this same
basis, and we almost always express them in the B basis, since that’s how we know
everything. Now,
√
Pn̂ (b̂1 ) = √2/2 and
2/2
0
Pn̂ (b̂2 ) =
0
(exercise: prove it), so
1 0
Pn̂ =
B 0 0 B
(exercise: prove it), a very different matrix than Pn̂ .
A
t
[Exercise. Using (−3, 10) (or a general v) confirm (prove) that the matrices
Pn̂ A and Pn̂ B produce output coordinates of the identical intrinsic output vector,
Pn̂ (−3, 10)t . Hint: use an argument similar to the one above.]
124
In other words, we form the transpose of M , then take the complex conjugate of every
element in that matrix.
Examples.
†
1−i 3 √ π 1+i 0 6
√−2.5 −7i
0 −2.5 7 + 7i = 3
6 7i 88 π 7 − 7i 88
† 1+i 5
1−i 3 + 2i √ π 99 3 − 2i
√−2.5
=
5 −2.5 7 + 7i ei π 7 − 7i
99 e−i
When either m = 1 or n = 1, the matrix is usually viewed as a vector. The adjoint
operation turns column vectors into row vectors (while also transposing them), and
vice versa:
!†
1 + i √
√ = 1 − i , 2 + 2i
2 − 2i
!
√ † 3 − 7i
3 + 7i , 2 = √
2
Because linear transformations always have a matrix associated with them (once we
have established a basis), we can carry the definition of adjoint easily over to linear
transformations.
Given a linear transformation, T with matrix MT (in some agreed-upon basis), its
adjoint, T † , is the linear transformation defined by the matrix (multiplication) MT † .
It can be stated in terms of its action on an arbitrary vector,
T † (v) ≡ (MT )† · v , for all v ∈ V .
[Food for Thought. This definition requires that we have a basis established,
otherwise we can’t get a matrix. But is the adjoint of a linear transformation, in this
definition, going to be different for different bases? Hit the CS 83A discussion forums,
please.]
125
Quantum Logic Gates Are Unitary Operators
v U Uv
Quantum computing uses a much richer class of logical operators (or gates) than
classical computing. Rather than the relatively small set of gates: XOR, AND,
NAND, etc., in classical computing, quantum computers have infinitely many different
logic gates that can be applied. On the other hand, the logical operators of quantum
computing are of a special form not required of classical computing; they must be
unitary, i.e., reversible.
h Uv | Uw i = hv|wi ,
While this is a statement about the preservation of inner products, it has many
implications, one of which is that that distances are preserved, i.e., for all v, kU vk =
kvk .
Caution. This theorem is true, in part, because inner products are positive-
definite, i.e., v 6= 0 ⇒ kvk > 0 (lengths of non-zero vectors are strictly positive).
However, we’ll encounter some very important pseudo inner products that are not
positive definite, and in those cases the three conditions will not be interchangeable.
More on that when we introduce the classical bit and quantum bit, a fews lectures
hence.
A concrete example that serves as a classic specimen is a rotation, Rθ , in any
Euclidean space, real or complex.
126
Notation. It is common to use the letter, U , rather than T , for a unitary operator
(in the absence of a more specific designation, like Rθ ).
Because of this theorem, we can use any of the three conditions as the definition
of unitarity, and for practical reasons, we choose the third.
Practical Definition #1. A linear transformation, U , is unitary (a.k.a. a
unitary operator ) if its matrix (in any orthonormal basis) has orthonormal columns
(or equivalently orthonormal rows), i.e., U is unitary ⇐⇒ for any orthonormal basis,
B = {bk }nk=1 ,
the column vectors of U B satisfy
Note that we only need to verify this condition for a single orthonormal basis (exercise,
below).
The Matrix of a Unitary Operator. A Matrix, M , is called unitary if its
adjoint is also its inverse, i.e.,
M †M = MM† = 1.
U †U = UU† = 1.
127
Dotting the two columns, we get
cos θ sin θ
· = cos θ sin θ − sin θ cos θ = 0,
− sin θ cos θ
and dotting a column with itself (we’ll just do the first column to demonstrate) gives
cos θ cos θ
· = cos2 θ + sin2 θ = 1 ,
− sin θ − sin θ
Because this is a complex matrix, we have to apply the full inner-product machinery,
which requires that we not forget to take conjugates. The inner product of the two
columns is easy enough,
iθ
e 0
= e−iθ · 0 + 0 · eiφ = 0 ,
0 eiφ
but do notice that we had to take the complex conjugate of the first vector – even
though failing to have done so would have been an error that still gave us the right
answer. The inner product of a column with itself (we’ll just show the first column)
gives
iθ iθ
e e
= e−iθ · eiθ + 0 · 0
0 0
= e0 + 0 = 1,
and we have orthonormality. Once again, the complex conjugate was essential in the
computation.
128
whose columns are orthogonal (do it), but whose column vectors are not unit length,
since
c c
= c∗ c + 0 · 0 = |c|2 6= 1 ,
0 0
by construction.
Note. In our projective Hilbert spaces, such transformations don’t really exist,
since we consider all vectors which differ by a scalar multiple to be the same entity
(state). While this example is fine for learning, and true for non-projective Hilbert
spaces, it doesn’t represent a real operator in quantum mechanics.
A Projection Operator. Another example we saw a moment ago is the projec-
tion onto a vector’s 1-dimensional subspace in R2 (although this will be true of such
a projection in Rn , n ≥ 2). That was
√
2/2
Pn̂ (v) ≡ (v · n̂) n̂ , where n̂ = √ .
2/2
We only need look at Pn̂ (v) in either of the two bases for which we computed its
matrices to see that this is not unitary. Those are done above, but you can fill in a
small detail:
[Exercise. Show that both matrices for this projection operator fail to have
orthonormal columns.]
whose four components are, of course, unknown to us. We’ll express them
temporarily as
α γ
.
β δ D
129
3. Computing the matrix in the previous step requires a little calculation: you
cannot use the earlier “trick”
D E
Tjk = b̂j T (b̂k ) ,
B
by drawing a picture. Now, get the D-coordinates of this vector by solving the
A-coordinate equation,
0 2 1
= α + β
2 0 1
for α and β.
4. Even though this will immediately tell you that Rπ/2 D is not orthonormal, go
on to get the full matrix by solving for the second column. (γ, δ)t , and showing
that neither column is normalized and the two columns are not orthogonal.]
130
Definitions and Examples
M is Hermitian
⇐⇒
M† = M.
are Hermitian.]
[Exercise. Explain why the matrices
1+i 0 6 1 0 3 0
√−2.5 −7i
3 and 0 2 −1 0
π 7 − 7i 88 3 −1 3 0
The definition of Hermitian operator implies that the basis chosen does not matter:
either all of T ’s matrices in all bases will be Hermitian or none will be. This is a fact,
but we won’t bother proving it.
Hermitian operators play a starring role in quantum mechanics, as we’ll see shortly.
131
Chapter 6
132
6.2.2 The Physical System, S
Here’s the set-up. We have a physical system, call it S (that’s a script “S”), and an
apparatus that can measure some property of S . Also, we have it on good authority
that S behaves according to quantum weirdness; 100 years of experimentation has
confirmed certain things about the behavior of S and its measurement outcomes.
Here are some examples – the last few may be unfamiliar to you, but I will elaborate,
shortly.
• The system is a proton and the measurement is the position of the proton.
• The system is a hydrogen atom (one proton + one electron) and the measure-
ment is the potential energy state of the atom.
• The system is an electron and the measurement is the magnitude of the elec-
tron’s spin projected onto the direction n̂.
In each case, we are measuring a real number that our apparatus somehow is
capable of detecting. In practice, the apparatus is usually measuring something related
to our desired quantity, and we follow that with a computation to get the value of
interest (velocity, momentum, energy, z-component of spin, etc.).
The Hilbert spaces that we get if we measure momentum or position are infinite di-
mensional vector spaces, and the corresponding linear combinations become integrals
rather than sums. We prefer to avoid calculus in this course. For spin, however, our
vector spaces are two dimensional – about as simple as they come. Sums work great.
133
A Reason Computer Scientists Need to Understand Spin
The spin of an electron, which is known as a spin 1/2 particle, has exactly the kind
of property that we can incorporate into our algorithms. It will have classical aspects
that allow it to be viewed as a classical bit (0 or 1), yet it still has quantum-mechanical
aspects that enable us process it while it is in an “in-between” state – a mixture of 0
and 1.
Spin is a property that every electron possesses. Some properties like charge and
mass are the same for all electrons, while others like position and momentum vary
depending on the electron in question and the exact moment at which we measure.
The spin – or more accurate term spin state – of an electron has aspects of both.
There is an overall magnitude associated with an electron’s spin state that does not
change. It is represented by the number 1/2, a value shared by all electrons at all
times. But then each election can have its own unique vector orientation that varies
from electron-to-electron or moment-to-moment. We’ll sneak up on the true quantum
definition of these two aspects of quantum spin in steps by first by thinking of an
electron using inaccurate but intuitive imagery, and we’ll make adjustments as we
perform experiments that progressively force us to change our attitude.
134
A Useful Mental Image of Spin
We indulge our desire to apply classical physics and – with an understanding that
it’s not necessarily true – consider the electron to be a rotating, charged body. Such
an assumption would imbue every electron with an intrinsic angular momentum that
we call spin. If you have not studied basic physics, you can imagine a spinning top; it
has a certain mass distribution, spins at a certain rate (frequency) and its rotational
axis has a certain direction. Combining these three things into a single vector, we
end up defining the angular momentum of the top or, in our case, the spin of the
electron. (See Figure 6.1.)
1. first, its quantity of angular momentum (how “heavy” it is combined with how
fast it’s rotating), which I will call S, and
2. second, the orientation (or direction) of its imagined rotational axis, which I
will call n̂S or sometimes just Ŝ.
The first entity, S, is a scalar. The second, n̂S , can be represented by a unit vector
that points in the direction of the rotational axis (where we adjudicate up vs. down
by a “right hand rule,” which I will let you recall from any one of your early math
classes). (See Figure 6.2.)
So, the total spin vector will be written
Sx
S = Sy ,
Sz
135
Figure 6.2: A classical idea for spin: A 3-D direction and a scalar magnitude
and we can break it into the two aspects, its scalar magnitude,
q
S ≡ |S| = Sx 2 + Sy 2 + Sz 2 ,
The constancy of its magnitude leaves the electron’s spin orientation, Ŝ, as the only
spin-related entity that can change from moment-to-moment or electron-to-electron.
Figure 6.3: Polar and azimuthal angles for the (unit) spin direction
You may have noticed that we don’t need all three components nx , ny and nz .
Since n̂S is unit length, the third can be derived from the other two. [Exercise.
How?] A common way to express spin direction using only two real numbers is
through the so-called polar and azimuthal angles, θ and φ (See Figure 6.3).
136
In fact, Spherical coordinates provide an alternate means of expressing any vector
using these two angles plus the vector’s length, r. In this language a vector can
be written ( r, θ, φ )Sph , rather than the usual (x, y, z). (The subscript “Sph” is
usually not shown if the context makes clear we are using spherical coordinates.) For
example,
√
1 3
1 = .615
.
1 π/4 Sph
In the language of spherical coordinates the vector n̂S is written ( 1, θ, φ )Sph , where
the first coordinate is always 1 because n̂S has unit length. The two remaining
coordinates are the ones we just defined, the angles depicted in Figure 6.3.
θ and φ will be important alternatives to the Euclidean coordinates (nx , ny , nz ),
especially as we study the Bloch sphere, density matrices and mixed states, topics in
the next quantum computing course, CS 83B.
137
Figure 6.5: The z-projection of one electron’s spin
individually. Therefore, we’ll ask the physicists to measure only the real valued
component Sz , the projection of S onto the z-axis. They assure us they can do
this without subjecting the electrons to any net forces that would (classically)
modify the z-component of S.
To aid the visualization, we imagine measuring each electron one-at-a-time and
noting the z-component of spin after each “trial.” (See Figure 6.5.)
3. The Classical Expectation. The √ results are easy to predict classically since
the length of the spin is fixed at √23 ~, and the
√
electrons are oriented randomly;
3 3
we expect Sz to vary between + 2 ~ and − 2 ~. For example, in one extreme
case, we could find
etc.
138
6.4.2 The Actual Results
We get no such cooperation from nature. In fact, we only – and always – get one of two
z-values of spin: Sz = +~/2 and Sz = −~/2. Furthermore, the two readings appear
to be somewhat random, occurring with about equal likelihood and no pattern:
~ ~ ~ ~ ~ ~
− ··· + ··· + ··· + ··· − ··· + ···
2 2 2 2 2 2
Figure 6.7: The measurements force Sz to “snap” into one of two values.
• Surprise #1. There are infinitely many quantum spin states available for elec-
trons to secretly experience. Yet when we measure the z-component of spin, the
uncooperative particle always reports that this value is either + 2 or − ~2 ,
~
each choice occurring with about equal likelihood. We will call the z-component
of spin the observable Sz (“observable” indicating that we can measure it) and
accept from the vast body of experimental evidence that measuring the observ-
able Sz forces the spin state to collapse such that its Sz “snaps” to either one
of the two
allowable values called eigenvalues of the observable Sz . We’ll call
the + ~2 outcome the +z outcome (or just the (+) outcome) and the − ~2
outcome the −z (or the (−)) outcome.
• Surprise #2. Even if we can somehow accept the collapse of the infinity of
random states into the two measurable ones, we cannot help but wonder why the
electron’s projection onto the
√
z-axis is not the entire length
√ of the vector, that
3 3
is, either straight up at + 2 ~ or straight down at − 2 ~. The electron
stubbornly wants to give us only a fraction of that amount, ≈ 58%. This
corresponds to two groups. The “up group” which forms the angle θ ≈ 55◦
139
(0.955 rad) with the positive z-axis and “down group” which forms that same
55◦ angle with the negative z-axis. The explanation
√
for not being able to get
3
a measurement that has the full length 2 ~ is hard to describe without a
more complete study of quantum mechanics. Briefly it is due to the Heisenberg
uncertainty principle. If the spin were to collapse to a state that was any closer
to the vertical ±z-axis, we would have too much simultaneous knowledge about
its
x- and y- components (too close to 0) and its z-component (too close to
√
3
± 2 ~). (See Figure 6.8.) This would violate Heisenberg, which requires
the combined variation of these observables be larger than a fixed constant.
√
Therefore, Sz must give up some of its claim on the full spin magnitude, 23 ~,
Figure 6.8: Near “vertical” spin measurements give illegally accurate knowledge of
Sx , Sy and Sz .
140
Figure 6.9: Plans for a follow-up to experiment # 1
|+i and |−i States. We give a name to the state of the electrons in the (+)
group: we call it the |+iz state (or simply the |+i state, since we consider the z-axis
to be the preferred axis in which to project the spin). We say that the (−) group is
in the |−iz (or just the |−i) state. Verbally, these two states are pronounced “plus
ket” and “minus ket.”
141
{|+ix , |−ix }? We’ll need another experiment.
1. The States. The input electrons are in a specific state, |+i, whose z-spins al-
ways point (as close as possible to) up. This is in contrast to the first experiment
where the electrons were randomly oriented.
142
Figure 6.12: The input state for experiment # 2
to know whether electrons that start in the z-up state have any x-projection
preference. We direct our physicists to measure only the real valued component
Sx , the projection of S onto the x-axis.
Figure 6.14: Viewed from top left, the classical range of x-projection of spin.
Clinging desperately to those classical ideas which have not been ruled out by
the first experiment, we imagine Sx and Sy to be in any relative amounts that
complement |Sz |, now firmly fixed at ~2 . This would allow values for those two
q √
such that |Sx |2 + |Sy |2 + 14 ~2 = 23 ~. If true, prior to this second measurement
Sx would be smeared over a range of values. (See Figure 6.14.)
143
6.5.2 The Actual Results
As before, our classical expectations are dashed. We get one of two x-values of spin:
Sx = +~/2 or Sx = −~/2. And again the two readings occur randomly with near
equal probability:
~ ~ ~ ~ ~ ~
+ ··· + ··· − ··· + ··· − ··· − ··· .
2 2 2 2 2 2
Also, when we subject each output group to further Sx tests, we find that after the
first Sx collapse each group is locked in its own state – as long as we only test Sx .
Figure 6.15: A guess about the states of two groups after experiment #2
144
6.5.5 Results of the Follow-Up
Here is what we find:
~ ~ ~ ~ ~ ~
+ ··· + ··· − ··· − ··· − ··· + ···
2 2 2 2 2 2
We are speechless. There are now equal numbers of z-up and z-down spins in a group
of electrons that we initially selected from a purely z-up pool. (See Figure 6.16.)
Furthermore, the physicists assure us that nothing in the second apparatus could
have “turned the electrons” viz-a-viz the z-component – there were no net z-direction
forces.
145
Figure 6.17: The destruction of |+i Sz information after measuring Sx
Preview
where c+ and c− are scalar “weights” which express how much |+i and |−i constitute
|−ix . Furthermore, it would make sense that the scalars c+ and c− have equal mag-
nitude if they are to reflect the observed (roughly) equal number of z-up and z-down
spins detected by the third apparatus when testing the |−ix group.
We can push this even further. Because we are going to be working with normal-
ized vectors (recall the projective
√ sphere in Hilbert space?), it will turn out that their
common magnitude will be 1/ 2.
For this particular combination of vector states,
1 1 |+i − |−i
|−ix = √ |+i − √ |−i = √ .
2 2 2
In words (that we shall make precise in a few moments), the |−ix vector can be
expressed as a linear combination of the Sz vectors |+i and |−i. This hints at the
idea that |+i and |−i form a basis for a very simple 2-dimensional Hilbert space, the
foundation of all quantum computing.
146
S is crumbling before our eyes. In its place, a new model is emerging – that of a two
dimensional vector space whose two basis vectors appear to be the two z-spin states
|+i and |−i which represent a quantum z-up and quantum z-down, respectively. This
is a difficult transition to make, and I’m asking you to accept the concept without
trying too hard to visualize it in your normal three-dimensional world view. Here are
three counter-intuitive ideas, the seeds of which are present in the recent outcomes
of experiment #2 and its follow-up:
1. Rather than electron spin being modeled by classical three dimensional unit
vectors
Sx 1
Sy = θ
Sz φ Sph
in a real vector space with basis {x̂, ŷ, ẑ}, we are heading toward a model
where spin states are represented by two dimensional unit vectors
c+
c−
in a complex vector space with basis { |+iz , |−iz }.
2. In contrast to classical spin, where the unit vectors with z-components +1 and
-1 are merely scalar multiples of the same basis vector ẑ,
0 0
0 = (+1) ẑ and 0 = (−1) ẑ
+1 −1
we are positing a model in which the two polar opposite z-spin states, |+i = |+iz
and |−i = |−iz , are linearly independent of one another.
3. In even starker contrast to classical spin, where the unit vector x̂ is linearly
independent of ẑ, the experiments seem to suggest that unit vector |−ix can be
formed by taking a linear combination of |+i and |−i, specifically
|+i − |−i
|−ix = √ .
2
But don’t give up on the spherical coordinates θ and φ just yet. They have a role
to play, and when we study expectation values and the Bloch sphere, you’ll see what
that role is. Meanwhile, we have one more experiment to perform.
147
1. It collapses the electrons into one of two
spin states for that observable, one that
produces a reading of up (+) = + ~2 , and the other that produced a reading
of down (−) = − ~2 .
2. It destroys any information about the other spin axes, or in quantum-ese, about
the other spin observables.
Figure 6.18: A spin direction with polar angle θ from +z, represented by |ψi
Let’s call this rotated state “|ψi,” just so it has a name that is distinct from |+i
and |−i.
If we only rotate by a tiny θ, we have a high dose of |+i and a small dose of |−i in
our rotated state, |ψi. On the other hand, if we rotate by nearly 180◦ (π radians), |ψi
would have mostly |−i and very little |+i in it. Before this lesson ends, we’ll prove
that the right way to express the relationship between θ and the relative amounts of
|+i and |−i contained in |ψi is
θ θ
|ψi = cos |+i + sin |−i .
2 2
148
By selecting the same (+) group coming out of the first apparatus (but now tilted at
an angle θ) as input into the second apparatus, we have effectively changed our input
states going into the second apparatus from purely |+i to purely |ψi.
We now measure Sz , the spin projected onto the z-axis. The exact features of
what I just described can be stated using the earlier three-bullet format.
1. The States. This time the input electrons are in a specific state, |ψi, whose
z-spin direction forms an angle θ from the z-axis (and for specificity, whose
spherical coordinate for the azimuthal angle, φ, is 0). (See Figure 6.19.)
Figure 6.19: The prepared state for experiment #3, prior to measurement
3. The Classical Expectation. We’ve been around the block enough to realize
that we shouldn’t expect a√classical result. If this were
√ a purely classical situa-
3 3
tion, the spin magnitude, 2 ~, would lead to Sz = 2 ~ cos (θ). But we already
know the largest Sz ever “reads” is 21 ~, so maybe we attenuate that number
by cos θ, and predict ~2 cos (θ). Those are the only two ideas we have at the
moment. (See Figure 6.20.)
149
6.6.3 The Actual Results
It is perhaps not surprising that we always read one of two z-values of spin: Sz = +~/2
and Sz = −~/2. The two readings occur somewhat randomly:
~ ~ ~ ~ ~ ~
+ ··· + ··· − ··· + ··· + ··· − ···
2 2 2 2 2 2
However, closer analysis reveals that they are not equally likely. As we try different
θs and tally the results, we get the summary shown Figure 6.21.
Figure 6.21: Probabilities of measuring |±i from starting state |ψi, θ from +z
In other words,
~ 2 θ
% outcomes which = + ≈ cos · 100% and
2 2
~ 2 θ
% outcomes which = − ≈ sin · 100% .
2 2
Notice how nicely this agrees with our discussion of experiment #2. There, we
prepared a |−ix state to go into the final Sz tester. |+ix is intuitively 90◦ from the
z-axis, so in that experiment
√
our θ was 90◦ . That would make 2θ = 45◦ , whose cosine
and sine are both = 22 . For these values, the formula above gives a predicted 50%
(i.e., equal) frequency to each outcome, (+) and (−), and that’s exactly what we
found when we measured Sz starting with |−ix electrons.
150
This also settles a debate that you might have been waging, mentally. Is spin, prior
to an Sz measurement, actually in some combination of the states |+i and |−i? Yes.
Rotating the first apparatus relative to the second apparatus by a particular θ has a
physical impact on the outcomes. Even though the electrons collapse into one of |+i
and |−i after the measurement, there is a difference between |ψi = .45 |+i + .89 |−i
and |ψ 0 i = .71 |+i + .71 |−i: The first produces 20% (+) measurements, and the
second produces 50% (+) measurements.
151
6.7 Onward to Formalism
In physics the word “formalism” refers to the abstract mathematical notation neces-
sary to scribble predictions about what a physical system such as a quantum computer
will do if we build it. For our purposes, the formalism is what we need to know, and
that’s the content of the next two chapters. They will provide a strict but limited set
of rules that we can use to accurately understand and design quantum algorithms.
Of those chapters, only the next (the second of these three) is necessary for CS
83A, but what you have just read has prepared you with a sound intuition about
the properties and techniques that constitute the formalism of quantum mechanics,
especially in the case of the spin 1/2 model used in quantum computing.
152
Chapter 7
1. We want you to master the notation used by physicists and computer scientists
to scribble, calculate and analyze quantum algorithms and their associated logic
circuits.
2. We want you to be able to recognize and make practical use of the direct cor-
respondence between the math and the physical quantum circuitry.
By developing this knowledge, you will learn how manipulating symbols on paper
affects the design and analysis of actual algorithms, hardware logic gates and mea-
surements of output registers.
153
In this lesson, time will not be a variable; the physics and mathematics pertain
to a single instant.
Let’s define (or re-define if you read the last chapter) a physical system S to be some
conceptual or actual apparatus that has a carefully controlled and limited number of
measurable states. It’s implied that there are very few aspects of S that can change
or be measured, since otherwise it would be too chaotic to lend itself to analysis.
Consider two examples. The first is studied early in an undergraduate quantum
mechanics course; we won’t dwell on it. The second forms the basis of quantum
computing, so we’ll be giving it considerable play time.
We build hardware that allows a particle (mass = m) to move freely in one dimension
between two boundaries, say from 0 cm to 5 cm. The particle can’t get out of that 1-
dimensional “box” but is otherwise free to roam around inside (no forces acting). We
build an apparatus to test the particle’s energy. Using elementary quantum mechanics
we discover that the particle’s energy measurement can only attain certain discrete
values, E0 , E1 , E2 , . . . . Furthermore, once we know which energy, Ek , the particle
has, we can form a probability curve that predicts the likelihood of finding the particle
at various locations within the interval [0, 5].
• Silver Atoms Work Well. You might think this is too complex a system for
measuring a single electron’s spin. After all, there are 47 electrons and each has
spin not to mention a confounding orbital angular momentum from its motion
about the nucleus. However, orbital angular momentum is net-zero due to
statistically random paths, and the spin of the inner 46 electrons cancel (they
are paired, one up, one down). This leaves only the spin of the outermost
154
electron #47 to account for the spin of the atom as a whole. (Two other facts
recommend the use of silver atoms. First, we can’t use charged particles since
the so-called Lorentz force would overshadow the subtle spin effects. Second,
silver atoms are heavy enough that their deflection can be calculated based
solely on classical equations.)
• We Prepare a Fixed Initial State. An atom can be prepared in a spin state
associated with any pre-determined direction, n̂, prior to subjecting it to a final
Stern-Gerlach tester. We do this by selecting a |+i electron from a preliminary
Stern-Gerlach Sz tester then orient a second tester in an n̂ direction relative
the original.
• The Measurements and Outcomes. The deflection of the silver atom is detected
as it hits a collector plate at the far end of the last apparatus giving us the
measurements ± ~2 , and therefore the collapsed states |+ix , |−in̂ , etc. The
results correspond precisely with our experiments #1, #2 and #3 discussed
earlier.
Stern-Gerlach is the physical system to keep in mind as you study the math that
follows.
|ψ i,
and is usually referred to as a ket. The Greek letter ψ is typically used to label any
old state. As needed we will be replacing it with specific and individual labels when
we want to differentiate two state vectors, express something known about the vector,
or discuss a famous vector that is universally labeled. Examples we will encounter
include
| a i , | uk i , | + i , and | + iy .
155
When studying Hilbert spaces, I mentioned that a single physical state corre-
sponds to an infinite number of vectors, all on the same ray, so we typically choose
a “normalized” representative having unit length. It’s the job of quantum physicists
to describe how to match the physical states with normalized vectors and ours as
quantum computer scientists to understand and respect the correspondence.
In this regime, any physical spin state |ψi ∈ S can be expressed as a normalized
vector expanded along this natural basis using
α
|ψi = = α |+i + β |−i , where
β
|α|2 + |β|2 = 1.
The length requirement reflects that physical states reside on the projective sphere of
H.
[Exercise. Demonstrate that { |+i , |−i } is an orthonormal pair. Caution:
Even though this may seem trivial, be sure you are using the complex, not the real,
inner product.]
In the heat of a big quantum computation, basis kets will kill each other off, turning
themselves into the scalars 0 and 1 because the last exercise says that
h+ | +i = h− | −i = 1, and
h+ | −i = h− | +i = 0.
156
While this doesn’t rise to the level of “trait”, memorize it. Every quantum mechanic
relies on it.
[Exercise. Demonstrate that the set { |+i , |−i } forms a basis (the z-basis) for
H. Hint: Even though only the projective sphere models S , we still have to account
for the entire expanse of H including all the vectors off the unit-sphere if we are going
to make claims about “spanning the space.”]
|+i − |−i
|−ix = √ ,
2
but now we can make this official (or, if you skipped the last chapter, let this serve
as the definition of two new kets).
1 1
|−ix ≡ √ ,
2 −1
that is, it is the vector in H whose coordinates along the z-bases are as shown. We
may as well define the |+ix vector. It is
|+i + |−i 1 1
|+ix = √ = √ .
2 2 1
2. Why do spin states have to live on the projective sphere? Why not any point
in H or perhaps the sphere of radius 94022?
I can answer item 1 now (and item 2 further down the page). Obviously, there was
nothing magical about the z-axis or the x-axis. I could have selected any direction
in which to start my experiments at the beginning of the last lesson and then picked
any other axis for the second apparatus. In particular, I might have selected the
157
same z-axis for the first measurement, but used the y-axis for the second one. Our
interpretation of the results would then have suggested that |−iy contain equal parts
|+i and |−i,
and similarly for |+iy . If we were forced to use real scalars, we would have to pick the
same two scalars ± √12 for c± (although we could choose which got the + and which
got the − sign, a meaningless difference). We’d end up with
1 1
|±iy ≡ √ (warning: not true) ,
2 ±1
which would force them to be identical to the vectors |+ix and |−ix , perhaps the
order of the vectors, reversed. But this can’t be true, since the x-kets and the y-kets
can no more be identical to each other than either to the z-kets, and certainly neither
are identical to the z-kets. (If they were, then repeated testing would never have split
the original |+i into two equal groups, |+ix and|−ix .) So there are just not enough
real numbers to form a third pair of basis vectors in the y-direction, distinct from the
x-basis and the z-basis.
If we allow complex scalars, the problem goes away. We can define
1 1
|±iy ≡ √ ,
2 ±i
Now all three pairs are totally different orthonormal bases for H, yet each one contains
“equal amounts” of |+i and |−i.
158
Trait #2’ (Mathematical Version of Operator for an Observable):
Observable A ∈ S
←→
TA : H → H linear and TA† = TA .
Meanwhile, the least we can do is to confirm that the matrix for Sz is Hermitian.
[Exercise. Prove the matrix for Sz is Hermitian.]
We will now start referring to
159
ii) its associated linear operator, and
iii) the matrix for that operator
Tip. This also demonstrates a universal truth. Any observable expressed it its
own basis is always a diagonal matrix with the eigenvalues appearing along that
diagonal. Because the actual observables’ matrices are the Pauli operators with the
factor of ~/2 out front, we can see that the eigenvalues of Sz do, in fact, appear along
that matrix’s diagonal.
160
when measuring position or momentum, the set of eigenvalues is continuous (non-
enumerable).]
Obviously, we have to learn what an eigenvalue is. Whatever it is, when we
compute the eigenvalues of, say, the matrix Sz , this trait tells us that they must be
±~/2, since those were the values we measured when we tested the observable Sz . If
we had not already done the experiment but knew that the matrix for the observable
was Sz above, Trait #3 would allow us to predict the possible outcomes, something
we will do now.
161
There are two facts that I will state without proof. (They are easy enough to be
exercises.)
• Diagonality. When the eigenvectors of a matrix, M , form a basis for the vector
space, we call it an eigenbasis for the space. M , expressed as a matrix in its
own eigenbasis, is a diagonal matrix (0 everywhere except for diagonal from
position 1-1 to n-n).
for any complex scalar, a. However, the vector (1, 0)t is an eigenvector, as
~ 1 0 1 ~ 1
=
2 0 −1 0 2 0
demonstrates. It also tells us that ~2 is the eigenvalue associated with the vector
(1, 0)t , which is exactly what Trait #3 requires.
[Exercise. Show that (0, 1)t and − ~2 form another eigenvector-eigenvalue pair for
Sz .]
This confirms that Trait #3 works for Sz ; we have identified the eigenvalues for
Sz and they do, indeed, represent the only measurable values of the observable Sz in
our experiments.
All of this results in a more informative variant of Trait #3, which I’ll call Trait
#3’.
Trait #3’: The only possible measurement outcomes of an observable, A, are the
solutions {ak } to the eigenvector-eigenvalue equation
TA |uk i = ak |uk i .
The values {ak } are always real, and are called the eigenvalues of the observable,
while their corresponding kets, {|uk i} are called the eigenkets. If each eigenvalue has
a unique eigenket associated with it, the observable is called non-degenerate. On the
other hand, if there are two or more different eigenkets that make the equation true
162
for the same eigenvalue, that eigenvalue is called a degenerate eigenvalue, and the
observable is called degenerate.
You may be wondering why we can say that the eigenvalues of an observable are
always real when we have mentioned that, for a general matrix operator, we can get
complex eigenvalues. This is related to the theoretical definition of an observable
which requires it to be of a special form that always has real eigenvalues.
det (M − λI) = 0.
Mu = au ⇒
Mu = a Iu ⇒
(M − aI) u = 0.
Keeping in mind that eigenvectors are always non-zero, we have shown that the matrix
M − aI maps a non-zero u into 0. But that’s the hypothesis of the Little Inverse
Theorem “B” of our matrix lesson, so we get
det (M − aI) = 0.
163
which is solved like so:
~ 0 −i λ 0
2 i 0 − = 0
0 λ
−λ − ~2 i
= 0
~i
2
−λ
~2
λ2 − = 0
4
~
λ = ±
2
Of course, we knew the answer, because we did the experiments (and in fact, the
theoreticians crafted the Sy matrix based on the results of the experimentalists). Now
comes the fun part. We want to figure out the eigenvectors for these eigenvalues. Get
ready to do your first actual quantum mechanical calculation.
Eigenvector for (+ ~ /2). The eigenvector has to satisfy
~
Sy u = + u,
2
so we view this as an equation in the unknown coordinates (expressed in the preferred
z-basis where we have been working all along),
~ 0 −i v1 ~ v1
= .
2 i 0 v2 2 v2
This reduces to
−i v2 = v1 and
i v1 = v2 .
There are two equations in two unknowns. Wrong. There are four unknowns (each
coordinate, vk , is a complex number, defined by two real numbers). This is somewhat
expected since we know that the solution will be a ray of vectors all differing by a
complex scalar factor. We can solve for any one of the vectors on this ray as a first
step. We do this by guessing that this ray has some non-zero first coordinate (and if
we guess wrong, we would try again, the second time knowing that it must therefore
have a non-zero second coordinate – [Exercise. Why?]. Using this guess, we can pick
v1 =1, since any non-zero first coordinate can be made to = 1 by a scalar multiple of
the entire vector. With this we get the complex equations
−i v2 = 1
i1 = v2 ,
revealing that v2 = i, so
1
u = ,
i
164
which we must (always) normalize by projecting onto the unit (“projective”) sphere.
u 1 1 |+i + i |−i
= √ = √ .
|u| 2 i 2
The last equality is the expression of u explicitly in terms of the z-basis {|+i , |−i}.
Alternate Method. We got lucky in that once we substituted 1 for v1 , we were
able to read off v2 immediately. Sometimes, the equation is messier, and we need to
do a little work. In that case, naming the real and imaginary parts of v2 helps.
v2 = a + bi,
and substituting this into the original equations containing v2 , above, gives
−i (a + b i) = 1
i = (a + b i) ,
or
b − ai = 1
i = a + bi.
Let’s solve the second equation for a, then substitute into the first, as in
a = i − bi ⇒
b − (i − b i) i = 1 ⇒
1 = 1.
What does this mean? It means that we get a very agreeable second equation; the b
disappears resulting in a true identity (a tautology to the logician). We can, therefore
let b be anything. Again, when given a choice, choose 1. So b = 1 and substituting
that into any of the earlier equations gives a = 0. Thus,
v2 = a + bi = 0 + 1·i = i,
the same result we got instantly the first time. We would then go on to normalize u
as before.
Wrong Guess for v1 ? If, however, after substituting for a and solving the first
equation, b disappeared and produced a falsehood (like 0 = 1), then no b would be
suitable. That would mean our original choice of v1 = 1 was not defensible; v1 could
not have been a non-zero value. We would simply change that assumption, set v1 = 0
and go on to solve for v2 (either directly or by solving for a and b to get it). This
time, we would be certain to get a solution. In fact, any time you end up facing a
contradiction (3 = 4) instead of a tautology (7 = 7), then your original guess for v1
has to be changed. Just redefine v1 (if you chose 1, change it to 0) and everything
will work out.
165
Too Many Solutions? (Optional Reading) In our well-behaved spin-1/2 state
space, each eigenvalue determines a single ray in the state-space, so it only takes one
unit eigenvector to describe it; you might say that the subspace of H spanned by the
eigenvectors corresponding to each eigenvalue is 1-dimensional. All of its eigenvectors
differ by scalar factor. But in other physical systems the eigenvector equation related
to a single eigenvalue may yield too many solutions (even after accounting for the
scalar multiples on the same ray we already know about). In other words, there
may be multiple linearly-independent solutions to the equation for one eigenvalue. If
so, we select an orthonormal set of eigenvectors that correspond to the degenerate
eigenvalue, as follows.
1. First observe (you can prove this as an easy [exercise]) that the set of all
eigenvectors belonging to that eigenvalue form a vector subspace of the state
space.
2. Use basic linear algebra to find any basis for this subspace.
Repeat this for any eigenvalue that is degenerate. You will get an optional exercise
that explains why we want an orthonormal basis for the eigenspace of a degenerate
eigenvalue.
Eigenvector for (- ~ /2). Now it’s your turn.
[Exercise. Find the eigenvector for the negative eigenvalue.]
166
The eigenvalues and eigenvectors for Sz , Sx , and Sy are:
~ 1 ~ 0
Sz : + ←→ , − ←→
2 0 2 1
~ 1 1 ~ 1 1
Sx : + ←→ √ , − ←→ √
2 2 1 2 2 −1
~ 1 1 ~ 1 1
Sy : + ←→ √ , − ←→ √
2 2 i 2 2 −i
Expressed explicitly in terms of the z-basis vectors we find
~ ~
Sz : ↔ |+i , − ↔ |−i
2 2
~ |+i + |−i ~ |+i − |−i
Sx : ↔ |+ix = √ , − ↔ |−ix = √
2 2 2 2
~ |+i + i |−i ~ |+i − i |−i
Sy : ↔ |+iy = √ , − ↔ |−iy = √
2 2 2 2
We saw the x-kets and y-kets before when we were trying to make sense out of the
50-50 split of a |−ix state into the two states |+iz and |−iz . Now, the expressions re-
emerge as the result of a rigorous calculation of the eigenvectors for the observables Sx
and Sy . Evidently, the eigenvectors of Sx are the same two vectors that you showed
(in an exercise) were an alternative orthonormal basis for H, and likewise for the
eigenvectors of Sy .
Using these expressions along with the distributive property of inner products, it
is easy to show orthonormality relations like
xh+ | + ix = 1, or
x h + | − ix = 0.
[Exercise. Prove the above two equalities as well as the remaining combinations
that demonstrate that both the x-bases and y-basis are each (individually) orthonor-
mal.]
167
eigenvectors form three different 2-vector bases for the 2-dimensional H. Each of the
bases is an orthonormal basis.
I’d like to distill two observations and award them collectively the title of trait.
2. whose eigenvectors ua1 , ua2 , . . ., form an orthonormal basis for the state space
H.
I did not call this trait a “postulate,” because it isn’t; you can prove these two
properties based on Trait #2, which connects observables to Hermitian operators.
Although we won’t spend our limited time proving it, if you are interested, try the
next few exercises.
Note. I wrote the trait as if all the eigenvalues were non-degenerate. It is still true,
even for degenerate eigenvalues, although then we would have to label the eigenvectors
more carefully.
[Exercise. Show that a Hermitian operator’s eigenvalues are real.]
[Exercise. Show that eigenvectors corresponding to distinct eigenvalues are or-
thogonal.]
[Exercise. In an optional passage, above, I mentioned that a degenerate eigen-
value determines not a single eigenvector, but a vector subspace of eigenvectors, from
which we can always select an orthonormal basis. Use this fact, combined with the
last exercise to construct a complete orthonormal set of eigenvectors, including those
that are non-degenerate (whose eigenspace is one-dimensional) and those that are
degenerate (whose eigenspace requires multiple vectors to span it).]
I told a small white lie a moment ago when I said that this is totally provable
from the second postulate (Trait #2). There is one detail that I left out of the
second postulate which is needed to prove these observations. You did not possess
the vocabulary to understand it at the time, but now you do. In Trait #2 I said
that the observable had to correspond to a Hilbert-space operator. What I left out
was the following mini-trait, which I’ll call
Trait #2a (Completeness of the Eigenbasis): The eigenvectors of an observ-
able span – and since they are orthogonal, constitute a basis for – the state space.
Furthermore, every measurable quantity of the physical system S corresponds to an
observable, thus its eigenvectors can be chosen as a basis whenever convenient.
If we had an observable whose eigenvectors turned out not to span the state
space, we did a bad job of defining the state space and would have to go back and
168
figure out a better mathematical Hilbert space to model S . Similarly, if there were
a measurable quantity for which we could not identify a linear operator, we have not
properly modeled S .
How would our familiar z-kets now look in this new basis? You can do this by starting
with the expressions for the x-kets in terms of |+i and |−i that we already have,
|+i + |−i
|+ix = √ and
2
|+i − |−i
|−ix = √ ,
2
and solve for the z-kets in terms of the x-kets. It turns out that doing so results in
déjà vu,
|+ix + |−ix
|+i = √ and
2
|+ix − |−ix
|−i = √ .
2
It’s a bit of a coincidence, and this symmetry is not quite duplicated when we express
the y-kets in terms of |+ix and |−ix . I’ll do one, and you can do the other as an
exercise.
|+iy in the x-Basis. The approach is to first write down |+iy in terms of |+i and
|−i (already known), then replace those two z-basis kets with their x-representation,
169
shown above. Here we go.
|+ix + |−ix |+ix − |−ix
|+i + i |−i √
2
+ i √
2
|+iy = √ = √
2 2
(1 + i) |+ix + (1 − i) |−ix
=
2
×(1−i) 2 |+ix − 2i |−ix
= = |+ix − i |−ix
2
normalize |+ix − i |−ix
∼
= √
2
[Exercise. Show that
|+ix + i |−ix i
|−iy = √
2
[Exercise. Express |±i and |±ix in the y-basis.]
170
where the syntax
x h+ | ψi
means we are taking the complex inner product of |ψi on the right with the x-basis
ket |+ix on the left. (Don’t forget that we have to take the Hermitian conjugate of
the left vector for complex inner products.)
We are implying by context that the column vector on the RHS is expressed in
x-coordinates since that’s the whole point of the paragraph. But if we want to be
super explicit, we could write it as
! !
x h+ | ψi x h+ | ψi
|ψi = , or ,
x h− | ψi x x h− | ψi x
with or without the long vertical line, depending on author and time-of-day.
Showing the same thing in terms of the x-kets explicitly, we get
Notice that the coordinates of the three vectors, |±ix and |ψi are expressed in the
preferred z-basis. We can compute inner products in any orthonormal bases, and since
we happen to know everything in the z-basis, why not? Try not to get confused. We
are looking for the coordinates in the x-basis, so we need to “dot” with the x-basis
vectors, but we use z-coordinates of those vectors (and the z-coordinates of state |ψi)
to compute those two scalars.
Example. The (implied z-spin) state vector
1+i
!
√
6
√
− √23
171
7.8 The Completeness (or Closure) Relation
7.8.1 Orthonormal Bases in Higher Dimensions
Our casual lesson about conversion from the z-basis to the x-basis has brought us to
one of the most computationally useful tools in quantum mechanics. It’s something
that we use when doodling with pen and paper to work out problems and construct
algorithms, so we don’t want to miss the opportunity to establish it formally, right
now. We just saw that
which was true because {|+ix , |−ix } formed an orthonormal basis for our state space.
In systems with higher dimensions we have a more general formula. Where do we
get state spaces that have dimensions higher than two? A spin 1 system – photons
– has 3 dimensions; a spin-3/2 system – delta particle – has 4-dimensions and later
when we get into multi-qubit systems, we’ll be taking tensor products of our humble
2-dimensional H to form 8-dimensional or larger state spaces. And the state space
that models position and momentum are infinite dimensional (but don’t let that scare
you – they are actually just as easy to work with – we use use integrals instead of
sums).
If we have an n-dimensional state space then we would have an orthonormal basis
for that space, say,
n on
|uk i .
k=1
The |uk i basis may or may not be a preferred basis – doesn’t matter. Using the
dot-product trick we can always expand any state in that space, say |ψi, along the
basis just like we did for the x-basis in 2-dimensions. Only, now, we have a larger
sum
n
X
|ψi = huk | ψi |uk i .
k=1
This is a weighted sum of the uk -kets by the scalars huk | ψi. There is no law that
prevents us from placing the scalars on the right side of the vectors, as in
n
X
|ψi = |uk i huk | ψi .
k=1
172
Look at what we have. We are subjecting any state vector |ψi to something that
looks like an operator and getting that same state vector back again. In other words,
that fancy looking operator-sum is nothing but an identity operator 1.
n
!
= 1.
X
|uk i huk |
k=1
This simple relation, called the completeness or closure relation, can be applied by
inserting the sum into any state equation without changing it, since it is the same as
inserting an identity operator (an identity matrix) into an equation involving vectors.
We’ll use it a little in this course, CS 83A, and a lot in the next courses CS 83B and
CS 83C, here at Foothill College.
P
[Exercise. Explain how the sum, k |uk i huk | is, in fact, a linear transformation
that can act on a vector |ψi. Hint: After applying it to |ψi and distributing, each
term is just an inner-product (resulting in a scalar) times a vector. Thus, you can
analyze a simple inner product first and later take the sum, invoking the properties
of linearity.]
This is worthy of its own trait.
173
then the probability that a measurement of A will yield a non-degenerate eigenvalue
ak (associated with the eigenvector |uk i) is |ck |2 .
Vocabulary. The expansion coefficients, ck , for state |ψi, are often referred to as
amplitudes by physicists.
In this language, the probability of obtaining a measurement outcome ak for ob-
servable A is the magnitude-squared of the amplitude ck standing next to the eigen-
vector |uk i associated with the outcome ak .
The complex coordinates of the state determine the statistical outcome of repeated
experimentation. This is about as quantum mechanical a concept as there is. It tells
us the following.
• If a state is a superposition (non-trivial linear combination) of two or more
eigenkets, we cannot know an outcome of a quantum measurement with cer-
tainty.
• This is not a lack of knowledge about the system, but a statement about what
it means to know everything knowable about the system, namely the full de-
scription of the state.
• The coefficient (or amplitude) ck gives you the probability of the outcome ak ,
namely
P ak |ψi = c∗k ck = |ck |2 ,
Probability Example 1
174
while
∗
1 1 1 1
P Sx = − √ ~ = √ √ = .
2 |+i 2 2 2
Notice that the two probabilities add to 1. Is this a happy coincidence? I think not.
The first postulate of QM (our Trait #1) guarantees that we are using unit vectors
to correspond to system states. If we had a non-unit vector that was supposed to
represent that state, we’d need to normalize it first before attempting to compute the
probabilities.
Probability Example 2
The follow-up to Experiment #2 was the most shocking to us, so we should see
how it is predicted by Trait #6. The starting point for this measurement was the
output of the second apparatus, specifically, the −x group: |−ix . We then measured
Sz , so we need to know the coefficients of the state |−ix along the Sz eigenbasis. We
computed this already, and they are contained in
|+i − |−i
|−ix = √ .
2
The arithmetic we just did works exactly the same here, and our predictions are the
same:
∗
1 1 1 1
P Sz = + √ ~ = −√ −√ = ,
2 |−ix 2 2 2
and
∗
1 1 1 1
P Sz = − √ ~ = −√ −√ = .
2 |−ix 2 2 2
Notice that, despite the amplitude’s negative signs, the probabilities still come out
non-negative.
[Exercise. Analyze the Sy measurement probabilities of an electron in the state
|−i. Be careful. This time we have a complex number to conjugate.]
Probability Example 3
175
The probability of detecting a z-UP spin is given by the coefficient (amplitude) of
the |+i, and is
1 1−i 1+i
P Sz = + √ ~ = √ √
2 |ψi 6 6
2 1
= = .
6 3
The probability of detecting an x-DOWN spin starting with that same |ψi requires
that we project that state along the x-basis,
However, since we only care about the x-down state, we can just compute the |−ix
coefficient, which we do using the dot-product trick.
* 1 1+i +
√ √
2 6 1+i 1
c− = 1 √2 = √ + √
− √2 − √ 12 3
3
1+i+2 3+i
= √ = √ .
12 12
Now we take the magnitude squared,
2 3−i 3+i 10 5
|c− | = √ √ = = .
12 12 12 6
[Exercise. Compute the −z and +x spin probabilities for this |ψi and confirm
that they complement their respective partners that we computed above. Explain
what I mean by “complementing their partners.”]
[Exercise. Compute the +y and −y spin probabilities for this |ψi.]
176
7.10.1 Trait #7 (Post-Measurement Collapse)
If the measurement of an observable of system S results in the eigenvalue, ak , then
the system ”collapses” into the (an) eigenvector |uk i, associated with ak . Further
measurements on this collapsed state yields the eigenvalue ak with 100% certainty.
Vocabulary Review. The eigenvectors of an observable are also known as eigen-
kets.
We’ve been saying all along that the eigenvalue ak might be degenerate. The im-
plication here is that there may be more than one possibility for the collapsed state,
specifically, any of the eigenvectors |u0k i , |u00k i , |u000
k i , . . . which correspond to ak . We
won’t encounter this situation immediately, but it will arise later in the course. The
early easy cases will consist of non-degenerate eigenvalues whose probabilities are eas-
ily computed by the amplitudes of their respective unique eigenvectors. Later, when
we get to degenerate eigenvalues, we won’t be sure which of the eigenvectors – cor-
responding to that eigenvalue – represents the state into which the system collapsed.
Yet, even knowing that it collapsed to the small subset of eigenvectors corresponding
to a single eigenvalue (in the degenerate case) is invaluable information that plays
directly into our algorithms.
The impact of Trait #7 is that engineers can prepare special states to act as input
into our quantum hardware logic. This is akin to setting a register’s value in a classical
computer using an assignment statement, prior to beginning further logic.
Example: Preparing a Basis (Eigenvector) State. We did this in Exper-
iment #1. After measuring the observable Sz , we ended up with two groups. By
selecting either one of those groups, we will be getting either the |+i state or the |−i
state, as we wish. Any future testing of Sz will confirm that we stay in those states,
as long as we don’t subject the system to forces that modify it.
Example: Preparing a Superposition State. We did this, too. In our late
stages of tinkering with Experiment #2, we focused on the output of the second
apparatus by choosing the |−ix group for further investigation. Using our state space
vocabulary, we expand the state |−ix in terms of the z-eigenkets,
|+i + |−i
|+ix = √ ,
2
and realize that we have prepared a state which, with respect to the z-spin observable,
is not a basis state but a linear combination – a superposition – of the two z-basis
states (with equally weighted components). This kind of state preparation will be
very important for quantum algorithms, because it represents starting out in a state
which is neither 0 nor 1, but a combination of the the two. This allows us to work
with a single state in our quantum processor and get two results for the price of one.
A single qubit and a single machine cycle will simultaneously produce answers for
both 0 and 1.
177
But, after we have prepared one of these states, how do we go about giving it to
a quantum processor, and what is a quantum processor. That’s answered in the first
quantum computing lesson coming up any day now. Today, we carry on with pure
quantum mechanics to acquire the full quiver of q-darts.
1. the possible states |ψi of the system (vectors in our state space),
2. the possible outcomes of a measurement of an observable of that system (the
eigenvalues, ak , of the observable),
3. the eigenvectors states (a.k.a. eigenbasis), |uk i, into which the system collapses
after we detect a value, ak , of that observable, and
4. the probabilities associated with each eigenvalue outcome (specifically, |ck |2 ,
which are derived from the amplitudes, ck , of |ψi expanded along the eigenbasis
|uk i).
In words, the system is in a state of probabilities. We can only get certain special
outcomes of measurement. Once we measure, the system collapses into a special state
associated with that special outcome. The probability that this measurement occurs
is predicted by the coefficients of the state’s eigenvector expansion.
h η | ψ i,
178
The RHS of the inner product, |ψi, is the familiar vector in our state space, or ket
space. Nothing new there. But the LHS, hη|, is to be thought of as a vector from a
new vector space, called the bra-space (mathematicians call it the dual space). The
bra space is constructed by taking conjugate transpose of the vectors in the ket space,
that is,
α
|ψi = −→ hψ| = ( α∗ , β ∗ ) .
β
Meanwhile, the scalars for the bra space are the same: the complex numbers, C.
Examples
Here are some kets (not necessarily normalized) and their associated bras.
!
1 + i √
|ψi = √ −→ hψ| = 1 − i, 2 + 2i
2 − 2i
√ ! √
3/2
|ψi = −→ hψ| = 3/2 , i
−i
!
1 5i 1
|ψi = √ −→ hψ| = √ ( −5i , 0 )
2 0 2
!
1 1 1
|ψi = √ −→ hψ| = √ ( 1 , 1 )
2 1 2
|+i −→ h+|
|−iy −→ y h−|
Hint: It’s probably easier to do this without reading a hint, but if you’re stuck
... write out the LHS as a single column vector and take the conjugate transpose.
Meanwhile the RHS can be constructed by constructing the bras for the two z-basis
vectors (again using coordinates) and combining them. The two efforts should result
in the same vector. ]
Notice that bras are written as row-vectors, which is why we call them conjugate
transposes of kets. The dagger (†) is used to express the fact that a ket and bra bear
179
this conjugate transpose relationship,
This should sound very familiar. Where have we seen conjugate transpose before?
Answer: When we defined the adjoint of a matrix. We even used the same dagger
(†) notation. In fact, you saw an example in which the matrix had only one row (or
one column) – i.e., a vector. (See the lesson on linear transformations.) This is the
same operation: conjugate transpose.
Be careful not say that a ket is the complex conjugate of a bra. Complex conjuga-
tion is used for scalars, only. Again, we say that a bra is the adjoint of the ket (and
vice versa). If you want to be literal, you can always say conjugate transpose. An
alternate term physicists like to use is Hermitian conjugate, “the bra is the Hermitian
conjugate of the ket.”
Example
Let’s demonstrate that the sum of two bras is also a bra. What does that even mean?
If hψ| and hη| are bras, they must be the adjoints of two kets,
α
|ψi = ←→ hψ| = ( α∗ , β ∗ ) .
β
γ
|ηi = ←→ hη| = ( γ ∗ , δ ∗ ) .
δ
hη| + hψ| = ( α∗ + γ ∗ , β ∗ + δ∗ ) ,
i.e.,
That’s all a bra needs to be: the Hermitian conjugate of some ket. So the sum is a
bra.
There is (at least) one thing we must confirm. As always, when we define anything
in terms of coordinates, we need to be sure that the definition is independent of our
choice of basis (since coordinates arise from some basis). I won’t prove this, but you
may choose to do so as an exercise.
180
[Exercise. Pick any three axioms of a vector space and prove that the bras in
the bra space obey them.]
[Exercise. Show that the definition of bra space is independent of basis.]
Remain Calm. There is no cause for alarm. Bra space is simply a device that
allows us manipulate the equations without making mistakes. It gives us the ability to
talk about the LHS and the RHS of an inner product individually and symmetrically,
unattached to the inner product.
Elaborate Example
We will use the bra notation to compute the inner product of the two somewhat
complicated kets,
d |φi − f |θi
on the right ,
g
where c, d, f and g are some complex scalars. The idea is very simple. We first take
the Hermitian conjugate of the intended left vector by
1. turning all of the kets (in that left vector) into bras,
2. taking the complex conjugate of any scalars (in that left vector) which are
outside a ket,
4. using the distributive property to combine the component kets and bras.
c∗ hψ| + hη| .
Step 3 gives us
d |φi − f |θi
∗
c hψ| + hη| ,
g
c∗ d hψ | φi + d hη | φi − c∗ f hψ | θi + f hη | θi
.
g
It seems overly complicated, but usually we apply it to simpler combinations and it
is far less cumbersome than turning all the constituent kets into their coordinates,
combining them, taking the conjugates of the left ket and doing the final “dot.”
181
Simple Example
|+i + i |−i
y h+ | +i = h √ | + i
2
h+| − i h−|
= √ | + i
2
h+ | +i − i h− | +i 1
= √ = √ .
2 2
The first thing we did was to express |+iy in the z-basis without converting it to a
bra. Then, we used the techniques just presented to convert that larger expression
into a bra. From there, it was a matter of distributing the individual kets and bras
and letting them neutralize each other. Normally, we would perform the first two
steps at once, as the next example demonstrates.
h+| + i h−| |+i + |−i
h− | +ix = √ √
y 2 2
h+ | +i + i h− | +i + h+ | −i + i h− | −i
=
2
1 + i
= .
2
[Exercise. Compute h+ | +ix and x
h− | +iy . ]
y
Summary
The bra space is a different vector space from the ket (our state) space. It is, however,
an exact copy (an isomorphism) of the state space in the case of finite dimensions
- all we ever use in quantum computing. You now have enough chops to prove this
easily, so I leave it as an ...
[Exercise. Prove that the adjoint of the ket basis is a basis for bra space. Hint:
Start with any bra. Find the ket from which it came (this step is not always possible in
infinite dimensional Hilbert space, as your physics instructors will tell you). Expand
that ket in any state space basis, then . . . .]
182
A† operates on bras, but since bras are row-vectors, it has to operate on the right,
not left:
And the symmetry we would like to see is that the “output” hφ| to which A† maps
hψ| is the bra corresponding to the ket A |ψi. That dizzying sentence translated into
symbols is
†
hφ| ←→ A hψ| .
Example. Start with our familiar 2-dimensional state space and consider the
operator,
i −i
A = .
1 0
Its adjoint is
† −i 1
A = .
i 0
As you can see by comparing the RHS of both calculations, the adjoint of A |ψi is
hψ| A† , in agreement with the claim.
c A |ψi + d h+ | ηi |ηi
into its bra counterpart. The rules will guide us, and they work for expressions far
more complex with equal ease.
We state the rules, then we will try them out on this expression. This calls for a
new trait.
183
7.12.4 Trait #8 (Adjoint Conversion Rules)
• The terms of a sum can be (but don’t have to be) left in the same order.
• (Covered by the above, but stated separately anyway:) Inner products are re-
versed.
• When done (for readability only), rearrange each product so that the scalars are
on the left of the vectors.
If we apply the adjoint conversion rules, except for the readability step, to the
above combination we get
†
c A |ψi + d h+ | ηi |ηi = hψ| A† c∗ + hη| hη | +i d∗ ,
which we rearrange to
c∗ hψ| A† + d∗ hη | +i hη| .
You’ll get fast at this with practice. I don’t want to spend any more real estate on
the topic, since we don’t apply it very much in our first course, CS 83A, but here are
a couple exercises that will take care of any lingering urges.
[Exercise. Use the rules to convert the resulting bra of the last example back
into a ket. Confirm that you get the ket we started with.]
[Exercise. Create a wild ket expression consisting of actual literal scalars, matri-
ces and column vectors. Use the rules to convert it to a bra. Then use the same rules
to convert the bra back to a ket. Confirm that you get the ket you started with.]
184
(We did this very thing by selecting only z-up electrons in a Stern-Gerlach-like ap-
paratus, for example.) We now look at our state expanded along the A eigenbasis,
X
|ψi = ck |uk i .
k
How do the amplitudes, ck , and their corresponding probabilities, |ck |2 , make them-
selves felt, by us human experimenters?
The answer to this question starts by taking many repeated measurements of the
observable A on these many identical states |ψi and recording our results.
[Exercise. Explain why we can’t get the same results by repeating the A mea-
surements on a single system S in state |ψi.]
jth measurement of A = mj .
If we take a large enough N , what do we expect this average to be? This answer
comes from the statistical axiom called the law of large numbers, which says that this
value will approach the expectation value, µ, as N → ∞, that is,
lim m = µ .
N →∞
This is good and wonderful, but I have not yet defined the expectation value µ. Better
do that, fast.
[Note. I should really have labeled m with N , as in mN , to indicate that each
average depends on the number of measurements taken, as we are imagining that we
can do the experiment with larger and larger N . But you understand this without
the extra notation.]
185
measurement (eigenvalue) ak is given by its kth expansion coefficient along the A
basis (ck ), specifically by its magnitude-squared, |ck |2 ,
In case you don’t see why this has the feeling of an expectation value (something we
might expect from a typical measurement, if we were forced to place a bet), read it
in English:
The first measurable value times the probability of getting that value
plus
the second measurable value times the probability of getting that value
plus
... .
In physics, rather than using the Greek letter µ, the notation for expectation value
focuses attention on the observable we are measuring, A,
X
hAi|ψi ≡ |ck |2 ak .
k
Fair Warning. In quantum mechanics, you will often see the expectation value of
the observable, A, written without the subscript, |ψi,
hAi ,
but this doesn’t technically make sense. There is no such thing as an expectation
value for an observable that applies without some assumed state; you must know
which |ψi has been prepared prior to doing the experiment. If this is not obvious to
you, look up at the definition one more time: We don’t have any ck to use in the
formula unless there is a |ψi in the room, because
X
|ψi = ck |uk i .
k
When authors suppress the subscript state on the expectation value, it’s usually
because the context strongly implies the state or the state is explicitly described
earlier and applies “until further notice.”
Calculating an expectation value tells us one way we can use the amplitudes.
This, in turn acts as an approximation of the average m, of a set of experimental
measurements on multiple systems in the identical state.
186
7.13.3 Computing Expectation Value
It seems like we should be done with this section. We have a formula for the expecta-
tion value, hAi|ψi , so what else is there to do? It turns out that computing that sum
isn’t always as easy or efficient as computing the value a different way.
or we can first apply A to the bra hψ|, and then dot it with the ket,
hψ| A |ψi .
If you do it this way, be careful not take the adjoint of A. Just because we apply
an operator to a bra does not mean we have to take its Hermitian conjugate. The
formula says to use A, not A† , regardless of which vector we feed it.
[Exercise. Prove that the two interpretations of hψ | A | ψi are equal by expressing
everything in component form with respect to any basis. Hint: It’s just (a row vector)
× (a matrix) × (a column vector), so multiply it out both ways.]
Proof of the Expectation Value Theorem. This is actually one way to prove
the last exercise nicely. Express everything in the A-basis (i.e., the eigenkets of A
which Trait #4 tells us form a basis for the state space H). We already know that
c1
c2
X
|ψi = ck |uk i = c3 ,
..
k .
cn A-basis
which means (by our adjoint conversion rules)
X
hψ| = c∗k huk | = (c∗1 , c∗2 , c∗3 , . . . , c∗n )A-basis .
k
187
Finally, what’s A in its own eigenbasis? We know that any basis vector expressed in
that basis’ coordinates has a preferred basis look, (1, 0, 0, 0, . . . )t , (0, 1, 0, 0, . . . )t ,
etc. To that, add the definition of an eigenvector and eigenvalue
Mu = au,
and you will conclude that, in its own eigenbasis, the matrix for A is 0 everywhere
except along its diagonal, which holds the eigenvalues,
a1 0 0 · · · 0
0 a2 0 · · · 0
A = 0 0 a3 · · · 0 .
.. .. .. . . ..
. . . . .
0 0 0 · · · an A-basis
[Oops. I just gave away the answer to one of today’s exercises. Which one?] We now
have all our players in coordinate form, so
a1 0 0 · · · 0 c1
0 a2 0 · · · 0 c2
∗ ∗ ∗
hψ | A | ψi = (c1 , c2 , c3 , . . . , cn ) ∗ 0 0 a3 · · · 0 c3
.. .. .. . . .. ..
. . . . . .
0 0 0 ··· an cn
a1 c 1
a2 c 2
= (c∗1 , c∗2 , c∗3 , . . . , c∗n ) a3 c3
..
.
an c n
X
= |ck |2 ak ,
k
This is a great sanity check, since we know from Trait #7 (the fifth postulate of
QM) that Sz will always report a + ~2 with certainty if we start in that state. Let’s
confirm it.
~ 1 0 1 ~
h+ | Sz | +i = (1, 0) = + .
2 0 −1 0 2
188
That was painless. Notice that this result is weaker than what we already knew
from Trait #7. This is telling us that the average reading will approach the (+)
eigenvalue in the long run, but in fact every measurement will be (+).
[Exercise. Show that the expectation value h− | Sz | −i = − ~2 .]
Once again, we know the answer should be − ~2 , because we’re starting in an eigenstate
of Sx , but we will do the computation in the z-basis, which involves a wee-bit more
arithmetic and will serve to give us some extra practice.
1 ~ 0 1 1 1
h− | Sx | −ix = √ (1, −1) √
x 2 2 1 0 2 −1
~ 0 1 1
= (1, −1)
4 1 0 −1
~ −1 ~
= (1, −1) = − .
4 1 2
This time, it’s not so obvious. However, we can guess. Since, the state
|+i − i |−i
|−iy = √
2
we see that the probability for each outome is 1/2. Over time half will result in an
Sz measurement of + ~2 , and half will give us − ~2 , so the average should be close to 0.
Let’s verify that.
1 ~ 1 0 1 1
h− | Sz | −iy = √ (1, +i) √
y 2 2 0 −1 2 −i
~ 1 0 1
= (1, +i)
4 0 −1 −i
~ 1
= (1, +i) = 0.
4 i
189
but for now you’re ready to dive into the lectures on single qubit systems and early
algorithms.
The next chapter is a completion of our quantum mechanics primer that covers
the basics of time evolution, that is, it describes the laws by which quantum systems
evolve over time. It isn’t required for CS 83A, but you’ll need it for the later courses
in the sequence. You can skip it if you are so inclined or, if you “opt in” immediately,
it’ll provide the final postulates and traits that comprise a complete study of quantum
formalism including the all important Schrödinger equation.
Whether you choose to go directly to the chapter on qubits or first learn the
essentials of the time dependent Schrödinger equation you’re in for a treat. Both
subjects provide a sense of purpose and completion to all the hard work we’ve done
up to this point.
Either way, it’s about time.
190
Chapter 8
191
8.2.2 From Classical Hamiltonian to Quantum Hamiltonian
We have to figure out a way to express the energy of a system S using pen and paper
so we can manipulate symbols, work problems and make predictions about how the
system will look at 5 PM if we know how it started out at 8 AM. It sounds like a
daunting task, but the 20th century physicists gave us a conceptually simple recipe
for the process. The first step is to define a quantum operator – a matrix for our
state space – that corresponds to the total energy. Well call this recipe ...
2. Replace the occurrence of the classical variables on the RHS by their (well
known) quantum operators, and replace the symbol for classical energy, H ,
on the LHS by its quantum symbol H.
192
in relation to the magnetic field. (You may challenge that I forgot to account for
the rotational kinetic energy, but an electron has no spatial extent, so there is no
classical moment of inertia, and therefore no rotational kinetic energy.) We want to
build a classical energy equation, so we treat spin as a classical 3-dimensional vector
representing the intrinsic angular momentum, (sx , sy , sz )t . A dot product between
this vector and the magnetic field vector, B, expresses this potential energy and yields
the following classical Hamiltonian
H = −γ B · S ,
where γ is a scalar known by the impressive name gyromagnetic ratio whose value is
not relevant at the moment. (We are temporarily viewing the system as if it were
classical in order to achieve step 1 in Trait #10, but please understand that it
already has one foot in the quantum world simply by the inclusion of the scalar γ.
Since scalars don’t affect us, this apparent infraction doesn’t disturb the process.)
Defining the z-Direction. The dot product only cares about the relationship
between two vectors,
v·w = vw cos θ ,
where θ is the angle between them, so we can rotate the pair as a fixed assembly, that
is, preserving angle θ. Therefore, let’s establish a magnetic field (with magnitude B)
pointing in the +z-direction,
0
B = B ẑ = 0 ,
B
and let the spin vector, S, go along for the rotational ride. This does not produce
a unique direction for the spin, but we only care about the polar angle θ, which we
have constrained to remain unchanged. Equivalently, we can define the z-direction
to be wherever our B field points. Either way, we get a very neat simplification.
The classical spin has well-defined real-valued components sx , sy and sz (not
operators yet),
sx
S = sy ,
sz
and we use this vector to evaluate the dot product, above. Substituting gives
0 sx
H = −γ 0 · sy
B sz
= −γ B sz .
This completes step 1 in Trait #10, and I can finally show you how easy it is to do
step 2.
193
8.3.2 A Quantum Hamiltonian
We saw that (by a century of experimentation) the quantum operator corresponding
to the classical z-component of spin is Sz . So we simply replace the classical sz with
our now familiar quantum operator operator Sz to get the quantum Hamiltonian,
H= −γ B Sz .
194
or
γB~
H |+i = − |+i and
2
γB~
H |−i = |−i .
2
But this says that H has the same two eigenvectors as Sz , only they are associated
with different eigenvalues,
γB~
|+i ←→ − ,
2
γB~
|−i ←→ + .
2
If we measure the energy of the system, we will get − γB~
2
if the electron collapses into
the |+i state, pointing as close as quantum mechanics allows toward the (+z)-axis,
the direction of the B-field. Meanwhile, if it collapses into the |−i state, we will
get + γB~
2
, pointing as far as quantum mechanics allows from (+z)-axis, opposite the
direction of the B-field.
Does this make sense?
Yes. When a magnetic dipole (which is what electron spin represents, discounting
the gyromagnetic ratio γ) is pointing in the direction of a magnetic field, energy is
minimum. Imagine a compass needle pointing magnetic north. The potential energy
of the compass-Earth system is at its minimum: It takes no energy to maintain that
configuration. However, if the dipole is pointing opposite the direction of the magnetic
field, energy is at its maximum: Imagine the compass needle pointing magnetic south.
Now it takes energy to hold the needle in place. The potential energy of the compass-
Earth system is maximum. The fact that the energy measurement is negative for the
|+i but positive for the |−i faithfully represents physical reality.
Also, note that for a spin-1/2 system, any state vector, |ψi, has the same coor-
dinate expression whether we expand it along the eigenkets of Sz or those of H, as
they are the same two kets, |+i and |−i.
The only allowable (i.e., measurable) energies of a quantum system are the eigenvalues
of the Hamiltonian.
195
Let’s take a short side-trip to give the crucial postulate that expresses, in full gen-
erality, how any quantum state evolves based on the system’s Hamiltonian operator,
H.
|ψi −→ | ψ(t) i .
Everything we did earlier still holds if we freeze time at some t0 . We would then
evaluate the system at that instant as if it were not time-dependent and we were
working with the fixed state
n
X
0
|ψi = |ψ(t )i = c0k |uk i ,
k=1
c0k ≡ 0
ck (t ) .
To get to time t = t0 (or any future t > 0), though, we need to know the exact
formula for those coefficients, ck (t), so we can plug-in t = t0 and produce this fixed
state. That’s where the sixth postulate of quantum mechanics comes in.
196
This is still a time-dependent equation; we merely have a simplification acknowledging
that t does not appear in the matrix for H.
The “Other” Schrödinger Equation? In case you’re wondering whether
there’s a time-independent Schrödinger equation, the answer is, it depends on whom
you ask. Purists say, not really, but most of us consider the eigenket-eigenvalue
equation of Trait #3’,
TA |uk i = ak |uk i ,
• our state vector, |ψ(t)i, with its two expansion coefficients, the unknown func-
tions c1 (t) and c2 (t),
!
c1 (t)
| ψ(t) i = c1 (t) |+i + c2 (t) |−i = ,
c2 (t)
z
Let’s compute. Substitute the coordinate functions in for |ψi in the Schrödinger
equation,
! !
d c1 (t) ~ 1 0 c1 (t)
i~ = −γB .
dt c2 (t) 2 0 −1 c2 (t)
197
This is equivalent to
d ! !
i~ dt
c 1 −γB ~2 c1
= ,
d
i~ c
dt 2
γB ~2 c2
or
d γBi
c1 = c1 and
dt 2
d γBi
c2 = − c2 .
dt 2
From calculus we know that the solutions to
dx
= kx ,
dt
with constant k, are the family of equations
x(t) = Cekt ,
one solution for each complex constant C. (If you didn’t know that, you can verify it
now by differentiating the last equation.) The constant C is determined by the initial
condition, at time t = 0,
C = x(0) .
c1 (t) = C1 eit(γB/2) ,
c2 (t) = C2 e−it(γB/2) .
|ψ(0)i = |ψi ,
our starting state. In other words, the initial conditions for our two equations are
c1 (0) = α0 and
c2 (0) = β0 ,
where we are saying α0 and β0 are the two scalar coefficients of the state |ψi at time
t = 0. That gives us the constants
C1 = α0 ,
C2 = β0 ,
198
the complete formulas for the time-dependent coefficients,
c1 (t) = α0 eit(γB/2) ,
c2 (t) = β0 e−it(γB/2) .
We pause to consider how this can be generalized to any situation (in the case of
finite or enumerable eigenvalues). I’ll introduce an odd notation that physicists have
universally adopted, namely that the eigenket associated with eigenvalue a, will be
that same a inside the ket symbol, i.e., |ai .
2. Next, we expanded the initial state, |ψi, along the energy basis,
X
|ψi = ck |Ek i .
k
ck (t) = ck e−itEk /~ .
199
8.5.3 Stationary States
Notice what this implies if our initial state happens to be one of the energy eigenstates.
In our spin-1/2 system that would be either |+i or |−i. Take the |ψi = |+i. The
result is
Introducing the shorthand φt ≡ γBt/2, the time evolution merely causes a “phase
factor” of eiφt to appear. But remember that our state-space does not differentiate
between scalar multiples of state vectors, so
Its one and only expansion coefficient changes by a factor of eiφt whose square mag-
nitude is 1 regardless of t. This is big enough to call a trait.
An eigenstate of the Hamiltonian operator evolves in such a way that its measurement
outcome does not change; it remains in the same eigenstate.
For this reason, eigenstates are often called stationary states of the system.
200
Allow this state to evolve for a time t according to the Schrödinger equation,
1. compute the Energy eigenvalues and eigenkets for the system, { Ek ↔ |Ek i} ,
by solving the “time-independent” Schrödinger equation, H |Ek i = Ek |Ek i,
201
P
2. expand |ψi along the energy basis: |ψi = k ck |Ek i,
3. attach (as factors) the time-dependent phase factors, e−itEk /~ to each term,
−itEk /~
P
|ψ(t)i = k ck e |Ek i,
4. “dot” this expression with the desired eigenket, |uj i of A, to find its amplitude,
αj (t) = huj | ψ(t)i, and
5. the magnitude-squared of this amplitude, |αj (t)|2 , will be the probability of an
A-measurement producing the eigenvalue ak at time t.
The short version is that for any observable of interest, A, you first solve the sys-
tem’s time-independent Schrödinger equation and use its energy eigenbasis to express
your state, |ψi. Incorporate the time dependence into that expression and “dot” that
with the desired eigenstate of A to get your amplitude and, ultimately, probability.
(I keep using quotes with the verb “dot”, because this is really an inner -product,
rather than a real dot-product, requiring the left vector’s coordinates to be conju-
gated.)
Example
We continue to examine the evolution of an electron that starts in the y-down state,
|−iy . We’ve already done the first three steps of Trait #14 and found that after
time t the state |−iy evolves to
|+i − i e−it(γB) |−i
| (−)t iy = √ .
2
(I used | (−)t iy , rather than the somewhat confusing |−(t)iy , to designate the state’s
dependence on time.)
That’s the official answer to the question “how does |−iy evolve?”, but to see how
we would use this information, we have to pick an observable we are curious about
and apply Trait #14, steps 4 and 5.
Let’s ask about Sy , the y-projection of spin – specifically the probability of mea-
suring a |+iy at time t. Step 4 says to “dot” the time-evolved state with the vector
|+iy , so the amplitude (step 4) is
c+y = yh + | (−)t iy .
I’ll help you read it: the “left” vector of the inner product is the + ~2 eigenket of the
operator Sy , |+iy , independent of time. The “right” vector of the inner product is
our starting state, |−iy , but evolved to a later time, t.
Because everything is expressed in terms of the z-basis, we have to be sure we
stay in that realm. The z-coordinates of |+iy are obtained from our familiar
√ !
|+i + i |−i 1/ 2
|+iy = √ = √ .
2 i/ 2 z
202
I added the subscript z on the RHS to emphasize that we are displaying the vector
|+iy in the z-coordinates, as usual. If we are to use this on the left side of a complex
inner product we have to take the conjugate of all components. This is easy to see in
the coordinate form,
√ √
y h+| = 1/ 2 , −i/ 2 z ,
but let’s see how we can avoid looking inside the vector by applying our adjoint
conversion rules to the expression defining |+iy to create a bra for this vector. I’ll
give you the result, and you can supply the (very few) details as an ...
[Exercise. Show that
† h+| − i h−|
y h+| = |+iy = √ . ]
2
Getting back to the computation of the amplitude, c+y , substitute the computed
values into the inner product to get
(The last two equalities made use of the orthonormality of any observable eigenbasis
(Trait #4).)
Finally, step 5 says that the probability of measuring Sy = + ~2 at any time t is
1 − eiγBt 1 − e−iγBt
2 ∗
|c+y | = c+y c+y = ,
2 2
where we used the fact (see the complex number lecture) that, for real θ,
∗
e−iθ = eiθ .
θ ≡ γBt
2 − eiθ − e−iθ
|c+y |2 =
4
1 eiθ + e−iθ
1
= −
2 2 2
1 1
= − cos θ .
2 2
203
Undoing the substitution gives us the final result
~ 1 1
P Sy (t) = + = − cos (γB t) .
2 2 2
As you can see, the probability of measuring an up-y state oscillates between 0 and
1 sinusoidally over time. Note that this is consistent with our initial state at time
t = 0: cos 0 = 1, so the probability of measuring + ~2 is zero; it had to be since we
started in state |−iy , and when you are in a eigenstate (|−iy ), the measurement of the
observable corresponding to that eigenstate (Sy ) is guaranteed to be the eigenstate’s
eigenvalue (− ~2 ). Likewise, if we test precisely at t = π/(γB), we get 21 − 12 (−1) = 1,
a certainty that we will detect + ~2 , the |+iy eigenvalue.
We can stop the clock at times between those two extremes to get any probability
we like.
[Exercise. What is the probability of measuring Sy (t) = + ~2 at the (chronologi-
cally ordered) times
(a) t = π/(6γB) ,
(b) t = π/(4γB) ,
(c) t = π/(3γB) ,
(d) t = π/(2γB) . ]
[Exercise. Do the same analysis to get the probability that Sy measured at time
t will be − ~2 . Confirm that at any time t, the two probabilities add to 1.]
|c1 |2 + |c2 |2 = 1.
204
We let the state (of any one of these systems, since they are all the same) evolve for a
time t. We have already solved the Schrödinger equation and found that the evolved
state at that time will be
|ψ(t)i = c1 eit(γB/2) |+i + c2 e−it(γB/2) |−i
!
c1 eit(γB/2)
= .
c2 e−it(γB/2)
This, then is the state at time t, prior to measurement.
Rewriting |ψ(t)i
It will help to represent this state by an equivalent vector that is a mere unit scalar
multiple of itself. To do that, we first express c1 and c2 in polar form,
c1 = c eiφ1 and
iφ2
c2 = se ,
giving the equivalent state
!
c eiφ1 eit(γB/2)
|ψ(t)i = .
s eiφ2 e−it(γB/2)
φ1 +φ2
Then we multiply by the unit scalar e−i( 2 ) to get a more balanced equivalent
state,
φ −φ
i( 1 2 2 ) it(γB/2)
ce e
|ψ(t)i = φ1 −φ2
.
−i( 2 ) −it(γB/2)
se e
Now, we simplify by making the substitutions
ω = γB and
φ0 = φ1 − φ2 ,
to get the simple and balanced Hilbert space representative of our state,
! !
c eiφ0 /2 eitω/2 c ei (tω + φ0 )/2
|ψ(t)i = = .
s e−iφ0 /2 e−itω/2 s e−i (tω + φ0 )/2
We get a nice simplification by using the notation
ωt + φ0
φ(t) ≡ ,
2
to express our evolving state very concisely as
!
c ei φ(t)
|ψ(t)i = .
s e−i φ(t)
205
A Convenient Angle
There is one last observation before we start to compute. Since |ψi is normalized,
|c|2 + |s|2 = 1,
the amplitudes c and s have moduli (absolute values) that are consistent with the
sine and cosine of some angle. Furthermore, we can name that angle anything we
like. Call it “θ/2” for reasons that will become clear in about 60 seconds. [Start of
60 seconds.]
We have proclaimed the angle θ to be such that
θ
c = cos and
2
θ
s = sin ,
2
which is why I named the amplitudes c and s.
Also, we’re going to run into the two expressions cs and c2 − s2 a little later,
so let’s see if we can write those in terms of our angle θ. The addition law of sines
implies that
θ θ sin θ
cs = cos sin = ,
2 2 2
while the addition law of cosines yields
θ θ
c2 − s 2 = cos2 − sin2 = cos θ .
2 2
By letting θ/2 be the common angle that we used to represent c and s (instead of,
say, θ) we ended up with plain old θ on the RHS of these formulas, which is the form
we’ll need. [End of 60 seconds.]
Although we’ll start out using c and s for the moduli of |ψi’s amplitudes, we’ll
eventually want to make these substitutions when the time comes. The angle θ will
have a geometric significance.
206
|ψ(t)i 3N or 3 million times. The physicists record the measurements producing a
certain number of +z values, a certain number of −z, etc. We ask them to compute
the average of the Sz results – a number between - ~2 and + ~2 , and the same with the
Sx and Sy results.
So they’ll have three numbers in the end, mx , my and mz .
But hold on a second. We don’t have to bother the physicists, because when N
is large, we know from the the law of large numbers that the average values of each
of the three spin projections are approximated very closely by the expectation values
of the operators. So, let’s compute those instead of wasting a lot of time and money.
We do this for each observable Sz , Sx and Sy , individually, then find a way to combine
the answers. We’ll begin with hSz (t)i.
Trait #9 (the expectation value theorem), tells us that we can compute this using
hψ(t) | Sz | ψ(t)i .
With the help of Trait #8 (the adjoint conversion rules) and keeping in mind that
c and s are real, we find
hψ(t) | Sz | ψ(t)i
! !
~ 1 0 c ei φ(t)
= c e−i φ(t) , s ei φ(t)
2 0 −1 s e−i φ(t)
!
−i φ(t) i φ(t)
~ c eiφ(t)
= ce , se
2 −s e−iφ(t)
~ 2 ~
c − s2
= = cos θ .
2 2
We can draw some quick conclusions from this (and subtler ones later).
207
B. The Expectation Value for the x-spin Observable: hSx (t)i
We compute
hψ(t) | Sx | ψ(t)i .
Using our adjoint conversion rules again, we find
hψ(t) | Sx | ψ(t)i
! !
~ 0 1 c ei φ(t)
= c e−i φ(t) , s ei φ(t)
2 1 0 s e−i φ(t)
!
~ s e−i φ(t)
= c e−i φ(t) , s ei φ(t)
2 c ei φ(t)
~
cs e−2i φ(t) + cs e2i φ(t) .
=
2
This is nice, but a slight rearrangement should give you a brilliant idea,
e−2i φ(t) + e2i φ(t)
hψ(t) | Sx | ψ(t)i = cs ~ .
2
Look back at our lesson on complex numbers, especially the consequences of the Euler
formula, and you’ll discover that the fraction simplifies to cos (2φ(t)). Now we have
hψ(t) | Sx | ψ(t)i = cs ~ cos (2φ(t)) ,
which, after undoing our substitutions for cs and φ(t) we set up in the convenient
angle section, looks like
~
hψ(t) | Sx | ψ(t)i = cs ~ cos (ωt + φ0 ) = sin θ cos (ωt + φ0 ) .
2
Observe these consequences.
|+ix ± |−ix
|±i = √ ,
2
so we would expect a roughly equal collapse into the |+ix and |−ix states,
averaging to 0.
• We’ve already established that the two kets |±i are stationary states of H,
so whatever holds at time t = 0, holds for all time.
208
C. The Expectation Value for the y-spin Observable: hSy (t)i
We compute
hψ(t) | Sy | ψ(t)i ,
hψ(t) | Sy | ψ(t)i
! !
~ 0 −i c ei φ(t)
= c e−i φ(t) , s ei φ(t)
2 i 0 s e−i φ(t)
!
~ −is e−i φ(t)
= c e−i φ(t) , s ei φ(t)
2 ic ei φ(t)
~
cs e−2i φ(t) − cs e2i φ(t)
=
2i
Rearranging and applying one of our Euler formulas, we find
|+iy + |−iy
|+i = √ and
2
|+iy − |−iy
|−i = √ ,
i 2
(from a prior exercise).
209
In contrast, the quantum mechanical spin state-vector lives in a 2-dimensional Hilbert
space, not 3-dimensional real space, so we don’t have simultaneously measurable x, y,
and z-components which we can study. However, we can define a real (and evolving)
3-dimensional vector s(t) to be
hSx i|ψ(t)i
hS i
s(t) ≡ y |ψ(t)i .
hSz i|ψ(t)i
This s(t) is a true 3-dimensional (time-dependent) vector whose real coordinates are
the three expectation values, hSx i, hSy i and hSz i, at time, t.
In the previous section we showed that
~
2
sin θ cos (ωt + φ0 ) sin θ cos (ωt + φ0 )
~
s(t) = − ~2 sin θ sin (ωt + φ0 ) = − sin θ sin (ωt + φ0 ) .
2
~
2
cos θ cos θ
If this is not speaking to you, drop the factor of ~/2 and set φ(t) = ωt + φ0 . What
we get is the 3-dimensional vector
sin θ cos φ(t)
s(t) ∝ − sin θ sin φ(t) .
cos θ
It is a unit vector in R3 whose spherical coordinates are (1, θ, φ(t)), i.e., it has a polar
angle θ and azimuthal angle φ(t). (I don’t use exclamation points, but if I did, I would
use one here.) We are looking at a vector that has a fixed z-coordinate, but whose
x and y-coordinates are in a clockwise circular orbit around the origin (clockwise
because of the y-coordinate’s minus sign – [Exercise]). This is called precession.
Since our B-field was defined to point in the +z direction, we have discovered the
meaning of the vector s(t) = (hSx (t)i , hSy (t)i , hSz (t)i )t .
This is something that recurs with regularity in quantum physics. The quan-
tum state vectors, themselves, do not behave like classical vectors. However, their
expectation values do.
210
within a uniform magnetic field B pointing in the +z direction, the Schrödinger equa-
tion tells us that its expectation value vector, s(t), evolves according to the formula
sin θ cos (ωt + φ0 )
~
s(t) = − sin θ sin (ωt + φ0 ) .
2
cos θ
• θ is the polar angle that the real vector, s(t), makes with the z-axis in our R3 .
It relates to |ψi in that θ/2 is the angle that expresses the magnitudes c and s
of |ψi’s Hilbert-space coordinates,
θ
c = cos and
2
θ
s = sin .
2
θ/2 ranges from 0 (when it defines the up-z state, |+i) to π/2 (when it defines
the down-z state, |−i state), allowing θ to range from 0 to π in R3 .
• ω is the Larmor frequency, defined by the magnitude of the B-field and the
constant, γ (the gyromagnetic ratio),
ω ≡ γB .
φ0 ≡ φ1 − φ2 .
211
Chapter 9
The Qubit
212
Folksy Definition of a Bit
The main take-away is that 0 and 1 are not bits. They are the values or states that
the bit can attain.
We can use the notation
x = 0
to mean that “x is in the state 0,” an observation (or possibly a question) about the
state bit x is in. We also use the same notation to express the imperative, “put x into
the state 0.” The latter is the programmer’s assignment statement.
What about the logical operators like AND or XOR which transform bits to other
bits? We can define those, too, using similarly loose language.
Logical operators are also called logic gates – or just gates – when implemented in
circuits diagrams.
Note: We are only considering functions that have a single output bit. If one
wanted to build a logic gate with multiple output bits it could be done by combining
several single-output logic gates, one for each output bit.
213
Mathematically, x ⊕ y is called “the mod-2 sum of x and y,” language that is used
throughout classical and quantum logic.
While logic gates can take any number of inputs, those that have one or two inputs
are given special names.
A unary operator is a gate that takes single input bit, and a binary
operator is one that takes two input bits.
As you can see, NOT is a unary operator while XOR is a binary operator.
Truth Tables
Informally, unary and binary operators are often described using truth tables. Here
are the traditional gate symbols and truth tables for NOT and XOR.
x ¬ x (or x)
x ¬x
0 1
1 0
x y x⊕y
x 0 0 0
x⊕y
y
0 1 1
1 0 1
1 1 0
We could go on like this to define more operators and their corresponding logic gates.
However, it’s a little disorganized and will not help you jump to a qubit, so let’s try
something a little more formal.
Our short formal development of classical logic in this lesson will be restricted to
the study of unary operators. We are learning about a single qubit today which is
analogous to one classical bit and operators that act on only one classical bit, i.e.,
unary operators. It sounds a little boring, I know, but that’s because in classical logic,
unary operators are boring (there are only four). But as you are about to see, in the
quantum world there are infinitely many different unary operators, all useful.
214
9.3 Classical Computation Models – Formal Ap-
proach
The definition of a quantum bit is necessarily abstract, and if I were to define it at
this point, you might not recognize the relationship between it and a classical bit.
To be fair to qubits, we’ll give a formal definition of a classical bit first, using the
same language we will need in the quantum case. This will allow us to establish some
vocabulary in the classical context that will be re-usable in the quantum world and
give us a reference for comparing the two regimes on a level playing field.
We define B = B2 to be the vector space whose scalars come from B and whose vectors
(objects) are ordered pairs of numbers from B. This vector space is so small I can list
215
its objects on one line,
2 0 0 1 1
B = B ≡ , , , .
0 1 0 1
I’m not going to bother proving that B obeys all the properties of a vector space, and
you don’t have to either. But if you are interested, it’s a fun ...
[Exercise. Show that B obeys the properties of a field (multiplicative inverses,
distributive properties, etc.) and that B obeys the properties of a vector space.]
The strange – but necessary – equality on the lower right is a consequence of this
oddball inner product on B = B2 .
[Exercise. Verify the above moduli.]
[Notation. Sometimes I’ll use an ordinary dot for the inner product, (x1 , y1 )t ·
(x2 , y2 )t instead of the circle dot, (x1 , y1 )t (x2 , y2 )t . When you see a vector on
each side of “·” you’ll know that we really mean the mod-2 inner product, not mod-2
multiplication.]
Dimension of B
If B is a vector space, what is its dimension, and what is its natural basis? That’s
not hard to guess. The usual suspects will work. It’s a short exercise.
216
[Exercise. Prove that B is 2-dimensional by showing that
1 0
,
0 1
form an orthonormal basis. Hint: There are only four vectors, so express each in
this basis. As for linear independence and orthonormality, I leave that to you.]
Sounds strange, I know, making a bit equal to an entire vector space. We think of a
bit as capable of holding a 1 or a 0. This is expressed as follows.
A bit, itself, is not committed to any particular value until we say which unit-vector
in B we are assigning it. Since there are only two unit vectors in B, that narrows the
field down to one of two values, which I’ll label as follows:
1
[0] ≡ and
0
0
[1] ≡ .
1
The other two vectors (0, 0)t and (1, 1)t , have length 0 so cannot be normalized (i.e.,
we cannot divide them by their length to form a unit vector).
If the definition feels too abstract, try this out for size. As a programmer, you’re
familiar with the idea of a variable (LVALUE ) which corresponds to a memory lo-
cation. That variable is capable of holding one specific value (the RVALUE ) at any
point in time, although there are many possible values we can assign it. The vector
space B is like that memory location: it is capable of holding one of several different
values but is not committed to any until we make the assignment. Putting a value
into a memory location with an assignment statement like “x = 51;” corresponds to
choosing one of the unit vectors in B to be assigned to the bit. So the values allowed
to be stored in the formal bit (copy B) are any of the unit vectors in B.
217
Multiple Bits
If we have several bits we have several copies of B, and we can name each with a
variable like x, y or z. We can assign values to these variables with the familiar
syntax
x = [1] ,
y = [1] ,
z = [0] ,
etc.
A classical bit (uncommitted to a value) can also be viewed as a variable linear
combination of the two basis vectors in B.
Since α and β are scalars of B, they can only be 0 and 1, so the normalization
condition implies exactly one of them is 1 and the other is 0.
Two questions are undoubtedly irritating you.
2. How will this definition help us grasp the qubit of quantum computing?
Keep reading.
Unary Operators
This is pretty abstract, but we can see how it works by looking at the only four logical
unary operators in sight.
218
• The constant-[0] operator A(x) ≡ [0]. This maps any bit into the 0-bit.
(Don’t forget, in B, the 0-bit is not the 0-vector, it is the unit vector (1, 0)t .)
Using older, informal truth tables, we would describe this operator as follows:
x [0]-op
0 0
1 0
In our new formal language, the constant-[0] operator corresponds to the linear
transformation whose matrix is
1 1
,
0 0
since, for any unit vector (bit value) (α, β)t , we have
1 1 α α⊕β 1
= = = [0] ,
0 0 β 0 0
the second-from-last equality is due to the fact that, by the normalization re-
quirement on bits, exactly one of α and β must be 1.
• The constant-[1] operator A(x) ≡ [1]. This maps any bit into the 1-bit.
Informally, it is described by:
x [1]-op
0 1
1 1
since, for any unit vector (bit value) (α, β)t , we have
0 0 α 0 0
= = = [1] .
1 1 β α⊕β 1
• The negation (or NOT) operator A(x) ≡ ¬ x. This maps any bit into its
logical opposite. It corresponds to
0 1
,
1 0
219
since, for any x = (α, β)t , we get
0 1 0 1 α β
x = = = ¬x.
1 0 1 0 β α
[Exercise. Using the formula, verify that ¬[0] = [1] and ¬[1] = [0].]
• The identity operator A(x) ≡ 1x = x. This maps any bit into itself and
corresponds to
1 0
.
0 1
[Exercise. Perform the matrix multiplication to confirm that 1[0] = [0] and
1[1] = [1].]
Apparently any linear transformation on B other than the four listed above will not
correspond to a logical unary operator. For example, the zero-operator
0 0
0 0
isn’t listed. This makes sense since it does not map normalized vectors to normalized
vectors. For example,
0 0 0 0 1 0
[0] = = = 0,
0 0 0 0 0 0
not a unit vector. Another example is the matrix
1 0
,
1 0
since it maps the bit [1] to the non-normalizable 0,
1 0 1 0 0 0
[1] = = = 0.
1 0 1 0 1 0
220
Reversible Logic Gates
A reversible operator or logic gate is one that can be undone by applying another op-
erator (which might be the same as the original). In other words, its associated matrix
has an inverse. For example, the constant-[1] operator is not reversible, because it
forces its input to the output state [1],
0 0 α 0
= ,
1 1 β 1
and there’s no way to reconstruct (α, β)t reliably from the constant bit [1]; it has
erased information that can never be recovered. Thus, the operator, and its associated
logic gate, is called irreversible.
Of the four logical operators on one classical bit, only two of them are reversible:
the identity, 1, and negation, ¬ . In fact, to reverse them, they can each be reapplied
to the output to get the back original input bit value with 100% reliability.
Example. We show that the matrix for ¬ is unitary by looking at its columns.
The first column is
0
,
1
which
• is orthogonal to the to the second column vector, since (0, 1)t · (1, 0)t = 0.
221
3. their rows (or columns) are orthonormal.
While this is true of usual (i,e., positive definite) inner products, the pairing in B
causes these conditions to lose sync. All four possible logical operators on bits do
preserve lengths and so meet condition 1; we’ve already shown that they map unit
vectors to unit vectors, and we can also check that they map vectors of length zero
to other vectors of length zero:
[Exercise. Verify that the constant-[1] operator maps the zero-length vectors
(1, 1)t and (0, 0)t to the zero-length (0, 0)t . Do the same for the constant-[0] operator.
Thus both of these non-unitary operators preserve the (oddball) mod-2 length.]
However, the constant-[1] and [0] operators do not preserve inner products, nor do
they have unitary matrices.
[Exercise. Find two vectors in B that have inner product 0, yet whose two images
under constant-[1] operator have inner product 1.]
This unusual situation has the consequence that, of the four linear transformations
which qualify to be logical operators (preserving lengths) in B, only two are reversible
(have unitary matrices).
Notice that we don’t need to restrict ourselves, yet, to the projective sphere. That
will come when we describe the allowed values a bit can take.
222
Comparison with Classical Logic
The classical bit was a 2-D vector space, B, over the finite scalar field B, and now we
see that a quantum bit is a 2-D vector space, H, over the infinite scalar field C.
Besides the underlying scalar fields being different – the tiny 2-element B vs. the
infinite and rich C – the vectors themselves are worlds apart. There are only four
vectors in B, while H has infinitely many.
However, there is at least one similarity: they are both two dimensional. The
natural basis for B is [0] and[1], while the natural basis for H is |+i and |−i.
The quantum computing world has adopted different terms for many of the established
quantum physics entities. This will be the first of several sections introducing that
new vocabulary.
Symbols for Basis Vectors. To reinforce the connection between the qubit
and the bit, we abandon the vector symbols |+i and |−i and in their places use |0i
and |1i – same vectors, different names. This is true whether we are referring to the
preferred z-basis, or any other orthonormal basis.
Alternate x-Basis Notation. Many authors use the shorter notation for the
x-basis,
|0ix ←→ |+i
|1ix ←→ |−i ,
but I will eschew that for the time being; |+i and |−i already have a z-basis meaning
in ordinary quantum mechanics, and using them for the x-basis too soon will cause
confusion. However, be prepared for me to call |+i and |−i into action as x-basis
CBS, particularly when we need the variable x for another purpose.
Computational Basis States. Instead of using the term eigenbasis, computer
scientists refer to computational basis states (or CBS when I’m in a hurry). For
example, we don’t talk about the eigenbasis of Sz , { |+i , |−i }. Rather, we speak of
the preferred computational basis, { |0i , |1i }. You are welcome to imagine it as being
associated with the observable Sz , but we really don’t care what physical observable
led to this basis. We only care about the Hilbert space H, not the physical system,
S , from which it arose. We don’t even know what kind of physics will be used to
build quantum computers (yet). Whatever physical hardware is used, it will give us
the 2-D Hilbert space H.
223
Alternate bases like { |0ix , |1ix }, { |0iy , |1iy } or even { |0in̂ , |1in̂ } for some di-
rection n̂, when needed, are also called computational bases, but we usually qualify
them using the term alternate computational basis. We still have the short-hand
terms z-basis, x-basis, etc., which avoid the naming conflict, altogether. These alter-
nate computational bases are still defined by their expansions in the preferred, z-basis
( |1ix = |0i√−2|1i , e.g.), and all the old relationships remain.
In other words, a qubit is an entity – the Hilbert space H – whose value can be any
vector on the projective sphere of that space.
A qubit, itself, is not committed to any particular value until we say which specific
unit-vector in H we are assigning to it.
Normalization
Why do we restrict qubit values to the projective sphere? This is a quantum system,
so states are always normalized vectors. That’s how we defined the state space in our
quantum mechanics lesson. A qubit’s state (or the state representing any quantum
system) has to reflect probabilities which sum to 1. That means the magnitude-
squared of all the amplitudes must sum to 1. Well, that’s the projective sphere. Any
vectors off that sphere cannot claim to reflect reality.
Certainly there will be times when we choose to work with un-normalized vec-
tors, especially in CS 83B, but that will be a computational convenience that must
eventually be corrected by normalizing the answers.
Just as a classical bit was capable of storing a state [0] or [1] (the two unit vectors
in B), qubits can be placed in specific states which are normalized vectors in H.
This time, however, there are infinitely many unit vectors in H: we have the entire
projective sphere from which to choose.
224
Alternative Definition of Qubit. A “qubit” is a variable superposition
of the two natural basis vectors of H,
I used the word “variable” to call attention to the fact that the qubit stores a value,
but is not the value, itself.
still resides on the projective sphere, and since it is a scalar multiple of |ψi, it is a
valid representative of same state or qubit value. We would say “|ψi and eiθ |ψi differ
by an overall, or global, phase factor θ,” a condition that does not change anything,
but can be used to put |ψi into a more workable form.
The alternate/working expressions for bits in the two regimes look similar.
In the classical case, α and β could only be 0 or 1, and they could not be the same
(normalizability), leading to only two possible states for x, namely, [0] or [1]. In the
quantum case, α and β can be any of an infinite combination of complex scalars,
leading to an infinite number of distinct values for the state |ψi.
225
Parallel Processing
The main motivation comes from our Trait #6, the fourth postulate of quantum
mechanics. It tells us that until we measure the state |ψi, it has a probability of
landing in either state, |0i or |1i, the exact details given by the magnitudes of the
complex amplitudes α and β.
As long as we don’t measure |ψi, it is like the Centaur of Greek mythology: part |0i
and part |1i; when we process this beast with quantum logic gates, we will be sending
both alternative binary values through the hardware in a single pass. That will change
it to another normalized state vector (say |φi), which has different amplitudes, and
that can be sent through further logic gates, again retaining the potential to be part
|0i and part |1i, but with different probabilities.
Well, this is not such a great selling point. If qubits are so much better than bits,
why not leave them alone? Worse still, our Trait #7, the fifth postulate of quantum
mechanics, means that even if we don’t want to collapse the qubit into a computational
basis state, once we measure it, we will have done just that. We will lose the exquisite
subtleties of α and β, turning the entire state into a |0i or |1i.
Once you go down this line of thought, you begin to question the entire enterprise.
We can’t get any answers if we don’t test the output states, and if they always collapse,
what good did the amplitudes do? This skepticism is reasonable. For now, I can only
give you some ideas, and ask you to wait to see the examples.
1. We can do a lot of processing “below the surface of the quantum ocean,” ma-
nipulating the quantum states without attempting an information-destroying
measurement that “brings them up for air” until the time is right.
2. We can use Trait #6, the fourth postulate, in reverse: Rather than looking at
amplitudes as a prediction of experimental outcomes, we can view the relative
distribution of several measurement outcomes to guess at the amplitudes of the
output state.
3. By preparing our states and quantum logic carefully, we can “load the dice” so
226
that the likelihood of getting an information-rich collapsed result will be greatly
enhanced.
The QNOT operator swaps the amplitudes of any state vector. It corresponds to
0 1
,
1 0
the same matrix that represents the NOT operator, ¬, of classical computing. The
difference here is not in the operator but in the vast quantity of qubits to which we
can apply it. Using |ψi = (α, β)t , as we will for this entire lecture, we find
0 1 α β
X |ψi = = .
1 0 β α
In the special case of a CBS ket, we find that this does indeed change the state from
|0i to |1i and vice versa.
[Exercise. Using the formula, verify that X |0i = |1i and X |1i = |0i.]
227
Notation and Vocabulary
The X operator is sometimes called the bit flip operator, because it “flips” the CBS
coefficients, α ↔ β. In the special case of a pure CBS input, like |0i, it “flips” it to
the other CBS, |1i.
The reason QNOT is usually labeled using the letter X is that, other than the
factor of ~2 , the matrix is the same as the spin-1/2 observable Sx . In fact, you’ll recall
from the quantum mechanics lesson that QNOT is precisely the Pauli spin matrix in
the x-direction,
0 1
X = σx = .
1 0
It’s best not to read anything too deep into this. The matrix that models an ob-
servable is used differently than one that performs a reversible operation on a qubit.
Here, we are swapping amplitudes and therefore negating computational basis states.
That’s the important take-away.
when we want to express an X gate or, less frequently, if we want to be overly explicit,
QN OT .
We might show the effect of the gate right on the circuit diagram,
The input state is placed on the left of the gate symbol and the output state on the
right.
Because any linear operator is completely determined by its action on a basis, you will
often see an operator like X defined only on the CBS states, and you are expected to
know that this should be extended to the entire H using linearity. In this case, the
letters x and y are usually used to label a CBS, so
|0i
|xi = or
|1i
228
to be distinguished from |ψi, which can take on infinitely many superpositions of
these two basis kets. With this convention, any quantum logic gate can be defined
by its action on the |xi (and sometimes |yi, |zi or |wi, if we need more input kets).
For the X gate, it might look like this
|xi X |¬xi .
This expresses the two possible input states and says that X |0i = |¬0i = |1i, while,
X |1i = |¬1i = |0i. Using alternative notation,
|xi X |xi .
In fact, you’ll often see the mod-2 operator for some CBS logic. If we used that to
define X, it would look like this:
|xi X |1 ⊕ xi .
[Exercise. Why does the last expression result in a logical negation of the CBS?]
You Must Remember This. The operators (⊕, ¬, etc.) used inside the kets
on the variables x and/or y apply only to the binary values 0 or 1 that label the basis
states. They make no sense for general states. We must extend linearly to the rest
of our Hilbert space.
Sample Problem
Given the definition of the bit flip operator, X, in terms of the CBS |xi,
|xi X |xi ,
what is the action of X on an arbitrary state |ψi, and what is the matrix for X?
Expand |ψi along the computational basis and apply X:
229
While the problem didn’t ask for it, let’s round out the study by viewing X in terms
of a ket’s CBS coordinates,(α, β)t . For that, we can read it directly off the final
derivation of X |ψi, above, or apply the matrix, which I will do now.
0 1 α β
= .
1 0 β α
One Last Time: In the literature, most logic gates are defined in terms of |xi,
|yi, etc. This is only the action on the CBS, and it is up to us to fill in the blanks,
get its action on the general |ψi and produce the matrix for the gate.
We’ve seen that there is at least one classical operator, NOT, that has an analog,
QNOT, in the quantum world. Their matrices look the same and they affect the
classical bits, [0] / [1], and corresponding CBS counterparts, |0i / |1i, identically, but
that’s where the parallels end.
What about the other three classical unary operators? You can probably guess
that the quantum identity,
1 0
1 ≡ ,
0 1
exhibits the same similarities and differences as the QNOT did with the NOT. The
matrices are identical and have the same effect on classical/CBS kets, but beyond
that, the operators work in different worlds.
[Exercise. Express the gate (i.e., circuit) definition of 1 in terms of a CBS |xi
and give its action on a general |ψi. Discuss other differences between the classical
and quantum identity.]
That leaves the two constant operators, the [0]-op and the [1]-op. The simple
answer is there are no quantum counterparts for these. The reason gets to the heart
of quantum computation.
The thing that distinguished ¬ and 1 from [0]-op and the [1]-op in the classical case
was unitarity. The first two were unitary and the last two were not. How did those
last two even sneak into the classical operator club? They had the property that they
preserved the lengths of vectors. That’s all an operator requires. But these constant
ops were not reversible which implied that their matrices were not unitary. The
quirkiness of certain operators being length-preserving yet non-unitary was a fluke
of nature caused by the strange mod-2 inner product on B. It allowed a distinction
between length-preservation and unitary.
In quantum computing, no such distinction exists. The self-same requirement
that operators map unit vectors to other unit vectors in H forces operators to be
230
unitary. This is because H has a well-behaved (positive definite) inner product. We
saw that with any positive definite inner product, unitarity and length-preservation
are equivalent.
But if you don’t want to be that abstract, you need only “try on” either constant
op for size and see if it fits. Let’s apply an attempted analog of the [0]-op to the unit
vector |0ix . For fun, we’ll do it twice. First try out the (non-unitary) matrix for the
classical [0]-op in the quantum regime. We find
!
1 1 |0i + |1i 1 1 1 1 1 1 0
√ = √ +
0 0 2 2 0 0 0 0 0 1
! √
1 1 1 2
= √ + = ,
2 0 0 0
not a unit vector. To see a different approach we show it using a putative operator,
A, defined by its CBS action that ignores the CBS input, always answering with a
|0i,
|xi A |0i .
Measurement
Finally, we ask what impact QNOT has on the measurement probabilities of a qubit.
Here is a picture of the circuit with two potential measurement “access points,” A
(before the gate) and B (after ):
|ψi X
A B
Of course, we know from Trait #7 (fifth postulate of QM) that once we measure the
state it collapses into a CBS, so we cannot measure both points on the same “sample”
of our system. If we measure at A, the system will collapse into either |0i or |1i and
we will no longer have |ψi going into X. So the way to interpret this diagram is to
231
visualize many different copies of the system in the same state. We measure some at
access point A and others at access point B. The math tells us that
(
α |0i + β |1i at Point A
|ψi =
β |0i + α |1i at Point B
|ψi ,
we will get
by Trait #6, the fourth QM postulate. However, if we do not measure those states,
but instead send them through X and only then measure them (at B),
|ψi X ,
the same trait applied to the output expression tells us that the probabilities will be
swapped,
Composition of Gates
Recall from the lesson on linear transformation that any unitary matrix, U , satisfies
U †U = UU† = 1,
where U † is the conjugate transpose a.k.a, adjoint, of U . In other words its adjoint is
also its inverse.
Because unitarity is required of all quantum gates – in particular QNOT – we
know that
X †X = XX † = 1,
X2 = 1.
232
X is its own inverse. If we apply X (or any self-adjoint operator) consecutively
without an intervening measurement, we should get our original state back. For
QNOT, this means
|ψi X X |ψi ,
which can be verified either algebraically, by multiplying the vectors and matrices, or
experimentally, by taking lots of sample measurements on identically prepared states.
Algebraically, for example, the state of the qubit at points A, B and C,
|ψi X X
A B C
will be
α β α
−→ −→ .
β α β
This leads to the expectation that for all quantum gates – as long as we don’t measure
anything – we can keep sending the output of one into the input of another, and while
they will be transformed, the exquisite detail of the qubits’ amplitudes remain intact.
They may be hidden in algebraic changes caused by the quantum gates, but they will
be retrievable due to the unitarity of our gates. However, make a measurement, and
we will have destroyed that information; measurements are not unitary operations.
We’ve covered the only two classical gates that have quantum alter egos. Let’s go
on to meet some one-bit quantum gates that have no classical counterpart.
negates the second amplitude of a state vector, leaving the first unchanged,
1 0 α α
Z |ψi = = .
0 −1 β −β
233
Notation and Vocabulary
Z is called a phase flip, because it changes (maximally) the relative phase of the two
amplitudes of |ψi. Why is multiplication by -1 a maximal phase change? Because
−1 = eiπ ,
[Exercise. Show that this agrees with Z’s action on a general |ψi by expanding |ψi
along the computational basis and using this formula while applying linearity.]
234
Measurement
|ψi Z
A B
yielding the same probabilities, |α|2 and |β|2 , at both access points. Therefore, both
|ψi and Z |ψi will have identical measurement likelihoods; if we have 1000 electrons
or photons in spin state |ψi and 1000 in Z |ψi, a measurement of all of them will
throw about |α|2 × 1000 into state |0i and |α|2 × 1000 into state |1i.
However, two states that have the same measurement probabilities are not
necessarily the same state.
The relative phase difference between |ψi and Z |ψi can be “felt” the moment we
try to combine (incorporate into a larger expression) either state using superposition.
Mathematically, we can see this by noticing that
|ψi + |ψi
= |ψi ,
2
i.e., we cannot create a new state from a single state, which is inherently linearly
dependent with itself, while a distinct normalized state can be formed by
|ψi + Z |ψi (α + α) |0i + (β − β) |1i
=
2α 2α
2α |0i
= = |0i .
2α
Unless it so happens that α = 1 and β = 0, we get different results when using |ψi
and Z |ψi in the second position of these two legal superpositions, demonstrating that
they do, indeed, have different physical consequences.
235
The Y operator is defined by the matrix
0 −i
,
i 0
that last equality accomplished by multiplying the result by the innocuous unit scalar,
i.
Although Y has no official name, we will call it the bit-and-phase flip, because it flips
both the bits and the relative phase, simultaneously.
The symbol Y is used to represent this operator because it is identical to the Pauli
matrix σy .
236
This may seem less than a compelling justification, but the expression is of the syn-
tactic form
→
−σ · n̂ ,
where, n̂ is a real 3-D unit vector (nx , ny , nz )t , →
−
σ is a “vector” of operators
σx
→
−σ ≡ σy ,
σz
and their formal “dot” product, → −
σ · n̂, represents the matrix for the observable Sn̂
(the measurement of spin in the most general direction defined by a unit vector n̂).
This will be developed and used in the next course CS 83B.
Measurement
As a combination of both bit flip (which swaps probabilities) and phase flip (which
does not change probabilities), Y has the same measurement consequence as a simple
QNOT:
|ψi Y
A B
This gives an A-to-B transition
α −iβ
−→ ,
β iα
which causes the probabilities, |α|2 and |β|2 to get swapped at access point B.
237
We immediately recognize that this “rotates” the z-basis kets onto the x-basis kets,
H of a General State
This is to be extended to an arbitrary state, |ψi, using the obvious rules. Rather than
approach it by expanding |ψi along the z-basis then extending linearly, it’s perhaps
faster to view everything in terms of matrices and column vectors,
1 1 1 α 1 α + β
H |ψi = √ = √ ,
2 1 −1 β 2 α − β
which can be grouped
α+β α−β
H |ψi = √ |0i + √ |1i
2 2
It turns out to be very useful to express H in compact computational basis form. I’ll
give you the answer, and let you prove it for for yourself.
|0i + (−1)x |1i
|xi H √ .
2
[Exercise. Show that this formula gives the right result on each of the two CBS
kets.]
238
Measurement
Compared to the previous gates, the Hadamard gate has a more complex and subtle
effect on measurement probabilities. The circuit
|ψi H
A B
P & |1i = 2 −4 3
3/2
P & |1i = 3/4
239
9.5.6 Phase-Shift Gates, S, T and Rθ
The phase-flip gate, Z, is a special case of a more general (relative) phase shift
operation in which the coefficient of |1i is “shifted” by θ = π radians. There are
two other common shift amounts, π/2 (the S operator) and π/4 (the T operator).
Beyond that we use the most general amount, any θ (the Rθ operator). Of course,
they can all be defined in terms of Rθ , so we’ll define that one first.
The phase shift operator, Rθ , is defined by the matrix
1 0
,
0 eiθ
where θ is any real number. It leaves the coefficient of |0i unchanged and “shifts” (or
“rotates”) |1i’s coefficient by a relative angle θ, whose meaning we will discuss in a
moment. Here’s the effect on a general state:
1 0 α α
Rθ |ψi = = .
0 eiθ β eiθ β
Vocabulary
S is called the “phase gate,” and in an apparent naming error T is referred to as the
“π/8 gate,” but this is not actually an error as much as it is a change of notation.
Like a state vector, any unitary operator on state space can be multiplied by a unit
scalar with impunity. We’ve seen the reason, but to remind you, state vectors are
rays in state space, all vectors on the ray considered to be the same state. Since
we are also working on the projective sphere, any unit scalar not only represents the
same state, but keeps the vector on the projective sphere.
With that in mind, if we want to see a more balanced version of T , we multiply
it by e−iπ/8 and get the equivalent operator,
−iπ/8
e 0
T ∼ = .
0 eiπ/8
Measurement
These phase shift gates leave the probabilities alone for single qubit systems, just as
the phase flip gate, Z, did. You can parallel the exposition presented for Z in the
current situation to verify this.
240
9.6 Putting Unary Gates to Use
9.6.1 Basis Conversion
Every quantum gate is unitary, a fact that has two important consequences, the first
of which we have already noted.
1. Quantum gates are reversible; their adjoints can be used to undo their action.
The second item is true using logic from our linear algebra lesson as follows. Unitary
operators preserve inner products. For example, since the natural CBS { |0i , |1i } –
which we can also express using the more general notation { |xi }1x=0 – satisfies the
orthonormality relation
(I’ll remind you that the LHS of last equation is nothing but the inner product of
U |yi with U |xi expressed in terms of the adjoint conversion rules introduced in our
lesson on quantum mechanics).
That tells us that { U |0i , U |1i } are orthonormal, and since the dimension of
H = 2 they also span the space. In other words, they form an orthonormal basis,
as claimed.
Let’s call this the basis conversion property of unitary transformations and make
it a theorem.
[Exercise. Prove the theorem for any dimension, N . Hint: Let bk = U (ak )
be the kth vector produced by subjecting ak ∈ A to U . Review the inner product-
preserving property of a unitary operator and apply that to any two vectors bk
and bj in the image of A. What does that say about the full set of vectors B =
U (A)? Finally, what do you know about the number of vectors in the basis for an
N -dimensional vector space?]
241
QNOT and Bases
When applied to some gates, like QNOT, this is a somewhat trivial observation since
QNOT maps the z-basis to itself:
In other situations, this is a very useful and interesting conversion. For example, if
you look back at its effect on the CBS, the Hadamard gate takes the z-basis to the
x-basis:
and, since every quantum gate’s adjoint is its inverse, and H is self-adjoint (easy
[Exercise]), it works in the reverse direction as well,
[Exercise. Identify the unitary operator that has the effect of converting between
the z-basis and the y-basis.]
An x-Basis QNOT
The QNOT gate (a.k.a. X), swaps the two CBS states, but only relative to the z-basis,
because that’s how we defined it. An easy experiment shows that this is not true of
another basis, such as the x-basis:
|0i − |1i X |0i − X |1i
X |1ix = X √ = √
2 2
|1i − |0i
= √ = − |1ix ∼= |1ix .
2
[Exercise. To what is the final equality due? What is X |0ix ?]
242
[Exercise. Why is this not a surprise? Hint: Revive your quantum mechanics
knowledge. X is proportional to the matrix for the observable, Sx , whose eigenvectors
are |0ix and |1ix , by definition. What is an eigenvector?]
If we wanted to construct a gate, QN OTx , that does have the desired swapping
effect on the x-basis we could approach it in a number of ways, two of which are
I’ll outline the first approach, leaving the details to you, then show you the second
approach in its full glory.
Brute Force. We assert the desired behavior by declaring it to be true (and
confirming that our guess results in a unitary transformation). Here, that means
stating that QN OTx |0ix = |1ix and QN OTx |1ix = |0ix . Express this as two
equations involving the matrix for QN OTx and the x-basis kets in coordinate form
(everything in z-basis coordinates, of course). You’ll get four simultaneous equations
for the four unknown matrix elements of QN OTx . This creates a definition in terms
of the natural CBS. We confirm it’s unitary and we’ve got our gate.
[Exercise. Fill in the details for QN OTx .]
Gate Combination. We know that QNOT (i.e., X) swaps the z-CBS, and H
converts between z-CBS and x-CBS. So we use H to map the x-basis to the z-basis,
apply X to the z-basis and convert the results back to the x-basis:
H X H
and we have our matrix. It’s easy to confirm that the matrix swaps the x-CBS kets
and is identical to the matrix we would get using brute force (I’ll let you check that).
243
Circuit Identities
The above example has a nice side-effect. By comparing the result with one of our
basic gates, we find that H → X → H is equivalent to Z,
H X H = Z
There are more circuit identities that can be generated, some by looking at the
matrices, and others by thinking about the effects of the constituent gates and con-
firming your guess through matrix multiplication.
Here is one you can verify,
H Z H = X ,
Z Z = X X = Y Y = 1 .
The last pattern is true for any quantum logic gate, U , which is self-adjoint because
then U 2 = U † U = 1, the first equality by “self-adjoint-ness” and the second by
unitarity.
Some operator equivalences are not shown in gate form, but rather using the
algebraic operators. For example
XZ = −i Y
or
XY Z = −ZY X = i1.
That’s because the algebra shows a global phase factor which may appear awkward
in gate form yet is still important if the combination is to be used in a larger circuit.
As you may recall, even though a phase factor may not have observable consequences
on the state alone, if that state is combined with other states prior to measurement,
the global phase factor can turn into a relative phase difference, which does have
observable consequences.
I will finish by reminding you that the algebra and the circuit are read in opposite
order. Thus
XY Z = i1.
corresponds to the circuit diagram
Z Y X = i1
This completes the basics of Qubits and their unary operators. There is one final
topic that every quantum computer scientist should know. It is not going to be used
much in this course, but will appear in CS 83B and CS 83C. It belongs in this chapter,
so consider it recommended, but not required.
244
9.7 The Bloch Sphere
9.7.1 Introduction
Our goal is to find a visual 3-D representation for the qubits in H. To that end, we
will briefly allude to the lecture on quantum mechanics.
If you studied the optional time evolution of a general spin state corresponding
to a special physical system – an electron in constant magnetic field B – you learned
that the expectation value of all three observables formed a real 3-D time-evolving
vector,
hSx i|ψ(t)i
hSz i|ψ(t)i
245
to get a balanced Hilbert space representative of our qubit,
!
c ei φ
|ψi = .
s e−i φ
so c and s can be equated with the sine and cosine of some angle which we call 2θ , i.e.,
θ
c = cos and
2
θ
s = sin .
2
We’ll see why we pick 2θ , not θ, next.
s ≡ hY i|ψi .
hZi|ψi
At the end of the quantum mechanics lecture we essentially computed these expec-
tation values. If you like, go back and plug t = 0 into the formula there. Aside from
the factor of ~2 (caused by the difference between X/Y /Z and Sx /Sy /Sz ), you will get
sin θ cos φ
s = − sin θ sin φ .
cos θ
By defining c and s in terms of 2θ , we ended up with expectation values that had the
whole angle, θ, in them. This is a unit vector in R3 whose spherical coordinates are
(1, θ, −φ)t , i.e., it has a polar angle θ and azimuthal angle −φ. It is a point on the
unit sphere.
246
9.7.4 Definition of the Bloch Sphere
The sphere in R3 defined by
n o
n̂ |n̂| = 1
is called the Bloch sphere when the coordinates of each point on the sphere n̂ =
(x, y, z)t are interpreted as the three expectation values hXi, hY i and hZi for some
qubit state, |ψi. Each qubit value, |ψi, in H corresponds to a point n̂ on the Bloch
sphere.
If we use spherical coordinates to represent points on the sphere, then n̂ =
(1, θ, φ)t corresponds to the |ψi = α |0i + β |1i in our Hilbert space H accord-
ing to
1, cos 2θ e−i φ
!
n̂ = θ, ∈ Bloch sphere ←→ |ψi = θ
iφ ∈ H.
φ Sph sin 2
e
Now we see that a polar angle, θ, of a point on the Bloch sphere gives the magnitudes
of its corresponding qubit coordinates, but not directly; when θ is the polar angle,
θ/2 is used (through sine and cosine) for the qubit coordinate magnitudes.
247
Chapter 10
Tensor Products
V ⊗W
U ,
which have a single input. We need to combine qubits (a process called quantum
entanglement), and to do that we’ll need gates that have, at a minimum, two inputs,
U .
(You may notice that there are also two outputs, an inevitable consequence of uni-
tarity that we’ll discuss in this hour.)
In order to feed two qubits into a binary quantum gate, we need a new tool to
help us calculate, and that tool is the tensor product of the two single qubit state
spaces
H ⊗H.
The concepts of tensors are no harder to master if we define the general tensor product
of any two vector spaces, V and W of dimensions l and m, respectively,
V ⊗W ,
and this approach will serve us well later in the course. We will then apply what we
learn by setting V = W = H.
248
The tensor product of more than two component spaces like
V0 ⊗ V1 ⊗ V2 ⊗ · · · ,
H ⊗ H ⊗ H ⊗ ··· ,
presents no difficulty once we have mastered the “order 2 tensors” (product of just
two spaces). When we need a product of more than two spaces, as we shall in a future
lecture, I’ll guide you. For now, let’s learn what it means to form the tensor product
of just two vector spaces.
Items 1 and 2 are easy, and we won’t be overly compulsive about item 3, so it
should not be too painful. We’ll also want to cover the two – normally optional but
for us required – topics,
If you find this sort of abstraction drudgery, think about the fact that tensors are the
requisite pillars of many fields including structural engineering, particle physics and
general relativity. Your attention here will not go unrewarded.
Overview
The new vector space is based on the two vector spaces V (dimension = l) and W
(dimension = m) and is called tensor product of V and W , written V ⊗ W . The new
space will turn out to have dimension = lm, the product of the two component space
dimensions).
249
The Scalars of V ⊗ W
Both V and W must have a common scalar set in order to form their inner product,
and that set will be the scalar set for V ⊗ W . For real vector spaces like R2 and R3 ,
the scalars for
R2 ⊗ R3
would then be R. For the Hilbert spaces of quantum physics (and quantum comput-
ing), the scalars are C.
The Vectors in V ⊗ W
Vectors of the tensor product space are formed in two stages. I like to Compart-
“mental ”-ize them as follows:
“v ⊗ w”
2. The formal vector products constructed in step 1 produce only a small subset
of the tensor product space V ⊗ W . The most general vector is a finite sum of
such symbols.
The full space V ⊗ W consists of all finite sums of the form
X
vk ⊗ wk , with
k
vk ∈ V and wk ∈ W.
250
For example, in the case of the complex vector spaces V = C3 and W = C4 ,
one such typical vector in the “product space” C3 ⊗ C4 would be
1 1−i i
1+i 2 1−i π 2 i
−6 ⊗ + −6 ⊗ 2 ⊗ √ .
0 +
3 7
3i 3i 2
4 0 −i
Vocabulary
• Product Space. The tensor product of two vector spaces is sometimes referred
to as the product space.
• Tensors. Vectors in the product space are sometimes called tensors, empha-
sizing that they live in the tensor product space of two vector spaces. However,
they are still vectors.
• Tensor Product. We can use the term “tensor product” to mean either the
product space or the individual separable tensors. Thus V ⊗ W is the tensor
product of two spaces, while v ⊗ w is the (separable) tensor product of two
vectors.
Vector Addition
This operation is built-into the definition of a tensor; since the general tensor is the
sum of separable tensors, adding two of them merely produces another sum, which
is automatically a tensor. The twist, if we can call it that, is how we equate those
sums which actually represent the same tensor. This all expressed in the following
two bullets.
251
• For any two tensors,
X X
ζ = v k ⊗ wk ζ0 = vj0 ⊗ wj0 ,
k j
which simply expresses the fact that a sum of two finite sums is itself a finite
sum and therefore agrees with our original definition of a vector object in the
product space. The sum may need simplification, but it is a valid object in the
product space.
• The tensor product distributes over sums in the component space,
(v + v0 ) ⊗ w = v ⊗ w + v0 ⊗ w and
v ⊗ (w + w0 ) = v ⊗ w + v ⊗ w0 .
Practically, we only need to understand this as a “distributive property,” but
in theoretical terms it has the effect of producing countless sets of equivalent
vectors. That is, it tells how different formal sums in step 2 of “The Vectors
of V ⊗ W ” might represent the same actual tensor.
Scalar Multiplication
252
The Requisite Properties of V ⊗ W
Inner Products in V ⊗ W
[Exercise. Prove that the definition of inner product satisfies all the usual re-
quirements or a dot or inner product. Be sure to cover distributivity and positive
definiteness.]
Of all the aspects of tensor products, the one that we will use most frequently is the
preferred basis.
253
and W has dimension m, with orthonormal basis
n om−1
wj ,
j=0
Proof of Basis Theorem. I’ll guide you through the proof, and you can fill in
the gaps as an exercise if you care to.
Spanning. A basis must span the space. We need to show that any tensor can
be expressed as a linear combination of the alleged basis vectors vk ⊗ wj . This is an
easy two parter:
2. Any tensor is a sum of separable tensors, so item 1 tells us that it, too, can
be expressed as a linear combination of vk ⊗ wj . [Exercise. Demonstrate this
algebraically.]
254
Linear Independence and Orthonormality. We rely on a little theorem to
which I subjected you twice, first in the linear algebra lecture, then again in the
Hilbert space lecture. It said that an orthonormality ⇒ linearly independence. We
now show that the set {vk ⊗ wj } is orthonormal collection of vectors.
from V and W .
If V and W each have an inner product (true for any of our vector spaces) this
theorem follows immediately from the orthonormal basis theorem.
[Exercise. Give a one sentence proof of this theorem based on the orthonormal
product basis theorem.]
The theorem is still true, even if the two spaces don’t have inner products, but
we won’t bother with that version.
While we have outlined a rigorous construction for a tensor product space, it is usually
good enough for computer scientists to characterize the produce space in terms of the
tensor basis.
255
The produce space, U = V ⊗ W consists of tensors, u, expressible as sums of
the separable basis
n o
vk ⊗ w j j = 0, . . . , (n − 1), k = 0, . . . , (m − 1) ,
The sums, products and equivalence of tensor expressions are defined by the required
distributive and commutative properties listed earlier, but can often be taken as the
natural rules one would expect.
While not universal, when we need to list the tensor basis linearly, the most common
convention is to let the left basis index increment slowly and the right increment
quickly. It is “V -major / W -minor format” if you will, an echo of the row-major
(column-minor ) ordering choice of arrays in computer science,
n
v0 ⊗ w0 , v0 ⊗ w1 , v0 ⊗ w2 , . . . , v0 ⊗ wm−1 ,
v1 ⊗ w0 , v1 ⊗ w1 , v1 ⊗ w2 , . . . , v1 ⊗ wm−1 ,
..
.
o
vl−1 ⊗ w0 , vl−1 ⊗ w1 , vl−1 ⊗ w2 , vl−1 ⊗ wm−1 .
You might even see these basis tensors labeled using the shorthand like ζkj ,
n
ζ00 , ζ01 , ζ02 , . . . , ζ0(m−1) ,
ζ10 , ζ11 , ζ12 , . . . , ζ1(m−1) ,
..
.
o
ζ(l−1)0 , ζ(l−1)1 , ζ(l−1)2 , . . . , ζ(l−1)(m−1) .
256
10.2.2 Tensor Coordinates from Component-Space Coordi-
nates
We have defined everything relating to a tensor product space V ⊗ W . Before we
apply that to qubits, we must make sure we can quickly write down coordinates of
a tensor in the preferred (natural) basis. This starts, as always, with (i) separable
tensors, from which we move to the (ii) basis tensors, and finally graduate to (iii)
general tensors.
We are looking for the coordinates of a pure tensor, expressible as a product of two
component vectors whose preferred coordinates we already know,
c0 d0
c1 d1
v ⊗ w = .. ⊗ .. .
. .
cl−1 dm−1
[A Reminder. We are numbering staring with 0, rather than 1, now that we are in
computing lessons.]
I’m going to give you the answer immediately, and allow you to skip the explana-
tion if you are in a hurry.
c0 d 0
c0 d 1
d0
..
.
d
1
c0 .. c0 dm−1
.
cd
dm−1 1 0
cd
1 1
..
d 0
.
d c1 dm−1
c0 d0 1
c1 .
c1 d 1 . c d
. 2 0
.. ⊗ .. = =
c2 d 1 .
. . dm−1
..
cl−1 dm−1
.
.. cd
. 2 m−1
·
d
·
0
d1
·
c c d
l−1 .. l−1 0
. c d
l−1 1
dm−1 ..
.
cl−1 dm−1
257
Example. For (5, −6)t ∈ R2 and (π, 0, 3)t ∈ R3 , their tensor product in R2 ⊗ R3
has natural basis coordinates given by
5π
0
π
15
5
⊗ 0 = −6π .
−6
3
0
−18
shows).
Consider the tensor product of two general vectors,
d0
c0
⊗ d1 .
c1
d2
Let’s zoom in on the meaning of each column vector which is, after all, shorthand
notation for the following,
1 0 0
1 0
c0 + c1 ⊗ d0 0 + d1 1 + d2 0 .
0 1
0 0 1
Now apply the linearity to equate the above with
1 0 0
1 1 1
c0 d0 ⊗ 0 + c0 d1 ⊗ 1
+ c0 d 2 ⊗ 0
0 0 0
0 0 1
1 0 0
0 0 0
+ c1 d0 ⊗ 0 + c1 d1 ⊗ 1 + c1 d 2 ⊗ 0 .
1 1 1
0 0 1
Next, identify each of the basis tensor products with their symbolic vk (for R2 ) and
wj (for R3 ) to see it more clearly,
c0 d0 (v0 ⊗ w0 ) + c0 d1 (v0 ⊗ w1 ) + c0 d2 (v0 ⊗ w2 )
+ c1 d0 (v1 ⊗ w0 ) + c1 d1 (v1 ⊗ w1 ) + c1 d2 (v1 ⊗ w2 ) .
258
The basis tensors are listed in the conventional (“V -major / W -minor format”) al-
lowing us to write down the coordinates of the tensor,
c0 d 0
c0 d 1
c0 d 2
c1 d 0 ,
c1 d 1
c1 d 2
as claimed. QED
[Exercise. Replicate the demonstration for the coordinates of a separable tensor
v ⊗ w in any product space V ⊗ W.]
The is true for our tensor basis, which means that once we embrace that basis, we
have lm basis vectors
1 0 0
.. .. ..
. . .
0 , · · · , 1 , · · · , 0 lm ,
. . .
.. .. ..
0 0 1
which don’t make reference to the two component vector spaces or the inherited V -
major / W -minor ordering we decided to use. However, for this to be useful, we need
an implied correspondence between these vectors and the inherited basis
n
v0 ⊗ w0 , v0 ⊗ w1 , v0 ⊗ w2 , . . . , v0 ⊗ wm−1 ,
v1 ⊗ w0 , v1 ⊗ w1 , v1 ⊗ w2 , . . . , v1 ⊗ wm−1 ,
...
o
vl−1 ⊗ w0 , vl−1 ⊗ w1 , vl−1 ⊗ w2 , vl−1 ⊗ wm−1 .
259
This is all fine, as long as we remember that those self-referential tensor bases as-
sume some agreed-upon ordering system, and in this course that will be the V -major
ordering.
To illustrate this, say we are working with the basis vector
0
0
0
∈ R2 ⊗ R3 .
0
1
0
In the rare times when we need to relate this back to our original vector spaces we
would count: The 1 is in position 4 (counting from 0), and relative to R2 and R3 this
means
4 = 1 × 3 + 1,
which we can confirm by multiplying out the RHS as we learned in the last section
0·0
0
0 · 1 0
0 0 · 0
0 0
⊗ 1
= = .
1 1 · 0 0
0
1 · 1
1
1·0 0
This will be easy enough with our 2-dimensional component spaces H, but if you
aren’t prepared for it, you might find yourself drifting aimlessly when faced with a
long column basis tensor and don’t know what to do with it.
260
Natural Coordinates of General Tensors
Because the tensor space has dimension lm, we know that any tensor in the natural
basis will look like
ζ0
ζ1
ζ2
.. .
.
ζlm−2
ζlm−1
The only thing worth noting here is the correspondence between this and the com-
ponent spaces V and W . If we are lucky enough to have a separable tensor in our
hands, this would have the special form
c0 d 0
c0 d 1
c0 d 2
..
.
c1 d 0
c1 d 1
c1 d 2 ,
.
..
c d
2 0
c d
2 1
c d
2 2
..
.
and we might be able to figure out the component vectors from this. However, in
general, we don’t have separable tensors. All we can say is that this tensor is a linear
combination of the lm basis vectors, and just accept that it has the somewhat random
components, ζi which we might label simply
ζ0
ζ1
ζ2
.. ,
.
ζlm−2
ζlm−1
261
or we might label with an eye on our component spaces
ζ00
ζ01
ζ02
..
.
ζ0(m−1)
ζ10
ζ11
ζ12
.
. ,
.
ζ1(m−1)
ζ20
ζ21
ζ22
.
..
ζ
2(m−1)
..
.
with the awareness that these components ζkj may not be products of two factors
ck dj originating in two vectors (c0 , . . . , cl−1 )t and (d0 , . . . , dm−1 )t .
One thing we do know: Any tensor can be written as a weighted-sum of, at most,
lm separable tensors. (If this is not immediately obvious, please review the tensor
product basis theorem.)
Example. Let’s compute the coordinates of the non-separable tensor
1 0
3 1
ζ = ⊗ 2
+ ⊗ 1
−6 0
π 1
Tensors as Matrices
If the two component spaces have dimension l and m, we know that the product space
has dimension lm. More recently we’ve been talking about how these lm coordinates
262
might be organized. Separable or not, a tensor is completely determined by its lm
preferred coefficients, which suggests a rectangle of numbers,
ζ00 ζ01 ζ02 · · · ζ0(m−1)
ζ10 ζ11 ζ12 · · · ζ1(m−1)
,
.. .. .. . . ..
. . . . .
ζ(l−1)0 ζ(l−1)1 ζ(l−1)2 · · · ζ(l−1)(m−1)
which happens to have a more organized structure in the separable case,
c0 d0 c0 d 1 c0 d2 · · · c0 dm−1
c1 d0 c1 d 1 c1 d2 · · · c1 dm−1
.
.. .. .. ... ..
. . . .
cl−1 d0 cl−1 d1 cl−1 d2 · · · cl−1 dm−1
This matrix is not to be interpreted as a linear transformation of either component
space – it is just a vector in the product space. (It does have a meaning as scalar-
valued function, but we’ll leave that as a topic for courses in relativity, particle physics
or structural engineering.)
Sometimes the lm column vector model serves tensor imagery best, while other
times the l × m matrix model works better. It’s good to be ready to use either one
as the situation demands.
263
then their tensor product, A ⊗ B is a linear transformation on the product space
V ⊗ W,
A ⊗ B : V ⊗ W −→ V ⊗ W ,
defined by its action on the separable tensors
[A ⊗ B](v ⊗ w) ≡ Av ⊗ Bw ,
and extended to general tensors linearly.
Note 1: We could have defined A ⊗ B first on just the lm basis vectors vk ⊗ wj ,
since they span the space. However, it’s so useful to remember that we can use this
formula on any two component vectors v and w, that I prefer to make this the official
definition.
Note 2: A and/or B need not map their respective vector spaces into themselves.
For example, perhaps A : V 7→ V 0 and B : W 7→ W 0 . Then A⊗B : V ⊗W 7→ V 0 ⊗W 0 .
However, we will usually encounter the simpler case covered by the definition above.
One must verify that this results in a linear transformation (operator) on the
product space by proving that for an ζ, η ∈ V ⊗ W and scalar c,
[A ⊗ B] (ζ + η) = [A ⊗ B]ζ + [A ⊗ B]η and
[A ⊗ B] (cζ) = c [A ⊗ B]ζ .
This is very easy to do as an ...
[Exercise. Prove this by first verifiying it on separable tensors then showing that
the extension to general tensors preserves the properties.]
Example. Let A be defined on R2 by
v v0 + 2v1
Av = A 0 ≡
v1 v1
and B be defined on R3 by
πw0
Bw ≡ πw = πw1 .
πw2
On separable tensors, then, A ⊗ B has the effect expressed by
πw0
v0 + 2v1
[A ⊗ B](v ⊗ w) = ⊗ πw1 ,
v1
πw2
and this is extended linearly to general tensors. To get specific, we apply A ⊗ B to
the tensor
1 0
3 1
ζ = ⊗ 2 + ⊗ 1
−6 0
π 1
264
to get
π 0
3 + (−12) 1+0
[A ⊗ B]ζ = ⊗ 2π
+ ⊗ π
−6 2 0
π π
π 0
−9 1
= ⊗ 2π + ⊗ π ,
−6 2 0
π π
We can always forsake the separable components and instead express this as a column
vector in the product space by adding the two separable tensors,
π 0
−9 1
[A ⊗ B]ζ = ⊗ 2π +
⊗ π
−6 2 0
π π
−9π 0 −9π
−18π π −17π
2
−9π π π − 9π 2
=
+ 0 = −6π .
−6π
−12π 0 −12π
−6π 2 0 −6π 2
265
the entire matrix on the right.
a00 a01 · · · a0(l−1) b00 b01 · · · b0(m−1)
a10 a11 · · · a1(l−1) ⊗ b10 b11
· · · b1(m−1)
.. .. . . .. .. .. ..
..
. . . . . . . .
b00 b01 · · · b00 b01 · · ·
a b10 b11 · · · a b10 b11 · · · · · ·
00 01
.. .. . . .. .. . .
. . . . . .
=
b00 b01 · · · b00 b01 · · ·
a10 b10 b11 · · · b10 b11 · · ·
a11 · · ·
.. .. . . .. .. . .
. . . . . .
.. .. ..
. . .
This works based on a V -major column format for the vectors in the product space.
If we had used a W -major column format, then we would have had to define the
product matrix using a B-major rule rather than the A-major rule given above.
Example. The matrices for the A and B of our last example are given by
π 0 0
1 2
A = and B = 0 π 0 ,
0 1
0 0 π
−6π
266
and found that, after reducing the result to a column vector,
−9π
−17π
π − 9π 2
[A ⊗ B]ζ =
−6π
−12π
−6π 2
[Exercise. Do the matrix multiplication, fill in the question mark, and see if it
agrees with the column tensor above.]
Ppq
which has a 1 in position (p, q) and 0 in all other positions, is separable. Hint: You
need to find an l × l matrix and an m × m matrix whose tensor product has 1 in the
right position and 0s everywhere else. Start by partitioning Ppq into sub-matrices of
size m × m. Which sub-matrix does the lonely 1 fall into? Where in that m × m
sub-matrix does that lonely 1 fall? ]
[Exercise. Show that the set of all (lm)2 matrices
n o
Ppq 0 ≤ p, q < lm ,
267
10.3.4 Food for Thought
Before we move on to multi-qubit systems, here are a few more things for you may
wish to ponder.
[Exercise. Is the product of unitary operators unitary in the product space?]
[Exercise. Is the product of Hermitian operators Hermitian in the product
space?]
[Exercise. Is the product of invertible operators invertible in the product space?]
268
Chapter 11
Tensor algebra allows us to make the leap from the 2-D Hilbert space of one qubit to
a 4-D Hilbert space of two qubits. Once mastered, advancing to n qubits for n > 2
is straightforward. Therefore, we move carefully through the n = 2 case, where the
concepts needed for higher dimensions are easiest to grasp.
269
11.2 The State Space for Two Qubits
11.2.1 Definition of a Two Quantum Bit (“Bipartite”) Sys-
tem
The reason that we “go tensor” for a two-qubit system is that the two bits may
become entangled (to be defined below). That forces us to treat two bits as if they
were a single state of a larger state space rather than keep them separate.
In other words, two qubits form a single entity – the tensor product space H ⊗ H
– whose value can be any vector (which happens also to be a tensor) on the projective
sphere of that product space.
The two-qubit entity itself is not committed to any particular value until we say
which specific unit-vector in H ⊗ H we are assigning it.
Two qubits are often referred to as a bipartite system. This term is inherited from
physics in which a composite system of two identical particles (thus bi-parti -te) can
be “entangled.”
To distinguish the two otherwise identical component Hilbert spaces I may use
subscripts, A for the left-space and B for the right space,
HA ⊗ HB .
Another notation you might see emphasizes the order of the tensor product, that is,
the number of component spaces – in our current case, two,
H(2) .
In this lesson, we are concerned with order-2 products, with a brief but important
section on order-3 products at the very end.
Finally, note that in the lesson on tensor products, we used the common abstract
names V and W for our two component spaces. In quantum computation the com-
ponent spaces are usually called A and B. For example, whereas in that lecture I
talked about “V -major ordering” for the tensor coordinates, I’ll now refer to “A-major
ordering”.
270
11.2.2 The Preferred Bipartite CBS
First and foremost, we need to establish symbolism for the computation basis states
(CBS ) of our product space. These states correspond to the two-bits of classical
computing, and they allow us to think of two ordinary bits as being embedded within
the rich continuum of a quantum bipartite state space.
The tensor product of two 2-D vector spaces has dimension 2 × 2 = 4. Its inherited
preferred basis vectors are the separable products of the component space vectors,
These are the CBS of the bipartite system. There are some shorthand alternatives in
quantum computing.
All three of the alternatives that lack the ⊗ symbol are seen frequently in computer
science, and we will switch between them freely based on the emphasis that the
context requires.
The notation of the first two columns admits the possibility of labeling each of
the component kets with the H from whence it came, A or B, as in
I will often omit the subscripts A and B when the context is clear and include them
when I want to emphasize which of the two component spaces the vectors comes from.
The labels are always expendable since the A-space ket is the one on the left and the
B-space ket is the one on the right. I will even include and/or omit them in the same
string of equalities, since it may be clear in certain expressions, but less so in others:
U β |0iA + δ |1iA |1iB = U β |0i |1i + δ |1i |1i = β |0i |0i + δ |0i |1i
= |0iA β |0iB + δ |1iB
271
The densest of the notations in the “←→” stack a couple paragraphs back is
the encoded version which expresses the ket as an integer from 0 to 3. We should
reinforce this correspondence at the outset and add the coordinate representation of
each basis ket under the implied A-major ordering of the vectors suggested by their
presentation, above.
basis ket |0i |0i |0i |1i |1i |0i |1i |1i
encoded |0i2 |1i2 |2i2 |3i2
1 0 0 0
0 1 0 0
coordinates
0 0 1 0
0 0 0 1
This table introduces an exponent-like notation, |xi2 , which is needed mainly in the
encoded form, since an integer representation for a CBS does not disclose its tensor
order (2 in this case) to the reader, while the other representations clearly imply that
we are looking at two-qubits.
Note that, unlike the CBS symbolism, there is no further alternative notation for a
general separable tensor. In particular, |ψϕi makes no sense.
272
or super condensed (my own notation),
2
|0i± , |1i2± , |2i2± , |3i2± .
In this last version, I’m using the subscript ± to indicate “x basis” and I’m encoding
the ket labels into decimal integers, 0 through 3, for the four CBS states.
While rare, we could inherit from the z-basis for our A-space, and the x-basis for our
B-space, to create the hybrid
{ |0i |0ix , |0i |1ix , |1i |0ix , |1i |1ix } .
[Exercise. How do we know that this is an orthonormal basis for the product space?]
Notation
273
Example
• the definition of inner product in the tensor space (the product of component
inner-products),
2
|0ix |1iy
= h |0ix ⊗ |1iy |0ix ⊗ |1iy i
• or the adjoint conversion rules to form the left bra for the inner product,
2
|0ix |1iy
= h1| h0| |0i |1i = h1| h0 | 0i |1iy
y x x y y x x
= h1 | 1iy = 1 X .
y
However, neither of these would have been as thorough a check of our arithmetic as
the first approach.
[Exercise. Expand the separable state
√ √ √ √
.1 |0i + i .9 |1i ⊗ i .7 |0i + .3 |1i
274
How the Second Order x-Basis Looks when Expressed in the Natural Basis
with the expansion of each component ket along the natural basis,
|0i + |1i
|+i = H |0i = √ and
2
|0i − |1i
|−i = H |1i = √ ,
2
the four x-kets look like this, when expanded along the natural basis:
When expanded along the z-basis, the x-basis kets have equal numbers of
+ and − terms except for the zeroth CBS ket, |00ix , whose coordinates
are all +1.
I know it sounds silly when you say it out loud, but believe me, it will be very useful.
The “exponent 2” on the LHS is, as mentioned earlier, a clue to the reader that |ψi
lives in a second-order tensor product space, a detail that might not be clear without
looking at the RHS. In particular, nothing is being “squared.”
275
11.2.6 Usual Definition of Two Qubits and their Values
This brings us to the common definition of a two-qubit system, which avoids the
above formalism.
or
We may also use alternate notation for scalars, especially when we prepare for higher-
order product spaces:
or
As you can see, with all the vocabulary and skills we have mastered, definitions
can be very short now and still have significant content. For instance, we already
know that some binary quantum operators will be separable and others will not. The
simplest and most common gate, in fact, is not separable.
Binary quantum operators also go by the names two-qubit gates, binary qubit
operators, bipartite operators and various combinations of these.
276
Complete Description of Binary Quantum Operators
|ψi2 : examine the behavior of the gate on an general state (i.e., one that is not
necessarily separable), and
: The Symbol
Every binary qubit gate has two input lines, one for each input qubit, and two output
lines, one for each output qubit. The label for the unitary transformation associated
with the gate, say U , is placed inside a box connected to its inputs and outputs.
Although the data going into the two input lines can become ”entangled” inside the
gate, we consider the top half of the gate to be a separate register from the lower
half. This can be confusing to new students, as we can’t usually consider each output
line to be independent of its partner the way the picture suggests. More (a lot more)
about this shortly.
Vocabulary. The top input/output lines form an upper A register (or A channel )
while the bottom form a lower B register (or B channel ).
277
|xi |yi : Action on the CBS
Every operator is defined by its action on the basis, and in our case that’s the com-
putational basis. For binary gates, the symbolism for the general CBS is
To demonstrate this on our “learning” gate, U , we define its action on the CBS, which
in turn defines the gate:
|xi | ¬yi
U
|yi | ¬x ⊕ yi
It is very important to treat the LHS as a single two-qubit input state, not two
separate single qubits, and likewise with the output. In other words, it is really
saying
U |xi ⊗ |yi ≡ | ¬yi ⊗ | ¬x ⊕ yi
or, using shorter notation,
U |xi |yi ≡ | ¬yi | ¬x ⊕ yi
Furthermore, |xi |yi only represents the four CBS, so we have to extend this linearly
to the entire Hilbert space.
Let’s make this concrete. Taking one of the four CBS, say |10i, the above definition
tells us to substitute 1 → x and 0 → y, to get the gate’s output,
U |1i |0i = | ¬0i | ¬1 ⊕ 0i = |1i |0i .
···
··· : The Matrix
In our linear transformation lesson, we proved that the matrix MT that represents
an operator T can be written by applying T to each basis vector, ak , and placing the
answer vectors in the columns of the matrix,
!
MT = T (a1 ) , T (a2 ) , . . . , T (an ) .
T (v) is then just MT · v. Applying the technique to U and the CBS {|xi |yi} we get
!
MU = U |00i , U |01i , U |10i , U |11i .
278
Each of these columns must be turned into the coordinate representation – in the
inherited tensor basis – of the four U -values. Let’s compute them. (Spoiler alert: this
was the last exercise):
0
0
U |00i = |¬ 0i |¬ 0 ⊕ 0i = |1i |1 ⊕ 0i = |1i |1i = 0 .
1
Similarly, we get
1
0
U |01i = |¬ 1i |¬ 0 ⊕ 1i = |0i |1 ⊕ 1i = |0i |0i =
0 ,
0
0
0
U |10i = |¬ 0i |¬ 1 ⊕ 0i = |1i |0 ⊕ 0i = |1i |0i =
1 ,
0
0
1
U |11i = |¬ 1i |¬ 1 ⊕ 1i = |0i |0 ⊕ 1i = |0i |1i =
0 ,
0
giving us the matrix
0 1 0 0
0 0 0 1
U ∼
= MU = ,
0 0 1 0
1 0 0 0
which is, indeed, unitary. Incidentally, not every recipe you might conjure for the four
values U (|xi |yi) will produce a unitary matrix and therefore not yield a reversible –
and thus valid – quantum gate. (We learned last time that non-unitary matrices do
not keep state vectors on the projective sphere and therefore do not correspond to
physically sensible quantum operations.)
[Exercise. Go through the same steps on the putative operator defined by
U |xi |yi ≡ |x ⊕ ¬yi | ¬x ⊕ yi .
279
whose coordinates are
α
β
|ψi2 = ,
γ
δ
so matrix multiplication gives
0 1 0 0 α β
0 0 0 1 β δ
U |ψi2 =
0 0
=
1 0 γ γ
1 0 0 0 δ α
& : Measurement
(
|ψi2 → U U |ψi2
P Q
Easy Observations by Looking at Expansion Coefficients. Trait #6 (QM’s
fourth postulate) tells us that a measurement of the input state (point P)
|ψi2 = α |0i |0i + β |0i |1i + γ |1i |0i + δ |1i |1i ,
collapses it to |00i with probability |α|2 . Meanwhile, a look at the U |ψi2 ’s amplitudes
(point Q),
U |ψi2 = β |0i |0i + δ |0i |1i + γ |1i |0i + α |1i |1i ,
280
reveals that measuring the output there will land it on the state |00i with probability
|β|2 . This was the input’s probability of landing on |01i prior to the gate; U has
shifted the probability that a ket will register a “01” on our meter to the probability
that it will register a “00.” In contrast, a glance at |ψi’s pre- and post-U amplitudes
of the CBS |10i tells us that the probability of this state being measured after U is
the same as before: |γ|2 .
Measurement of Separable Output States. By looking at the expansion
coefficients of the general output state, we can usually concoct a simple input state
that produces
a separable
output. For example, taking γ = α = 0 gives a separable
input, β |0iA + δ |1iA ⊗ |1iB , as well as the following separable output:
U β |0iA + δ |1iA |1iB = U β |0i |1i + δ |1i |1i = β |0i |0i + δ |0i |1i
= |0iA β |0iB + δ |1iB
Measuring the A-register at the output (point Q) will yield a “0” with certainty
(the coefficient of the separable CBS component, |0i, is 1) yet will tell us nothing
about the B-register, which has a |β|2 probability of yielding a “0” and |δ|2 chance of
yielding a “1”, just as it did before we measured A. Similarly, measuring B at point
Q will collapse that output register into one of the two B-space CBS states (with
the probabilities |β|2 and |δ|2 ) but will not change a subsequent measurement of the
A-register output, still certain to show us a “0”.
(If this seems as though I’m jumping to conclusions, it will be explained formally
when we get the Born rule, below.)
A slightly less trivial separable output state results from the input,
√ ! √ ! √ ! √ !
2 6 2 6
|ψi2 = |00i + |01i + |10i + |11i
4 4 4 4
√ !
|0iA + |1iA |0iB + 3 |1iB
= √ ⊗ .
2 2
(As it happens, this input state is separable, but that’s not required to produce a
separable output state, the topic of this example. I just made it so to add a little
symmetry.)
281
The output state can be written down instantly by permuting the amplitudes
according to U ’s formula,
√ ! √ ! √ ! √ !
6 6 2 2
U |ψi2 = |00i + |01i + |10i + |11i
4 4 4 4
√ !
3 |0iA + |1iA |0iB + |1iB
= ⊗ √ ,
2 2
and I have factored it for you, demonstrating the output state’s separability. Mea-
suring either output register at access point Q,
√
|0i + |1i 3 |0i + |1i
√
2 2
√ U
|0i + 3 |1i |0i + |1i
√
2 2
Q
has a non-zero probability of yielding one of the two CBS states for its respective H,
but it won’t affect the measurement of the other output register. For example, mea-
suring the B-qubit-out will land it in it |0iB or |1iB with equal probability. Regardless
of which result we get, it will not affect a future measurement of the A-qubit-out which
has a 3/4 chance of measuring “0” and 1/4 chance of showing us a “1.”
Measuring One Register of a Separable State. This is characteristic of
separable states, whether they be input or output. Measuring either register does not
affect the probabilities of the other register. It only collapses the component vector of
the tensor, leaving the other vector un-collapsed.
As a final example before we get into real gates, we look at what happens when we try
to measure a non-separable output state of our general learning circuit just presented.
Consider the input,
2 1 1
|ψi = √ |00i + √ |01i
2 2
|0iB + |1iB
= |0iA √ ,
2
282
a separable state that we also know under the alias,
|0i |0ix .
Applying U , this time using matrix notation (for fun and practice), yields
0 1 0 0 1 1
0 0 0 1 1 1 1 0
U |ψi2 =
0 0 1 0 · √ = √
2 0 2 0
1 0 0 0 0 1
|0iA |0iB + |1iA |1iB
= √ ,
2
clearly not factorable. Furthermore, unlike the separable output states we have stud-
ied, a measurement of either register forces its partner to collapse.
|0i
|0i |0i + |1i |1i
A B A B
U √
|0i + |1i 2
√
2
Q
For example, if we measure B’s output, and find it to be in state |1iB , since the
output ket has only one CBS tensor associated with that |1iB , namely |1iA |1iB , as
we can see from its form
|0i |0i + |1i |1i
√ ,
2
we are forced to conclude that the A-register must have collapsed into its |1i state. If
this is not clear to you, imagine that the A-register had not collapsed to |1i. It would
then be possible to measure a “0 ” in the A-register. However, such a turn of events
would have landed a |1i in the B-register and |0i in the A-register, a combination
that is patently absent from the output ket’s CBS expansion, above.
Stated another way (if you are still unsure), there is only one bipartite state here,
and if, when expanded along the CBS basis, one of the four CBS kets is missing
from that expansion that CBS ket has a zero probability of being the result of a
measurement collapse. Since |0i |1i is not in the expansion, this state is not accessible
through a measurement. (And by the way, the same goes for |1i |0i.)
283
Non-Locality
Entangled states are also said to be non-local, meaning that if you are in a room
with only one of the two registers, you do not have full control over what happens
to the data there; an observer of the other register in a different room may measure
his qubit and affect your data even though you have done nothing. Furthermore, if
you measure the data in that register, your efforts are not confined to your room but
extend to the outside world where the other register is located. Likewise, separable
states are considered local, since they do allow full segregation of the actions on
separate registers. Each observer has total control of the destiny of his register, and
his actions don’t affect the other observer.
Partial Collapse
In this last example a measurement and collapse of one register completely determined
the full and unique state of the output. However, often things are subtler. Measuring
one register may have the effect of only partially collapsing its partner. We’ll get to
that when we take up the Born rule.
Everything we’ve done in the last two sections is true as long as we have a consistent
CBS from start to finish. The definition of U , its matrix and the measurements have
all used the same CBS. But funny things happen if we use a different measurement
basis than the one used to define the operator or express its matrix. Look for an
example in a few minutes.
Now let’s do everything again, this time for a famous gate.
284
: The Symbol
•
,
•
.
The A-register is often called the control bit, and the B-register the target bit,
“control bit” → •
.
“target bit” →
The CNOT gate has the following effect on the computational basis states:
|xi • |xi
|yi |x ⊕ yi
When viewed on the tiny set of four CBS tensors, it appears to leave the A-register
unchanged and to negate the B-register qubit or leave it alone, based on whether the
A-register is |1i or |0i:
(
|yi , if x = 0
|yi 7−→
| ¬ yi , if x = 1
285
···
··· : The Matrix
We compute the column vectors of the matrix by applying CNOT to the CBS tensors
to get
!
MCNOT = CNOT |00i , CNOT |01i , CNOT |10i , CNOT |11i
! 1 0 0 0
0 1 0 0
= |00i , |01i , |11i , |10i =
0
0 0 1
0 0 1 0
we get
1 0 0 0 α α
0 1 0 0 β β
CNOT |ψi2 =
0
=
0 0 1 γ δ
0 0 1 0 δ γ
A Meditation. If you are tempted to read on, feeling that you understand
everything we just covered, see how quickly you can answer this:
[Exercise. The CNOT is said to leave the source register unchanged and flip
the target register only if the source register input is |1i. Yet the matrix for CNOT
seems to always swap the last two amplitudes, γ ↔ δ, of any ket. Explain this.]
Caution. If you cannot do the last exercise, you should not continue reading, but
review the last few sections or ask a colleague for assistance until you see the light.
This is an important consequence of what we just covered. It is best that you apply
that knowledge to solve it rather than my blurting out the answer for you.
286
& : Measurement
First, we’ll consider the amplitudes before and after the application of CNOT (access
points P and Q, respectively):
•
(
|ψi2 →
CNOT |ψi2
P Q
will yield a “00 ” with probability |α|2 and “01 ” with probability |β|2 . A post-gate
measurement of CN OT |ψi2 (point Q),
CNOT |ψi2 = α |0i |0i + β |0i |1i + δ |1i |0i + γ |1i |1i ,
will yield those first two readings with the same probabilities since their ket’s respec-
tive amplitudes are not changed by the gate. However, the probabilities of getting a
“10 ” vs. a “11 ” reading are swapped. They go from |γ|2 and |δ|2 before the gate to
|δ|2 and |γ|2 , after.
Separable Output States. There’s nothing new to say here, as we have covered
all such states in our learning example. Whenever we have a separable output state,
measuring one register has no affect on the other register. So while a measurement
of A causes it to collapse, B will continue to be in a superposition state until we
measure it (and vice versa).
A separable bipartite state into CNOT gate does not usually result in a separable
state out of CNOT. To see this, consider the separable state
2 |0i + |1i
|ψi = |0ix ⊗ |0i = √ |0i
2
going into CNOT:
|0i + |1i
√ • ?
2
|0i ?
287
When presented with a superposition state into either the A or B register, back
away very slowly from your circuit diagram. Turn, instead, to the linear algebra,
which never lies. The separable state should be resolved to its tensor basis form by
distributing the product over the sums,
|0i + |1i 1 1
√ |0i = √ |00i + √ |10i .
2 2 2
[Exception: If you have a separable operator as well as separable input state, we
don’t need to expand the input state along the CBS, as the definition of separable
operator allows us to apply the component operators individually to the component
vectors. CNOT is not separable, so we have to expand.]
Now, apply CNOT using linearity,
1 1 1 1
CNOT √ |00i + √ |10i = √ CNOT (|00i) + √ CNOT (|10i)
2 2 2 2
1 1
= √ |00i + √ |11i
2 2
|00i + |11i
= √ .
2
This is the true output of the gate for the presented input. It is not separable as is
obvious by its simplicity; there are only two ways we might factor it: pulling out an
A-ket (a vector in the first H space) or pulling out a B-ket (a vector in the second H
space), and neither works.
Getting back to the circuit diagram, we see there is nothing whatsoever we can
place in the question marks that would make that circuit sensible. Anything we
might try would make it appear as though we had a separable product on the RHS,
which we do not. The best we can do is consolidate the RHS of the gate into a single
bipartite qubit, indeed, an entangled state.
|0i + |1i
√ • |00i + |11i
2 → √
|0i 2
With an entangled output state such as this, measuring one output register causes the
collapse of both registers. We use this property frequently when designing quantum
algorithms.
Individual Measurement of Output Registers. Although we may have an
entangled state at the output of a gate, we are always allowed to measure each
288
register separately. No one can stop us from doing so; the two registers are distinct
physical entities at separate locations in the computer (or universe). Entanglement
and non-locality mean that the registers are connected to one another. Our intent to
measure one register must be accompanied by the awareness that, when dealing with
an entangled state, doing so will affect the other register’s data.
|1i • ?
|0i + |1i
√ ?
2
I have chosen the A-register input to be |1i for variety. It could have been |0i with
the same (as of yet, undisclosed) outcome. The point is that this time our A-register
is a CBS while the B-register is a superposition. We know from experience to ignore
the circuit diagram and turn to the linear algebra.
|0i + |1i 1 1
|1i √ = √ |10i + √ |11i ,
2 2 2
and we apply CNOT
1 1 1 1
CNOT √ |10i + √ |11i = √ CNOT (|10i) + √ CNOT (|11i)
2 2 2 2
1 1
= √ |11i + √ |10i
2 2
|1i + |0i
= |1i √ .
2
Aha – separable. That’s because the control-bit (the A-register) is a CBS; it does not
change during the linear application of CNOT so will be conveniently available for
factoring at the end. Therefore, for this input, we are authorized to label the output
registers, individually.
|1i • |1i
|0i + |1i |0i + |1i
√ √
2 2
289
The two-qubit output state is unchanged. Not so fast. You have to do an ...
[Exercise. We are told that a |1i going into CNOT’s control register means
we flip the B-register bit. Yet, the output state of this binary gate is the same as
the
√ input state.
√ Explain. Hint: Try the same example with a B-register input of
.3 |0i + .7 |1i. ]
[Exercise. Compute CNOT of an input tensor |1i |1ix . Does CNOT leave this
state unchanged?]
Summary. A CBS ket going into the control register (A) of a CNOT gate allows
us to preserve the two registers at the output: we do, indeed, get a separable state
out, with the control register output identical to the control register input. This is
true even if a superposition goes into the target register (B). If a superposition goes
into the control register, however, all bets are off (i.e., entanglement emerges at the
output).
The A, or control, register of the CNOT gate is said to be unaffected by the CNOT
gate, although this is overstating the case; it gives the false impression that a separable
bipartite state into CNOT results in a separable state out, which we see is not the
case. Yet, there are are at least two ways to interpret this characterization.
1. When a CBS state (of the preferred, z-basis) is presented to CNOT’s A-register,
the output state is, indeed, separable, with the A-register unchanged.
We have already demonstrated item 1, so let’s look now at item 2. The general
state, expressed along the natural basis is
CNOT |ψi2 = α |0i |0i + β |0i |1i + δ |1i |0i + γ |1i |1i .
290
one outcome is |0iB and of the other outcome is |1iB ). Therefore, to get the overall
probability that A collapses to |0iA , we simply add those two probabilities:
P (A-reg output & |0i) = P CNOT |ψi & |00i + P CNOT |ψi & |01i
= |α|2 + |β|2 ,
which is exactly the same probability of measuring a 0 on the input, |ψi, prior to
applying CNOT.
[Exercise. We did not compute the probability of measuring a “0 ” on the input,
|ψi. Do that that to confirm the claim.]
[Exercise. What trait of QM (and postulate) tells us that the individual proba-
bilities are |α|2 and |β|2 ?]
[Exercise. Compute the probabilities of measuring a “1 ” in the A-register both
before and after the CNOT gate. Caution: This doesn’t mean that we would measure
the same prepared state before and after CNOT. Due to the collapse of the state after
any measurement, we must prepare many identical states and measure some before
and others after then examine the outcome frequencies to see how the experimental
probabilities compare.]
Now we come to the example that I promised: measuring in a basis different from the
one used to define and express the matrix for the gate. Let’s present the following
four bipartite states
to the input of CNOT and look at the output (do a measurement) in terms of the
x-basis (which consist of those four tensors).
|00ix : This one is easy because the z-coordinates are all the same.
1
|0i + |1i |0i + |1i 1
1
|00ix = |0ix |0ix = √ √ =
2 2 2 1
1
CNOT applied to |00ix swaps the last two z-coordinates, which are identical,
so it is unchanged.
291
|01ix : Just repeat, but watch for signs.
1
|0i + |1i |0i − |1i 1
−1
|01ix = |0ix |1ix = √ √ =
2 2 2 1
−1
From here it’s easier to expand along the z-basis so we can factor,
1
CNOT |01ix = |0i |0i − |0i |1i − |1i |0i + |1i |1i
2
1
= |0i |0i − |1i |0i − |0i |1i + |1i |1i
2
1
= |0i − |1i |0i − |0i − |1i |1i
2
1
= |0i − |1i |0i − |1i
2
|0i − |1i |0i − |1i
= √ √ = |1ix |1ix = |11ix .
2 2
What is this? Looking at it in terms of the x-basis it left the B-register un-
changed at |1ix but flipped the A-register from |0ix to |1ix .
Looking back at the |00ix case, we see that when the B-register held a qubit in
the Sx =“0 ” state, the A-register was unaffected relative to the x-basis.
This looks suspiciously as though the B-register is now the control bit and A is
the target bit and, in fact, a computation of the remaining two cases, |10ix and
|11ix , would bear this out.
292
demonstrating that, in the x-basis, the B-register is the control and the A is the
target.
[Preview. We are going to revisit this in a circuit later today. For now, we’ll call
it an “upside-down” action of the CNOT gate relative to the x-basis and later see how
to turn it into an actual “upside-down” CNOT gate for the natural CBS kets. When
we do that, we’ll call it “C↑NOT,” because it will be controlled from bottom-up in
the z-basis. So for, however, we’ve only produced this bottom-up behavior in the
x-basis, so the gate name does not change.]
What About Measurement? I advertised this section as a study of measure-
ment, yet all I did so far was make observations about the separable components of
the output – which was an eye-opener in itself. Still, let’s bring it back to the topic
of measurement.
Take any of the input states, say |10ix . Then the above results say that
To turn this into a statement about measurement probabilities, we “dot” the output
state with the x-basis kets to get the four amplitudes. By orthogonality of CBS,
x h00 | 10ix = 0
x h01 | 10ix = 0
x h10 | 10ix = 1
x h11 | 10ix = 0,
producing the measurement probability of 100% for the state, |10ix . In other words,
for this state – which has a B input of 0 (in x-coordinates) – its output remains 0
with certainty, while A’s 1 (again, x-coordinates) is unchanged, also with certainty.
On the other hand, the input state |11ix gave us an output of |01ix , so the output
amplitudes become
x h00 | 01ix = 0
x h01 | 01ix = 1
x h10 | 01ix = 0
x h11 | 01ix = 0.
Here the input – whose B x-basis component is 1 – turns into an output with the B
x-basis component remaining 1 (with certainty) and an A x-basis input 1 becoming
flipped to 0 at the output (also with certainty).
This demonstrates that such statements like “The A-register is left unchanged”
or “The A register is the control qubit,” are loosey-goosey terms that must be taken
with a grain of salt. They are vague for non-separable states (as we saw, earlier) and
patently false for measurements in alternate CBSs.
293
11.3.5 The Second-Order Hadamard Gate
We construct the second order Hadamard gate by forming the tensor product of two
first order gates, so we’d better first review that operator.
Recall that the first order Hadamard gate operates on the 2-dimensional Hilbert space,
H, of a single qubit according to its effect on the CBS states
|0i + |1i
H |0i = |0ix = √ and
2
|0i − |1i
H |1i = |1ix = √ ,
2
which, in an exercise, you showed was equivalent to the CBS formula
and that it affects a general qubit state, |ψi = α |0i + β |1i, according to
α+β α−β
H |ψi = √ |0i + √ |1i .
2 2
Definition. The second order Hadamard gate, also called the two-qubit or binary
Hadamard gate, is the tensor product of two single-qubit Hadamard gates,
H ⊗2 ≡ H ⊗H.
Notation. You will see both H ⊗ H and H ⊗2 when referring to the second order
Hadamard gate. In a circuit diagram, it looks like this
H ⊗2
294
when we want to condense the input and output pipes into a multi-pipe. However, it
is often drawn as two individual H gates applied in parallel,
H
H
[T1 ⊗ T2 ](v ⊗ w) ≡ T1 v ⊗ T2 w .
This forces the action of H ⊗ H on the H(2) computational basis state |0i |0i (for
example) to be
|0i + |1i |0i + |1i
[H ⊗ H] |0i |0i = H |0i H |0i = √ √ .
2 2
Separability of CBS Output. Let’s pause a moment to appreciate that when an
operator in H(2) is a pure product of individual operators, as this one is, CBS states
always map to separable states. We can see this in the last result, and we know it will
happen for the other three CBS states.
CBS Output in z-Basis Form. Separable or not, it’s always good to have
the basis-expansion of the gate output for the four CBS kets. Multiplying it out for
H ⊗2 (|0i |0i), we find
1
|0i |0i + |0i |1i + |1i |0i + |1i |1i 11 .
[H ⊗ H] |0i |0i = =
2 2 1
1
Doing the same thing for all four CBS kets, we get the identities
295
Condensed Form #1. There is a single-formula version of these four CBS results,
and it is needed when we move to three or more qubits, so we had better develop it
now. However, it takes a little explanation, so we allow a short side trip.
First, let’s switch to encoded notation (0 ↔ 00, 1 ↔ 01, 2 ↔ 10 and 3 ↔ 11),
and view the above in the equivalent form,
H ⊗2 |xi2
x = x1 x0 and
y = y1 y0 ,
where the RHS represents the two-bit string of the CBS (“00,” “01,” etc.), we define
xy ≡ x1 · y 1 ⊕ x0 · y 0 ,
that is, we multiply corresponding bits and take their mod-2 sum. To get overly
explicit, here are a few computed mod-2 dot products:
x = x1 x0 y = y1 y0 xy
3 = 11 3 = 11 0
1 = 01 1 = 01 1
0 = 00 2 = 10 0
1 = 01 3 = 11 1
1 = 01 2 = 10 0
296
I’ll now confirm that the last expression presented above for H ⊗2 |xi2 , when applied
to the particular CBS |3i2 ∼= |1i |1i, gives the right result:
H ⊗2 |3i2 = H ⊗2 |11i
Most authors typically don’t use the “” for the mod-2 dot product, but stick with a
simpler “·”, and add some verbiage to the effect that “this is a mod-2 dot product...,”
in which case you would see it as
3
2 1 X
H ⊗2
|xi = (−1)x · y |yi2 .
2 y=0
(
3
X
2
|xi → H ⊗2 1
2
(−1)x · y |yi2
y=0
Condensed Form #2. You may have noticed that the separability of the CBS
outputs, combined with the expression we already had for a single-qubit Hadamard,
297
gives us another way to express H in terms of the CBS. Once again, invoking the
definition of a product operator, we get
[H ⊗ H] |xi |yi = H |xi ⊗ H |yi
|0i + (−1)x |1i |0i + (−1)y |1i
= √ √ ,
2 2
which, with much less fanfare than condensed form #1, produces a nice separable
circuit diagram definition for the binary Hadamard:
···
··· : The Matrix
This time we have two different approaches available. As before we can compute
the column vectors of the matrix by applying H ⊗2 to the CBS tensors. However,
the separability of the operator allows us to use the theory of tensor products to
write down the product matrix based on the two component matrices. The need for
frequent sanity checks wired into our collective computer science mindset induces us
to do both.
Method #1: Tensor Theory.
Using the standard A-major (B-minor) method, a separable operator A ⊗ B can
298
be immediately written down using our formula
a00 a01 · · · a0(l−1) b00 b01 · · · b0(m−1)
a10 a11 · · · a1(l−1) b10 b11 · · · b1(m−1)
⊗
.. .. . . .. .. .. ..
..
. . . . . . . .
b00 b01 · · · b00 b01 · · ·
a b10 b11 · · · a b10 b11 · · · · · ·
00 01
.. .. . . .. .. . .
. . . . . .
=
b00 b01 · · · b00 b01 · · ·
a10 b10 b11 · · · b10 b11 · · ·
a11 · · ·
.. .. . . .. .. . .
. . . . . .
.. .. ..
. . .
299
matrix. Using our four expressions for H ⊗2 |xi |yi presented initially, we learn
1
1 1 1
H ⊗2 |00i =
|00i + |01i + |10i + |11i =
2 2 1
1
1
1 1 −1
H ⊗2 |01i =
|00i − |01i + |10i − |11i =
2 2 1
−1
1
1 1 1
H ⊗2 |10i =
|00i + |01i − |10i − |11i =
2 2 −1
−1
1
1 1 −1
H ⊗2 |11i =
|00i − |01i − |10i + |11i = .
2 2 −1
1
Therefore,
!
⊗2 ⊗2 ⊗2 ⊗2
MH ⊗2 = H |00i , H |01i , H |10i , H |11i
1 1 1 1
1
1 −1 1 −1 .
=
21 1 −1 −1
1 −1 −1 1
happily, the same result.
[Exercise. Complete the sanity check by confirming that this is a unitary matrix.]
To a general
α
β
|ψi2 = ,
γ
δ
we apply our operator
1 1 1 1 α α+β+γ+δ
1 1 −1 1 −1 β 1
α − β + γ − δ ,
H ⊗2 |ψi2
= =
21 1 −1 −1 γ 2 α + β − γ − δ
1 −1 −1 1 δ α−β−γ+δ
300
which shows the result of applying the two-qubit Hadamard to any state.
& : Measurement
There is no concise phrase we can use to describe how the binary Hadamard affects
measurement probabilities of a general state. We must be content to describe it in
terms of the algebra. For example, testing at point P,
(
|ψi2 → H ⊗2 H ⊗2 |ψi2
P Q
collapses |ψi2 to |00i with probability |α|2 (as usual). Waiting, instead, to take the
measurement of H ⊗2 |ψi2 (point Q), would produce a collapse to that same |00i with
the probability
α + β + γ + δ
2 (α + β + γ + δ)∗ (α + β + γ + δ)
= ,
2
4
301
Converting Between z-Basis and x-Basis using H ⊗2
H ⊗2 is the transformation that takes the z-CBS to the x-CBS (and since
it is its own inverse, also takes x-CBS back to the z-CBS).
Algebraically, using the alternate |±i notation for the x-basis, the forward direction
is
H ⊗2 |00i = |++i ,
H ⊗2 |01i = |+−i ,
H ⊗2 |10i = |−+i and
H ⊗2 |11i = |−−i ,
and the inverse direction is
H ⊗2 |++i = |00i ,
H ⊗2 |+−i = |01i ,
H ⊗2 |−+i = |10i and
H ⊗2 |−−i = |11i .
302
11.4.1 Measuring Along the x-Basis
We apply the observation of the last section. When we are ready to measure along
the x-basis, we insert the quantum gate that turns the x-basis into the z-basis, then
measure in our familiar z-basis. For a general circuit represented by a single operator
U we would use
H
,
U
H
to measure along the x-basis. (The meter symbols imply a natural z-basis measure-
ment, unless otherwise stated.)
C = { |ξ0 i , |ξ1 i } ,
of our first order Hilbert space, H(1) . Let’s also assume we have the unary operator,
call it T , that converts from the natural z-CBS to the C-CBS,
T : z-basis −→ C ,
T |0i = |ξ0 i ,
T |1i = |ξ1 i .
T† : C −→ z-basis ,
T † |ξ0 i = |0i ,
T † |ξ1 i = |1i .
Moving up to our second order Hilbert space, H(2) , the operator that converts from
the induced 4-element z-basis to the induced 4-element C-basis is T ⊗ T , while to go
in the reverse direction we would use T † ⊗ T † ,
T † ⊗ T † |ξ00 i = |00i ,
T † ⊗ T † |ξ01 i = |01i
T † ⊗ T † |ξ10 i = |10i ,
T † ⊗ T † |ξ11 i = |11i .
303
To measure a bipartite output state along the induced C-basis, we just apply T † ⊗ T †
instead of the Hadamard gate H ⊗ H at the end of the circuit:
T†
.
U
T†
Question to Ponder
How do we vocalize the measurement? Are we measuring the two qubits in the C basis
or in the z-basis? It is crucial to our understanding that we tidy up our language.
There are two ways we can say it and they are equivalent, but each is very carefully
worded, so please meditate on them.
1. If we apply the T † gate first, then subsequent measurement will be in the z-basis.
2. If we are talking about the original output registers of U before applying the T †
gates to them, we would say we are “measuring that pair along the C basis.” This
version has built-into it the implication that we are “first sending the qubits
through T † s, then measuring them the z-basis.”
If we happen to know that, just after the main circuit but before the final basis-
transforming T † ⊗ T † , we had a C-CBS state, then a final reading that showed the
binary number “x ” (x = 0, 1, 2 or 3) on our meter would imply that the original
bipartite state before the T † s was |ξx i. No collapse takes place when we have a CBS
ket and measure in that CBS basis.
However, if we had some superposition state of x-CBS kets, |ψi2 , at the end of our
main circuit but before the basis-transforming T † ⊗ T † , we have to describe things
probabilistically. Let’s express this superposition as
By linearity,
T † ⊗ T † |ψi2
= c00 |00i + c01 |01i + c10 |10i + c11 |11i ,
a result that tells us the probabilities of detecting the four natural CBS kets on our
final meters are the same as the probabilities of our having detected the corresponding
x-CBS kets prior to the final T † ⊗ T † gate. Those probabilities are, of course, |cx |2 ,
for x = 0, 1, 2 or 3.
304
11.4.3 The Entire Circuit Viewed in Terms of an Alternate
Basis
If we wanted to operate the circuit entirely using the alternate CBS, i.e., giving it
input states defined using alternate CBS coordinates as well as measuring along the
alternate basis, we would first create a circuit (represented here by a single operator
U ) that works in terms of the z-basis, then surround it with T and T † gates,
T T†
.
U
T T†
.
U S†
S U S† .
We’ll be using both separable and non-separable basis conversion operators today
and in future lessons.
305
11.5 Variations on the Fundamental Binary Qubit
Gates
We’ve seen three examples of two-qubit unitary operators: a general learning example,
the CN OT and the H ⊗2 . There is a tier of binary gates which are derivative of one
or more of those three, and we can place this tier in a kind of “secondary” category
and thereby leverage and/or parallel the good work we’ve already done.
The CNOT is a special case of a more general gate which “places” any unary (one-
qubit) operation in the B-register to be controlled by the qubit in the A-register.
If U is any unary gate, we can form a binary qubit gate called a controlled-U gate
(or U -operator ). Loosely stated, it applies the unary operator on one register’s CBS
conditionally, based on the CBS qubit going into the other register.
: The Symbol
The controlled-U gate is drawn like the CNOT gate with CNOT’s ⊕ operation re-
placed by the unary operator, U , that we wish to control:
•
,
X
•
.
Rθ
306
The A-register maintains its role of control bit/register, and the B-register the target
register or perhaps more appropriately, target (unary) operator.
“control register” → •
“target operator” → U
There is no accepted name for the operator, but I’ll use the notation CU (i.e.,
CX, C(Rθ ), etc.) in this course.
It is easiest and most informative to give the effect on the CBS when we know which
specific operator, U , we are controlling. Yet, even for a general U we can give formal
expression using the power (exponent) of the matrix U . As with ordinary integer
exponents, U n simply means “multiply U by itself n times” with the usual convention
that U 0 = 1:
|xi • |xi
|yi U U x |yi
0
0
1
CU |01i = |0i |1i = 0
0
0
0
CU |10i = |1i U |0i =
U00
U10
0
0
CU |11i = |1i U |1i =
U01
U11
307
Here the Ujk are the four matrix elements of MU .
[Exercise. Verify this formulation.]
···
··· : The Matrix
The column vectors of the matrix are given by applying CU to the CBS tensors to
get
!
MCU = CU |00i , CU |01i , CU |10i , CU |11i
! 1 0 0 0
0 1 0 0
= |00i , |01i , |1i U |0i , |1i U |1i =
0
,
0 U00 U01
0 0 U10 U11
[Exercise. Prove that this is a separable operator for some U and not-separable
for others. Hint: What did we say about CNOT in this regard?]
[Exercise. Confirm unitarity.]
we get
1 0 0 0 α α
0 1 0 0 β β
CU |ψi2
=
0
=
0 U00 U01 γ U00 γ + U01 δ
0 0 U10 U11 δ U10 γ + U11 δ
= α |0i |0i + β |0i |1i + (U00 γ + U01 δ) |1i |0i + (U10 γ + U11 δ) |1i |1i .
308
& : Measurement
There’s nothing new we have to add here, since measurement probabilities of the
target-register will depend on the specific U being controlled, and we have already
established that the measurement probabilities of the control register are unaffected
(see CNOT discussion).
: The Symbol
Recall that we had what appeared to be an overly abstract expression for the unary
Z-gate operating on a CBS |xi,
but this actually helps us understand the CZ action using a similar expression:
[Exercise. Prove this formula is correct. Hint: Apply the wordy definition of a
controlled Z-gate (“leaves the B-reg alone if ... and applies Z to the B-reg if ...”) to
the four CBS tensors and compare what you get with the formula (−1)xy |xi |yi for
each of the four combinations of x and y.]
309
To see the four CBS results as explicitly,
1
0
CZ |00i =
0
0
0
1
CZ |01i =
0
0
0
0
CZ |10i =
1
0
0
0
CZ |11i =
0
−1
···
··· : The Matrix
The column vectors of the matrix were produced above, so we can write it down
instantly:
1 0 0 0
0 1 0 0
CZ = 0 0 1 0
0 0 0 −1
[Exercise. Prove that this (a) is not a separable operator and (b) is unitary.]
we get
1 0 0 0 α α
0 1 0 0 β β
CZ |ψi2
=
0
= .
0 1 0 γ γ
0 0 0 −1 δ −δ
310
& : Measurement
As we noticed with the unary Z-operator, the probabilities of measurement (along the
preferred CBS) are not affected by the controlled-Z. However, the state is modified.
You can demonstrate this the same way we did for the unary Z: combine |ψi2 and
CZ |ψi2 with a second tensor and show that you (can) produce distinct results.
We could have (and still can) turn any of our controlled gates upside down:
Now the B-register takes on the role of control, and the A-register becomes the target:
“target operator” → U
“control register” → •
Let’s refer to this version of a controlled gate using the notation (caution: not
seen outside of this course) (C↑) U
Everything has to be adjusted accordingly if we insist – which we do – on continu-
ing to call the upper register the A-register, producing A-major (B-minor ) matrices
and coordinate representations. For example, the action on the CBS becomes
|xi U U y |xi
|yi • |yi
and the matrix we obtain (make sure you can compute this this) is
1 0 0 0
0 U
00 0 U01
(C↑) U = ,
0 0 1 0
0 U10 0 U11
311
[Exercise. Prove that the binary quantum gates CZ and (C↑) Z are identical.
That is, show that
• Z
= .
Z •
Hint: If they are equal on the CBS, they are equal period.]
H
H
provided a visual representation of this locality. It correctly pictures the non-causal
separation between the two channels when separable inputs are presented to the gate.
This is true for all separable operators like our current X ⊗ H,
|ψi X X |ψi
⊗ ⊗
|ϕi H H |ϕi
demonstrating the complete isolation of each channel, but only when separable states
are sent to the gate. If an entangled state goes in, an entangled state will come out.
312
The channels are still separate, but the input and output must be displayed as unified
states,
( )
2
X
|ψi → → (X ⊗ H) |ψi2
H
U ⊗ 1 : HA ⊗ HB −→ HA ⊗ HB .
Let’s apply this to a general |ψi2 , allowing the possibility that this is an entangled
state.
(U ⊗ 1) |ψi2 = [U ⊗ 1] α |00i + β |01i + γ |10i + δ |11i
= α U |0i |0i + β U |0i |1i + γ U |1i |0i + δ U |1i |1i .
Replacing U |0i and U |1i with their expansions along the CBS (and noting that the
matrix elements, Ujk , are the weights), we get
(U ⊗ 1) |ψi2
= α U00 |0i + U10 |1i |0i + β U00 |0i + U10 |1i |1i
+ γ U01 |0i + U11 |1i |0i + δ U01 |0i + U11 |1i |1i .
Now, distribute and collect terms for the four CBS tensor basis, to see that
(U ⊗ 1) |ψi2 =
αU00 + γU01 |00i + · · ·
and without even finishing the regrouping we see that the amplitude of |00i can be
totally different from its original amplitude, proving that the new entangled state is
different, and potentially just as entangled.
[Exercise. Complete the unfinished expression I started for (U ⊗ 1) |ψi2 , collect
terms for the four CBS kets, and combine the square-magnitudes of |0iB to get the
probability of B measuring a “0 ”. Do the same for the |1iB to get the probability
of B measuring a “1 ”. Might the probabilities of these measurements be affected by
applying U to the A-register?]
Vocabulary. This is called performing a local operation on one qubit of an
entangled pair.
313
The Matrix. It will be handy to have the matrix for the operator U ⊗ 1 at
the ready for future algorithms. It can be written down instantly using the rules for
separable operator matrices (see the lesson on tensor products),
U00 0 U01 0
0 U00 0 U01
.
U10 0 U11 0
0 U10 0 U11
Example. Alice and Bob each share one qubit of the entangled pair
1
|00i + |11i 1 0
|β00 i ≡ √ = √ ,
2 2 0
1
which you’ll notice I have named |β00 i for reasons that will be made clear later today
when we get to the Bell states. Alice will hold the A-register qubit (on the left of
each product) and Bob the B-register qubit, so you may wish to view the state using
the notation
|0iA |0iB + |1iA |1iB
√ .
2
Alice sends her qubit (i.e., the A-register) through a local QNOT operator,
0 1
X = ,
1 0
and Bob does nothing. This describes the full local operator
X ⊗1
applied to the entire bipartite state. We want to know the effect this has on the
total entangled state, so we apply the matrix for X ⊗ 1 to the state. Using our
pre-computed matrix for the general U ⊗ 1 with U = X, we get
0 0 1 0
1
0
0 0 0 1 1 0 1 1
X ⊗ 1 |β00 i = √ = √
0 0 2 0 2 1
1 0
0 1 0 0 1 0
|01i + |10i
= √ .
2
314
We could have gotten the same result, perhaps more quickly, by distributing X ⊗ 1
over the superposition and using the identity for separable operators,
[S ⊗ T ] (v ⊗ w) = S(v) ⊗ T (w) .
Either way, the result will be used in our first quantum algorithm (superdense coding),
so it’s worth studying carefully. To help you, here is an exercise.
[Exercise. Using the entangled state |β00 i as input, prove that the following local
operators applied on Alice’s end produce the entangled output states shown:
√
• Z ⊗ 1 |β00 i = (|00i − |11i) / 2 ,
√
• iY ⊗ 1 |β00 i = (|01i − |10i) / 2
315
11.6.1 The Born Rule for a Two-Qubit System
Take the most general state
α
β
|ψi2 = α |00i + β |01i + γ |10i + δ |11i = ,
γ
δ
and rearrange it so that either the A-kets or B-kets are factored out of common terms.
Let’s factor the A-kets for this illustration.
|ψi2 = |0iA α |0iB + β |1iB + |1iA γ |0iB + δ |1iB .
(I labeled the state spaces of each ket to reinforce which kets belong to which register,
but position implies this information even without the labels. I will often label a
particular step in a long computation when I feel it helps, leaving the other steps
unlabeled.)
What happens if we measure A and get a “0 ”? Since there is only one term which
matches this state, namely,
|0i α |0i + β |1i ,
we are forced to conclude that the B-register is left in the non-normalized state
α |0i + β |1i .
There are a couple things that may be irritating you at this point.
We are on firm ground, however, because when the postulates of quantum mechanics
are presented in their full generality, both of these concerns are addressed. The
fifth postulate of QM (Trait #7), which addresses post-measurement collapse has
a generalization sometimes called the generalized Born rule. For the present, we’ll
satisfy ourselves with a version that applies only to a bipartite state’s one-register
measurement. We’ll call it the . . .
Trait #15 (Born Rule for Bipartite States): If a bipartite state is factored
relative to the A-register,
|ψi2 = |0i α |0i + β |1i + |1i γ |0i + δ |1i ,
316
a measurement of the A-register will cause the collapse of the B-register according to
α |0i + β |1i
A & 0 ⇒ B & p
|α|2 + |β|2
γ |0i + δ |1i
A & 1 ⇒ B & p .
|γ|2 + |δ|2
Note how thisphandles the non-normality of the state α |0i + β |1i: we divide though
by the norm |α|2 + |β|2 . (The same is seen for the alternative state γ |0i + δ |1i).
Vocabulary. We’ll call this simply the “Born Rule.”
Trait #15 has a partner which tells us what happens if we first factor out the
B-ket and measure the B-register.
[Exercise. State the Born Rule when we factor and measure the B -register.]
You should always confirm your understanding of a general rule by trying it out on
simple cases to which you already have an answer.
Example. The state we encountered,
α = δ = 1 and
β = γ = 0,
so if we measure the A-register and find it to be in the state |0iA (by a measurement
of “0 ” on that register), the Born rule tells us that the state remaining the B-register
should be
α |0i + β |1i 1 · |0i + 0 · |1i
p = √ = |0i ,
|α|2 + |β|2 12 + 02
317
and imagine that we test the B-register and find that it “decided” to collapse to |0iB .
To see what state this leaves the A-register in, factor out the B-kets of the original
to get an improved view of |ψi2 ,
|0iA + |1iA |0iB + |0iA − |1iA |1iB
|ψi2 = .
2
Examination of this expression tells us that the A-register corresponding to |0iB is
some normalized representation of the vector |0i + |1i. Let’s see if the Born rule
gives us that result. The expression’s four scalars are
1
α = β = γ= and
2
1
δ = − ,
2
so a B-register collapse to |0iB will, according to the Born rule, leave the A-register
in the state
1
α |0i + γ |1i |0i + 21 |1i |0i + |1i
p = p2 = √ ,
|α|2 + |γ|2 (1/2)2 + (1/2)2 2
again, the expected normalized state.
[Exercise. Show that a measurement
√ of B & 1 for the same state results in an
A-register collapse to |0i − |1i / 2 .]
The Born rule gets used in many important quantum algorithms, so there’s no danger
of over-doing our practice. Let’s take the separable gate 1 ⊗ H, whose matrix you
should (by now) be able to write down blindfolded,
1 1
1 1 −1
1⊗H = √ .
2 1 1
1 −1
Hand it the most general |ψi2 = (α, β, γ, δ)t , which will produce gate output
α+β
1α − β
2 γ + δ
γ−δ
We measure the B register and get a “1”. To see what’s left in the A-register, we
factor the B-kets at the output,
1 ⊗ H |ψi2 =
(α + β) |0i + (γ + δ) |1i |0i
+ (α − β) |0i + (γ − δ)) |1i |1i .
318
(At this point, we have to pause to avoid notational confusion. The “α” of the Born
rule is actually our current (α − β), with similar unfortunate name conflicts for the
other three Born variables, all of which I’m sure you can handle.) The Born rule says
that the A-register will collapse to
(α − β) |0i + (γ − δ) |1i
p ,
|α − β|2 + |γ − δ|2
which is as far as we need to go, although it won’t hurt for you to do the ...
[Exercise. Simplify this and show that it is a normal vector.]
In addition, I am not using the superscript to denote a bipartite state |β00 i2 , since
319
The circuit that produces these four states using the standard CBS basis for H(2)
as inputs is
|xi •
)
H
→ |βxy i ,
|yi
which can be seen as a combination of a unary Hadamard gate with a CNOT gate.
We could emphasize that this is a binary gate in its own right by calling it BELL and
boxing it,
|xi •
)
H
→ |βxy i .
|yi
When studying unary quantum gates, we saw that they take orthonormal bases to
orthonormal bases. The same argument – unitarity – proves this to be true of any
dimension Hilbert space. Consequently, the Bell states form an orthonormal basis for
H(2) .
The matrix for this gate can be constructed using the various techniques we have
already studied. For example, you can use the standard linear algebra approach of
building columns for the matrix from the four outputs of the gate. For variety, let’s
take a different path. The A-register Hadamard has a plain quantum wire below it,
meaning the B-register is implicitly performing an identity operator at that point.
So we could write the gate using the equivalent symbolism
H •
,
1
320
a visual that demonstrates the application of two known matrices in series, that is,
1 0 0 0 1 0 1 0
0 1 0 0 1 0 1 0 1
BELL = (CN OT ) (H ⊗ 1) = √
0 0 0 1 2 1 0 −1 0
0 0 1 0 0 1 0 −1
1 0 1 0
1 0 1 0 1
= √ .
2 0
1 0 −1
1 0 −1 0
4. Confirm that the matrix gives the four Bell states when one presents the four
CBS states as inputs. (See below for a hint.)
5. Demonstrate that BELL is not separable. Hint: What do we know about the
matrix of a separable operator? ]
I’ll do item 4 for the input state |10i, just to get the blood flowing.
1 0 1 0 0 1
1 0 1 0 1 0
1 0
BELL |10i = √ = √
2 0 1 0 −1 1 2 0
1 0 −1 0 0 −1
|00i − |11i
= √ = |β10 i . X
2
321
and followed it with some exercises. We can now list those results in the language of
the EPR pairs.
1 ⊗ 1 |β00 i = |β00 i
X ⊗ 1 |β00 i = |β01 i
Z ⊗ 1 |β00 i = |β10 i
iY ⊗ 1 |β00 i = |β11 i
This might be a good time to appreciate one of today’s earlier observations: a local
(read “separable”) operation on an entangled state changes the entire state, affecting
both qubits of the entangled pair.
The operator BELL takes natural CBS kets to the four Bell kets, the latter shown to
be an orthonormal basis for H(2) . But that’s exactly what we call a basis transforming
operator. Viewed in this light BELL, like H ⊗2 , can be used when we want to change
our basis. Unlike H ⊗2 , however, BELL is not separable (a recent exercise) and not
its own inverse (to be proven in a few minutes).
Measuring Along the Bell Basis. We saw that to measure along any basis,
we find the binary operator, call it S, that takes the z-basis to the other basis and
use S † prior to measurement.
.
U S†
Thus, to measure along the BELL basis (and we will, next lecture), we plug in BELL
for S,
.
U BELL†
And what is BELL† ? Using the adjoint conversion rules, and remembering that the
order of operators in the circuit is opposite of that in the algebra, we find
h i† †
BELL† = (CN OT ) H ⊗ 1 = H ⊗1 (CN OT )†
= H ⊗ 1 (CN OT ) ,
the final equality a consequence of the fact that CNOT and H ⊗1 are both self-adjoint.
322
[Exercise. Prove this last claim using the matrices for these two binary gates.]
In other words, we just reverse the order of the two sub-operators that comprise
BELL. This makes the circuit diagram for BELL† come out to be
• H
,
1
The matrix for BELL† is easy to derive since we just take the transpose (everything’s
real so no complex conjugation necessary):
†
1 0 1 0 1 0 0 1
1 0 1 0 1 1 0 1 1 0
BELL†
= √ = √ .
2 0
1 0 −1 2 1
0 0 −1
1 0 −1 0 0 1 −1 0
|xi H • H |x ⊕ yi
|yi H H |yi
H • H
.
H H
323
That corresponds to the matrix product
1 1 1 1 1 0 0 0 1 1 1 1
1 1 −1 1 −1 0
1 0 1 1 −1 1 −1
0
C ↑ NOT =
21 1 −1 −1 0 0 0 1 2 1 1 −1 −1
1 −1 −1 1 0 0 1 0 1 −1 −1 1
1 1 1 1 1 1 1 1 4 0 0 0
1
1 −1 1 −1 1 −1 1 −1
10 0 0 4
= =
4 1
1 −1 −1 1 −1 −1 1 4 0 0 4 0
1 −1 −1 1 1 1 −1 −1 0 4 0 0
1 0 0 0
0 0 0 1
=
0
.
0 1 0
0 1 0 0
This is an easy matrix to apply to the four CBS kets with the following results:
C ↑ NOT : |00i 7−→ |00i
C ↑ NOT : |01i 7−→ |11i
C ↑ NOT : |10i 7−→ |10i
C ↑ NOT : |11i 7−→ |01i
We can now plainly see that the B-register is controlling the A-register’s QNOT
operation, as claimed.
We now have two different studies of the upside-down CNOT. The first study con-
cerned the “naked” CNOT and resulted in the observation that, relative to the x-CBS,
the B-register controlled the QNOT (X) operation on the A-register, thus it looks
upside-down if you are an x-basis ket. The second and current study concerns a new
circuit that had the CNOT surrounded by Hadamard gates and, taken as a whole is
a truly upside-down CNOT viewed in the ordinary z-CBS. How do these two studies
compare?
The key to understanding this comes from our recent observation that H ⊗2 can
be viewed as a way to convert between the x-basis and the z-basis (in either direction
since it’s its own inverse).
Thus, we use the first third of the three-part circuit to let H ⊗2 take z-CBS kets
to x-CBS kets. Next, we allow CNOT to act on the x-basis, which we saw from
our earlier study caused the B-register to be the control qubit – because and only
because we are looking at x-basis kets. The output will be x-CBS kets (since we put
x-CBS kets into the central CNOT). Finally, in the last third of the circuit we let
H ⊗2 convert the x-CBS kets back to z-CBS kets.
324
11.8 More than Two Qubits
We will officially introduce n-qubit systems for n > 2 next week, but we can find ways
to use binary qubit gates in circuits that have more than two inputs immediately as
long as we operate on no more than two qubits at at-a-time. This will lead to our
first quantum algorithms.
• The dimension of the product space, W , is the product of the three dimensions,
dim (W ) = dim (A) · dim (B) · dim (C)
and W has as its basis
wjkl ≡ aj ⊗ bk ⊗ cl ,
where {aj }, {bk } and {cl } are the bases of the three component spaces.
• The vectors (tensors) in the product space are uniquely expressed as superpo-
sitions of these basis tensors so that a typical tensor in W can be written
X
w = cjkl (aj ⊗ bk ⊗ cl ) ,
j,k,l
where cjkl are the amplitudes of the CBS kets, scalars which we had been naming
α, β, γ, etc. in a simpler era.
325
• A separable operator on the product space is one that arises from three compo-
nent operators, TA , TB and TC , each defined on its respective component space,
A, B and C. This separable tensor operator is defined first by its action on
separable order-3 tensors
and since the basis tensors are of this form, that establishes the action of TA ⊗
TB ⊗ TC on the basis which in turn extends the action to the whole space.
If any of this seems hazy, I encourage you to refer back to the tensor produce
lecture and fill in details so that they extend to three component spaces.
[Exercise. Replicate the development of an order-2 tensor product space from
our past lecture to order-3 using the above definitions as a guide.]
HA ⊗ H B ⊗ H C .
The order of the tensor product, this time three, can used to label the state space:
H(3) .
326
The shorthand alternatives are
The notation of the first two columns admits the possibility of labeling each of the
component kets with the H from which it came, A, B or C,
The densest of the notations expresses the CBS ket as an integer from 0 to 7. We
reinforce this correspondence and add the coordinate representation of each basis ket:
0 0 0 0 0 0 0 1
Note that that the “exponent 3” is needed mainly in the encoded form, since an
integer representation for a CBS does not disclose its tensor order (3) to the reader,
while the other representations clearly reveal that the context is three-qubits.
The Channel Labels. We will use the same labeling scheme as before, but more
input lines means more labels. For three lines, we would name the registers A, B and
327
C, as in the hypothetical circuit
As I mentioned in the introduction, the current regime allows three or more inputs as
long as we only apply operators to two at-a-time. Let’s look at a circuit that meets
that condition.
( •
2
|ψi →
H
|ϕi H
P Q R
This circuit is receiving an order-3 tensor at its inputs. The first two registers, A and
B, get a (potentially) entangled bipartite state |ψi2 and the third, C, gets a single
qubit, |ϕi. We analyze the circuit at the three access points, P , Q and R.
A Theoretical Approach. We’ll first do an example that is more general than
we normally need, but provides a surefire fallback technique if we are having a hard
time. We’ll give the input states the most general form,
α
β
|ψi = γ and
δ
η
|ϕi =
ξ
328
Access Point P. The initial tripartite tensor is
αη
αξ
α
βη
β
⊗ η
βξ
|ψi2 |ϕi = γ =
ξ γη
δ γξ
δη
δξ
Access Point Q. The first gate is a CNOT applied only to the entangled |ψi2 ,
so the overall effect is just
CNOT ⊗ 1 |ψi2 |ϕi CNOT |ψi2 ⊗ |ϕi
=
by the rule for applying a separable operator to a separable state. (Although the first
two qubits are entangled, when the input is grouped as a tensor product of H(2) ⊗ H,
|ψi2 ⊗ |ϕi is recognized as a separable second-order tensor.)
Applying the CNOT explicitly, we get
α α
β η β
⊗ η
2
CNOT |ψi ⊗ |ϕi = CNOT γ ⊗ ξ
= δ ξ
δ γ
αη
αξ
βη
βξ
= .
δη
δξ
γη
γξ
We needed to multiply out the separable product in preparation for the next phase
of the circuit, which appears to operate on the last two registers, B and C.
Access Point R. The final operator is local to the last two registers, and takes
the form
1 ⊗ H ⊗2
Although it feels as though we might be able to take a short cut, the intermediate
tripartie state and final operator are sufficiently complicated to warrant treating it as
329
a fully entangled three-qubit state and just doing the big matrix multiplication.
αη 1 1 1 1 0 0 0 0 αη
αξ 1 −1 1 −1 0 0 0 0 αξ
βη
1
1 −1 −1 0 0 0 0
βη
h
⊗2 βξ
i 1 1 −1 −1 1 0 0 0 0 βξ
1⊗H δη = 2 0 0
0 0 1 1 1 1 δη
δξ
0 0
0 0 1 −1 1 −1 δξ
γη 0 0 0 0 1 1 −1 −1 γη
γξ 0 0 0 0 1 −1 −1 1 γξ
αη + αξ + βη + βξ
αη
− αξ + βη − βξ
αη
+ αξ − βη − βξ
1
αη − αξ − βη + βξ
= ,
δη
2 + δξ + γη + γξ
δη
− δξ + γη − γξ
δη + δξ − γη − γξ
δη − δξ − γη + γξ
This isn’t very enlightening because the coefficients are so general. But a concrete
example shows how the process can be streamlined.
A Practical Approach. Usually we have specific and nicely symmetric input
tensors. In this case, let’s pretend we know that
|00i + |11i
|ψi2 ≡ √ and
2
|ϕi ≡ |1i .
330
For reference, I’ll repeat the circuit for this specific input.
( •
|00i + |11i
√ →
2 H
|1i H
P Q R
Access Point Q. Apply the two-qubit CNOT gate to registers A and B, which
has the overall effect of applying CNOT ⊗ 1 to the full tripartite tensor.
|ψi2 |ϕi CNOT |ψi2 ⊗ 1 |ϕi
CN OT ⊗ 1 =
|00i + |11i
= CNOT √ ⊗ 1 |1i
2
CNOT |00i + CNOT |11i
= √ ⊗ |1i
2
|00i + |10i
= √ ⊗ |1i
2
|001i + |101i
= √ ,
2
a state that can be factored to our advantage:
|001i + |101i |0i + |1i
√ = √ |01i
2 2
Access Point R. Finally we apply the second two-qubit gate H ⊗2 to the B and
C registers, which has the overall effect of applying 1 ⊗ H ⊗2 to the full tripartite
state. The factorization we found makes this an easy separable proposition,
h i |0i + |1i
|0i + |1i
⊗2
1⊗H √ ⊗ |01i = √ ⊗ H ⊗2 |01i .
2 2
331
Referring back to the second order Hadamard on the two states in question, i.e.,
|00i − |01i + |10i − |11i
H ⊗2 |01i = ,
2
we find that
h i |0i + |1i
⊗2
1⊗H √ ⊗ |01i
2
|0i + |1i |00i − |01i + |10i − |11i
= √
2 2
which can be factored into a fully separable
|0i + |1i |0i + |1i |0i − |1i
√ √ √ .
2 2 2
On the other hand, if we had been keen enough to remember that H ⊗2 is just the
separable H ⊗ H, which has a special effect on a separable state like |01i = |0i |1i,
we could have gotten to the factored form faster using
h i |0i + |1i
|0i + |1i
⊗2
1⊗H √ ⊗ |01i = √ ⊗ H ⊗ H |0i |1i
2 2
|0i + |1i
= √ ⊗ H |0i ⊗ H |1i
2
|0i + |1i |0i + |1i |0i − |1i
= √ √ √ .
2 2 2
The moral is, don’t worry about picking the wrong approach to a problem. If your
math is sound, you’ll get to the end zone either way.
Double Checking Our Work. Let’s pause to see how this compares with the
general formula. Relative to the general |ψi2 and |ϕi, the specific amplitudes we have
in this example are
1
α = δ = √ , ξ = 1 and
2
β = γ = η = 0
332
to reduce to
1h
(αξ) |000i + (−αξ) |001i + (αξ) |010i + (−αξ) |011i
2
i
+ (δξ) |100i + (−δξ) |101i + (δξ) |110i + (−δξ) |111i
1h 1 1 1 1
= ( √ |000i − √ |001i + √ |010i − √ |011i
2 2 2 2 2
1 1 1 1 i
+ √ |100i − √ |101i + √ |110i − √ |111i .
2 2 2 2
A short multiplication reveals this to be the answer we got for the input |00i√+2|11i |1i,
without the general formula.
[Exercise. Verify that this is equal to the directly computed output state. Hint:
If you start with the first state we got, prior to the factorization, there is less to do.]
[Exercise. This had better be a normalized state as we started with unit vectors
and applied unitary gates. Confirm it.]
The Born rule can be generalized to any dimension and stated in many ways. For
now, let’s state the rule for an order-three Hilbert space with registers A, B and C,
and in a way that favors factoring out the AB-registers.
Trait #15’ (Born Rule for Tripartite States): If we have a tripartite state
that can be expressed as the sum of four terms
each of which is the product of a distinct CBS ket for HA ⊗ HB and some general
first order (typically un-normalized) ket in the space HC ,
|ki2AB |ψk iC ,
then if we measure the first two registers, thus forcing their collapse into one of the
four basis states,
2
|0iAB , |1i2AB , |2i2AB , |3i2AB ,
the C register will be left in a normalized state associated with the measured CBS
333
ket. In other words,
|ψ0 i
A ⊗ B & |0i2 =⇒ C & p ,
hψ0 | ψ0 i
|ψ1 i
A ⊗ B & |1i2 =⇒ C & p ,
hψ1 | ψ1 i
|ψ2 i
A ⊗ B & |2i2 =⇒ C & p and
hψ2 | ψ2 i
|ψ3 i
A ⊗ B & |3i2 =⇒ C & p .
hψ3 | ψ3 i
Note the prime (’) in the tripartite Trait #15’, to distinguish this from the un-
primed Trait #15 for bipartite systems. Also, note that I suppressed the state-space
subscript labels A, B and C which are understood by context.
We’ll use this form of the Born rule for quantum teleportation in our next lecture.
334
Chapter 12
• Superdense Coding
• Quantum Teleportation
• Deutsch’s Algorithm
The first two demonstrate quantum communication possibilities, and the third pro-
vides a learning framework for many quantum algorithms which execute “faster” (in
a sense) than their classical counterparts.
seems like it holds an infinite amount of information. After all, α and β are complex
numbers, and even though you can’t choose them arbitrarily (|α|2 + |β|2 must be 1),
the mere fact that α can be any complex number whose magnitude is ≤ 1 means it
could be an unending sequence of never-repeating digits, like 0.4193980022903 . . . .
If a sender A (an assistant quantum researcher named “Alice”) could pack |ψi with
that α (and compatible β) and send it off in the form of a single photon to a receiver
B (another helper whose name is “Bob”) a few time zones away, A would be sending
an infinite string of digits to B encoded in that one sub-atomic particle.
335
The problem, of course, arises when the when B tries to look inside the received
state. All he can do is measure it once and only once (Trait #7, the fifth postulate
of QM), at which point he gets a “0 ” or “1 ” and both α and β are wiped off the face
of the Earth. That one measurement tells B very little.
[Exercise. But it does tell him something. What?]
In short, to communicate |α|, A would have to prepare and send an infinite
number of identical states, then B would have to receive, test and record them. Only
then would B know |α| and |β| (although neither α nor β). This is no better than
classical communication.
We have to lower our sights.
We are wondering what information, exactly, A (Alice) can send B (Bob) in the
form of a single qubit. We know it’s not infinite. At the other extreme is the most
modest super-classical capability we could hope for: two classical bits for the price
of one. I think that we need no lecture to affirm the claim that, in order to send a
two-digit binary message, i.e., one of
0 = “00 ” ,
1 = “01 ” ,
2 = “10 ” ,
3 = “11 ” ,
we would have to send more than one classical bit – we’d need two. Can we pack at
least this meager amount of classical information into one qubit with the confidence
that B would be able read the message?
|00i + |11i
|β00 i = √ ,
2
|01i + |10i
|β01 i = √ ,
2
|00i − |11i
|β10 i = √ and
2
|01i − |10i
|β11 i = √ .
2
336
Building the Communication Equipment
A and B prepare the state |β00 i. (This can be done, for example, by sending a |00i
through the BELL gate,
|0i H • )
→ |β00 i ,
|0i
as we learned.) A takes the A register of the entangled state |β00 i and B takes the
B register. B gets on a plane, placing his qubit in the overhead bin and travels a
few time zones away. This can all be done long before the classical two-bit message
is selected by A , but it has to be done. It can even be done by a third party who
sends the first qubit of this EPR pair to A and the second to B.
Defense of Your Objection. The sharing of this qubit does not constitute
sending more than one qubit of information (the phase yet to come), since it is
analogous to establishing a radio transmission protocol or message envelope, which
would have to be done even with classical bits. It is part of the equipment that B
and A use to communicate data, not the data itself.
Notation. In the few cases where we need it (and one is coming up), let’s build
some notation. When a potentially entangled two-qubit state is separated physically
into two registers or by two observers, we need a way to talk about each individual
qubit. We’ll use
2
|ψi for the A register (or A ’s) qubit, and
A
|ψi2 for the B register (or B’s) qubit.
B
Note that, unless |ψi2 happens to be separable – and |β00 i is clearly not – we will be
faced with the reality that
|ψi2 6= |ψi2 ⊗ |ψi2 .
A B
[Note. This does not mean that the A register and B register can’t exist in physically
independent locations and be measured or processed independently by different ob-
servers. As we learned, one observer can modify or measure either qubit individually.
What it does mean is that the two registers are entangled so modifying or measuring
one will affect the other. Together they form a single state.]
With this language, the construction and distribution of each half of the entangled
|β00 i to A and B can be symbolized by
−−−−−−−−−→ |β00 i A goes to A
|β00 i −→
−−−−−−−−−→ |β00 i goes to B
B
337
A Encodes the Message
When A is ready to ship one of the four bit strings to B, she decides – or is informed
– which it is to be and takes the following action. She submits her half of the bipartite
state to one of four local gates according to the table
“01 ” X X ⊗1 |β01 i
“10 ” Z Z ⊗1 |β10 i
“11 ” iY iY ⊗ 1 |β11 i
(The “⊗1”s in the equivalent binary gate column reflect the fact that B is not touch-
ing his half of |β00 i, which is effectively the identity operation as far as the B register
is concerned.)
And how do we know that the far right column is the result of A ’s local operation?
We apply the relevant matrix to |β00 i and read off the answer (see section Four Bell
States from One in the two qubit lecture).
Compatibility Note. Most authors ask A to apply Z if she wants to encode
”01 ” and X if she wants to encode ”10 ”, but doing so results in the state |β10 i for
”01 ” and |β01 i for ”10 ”, not a very nice match-up and is why I chose to present the al-
gorithm with those two gates swapped. Of course, it really does’t matter which of the
four operators A uses for each encoding, as long as B uses the same correspondence
to decode.
If we encapsulate the four possible operators into one symbol, SD (for Super
Dense), which takes on the proper operation based on the message to be encoded,
A ’s job is to apply the local circuit,
h i
|β00 i ( SD ⊗ 1 ) |β00 i
SD
A
A
Notice that to describe A ’s half of the output state, we need to first show the full
effect of the bipartite operator and only then restrict
attention to A ’s qubit. We
cannot express it as a function of A ’s input, |β00 i , alone.
A
The message is now encoded in the bipartite state, but for B to decode it, he needs
both qubits. A now sends her qubit to B.
338
h i
( SD⊗1 )|β00 i
A
A −−−−−−−−−−−−−−−−−−−−−−−−−−→ B
so he can now measure both qubits to determine which of the four Bell states he has.
Once that’s done he reads the earlier table from right-to-left to recover the classical
two-bit message.
Refresher: Measuring Along the Bell Basis. Since this is the first time we
will have applied it in an algorithm, I’ll summarize one way that B can measure
his entangled state along the Bell basis. When studying two qubit logic we learned
that to measure a bipartite state along a non-standard basis (call it C), we find the
binary operator that takes the z-basis to the other basis, call it S, and use S † prior
to measurement:
( )
(some C-basis state) −→ S† measure along z-basis
• H
BELL † = .
Adding the measurement symbols (the “meters”) along the z-basis, circuit becomes
(
(one of the four BELL states) −→ BELL† .
339
In terms of matrices, B subjects his two-qubit state to the matrix for BELL† (also
computed last time),
1 0 0 1
1 0 1 1 0
BELL† = √ .
2 1 0 0 −1
0 1 −1 0
Bob’s Action and Conclusion. Post-processing with the BELL† gate turns
the four Bell states into four z-CBS kets; if B follows that gate with a z-basis mea-
surement and sees a “01 ”, he will conclude that he had received the Bell state |β01 i
from A , and likewise for the other states. So his role, after receiving the qubit sent
by A , is to
2. read the encoded message according to his results using the table
“00 ” “00 ”
“01 ” “01 ”
“10 ” “10 ”
“11 ” “11 ”
In other words, the application of the BELL† gate allowed B to interpret his z-basis
measurement reading “xy” as the message, itself.
The following exercise should help crystallize the algorithm.
[Exercise. Assume A wants to send the message “11.”
ii) Multiply the 4 × 4 matrix for BELL† by the 4 × 1 state vector for |β11 i to show
that B recovers the message “11.”
We can get a circuit for the overall superdense coding algorithm by adding some new
notation.
Classical Wires. Double lines (=) indicate the transfer of classical bits. We use
them to move one or more ordinary digits within a circuit.
340
Decisions Based on Classical Bits. We insert a dot symbol, , into a classical
line to indicate a general controlled operation, based on the content of that classical
data ([1] means apply an operator, [0] means don’t).
Noiseless Transmission Between Communicators. To indicate the (typ-
The notation tells the story. A uses her two-bit classical message “xy” (traveling on
the double lines) to control (filled circles) which of the four operations (SD = 1, X,
Z or iY ) she will apply to her qubit. After sending her qubit to B, B measures both
qubits along the Bell basis to recover the message “xy” now sitting in the output
registers in natural z-basis form |xi |yi.
[Exercise. Measurement involves collapse and uncertainty. Why is B so certain
that his two measurements will always result in a true reproduction of the message
“xy” sent by A ? Hint: For each of the four possible messages, what bipartite state
is he holding at the moment of measurement?]
This can actually be tightened up. You’ve seen several unary operator identities
in the single qubit lecture, one of which was XZ = −iY . A slight revision of this
(verify as an exercise) is
ZX = iY ,
which enables us to define the elusive SD operation: we place a controlled-X gate and
controlled-Z gate in the A-channel under A ’s supervision. Each gate is controlled
by one of the two classical bits in her message. They work just like a quantum
Controlled-U gate, only simpler: if the classical control bit is 1, the target operation
is applied, if the bit is 0, it is not.
x −→ •
y −→ •
A
A-reg: |β00 i
A
X Z g |xi
BELL † B
B-reg: |β00 i |yi
B
341
For example, if both bits are 1, both gates get applied and result in in the desired
behavior: “11 ”⇒ ZX ∼ = iY .
[Exercise. Remind us why the gates X and Z appear reversed in the circuit
relative to the algebraic identity iY = ZX.]
This technique may not seem tremendously applicable considering its unimpressive
2-bit to 1-bit compression, but consider sending a large classical message, even one
that is already as densely compressed as classical logic will allow. This is a 2-to-1
improvement over the best classical technique when applied to the output of classical
compression. The fact that we have to send lots of entangled Bell states before our
message takes nothing away from our ability to send information in half the time (or
space) as before.
You might ask why she doesn’t simply send B the one qubit and be done with it.
Why be so indirect and translate the quantum information into classical bits? There
are many answers, two of which I think are important.
2. Sending the original qubit rather than two classical bits is somewhat beside the
point. The very fact A can get the infinitely precise data embedded in the
continuous scalars α and β by sending something as crude as an integer from 0
to 3 should come as unexpectedly marvelous news, and we want to know why
and how this can be done.
342
Caveats
There is the usual caveat. Just because B gets the qubit doesn’t mean he can know
what it is. He can no more examine its basis coefficients than A (or anyone in her
local lab who didn’t already know their values) could. What we are doing here is
getting the qubit over to B’s lab so he can use it on his end for any purpose that A
could have (before the teleportation).
And then there’s the unusual caveat. In the process of executing the teleportation,
A loses her copy of |ψi. We’ll see why as we describe the algorithm.
then an AB-register measurement along the natural basis will force the corresponding
C-register collapse according to
|ψ0 iC
A ⊗ B & |0i2AB =⇒ C & k |ψ0 iC k
,
|ψ1 iC
A ⊗ B & |1i2AB =⇒ C & k |ψ1 iC k
,
etc.
There are two consequences that will prepare us for understanding quantum tele-
portation as well as anticipating other algorithms that might employ this special
technique.
Consequence #1. The rule works for any orthonormal basis in channels A and
B, not just the natural basis. Whichever basis we choose for the first two registers A
and B, it is along thatbasis that
we must make our two-qubit measurements. So, if
we use the Bell basis, |βjk i , then a state in the form
343
when measured along that basis will force the corresponding C-register collapse ac-
cording to
|ψ00 iC
A ⊗ B & |β00 iAB =⇒ C & ,
k |ψ00 iC k
|ψ01 iC
A ⊗ B & |β01 iAB =⇒ C & k |ψ01 iC k
,
etc.
This follows from the Trait #7, Post-Measurement Collapse, which tells us that AB
will collapse to one of the four CBS states – regardless of which CBS we use – forcing
C into the state that is glued to its partner in the above expansion.
p
The division by each k |ψjk i k (or, if you prefer, hψjk | ψjk i ) is necessary because
the overall tripartite state, |ϕi3 can only be normalized when the |ψjk i have non-unit
(in fact < 1) lengths.
[Exercise. We already know that |βjk i are four normalized CBS kets. Show that
if the |ψjk i were normal vectors in HC , then |ϕi3 would not be a normal vector. Hint:
Write down 3 hϕ | ϕi3 and apply orthonormality of the Bell states.]
Consequence #2. If we know that the four general states, |ψjk i, are just four
variations of a single known state, we may be able to glean even more specific in-
formation about the collapsed C-register. To cite the example needed today, say we
know that all four |ψjk i use the same two scalar coordinates, α and β, only in slightly
different combinations,
3 α |0iC + β |1iC β |0iC + α |1iC
|ϕi = |β00 iAB + |β01 iAB
2 2
α |0iC − β |1iC −β |0iC + α |1iC
+ |β10 iAB + |β11 iAB .
2 2
(Each denominator 2 is needed to produce a normal state |ϕi3 ; we cannot absorb
it into α and β, as those scalars are fixed by the normalized |ψi to be teleported.
However, the Born rule tells us that the collapse of the C-register will get rid of this
factor, leaving only one of the four numerators in the C-register.) Such a happy state-
of-affairs will allow us to convert any of the four collapsed states in the C-register to
the one state,
α |0i + β |1i
by mere application of a simple unary operator. For example, if we find that AB
collapses to |β00 i (by reading a “00 ” on our measuring apparatus), then C will have
already collapsed to the state α |0i + β |1i. Or, if AB collapses to |β11 i (meter
reads“11 ”), then we apply the operator iY to C to recover α |0i + β |1i, because
0 1 −β α
iY − β |0i + α |1i = = .
−1 0 α β
You’ll refer back to these two facts as we unroll the quantum teleportation algorithm.
344
12.3.1 The Quantum Teleportation Algorithm
We continue to exploit the EPR pairs which I list again for quick reference:
|00i + |11i
|β00 i = √ ,
2
|01i + |10i
|β01 i = √ ,
2
|00i − |11i
|β10 i = √ and
2
|01i − |10i
|β11 i = √ .
2
A and B prepare – and each get one qubit of – the bipartite state |β00 i.
The subscript C indicates that we have a qubit separate from the two entangled
qubits already created and distributed to our two “messagers,” a qubit which lives in
its own space with its own (natural) CBS basis { |0iC , |1iC }.
By tradition for this algorithm, we place the C-channel above the A/B-Channels:
|ψiC −→ register C
(
register A
|β00 iAB −→
register B
The Plan
345
She then follows that up by taking a measurement of her two qubits, getting two
classical bits of information – the outcomes of the two register readings. Finally, she
sends the result of that measurement as a classical two-bit message to B (sorry, we
have to obey the Einstein’s speed limit for this part). B will use the two classical bits
he receives from Alice to tweak his qubit (already modified by Alice’s teleportation)
into the desired state, |ψi.
A Expresses the System State in the Bell Basis (No Action Yet)
In the z-basis, all the information about |ψi is contained in A ’s C-register. She
wants to move that information over to B’s B-register. Before she even does anything
physical, she can accomplish most of the hard work by just rearranging the tripartite
state |ϕi3 in a factored form expanded along a CA Bell -basis rather than a CA
z-basis. In other words, we’d like to see
?
|ϕi3 = |β00 iCA |ψ00 iB + |β01 iCA |ψ01 iB
+ |β10 iCA |ψ10 iB + |β11 iCA |ψ11 iB ,
where the |ψjk iB are (for the moment) four unknown B-channel states. We can
only arrive at such a CA Bell basis expression if the two channels A and C become
entangled, which they are not, initially. We’ll get to that.
In our short review of the Born rule, above, I gave you a preview of the actual
expression we’ll need. This is what we would like/wish/hope for:
3 ? α |0iB + β |1iB β |0iB + α |1iB
|ϕi = |β00 iCA + |β01 iCA
2 2
α |0iB − β |1iB −β |0iB + α |1iB
+ |β10 iCA + |β11 iCA .
2 2
Indeed, if we could accomplish that, then A would only have to measure her two
qubits along the Bell basis, forcing a collapse into one of the four Bell states and
by the Born rule collapsing B’s register into the one of his four matching states. A
glance at the above expressions reveals that this gets us is 99.99% of the way toward
placing |ψi into B’s B-register, i.e., manufacturing |ψiB , a teleported twin to Alice’s
original |ψiA . We’ll see how B gets the last .01% of the way there, but first, we prove
the validity of the hoped-for expansion.
We begin with the desired expression and reduce it to the expression we know
to be our actual starting point, |ϕi3 . (Warning: After the first expression, I’ll be
346
dropping the state-space subscripts A/B/C and letting position do the job.)
α |0iB + β |1iB β |0iB + α |1iB
|β00 iCA + |β01 iCA
2 2
α |0iB − β |1iB −β |0iB + α |1iB
+ |β10 iCA + |β11 iCA
2 2
|00i + |11iα |0i + β |1i |01i + |10i β |0i + α |1i
= √ + √
2 2 2 2
|00i − |11i α |0i − β |1i |01i − |10i −β |0i + α |1i
+ √ + √
2 2 2 2
1
= √ α |000i + α |110i + β |001i + β |111i
2 2
+ β |010i + β |100i + α |011i + α |101i
+ α |000i − α |110i − β |001i + β |111i
− β |010i + β |100i + α |011i − α |101i .
Half the terms cancel and the other half reinforce to give
α |0iB + β |1iB β |0iB + α |1iB
|β00 iCA + |β01 iCA
2 2
α |0iB − β |1iB −β |0iB + α |1iB
+ |β10 iCA + |β11 iCA
2 2
1
= √ 2α |000i + 2β |100i + 2α |011i + 2β |111i
2 2
1 1
= √ α |0i + β |1i |00i + √ α |0i + β |1i |11i
2 2
|00i + |11i
= (α |0i + β |1i) √
2
a happy ending. This was A ’s original formulation of the tripartite state in terms of
the z-basis, so it is indeed the same as the Bell expansion we were hoping for.
Next, we take action to make use of this alternate formulation of our system state.
(Remember, we haven’t actually done anything yet.)
347
A Measures the Registers CA Along the Bell Basis
that is, the to-be-teleported |ψi in Alices’s C-register seems to “shows up” (in a
modified form) in B’s B-register without anyone having taken any action – all we
did was rearrange the terms. However, this rearrangement is only valid if A intends
to measure the AC-register along the BELL basis. Such a measurement, as we have
seen, always has two parts.
1. A applies a BELL† gate to her AC-registers (the operator that takes the non-
standard basis to the z-basis).
The first of these two parts, which effects the instantaneous transfer of |ψi from the
A-channel to the B-channel, corresponds to the teleportation step of “the plan.” The
second part is where A ’s measurement selects one of the four “near”-|ψis for B’s
B-register.
The circuit A needs is
register-C
,
BELL†
register-A
or more explicitly,
register-C • H
.
register-A
348
After applying the gate (but before the measurement), the original tripartite state,
|ϕi3 , will be transformed to
†
3 † α |0iB + β |1iB
BELL ⊗ 1 |ϕi = BELL |β00 iCA ⊗ 1
2
† β |0iB + α |1iB
+ BELL |β01 iCA ⊗ 1
2
† α |0iB − β |1iB
+ BELL |β10 iCA ⊗ 1
2
† −β |0iB + α |1iB
+ BELL |β11 iCA ⊗ 1
2
α |0iB + β |1iB β |0iB + α |1iB
= |00iCA + |01iCA
2 2
α |0iB − β |1iB −β |0iB + α |1iB
+ |10iCA + |11iCA ,
2 2
Now when A measures her two qubits along the z-basis, she will actually be measuring
the pre-gate state along Bell basis. After the measurement only one of the four terms
will remain (by collapse) and B will have a near-|ψi left in his B-register.
The circuit that describes A ’s local Bell basis measurement with B’s qubit going
along for the ride is
|ψi • H
|β00 i
A
|β00 i
B
In case you hadn’t noticed, twice today we’ve seen something that might seem to be
at odds with our previous lessons. I’m talking about a single, entangled, qubit being
fed into one channel of a binary quantum gate, like
|ψi • H
|β00 i
A
349
or
|β00 i
A
X Z g
BELL†
|β00 i
B
(
|ψi2 → U U |ψi2
P Q
However, the new notation that I have provided today, one half of an entangled qubit,
|ψi2 for the A register (or A ’s) qubit, and
A
|ψi2 for the B register (or B’s) qubit,.
B
allows us to write these symbols as individual inputs into either input of a binary
quantum gate without violating the cautionary note. Why? Because earlier, the
separate inputs we disallowed were individual components of a separable tensor (when
no such separable tensor existed). We were saying that you cannot mentally place a
tensor symbol, ⊗, between the two individual inputs. Here, the individual symbols
are not elements in the two component spaces, and there is no danger of treating
them as separable components of a bipartite state, and no ⊗ is implied.
A now has a two (classical) bit result of her measurement: “xy” = “00 ”, “01 ”, “10 ”
or ,“11.” She sends “xy” to B through a classical channel, which takes time to get
there.
350
B Uses the Received Message to Extract |ψi from His Qubit
α |0iB + β |1iB
β |0iB + α |1iB
α |0iB − β |1iB
−β |0iB + α |1iB .
i) Express the operator BELL† ⊗ 1 as an 8 × 8 matrix with the help of the section
“The Matrix of a Separable Operator,” in the lesson on tensor products.
ii) Express the state |ϕi3 = α |0i + β |1i |β00 i as an 8 × 1 column vector
by multiplying it out (use this initial state description, not the “re-arranged”
version that used the Bell basis).
iii) Multiply the 8 × 8 operator matrix by the 8 × 1 state vector to get A ’s output
state (prior to measurement) in column vector form.
v) Factor that last result in such a way that it looks the same as the answer we
got when we applied BELL† ⊗ 1 to the “re-arranged” version of |ϕi3 . ]
vi) In the case where B receives the classical message “10 ” from A , apply the
corresponding “fix-it-up operator” shown in the table to his collapsed qubit and
thereby prove that he recovers the exact teleported state |ψi = α |0i + β |1i.
351
A Circuit Representation of Quantum Teleportation
Just as we used “SD” to be one of four possible operators in the superdense coding
algorithm, we will use “QT ” to mean one of four operators (1, X, Z, iY ) that B must
apply to his register based on the message he receives from A . With this shorthand,
the quantum teleportation circuit can be expressed as:
|ψi • H g •
|β00 i
A
g •
|β00 i QT |ψi
B
The circuit says that after taking the measurements (the meter symbols), A “radios”
the classical data (double lines and wavy lines) to B who uses it to control (filled
circles) which of the four operations he will apply to his qubit.
Once again, we use the identity
ZX = iY
|ψi • H g •
|β00 i
A
g •
|β00 i |ψi
X Z
B
(Don’t forget that operators are applied from left-to-right in circuits, but right-to-left
in algebra.)
Many authors go a step further and add the initial gate that creates the AB-
channel Bell state |β00 i from CBS kets:
|0i •
)
H
→ |β00 i ,
|0i
352
which leads to
C: |ψi • H g •
A: |0i H • g •
B: |0i X Z |ψi
C: |ψi • H g •
A: |0i H • g •
B: |0i X Z |ψi
P Q R S T U
Observe that the tripartite state
|ϕi3 = |ψiC |0iA |0iB
going into the entire circuit is transformed by various gates and measurements along
the way. It continues to exist as a tripartite state to the very end, but you may not
recognize it as such due to the classical wires and transmission of classical information
around access points R and S, seemingly halting the qubit flow to their right. Yet
the full order-3 state lives on. It is simply unnecessary to show the full state beyond
that point, because registers C and A, after collapse, will contain one of the four CBS
kets, |xiC |yiA , for xy = 00, 01, 10 or 11. But those two registers never change after
the measurement, and when Bob applies a measurement to his local register B, say
iY perhaps, he will be implicitly applying the separable operator 1 ⊗ 1 ⊗ iY to the
full separable tripartite state.
[Exercise. Using natural coordinates for everything, compute the state of the
vector |ϕi3 as it travels through the access points, P-U: |ϕi3P , |ϕi3Q , |ϕi3R , |ϕi3S ,
|ϕi3T and |ϕi3U . For points S, T and U you will have to know what measurement A
reads and sends to B, so do those three points twice, once for a reading of CA =“01 ”
and once for a reading of CA =“11 ”. HINT: Starting with the easy point P, apply
transformations carefully to the basis kets using separable notation like (1⊗ BELL)
or (BELL† ⊗ 1). When you get to post-measurement classical pipes, apply the Born
Rule which will select exactly one term in the sum. ]
353
Why Teleportation Works.
Consider the main steps in the teleportation algorithm. We begin with three channels,
the first of which contains all the quantum information we want to teleport, and the
last two none of it,
Once we have the idea to entangle channels A and C by converting to the Bell basis
(perhaps driven by the fact that one of the Bell states is in the AB register pair) we
end up with a state in the general form,
|ϕi3 ∼
= |β00 iCA |ψ00 iB + |β01 iCA |ψ01 iB
+ |β10 iCA |ψ10 iB + |β11 iCA |ψ11 iB .
Without even looking at any of four |ψjk i kets in the B-channel, we are convinced
that 100% of the |ψi information is now sitting inside that register, waiting to be
tapped. Why?
The reason is actually quite simple.
Quantum gates – including basis transformations – are always unitary and thus
reversible. If the Bell-basis operator had failed to transfer all the |ψi information into
the B-register, then, since none of it is left in the AC-registers (they contain only
Bell states), there would be no hope of getting an inverse gate to recover our starting
state which holds the full |ψi. Thus, producing an expression that left channels A
and C bereft of any a trace of |ψi information must necessarily produce a B-channel
contains it all.
354
for the latter two. Binary gates provide even more examples of irreversible classical
operations for which there are no quantum counterparts.
These are specific examples of a general phenomenon that is more easily expressed
in terms of classical Boolean functions.
Boolean Functions. A Boolean function is a function that has one or more
binary digits (0 or 1) as input, and one binary digit as output.
A classical unary gate takes a single classical bit in and produces a single classical
bit out. In the language of functions, it is nothing other than a Boolean function of
one bit, i.e.,
f : { 0, 1 } −→ { 0, 1 } ,
f : B −→ B .
(As we defined it, B had a richer structure than that of a simple set; it had a mod-2
addition operation ⊕ that we will find useful in the definitions and computations to
come.)
Reversible Example. To avoid becoming unmoored by abstractions, we revisit
the negation operator in the language of Boolean functions. If we define
f (x) ≡ ¬ x, for x ∈ B ,
x f (x)
0 1 .
1 0
g(x) ≡ 1, for x ∈ B ,
x g(x)
0 1 ,
1 1
355
we have the quintessential example of an irreversible function.
Boolean functions (classical implied) will now become the subject of study. Com-
puter science seeks to answer questions about such functions or create Boolean func-
tions that do useful things. On the other hand, we will be using quantum circuits
composed of quantum operators to answer questions about these classical Boolean
functions. In other words, we have not abandoned the classical functions (I’ll drop
the modifier “Boolean” for now) in the least. On the contrary: they are the principle
players in our narrative.
The language naturally extends to functions of more than one input bit. To keep
things simple, let’s talk about two bits.
A two bit function (classical and Boolean implied) takes two bits in and produces
one bit out. In other words,
0 0 1 1
f: , , , −→ { 0, 1 } ,
0 1 0 1
or in B notation,
f : B2 −→ B .
(Column vs. row is not important here, so I’ll use whichever fits better into the written
page without the ()t baggage.)
Note that we are avoiding the term “binary,” replacing it instead with “two bit”
to avoid confusion arising from the fact that we are using binary digits for every input
slot, whether a unary input or a multi-bit input.
Irreversibility in the Two (or Greater) Bit Case. Since two-input Boolean
functions functions, like all Boolean functions, have a single bit out, they are inher-
ently irreversible; we cannot undo the destruction that results from the loss of one or
more bits.
(Necessarily) Irreversible Example. A typical two bit function that, like all
two+ bit functions, is necessarily irreversible is the XOR, i.e.,
x
f (x, y) ≡ x ⊕ y, for ∈ B2 ,
y
with the truth table
(x, y) f (x, y)
(0, 0) 0
(0, 1) 1 .
(1, 0) 1
(1, 1) 0
356
12.4.2 The Quantum Oracle of a Boolean Function
Although our quantum algorithms will use quantum gates, they will often have to
incorporate the classical functions that are the center of our investigations. But
how can we do this when all quantum circuits are required to use unitary – and
therefore reversible – gates? There is a well known classical technique for turning an
otherwise irreversible function into one that is reversible. The technique pre-dates
quantum computing, but we’ll look at it only in the quantum context, and if you’re
interested in the classical analog, you can mentally “down-convert” ours by ignoring
its superposition capability and focus only on the CBS inputs.
Suppose we are given a black box that computes some unary function, f (x), even one
that may be initially unknown to us. The term black box suggests that we don’t know
what’s on the inside or how it works.
x f f (x)
It can be shown that, using this black box – along with certain fundamental quantum
gates – one can build a new gate that
• does so with the same efficiency (technically, the same computational complexity,
a term we will define in a later lesson), as the black box f , whose irreversible
function we want to reproduce.
We won’t describe how this works but, instead, take it as a given and call the new,
larger circuit “Uf ,” the quantum oracle for f . Its action on CBS kets and its circuit
diagram are defined by
357
Example. We compute the matrix for Uf when f (x) = 0, the constant (and
irreversible) [0]-op. Starting with the construction of the matrix of any linear trans-
formation and moving on from there,
Uf = Uf |00i , Uf |01i , Uf |10i , Uf |11i
= |0i | 0 ⊕ f (0) i , |0i | 1 ⊕ f (0) i , |1i | 0 ⊕ f (1) i , |1i | 1 ⊕ f (1) i
= |0i |f (0)i , |0i | f (0) i , |1i |f (1)i , |1i | f (1) i ,
where we use the alternate notation for negation, a = ¬ a. So far, everything we did
applies to the quantum oracle for any function f , so we’ll put a pin in it for future
use. Now, going on to apply it to f = [0]-op,
U[0]-op = |0i |0i , |0i |1i , |1i |0i , |1i |1i
1 0 0 0
0 1 0 0
= 0 0 1 0 ,
0 0 0 1
an interesting result in its own right, U[0]-op = 1, but nothing to which we should
attribute any deep meaning. Do note, however, that such a nice result makes it self-
evident that Uf is not only unitary but its own inverse, as we show next it always
will be.
Uf is Always its Own Inverse. We compute on the tensor CBS, and the result
will be extensible to the entire H ⊗ H by linearity:
Uf Uf |xyi = Uf Uf |xyi = Uf |xi | y ⊕ f (x) i
E
= |xi y ⊕ f (x) ⊕ f (x)
E
= |xi y ⊕ f (x) ⊕ f (x) = |xi |yi = |xyi QED
|xi |xi
Uf .
|0i |f (x)i
358
Notice that output of Uf for a CBS is always a separable state
|xi ⊗ | y ⊕ f (x) i ,
Everything extends smoothly to more than one-bit gates, so we only need outline
the analysis for two-qubits. We are given black box of a two-input Boolean function,
f (x0 , x1 ),
x0
f f (x0 , x1 )
x1
This time, we assume that circuit theory enables us to build a three-in, three-out
oracle, Uf , defined by
|x0 i |x0 i
,
|x1 i Uf |x1 i
|yi | y ⊕ f (x0 , x1 ) i
usually shortened by using the encoded form of the CBS kets, |xi2 , where x ∈
{ 0, 1, 2, 3 },
|xi2 / / |xi2
Uf .
|yi |y ⊕ f (x)i
The key points are the same:
[Exercise. Compute the quantum oracle (in matrix form) for the classical AND
gate.]
359
12.5 Deutsch’s Problem
Our first quantum algorithm answers a question about an unknown unary function
f (x). It does not find the exact form of this function, but seeks only to answer a
general question about its character. Specifically, we ask whether the function is
one-to-one (distinct inputs produce distinct outputs) or constant (both inputs are
mapped to the same output.)
Obviously, we can figure this out by evaluating both f (0) and f (1), after which
we would know the answer, not to mention have a complete description of f . But
the point is to see what we can learn about f without doing both evaluations of the
function; we only want to do one evaluation. In a classical world if we only get to
query f once we have to choose between inputs 0 or 1, and getting the output for our
choice will not tell us whether the function is one-to-one or constant.
All the massive machinery we have accumulated in the past weeks can be brought
to bear on this simple problem very neatly to demonstrate how quantum parallelism
will beat classical computing in certain problems. It will set the stage for all quantum
algorithms.
Balanced Function. A balanced function is one that takes on the value 0 for exactly
half of the possible inputs (and therefore 1 on the other half).
Two examples of balanced functions of two inputs are and XOR and 1y : (x, y) 7→
y:
(0, 0) 0 (0, 0) 0
(0, 1) 1 (0, 1) 1
(1, 0) 1 (1, 0) 0
(1, 1) 0 (1, 1) 1
Two unbalanced function of two inputs are AND and the [1]-op:
360
(x, y) AN D(x, y) (x, y) [1](x, y)
(0, 0) 0 (0, 0) 1
(0, 1) 0 (0, 1) 1
(1, 0) 0 (1, 0) 1
(1, 1) 1 (1, 1) 1
Constant Functions. Constant functions are functions that always produce the
same output regardless of the input. There are only two constant functions for any
number of inputs: either the [0]-op or the [1]-op. See the truth table for the [1]-op,
above; the truth table for the [0]-op would, of course, have 0s in the right column
instead of 1s.
There are only four unary functions. Therefore the terms balanced and constant might
seem heavy handed. The two constant functions are obviously the [0]-op or the [1]-op,
and the other two are balanced. In fact, the balanced unary functions already have
a term that describes them: one-to-one. There’s even a simpler term in balanced
functions in the unary case: not constant. To see this, let’s lay all of our cards “on
the table,” pun intended.
x 1 x ¬ x [0] x [1]
0 0 0 1 0 0 0 1
1 1 1 0 1 0 1 1
So exactly two of our unary ops are constant and the other two are balanced = one-
to-one = not constant.
The reason we complicate things by adding the vocabulary constant vs. balanced
is that we will eventually move on to functions of more than one input, and in those
cases,
• not all functions will be either balanced or one-to-one (e.g., the binary AND
function isn’t either), and
• balanced functions will not be one-to-one (e.g., binary XOR function is balanced
but not one-to-one)
Deutsch’s Problem
We are now ready to state Deutsch’s problem using vocabulary that will help when
we go to higher-input functions.
361
Deutsch’s Problem. Given an unknown unary function that we are told is either
balanced or constant, determine which it is in one query of the quantum oracle,
Uf .
Notice that we are not asking to determine the exact function, just which category
it belongs to. Even so, we cannot do it classically in a single query.
The Circuit
We combine the quantum oracle for f with a few Hadamard gates in a very small
circuit:
|0i H H
Uf
|1i H (ignore)
Because there are only four unary functions, the temptation is to simply plug each
one into Uf and confirm our claim. That’s not a bad exercise (which I’ll ask you to
do), but let’s understand how one arrives at this design so we can use the ideas in
other algorithms.
362
takes us off this classical plane into quantum hyperspace where all the fun happens.
When we send such a non-trivial superposition through the quantum oracle, we are
implicitly processing both z-basis kets – and therefore both classical states, [0] and
[1] – simultaneously. This is the first big idea that fuels quantum computing and
explains how it achieves its speed improvements. (The second big idea is quantum
entanglement, but we’ll feature that one a little later.)
The practical impact of this technique in Deutsch’s algorithm is that we’ll be
sending a perfectly balanced (or maximally mixed ) superposition,
1 1
|0ix = √ |0i + √ |1i ,
2 2
through the data register (the A-channel) of the oracle, Uf .
#2: The Phase Kick-Back Trick. This isn’t quite as generally applicable as
quantum parallelism, but it plays a role in several algorithms including some we’ll
meet later in the course. It goes like this. If we feed the other maximally mixed state,
1 1
|1ix = √ |0i − √ |1i ,
2 2
into the target register (the B-channel) of Uf , we can transfer – or kick-back – 100%
of the information about the unknown function f (x) from the B-register output to
the A-register output.
You’ve actually experienced this idea earlier today when you studied quantum
teleportation. Recall that by merely rearranging the initial configuration of our input
state we were able to effect a seemingly magical transfer of |ψi from one channel
to the other. In the current context, presenting the x-basis ket, |1ix , to the target
register will have a similar effect.
Because we are going to make heavy use of the x-basis kets here and the variable x is
being used as the Boolean input to the function f (x), I am going to call into action
our alternate x-basis notation,
|+i ≡ |0ix and
|−i ≡ |1ix .
Together, the two techniques explain the first part of Deutsch’s circuit (in the dashed-
box),
|0i H H
Uf .
|1i H (ignore)
363
We recognize H as the operator that takes z-basis kets to x-basis kets, thus manu-
facturing a |+i (i.e., |0ix ) for the data register input and |−i (i.e., |1ix ) for the target
register input,
|0i H |+i
.
|1i H |−i
In other words, the Hadamard gate converts the two natural basis kets (easy states to
prepare) into superposition inputs for quantum oracle. The top gate sets up quantum
parallelism for the circuit, and the bottom one sets up the phase kick-back. For
reference, algebraically these two gates perform
|0i + |1i
H |0i = √ = |+i and
2
|0i − |1i
H |1i = √ = |−i .
2
The real understanding of how the algorithm works comes by analyzing the kernel of
the circuit, the oracle (in the dashed-box),
|0i H H
Uf .
|1i H (ignore)
Step 1. CBS Into Both Channels. We creep up slowly on our result by first
considering a CBS ket into both registers, a result we know immediately by definition
of Uf ,
Step 2. CBS Into Data and Superposition into Target. We stick with a
CBS |xi going into the data register, but now allow the superposition |−i to go into
364
the target register. Extend the above linearly,
|0i − |1i
! Uf |xi |0i − Uf |xi |1i
Uf (|xi |−i) = Uf |xi √ = √
2 2
This amounts to
|0i − |1i
√ , when f (x) = 0
2
Uf (|xi |−i) = |xi
|1i − |0i
√
, when f (x) = 1
2
f (x) |0i − |1i
= |xi (−1) √ .
2
Since it’s a scalar, (−1)f (x) can be moved to the left and be attached to the A-register’s
|xi, a mere rearrangement of the terms,
f (x) |0i − |1i
f (x)
Uf (|xi |−i) = (−1) |xi √ = (−1) |xi |−i ,
2
and we have successfully (like magic) moved all of the information about f (x) from
the B-register to the A-register, where it appears as an overall phase factor in the
scalar’s exponent, (−1)f (x) .
The Oracle’s part of the circuit would process this intermediate step’s data as
follows.
Although we have a ways to go, let’s pause to summarize what we have accom-
plished so far.
365
in the coming step. Viewed this way, the B-register retains no useful informa-
tion; just like in teleportation, a rearrangement of the data sometimes creates a
perceptual shift of information from one channel to another that we can exploit
by measuring along a different basis – something we will do in a moment.
Step 3. Superpositions into Both Registers. Finally, we want the state |+i
to go into the data register so we can process both f (0) and f (1) in a single pass.
The effect is to present the separable |+i ⊗ |−i to the oracle and see what comes out.
Applying linearity to the last result we get
|0i + |1i
Uf (|+i |−i) = Uf √ |−i
2
Uf |0i |−i + Uf |1i |−i
= √
2
(−1)f (0) |0i |−i + (−1)f (1) |1i |−i
= √
2
(−1)f (0) |0i + (−1)f (1) |1i
= √ |−i .
2
By combining the phase kick-back with quantum parallelism, we’ve managed to get an
expression containing both f (0) and f (1) in the A-register. We now ask the question
that Deutsch posed in the context of this simple expression, “What is the difference
between the balanced case (f (0) 6= f (1)) and the constant case (f (0) = f (1))?”
Answer: When constant, the two terms in the numerator have the same sign and
when balanced, they have different signs, to wit,
|0i + |1i
(±1) √ |−i , if f (0) = f (1)
2
Uf (|+i |−i) =
|0i − |1i
√
(±1)
|−i , if f (0) 6= f (1)
2
We don’t care about a possible overall phase factor or (−1) in front of all this since
it’s a unit scalar in a state space. Dumping it and noticing that the A-register has
x-basis kets in both cases, we get the ultimate simplification,
|+i |−i , if f (0) = f (1)
Uf (|+i |−i) =
|−i |−i , if f (0) 6= f (1)
the perfect form for an x-basis measurement. Before we do that, let’s have a look at
the oracle’s input and output states,
366
Measurement
We only care about the A-register, since the B-register will always collapse to |−i.
The conclusion?
|0i H H
Uf .
|1i H (ignore)
We’ve explained the purpose of all the components in the circuit and how each plays
a role in leveraging quantum parallelism and phase kick-back. The result is extremely
easy to state. We run the circuit
|0i H H
Uf
|1i H (ignore)
one time only and measure the data register output in the natural basis.
This may not seem like game changing result; a quantum speed up of 2× in a prob-
lem that is both trivial and without any real world application, but it demonstrates
that there is a difference between quantum computing and classical computing. It
also lays the groundwork for the more advanced algorithms to come.
367
Chapter 13
We learned that the tensor product of two vector spaces, A and B, having dimensions
dA and dB , respectively, is the product space,
W = A⊗B,
368
whose vectors (a.k.a. tensors) consist of objects, w, expressible as weighted sums of
the separable basis, i.e.,
dA −1,
dB −1
X
w = ckj (ak ⊗ bj ) ,
k=0,
j=0
where the ckj are the scalar weights and also serve as the coordinates of w along the
states basis.
The separable basis tensors appearing in the above linear combination are the
dA dB vectors
n o
ak ⊗ bj k = 0, . . . , (dA − 1) and j = 0, . . . , (dB − 1) ,
The sums, products and equivalence of tensor expressions were defined by the required
distributive and commutative properties, but can often be taken as the natural rules
one would expect.
A separable operator on the product space is one that arises from two component
operators, TA and TB , each defined on its respective component space, A and B. This
separable tensor operator is defined first by its action on separable order-2 tensors
and since the basis tensors are of this form, it establishes the action of TA ⊗ TB on
the basis which, in turn, extends the action to the whole space.
W = A⊗B⊗C
in order to acquire the vocabulary needed to present a few of the early quantum
algorithms involving three channels. Here is a summary of that section.
369
Objects of the Produce Space and Induced Basis
Assuming A, B and C have dimensions dA , dB and dC , respectively, the basis for the
product space is the set of dA dB dC separable tensors
n o
ak ⊗ bj ⊗ cl ,
where
dA −1,
dB −1,
dC −1
X
w = ckjl (ak ⊗ bj ⊗ cl ) ,
k=0,
j=0,
l=0
where the ckjl are the scalar weights (or coordinates) that define w.
A separable operator on the product space is one that arises from three component
operators, TA , TB and TC , each defined on its respective component space, A, B and
C. This separable tensor operator is defined first by its action on separable order-3
tensors
and since the basis tensors are of this form, that establishes the action of TA ⊗TB ⊗TC
on the basis which, in turn, extends the action to the whole space.
W = A0 ⊗ A1 ⊗ · · · ⊗ An−2 ⊗ An−1 ,
370
for n >= 2. We’ll label the dimensions of the component spaces by
dim (A0 ) = d0 ,
dim (A1 ) = d1 ,
..
.
dim (An−2 ) = dn−2 and
dim (An−1 ) = dn−1 .
which seems really big (and is big in fields like general relativity), but for us each
component space is H which has dimension two, so dim(W ) will be the – still large
but at least palatable – number 2n .
The vectors – a.k.a. tensors – of the space consist of those w expressible as weighted
sums of the separable basis
n od0 −1, d1 −1, ..., dn−1 −1
a0k0 ⊗ a1k1 ⊗ a2k2 ⊗ · · · ⊗ a(n−1)kn−1 .
k0 , k1 , ..., kn−1 = 0, 0, ..., 0
If we write this algebraically, the typical w in W has a unique expansion along the
tensor basis weighted by the scalars ck0 k1 ...kn−1 ,
X
w = ck0 k1 ...kn−1 a0k0 ⊗ a1k1 ⊗ a2k2 ⊗ · · · ⊗ a(n−1)kn−1 .
This notation is an order of magnitude more general than we need, but it is good
to have down for reference. We’ll see that the expression takes on a much more
manageable form when we get into the state spaces of quantum computing.
The sums, products and equivalence of tensor expressions have definitions analo-
gous to their lower-order prototypes. You’ll see examples as we go.
371
Separable Operators in the Product Space
A separable operator on the product space is one that arises from n component op-
erators, T0 , T1 . . ., Tn−1 , each defined on its respective component space, A0 , A1 ,
. . ., An−1 . This separable tensor operator is defined first by its action on separable
order-n tensors
[T0 ⊗ T1 ⊗ · · · ⊗ Tn−1 ] (v0 ⊗ v1 ⊗ · · · ⊗ vn−1 )
≡ T0 (v0 ) ⊗ T1 (v1 ) ⊗ · · · ⊗ Tn−1 (vn−1 ) ,
and since the basis tensors are always separable,
a0k0 ⊗ a1k1 ⊗ · · · ⊗ a(n−1)kn−1 ,
this establishes the action of T0 ⊗ T1 ⊗ · · · ⊗ Tn−1 on the basis,
[T0 ⊗ T1 ⊗ · · · ⊗ Tn−1 ] a0k0 ⊗ a1k1 ⊗ · · · ⊗ a(n−1)kn−1
≡ T0 (a0k0 ) ⊗ T1 (a1k1 ) ⊗ · · · ⊗ Tn−1 (a(n−1)kn−1 ) ,
which, in turn, extends the action to the whole space.
Notation
Q N
Sometimes we use the or notation to shorten expressions. In these forms, the
product space would be written in one of the two equivalent ways
n−1
O n−1
Y
W = Ak = Ak ,
k=0 k=0
or
"n−1 # n−1
! n−1
Y Y Y
Tk vk = Tk (vk ) .
k=0 k=0 k=0
372
13.3 n-Qubit Systems
The next step in this lecture is to define the precise state space we need for a quantum
computer that supports n qubits. I won’t back up all the way to two qubits as I did for
the tensor product, but a short recap of three qubits will be a boon to understanding
n qubits.
H(3) ∼
= HA ⊗ HB ⊗ HC .
The dimension is 2 × 2 × 2 = 8.
The natural three qubit tensor basis is constructed by forming all possible separable
products from the component space basis vectors, and we continue to use our CBS
ket notation. The CBS for H(3) is therefore
n
|0i ⊗ |0i ⊗ |0i , |0i ⊗ |0i ⊗ |1i , |0i ⊗ |1i ⊗ |0i , |0i ⊗ |1i ⊗ |1i ,
o
|1i ⊗ |0i ⊗ |0i , |1i ⊗ |0i ⊗ |1i , |1i ⊗ |1i ⊗ |0i , |1i ⊗ |1i ⊗ |1i ,
373
The notation of the first two columns admits the possibility of labeling each of the
component kets with the H from which it came, A, B or C,
The densest of the notations expresses the CBS ket as an integer from 0 to 7. This
can all be summarized by looking at the third order coordinate representation of these
eight tensors:
0 0 0 0 0 0 0 1
A typical three qubit value is a normalized superposition of the eight CBS, e.g.,
or most generally,
7
X
ck |ki3 ,
k=0
where
7
X
|ck |2 = 1.
k=0
374
• the separable product of three first order operators,
In this course, we won’t be studying third order gates other than the ones that are
separable products of first order gates (like H ⊗3 discussed in the next section). How-
ever, let’s meet one – the Toffoli gate – which plays a role in reversible computation
and some of our work in the later courses CS 83B and CS 83C.
: The Symbol
•
• ,
•
• .
The A and B registers are the control bits, and the C register the target bit,
•
“control bits”
• ,
“target bit” −→
two terms that will become clear in the next bullet point.
At times I’ll use all caps, as in TOFFOLI, to name the gate in order to give it the
status of its simpler cousin, CNOT.
The TOFFOLI gate has the following effect on the computational basis states:
|xi • |xi
|yi • |yi
|zi | (x ∧ y) ⊕ z i
375
In terms of the eight CBS tensors, it leaves the A and B registers unchanged and
negates the C register qubit or leaves it alone based on whether the AND of the
control bits is “1” or “0”:
(
|zi , if x ∧ y = 0
|zi 7−→
| ¬ zi , if x ∧ y = 1
It is a controlled-NOT operator, but the control consists of two bits rather than one.
Remember, not every CBS definition we can drum up will result in a unitary
operator, especially when we start defining the output kets in terms of arbitrary
classical operations. In an exercise during your two qubit lesson you met a bipartite
“gate” which seemed simple enough but turned out not to be unitary. So we must
confirm this property in the next bullet.
···
··· : The Matrix
We compute the column vectors of the matrix by applying TOFFOLI to the CBS
tensors to get
!
MTOFFOLI = TOFFOLI |000i , TOFFOLI |001i , · · · , TOFFOLI |111i
!
= |000i , |001i , |010i , |011i , |100i , |101i , |111i , |110i
1 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0
0 0 1 0 0 0 0 0
0 0 0 1 0 0 0 0
=
0
0 0 0 1 0 0 0
0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 1
0 0 0 0 0 0 1 0
which is an identity matrix until we reach the last two rows (columns) where it swaps
those rows (columns). It is unitary.
[Exercise. Prove that this is not a separable operator.]
376
|ψi2 : Behavior on General State
This is as far as we need to go on the Toffoli gate. Our interest here is in higher order
gates that are separable products of unary gates.
13.3.3 n Qubits
Definition and Notation
Note. As usual, we consider two unit-tensors which differ by a phase factor, eiθ
for real θ, to be the same n qubit value.
We can designate this state space using the notation
n n−1
z }| { O
H(n) = H ⊗ H ⊗ ... ⊗ H = H.
k=0
The natural n qubit tensor basis is constructed by forming all possible separable
products from the component space basis vectors. The CBS for H(n) is therefore
n
|0i ⊗ · · · ⊗ |0i ⊗ |0i ⊗ |0i , |0i ⊗ · · · ⊗ |0i ⊗ |0i ⊗ |1i , |0i ⊗ · · · ⊗ |0i ⊗ |1i ⊗ |0i ,
|0i ⊗ · · · ⊗ |0i ⊗ |1i ⊗ |1i , |0i ⊗ · · · ⊗ |1i ⊗ |0i ⊗ |0i , . . . ,
o
... , |1i ⊗ · · · ⊗ |1i ⊗ |1i ⊗ |0i , |1i ⊗ · · · ⊗ |1i ⊗ |1i ⊗ |1i ,
377
with the shorthand options
Dimension of H(n)
As you can tell by counting, there are 2n basis tensors in the product space, which
makes sense because the dimension of the product space is the
product of the dimen-
sions of the component spaces; since dim(H) = 2, dim H(n) = 2 × 2 × · · · × 2 = 2n ,
X.
For nth order CBS kets we usually label each component ket using the letter x
with its corresponding space label,
|xin , x ∈ {0, 1, 2, 3, . . . , 2n − 1} .
|0i5 ←→ |00000i ,
5
|1i ←→ |00001i ,
|2i5 ←→ |00010i ,
|8i5 ←→ |01000i ,
5
|23i ←→ |10111i ,
and, in general,
|xi5 ←→ |x4 x3 x2 x1 x0 i .
378
13.3.4 n Qubit Logic Gates
Quantum logic gates of order n > 3 are nothing more than unitary operators of order
n > 3, which we defined above. There’s no need to say anything further about a
general nth order logic gate. Instead, let’s get right down to the business of describing
the specific example that will pervade the remainder of the course.
where “” stands for the mod-2 dot product. It is only a matter of expending more
time and graphite to prove that this turns into the higher order version,
n 2Xn −1
n 1
H ⊗n |xi = √ (−1)x y |yin .
2 y=0
379
Vector Notation
We’ll sometimes present the formula using vector dot products. If x and y are con-
sidered to be vectors of 1s and 0s, we would represent them using boldface x and
y,
xn−1 yn−1
xn−2 yn−2
.. ..
x ↔ x = . , y ↔ y = . .
x1 y1
x0 y0
When so expressed, the dot product between vector x and vector y is considered the
mod-2 dot product,
This results in an equivalent form of the Hadamard gate using vector notation,
n −1
n 2X
n 1
H ⊗n |xi = √ (−1)x · y |yin .
2 y=0
We’ll be doing higher level basis conversions frequently, especially between the z-basis
and the x-basis. Let’s review and extend our knowledge.
The induced z-basis for H(n) is
|0i |0i · · · |0i |0i , |0i |0i · · · |0i |1i , |0i |0i · · · |1i |0i , . . . , |1i |1i · · · |1i |1i ,
|0ix |0ix · · · |0ix |0ix , |0ix |0ix · · · |0ix |1ix , |0ix |0ix · · · |1ix |0ix ,
. . . , |1ix |1ix · · · |1ix |1ix .
the induced x-basis CBS can also be written without using the letter “x” as a label,
|+i |+i · · · |+i |+i , |+i |+i · · · |+i |−i , |+i |+i · · · |−i |+i ,
. . . , |−i |−i · · · |−i |−i .
380
Since H converts to and from these the x and z bases in H, it easy to confirm that
the separable H ⊗n converts to-and-from these two basis in H(n) .
[Exercise. Do it.]
Notation. In order to make the higher order x-CBS kets of H(n) less confusing
(we need “x” as an encoded integer specifying the CBS state), I’m going to call to
duty some non-standard notation that I introduced in our two qubit lecture: I’ll use
the subscript “±” to indicate a CBS relative to the x-basis:
That is, if y is an integer from 0 to 2n − 1, when you see the ± subscript on the CBS
ket you know that its binary representation is telling us which x-basis (not z-basis)
ket it represents. So,
(Without the subscript “±,” of course, we mean the usual z-basis CBS.) This frees
up the variable x for use inside the ket,
H ⊗n |xin = |xin±
H ⊗n |xin± = |xin .
In the last section we were looking at the separable form of the CBS for an nth order
Hilbert space. Let’s count for a moment. Whether we have an x-basis, z-basis or any
other basis induced from the n component Hs, there are n factors in the separable
factorization, i.e.,
n components n components
z }| { z }| {
|0i |1i |1i · · · |0i |1i or |+i |−i |−i · · · |+i |−i
But when expanded along any basis these states have 2n components (because the
product space is 2n dimensional). From our linear algebra and tensor product lessons
381
we recall that a basis vector, bk , expanded along its own basis, B, contains a single 1
and the rest 0s. In coordinate form that looks like
0
..
.
bk = 1 ←− kth element .
B .
..
0 B
This column vector is very tall in the current context, whether a z-basis ket,
0
.
..
|xin = 2n rows 1 ←− xth element ,
.
..
0 z
or more to the point of this exploration, an x-basis ket expanded along the z-basis,
?
..
.
n n
|xi± = 2 rows ? .
.
..
? z
Actually, we know what those ?s are because it is the H ⊗n which turns the z-CBS
into an x-CBS,
H ⊗n
|xin −→ |xin± ,
and we have already seen the result of H ⊗n applied to any |xin , namely,
n −1
n 2X
1
|xin± = H ⊗n
|xi = √ (−1)x y |yin ,
2 y=0
382
which, when written out looks something like
n
|0in ± |1in ± |2in ± |3in ± . . . ± |2n−1 − 1i
√ n .
2
But we can do better. Not all possible sums and differences will appear in the sum,
so not all possible combinations of +1 and -1 will appear in an x-basis ket’s column
vector (not counting the scalar factor ( √12 )n ). An x-CBS ket, |xi± , will have exactly
the same number of +s as −s in its expansion (and +1s, −1s in its coordinate vector)
— except for |0i± , which has all +s (+1s). How do we know this?
We start by looking at the lowest dimension, H = H(1) , where there were two
easy-to-grasp x-kets in z-basis form,
|0i + |1i
|+i = √ and
2
|0i − |1i
|−i = √ .
2
The claim is easily confirmed here with only two kets to check. Stepping up to second
order, the x-kets expanded along the z-basis were found to be
383
where — except for |0in± which has all plus signs — the sum will always have an equal
numbers of +s and −s.
[Caution. This doesn’t mean that every sum with an equal number of positive
and negative coefficients is necessarily an x CBS ket; there are still more ways to
distribute the +s and −s equally than there are CBS kets, so the distribution of the
plus and minus signs has to be even further restricted if the superposition above is to
represent an x-basis ket. But just knowing that all x CBS tensors, when expanded
along the z-basis, are “balanced” in this sense, will help us understand and predict
quantum circuits.]
Proof of Lemma. We already know that the lemma is true for first and second
order state spaces because we are staring directly into the eyes of the two x-bases,
above. But let’s see why the Hadamard operators tell the same story. The matrix for
H ⊗2 , which is used to convert the second order z-basis to an x-basis, is
1 1 1 1
1 1 −1 1 −1 .
H ⊗H =
21 1 −1 −1
1 −1 −1 1
If we forget about the common factor 12 , it has a first column of +1s, and all its
remaining columns have equal numbers of +1s and −1s. If we apply H ⊗2 to |0i2 =
( 1, 0, 0, 0 )t we get the first column, all +1s. If we apply it to any |xi2 , for x > 0,
say, |2i2 = ( 0, 0, 1, 0 )t , we get one of the other columns, each one of which has an
equal numbers of +1s and −1s.
To reproduce this claim for any higher order Hadamard, we just show that the
matrix for H ⊗n (which generates the nth order x-basis) will also have all +1s in the
left column and equal number of +1s and −1s in the other columns. This is done
formally by recursion, but we can get the gist by noting how we extend the claim
from n = 2 to n = 3. By definition,
H ⊗3 = H ⊗ H ⊗ H = H ⊗ H ⊗2 ,
By our technique for calculating tensor product matrices, we know that the matrix
on the right will appear four times in the 8 × 8 product matrix, with the lower right
copy being negated (due to the -1 in the lower right of the smaller left matrix). To
384
wit,
1 1 1 1 1 1 1 1
1 −1 1 −1 1 −1 1 −1
1 1 −1 −1 1 1 −1 −1
3 1 −1 −1 1 1 −1 −1 1
1
H ⊗3
= √
2 1 1 1 1
−1 −1 −1 −1
1 −1 1 −1 −1 1 −1 1
1 1 −1 −1 −1 −1 1 1
1 −1 −1 1 −1 1 1 −1
Therefore, except for the first column (all +1s), the tensor product’s columns will all
be a doubling (vertical stacking) of the columns of the balanced 4 × 4 (or a negated
4 × 4). Stacking two balanced columns above one another produces a column that is
twice as tall, but still balanced. QED
[Exercise. Give a rigorous proof by showing how one extends an order (n − 1)
Hadamard matrix to an order n Hadamard matrix.]
385
action on these CBS is
|xin / / |xin
Uf .
|yi |y ⊕ f (x)i
• We assume (it does not follow from the definition) that the oracle is of the
same spatial circuit complexity as f (x), i.e., it grows in size at the same rate as
f grows relative to the number of inputs, n. This is usually demonstrated to
be true for common individual functions by manually presenting circuits that
implement oracles for those functions.
f : {0 , 1}n −→ {0 , 1} .
f ( xn−1 , xn−2 , . . . , x1 , x0 ) ,
386
13.4.1 Deutsch-Jozsa Algorithm
The algorithm consists of building a circuit very similar to that in Deutsch’s circuit
and measuring the data register once. Our conclusion about f is the same as in the
unary case: if we get a “0 ” the function is constant, if we get “1 ” the function is
balanced. We’ll analyze the speed-up after we prove this claim.
The Circuit
We replace the unary Hadamard gates of Deutsch’s circuit with nth order Hadamard
gates to accommodate the wider data register lines, but otherwise, the circuit layout
is organized the same:
|0in / H ⊗n / / H ⊗n /
Uf
|1i H (ignore)
• using the phase kick-back trick by putting |−i into Uf ’s target register.
The first part of the Deutsch-Jozsa circuit (in the dashed box) prepares states that are
needed for quantum parallelism and phase kick-back just as the lower-order Deutsch
circuit did,
|0in / H ⊗n / / H ⊗n /
Uf .
|1i H (ignore)
The H and H ⊗n operators take z-basis kets to x-basis kets in the first order H, and
the nth order H(n) spaces, respectively, thus manufacturing a |0in± for the data register
input and |−i for the target register input,
|0in H ⊗n |0in±
.
|1i H |−i
387
The top gate sets up quantum parallelism and the bottom sets up the phase kick-back.
For reference, here is the algebra:
n −1
n 2X 2Xn
−1
n 1 1
H ⊗n
|0i = |0in± = √ n
|yi = √ |yin and
2 n
2 y=0
y=0
|0i − |1i
H |1i = √ = |−i .
2
Next, we consider the effect of the oracle on these two x-basis inputs (dashed box),
|0in / H ⊗n / / H ⊗n /
Uf .
|1i H (ignore)
We’ll do it in stages, as before, to avoid confusion and be sure we don’t make mistakes.
Step 1. CBS Into Both Channels. When a natural CBS ket goes into both
registers, the definition of Uf tells us what comes out:
388
This amounts to
|0i − |1i
√ , when f (x) = 0
2
Uf (|xin |−i) = |xin
|1i − |0i
√
, when f (x) = 1
2
n f (x) |0i − |1i
= |xi (−1) √ .
2
Step 3. Superpositions into Both Registers. Finally, we send the full output
of H ⊗n |0in ,
|0in± = |+ + + · · · + +i ,
into the data register so we can process f (x) for all x in a single pass and thereby
leverage quantum parallelism. The net effect is to present the separable |0in± ⊗ |−i to
the oracle. Applying linearity to the last result we find
n −1
2X
! !
n 1 n
Uf |0i± |−i = Uf √ |yi |−i
2n y=0
n −1
2X
1
= √ Uf |yin |−i
2n y=0
n −1
2X
1
= √ (−1)f (y) |yin |−i
2n y=0
n −1
2X
!
1 n
= √ (−1)f (y) |yi |−i .
2n y=0
389
The Final Hadamard Gate
[Warning. The version of the argument I give next is easy enough to follow and will
“prove” the algorithm, but it may leave you with a “huh?” feeling. That’s because
it does not explain how one arrives at the decision to apply the final Hadamard. I’ll
present a more illuminating alternative at the end of this lesson that will be more
satisfying but which requires that you activate a few more little gray cells.]
We are ready to apply the nth order Hadamard gate in the upper right (dashed
box),
|0in / H ⊗n / / H ⊗n /
Uf .
|1i H (ignore)
To that end, we consider how it changes the state at access point P into a state at
the final access point Q:
|0in / H ⊗n / / H ⊗n /
Uf
|1i H (ignore)
P Q
390
it is the output of the data register we will test. It produces the output
n −1
2X
! n −1
2X
1 n 1
H ⊗n
√ (−1) f (y)
|yi = √ (−1)f (y) H ⊗n |yin
n
2 y=0 n
2 y=0
n −1 "
2X n −1
2X
! #
1 1
= √ (−1)f (y) √ (−1)yz |zin
2n y=0 2n z=0
n −1
2X n −1
2X
!
1
|zin
= (−1)f (y) (−1)yz ,
2n z=0
y=0
| {z }
G(z)
where we have regrouped the sum and defined a scalar function, G(z), of the sum-
mation index z. So, the final output is an expansion along the z-basis,
2 −1 n
1 X
data register at access point Q = G(z) |zin .
2n z=0
We now look only at the coefficient, G(0), of the very first CBS ket, |0in . This will
tell us something about the other 2n − 1 CBS coefficients, G(z), for z > 0. We break
it into two cases.
• f is constant. In this case, f (y) is the same for all y, either 0 or 1; call it c.
We evaluate the coefficient of |0in in the expansion, namely G(0)
2n
.
n
2 −1
G(0) 1 X 2n
= (−1)c (−1)y0 = (−1)c = ±1 ,
2n 2n y=0 2n
thereby forcing the coefficients of all other z-basis kets in the expansion to be 0
(why?). So in the constant case we have a CBS ket |0in at access point Q with
certainty and are therefore guaranteed to get a reading of “0 ” if we measure
the state.
• f is balanced. This time the coefficient of |0in in the expansion is
n
2 −1 2 −1n
G(0) 1 X 1 X
= (−1)f (y) (−1)y0 = (−1)f (y) ,
2n 2n y=0 2n y=0
391
The Deutsch-Jozsa Algorithm in Summary
We’ve explained the purpose of all the components in the circuit and how each plays
a role in leveraging quantum parallelism and phase kick-back. The result is extremely
easy to state. We run the circuit
|0in / H ⊗n / / H ⊗n /
Uf
|1i H (ignore)
one time only and measure the data register output in the natural basis.
• If we read “x” for any other x, (i.e., x ∈ [1, 2n − 1]), then f is balanced.
In order to know the answer to the Deutsch-Jozsa problem deterministically, i.e., with
100% certainty, we would have to evaluate the function f for more than half of the
possible inputs, i.e., at least
2n
+1 = 2n−1 + 1
2
times. That is, we’d plug just over half of the possible x values into f (say, x =
0, 1, 2, ..., 2n−1 + 1), and if they were all the same, we’d know the function must
be constant. If any two were distinct, we know it is balanced. Of course, we may get
lucky and find that f (0) 6= f (1), in which case we can declare victory (balanced ) very
quickly, but we cannot count on that. We could be very unlucky and get the same
output for the first 2n /2 computations, only to know the answer with certainty, on
the 2n /2 + 1st. (if it’s the same as the others: constant, if not: balanced.)
While we have not had our official lecture on time complexity, we can see that
as the number of binary inputs, n, grows, the number of required evaluations of f ,
2n−1 + 1, grows exponentially with n. However, when we consider that there are
N = 2n encoded integers that are allowed inputs to f , then as N grows, the number
of evaluations of f , N2 + 1, grows only linearly with N .
The classical problem has a solution which is exponential in n (the number of
binary inputs, or linear in N = 2n (the number of integer inputs).
392
The Quantum Time Complexity
We have solved the problem with one evaluation of Uf which is assumed to have the
same the same circuit complexity as f . Now you might say that this is a constant
time solution, i.e., it does not grow at all with n, because no matter how large n is, we
only need to evaluate Uf once. In that light, the quantum solution is constant-time;
it doesn’t grow at all with n. We simply measure the output of the data register, x,
done using
if (x > 0)
and we’re done.
You might argue that we have overstated the case, because in order to detect
the output of the circuit, we have to query all n bits of the data registers to know
whether we get “000 · · · 00” or an integer other than that. No computer can do that
for arbitrarily large n without having an increasingly large circuit or increasingly long
testing algorithm. So in practical terms, this is an evaluation of Uf followed by n
one-bit queries, something that requires n if statements. That grows linearly with
n or, using encoded integer counting (N = 2n ), logarithmically (even better). Either
way, the quantum algorithm has a better time complexity (linear vs. exponential in
n or logarithmic vs. linear in N ) than its classical counterpart. So the speed-up is
real.
But there’s another way to view things that puts the quantum algorithm in an even
more favorable light. Whether quantum or classical, the number of binary registers
to test is the same: n. So we can really ignore that hardware growth when we speak
of time complexity relative to the classical case; the quantum algorithm can be said
to solve the problem in constant time relative to the classical algorithm.
Reminder. I’ll define terms like logarithmic, linear and exponential time com-
plexity in the next lecture.
Either way you look at it, if you require 100% certainty of the solution, we have found
an algorithm that is “faster” than the classical solution.
If, however, we allow a small error possibility for the classical case, as is only fair
since we might expect our quantum circuit to be prone to error (a topic of the next
course), then the classical algorithm grows neither exponentially with n nor linearly
with N , but in fact is a constant time algorithm, just like the Deutsch-Jozsa. I’ll give
you an outline of the reason now, and after our probability lesson, we can make it
rigorous.
Classical Algorithm Admitting a Small Error Probability ε << 1.
393
Let’s consider the following classical algorithm.
The M -and-Guess Algorithm. Let M be some positive integer (think 20).
Given a Boolean function, f (x) of x ∈ [ 0, n − 1 ] which is either balanced or constant,
i.e., one that satisfies the Deutsch-Jozsa hypothesis, we evaluate f (x) M times, each
time at a random x ∈ [ 0, n − 1 ]. We call each evaluation a “trial.” If we get two
different outputs, f (x0 ) 6= f (x00 ), by the time we complete our M trials, we declare
victory: f is balanced without a doubt. On the other hand, if we get the same output
for all M trials, we declare near victory: We report that f is constant, with a pretty
good certainty.
How often will we get the wrong answer using this algorithm?
The only way we can fail is if the function is balanced yet we declare it to be
constant after M trials. That only happens if we are unlucky enough to get M
straight 0s or M straight 1s from a balanced f .
We’ll call that eventuality, the event
S ∧B,
which is a symbolic way to say, “f was Balanced yet (technically AND) all trial
outcomes were the S ame.”
Since a balanced f means there is a 50-50 chance of getting a 1 or a 0 on any trial,
this unlucky outcome is akin to flipping a fair coin M times and getting either all
heads or all tails. As you can intuit by imagining 20 heads or 20 tails in a sequence of
20 fair coin tosses, this is quite unlikely. We’ll explain it rigorously in the upcoming
lesson on probability, but the answer is that the probability of this event occurring,
designated P (S ∧ B), is
M
1 1 1
P (S ∧ B) = 2 × × = .
2 2 2M
The factor of 2 out front is due to the fact that the error on a balanced function
can occur two different ways, all 1s or all 0s. The final factor 1/2 is a result of an
assumption – which could be adjusted if not true – that we are getting a constant
function or balanced function with equal likelihood.
So we decide beforehand the error probability we are willing to accept, say some
ε << 1, and select M so that
1
≤ ε.
2M
This will allow our classical algorithm to complete (with the same tiny error prob-
ability, ε, in a fixed number of evaluations, M , of the function f regardless of the
number of inputs, n. To give you an idea,
(
0.000001 , for M = 20
P (S ∧ B) ≤ −16
.
9 × 10 , for M = 50
394
Since the error probability does not increase with increasing n, the classical algo-
rithm has a constant time solution, meaning that we can solve it with the same
time complexity as the quantum Deutsch-Jozso algorithm. (We will define terms
like complexity and constant time precisely very soon, but you get the general idea.)
Therefore, no realistic speed-up is gained using quantum computing if we accept a
vanishingly small error result.
This does not diminish the importance of the deterministic solution which does
show a massive computational speed increase, but we must always temper our enthu-
siasm with a dose of reality.
|0in / H ⊗n / / H ⊗n /
Uf .
|1i H (ignore)
|0in / H ⊗n / / H ⊗n / ,
Uf
|1i H (ignore)
P
the data register was in the state
n
−1
2X
1
√ (−1)f (y) |yin .
n
2 y=0
1. If f is constant, then all the coefficients, (−1)f (y) , are the same, and we are
looking at |0in± (or possibly -1 times this state, observationally equivalent).
2. If f is balanced, then half the (−1)f (y) are +1 and half are −1. Now, our little
lemma reminds us that this condition suggests – but does not guarantee – that
a balanced state might, at access point P, be an x-CBS state other than |0in± .
395
The constant case 1, guarantees that we land in the x-CBS state, |0in± . The balanced
case 2, suggests that we might end up in one of the other x-CBS states, |xin± , for
x > 1. Let’s pretend that in the balanced case we are lucky enough to land exactly
in one of those other CBS states. If so, when we measure at access point P along the
x-basis,
This is because measuring any CBS state along its own basis gives, with 100% prob-
ability, the value of that state; that state’s amplitude is 1 and all the rest of the CBS
states’ amplitudes are 0.
The Bad News. Alas, we are not able to assert that all balanced f s will produce
x-CBS kets since there are more ways to distribute the + and − signs equally than
there are x-CBS kets.
The Good News. We do know something that will turn out to be pivotal: a
balanced f will never have the CBS ket, |0in± in its expansion. Let’s prove it.
If we give the data register’s state at access point P the name |βin ,
n
−1
2X
n 1
|βi = √ (−1)f (y) |yin ,
n
2 y=0
Furthermore, its |0in± coefficient is given by the dot-with-the-basis-ket trick (all coef-
ficients are real, so we can use a simple dot-product),
1
1 ±1
1
1 1 ±1
n
h0 | βin
= √ .. · √ . .
± . n ..
2n
1
2
±1
1
±1
√
Aside from the scalar factors 1/ 2, the left vector has all 1s, while the right vector
has half +1s and half −1s, i.e., their dot product is 0: we are assured that there is
no presence of the 0th CBS ket |0in± in the expansion of a balanced f . X
396
We have shown that the amplitude of the data register’s |0in± is 0 whenever f is
balanced, and we already knew that its amplitude is 1 whenever f is constant, so
measuring at access point P along the x-basis will
Measurement
The x-basis measurement we seek is nothing more than a z-basis measurement after
applying the nth order x ↔ z basis transforming unitary H ⊗n . This explains the
final nth order Hadamard gate in the upper right (dashed box),
|0in / H ⊗n / / H ⊗n /
Uf .
|1i H (ignore)
• |0in , which corresponds to the pre-H ⊗n state of |0in± , and therefore indicated a
constant f , or
• anything else, which corresponds to a pre-H ⊗n state that did not contain even
“trace amounts” of |0in± before the final gate and therefore indicates a balanced
f.
The argument led to the same conclusion but forced us to think about the direct
output of the oracle in terms of the x-basis, thereby guiding the decision to apply the
final Hadamard gate. Not only that, we get a free algorithm out of it, and I’ll let you
guess what it is.
[Exercise. While not all balanced functions lead to x CBS kets at access point P,
several do. Describe them in words or formulas.]
Let’s call the collection of functions in the last exercise Bx .
[Exercise. How many functions are in the set Bx ?]
[Exercise. If you are told that an unknown function is in the set Bx , formulate
an algorithm using the Deutsch-Jozsa circuit that will, in a single evaluation of Uf ,
determine the entire truth table of the unknown function.]
397
[Exercise. How many evaluations of the unknown function f would be needed
to do this deterministically using a classical approach?]
[Exercise. If you were to allow for a non-deterministic outcome classically, would
you be able to get a constant time solution (one whose number of evaluations of f
would be independent of n for a fixed error, ε)?]
[Exercise. After attempting this problem, read the next section (Bernstein-
Vazirani) and compare your results and algorithm with that seemingly distinct prob-
lem. Are the two truly different problems and algorithms or is there a relationship
between them?]
f : {0 , 1}n −→ {0 , 1} .
f ( xn−1 , xn−2 , . . . , x1 , x0 ) ,
f (x) = a x,
398
13.5.1 The Bernstein-Vazirani Algorithm
The algorithm uses the same circuit with the same inputs and the same single data
register measurement as Deutsch-Josza. However this time, instead of asking whether
we see a “0” or a non-“0” at the output, we look at the full output: its value will be
our desired unknown, a.
The Circuit
|0in / H ⊗n / / H ⊗n /
Uf
|1i H (ignore)
The |0i going into the top register provides the quantum parallelism and the |1i
into the bottom offers a phase kick-back that transfers information about f from the
target output to the data output.
Same as Deutsch-Jozsa. The first part of the circuit prepares states that are needed
for quantum parallelism and phase kick-back,
|0in / H ⊗n / / H ⊗n /
Uf .
|1i H (ignore)
|0in / H ⊗n / / H ⊗n /
Uf ,
|1i H (ignore)
399
namely,
−1
2X n !
1
|0in± (−1)f (y) |yin
Uf |−i = √ |−i ,
n
2 y=0
|0in / H ⊗n / / H ⊗n /
Uf
|1i H (ignore)
P Q
the data register holds the ket
n
−1
2X
1
√ (−1)ay |yin
n
2 y=0
400
• z = a.
n −1
2X n −1
2X
ay ya
G(a) = (−1) (−1) = 1 = 2n ,
y=0 y=0
• z 6= a. We don’t even have to sweat the computation for the amplitudes for the
other kets, because once we know that |ai has amplitude 1, the others have to
be 0. (Why?)
We have shown that at access point Q, the CBS state |ai is sitting in the data register.
Since it is a CBS state, it won’t collapse to anything other than what it already is,
and we are guaranteed to get a reading of “a,” our sought-after n-bit binary number.
Time Complexity
Because the quantum circuit evaluates Uf only once, this is a constant time solution.
What about the classical solution?
Deterministic. Classically we would need a full n evaluations of f in order to
get all n coordinates of a. That is, we would use the input value
0
..
.
ek ≡ 1 ←− kth element .
.
..
0
After n passes we would have all n coordinates of a and be done. Thus, the classical
algorithm grows linearly with the number of inputs n. This kind of growth is called
linear growth or linear time complexity as it requires longer to process more inputs,
but if you double the number of inputs, it only requires twice as much time. This
is not as bad as the exponential growth of the classical deterministic Deutsch-Jozsa
algorithm.
401
Alternatively, we can measure the classical deterministic solution to the current
problem in terms of the encoded integer size, N = 2n . In that case the classical
algorithm is logarithmic in N , which doesn’t sound as bad as linear, even though this
is just a different accounting system.)
Non-Deterministic. What if, classically, we evaluate f a fixed number of times,
M , and allow for some error, ε close to 0? Can we succeed if M is independent of
the number of inputs, n? No. In fact, even if allowed M to grow with n by taking
M = n − 1, we would still be forced to guess at the last coordinate. This would
produce a 50% error since the last coordinate could be 1 or 0 with equal probability.
We can’t even make a good guess (small error ε close to 0) if we skip a measely one of
the n evaluations, never mind skipping the many evaluations that would be hoisted
on us if we let M be constant and watched n grow far beyond M .
So in practical terms, the classical solution is not constant time, and we have
a clear separation between quantum and classical solutions to the question. This
is a stronger result than quantum computing provided to the Duetsch-Josza prob-
lem where, when we allowed a small error, there was no real difference between the
quantum and classical solutions.
In this special form, notice that each term is a separable product of a distinct CBS
ket from A and some general state from B, i.e., the kth term is
|kinA |ψk im
B .
402
We know by QM Trait #7 (post-measurement collapse), that the state of the com-
ponent space A = H(n) must collapse to one of the CBS states, call it
|k0 inA .
The generalized Born rule assures us that this will force the component space B =
H(m) to collapse to the matching state,
|ψk0 im
B ,
Discussion
The assumption of this rule is that the component spaces A and B are in an entangled
state which can be expanded as a sum, all terms of which have A basis factors. Well,
any state in A ⊗ B can be expressed expressed this way; all we have to do is express
it along the full 2n+m product basis kets, then collect terms having like A-basis kets
and factor out the common ket in each term. So the assumption isn’t so much about
the state |ϕin+m as it is about how the state is written.
The next part reminds us that when an observer of the state space A takes a
measurement along the natural basis, her only possible outcomes are one of the 2n − 1
basis kets:
so only one term in the original sum survives. That term tells us what a B-state
space observer now has before him:
|ψ0 im
A & |0i n
=⇒ B & p ,
hψ0 | ψ0 i
|ψ im
A & |1i n
=⇒ B & p 1 ,
hψ1 | ψ1 i
|ψ im
A & |2i n
=⇒ B & p 2 ,
hψ2 | ψ2 i
and, in general,
|ψk im
A & |kin =⇒ B & p , for k = 0, 1, . . . , 2n − 1 .
hψk | ψk i
403
This does not tell us what a B-state observer would measure, however, since the state
he is left with, call it
|ψ im
p k0 ,
hψk0 | ψk0 i
|ϕin+m
in such a way that each term in the sum had a B-space CBS ket, |kim
B and the A-space
n
partners were general states, |ψk iA .
For those we need to add a little more math to our diet, so we take a small-but-
interesting side trip in next time.
404
Chapter 14
Probability Theory
S ∧B,
405
... (description of loop) ...
Estimating the probabilities of this algorithm will require more than intuition; it will
require a few probability laws and formulas.
Today we’ll cover those laws and formulas. We’ll use them to solidify our earlier
classical estimations, and we’ll have them at-the-ready for the upcoming probabilistic
quantum algorithms.
14.2.1 Events
An event is something that happens, happened, will happen or might happen.
Events are described using English, Mandarin, Russian or some other natural
language. They are not numbers, but descriptions. There is no such thing as the
event “8.” There is the event that Salim rolls an 8 at dice, Or Han missed the 8 PM
train.
406
We will often use a script letter like A , B, C , . . . to designate events, as in
• “... Let E be the event that two sample measurements of our quantum circuit
are equal ...”,
• “... Let I be the event that all the vectors we select at random are linearly
independent ...”, or
• “... Let C be the event that the number given to us is relatively prime to 100
...”.
14.2.2 Probabilities
Probabilities are the numeric likelihoods that certain events will occur. They are
always positive numbers between 0 and 1, inclusive. If the probability of an event is
0, it cannot occur, if it is 1, it will occur with 100% certainly, if it is .7319, it will
occur 73.19% of the time, and so on. We express the probabilities of events using P ()
notation, like so
407
If we prepare this state by applying a Hadamard gate, H, to the basis state |0i, our
“coin” is waiting at the output gate. In other words, the coin is H |0i:
|0i + |1i
|0i H √ .
2
[Exercise. We recognize this state under an alias; it also goes by the name |+i.
Recall why.]
Measuring the output state H |0i is our actual “toss.” It causes the state to collapse
to either |0i or |1i, which we would experience by seeing a “0 ” or “1 ” on our meter.
|0i + |1i
√ & |0i or |1i
2
Here, &
means collapses to.
That’s equivalent to getting a heads or tails. Moreover, the probability of getting
either one of these outcomes (the eigenvalues) is determined by the amplitudes
√ of
their respective CBS kets (the eigenvectors). Since both amplitudes are 1/ 2, the
probabilities are
1 2
1
P (measuring 0) = √ = , and
2 2
1 2
1
P (measuring 1) = √ = .
2 2
So we have a perfectly good coin. Suddenly learning probability theory seems a lot
more appealing, and as a bonus we’ll be doing a little quantum computing along the
way.
[Exercise. We could have used H |1i as our coin. Explain why.]
[Exercise. The above presupposes that we will measure the output state along
the z-basis, {|0i , |1i}. What happens if, instead, we measure the same state along
the x-basis, {|+i , |−i}. Can we use this method as a fair coin flip?]
408
14.4.1 Outcomes
Every experiment or set of measurements associated with our quantum algorithms
will consist of a set of all possible outcomes. An outcome is the most basic result we
can imagine.
If the experiment is to prepare exactly one quantum coin, i.e., the state H |0i, and
then measure it, there are two possible outcomes:
Usually, though, we simply say that the possible outcomes are 0 and 1.
|0i + |1i
#0 :
|0i H √
2
|0i + |1i
#
1: |0i √
10
H 2
quantum
coins
..
.
# |0i + |1i
9: |0i H √
2
− “Total 0s = 0”
− “Total 0s = 1”
− “Total 0s = 2”
409
..
.
− “Total 0s = 10”
Again, we could abbreviate this by saying, “the possible outcomes are 0-10.”
One problem with this definition of “outcome” is that some are more likely than
others. It is usually beneficial to define outcomes so that they are all equally –
or nearly equally – likely. So, we change our outcomes to be the many ten-tuples
consisting of
There are now a lot more outcomes (210 = 1024), but they each have the same
likelihood of happening. (If you don’t believe me, list the eight outcomes for three
coins and start flipping.) A shorter way to describe the second breakdown of outcomes
is,
( x0 , x1 , x2 , x3 , x4 , x5 , x6 , x7 , x8 , x9 ),
[Exercise. Explain why the two ways we defined the outcomes of the ten qubit
coin toss are both legal.]
410
14.4.3 An Incorrect Attempt at Defining Outcomes
It’s natural to believe that organizing the ten qubit coin toss by looking at each
individual qubit outcome (such as “the 4th qubit measured a 0”) as being a reasonable
partition of the experiment. After all, there are ten individual circuits. Here is the
(unsuccessful) attempt at using that as our outcome set.
“The outcomes are to be the individual measurement results
Z3 ≡ measurement of qubit # 3 is 0 ,
Z8 ≡ measurement of qubit # 8 is 0 ,
O5 ≡ measurement of qubit # 5 is 1 ,
O0 ≡ measurement of qubit # 0 is 1 ,
and so on.”
(Z would mean that the event detects a Zero, while O, script-O, means the event
detects a One. Meanwhile, the subscript indicates which of the ten measurements we
are describing.)
There are ten measurements, each one can be either zero or one, and the above
organization produces 20 alleged outcomes. However, this does not satisfy the two
requirements of “outcome.”
[Exercise. Explain why?]
14.4.4 Events
Definitions
Please recognize that outcomes are not events. A set containing an outcome is an
event (a simple one, to be precise).
Describing Events
Events can be described either by the actual sets, using set notation, or an English
(or French or Vietnamese) sentence.
Examples of simple event descriptions for our ten qubit coint toss experiment are
411
· { (0, 0, 1, 1, 1, 0, 0, 0, 1, 1) }
· { (1, 1, 1, 1, 1, 1, 0, 0, 1, 0) }
· { (0, 0, 0, 0, 0, 0, 0, 0, 0, 0) }
· “The first five qubits measure 0 and the last five measure 1.”
Examples of compound event descriptions for our ten qubit coint toss experiment
are
[Exercise. Describe five simple events, and five compound events. Use some
set notation and some natural English descriptions. You can use set notation that
leverages formulas rather than listing all the members, individually.]
In our ten qubit coin toss, Ω is the set of all ordered ten-tuples consisting of 0s and
1s,
n o
Ω = (x0 , x1 , x2 , . . . , x9 ) xk ∈ {0, 1} .
412
14.4.6 Set Operations
Unions
One way to express Ω is as a compound event consisting of the set union of simple
events. Let’s do that for the ten qubit coin flip using big-∪ notation, which is just
like summation notation, Σ, only for unions,
[
Ω ≡ { (x0 , x1 , x2 , . . . , x9 ) } .
xk ∈{0, 1}
Example. We would like to represent the event, F , that the first four quantum
coin flips in our ten qubit experiment are all the same. One expression would be
[
F = { (w, w, w, w, x0 , x1 , . . . , x5 ) } .
w, xk ∈ {0,1}
Example. We would like to represent the event, F 0 , that the first four quantum
coin flips in our ten qubit experiment are all the same, but the first five are not all
the same. One expression would be
[
F0 = { (w, w, w, w, w ⊕ 1, x0 , x1 , . . . , x4 ) } ,
w, xk ∈ {0,1}
Intersections
We can use intersections to describe sets, too. For example, if we wanted an event
A in our ten qubit coin flip in which both a) the first four are the same and also b)
the first and last are different, we might express that using
A = F ∩E ,
where F and E were defined by me (and you) a moment ago.
Of course, we could also try representing that event directly, but once we have a
few compound events defined, we often leverage them to produce new ones.
[Exercise. Use intersections and the events already defined above to describe
the event, R, in our ten qubit coin flip in which both a) the Hamming weight has
absolute value ≥ 5 and b) the sum of the 1s is odd.
413
Differences
What if we wanted to discuss the ten qubit coin flip event, D, which had odd sums,
but whose first four flips are not equal ? We could leverage our definitions of O and
F and use difference notation, “−” or “\”, like so:
D = O − F or
D = O \F .
That notation instructs the reader to start with O, then remove all the events that
satisfy (or are ∈) F .
When we start with the entire sample space Ω, and subtract and event, S ,
Ω−S ,
we use a special term and notation: the complement of S , written in several ways,
depending on the author or context,
S0,
S,
Sc, or
¬S .
All are usually read “not S ”, and the last reprises the logical negation operator, ¬.
414
An outcome is already in a vector-like format so it’s not hard to see the correspon-
dence. For example, the outcome (0, 1, 0, . . . , 1) corresponds to the vector
0
1
0
.
..
.
1
This seems pretty natural, but as usual, I’ll complicate matters by introducing the
addition of two such vectors. While adding vectors is nothing new, the concept
doesn’t seem to have much meaning when you think of them as outcomes. (What
does it mean to “add two outcomes of an experiment?”) Let’s not dwell on that for
the moment but proceed with the definition. We add vectors in this space by taking
their component-wise mod-2 sum ⊕ or, equivalently, their component-wise XOR,
0 1 0⊕1 1
1 0 1 ⊕ 0 1
0 1 0 ⊕ 1
+ ≡ = 1 .
.. .. .. ..
. . . .
1 1 1⊕ 1 0
To make this a vector space, I’d have to tell you its scalars, (just the two numbers
0 and 1), operations on the scalars (simple multiplication, “·” and mod-2 addition,
“⊕”), etc. Once the details were filled in we would have the “ten dimensional vectors
mod-2,” or (Z2 )10 . The fancy name expresses the fact that the vectors have 10
components (the superscript 10 in (Z2 )10 ), in which each component comes from the
set {0, 1} (the subscript 2 in (Z2 )10 ). You might recall from our formal treatment of
classical bits that we had a two dimensional mod-2 vector space, B = B2 , which in
this new notation is (Z2 )2 .
We can create mod-2 vectors of any dimension, of course, like the five-dimensional
(Z2 )5 or more general n-dimensional (Z2 )n . The number of components 10, 5 or n,
tells you the dimension of the vector space.
A second view of a ten qubit coin flip is that of a ten bit integer from 0 to 1023,
constructed by concatenating all the results,
x0 x1 x2 · · · x9 .
415
Examples of the correspondence between these last two views are:
• All outcomes which are relatively prime (a.k.a. coprime) to a given number.
(Uses integer interpretation.)
• All outcomes that are linearly independent of the set (event) containing the
three vectors
(0, . . . 0, 0, 1), (0, . . . 0, 1, 0) , (0, . . . 1, 0, 0)
The last two are of particular importance when we consider quantum period-finding.
416
Linear Independence. In a vector space, a set of vectors {v0 , . . . , vn−1 }
is linearly independent if you cannot form a non-trivial linear combi-
nation of them which produces the zero vector, i.e.,
c0 v0 + c1 v1 + . . . + cn−1 vn−1 = 0
=⇒
all ck = 0.
Another way to say this is that no vector in the set can be expressed as a linear-
combination of the others.
[Exercise. Prove that the last statement is equivalent to the definition.]
[Exercise. Show that the zero vector can never be a member of a linearly inde-
pendent set.]
[Exercise. Show that a singleton (a set consisting of any single non-zero vector)
is a linearly independent set.]
The span of a set of vectors is the (usually larger) set consisting of all vectors that
can be constructed by taking linear combinations of the original set.
The set S does not have to be a linearly independent set. If it is not, then it means
we can omit one or more of its vectors without reducing its span.
[Exercise. Prove it.]
When we say that a vector, w, is in the span of S, we mean that w can be written
as a linear combination of the vk s in S. Again, this does not require that the original
{vk } be linearly independent.
[Exercise. Make this last definition explicit using formulas.]
When a vector, w, is not in the span of S, adding w to S will increase S’s span.
417
Abstract Example
[Exercise. For a mod-2 vector space, how many vectors are in the span of the
empty set, ∅? How many are in the span of {0}? How many are in the span of a set
consisting of a single (specific) non-zero vector? How many are in the span of a set
of two (specific) linearly independent vectors? Bonus: How many in the span of
a set of m (specific) linearly independent vectors? Hint: If you’re stuck, the next
examples will help sort things out.]
Concrete Example
A set of two mod-2 vectors, {v0 , v1 } (of any dimension, say 10), are linearly inde-
pendent in the mod-2 sense when
• neither vector is zero (0), and
• the two vectors are distinct, i.e., v0 6= v1 .
[Exercise. Prove that this follows from the definition of linear independence using
the observation that the only scalars available are 0 and 1.]
418
Finding a Third Independent Vector Relative to an Independent Set of
Two
This set is linearly independent as we know from the above observations. If we wanted
to add a third vector, w, to this set such that the augmented set of three vectors,
(1, 0, 0, 1, 1), (1, 1, 1, 1, 0), w ,
would also be linearly independent, what would w have to look like? It is easiest
to describe the illegal vectors, i.e., those w which are linearly dependent on the two,
then make sure we avoid those. For w to be linearly dependent on these two, it would
have to be in their span. We already computed the span and found it to be the four
vectors
(0, 0, 0, 0, 0) ,
(1, 0, 0, 1, 1) ,
(1, 1, 1, 1, 0) and
(0, 1, 1, 0, 1) .
Thus, a vector w is linearly independent of the original two exactly when it is not in
that set,
w ∈ / 0, (1, 0, 0, 1, 1), (1, 1, 1, 1, 0), (0, 1, 1, 0, 1) ,
(I used the over-bar, E , rather than the E c notation to denote complement, since the
c
tends to get lost in this situation.)
How many vectors is this? Count all the vectors in the space (25 = 32) and
subtract the four that we know to be linearly dependent. That makes 32 − 4 = 28
such w independent of the original two.
Discussion. This analysis didn’t depend on which two vectors were in the original
linearly independent set. If we had started with any two distinct non-zero vectors,
they would be independent and there would be 28 ways to extend them to a set of
three independent vectors. Furthermore, the only role played by the dimension 5 was
that we subtracted the 4 from 25 = 32 to get 28. If we had been working in the
7-dimensional (Z2 )7 and asked the same question starting with two specific vectors,
we would have arrived at the conclusion that there were 27 − 4 = 128 − 4 = 124
vectors independent of the first two. If we started with two linearly independent 10-
dimensional vectors, we would have gotten 210 −4 = 1024−4 = 1020 choices. And if we
419
started in the space of 3-dimensional vectors, (Z2 )3 , there would be 23 − 4 = 8 − 4 = 4
independent w from which to choose.
[Exercise. If we had two independent vectors in the 2-dimensional (Z2 )2 , how
many ways would there have been to select a third vector linearly dependent from
the first two?]
Finding a Third Vector that is Not in the Span of Two Random Vectors
Let’s describe an event that is more general than the one in the last example.
Notation. Since we are selecting vectors in a specific sequence, we’ll use the
notation
x, y
to represent the event where x is the first pick and y is the second pick. (To count
accurately, we must consider order, which is why we don’t use braces: “{” or “}.”)
Similarly, the selection of the third w after the first two could be represented by
x, y, w .
2. x and y are not linearly independent. This case contains three sub-cases:
420
Let’s do the larger case 2 first, then come back and finish up what we started
earlier to handle case 1.
Harder Case 2. For x and y not linearly independent, we count each sub-case
as follows.
(i) There is only one configuration of x and y in this sub-case, namely 0, 0 . In
such a situation, the only thing we require of w is that it not be 0. There
are
32 − 1 = 31 such w. Therefore, there are 31 simple events, 0, 0, w , in this
case.
(ii) In this sub-case there are 31 configurations of the form 0, y y6=0 and 31 of
the form x, 0 x6=0 . That’s a total of 62 ways that random choices of x and y
can lead us to this sub-case. Meanwhile, for each such configuration there are
32 − 2 = 30 ways for w to be different from x and y. Putting it together, there
are 62 · 30 = 1860 simple events in this sub-case.
(iii) There are 31 configurations of x = y 6= 0 in this sub-case, namely x, x x6=0 .
Meanwhile, for each such configuration any w that is neither 0 nor x will work.
There are 32 − 2 = 30 such w. Putting it together, there are 31 · 30 = 930
simple events in this sub-case.
Summarizing, the number of events with x and y not linearly independent and w not
in their span is 31 + 1860 + 930 = 2821.
Easier Case 1. We get into this situation with x, y x6=y6=0 . That means the
first choice can’t be 0 so there are 31 possibilities for x, and the second one can’t be
0 or x, so there are 30 choices left for y. That’s 31 · 30 ways to get into this major
case. Meanwhile, there are 28 ws not in the span for each of those individual outcomes
(result of last section), providing the linearly-independent case with 31·30·28 = 26040
simple events.
Combining Both Cases. We add the two major cases to get 2821 + 26040 =
28861 outcomes in event E . If you are thinking of this as three five qubit coin flip
outcomes – 15 individual flips, total – there are 28861 ways in which the third group
of five will be linearly independent of the first two.
Is this likely to happen? How many possible flips are there in the sample space
Ω? The answer is that there are 215 (or, if you prefer, 323 ) = 32768. (The latter
expression comes from 3 five qubit events, each event coming from a set of 25 = 32
outcomes.) That means 28861 of the 32768 possible outcomes will result in the third
outcome not being in the span of the first two. A simple division tells us that this
will happen 88% of the time.
A third vector selected at random from (Z2 )5 is far more likely to be in-
dependent of any two previously selected vectors than it is to be in their
span.
This was a seat-of-the-pants calculation. We’d better get some methodology so
we don’t have to work so hard every time we need to count.
421
14.7 Fundamental Probability Theory
14.7.1 The Axioms
Consider the set of all possible events of an experiment,
Events ≡ { E }.
P : Events −→ R≥0 ,
P (E ) ≥ 0.
P (Ω) = 1.
· P (∅) = 0.
· If E ⊆ F , P (E ) ≤ P (F ). ]
It may not always be obvious how to assign probabilities to events even when they
are simple events. However, when the sample space is finite and all simple events are
equiprobable, we can always do it. We just count and divide.
422
Caution. There is no way to prove that simple events are equiprobable. This
is something we deduce by experiment. For example, the probability of a coin flip
coming up tails (or the z-spin measurement of an electron in state |0ix being “1 ”) is
said to be .5, but we don’t know that it is for sure. We conclude it to be so by doing
lots of experiments.
|E | ≡ # outcomes ∈ E .
This is not a definition, but a consequence of the axioms plus the assumption that
simple events are finite and equiprobable.
Example 1
In the ten qubit coin flip, consider the event, F , in which the first four results are
all equal. We compute the probability by counting how many simple events (or,
equivalently, how many outcomes) meet that criterion. What we know about these
events is that the first four are the same, so they are either all 0 or all 1.
· All 0. The number of events in this case is the number of ten-tuples of the
form,
(0, 0, 0, 0, x0 , x1 , . . . , x5 ) ,
which we can see is the same as the number of integers of the form x0 x1 x2 x3 x4 x5 .
That’s 000000 through 111111, or 0 → 63 = 64.
· All 1. The number of events in this case is the number of ten-tuples of the
form,
(1, 1, 1, 1, x0 , x1 , . . . , x5 ) ,
So the number of outcomes in this event is 128. Meanwhile the sample space has
1024 outcomes, giving
128
P (F ) = = 1/8 = .125 .
1024
423
Example 2
· Four 0s followed by a 1.
(0, 0, 0, 0, 1, x0 , x1 , . . . , x4 ) ,
· Four 1s followed by a 0.
(1, 1, 1, 1, 0, x0 , x1 , . . . , x4 ) ,
Again, we look at the number of possibilities for the “free range” bits x0 , . . . , x4 ,
which is 32 for each of the two categories, making the number of outcomes in this
event 32 + 32 = 64, so the probability becomes
64
P (F
c) = = 1/16 = .0625 .
1024
Example 3
We do a five qubit coin flip twice. That is, we measure five quantum states once,
producing a mod-2 vector, x = (x0 , x1 , . . . , x4 ), then repeat, getting a second vector,
y = (y0 , y1 , . . . , y4 ). It’s like doing one ten qubit coin flip, but we are organizing
things naturally into two equal parts. Instead of the outcomes being single vectors
with ten mod-2 components, outcomes are pairs of vectors, each member of the pair
having five mod-2 components,
n o
Ω ≡ (x0 , x1 , . . . , x4 ), (y0 , y1 , . . . , y4 ) xk , yk ∈ {0, 1}
Consider the event, I , in which the two vectors form a linearly independent set.
We’ve already discussed the exact conditions for a set of two mod-2 vectors to be
linearly independent:
424
That makes 31 vectors x 6= 0, each supporting 30 linearly independent ys. That’s
31 × 30 = 930 outcomes in I . |Ω| continues to be 1024, so
|I | 930
P (I ) = = ≈ .908 .
|Ω| 1024
As you can see, it is very likely that two five qubit coin flips will produce a linearly
independent set; it happens > 90% of the time.
[Exercise. Revise the above example so that we take three, rather than two, five
qubit coin flips. Now the sample space is all triples of these five-tuples, (x, y, w).
What is the probability of the event, T , that all three “flip-tuples” are linearly
independent? Hint: We already covered the more lenient case in which x and y were
allowed to be any two vectors and w was not in their span. Repeat that analysis but
exclude the cases where x and y formed a linearly dependent set. ]
[Exercise. Make up three interesting event descriptions in this experiment and
compute the probability of each.]
What happens when the events are not mutually exclusive? A simple diagram in the
case of two events tells the story. If we were to add the probabilities of two events
P (E ∪ F ) = P (E ) + P (F ) − P (E ∩ F ) .
425
When there are more than two sets, the intersections involve more combinations and
get harder to write out. But the concept is the same, and all we need to know is that
n−1
! n−1
[ X
P Ek = P (Ek ) − P (various intersections) .
k=0 k=0
The reason this is always enough information is that we will be using the formula to
bound the probability from above, so the equation, as vague as it is, clearly implies
n−1
! n−1
[ X
P Ek ≤ P (Ek ) .
k=0 k=0
Very often we will want to know the probability of some event, E , under the as-
sumption that another event, F , is true. For example, we might want to know the
probability that three quantum coin flips are linearly independent under the assump-
tion that the first two are (known to be) linearly independent. The notation for the
event “E given F ” is
E F ,
P E F .
This is something we can count using common sense. Start with our formula for an
event in a finite sample space of equiprobable simple events (always the setting for
us),
|E |
P (E ) ≡ .
|Ω|
Next, think about what it means to say “under the assumption that another event,
F , is true.” It means that our sample space, Ω, suddenly shrinks to F , and in that
smaller sample space, we are interested in the probability of the event E ∩ F , so
|E ∩ F |
P E F whenever F 6= ∅.
= ,
|F |
426
Bayes’ Law
Study this last expression until it makes sense. Once you have it, divide the top
and bottom by the size of our original sample space, |Ω|, to produce the equivalent
identity,
P (E ∩ F )
P E F
= , whenever P (F ) > 0.
P (F )
This is often taken to be the definition of conditional probability, but we can view
it as a natural consequence of the meaning of the phrase “E given F .” It is also a
simplified form of Bayes’ law, and I will often refer to this as Bayes’ law (or rule or
formula), since this simple version is all we will ever need.
P E F
= P (E ) ,
P F E
= P (F ) .
P (E ∩ F ) = P (E )P (F ),
which, by the way, is true even for the degenerate case, F = ∅. This is the official
definition of statistically independent events, and working backwards you would derive
the intuitive meaning that we started with. In words, two events are independent ⇔
The idea carries over to any number of events, although the notation becomes thorny.
It’s easier to first say it in words, then show the formula. In words,
427
n events are independent
⇔
“the probability of the intersection [of any subset of the n events] is
equal to the product of the probabilities [of events in that subset].”
⇔
\ Y
Ek i
P
= P (Eki ) .
1 < k0 < k1 <··· 1 < k0 < k1 <···
··· < kl < n ··· < kl < n
Events
(E ∩ F )c = E c ∪ Fc
E ∩ (F ∪ G ) = (E ∩ F ) ∪ (E ∩ G )
Probabilities
P (E ) = P (E ∩ F ) + P (E ∩ F c )
P (E ∩ F ) P E F P (F )
=
Example 1
In our ten qubit coin flip, what is the probability that the 3rd, 6th and 9th flips are
identical?
We’ll call the event E . It is the union of two disjoint events, the first requiring
that all three flips be 0, Z , and the second requiring that all three be 1, O,
E = Z ∪ O, with Z ∩ O = ∅
=⇒
P (E ) = P (Z ) + P (O) .
428
There is no mathematical difference between Z and O, so we compute P () of either
one, say O. It is the intersection of the three statistically independent events, namely
that the 3rd, 6th and 9th flips, individually, come up 1, which we call O3 , O6 and
O9 , respectively. The probability of each is, of course, .5, so we get,
P (O) = P (O3 ∩ O6 ∩ O9 )
= P (O3 ) P (O6 ) P (O9 ) .
= .5 · .5 · .5 = .125
Plugging into the sum, we get
P (E ) = P (Z ) + P (O) = .125 + .125 = .25 .
Example 2
We do the five qubit coin flip five times. We examine the probability of the event I5
defined as “the five five-tuples are linearly independent.” Our idea is to write P (I5 )
as a product of conditional probabilities. Let
Ij ≡ event that the vector outcomes
#
0, # 1, . . . # (j − 1)
are linearly-independent.
Our goal is to compute the probability of I5 .
(It will now be convenient to use the over-bar notation E to denote complement.)
Combining the basic identity,
P I5 = P I5 ∩ I4
P I5 ∩ I4
+
(exercise: why?), with the observation that
P I5 ∩ I4
= 0
(exercise: why?), we can write
P I5 = P I5 ∩ I4
= P I5 I4 P I4
(exercise: why?).
Now apply that same process to the right-most factor repeatedly, and you end up
with
P I5
= P I5 I4 P I4 I3
P I3 I2 P I2 I1 P I1
5
Y
P Ij Ij−1 ,
=
j=1
429
where, the j = 1 term contains the curious event, I0 . This corresponds to no coin-flip
= no vector being selected or tested. It’s different from the zero vector = (0, 0, 0, 0, 0)t ,
which is an actual flip possibility; I0 is no flip at all, i.e., the empty set, ∅. But we
know that ∅ is linearly independent always because it vacuously satisfies the condition
that one cannot produce the zero vector as a linear combination of vectors from the
set – since there are no vectors to combine. So P (I0 ) = 1. Thus, the last factor in
the product is just
P I1 I0 = P I1 ,
but it’s cleaner to include the conditional probability (LHS of above) in all factors
when using product notation.
[Note. We’ll be computing the individual factors in Simon’s algorithm. This is
as far as we need to take it today.]
P E = P E ∧ F
+ P E ∧ ¬F .
430
14.9.3 Bayes’ Law.
Bayes’ law (or rule or formula) in its simple, special case, can be expressed as
P (E ∧ F )
P E F
= , whenever P (F ) > 0.
P (F )
431
14.10.1 Sampling with Replacement
The way the algorithm is stated, we are admitting the possibility of randomly selecting
the same x more than once during our M trials. This has two consequences in the
balanced case – the only case that could lead to error:
The algorithm as stated uses a “sampling with replacement” technique and is the
version that I summarized in the original presentation. We’ll dispatch that rigorously
first and move on to a smarter algorithm in the section that follows.
P (S ∧ B) ,
we consider the case in which we are given a balanced function – the only way we
could fail. The probability to be computed in that case is expressed by
P S B .
So we first do our counting under the assumption that we have a balanced function.
In this context, we are not asking about a “wrong guess,” since under this assumption
there is no guessing going on; we know we have a balanced function. Rather, we are
computing the probability that, given a balanced function, we get M identical results
in our M trials.
Another way to approach this preliminary computation is to declare our sample
space, Ω, to be all experimental outcomes based on a balanced function. Shining the
light on this interpretation, we get to leave out the phrase “given that f is balanced ”
throughout this section. Our choice sample space implies it.
432
Notation for Outcomes Yielding a One
Let Ok be the event that we observe f (x) = 1 on the kth trial . (Script O stands
Likewise, let Zk be the event that we observe f (x) = 0 on the kth trial . (Script
Finally, let Z (no index) be the event that we get f (x) = 0 for all M trials .
The event S means all outcomes were the S ame, i.e, we got either all 1s or all 0s.
S is the disjoint union,
S = O ∨ Z (disjoint),
so
P (S ) = P (O ∨ Z ) = P (O) + P (Z )
−1
M
! M −1
\ Y
= 2 P (O) = 2P Ok = 2 P (Ok ) .
k=0 k=0
The first line uses the mutual exclusivity of O and Z , while the last line relies on
the statistical independence of the {Ok }. Plugging in 21 for the terms in the product,
we find that
M −1 M
Y 1 1 1
P (S ) = 2 = 2 = M −1
.
k=0
2 2 2
433
14.10.3 Completion of Sampling with Replacement for Un-
known f
M −1
We might be tempted to say that 21 is the probability of failure in our M -and-
guess algorithm, but that would be rash. We computed it under the assumption of
a balanced f . To see the error clearly, imagine that our function provider gives us a
balanced function only 1% of the time. For the other 99% of the functions, when we
guess “constant” we will be doing so on a constant function and will be correct; our
chances of being wrong in this case are diminished to
1
.01 × .
2M −1
To see how this comes out of our theory, we give the correct expression for a wrong
guess. Let W be the event consisting of a wrong guess. It happens when both f is
balanced (B) AND we get M identical results (S ),
W = S ∧B.
P (W ) = P (S ∧ B) = P S B P (B) .
and now understand that the probability of being wrong is attenuated by P (B),
which is usually taken to be 1/2 due to fair sampling. But understand that the world
is not always fair, and we may be given more of one kind of function than the other.
Using the usual value for P (B), this yields the proper probability of guessing wrong,
1 1 1
P (W ) = = .
2M −1 2 2M
There are many different versions of how one samples the function in a classical
Deutsche-Jozsa solution, so other mechanisms might yield a different estimate. How-
ever, they are all used as upper bounds and all give the same end result: classical
methods can get the right answer with exponentially small error in constant time.
434
guarantee this, but you can surmise that it will only add a constant time penalty –
something independent of n – to the algorithm.)
[Exercise. Write an algorithm that produces M distinct x values at random.
You can assume a random number generator that returns many more than M distinct
values (with possibly some repeats), since a typical random number generator will
return at least 32,768 different ints, while M is on the order of 20 or 50. Hint: It’s
okay if your algorithm depends on M , since M is not a function of n. It could even
be on the order of M 2 or M 3 , so long as it does not rely on n.]
Intuitively, this should reduce the error because every time we remove another x
whose f (x) = 1 from our pot, there are fewer xs capable of leading to that 1, while
n
the full complement of 22 xs that give f (x) = 0 are still present. Thus, chances of
getting f (x) = 1 on future draws should diminish each trial from the orginal value of
1
2
. We prove this by doing a careful count.
As before, we will do the heavy lifting assuming the sample space Ω = B, and when
we’re done we can just multiply by 1/2 (which assumes equally likely reception of
balanced vs. constant function.).
M = 2 Samples
The {Ok } are no longer independent. To see how to deal with that, we’ll look at
M = 2 which only contains the two events O0 and O1 .
The first trial is the easiest. Clearly,
1
P (O0 ) = .
2
Incorporating second trial requires conditional probability. We compute with care.
Our interest is P (O), and
2n−1 − 1
1
P O1 O0 P (O0 )
P (O) = = .
2n − 1 2
435
1
Let’s write 2
in a more complicated – but also more suggestive – way, to give
n−1 n−1
2 −1 2
P (O) = P (O1 ∧ O0 ) = .
2n − 1 2n
M = 3 Samples
You may be able to guess what will happen next. Here we go.
P (O) = P (O2 ∧ O1 ∧ O0 ) = P O2 [O1 ∧ O0 ] P (O1 ∧ O0 ) .
now two fewer xs left that would cause f (x) = 1, and the total number of xs from
which to choose now stands at 2n − 2, so
2n
−2 2n−1 − 2
P O2 [O1 ∧ O0 ] = 2
= .
2n −2 2n − 2
Substituting into our expression for P (O) we get
n−1 n−1 n−1
2 −2 2 −1 2
P (O) = P (O2 ∧ O1 ∧ O0 ) = n n
.
2 −2 2 −1 2n
P (O) = P (OM −1 ∧ · · · ∧ O2 ∧ O1 ∧ O0 )
n−1 n−1 n−1 n−1
2 − (M − 1) 2 −2 2 −1 2
= n
··· n n
,
2 − (M − 1) 2 −2 2 −1 2n
This covers the eventually of getting all 1s when f is balanced, and if we include the
alternate way to get unlucky, all 0s, we have
M −1
Y 2n−1 − k
P ( S without replacement ) = 2
k=0
2n − k
436
Completion of Analysis for Unknown f
We already know what to do. If we believe we’ll be getting about equal numbers of
balanced and constant functions, we multiply the last result by 1/2,
M −1
2n−1 − k
1 Y
P ( W without replacement ) = × 2
2 k=0
2n − k
M −1
Y 2n−1 − k
= .
k=0
2n − k
The probability of guessing wrong in the “with replacement” algorithm (under equal
likelihood of getting the two types of functions) was
M −1
Y 1
P ( W with replacement ) = ,
k=0
2
so we want to confirm our suspicion that we have improved our chances of guessing
correctly. We compare the current case with this past case and ask
M −1 n−1 < MY −1
Y 2 −k 1
n
= ?
2 −k 2
>
k=0 k=0
Intuitively we already guessed that it must be less, but we can now confirm this with
hard figures. We simply prove that each term in the left product is less than (or in
one case, equal to) each term in the right product.
We want to show that
2n−1 − k 1
n
≤ .
2 −k 2
Now is the time we realize that the old grade school fraction test they taught us from
our childhood, usually called “cross-multiplication,”
a c
≤ ⇔ ad ≤ cb .
b d
actually has some use. We apply it by asking
2 2n−1 − k ≤ 2n − k ?
2n − 2k ≤ 2n − k ?
−2k ≤ −k ?
The answer is yes. In fact, except for k = 0 where both sides are equal, the LHS is
strictly less than the RHS. Thus, the product and therefore the probability of error is
actually smaller in the “without replacement” algorithm. The constant time result of
the earlier algorithm therefore guaranteed a constant time result here, but we should
have fewer wrong guesses now.
437
14.11 A Condition for Constant Time Complexity
in Non-Deterministic Algorithms
This section will turn out to be critically important to our analysis of some quantum
algorithms, ahead. I’m going to define a few terms here before our official coverage,
but they’re easy and we’ll give these new terms full air time in future lectures.
Example 1
A might be an algorithm to sort the data and N is the number of data records to be
sorted.
Example 2
438
14.11.3 Looping Algorithms
Often in quantum computing, we have an algorithm that repeats an identical mea-
surement (test, experiment) in a loop, and that measurement can be categorized in
one of two ways: success (S ) or failure (F ). Assume that a measurement (test, ex-
periment) only need succeed one time in any of the loop passes to end the algorithm
with a declaration of victory: total success. Only if it fails after all loop passes is the
algorithm considered to have failed. Finally, the events S and F for any one loop
pass are usually statistically independent of the outcomes on previous loop passes, a
condition we will assume is met.
We’ll say that A is a looping algorithm.
P (S ) ≥ p > 0,
Proof. Let Sk be the event of success in the kth loop pass. The hypothesis is
that
We are allowing the algorithm to have an error with probability ε. Pick T such that
(1 − p)T < ε,
439
a condition we can guarantee for large enough T since (1 − p) < 1. Note that p being
independent of the size N implies that T is also independent of N . After having
established T , we repeat A’s loop T times. The event of failure of our algorithm at
the completion of all T loop passes, which we’ll call Ftot , can be expressed in terms
of the individual loop pass failures,
Ftot = (¬S1 ) ∧ (¬S2 ) ∧ · · · ∧ (¬ST ) .
Since the events are statistically independent, we can convert this to a probability
using a simple product,
P (Ftot ) = P (¬S1 ) P (¬S2 ) · · · P (¬ST )
= (1 − p)T < ε.
We have shown that we can get A to succeed with failure probability < ε if we allow
A’s loop to proceed a fixed number of times, T , independent of its size, N . This is
the definition of constant time complexity. QED
To solve for T explicitly, we turn our condition on the integer T into an equality on
a real number t,
(1 − p)t = ε,
solve for t by taking the log1−p of both sides,
t = log1−p (ε) ,
then pick any integer T > t. Of course, taking a log having a non-standard base like
1 − p, which is some real number between 0 and 1, is not usually a calculator-friendly
proposition; calculators, not to mention programming language math APIs, tend to
give us the option of only log2 or log10 . No problem, because ...
[Exercise. Show that
log x
logA x = ,
log A
where logs on the RHS are both base 2, both base 10, or both any other base for that
matter.]
Using the exercise to make the condition on t a little more palatable,
log (ε)
t = ,
log (1 − p)
and combining that with the need for an integer T > t, we offer a single formula for
T,
log (ε)
T = + 1,
log (1 − p)
where bxc is notation for the floor of x, or the greatest integer ≤ x.
440
Examples with Two Different ps for Success
It’s important to realize that we don’t care whether the event of success in each loop
pass, S , is highly probable or highly improbable. We only care that it is bounded
away from 0 by a fixed amount, p, independent of the size of the algorithm, N . Two
examples should crystallize this. In both examples, we assume that we would like to
assure an error probability less than ε = .000001 = 10−6 – that’s one in a million.
How many times do we have to loop?
Example 1. P (S ) = .002 (Very Improbable). We’ll use log10 , since my
calculator likes that.
log (10−6 )
−6
T = + 1 = + 1
log (.998) −.00086945871262889
= b6900.8452188c + 1 = 6900 + 1 = 6901 ,
log (10−6 )
−6
T = + 1 = + 1
log (.734) −0.134303940083929467
= b44.67478762c + 1 = 44 + 1 = 45 ,
441
Chapter 15
Computational Complexity
442
complexities we are about to study, but they do give you taste of their consequences.
• The algorithm does not depend on the size of the data set, N . It appears to
terminate in a fixed running time (C seconds) no matter how large N is. Such
an algorithm is said to have constant time complexity (or be a constant-time
algorithm).
• The algorithm takes C seconds to process N data items. We double N and the
running time seems to double – it takes 2C seconds to process 2N items. If we
apply it to 8N data items, the running time seems to take 8C seconds. Here,
the algorithm exhibits linear time complexity (or be a linear algorithm).
• The algorithm takes C seconds to process N data items. We double N and
now the running time seems to quadruple – it takes C (22 ) = 4C seconds to
process 2N items. If we apply it to 8N data items the running time seems to
take C (82 ) = 64C seconds. Now, the algorithm will likely have quadratic time
complexity (or be a quadratic algorithm).
• The algorithm takes C seconds to process N data items. We double N and
now the running time seems to increase by a factor of 23 – it takes C (23 ) = 8C
seconds to process 2N items. If we apply it to 8N data items the running time
seems to take C (83 ) = 512C seconds. Now, the algorithm will likely have cubic
time complexity.
The previous examples – constant, linear, quadratic – all fall into the general
category of polynomial time complexity which includes growth rates limited by some
fixed power of N (N 2 for quadratic, N 3 for cubic, N 5 for quintic, etc.).
Sometimes we can’t find a p such that N p reflects the growth in time as the data grows.
We need a different functional form. Examples include logarithmic and exponential
growth. I won’t give an example of the former here – we’ll define it rigorously in the
next section. But here’s what exponential growth feels like.
443
(This last example doesn’t describe every exponential growth algorithm, by the
way, but an algorithm satisfying this for some C > 1 would likely be exponen-
tial.)
• Our interest will usually be in relative speed-up of quantum over classical meth-
ods. For that, we will be using a hardware black box that does the bulk of
the computation. We will be asking the question, “how much time the quan-
tum algorithm saves us over the classical algorithm when we use the same – or
spatially equivalent – black boxes in both regimes?”
• Even when we take space into consideration, the circuitry in our algorithms for
this course will grow linearly at worst, (often logarithmically) and we have much
bigger fish to fry. We’re trying to take a very expensive exponential algorithm
classically and find a polynomial algorithm using quantum computation. There-
fore, the linear or logarithmic growth of the hardware will be overpowered by
the time cost in both cases and therefore ignorable.
For example the circuit we used for both Deutsch-Josza and Bernstein-Vazirani,
|0in / H ⊗n / / H ⊗n /
Uf ,
|1i H (ignore)
had n + 1 inputs (and outputs), so it grows linearly with n (and only logarithmically
with the encoded N = 2n ). Furthermore, since this is true for both classical and
quantum algorithms, such growth can be ignored when we compare the two regimes.
15.2.3 Notation
To kick things off, we establish the following symbolism for the time taken by an
algorithm to deal with a problem of size N (where you can continue to think of N as
444
the amount of data, while keeping in mind it might be the number of inputs or the
size of a number to be factored).
TQ (N ) ≡ time required by algorithm Q to process N elements.
We now explore ways to quantify TQ (N ).
i f ( found )
c o ut << x << ” found a t p o s i t i o n ” << k << e n d l ;
If x is in location myArray[0], the algorithm terminates instantly independent of the
array size: constant time. If it is in the last location, myArray[N-1] (or not in the
list at all), the algorithm will take N − 1 steps to complete, a time that increases
linearly with N .
If we can’t even adjudicate the speed of an algorithm for a single data set, how
do we categorize it in terms of all data sets of a fixed size? We do so by asking three
of four types of more nuanced questions. The most important category of question is
“what happens in the worst case?” This kind of time complexity is called big-O.
To measure big-O time complexity, we stack the cards against ourselves by con-
structing the worst possible data set that our algorithm could encounter. In the
search example above, that would be the case in which x was in that last position
searched. This clears up the ambiguity about where we might find it and tells us that
the big-O complexity is going to be linear. But wait – we haven’t officially defined
what it means to be linear.
445
15.3.2 Definition of Big-0
Let’s say we have an ordinary function of N , call it f (N ). f (N ) could be anything.
One example is
f (N ) = N 2 + 3N + 75 .
Another might be
f (N ) = N log N + 2 .
We wish to compare the growth rate of TQ (N ) with the function f (N ). We say that
TQ (N ) = O (f (N ))
⇐⇒
TQ grows no faster than f (N ) .
We are giving our timing on algorithm Q an upper bound using the function f (N ).
In words, we say that “the timing for algorithm Q is big-O of f ”. But we still have a
loose end; the wording ”grows no faster than” is not very rigorous, so let’s clear that
up.
TQ (N ) = O (f (N ))
⇐⇒
there exist positive constants n0 and c such that
TQ (N ) ≤ c |f (N )| , for all N ≥ n0 .
This means that while TQ (N ) might start out being much greater than c|f (N )| for
small N , eventually it “improves” as N increases to the extent that TQ (N ) will become
and stay ≤ c |f (N )| for all N once we get past N = n0 .
Note. Since our comparison function, f (x), will always be non-negative, I will
drop the absolute value signs in many of the descriptions going forward.
446
Descriptions of Growth Functions
1 constant
log N logarithmic
log2 N log-squared
N linear
N log N (no special term)
N2 quadratic
N3 cubic
N k,
polynomial
k ≥ 0, integer
N k log N l ,
(also) polynomial
k, l ≥ 0, integer
2N exponential
N! factorial
The theorem says we can ignore constant factors and use the simplest version of the
function possible, i.e., O(N 2 ) vs. O(3.5N 2 ).
[Exercise. Prove it. Hint. It’s easy.]
447
For Polynomial big-O complexity, Ignore all but the Highest Power Term
We only need monomials, never binomials or beyond, when declaring a big-O. For
example if TQ is O(N 4 + N 2 + N + 1) it’s more concisely O(N 4 ). Here’s why.
Theorem. If
k
!
X
TQ = O aj N j
j=0
then
O Nk .
TQ =
Pick a positive number a greater than all the coefficients aj , which will allow us to
write
Xk Xk
j
c aj N ≤ ca Nj .
j=0 j=0
TQ (N ) ≤ c 0 N k , for all N ≥ n0 ,
448
15.4 Ω Growth
15.4.1 Definition of Ω
Sometimes we want the opposite relationship between algorithm Q, and a function f .
We want to demonstrate that Q has worse (or at least no better) performance than
f when the data set on which it operates grows. We say that
TQ (N ) = Ω (f (N ))
⇐⇒
TQ grows at least as fast as f (N ) .
We are giving our timing on algorithm Q a lower bound using the function f (N ). In
words, we say that “the timing for algorithm Q is omega of f ”. Quantitatively,
TQ (N ) = Ω (f (N ))
⇐⇒
there exist positive constants n0 and c such that
TQ (N ) ≥ c |f (N )| , for all N ≥ n0 .
This means that while TQ (N ) might start out being much smaller than c|f (N )| for
small N , eventually it will “degrade” as N increases to the extent that TQ (N ) will
become and stay ≥ c |f (N )| for all N once we get to N = n0 .
There are similar theorems as those we proved for big-O complexity that would
apply to Ω time complexity, but they are not critical for our needs, so I’ll leave it as
an exercise.
[Exercise. State and prove lemmas and theorems analogous to the ones we proved
for big-O, but applicable to Ω growth.]
15.5 Θ Growth
Next, we express the fact that an algorithm Q is said to grow at exactly (a term
which is not universally used because it could be misinterpreted) the same rate as
some mathematical expression f (N ) using the notation
TQ (N ) = Θ (f (N )) .
We mean that TQ grows neither faster nor slower than f (N ). In words, we say ”the
timing for algorithm Q is theta f (N )”. Officially,
TQ (N ) = Θ (f (N ))
⇐⇒
both
TQ (N ) = O (f (N )) and TQ (N ) = Ω (f (N ))
449
Ideally, this is what we want to know about an algorithm. Sometimes, when pro-
grammers informally say an algorithm is big-O of N or N log N , they really mean
that it is theta of N or log N , because they have actually narrowed down the growth
rate to being precisely linear or logarithmic. Conversely, if a programmer says an al-
gorithm is linear or logarithmic or N log N , we don’t know what they mean without
qualification by one of categories, usually big-O or Θ.
15.8 Wrap-Up
This section was a necessarily brief and incomplete study of time complexity because
we only needed the most fundamental aspects of big-O and Θ growth for the most
obvious and easy-to-state classes: exponential growth, polynomial growth (and a few
cases of N log N growth). When we need them, the analysis we do should be self-
explanatory, especially with this short section of definitions on which you can fall
back.
450
Chapter 16
Recall that a one-qubit Hilbert space, H, consists of the 2-D complex vector space with
basis vectors |0i and |1i, making its typical state a superposition (always normalized,
451
of course) of those two vectors, as in
Multi-qubit computers operate in a tensor product of such spaces, one for each qubit.
This product is a 2n -dimensional Hilbert space, which I sometimes label with the
subscript (n) for clarity, as in H(n) . We can think of it as
n n−1
z }| { O
H(n) = H ⊗ H ⊗ ... ⊗ H = H.
k=0
The computational basis states, or CBS, of this product space are the 2n vectors of
the form
n−1
O
n
|xi = |xn−1 i ⊗ |xn−2 i ⊗ |xn−3 i ⊗ · · · ⊗ |x0 i = |xk i ,
k=0
where each |xk i is either |0i or |1i, i.e., a CBS of the kth one qubit space. We index in
decreasing order from xn−1 to x0 because we’ll want the right-most bit to correspond
to the least significant bit of the binary number xn−1 . . . x1 x0 .
and two other common notations we’ll need are the decimal integer (encoded ) form,
x, and its binary representation,
|xin , x ∈ {0, 1, 2, 3, . . . , 2n − 1} or
|xn−1 xn−2 . . . x3 x2 x1 x0 i , xk ∈ {0, 1} .
|0i3 ←→ |000i ,
|1i3 ←→ |001i ,
3
|2i ←→ |010i ,
3
|3i ←→ |011i ,
3
|4i ←→ |100i ,
and, in general,
|xi3 ←→ |x2 x1 x0 i .
452
The 2-dimensional H = H(1) and its 2n -dimensional products H(n) are models we
use for quantum computing. However, the problems that arise naturally in math and
computer science are based on simpler number systems. Let’s have a look at two such
systems and show that they are equivalent to the CBS of our Hilbert space(s), H.
16.2.2 Z2 )n
The Second Environment: The Finite Group (Z
Simple Integers
where I have explicitly stated the operation of interest, namely ordinary addition.
(We’re not particularly interested in multiplication at this time.) This is sometimes
called the group of integers, and as a set we know it’s infinite, stretching toward ±∞
in the two directions.
Another group that you may not have encountered in a prior course is the finite group
ZN , consisting of only the N integers from 0 to N − 1,
ZN ≡ { 0, 1 , 2, . . . N − 1}, “ + ” is (+ mod N ),
and this time we are using addition modulo N as the principal operation; if x+y ≥ N ,
we bring it back into the set by taking its remainder after dividing by N :
x + y (mod N ) ≡ (x + y) % N.
−x (mod N ) ≡ (N − x).
Subtraction is defined using the above two definitions, as you would expect,
x − y (mod N ) ≡ (x + −y) % N.
453
To make this concrete, we consider the group Z15 .
Example Operations mod-15
7 + 2 = 9
7 + 10 = 2
14 + 14 = 13
− 1 = 14
−7 = 8
10 = −5
14 − 2 = 12
2 − 8 = 9
4 − 4 = 0
While ZN is important in its own right (and we’ll be using it in the future), an
important special case you’ll want to embrace is ZN for N = 2. The above definition
of ZN works for Z2 , but when N = 2 some special notation kicks in:
Z2 ≡ { 0, 1 }, “ ⊕ ” is (+ mod 2) .
A few consequences of mod-2 addition are
0 ⊕ 1 = 1
1 ⊕ 0 = 1
0 ⊕ 0 = 0
1 ⊕ 1 = 0
−1 = 1
0 −1 = 0 1 = 1
Of course ⊕ is nothing other than the familiar XOR operation, although in this
context, we get subtraction and negative mod-2 numbers defined, as well. Also, while
we should be consistent and call subtraction , the last example shows that we can,
and usually do, use the ordinary “−” operator, even in mod-2 arithmetic. If there is
the potential for confusion, we would tag on the parenthetical “ (mod 2).”
454
Connection Between Z 2 and H
Z2 H(1)
0 ↔ |0i
1 ↔ |1i
I hasten to add that this connection does not go beyond the 1:1 correspondence listed
above and, in particular, does not extend to the mod-2 addition in Z2 vs. the vector
addition in H; those are totally separate and possess no similarities. Also, only the
basis states in H are part of this correspondence; the general state, |ψi = α |0i + β |1i
has no place in the analogy. As tenuous as it may seem, this connection will help us
in the up-coming analysis.
Z2 )n with ⊕ Arithmetic
Second Environment Completed: (Z
As a set, (Z2 )n is simply the n-tuples that have either 0 or 1 as their coordinates,
that is,
(Z2 )n ≡
(xn−1 , xn−2 , . . . x2 , x1 , x0 ) ,
each xk = 0 or 1,
Notice that we label the 0th coordinate on the far right, or bottom, and the (n − 1)st
coordinate on the far left, or top. This facilitates the association of these vectors with
binary number representations (coming soon) in which the LSB is on the right, and
the MSB is on the left (as in binary 1000 = 8, while 0001 = 1).
455
The additive operation stems from the “Z2 ” in its name: it’s the component-wise
mod-2 addition, ⊕ or, equivalently XOR, e.g.,
0 1 1
1 0 1
⊕ =
0 1 1
1 1 0
Z2 )n is a Vector Space
(Z
I’ve already started calling the objects in (Z2 )n vectors, and this truly is an official
designation. Just as Rn is an n-dimensional vector space over the reals, and H(n) is a
2n -dimensional vector space over the complex numbers, (Z2 )n is a vector space over
Z2 . The natural question arises, what does this even mean?
You know certain things about all vector spaces, a few of which are
All this is true of (Z2 )n , although the details will be defined as they crop up. For now,
we only care about the vector notation and ⊕ addition. That is, unless you want to
do this ...
[Exercise. Describe all of the above and anything else that needs to be confirmed
to authorize us to call (Z2 )n a vector space.]
Caution. For general N , (ZN )n is not a vector space. The enlightened among
you can help the uninitiated understand this in your forums, but it is not something
we will need. What kind of N will lead to a vector space?
Recall. Once again, think back to the vector space that we called B = B2 . It was
the formal structure that we used to define classical bits. Using the more conventional
language of this lesson, B is the four-element, 2-dimensional vector space (Z2 )2 .
Z2 )n and H (n)
Connection Between (Z
Let’s punch up our previous analogy, bringing it into higher dimensions. We relate
the n-component vectors, (Z2 )n , and the multi-qubit space, H = H(n) . As before,
456
the dimension of H(n) , 2n , equals the size of the group (Z2 )n , also 2n . The vectors in
(Z2 )n corresponds nicely to the CBS in H(n) :
(Z2 )n H(n)
t
(0, 0, . . . , 0, 0, 0) ↔ |00 · · · 000i
t
(0, 0, . . . , 0, 0, 1) ↔ |00 · · · 001i
(0, 0, . . . , 0, 1, 0)t ↔ |00 · · · 010i
t
(0, 0, . . . , 0, 1, 1) ↔ |00 · · · 011i
t
(0, 0, . . . , 1, 0, 0) ↔ |00 · · · 100i
.. ..
. .
(xn−1 , . . . , x2 , x1 , x0 )t ↔ |xn−1 · · · x2 x1 x0 i
Again, there is no connection between the respective addition operations, and the
correspondence does not include superposition states of H(n) . Still, the basis states in
H(n) line up with the vectors in (Z2 )n , and that’s the important thing to remember.
ZN ≡ { 0, 1 , 2, . . . N − 1}.
Now we’re going to make two modifications to this. First, we’ll restrict the size, N ,
to powers of 2, i.e., N = 2n , for some n,
Z2n ≡ { 0, 1 , 2, . . . 2n − 1},
The second change will be to the addition operation. It will be neither normal addition
nor mod-N addition. Instead, we define x + y using the bit-wise ⊕ operator.
If x = xn−1 xn−2 · · · x2 x1 x0 ,
and y = yn−1 yn−2 · · · y2 y1 y0 ,
then x ⊕ y ≡ (xn−1 ⊕ yn−1 ) · · · (x2 ⊕ y2 )(x1 ⊕ y1 )(x0 ⊕ y0 ) .
457
Note that the RHS of the last line is not a product, but the binary representation
using its base-2 digits (e.g., 11010001101 ). Another way to say it is
n−1
X
x⊕y ≡ (xk ⊕ yk )2k .
k=0
To eliminate any confusion between this group and the same set under ordinary mod-
N (mod-2n ) addition, let’s call ours by its full name,
(Z2n , ⊕) .
Examples of addition in (Z2n , ⊕) are
1⊕1 = 0,
1⊕2 = 3,
1⊕5 = 4,
4⊕5 = 1,
5 ⊕ 11 = 14,
13 ⊕ 13 = 0, and
15 ⊕ 3 = 12.
Note that for any x ∈ (Z2n , ⊕),
x⊕x = 0, so
x = −x,
i.e., x is its own additive inverse, under bit-wise ⊕.
Z2 )n and (Z
Connection Between (Z Z2n , ⊕)
(Z2n , ⊕) and (Z2 )n are fundamentally two ways to express the same group – they are
isomorphic in group terminology. This is symbolized by
(Z2n , ⊕) ∼ n
= (Z2 ) ,
under the set and operator association
xn−1
xn−2
..
x = (xn−1 xn−2 · · · x1 x0 ) ←→ x = . ,
x1
x0
x ⊕ y ←→ x ⊕ y.
[Exercise. For those of you fixating on the vector space aspect of (Z2 )n , you may
as well satisfy your curiosity by writing down why this makes Z2n a vector space over
Z2 , one that is isomorphic to (effectively the same as) (Z2 )n .]
458
16.2.4 Z2 )n and (Z
Interchangeable Notation of H(n) , (Z Z2n , ⊕)
In practical terms, the relationship between the above three environments allows us
to use bit-vectors,
1
1
(1, 1, 0, 1, 0)t =
0 ,
1
0
11010,
26
interchangeably, at will. One way we’ll take advantage of this is by using plain int
notation in our kets. For n = 5, for example, we might write any of the four equivalent
expressions,
|26i
|11010i
|1i |1i |0i |1i |0i
|1i ⊗ |1i ⊗ |0i ⊗ |1i ⊗ |0i ,
usually the first or second. Also, we may add notation to designate the number of
qubits under consideration, as in
Hazard
Why are these last two equivalent? We must be careful not to confuse the encoded
CBS notation as if it gave coordinates of CBS kets in the natural basis – it does not.
In other words, |3i2 expressed in natural tensor coordinates is not
1
.
1
a) |3i2 is a 4-dimensional vector and requires four, not two, coordinates to express
it, and
459
b) |3i2 is a CBS, and any CBS ket expressed in its own (natural) basis can have
only a single 1 coordinate, the balance of the column consisting of 0 coordinates.
Therefore, to answer this last question “why are the last two expressions equivalent?”
we must first express all vectors in terms of natural coordinates. That would produce
three column vectors (for |3i2 , |2i3 and |26i5 ) in which all had a single 1 in its
respective column, and only then could we compute the product of two of them,
demonstrating that it was equal to the third.
To head off another possible source of confusion, we must understand why the
tensor product dimension of the two component vectors is not 2 × 3 = 6 contradicting
a possibly (and incorrectly) hoped-for result of 5. Well, the dimensions of these spaces
are not 2, 3, 5 or even 6. Remember that “exponent” to the upper-right of the ket
designates the order of the Hilbert space. Meanwhile, the dimension of each space
is (2)order , so these dimensions are actually 22 , 23 and 25 . Now we can see that the
product space dimension, 32, equals the product of the two component dimensions,
4 × 8, as it should.
Most importantly, if x and y are two elements in Z2n , we may take their mod-2 sum
inside a ket,
|x ⊕ yi ,
which means that we are first forming x ⊕ y, as defined in (Z2n , ⊕), and then using
that n-bit answer to signify the CBS associated with it, e.g.,
|1 ⊕ 5i = |4i ,
|5 ⊕ 11i = |14i or, designating a qubit size,
4 4
|15 ⊕ 3i = |12i and
6 6
|21 ⊕ 21i = |0i .
460
Example of Different Notations Applied to Hadamard
Recall that the nth order Hadamard operator’s definition, usually given in encoded
binary form, is
n −1
n 2X
n 1
H ⊗n
|xi = √ (−1)x y |yin ,
2 y=0
where is the mod-2 dot product based on the individual binary digits in the base-2
representation of x and y,
the dot product between vector x and vector y is also assumed to be the mod-2 dot
product,
This demonstrates the carefree change of notation often uncounted in this and other
quantum computing presentations.
We have completed our review and study of the different notation and language
used for CBS. As we move forward, we’ll want to add more vocabulary that straddles
these three mathematical systems, most notably, periodicity, but that’s best deferred
until we get to the algorithms which require it.
461
Chapter 17
Quantum Oracles
|0in H ⊗n H ⊗n
Uf
|0in
| {z }
Quantum Oracle
• extend our input size to cover any dimension for each of the two oracle’s chan-
nels,
• define relativized and absolute time complexity, two different ways of measuring
a quantum algorithm’s improvement over a classical algorithm.
The last item relies on an understanding of the oracle’s time complexity, which is why
it is included in this chapter.
We’ll continue to use “Uf ” to represent the oracle for a Boolean function, f . Even
as we widen the input channels today, Uf will still have two general inputs, an upper,
A or data register and a lower, B or target register.
462
At the top of the page I’ve included a circuit that solves Simon’s problem (coming
soon). It reminds us how the oracle relates to the surrounding gates and contains a
wider (n qubit) input to the target than we’ve seen up to now.
|xi |xi
Uf
|yi |y ⊕ f (x)i
In terms of the effect that the oracle has on the CBS |xi |yi, which we know to be
shorthand for |xi ⊗ |yi, the oracle can be described as
Uf
|xi |yi 7−→ |xi |y ⊕ f (x)i .
There are a number of things to establish at the start, some review, others new.
0⊕0 = 0
0⊕1 = 1
1⊕0 = 1
1⊕1 = 0
463
3. The separable state |xi |yi coming in from the left represents a very special and
restricted input: a CBS. Whenever any gate is defined in terms of CBS, we
must remember to use linearity and extend the definition to the entire Hilbert
space. We do this by expanding a general ket, such as a 4-dimensional |ψi2 ,
along the CBS,
|ψi2 = c0 |0i |0i + c1 |0i |1i + c2 |1i |0i + c3 |1i |1i
= c0 |0i2 + c1 |1i2 + c2 |2i2 + c3 |3i2 ,
reading off the output for each of the CBS kets from our oracle description, and
combining those using the complex amplitudes, ck .
4. When a CBS is presented to the input of an oracle like Uf , the output happens
to be a separable state (something not true for general unitary gates as we saw
with the BELL operator). In this case, the separable output is |xi |y ⊕ f (x)i.
Considering the last bullet, we can’t expect such a nice separable product when
we present the oracle with some non-basis state, |ψi2 , at its inputs. Take care
not to make the mistake of using the above template directly on non-CBS inputs.
Oracles are often called black boxes, because we computer scientists don’t care how
the physicists and engineers build them or what’s inside. However, when we specify
an oracle using any definition (above being only one such example), we have to check
that certain criteria are met.
1. The definition of Uf ’s action on the CBS inputs as described above must result
in unitarity. Should you come across a putative oracle with a slightly off-beat
definition, a quick check of unitarity might be in order – a so called “sanity
check.”
2. The above circuit is for two one-qubit inputs (that’s a total of 4-dimensions for
our input and output states) based on a function, f , that has one bit in and
one bit out. After studying this easy case, we’ll have to extend the definition
and confirm unitarity for
• multi-qubit input registers taking CBS of the form |xin and |yim , and
• an f with domain and range larger than the set {0, 1}.
3. The function that we specify needs to be easy to compute in the complexity
sense. A quantum circuit won’t likely help us if a computationally hard function
is inside an oracle. While we may be solving hard problems, we need to find
easy functions on which to build our circuits. This means the functions have to
be computable in polynomial time.
4. Even if the function is easy, the quantum oracle still may be impractical to build
in the near future of quantum computing.
464
17.2.2 A Two-Qubit Oracle’s Action on the CBS
The nice thing about studying single bit functions, f , and their two-qubit oracles, Uf ,
is that we don’t have to work in abstracts. There are so few options, we can compute
each one according to the definitions. The results often reveal patterns that will hold
in the more general cases.
Notation Reminder – If a is a single binary digit, a = ¬ a is its logical negation
(AKA the bit-flip or not operation),
(
0, if a = 1,
a (or ¬ a) ≡
1, if a = 0.
If f ≡ 1, a constant function, we can list all the possible outputs in a four-line table:
(Remember, | i2 does not mean “ket squared”, but is an indicator that this is an
n-fold tensor product state, where n = 2.)
It’s always helpful to write down the matrix for any linear operator. At the very least
it will usually reveal whether or not the operator is unitary – even though we suspect
without looking at the matrix that Uf is unitary since it is real and is its own inverse.
However, self-invertibility does not always unitarity make, so it’s safest to confirm
unitarity by looking at the matrix.
This is a transformation from 4-dimensions to 4-dimensions, so we need to express
the 4-dimensional basis kets as coordinates. Let’s review the connection between the
four CBS kets of H(2) and their natural basis coordinates:
465
component |0i |0i |0i |1i |1i |0i |1i |1i
encoded |0i2 |1i2 |2i2 |3i2
1 0 0 0
0 1 0 0
coordinate
0 0 1 0
0 0 0 1
To obtain the matrix, express each Uf (|xi |yi) as a column vector for each tensor
CBS, |ki2 , k = 0, . . . , 3:
Uf |0i2 Uf |1i2 Uf |2i2 Uf |3i2
= |1i2 |0i2 |3i2 |2i2
0 1 0 0
1 0 0 0
= 0
0 0 1
0 0 1 0
Aha. These rows (or columns) are clearly orthonormal, so the matrix is unitary
meaning the operator is unitary.
It will be useful to express this matrix in terms of the 2 × 2 Pauli matrix, σx ,
associated with the X gate.
σx 0
Uf =
0 σx
σx
=
σx
This form reveals a pattern that you may find useful going forward. Whenever a
square matrix M can be broken down into smaller unitary matrices along its diagonal
(0s assumed elsewhere), M will be unitary.
[Exercise. Prove the last statement.]
Now, let f ≡ x, the identity function. We repeat the process of the first case study
and list all the possible outputs. For reference, here are the key mappings again:
Uf
|xi |0i 7−→ |xi |f (x)i ,
Uf
|xi |1i 7−→ |xi |f (x)i .
466
Now, the table for f ≡ x:
467
17.3 Integers Mod-2 Review
17.3.1 The Classical f at the Heart of an Oracle
We will be building oracles based on classical Boolean functions that might have many
inputs and one output,
f : { 0, 1 }n → { 0, 1 } .
f : { 0, 1 }n → { 0, 1 }m .
Z2n = {0, 1, 2, 3, . . . 2n − 1 }
n
Binary Vector Notation: (Z2 )
468
The Natural Basis for H(n)
Seen below, we can use either encoded or binary form to write these basis kets.
|0in ←→ |0 · · · 000i ,
|1in ←→ |0 · · · 001i ,
|2in ←→ |0 · · · 010i ,
|3in ←→ |0 · · · 011i ,
|4in ←→ |0 · · · 100i ,
..
.
| 2n − 1 in ←→ |1 · · · 111i .
|y ⊕ f (x)i
when y and f (x) are more than just labels for the CBS of the 2-dimensional H, 0 and
1. Of course, when they are 0 and 1, we know what to do:
|1 ⊕ 0i = |1i or
|1 ⊕ 1i = |0i .
But when we are in a higher dimensional Hilbert space that comes about by studying
a function sending Z2n into Z2m , we’ll remember the above correspondence. For
example,
With that little review, we can analyze our intermediate and advanced multi-qubit
oracles.
469
17.4 Intermediate Oracle: f is a Boolean Function
of a Multiple Input Bits
17.4.1 Circuit
A Wider Data Register
|xin |xin
Uf
|yi |y ⊕ f (x)i
ii) the y to be “⊕-ed” with f (x) is commensurate with the output value of f
(namely, a single binary bit).
The analysis of the simplest oracle taught us that Uf was expressible as a unitary
4 × 4 matrix with cute, little unitary 2 × 2 matrices (either 1 or σx ) along its diagonal.
It will turn out this intermediate-level oracle is nothing more than many copies of
those same 2 × 2s affixed to a longer diagonal, one traversing a 2n+1 × 2n+1 matrix.
470
17.4.3 Analyzing Uf for x = 0
For the moment, we won’t commit to a specific f , but we will restrict our attention
to the input x = 0.
Uf
|0in |0i 7−→ |0in |f (0)i ,
Uf
|0in |1i 7−→ |0in |f (0)i .
f (0) can be either 0 or 1, and y can be either 0 or 1, giving four possible combinations:
This is a little different from our previous table. Rather than completely determining
a 4 × 4 matrix for a particular f , it gives us two possible 2 × 2 sub-matrices depending
on the value of f (0).
1. When f (0) = 0, the first two columns of the matrix for Uf will be (see upper
two rows of table):
Uf |0in+1 Uf |1in+1 · · ·
= |0in+1 |1in+1 · · ·
1 0
0 1
= 0 0 ?
.. ..
. .
0 0
1
?
=
0 ?
2. When f (0) = 1, the first two columns of the matrix for Uf will be (see lower
471
two rows of table):
Uf |0in+1 Uf |1in+1 · · ·
= |1in+1 |0in+1 · · ·
0 1
1 0
= 0 0 ?
.. ..
. .
0 0
σx
?
=
0 ?
472
We computed the first two columns of the matrix for Uf before, and now we
compute two columns of Uf further to the right.
[In Case You Were Wondering. Why did the single value x = 0 produce two
columns in the matrix? Because there were two possible values for y, 0 and 1, which
gave rise to two different basis kets |0in |0i and |0in |1i. It was those two kets that
we subjected to Uf to produce the first two columns of the matrix. Same thing here,
except now the two basis kets that correspond to the fixed x under consideration are
|xin |0i and |xin |1i, and they will produce matrix columns 2x and 2x + 1.]
1. When f (x) = 0, columns 2x and 2x + 1 of the matrix for Uf will be (see upper
two rows of table):
· · · Uf (|xin |0i) Uf (|xin |1i) · · ·
· · · |xin |0i |xin |1i · · ·
2x 2x+1
z }| {
0 0
.. ..
. .
)
1 0
2x
=
0 1
2x+1
.. ..
. .
0 0
2x 2x+1
z }| {
0
2x
=
1
2x+1
0
2. When f (0) = 1, the first two columns of the matrix for Uf will be (see lower
473
two rows of table):
· · · Uf (|xin |0i) Uf (|xin |1i) · · ·
· · · |xin |1i |xin |0i · · ·
2x 2x+1
z }| {
0 0
.. ..
. .
)
0 1
2x
=
1 0
2x+1
.. ..
. .
0 0
2x 2x+1
z }| {
0
2x
σx
=
2x+1
0
As you can see, for any x the two columns starting with column 2x contain all 0s
away from the diagonal and are either the 2 × 2 identity 1 or the Pauli matrix σx on
the diagonal, giving the matrix the overall form
[1 or σx ]
[1 or σx ]
0
... .
0 [1 or σx ]
[1 or σx ]
474
17.5 Advanced Oracle: f is a Multi-Valued func-
tion of a Multiple Input Bits
17.5.1 Circuit and Vocabulary
A Wider Target Register
|xin |xin
Uf
m
|yi |y ⊕ f (x)im
which arises when the function under study is an f that maps Z2n → Z2m . In this
case it only makes sense to have an m-qubit B register, as pictured above, otherwise
the sum inside the bottom right output, |y ⊕ f (x)i would be ill-defined.
Sometimes m = n
Often, for multi-valued f , we can arrange things so that m = n and, in that case the
circuit will be
|xin |xin
Uf
n
|yi |y ⊕ f (x)in
|xin |xin
Uf
2
|yi |y ⊕ f (x)i2
This gives a total of 2n+2 possible input CBS going into the system, making the
dimension of the overall tensor product space, H(n+2) , = 2n+2 . The matrix for Uf
will, therefore, have size (2n+2 × 2n+2 ).
475
sub-matrix compared to the 2 × 2 sub-matrix we got when studying a fixed x in the
intermediate case.
The maneuver that we have to apply here, which was not needed before, is the
application of the ⊕ operator between y and f (x) on a bit-by-bit basis. For m = 2
(and using the notation f (x)k to mean the kth digit of the number f (x)), we get
The second line is the definition of ⊕ in Z22 , and the (· · ·)(· · ·) inside the ket is not
multiplication but the binary expansion of the number y1 y0 ⊕ f (x)1 f (x)0 . Thus, our
four combinations of the 2-bit number y with the fixed value f (x) become
Uf
|xin |0i |0i 7−→ |xin |0 ⊕ f (x)1 i |0 ⊕ f (x)0 i ,
Uf
|xin |0i |1i 7−→ |xin |0 ⊕ f (x)1 i |1 ⊕ f (x)0 i ,
Uf
|xin |1i |0i 7−→ |xin |1 ⊕ f (x)1 i |0 ⊕ f (x)0 i , and
Uf
|xin |1i |1i 7−→ |xi |1 ⊕ f (x)1 i |1 ⊕ f (x)0 i .
Applying 0⊕a = a and 1⊕a = a to the RHS of these equalities produces the identities
that will drive the math,
Uf
|xin |0i |0i 7−→ |xin |f (x)1 i |f (x)0 i ,
Uf
|xin |0i |1i 7−→ |xin |f (x)1 i |f (x)0 i ,
Uf
|xin |1i |0i 7−→ |xin |f (x)1 i |f (x)0 i , and
Uf
|xin |1i |1i 7−→ |xin |f (x)1 i |f (x)0 i .
Of course, for a general f , there are now four possible values for f (x), and we
have to combine those with the four possible values for y, yielding a whopping 16
combinations for this fixed x, as the following table reveals.
476
|xin |yi2 Uf |xin |yi2
f (x)
= · · · |xin |0i |0i |xin |0i |1i |xin |1i |0i |xin |1i |1i · · · ,
477
which translates to the matrix
4x → 4x+3
z }| {
0 0 0 0
.. .. .. ..
. . . .
1 0 0 0
4x
0 1 0 0
−→
0 0 1 0
4x+3
0 0 0 1
.. .. .. ..
. . . .
0 0 0 0
4x→4x+3
z }| {
0
4x →
1
4x+3
0
4x → 4x+3
z }| {
0
4x
1 0
−→
0 1
4x+3
0
478
2. Let’s skip to the case f (x) = 3, and use the table to calculate columns 4x
through 4x + 3 of the matrix for Uf (see bottommost four rows of table):
= · · · |xin |1i |1i |xin |1i |0i |xin |0i |1i |xin |0i |0i · · · ,
4x → 4x+3
z }| {
0 0 0 0
.. .. .. ..
. . . .
0 0 0 1
4x
0 0 1 0
−→
0 1 0 0
4x+3
1 0 0 0
.. .. .. ..
. . . .
0 0 0 0
4x → 4x+3
z }| {
0
4x
0 σx
−→
σx 0
4x+3
0
479
3. Exercise: When f (x) = 1, show that columns 4x → 4x + 3 of Uf are
4x → 4x+3
z }| {
0
4x
σx 0
−→
0 σx
4x+3
0
4x → 4x+3
z }| {
0
4x
0 1
−→
1 0
4x+3
0
Summary of Case m = 2
We’ve covered all four possible values for f (x) and done so for arbitrary x = 0, . . . , 2n −
1. What we have discovered is that
1 σx 1 σx
, , , and ;
1 σx 1 σx
480
• since, for each x, its 4 × 4 unitary matrix is in rows 4x → 4x + 3, with zeros
above and below, it follows (exercise) that there can only be zeros to the left
and right on those rows.
Once again, even though we expected Uf to be unitary, its matrix can be seen
to exhibit this property on its own merit, and the only non-zero elements are on
4 × 4 sub-matrices that lie along the diagonal. This characterizes all oracles for
f : Z2n → Z22 .
• Every column is a CBS ket since it is the separable product of CBS kets.
• All CBS kets are normal vectors (all coordinates are 0 except one, which is 1).
• If two columns were the identical, Uf would map two different CBS kets to the
same CBS ket.
• Any matrix that maps two different CBS kets to the same CBS ket cannot be
invertible.
• Since Uf is its own inverse, it is invertible, so by last two bullets, all columns
are distinct unit vector.
• The inner product of these columns with themselves is 1 and with other columns
is 0: the matrix has orthonormal columns. QED
Of course, we learn more by describing the form of the matrices, so the activities
of this chapter have value beyond proving unitarity.
This is where we should apply our intuition and extrapolate the m = 2 case to all
m. Each time m increases by 1, the number of columns in Uf controlled by a single
input value x doubles. For m = 3, we would have 8 × 8 unitary matrices along the
diagonal, each built from appropriate combinations of σx and 1. For m = 4, we would
have unitary 16 × 16 matrices along the diagonal. And when m = n, we would have
481
n n
2n × 2n unitary sub-matrices along the diagonal of a really large 22 × 22 matrix
for Uf . Our explicit demonstrations in the m = 1 and 2 cases can be duplicated to
any m with no theoretical difficulty. We’d only be dealing with lengthier tables and
larger sub-matrices. So if the number of output bits of f is m, for m > 2, the results
are the same: Uf is unitary and it consists of all 0s except near the diagonal where
sub-matrices are built from combinations of σx and 1.
482
17.6 The Complexity of a Quantum Algorithm Rel-
ative to the Oracle
In this course, we are keeping things as simple as possible while attempting to provide
the key ideas in quantum computation. To that end, we’ll make only one key classi-
fication of oracles used in algorithms. You will undoubtedly explore a more rigorous
and theoretical classification in your advanced studies.
The above constructions all demonstrated that we can take an arbitrary function, f ,
and, in theory, represent it as a reversible gate associated with a unitary matrix. If
you look at the construction, though, you’ll see that any function which requires a full
table of 2n values to represent it (if there is no clear analytical short-cut we can use
to compute it) will likewise end up with a similarly complicated oracle. The oracle
would need an exponentially large number of gates (relative to the number of binary
inputs, n). An example will come to us in the form of Simon’s algorithm where we
have a Z2n -periodic function (notation to be defined) and seek to learn its period.
However, there are functions which we know have polynomial complexity and can
be realized with a correspondingly simple oracle. An example of this kind of oracle
is that which appears in Shor’s factoring algorithm (not necessarily Shor’s period-
finding algorithm). In a factoring algorithm, we know a lot about the function that
we are trying to crack and can analyze its specific form, proving it to be O(n3 ). Its
oracle will also be O(n3 ).
483
Chapter 18
484
In the problem at hand, we are given a function and asked to find its period. How-
ever, the function is not a typical mapping of real or complex numbers, and the period
is not the thing that you studied in your calculus or trigonometry classes. Therefore,
a short review of periodicity, and its different meanings in distinct environments, is
in order.
18.2 Periodicity
We’ll first establish the meaning of periodicity in a typical mathematical context –
the kind you may have seen in past calculus or trig course. Then we’ll define it in a
more exotic mod-2 environment Z2n ∼ n
= (Z2 ) .
f (x + a) = f (x) ,
and 2π is the smallest positive number with this property, so a = 2π is its period. 4π
and 12π satisfy the equality, but they’re not as small as 2π, so they’re not periods.
18.2.2 Z2 )n Periodicity
(Z
Let’s change things up a bit. We define a different sort of periodicity which respects
not ordinary addition, but mod-2 addition.
485
Z2 )n periodic if there exists an a ∈ (Z2 )n , a 6= 0, such that,
is called (Z
18.2.3 Z 2n Periodicity
f is Z 2n periodic if there exists an a ∈ Z2n , a 6= 0, such that,
Z2 )n Periodicity
Examples of (Z
We implied that the range of f could be practically any set, S, but for the next few
examples, let’s consider functions that map Z2n into itself,
f : Z2n −→ Z2n
486
Collapsing One Bit – Periodic
A simple (Z2 )n periodic function is one that preserves all the bits of x except for one,
say the kth, and turns that one bit into a constant (either 0 or 1).
Let n = 5, k = 2, and the constant, 1. This is a “2nd bit collapse-to-1,” alge-
braically
x4
x3
f (x) ≡ 1 = x4 x3 1 x1 x0
x1
x0
If n = 5, k = 4, and the constant was 0, we’d have a “4th bit collapse-to-0,”
0
x3
g(x) ≡ x2 = 0 x3 x1 x1 x0 .
x1
x0
Let’s show why the first is (Z2 )5 periodic with period 4, and this will tell us why all
the others of its ilk are periodic (with possibly a different period), as well.
Notation - Denote the bit-flip operation on the kth bit, xk , to mean
(
0, if xk = 1,
xk ≡
1, if xk = 0.
[Exercise. Demonstrate that you can effect a bit-flip on the kth bit of x using x⊕2k .]
I claim that the “2nd-bit collapse-to-1” function, f , is (Z2 )n periodic with period
a = 4. We must show
f (x) = f (y) ⇔ y = x ⊕ 4.
• ⇐: Assume y = x ⊕ 4. That means (in vector notation)
x4 y4 x4
x3 y 3 x3
x = x2 and y = y 2 = x2 .
x1 y 1 x1
x0 y0 x0
Then,
x4 x4
x3 x3
1 ,
f (x) = but, also 1 ,
f (y) =
x1 x1
x0 x0
proving f (x) = f (y).
487
• ⇒ : Assume f (x) = f (y) for x 6= y. Since f does not modify any bits other
than bit 2, we conclude yk = xk , except for (possibly) bit k = 2. But since
x 6= y, some bit must be different between them, so it has to be bit-2. That is,
y4 x4
y 3 x3
y = y 2
= x2 = x ⊕ 4,
y 1 x1
y0 x0
showing that y = x ⊕ 4. QED
Of course, we could have collapsed the 2nd bit to 0, or used any other bit, and gotten
the same result. A single bit collapse is (Z2 )n periodic with period 2k , where k is the
bit being collapsed.
Note - With single bit collapses, there are always exactly two numbers from the
domain Z2n that map into the same range number in Z2n . With the example, f , just
discussed,
)
00100
7−→ 00100 ,
00000
)
10101
7−→ 10101 ,
10001
)
11111
7−→ 11111 ,
11011
and so on.
A collapse of two or more bits can never be periodic, as we now show. For illustration,
let’s stick with n = 5, but consider a simultaneous collapse-to-1 of both the 2nd and
0th bit,
x4
x3
f (x) ≡ 1 = x4 x3 1 x1 1
x1
1
In this situation, for any x in the domain of f , you can find three others that map to
the same f (x). For example,
00000 = 0
00001 = 1
7−→ 00101 = 5 .
00100 = 4
00101 = 5
488
If there were some period, a, for this function, it would have to work for the first two
listed above, meaning, f (0) = f (1). For that to be true, we’d need 1 = 0 ⊕ a, which
forces a = 1. But a = 1 won’t work when you consider the first and third x, above:
f (0) = f (4), yet 4 6= 0 ⊕ 1.
As you can see, this comes about because there are too many xs that get mapped
to the same f (x).
Let’s summarize: bit collapsing gives us a periodic function only if we collapse
exactly one bit (any bit) to a constant (either 0 or 1). However, a function which is
a “multi-bit-collapser” (preserving the rest) can never be periodic.
So, what are the other periodic functions in the Z2n milieu?
n-To-One
(In this small section we will set up a concept that you’ll find useful in many future
quantum algorithms: the partitioning of dom(f ) into subsets.)
We now characterize all Z2n periodic functions.
Let’s say you have one in your hands, call it f , and you even know its period, a.
Buy three large trash bins at your local hardware store. Label one R, one Q and the
third, S (for S ource pool). Dump all x ∈ dom(f ) (which is Z2n ) into the source pool,
489
S. We’re going to be moving numbers from the source, S, into R and Q according to
this plan:
1. Pick any x ∈ S = Z2n . Call it r0 (0, because it’s our first pick).
3. Toss r0 into bin R and its partner, q0 , (which you may have to dig around in S
to find) into bin Q.
5. Pick a new x from what’s left of S. Call it r1 (1, because it’s our second pick).
7. Toss r1 into bin R and its partner, q1 , into bin Q. S is further reduced by two
and is now S = S − {r0 , q0 , r1 , q1 }.
8. Repeat this activity, each pass moving one value from bin S into bin R and its
partner into bin Q until we have none of the original domain numbers left in S.
9. When we’re done, half of the xs from dom(f ) will have ended up in R and the
other half in Q. (However, since we chose the first of each pair at random, this
was not a unique division of dom(f ), but that’s not important.)
Here is the picture, when we’re done with the above process:
Z2n = R ∪ Q
= { . . . , rk , . . .} ∪ { . . . , qk , . . . }
f (rk ) = f (qk )
1. Every periodic function in the Z2n sense is 2-to-1. If a function is not 2-to-1, it
doesn’t have a prayer of being periodic.
490
2. We have a way to produce arbitrary Z2n periodic functions. Pick a number
you want to be the period and call it a. Start picking rk numbers from Z2n
at random, each pick followed immediately by the computation of its partner,
qk = rk ⊕ a. Assign any image value you like to these two numbers. (The
assigned image values don’t even have to come from Z2n – they can be the
chickens in your yard or the contacts in your phone list. All that matters is
that once you use one image value for an rk − qk pair, you don’t assign it to any
future pair.) This will define a Z2n periodic function with period a.
Pay particular attention to bullet 2. If we get a single pair that map to the same
f (x) = f (y), we will have our a. This will be used in the classical (although not
quantum) analysis.
491
18.4 Simon’s Quantum Circuit Overview and the
Master Plan
18.4.1 The Circuit
A bird’s eye view of the total circuit will give us an idea of what’s ahead.
|0in / H ⊗n / / H ⊗n /
Uf
|0in / /
You see a familiar pattern. There are two multi-dimensional registers, the upper
(which I will call the A register, the data register or even the top line, at my whim),
and the lower (which I will call the B register, target register or bottom line, corre-
spondingly.)
This is almost identical to the circuits of our recent algorithms, with the following
changes:
• The target channel is “hatched,” reflecting that it has n component lines rather
than one.
• We are sending a |0in into the bottom instead of a |1i (or even |1in ).
In fact, that third bullet concerning measuring the bottom register will turn out to be
conceptual rather than actual. We could measure it, and it would cause the desired
collapse of the upper register, however our analysis will reveal that we really don’t
have to. Nevertheless, we will keep it in the circuit to facilitate our understanding,
label it as “conceptual,” and then abandon the measurement in the end when we are
certain it has no practical value.
Here is the picture I’ll be using for the remainder of the lesson.
|0in H ⊗n H ⊗n
Uf
| {z }
(actual)
|0in
| {z }
(conceptual)
[Note: I am suppressing the hatched quantum wires to produce a cleaner circuit.
Since every channel is has n lines built-into it and we clearly see the kets and operators
labeled with the “exponent” n, the hatched wires no longer serve a purpose.]
492
18.4.2 The Plan
We will prepare a couple CBS kets for input to our circuit, this time both will be |0in .
The data channel (top) will first encounter a multi-dimensional Hadamard gate to
create a familiar superposition at the top. This sets up quantum parallelism which we
found to be pivotal in past algorithms. The target channel’s |0in will be sent directly
into the oracle without pre-processing. This is the first time we will have started with
a |0i rather than a |1i in this channel, a hint that we’re not going to get a phase
kick-back today. Instead, the generalized Born rule, (QM Trait #15”) will turn out
to be our best friend.
[Preview: When we expect to achieve our goals by applying the Born rule to a
superposition, the oracle’s target register should normally be fed a |0in rather than
a |1in .]
After the oracle, both registers will become entangled.
At that point, we conceptually test the B register output. This causes a collapse
of both the top and bottom lines’ states (from the Born rule), enabling us to know
something about the A register. We’ll analyze the A register’s output – which re-
sulted from this conceptual B register measurement – and discover that it has very
special properties. Post processing the A register output by a second “re-organizing”
Hadamard gate will seal the deal.
In the end, we may as well have measured the A register to begin with, since
quantum entanglement authorizes the collapse using either line, and the A register is
what we really care about.
Strategy
Our strategy will be to “load the dice” by creating a quantum circuit that spits out
measured states which are “orthogonal” to the period a, i.e., z · a = 0 mod-2. (This is
not a true orthogonality, as we’ll see, but everyone uses the term and so shall we. We’ll
discover that states which are orthogonal to a can often include the “vector” a, itself,
another reason for the extra care with which we analyze our resulting “orthogonal”
states.)
That sounds paradoxical; after all, we are looking for a, so why search for states
orthogonal to it? Sometimes in quantum computing, it’s easier to back-into the
desired solution by sneaking up on it indirectly, and this turns out to be the case in
Simon’s problem. You can try to think of ways to get a more directly, and if you
find an approach that works with better computational complexity, you may have
discovered a new quantum algorithm. Let us know.
Because the states orthogonal to a are so much more likely than those that are not,
we will quickly get a linearly independent set of n − 1 equations with n − 1 unknowns,
namely, a·wk = 0, for k = 0, . . . , n−2. We then augment this system instantly (using
a direct classical technique) with an nth linearly independent equation, at which point
we can solve the full, non-degenerate n × n system for a using fast and well-known
493
techniques.
|0in H ⊗n H ⊗n
Uf
| {z }
(actual)
|0in
| {z }
(conceptual)
A B C
|0in H ⊗n H ⊗n
Uf
|0in
Hadamard, H ⊗n , in H(n)
494
It never hurts to review the general definition of a gate like H ⊗n . For any CBS |xin ,
the 2n -dimensional Hadamard gate is expressed in encoded form using the formula,
n 2Xn −1
n 1
H ⊗n |xi = √ (−1)x y |yin ,
2 y=0
where is the mod-2 dot product. Today, I’ll be using the alternate vector notation,
n 2X n −1
n 1
⊗n
H |xi = √ (−1)x · y |yin ,
2 y=0
where the dot product between vector x and vector y is also assumed to be the mod-2
dot product. In the circuit, we have
n −1
n 2X
|xin
H ⊗n √1
2
(−1)x · y |yin ,
y=0
or, returning to the usual computational basis notation, |xin , for the summation index
is
n −1
n 2X
n
|0i H ⊗n √1
2
|xin .
x=0
You’ll recognize the output state of this Hadamard operator as the nth order x-basis
CBS ket, |0in± . It reminds us that not only do Hadamard gates provide quantum
parallelism but double as a z ↔ x basis conversion operator.
|0in H ⊗n H ⊗n
Uf
n
|0i
| {z }
Quantum Oracle
495
Due to the increased B channel width, we had better review the precise definition of
the higher dimensional oracle. It’s based on CBS kets going in,
|xin |xin
Uf
n
|yi |y ⊕ f (x)in
Uf
|xin |yin 7−→ |xin |y ⊕ f (x)in ,
and from there we extend to general input states, linearly. We actually constructed
the matrix of this oracle and proved it to be unitary in our lesson on quantum oracles.
Today, we need only consider the case of y = 0:
|xin |xin
Uf
n
|0i |f (x)in
Uf
|xin |0in 7−→ |xin |f (x)in
In Words
We are
1. taking the B register CBS input |yin , which is |0in in this case, and extracting
the integer representation of y, namely 0,
2. applying f to the integer x (of the A register CBS |xin ) to form f (x),
4. forming the mod-2 sum of these two integers, 0 ⊕ f (x), which, of course is f (x),
and
5. using the result to define the output of the oracle’s B register, |f (x)in .
Just to remove any lingering doubts, assume n = 5, x = 18, and f (18) = 7. Then
the above process yields
1. |0i5 −→ 0
f (18)=7
2. |18i5 −→ 18 −−−−−−−−→ 7
496
3. 0 ⊕ 7 = 00000 ⊕ 00111 = 00111 = 7,
4. 7 −→ |7i5
Uf
5. |18i5 ⊗ |0i5 −−−−−−→ |18i5 ⊗ |7i5
This is a weighted sum of separable √ products. (The weights are the same for each
n
separable product in the sum: (1/ 2) .) That sum is not, as a whole, separable,
which makes it impossible to visualize directly on the circuit diagram unless we com-
bine the two outputs into a single, entangled, output register. However we do have
an interpretation that relates to the original circuit.
|f (x)in
• The output state is a superposition of separable terms |xin √
( 2)n
. But this is
exactly the kind of sum the generalized Born rule needs,
497
√ n
• Each of the 2n orthogonal terms in the superposition has amplitude 1/ 2 ,
so the probability that
a√measurement by A will collapse the superposition to
n 2
any one of them is 1/ 2 = 1/2n .
[Exercise. Why are the terms orthogonal? Hint: inner product of tensors.]
n −1
2P
[Exercise. Look at the sum in our specific situation: |xin |f (x)in . QM
x=0
Trait #6 (Probability of Outcomes) assumes we start with an expansion along some
computational basis, but this expansion only has 2n terms, so can’t be a basis for
A ⊗√B which has 2n · 2n = 22n basis vectors. Why can we claim that the scalars
n
1/ 2 are amplitudes of collapse to one of these states? Hint: While the kets
in the sum do not comprise a full basis, they are all distinct CBS kets. (Are they
distinct? each f (x) appears twice in the sum. But that doesn’t destroy the linear
independence of the tensor CBS kets in that sum, because . . . .) Therefore, we can
add the missing CBS kets into the sum as long as we accompany them by 0 scalar
weights. Now we can apply Trait #6.]
n −1
2P
[Exercise. Look at the the more general sum in the Born rule: |xinA |ψx im
B .
x=0
QM Trait #6 (Probability of Outcomes) assumes we start with an expansion along
some computational basis, but this expansion only has 2n terms, so can’t be a basis
A ⊗ B which has 2n · 2m = 2n+m basis vectors. Why can we claim that the scalars
for √
n
1/ 2 are amplitudes of collapse to one of these states? Hint: Force this to be
an expansion along the tensor CBS by expanding each |ψx im B along the B-basis then
distributing the |xinA s. Now we meet the criterion of Trait #6.]
Z2n = R ∪ Q
= { · · · , x, · · · } ∪ { · · · , x ⊕ a, · · · } .
Cosets
498
18.6.5 Rewriting the Output of the Oracle’s B Register
Our original expression for the oracle’s complete entangled output was
2n −1
n X
1
√ |xin |f (x)in ,
2
x=0
but our new partition of the domain will give us a propitious way to rewrite this.
Each element x ∈ R has a unique partner in Q satisfying
x
f
7−→ f (x) .
x ⊕ a x∈R
Using this fact, we only need to sum the B register output over R (half as big as Z2n )
and include both x and x ⊕ a in each term,
2n −1
n X n X
1 n n 1
√ |xi |f (x)i = √ |xin + |x ⊕ ain |f (x)in
2 2
x=0 x∈R
n−1 X n
|xi + |x ⊕ ain
1
= √ √ |f (x)in
2 2
x∈R
√
I moved one of the factors of 1/ 2 into the sum so we could see
1. how the new sum consists of normalized states (length 1), and
2. the common amplitude remaining on the outside, nicely produces a normalized
state overall, since now there are half as many states in the sum, but each state
has twice the probability as before.
The last rearrangement of the sum had a fascinating consequence. While the terms
still consist of separable products from the A and B channels, now it is the B channel
that has basis CBS kets, and the A channel that does not.
How can we see this? Each |f (x)in was always a CBS state – f (x) is an integer
from 0 to 2n − 1 and so corresponds to a CBS ket – but the original sum was plagued
by its appearing twice (destroying potential linear independence of the terms), so we
couldn’t view the sum as an expansion along the B-basis. After consolidating all
the pre-image pairs, we now have only one |f (x)in term for each x in the sum: we
have factored the expansion along the B-basis. In the process, the A-factors became
superpositions of A CBS kets, now mere |ψx inA s in the general population. This
reverses the roles of A and B and allows us to talk about measuring B along the
CBS, with that measurement selecting one of the non-CBS A factors.
499
The upshot is that we can apply the Born rule in reverse; we’ll be measuring the B
register and forcing the A register to collapse into one of its “binomial” superpositions.
Let’s do it. But first, we should give recognition to a reusable design policy.
This all worked because we chose to send the CBS |0in into the oracle’s B register.
Any other CBS into that channel would not have created the nice terms |xin |f (x)in
of the oracle’s entangled output. After factoring out terms that had common |f (x)in
components in the B register, we were in a position to collapse along the B-basis and
pick out the attached sum in the A register.
Remember this. It’s a classic trick that can be tried when we want to select a
small subset of A register terms from the large, perfectly mixed superposition in that
register. It will typically lead to a probabilistic outcome that won’t necessarily settle
the algorithm in a single evaluation of the oracle, but we expect it to give a valuable
result that can be combined with a few more evaluations (loop passes) of the oracle.
This is the lesson we learn today that will apply next time when we study Shor’s
period-finding algorithm.
In contrast, when we were looking for a deterministic solution in algorithms like
Deutsch-Jozsa and Bernstein-Vazirani, we fed a |1i into the B register and used the
phase kick-back to give us an answer in a single evaluation.
|1i into oracle’s B register −→ phase kick-back ,
|0in into oracle’s B register −→ Born rule .
Conceptual
Each B register measurement of “f (x)” will be attached to not one, but two, input A
register states. Thus, measuring B first, while collapsing A, actually produces merely
500
a superposition in that register, not a single, unique x from the domain. It narrows
things down considerably, but not completely,
n−1 X |xin + |x ⊕ ain
√1
2
√ |f (x)in
x∈R
2
|x0 in + |x0 ⊕ ain
& √ |f (x0 )in
2
Here, & means collapses to.
Well that’s good, great and wonderful, but if after measuring the post-oracle B regis-
ter, we were to measure line A, it would collapse to one of two states, |x0 i or |x0 + ai,
but we wouldn’t know which nor would we know its unsuccessful companion (the one
to which the state didn’t collapse). There seems to be no usable information here.
As a result we don’t measure A ... yet.
Let’s name the collapsed – but unmeasured – superposition state in the A register
|ψx0 in , since it is determined by the measurement “f (x0 )” of the collapsed B register,
Guiding Principle: Narrow the Field. We stand back and remember this
stage of the analysis for future use. Although a conceptual measurement of B does not
produce an individual CBS ket in register A, it does result in a significant narrowing
of the field. This is how the big remaining quantum algorithms in this course will
work.
|0in H ⊗n H ⊗n
Uf
|0in
[Apology. I can’t offer a simple reason why anyone should be able to “intuit” a
Hadamard as the post-oracle operator we need. Unlike Deutsch-Jozsa, today we
are not measuring along the x-basis, our motivation for the final H ⊗n back then.
However, there is a small technical theorem about√a Hadamard applied to a binomial
superposition of CBSs of the form (|xin + |yin ) / 2 which is relevant, and perhaps
this inspired Simon and his compadres.]
501
Continuing under the assumption that we measure an f (x0 ) at the B register out-
put, thus collapsing both registers, we go on to work with the resulting superposition
|ψx0 i in the A register. Let’s track its progress as we apply the Hadamard gate to it.
As with all quantum gates, H ⊗n is linear so moves past the sum, and we get
and
2n −1
n X
1
H ⊗n |x0 ⊕ ain = √ (−1)y · (x0 ⊕ a) |yin
2
y=0
(−1)y · (x0 ⊕ a) .
1. The mod-2 dot product distributes over the mod-2 sum ⊕, and
2. (−1)p ⊕ q = (−1)p (−1)q , for p and q in Z2n . (Danger: while it’s true for a
base of -1, it is not for a general complex base c).
[Exercise. Prove the second identity. One idea: Break it into two cases, p ⊕ q
even and p ⊕ q odd, then consider each one separately.]
[Exercise. Find a complex c, for which (c)p ⊕ q 6= (c)p (c)q . Hint: Use the
simplest case, n = 1.]
Combining both facts, we get
502
so
H ⊗n |x0 in + H ⊗n |x0 ⊕ ain
√
2
n −1
n 2X
√1
2
(−1)y · x0 1 + (−1)y · a |yin
y=0
= √
2
2n −1
n+1 X
1
= √ (−1)y · x0 1 + (−1)y · a |yin .
2
y=0
so we can omit all those 0 terms which correspond to y · a = 1 (mod 2), leaving
n n n−1 X
⊗n |x0 i + |x0 ⊕ ai 1
H √ = √ (−1)y · x0 |yin .
2 2 y·a = 0
(mod 2)
n−1
Note that the sum is now over only 2 , or exactly half, of the original 2n CBSs.
[Exercise. Show that, for any fixed number a ∈ Z2n (or its equivalent vector
a ∈ (Z2 )n ) the set of all x with x a = 0 (or x with x · a = 0) is exactly half the
numbers (vectors) in the set.]
Avoid confusion. Don’t forget that these dot products (like y · a) are mod-2 dot
products of vectors in (Z2 )n . This has nothing to do with the Hilbert space inner
product nhy | ain , an operation on quantum states.
503
When we talk about orthogonality of vectors or numbers at this stage of the
analysis, we mean the mod-2 dot product, not the state space inner product. Indeed,
there is nothing new or interesting about Hilbert space orthogonality at this juncture:
CBS kets always form an orthonormal set, so each one has inner product = 1 with
itself and inner product = 0 with all the rest. However, that fundamental fact doesn’t
help us here. The mod-2 dot product does.
If n = 4, and
0 a3
1 a2
a = 5 =
0 = a1
,
1 a0
then
y | y · a = 0 (mod 2)
a3
a3
0 1
= a1
∪ a1
,
0 1
a1 ,a3 ∈ {0,1} a1 ,a3 ∈ {0,1}
which consists of eight of the original 24 = sixteen (Z2 )4 vectors associated with the
CBSs. √Therefore, there
√ are 2n−1√= 23 = 8 terms in the sum, exactly normalized by
the (1/ 2)n−1 = (1/ 2)3 = (1/ 8) out front.
Note: In this case, a is in the set of mod-2 vectors which are orthogonal to it.
[Exercise. Characterize all a which are orthogonal to themselves, and explain
why this includes half of all states in our total set.]
Returning to our derivation, we know that once we measure the B register and
collapse the states into those associated with a specific |f (x0 )i, the A register can be
post-processed with a Hadamard get to give
n−1
|2n−1 |
|x0 in + |x0 ⊕ ain
X
1
√ H ⊗n √ (−1)y · x0 |yin .
2 2
y·a = 0
(mod 2)
All of the vectors in the final A register superposition are orthogonal to a, so we can
now safely measure that mixed state and get some great information:
504
We don’t know which y0 among the 2n−1 ys we will measure – that depends on the
whimsy of the collapse, and they’re all equally likely. However, we just showed that
they’re all orthogonal to a, including y0 .
Warning(s): y = 0 Possible
This is one possible snag. We might get a 0 when we test the A register. It is a possible
outcome, since 0 = 0 is mod-2 orthogonal to everything. The probabilities are low –
1/2n – and you can test for it and throw it back if that happens. We’ll account for
this in our probabilistic analysis, further down. While we’re at it, remember that a,
itself, might get measured, but that’s okay. We won’t know it’s a, and the fact
that it might be won’t change a thing that follows.
n−1 X n
|xi + |x ⊕ ain
1 ⊗n ⊗n
n
= √ H ⊗1 √ |f (x)i
2 2
x∈R
n−1 X ⊗n
H |xin + H ⊗n |x ⊕ ain
1
= √ √ |f (x)in .
2 2
x∈R
We can now cite the – still valid – result that led to expressing the Hadamard fraction
505
as a sum of CBS kets satisfying y · a = 0 (mod 2), so the last expression is
n−1 X n−1 X
1 1 n
y·x
|f (x)in
= √ √ (−1) |yi
2 2 y·a = 0
x∈R
(mod 2)
2n−2
1 X X
= √ |yin (−1)y · x |f (x)in .
2 y·a = 0 x∈R
(mod 2)
While our double sum has more overall terms than before, they are all confined to
those y which are (mod-2) orthogonal to a. In fact, we don’t have to apply the Born
rule this time, because all that we’re claiming is an A register collapse to one of the
2n−1 CBS kets |yin which we get compliments of quantum mechanics: third postulate
+ post measurement collapse.
Therefore, when we measure only the A register of this larger superposition, the
collapsed state is still some y orthogonal to a.
Because
P I often use the variable y for general CBS states, |yi, or summation variables,
, I’m going to switch to the variable z for the measured orthogonal output state,
y
as in |zi , z ⊥a. We’ll then have a mental cue for the rest of the lecture, where z
will always be a mod-2 vector orthogonal to a. With this last tweak, our final circuit
result is
506
sampling the circuit several more times. How long will it take us to be relatively
certain we have n − 1 independent vectors? We explore this question next.
I will call the set of all vectors orthogonal to a either a⊥ (if using vector notation)
or a⊥ (if using Z2n notation). It is pronounced “a-perp.”
[Exercise. Working with the vector space (Z2 )n , show that a⊥ is a vector sub-
space. (For our purposes, it’s enough to show that it is closed under the ⊕ operation).]
[Exercise. Show that those vectors which are not orthogonal to a do not form a
subspace.]
If n = 4, and
0 a3
1 a2
0 = a1
a = 5 = ,
1 a0
suppose that the circuit produced the three vectors
0 0 0
1 0 1
, and
0 1 1
1 0 1
507
after three circuit invocations. While all three are orthogonal to a, they do not form
a linearly-independent set. In this case, 3 = n − 1 was not adequate. Furthermore, a
fourth or fifth might not even be enough (if, say, we got some repeats of these three).
Therefore, we must perform this process m times, m ≥ n − 1, until we have n − 1
linearly-independent vectors. (We don’t have to say that they must be orthogonal to
a, since the circuit construction already guarantees this.) How large must m be, and
can we ever be sure we will succeed?
1. we only have to run the circuit a polynomial number of times (in n) to get
n − 1 linearly independent zs which are orthogonal to a with arbitrarily good
confidence, and
2. the various classical tasks – like checking that a set of vectors is linearly inde-
pendent and solving a series of n equations – are all of polynomial complexity
in n, individually.
We’ll do all that in the following sections, but right now let’s see the algorithm:
• Initialize a set W to the empty set. W will eventually contain a growing number
of (Z2 )n vectors,
508
∗ if j = n − 2, we have n − 1 linearly-independent vectors in W
and are done; break the loop.
– If it is not independent (which includes special the case z = 0, even
when W is still empty), then continue to the next pass of the loop.
• If the above loop ended naturally (i.e., not from the break ) after n + T full
passes, we failed.
• Otherwise, we succeeded. Add an nth vector, wn−1 , which is linearly indepen-
dent to this set (and therefore not orthogonal to a, by a previous exercise), done
easily using a simple classical observation, demonstrated below. This produces
a system of n independent equations satisfying
(
0, k = 0, . . . , n − 2
wk · a =
1, k = n − 1
Note: By supplying the nth vector (a fast, easy addition of cost O(n)), we get a full
system requiring no extra quantum samples and guaranteeing that our system yields
a, unequivocally.
We will run our circuit n + T = O(n) times. There will be some classical algorithms
that need to be tacked on, producing an overall growth rate of about O(n3 ) or O(n4 ),
depending on the cleverness of the overall design. But let’s back-up a moment.
I’ve only mentioned time complexity. What about circuit, a.k.a. spatial, complex-
ity?
There is a circuit that must be built, and we can tell by looking at the number
of inputs and outputs to the circuit (2n each) that the spatial complexity is O(n).
I mentioned during our oracle lesson that we can ignore the internal design of the
oracle because it, like its associated f , may be arbitrarily complex; all we are doing in
these algorithms is getting a relativized speed-up. But even setting aside the oracle’s
internals, we can’t ignore the O(n) spatial input/output size leading to/from the
oracle as well as the circuit as a whole.
Therefore, if I operate the circuit in an algorithmic “while loop” of n + T = O(n)
passes, the time complexity of the quantum circuit (not counting the classical tools
to be added) is O(n). Meanwhile, each pass of the circuit uses O(n) wires and gates,
giving a more honest O(n2 ) growth to the { algorithm + circuit } representing (only)
the quantum portion of the algorithm.
So, why do I (and others) describe the complexity of the quantum portion of the
algorithm to be O(n) and not O(n2 )?
509
We actually touched on these reasons when describing why we tend to ignore
hardware growth for these relatively simple circuits. I’ll reprise the two reasons given
at that time.
So, be aware of the true overall complexity, but understand that it’s usually un-
necessary to multiply by a linear circuit complexity when doing computational ac-
counting.
510
we would (sadly) discover that, for many of the coefficients, ck ,
ck 6= v · wk .
This is not a big deal, since we will not need the trick, but it’s a good mental exercise
to acknowledge naturally occurring vector spaces that give rise to non-positive defi-
nite pairings and be careful not to use those pairing as if they possessed our usual
properties.
Example #1
Take n = 4 and
0
1
a =
0 .
We may end up with a basis that uses the 3-dimensional subspace, a⊥ , generated by
the three vectors
1 0 0
0 0 1
, , = a ,
0 1 0
0 0 1
These four vectors form a possible outcome of our algorithm when applied to (Z2 )4
with a period of a = ( 0, 1, 0, 1 )t , and you can confirm the odd claims I made above.
Example #2
511
and the basis consisting of the three vectors orthogonal to a,
1 1 1
1 ,
0 1
, = a
0 0 1
0 1 1
512
• Step 1: Form a matrix and look at the column vectors. We begin by
stacking the m + T vectors atop one another, forming the matrix,
z0 z00 z01 ··· z0(m−1)
z1
z10
z11 ··· z1(m−1)
z2
=
z20 z21 · · · z2(m−1) .
.. .. .. . . ..
. . . . .
zm+T −1 z(m+T −1)0 z(m+T −1)1 · · · z(m+T −1)(m−1)
The number of independent rows is the row rank of this matrix, and by elemen-
tary linear algebra, the row rank = column rank. So, let’s change our perspective
and think of this matrix as set of m column vectors, each of dimension m + T .
We would be done if we could show that all m of column vectors
z00 z01 z0(m−1)
z10 z11 z1(m−1)
z20 z21 z2(m−1)
, ,...,
.. .. ..
. . .
z(m+T −1)0 z(m+T −1)1 z(m+T −1)(m−1)
↑ ↑ ↑
≡ c0 , c1 , ··· , cm−1
were linearly independent with probability > 1 − (1/2)T +1 . (That would mean
the column rank was m.)
[This row rank = column rank trick has other applications in quantum comput-
ing, and, in particular, will be used when we study Neumark’s construction for
“orthogonalizing” a set of general measurements in the next course. Neumark’s
construction is a conceptual first step towards noisy-system analysis.]
Our goal is to compute the probability of I (m). Combining the basic identity,
P I (m) = P I (m) ∧ I (m − 1)
+ P I (m) ∧ ¬ I (m − 1) ,
513
with the observation that
P I (m) ∧ ¬ I (m − 1)
= 0
we can write
P I (m) = P I (m) ∧ I (m − 1)
= P I (m) I (m − 1) P I (m − 1) ,
P I (m)
= P I (m) I (m − 1) P I (m − 1) I (m − 2)
Ym
P I (j) I (j − 1) .
=
j=1
(The j = 1 term might look a little strange, because it refers to the undefined
I (0), but it makes sense if we view the event I (0), i.e., that in a set of no
vectors they’re all linearly-independent, as vacuously true. Therefore, we see
that I (0) can be said to have probability 1, so the j = 1 term reduces to
P ( I (1) ), without a conditional, in agreement with the line above.)
uct.
P I (j) I (j − 1) = 1 − P ¬ I (j) I (j − 1)
514
always have a single 1 coordinate sitting in a column of 0s. Looked at this way,
how many distinct vectors can be formed out of various sums of the original
j − 1?]
The probability we seek is, by definition of probability, just the ratio
m+T −j+1
2j−1 1
= ,
2m+T 2
so
m+T −j+1
1
P I (j) I (j − 1)
= 1 − ,
2
for j = 1, 2, . . . , m.
Sanity Check. Now is a good time for the computer scientist’s first line of
defense when coming across a messy formula: the sanity check.
Does this make sense for j = 1, the case of a single vector c0 ? The formula tells
us that the chances of getting a single, linearly-independent vector is
m+T −1+1 m+T
1 1
P I (1)
= 1 − = 1 − .
2 2
Wait, shouldn’t the first vector be 100% certain? No, we might get unlucky and
pick the 0-vector with probability 1/2m+T , which is exactly what the formula
predicts.
That was too easy, so let’s do one more. What about j = 2? The first vector,
c0 , spans a space of two vectors (as do all non-zero single vectors in (Z2 )m ).
The chances of picking a second vector from this set would be 2/(size of the
space), which is 2/2m+T = 1/(2m+T −1 ). The formula predicts that we will get
a second independent vector, not in the span of c0 with probability
m+T −2+1 m+T −1
1 1
P I (2) I (1)
= 1 − = 1 − ,
2 2
exactly the complement of the probability that c1 just happening to get pulled
from the set of two vectors spanned by c0 , as computed.
So, spot testing supports the correctness of the derived formula.
• Step 4: Plug the expression for the jth factor (step 3) back into the
full probability formula for all m vectors (step 2).
In step 2, we decomposed the probability of “success”, as a product,
m
Y
P I (m) P I (j) I (j − 1) .
=
j=1
515
In step 3, we computed the value for each term,
m+T −j+1
1
P I (j) I (j − 1)
= 1 − .
2
Combining the two, we get
m m+T −j+1 !
Y 1
P I (m)
= 1 − ,
j=1
2
Proof by Induction.
– Case p = 1:
1 − α1 ≥ 1 − α1 . X
– Consider any p > 1 and assume the claim is true for p − 1. Then
p p−1
Y Y
(1 − αi ) = (1 − αp ) (1 − αi )
i=1 i=1
p−1
!
X
≥ (1 − αp ) 1 − αi
i=1
p−1 p−1
X X
= 1− αi − αp + αp αi
i=1 i=1
p
X
> 1− αi . X QED
i=1
516
• Step 6: Apply the lemma to the conclusion of step 4 to finish off the
proof. Using m for the p of the lemma, we obtain
m T +i !
Y 1
P I (m)
= 1 −
i=1
2
m T +i
X 1
≥ 1 −
i=1
2
T "X m i
#
1 1
= 1 − .
2 i=1
2
But that big bracketed sum on the RHS is a bunch of distinct and positive
powers of 1/2, which can never add up to more than 1 (think binary floating
point numbers like .101011 or .00111 or .1111111), so that sum is < 1, i.e.,
" m #
X 1 i
< 1 , so
i=1
2
T "X m i
# T
1 1 1
< , and we get
2 i=1
2 2
T "X m i
# T
1 1 1
1 − > 1 − .
2 i=1
2 2
This proves that the column vectors, cj , are linearly-independent with proba-
bility greater than 1 − 1/2T , and therefore the row vectors, our zk also have
at least m linearly independent vectors among them (row rank = column rank,
remember?), with that same lower-bound probability. QED
517
18.10.4 Producing n − 1 Linearly-Independent wk in Poly-
nomial Time – Argument 2
This argument is seen frequently and is more straightforward than our preferred one,
but it gives a weaker result in absolute terms. That is, it gives an unjustifiably
conservative projection for the number of samples required to achieve n − 1 linearly
independent vectors. Of course, this would not affect the performance of an actual
algorithm, since all we are doing in these proofs is showing that we’ll get linear
independence fast. An actual quantum circuit would be indifferent to how quickly
we think it should give us n − 1 independent vectors; it would reveal them in a time
frame set by the laws of nature, not what we proved or didn’t prove. Still, it’s nice to
predict the convergence to linear independence accurately, which this version doesn’t
do quite as well as the first. Due to its simplicity and prevalence in the literature, I
include it.
Here are the steps. We’ll refer back to the first proof when we need a result that
was already proven there.
518
Using the exact argument from argument 1, step 2 , we conclude,
m
Y
P I (m) P I (j) I (j − 1) .
=
j=1
(If interested, see argument 1, step 2 to account for the fact that I (0)
can be said to have probability 1, implying that the j = 1 term reduces to
P ( I (1) ), without a conditional.)
uct.
P I (j) I (j − 1) = 1 − P ¬ I (j) I (j − 1)
Sanity Check. Does this make sense for j = 1, the case of a single vector z0 ?
The formula tells us that the chances of getting a single, linearly-independent
vector is
m−1+1 m
1 1
P I (1)
= 1 − = 1 − .
2 2
Wait, shouldn’t the first vector be 100% certain? No, we might get unlucky
and pick the 0-vector with probability 1/2m , which is exactly what the formula
predicts.
That was too easy, so let’s do one more. What about j = 2? The first vector,
z0 , spans a space of two vectors (as do all non-zero single vectors in (Z2 )m ).
The chances of picking a second vector from this set would be 2/(size of the
519
space), which is 2/2m = 1/(2m−1 ). The formula predicts that we will get a
second independent vector, not in the span of z0 with probability
m−2+1 m−1
1 1
P I (2) I (1)
= 1 − = 1 − ,
2 2
exactly the complement of the probability that z1 just happening to get pulled
from the set of two vectors spanned by z0 , as computed.
So, spot testing supports the formula’s claim.
• Step 3: Plug the expression for the jth factor (step 2) back into the
full probability formula for all m vectors (step 1).
In step 1, we decomposed the probability of “success”, as a product,
m
Y
P I (m) P I (j) I (j − 1) .
=
j=1
From here one could just quote a result from the theory of mathematical q-series,
namely that this infinite product is about .28879. As an alternative, there are
520
some elementary proofs that involve taking the natural log of the product,
splitting it into a finite sum plus and infinite error sum, then estimating the
error. We’ll accept the result without further ado, which implies
m i !
Y 1
P I (m)
= 1 −
i=1
2
> .25
521
For example, for n = 10, T = 10, the second proof predicts that 10 × 9 = 90
samples would produce at least one of the 10 sets to be linearly independent with
probability greater than 1 − (3/4)10 ≈ .943686. In contrast, the first proof would only
ask for 9 + 10 = 19 samples to get confidence > .999023. That’s greater confidence
with fewer samples.
1. The test for mod-2 linear independence in (Z2 )n that our iterative process has
used throughout.
For those of you who will be skipping the details in the next sections, I’ll reveal
the results now:
1. The test for mod-2 linear independence is handled using mod-2 Gaussian elim-
ination which we will show to be O(n3 ).
3. The two classical algorithms will be applied in series, so we only need to count
the larger of the two, O(n3 ). Together, they are applied once for each quantum
sample, already computed to be O(n), resulting in a nested count of O(n4 ).
522
4. We’ll tweak the classical tools by integrating them into Simon’s algorithm so
that their combined cost is only O(n2 ), resulting in a final count of O(n3 ).
5x + y + 2z + w = 7
x − y + z + 2w = 10
x + 2y − 3z + 7w = −3
As the example shows, there’s no requirement that the system have the same number
of equations as unknowns; the fewer equations, the less you will know about the solu-
tions. (Instead of the solution being a unique vector like (x, y, z, w)t = (3, 0, −7, 2)t ,
it might be a relation between the components, like ( α, 4α, −.5α, 3α )t , with α free
to roam over R). Nevertheless, we can apply our techniques to any sized system.
We break it into the two parts,
• Gaussian elimination, which produces a matrix with 0s in the lower left triangle,
and
• back substitution, which uses that matrix to solve the system of equations as
best we can, meaning that if there are not enough equations, we might only get
relations between unknowns, rather than unique numbers.
523
18.12.2 Gaussian Elimination
Gaussian Elimination for Decimal Matrices
Gaussian Elimination seeks to change the matrix for the equation into (depending
on who you read), either
• reduced row echelon form, which is echelon form with the additional re-
quirement that the first non-zero element in each row be 1, e.g.,
1 1 0 5/3
0 1 1/3 2/3 .
0 0 1 −7/5
In our case, where all the values are integers mod-2 (just 0 and 1), the two are actually
equivalent: all non-zero values are 1, automatically.
Reduced or not, row echelon forms have some important properties that we will need.
Let’s first list them, then have a peek at how one uses Gaussian elimination (GE), to
convert any matrix to an echelon form.
• Geometrically, the diagonal, under which all the elements must be 0, is clear in
a square matrix:
• ∗ ∗ ··· ∗
0 • ∗ ··· ∗
0 0 • ··· ∗
.. .. .. ..
. . . .
0 0 ··· 0 •
When the the matrix is not square, the diagonal is geometrically visualized
524
relative to the upper left (position (0, 0)):
• ∗ ∗ ··· ∗
0 • ∗ ··· ∗
0 0 • ··· ∗
• ∗ ∗ ··· ∗ ··· ∗
.
. .. .. .. 0 • ∗ ··· ∗ · · · ∗
. . . .
0 0 • ··· ∗
· · · ∗
,
0 0 ··· 0 •
.. .. .. .. ..
0 ···
. . . . .· · · ∗
0 0 0
.
.. .. .. .. 0 0 ··· 0 • ··· ∗
. . .
0 0 ··· 0 0
In any case, a diagonal element is one that sits on position (k, k), for some k.
• The first non-zero element on row k is to the right of the first non-zero element
of row k−1, but it might be two or more positions to the right. I’ll use a reduced
form which has the special value, 1, occupying the first non-zero element in a
row to demonstrate this. Note the extra 0s, underlined, that come about as a
result of some row being “pushed to the right” in this way.
1 ∗ ∗ ∗ ∗ ∗ ··· ∗
0 1 ∗ ∗ ∗ ∗ · · · ∗
0 0 0 0 1 ∗ · · · ∗
0 0 0 0 0 1 · · · ∗
.. .. .. .. .. .. . . ..
. . . . . . . .
0 0 0 0 0 0 · · · [0/1/∗]
• All non-zero row vectors in the matrix are, collectively, a linear independent
set.
[Exercise. Prove it.]
• If we know that there are no all-zero row vectors in the echelon form, then the
number of rows ≤ number of columns.
[Exercise. Prove it.]
When using GE to solve systems of equations, we have to be careful that the equations
that the reduced echelon form represents are equivalent to the original equations, and
to that end we have to modify the RHS column vector, e.g., (7, 10, −3)t of our
example, as we act on the matrix on the LHS. We thus start the festivities by placing
525
the RHS constant vector in the same “house” as the LHS matrix, but in a “room” of
its own,
5 1 2 1 7
1 −1 1 2 10 .
1 2 −3 7 −3
We will modify both the matrix and the vector at the same time, eventually resulting
in the row-echelon form,
3 3 0 5 −13
0 6 2 4 − 832 ,
15
0 0 5 −7 − 17 5
0 0 1 − 75 − 17 25
There are only three legal operations that we need to consider when performing GE,
[Exercise. Prove that these operations produce equations that have the identical
solution(s) as the original equations.]
526
For example,
5 1 2 1 7 −5×2nd row
5 1 2 1 7
1 −1 1 2 10 −−−−−−−−→ −5 5 −5 −10 −50
1 2 −3 7 −3 1 2 −3 7 −3
5 1 2
add 1st to 2nd
1 7
−−−−−−−−→ 0 6 −3 −9 −43
1 2 −3 7 −3
swap 1st and 3rd 1 2 −3 7 −3
−−−−−−−−−−−−→ 0 6 −3 −9 −43
5 1 2 1 7
add −5×1st to 3rd
1 2 −3 7 −3
−−−−−−−−−−−−→ 0 6 −3 −9 −43
0 −9 17 −34 22
add 32 ×2nd to 3rd 1 2 −3 7 −3
−−−−−−−−−−−−→ 0 6 −3 −9 −43 ,
0 0 252
− 95
2
− 85
2
etc. (These particular operations may not lead to the echelon forms, above; they’re
just illustrations of the three rules.)
GE is firmly established in the literature, so for those among you who are interested,
I’ll prescribe web search to dig up the exact sequence of operations needed to produce
a row reduced echelon form. The simplest algorithms with no short-cuts use O(n3 )
operations, where an “operation” is either addition or multiplication, and n is the
larger of the matrix’s two dimensions. Some special techniques can improve that, but
it is always worse than O(n2 ), so we’ll be satisfied with the simpler O(n3 ).
To that result we must incorporate the cost of each multiplication and addition
operation. For GE, multiplication could involve increasingly large numbers and, if
incorporated into the full accounting, would change the complexity to slightly better
3 2
than O n (log m) , where m is the absolute value of the largest integer involved.
Addition is less costly and done in series with the multiplications so does not erode
performance further.
For mod-2 arithmetic, however, we can express the complexity without the extra vari-
able, m. Our matrices consist of only 0s and 1s, so the “multiplication” in operations
2 and 3 reduce to either the identity (1 × a row) or producing a row of 0s (0 × a
row). Therefore, we ignore the multiplicative cost completely. Likewise, the addition
527
operation that is counted in the general GE algorithm is between matrix elements,
each of which might be arbitrarily large. But for us, all matrix elements are either 0
or 1, so our addition is a constant time XOR between bits.
All this means that in the mod-2 milieu, each of our ≈ n3 GE operations requires
some constant-time XORs (for the additions) and constant time if -statements (to
account for the multiplication by 0 or 1). Evidently, the a total mod-2 GE cost is
untarnished at O(n3 ).
With some fancy footwork in a special section, below, we’ll improve it to O(n2 ),
but we don’t really care about the exact figure once we have it down this far. All we
require is polynomial time to make Simon a success story. However, we are exercising
our ability to evaluate algorithms, historical or future, so this continues to be an
exercise worthy of our efforts.
It is easiest – and for us, enough – to explain back substitution in the case when the
number of linearly independent equations equals the number of unknowns, n. Let’s
say we begin with the system of equations in matrix form,
c00 c01 ... c0(n−1) x0 b0
c10 c11 ... c1(n−1)
x1
b1
= .. ,
.. .. . . .
. .
.
. . . . . .
c(n−1)0 c(n−1)1 . . . c(n−1)(n−1) xn−1 bn−1
assumed to be of maximal rank, n – all rows (or columns) are linearly independent.
This would result is a reduced row echelon form
1 c001 c002 c003 . . . c00(n−2) c00(n−1) b00
0 1 c012 c013 . . . c01(n−2) c01(n−1) b01
0 0
1 c023 . . . c02(n−2) c02(n−1) b02
0 0
1 . . . c03(n−2) c03(n−1) b03
,
. . . .. .. .. .. ..
.. .. .. . . . . .
0 0 0 0 ... 1 c0(n−2)(n−1) b0n−2
0 0 0 0 ... 0 1 b0n−1
where, the c0kj and b0k are not the original constants in the equation, but the ones
obtained after applying GE. In reduced echelon form, we see that the (n − 1)st
unknown, xn−1 , can be read off immediately, as
xn−1 = b0n−1 .
From here, we “substitute back” into the (n − 2)nd equation to get,
xn−2 + b0n−1 c0(n−2)(n−1) = b0n−2 ,
528
which can be solved for xn−2 (one equation, one unknown). Once solved, we substitute
these numbers into the equation above, getting another equation with one unknown.
This continues until all rows display the answer to its corresponding xk , and the
system is solved.
The bottom row has no operations: it is already solved. The second-from-bottom has
one multiplication and one addition. The third-from-bottom has two multiplications
and two additions. Continuing in this manner and adding things up, we get
(n − 1) n
1 + 2 + 3 + ... + (n − 1) =
2
additions and the same number of multiplications, producing an overall complexity of
O(n2 ) operations. As noted, the time complexity of the addition and multiplication
algorithms would degrade by a factor of (log m)2 , m being the largest number to be
multiplied, making the overall “bit” complexity O n2 (log m)2 .
For mod-2 systems, we have no actual multiplication (the b0k are either 1 and 0) and
the additions are single-bit mod-2 additions and therefore, constant time. Thus, each
of the approximately n(n + 1)/2 (= O(n2 ) ) addition operations in back substitution
uses a constant time addition, leaving the total cost of back substitution un-degraded
at O(n2 ).
529
They will be used in-series, so we only need to count the larger of the two, O(n3 ),
and these algorithms are applied once for each quantum sample, already computed
to be O(n), resulting in a nested count of O(n4 ).
2. Observations. Notice that m < (n − 1), since the vectors in W are indepen-
dent, by assumption, and if m were equal to n − 1, we would already have a
maximally independent set of vectors known to be orthogonal to a and would
have stopped sampling the circuit. That means that there are at least two more
columns than there are rows: the full space is n-dimensional, and we have n − 2
or fewer linearly independent vectors so far.
As a consequence, one or more rows (the second, in the above example) skips
to the right more than one position relative to the row above it and/or the final
row in the matrix has its leading 1 in column n − 3 or greater.
530
GE applied to those vectors. All that matters is that the span of the new
rows is the same as the span of W plus z, which GE ensures.] We have
increased our set of linearly independent vectors by one.
• If z is not linearly independent of W the last row will contain all 0s.
Recover the original W (or, if you like, replace it with the new reduced
matrix row vectors, leaving off the final 0 row – the two sets will span the
same space and be linearly independent). You are ready to grab another z
based on the outer-loop inside which this linear-independence test resides.
[Exercise. Explain why all the claims in this step are true.]
4. Once n − 1 vectors populate the set W = {w0 , . . . , wn−2 }, the process is
complete. We call the associated row-reduced matrix W ,
w0
w1
W = ,
..
.
wn−2
and W satisfies the matrix-vector product equation
W · a = 0,
where a is our unknown period and 0 is the (n − 1)-dimensional 0-vector in
(Z2 )n−1 . This condenses our system of n − 1 equations, a · wk = 0, into a single
matrix relation.
1. Starting from the top row, w0 , look for the last (lowest) wk which has its leading
1 in column k (i.e., it has its leading 1 on W ’s diagonal, but wk+1 , directly below
it, has a 0 in its (k + 1)st position).
531
• If such a wk can be found, define
(0, 0, . . . , 0, 1, 0, . . . , 0)
wn−1 ≡ 2n−2−k ←→ ↑
k + 1st position from left
and place this new wn−1 directly below wk , pushing all the vectors in the
old rows k + 1 and greater down to accommodate the insertion. Call the
augmented matrix, W 0 .
Before:
1 w01 w02 w03 w04 w05
0 1 w12 w13 w14 w15
W = 0 0 1 w 23 w 24 w 25
←− wk
0 0 0 0 1 w35
0 0 0 0 0 1
After:
1 w01 w02 w03 w04 w05
0 1 w12 w13 w14 w15
0 0 1 w23 w24 w25
W0 ≡
←− wn−1
0 0 0 1 0 0
0 0 0 0 1 w35
0 0 0 0 0 1
wn−2 = (0, 0, . . . , 0, 1, ∗) .
Define
wn−1 ≡ 20 = 1 ←→ (0, 0, . . . , 0, 0, 1) ,
and place this new wn−1 after wn−2 last old row, making wn−1 the new
bottom row of W . Call the augmented matrix, W 0 .
Before:
1 w01 w02 w03 w04 w05
0 1 w12 w13 w14 w15
W = 0 0 1 w23 w24 w25
0 0 0 1 w34 w35
0 0 0 0 1 w45 ←− wn−2
532
After:
1 w01 w02 w03 w04 w05
0 1 w12 w13 w14 w15
0 0 1 w23 w24 w25
W ≡
0 0
0 1 w34 w35
0 0 0 0 1 w45
0 0 0 0 0 1 ←− wn−1
W · a = 0,
insert a 1 into the position corresponding to the new row in W . Push any 0s
down, as needed, to accommodate this 1. It will now be an n-dimensional vector
corresponding to 2k for some k.
0
0 0
0
0 −→ 1
0
0
0
0
0
That will produce a full set of n linearly independent vectors for Z2n in reduced
echelon form, which we call W 0 .
The above process consists of a small number of O(n) operations, each occurring in
series with each other and with the loops that come before. Therefore, its O(n) adds
nothing to the previous complexity, which now stands at O(n4 ).
533
18.13.3 Using Back-Substitution to Close the Deal
We are, metaphorically, 99% of the way to finding the period of f . We want to solve
0
0
.
.
.
W · a = 1 ,
.
..
0
0
This is an O(n2 ) activity done in series with the loops that come before. Therefore,
its O(n2 ) adds nothing to the previous complexity, which still stands at O(n4 ).
534
3. Use a classical algorithm to determine whether or not z is linearly dependent
on the vectors in W.
The entire block can be replaced by the following, which not only tests for linear
independence, but keeps W’s associated matrix, W , in reduced echelon form and does
so leveraging the special simplicity afforded by mod-2 arithmetic.
The algorithm is more readable if we carry an example along. Assume that, to
date, W is represented by
1 0 1 1 0 1 0 0 1
0 0 0 1 1 1 1 1 0
0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 1 0 1
3. Loop (new inner loop) until either z has been added to W ( ' W)
or z = 0 produced.
535
• If a row vector wk with same non-0 MSB is found replace
z ← z ⊕ wk ,
z ← z ⊕ w1 = (000001101)]
[Exercise. Finish this example, repeating the loop as many times as needed, and
decide whether the original z produces an augmented W or turns z into 0.]
[Exercise. Try the loop with z = 000001100.]
[Exercise. Try the loop with z = 010001100.]
In the proposed loop, we are considering whether or not to add some z to our set, W.
• Say we do find a wk whose MSB matches the MSB of z. z ⊕wk is, by definition,
in the span of W ∪ {z}. Since z is orthogonal to a, so, too is the new z = z ⊕ wk ,
but the new z has more 0s, bringing us one loop pass closer to either adding it W
(the above case) or arriving at a termination condition in which we ultimately
produce z = 0, which is never independent of any set, so neither was the previous
zs that got us to this point (including the original z produced by the circuit.)
[Exercise. Fill in the details, making sure to explain why z and z ⊕ wk are
either a) both linearly independent of W or b) both not.]
This step has a new inner-loop, relative to the outer quantum sampling loop, which
has, at most, n − 1 passes (we move the MSB to the right each time). Inside the loop
we apply some O(n) operations in series with each other. Therefore, the total cost of
the Step 3 loop is O(n2 ), which includes all arithmetic.
536
Summary
We are not applying GE all at once to a single matrix. Rather, we are doing an O(n2 )
operation after each quantum sample that keeps the accumulated W set in eternal
echelon-reduced form. So, it’s a custom O(n2 ) algorithm nested within the quantum
O(n) loop, giving an outer complexity of O(n3 ).
The take-away is that we have already produced the echelon form as part of the
linear-independence tests, so we are positioned to solve the system using only back-
substitution, O(n2 ).
537
Cost of the mod-2 Test for Independence
As already noted, the step-3 loop is repeated at most n − 1 times, since we push
the MSB of z to the right at least one position each pass. Inside this loop, we have
O(n) operations, all applied in series. The nesting in step 3, therefore, produces an
O(n2 ) complexity. Step 3 is within the outer quantum loop, O(n), which brings the
outer-most complexity to O(n3 ), so far . . . .
We saw that we could solve the system using only back-substitution, O(n2 ). This is
done outside the entire quantum loop.
The total cost, quantum + classical, using our fancy footwork is, therefore, O(n3 ),
relativized to the oracle.
538
18.15.2 Classical Probabilistic Cost
However, what if we were satisfied to find a with a pre-determined probability of
error, ε? This means that we could choose an ε as small as we like and know that if
we took m = m(ε, n) samples, we would get the right answer, a, with probability
The functional dependence m = m(ε, n) is just a way to express that we are allowed
to let the number of samples, m, depend on both how small an error we want and
also how big the domain of f is. For example, if we could show that m = 21/ε worked,
then since that is not dependent on n the complexity would be constant time. On the
other hand, if we could only prove that an m = n4 21/ε worked, then the algorithm
would be O(n4 ).
The function dependence we care about does not involve ε, only n, so really, we
are interested in m = m(n).
What we will show is that even if we let m be a particular function that grows
exponentially in n, we won’t succeed. That’s not to say every exponentially increasing
sample size which is a function of n would fail – we already know that if we chose
2n−1 + 1 we will succeed with certainty. But we’ll see that some smaller exponential
function of n, specifically m = 2n/4 , will not work, and if that won’t work then no
polynomial dependence on n, which necessarily grows more slowly than m = 2n/4 ,
has a chance, either.
For the moment, we won’t concern ourselves with whether or not m is some function
of ε or n. Instead, let’s compute an upper bound on the probability of getting a
repeat when sampling a classical oracle m times. That is, we’ll get the probability
as a function of m, alone. Afterwards, we can stand back and see what kind of
dependence m would require on n in order that the deck be stacked in our favor.
We are looking for the probability that at least two samples f (xi ), f (xj ) are equal
when choosing m distinct inputs, {x0 , x1 , . . . , xm−1 }. We’ll call that event E . The
more specific event that some pair of inputs, xi and xj , yield equal f (x)s, will be
referred to as Eij . Since Eij and Eji are the same event, we only have to list it once,
so we only consider the cases where i < j. Clearly,
m−1
[
E = Eij
i, j=0
i<j
The probability of a union is the sum of the probabilities of the individual events
539
minus the probabilities of the various intersections of the individual events,
m−1
X X
P (E ) = P (Eij ) − P (Eij ∧ Ekl . . .)
i, j=0 various
i<j combinations
m−1
X
≤ P (Eij ) .
i, j=0
i<j
The number of unordered pairs, {i, j}, i 6= j, when taken from m things is (look up
“n choose k” if you have never seen this)
m m! m (m − 1)
= = .
2 2!(m − 2)! 2
This is exactly the number of events Eij that we are counting since our condition,
0 ≤ i < j ≤ m − 1, is in 1-to-1 correspondence with the set of unordered pairs {i, j},
i and j between 0 and m − 1, inclusive and i 6= j.
Meanwhile, the probability that an individual pair produces the same f value is
just the probability that we choose the second one, xj , in such a way that it happens
to be exactly xi ⊕ a. Since we’re intentionally not going to pick xi a second time this
leaves 2n − 1 choices, of which only one is xi ⊕ a, so that gives
1
P (Eij ) = n
.
2 −1
Therefore, we’ve computed the number of elements in the sum, m(m − 1)/2, as well
as the probability of each element in the sum, 1/(2n − 1), so we plug back into our
inequality to get
m (m − 1) 1
P (E ) ≤ · n .
2 2 −1
[Exercise. We know that when we sample m = 2n−1 + 1 times, we are certain to
get a duplicate. As a sanity check, make sure that plugging this value of m into the
derived inequality gives an upper bound that is no less than one. Any value ≥ 1 will
be consistent with our result. ]
This is the first formula we sought. We now go on to see what this implies about
how m would need to depend on n to give a decent chance of obtaining a.
To get our feet wet, let’s imagine the pipe dream that we can use an m that is
independent of n. The bound we proved,
m (m − 1) 1
P (E ) ≤ · n ,
2 2 −1
tells us that any such m (say an integer m > 1/ε1000000 ) is going to have an exponen-
tially small probability as n → ∞. So that settles that, at least.
540
An Upper Bound for Getting Repeats in m = 2n/4 Samples
But, what about allowing m to be a really large proportion of our domain, like
m = 2n/4 ?
That’s an exponential function of n, so it really grows fast. Now we’re at least being
realistic in our hopes. Still, they are dashed:
m (m − 1) 1 2n/4 (2n/4 − 1) 1
P (E ) ≤ · n = · n
2 2 −1 2 2 −1
2n/2 − 2n/4 1 2n/2 1
= · n < · n
2 2 −1 2 2 −1
n/2 n/2
1 2 1 2
= · n < · n
2 2 −1 2 2 − 2n/2
1 2n/2 1 1
= · n/2 n/2 = · n/2
2 2 (2 − 1) 2 2 −1
1
< n/2
,
2 −1
a probability that still approaches 0 (exponentially fast – unnecessary salt in the
wound) as n → ∞. Although we are allowing ourselves to sample the function, f , at
m = m(n) distinct inputs, where m(n) grows exponentially in n, we cannot guarantee
a reasonable probability of getting a repeat f (x) for all n. The probability always
shrinks to 0 when n gets large. This is exponential time complexity.
This problem is hard, even probabilistically, in classical terms. Simon’s algorithm
gives a (relativized) exponential speed-up over classical methods.
541
Chapter 19
If you do decide to invest time in the early material, you will be incidentally adding
knowledge applicable to many fields besides quantum computing. You will also have
542
some additional insight into the QFT .
In this first lesson we introduce the concepts frequency and period and show how
a periodic function can be built using only sines and cosines (real Fourier series) or
exponentials (complex Fourier series). The basic concepts here make learning the
Fourier transform (next chapter) very intuitive.
and 2π is the smallest positive number with this property, so a = 2π is its period. 4π
and 12π satisfy the equality, but they’re not as small as 2π, so they’re not periods.
Its graph manifests this periodicity in the form of repetition (Figure 19.1).
I said that the domain could be “nearly” all real numbers, because it’s fine if the
function blows-up or is undefined on some isolated points. A good example is the
tangent function, y = tan x, whose period is half that of sin x but is undefined for
±π/2, ±3π/2, etc. (Figure 19.2).
543
Figure 19.1: Graph of the quintessential periodic function y = sin x
(period = 2π)
Figure 19.2: The function y = tan x blows-up at isolated points but is still periodic
(with period π)
[Note the notation [−1, 3) for the “half-open” interval that contains the left end-
point, −1, but not the right endpoint, 3. We could have included 3 and not included
544
−1, included both or neither. It all depends on our particular goal and function.
Half-open intervals, closed on the left and open on the right, are the most useful to
us.]
A subtly different concept is that of compact support. A function might be defined
on a relatively large set, say all (or most) real numbers, but happen to be zero outside
a bounded interval. In this case we prefer to say that it has compact support.
The previous example, extended to all R, but set to zero outside [−1, 3) is
2
x , if x ∈ [−1, 3)
f (x) =
0, otherwise
Terminology
The support of the function is the closure of the domain where f 6= 0. In the last
function, although f is non-zero only for [−1, 0) ∪ (0, 3), we include the two points 0
and 3 in its support since they are part of the closure of the set where f is non-zero.
(I realize that I have not defined closure, and I won’t do so rigorously. For us, closure
means adding back any points which are “right next” to places where f is non-zero,
like 0 and 3 in the last example.)
Figure 19.4: Graph of a function defined everywhere, but whose support is [−1, 3],
the closure of [−1, 0) ∪ (0, 3)
545
know about the function elsewhere, since the rest of the graph of the function is just
a repeated clone of what we see on this small part.
Likewise, if we had a (non-periodic) function with bounded domain, say [a, b], we
could throw away b to make it a half-open interval [a, b) (we don’t care about f at one
point, anyway). We then convert that to an induced periodic function by insisting
that f (x) = f (x + T ), for T ≡ b − a. This defines f everywhere off that interval, and
the expanded function agrees with f on its original domain, but is now periodic with
period T = b − a.
As a result of this duality between periodic functions and functions with bounded
domain, I will be interchanging the terms periodic and bounded domain at will over
the next few sections, choosing whichever one best fits the context at hand.
Figure 19.6: A function with bounded domain that can be expressed as a Fourier
series (support width = 2π)
546
Until further notice we confine ourselves to functions with domain ⊆ R and range
⊆ R.
Any “well-behaved” function of the real numbers that is either periodic (See Fig-
ure 19.5), or has bounded domain (See Figure 19.6), can be expressed as a sum of
sines and cosines. This is true for any period or support width, T , but we normally
simplify things by taking T = 2π.
The sum on the RHS of this equation is called the Fourier Series of the function f .
The functions of x (that is, {sin nx}n , {cos nx}n and the constant function 1/2) that
appear in the sum are sometimes called the Fourier basis functions.
Study this carefully for a moment. There is a constant term out front (a0 /2),
which simply shifts the function’s graph up or down, vertically. Then, there are two
infinite sums involving cos nx and sin nx. Each term in those sums has a coefficient
– some real number an or bn – in front of it. In thirty seconds we’ll see what all that
means.
[The term well-behaved could take us all week to explore, but every function
that you are likely to think of, that we will need, or that comes up in physics and
engineering, is almost certainly going to be well-behaved.]
• When the higher -n coefficients like a50 , b90 or b1000 are large in magnitude, the
function has lots of high frequency characteristics (busy squiggling) going on.
• The coefficients, {an } and {bn } are often called the weights or amplitudes of
their respective basis functions (in front of which they stand). If |an | is large,
547
Figure 19.7: A low frequency (n = 1 : sin x) and high frequency (n = 20 : sin 20x)
basis function in the Fourier series
there’s a “lot of” cos nx needed in the recipe to prepare a meal of f (x) (same
goes for |bn | and sin nx). Each coefficient adds just the right about of weight of
its corresponding sinusoid to build f .
• As mentioned, the functions {sin nx} and {cos nx} are sometimes called the
Fourier basis functions, at other times the normal modes, and in some contexts
the Fourier eigenfunctions. Whatever we call them, they represent the individ-
ual ingredients used to build the original f out of trigonometric objects, and
the weights instruct the chef how much of each function to add to the recipe:
“a pinch of cos 3x, a quart of sin 5x, three tablespoons of sin 17x”, etc.
548
19.3.3 Example of a Fourier Series
To bring all this into focus, we look at the Fourier series of a function that is about
a simple as you can imagine,
f (x) = x, x ∈ [−π, π).
f ’s graph on the fundamental interval [−π, π) is shown in Figure 19.8. When viewed
as a periodic function, f will appear as is in Figure 19.9. Either way, we represent it
as the following Fourier series
∞
X
n+1 2
x = (−1) sin nx .
n=1
n
Figure 19.9: f (x) = x as a periodic function with fundamental interval [−π, π):
The expression of such a simple function, x, as the infinite sum of sines should
strike you as somewhat odd. Why do we even bother? There are times when we
need the Fourier expansion of even a simple function like f (x) = x in order to fully
analyze it. Or perhaps we wish to build a circuit to generate a signal like f (x) = x
electronically in a signal generator. Circuits naturally have sinusoids at their disposal,
constructed by squeezing, stretching and amplifying the 60 Hz signal coming from the
AC outlet in the wall. However, they don’t have an f (x) = x signal, so that one must
built it from sinusoids, and the Fourier series provides the blueprint for the circuit.
Why is the Fourier series of f (x) = x so complicated? First, f , has a jump
discontinuity – a sharp edge – so it takes a lot of high frequency components to shape
it. If you look at the periodic sawtooth shape of f (x) = x, you’ll see these sharp
points. Second, f ’s graph is linear (but not constant). Crafting a non-horizontal
straight line from curvy sines and cosines requires considerable tinkering.
549
Finite Approximations
While the Fourier sum is exact (for well-behaved f ) the vagaries of hardware require
that we merely approximate it by taking only a partial sum that ends at some finite
n = N < ∞. For our f under consideration, the first 25 coefficients of the sines are
shown in Figure 19.10 and graphed in Figure 19.11.
The Spectrum
Collectively, the Fourier coefficients (or their graph) is called the spectrum of f . It is
a possibly infinite list (or graph) of the weights of the various frequencies contained
in f .
Viewed in this way, the coefficients, themselves, represent a new function, F (n).
The Fourier mechanism is a kind of operator, FS, applied to f (x) to get a new
function, F (n), which is also called the spectrum.
550
F (n) = FS [f (x)]
l
{an , bn }
The catch is, this new function, F , is only defined on the non-negative integers. In
fact, if you look closely, it’s really two separate functions of integers, a(n) = an and
b(n) = bn . But that’s okay – we only want to get comfortable with the idea that
the Fourier “operator” takes one function, f (x), domain R, and produces another
function, its spectrum F (n), domain Z≥0 .
f : R −→ R
F : Z≥0 −→ R
FS
f 7−→ F
The way to produce the Fourier coefficients, {an , bn } of a function, f , is through these
easy formulas (that I won’t derive),
1 π
Z
a0 = f (x) dx , n = 0 ,
π −π
1 π
Z
an = f (x) cos nx dx , n > 0 , and
π −π
1 π
Z
bn = f (x) sin nx dx , n > 0 .
π −π
They work for functions which have period 2π or bounded domain [−π, π). For some
other period T , we would need to multiply or divide T /(2π) in the right places (check
on-line or see if you can derive the general formula).
Using these formulas, you can do lots of exercises, computing the Fourier series of
various functions restricted to the interval [−π, π).
In practice, we can’t build circuits or algorithms that will generate an infinite
sum of frequencies, but it’s easy enough to stop after any finite number of terms.
Figure 19.12 shows what we get if we stop the sum after three terms.
It’s not very impressive, but remember, we are using only three sines/cosines to
approximate a diagonal line. Not bad, when you think of it that way. Let’s take the
first 50 terms and see what we get (Figure 19.13).
Now we understand how Fourier series work. We can see the close approximation to
the straight line near the middle of the domain and also recognize the high frequency
551
Figure 19.12: Fourier partial sum of f (x) = x to n = 3
effort to get at the sharp “teeth” at each end. Taking it out to 1000 terms, Figure 19.14
shows a stage which we might find acceptable in some applications.
552
Figure 19.14: Fourier partial sum of f (x) = x to n = 1000
always produces a function of x which is periodic on the entire real line, even if we
started with (and only care about) a function, f (x) with bounded domain. The RHS
of this “equation” matches the original f over its original domain, but the domain
of the RHS may be larger. To illustrate this, if we were modeling the function
f (x) = x2 , restricted to [−π, π), the Fourier series would converge on the entire real
line, R, beyond the original domain (See Figure 19.15).
Figure 19.15: f (x) has bounded domain, but its Fourier expansion is periodic.
The way to think about and deal with this is to simply ignore the infinite number of
periods magnanimously afforded by the Fourier series expression (as a function of x)
and only take the one period that lies above f ’s original, bounded domain.
In contrast, if we defined a function which is defined over all R but had compact
support, [−π, π], it would not have a Fourier series; no single weighted sum of sinusoids
could build this function, because we can’t reconstruct the flat f (x) = 0 regions on
the left and right with a single (even infinite) sum. We can break it up into three
regions, and deal with each separately, but that’s a different story.
553
19.4 The Complex Fourier Series of a Periodic Func-
tion
19.4.1 Definitions
We continue to study real functions of a real variable, f (x), which are either periodic
or have a bounded domain. We still want to express them as a weighted sum of special
“pure” frequency functions. I remind you, also, that we are restricting our attention
to functions with period (or domain length) 2π, but our results will apply to functions
having any period T if we tweak them using factors of T or 1/T in the right places.
To convert from sines and cosines to complex numbers, one formula should come
to mind: Euler’s formula,
Now, let n runneth negative to form our Complex Fourier Series of the (same) func-
tion f .
where
1
2 (an − ibn ) ,
n>0
cn ≡ 21 (a−n + ib−n ) , n<0 .
1
a ,
2 0
n=0
554
The complex Fourier series is a cleaner sum than the real Fourier expansion which
uses sinusoids, and it allows us to deal with all the coefficients at once when doing
computations. The price we pay is that the coefficients, cn , are generally complex
(not to mention the exponentials, themselves). However, even when they are all
complex, the sum is still real. We have been – and continue to be – interested in
real-valued functions of a real variable. The fact that we are using complex functions
and coefficients to construct a real-valued function does not change our focus.
Make a mental note of the following observations by confirming, visually, that they’re
true.
while,
Z π
1
c(n) = f (x) e−inx dx .
2π −π
555
• If we know a function, f (x), we compute a specific spectrum value, cn , by
multiplying each f (x) by the complex basis function e−inx and integrating x
from −π to π. If we know a spectrum, {cn }, we compute a specific functional
value, f (x), by multiplying each cn by the complex basis functions einx and
summing n from −∞ to ∞.
• The correspondence between functions over R and Z,
f (x) ←→ c(n)
is, conceptually, its own inverse. You do (very roughly) the same thing to get the
spectrum from the function as you do to build the function from its spectrum.
Example
556
19.5 Periods and Frequencies
19.5.1 The Frequency of any Periodic Function
This short section will be very useful in motivating the approach to Shor’s period-
finding algorithm. In order to use it, I’ll temporarily need the letter f to mean
frequency (in keeping with the classical scientific literature), so we’re going to call our
periodic function under study g(x).
We’ve been studying periodic functions, g(x) of real x which have periods T = 2π,
and we have shown how to express them as a sum of either real sines and cosines,
∞ ∞
1 X X
g(x) = a0 + an cos nx + bn sin nx ,
2 n=1 n=1
or complex exponentials,
∞
X
g(x) = cn einx .
n = −∞
Each term in these sums has a certain frequency: the n. You may have gotten the
impression that the term “frequency” only applies to functions of the form sin (nx),
cos (nx) or e(nx) . If so, I’d like to disabuse you of that notion (for which my presen-
tation was partly responsible). In fact any periodic function has a frequency, even
those which are somewhat arbitrary looking.
We’ll relax the requirement that our periodic functions have period T = 2π. That
was merely a convenience to make its Fourier sum take on a standard form. For the
moment, we don’t care about Fourier series.
We will define frequency twice, first in the usual way and then using a common
alternative.
• The frequency f tells you how many periods of g(x) fit into any unit interval
(like [0, 1) or [−1/2, 1/2)).
• When the period is on the small side, like 1/10, the frequency will be large (10).
In this case we’d see 10 repeated patterns if we graphed g(x) between −1/2 and
1/2. (See Figure 19.16)
557
Figure 19.16: f = 10 produces ten copies of the period in [−.5, .5)
• When the period is larger, like 10, the frequency will be small (1/10). In this
case we’d see only a small portion of the graph of g(x) between −1/2 and 1/2,
missing the full view of its curves and gyrations, only apparent if we were to
graph it over much larger interval containing at least one full period, say -5 to
+5 or 0 to 10.(See Figure 19.17)
S
Figure 19.17: f = .1 only reveals one tenth of period in [−.5, .5)
558
Angular frequency, usually denoted by the letter ω, is therefore defined to be
2π
ω ≡ .
T
The relationship between angular frequency and ordinary frequency is
ω = 2π f .
The relationship between period and angular frequency has the same interpretation
and form as that for ordinary frequency with the qualitatively unimportant change
that the number 1 now becomes 2π. In particular, if you know the function’s angular
frequency, you know its period, courtesy of
ω·T = 2π .
559
Chapter 20
In this lesson we learn how any reasonable function (not just periodic ones covered
in the previous chapter) can be built using exponential functions, each having a
specific frequency. The sums used in Fourier series turn into integrals this time
around.
Ironically, we abandon the integrals after today and go back to sums since the
QFT is a “finite” entity. However, the continuous Fourier transform of this lesson
is correct foundation for the QFT , so if you can devote the time, it is good reading.
560
support. Can any (well-enough-behaved-but-non-periodic) function be expressed as
a sum of basis functions like einx ? There are two reasons to be hopeful.
2. The wild, fast squiggliness or the slow, smooth curvature of a graph are not
exclusive to periodic functions; they are part of any function. We should be
able to build them out of sines and cosines (or exponentials) whether they
appear in functions of compact support (even if unbounded domain) or general
functions over the real line.
l
Z π
1
c(n) = f (x) e−inx dx .
2π −π
561
• While not required, we makepthe formulas symmetric by replacing the normal-
ization constant, 1/(2π) by 1/(2π), and thus spreading that constant over
both expressions.
Notation
562
If we look at our original motivation for the FT , we see that FT −1 (F ) is actually
built-into that formula. Substitute F (s) in for what we originally called it, c(s), and
we have the explicit form of the inverse Fourier transform,
Z ∞
1
f (x) = √ F (s) eisx ds .
2π −∞
The noteworthy items are:
fb = F (f ), or
fe = F (f ).
f (−x) = f (x).
f (−x) = −f (x).
563
Figure 20.1: cos x is even
The domains of both f and F are, in our context, always assumed to be real (x
and s are both real). It is only the values of the functions f and F that we were
concerned about.
Finally, you might notice that there was nothing in any of our definitions that
required the original f be real-valued. In fact, as long as the domain is R, the range
can be ⊆ C. We’ll be using this fact implicitly as we consider the symmetric aspects
of the spatial (or time) “domain” (i.e., where f lives) and the frequency “domain”
(where F lives). In other words, we can start out with a complex-valued f and end up
with another complex-valued F , or perhaps even a real-valued F . All combinations
are welcome.
564
now “free-range” over all of R, like a hyper-chicken in a billion acre ranch, it might
go anywhere. We have to be careful that the Fourier integral converges. One oft cited
sufficient condition is that f (x) be absolutely integrable, i.e.,
Z ∞
|f (x)| dx < ∞ .
−∞
As you can see, simple functions. like f (x) = x or f (x) = x2 − x3 + 1 don’t pass
the absolute-integrability test. We need functions that tend to zero strongly at both
±∞, like a pulse or wavelet that peters-out at both sides. Some pictures to help you
visualize the graphs of square integrable functions are seen in figure 20.3. The main
characteristic is that they peter-out towards ±∞.
(The functions that I am claiming possess a Fourier transform are the fk (x) =
ψk2 (x), since these have already been squared, and are ready to be declared absolutely-
integrable, a stronger requirement than square-integrable.)
565
We plug it directly into the definition:
Z ∞
1
F (s) = √ f (x) e−isx dx
2π −∞
Z +.5
1
= √ e−isx dx
2π −.5
+.5
1 −isx
1 −i(.5s) i(.5s)
= √ e = √ e − e
−is 2π
−.5 −is 2π
r r
− e−i(.5s)
i(.5s)
2 1 e 2 sin .5s
= · = .
π s 2i π s
That wasn’t so bad. You can see both functions’ graphs in Figure 20.4. They demon-
strate that while f is restricted to a compact support, F requires the entire real line for
its full definition. We’ll see that this is no accident, and has profound consequences.
N is the height of its peak at the origin, and σ is called the standard deviation which
conveys what percentage of the total area under f falls between ±kσ, for k = 1, 2, 3,
or any multiple we like. When k = 3, 99.7% of the area is covered. (Figure 20.5
demonstrates this.)
566
Its Fourier transform is
Z ∞
N 2 /(2σ 2 )
F (s) = √ e−x eisx dx.
2π −∞
which can be computed by completing the square and using a polar coordinate trick
(look it up – it’s fun). The result is
σ2
−s2
F (s) = N σ e 2
,
which, if you look carefully, is another Gaussian, but now with a different height
and standard deviation. The standard deviation is now 1/σ, rather than σ. Loosely
speaking, if the “spread” of f is wide, the “spread” of F is narrow, and vice versa.
and
Z ∞
δ(x) dx = 1.
−∞
(Since δ is 0 away from the origin, the limits of integration can be chosen to be any
interval that contains it.)
The delta function is also known as the “Dirac delta function,” after the physicist,
Paul Dirac, who introduced it into the literature. It can’t be graphed, exactly, since
it requires information not visible on a page, but it has a graphic representation as
shown in figure 20.6.
567
Figure 20.6: Graph of the Dirac delta function
Figure 20.7: Sequence of box functions that approximate the delta function with
increasing accuracy
And if we’re not too bothered by the imprecision of informality, we accept the defi-
nition
δ(x) = lim δn (x) .
n→∞
In fact, the converging family of functions {δn (x)} serves a dual purpose. In a com-
puter we would select an N large enough to provide some desired level of accuracy and
use δN (x) as an approximation for δ(x), thus creating a true function (no infinities
involved) which can be used for computations, yet still has the properties we require.
568
20.4.3 The Delta Function as a Limit of Exponentials
While simple, the previous definition lacks “smoothness.” Any function that has unit
area and is essentially zero away from the origin will work. A smoother family of
Gaussian functions, indexed by a real parameter α,
r
α −αx2
δα (x) =
π
e ,
Figure 20.8: Sequence of smooth functions that approximate the delta function with
increasing accuracy
Using the integration tricks mentioned earlier, you can confirm that this has the
integration properties needed.
569
You can prove this by doing the integration on the approximating sequence {δn (x)}
and take the limit.
This sifting property is useful in its own right, but it also gives another way to
express the delta function,
Z ∞
1
δ(x) = eisx ds .
2π −∞
It looks weird, I know, but you have all the tools to prove it. Here are the steps:
3. Set x0 = 0.
The integral looks very much like an expression of the delta function. (Compare it
to the last of our many definitions of δ(x), about half-a-page up). If the exponent
did
√ not have that minus sign, the integral would be exactly 2π δ(s), making ? =
2π δ(s). That suggests that we use integration by substitution, setting x0 = −x and
wind up with an integral that does match this last expression of the delta function.
That would work:
[Exercise. Try it.]
For an interesting alternative, let’s simply guess that the
√ minus sign in the ex-
ponent doesn’t matter, implying the answer would still be 2π δ(s). We then test
570
our hypothesis by taking F −1 2π δ(s) and confirm that it gives us back f (x) = 1.
√
Watch.
h√ i 1
Z ∞√
F −1
2π δ(s) (s) = √ 2π δ(s) eisx ds
2π −∞
Z ∞
= δ(s) eisx ds
Z−∞∞
= δ(s − 0) eisx ds = ei·0·x = 1. X
−∞
The last line comes about by applying the sifting property of δ(s) to the function
f (s) = eisx at the point x0 = 0.
We have a Fourier transform pair
F √
1 ←→ 2π δ(s) ,
and we could have started with a δ(x) in the spatial domain which would have given a
constant in the frequency domain. (The delta function apparently has equal amounts
of all frequencies.)
Also, our intuition turned out to be correct: ignoring the minus sign in the ex-
ponent of e did not change the resulting integral, even when we left the integration
boundaries alone; the delta function can be expressed with a negative sign in the
exponent,
Z ∞
1
δ(x) = e−isx ds .
2π −∞
571
20.5.2 Example 4: A Cosine
Next we try f (x) = cos x. This is done by first solving Euler’s formula for cosine,
then using the definition of δ(x),
Z ∞
h i 1
F [cos x] (s) = √ cos(x) e−isx dx
2π −∞
Z ∞ ix
e + e−ix
1
= √ e−isx dx
2π −∞ 2
Z ∞ Z ∞
1 ix(1−s) −ix(1+s)
= √ e dx + e dx
2 2π −∞ −∞
1
= √ 2π δ(1 − s) + 2π δ(1 + s)
2 2π
r
π
= δ(1 − s) + δ(1 + s) .
2
Notice something interesting: this is the first time we see a Fourier transform that
is not real (never mind that δ(s) isn’t even a function ... we’re inured to that by
572
now). What do you remember from the last few sections that would have predicted
this result?
Does the FT of sin x (or cos x) make sense? It should. The spectrum of sin x
needs only one frequency to represent it: |s| = 1. (We get two, of course, because
there’s an impulse at s = − ± 1, but there’s only one magnitude.) If we had done
sin ax instead, we would have seen the impulses appear at s = ±a, instead of ±1,
which agrees with our intuition that sin ax requires only one frequency to represent
it. The Fourier coefficients (weights) are zero everywhere except for s = ±a.
(Use this reasoning to explain why a constant function has an FT = one impulse
at 0.)
As an exercise, you can throw constants into any of the functions whose FT s we
computed, above. For example, try doing A sin (2πnx).
573
Since we’ll be calculating probabilities of quantum states and probabilities are the
amplitudes’ absolute-values-squared, this says that both f (x) and f (x + α) have
Fourier transforms which possess the same absolute-values and therefore the same
probabilities. We’ll use this in Shor’s quantum period-finding algorithm.
The reason this theorem is so important in quantum mechanics is that our func-
tions are amplitudes of quantum states, and their squared absolute values are the
probability densities of those states. Since we want any state to sum (integrate) to 1
over all space (“the particle must be somewhere”), we need to know that the Fourier
transform does not disturb that property. Plancherel’s theorem assures us of this
fact.
20.6.3 Convolution
One last property that is heavily used in engineering, signal processing, math and
physics is the convolution theorem.
A convolution is a binary operator on two functions that produces a third function.
Say we have an input signal (maybe an image), f , and a filter function g that we
want to apply to f . Think of g as anything you want to “do” to the signal. Do you
want to reduce the salt-and-pepper noise in the image? There’s a g for that. Do you
want to make the image high contrast? There’s a g for that. How about looking at
574
only vertical edges (which a robot would care about when slewing its arms). There’s
another g for that. The filter g is applied to the signal f to get the output signal
which we denote f ∗ g and call the convolution of f and g: The convolution of f
and g, written f ∗ g, is the function defined by
Z ∞
[f ∗ g] (x) ≡ f (ξ) g(x − ξ) dξ.
−∞
The simplest filter to imagine is one that smooths out the rough edges (“noise”). This
is done by replacing f (x) with a function h(x) which, at each x, is the average over
some interval containing x, say ±2 from x. With this idea, h(10) would be
Z 12
h(10) = K f (ξ) dξ ,
8
(Here K is some normalizing constant like 1/(sample interval).) If f had lots of change
from any x to its close neighbors (noise), |f (10) − f (10.1)| could be quite large. But
|h(10) − h(10.1)|) will be small since the two numbers are integrals over almost the
same interval around x = 10. This is sometimes called a running average and is
used to track financial markets by filtering out the moment-to-moment or day-to-day
noise. Well this is nothing more than a convolution of f and g, where g(x) = K for
|x| ≤ #daystoavg., and 0, everywhere else (K often chosen to be 1/(# days to avg.).
575
the period is T = 2π/n making the angular frequency
2π
ω = = n.
T
Well this is how we first introduced the idea of frequency informally at the start of
this lecture: I just declared the n in sin nx to be a frequency and showed pictures
suggesting that sines with large n squiggled more than those with small n.
The period-frequency relationship is demonstrated when we look at the Fourier
transform of two sinusoids, one with period 2π and another with period 2π/3. First,
sin x and its spectrum are shown in figure 20.13 where it is clear that the frequency
domain has only two non-zero frequencies at ±1.
Next, sin 3x and its spectrum appear in figure 20.14 where the spectrum still has only
two non-zero frequencies, but this time they appear at ±3.
Notice that both cases, when we consider ω to be the absolute value of the delta
spike’s position, we confirm the formula,
ωT = 2π ,
reminding us that if we know the period we know the (angular) frequency, and vice
versa.
20.8 Applications
20.8.1 What’s the FT Used For?
The list of applications of the FT is impressive. It ranges from cleaning up noisy audio
signals to applying special filters in digital photography. It’s used in communications,
576
circuit-building and numerical approximation. In wave and quantum physics, the
Fourier transform is an alternate way to view a quantum state of a system.
In one example scenario, we seek to design an algorithm that will target certain
frequencies of a picture, audio or signal, f . Rather than work directly on f , where it
is unclear where the frequencies actually are, we take F = F (f ). Now, we can isolate,
remove, enhance the exact frequencies of F (s) that interest us. After modifying F to
our satisfaction, we recover f = F −1 (f ).
A second application, dearer to quantum physicists, is the representation of states
of a quantum system. We might model a system such as an electron zipping through a
known magnetic field (but it could be a laser scattering off a crystal, the spin states of
two entangled particles, or some other physical or theoretical apparatus). We prepare
the system in a specific state – the electron is given a particular energy as it enters
the field at time t = 0. This state is expressed by a wavefunction Ψ. If we did a
good job modeling the system, Ψ tells us everything we could possibly know about
the system at a specific time.
Now Ψ is essentially a vector in a Hilbert space, and like all vectors, it can be
expressed in different bases depending on our interest.
Both ψ(x) and ϕ(p) tell the exact same story, but from different points of view.
Again, like any vector’s coordinates, we can transform the coordinates to a different
basis using a simple linear transformation. It so happens that the way one transforms
the position representation to the momentum representation in quantum mechanics
is by using the Fourier transform
ϕ(p) = F (ψ(x)) .
In other words, moving between position space and momentum space is accomplished
by applying the Fourier transform or its inverse.
577
20.8.2 The Uncertainty Principle
In quantum mechanics, every student studies a Gaussian wave-packet, which has the
same form as that of our example, but with an interpretation: f = ψ is a parti-
cle’s position-state, with ψ(x) being an amplitude. We have seen that the magnitude
squared of the amplitudes tell us the probability that the particle has a certain posi-
tion. So, if ψ represents a wave-packet in position space, then |ψ(x)|2 reveals relative
likelihoods that the particle is at position x. Meanwhile ϕ = F (ψ), as we learned a
moment ago, is the same state in terms of momentum. If we want the probabilities
for momentum, we would graph ϕ(s).
(Figures 20.15 and 20.16 show two different wave-packet Gaussians after taking
their absolute value-squared and likewise for their Fourier transforms.
Figure 20.16: A more localized Gaussian with σ 2 = 1/7 and its Fourier transform
The second pair represents an initial narrower spread of its position state compared
to the first. The uncertainty of its position has become smaller. But notice what
happened to its Fourier transform, which is the probability density for momentum.
Its uncertainty has become larger (a wider spread). As we set up the experiment to
pin down its position, any measurement of its momentum will be less certain. You are
looking at the actual mathematical expression of the Heisenberg uncertainty principle
in the specific case where the two observables are position and momentum.
20.9 Summary
This is a mere fraction of the important techniques and properties of the Fourier
transform, and I’d like to dig deeper, but we’re on a mission. If you’re interested, I
recommend researching the sampling theorem and Nyquist frequency.
578
And with that, we wrap up our overview of the classical Fourier series and Fourier
transform. Our next step in the ladder to QFT is the discrete Fourier transform.
579
Chapter 21
In this this lesson we turn the integrals of the last chapter back into nice finite
sums. This is allowed because the functions we really care about will be defined on a
finite set of integers, a property shared by the important functions that are the target
of Shor’s quantum period-finding and factoring algorithms.
After defining the discrete Fourier transform, or DFT , we’ll study a computa-
tional improvement called the fast Fourier transform, or FFT . Even though the
DFT is what we really need in order to understand the use of the QFT in our
algorithms, the development of the recursion relations in the FFT will be needed in
the next chapter when we design the quantum circuitry that implements the QFT .
580
21.2 Motivation and Definitions
21.2.1 Functions Mapping ZN → C
Sometimes Continuous Domains Don’t Work
1. Computers, having a finite capacity, can’t store data for continuous functions
like 0.5x3 − 1, cos N x or Ae2πinx . Instead, such functions must be sampled at
discrete points, producing arrays of numbers that only approximate the original
function. We are obliged to process these arrays, not the original functions.
2. Most of the data we analyze don’t have a function or formula behind them.
They originate as discrete arrays of numbers, making a function of a continuous
variable inapplicable.
Functions as Vectors
Although defined only on a finite set of integers, such function may still take real or
complex values,
(
R or
f : ZN −→ ,
C
581
and we can convey all the information about a particular function using an array or
vector,
f (0) c0
f (1) c1
f ←→ ←→ .. .
..
. .
f (N − 1) cN −1
Since N -dimensional quantum systems are the stuff of this course, we’ll need the
vectors to be complex: Quantum mechanics requires complex scalars in order to ac-
curately model physical systems and create relative phase differences of superposition
states. Although our initial vector coordinates might be real (in fact, they may come
from the tiny set {0, 1}), we’ll still want to think of them as living in C. Indeed, our
operators and gates will convert such coordinates into complex numbers.
We start by reviving our notation for the complex primitive N th root of unity,
ωN ≡ e2πi/N .
(I say “the,” primitive N th root because in this course I only consider this one number
to hold that title. It removes ambiguity and simplifies the discussion.) When clear
from the context, we’ll suppress the subscript N and simply use ω,
ω ≡ ωN .
From the primitive N th root, we generate all N of the roots (including 1, itself):
1, ω, ω 2 , ω 3 , · · · , ω N −1
or
2πi/N 4πi/N
, e6πi/N , · · · , e(N −1)2πi/N
1, e ,e
582
Figure 21.3: Primitive 5th root of 1 (the thick radius) and the four other 5th roots it
generates
Let’s look again at the way the FT maps mostly continuous functions to other mostly
continuous functions. The FT , also written F , was defined as a map between func-
tions,
F = F (f )
F
f −→ F,
which produced F from f using the formula
Z ∞
1
F (s) = √ f (x) e−isx dx.
2π −∞
It’s easy to forget what the FT is and think of it merely as the above formula, so I’ll
pester you by emphasizing the reason for this definition: we wanted to express f (x)
as a weighted-“sum” of frequencies, s, the weights being F (s),
Z ∞
1
f (x) = √ F (s) eisx ds .
2π −∞
To define the DFT , we start with the FT and make the following adjustments.
583
• The integrals will become sums from 0 to N − 1. Rather than evaluating f at
real continuous x, we will be taking the N discrete values of the vector (fk ) and
producing a complex spectrum vector (Fj ) that also has N components.
√ √
• The factor of 1/ 2π will be replaced by a normalizing factor 1/ N . This √ is
a choice, just like the specific flavor of FT we used (e.g., the factor of 1/ 2π
which some authors omit). In quantum computing the choice is driven by our
need for all vectors to live on the projective sphere in Hilbert space and therefore
be normalized.
ϕs (x) = e−isx .
Let’s rewrite the spectrum, F , using the symbolism of this s-parameter family,
ϕs (x), in place of e−isx ,
Z ∞
1
F (s) = √ f (x) ϕs (x) dx .
2π −∞
In the discrete case, we want functions of the index k, each of whose constant
frequency is parametrized by the discrete parameter j. The N th roots of unity
provide the ideal surrogate for the continuously parametrized ϕs . To make the
analogy to the FT as true as possible, I will take the negative of all the roots’
exponents (which produces the same N roots of unity, but in a different order),
and define
−jk
φj (k) = ωN = ω −jk .
N −1
The N functions {φj }j=0 replace the infinite {ϕs }s∈R . In other words, the
continuous basis functions of x, e−isx parametrized by s, become N vectors,
−j0
vj0 ω
vj1 ω −j1
. ..
.
. .
vj = = −jk , j = 0, 1, . . . , N − 1
vjk ω
. ..
..
.
vj(N −1) v−j(N −1)
where k is the coordinate index and j is the parameter that labels each vector.
584
Official Definition of DFT
And so we find ourselves compelled to define the discrete Fourier transform, or DFT ,
as follows.
If we need to emphasize the order of the DFT , we could use a parenthesized exponent,
DFT (N ) . However, when the common dimension of the input and output vectors is
understood from the context, we usually omit exponent N as well as the phrase “N th
order” and just refer to it as the DFT .
Notation
Common notational options that we’ll use for the DFT include
F = DFT (f ) = DFT [f ] .
The jth coordinate of the output can be also expressed in various equivalent ways,
The last two lack surrounding parentheses or brackets, but the subscript j still applies
to the entire output vector, F .
Just as with the FT , the DFT has many variants all essentially equivalent but
producing slightly different constants
√ or minus signs in the results. In one case the
forward DFT has no factor of 1/ 2π, but the reverse DFT contains a full 1/(2π). In
another, the exponential is positive. To make things more confusing, a third version
has a positive exponent of ω, but ω is defined to have the minus sign built-into it, so
the overall definition is actually the same as ours. Be ready to see deviations as you
perambulate the literature.
585
Inverse DFT
Our expectation is that {Fj } so defined will provide the weighting factors needed to
make an expansion of f as a weighted sum of the frequencies ω j ,
N −1
1 X
fk = √ Fj ω kj ,
N j=0
From exercise (d ) of the section Roots of Unity (at the end of our complex arithmetic
lecture) the sum in parentheses collapses to N δkm , so the double sum becomes
N −1
1 X N
fm (N δkm ) = fk = fk . QED
N m=0 N
586
21.3 Matrix Representation of the DFT
If we let ζ ≡ ω −1 (and remember, we are using the shorthand ω = ωN = e2πi/N ), the
following matrix, W , encodes the DFT in a convenient way. Define W as
1 1 1 1 ··· 1
1 ζ ζ2 ζ3 ··· ζ N −1
2 4 6 2(N −1)
· · ·
1 ζ ζ ζ ζ
1 . .. .. .. ..
W = √ .. . . . . .
N 1 ζ j
ζ 2j
ζ 3k
··· ζ (N −1)j
. . . . .
.. .. .. .. ..
N −1 2(N −1) 3(N −1) (N −1)(N −1)
1 ζ ζ ζ ··· ζ
Now, we can express DFT of the vector (fk ) as
DFT [f ] = W · (fk ) .
To see this, just write it out long hand.
1 1 1 1 ··· 1
f0
1 ζ ζ 2
ζ3 ··· ζ N −1 f1
1 ζ 2 2(N −1)
ζ4 ζ6 ···
ζ f2
1 .
.. .. .. .. ..
√ .. . . . . . .
N1 ζ j ζ 2j ζ 3j ··· ζ (N −1)j fk
. .. .. .. .. .
.. ..
. . . .
N −1 2(N −1) 3(N −1)
1 ζ ζ ζ · · · ζ (N −1)(N −1) fN −1
The jth component of this product is the sum
N −1 N −1
1 X 1 X
√ fk ζ jk = √ fk ω −jk ,
N k=0 N k=0
which is just DFT [f ]j .
587
The convolution theorem for continuous FT holds in the discrete case, where we still
have a way to compute the (this time discrete) convolution by using DFT :
√
f ∗g = 2π DFT −1 [ DFT (f ) · DFT (g)]
As you can see, it’s the same idea: a spatial or time translation in one domain (shifting
the index by −l) corresponds to a phase shift (multiplication by a root of unity) in
the other domain.
tells us that we have ∼ N complex terms to add and multiply for each resulting
component Fj in the spectrum. Since the spectrum consists of N coordinates, that’s
N 2 complex operations, each of which consists of two or three real operations, so it’s
still O (N 2 ) in terms of real operations. Furthermore, we often use fixed point (fixed
precision) arithmetic in computers, making the real sums and products independent
of the size, N , of the vector. Thus, the DFT is O (N 2 ), period.
You might argue that as N increases, so will the precision needed by each floating
point multiplication or addition, which would require that we incorporate the number
of digits of precision, m, into the growth estimate, causing it to be approximately
O (N 2 m2 ). However, we’ll stick with O (N 2 ) and call this the time complexity relative
to the arithmetic operations, i.e., “above” them. This is simpler, often correct and
will give us a fair basis for comparison with the up-coming fast Fourier transform
and quantum Fourier transform.
588
less than N ) – then the function would exhibit periodicity relative to the domain.
That, in turn, implies it has an associated frequency, f . In the continuous cases, T
and f are related by one of two common relationships, either
Tf = 1
or
Tf = 2π .
One could actually make the constant on the RHS different from 1 or 2π, although
it’s rarely done. But in the discrete case we do use a different constant, namely, N .
We still have the periodicity condition that
f (k + T ) = f (k) ,
for all k (when both k and k + T are in the domain), but now the frequency is defined
by
Tf = N.
you would see no dominant frequency, which agrees with the fact that the function is
not periodic. (See figure 21.4)
589
( .1 , 0, 0, .25 , .15 , .3 , 0, .1 ,
.1 , 0, 0, .25 , .15 , .3 , 0, .1 ,
.1 , 0, 0, .25 , .15 , .3 , 0, .1 ,
.1 , 0, 0, .25 , .15 , .3 , 0, .1 ,
.1 , 0, 0, .25 , .15 , .3 , 0, .1 ,
.1 , 0, 0, .25 , .15 , .3 , 0, .1 ,
.1 , 0, 0, .25 , .15 , .3 , 0, .1 ,
.1 , 0, 0, .25 , .15 , .3 , 0, .1 ,
.1 , 0, 0, .25 , .15 , .3 , 0, .1 ,
.1 , 0, 0, .25 , .15 , .3 , 0, .1 ,
.1 , 0, 0, .25 , .15 , .3 , 0, .1 ,
.1 , 0, 0, .25 , .15 , .3 , 0, .1 ,
.1 , 0, 0, .25 , .15 , .3 , 0, .1 ,
.1 , 0, 0, .25 , .15 , .3 , 0, .1 ,
.1 , 0, 0, .25 , .15 , .3 , 0, .1 ,
.1 , 0, 0, .25 , .15 , .3 , 0, .1 )
A very pure periodic vector akin to the pure “basis functions” of the continuous cases,
like sin nx, cos nx and eins would be one like
( 0, 0, 0, .25 , 0, 0, 0, 0,
0, 0, 0, .25 , 0, 0, 0, 0,
0, 0, 0, .25 , 0, 0, 0, 0,
0, 0, 0, .25 , 0, 0, 0, 0,
0, 0, 0, .25 , 0, 0, 0, 0,
0, 0, 0, .25 , 0, 0, 0, 0,
0, 0, 0, .25 , 0, 0, 0, 0,
0, 0, 0, .25 , 0, 0, 0, 0,
0, 0, 0, .25 , 0, 0, 0, 0,
0, 0, 0, .25 , 0, 0, 0, 0,
0, 0, 0, .25 , 0, 0, 0, 0,
0, 0, 0, .25 , 0, 0, 0, 0,
0, 0, 0, .25 , 0, 0, 0, 0,
0, 0, 0, .25 , 0, 0, 0, 0,
0, 0, 0, .25 , 0, 0, 0, 0,
0, 0, 0, .25 , 0, 0, 0, 0 ) ,
and this one has a DFT in which all the non-zero frequencies in the spectrum have
the same amplitudes, as seen in figure 21.6.
For all these examples I used N = 128, and for the two periodic cases, the period
was T = 8 (look at the vectors), which would make the frequency f = N/8 = 128/8 =
16. You can see that all of the non-zero amplitudes in the spectrum (i.e., the DFT )
are multiples of 16: 16, 32, 48, etc. (There is a “phantom spike” at 128, but that’s
590
Figure 21.6: The spectrum of a very pure periodic vector
beyond the end of the vector and merely an artifact of the graphing software, not
really part of the spectrum.)
The thing to note here is that by looking at the spectrum we can see multiples of
the frequency in the form cf , for c = 1, 2, 3, . . . , 7 and from that, deduce f and from
f , get T .
21.6.2 Cost
A slightly sticky requirement is that it only operates on vectors which have exactly
N = 2n components, but there are easy work-arounds. One is to simply pad a
deficient f with enough 0s to bring it up to the next power-of-two. For example, we
might upgrade a 5-vector to an 8-vector like so:
f0
f1
f0
f2
f1
f2
−→ f4
0
f4
0
591
Therefore, until further notice, we assume that N = 2n , for some n ≥ 0.
In terms of coordinates,
fkeven = f2k
and
fkodd = f2k+1 ,
592
21.7.2 N th Order DFT in Terms of Two (N/2)th Order DFT s
Now, ω = ωN was our primitive N th root of unity, so ω 2 is an (N/2)th root of unity
√ 2 √
(e.g., ( 8 a) = 4 a ). If we rewrite the final sum using ω 0 for ω 2 , N 0 for N/2 and
labeling the ω outside the odd sum as an N th root, things look very interesting:
N0 − 1 0 −1
NX
!
1 1 X even 0 −jk −j 1 −jk
DFT [f ]j = √ √ fk (ω ) + ωN √ fkodd (ω 0 )
2 0
N k=0 0
N k=0
We recognize each sum on the RHS as an order N/2 DFT , so let’s go ahead and
label them as such, using an exponent label to help identify the orders. We get
1 −j
DFT (N ) [f ]j = √ DFT (N/2) [f even ]j + ωN · DFT (N/2) f odd j
2
Since the j on the LHS can go from 0 → (N − 1), while both DFT s on the RHS are
only size N/2, we have to remind ourselves that we consider all these functions to be
periodic when convenient.
Start of Side-Trip
Let’s clarify that last statement by following a short sequence of maneu-
vers. First turn an N/2 dimensional vector, call it g, into a “periodic
vector” that is N – or even infinite – dimensional by assigning values to
the “excess” coordinates based on the original N/2 with the help of an
old trick,
g( p + N/2 ) ≡ g(p) .
593
End of Side-Trip
The upshot is that the j on the RHS can always be taken modulo the size of the
vectors on the RHS, N/2. We’ll add that detail for utter clarity:
1
DFT [f ]j = √ DFT (N/2) [f even ](j mod N/2)
(N )
2
−j
· DFT (N/2) f odd (j mod N/2)
+ ωN
Finally, let’s clear the smoke using shorthand like F (N ) = DFT (N ) (f ), FE = F even
and FO = F odd . The final form is due to Danielson and Lanczos, and dates back to
1942.
21.7.3 Danielson-Lanczos Recursion Relation
h
(N )
1 (N/2)
i
−j
h
(N/2)
i
F j
= √ FE + ωN · FO
2 (j mod N/2) (j mod N/2)
We have reduced the computation of a size N DFT to that of two size (N/2) DFT s
(and a constant time multiplication and addition). Because we can do this all the
way down to a size 1 DFT (which is just the identity operation – check it out), we
are able to compute Fj in log N iterations, each one a small, fixed number of complex
additions and multiplications.
1. The cost of partitioning f into f even and f odd , unfortunately, does require us
running through the full array at each recursion level, so that’s a deal breaker.
2. We can fix the above by passing the array “in-place” and just adding a couple
parameters, start and gap, down each recursion level. This obviates the need
to partition the arrays, but, each time we compute a single output value, we still
end up accessing and adding all N of the original elements (do a little example
to compute F3 for an 8-element f ).
3. Recursion has its costs, as any computer science student well knows, and there
are many internal expenses that can ruin our efficiency even if we manage to
fix the above two items, yet refuse to abandon recursion.
In fact, it took some 20 years before someone (Tukey and Cooley get the credit)
figured out how to leverage this recursion relation to break theO(N 2 ) barrier.
594
21.7.4 Code Samples: Recursive Algorithm
Before we present a non-recursive algorithm based on Tukey and Cooley’s work, we
should confirm that the recursive algorithm actually works and has expected time
complexity of N 2 .
We’ll use an object-oriented (OOP) solution written in C++ whose main utility
class will be FftUtil. A single FftUtil object will have its own FFT size, provided
either at instantiation (by a constructor ) or during mid-life (by an instance method
setSize()). Once a size is established all the roots of unity are stored inside the
object so that subsequent computations using that same object do not require root
calculations. We’ll have the obvious accessors, mutators and computational methods
in the class, as demonstrated by a simple client, shown next. Because many of the
aspects of the class are independent of the details of the computational algorithm, we
can use this class for the up-coming non-recursive, N logN , solution.
Not all the capabilities of the class will be demonstrated, but you’ll get the idea.
First, the client’s view.
// client - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
int main ()
{
const int FFT_SIZE = 8;
int k ;
bool const TIMING = false ;
Complex * a = new Complex [ FFT_SIZE ];
The client is invoking setInSig() and toStringOut() to transfer the signals be-
595
tween it and object and also calling calcFftRecursive() to do the actual DFT
computation. The client also uses a simple eight-element input signal for testing,
7
fk k=0 ≡ { 0, .1, .2, .3, .4, .5, .6, .7 } .
Although the class proves to be only a slow O(N 2 ) solution, we should not shrug-off
its details; as computer scientists, we need to have reliable benchmarks for future
comparison and proof-of-correctness coding runs. Thus, we want to look inside class
FftUtil and then test it.
The publicly exposed calcFftRecursive() called by the client leans on a private
helper not seen by the client: calcFftRecursive(). First the definition of the public
member method:
// public method that client uses to request FFT computation
// assumes signal is loaded into private array inSig []
bool FftUtil :: c a l c F f t R e c u r s i v e ()
{
int k ;
return true ;
}
Here’s the private recursive helper which you should compare with the Danielson-
Lanczos relation. It is a direct implementation which emerges naturally from that
formula.
// private recursive method that does the work
// a: array start location
// gap : interval between consec a [] elements ( so we can pass array in - place )
// size : size of a [] in current iteration
// rootPos : index in the ( member ) roots [] array where current omega stored
// j: output index FFT we are computing
// reverse : true if doing an inverse FFT
596
rootPosNxtLevel , gapNxtLev , reverse ) ;
odd = fftFHC ( n , a + gap , roots , j % sizeNxtLevel , sizeNxtLevel ,
rootPosNxtLevel , gapNxtLev , reverse ) ;
// put the even and odd results together to produce return for current call
// ( inverse FFT wants positive exponent , forward wants negative )
if ( reverse )
arrayPos = ( j * rootPos ) % n ; // j * omega
else
arrayPos = ( j * ( n - rootPos ) ) % n ; // -j * omega
return retVal ;
}
I’ll let you analyze the code to show that is does predict O(N 2 ) big-O timing, but we
will verify it using benchmarks in the next few lines.
The output confirms that we are getting the correct values.
/* = = = = = = = = = = = = = = = = = = = = = sample run = = = = = = = = = = = = = = = = = = = = =
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = */
And we loop through different input-array sizes to see how the time complexity shakes
out:
FFT size 1024 Recursive FFT : 0.015 seconds .
FFT size 2048 Recursive FFT : 0.066 seconds .
FFT size 4096 Recursive FFT : 0.259 seconds .
FFT size 8192 Recursive FFT : 1.014 seconds .
FFT size 16384 Recursive FFT : 4.076 seconds .
The pattern is unmistakable: doubling the array size causes the time to grow four-
fold. This is classic N 2 time complexity. Besides the growth rate, the absolute times
(four seconds for a modest sized signal) are unacceptable.
597
by previewing the very short high-level FFT code that invokes them.
if ( ! c o p y I n S i g T o A u x S i g B i t R e v () )
return false ;
if ( ! normalize () )
return false ;
return true ;
}
21.8.2 Bit-Reversal
The big break in our quest for speed comes when we recognize that the recursive
algorithm leads – at its deepest nested level – to many tiny order-one arrays, and
that happens after log N method calls. This is the end of the recursion at which point
we compute each of these order-one DFT s, manually. It’s the infamous “escape valve”
of recursion. But computing the DFT of those size-one arrays is trivial:
598
that is, the DFT of any single element array is itself (apply the definition). So we
don’t really have to go all the way down to that level - there’s nothing to do there.
(Those, the size one DFT s are already done, even if they are in a mixed up order
in our input signal.) Instead we can halt recursion when we have size two arrays, at
which point we compute the order two DFT s explicitly. Take a look:
h
(2)
i 1
= √ fp + (−1)−j · fq ,
FEEOE...OE
j 2
(2)
for some p and q, gives us the jth component of the size two DFT s. FEEOE...OE
represents one of the many order two DFT s that result from the recursion relation
by taking increasingly smaller even and odd sub-arrays in our recursive descent from
size N down to size two.
The Plan: Knowing that the first order DFT s are just the original input signal’s
array elements (whose exact positions are unclear at the moment), our plan is work
not from the top down, recursively, but from the bottom with the original array
values, and build-up from there. In other words, instead of recursing down from size
N to size one, we iterate up from size one to size N . To do that we need to get the
original signal, {fk }, in the right order in preparation for this rebuilding process.
So our first task is to re-order the input array so that every pair fp and fq that
we want to combine to get a size two DFT end up next to one another. While at
it, we’ll make sure that once they are computed, all size two pairs which need to
be combined to get the fourth order DFT s will also be adjacent, and so on. This
reordering is called bit-reversal. The reason for that name will be apparent shortly.
Let’s start with an input signal of size 8 = 23 that we wish to transform and define
it such that it’ll be easy to track:
7
fk k=0 ≡ { 0, .1, .2, .3, .4, .5, .6, .7 } .
We start at the top and, using the Danielson-Lanczos recursion relation, see how the
original f decomposes into two four-element even-odd sets, then four two-element
even-odd sets, and finally eight singleton sets. Figure 21.7 shows how we separate the
(4) (4)
original eight-element f (8) into fE and fO , each of length 4.
599
Figure 21.7: Going from an 8-element array to two 4-element arrays (one even and
one odd)
(4)
I’ll select one of these, fE , for further processing (figure 21.8).
600
Figure 21.8: Decomposing the even 4-element array to two 2-element arrays (one even
and one odd)
(2)
This time, for variety, we’ll recurse on the odd sub-array, fEO (figure 21.9).
Remember, our goal is to re-order the input array to reposition pairs such as these
next to one-another. The last picture told us that we want to see f2 repositioned so
it is next to f6 (figure 21.10).
601
Figure 21.10: Final positions of f2 should be next to f4 .
Once this happens, we are at the bottom of recursion. These one-element arrays
are their own DFT s, and as such, are neither even nor odd. Each singleton is some
F0 (more accurately, [FEOE ]0 and[FEOO ]0 .
Aside: Recall that we needed to take j (mod N/2) at each level, but when N/2
is 1, anything mod 1 is 0, which is why we are correct in calling these F0 for different
one-element F arrays (figure 21.11).
If we had taken a different path, we’d have gotten a different pair. Doing it for
all possibilities would reveal that we would want the original eight values paired as
follows:
f2 ←→ f6
f3 ←→ f7
f0 ←→ f4
f1 ←→ f5
Now, we want more than just adjacent pairs; we’d like the two-element DFT s
that they generate to also be next to one another. Each of these pairs has to be
positioned properly with respect to the rest. Now is the time for us to stand on the
shoulders of the giants who came before and write down the full ordering we seek.
This is shown in figure 21.12. (Confirm that the above pairs are adjacent).
What you are looking at in figure 21.12 is the bit-reversal arrangement. It’s so
named because in order to get f6 , say, into its correct position for transform building,
we “reverse the bits” of the integer index 6 (relative to the size of the overall transform,
8, which is three bits). It’s easier to see than say: 6 = 110 reversed is 011 = 3, and
indeed you will find the original f6 = .6 ends up in position 3 of the bit-reversed,
602
Figure 21.12: Bit-reversal reorders an eight-element array
array. On the other hand, 5 = 101 is its own bit reversed index, so it should – and
does – not change array positions.
The Code. The bit reversal procedure uses a managing method, allocateBitRev()
which calls a helper reverseOneInt(). First, a look at the methods:
// called once for a given FFT size and not repeated for that size
// produces array bitRev [] used to load input signal
void FftUtil :: allocate BitRev ()
{
int k ;
if ( bitRev != NULL )
delete bitRev ;
603
return ;
}
// inVal and retVal are array locations , and size of the array is fftSize
retVal = 0;
for ( logSize = fftSize >> 2; logSize > 0; logSize > >= 1)
{
retVal |= ( inVal & 1) ;
retVal < <= 1;
inVal > >= 1;
}
return retVal ;
}
Time Complexity
The driver method has a simple loop of size N making it O(N ). In that loop, it calls
the helper, which careful inspection reveals to be O(log N ): the loop in that helper is
managed by the statement logSize >>= 1, which halves the size of array each pass,
an action that always means log growth. Since this is a nesting of two loop the full
complexity is O(N log N ).
Maybe Constant Time? This is done in series with the second phase of FFT
rebuilding so this complexity and that of the next phase do not multiply; we will take
the slower of the two. But the story gets better. Bit reversal need only be done once
for any size N and can be skipped when new FFT s of the same size are computed.
It prepares a static array that is independent of the input signal. In that sense, it is
really a constant time operation for a given FFT or order N .
Either way you look at it, this preparation code won’t affect the final complexity
since we’re about to see that the next phase is also O(N log N ) and the normalization
phase is only O(N ) making the full algorithm O(N log N ) whether or not we count
bit reversal.
604
h i h i h i
(2) (1) (1)
Figure 21.13: FEO = FEOE + (−1)−j FEOO
j j (mod 1) j (mod 1)
two-element DFT s, we repeat the process at the next level: we build the four-element
arrays from these two-element arrays. Figure 21.14 shows the process on one of the
(4)
two four-element arrays, FE . This time, we are using the 4th root of unity, i, as the
multiplier of the odd term.
605
h i h i h i
(4) (2) (2)
Figure 21.14: FE = FEE + (i)−j FEO
j j (mod 2) j (mod 2)
(4) (4)
After the four-element FE we compute its companion four-element DFT , FO ,
then go to the next and final level, the computation of the full output array F (8) . If
the original signal were length 16, 32 or greater, we’d need to keep iterating this outer
loop, building higher levels until we reached the full size of the input (and output)
array.
The implementation will contain loops that make it easier to “count” the big-O of
this part, so first have a casual look at the class methods.
The Code. The private method, combineEvenOdd(), which does all the hard
work, uses an iterative approach that mirrors the diagrams. Before looking at the
full definition, it helps to first imagine processing the pairs, only, which turns those
singleton values into order-two DFT s. Recall, that for that first effort, we use −1 as
the root of unity and either add or subtract,
h i h i h i
(2) (1) (1)
FEO = FEOE + (−1)j FEOO
j j (mod 1) j (mod 1)
for j = 0, 1,
which replaces those eight one-element DFT s with four two-element DFT s. The
code to do that isn’t too bad.
Two-Element DFT s from Singletons:
// this computes the DFT of length 2 from the DFTs of length 1 using
// F0 = f0 + f1
606
// F1 = f0 - f1
// and does so , pairwise ( after bit - reversal re - ordering ) .
// It represents the first iteration of the loop , but has the concepts
// of the recursive FFT formula .
// the roots [0] , ... , roots [ fftSize -1] are the nth roots in normal order
// which implies that -1 would be found at position fftSize /2
If we were to apply only this code, we would be replacing the adjacent pairs (after
bit-reversal) by their order-two DFT s. For an input signal
7
fk k=0
≡ { 0, .1, .2, .3, .4, .5, .6, .7 } ,
and the the above loop would replace these by the order-two DFT s to produce
• j < 2 → j < 4
I’ll let you write out the second iteration of the code that will combine DFT s of
length two and produce DFT s of length four. After doing that exercise, one can see
that the literals 1, 2, 4, etc. should be turned into a variable, groupsize, over which
we loop (by surrounding the above code in an outer groupsize-loop). The result
would be the final method.
Private Workhorse method combineEvenOdd():
607
// private non - recursive method builds FFT from 2 - elements up
// reverse : true if doing an inverse FFT
// rather than be clever , copy array back to aux - input for next pass
if ( ( groupSize << 1) < fftSize )
c op y O u t S i g T o A u x S i g () ;
}
return true ;
}
Time Complexity
At first glance it might look like a triple nested loop, leading to some horrific cubic
performance. Upon closer examination we are relieved to find that
608
• the two inner loops together constitute only N loop passes.
21.8.4 Normalization
To complete the trio, we have to write the normalize() method. It’s a simple linear
complexity loop not adding to the growth of the algorithm.
The Normalize Method
bool FftUtil :: normalize ()
{
double factor ;
int k ;
if ( outSig == NULL )
return false ;
It is slightly slower than linear, the difference being the expected factor of logN
(although we couldn’t tell that detail from the above times). Not only is this evidence
of the N logN time complexity, but it is orders of magnitude faster than the recursive
algorithm (4.076 seconds vs. .003 seconds for a 16k array). We finally have our true
FF T .
This gives us more (far more) than enough classical background in order to tackle
the quantum Fourier transform.
609
Chapter 22
22.2 Definitions
n
22.2.1 From C2 to H(n)
We know that the discrete Fourier transform of order N, or DFT (its order usually
implied by context), is a special operator that takes an N -dimensional complex vector
f = (fk ) to another complex vector fe = (fej ). In symbols,
DFT : CN −→ CN ,
DFT (f ) 7−→ fe,
610
or, if we want to emphasize the coordinates,
DFT [(fk )] 7−→ (fej ) .
When we moved to the fast Fourier transform the only change (other than the im-
plementation details) was that we required N = 2n be a power-of-2. Symbolically,
n n
FFT : C2 −→ C2 ,
FFT [(fk )] 7−→ (fej ).
We maintain this restriction on N and continue to take n to be log N (base 2 always
implied).
We’ll build the definition of the QFT atop of our firm foundation of the DFT ,
so we start with
n n
DFT : C2 −→ C2 ,
n
a 2n th order mapping of C2 to itself and use that to define the 2n th order QFT
acting on an nth order Hilbert space,
QFT : H(n) −→ H(n) .
[Order. The word “order,” when describing the DFT = DFT (N ) means N , the
dimension of the underlying space, while the same word,“order,” when applied to a
tensor product space H(n) is n, the number of component, single qubit spaces in the
product. The two “orders” are not the same: N = 2n or, equivalently, n = log N .]
In each case, we have to confirm that U is both linear and unitary (although in
methods 1 and 3, we get linearity for free since it is built-into those definitions).
Also, we only have to use one technique, since the other two can be surmised using
linear algebra. However, if we use a different method than someone else to define the
same operator, we had better check that our definition and theirs are equivalent.
611
22.2.3 Review of Hadamard Operator
Let’s make sure we understand these concepts before defining the QFT by reprising
an example from our past, the Hadamard operator.
N th Order Hadamard
Meanwhile, for the n-fold Hadamard, we didn’t need to define it: we just derived it
using the laws of tensor products. We found (using method 1), that
n −1
n 2X
n 1
H ⊗n
|xi = √ (−1)x·y |yin .
2 y=0
612
22.2.4 Defining the QFT
As long as we’re careful to check that our definition provides a linear and unitary
transformation, we are free to use a state’s coordinates to define it. Consider a general
state |ψin and its preferred basis amplitudes (a.k.a., coordinates or coefficients),
c0
c1
−1
|ψin ←→ (cx )N = .. , N = 2n .
x=0
.
cN −1
We describe how the order-N QFT = QFT (N ) acts on the 2n coefficients, and this
will define the QFT for any qubit in H(n) .
−1
If |ψin ←→ (cx )N
x=0 ,
−1
then QFT (N ) |ψin ←→ cy )N
(e y=0 .
In words, starting from |ψin , we form the vector of its amplitudes, (cx ); we treat (cx )
like an ordinary complex vector of size 2n ; we take its DFT (N ) to get another vector
cy ); we declare the coefficients { e
(e cy } to be the amplitudes of our desired output state,
(N ) n
QFT |ψi . The end.
If we need it, we’ll display the QFT ’s order in the superscript with the notation
QFT (N ) . Quantum computer scientists don’t usually specify the order in diagrams,
so we’ll often go with a plain “QFT ” and remember that it operates on an order-n
Hilbert space having dimension N = 2n .
613
We can really feel the Fourier transform concept when we view the states as complex
vectors |ψin = c = (cx ) of size 2n to which we subject the standard DFT ,
N
X −1
QFT |ψin = [DFT (c)]y |yin .
y=0
The definition assumes the reader can compute a DFT , so we’d better unwind our
definition by expressing the QFT explicitly. The yth coordinate of the output QFT
is produced using
N −1
1 X
[QFT |ψin ]y = cy
e = √ cx ω yx ,
N x=0
where ω = ωN is the primitive N th root of unity. The yth coordinate can also be
obtained using the the dot-with-the-basis-vector trick,
n
[QFT |ψin ]y = hy| QFT |ψin ,
so you might see this notation, rather than the subscript, by physics-oriented authors.
For example, it could appear in the definition – or even computation – of the yth
coordinate of the QFT ,
N −1
n 1 X
hy| QFT |ψin = √ cx ω yx .
N x=0
We’ll stick with the subscript notation, [QFT |ψin ]y , for now.
Hold the phone. Doesn’t the DFT have a negative root exponent, ω −jk ? The way I
defined it, yes. As I already said, there are two schools of thought regarding forward
vs. reverse Fourier transforms and DFT s. I really prefer the negative exponent for the
forward transform because it arises naturally when decomposing a function into its
frequencies. But there is only one school when defining the QFT , and it has a positive
root exponent in the forward direction (and negative in the reverse). Therefore, I have
to switch conventions.
[If you need more specifics, here are three options. (i ) You can go back and define
the forward DFT using positive exponent from the start ... OR ... (ii ) You can
consider the appearance of “DFT ” in the above expressions as motivational but rely
only on the explicit formulas for the formal definition of the QFT without anxiety
about the exponent’s sign difference ... OR ... (iii ) You can preserve our original DFT
but modify the definition of QFT by replacing “DFT ” with “DFT −1 ” everywhere
in this section.]
Anyway, we won’t be referring to the DFT , only the QFT – effective immediately
– so the discrepancy starts and ends here.
614
Full Definition of QFT using Method 2
You should also be ready to see the full output vector, QFT |ψin , expanded along
the CBS,
N
X −1
n
if |ψi = cx |xin
x=0
N −1 N −1
n 1 XX
then QFT |ψi ≡ √ cx ω yx |yin .
N y=0 x=0
It’s more common in quantum computing to take the CBS route, and in that case
the definition of the QFT would be
N −1
n 1 X yx n
QFT |xi ≡ √ ω |yi ,
N y=0
from which we would get the definition for arbitrary states by applying linearity to
their CBS expansion. Our task at hand is to make sure
Step 1) Agreement on CBS. We’ll show that the coefficient definition, QFT ours ,
agrees with the typical CBS definition, QFT cbs on the CBS. Consider the CBS |xin .
615
It is a tall 2n component vector with a 1 in position x:
0
0
.
.
.
|xin = 1 ←− xth coefficient
.
..
0
0
N
X −1 N
X −1
n
= ck |ki = δkx |kin .
k=0 k=0
Now apply our definition to this ket and see where it leads:
N −1 N −1
n 1 XX
QFT ours |xi = √ ck ω yk |yin
N y=0 k=0
N −1 N −1
1 XX
= √ δkx ω yk |yin
N y=0 k=0
N −1
1 X yx n
= √ ω |yi
N y=0
= QFT cbs |xin . QED
Step 2) Linearity. We must also show that QFT ours is linear. Once we do that
we’ll know that both it and QFT cbs (linear by construction) not only agree on the
CBS but are both linear, forcing the two to be equivalent over the entire H(n) .
The definition of QFT ours in its expanded form is
N −1 N −1
1 XX
QFT |ψin ≡ √ cx ω yx |yi ,
N y=0 x=0
616
so
ec0
c1
e
QFT ours |ψin =
c2
e
..
.
X
cx ω 0 x
x
X
cx ω 1 x
1 X x
= √
cx ω 2 x
N
x
..
.
1 1 1 ··· c0
1 ω ω 2 · · · c1
1
1 ω 2 ω 4 · · · c2
= √ .
N 1 ω 3 ω 6 · · ·
.. .. .. . . ..
. . . . .
Whenever an operator’s coordinates are transformed by matrix multiplication, the
operator is linear. QED
Step 3) Unitarity. Now that we have the matrix for the QFT ,
1 1 1 ···
1 ω ω 2 · · ·
1 2 4
· · ·
MQF T ≡ √ 1 ω ω ,
N1 ω ω
3 6
· · ·
.. .. .. ..
. . . .
we can use MQF T to confirm that QFT is unitary: if the matrix is unitary, so is
the operator. (Caution. This is only true when the basis in which the matrix is
expressed is orthonormal, which { |xin } is.) We need to show that
(MQF T )† MQF T = 1,
so let’s take the dot product of row x of (MQF T )† with column y of MQF T :
1 1
y
x ∗ 2x ∗
1 ω
1 −x −2x
ωy
√1 1, (ω ) , (ω ) , . . . √ 2y = 1, ω , ω , . . .
N N ω
ω 2y
N
.. ..
. .
N −1
1 X k(y−x) 1
= ω = (δxy N ) = δxy . QED
N k=0 N
617
(The second-from-last identity is from Exercise D (roots-of-unity section) in the
early lesson complex arithmetic.)
when plugged into the definition of QFT results in a quantum translation invariance
N −1
n 1 X yx n
QFT |x − zi = ω zx
√ ω |yi = ω zx QFT |xin .
N y=0
We lose the minus exponent of ω because the QFT uses a positive exponent in its
forward direction.
n −1
n 2X
⊗n 1
H |xi = √ (−1)x·y |yi ,
2 y=0
where x · y = x y is the mod-2 dot product based on the individual binary digits in
the base-2 representation of x and y. Now −1 = ω2 (e2πi/2 = eπi = −1, X), so let’s
replace the −1, above, with its symbol as the square root of unity,
n −1
n 2X
n 1
H ⊗n
|xi = √ ω2x·y |yin .
2 y=0
618
N -Dimensional QFT
• The QFT uses a primitive 2n th root of unity, while H ⊗n uses a square root of
unity.
• The exponent of the root-of-unity is an ordinary integer product for QFT , but
is a mod-2 dot product for H ⊗n .
A useful factoid is that when N = 2 (n = 1) both operators are the same. Look:
1 X 1
(2) 1
QFT |xi = √ (−1)yx |yi
2 y=0
1
= √ (−1)0·x |0i + (−1)1·x |1i
2
( |0i + |1i
√
2
, x=0
= |0i − |1i
√
2
, x=1
We’ll refer to this as the quantum Fourier basis or, once we are firmly back in purely
quantum territory, simply as the Fourier basis or frequency basis. It will be needed
in Shor’s period-finding algorithm.
619
The Take-Away
22.4.1 Notation
We’ll review and establish some notation applicable to an N = 2n -dimensional H(n) .
The CBS kets in an N -dimensional H(n) are officially tensor products of the indi-
vidual CBSs of the 2-dimensional qubit spaces,
n−1
O
|xin = |xn−1 i ⊗ |xn−2 i ⊗ |xn−3 i ⊗ . . . ⊗ |x0 i = |xk i ,
k=0
620
We’ll also use the natural consequence of this notation,
n−1
X
x = xk 2k .
k=0
√ N
X −1
N QFT (N ) |xin = ω xy |yin
y=0
N −1 n−1
yk 2k
P
X x
= ω k=0 |yn−1 . . . y1 y0 i
y=0
−1
N n−1
!
2k
X Y
= ω xyk |yn−1 . . . y1 y0 i
y=0 k=0
(I displayed the order, N , explicitly in the LHS’s ‘‘QFT (N ) ,” something that will
come in handy in about two screens.) To keep the equations from overwhelming us,
let’s symbolize the inside product by
n−1
k
πxy ω xyk 2
Y
≡ ,
k=0
621
Separating a DFT into even and odd sub-arrays led to the FFT algorithm, and we
try that here in the hope of a similar profit.
√
8 QFT |xi3
The least significant bit, y0 , of all terms in the y-even group is always 0, so for terms
in this group,
y0 = 0
0
ω xy0 2 = ω x·0·1 = 1, so for even y we get
2 2
πxy xyk 2k k
Y Y
= ω = ω xyk 2 .
k=0 k=1
Evidently, the πxy in the y-even group can start the product at k = 1 rather than
k = 0 since the k = 0 factor is 1. We rewrite the even sum with this new knowledge:
P
y-even group
= πx·0 |000i + πx·2 |010i + πx·4 |100i + πx·6 |110i
= πx·0 |00i + πx·2 |01i + πx·4 |10i + πx·6 |11i |0i
7 2
!
k
X Y
= ω xyk 2 |y2 y1 i |0i
y=0 k=1
y even
Now that |y0 i = |0i has been factored from the sum we can run through the even y
more efficiently by
P7 P3
• halving the y-sum from even −→ all ,
• replacing 2k −→ 2k+1 .
622
(Take a little time to see why these adjustments make sense.) Applying the bullets
gives
P
y-even group
3 1
! !
k+1
X Y
= ω xyk 2 |y1 y0 i |0i
y=0 k=0
3 1
! !
X Y k
2 xyk 2
= ω |y1 y0 i |0i .
y=0 k=0
There’s one final reduction to be made. While we successfully halved the size of y
inside the kets from its original 0 → 7 range to the smaller interval 0 → 3, the x in
the exponent still roams free in the original set. How do we get it to live in the same
smaller world as y? The key lurks here:
xyk 2k
ω2
The even sub-array rearrangement precipitated a 4th root of unity ω 2 rather than
the original 8th root ω. This enables us to replace any x > 3 with x − 4, bringing it
back into the 0 → 3 range without affecting the computed values. To see why, do the
following short exercise.
[Exercise. For 4 ≤ x < 7 write x = 4 + p, where 0 ≤ p < 3. √ Plug 4 + p in for x
in the above exponent and simplify, leveraging the fact that ω = 8 1.]
The bottom line is that we can replace x with (x mod 4) and the equality is still
holds true,
3 1
! !
X Y (x mod 4) y 2k
ω2
P k
y-even group = |y1 y0 i |0i .
y=0 k=0
and we are encouraged to see the order-N/2 QFT staring us in the face,
q
P N (N/2) (n−1)
y-even group = 2
QFT | x mod (N/2) i |0i .
623
Let’s make this as concise as possible by using x̃ to mean x mod (N/2).
q
P N (N/2) (n−1)
y-even group = 2
QFT | x̃ i |0i .
We’ve expressed the even group as a QFT whose order N/2, half the original N .
The scent of recursion (and success) is in the air. Now, let’s take a stab at the odd
group.
The least significant bit, y0 , of the all terms in the y-odd group is always 1, so for
terms in this group,
y0 = 1
0
ω xy0 2 = ω x·1·1 = ωx, so for odd y we get
2 2
πxy xyk 2k k
Y Y
x
= ω = ω ω xyk 2 .
k=0 k=1
x
We separated the factor ω from the rest so that we could start the product at k = 1
to align our analysis with the y-even group, above. We rewrite the odd sum using
this adjustment:
P
y-odd group
= πx·1 |001i + πx·3 |011i + πx·5 |101i + πx·7 |111i
= πx·1 |00i + πx·3 |01i + πx·5 |10i + πx·7 |11i |1i
7 2
!
X k
Y
x
= ω
ω xyk 2 |y2 y1 i
|1i
y=0 k=1
y odd
Now that |y0 i = |1i and ω x have both been factored from the sum, we run through
the odd y by
• halving the y-sum from 7odd −→ 3all ,
P P
624
and we follow it by the same replacement, (x mod 4) → x that worked for the y-even
group (and works here, too):
P
y-odd group
3 1
! !
X Y (x mod 4) yk 2 k
= ωx ω2
|y1 y0 i |1i .
y=0 k=0
Once again, we are thrilled to see an (N/2)-Order QFT emerge from the fray,
q
(N/2) (n−1)
= ωx N
P
y-odd group 2
QFT | x̃ i |1i .
The binomial |0i + ω x |1i had to end up on the right of the sum because we were
peeling off the least-significant |0i and |1i in the even-odd analysis; tensor products
are not commutative. This detail leads to a slightly annoying but easily handled
wrinkle in the end. You’ll see.
√
Dividing out the N , using 2n for N and rearranging, we get an even clearer
picture.
625
If we apply the same math to the lower-order QFT on the RHS and plug the
result into last equation, using x̃˜ for the x̃ mod (N/2), we find
|0i + ω2xn−1 |1i |0i + ω2xn |1i
(2n ) n (2n−2 ) ˜ n−2
QFT |xi = QFT | x̃ i √ √ .
2 2
Now let recursion off its leash. Each iteration pulls a factor of |0i + ω2xk |1i out
k
(and to the right) of the lower-dimensional QFT (2 ) until we get to QFT (2) , which
would be the final factor on the left,
n
|0i + ω2xk |1i
(2n ) n
Y
QFT |xi = √ .
k=1
2
First, admire the disappearance of those pesky x̃ factors, so any anxiety about x
mod N/k is now lifted. Next, note that the RHS is written in terms of different
roots-of-unity, ω2 , ω4 , . . . , ω2n = ω. However, they can all be written as powers of
ω = ωN ,
n−k
ω2k = ω2 .
(n−1)
For example, when k = 1 we have ω2 = (−1), which is ω 2 . Using this, we write
the N th order QFT in terms of the N th root-of-unity ω,
n n−k x
!
(2n )
Y |0i + ω 2 |1i
QFT |xin = √ .
k=1
2
Products between kets are tensor products. Each factor in the final product expressing
n
QFT 2 is a single superposition qubit consisting of a mixture of |0i and |1i in its
own component space. Meanwhile, there are n of those small superpositions creating
the final tensor product. It is sometimes written
n−1
|0i + ω2xn−k |1i
O
(2n ) n
QFT |xi = √ .
2
k=0
626
I won’t use this notation, since it scares people, and when youQmultiply kets by
kets everyone knows that tensors are implied. So, I’ll use the notation for all
products, and you can infuse the tensor interpretation mentally when you see that
the components are all qubits.
However, it’s still worth remarking that this is a tensor product of kets from the
individual 2-dimensional H spaces (of which there are n) and as such results in a
separable state in the N -dimensional H(n) . This is a special way – different from the
expansion along the CBS – to express a state in this high dimensional Hilbert space.
But you should not be left with the impression that we were entitled to find a factored
representation. Most states in H(n) cannot be factored – they’re not separable. The
result we derived is that when taking the QFT of a CBS we happily end up with a
separable state.
The factored representation and the CBS expansion each give different information
about the output state, and it may not always be obvious how the coefficients or
factors of the two relate (without doing the math).
A simple example is the equivalence of a factored representation and the CBS
expansion of the following |ψi2 in a two qubit system (n = 2, N = 4):
2 |0i |0i |0i |1i |0i + |1i
|ψi = √ + √ = |0i √
2 2 2
Here we have both the CBS definition of QFT |xi2 and the separable view.
In the N -dimensional case, the two different forms can be shown side-by-side,
n
! n −1
2X
2n−k x
n
Y |0i + ω |1i
QFT (2 )
|xin
= √ = ω xy |yi .
k=1
2 y=0
Of course, there can only be (at most) n factors in the separable factorization, while
there will be up to 2n terms in the CBS expansion.
The reason I bring this up is that the (separable) factorization is more relevant
to the QFT than it was to the FFT because we are basing our quantum work on
the supposition that there will be quantum gates in the near future. These gates
are unitary operators applied to the input CBS qubit-by-qubit, which is essentially a
tensor product construction.
Let’s see how we can construct an actual QFT circuit from such unitary operators.
627
Good things come to those who calmly examine each factor, separately. Work from
left (most-significant output qubit) to right (least-significant output qubit).
We already know that this is H |x0 i but we’ll want to re-derive that fact in a way
that can be used as a template for the other two factors. ω is an 8th root of unity,
so the coefficient of |1i in the numerator can be derived from
ω 4x = ω 4 (4x2 + 2x1 + x0 ) = ω 16x2 ω 8x1 ω 4x0
x
1 · 1 · ω 4 0 = (−1)x0 ,
which means
|0i + ω 4x |1i |0i + (−1)x0 |1i
√ = √
2 2
|0i √+ |1i , x0 = 0
2
=
|0i √− |1i
, x0 = 1
2
= H |x0 i .
This was the most-significant qubit factor of output ket (the one on the far left of
the product). Let’s refer to the output ket as |exi and its most significant separable
factor (the one at the far left of our product) as |e
x2 i. We can then rewrite the last
equation as
|e
x2 i = H |x0 i .
[Don’t be lulled into thinking this is a computational basis element, though. Unlike
the input state, |xi = |x2 i |x1 i |x0 i, which is a product of CBS, and therefore, itself a
tensor CBS, the output, |e xi while a product of states, to be sure, is not comprised of
factors which are CBS in their 2-D homes. Therefore, the product, |e xi, is not a CBS
in the 2n -dimensional product space.]
Summary: By expressing x as powers-of-2 in the most-significant output factor,
|e
x2 i, we were able to watch the higher-powers dissolve because they turned ω into 1.
That left only the lowest power of ω, namely ω 4 = (−1), which, in turn, produced a
“Hadamard effect” on the least significant bit of the input ket, |x0 i. We’ll do this for
the other two factors with the sober acceptance that each time, fewer high powers
will disappear.
But first, let’s stand back and admire our handiwork. We have an actual circuit
element that generates the most significant separable factor for QFT (8) |xi:
|x0 i H |e
x2 i X
First, wow. Second, do you remember that I said pulling the least-significant kets
toward the right during factorization would introduce a small wrinkle? You’re looking
628
at it. The output state’s most-significant factor, |e
x2 i, is derived from the input state’s
least significant ket, |x0 i. Make a mental note that this will necessitate a reversal of
the kets once we have the output of the circuit; following the input line for qubit |x0 i
leads not to the output ket’s 0th factor, as we might have hoped, but rather to the
output ket’s (n − 1)st factor.
Again, we remain aware that ω is an 8th root of unity, making the coefficient of |1i
in the numerator of the middle factor
which means
|0i + ω 2x |1i |0i + (−1)x1 (i)x0 |1i
√ = √
2 2
H |x1 i , x0 = 0
= x1 (i)x0 |1i
|0i + (−1)√ , x0 = 1
2
• The good news is that if x0 = 0, then the factor ix0 becomes 1 and we are left
with a Hadamard operator applied to the middle input ket, |x1 i.
Fixing the bad news of item 1 is where I need you to focus all your attention and
patience, as it is the key to everything and takes only a few more neurons. Let’s go
ahead and take H |x1 i, regardless of whether x0 is 0 or 1. If x0 was 0, we guessed
right, but if it was 1, what do we have to do to patch things up? Not much, it turns
out.
Let’s compare the actual state we computed (wrong, if x0 was 1) with the one we
wanted (right, no matter what) and see how they differ. Writing them in coordinate
form will do us a world of good.
629
How do we transform
1 1
7−→ ?
−1x1 −1x1 · i
Answer: multiply by
1 0
R1 ≡ :
0 i
1 0 1 1
= .
0 i −1x1 −1x1 · i
Now we have the more pleasant formula for the second factor,
(
|0i + ω 2x |1i H |x1 i , x0 = 0
√ =
2 R1 H |x1 i , x0 = 1
1. We apply H to the two least significant kets, |x0 i and |x1 i, unconditionally, since
they will always be used in the computation of the final two most-significant
factors of QFT |xi.
2. We conditionally apply another operator, R1 , to the result of H |x1 i in the
eventuality that x0 = 1.
3. Although we apply all this to the two least significant input kets, |x1 i |x0 i,
what we get is the most-significant portion of the output state’s factorization,
|e
x2 i |e
x1 i (not the least-significant, so we must be prepared to do some swapping
before the day is done).
|x0 i • ...
630
Let’s add the remaining components one-at-a-time. First, we want to apply the
unconditional Hadamard gate to |x1 i. As our formulas indicate, this done before R1 ,
(R1 H |x1 i is applied right-to-left). Adding this element, we get:
|x1 i H R1 |e
x1 i X
|x0 i • ...
This computes our final value for the |e
x1 i factor.
We have yet to add back in the Hadamard applied to |x0 i. Here the order is
important. We have to make sure we do this after x0 is used to control x1 ’s R1 gate.
Were we to apply H to x0 before using it to control R1 , x0 would no longer be there –
it would have been replaced by the Hadamard superposition. So we place the H-gate
to the right of the control vertex:
|x1 i H R1 |e
x1 i X
|x0 i • H |e
x2 i X
That completes the circuit element for the two most-significant separable output
factors. We can now get back to analyzing the logic of our instructional n = 3 case
and see how we can incorporate the last of our three factors.
which means
|0i + ω x |1i |0i + (−1)x2 (i)x1 (ω)x0 |1i
√ = √
2 2
x2 (i)x1 |1i
|0i + (−1)
√ , x0 = 0
2
=
|0i + (−1)x√2 (i)x1 (ω)x0 |1i , x = 1
2 0
This time, while the output factor does not reduce to something as simple as a H |x2 i
in any cases, when x0 = 0, it does look like the expression we had for the middle
factor, except applied here to |x2 i |x1 i rather than the |x1 i |x0 i of the middle factor.
In other words, when x0 = 0 this least significant factor reduces to
|0i + (−1)x2 (i)x1 |1i
√ ,
2
631
while the middle factor was in its entirety
|0i + (−1)x1 (i)x0 |1i
√ .
2
This suggests that, if x0 = 0, we apply the same exact logic to |x2 i |x1 i that we used
for |x1 i |x0 i in the middle case. That logic would be (if x0 = 0)
(
|0i + ω x |1i H |x2 i , x1 = 0
√ =
2 R1 H |x2 i , x1 = 1
Therefore, in the special case where x0 = 0, the circuit that works for |e
x0 i looks like
the one that worked for |ex1 i, applied, this time, to qubits 1 and 2:
|x2 i H R1 ...
|x1 i • ...
To patch this up, we have to adjust for the case in which x0 = 1. The state we just
generated with this circuit was
1 1
√ x x
2 −1 2 · (i) 1
... but if x0 = 1, we really wanted:
1 1
√ x x x
2 −1 2 · (i) 1 · (ω) 0
How do we transform
1 1
7−→ ?
−1x2 · (i)x1 −1x2 · (i)x1 · (ω)x0
Answer: multiply by
1 0
R2 ≡ :
0 ω
1 0 1 1
= .
0 ω −1x2 · (i)x1 −1x2 · (i)x1 · (ω)x0
(Remember, this is in the case when x0 = 1, so ω = ω x0 .) This gives us the complete
formula for the rightmost output factor, |e
x0 i,
H |x2 i , x0 = 0, x1 = 0
R1 H |x2 i , x0 = 0, x1 = 1
x
|0i + ω |1i
√ = −−−−−−−−−−−−−
2
R2 H |x2 i , x0 = 1, x1 = 0
R2 R1 H |x2 i , x0 = 1, x1 = 1
632
In words, we took the tentative circuit that we designed for |e
x0 i under the assumption
that x0 = 0 but tagged on a correction if x0 = 1. That amounts to our newly
introduced operator R2 controlled by x0 , so that the result would be further multiplied
by R2 .
The Circuit Element for |e
x0 i, in Isolation
As long as we only consider |e x0 i, we can easily use this new information to patch
up the most recent x0 = 0 case. We simply add an |x0 i at the bottom of the picture,
and use it to control an R2 - gate, applied to the end of the |x2 i → |e
x0 i assembly line.
|x2 i H R1 R2 |e
x0 i X
|x1 i • ...
|x0 i • ...
In the previous section, we obtained the exact result for the least-significant output
factor, |e
x0 i.
|x2 i H R1 R2 |e
x0 i X
|x1 i • ...
|x0 i • ...
In the section prior, we derived the circuit for output two most significant factors,
|e
x1 i and |e
x2 i
|x1 i H R1 |e
x1 i X
|x0 i • H |e
x2 i X
All that’s left to do is combine them. The precaution we take is to defer applying any
operator to an input ket until after that ket has been used to control any R-gates
needed by its siblings. That suggests that we place the |ex1 i |e
x2 i circuit elements to
the right of the |ex0 i circuit element, and so we do.
|x2 i H R1 R2 |e
x0 i X
|x1 i • H R1 |e
x1 i X
|x0 i • • H |e
x2 i X
633
Prior to celebration, we have to symbolize the somewhat trivial circuitry for re-
ordering the output. While trivial, it has a linear (in n = log N ) cost, but it adds
nothing to the time complexity, as we’ll see.
|x2 i H R1 R2 |e
x2 i X
|x1 i • H R1 l |e
x1 i X
|x0 i • • H |e
x0 i X
You are looking at the complete QFT circuit for a 3-qubit system.
Before we leave this case, let’s make one notational observation. We defined R1
to be the matrix that “patched-up” the |e x1 i factor, and R2 to be the matrix that
“patched-up” the |e
x0 i factor. Let’s look at those gates along with the only other gate
we needed, H:
1 0 1 0
R2 ≡ R1 ≡
0 ω 0 i
1 1 1
H ≡ √
2 1 −1
The lower right-hand element of each matrix is a root-of-unity, so let’s see all three
matrices again, this time with that lower right element expressed as a power of ω:
1 0 1 0
R2 ≡ R1 ≡
0 ω 0 ω2
1 1 1
H ≡ √ 4
2 1 ω
This paves the way to generalizing to a QFT of any size.
|x2 i H R1 R2
|x1 i • H R1
|x0 i • • H
634
[Exercise. Go through the steps that got us this circuit, but add a fourth qubit to
get the n = 4 (QFT (16) ) circuit.]
It doesn’t take too much imagination to guess what the circuit would be for any
n:
|xn−1 i H R1 R2 .. Rn−1
|xn−2 i • .. H R1 .. Rn−2
|xn−3 i • .. • ..
.. ... .. ...
. .
|x0 i .. • .. • .. H
That’s good and wonderful, but we have not defined Rk for k > 2 yet. However, the
final observation of the n = 3 case study suggested that it should be
1 0
Rk ≡ n−k−1 .
0 ω2
You can verify this by analyzing it formally, but it’s easiest to just look at the extreme
cases. No matter what n is, we want R1 ’s lower-right element = i, and Rn−1 ’s to be
ω (compare our n = 3 case study directly above), and you can verify that for k = 1
and k = n − 1, that’s indeed what we get.
Well, we have defined and designed the QFT circuit out of small, unitary gates.
Since you’ll be using QFT in a number of circuit designs, you need to be able to cite
its computational complexity, which we do now.
635
The circuit complexity for this is O(n2 ). Adding on a circuit that reverses the order
can only add an additional O(n) gates, but in series, not nested, so that does not
affect the circuit complexity. Therefore, we are left with a computational factor of
O(n2 ) = O(log2 N ).
You might be tempted to compare this with the O(N log N ) performance of the FFT ,
but that’s not really an apples-to-apples comparison, for several reasons:
1. Our circuit computes QFT |xin for only one of the N = 2n basis states. We’d
have to account for the algorithm time required to repeat N passes through
the circuit which (simplistically) brings it to O(N log2 N ). While this can be
improved, the point remains: our result above would have to by multiplied by
something to account for all N output basis states.
2. If we were thinking of using the QFT to compute the DFT , we’d need to
calculate the N complex amplitudes, {e ck } from the inputs {ck }. They don’t
appear in our analysis because we implicitly considered the special N amplitudes
−1
{δxy }N
y=0 that define the CBS. Fixing this is feels like O(N ) proposition.
3. Even if we could repair the above with clever redesign, we have the biggest
obstacle: the output coefficients – which hold the DFT information – are am-
plitudes. Measuring the output state collapses them destroying their quantum
superposition.
Although we cannot (yet) use the QFT to directly compute a DFT with growth
smaller than the FFT , we can still use it to our advantage in quantum circuits, as
we will soon discover.
[Accounting for Precision. You may worry that increasingly precise matrix
multiplies will be needed in the 2 × 2 unitary matrices as n increases. This is a valid
concern. Fixed precision will only get us so far until our ability to generate and com-
pute with increasingly higher roots-of-unity will be tapped out. So the constant time
unitary gates are relative to, or “above,” the primitive complex (or real) multiplica-
tions and additions. We would have to make some design choices to either limit n
to a maximum useable size or else account for these primitive arithmetic operations
in the circuit complexity. We’ll take the first option: our n will remain below some
maximum n0 , build our circuitry and algorithm to be able to handle adequate pre-
cision for that n0 and all n ≤ n0 . This isn’t so hard to do in our current problem
since n never gets too big: n = log2 N , where N is the true size of our problem, so we
won’t be needing arbitrarily large ns in practice. If that doesn’t work for us in some
future problem, we can toss in the extra complexity factors and they will usually still
produce satisfactory polynomial big-Os for problems that are classically exponential.]
Further Improvements
I’ll finish by mentioning, without protracted analysis, a couple ways the above
circuits can be simplified and/or accelerated:
636
• If we are willing to destroy the quantum states and measure the output qubits
immediately after performing the QFT (something we are willing to do in most
of our algorithms), then the two-qubit (controlled-Rk ) gates can be replaced
with 1-qubit gates. This is based on the idea that, rather than construct a
controlled-Rk gate, we instead measure the controlling qubit first and then apply
Rk based on the outcome of that measurement. This sounds suspicious, I know:
we’re still doing a conditional application of a gate. However, a controlled-Rk
does not destroy the controlling qubit and contains all the conditional logic
inside the quantum gate, whereas measuring a qubit and then applying a 1-
qubit gate based on its outcome, moves the controlling aspect from inside the
quantum gate to the outer classical logic. It is much easier to build stable
conditioned one-qubit gates than two-qubit controlled gates. Do note, however,
that this does not improve the computational complexity.
637
Chapter 23
Shor’s Algorithm
2. Shor’s algorithm for factoring not only makes use of the period-finding algo-
638
rithm, but also provides an oracle for a specific function that is polynomial-time,
making the entire {circuit + algorithm} an absolute exponential speed-up over
the classical version.
Shor’s period-finding algorithm seeks to find the period, a, of a periodic function f (k)
whose domain is a finite set of integers, i.e., k ∈ ZM = {0, 1, 2, . . . , M − 1}.
Typically M can be very large; we can even consider the domain to be all of Z,
although the ZM case will be shown to be an equivalent finitization that makes the
problem tractable. I’ll state this explicitly in a couple screens. Here, we want to get
a general picture of the problem and the plan.
First, for this to even make sense a has to be less than M so that such periodicity
is “visible” by looking at the M domain points available to us.
Second, this kind of periodicity is more “traditional” than Simon’s, since here we
use ordinary addition to describe the period a via f (x) = f (x + a) rather than the
exotic mod-2 periodicity of Simon, in which f (x) = f (x ⊕ a).
We’ll see that the circuit and analysis have the same general framework as that of
Simon’s algorithm with the main difference being a post-oracle QFT gate rather
than a post-oracle Hadamard gate. What’s the big idea behind this change? At the
highest level of analysis it is quite simple even if the details are thorny.
If we put a maximally mixed superposition into the oracle as we have done for all
previous algorithms (and will do here) then a post-oracle measurement in the standard
basis should give all possible values of f with equal probabilities. That won’t do. We
have to measure along a basis that will be biased toward giving information. For
Simon, that happened to be the x-basis, thus the use of a final Hadamard operator.
What is the right basis in which to measure when we are hoping to discover an integer
function’s period ?
Our lectures on classical Fourier series and transforms had many ideas and conse-
quences of which I’d like to revive two.
1. Period, T , and frequency, f (not the function, f (x), but its frequency) are
related by
T ·f = constant,
639
where the constant is usually 1 or 2π for continuous functions and the vector
size, M , for discrete functions.
2. If a discrete function is periodic, its spectrum DFT (f ) will have values which
are mostly small or zero except at domain points that are multiples of the
frequency (See Figure 23.1).
Figure 23.1: The spectrum of a vector with period 8 and frequency 16 = 128/8
In very broad – and slightly inaccurate – terms, this suggests we query the spectrum of
our function, f (x), ascertain its fundamental frequency, m, (the first non-zero spike)
and from it get the period, a = M/m.
But what does it mean to “query the frequency?” That’s code for “take a post-
oracle measurement in the Fourier basis.” We learned that measuring along a non-
preferred basis is actually applying the operator that converts the preferred basis to
the alternate basis, and for frequencies of periodic functions, that gate is none other
than the QFT .
Well this sounds easy, and while it may motivate the use of the QFT , figuring
out how to use it and what to test – that will consume the next two weeks.
Another thing we saw in our Fourier lectures was that we get the cleanest, easiest-to-
analyze spectrum of a periodic function when we start with a pure periodic function
in the spatial (or time) domain, and then apply the transform. In the continuous case
that was a sinusoid or exponential, e.g., sin 3x (Figure 23.2).
In the discrete case it was a function that was zero everywhere except for a single
k < a and its multiples, like
640
( 0, 0, 0, .25 , 0, 0, 0, 0,
0, 0, 0, .25 , 0, 0, 0, 0,
0, 0, 0, .25 , 0, 0, 0, 0,
0, 0, 0, .25 , 0, 0, 0, 0,
0, 0, 0, .25 , 0, 0, 0, 0,
0, 0, 0, .25 , 0, 0, 0, 0,
0, 0, 0, .25 , 0, 0, 0, 0,
0, 0, 0, .25 , 0, 0, 0, 0,
0, 0, 0, .25 , 0, 0, 0, 0,
0, 0, 0, .25 , 0, 0, 0, 0,
0, 0, 0, .25 , 0, 0, 0, 0,
0, 0, 0, .25 , 0, 0, 0, 0,
0, 0, 0, .25 , 0, 0, 0, 0,
0, 0, 0, .25 , 0, 0, 0, 0,
0, 0, 0, .25 , 0, 0, 0, 0,
0, 0, 0, .25 , 0, 0, 0, 0 ) .
Such overtly periodic vectors have DFT s in which all the non-zero frequencies in the
spectrum have the same amplitudes as shown in Figure 23.3.
Figure 23.3: The spectrum of a purely periodic vector with period 8 and frequency
16 = 128/8
We will process our original function so that it produces a purely periodic cousin with
the same period by
1. putting a maximally mixed state into the oracle’s A register to enable quantum
parallelism, and
This will leave a “pure” periodic vector in the A register which we can send through
a post processing QFT gate and, finally, measure. We may have to do this more
than once, thereby producing several measurements. To extract m, and thus a, from
the measurements we will apply some beautiful mathematics.
After first defining the kind of periodicity that Shor’s work addresses, we begin the
final leg of our journey that will take us through some quantum and classical terrain.
When we’re done, you will have completed the first phase in your study of quantum
computation and will be ready to move on to more advanced topics.
The math that accompanies Shor’s algorithms is significant; it spans areas as
diverse as Fourier analysis, number theory, complex arithmetic and trigonometry.
641
We have covered each of these subjects completely. If you find yourself stuck on some
detail, please search the table of contents in this volume for a pointer to the relevant
section.
A function defined on Z,
f: Z −→ S, S ⊂ Z,
is called periodic injective if there exists an integer a > 0 (called the
period), such that
for all x 6= y in Z, we have
f (x) = f (y) ⇐⇒ y = x + ka, some integer k.
The term “injective” will be discussed shortly. Because of the if and only if (⇔) in
the definition, we don’t need to say “smallest” or “unique” for a. Those conditions
follow naturally.
Caution: Some authors might call f “a-periodic” to make the period visible,
omitting the reference to injectivity. Others, might just call f “periodic” and let you
fend for yourselves.
In theory, S, the range of f , can be any set: integers, sheep or neutrinos. It’s the
periodicity that matters, not what the functional values are. Still, we will typically
consider the range to be a subset of Z, most notably, Z2r (for some positive integer
r). This will make our circuit analysis clearer.
642
23.2.2 Functions of the Group Z M
A little consideration of the previous definition should convince you that any periodic
injective function with period a can be confined to a finite subset of Z which contains
the interval [0, a). To “feel” f ’s periodicity, though, we’d want M to contain at least
a few “copies of a” inside it, i.e., we would like M > 3a or M > 1000a. It helps if
we assume that we do know such an M , even if we don’t know the period, a, and let
f be defined on ZM , rather than the larger Z. The definition of periodic injective in
this setting would be as follows.
A function defined on ZM ,
f: ZM −→ S, S ⊂ ZM ,
“y = x + ka”
uses ordinary addition, not mod-M addition. We are not saying that we can let
x + ka wrap around M back to 0, 1, 2, . . . and find more numbers that are part of
the f (x) = f (y) club. For example, the function
f (x) = x % 8
643
multiple domain points to the same image value. Where is there 1-to-1-ness? It
derives from the following fact which is a direct consequence of the definition:
The if-and-only-if (⇔) condition in the definition of periodic injective implies
that, when restricted to the set [0, a) = { 0, 1, 2, . . . , a − 1 }, f is 1-to-1. The same
is true of any set of <= a consecutive integers in the domain.
[Exercise. Prove it.]
This is the twist I promised had some kinship to Simon’s periodic functions. Recall
that they were also 1-to-1 on each of their two disjoint cosets R and Q in that conver-
sation. The property is required in our treatment of Shor’s period-finding as well; we
must be able to partition f ’s domain into disjoint sets on which f is 1-to-1. This is
not to say that the property is necessary in order for Shor’s algorithm to work (there
may be more general results that work for vanilla flavored periodicity). However, the
majority of historical quantum period-finding proofs make use of injective periodic-
ity, whether they call it that, or not. For factoring and encryption-breaking, that
condition is always met.
While we’re comparing Simon to Shor, let’s discuss a difference. In Simon’s case,
if we found even one pair of elements, x0 6= y 0 with f (x0 ) = f (y 0 ), then we knew
a. However, the same cannot be said of Shor’s problem. All we would know in the
current case is that x0 and y 0 differ by a multiple of a, but we would know neither a
nor the multiple.
The M in the problem statement is not about the periodicity (a describes that) as
much as it is about how big the problem is; it gives us a bound on a. It is M which
is used to measure the computational complexity of the algorithm; how does the
{algorithm + circuit} grow as M gets larger?
644
Relativized vs. Absolute Speed-Up
I feel compelled to say this again before we hit-the-road. Our goal is to produce a
quantum algorithm that completes in polynomial time when its classical counterpart
has exponential complexity. The time complexity of Shor’s period-finding algorithm
will end up being limited by both the circuits/algorithms we build around our oracle
Uf , as well as f , itself. So if f cannot be computed in polynomial time, at least
by some quantum circuit, we won’t end up with an easy algorithm. We will prove
all the circuits and algorithms around Uf to be of polynomial complexity, specifi-
cally O(log3 M ). We will also show (in a future lesson) that the f needed for RSA
encryption-breaking is O(log4 M ), so the factoring problem is quantum-easy compared
with a hard problem using classical circuits/algorithms.
The point is that there are two problems: period-finding, which has quantum
relativized speed-up and factoring, which has quantum absolute speed-up. Our job is
to make sure that the quantum machinery around Uf is “polynomial,” so that when
f , itself, is polynomial (as is the case in factoring) we end up with a problem that is
absolutely easy in the quantum realm.
In the development of the algorithm it will also help to add the very weak assumption
a < M/2,
i.e., the periodicity cycles at least twice in the interval [0, M − 1].
Figure 23.6: We add the weak assumption that 2(+) a-intervals fit into [0, M )
645
Figure 23.7: Typical application provides many a-intervals in [0, M )
Figure 23.8: Our proof will also work for only one a interval in [0, M )
2n−1 < M 2 ≤ 2n .
We’ll use the integer interval [0, 1, . . . , 2n − 1] as our official domain for f , and we’ll
let N be the actual power-of-2,
N ≡ 2n .
Since M > 2a we are guaranteed that [0, N −1] will contain at least as many intervals
of size a within it.
[You’re worried about how to define f beyond the original domain limit M ? Stop
worrying. It’s not our job to define f , just discover its period. We know that f is
periodic with period a, even though we don’t know a yet. That means its definition
can be extended to all of Z. So we can take any size domain we want. Stated another
way, we assume our oracle can compute f (x) for any x.]
The reason for bracketing M 2 like this only becomes apparent as the plot unfolds.
Don’t be intimidated into believing anyone could predict we would need these exact
646
Figure 23.9: N = 2n chosen so (N/2, N ] bracket M 2
bounds so early in the game. Even the pioneers certainly got to the end of the
derivation and noticed that these limits would be needed, then came back and added
them up front. That’s exactly how you will do it when you write up the quantum
algorithms that you discover.
The bracketing of M 2 , when written in terms of N , looks like
N
< M 2 ≤ N.
2
Also, without loss of generality, we assume that the range of f is a subset of Z2r for
some sufficiently large r > 0, i.e.,
ra(f ) ⊆ [ 0, 2r − 1 ], also written
0 ≤ f (x) < 2r .
Let’s be certain that using N for the time complexity is the same as using M . Taking
the log (base 2) of the inequality that brackets M 2 , we get
N
log < log M 2 ≤ log N , or
2
log N − log 2 < 2 log M ≤ log N .
The big-O of every expression in this equation will kill the constants and weaken <
to ≤, producing
O(log N ) ≤ O(log M ) ≤ O(log N ) ,
647
and since the far left and far right are equal, they both equal middle, i.e.
O (log N ) = O (log M ) .
Therefore, a growth rate of any polynomial function of these two will also be equal,
O (logp N ) = O (logp M ) .
Thus, bracketing M 2 between N/2 and N allows us to use N to compute the com-
plexity and later replace it with M . Specifically, we’ll eventually compute a big-O of
log3 N for Shor’s algorithm, implying a complexity of log3 M .
• |y ⊕ f (x)ir . We’re familiar with the mod-2 sum and its use inside a ket,
especially when we are expressing Uf ’s target register. The only very minor
adjustment we’ll need to make arises from the oracle’s B channel being r qubits
wide (where 2r is f ’s range size) instead of the same n qubits of the oracle’s A
channel (where 2n is f ’s domain size). We’ll be careful when we come to that.
• |x + jain . As for ordinary addition inside the kets, this will come about when
we partition the domain into mutually exclusive “cosets,” a process that I’ll
describe shortly. The main thing to be aware of is that the sum must not extend
beyond the dimension of the Hilbert space in which the ket lives, namely 2n .
That’s necessary since an integer x inside a ket |xin represents a CBS state,
and there are only 2n of those, |0in , . . . |2n − 1in . We’ll be sure to obey that
rule, too.
Okay, I’ve burdened you with eye protection, seat belts and other safety equip-
ment, and I know you’re bursting to start building something. Let’s begin.
648
23.4 Shor’s Quantum Circuit Overview and the
Master Plan
23.4.1 The Circuit
The total circuit looks very much like Simon’s.
|0in H ⊗n QFT (N )
| {z }
Uf
(actual)
|0ir
| {z }
(conceptual)
[Note: As with Simon, I suppressed the hatching of the quantum wires so as to
produce a cleaner looking circuit. The A channel has n lines, and the B channel has
r lines, as evinced by the kets and operators which are labeled with the “exponents”
n, N and r.]
There are two multi-dimensional registers, the upper A register, and the lower
B register. A side-by-side comparison of Shor’s and Simon’s circuits reveals two
differences:
649
Target Channel. The bottom line forwards its |0ir directly on to the quantum
oracle’s B register, a move that (we saw with Simon) anticipates an application of
the generalized Born rule.
At that point, we conceptually test the B register output, causing a collapse of
both registers (Born rule). We’ll analyze what’s left in the collapsed A register’s
output, (with the help of a “re-organizing,” QFT gate). We’ll find that only a very
small and special set of measurement results are likely. And like Simon’s algorithm,
we may need more than one sampling of the circuit to get an adequate collection of
useful outputs on the A-line, but it’ll come very quickly due to the probabilities.
Strategy
Up to now, I’ve been comparing Shor to Simon. There’s an irony, though, when we
come to trying to understand the application of the final post-oracle, pre-measurement
gate. It was quite difficult to give a simple reason why a final Hadamard gate did
the trick for Simon’s algorithm (I only alluded to a technical lemma back then.) But
the need for a final QFT , as we have already seen, is quite easy to understand:
we want the period, so we measure in the Fourier basis to get the fundamental fre-
quency, m. Measuring in the Fourier basis means applying a z basis-to-Fourier basis
transformation, i.e., a QFT . m gets us a and we go home early.
One wrinkle rears its head when we look at the spectrum of a periodic function,
even one that is pure in the sense described above. While the likely (or in some cases
only) measurement possibilities may be limited to a small subset {cm}a−1 c=0 where
m = N/a is the frequency associated with the period a, we don’t know which cm we
will measure; there are a of them and they are all about equally likely. You’ll see why
we should expect to get lucky.
Figure 23.10: Eight highly probable measurement results, cm, for N = 128 and a = 8
So, while we’ll know we have measured a multiple cm of the frequency, we won’t know
which multiple. As it happens, if we are lucky enough to get a multiple c that has the
bonus feature of being relatively prime (coprime) to a, we be able to use it to find a.
A second wrinkle is that despite what I’ve led you to believe through my pictures,
the likely measurements aren’t exact multiples cm of the frequency, m. Instead they
will be a values yc , for c = 0, 1, · · · , (a − 1) which are very close to cm. We’ll have
to find out how to lock-on to the nearby cm associated with our measured, yc . Still,
when we do, a c coprime to a will be the the most desirable multiple that will lead
to a.
650
Two Forks
As we proceed, we’ll get to a fork in the road. If we take the right fork, we’ll find an
easy option. However the left fork will require much more detailed math. That’s the
hard option. In the easy case the spectrum measurement will yield an exact multiple,
cm, of m that I spoke of above. The harder, general case, will give us only a yc close
to cm. Then we’ll have to earn our money and use some math to hop from the yc on
which we landed to the nearby cm that we really want.
That’s the plan. It may not be a perfect plan, but I think that’s what I like about
it.
|0in H ⊗n QFT (N )
| {z }
Uf
(actual)
|0ir
| {z }
(conceptual)
A B C
Since many of the sections are identical to what we’ve done earlier, the analysis is
also the same. However, I’ll repeat the discussion of those common parts to keep this
lecture somewhat self-contained.
651
state, enabling the oracle to act on f (x) for all possible x, simultaneously.
|0in H ⊗n QFT (N )
Uf
|0ir
Hadamard, H ⊗n , in H(n)
Even though we are only going to apply the 2n -dimensional Hadamard gate to the
simple input |0in , let’s review the effect it has on any CBS |xin .
n −1
n 2X
|xin H ⊗n √1
2
(−1)x · y |yin ,
y=0
where the dot product between vector x and vector y is the mod-2 dot product. When
applied to |0in , reduces to n −1
n 2X
|0in
H ⊗n √1
2
|yin ,
y=0
or, returning to the usual computational basis notation |xin for the summation,
n −1
n 2X
n
|0i H ⊗n 1
√
2
|xin .
x=0
The output state of this Hadamard operator is the nth order x-basis CBS ket, |+in =
|0in± , reminding us that Hadamard gates provide both quantum parallelism as well as
a z ↔ x basis conversion operator.
|0in H ⊗n QFT (N )
Uf
|0ir
| {z }
Quantum Oracle
652
The only difference between this and Simon’s oracle is the width of the oracle’s B
register. Today, it is r qubits wide, where r will be (typically) smaller than the n of
the A register.
|xin |xin
Uf
r
|0i |0 ⊕ f (x)ir = |f (x)ir
Uf
|xin |0ir 7−→ |xin |f (x)ir
which
√ we see is a weighted sum of separable products, all weights being equal to
n
(1/ 2) . The headlines are these:
|f (x)in
• The output is a superposition of separable terms |xin √
( 2)n
, the kind of sum
the generalized Born rule needs,
|ϕin+r = |0inA |ψ0 irB + |1inA |ψ1 irB + · · · + |2n − 1inA |ψ2n −1 irB .
653
1. an instructional/easy case that has some unrealistic constraints on the period,
a, and
The easy case will give us the general framework that we will re-use in the general
analysis. Don’t skip it. The general case requires that you have your mind already
primed with the key steps from the easy case.
Basic Notation
We express that fact that one integer, c, divides another integer, a, evenly (i.e. with
remainder 0) using the notation
c a .
Z≥0 .
a, b ∈ Z≥0 , a > b,
654
23.9 First Fork: Easy Case (aN )
We
now consider the special case in which a is a divisor of N = 2n (in symbols,
aN ). This implies a = 2l . Immediately, we recognize that there’s really no need for
Figure 23.11: Easy case covers aN , exactly
a quantum algorithm in this situation because we can test for periodicity using classi-
cal means by simply trying 2l for l = 1, 2, 3, . . . , (n − 1), which constitutes O(log N )
trials. In the case of factoring, to which we’ll apply period-finding in a later lecture,
we’ll see that each trial requires a computation of f (x) = y x (mod N ) ∈ O(log4 N ),
y some constant (to be revealed later). So the classical approach is O(log N ) relative
to the oracle, and O(log5 N ) absolute including the oracle, all without the help of a
quantum circuit. However, the quantum algorithm in this easy case lays the founda-
tion for the difficult case that follows, so we will develop it now and confirm that QC
leads to at least O(log5 N ) classical complexity (we’ll do a little better).
655
[ 0, N − 1 ] = [ 0, ma − 1 ]
= [0, a − 1] ∪ [a, 2a − 1] ∪ [2a, 3a − 1] · · · ∪ [(m − 1)a, ma − 1]
= R ∪ R + a ∪ R + 2a ··· ∪ R + (m − 1)a,
where
R ≡ [0, a − 1] = {0, 1, 2, ... , a − 1},
a = period of f, and
m = N/a is the number of times a divides N = 2n .
Definition of Coset. R + ja is called the jth coset of R.
We rewrite this decomposition relative to a typical element, x, in the base coset R.
[ 0, N − 1 ] = 0, 1, . . . x, . . . a − 1 ∪ a, 1 + a, . . . x + a, . . . 2a − 1
∪ 2a, 1 + 2a, . . . x + 2a, . . . 3a − 1 ∪ · · ·
··· ∪ (m − 1)a, 1 + (m − 1)a, . . . x + (m − 1)a, . . . ma − 1
m−1
[n o
= x + ja x ∈ [0, a)
j=0
m−1
[n oa−1
= x + ja .
x=0
j=0
656
Using this fact (and keeping in mind that N = 2n ) we only need to sum over the a
elements in R and include all the x + ja siblings in each term’s A register factor,
N −1
1 X
√ |xin |f (x)ir
N
x=0
a−1
1 X n
= √ |xi + |x + ain + |x + 2ain + · · · + |x + (m − 1)ain |f (x)ir
N
x=0
a−1 m−1
1 X X
= √ |x + jain |f (x)ir
N
x=0 j=0
r X a−1 m−1
m √1
X
= |x + jain |f (x)ir .
N m
x=0 j=0
√
I moved a factor of 1/ m to the right of the outer sum so we could see that
|0in H ⊗n QFT (N )
Uf
|0ir
| {z }
Conceptual
As the last sum demonstrated, each B register measurement of f (x) will be attached
to not one, but m, input A register states. Thus, measuring B first, while collapsing
A, merely produces a superposition of m states in that register, not a single, unique
x from the domain. It narrows things down, but not enough to measure,
657
r a−1 m−1
m X
√1
X
|x + jain |f (x)ir
N x=0 m
j=0
m−1
!
1 X
& √ |x0 + jain |f (x0 )ir
m j=0
Here, &
means collapses to.
Stand back and you’ll see that we’ve accomplished one of the goals of our introductory
“key idea” section. The conceptual measurement of the B register leaves an overall
state in the A register in which all of the amplitudes are zero except for m that have
equal amplitude √1m . Furthermore, those non-zero terms are spaced at intervals of
a in the N -dimensional vector: this is a “pure” periodic vector with the same period
a as our function f . We have produced a vector whose DFT is akin to that shown
in Figure 23.13. All non-zero amplitudes of the DFT are multiples of the frequency
m, i.e., of the form cm, c = 0, 1, . . . , (a − 1). (Due to an artifact of the graphing
software the 0 frequency appears after the array at phantom position N = 128.)
Figure 23.13: The spectrum of a purely periodic vector with period 8 and frequency
16 = 128/8
This strongly suggests that we apply the QFT to the A register in order to produce
a state that looks like Figure 23.13.
658
The Details
We’ll get our ideal result if we can produce an A register measurement, cm, where c
is coprime to a. The following two thoughts will guide us.
• The shift property of the QFT will turn the sum x+ja into a product involving
a root-of-unity,
N −1
(N ) n x0 x 1 X yx n
QFT |x − x0 i = ω √ ω |yi .
N y=0
|0in H ⊗n QFT (N )
Uf
|0ir
The QFT , being a linear operator, distributes over sums, so it passes right through
the Σ,
m−1 m−1
1 X 1 X
√ |x0 + jain QFT (N ) √ QFT (N ) |x0 + jain
m m
j=0 j=0
659
so the QFT of the entire collapsed superposition is
m−1
1 X
QFT (N )
√ |x0 + jain
m
j=0
m−1 N −1
1 X 1 X x0 y jay
= √ √ ω ω |yin
m N
j=0 y=0
N −1 m−1
1 X X x0 y jay
= √ ω ω |yin
mN
y=0 j=0
m−1 N −1
1 X x0 y X jay
= √ ω ω |yin
mN
y=0 j=0
We can measure it at any time, and we next look at what the probabilities say we
will see when we do.
Foregoing the B Register Measurement. Although we analyzed this under
the assumption of a B measurement, an A channel measurement really doesn’t care
about a “conceptual” B channel measurement. The reasoning is the same as in
Simon’s algorithm. If we don’t measure B first, the oracle’s output must continue to
carry the full entangled summation
r X a−1 m−1
m √1
X
|x + jain |f (x)ir .
N m
x=0 j=0
h i
through the final QFT (N ) ⊗r
⊗ 1 . This would add an extra outer-nested sum
1 X
√ |f (x)ir
a
x∈[0.a)
to our “Summary” expression, above, making it the full oracle output, not just that
of the A register. Even leaving B is unmeasured, the algebraic simplification we
get below will still take place inside the big parentheses above for each x, and the
probabilities won’t be affected. (Also, note that an A register collapse to one specific
|x0 + jain will implicitly select a unique |f (x0 )i in the B register.) With this overview,
try carrying this complete sum through the next section if you’d like to see its (non)
effect on the outcome.
660
23.9.5 Computation of Final Measurement Probabilities (Easy
Case)
We are now in an excellent position to analyze this final A register superposition and
see much of it disappear as a result of some of the properties of roots-of-unities that
we covered in a past lecture. After that, we can analyze the probabilities which will
lead to the algorithm. We proceed in five steps that will
3. prove that a random selection from [0, a − 1] will be coprime-to-a 50% of the
time,
After (conceptual) measurement/collapse of the B register to state |f (x0 )ir , the post-
QFT A register was left in the state:
m−1 N −1 m−1
1 X 1 X X
√ |x0 + jain QFT (N ) √ ω x0 y ω jay |yin
m mN
j=0 y=0 j=0
We look at the inner sum in parentheses in a moment. First, let’s recap some facts
about ω.
ω ≡ ωN
ω N = 1.
m = N/a,
661
we conclude
1 = ω N = ω am = (ω a )m .
ω a = ωm
is the primitive mth root of unity. Using ωm in place of ω a in the above sum produces a
form that we have seen before (lecture Complex Arithmetic for Quantum Computing,
section Roots of Unity, exercise (d )),
m−1
X m−1
X m, if y ≡ 0 (mod m)
jay jy
ω = ωm = .
0, if y 6≡ 0 (mod m)
j=0 j=0
This causes a vast quantity of the terms in the QFT output (the double sum) to
disappear: only 1-in-m survives,
r
m−1 m X
1 X ω x0 y |yin .
√ |x0 + jain QFT (N ) N
m y≡0
j=0 (mod m)
y ≡ 0 (mod m)
⇔
y = cm, for some c = 0, 1, 2, ... , a − 1 .
This defines the special set of size a which will be certain to contain our measured y:
C = {cm}a−1
c=0 .
m−1 a−1
1 X 1 X x0 cm
√ |x0 + jain QFT (N ) √ ω |cmin ,
m a
j=0 c=0
662
Example
Consider a function that has period 8 = 23 defined on a domain of size 128 = 27 . Our
problem variables for this function become
n = 7,
N = 2n = 128 ,
a = 23 = 8 and
N 27
m = = 3 = 16 .
a 2
Let’s say that we measured the B register and got the value f (x0 ) corresponding to
x0 = 3. According to the above analysis the full pretested superposition,
127
1 X
√ |xi7 |f (x)ir
128
x=0
7
1 X 7
= √ |xi + |x + 8i7 + |x + 16i7 + · · · + |x + 120i7 |f (x)ir
128
x=0
We are interested in learning the period but can’t really look at the individual co-
ordinates of this vector, it being a superposition which will collapse unpredictably.
But the analysis tells us that its QFT will produce a vector with only eight non-zero
amplitudes,
r r
7 m X n 1 X
QFT (128)
|ψ3 i = ω x0 y
|yi = ω x0 y |yi7 ,
N y≡0
8 y≡0
(mod m) (mod 16)
663
each corresponding to a y which is a multiple of m = 16 having an amplitude-squared
(i.e., probability of collapse) = 1/8 = .125, all other probabilities being zero. Graphing
the square amplitudes of the QFT confirms this nicely. (See Figure 23.14.) So we do
find that the only possible measurements in the frequency domain lie in the special
set
C = { 0, 16, 2(16), 3(16), . . . , 7(16) } .
Match this with the graph to verify the spikes are at positions 16, 32, 48, etc. (Due
to an artifact of the graphing software the 0 frequency appears after the array at
phantom position 128.) If we can use this set to glean the frequency m = 16, we will
be able to determine the period a = N/m = 128/16 = 8.
Measuring along the frequency basis means applying the basis-transforming QFT
after the oracle and explains its presence in the circuit.
23.9.8 Step III: Prove that a Random Selection from [0, a−1]
will be Coprime-to-a 50% of the Time
Specifically, we prove that the probability of a randomly selected c ∈ [0, a − 1] being
coprime to a is 1/2 (in the easy case).
Recall that, in this easy case, N = 2n , and a|N , so, a = 2r must also be a
power-of-two.
a) = # coprimes to a ∈ [0, a − 1] ,
P (c ◦
a
# odds < 2r
= , since a = 2r
2r
1
= . QED
2
664
in constant time, namely those cm corresponding to c ◦ a (i.e., c coprime to a). This
will enable us to find m (details shortly), and once we have m we get the period a
instantly from a = N/m.
In Step I, we proved that the likelihood of measuring one of the special C =
{yc } = {cm} was 100%. Then, in Steps II and III, we demonstrated that
We combine all this with the help of a little probability theory. The derivation may
seem overly formal given the simplicity of the probabilities in this easy case, but it
sets the stage for the difficult case where we will certainly need the formalism.
First we reprise some notation and introduce some new.
C ≡ {cm}a−1c=0
{ yb yb = bm ∈ C and b ◦
B ≡ a}
(Note: B ⊆ C.)
We can reproduce the effect of the CTC theorem without invoking it, and for practice
with probability theory let’s go ahead and do it that way, too.
If we run the circuit T times, obtaining T samples,
mc1 , mc2 , . . . , mcT ,
the probability that none of the associated c were coprime to a is
P ¬ c1 ◦ a ∧ ¬ c2 ◦ a ∧ . . . ∧ ¬ cT ◦
a
T
1
= , constant time, T , independent of N .
2
This means that the probability of at least one mc having a c ◦ a is:
T
1
P (at least one c ◦ a) = 1 −
.
2
Conclusion
666
with c ◦ a – that is, yck ∈ B – with high probability. In a moment, we’ll see why it’s
so important to achieve this coprime condition.
Example:
If we measure the output of our quantum circuit repeatedly, insisting
on yc ↔ c ◦ a with P > .999999, or error tolerance ε = 10−6 , we would need
log (10−6 )
−6
T = + 1 = + 1 = 19 + 1 = 20
log (.5) −.30103
measurements.
We can instruct our algorithm to cycle 100 times, to get an even better confidence,
but if the data determined that only eight measurements were needed to find the
desired cm, then it would return with a successful cm after eight passes. Our hyper-
conservative estimate costs us nothing.
The Euclidean Algorithm (EA) is a famous technique from number theory takes two
input integers P, Q with P > Q in Z≥0 and produces the greatest common divisor of
P and Q, also represented by gcd(P, Q). This is the largest integer that divides both
P and Q, evenly (without remainder). EA(P, Q) will produce gcd(P, Q) in O(log3 P )
time. We will be applying EA to N and cm, to get m0 = gcd(N, cm) so the algorithm
will produce m0 with time complexity O(log3 N ) = O(log3 M ).
We’ll assume that EA(N, cm) delivers as promised and go into its inner workings
in a later chapter. EA is a crucial classical step in Shor’s algorithm as we see next.
We proved that we will measure yc ∈ B with near certainty after some predetermined
number, T of measurements. Why does such a yc solve Shor’s period-finding problem
(in the easy case)?
Here is our circuit, after measuring cm:
667
So the first thing we do is use EA to produce m0 = gcd(N, cm). Now, if cm ∈ B then
c ◦ a, and
c◦
a ⇒ m0 ≡ gcd(N, cm) = gcd(am, cm) = m
• Select an integer, T , that reflects an acceptable failure rate based on any known
aspects of the period. E.g., for a failure tolerance of .000001, We might choose
T = 20
• If the above loop ended naturally (i.e., not from the break ) after T full passes,
we failed. Otherwise, we have found a.
668
Computational Complexity (Easy Case)
We have already seen that O(log(N )) = O(log(M )) as well as any powers of the logs,
so I will use M in the following.
• The four non-oracle components, above, are done in series, not in nested loops,
so the overall relativized complexity will be the worst case among them, O(log3 (M )).
• In the case of factoring needed for RSA encryption breaking (order-finding) the
actual oracle is O(log4 (M )) or better, so the absolute complexity in that case
will be O(log4 (M ))
So the entire Shor circuit for an f ∈ O(log4 (M )) (true for RSA/order-finding) would
have an absolute complexity of O(log4 (M )). Notice that, while not an exponential
speed-up over the O(log5 (M )) easy case classical algorithm presented, it is “faster” by
a factor of log N . That’s due to the fact that the quantum circuit requires a constant
time sampling, while the classical function must be sampled O(log N ) times. To be
fair, though, the classical algorithm was deterministic, and the quantum algorithm,
probabilistic. You can debate whether or not this counts in your forums.
This completes the easy case fork-in-the-road. It contains all the way points that
we will need for the general case without the subtle and tricky math. You are now
well positioned to understand those additional details, so ... onward.
669
Figure 23.15: There is (possibly) a remainder for N/a, called the “excess”
Figure 23.16: [0, N ) is the union of distinct cosets of size a, except for last
[ 0, N − 1 ]
excess
z }| {
= [0, a − 1] ∪ [a, 2a − 1] · · · ∪ [(m − 1)a, ma − 1] ∪ [ ma, N − 1 ]
partial
z }| {
= R ∪ R + a ··· ∪ R + (m − 1)a, ∪ R + ma
670
where
but now we had to “slap on” the partial coset, { x + ma }, to account for the possible
overflow.
We have to be careful about counting the family members of each element x ∈ R, i.e.,
those x + ja who map to the same f (x) by periodicity. We sometimes have a member
in the last, partial, coset, and sometimes not. If x is among the first few integers of
R, i.e., ∈ [0, N − ma), then there will be m + 1 partners (including x) among its kin.
However, if x is among the latter integers of R, i.e., ∈ [N − ma, a), then there will be
only m partners (including x) among its kin.
We’ll use m
e to be either m or m + 1 depending on x,
(
m + 1, for the “first few” x in [0, a − 1]
m
e =
m, for the “remaining” x in [0, a − 1]
671
Figure 23.18: If 0 ≤ x < N − ma, a full m + 1 numbers in ZN map to f (x)
and our new partition of the domain gives us a nice way to rewrite this. First, note
that
x
x+a
x + 2a
..
.
f
7 → f (x) .
−
x + ja
..
.
x + (m − 1)a
and sometimes . . . x + ma x∈R
672
√
The factor of 1/ m e inside the sum normalizes each term in the outer sum. However,
the common amplitude remaining on the outside is harder to symbolize in a formula,
which is why I used “≈” to describe it. (me doesn’t even make good sense outside the
sum, but it gives us an idea of what the normalization factor is.) It turns out that
q we
don’t care about its exact value. It will be some number between N and m+1
pm
N
,
the precise value being whatever is needed to normalize the overall state.
|0in H ⊗n QFT (N )
Uf
|0ir
| {z }
Conceptual
Each B register measurement of f (x) will be attached to not one, but m, e input A
register states. The generalized Born rule tells us that measuring B will cause the col-
lapse of A into a superposition of m
e CBS states, narrowing things down considerably.
r a−1 m−1
e
m X
√1
X
|x + jain |f (x)ir
e
N x=0 me
j=0
m−1
!
1 X
e
& √ |x0 + jain |f (x0 )ir
m
e j=0
Here, &
means collapses to.
If after measuring the post-oracle B register we were to go on to measure the A
register, it would collapse, giving us a reading of one of the m e values, x0 + ja, but
that value would not get us any closer to knowing a, so as with the easy case, we don’t
measure A yet. Instead, we name the collapsed – but unmeasured – superposition
state in the A register |ψx0 in , since it is determined by the measurement “f (x0 )” of
the collapsed B register,
m−1
e
1 X
|ψx0 in ≡ √ |x0 + jain
m
e
j=0
673
occurred. There is no harm in pretending we measured and collapsed to a |f (x0 )ir
first. (See the Easy Case or our lesson Simon’s Algorithm for longer explanations of
the same argument.)
The conceptual measurement of the B register leaves an overall state in the A register
in which all the amplitudes are zero except for me of them which have amplitude √1m e
.
Furthermore, those non-zero terms are spaced at intervals of a in the N -dimensional
vector: this is a “pure” periodic vector with the same period a as our function f .
In contrast to the easy case, however, m
e is sometimes m and sometimes m + 1,
never mind that a is not a perfect power-of-2 or that it doesn’t divide into N evenly.
All this imperfection destroys the arguments made in the easy case.
A look at the DFT of the vector whose components match the amplitudes of
a typical collapsed state left in the A register (see Figure 23.20) confirms that a
frequency domain measurement of QFT (N ) |ψx0 in no longer assures us of seeing one
of the a numbers cm0 , c = 0, . . . a−1, with m0 the true – and typically non-integer –
frequency N/a. As for our integer quotient m ≡ bN/ac close to the actual frequency
N/a, it’s possible that none of the frequency domain points cm have high amplitudes.
Figure 23.20: The spectrum of a purely periodic vector with period 10 and frequency
12.8 = 128/10
674
The Big Picture
It still seems to be a good idea to apply a QFT to the A register in order to produce
a state that looks like Figure 23.20.
We’ll show that there are only a likely A register measurements, yc , c = 0, 1, . . . , (a−
1). And even if a given yc 6= cm exactly, it will at least lead us to a cm. In fact, we
can expect the resulting c to be relatively prime to a with good probability, a very
desirable outcome. Here are the general steps.
• We will show that each yc is very close to – in fact, uniquely selects – a point of
the form cm. We’ll describe a fast (polynomial time) algorithm that takes any
of the likely measured yc to its unique partner cm.
• We’ll prove that we can expect to get a yc (and thus cm) with c coprime to
the period a in constant time. Such a c will unlock the near-frequency, m, and
therefore the period, a.
|0in H ⊗n QFT (N )
Uf
|0ir
m−1
e m−1
e
1 X 1 X
√ |x0 + jain QFT (N ) √ QFT (N ) |x0 + jain
me m
e
j=0 j=0
675
is
N −1
n 1 X (x0 + ja) y
QFT (N )
|x0 + jai = √ ω |yin
N
y=0
N −1
1 X x0 y jay
= √ ω ω |yin ,
N
y=0
1
In this expression, the normalizing factor √mN
e
is precise. That’s in contrast to the
pre-collapsed state in which we had an approximate factor outside the full sum. The B
register measurement “picked out” one specific x0 , which had a definite me associated
with it. Whether it was m or m + 1 doesn’t matter. It is one of the two, and that
value is used throughout this expression.
Summary. This is the preferred organization of our superposition state prior
to sampling the final A register output:
N −1 m−1
e
1 X X
√ ω x0 y ω jay |yin .
mN
e
y=0 j=0
The next several sections explore what the probabilities say we will see when we
measure this state. And while we analyzed it under the assumption of a prior B
measurement, the upper (A) channel measurement won’t care about that conceptual
measurement, as we’ll see. We continue the analysis as if we had measured the B
channel.
676
23.10.5 Computation of Final Measurement Probabilities (Gen-
eral Case)
This general case, which I scared you into thinking would be a mathematical horror
story, has been a relative cakewalk so far. About all we had to do was replace the
firm m with the slippery m,e and everything went through without incident. That’s
about to change.
In the easy case, we were able to make the majority terms in our sum vanish (all
but 1-in-m). Let’s review how we did that. We noted that ω ≡ ωN , the primitive
N th root, so
ωN = 1 .
ω ma = 1
and realized that this implied that ω a was a primitive mth root of unity. From there
we were able get massive cancellation due to the facts we developed about sums of
roots-of-unity.
The problem, now, is that we cannot replace N with ma. We have an m, e but
even resolving that to m or m + 1 won’t work, because neither one divides N evenly
(by the general case hypothesis). So we’ll never be able to manufacture an mth root
of unity at this point and cannot watch those big sums dissolve before our eyes. So
sad.
We can still get what we need, though, and have fun with math, so let’s rise to
the challenge.
As with the easy case our job is to analyze the final (post QFT ) A register
superposition. While none of the terms will politely disappear the way they did in
the easy case, we will find that certain y states will be much more likely than others,
and this will be our savior.
Computing the final measurement probabilities will require the following five steps.
677
23.10.6 STEP I: Identify (Without Proof ) a Special Set of
a Elements, C = {yc }a−1
c=0 of High Measurement Likeli-
hood
In this step, we will merely describe the subset of y that we want to measure. In the
next step, we’ll provide the proof.
In the easy case we measured y, which had the special form
y = cm, c = 0, 1, 2, ... , a − 1,
with 100% certainly in single measurement. From there we tested whether the c was
coprime to a, (which it was with high probability), and so on. This time we can’t be
100% sure of anything even after post-processing with the QFT , but that’s normal
for quantum algorithms – we often have to “work the numbers” and be satisfied to
get what we want in constant or polynmoial time. I claim that in the general case we
will measure an equally small subset of y, again a in all, that we label
y = yc , c = 0, 1, 2, ... , a − 1
with probability P = 1 − ε in O(1) measurements. (Whenever you see me use a ε,
it represents a positive number which is as small as we want, say .000000001, for
example.) The following construction will lead us to these special a values.
Consider the (very) long line [ 0, aN − 1 ].
Each interval contains exactly one integral multiple, ya, of a in it. We’ll label the
multiplier that gets us into the cth interval yc . (y0 is easily seen to be 0.)
h a a h a a
0a ∈ − , + , · · · , yc a ∈ cN − , cN + ,
2 2 2 2
h a a
··· , ya−1 a ∈ (a − 1)N − , (a − 1)N +
2 2
678
Figure 23.22: Half-open intervals of width a around each point cN
Example 1
In this first of two examples we illustrate the selection of yc for an actual periodic
function defined on integers [0, N − 1], where N = 25 = 32, and the period a = 3.
(Remember, it doesn’t matter what the values f (k) are, so long as we’re told it is
periodic.)
679
• c = 2: The center of the interval is 2N = 2 · 32 = 64. We seek y2 such that
• Etc.
Example 2
In this second example, we test the claim that the yc so described really do have high
relative measurement likelihood for an actual periodic function.
Consider a function that has period 10 defined on a domain of size 128 = 27 . Our
problem variables for this function become
n = 7,
N = 2n = 128 ,
a = 10 and
N 128
m = = = 12 .
a 10
Let’s say that we measured the B register and got the value f (x0 ), corresponding to
x0 = 3. For this x,
m
e = m + 1 = 13 ,
since
7 sometimes
1 X 7
z }| {
= √ |xi + |x + 10i7 + |x + 20i7 + · · · + |x + 120i7 |f (x)ir
128
x=0
680
( 0, 0, 0, .27735 , 0, 0, 0, 0, 0, 0,
0, 0, 0, .27735 , 0, 0, 0, 0, 0, 0,
0, 0, 0, .27735 , 0, 0, 0, 0, 0, 0,
0, 0, 0, .27735 , 0, 0, 0, 0, 0, 0,
0, 0, 0, .27735 , 0, 0, 0, 0, 0, 0,
0, 0, 0, .27735 , 0, 0, 0, 0, 0, 0,
0, 0, 0, .27735 , 0, 0, 0, 0, 0, 0,
0, 0, 0, .27735 , 0, 0, 0, 0, 0, 0,
0, 0, 0, .27735 , 0, 0, 0, 0, 0, 0,
0, 0, 0, .27735 , 0, 0, 0, 0, 0, 0,
0, 0, 0, .27735 , 0, 0, 0, 0, 0, 0,
0, 0, 0, .27735 , 0, 0, 0, 0, 0, 0,
0, 0, 0, .27735 , 0, 0, 0, 0 ) .
We are interested in learning f ’s period and, like the easy case, the way to get at it is
by looking at this state vector’s spectrum, so we take the QFT . Now, unlike the easy
case, this vector’s QFT will not create lots of 0 amplitudes; generally all N = 128 of
them will be non-zero. That’s because the resulting sum
N −1 m−1
e
1 X X
QFT (128) |ψ3 i7 = √ ω x0 y ω jay |yin .
mN
e
y=0 j=0
did not admit any cancellations or simplification. Instead, the above claim – which
we will prove in the next section – is that for x0 = 3 only m e = 13 of them will be
likely, the special {yc }, for c = 0, 1, 2, . . . , 12 we described in our last analysis. Let’s
put our money where our mouth is and at least show this to be true for the one
function under consideration.
We take three yc as examples: y4 , y5 and y6 . We’ll do this in two stages. First
we identify the three yc values. Next, we graph the probabilities of QFT (128) |ψ3 i7
around those three values to see how they compare with nearby y values.
Stage 1. Compute y4 , y5 and y6
681
2
(128) 7
Stage 2. Look at the Graph of
QFT |ψ3 i
Around These Three y
Here is a portion of the graph of the QFT ’s absolute value squared showing
the probabilities of measuring over 35 different y value in the frequency domain. It
exhibits in a dramatic way how much more likely the yc are to be detected than
are the non-yc values. (See Figure 23.24.) Even though the non-yc have non-zero
To assist our analysis we’ll define a setof ybc associated with the desired yc such that
the ybc all live in the interval − a2 , + a2 (Figure 23.25).
We
ado this by subtracting cN from each yc a to “bring it” into the base interval
a
− 2 , + 2 . More precisely,
h a a
ybc ≡ yc a − cN ∈ − , + , c = 0, 1, 2, ..., a − 1 ,
2 2
which can be stated using modular arithmetic by
ybc = yc a (mod N ) .
682
Figure 23.25: ybc all fall in the interval [−a/2, a/2)
This is the vocabulary we will need. Next, we’ll take a lengthy – but leisurely –
mathematical cruise to prove our claim of high probability measurement for the set
C.
23.10.7 STEP II: Prove that the Values in, C = {yc }a−1
c=0 Have
High Measurement Likelihood
We gave a convincing argument that the a distinct frequency domain values {yc }
constructed in the last section do produce highly likely measurement results, but we
didn’t even try to prove it. It’s time to do that now. This is the messiest math of the
lecture. I’ll try to make it clear by offering pictures and gradual steps.
When we last checked, the (conceptual) measurement/collapse of B register to
state |f (x0 )i left the post-QFT A register in the state
N −1 m−1
e
1 X X
√ ω x0 y ω jay |yin .
mN
e
y=0 j=0
The probability of measuring the state |yi is the amplitude’s magnitude squared:
2
Xm−1
e
1 x0 y 2
jay
P (measurement yields y) = |ω | ω
mN
e j=0
2
Xm−1
e
1
jay
= ω
mN
e
j=0
Letting
µ ≡ ω ay ,
683
the summation factor on the right (prior to magnitude-squaring) is
m−1
e
X
µj = 1 + µ + µ2 + · · · + µm−1
e
j=0
µm
e
−1
= .
µ−1
Having served its purpose, µ can now be jettisoned, and
2πi
ay m
µm
e
−1 ω ayme
−1 eN e
−1
= ay
= 2πi
µ−1 ω −1 e N
ay
−1
iθy m
e e
−1
= ,
eiθy − 1
where we are defining the “angle”
2πay
θy ≡ .
N
In other words, the probability of measuring any y is
2
1 eiθy m
e
− 1
P (measurement yields y) = eiθy − 1
mN
e
The next several screens are filled with the math to estimate the magnitude of the
fraction,
iθy m
e e − 1
eiθy − 1 .
A General Bound for eiφ − 1
An Upper Bound for eiφ − 1
In the complex plane, eiφ − 1 is the length of the chord from 1 to eiφ , and this is
always ≤ the arc length from 1, counterclockwise, to eiφ , all on the unit circle. But
the arc length is, by definition, the (absolute value of the) angle, φ, itself, so:
iφ
e − 1 ≤ |φ| .
684
Figure 23.26: The chord is shorter than the arc length
A Lower Bound for eiφ − 1
Noting that
2
sin φ φ
− i cos = 1,
2 2
we have shown that
iφ
e − 1
φ
= 2 sin .
2
685
Figure 23.27: |sin(x/2)| lies above |x/π| in the interval ( −π, π )
By simple calculus and solving φ/π = sin (φ/2) for φ, or noting where the graph of the
sine curve and the line intersect, we conclude that when φ is in the interval [−π, π],
we can bound 2 |sin (φ/2)| from below,
2 |φ| φ
≤ 2 sin , for φ ∈ [−π, π] .
π 2
This gives us a lower bound for eiφ − 1:
2 |φ| iφ
≤ e − 1 , for φ ∈ [−π, π] .
π
Combining both bounds when φ is in the interval [−π, π], we have bracketed the
expression under study,
2 |φ| iφ
≤ e − 1 ≤ |φ| , for φ ∈ [−π, π] .
π
especially when y = yc .We prefer to work with the ybc which are in an a-sized interval
around 0:
h a a
ybc = yc a − cN ∈ − , + , c = 0, 1, 2, ..., a − 1
2 2
686
So, let’s first convert the exponentials in the numerator and denominator using the
above relation between those special {yc } (which we hope to measure with high prob-
ability) and their corresponding mod N equivalents, the {b yc }, close to the origin.
This allows us to rewrite the absolute value of the amplitudes we wish to estimate as
iθy m iθ m/a
e e − 1 ybc e − 1
= e
eiθybc /a − 1 .
eiθy − 1
We want a lower bound for this magnitude. This way we can see that the likelihood
of measuring these relatively few ys is high. To that end, we get
The denominator is easy. We derived an upper bound for all angles, φ, so:
e ybc − 1 ≤ θybc .
i θ /a
a
The numerator is
iθ m/a i (2πby / N ) m
e ybc e − 1 = e c e
− 1 .
Remember that m
e is sometimes m and sometimes m + 1.
Sub-Case 1: m
f is m
687
we get
2πm
−π < ybc . < π .
N
This allows us to invoke the lower bound we derived when φ ∈ [π, π], namely
i 2πmby / N 2 |2πmbyc / N |
e c
− 1 ≥ .
π
Re-applying the notation
2πay
θy ≡ ,
N
we write it as
i 2πmy / N 2 ma
θy c
e c
− 1 ≥ .
b
π
Combining the bounds for the numerator and denominator (in the case where m
e = m),
we end up with
iθ m/a m ,
≥ 2 a θybc
e ybc e − 1 θybc
= 2m
eiθybc /a − 1 π a π
Sub-Case 2: m
f is m + 1 (Deferred)
That only worked for m e = m; the bounds argument we presented won’t hold up when
m
e = m+1, but we’ll deal with that kink in a moment. Let’s pretend the above bound
works for all m,
e both m and m + 1, and finish computing the probabilities under this
white lie. Then we’ll come back and repair the argument to also include m
e = m + 1.
If the above bounds worked for all m,e we could show that this leads to a desired
lower bound for our probabilities of getting a desired y ∈ C in constant time using
the following argument.
For any y, we said
2
1 eiθy m
e
− 1
P (measurement yields y) = eiθy − 1 ,
mN
e
but for y = yc , one of the (a) special ys that lie in the neighborhoods of the cN s,
we can substitute our new-found lower bound for the magnitude of the fraction.
(Remember, we are allowing that the bound holds for all m, e even though we only
proved it for m
e = m), so
e2
1 4m m 4
P (measurement yields yc ) ≥ ≥
mN
e π2 N π2
688
The last inequality holds because some of the m e are m + 1 and some are m. Now,
there are a such yc , namely, y0 , y1 , · · · , ya−1 , each describing a distinct, independent,
measurement (we can’t get both y32 and y81 ). Thus, the probability of getting any one
of them is the sum of the individual probabilities. Since those individual probabilities
are all bounded below by the same constant, we can multiply it by a to get the
collective lower bound,
am 4
P (measurement yields one of the yc ) ≥
N π2
[Exercise. Make this last statement precise.]
In this hard case, allowing ¬ a|N , we defined m to be the unique integer satisfying
ma ≤ N < (m + 1)a, or quoting only the latter inequality,
(m + 1)a > N.
It’s now time to harvest that “weak additional assumption” requiring at least two
periods of a to fit into M ,
M
a < ,
2
which also implies
N
a < .
2
We combine those to conclude
am 1
> .
N 2
[Exercise. Do it.]
We can plug that result into our probability estimate to get
2
P (measurement yields one of the yc ) > .
π2
Note-to-file. If a << M < N , as is often the case, then
am
≈ 1,
N
and the above lower bound improves to
4
P (measurement yields one of the yc ) ≥ − ε,
π2
where ε → 0 as m → ∞
689
Our last lower bound for the “p of success,” 2/π 2 > 0, was independent of M (or
N or a), so by repeating the random measurement a fixed number, T , times, we can
assure we measure one of those yc with any desired level of confidence. This follows
from the CTC theorem for looping algorithms (end of probability lesson), but for a
derivation that does not rely on any theorems, compute directly,
The last product can be made arbitrarily small by making T large enough, indepen-
dent of N , M , a, etc. This would prove the claim of Step 2 if we could only use the
bound we got for m e = m. But we can’t. So we must soldier on ...
690
So far, we’re okay; we had not yet made any assumption about the particular choice
of m.
e To bound this probability, we went on to get an estimate for the fraction
iθ m/a
e ybc e − 1
eiθybc /a − 1 .
The denominator’s upper bound worked for any θ, so no change needed there. But
the numerator’s lower bound has to be recomputed, this time, under the harsher
assumption that m
e = m + 1.
Earlier, we showed that 2πmb yc / N was confined to the interval (−π, π), which
gave us our desired result. Now, however, we replace m with m + 1, and we’ll see
that 2π(m + 1)b yc / N won’t be restricted to (−π, π). What, exactly, are its limits?
Start, as before,
a a
− ≤ ybc < ,
2 2
but we need to multiply by a different constant this time,
a(m + 1) 2π(m + 1) a(m + 1)
−π ≤ ybc < π .
N N N
By our working assumption, a/N < 2 (which continues to reap benefits) we can assert
that
a(m + 1) am a 1
= + < 1 + ,
N N N 2
giving us the new absolute bounds
3 2π(m + 1) 3
− π < ybc < π.
2 N 2
The next step would have been to apply the general result we found for any φ ∈
[−π, π], namely that
2 |φ| φ
≤ 2 sin , for φ ∈ [−π, π] .
π 2
But now our “φ” = 2πb yc (m + 1)/N lives in the enlarged interval [−(3/2)π, (3/2)π],
so that general result no longer applies. Sigh. We have to go back and get new
general result for this larger interval. It turns out that old bound merely needs to be
multiplied by a constant:
2 |φ| φ 3 3
K ≤ 2 sin , for φ ∈ − π, π ,
π 2 2 2
where K is the constant
2 sin 3π
4
K = ≈ .4714 .
3
691
Figure 23.28: |sin(x/2)| lies above |Kx/π| in the interval ( −1.5π, 1.5π )
Again, this can be done using calculus and solving Kφ/π = sin (φ/2) for K. Visually,
you we see where the graphs of the sine and the line intersect, which confirms that
assertion. Summarizing the general result in the expanded interval,
e − 1 = 2 sin φ ≥ 2K |φ| 3 3
iφ
for φ ∈ − π, π .
2 π 2 2
This gives us the actual bound we seek in the case m e = m + 1, namely
i 2π(m+1)by / N 2K |2π(m + 1)b yc / N |
e c
− 1 ≥ .
π
Re-applying the notation
2πay
θy ≡ ,
N
we get
m+1
i 2π(m+1)y / N 2K
a
θybc
e c
− 1 ≥ .
π
Combining the bounds for the numerator and denominator (in the case where m e =
m + 1),
iθ m/a m+1 ,
e ybc e − 1
≥ 2K
a
θybc θybc
eiθybc /a − 1 π a
2K(m + 1) 2K m
e
= = .
π π
692
so we can use the new, weaker, bound to cover both cases. For all m,
e both m and
m + 1, we have
iθ m/a
e ybc e − 1 2K m
eiθybc /a − 1 ≥
e
,
π
for
2 sin 3π
4
K = ≈ .4714 .
3
We can reproduce the final part of the m
e = m argument, incorporating K into the
inequalities
1 4K 2 m
e2 m 4K 2
P (measurement yields yc ) ≥ ≥ .
mN
e π2 N π2
The last inequality still acknowledges that some of the m e are m + 1 and some are m.
Again, there are a such yc , namely, y0 , y1 , · · · , ya−1 , so the probability of getting any
one of them is (≥) a times this number,
am 4K 2
P (measurement yields one of the yc ) ≥
N π2
1 4K 2 2K 2
> = ≈ .04503 .
2 π2 π2
[Exercise.] Compute K when m = 100.
Note-to-file. If a << M < N (i.e., m gets very large), as is often the case, two
things happen. First,
(m + 1) m
≈ ,
N N
so we can use our first estimate, m
e = m, for measuring a yc , even in the general case.
That estimate gave us
2
P (measurement yields one of the yc ) > .
π2
Second, our earlier “Note-to-file” suggested that, for m
e = m,
4
P (measurement yields one of the yc ) > − ε,
π2
where ε → 0 as m → ∞ .
Putting these two observations together, we conclude that when a << N , the lower
bound for any measurement is ≈ 4/π 2 . (“any” means it doesn’t matter whether
the state to which the second register collapsed, |f (x0 )i, is associated with an x0 for
which there were m or m + 1 mod-a equivalents in [0, N − 1]).
693
So we have both a hard bound assuming worst case scenarios (the period, a, cycles
no more than twice in M ) and the more likely scenario, a << M . Symbolically,
In the worst case, the smaller lower bound doesn’t change a thing; it’s still a constant
probability boundeed away from zero, independent of N, M, a, etc., and it still gives us
a constant time, T , for detecting one of our special yc with arbitrarily high confidence.
This, again, follows form the CTC Theorem for looping algorithms, or you can simply
apply probability theory directly.
In practice, a << M , so we can use .40528 as our constant “p of success” bounded
away from 0. If we are satisfied with an ε = 10−6 of failure, the CTC theorem tells
us that the number of passes of our circuit would be
log (10−6 )
−6
T = + 1 = + 1 = 26 + 1 = 27 .
log (.59472) −.2256875
If we sampled y 27 times our chances of not measuring at least one yc is less than of
one in a million.
This completes the proof of the fact that we will measure one of the a yc in constant
time. Our next task is to demonstrate that by measuring relatively few of these yc we
will be able to determine the period. This will be broken into small, bite-sized steps.
We now know that we’ll measure one of those yc with very good probability if we
sample enough times (O(1) complexity). But, what’s our real goal here? We’d like to
get back to the results of the easy case where we found a number c that was coprime
to a in constant time and used that to compute a. In the general case, however, c
is merely an index of the yc , not a multiple cm, of m. What can we hope to know
about the c which just indexes the set {yc }? You’d be surprised.
In this step, we demonstrate that each of these special, likely-measured yc values is
bound tightly to the fraction c/a in a special way: yc /N will turn out to be extremely
(and uniquely) close to c/a. This, in itself, should feel a little like magic: somehow
the index of the likely-measured set of ys shows up in the numerator of a fraction
that is close to yc /N . Let’s pull back the curtain.
Do you remember those (relatively small) half-open intervals of width a around
694
the points cN ,
h a a h a a
− , + , N− , N+ ,
2 2 h 2 2
a a
··· cN − , cN + ···
2h 2
a a
··· (a − 1)N − , (a − 1)N + ,
2 2
each of which contained exactly one integral multiple, yc a, of a in it?
h a a h a a
0a ∈ − , + , · · · , yc a ∈ cN − , cN + ,
2 2 2 2
h a a
··· , ya−1 a ∈ (a − 1)N − , (a − 1)N +
2 2
Let’s use the relationship of those yc with their host intervals. Start with the obvious,
a a
cN − ≤ yc a < cN − ,
2 2
then divide by N ,
c 1 yc c 1
− ≤ < + .
a 2N N a 2N
In other words, yc /N is within 1/(2N ) of c/a, or symbolically
c yc 1
− ≤ .
a N 2N
695
Figure 23.30: N = 2n chosen so (N/2, N ] bracket M 2
Compare that with the distances between consecutive c/a values (which I will prove
in a moment),
c
− c + 1 ≥ 1
,
a a M2
and we can conclude that each of our special yc , when divided by N , is closer to c/a
than it is to any other fraction with denominator a. Thus, when we measure one of
these yc (which we already showed we can do in constant time with arbitrarily good
probability) we will be picking out the rational number c/a. In fact, we will show
something stronger, but first, we fill in the gap and demonstrate why consecutive
fractions c/a and (c + 1)/a differ by, at least, 1/M 2 .
696
Conclusion: Of all fractions n/d with denominator d ≤ M , c/a is the only one
that lies in the neighborhood of “radius” 1/ (2M 2 ) around yc /N . Thus yc /N strongly
selects c/a and vice versa.
[Interesting Observation. We showed that
yc c
is uniquely close to .
N a
But multiply both sides by N to get to the equivalent conclusion,
N
yc is uniquely close to c .
a
What is N/a? It is f req, the exact frequency of our function f relative to the interval
[0, N − 1]. (In general, it’s a real number between m and m + 1.) In other words,
each likely-measured yc is uniquely close to an integer multiple of the function’s exact
frequency,
Our math doesn’t require that we recognize this fact, but it does provide a nice
parallel with the easy case, in which our measured {yc } were exact multiples, {cm},
of the true integer frequency, f req = m.]
The continued fractions algorithm (CFA) will be developed in an optional lesson next
week. In short, it takes a real number x as input and produces a sequence of (reduced)
fractions, {nk /dk } that approach (get closer and closer to) x. We will be applying
CFA to x = yc /N , a rational number itself, but we still want these other fractional
approximations because among them we’ll find one, n/d, which is identical to our
sought-after c/a.
Caution: There is no guarantee that the c ↔ yc satisfies c ◦ a. So when CFA tells
us that it has found the reduced fraction n/d = c/a, we will not be able to conclude
that n = c and d = a. We will deal with that wrinkle in Step V.
The plan is to list a few results about CFA, then use those facts to show that
it will produce (return) the unique a-denominator fraction, c/a, closest to yc /N , in
O(log3 M ) operations. (Reminder: M is the known upper-bound we were given for
a).
Here are the properties of CFA that we’ll need. Some of them are only true when
x is rational, but for us, that’s the case.
697
1. During execution, CFA consists of a loop that produces a fraction in reduced
form, nk /dk , at the end of the kth iteration. nk /dk is called the kth convergent
for x.
2. For any real number, x, the convergents approach x (thus justifying their name),
that is
nk
lim = x.
k→∞ dk
3. For rational x, the above limit is finite, i.e., there will be a K < ∞, with
nK /dK = x, exactly, and no more fractions are produced for k > K.
4. In our version of CFA, we will supply a requested degree of accuracy ε, and ask
CFA to return n/d, the first fraction it generates which is within ε of x.
Depending on ε and x, CFA either returns n/d = nK /dK = x, exactly, as its
final convergent, or returns an ε-approximation n/d 6= x, but within ε of it.
5. The denominators {dk } are strictly increasing and, for rational x, all ≤ the
denominator of x (whether or not x was given to us in reduced form).
6. If a fraction n/d differs from x by less than 1/(2d2 ) then n/d will appear in the
list of convergents for x. Symbolically, if
n 1
− x < ,
d 2d2
then
K
n nk0 nk
= ∈ (the convergents for x) .
d dk0 dk k=0
7. When x = p/q is a rational number, CFA will complete in O(log3 q). (Sharper
bounds exist, but this is enough for our purposes, and is easy to explain.)
698
but in step III we showed
c yc 1
− < ,
a N 2M 2
so
c yc 1
− < .
a N 2a2
As this is the hypothesis of bullet 6, c/a must appear among the convergents of
yc /N . Since it is within 1/(2M 2 ) of yc /N , we know that CFA will terminate when it
reaches c/a, if not before.
We now show that CFA cannot terminate before its loop produces c/a. If CFA
returned a convergent nk /dk that preceded c/a, we would have
x − n k
≤ 1
dk 2M 2
by bullet 4. But since the dk are strictly increasing (bullet 5), and we are saying
that the algorithm terminated before getting to c/a, then
dk < a .
That would give us a second fraction, nk /dk with denominator dk < M within
1/(2M 2 ) of yc /N , a title uniquely held by c/a (from Step III). Therefore, when we
give CFA the inputs x = yc /N and ε = 1/(2M 2 ), it must produce and return c/a.
QED
CFA is O(log3 M )
By Bullet 7 CFA has time complexity O(log3 N ), which we have already established
to be equivalent to O(log3 M ).
699
What we can say is that, from Steps III and IV, each of the measured {yc }s
leads – in constant time, with any predetermined confidence, ε – to a partner fraction
in {c/a} with the help of some O(log3 M ) logic provided by CFA.
In this step, we demonstrate that not only do we measure some yc (constant time)
and get a partner, {c/a} (O(log3 M )), but that we can even expect to get a special
subset B ⊆ {yc } ↔ {c/a} in constant time, namely those yc corresponding to c ◦ a
(i.e., c coprime to a). This will enable us to extract the period a from c/a.
We do it all in three steps, the first two of which correspond to the missing
conditions we enjoyed in the easy case:
• Stated loosely, in our quantum circuit, the “difference” between measuring the
least likely yc and the most likely yc is a fixed ratio independent of a. (This
corresponds to the equi-probabilities of the easy case.)
There are clearly two measurement-dependent parameters that will affect this proba-
bility: m,
e which is either m or m + 1, depending on the collapsed state of the second
register, and θy , which depends on the measured y. When a << M the probabilities
are very close to being uniform, but to avoid hand-waving, let’s go with our worst-case
scenario – the assumption that we added to Shor’s hypothesis in order to get hard
estimates for all our bounds: M > 2a, but not necessarily any larger.
When we computed our lower bound on the probability of getting a yc , we used an
inequality that contained an angle φ under the assumption that φ was restricted to
an interval wider than [−π, π], spefically [−(3/2)π, (3/2)π]. The inequality we found
applicable was
700
2 |φ| φ 3 3
K ≤ 2 sin , for φ ∈ − π, π ,
π 2 2 2
Figure 23.31: |sin(x/2)| lies above |Lx/π| in the interval ( −.4714π, .4714π )
2 |φ| φ h π πi
L ≤ 2 sin , for φ ∈ − , ,
π 2 2 2
701
Upper Bound for Numerator
The numerator is easy. We derived an upper bound for all angles, φ, so:
iθ m/a θybc m
− 1 ≤
e ybc e e .
a
The denominator is
i θ /a i (2πby / N )
e ybc − 1 = e c
− 1 ,
702
Applying Both Bounds
Finally,
1 4L2 m
e2 m + 1 4L2
P (measurement yields yc ) ≤ ≤ .
mN
e π2 N π2
The last inequality still acknowledges that some of the m e are m + 1 and some are
m. This time, we are only interested in the probabilities for each individual yc , not
getting one of the set {yc }, so instead of summing all a of the yc s, we combine this,
as is, with the earlier one, which I repeat here,
1 4K 2 m
e2 m 4K 2
P (measurement yields yc ) ≥ ≥ ,
mN
e π2 N π2
to get
P (least-likely yc ) mK 2
≥ .
P (most-likely yc ) (m + 1)L2
Our assumption has been that a ≤ M/2 so m = (integer quotient) N/a is > 2 (usually
much greater). This, and the estimates for L ≈ 1.4142 and K ≈ .4714, result in a
ratio which is independent of a, M or N , between the probability of measuring the
least likely yc to that of measuring the most likely yc .
P (least-likely yc ) mK 2 2K 2
≥ ≥ ≈ .072 . QED
P (most-likely yc ) (m + 1)L2 3L2
This covers the bullet about the ratio of the least likely and the most likely yc .
Note-to-file. if a << M < N (i.e., m gets very large), as is often the case, Both
K and L will be close to 1 (review the derivations and previous notes-to-file). This
means all yc are roughly equi-probable. Also m/(m + 1) ≈ 1. Taken together, the
ratio of the least likely to most likely is approximately 1.
Summarizing, we have a hard minimum for the worst case scenario (only two
intervals of size a fit into [0, M )) as well as an expected minimum for the more
realistic one (a << M ). That is,
P (least-likely yc ) .072, worst case: 3a > M
≥ .
P (most-likely yc )
1 − ε, typically
703
Proof of Second Bullet
We will show that the probability of randomly selecting a number c ◦ a is > 60%.
First, we express
this as a product of terms, each of which is the probability that
a specific prime, pc (p divides evenly into c).
Note that, a is an unknown: no one a is favored over any other a, a priori. Also,
c is selected at random, by hypothesis of this bullet. So both a and c are considered
selected at random with respect to any particular prime, p. We use this fact in the
derivation.
P (c ◦
a) = P ( There is no prime that divides both c and a )
=
P ¬ 2c ∧ 2a ∧ ¬ 3c ∧ 3a ∧ ¬ 5c ∧ 5a ∧ . . .
... ∧ ¬ pk c ∧ pk a ∧ ... , where pk = kth prime .
finite
! finite
^ Y
P (c ◦
a) = P ¬ p c ∧ p a = P ¬ p c ∧ p a
p prime p prime
Since ¬(pc ∧ pa) is true for p > a or p > c, the probabilities for those higher primes,
p, are all 1, which is why the product is finite.
Next, we compute these individual probabilities. For a fixed prime, p, the probabil-
ity that it divides an arbitrary non-negative c chosen randomly from all non-negative
integers is actually independent of c,
1
P p c = .
p
[Exercise. Justify this.] This is also true for pa, so,
1
P p c ∧ p a = and
p2
1
P ¬ p c ∧ p a = 1− .
p2
Remember, both a and c can be considered randomly selected relative to a fixed
prime, p. If, instead, we restrict ourselves to non-negative values chosen randomly
from a finite set, such as c < a (or a < M ), then the probability that p|c is actually
< 1/p. To see this, consider the worst cases, say a < p (P = 0) or p ≤ a < 2p
(P = 1/a < 1/p). So the above equalities become bounds which work even better for
704
us,
1
P p c < ,
p
1
P p c ∧ p a < and
p2
1
P ¬ p c ∧ p a ≥ 1− .
p2
Finally, we plug this result back into the full product, to get
finite ∞
Y Y 1
P (c ◦
a) = P ¬ (p|c ∧ p|a) ≥ 1−
p prime p prime
p2
= ζ(2),
where ζ(s) is the most famous function you never heard of, the Riemann zeta function,
defined by
Y 1
ζ(s) = 1−
p prime
ps
in Euler product form. The value of ζ(2) has to be handed-off to the mathematical
annals, and we’ll simply quote the result,
ζ(2) ≈ .607 .
That proves our second bullet, which is all we will need, but notice what it implies.
Since
P (¬ c ◦
a) < .393 ,
This can be made arbitrarily small by choosing a large enough T , so we can expect
to get a c ◦ a with arbitrarily high probability after T random samples from the set
[0, a − 1].
705
Combine Both Bullets to Arrive at Step V
C ≡ {yc }a−1
c=0
{ yb yb ∈ C and b ◦
B ≡ a}
(Note: B ⊆ C.)
We would like a lower bound on the probability of measuring a yc which also has the
property that its associated c is coprime to a. In symbols, we would like to show:
Claim: P (B) ≥ some constant independent of a, M, N, etc.
Proof:
Let
If there is more than one y that produce minimum or maximum probabilities, choose
706
any one. Then,
, !
X X |B| P (yBmin )
P (yb ) P (yc ) ≥
b∈B c∈C
|C| P (yCmax )
|B| P (yCmin )
≥
|C| P (yCmax )
≥ q × .072 ,
so
P (B) ≥ q × .072 P (C) .
From the proof of the second bullet of this step, q > .607, and from Step II P (C) >
.04503, so
P (B) ≥ .607 × .072 × .04503 ≈ .002.
This is independent of a, M, N, etc. and allows us to apply the CTC theorem for
looping algorithms to aver that after a fixed number, T , of applications of the quantum
circuit we will produce a yc with c ◦ a with any desired error tolerance. We’ll compute
T in a moment.
Remember that we used a worst case bounds above. As we demonstrated, nor-
mally the ratio .072 is very close to 1, and P (C) > .40528, so we can expect a better
constant lower bound:
P (B) ≥ .607 × 1 × .40528 ≈ .266.
Conclusion of Step V
How many passes, T , do we need to get an error tolerance of, say ε = 10−6 (one in
a million)? It depends on the number of times our period, a, fits into the interval
[0, N ). Under the worst case assumption that we formally required – only two – we
would need a much larger number of whacks as the circuit than in a typical problem
that fits hundreds of periods in the interval. Let’s see the difference.
The “p of success” bounded away from 0 for a single pass of our algorithm’s loop
in the CTC theorem, along with the error tolerance ε, gives us the number of required
passes, the formula provided by the theorem,
log (ε)
T = + 1.
log (1 − p)
707
Worst Case (P (B) ≈ .002): We solved this near the end of the probability
lesson and we found
log (10−6 )
T = + 1 = 6901 ,
log (.998)
or, more briefly and conservatively, 7000 loop passes.
Typical Case (P (B) ≈ .266): This was also an example in the earlier
chapter,
log (10−6 )
T = + 1 = 45 ,
log (.734)
or, rounding up, 50 loop passes.
It’s important to remember that the data’s actual probability doesn’t care
about us. We can instruct our algorithm to cycle 7000 times, but if the data
determined that only 15 loop passes were needed to find the desired c/a, then it
would return with a successful c/a after 15 passes. Our hyper-conservative estimate
costs us nothing.
On the other hand, if we are worried that a < N/2 is too risky, and want to allow
for only one period of a fitting into M or N , the math works. For example, we could
require merely a < .999N and still get constant time bounds for all probabilities. Just
repeat the analysis replacing 1/2 with .999 to find the more conservative bounds. You
would still find that proofs all worked, albeit with P (B) > a constant much smaller
than .002.
In the final step of the previous section we proved that we will measure yc ∈ B with
near certainty after some predetermined number, T of measurements. Why does such
a yc solve Shor’s period-finding problem? CFA returns n/d = c/a close to yc /N , but
the actual numerators and denominators do not have to match: n/d is reduced, but
c/a may not be for general c. However, we are safe when yc ∈ B, for then, c ◦ a, and
we are assured that c/a is a reduced fraction. In that case, we can assert that the
numerators and denominators match; for the denominators, that means d = a and
we are done.
This happens if we measure y ∈ B, and this is why finding such a y will give us
our period and therefore solve Shor’s problem. (Wait. Might we end up getting d = a
even if we happened to measure some y ∈ / B? In other words, is the if in the last
sentence if and only if or just if ? Well, there are two ways to deal with this question.
You could chase down the logic to see whether a y ∈ / B used to send x = y/N to the
CFA could possibly give us d = a, but why bother? I prefer the second method to
dispatch the question: Our algorthm will test d = a after every measurement, so if
(possible or not) we found a for the wrong reason, we still found a.)
708
Shor’s Period-Finding Algorithm, General Case
After some hard work we have assembled all the results. Two things need to go right
to be certain we measured a y ∈ B:
1. We measure y ∈ C = {yc }.
2. the associated c satisfies c ◦ a.
We could fail in either or both, but the test of overall success is that the fraction,
n/d, returned, by CFA( yc /N, 1/(2M 2 ) ) has the property that d = a. And we can
test that in a single evaluation of our given function f . (Remember f ? It was given
to us in Shor’s hypothesis, and we used it to design/control Uf ).
Testing whether, say, f (1 + d) = f (1) is enough to know whether d is the period,
a.
The complexity of this test may or may not be polynomial time, depending on f .
For RSA encryption-breaking the f in question is O(log4 (M )), as we will prove, so for
that application we will have absolute speed-up over classical solutions. In general,
this step confines our quantum period-finding to only relativized speed-up.
So, the short description of the algorithm is this:
· Run the circuit producing yc s, use those to get (n/d)s, stop when we
confirm f (1 + d) = f (1), report that d = a (our period) and declare
victory.
· In the unlikely event that we run out of time (exceed our established
T ), we admit defeat.
• Select an integer T that reflects an acceptable failure rate based on any known
aspects of the period. (E.g., for a failure tolerance of .000001, we might choose
T = 7000 if we expect only 2 periods, a, to fit into [0, M − 1) or T = 45 if we
know a << M . If we are happy with failure at .001, then these values would
be adjusted downward.)
• Repeat the following loop at most T times.
• If the above loop ended naturally (i.e., not from the break ) after T full passes,
we failed. Otherwise, we have found a.
709
Computational Complexity (General Case)
• The four non-oracle components, above, are done in series, not in nested loops,
so the overall relativized complexity will be the worst among them, O(log3 (M )).
So the entire Shor circuit for an f ∈ O(log4 (M )) (true for RSA/order-finding) would
have an absolute complexity of O(log4 (M )).
710
Chapter 24
• EA(P, Q) ∈ O(log3 X), where X is the larger of the two integers passed to it
(although sharper/subtler bounds exist), and
711
24.2.2 Long Division
A long division algorithm (LDA) for integers A and B, both > 0, produces A ÷ B in
the form of quotient, q and remainder, r satisfying
A = qB + r.
The big-O time complexity of LDA(A, B) in its simplest form is O(log2 X), where X
is the larger of {A, B}. If you research this, you’ll find it given as O(N 2 ), where N is
the number of digits in the larger of {A, B}, but that makes N = log10 X, and log10
has the same complexity as log2 .
General Idea
P = qQ + r.
Notice that either r = 0, in which case Q|P and we are are done (gcd(P, Q) = Q),
or else we have two new integers to work with: Q and r, where, now, Q > r. We
re-apply long division, this time to the the inputs Q and r to get Q ÷ r, and again,
examine the new remainder (call it r0 ),
Q = q 0 r + r0 .
Like before, if r0 = 0 we are done (gcd(P, Q) = r), and if not, we keep going. This
continues until we get a remainder, re = 0, at which point the gcd is the integer
“standing next to” (being multiplied by) its corresponding quotient, qe, in the most
recent long division (just as Q was “standing next to” q in the initial division, or r
was “standing next to” q 0 in the second division).
Let’s add some indexing to our “general idea” before we give the complete algo-
rithm. Define
r0 ≡ P,
r1 ≡ Q,
712
and
r0 = q0 · r1 + r2 .
r1 = q1 · r2 + r3 .
• Initialize
r0 = P
r1 = Q
rk = qk · rk+1 + rk+2 .
– until rk+2 = 0
713
Example
r0 = 285
r1 = 126
r0 = q0 · r1 + r2
285 = 2 · 126 + 33
r2 6= 0, so compute r1 ÷ r2
r1 = q1 · r2 + r3
126 = 3 · 33 + 27
r2 6= 0, so compute r2 ÷ r3
r2 = q2 · r3 + r4
33 = 1 · 27 + 6
r3 6= 0, so compute r3 ÷ r4
r3 = q3 · r4 + r5
27 = 4 · 6 + 3
r4 6= 0, so compute r4 ÷ r5
r4 = q4 · r5 + r6
6 = 2·3 + 0
r5 = 0, so return gcd = r4
gcd = 3
P = qQ + r.
Notice that
P
r < .
2
To see this, break it into two cases.
• case ii) Q > P/2. In this case q = 0 and r = P − Q < P − P/2 = P/2.
rk = qk · rk+1 + rk+2 ,
714
gives
rk
rk+2 < .
2
In other words, every two divisions, we have reduced rk by half, forcing the even-
indexed rs to become 0 in at most 2 log P iterations. Therefore, the number of steps
is O(2 log P ) = O(log P ). (Incidentally, the same argument also works for Q, since
Q = r1 , spawning the odd-indexed rs. So, whichever is smaller, P or Q, gives a tighter
bound on the number of steps. However, we don’t need that level of subtlety.) We
have shown that the EA main loop’s complexity is O(log X), where X is the larger
(or either) of P , Q.
Next, we note that each loop iteration, calls the LDA(rk , rk+1 ), which is O(log2 rk ),
since rk > rk+1 . For all k, rk < X (larger of P and Q), so LDA(rk , rk+1 ) is also in
the more conservative class, O(log X).
Combining the last two observations, we conclude that the overall complexity of
EA(P, Q) is the product of complexities of loop, O(log X), and the LDA within the
loop, O(log2 X), X being the larger of P and Q. Symbolically,
EA(P, Q) ∈ O(log3 X),
X = larger of {P, Q}.
715
The headlines are:
A Rational Example
An Irrational Example
716
CFA Using the EA. If
P
x = ,
Q
the {ak } in its continued fraction expansion are exactly the unique {qk }
of the Euclidean Algorithm, EA(P, Q), for finding the gcd(P, Q).
Theorem. The time complexity for computing all the ak for a rational x
is O(log3 X), where X is the larger of {P, Q}.
Proof : Since the {ak } of continued fractions are just the {qk } of the EA, and
we have proved that the EA ∈ O(log3 X), where X is the larger of P and Q, the
conclusion follows. QED
The CF Method
1. ak ← bxc
2. f rac ← x − bxc
3. If f rac = 0, break from loop; we’ve found x.
4. x ← 1/f rac
717
24.3.4 Convergents of a Continued Fraction
Algorithmic Definition of Convergents
Rather than using the unwieldy continued fractions directly, we’ll work with the much
more tractable “simple fractions” which derive from them.
We could have defined the kth convergent of x when we first introduced continued
fractions. They are just the partially completed constructs that we see when writing
out (or attempting to write out) the full continued fraction. Explicitly, the first few
convergents are
n0
= a0 ,
d0
n1 1
= a0 + ,
d1 a1
n2 1
= a0 + ,
d2 1
a1 +
a2
and generally,
nk 1
= a0 +
dk 1
a1 +
... .
1
ak−1 +
ak
.
718
Example
For the rational x = 285/126, whose continued fraction and {ak } we computed earlier,
you can verify that the convergents are
n0 2
= ,
d0 1
n1 7
= ,
d1 3
n2 9
= ,
d2 4
n3 43
= , and
d3 19
n4 95
= .
d4 42
Notice that the final convergent is our original x in reduced form. This is very impor-
tant for our purposes. Figures 24.1 and 24.2 show two graphs of these convergents at
different zoom levels.
719
1. The convergents “converge” very fast – every two convergents are much closer
together than the previous two, and
2. The convergents bounce back-and-forth around the target, x, alternating less-
than-x values and greater-than-x values.
Before making this precise, let’s see if it holds for a second example.
Example
720
Figure 24.4: Second view of convergents
721
24.3.5 An Algorithm for Computing the Convergents {nk /dk }
• Specify a termination condition (e.g., a maximum number of loop passes or a
convergent within a specified ε of the target x, etc.).
• Invoke the CF method, given earlier, that produces the entire sequence {ak } to
the desired accuracy. (Note: The CF method can be merged into this algorithm
by combining loops carefully.)
n0 ← a0 , d0 ← 1
n1 ← a1 a0 + 1, d 1 ← a1
• Loop over k starting at k = 2 and iterating until k = K, the final index of the
sequence {ak } returned by the CF method.
1. nk ← ak nk−1 + nk−2
2. dk ← ak dk−1 + dk−2
2. For rational x, the above limit is finite, i.e., there will be a K < ∞, with
nK /dK = x exactly, and no more fractions are produced for k > K.
4. Each nk /dk (if not the final convergent which is exactly x) differs from x by no
more than 1/(dk dk+1 ),
x − nk
≤ 1
.
dk dk dk+1
722
5. For k > 0, nk /dk is the best approximation to x of all fractions with denominator
≤ dk ,
x − n k ≤ x − n , for all d ≤ dk .
dk d
7. The denominators {dk } are strictly increasing and, if x is rational, are all ≤ the
denominator of x (whether or not x was given to us in reduced form).
• If a fraction n/d differs from x by less than 1/(2d2 ) then n/d will appear in the
list of convergents for x. Symbolically:
If
n 1
− x <
d 2d2
then
K
n nk0 nk
= ∈ (the convergents for x) .
d dk 0 dk k=0
1. To our previous algorithm for generating the convergents, pass x along with the
terminating condition that it stop looping when it detects that |x − nk /dk | ≤ ε.
723
2. Return nk /dk .
724
Chapter 25
[Variable Name – Most number theory publications use the letter N for the
large number to be factored. I have been using N to be the power-of-2 larger than
M 2 , where M is our bound on the period a and the number we want to factor, so I
will use M for the arbitrary integer in this section.]
We will want to narrow down the kind of M that we’re willing to look at, and there
are two cases which can be tested and factored by ordinary computers efficiently,
725
1. M even, and
2. M = pk , k > 1, is a power of some prime.
The test “M even? ” entails a simple examination of the least significant bit (0 or 1),
trivially fast. Meanwhile, there are easy classical methods that determine whether
M = pk for some prime p and produce such a p in the process (thus providing a
divisor of M ).
In fact, we can dispose of a larger class of M : those M for which M = q k , k > 1
for any integer q < M , prime or not, and produce q in the process – all using classical
machinery. If we detected that case, it would provide a factor q, do so without
requiring Shor’s quantum circuit, and cover the more restrictive condition 2, in which
q is a prime.
So why does the second condition only seek to eliminate the case in which M is
a power of some prime p before embarking on our quantum algorithm rather than
using classical methods to test and bypass the larger class of M that are powers of
any integer, q? First, eliminating only those M that are powers of a single prime
is all that the quantum algorithm actually requires. So once we have disposed of
that possibility, we are authorized to move on to Shor’s quantum algorithm. Second,
knowing we can move on after confirming M 6= pk , for a p prime, gives us options.
• We can ask the number theorists to provide a very fast answer to the question
“is M = pk , p prime,” and let the quantum algorithm scoop up the remaining
cases (which include M = q k , q not prime).
• Alternatively, we can apply a fast classical method to search for a q (prime or
not) that satisfies M = q k , thus avoiding the quantum algorithm in a larger
class of M .
One of the above two paths may be faster than the other in any particular {hardware
+ software} implementation, so knowing that we can go either way gives us choices.
Now let’s outline why either of the two tests
M = qk , k > 1 or
M = pk , k > 1, p prime
can be dispatched classically.
Any such power k would have to satisfy k < log2 M since p > 2 (we’ve eliminated M
even). Therefore, for every k < log2 M we compute the integral part
j√ k
k
q= M
726
(something for which fast algorithms exist) and test whether q k = M . If it does, q is
our divisor, and we have covered the case M = q k , k > 1, for any integer q without
resorting to quantum computation. The time complexity is polynomial fast because
• the outer loop has only log2 M passes (one for each k),
j√ k
k
• even a slow, brute force method to compute q = M has a polynomial
big-Oh, and
[Exercise. Design an algorithm that implements these bullets and derive its
big-Oh.]
But if one wanted to also know whether the produced q in the above process was
prime, one could use an algorithm like AKS (do a search) which has been shown to
be log polynomial, better than O(log 8 (#digits in M )) = O(log 8 (logM )), in M .
This approach was based on first finding a q with M = q k , then going on to
determine whether q was prime. That’s not efficient, and I presented it only because
the components are easy, off-the-shelf results that can be combined to prove the
classical solution is polynomial fast. In practice we would seek a solution that tests
whether M is a power of a prime directly, using some approach that was faster than
testing whether it is a power of a general integer, q.
The reason we quickly dispose of these two cases is that the reduction of factoring
to period-finding, described next, will not work for either one. However, we now
understand why we can be comfortable assuming M is neither even nor a power of a
single prime and can proceed based on that supposition.
x2 = 1 (mod M ), for
x 6= ±1 (mod M ) .
727
Proof
x2= 1 (mod M )
⇒
x2 − 1 = 0 (mod M )
⇔
2
M (x − 1)
⇔
M (x − 1) (x + 1) .
That can only happen if M has a factor, p > 1, in common with one or both of (x−1)
and (x + 1), i.e.,
pM and p(x − 1) or
pM and p(x + 1) .
Whichever of the above two cases is true (and we just proved at least one must be),
we have produced a q with q M . QED
The time complexity of gcd(M, k), M ≥ k is shown in another lecture to be
O(log3 M ), so once we have x, getting q is “fast.”
Before we find x, we take a short diversion to describe something called order −
f inding.
728
• If y ◦ M we go on to find the “order” of y, defined next and which leads to the
rest of the algorithm.
We therefore assume that y ◦ M , since if it were not we would have lucked upon the
first bullet and factored M – that would be a third easy case we might encounter.
While it may not be obvious, finding the “order” of y in ZM will be the key. In
this section we define ”order” and learn how we compute it with the help of Shor’s
quantum period-finding; the final section will explain how doing so factors M .
yb = 1 (mod M ) .
For each pair, take k 0 to the the larger of the two, and write this last equality as
We just argued that there are infinitely many k > 1 (with potentially a different b > 0
for each k) for which the above holds. There must be a smallest b that satisfies this
among all the pairs. (Once you find one pair, take the k and b for that pair. Keep
looking for other pairs with different ks and smaller bs. You can’t do this indefinitely,
since eventually you’d reach b = 0. It doesn’t matter how long this takes – we only
need the existence of such a b, not to produce it, physically.) Assume this last equality
represents that smallest b > 0 for any k which makes it true. That means, there exists
a pair with
Factoring,
yk 1 − yb = 0
(mod M )
⇒
M y k (1 − y b )
⇒
M (1 − y b ) .
729
The last step holds because y k ◦ M , since we are working inside the major case in
which we were unlucky enough to pick a y ◦ M . We’re done (proving that an order b
of y exists) because the final equality means
yb = 1 (mod M ) .
Define
a ≡ b + 1,
which implies (plug into above to see why)
ya = y (mod M ), a minimal .
That means the function
f (x) = y x (mod M ) ,
built upon our randomly selected y, is periodic with period a. Furthermore, it is
ZM −periodic, which implies the extra condition
f (x0 ) 6= f (x) whenever x − x0 < a .
(Review the definition of b and its minimality to verify this extra condition.)
We’ll also be using the fact that the period, a, is less than M . Here’s why we can
assume so. The order of y, b = a − 1, has to divide M , because the order of every
element in a finite group divides evenly into the size of the group, M in this case (see
elementary group theory, if you’d like to research it). So, either b = M or b ≤ M/2.
In the former case, a = M − 1, and we dispose of that possibility instantly by testing
whether M − 1 is the period of f (x) = y x (mod M ) by evaluating it for any x and
x + (M − 1). That only leaves the case b ≤ M/2 ⇒ a < M .
Enter Quantum Computing – We have a ZM −periodic function, f (x) = y x
(mod M ), with unknown period, a < M . This is the hypothesis of Shor’s quantum
algorithm which we have already proved can be applied in log3 M time. This is exactly
where we would use our quantum computer in the course of factoring M .
We have picked a y ∈ ZM − {0, 1} at random, defined a function based on that
y, and found its period, a. That gave us the order of y in ZM . The next, and final,
step is to demonstrate how we use the order, a, to factor M .
730
and do so efficiently (in polynomial time), we will get our factor q of M . We did
manage to leverage quantum period-finding to efficiently get the order of a randomly
selected y ∈ ZM , so our job is to use that order, a, without using excessive further
computation, to factor M . There are the three cases to consider:
Claim.
x ≡ y a/2
Proof
That’s half the sufficient condition. The other half is x 6= ±1 (mod M ). Our
assumption in this case is that
x 6= −1 (mod M ),
x 6= +1 (mod M ),
and we’ll have shown that this x satisfies our sufficient condition. Proceed by contra-
diction. What would happen if
x = +1 (mod M ) ?
Then
If a = 2, then
y2 = y (mod M ), i.e.,
y = 1 (mod M ),
731
which contradicts that y ∈ ZM − {0, 1} (1 is not in that set). So we are forced to
conclude that a > 2, which gives
a
> 1, so
2
a a a
+1 < + = a.
2 2 2
But now we have an a0 = (a/2) + 1 < a that satisfies
0
ya = y (a/2)+1 = y (mod M ) .
That contradicts that a is the order of y (mod )M , because the order is the smallest
integer with that property, by construction. QED
That dispatches the first case; we have found an x which satisfies the sufficient
condition needed to find a factor, q, of M .
I combine these two cases because, while they are both possible, we rely on results
from number theory (one being the Chinese Remainder Theorem) which tell us that
the probability of both cases, taken together is never more than .5, i.e.,
1
P ( case 2 ∨ case 3 ) ≤
.
2
This result is independent of M . That means that if we repeatedly pick y at random,
T times, the chances that we are unlucky enough to get case 2 or case 3 in all T trials
is
T
! T
^ Y 1 1
P case 2 ∨ case 3 ≤ = .
k=1 k=1
2 2T
732
25.7 The Complexity Analysis
y ∈ ZM and we also saw that a < M , so we only need to consider computing y x for
both y, x < M . Since M < M 2 <= N = 2n , we can express both x and y as a sum
of powers-of-2 with, at most, n = log N terms. Let’s do that for x:
n−1
X
x = xk 2k ,
k=0
where the xk are x’s base-2 digits. So (all products taken mod-M )
xk 2k n−1
P
k
yk y xk 2
Y
x
y = =
k=0
k
However long it takes us to compute the general factor, y xk 2 , we need to repeat it
n times, and when those n factors are computed, we multiply them together using
(n − 1) multiplications. There are two parts, taken in series, to this computation:
h i
k
1. n × complexity of y xk 2 (mod M )
The slower (not the product) of the two will determine the overall complexity of y x
(mod M ).
Preliminary Note. The computational complexity of integer multiplication is
O(log2 X), where X is the larger of the two numbers. For us, each product is bounded
by N > M , so integer multiplication costs us, at most, O(log2 N ).
k
k
xk
y xk 2 = y2 ,
k k−1
and in the process of computing y 2 we would end up computing y 2 , so, in order
to avoid repeated calculations from factor-to-factor, we first compute an array of the
n factors
n k on−1
y2 1, y, y 2 , y 4 , y 8 , ... y n−1 , all mod-M.
=
k=0
Starting with the second element, y, each element in this array is the square of the
one before. That’s a total of n − 2 multiplications (we get 1 and y for free). Thus, to
produce the entire array it costs O(log N ) multiplications, with each multiplication
733
(by above note) ∈ O(log2 N ). That’s a total complexity of O(log3 N ). This is done
once for each factor in the product
n−1
Y xk
k
y2 .
k=0
To complete the computation of each of the k factors, we raise one of our pre-computed
k
y 2 to the xk power. That’s is xk multiplications for each factor. Wait a minute – xk
is a binary digit, either 0 or 1, so this is nothing other than a choice; for each n we
tag on an if-statement to finish off the computation for that factor. Therefore, the
computation of each factor remains O(log3 N ).
There are n factors to compute, so this tags on another log N magnitude to the
bunch, producing a final cost for step 1 of O(log4 N ). However, have not done the
big Π product yet . . . .
The evaluation of all n factors, O(log4 N ), is computed in series with the final prod-
uct, O(log3 N ), not nested, so the slower of the two, O(log4 N ), determines the full
complexity of the oracle. Note that this was a lazy and coarse computation, utilizing
simple multiplication algorithms and a straightforward build of the f (x) = y x (mod
M ), function, and we can certainly do a little better.
As we demonstrated when covering Shor’s algorithm, the relationship between M
and N (N/2 < M 2 ≤ N ) implies that this is the equal to O(log4 M )
• H ⊗n : O(log M ),
• QF T : O(log2 M ), and
734
• classical post-measurement processing (EA/CF) : O(log 3 M ).
The bottleneck is the oracle, at O(log4 M ), which we will use as our polynomial time
complexity, proving the absolute speed-up of quantum factoring. We acknowledge the
existence of faster oracles than the construction provided above, improving overall
algorithm, accordingly.
735
List of Figures
2.1 A vector in R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2
2.2 Vector addition in R . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2
2.3 Scalar multiplication in R . . . . . . . . . . . . . . . . . . . . . . . . 55
2.4 Orthogonal vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3
2.5 A vector in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
2.6 A vector expressed as linear combination of x̂ and ŷ . . . . . . . . . . 61
2.7 A vector expressed as linear combination of b0 and b1 . . . . . . . . . 64
2.8 A vector expressed as linear combination of c0 and c1 . . . . . . . . . 65
3.1 Dot-product of the first row and first column yields element 1-1 . . . 75
3.2 Dot-product of the second row and second column yields element 2-2 75
3.3 Minor of a matrix element . . . . . . . . . . . . . . . . . . . . . . . . 80
3.4 The numerator of Cramer’s fraction . . . . . . . . . . . . . . . . . . . 85
∞
The Cauchy sequence 1 − k1 k=2 has its limit in [0, 1] . . . . . . . .
4.1 98
736
∞
The Cauchy sequence 1 − k1 k=2 does not have its limit in (0, 1) . .
4.2 98
4.3 Triangle inequality in a metric space.svg from Wikipedia . . . . . . . 100
4.4 A “3-D” quantum state is a ray in its underlying H = C3 . . . . . . 101
4.5 Dividing a vector by its norm yields a unit vector on the same ray . 102
737
19.2 The function y = tan x blows-up at isolated points but is still periodic
(with period π) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 544
19.3 Graph of a function defined only for x ∈ [−1, 3) . . . . . . . . . . . . 544
19.4 Graph of a function defined everywhere, but whose support is [−1, 3],
the closure of [−1, 0) ∪ (0, 3) . . . . . . . . . . . . . . . . . . . . . . 545
19.5 A periodic function that can be expressed as a Fourier series . . . . . 546
19.6 A function with bounded domain that can be expressed as a Fourier
series (support width = 2π) . . . . . . . . . . . . . . . . . . . . . . . 546
19.7 A low frequency (n = 1 : sin x) and high frequency (n = 20 : sin 20x)
basis function in the Fourier series . . . . . . . . . . . . . . . . . . . . 548
19.8 f (x) = x, defined only on bounded domain [−π, π) . . . . . . . . . . 549
19.9 f (x) = x as a periodic function with fundamental interval [−π, π): . 549
19.10First 25 Fourier coefficients of f (x) = x . . . . . . . . . . . . . . . . . 550
19.11Graph of the Fourier coefficients of f (x) = x . . . . . . . . . . . . . . 550
19.12Fourier partial sum of f (x) = x to n = 3 . . . . . . . . . . . . . . . . 552
19.13Fourier partial sum of f (x) = x to n = 50 . . . . . . . . . . . . . . . 552
19.14Fourier partial sum of f (x) = x to n = 1000 . . . . . . . . . . . . . . 553
19.15f (x) has bounded domain, but its Fourier expansion is periodic. . . . 553
19.16f = 10 produces ten copies of the period in [−.5, .5) . . . . . . . . . . 558
19.17f = .1 only reveals one tenth of period in [−.5, .5) . . . . . . . . . . . 558
738
20.14sin(3x) and its spectrum . . . . . . . . . . . . . . . . . . . . . . . . . 576
20.15A normalized Gaussian with σ 2 = 3 and its Fourier transform . . . . 578
20.16A more localized Gaussian with σ 2 = 1/7 and its Fourier transform . 578
23.1 The spectrum of a vector with period 8 and frequency 16 = 128/8 . . 640
23.2 sin(3x) and its spectrum . . . . . . . . . . . . . . . . . . . . . . . . . 640
23.3 The spectrum of a purely periodic vector with period 8 and frequency
16 = 128/8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 641
23.4 Graph of two periods of a periodic injective function . . . . . . . . . . 642
23.5 Example of a periodic function that is not periodic injective . . . . . 644
23.6 We add the weak assumption that 2(+) a-intervals fit into [0, M ) . . 645
23.7 Typical application provides many a-intervals in [0, M ) . . . . . . . . 646
23.8 Our proof will also work for only one a interval in [0, M ) . . . . . . . 646
23.9 N = 2n chosen so (N/2, N ] bracket M 2 . . . . . . . . . . . . . . . . . 647
23.10Eight highly probable measurement results, cm, for N = 128 and a = 8 650
23.11Easy case covers aN , exactly . . . . . . . . . . . . . . . . . . . . . . 655
23.12[0, N ) is the union of distinct cosets of size a . . . . . . . . . . . . . . 655
739
23.13The spectrum of a purely periodic vector with period 8 and frequency
16 = 128/8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 658
23.14Eight probabilities, .125, of measuring a multiple of m = 16 . . . . . . 664
23.15There is (possibly) a remainder for N/a, called the “excess” . . . . . 670
23.16[0, N ) is the union of distinct cosets of size a, except for last . . . . . 670
23.17The final coset may have size < a . . . . . . . . . . . . . . . . . . . . 671
23.18If 0 ≤ x < N − ma, a full m + 1 numbers in ZN map to f (x) . . . . . 672
23.19If N − ma ≤ x < a, only m numbers in ZN map to f (x) . . . . . . . 672
23.20The spectrum of a purely periodic vector with period 10 and frequency
12.8 = 128/10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 674
23.21A very long line consisting of a copies of N = 2n . . . . . . . . . . . . 678
23.22Half-open intervals of width a around each point cN . . . . . . . . . 679
23.23Exactly one integral multiple of a falls in each interval . . . . . . . . 679
23.24Probabilities of measuring y4 = 51, y5 = 64 or y6 = 77 are dominant. . 682
23.25ybc all fall in the interval [−a/2, a/2) . . . . . . . . . . . . . . . . . . . 683
23.26The chord is shorter than the arc length . . . . . . . . . . . . . . . . 685
23.27|sin(x/2)| lies above |x/π| in the interval ( −π, π ) . . . . . . . . . . . 686
23.28|sin(x/2)| lies above |Kx/π| in the interval ( −1.5π, 1.5π ) . . . . . . 692
23.29Exactly one integral multiple of a falls in each interval . . . . . . . . 695
23.30N = 2n chosen so (N/2, N ] bracket M 2 . . . . . . . . . . . . . . . . . 696
23.31|sin(x/2)| lies above |Lx/π| in the interval ( −.4714π, .4714π ) . . . . 701
740
List of Tables
741