In my previous post, we saw that states you can measure form a basis for vectors which represent states you can prepare. These prepared states might have multiple basis elements in them, so you find that if you try to measure a state made up of multiple basis elements, you have some probability of measuring any of the measurable values which have a nonzero basis element contribution to the state. If we represent a state as a column vector like \(\begin{bmatrix} \frac35 \\ 0 \\ \frac45 \end{bmatrix}\), we would expect some probability of measuring the value associated with the top number in the column (the probability is 0.36 as it turns out), no probability of measuring the value associated with the middle number, and a larger (0.64) probability of measuring the value associated with the bottom number.

We can define a new basis by measuring something else. If you measure one thing and then measure it again, you tend to get the same answer both times. Once you measure a state, you have projected it onto the single basis element that corresponds to whatever the value you measure. As an example with numbers, if you start out with the (normalized) state \(\ket\psi\doteq\frac{\sqrt{2}}{10}\begin{bmatrix} 3 \\ 4 \\ 5 \end{bmatrix}\), and measure an object with curviness 2 wiggle (you expect this to happen half the time), then you may now consider that object to be in state \(\ket{\text{2 wiggle}}\doteq\begin{bmatrix} 0 \\ 0 \\ 1 \end{bmatrix}\), and you expect another measurement to give you 2 wiggle again. But why restrict yourself to one basis?

Suppose there is some second property “bounciness” of my object above which takes the three values “0 flop”, “1 flop”, and “2 flop”. Suppose that an object that is measured to be in a “0 flop” state is always subsequently measured to be in a “0 wiggle” state and vice versa, but if you measure a “1 flop” state, you are then equally likely to measure 1 wiggle or 2 wiggle after. If we keep using the curviness basis above, this is compatible with \(\ket{\text{1 flop}}\doteq\frac{\sqrt{2}}{2}\begin{bmatrix} 0 \\ 1 \\ 1 \end{bmatrix}\). But bounciness should also be a good orthogonal basis! Because the “0 flop” and “0 wiggle” state seem interchangeable, we suppose \(\ket{\text{0 flop}}\doteq\begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix}\). We are then forced to expect that the “2 flop” state will be represented as \(\ket{\text{2 flop}}\doteq\frac{\sqrt{2}}{2}\begin{bmatrix} 0 \\ 1 \\ -1 \end{bmatrix}\) up to a constant. Why? We need to create a state which does not “contain “any of the “0 flop” or “1 flop” state. Recall that we figure out how much of one state is in another state with the projection operator, which is represented by “braket” notation. If \(\ket\psi\) is represented by the vector \(\psi\) and \(ket\phi\) is represented by the vector \(\phi\), then the projection of \(\phi\) on \(\psi\) is \(\bra\phi\psi\rangle=\phi^\dagger\psi\) where the $\dagger$ sign means transpose the column vector into a row vector and turn all of the numbers into their complex conjugates (which are just the numbers if the numbers are real). We can confirm that the vector I wrote for \(\ket{\text{2 flop}}\) fulfills the necessary requirements of projecting to zero on any other observable state: \(\bra{\text{2 flop}}\text{0 flop}\rangle=\frac{\sqrt2}2\begin{bmatrix} 0 & 1 & -1 \end{bmatrix}\begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix}=0\) and \(\bra{\text{2 flop}}\text{1 flop}\rangle=\frac{\sqrt2}2\begin{bmatrix} 0 & 1 & -1 \end{bmatrix}\begin{bmatrix} 0 \\ 1 \\ 1 \end{bmatrix}=0\). From this representation, we expect a “2 flop” state to have even odds of measuring 1 wiggle or 2 wiggle just like the “1 flop” state.

A corollary of this is that the order of measurement matters. Measuring a thing shoves it into a basis state, but two different measurements might not use the same basis. If you measure property A followed by property B, you get a state which will always be measured to have a set value in B for future measurements of B. If you then measure A, you will have some probability of several values in A assuming they aren’t the same basis. However, if you measure property B before you measure property A, you will always have a set value in A for future measurements rather than several values of A as if you had measured A followed by B. This is a trivial example, but quantum mechanics textbooks make a big deal out of it, because prior to the development of quantum mechanics, it was taken for granted that you could define a state such that you knew everything it was possible to know about the state. Statistical mechanics allowed macrostates such as “bottle of hydrogen at 3 atmospheres and 296 degrees kelvin” which were compatible with many microstates, but in principle you could define one of the microstates exactly.

It so happens that the universe does not actually work that way. The canonical example is that you cannot know both a particle’s momentum and position in a particular direction to arbitrary accuracy. If you shoot a bunch of electrons down a straight narrow pipe toward a hole in front of a screen, the ones which make it to the end of the pipe will have very little momentum in the sideways direction, because they need to have a momentum consistent with moving straight down the pipe. If the hole is small, you “measure” the location in the sideways direction of the electrons which make it through the hole very precisely because they have to go through the hole. For some electrons which make it to the screen, there will be no line you can draw which goes from the electron source through the hole to the screen. An electron which goes through the hole has a sideways location which is tightly constrained by the hole width, so its momentum cannot keep the tightly-constrained sideways momentum near zero that it must have had to make it to the hole, and there is a significant probability of finding it to have veered off at some angle when you measure it on the other side of the hole. In the language of linear algebra, a position basis element is made up of many momentum basis elements and vice versa.

In the previous post, we also saw how square matrices represent operators which change states into other states. One important type of operator in quantum mechanics is the observable. An observable operator is an operator which does not change states which represent one thing you can measure, but will change states that do not represent a single thing you can measure. The operators, when applied to a state which represents a single observation, return the original state multiplied by whatever value you would have observed if you measure that state. In the language of linear algebra, an observable operator is defined as the operator whose eigenvectors represent states which are always measured to have the same value and whose eigenvalues are those values. In the basis we have above, we can represent the curviness operator as \(\hat c\doteq\begin{bmatrix} \text{0 wiggle} & 0 & 0 \\ 0 & \text{1 wiggle} & 0 \\ 0 & 0 & \text{2 wiggle} \end{bmatrix}\). Notice that because we’re in the wiggle basis, the matrix is diagonal. We clearly see that if we multiply this matrix by \(\begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix}\), \(\begin{bmatrix} 0 \\ 1 \\ 0 \end{bmatrix}\), and \(\begin{bmatrix} 0 \\ 0 \\ 1 \end{bmatrix}\), then we get \(\begin{bmatrix} \text{0 wiggle} \\ 0 \\ 0 \end{bmatrix}\), \(\begin{bmatrix} 0 \\ \text{1 wiggle} \\ 0 \end{bmatrix}\), and \(\begin{bmatrix} 0 \\ 0 \\ \text{2 wiggle} \end{bmatrix}\) respectively.

We get more interesting operators when we attempt to build observables for things outside of the basis we’re working with. The bounciness matrix looks like \(\hat b\doteq\text{(flop)}*\begin{bmatrix} 0 & 0 & 0 \\ 0 & \frac32 & -\frac12 \\ 0 & -\frac12 & \frac32 \end{bmatrix}\). Note that I have factored out the units and put them in front of the matrix. We can confirm that the eigenvalues of this matrix give us the correct measurements from above:

\[\hat b \ket{\text{0 flop}}=\text{(flop)}*\begin{bmatrix} 0 & 0 & 0 \\ 0 & \frac32 & -\frac12 \\ 0 & -\frac12 & \frac32 \end{bmatrix}\begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix}=0\doteq\text{0 flop}\ket{\text{0 flop}}\] \[\hat b \ket{\text{1 flop}}=\frac{\sqrt2}{2}\text{(flop)}*\begin{bmatrix} 0 & 0 & 0 \\ 0 & \frac32 & -\frac12 \\ 0 & -\frac12 & \frac32 \end{bmatrix}\begin{bmatrix} 0 \\ 1 \\ 1 \end{bmatrix}=\text{(1 flop)}\frac{\sqrt2}2\begin{bmatrix} 0 \\ 1 \\ 1 \end{bmatrix}\doteq\text{1 flop}\ket{\text{1 flop}}\] \[\hat b \ket{\text{2 flop}}=\frac{\sqrt2}{2}\text{(flop)}*\begin{bmatrix} 0 & 0 & 0 \\ 0 & \frac32 & -\frac12 \\ 0 & -\frac12 & \frac32 \end{bmatrix}\begin{bmatrix} 0 \\ 1 \\ -1 \end{bmatrix}=\text{(2 flop)}\frac{\sqrt2}2\begin{bmatrix} 0 \\ 1 \\ -1 \end{bmatrix}\doteq\text{2 flop}\ket{\text{2 flop}}\]]]>I’m frustrated with the explanations of quantum mechanics which I received prior to graduate school. I don’t think there was any reason I couldn’t have understood quantum mechanics in high school, although I might have understood it better after taking undergraduate linear algebra. This post is an attempt to release some of my frustration. This post will make more sense if you have some theoretical knowledge of linear algebra, but all you really need to understand the math is to know how to do matrix multiplication.

The mechanics of quantum mechanics are pretty boring: they are just linear algebra. We only ever get one value when we measure some quantity, so we use the ways we can measure a system to be as a basis for a vector space in which vectors are states that the system could be in, and we’re guaranteed that the basis is orthogonal (roughly, that none of our basis vectors contain any of the other basis vectors). In order to make predictions of what we will measure given a non-basis vector corresponding to a state, we project a given state onto the basis element which corresponds to a possible measured value and square the the magnitude of the projection (which is just a number) to get a weight that tells us the probability of measuring the value.

Let’s turn this into math. Suppose objects can have a property “curviness” which takes one of the three numerical values with units of “0 wiggle”, “1 wiggle”, or “2 wiggle”. It is a property of vector spaces that any vector space over a given field with the same dimension is isomorphic to any other, so I might as well make my life easier by representing the vector space of reality using the more typical vector space of column matrices with 3 elements to match the number of possible measurements in our system. Let us define a basis such that the states “measure 0 wiggles”, “measure 1 wiggle”, and “measure 2 wiggles” are represented by the vectors \(\ket{0 \text{ wiggle} } \doteq \begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix}\), \(\ket{\text{1 wiggle}} \doteq \begin{bmatrix} 0 \\ 1 \\ 0 \end{bmatrix}\), and \(\ket{\text{2 wiggle}}\doteq\begin{bmatrix} 0 \\ 0 \\ 1 \end{bmatrix}\) respectively. In this notation, \(\ket{ \text{[state]} }\) is called a ket, and it represents a vector corresponding to a measured state. I use the \(\doteq\) symbol to emphasize that the column matrix of numbers is a non-unique representation of that state which is useful for calculation but not “equal” to the vector.

You can find the relative odds of measuring each of our basis states by inspection. Simply pick out the number corresponding to each state, square it, and build odds around it. For example, given a state \(\ket\psi\doteq\begin{bmatrix} 3 \\ -4 \\ 5 \end{bmatrix}\), we have odds of \(3^2:(-4)^2:5^2=9:16:25\) of measuring [0 wiggle] to [1 wiggle] to [2 wiggle]. Equivalently, the probability of measuring 0 wiggles is 0.18 and the probability of measuring 1 wiggle is 0.32 and the probability of measuring 2 wiggles is 0.5 if you were given the state represented by \(\ket{\psi}\). This is a special case of finding the probability of an arbitrary state given an arbitrary other state. We can generalize as follows. Suppose you have a state represented by column matrix \(V\) and you want to know the probability of measuring a state represented by the column matrix \(P\). The probability is \(\frac{P^\dagger V V^\dagger P}{P^\dagger P V^\dagger V}\) where the \(\dagger\) (dagger) technically represents a hermitian conjugate, but if you have real numbers it’s just a matrix transpose. Note that this means that the single column matrices we’ve been using to represent states are “daggered” into being single row matrices, and they collapse the column matrices down to single numbers by matrix multiplication. We said that the probability for measuring \(\ket{\text{2 wiggle}}\) given state \(\ket\psi\) was 0.5. We can show that here:

\[\frac{\begin{bmatrix} 0 & 0 & 1 \end{bmatrix}\begin{bmatrix} 3 \\ -4 \\ 5 \end{bmatrix}\begin{bmatrix} 3 & -4 & 5 \end{bmatrix}\begin{bmatrix} 0 \\ 0 \\ 1 \end{bmatrix}}{\begin{bmatrix} 0 & 0 & 1 \end{bmatrix}\begin{bmatrix} 0 \\ 0 \\ 1 \end{bmatrix}\begin{bmatrix} 3 & -4 & 5 \end{bmatrix}\begin{bmatrix} 3 \\ -4 \\ 5 \end{bmatrix}}=\frac{5*5}{1*(9+16+25)}=0.5\]Multiplying by a constant number will not change the state. The odds will not change at all if you are given \(\begin{bmatrix} 6 \\ -8 \\ 10 \end{bmatrix}\) instead of \(\begin{bmatrix} 3 \\ -4 \\ 5 \end{bmatrix}\). If you were instead given the state represented by \(\begin{bmatrix} 3 \\ 4 \\ 5 \end{bmatrix}\), it is not the same state because there is no single number you can multiply by to get to that state. Even though the odds of measuring each thing you can measure are the same, the two states might evolve to different new states under the same initial conditions. We have defined a basis for a vector space, but to do physics, we need to manipulate those states. We generally change states with linear operators, which are represented by square matrices when we represent our vector space of observable states with column matrices . For example, we can define and operator to perform the action “take a system with curviness 1 wiggle and turn it into a system with curliness 0 wiggle, but don’t do anything if the initial system has curviness 0 wiggle or 2 wiggle” and represent it with the matrix \(\begin{bmatrix} 1 & 1 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 1 \end{bmatrix}\). You should verify that this matrix actually does what I say it does. Here is the matrix turning a “1 wiggle” state into a “0 wiggle” state for example: \(\begin{bmatrix} 1 & 1 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 1 \end{bmatrix}\begin{bmatrix} 0 \\ 1 \\ 0 \end{bmatrix}=\begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix}\)

We can now apply this operator to states which are not basis states and see what happens. We find that the state \(\begin{bmatrix} 3 \\ -4 \\ 5 \end{bmatrix}\) is taken to the state \(\begin{bmatrix} -1 \\ 0 \\ 5 \end{bmatrix}\) and the state \(\begin{bmatrix} 3 \\ 4 \\ 0 \end{bmatrix}\) is taken to state \(\begin{bmatrix} 7 \\ 0 \\ 5 \end{bmatrix}\). We see that these two states lead to different odds of measuring each basis state after being transformed, even though the initial states had the same odds of measuring each state.

Once you get intuition for a finite basis corresponding to a finite number of measurable states, you can consider an infinite basis. Sometimes it is countably infinite, such as the number of particles of a certain type in a system. Sometimes it has a higher cardinality, such as the basis “position of a point particle” which can take values corresponding to real numbers over a continuous interval. In either case, it becomes impossible to represent operators or vectors as finite matrices, although physicists will still use the phrase “matrix elements” to describe how an operator sends one basis state to another basis state. The probability equation I wrote above doesn’t make much sense in the case of an infinite basis, but you can still think of probabilities as the squared modulus of the projection of one state onto another state. The way that we write the projection of state \(\ket\psi\) onto state \(\ket\phi\) is using “braket” notation, where the state that you project onto is written as the “bra” \(\bra{\text{[state]}}\), and the total projection is the “braket” \(\bra{\phi}\psi\rangle\). It’s not important for my purposes, but moving from bra space to ket space involves taking all of the complex numbers associated with the ket to their complex conjugates. If this were a textbook, I would spend a page showing that \(\bra{\phi}\psi\rangle\) is the complex conjugate of \(\bra\psi\phi\rangle\), but I will instead just tell you that it is. There’s a lot of annoying calculus you have to do to get meaningful numbers out of states with infinite continuous bases, but I’m hoping that I can use the abstraction of the braket notation to avoid it. For now, a braket is a complex number which tells you how much of one state is inside of another state. States with a braket with a smaller magnitude are less similar than states with a larger magnitude braket. In order for the math to work out to anything meaningful, physicists put a lot of effort into “normalizing” states so that they give you one when they project on themselves: \(\bra\psi\psi\rangle=1\). If you don’t do this, you can make the braket number arbitrarily large or small by multiplying by a complex number that doesn’t actually change the state. If you do do this, you benefit from an elegant general probability rule which is \(\text{Prob}(\psi \text{ given } \phi)=\bra\phi\psi\rangle\bra\psi\phi\rangle=\|\bra\psi\phi\rangle\|^2\).

]]>