Vectors vs. Complex Numbers - The Mathematical Wild

An example of a linear transformation, the angle with which we’ll be looking at complex numbers and vectors alike.
Credit: All graphs in this post were created with Desmos

Why don’t we just use vectors instead? This question is thrown out all the time when discussing the applications of complex numbers, usually because of a mix of unfamiliarity and bad marketing on \( \mathbb{C}'s \) part with the whole “imaginary” thing. Of course, a fanboy of complex numbers like me might launch the opposite question: complex numbers reveal algebraic symmetries that can be applied to solve real equations and real problems, and they come with a language for describing 2D space. So why bother using vectors? To settle the difference, let’s take a look at the ground rules for both systems’ geometries, and see how a small difference slowly grows until the two systems come to head in the infinitesimal landscape of differentiation.

Quaternions: A Complex Past

Did you know that vectors first came from complex numbers?

It’s true. We’ve covered in earlier posts how complex numbers were the missing piece of our algebraic number system, and their properties as an extension of \( \mathbb{R} \) were laid out and then used to derive results such as the complex plane and the geometric representation of i as a rotation. But mathematicians, especially William Hamilton in 1833, wanted to take this geometry a step further by generalizing this representation to describe 3D space. Paradoxically though, after months of frustrated searching, he realized that the only system that preserved the algebra of \( \mathbb{C} \) required 4 coefficients, 1 real and 3 “imaginary”: in a sense, a 4-dimensional number system.

This system, which he dubbed quaternions, fulfilled all his 3D needs and more from an analytic perspective (see 3BlueOneBrown and Ben Eater’s beautiful interactive walkthrough for more info), but they were pretty dissatisfying: they carried a lot of extra 4D baggage that wasn’t needed for their intended 3D purpose. Even worse, they weren’t scaleable: the next extension, the octonions, required 8 dimensions and 7 imaginaries and started shedding more properties like associativity. But what was causing this need for an extra dimension?

Vector Algebra

The trouble was with multiplication and division. For quaternions, a useful definition that kept them closed as an algebraic system forced Hamilton to strip them of their commutative multiplication, and for octonions even associativity (changing the order things are multiplied) had to go for this definition to hold. Let’s look at the way he defined quaternions and the multiplication of his new complex coefficients, i, j, & k:

\( \textbf{Q}=a+b\textbf{I}+c\textbf{J}+d\textbf{K} \)

\( \textbf{I}\textbf{J}=\textbf{K}=-\textbf{J}\textbf{I} \)

\( \textbf{J}\textbf{K}=\textbf{I}=-\textbf{K}\textbf{J} \)

\( \textbf{K}\textbf{I}=\textbf{J}=-\textbf{I}\textbf{K} \)

If that lettering and those relationships look familiar, that’s not a coincidence: if you know your vector algebra, they’re the relationships of unit vectors under the “cross product.” The product is actually directly defined by dropping the real component and multiplying the two resulting quaternions together: in fact, this product is only defined in 3D and 7D, just like the quaternions and octonions!

This is exactly where vectors came from! And to avoid that multiplicative burden, its pioneers dropped the whole operation and in the process a whole new algebraic structure we call a “vector space,” which broadly classifies any system that satisfies fundamental properties of addition and scalar multiplication, but doesn’t require the vector product. This technically still includes complex numbers—explaining the structural similarities the two have— but also allows for looser systems like vectors to thrive.

\( \mathbb{V} + \mathbb{V} \to \mathbb{V}, \mathbb{V} \times \mathbb{V} \to \mathbb{R} \).

\( V(a+b)=V(a)+V(b) (a_1+b_1, a_2+b_2)=(a_1, a_2)+(b_1+b_2) \)

\( V(ka)=kV(a): (ka_1,ka_2)=k(a_1,a_2) \)

Unlike more intricate systems like quaternions, it’s easy to see how vectors represent these higher dimensions: each component is in a different direction and completely independent of the others, no different from a coordinate.

What’s more, it’s independent of a particular set of axes, or basis. Operations like addition and the scalar dot product are unchanged by the choice of axes, whereas the complex numbers described as a vector space are tied to a particular basis \( {1,i} \). These two details make vectors ideal for describing mappings in higher dimensions.

The Link To Linear Algebra

With even fewer rules than the already vast complex numbers, the possible transformations with vectors are staggering. But if we still want the output of such mappings to follow the rules of a vector space— vector addition and scalar multiplication— we bring the degrees of freedom to manageable levels. The transformations that satisfy this preservation rule are called “linear transformations” and we can define them below.

\( M(I+J)=M(I)+M(J), M(cV)=cM(V) \)

Hopefully, what we wrote earlier about the “function” V makes this definition and the conclusion that the results form a vector space reasonable.

If you’re familiar with linear algebra already then you can probably skip the rest of this section. The key takeaway is this: so far our goal of representing geometry in higher dimensions has directly, almost inevitably led us to this key definition, and now we’ll be uncovering some of its consequences.

Exploring Linear Transformations

Geometrically, our algebraic definition means that any mapping that can be expressed as a sum of some I and J can be constructed if we know where these two are mapped: if a vector could be expressed as a sum of I and J before the mapping, then we know that that the image vector can be expressed as the exact same sum of the new I and J.

\( M(aI+bJ)=aM(I)+bM(J) \): Same vector sum of “a” I vectors and “b” J vectors.r

An awesome consequence of this basis-vector-decomposition method is that parallel vectors, which possess the same horizontal and vertical components with an added translation, map to vectors with the same decomposition and with their translation vectors scaled equally. Extending these vectors arbitrarily with our 2nd rule of constant multiplication we find that a set of evenly spaced parallel lines (each separated by the same translation vector) are always mapped to another set of evenly spaced parallel lines. This property, combined with the fact that the origin is always fixed (can you see why using our decomposition method?) allows us to view the linear scaling of a grid of lines, and it’s what gives linear transformations their distinctive “flavor”:

Red = new coordinates of I
Blue = new coordinates of J

Mappings with Matrices

Since we can describe any linear transform with just two vectors, mathematicians developed “matrices” as the primary syntax. Don’t let the jumble of numbers fool you: this box really just tells us where each basis vector goes. We can even vertically read out their end coordinates: in the example shown above, our pink basis vector “I” goes to (1,2) and our blue basis vector “J” goes to (2,0) so the matrix describing this new plane is below:

Computing the effect of this mapping is really just working out how we’d describe the position of the new vector in our coordinate system, and it’s just as simple; using our rule for scaling basis vectors after our mapping we can substitute the vertical columns of our matrices as follows:

\( M(aI+bJ)=aM(I)+bM(J) \)

\begin{bmatrix} 1 & 2\\ 2 & 0 \end{bmatrix} \begin{bmatrix} a\\b \end{bmatrix} =a\begin{bmatrix} 1\\2 \end{bmatrix}+b\begin{bmatrix} 2\\0 \end{bmatrix}=\begin{bmatrix} a+2b\\2a+0b \end{bmatrix}

Hold on, isn’t this a form of vector-vector multiplication? Not exactly. A matrix provides info about the positions of any number of basis vectors but isn’t an entity within a vector space itself. Instead, think of it as an operator like an integral that takes a vector input and yields a vector output after stretching each component vector by a different amount.

Describing Rotations (Definitely not foreshadowing):

We’ve been talking about stretching and squishing straight lines, but remember that this combination of horizontal and vertical scaling can also cause rotations. So let’s finish up our crash course of linear transformations by working out how we would describe simple rotations with vectors; this means no squishing or stretching space, just moving both basis vectors along a circle.

We know our standard basis vectors \( (0,1), (1,0) \) are perpendicular and equal in length, so the resulting grid is our familiar square one. To make sure our mapping preserves these squares, we need to make sure that our images also meet at right angles and have equal lengths. By solving for when these two conditions take place (give it a try: the most direct method would be to find equations describing equal magnitude and 2D perpendicularity for vectors \( (a,b) \) and \( (c,d) \), then solving the system for the resulting vectors) we find the following solution:

\( I \to (a,b), J\to (-b,a) \).

M_R=\begin{bmatrix} a & -b\\ b & a\ \end{bmatrix}

Since a and b can be as large as we want right now, this will also cause an expansion of the plane, so to make sure we keep magnitude fixed we can also specify that \( \sqrt{a^2+b^2}=1 \). And we’re done! With the basics of vector geometry added to our toolkit, let’s finally move on to their Calculus.

Defining Differentiation

What does the derivative of a vector-valued function—having a separate function for each coordinate— mean?

f(x,y)=(\frac{1}{xy}, \: \ln(x^2+y^2))

Not exactly the prettiest, and, more importantly, requiring a 4D graph to show the entire function of 2 inputs to 2 outputs! Using the tangent line to describe this function’s rate of change kind of loses it’s visual appeal beyond 3 dimensions, so let’s default to our basic definition of the derivative as relating change in one variable to the change in another. With single-valued functions, this is where the whole idea of “local linearity” when describing differentiable functions comes from: at infinitesimal ranges, the mapping approaches a linear relationship and we can describe it with a single number.

But we don’t have to stay confined to graphs of functions when talking about the notion of a derivative. Lines are defined by two variables having a proportional relationship: the input is scaled by some factor to match the output. Instead of modeling our single-valued derivative as a slope, a more flexible mental picture could be to describe its effect on a tiny segment of the number line. The image below showcases an example with a derivative of 10: think of the original series of points as lying on some number line of inputs, all of which are pulled out by the stretch factor until they find their new positions on some new line of outputs.

Stretching of the number line corresponding to a derivative of 10.

All this talk of linearity and tangent lines may have been ringing a bell, and hopefully this idea of the derivative as a linear stretch factor is making the comparison even more striking. Could the derivative’s local effect be a linear transformation? Well, what are some of the first two rules we learn about operations with derivatives?

Sum and Difference Rule: \( \frac{d }{dx}[u(x)+v(x)]=\frac{d }{dx}[u(x)]+\frac{d }{dx}[v(x)] \).

Constant Multiple Rule: \( \frac{d }{dx}[Cf(x)]=C\frac{d }{dx}[f(x)] \)

That’s right: long before entering the complex domain, we’ve worked with the derivative vector space without even knowing it! This is what we rigorously mean when we say the derivative uses “local linearity”: all differentiable functions are locally described by linear transformations that map one variable to another after a given stretch!

The Generalized Derivative

From here it’s completely natural to extend this linear-transformation derivative to higher dimensions, starting with a double-output, or 2D function (this is 2D in the sense of mapping a whole plane instead of just a line of points, not in the functional sense of 1 axis for inputs and 1 for outputs).

Two dimensions, 2 scaling factors. This “simple” transformation” doesn’t include rotation.

As we mentioned before, our tangent line model falls apart when there are multiple variables changing at any given moment. Luckily, each basis vector only requires change along its direction- the x-axis for I and y-axis for J-and can therefore be described by a simpler operation called the partial derivative:

\( d_xf=\frac{\partial}{\partial x}f=\lim_{x\to 0}\frac{f(x+h,y)-f(x,y)}{h} \)

There’s a lot of interesting algebra to go into here, but right now we can just use the symbol and its meaning to write the corresponding matrix of our infinitesimal linear transformation: given 2 functions u and v, where u yields the horizontal coordinate in our new plane and v yields the vertical, our new basis vectors are basically the degree to which a small shift in our former x and y directions affects the change in our new u and v directions:

\( I_{map}=(\partial_xu\partial_xv), J_{map}=(\partial_yu\partial_yv) \)

J(x,y)=\begin{bmatrix} \partial_xu & \partial_yu\\ \partial_xv & \partial_yv \end{bmatrix}

This is the 2D derivative: a measure of how much a function scales each coordinate of its 2D input. And it’s readily extendable to higher dimensions by adding a partial derivative for each new axis. This general matrix is known as the Jacobian:

The Jacobian matrix

One thing to remember is that just because the two scaling factors are independent, that doesn’t mean that they always work in perpendicular directions. When a matrix includes a vertical component to its mapping of I or a horizontal component to J, the new bases no longer have to remain perpendicular and can produce rotations such as the bizarre mapping below.

Whew, that last one is a lot to digest. But things could be much worse; linear transformations’ effects on parallel lines mean that they always map parallelograms t. And any real differentiable function in any dimension can be reduced to the powerful language of vector spaces at an infinitesimal level.

Just to clear up any confusion, this is NOT implying that all differentiable vector-valued functions result in linear transformations. Remember, \( x, e^x, \) and \( \tan{\sqrt{x}} \) all exhibit linear behavior if you look at an arbitrarily small interval, but these linear effects can add up to very non-linear graphs. The same is true for local linearity in higher dimensions.

With complex numbers, though, things aren’t so simple…they’re actually simpler.

A Complex Problem

Ironically, the added restrictions of the “complex” numbers actually make its geometric transformations much more limited and simple. Besides being a vector space, complex numbers have a well-defined multiplication and division purely in terms of elements of \( \mathbb{C} \) (\( \mathbb{C} \) refers to the set of all complex numbers). Of course they do: the whole reason they were invented was to solve equations involving operations that weren’t closed over the set of real numbers!

To see the effects of this decision, let’s go back to the geometry we want both systems to describe, zooming in on a general linear transformation and looking at its effect on an origin-centered circle of vectors. Now how would we describe this result with complex multiplication?

Left: Unit Circle before transformation
Right: Axes of circle stretched into major and minor axes of ellipse,

The best window into complex multiplication’s geometric effects is by using Euler’s formula: it divides the process into expansion by a real scalar and a rotation by a complex power.

\( r_1e^{i\theta_1}*r_2e^{i\theta_2}=r_1r_2e^{i\theta_1+\theta_2} \).

Right away we run into a problem: \( r_1r_2 \) is the only scale factor for this transformation. There’s no way to represent complex numbers being stretched by different amounts in different directions, and this makes describing transformations like those above impossible to represent with complex numbers.

But why can’t we just define a matrix-like operation for complex numbers? Sure you can. But then it would no longer be complex multiplication, and the new matrix structure is a new algebraic entity that doesn’t satisfy the other field operations that the set \( \mathbb{C} \) holds like commutative multiplication and division (yep, just like with quaternions, matrix multiplication isn’t commutative).

This means that our complex transformations through multiplication only really have two parts: a single expansion and a single rotation. If we think back to the matrix we derived for pure rotations and remove the requirement that I and J must have an ending length of 1 we can actually represent our complex multiplication as a class of linear transformations!

re^{i\theta}\to\begin{bmatrix} a & -b\\ b & a\ \end{bmatrix}:::a=d,b=-c

Complex Calculus

Knowing this fundamental limitation of complex numbers, let’s look at how it applies to complex differentiation. Differentiability is defined by its linearity just like with vectors and real numbers, but now it can only be represented by a single complex number and its limited group of linear transformations.

We can translate this to our Jacobian derivative to understand this problem algebraically. What if I were to give you the following matrix to represent the complex derivative of some function? Remember, complex differentiation still involves local linear transformations, so borrowing this tool from our vectors isn’t breaking any rules:

re^{i\theta}\to\begin{bmatrix} 2x & 3x^2\\ x^4 & -x\ \end{bmatrix}

Well, that looks awful to deal with. But hold on… it’s not just awful, it’s impossible! Glancing back to the matrix we used earlier to represent the most general linear transformation that complex multiplication represents, the diagonal entries just don’t fit the relationships specified by that matrix.

\( 2x=a, 3x^2=b, x^4=c, -x=d \)

\( 2x\ne -x, 3x^2\ne x^4 \)

The Cauchy-Reimann Equations

Since these relationships don’t always hold true for this matrix it can’t represent a complex derivative (no complaints there!) But using these rules, we can work out a set of equations for the partial derivatives that always will:

J(x,y)=\begin{bmatrix} \partial_xu & \partial_yu\\ \partial_xv & \partial_yv \end{bmatrix}

\partial_xu=a, \partial_yu=b, \partial_xv=c, \partial_yv=d

\partial_xu=\partial_yv

\partial_xv=-\partial_yu

This pair of equations is so fundamental to complex analysis that it was named after two great mathematicians (both of whom were pioneers with complex numbers): The Cauchy-Reimann equations give us an amazingly simple test for whether a function is complex-differentiable, which mathematicians call analytic: just compute the partial derivatives of u & v, which in complex terms are the real (horizontal) and imaginary (vertical) parts of the complex function.

Whew. With just the added rules of multiplication, we found that the complex numbers and vectors took on entirely different rules and abilities: one as an all-purpose tool for transformations in any dimension and the other as a rich number system that can describe algebraic structures with a simple but powerful geometric flair. To me, that emergent complexity is one of the coolest parts of math. Next time we’ll go into the weeds of analycity and the endless possibilities this differentiability allows them to achieve. See you then!