The Calculus of Several Variables

(1)

The Calculus of Several Variables

Robert C. Rogers

September 29, 2011

(2)

It is now known to science that there are many more dimensions than the classical four. Scientists say that these don’t normally impinge on the world because the extra dimensions are very small and curve in on themselves, and that since reality is fractal most of it is tucked inside itself. This means either that the universe is more full of wonders than we can hope to understand or, more probably, that scientists make things up as they go along.

Terry Pratchett

(3)

14 Introduction to Differential Calculus 94 15 Derivatives of Functions from Rto Rⁿ 96 16 Derivatives of Functions from Rⁿ to R 101 16.1 Partial Derivatives . . . 101 16.2 Higher Order Partial Derivatives . . . 102 16.3 The Chain Rule for Partial Derivatives . . . 105 17 Derivatives of Functions from Rⁿ to R^m 113 17.1 Partial Derivatives . . . 113 17.2 The Total Derivative Matrix . . . 114 17.3 The Chain Rule for Mappings . . . 118

18 Gradient, Divergence, and Curl 123

18.1 The Gradient . . . 123 18.2 The Divergence . . . 127 18.3 The Curl . . . 128 19 Differential Operators in Curvilinear Coordinates 133 19.1 Differential Operators Polar Coordinates . . . 133 19.2 Differential Operators in Cylindrical Coordinates . . . 137 19.3 Differential Operators in Spherical Coordinates . . . 139

20 Differentiation Rules 142

20.1 Linearity . . . 142 20.2 Product Rules . . . 142 20.3 Second Derivative Rules . . . 143

21 Eigenvalues 146

22 Quadratic Approximation and Taylor’s Theorem 157 22.1 Quadratic Approximation of Real-Valued Functions . . . 157 22.2 Taylor’s Theorem . . . 161

23 Max-Min Problems 165

23.1 First Derivative Test . . . 168 23.2 Second Derivative Test . . . 170 23.3 Lagrange Multipliers . . . 175

24 Nonlinear Systems of Equations 181

24.1 The Inverse Function Theorem . . . 181 24.2 The Implicit Function Theorem . . . 184

(5)

III Integral Calculus of Several Variables 190

25 Introduction to Integral Calculus 191

26 Riemann Volume in Rⁿ 195

27 Integrals Over Volumes inRⁿ 199

27.1 Basic Definitions and Properties . . . 199

27.2 Basic Properties of the Integral . . . 201

27.3 Integrals Over Rectangular Regions . . . 203

27.4 Integrals Over General Regions inR² . . . 205

27.5 Change of Order of Integration inR² . . . 208

27.6 Integrals over Regions inR³ . . . 211

28 The Change of Variables Formula 217 29 Hausdorff Dimension and Measure 231 30 Integrals over Curves 235 30.1 The Length of a Curve . . . 235

30.2 Integrals of Scalar Fields Along Curves . . . 237

30.3 Integrals of Vector Fields Along Paths . . . 239

31 Integrals Over Surfaces 244 31.1 Regular Regions and Boundary Orientation . . . 244

31.2 Parameterized Regular Surfaces and Normals . . . 245

31.3 Oriented Surfaces with Corners . . . 250

31.4 Surface Area . . . 254

31.5 Scalar Surface Integrals . . . 256

31.6 Surface Flux Integrals . . . 257

31.7 Generalized (n−1)-Dimensional Surfaces . . . 260

IV The Fundamental Theorems of Vector Calculus 263

32 Introduction to the Fundamental Theorem of Calculus 264

33 Green’s Theorem in the Plane 266

34 Fundamental Theorem of Gradients 274

35 Stokes’ Theorem 277

36 The Divergence Theorem 282

37 Integration by Parts 288

38 Conservative vector fields 291

(6)

Chapter 1

Introduction

This book is about the calculus of functions whose domain or range or both are vector-valued rather than real-valued. Of course, this subject is much too big to be covered completely in a single book. The full scope of the topic contains at least all of ordinary differential equations, partial differential equation, and differential geometry. The physical applications include thermodynamics, fluid mechanics, elasticity, electromagnetism, and cosmology. Since a comprehensive treatment is so ambitious, and since few undergraduates devote more than a semester to direct study of this subject, this book focuses on a much more limited goal. The book will try to develop a series of definitions and results that are parallel to those in an elementary course in the calculus of functions of a single variable.

Consider the following “syllabus” for an elementary calculus course.

1. Precalculus

• The arithmetic and algebra of real numbers.

• The geometry of lines in the plane: slopes, intercepts, intersections, angles, trigonometry.

• The concept of a function whose domain and range are both real numbers and whose graphs are curves in the plane.

• The concepts of limit and continuity 2. The Derivative

• The definition of the derivative as the limit of the slopes of secant lines of a function.

• The interpretation of the derivative as the slope of the tangent line.

• The characterization of the tangent line as the “best linear approximation” of a differentiable function.

• The development of various differentiation rules for products, com- posites, and other combinations of functions.

(7)

• The calculation of higher order derivatives and their geometric interpretation.

• The application of the derivative to max/min problems.

3. The Integral

• The calculation of the area under a curve as the limit of a Riemann sum of the area of rectangles

• The proof that for a continuous function (and a large class of simple discontinuous functions) the calculation of area is independent of the choice of partitioning strategy.

4. The Fundamental Theorem of Calculus

• The “fundamental theorem of calculus” - demonstration that the derivative and integral are “inverse operations”

• The calculation of integrals using antiderivatives

• Derivation of “integration by substitution” formulas from the fundamental theorem and the chain rule

• Derivation of “integration by parts” from the fundamental theorem and the product rule.

Now, this might be an unusual way to present calculus to someone learning it for the first time, but it is at least a reasonable way to think of the subject in review. We will use it as a framework for our study of the calculus of several variables. This will help us to see some of the interconnections between what can seem like a huge body of loosely related definitions and theorems¹.

While our structure is parallel to the calculus of functions of a single variable, there are important differences.

1. Precalculus

• The arithmetic and algebra of real numbers is replaced by linear algebra of vectors and matrices.

• The geometry the plane is replaced by geometry inRⁿ.

• Graphs in the plane are now graphs in higher dimensions (and may be difficult to visualize).

2. The Derivative

• Differential calculus for functions whose domain is one-dimensional turns out to be very similar to elementary calculus no matter how large the dimension of the range.

1In fact, the interconnections are even richer than this development indicates. It is important not to get the impression that this is the whole story. It is simply a place to start.

Nonetheless, it is a good starting point and will provide a structure firm enough to build on.

(8)

• For functions with a higher-dimensional domain, there are many ways to think of “the derivative.”

3. The Integral

• We will consider several types of domains over which we will integrate functions: curves, surfaces, oddly shaped regions in space.

4. The Fundamental Theorem of Calculus

• We will find a whole hierarchy of generalizations of the fundamental theorem.

Our general procedure will be to follow the path of an elementary calculus course and focus on what changes and what stays the same as we change the domain and range of the functions we consider.

Remark 1.1(On notation). A wise man once said that, “The more important a mathematical subject is, the more versions of notation will be used for that subject.” If the converse of that statement is true, vector calculus must be extremely important. There are many notational schemes for vector calculus.

They are used by different groups of mathematicians and in different application areas. There is no real hope that their use will be standardized in the near future. This text will use a variety of notations and will use different notations in different contexts. I will try to be clear about this, but learning how to read and interpret the different notations should be an important goal for students of this material.

Remark 1.2 (On prerequisites). Readers are assumed to be familiar with the following subjects.

• Basic notions of algebra and very elementary set theory.

• Integral and differential calculus of a single variable.

• Linear algebra including solution of systems of linear equations, matrix manipulation, eigenvalues and eigenvectors, and elementary vector space concepts such as basis and dimension.

• Elementary ordinary differential equations.

• Elementary calculations on real-valued functions of two or three variables such as partial differentiation, integration, and basic graphing.

Of course, a number of these subjects are reviewed extensively, and I am mindful of the fact that one of the most important goals of any course is to help the student to finally understand the material that was covered in the previous course. This study of vector calculus is a great opportunity to gain proficiency and greater insight into the subjects listed above.

(9)

Remark 1.3(On proofs). This text is intended for use by mathematicians and other scientists and engineers. While the primary focus will be on the calculation of various quantities related to the subject, some effort will be made to provide a rigorous background for those calculations, particularly in those cases where the proofs reveal underlying structure. Indeed, many of the calculations in this subject can seem like nothing more than complicated recipes if one doesn’t make an attempt to understand the theory behind them. On the other hand, this subject is full of places where the proofs of general theorems are technical nightmares that reveal little (at least to me), and this type of proof will be avoided.

Remark 1.4 (On reading this book). My intention in writing this book is to provide a fairly terse treatment of the subject that can realistically be read cover- to-cover in the course of a busy semester. I’ve tried to cut down on extraneous detail. However, while most of the exposition is directly aimed at solving the problems directly posed in the text, there are a number of discussions that are intended to give the reader a glimpse into subjects that will open up in later courses and texts. (Presenting a student with interesting ideas that he or she won’t quite understand is another important goal of any course.) Many of these ideas are presented in the problems. I encourage students to read even those problems that they have not been assigned as homework.

(10)

Part I

Precalculus of Several

Variables

(11)

Chapter 2

Vectors, Points, Norm, and Dot Product

In this part of the book we study material analogous to that studied in a typical “precalculus” course. While these courses cover some topics like functions, limits, and continuity that are closely tied to the study of calculus, the most important part of such a course is probably the broader topic of algebra. That is true in this course as well, but with an added complication. Since we will be dealing with multidimensional objects – vectors – we spend a great deal of time discussing linear algebra. We cover only relatively elementary aspects of this subject, and the reader is assumed to be somewhat familiar with them.

Definition 2.1. We define avector v∈Rⁿto be an n-tuple of real numbers v= (v1, v2, . . . , vn),

and refer to the numbersvi,i= 1, . . . , nas thecomponentsof the vector.

We define two operations on the set of vectors: scalar multiplication cv=c(v1, v2, . . . , vn) = (cv1, cv2, . . . , cvn)

for any real numberc∈Rand vectorv∈Rⁿ, andvector addition v+w= (v₁, v₂, . . . , v_n) + (w₁, w₂, . . . , w_n) = (v₁+w₁, v₂+w₂, . . . , v_n+w_n) for any pair of vectorsv∈Rⁿ andw∈Rⁿ.

(12)

Definition 2.2. If we have a collection of vectors{v₁,v₂, . . .v_k}and scalars {c1, c2, . . . , ck} we refer to

c1v1+c2v2+· · ·+ckvk

as alinear combinationof the vectors{v1,v2, . . .vk}.

Remark 2.3. We typically use boldface, lowercase, Latin letters to represent abstract vectors inRⁿ. Another fairly common notation represents a vector by a generic component with a “free index,” a subscript (usually,i,j, ork) assumed to range over the values from 1 ton. In this scheme, the vector v would be denoted byv_i, the vectorxbyx_i, etc.

Remark 2.4. At this point we make no distinction between vectors displayed as columns or rows. In most cases, the choice of visual display is merely a matter of convenience. Of course, when we involve vectors in matrix multiplication the distinction will be important, and we adopt a standard in that context.

Definition 2.5. We say that two vectors are parallel if one is a scalar multiple of the other. That is,x is parallel toy if there existsc ∈Rsuch that

x=cy.

Remark 2.6. At this juncture, we have given the space of vectors Rⁿ only an algebraic structure. We can add a geometric structure by choosing an origin and a set ofnperpendicularCartesian axesforn-dimensional geometric space.

With these choices made, every point X can be represented uniquely by its Cartesiancoordinates(x1, x2, . . . , xn). We then associate with every ordered pair of pointsX = (x1, x2, . . . , xn) andY = (y1, y2, . . . , yn) the vector

−−→XY = (y₁−x₁, y₂−x₂, . . . , y_n−x_n).

We think of this vector as a directed line segment or arrow pointing from the tail atX to the head atY. Note that a vector can be moved by “parallel transport”

so that its tail is anywhere in space. For example, the vectorv= (1,1) can be represented as a line segment with its tail atX = (3,4) and head atY = (4,5) or with tail atX⁰ = (−5,7) and head atY⁰ = (−4,8).

This geometric structure makes vector addition and subtraction quite interesting. Figure 2.1 presents a parallelogram with sides formed by the vectorsx andy. The diagonals of the parallelogram represent the sum and difference of these vectors.

(13)

:

@

@ I

x x

y

y y−x

x+y

Figure 2.1: This parallelogram has sides with the vectorsxandy. The diagonals of the parallelogram represent the sum and difference of the vectors. The sum can be obtained graphically by placing the tail of y at the head of x (or vice versa). The difference of two vectors is a directed line segment connecting the heads of the vectors. Note that the “graphic” sum ofxandy−xisy.

Definition 2.7. We define a set of vectorse_i ∈Rⁿ, 1≤i ≤ncalled the standard basis. These have the form

e1 = (1,0,0, . . . ,0), e2 = (0,1,0, . . . ,0),

...

e_n = (0,0,0, . . . ,1).

In component form, these vectors can be written (ei)j =δij =

0 ifi6=j, 1 ifi=j.

Here we have definedδij which is called theKronecker delta function.

In the special case ofR³ it is common to denote the standard basis by i = (1,0,0),

j = (0,1,0), k = (0,0,1).

Remark 2.8. Note that any vectorv= (v1, v2, . . . , vn)∈Rⁿcan be written as

(14)

a linear combination of the standard basis vectors v=

n

X

i=1

viei.

Definition 2.9. We define the (Euclidean)norm indexnorm of a vector x∈Rⁿ to be

kxk= q

x²₁+x²₂+· · ·+x²_n= v u u t

n

X

i=1

x²_i. A vectoreis called aunit vectorifkek= 1.

The distance between points X and Y (corresponding to the vectors x andy) is given by

k−−→

XYk=ky−xk.

Thedot product (or inner product) of two vectorsx∈Rⁿ andy∈Rⁿ is given by

x·y=x1y1+x2y2+· · ·+xnyn =

n

X

i=1

xiyi.

Remark 2.10. Note that for any nonzero vectorvwe can find a unit vectore parallel to that vector by defining

e= v kvk. In doing this we say we havenormalized v.

Remark 2.11. The standard basis vectors have an important relation to the dot product.

vi=v·ei. Thus, for any vectorv

v=

n

X

i=1

(v·ei)ei.

Let us now note a few important properties of the dot product

(15)

Theorem 2.12. For allx,y,w∈Rⁿand everyc∈Rwe have the following.

1. (x+y)·w=x·w+y·w.

2. c(x·y) = (cx)·y=x·(cy).

3. x·y=y·x.

4. x·x≥0.

5. x·x= 0 if and only ifx=0= (0,0, . . . ,0).

These are easy to prove directly from the formula for the dot product, and we leave the proof to the reader. (See Problem 2.8.)

Of course, there is an obvious relation between the norm and the dot product kxk=√

x·x. (2.1)

However, we now prove a more subtle and interesting relationship.

Theorem 2.13(Cauchy-Schwartz inequality). For all x,y∈Rⁿ

|x·y| ≤ kxkkyk.

Proof. For any real numberz∈Rwe compute 0 ≤ kx−zyk²

= (x−zy)·(x−zy)

= x·(x−zy)−zy·(x−zy)

= x·x−zx·y−zy·x+z²y·y

= kxk²−2z(x·y) +z²kyk².

We note that quantity on the final line is a quadratic polynomial in the variable z. (It has the form az²+bz+c.) Since the polynomial is never negative, its discriminant (b²−4ac) must not be positive (or else there would be two distinct real roots of the polynomial). Thus,

(2x·y)²−4kxk²kyk²≤0, or

(x·y)²≤ kxk²kyk².

Taking the square root of both sides and using the fact that|a|=√

a² for any real number gives us the Cauchy-Schwartz inequality.

We now note that the norm has the following important properties

(16)

Theorem 2.14. For allx,y∈Rⁿ and every c∈Rwe have the following.

1. kxk ≥0.

2. kxk= 0 if and only ifx=0= (0,0, . . . ,0).

3. kcxk=|c|kxk.

4. kx+yk ≤ kxk+kyk(The triangle inequality).

Proof. One can prove the first three properties directly from the formula for the norm. These are left to the reader in Problem 2.9. To prove the triangle inequality we use the Cauchy-Schwartz inequality and note that

kx+yk² = (x+y)·(x+y)

= x·(x+y) +y·(x+y)

= x·x+x·y+y·x+y·y

= kxk²+ 2x·y+kyk²

≤ kxk²+ 2|x·y|+kyk²

≤ kxk²+ 2kxkkyk+kyk²

= (kxk+kyk)².

Taking the square root of both sides gives us the result.

Remark 2.15. While some of the proofs above have relied heavily on the specific formulas for the norm and dot product, these theorems hold for more abstract norms and inner products. (See Problem 2.10.) Such concepts are useful in working with (for instance) spaces of functions in partial differential equations where a common “inner product” between two functions defined on the domain Ω is given by the formula

hf, gi= Z

Ω

f(x)g(x)dx.

We will not be working with general inner products in this course, but it is worth noting that the concepts of the dot product and norm can be extended to more general objects and that these extensions are very useful in applications.

Problems

Problem 2.1. Letx= (2,5,−1),y= (4,0,8), andz= (1,−6,7).

(a) Computex+y.

(b) Computez−x.

(c) Compute 5x.

(d) Compute 3z+ 6y.

(a) Compute 4x−2y+ 3z.

(17)

Problem 2.2. Letx= (1,3,1), y= (2,−1,−3), and z= (5,1,−2).

(a) Computex+y.

(b) Computez−x.

(c) Compute−3x.

(d) Compute 4z−2y.

(a) Computex+ 4y−5z.

Problem 2.3. For the following two-dimensional vectors, create a graph that representsx,y,−x, −y, x−y, andy−x.

(a)x= (2,1),y= (−1,4).

(b)x= (0,−3), y= (3,4).

(c)x= (4,2),y= (−5,6).

Problem 2.4. For the points X and Y below compute the vectors −−→

XY and

−−→Y X.

(a)X= (4,2,6), Y = (−2,3,1).

(b)X= (0,1,−4),Y = (3,6,9).

(c)X = (5,0,5), Y = (1,2,1).

Problem 2.5. Letx= (1,−2,0) andz= (−1,−4,3).

(a) Computekxk.

(b) Computekzk − kxk.

(c) Computekz−xk.

(d) Computex·z.

(e) Compute x·z kzk²z.

Problem 2.6. Letx= (2,0,1) and y= (1,−3,2).

(a) Computekxk.

(b) Computekyk − kxk.

(c) Computeky−xk.

(d) Computex·y.

(e) Compute x·y kyk²y.

Problem 2.7. Use graphs of “generic” vectorsx,y andx+y in the plane to explain how the triangle inequality gets its name. Show geometrically the case where equality is obtained.

Problem 2.8. Use the formula for the dot product of vectors inRⁿ to prove Theorem 2.12.

Problem 2.9. Use the formula for the norm of a vector inRⁿto prove the first three parts of Theorem 2.14.

(18)

Problem 2.10. Instead of using the formula for the norm of a vector inRⁿ, use (2.1) and the properties of the dot product given in Theorem 2.12 to prove the first three parts of Theorem 2.14. (Note that the proofs of the Cauchy Schwartz inequality and the triangle inequality depended only on Theorem 2.12, not the specific formulas for the norm or dot product.)

Problem 2.11. Show that

kx+yk²=kxk²+kyk² if and only if

x·y= 0.

Problem 2.12. (a) Prove that ifx·y= 0 for everyy∈Rⁿ thenx=0.

(b) Prove that ifu·y=v·y for everyy∈Rⁿ thenu=v.

Problem 2.13. The idea of a norm can be generalized beyond the particular case of the Euclidean norm defined above. In more general treatments, any function on a vector space satisfying the four properties of Theorem 2.14 is referred to as a norm. Show that the following two functions onRⁿ satisfy the four properties and are therefore norms.

kxk1=|x1|+|x2|+· · ·+|xn|.

kxk_∞= max

i=1,2,...,n|xi|.

Problem 2.14. InR²graph the three sets

S₁ = {x= (x, y)∈R²| kxk ≤1}, S2 = {x= (x, y)∈R²| kxk1≤1}, S3 = {x= (x, y)∈R²| kxk_∞≤1}.

Herek · kis the Euclidean norm andk · k1andk · k_∞are defined in Problem 2.13.

Problem 2.15. Show that there are constants c1, C1, c_∞ and C_∞ such that for everyx∈Rⁿ

c1kxk1≤ kxk ≤C1kxk1, c∞kxk∞≤ kxk ≤C∞kxk∞.

We say that pairs of norms satisfying this type of relationship areequivalent.

(19)

Chapter 3

Angles and Projections

While “angle” is a natural concept inR²orR³, it is much harder to visualize in higher dimensions. In Problem 3.5, the reader is asked to use the law of cosines from trigonometry to show that ifxandyare in the plane (R²) then

x·y=kxkkykcosθ.

In light of this, we define the angle between two general vectors in Rⁿ by extending the formula above in the following way. We note that ifxandyare both nonzero then the Cauchy-Schwartz inequality gives us

|x·y|

kxkkyk ≤1, or

−1≤ x·y kxkkyk ≤1.

This tells us that _kxkkyk^x·y is in the domain of the inverse cosine function, so we define

θ= cos⁻¹

x·y kxkkyk

∈[0, π]

to be the angle betweenxandy. This gives us cosθ= x·y

kxkkyk.

We state this definition formally and generalize the concept of perpendicular vectors in the following.

(20)

Definition 3.1. For any two nonzero vectorsxand yin Rⁿ we define the angleθbetween the two vectors by

θ= cos⁻¹

x·y kxkkyk

∈[0, π]

We say that x and y are orthogonal if x· y = 0. A set of vectors {v₁,v₂, . . . ,v_k}is said to be an orthogonal setif

vi·vj= 0 ifi6=j.

We say that as set of vectors {w1,w2, . . . ,wk} is orthonormal if it is orthogonal and each vector in the set is a unit vector. That is

wi·wj=δij=

0 ifi6=j, 1 ifi=j.

Example 3.2. The standard basiseiis an example of an orthonormal set.

Example 3.3. The set

{(1,1),(1,−1)}

is an orthogonal set inR². The set {(1/√

2,1/√

2),(1/√

2,−1/√ 2)}

is an orthonormal set inR².

The following computation is often useful

Definition 3.4. Lety∈Rⁿ be nonzero. For any vector x∈Rⁿ we define theorthogonal projectionofxontoyby

py(x) = x·y kyk²y.

The projection has the following properties. (See Figure 3.1.) Lemma 3.5. For anyy6=0andxinRⁿ we have

1. py(x)is parallel toy,

2. py(x)is orthogonal tox−py(x).

The first assertion follows directly from the definition of parallel vectors.

The second can be shown by direct computation and is left to the reader. (See Problem 3.8.)

(21)

1

B B B B B B B B B B B B B M

BB

1 y x

py(x)

x−py(x)

Figure 3.1: Orthogonal projection.

Example 3.6. Letx= (1,2,−1) andy= (4,0,3). Thenx·y= 1 andkyk= 5, so

py(x) = 1

25(4,0,3).

Note that

x−py(x) = 21

25,2,28 25

and that (sincepy(x) andyare parallel)

py(x)·(x−py(x)) =y·(x−py(x)) = 0.

Problems

Problem 3.1. Compute the angle between the following pairs of vectors.

(a)x= (−1,0,1,1), y= (2,2,1,0).

(b)x= (3,0,−1,0,1), y= (−1,1,2,1,0).

(c)x= (−1,0,1),y= (5,1,0).

Problem 3.2. Letx= (1,−2,0),y= (−3,0,1), and z= (−1,−4,3).

(a) Computep_y(x).

(b) Computepx(y).

(c) Computep_y(z).

(d) Computepz(x).

(22)

Problem 3.3. Determine whether each of the following is an orthogonal set (a)





 0 0 0 1





 ,





 1 1 0 0





 ,





 0 0 1 1





 .

(b)





 0 0

−1 1





 ,







−1 1

−1





 ,





 2 0 1 1





 .

(c)







−1 0 0

−1





 ,







−1 1 0 1





 ,





 0 0 1 0





 .

Problem 3.4. Determine whether each of the following is an orthonormal set (a)







1 32 3

0

2 3





 ,







2 31 3

0

−²₃





 ,







2 3

−²₃

1 31 3





 .

(b)





√1 2

0

−^√¹

2



,







√1 13

√3

√1 3





,



 0 0 1



. (c)





√1 2

−^√¹

2

0



,







√1 16

√6

√2 6





,







√1 13

√3

−^√¹

3





.

Problem 3.5.Ifxandyare any two vectors in the plane, andθis the (smallest) angle between them, thelaw of cosines¹from trigonometry says

kx−yk²=kxk²+kyk²−2kxkkykcosθ.

Use this to derive the identity

x·y=kxkkykcosθ.

1Note that the law of cosines reduces to the Pythagorean theorem ifθ=π/2

(23)

Problem 3.6. Suppose{w1,w2, . . . ,wn}is an orthonormal set. Suppose that for some constantsc1, c2, . . . , cn we have

x=c1w1+c2w2+· · ·+cnwn. Show that for anyi= 1,2, . . . , n

x·w_i=c_i.

Problem 3.7. Let{w₁,w₂, . . . ,w_k} be an orthonormal set inRⁿ and letx∈ Rⁿ. Show that

k

X

i=1

(x·wi)²≤ kxk². Hint: use the fact that

0≤

x−

k

X

i=1

(x·wi)wi

2

.

Problem 3.8. Show that for any vectors y 6=0 and x in Rⁿ the projection py(x) is orthogonal tox−py(x).

Problem 3.9. Show that for anyxand nonzeroy inRⁿ py(py(x)) =py(x),

That is, the projection operator applied twice is the same as the projection operator applied once.

(24)

Chapter 4

Matrix Algebra

In this section we define the most basic notations and computations involving matrices.

Definition 4.1. Anm×n(read “mbyn”)matrixAis a rectangular array ofmnnumbers arranged inmrows andncolumns.

A=







a11 a12 · · · a1n

a21 a22 · · · a2n

... ... . .. ... am1 am2 · · · amn





 .

We call

ai1 ai2 · · · ain thei^th rowofA, (1≤i≤m), and we call





 a1j

a2j

... a_mj







thej^th column ofA, (1 ≤j ≤n). We call the number aij in the i^th row and the j^th column the ij^th entry of the matrix A. The terms element andcomponentare also used instead of “entry.”

An abstract matrixAis often denoted by a typical entry aij

with twofree indices i (assumed to range from 1 tom) andj (assumed to range from 1 ton).

(25)

Remark 4.2. The entries in matrices are assumed to be real numbers in this text. Complex entries are considered in more complete treatments and will be mentioned briefly in our treatment of eigenvalues.

Definition 4.3. As with vectors inRⁿ, we can definescalar multiplica- tionof any numbercwith anm×nmatrixA.

cA=c







a₁₁ a₁₂ · · · a_1n a₂₁ a₂₂ · · · a_2n ... ... . .. ... a_m1 a_m2 · · · a_mn







=







ca₁₁ ca₁₂ · · · ca_1n ca₂₁ ca₂₂ · · · ca_2n ... ... . .. ... ca_m1 ca_m2 · · · ca_mn





 .

Theij^th entry ofcAis given by

c aij.

We can also definematrix additionprovided the matrices have the same number of rows and the same number of columns. As with vector addition we simply add corresponding entries

A+B =







a₁₁ a₁₂ · · · a_1n a₂₁ a₂₂ · · · a_2n ... ... . .. ... am1 am2 · · · amn





 +







b₁₁ b₁₂ · · · b_1n b₂₁ b₂₂ · · · b_2n ... ... . .. ... bm1 bm2 · · · bmn







=







a11+b11 a12+b12 · · · a1n+b1n

a21+b21 a22+b22 · · · a2n+b2n

... ... . .. ... am1+bm1 am2+bm2 · · · amn+bmn





 .

Theij^th entry ofA+B is given by aij+bij.

Remark 4.4. Matrix addition is clearly commutative as defined, i.e.

A+B =B+A.

We define scalar multiplication to be commutative as well:

Ac=cA.

Example 4.5. Let

A=

3 −2 4

−1 5 7

,

(26)

B=

0 1 −3 2 8 −9

,

C=

6 0 4 −7

. Then

A+B=

3 + 0 −2 + 1 4−3

−1 + 2 5 + 8 7−9

=

3 −1 1

1 13 −2

, and

2C=

2(6) 2(0) 2(4) 2(−7)

=

12 0 8 −14

.

The sumA+C is not well defined since the dimensions of the matrices do not match.

Definition 4.6. IfAis anm×pmatrix andB is ap×nmatrix, then the matrix productABis anm×nmatrixC whoseij^th entry is given by

c_ij =

p

X

k=1

a_ikb_kj.

Remark 4.7. We note that theij^th entry ofABis computed by taking thei^th row ofA and thej^th row of B (which we have required to be the same length (p)). We multiply the rows term by term and add the products (as we do in taking the dot product of two vectors).







a₁₁ a₁₂ . . . a_1p ... ... ... a_i1 a_i2 . . . a_1p

a_m1 a_m2 . . . a_mp













b11 · · · b₂₁ · · ·

... b_p1 · · ·





 b1j

b_2j ... b_pj







· · · b1n

· · · b_2n ...

· · · b_pn







=







c11 · · · c1j · · · c1n

c21 · · · c2j · · · c2n

... ... ...

ci1 · · · [cij] · · · cin

... ... ...

c_m1 · · · c_mj · · · c_mn





 .

(27)

Example 4.8.

6 0 4 −7

0 1 −3 2 8 −9

=

6(0) + 0(2) 6(1) + 0(8) 6(−3) + 0(−9) 4(0)−7(2) 4(1)−7(8) 4(−3)−7(−9)

=

0 6 −18

−14 −52 51

.

Remark 4.9. Matrix multiplication isassociative. That is, A(BC) = (AB)C

whenever the dimensions of the matrices match appropriately. This is easiest to demonstrate using component notation. We use the associative law for multiplication of numbers and the fact that finite sums can be taking any order (the commutative law of addition) to get the following.

n

X

j=1

aij(

m

X

k=1

bjkckl) =

n

X

j=1 m

X

k=1

aijbjkckl =

m

X

k=1

(

n

X

j=1

aijbjk)ckl.

Remark 4.10. Matrix multiplication is not commutative. To compute AB we must match the number of columns on the left matrixA with the number of rows of the right matrixB. The matrixBA. . .

• might not be defined at all,

• might be defined but of a different size thanAB, or

• might be have the same size asAB but have different entries.

We will see examples of this in Problem 4.2 below.

Remark 4.11. When a vectorx∈Rⁿ is being used in matrix multiplication, we will always regard it as a column vector orn×1 matrix.

x=





 x1

x2

... xn





 .

Definition 4.12. IfAis ann×nmatrix, the elementsa_ii,i= 1, . . . , nare called thediagonal elements of the matrix. A is said to bediagonalif a_ij= 0 wheneveri6=j.

(28)

Definition 4.13. Then×n diagonal matrixI with all diagonal elements a_ii= 1,i= 1, . . . , nis called then×nidentity matrix.

I=







1 0 · · · 0 0 1 · · · 0 ... ... . .. ... 0 0 · · · 1





 .

The entries of the identity can be represented by the Kronecker delta function,δij.

The identity matrix is a multiplicative identity. For anym×nmatrixBwe have

IB=BI=B.

Note that here we are using the same symbol to denote the m×m and the n×nidentity in the first and second instances respectively. This ambiguity in our notation rarely causes problems. In component form, this equation can be written

m

X

k=1

δikbkj=

n

X

k=1

bikδkj=bij.

Definition 4.14. We say that an n×n matrix A is invertible if there exists a matrixA⁻¹such that

AA⁻¹=A⁻¹A=I.

Example 4.15. Suppose that A=

a b c d

andad−bc6= 0. Then one can check directly that A⁻¹= 1

ad−bc

d −b

−c a

. Lemma 4.16. A matrix has at most one inverse.

The proof of this is left to the reader in Problem 4.3.

Lemma 4.17. SupposeA andB are invertiblen×n matrices. Then (AB)⁻¹=B⁻¹A⁻¹.

(29)

The proof of this is left to the reader in Problem 4.4.

Definition 4.18. Thetransposeof anm×nmatrixAis then×mmatrix A^T obtained by using the rows ofAas the columns ofA^T. That is, theij^th entry ofA^T is theji^th entry ofA.

a^T_ij =aji.

We say that a matrix issymmetric ifA=A^T andskew if−A=A^T.

Example 4.19.

0 1 −3 2 8 −9

T

=





0 2

1 8

−3 −9



. The matrix





2 −1 0

−1 2 −1

0 −1 2





is symmetric. The matrix





0 2 5

−2 0 −4

−5 4 0





is skew.

The next lemma follows immediately from the definition.

Lemma 4.20. For any matrices A andB and scalarc we have 1. (A^T)^T =A.

2. (A+B)^T =A^T+B^T if AandB are bothm×n.

3. (cA)^T =c(A^T).

4. (AB)^T =B^TA^T ifA ism×pandB isp×n.

5. (A⁻¹)^T = (A^T)⁻¹ ifA is an invertiblen×nmatrix.

The proof of this is left to the reader in Problem 4.5. We also note the following.

Lemma 4.21. If A is anm×n matrix,x∈R^m, andy∈Rⁿ then x·(Ay) = (A^Tx)·y.

(30)

Proof. If we look at this equation in component form we see that it follows directly from the definition of multiplication by the transpose and the associative and commutative laws for multiplication of numbers

n

X

i=1

xi(

m

X

j=1

aijyj) =

n

X

i=1 m

X

j=1

xiaijyj =

m

X

j=1

(

n

X

i=1

xiaij)yj.

Definition 4.22. Ann×nmatrixQisorthogonalif QQ^T =Q^TQ=I.

That is, ifQ^T =Q⁻¹.

Example 4.23. The 2×2 matrix

cosθ −sinθ sinθ cosθ

is orthogonal since

cosθ sinθ

−sinθ cosθ

=

cosθ sinθ

−sinθ cosθ

=

cos²θ+ sin²θ 0 0 cos²θ+ sin²θ

=I.

Problems Problem 4.1. Let

A=

2 3

−4 1

, B=

0 7 1 −5

,

C=





6 1

7 −8

−2 4



, D=

9 3 0 0 4 7

.

(a) Compute 2A.

(b) Compute 4A−2B.

(c) ComputeC−3D^T. (d) Compute 2C^T+ 5D.

(31)

Problem 4.2. Let A=

2 3

−4 1

, B=

0 7 1 −5

,

C=





6 1

7 −8

−2 4



, D=

9 3 0 0 4 7

.

(a) ComputeAB.

(b) ComputeBA.

(c) ComputeCD.

(d) ComputeDC

Problem 4.3. Show that the inverse of ann×nmatrixAis unique. That is, show that if

AB=BA=AC=CA=I, thenB=C.

Problem 4.4. Show that ifAandB are invertiblen×nmatrices then (AB)⁻¹=B⁻¹A⁻¹.

Problem 4.5. Prove Lemma 4.20.

Problem 4.6. Show that everyn×nmatrixAcan be written uniquely as the sum of a symmetric matrixE and a skew matrixW. Hint: If A=E+W then A^T =? We refer toE as the “symmetric part” ofAandW as the “skew part.”

Problem 4.7. While we don’t use it in this text, there is a natural extension of the dot product forn×nmatrices:

hA, Bi=

n

X

i=1 n

X

j=1

aijbij.

Show that ifAis symmetric andB is skew thenhA, Bi= 0.

Problem 4.8. LetA be any n×n matrix and letE be its symmetric part.

Show that for anyx∈Rⁿ we have

x^TAx=x^TEx.

(32)

Chapter 5

Systems of Linear

Equations and Gaussian Elimination

One of the most basic problems in linear algebra is the solution of a systemm linear equations innunknown variables. In this section we give a quick review of the method of Gaussian elimination for solving these systems.

The following is a generic linear system.

a11x1+a12x2+· · ·+a1nxn = b1, a₂₁x₁+a₂₂x₂+· · ·+a_2nx_n = b₂,

... a_m1x₁+a_m2x₂+· · ·+a_mnx_n = b_m.

Here, we assume thataij, i = 1, . . . , m, j = 1, . . . , n and bi, i = 1, . . . , m are known constants. We call the constants a_ij the coefficients of the system.

The constantsb_i are sometimes referred to as thedata of the system. The n variablesx_j,j= 1, . . . , nare called theunknownsof the system. Any ordered n-tuple (x₁, x₂, . . . , x_n)∈Rⁿthat satisfies each of themequations in the system simultaneously is asolutionof the system.

We note that the generic system above can be written in term of matrix multiplication.







a₁₁ a₁₂ · · · a_1n a₂₁ a₂₂ · · · a_2n ... ... . .. ... am1 am2 · · · amn











 x₁ x₂ ... xn







=





 b₁ b₂ ... bm





 ,

or

Ax=b.

(33)

HereA is them×ncoefficient matrix,x∈Rⁿ is the vector of unknowns, and b∈R^m is the data vector.

It is worth considering the very simple case wheren=m= 1. Our equation reduces to

ax=b

where a, x, and b are all real numbers. (We think of a and b as given; x is unknown.) The only alternatives for solutions of this equation are as follows.

• Ifa6= 0 then the equation has the unique solutionx= ^b_a.

• Ifa= 0 then there are two possibilities.

– Ifb= 0 then the equation 0·x= 0 is satisfied by anyx∈R. – Ifb6= 0 then there is no solution.

We will see these three alternatives reflected in our subsequent results, but we can get at least some information about an important special case immediately.

We call a system ofmequations innunknowns (or the equivalent matrix equa- tionsAx=b)homogeneousifbi= 0,i= 1, . . . , m, (or equivalently,b=0).

We note that every homogeneous system has at least one solution, thetrivial solutionx_j = 0,j= 1, . . . , n, (x=0).

More generally, a systematic development of the method of Gaussian elimination (which we won’t attempt in this quick review) reveals an important result.

Theorem 5.1. For any linear system ofmlinear equations innunknowns, exactly one of the three alternatives holds.

1. The system has a unique solution (x₁, . . . , x_n).

2. The system has an infinite family of solutions.

3. The system has no solution.

The following examples of the three alternatives are simple enough to solve by inspection or by solving the first equation for one variable and substituting that into the second. The reader should do so and verify the following.

Example 5.2. The system

2x1−3x2 = 1, 4x1+ 5x2 = 13, has only one solution: (x1, x2) = (2,1).

(34)

x1−x2 = 5, 2x1−2x2 = 10,

(which is really two copies of the “same” equation) has a an infinite collection of solutions of the form (x1, x2) = (5 +s, s) wheresis any real number.

x₁−x₂ = 1, 2x1−2x2 = 7, has no solutions.

A systematic development of Gaussian elimination:

• shows that the three alternatives are the only possibilities,

• tells us which of the alternatives fits a given system, and

• allows us to compute any solutions that exist.

As noted above, we will not attempt such a development, but we provide enough detail that readers can convince themselves of the first assertion and do the computations described in the second and third.

Returning to the general problem of Gaussian elimination, we have an abbre- viated way of representing the generic matrix equation with a singlem×(n+ 1) augmented matrixobtained by using the data vector as an additional column of the coefficient matrix.







a11 a12 · · · a1n b1

a₂₁ a₂₂ · · · a_2n b₂ ... ... . .. ... ... a_m1 a_m2 · · · a_mn b_m







The augmented matrix represents the corresponding system of equations in either itsmscalar equation or single matrix equation form.

Gaussian elimination involves manipulating an augmented matrix using the following operations.

Definition 5.5. We call the followingelementary row operations of a matrix:

1. Multiplying any row by a nonzero constant, 2. Interchanging any two rows,

3. Adding a multiple of any row to another (distinct) row.

(35)

Example 5.6. Multiplying the second row of





0 3 −1

2 −6 10

−3 7 9





by ¹₂ yields





0 3 −1

1 −3 5

−3 7 9



. Interchanging the first two rows of this matrix yields





1 −3 5

0 3 −1

−3 7 9



.

Adding three times the first row to the third of this matrix yields





1 −3 5

0 3 −1

0 −2 24



. Multiplying the third row of this matrix by−¹₂ yields





1 −3 5

0 3 −1

0 1 −12



. Interchanging the second and third rows yields





1 −3 5

0 1 −12

0 3 −1



.

Adding−3 times the second row of this matrix to the third yields





1 −3 5

0 1 −12

0 0 35



. Dividing the final row by 35 yields





1 −3 5

0 1 −12

0 0 1



.

Elementary row operations have an important property: they don’t change the solution set of the equivalent systems.

(36)

Theorem 5.7. Suppose the matrix B is obtained from the matrix A by a sequence of elementary row operations. Then the linear system represented by the matrixBhas exactly the same set of solution as the system represented by matrixA.

The proof is left to the reader. We note that it is obvious that interchanging equations or multiplying both sides of an equation by a nonzero constant (which are equivalent to the first two operations) doesn’t change the set of solutions.

It is less obvious that the third type of operation doesn’t change the set. Is it easier to show that the operation doesn’t destroy solutions or doesn’t create new solutions? Can you “undo” a row operation of the third type by doing another row operation?

Example 5.8. If we interpret the matrices in Example 5.6 as augmented matrices, the first matrix represents the system

3x2 = −1, 2x1−6x2 = 10,

−3x1+ 7x2 = 9.

while the final matrix represents the system x1−3x2 = 5,

x2 = −12, 0 = 1.

According to our theorem these systems have exactly the same solution set.

While it is not that hard to see that the first system has no solutions, the conclusion is immediate for the second system. That is because we have used the elementary row operations to reduce the matrix to a particularly convenient form which we now describe.

Gaussian elimination is the process of using a sequence of elementary row operations to reduce an augmented matrix in a standard form calledreduced row echelon formfrom which it is easy to read the set of solutions. The form has the following properties.

1. Every row is either a row of zeros or has a one as its first nonzero entry (a “leading one”).

2. Any row of zeros lies below all nonzero rows.

3. Any column containing a leading one contains no other nonzero entries.

4. The leading one in any row must lie to the left of any leading one in the rows below it.

(37)

Example 5.9. The matrix







1 0 0 3

0 1 0 2

0 0 1 7

0 0 0 0







is in reduced row echelon form. If we interpret the matrix as an augmented matrix corresponding to a system of five equations in three unknowns it is equivalent to the matrix equation







1 0 0 0 1 0 0 0 1 0 0 0 0 0 0









 x₁ x₂ x₃



=





 3 2 7 0 0





 ,

or the system of five scalar equation

x₁ = 3, x2 = 2, x3 = 7, 0 = 0, 0 = 0.

Of course, the unique solution is (x1, x2, x3) = (3,2,7).

Example 5.10. Again, the matrix







1 0 2 5

0 1 −4 6

0 0 0 0







is in reduced row echelon form. If we again interpret the matrix as an augmented matrix corresponding to a system of five equations in three unknowns it is equivalent to the matrix equation







1 0 2

0 1 −4

0 0 0









 x1

x2

x3



=





 5 6 0 0 0





 ,

(38)

x1+ 2x3 = 5, x₂−4x₃ = 6, 0 = 0, 0 = 0, 0 = 0.

This system has an infinite family of solutions and there are many ways of describing them. The most convenient is to allow the variables corresponding to columns without a leading one to take on an arbitrary value and solve for the variables correspond to columns with a leading one in terms of these. In this situation we note that the third column has no leading one¹so we take x3=s wheresis any real number and solve forx₁andx₂ to get the solution set

(x1, x2, x3) = (5−2s,6 + 4s, s) = (5,6,0) +s(−2,4,1), s∈R. Example 5.11. Finally, the matrix







1 0 0 −1

0 1 0 0

0 0 1 8

0 0 0 1

0 0 0 0







is in reduced row echelon form. If we interpret the matrix as an augmented matrix corresponding to a system of five equations in three unknowns it is equivalent to the matrix equation







1 0 0 0 1 0 0 0 1 0 0 0 0 0 0









 x₁ x2

x3



=







−1 0 8 1 0





 ,

x1 = −1, x2 = 0, x₃ = 0, 0 = 1, 0 = 0.

If is clear from the fourth equation that there is no solution to this system.

1Neither does the fourth column, but it is the “data column” and does not correspond to an unknown variable.

(39)

Problem 5.1 gives a number of examples of matrices in reduced row echelon form and asks the reader to give the set of solutions. There is an intermediate form called row echelon form. In this form, columns are allowed to have nonzero entries above the leading ones (though still not below). From this form it is easy to determine which of the three solution alternatives hold. Problem 5.2 gives a number of examples of matrices in this form. The reader is asked to determine the alternative by inspection and then determine all solutions where they exist.

Since this is a review, we will not give an elaborate algorithm for using elementary row operation to reduce an arbitrary matrix to reduced row echelon form. (More information on this is given in the references.) We will content ourselves with the following simple examples.

Example 5.12. The system in Example 5.2 can be represented by the augmented matrix

2 −3 1

4 5 13

.

In order to “clear out” the first column, let us add−2 times the first row to the second to get

2 −3 1 0 11 11

. We now divide the second row by 11 to get

2 −3 1

0 1 1

. Adding 3 times the second row to the first gives

2 0 4 0 1 1

.

Finally, dividing the first row by 2 puts the matrix in reduced row echelon form 1 0 2

0 1 1

. This is equivalent to the system

x1 = 2, x₂ = 1, which describes the unique solution.

Note that we chose the order of our row operation in order to avoid introduc- ing fractions. A computer computation would take a more systematic approach and treat all coefficients as floating point numbers, but our approach makes sense for hand computations.

(40)

1 −1 5 2 −2 10

.

Taking−2 times the first row of this matrix and adding it to the second in order to clear out the first column yields

1 −1 5

0 0 0

.

This is already in reduced row echelon form and is equivalent to the equation x1−x2= 5.

Since the second column has no leading one, we let the corresponding variable, x2 take on an arbitrary value x2 = s ∈ R and solve for the system for those variables whose column contains a leading one. (In this case,x₁) Our solutions can be represented as

x₁ = s+ 5, x2 = s, for anys∈R.

1 −1 1 2 −2 7

.

Taking−2 times the first row of this matrix and adding it to the second in order to clear out the first column yields

1 −1 1

0 0 5

.

Without even row reducing further, we can see that the second row represents the equation

0 = 5.

Therefore, this system can have no solutions.

Problems

Problem 5.1. The following matrices in reduced row echelon form represent augmented matrices of systems of linear equations. Find all solutions of the systems.

(41)

(a)







1 0 0 −2

0 1 0 3

0 0 1 5

0 0 0 0





 .

(b)







1 4 0 0 −3 −2

0 0 1 0 2 3

0 0 0 1 2 5

0 0 0 0 0 0





 .

(c)







1 0 0 −2 0

0 1 0 4 0

0 0 1 7 0

0 0 0 0 1





 .

Problem 5.2. The following matrices in row echelon form represent augmented matrices of systems of linear equations. Determine by inspection which of the three alternatives hold: a unique solution, an infinite family of solutions, or no solution. Find all solutions of the systems that have them.

(a)







1 2 2 −2 2

0 1 5 3 −2

0 0 1 7 6

0 0 0 0 1





 .

(b)







1 4 0 2 −3 −2

0 1 1 4 2 3

0 0 0 1 2 5

0 0 0 0 1 2





 .

(c)





1 1 4 −2

0 1 3 3

0 0 1 5



.

Problem 5.3. The following matrices represent augmented matrices of systems of linear equations. Find all solutions of the systems.

(a)





3 1 1 2

4 0 2 2

1 −3 −2 3



. (b)

3 −6 1 6 0

2 −4 −3 −7 0

.

The Calculus of Several Variables