1.1 The Vector Space R

(1)

1 VECTOR SPACES AND SUBSPACES

What is a vector? Many are familiar with the concept of a vector as:

• Something which has magnitude and direction.

• an ordered pair or triple.

• a description for quantities such as Force, velocity and acceleration.

Such vectors belong to the foundation vector space - Rⁿ - of all vector spaces. The properties of general vector spaces are based on the properties of Rⁿ. It is therefore helpful to consider briefly the nature of Rⁿ.

1.1 The Vector Space R

ⁿ

Definitions

• If n is a positive integer, then an ordered n-tuple is a sequence of n real numbers (a₁, a₂, . . . , a_n). The set of all ordered n-tuples is called n-space and is denoted byRⁿ.

When n = 1 each ordered n-tuple consists of one real number, and so R may be viewed as the set of real numbers. Take n = 2 and one has the set of all 2-tuples which are more commonly known as ordered pairs. This set has the geometrical interpretation of describing all points and directed line segments in the Cartesianx−y plane. The vector spaceR³, likewise is the set of ordered triples, which describe all points and directed line segments in 3-D space.

In the study of 3-space, the symbol (a₁, a₂, a₃) has two different geometric in- terpretations: it can be interpreted as a point, in which case a₁, a₂ and a₃ are the coordinates, or it can be interpreted as a vector, in which case a₁, a₂ and a₃ are the components. It follows, therefore, that an ordered n-tuple (a₁, a₂, . . . , a_n) can be

(2)

viewed as a “generalized point” or a “generalized vector” - the distinction is math- ematically unimportant. Thus, we can describe the 5-tuple (1,2,3,4,5) either as a point or a vector in R⁵.

Definitions

• Two vectorsu= (u1, u2, . . . , un) and v= (v1, v2, . . . , vn) inRⁿ are calledequal if

u1 =v1, u2 =v2, . . . , un =vn

• The sum u+v is defined by

u+v= (u₁+v₁, u₂+v₂, . . . , u_n+v_n)

• Letk be any scalar, then the scalar multipleku is defined by ku= (ku₁, ku₂, . . . , ku_n)

• These two operations of addition and scalar multiplication are called thestan- dard operations onRⁿ.

• The zero vectorin Rⁿ is denoted by 0 and is defined to be the vector 0 = (0,0, . . . ,0)

• The negative (or additive inverse) of u is denoted by -u and is defined by

−u= (−u₁,−u₂, . . . ,−u_n)

• The differenceof vectors in Rⁿ is defined by v−u=v+ (−u)

The most important arithmetic properties of addition and scalar multiplication of vectors in Rⁿ are listed in the following theorem. This theorem enabes us to manipulate vectors in Rⁿ without expressing the vectors in terms of componenets.

(3)

Theorem 1.1. If u= (u₁, u₂, . . . , u_n),v= (v₁, v₂, . . . , v_n), and w= (w₁, w₂, . . . , w_n) are vectors in Rⁿ and k and l are scalars, then:

1. u+v=v+u

2. u+ (v+w) = (u+v) +w 3. u+0=0+u=u

4. u+ (−u) =0; that is, u−u=0 5. k(lu) = (kl)u

6. k(u+v) = ku+kv 7. (k+l)u=ku+lu 8. 1u=u

1.2 Generalized Vector Spaces

The time has now come to generalize the concept of a vector. In this section a set of axioms are stated, which if satisfied by a class of objects, entitles those objects to be called “vectors”. The axioms were chosen by abstracting the most important properties (theorem 1.1). of vectors in Rⁿ; as a consequence, vectors in Rⁿ automatically satisfy these axioms. Thus, the new concept of a vector, includes many new kinds of vector without excluding the “common vector”. The new types of vectors include, among other things, various kinds of matrices and functions.

Definition

A vector space V over a field F is a nonempty set on which two operations are defined - addition and scalar multiplication. Addition is a rule for associating with each pair of objectsu and v inV an object u+v, and scalar multiplication is a rule for associating with each scalar k∈F and each objectu inV an objectku such that

(4)

1. If u,v∈V, thenu+v∈V. 2. If u ∈V and k∈F, then ku∈V. 3. u+v=v+u

4. u+ (v+w) = (u+v) +w

5. There is an object0in V, called azero vectorforV, such thatu+0=0+u=u for all u in V.

6. For each u in V, there is an object -u inV, called the additive inverse of u, such that u+ (−u) = −u+u=0;

7. k(lu) = (kl)u 8. k(u+v) = ku+kv 9. (k+l)u=ku+lu 10. 1u=u

Remark The elements of the underlying field F are called scalars and the elements of the vector space are called vectors. Note also that we often restrict our attention to the case when F=R or C.

Examples of Vector Spaces

A wide variety of vector spaces are possible under the above definition as illus- trated by the following examples. In each example we specify a nonempty set of objects V. We must then define two operations - addition and scalar multiplication, and as an exercise we will demonstrate that all the axioms are satisfied, hence entitling V with the specified operations, to be called a vector space.

1. The set of all n-tuples with entries in the field F, denoted Fⁿ (especially note Rⁿ and Cⁿ).

(5)

2. The set of all m×n matrices with entries from the field F, denoted Mm×n(F).

3. The set of all real-valued functions defined on the real line (−∞,∞).

4. The set of polynomials with coefficients from the field F, denoted P(F).

5. (Counter example) Let V = R² and define addition and scalar multiplication oparations as follows: Ifu = (u1, u2) and v= (v1, v2), then define

u+v= (u₁+v₁, u₂+v₂) and if k is any real number, then define

ku = (ku₁,0).

1.2.1 Some Properties of Vectors

It is important to realise that the following results hold for all vector spaces. They provide a useful set of vector properties.

Theorem 1.2. If u, v, w ∈V (a vector space) such that u+w =v+w, then u=v.

Corollary 1.1. The zero vector and the additive inverse vector (for each vector) are unique.

Theorem 1.3. Let V be a vector space over the field F, u∈V, and k ∈F. Then the following statement are true:

(a) 0u=0 (b) k0=0

(c) (−k)u=−(ku) =k(−u) (d) If ku=0, then k= 0 or u= 0.

(6)

1.2.2 Quiz True or false?

(a) Every vector space contains a zero vector.

(b) A vector space may have more than one zero vector.

(c) In any vector space, au=bu implies a=b.

(d) In any vector space, au=av impliesu =v.

1.3 Subspaces

It is possible for one vector space to be contained within a larger vector space. This section will look closely at this important concept.

Definitions

• A subsetW of a vector spaceV is called asubspaceofV ifW is itself a vector space under the addition and scalar multiplication defined on V.

In general, all ten vector space axioms must be verified to show that a set W with addition and scalar multiplication forms a vector space. However, if W is part of a larget setV that is already known to be a vector space, then certain axioms need not be verified for W because they are inherited from V. For example, there is no need to check that u+v=v+u (axiom 3) for W because this holds for all vectors in V and consequently holds for all vectors in W. Likewise, axioms 4, 7, 8, 9 and 10 are inherited byW fromV. Thus to show that W is a subspace of a vector space V (and hence that W is a vector space), only axioms 1, 2, 5 and 6 need to be verified. The following theorem reduces this list even further by showing that even axioms 5 and 6 can be dispensed with.

Theorem 1.4. If W is a set of one or more vectors from a vector space V, then W is a subspace of V if and only if the following conditions hold.

(a) If u and v are vectors in W, then u + v is in W.

(7)

(b) If k is any scalar and u is any vector in W, then ku is in W.

Proof. If W is a subspace of V, then all the vector space axioms are satisfied; in particular, axioms 1 and 2 hold. These are precisely conditions (a) and (b).

Conversely, assume conditions (a) and (b) hold. Since these conditions are vector space axioms 1 and 2, it only remains to be shown that W satisfies the remaining eight axioms. Axioms 3, 4, 7, 8, 9 and 10 are automatically satisfied by the vectors in W since they are satisfied by all vectors in V. Therefore, to complete the proof, we need only verify that Axioms 5 and 6 are satisfied by vectors in W.

Letube any vector inW. By condition (b), kuis in W for every scalark. Setting k = 0, it follows from theorem 1.3 that 0u=0is in W, and settingk =−1, it follows that (−1)u =−u is in W.

Remarks

• Note that a consequence of (b) is that 0 is an element of W.

• A set W of one or more vectors from a vector space V is said to be closed under additionif condition (a) in theorem 1.4 holds andclosed under scalar multiplication if condition (b) holds. Thus, theorem 1.4 states that W is a subspace ofV if and only ifW is closed under addition and closed under scalar multiplication.

Examples of Subspaces

1. A plane through the origin of R³ forms a subspace of R³. This is evident geometrically as follows: Let W be any plane through the origin and let u and v be any vectors in W other than the zero vector. Then u+v must lie in W because it is the diagonal of the parallelogram determined byu and v, and ku must lie in W for any scalark becauseku lies on a line through u. Thus, W is closed under addition and scalar multiplication, so it is a subspace of R³.

(8)

2. A line through the origin of R³ is also a subspace of R³. It is evident geometrically that the sum of two vectors on this line also lies on the line and that a scalar multiple of a vector on the line is on the line as well. Thus, W is closed under addition and scalar multiplication, so it is a subspace of R³.

3. Let n be a positive integer, and let W consist of all functions expressible in the form

p(x) = a₀+a₁x+· · ·+a_nxⁿ

where a₀, . . . , a_n belong to some field F. Thus, W consists of the zero function together with all polynomials inF of degree n or less. The setW is a subspace of P(F) (example 4 on page 5), and if F=R it is also a subspace of the vector space of all real-valued functions (discussed in example 3 on page 5).

To see this, let p and qbe the polynomials

p(x) = a₀+a₁x+· · ·+a_nxⁿ and

q(x) =b₀+b₁x+· · ·+b_nxⁿ Then

(p+q)(x) = p(x) +q(x) = (a₀+b₀) + (a₁+b₁)x+· · ·+ (a_n+b_n)xⁿ and

(kp)(x) = kp(x) = (ka₀) + (ka₁)x+· · ·+ (ka_n)xⁿ

These functions have the form given above, so p + q and kp lie in W. This vector spaceW is denoted P_n(F).

4. The transpose A^T of an m×n matrix A is the n×m matrix obtained from A by interchanging rows and columns. A symmetric matrix is a square matrix A such thatA^T =A. The set of all symmetric matrices in M_n×n(F) is a subspace of Mn×n(F).

(9)

5. The trace of an n× n matrix A, denoted tr(A), is the sum of the diagonal entries ofA. The set ofn×n matrices having trace equal to zero is a subspace of Mn×n(F).

1.3.1 Operations on Vector Spaces Definitions

• The addition of two subsets U and V of a vector space is defined by:

U +V ={u+v|u∈U,v∈V}

• The intersection ∩ of two subsets U and V of a vector space is defined by:

U ∩V ={w|w∈U andw∈V}

• A vector space W is called the direct sum ofU and V, denotedU⊕V, if U and V are subspaces ofW with U ∩V ={0}and U +V =W.

The following theorem shows how we can form a new subspace from other ones.

Theorem 1.5. Any intersection or sum of subspaces of a vector space V is also a subspace of V.

(a) IfV is a vector space and W is a subset ofV that is also a vector space, then W is a subspace of V.

(b) The empty set is a subspace of every vector space.

(c) IfV is a vector space other than the zero vector space, thenV contains a subspace W such that W 6=V.

(d) The intersection of any two subsets of V is a subspace ofV. (e) Any union of subspaces of a vector space V is a subspace of V.

(10)

1.4 Linear Combinations of Vectors and Systems of Linear Equations

Have m linear equations in n variables:

a11x1+a12x2+· · ·+a1nxn = b1

a21x1+a22x2+· · ·+a2nxn = b2

...

am1x1+am2x2+· · ·+amnxn = bm

Write in matrix form: Axxx=bbb.

A= [aij] is the m×n coefficient matrix.

x x x =





 x₁

... x_n







is the column vector of unknowns, andbbb =





 b₁

... b_m







is the column vector of RHS.

Note: a_ij,b_j ∈R or C.

1.4.1 Gaussian Elimination To solve Axxx=bbb:

write augmented matrix: [A|bbb].

1. Find the left-most non-zero column, say column j.

2. Interchange top row with another row if necessary, so top element of column j is non-zero. (The pivot.)

3. Subtract multiples of row 1 from all other rows so all entries in column j below the top are then 0.

4. Cover top row; repeat 1 above on rest of rows.

Continue until all rows are covered, or until only 00. . .0 rows remain.

(11)

Result is a triangular system, easily solved by back substitution: solve the last equation first, then 2nd last equation and so on.

1.4.2 Example

Use Gaussian elimination to solve:

x₃ − x₄ = 2

−9x₁−2x₂+ 6x₃−12x₄ = −7 3x₁+ x₂−2x₃+ 4x₄ = 2

2x₃ = 6

1.4.3 Definition (row echelon form)

A matrix is in row echelon form (r.e.f.) if each row after the first starts with more zeros than the previous row (or else rows at bottom of matrix are all zeros).

The Gauss algorithm converts any matrix to one in row echelon form. The 2 matrices are equivalent, that is, they have the same solution set.

1.4.4 Elementary row operations 1. r_i ↔r_j : swap rowsi and j.

2. r_i →r_i−cr_j : replace row i with (rowi minus ctimes row j).

3. r_i →cr_i :

replace row i with ctimes row i, wherec6= 0.

The Gauss algorithm uses only 1 and 2.

1.4.5 Possible solutions for Axxx=bbb

Consider the r.e.f. of [A|bbb]. Then we have three possibilities:

(12)

(1) Exactly one solution; here the r.e.f. gives each variable a single value, so the number of variables, n, equals the number of non-zero rows in the r.e.f.

(2) No solution; when one row of r.e.f. is (0 0. . . d) with d 6= 0. We can’t solve 0x₁+ 0x₂+· · ·+ 0x_m =d if d 6= 0; it says 0 = d. In this case the system is said to be inconsistent.

(3) Infinitely many solutions; here the number of rows of the r.e.f. is less than the number of variables.

Note that a homogeneous system has bbb = 000, i.e., all zero RHS. Then we always have at least the trivial solution, x_i = 0, 1≤i≤n.

1.4.6 Examples

x₁+x₂− x₃ = 0 2x₁−x₂ = 0 4x₁ +x₂−2x₃ = 0

x₂−2x₃+ 4x₄ = 2 2x₂−3x₃+ 7x₄ = 6 x₃− x₄ = 2

1.4.7 Different right hand sides

To solve Axxx=bbb_j, forj = 1, . . . , r, forr different sets of right hand sides bbb_j:

Form a big augmented matrix [A|bbb₁bbb₂. . . bbb_r] and find its r.e.f. [U|bbb⁰₁bbb⁰₂. . . bbb⁰_r]. So U will be a r.e.f. corresponding to A. Then solve each of the systems Uxxx = bbb⁰_j, j = 1,2, . . . , r, by back substitution.

(13)

1.4.8 Special case: finding A⁻¹ (if it exists)

IfAisn×nand it has an inverse, then solvingAxxx=eee_j (whereeee_j is the n×1 column with 1 in jth place and 0 elsewhere) gives jth column of A⁻¹.

So we find r.e.f. of [A|eee₁eee₂. . . eee_n], i.e., determine the r.e.f. of [A|I] whereI isn×n identity matrix.

Once we have found the r.e.f. of [A|I] to be [U|∗], we then use row operations to convert it to [I|D], so D=A⁻¹.

If the last row of U is all zeros, A has no inverse.

Note that if A and I are square,AC =I implies CA=I and conversely.

If such a matrix C exists, it is unique. We write C = A⁻¹, and we say A is non-singular orinvertible.

1.4.9 Example

Does A=







1 −1 4

1 0 −2

2 −2 10







have an inverse?

If so, find it.

1.4.10 Linear combinations Definitions

• A vectorwis called alinear combination of the vectorsv₁, v₂, . . . , v_r if it can be expressed in the form

w=k₁v₁+k₂v₂+· · ·+k_rv_r where k1, k2, . . . , kr are scalars.

Example

(14)

1. Consider the vectors u = (1,2,−1) and v = (6,4,2) in R³. Show that w = (9,2,7) is a linear combination of u and v and that w⁰ = (4,−1,8) is not a linear combination of u and v.

1.4.11 Spanning

Ifv1,v2, . . . ,vrare vectors in a vector spaceV, then generally some vectors inV may be linear combinations of v₁,v₂, . . . ,v_r and others may not. The following theorem shows that if a setW is constructed consisting of all those vectors that are expressible as linear combinations of v₁,v₂, . . . ,v_r, then W forms a subspace ofV.

Theorem 1.6. If v₁,v₂, . . . ,v_r are vectors in a vector space V, then:

(a) The set W of all linear combinations of v₁,v₂, . . . ,v_r is a subspace of V.

(b) W is the smallest subspace ofV that containsv₁,v₂, . . . ,v_r every other subspace of V that contains v₁,v₂, . . . ,v_r must contain W

Proof. (a) To show that W is a subspace of V, it must be proven that it is closed under addition and scalar multiplication. There is at least one vector in W, namely,0, since 0= 0v₁+ 0v₂+· · ·+ 0v_r. If u and v are vectors inW, then

u =c₁v₁ +c₂v₂ +· · ·+c_rv_r and

v=k₁v₁+k₂v₂+· · ·+k_rv_r where c1, c2, . . . , cr, k1, k2, . . . , kr are scalars. Therefore

u+v= (c1+k1)v1+ (c2+k2)v2+· · ·+ (cr+kr)vr

and, for any scalar k,

ku= (kc₁)v₁+ (kc₂)v₂+· · ·+ (kc_r)v_r

Thus,u+vand ku are linear combinations of v₁,v₂, . . . ,v_r and consequently lie in W. Therefore, W is closed under addition and scalar multiplication.

(15)

(b) Each vectorv_i is a linear combination of v₁,v₂, . . . ,v_r since we can write v_i = 0v₁+ 0v₂+· · ·+ 1v_i+· · ·+ 0v_r

Therefore, the subspace W contains each of the vectors v1,v2, . . . ,vr. Let W⁰ be any other subspace that contains v₁,v₂, . . . ,v_r. Since W⁰ is closed under addition and scalar multiplication, it must contain all linear combinations of v₁,v₂, . . . ,v_r. Thus W⁰ contains each vector of W.

Definitions

• If S ={v1,v2, . . . ,vr}is a set of vectors in a vector spaceV, then the subspace W of V consisting of all linear combinations of the vectors in S is called the space spannedby v1,v2, . . . ,vr, and it is said that the vectors v1,v2, . . . ,vr

span W. To indicate that W is the space spanned by the vectors in the set S ={v1,v2, . . . ,vr} the below notation is used.

W =span(S) or W =span{v₁,v₂, . . . ,v_r}

Examples The polynomials 1, x, x², . . . , xⁿ span the vector space P_n defined previ- ously since each polynomial p inP_n can be written as

p =a₀+a₁x+· · ·+a_nxⁿ

which is a linear combination of 1, x, x², . . . , xⁿ. This can be denoted by writing P_n=span{1, x, x², . . . , xⁿ}

Spanning sets are not unique. For example, any two noncolinear vectors that lie in the x−y plane will span the x−y plane. Also, any nonzero vector on a line will span the same line.

(16)

Theorem 1.7. Let S = {v₁,v₂, . . . ,v_r} and S⁰ = {w₁,w₂, . . . ,w_k} be two sets of vectors in a vector space V. Then

span(S) = span(S⁰)

if and only if each vector in S is a linear combination of those in S⁰ and (conversely) each vector in S⁰ is a linear combination of those in S.

Proof. If each vector in S is a linear combination of those inS⁰ then span(S)⊆span(S⁰)

and if each vector in S⁰ is a linear combination of those inS then span(S⁰)⊆span(S)

and therefore

span(S) =span(S⁰).

If

v_i6=a₁w₁+a₂w₂+· · ·+a_nw_n for all possible a₁, a₂, . . . , a_n then

vi ∈span(S) but vi 6∈span(S⁰) therefore

span(S)6=span(S⁰) and vice versa.

(a) 000 is a linear combination of any non-empty set of vectors.

(b) If S ⊆V (vector spaceV), then span(S) equals the intersection of all subspaces of V that contain S.

(17)

1.5 Linear Independence

In the previous section it was stated that a set of vectorsS spans a given vector space V if every vector in V is expressible as a linear combination of the vectors in S. In general, it is possible that there may be more than one way to express a vector in V as a linear combination of vectors in a spanning set. This section will focus on the conditions under which each vector inV is expressible as a unique linear combination of the spanning vectors. Spanning sets with this property play a fundamental role in the study of vector spaces.

Definitions If S = {v₁, v₂, . . . , v_r} is a nonempty set of vectors, then the vector equation

k₁v₁ +k₂v₂+· · ·+k_rv_r =0 has at least one solution, namely

k₁ = 0, k₂ = 0, . . . , k_r = 0

If this is the only solution, thenS is called a linearly independentset. If there are other solutions, then S is called a linearly dependent set.

Examples

1. If v₁ = (2,−1,0,3), v₂ = (1,2,5,−1) and v₃ = (7,−1,5,8), then the set of vectors S={v₁,v₂,v₃} is linearly dependent, since 3v₁+v₂−v₃ =0.

2. The polynomials

p₁ = 1−x, p₂ = 5 + 3x−2x², p₃ = 1 + 3x−x² form a linearly dependent set in P₂ since 3p₁−p₂+ 2p₃ =0

3. Consider the vectors i = (1,0,0),j = (0,1,0),k = (0,0,1) in R³. In terms of components the vector equation

k₁i+k₂j+k₃k=0

(18)

becomes

k₁(1,0,0) +k₂(0,1,0) +k₃(0,0,1) = (0,0,0) or equivalently,

(k₁, k₂, k₃) = (0,0,0)

Thus the set S = {i,j,k} is linearly independent. A similar argument can be used to extend S to a linear independent set inRⁿ.

4. In M2×3(R), the set











1 −3 2

−4 0 5



,





−3 7 4

6 −2 −7



,





−2 3 11

−1 −3 2









 is linearly dependent since

5





1 −3 2

−4 0 5



+3





−3 7 4

6 −2 −7



−2





−2 3 11

−1 −3 2



=





0 0 0 0 0 0



.

The following two theorems follow quite simply from the definition of linear independence and linear dependence.

Theorem 1.8. A set S with two or more vectors is:

(a) Linearly dependent if and only if at least one of the vectors in S is expressible as a linear combination of the other vectors in S.

(b) Linearly independent if and only if no vector in S is expressible as a linear combination of the other vectors in S.

Example

1. Recall that the vectors

v₁ = (2,−1,0,3), v₂ = (1,2,5,−1), v₃ = (7,−1,5,8)

(19)

were linear dependent because

3v₁+v₂−v₃ =0.

It is obvious from the equation that v₁ = −1

3 v₂ +1

3v₃, v₂ =−3v₁+ 1v₃, v₃ = 3v₁+ 1v₂

Theorem 1.9. (a) A finite set of vectors that contains the zero vector is linearly dependent.

(b) A set with exactly two vectors is linearly independent if and only if neither vector is a scalar multiple of the other.

(20)

2 BASIS AND DIMENSION

A line is thought of as 1-Dimensional, a plane 2-Dimensional, and surrounding space as 3-Dimensional. This section will attempt to make this intuitive notion of dimension precise and extend it to general vector spaces.

2.1 Coordinate systems of General Vector Spaces

A line is thought of as 1-Dimensional because every point on that line can be specified by 1 coordinate. In the same way a plane is thought of as 2 Dimensional because every point on that plane can be specified by 2 coordinates and so on. What defines this coordinate system? The most common form of defining a coordinate system is the use of coordinate axes. In the case of the plane the x and y axes are used most frequently. But there is also a way of specifying the coordinate system with vectors.

This can be done by replacing each axis with a vector of length one that points in the positive direction of the axis. In the case of the x−y plane thex and y-axes are replaced by the well known unit vectors i and j respectively. Let O be the origin of the system and P be any point in the plane. The point P can be specified by the vector OP. Every vector, OP can be written as a linear combination ofi and j:

OP =ai+bj

The coordinates of P, corresponding to this coordinate system, are (a, b).

Informally stated, vectors such as i and j that specify a coordinate system are called “basis vectors” for that system. Although in the preceding discussion our basis vectors were chosen to be of unit length and mutually perpendicular this is not essential. As long as linear combinations of the vectors chosen are capable of specifiying all points in the plane. In our example this only requires that the two vectors are not colinear. Different basis vectors however do change the coordinates of a point, as the following example demonstrates.

(21)

Example LetS ={i,j}, U ={i,2j} and V ={i+j,j}. Let the sets S,U and V be three sets of basis vectors. Let P be the point i+ 2j. The coordinates of P relative to each set of basis vectors is:

S→(1,2) U →(1,1) T →(1,1)

The following definition makes the preceding ideas more precise and enables the extension of a coordinate system to general vector spaces.

Definition

• IfV is any vector space and S ={v₁,v₂, . . . ,v_n} is a set of vectors in V, then S is called a basisfor V if the following two conditions hold:

(a) S is linearly independent (b) S spans V

A basis is the vector space generalization of a coordinate system in 2-space and 3-space. The following theorem will aid in understanding how this is so.

Theorem 2.1. If S = {v₁,v₂, . . . ,v_n} is a basis for a vector space V, then every vector v in V can be expressed in the form v =c1v1+c2v2+· · ·+cnvn in exactly one way.

Proof. Since S spans V, it follows from the definition of a spanning set that every vector in V is expressible as a linear combination of the vectors in S. To see that there is only one way to express a vector as a linear combination of the vectors inS, suppose that some vector v can be written as

v=c₁v₁+c₂v₂+· · ·+c_nv_n and also as

v=k₁v₁+k₂v₂+· · ·+k_nv_n

(22)

Subtracting the second equation from the first gives

0= (c₁−k₁)v₁+ (c₂−k₂)v₂+· · ·+ (c_n−k_n)v_n

Since the right side of this equation is a linear combination of vectors inS, the linear independence of S implies that

(c1−k1) = 0,(c2−k2) = 0, . . . ,(cn−kn) That is

c1 =k1, c2 =k2, . . . , cn=kn

Thus the two expressions for v are the same.

Definitions

• IfS ={v₁,v₂, . . . ,v_n} is a basis for a vector spaceV, and v=c1v1 +c2v2 +· · ·+cnvn

is the expression for a vector v in terms of the basis S, then the scalars c1, c2, . . . , cnare called thecoordinatesofvrelative to the basisS. The vector (c₁, c₂, . . . , c_n) in Fⁿ constructed from these coordinates is called the coordinate vector of v relative to S; it is denoted by

[v]_S = (c₁, c₂, . . . , c_n)

• Ifv= [v]_S then S is called the standard basis.

Remark It should be noted that coordinate vectors depend not only on the basisS but also on the order in which the basis vectors are written; a change in the order of the basis vectors results in a corresponding change of order for the entries in the coordinate vectors.

Examples

(23)

1. In example 3 of Section 1.5 it was shown that if

i= (1,0,0), j= (0,1,0), k= (0,0,1)

then S = {i,j,k} is a linearly independent set in R³. This set also spans R³ since any vector v= (a, b, c) can be written as

v= (a, b, c) = a(1,0,0) +b(0,1,0) +c(0,1,1) =ai+bj+ck

Thus, S is a basis for R³. It is in fact a standard basis for R³. Looking at the coefficients of i,j and k above, it follows that the coordinates of v relative to the standard basis are a, b and c, so

[v]_S = (a, b, c) and so we have

[v]_S =v.

2.2 Dimension of General Vector Spaces

Definition

• A nonzero vector space V is called finite-dimensional if it contains a finite set of vectors {v₁,v₂, . . . ,v_n} that forms a basis. If no such set exists, V is called infinite-dimensional. In addition, the zero vector space is regarded as finite-dimensional.

Examples

• The vector spaces Fⁿ and P_n are both finite-dimensional.

• The vector space of all real valued functions defined on (−∞,∞) is infinite- dimensional.

Theorem 2.2. If V is a finite-dimensional vector space and {v₁,v₂, . . . ,v_n} is any basis, then:

(24)

(a) Every set with more than n vectors is linearly dependent.

(b) No set with fewer than n vectors spans V.

Proof. (a) LetS⁰ ={w1,w2, . . . ,wm}be any set of m vectors in V, where m > n.

It remains to be shown thatS⁰ is linearly dependent. SinceS ={v₁,v₂, . . . ,v_n} is a basis forV, eachwi can be expressed as a linear combination of the vectors inS, say:

w1 =a11v1+a21v2+· · ·+an1vn

w₂ =a₁₂v₁+a₂₂v₂+· · ·+a_n2v_n ...

w_m=a_1mv₁+a_2mv₂+· · ·+a_nmv_n

To show that S⁰ is linearly dependent, scalars k₁, k₂, . . . , k_n must be found, not all zero, such that

k₁w₁+k₂w₂+· · ·+k_mw_m =0 combining the above 2 systems of equations gives

(k₁a₁₁+k₂a₁₂+· · ·+k_ma_1m)v₁

+ (k₁a₂₁+k₂a₂₂+· · ·+k_ma_2m)v₂ . ..

+ (k₁a_n1+k₂a_n2 +· · ·+k_ma_nm)v_n =0 Thus, from the linear independence of S, the problem of proving that S⁰ is a linearly dependent set reduces to showing there are scalars k₁, k₂, . . . , k_m, not all zero, that satisfy

a11k1+a12k2+· · ·+a1mkm = 0 a₂₁k₁+a₂₂k₂+· · ·+a_2mk_m = 0

...

(25)

a_n1k₁+a_n2k₂+· · ·+a_nmk_m = 0

As the system is homogenous and there are more unknowns than equations (m > n), we have an infinite number of solutions, or in other words there are non trivial solutions such that k₁, k₂, . . . , k_m are not all zero.

(b) Let S⁰ = {w₁,w₂, . . . ,w_m} be any set of m vectors in V, where m < n. It remains to be shown that S⁰ does not span V. The proof is by contradiction:

assume S⁰ spans V. This leads to a contradiction of the linear dependence of the basis S ={v₁,v₂, . . . ,v_n} of V.

If S⁰ spans V, then every vector in V is a linear combination of the vectors in S⁰. In particular, each basis vector v_i is a linear combination of the vectors in S⁰, say

v₁ =a₁₁w₁+a₂₁w₂+· · ·+a_n1w_m v2 =a12w1+a22w2+· · ·+an2wm

...

v_n =a_1nw₁+a_2nw₂+· · ·+a_mnw_m

To obtain the contradiction it will be shown that there exist scalarsk₁, k₂, . . . , k_n not all zero, such that

k₁v₁+k₂v₂+· · ·+k_nv_n=0

Observe the similarity to the above two systems compared with those given in the proof of (a). It can be seen that they are identical except that thew’s and the v’s and them’s and n’s have been interchanged. Thus the above system in the same way again reduces to the problem of findingk₁, k₂, . . . , k_n not all zero, that satisfy

a₁₁k₁+a₁₂k₂+· · ·+a_1mk_n= 0 a₂₁k₁+a₂₂k₂+· · ·+a_2mk_n= 0

(26)

...

a_m1k₁+a_m2k₂+· · ·+a_mnk_n = 0

As the system is homogenous and there are more unknowns than equations (n > m), we have an infinite number of solutions, or in other words there exist non trivial solutions such that k₁, k₂, . . . , k_m are not all zero. Hence the contradiction.

The last theorem essentially states the following. Let S be a set with n vectors which forms a basis for the vector space V. Let S⁰ be another set of vectors in V consisting of m vectors. If m is greater than n, S⁰ cannot form a basis for V as the vectors in S⁰ cannot be linearly independent. If m is less than n, S⁰ cannot form a basis for V because it does not span V. Thus, theorem 2.2 leads directly into one of the most important theorems in linear algebra.

Theorem 2.3. All bases for a finite-dimenstional vector space have the same number of vectors.

And thus the concept of dimension is almost complete. All that is needed is a definition.

Definition

• The dimension of a finite-dimensional vector space V, denoted by dim(V), is defined to be the number of vectors in a basis forV. In addition, the zero vector space has dimension zero.

Examples

1. The dimensions of some common vector spaces are given below:

dim(Fⁿ) =n dim(Pn) =n+ 1 dim(Mn×n(F)) =mn

(27)

2. Determine a basis (and hence dimension) for the solution space of the homogenous system:

2x₁ + 2x₂ − x₃ + x₅ = 0

−x₁ − x₂ + 2x₃ − 3x₄ + x₅ = 0

x₁ + x₂ − 2x₃ − x₅ = 0

x₃ + x₄ + x₅ = 0

2.3 Related Theorems

The remaining part of this section states theorems which illustrate the subtle relationships among the concepts of spanning, linear independence, basis and dimension. In many ways these theorems form the building blocks of other results in linear algebra.

Theorem 2.4. Plus/Minus Theorem. Let S be a nonempty set of vectors in a vector space V.

(a) If S is a linearly independent set, and if v is a vector in V that is outside of the span(S), then the set S ∪ {v} that results by inserting v is still linearly independent.

(b) If v is a vector in S that is expressible as a linear combination of other vectors in S, and if S− {v} denotes the set obtained by removing v from S, then S and S− {v} span the same space: that is,

span(S) =span(S− {v})

A proof will not be included, but the theorem can be visualised in R³ as follows.

(a) Consider two linearly independent vectors in R³. These two vectors span a plane. If you add a third vector to them that is not in the plane, then the three vectors are still linearly independent and they span the entire domain of R³.

(28)

(b) Consider three non-colinear vectors in a plane that form a set S. The set S spans the plane. If any one of the vectors is removed from S to give S⁰ it is clear that S⁰ still spans the plane. That isspan(S) =span(S⁰).

Theorem 2.5. If V is an n-dimensional vector space and if S is a set in V with exactly n vectors, then S is a basis for V if either S spans V or S is linearly independent.

Proof. Assume that S has exactly n vectors and spans V. To prove thatS is a basis it must be shown that S is a linearly independent set. But if this is not so, then some vector v inS is a linear combination of the remaining vectors. If this vector is removed from S, then it follows from the theorem 2.4(b) that the remaining set of n-1 vectors still spansV. But this is impossible, since it follows from theorem 2.2(b), that no set with fewer than n vectors can span an n-dimensional vector space. Thus, S is linearly independent.

Assume S has exactly n vectors and is a linearly independent set. To prove that S is a basis it must be shown thatS spansV. But if this is not so, then there is some vector v in V that is not in span(S). If this vector is inserted in S, then it follows from the theorem 2.4(a) that this set of n+1 vectors is still linearly independent. But this is impossible because it follows from theorem 2.2(a) that no set with more thann vectors in an n-dimensional vector space can be linearly independent. Thus S spans V.

Examples

• v₁ = (−3,8) andv₂ = (1,1) form a basis forR² because R² has dimension two and v₁ and v₂ are linearly independent.

Theorem 2.6. LetS be a finite set of vectors in a finite-dimensional vector spaceV. (a) If S spans V but is not a basis for V, then S can be reduced to a basis forV by

removing appropriate vectors from S.

(29)

(b) If S is a linearly independent set that is not already a basis for V, then S can be enlarged to a basis for V by inserting appropriate vectors into S.

Proof. (a) The proof is constructive and is called the left to right algorithm.

Let v_c₁ be the first nonzero vector in the set S. Choose the next vector in the list which is not a linear combination of v_c₁ and call it v_c₂. Find the next vector in the list which is not a linear combination of v_c₁ and v_c₂ and call it v_c₃. Continue in such a way until the number of vectors chosen equalsdim(V).

(b) This proof is also constructive.

LetV be a vector space. Begin with u1,u2, . . . ,ur which form a linearly independent family in V. Let v₁,v₂, . . . ,v_n be a basis for V. Now it is necessary and important that r < n. To extend the basis, simply apply the left to right algorithm to the set (note that this set spans V because it contains a basis within it)

u₁,u₂, . . . ,u_r,v₁,v₂, . . . ,v_n

This will select a basis for V that commences withu₁,u₂, . . . ,u_r

Theorem 2.7. If W is a subspace of a finite-dimensional vector space V, then dim(W)≤dim(V); moreover, if dim(W) = dim(V), then W =V

Proof. LetS ={w₁,w₂, . . . ,w_m}be a basis forW. EitherS is also a basis forV or it is not. If it is, thendim(W) =dim(V) =m. If it is not, then by the previous theorem, vectors can be added to the linearly independent set S to make it into a basis for V, so dim(W)< dim(V). Thus, dim(W)≤dim(V) in all cases. If dim(W) =dim(V), then S is a set of m linearly independent vectors in the m-dimensional vector space V; hence by theorem 2.5, S is a basis for V. Therefore W =V.

(30)

(a) The zero vector space has no basis.

(b) Every vector space that is spanned by a finite set has a basis.

(c) Every vector space has a finite basis.

(d) A vector space cannot have more than one basis.

(e) If a vector space has a finite basis, then the number of vectors in every basis is the same.

(f) Suppose that V is a finite dimensional vector space,S1 is a linear independent subset of V, and S₂ is a subset of V that spans V. Then S₁ cannot contain more vectors thanS2.

(g) IfS spans the vector spaceV, then every vector inV can be written as a linear combination of vectors in S in only one way.

(h) Every subspace of a finite dimensional vector space is finite dimensional.

(i) If V is an n dimensional vector space, then V has exactly one subspace with dimension 0 and one with dimension n.

(j) IfV is an n dimensional vector space, and if S is a subset ofV with n vectors, then S is linearly independent if and only if S spans V.

(31)

3 INNER PRODUCT SPACES AND ORTHONOR- MAL BASES

In many applications of vector spaces, we are concerned with the notion of measure- ment. In this section we introduce the idea of length through the structure of inner product spaces. We only consider F=R or C.

Definition

LetV be a vector space overF. We define an inner producth,ionV to be a function that assigns a scalar hu,vi ∈ F to every pair of ordered vectors u,v ∈ V such that the following properties hold for all u,v,w∈V and α∈F:

(a) hu+v,wi=hu,wi+hv,wi (b) hαu,vi=αhu,vi

(c) hu,vi=hv,ui (d) hu,ui >0 ifu 6=0.

The main example is when V = Fⁿ. In this case we often use the notation hu,vi ≡u·v which is determined by

u·v=

n

X

i=1

u_iv_i

where u= (u₁, u₂, . . . , u_n) and v= (v₁, v₂, . . . , v_n).

Definitions

• A vector space V over F endowed with a specific inner product is called an inner product space. If F=R then V is said to be a real inner product space, whereas ifF=C we call V a complex inner product space.

• The norm (or length, or magnitude) of a vector u is given by kuk=p hu,ui.

(32)

• Two vectorsu,vin an inner product space are said to beorthogonalifhu,vi= 0.

• If u and v are orthogonal vectors and both u and v have a magnitude of one (with respect toh,i), then u and v are said to beorthonormal.

• A set of vectors in an inner product space is called an orthogonal set if all pairs of distinct vectors in the set are orthogonal. An orthogonal set in which each vector has a magnitude of one is called an orthonormal set.

The following additional properties follow easily from the axioms:

Theorem 3.1. Let V be an inner product space, x,y,z∈V and c∈F. (a) hx,y+zi=hx,yi+hx,zi.

(b) hx, cyi= ¯chx,yi.

(c) hx,0i=h0,xi= 0.

(d) hx,xi= 0 if and only if x=0.

(e) If hx,yi=hx,zi for all x∈V, then y=z.

Proof. (a) - (d) exercises

(e) By part (a) and (b), hx,y−zi = 0 for all x∈V. Since this is true for all x, it is true for x=y−z, thus hy−z,y−zi= 0. By (d) this implies that y=z.

Now that the groundwork has been laid the following theorem can be stated. The proof of this result is extremely important, since it makes use of an algorithm, or method, for converting an arbitary basis into an orthonormal basis.

Theorem 3.2. Every non-zero finite dimensional inner product space V has an orthonormal basis.

Proof. Let {u₁,u₂, . . . ,u_m} be any basis for V. It suffices to show that V has an orthogonal basis, since the vectors in the orthogonal basis can be normalized to produce an orthonormal basis for V. The following sequence of steps will produce an orthogonal basis {v₁,v₂, . . . ,v_m} for V.

(33)

Step 1 Let v₁ =u₁.

Step 2 Obtain a vector v₂ that is orthogonal to v₁ by computing the component of u₂ that is orthogonal to the space W₁ spanned by v₁. This can be done using the formula:

v₂ =u₂−

hu2,v1i hv₁,v₁i

v₁

Of course, if v₂ = 0, then v₂ is not a basis vector. But this cannot happen, since it would then follow from the preceding formula forv₂ that

u₂ =

hu₂,v₁i hv₁,v₁i

v₁ =

hu₂,v₁i hu₁,u₁i

u₁

which says thatu₂ is a multiple ofu₁, contradicting the linear independence of the basis S ={u₁,u₂, . . . ,u_n}.

Step 3 To construct a vector v₃ that is orthogonal to both v₁ and v₂, compute the component of u₃ orthogonal to the space W₂ spanned by v₁ and v₂ using the formula:

v₃ =u₃−

hu3,v1i hv₁,v₁i

v₁−

hu3,v2i hv₂,v₂i

v₂

As in step 2, the linear independence of {u₁,u₂, . . . ,u_n} ensures that v₃ 6=0.

The remaining details are left as an exercise.

Step 4 To determine a vector v₄ that is orthogonal to v₁,v₂ and v₃, compute the component of u₄ orthogonal to the space W₃ spanned by v₁,v₂ and v₃ using the formula

v₄ =u₄−

hu₄,v₁i hv₁,v₁i

v₁−

hu₄,v₂i hv₂,v₂i

v₂−

hu₄,v₃i hv₃,v₃i

v₃

Continuing in this way, an orthogonal set of vectors,{v₁,v₂, . . . ,v_m}, will be obtained after m steps. Since V is an m-dimensional vector space and every orthogonal set is linearly independent, the set {v₁,v₂, . . . ,v_n} is an orthogonal basis for V.

(34)

This preceding step-by-step construction for converting an arbitary basis into an orthogonal basis is called the Gram-Schmidt process.

Examples: THE GRAM-SCHMIDT PROCESS

1. Consider the vector space R³ with the Euclidean inner product. Apply the Gram-Schmidt process to transform the basis vectorsu₁ = (1,1,1),u₂ = (0,1,1),u₃ = (0,0,1) into an orthogonal basis{v1,v2,v3}; then normalize the orthogonal basis vectors to obtain an orthonormal basis {q₁,q₂,q₃}.

Step 1

v₁ =u₁ = (1,1,1) Step 2

v₂ = u₂−

u₂·v₁ v₁·v₁

v₁

= (0,1,1)− 2

3(1,1,1) = −2

3 ,1 3,1

3

Step 3

v₃ = u₃−

u₃·v₁ v₁·v₁

v₁−

u₃·v₂ v₂·v₂

v₂

= (0,0,1)−1

3(1,1,1)−1/3 2/3

−2 3 ,1

3,1 3

=

0,−1 2,1

2

Thus,

v₁ = (1,1,1), v₂ =

−2 3,1

3,1 3

, v₃ =

0,−1 2,1

2

form an orthogonal basis forR³. The norms of these vectors are kv₁k=√

3, kv₂k=

√6

3 , kv₃k= 1

√2 so an orthonormal basis forR³ is

q₁ = v1

kv₁k = 1

√3, 1

√3

, q₂ = v2

kv₂k = −2

√6, 1

√6

(35)

q₃ = v₃ kv₃k =

0,− 1

√2, 1

√2

The Gramm-Schmidt process with subsequent normalization not only converts an arbitary basis{u₁,u₂, . . . ,u_n}into an orthonormal basis{q₁,q₂, . . . ,q_n}, but it does it in such a way that for k≥2 the following relationships hold:

• {q₁,q₂, . . . ,q_k}is an orthonormal basis for the space spanned by{u₁, . . . ,u_k}.

• q_k is orthogonal to {u₁,u₂, . . . ,u_k−1}.

The proofs are omitted but these facts should become evident after some thoughtful examination of the proof of Theorem 3.1.

3.1 Quiz

True or false?

• An inner product is a scalar-valued function on the set of ordered pairs of vectors.

• An inner product space must be over the field of real or complex numbers.

• An inner product is linear in both components.

• If x, y and z are vectors in an inner product space such that hx, yi = hx, zi, then y=z.

• Ifhx, yi= 0 for all x in an inner product space, theny= 0.

(36)

4 LINEAR TRANSFORMATIONS AND MATRI- CES

Definitions

• Let V, W be vector spaces over a field F. A function that maps V into W, T :V →W, is called a linear transformation fromV to W if for all vectors u and v inV and all scalars c∈F

(a) T(u+v) =T(u) +T(v) (b) T(cu) =cT(u)

• In the special case whereV =W, the linear transformationT :V →V is called a linear operatoronV.

• Let A be an m×n matrix and let T : Fⁿ → F^m be the linear transformation defined byT(x) =Axfor allx∈Fⁿ. Then as a matter of notational convention it is said thatT is the linear transformation TA.

4.0.1 Basic Properties of Linear Transformations Theorem 4.1. If T :V →W is a linear transformation, then:

(a) If T is linear, then T(0) =0

(b) T is linear if and only if T(av+w) = aT(v) +T(w) for all v,w in V and a∈F.

(c) T(v−w) = T(v)−T(w) for all v and w in V.

Part (a) of the above theorem states that a linear transformation maps 0 into 0.

This property is useful for identifying transformations that are not linear. Part (b) is usually used to show that a transformation is linear.

Examples

(37)

1. T_A is a linear transformation. Let A be anm×n matrix and let T :Fⁿ →F^m be the linear transformation defined by T_A(x) =Axfor all x∈ Fⁿ. Let u and v∈Fⁿ, then

T(λu+v) = A(λu+v)

=λAu+Av

=λT_A(u) +T_A(v) and thus T_A is a linear transformation.

2. If I is the n×n identity matrix, then for every vector xin Fⁿ TI(x) = Ix=x

so multiplication by I maps every vector in Fⁿ into itself. T_I(x) is called the identity operator onFⁿ.

3. Let A,B and X be n×n matrices. Then Y =AX−XB is also n×n.

LetV =M_n×n(F) be the vector space of alln×nmatrices. ThenY =AX−XB defines a transformation T :V →V. The transformation is linear since

T(λX₁+X₂) = A(λX₁+X₂)−(λX₁+X₂)B

= λAX₁+AX₂−λX₁b−X₂B

= λ(AX₁−X₁B) +AX₂−X₂B

= λT(X1) +T(X2)

Theorem 4.2. If T :Fⁿ →F^m is a linear transformation, then there exists anm×n matrix A such that T =T_A.

Example

1. Find the 2×2 matrixA such that T =TA has the property that T



 1 1



=



 1 3



 and T



 2 1



=



 0 1





(38)

4.1 Geometric Transformations in R

²

This section consists of various different transformations of the form T_A that have a geometrical interpretation. Such transformations form the building blocks for understanding linear transformations.

Examples of Geometric Transformations

• Operators on R² and R³ that map each vector into its symmetric image about some line or plane are called reflection operators. Such operators are of the formT_A and are thus linear. There are three main reflections inR². These are summarised below. Considering the transformation from the coordinates (x, y) to (w₁, w₂) the properties of the operator are as follows.

1. Reflection about the y-axis: The equations for this transformation are w₁ = −x

w₂ = y

The standard matrix for the transformation is clearly A=





−1 0 0 1





To demonstrate the reflection, consider the example below.

Letx=



 1 2





thereforeT_A(x) = Ax=





−1 2





2. Reflection about the x-axis: The equations for this transformation are w₁ = x

w₂ = −y

(39)





1 0

0 −1





Letx=



 1 2







 1

−2





3. Reflection about the line y=x: The equations for this transformation are

w₁ = y w₂ = x



 0 1 1 0





Letx=



 1 2







 2 1





• Operators onR²andR³ that map each vector into its orthogonal projection on a line or plane through the origin are calledorthogonal projection operators.

(40)

Such operators are of the form T_A and are thus linear. There are two main projections inR². These are summarised below. Considering the transformation from the coordinates (x, y) to (w₁, w₂) the properties of the operator are as follows.

1. Orthogonal projection onto the x-axis: The equations for this transformation are

w₁ = x w₂ = 0



 1 0 0 0





To demonstrate the projection, consider the example below.

Letx=



 1 2







 1 0





2. Orthogonal projection on the y-axis: The equations for this transformation are

w1 = 0 w2 = y



 0 0 0 1



