\title{
Matrix Differentiation
}
## Introduction
Throughout this presentation I have chosen to use a symbolic matrix notation. This choice was not made lightly. I am a strong advocate of index notation, when appropriate. For example, index notation greatly simplifies the presentation and manipulation of differential geometry. As a rule-of-thumb, if your work is going to primarily involve differentiation with respect to the spatial coordinates, then index notation is almost surely the appropriate choice.
In the present case, however, I will be manipulating large systems of equations in which the matrix calculus is relatively simply while the matrix algebra and matrix arithmetic is messy and more involved. Thus, I have chosen to use symbolic notation.
## Notation and Nomenclature
Definition 1 Let $a_{i j} \in \Re, i=1,2, \ldots, m, j=1,2, \ldots, n$. Then the ordered rectangular array
$$
\mathbf{A}=\left[\begin{array}{cccc}
a_{11} & a_{12} & \cdots & a_{1 n} \\
a_{21} & a_{22} & \cdots & a_{2 n} \\
\vdots & \vdots & & \vdots \\
a_{m 1} & a_{m 2} & \cdots & a_{m n}
\end{array}\right]
$$
is said to be a real matrix of dimension $m \times n$.
When writing a matrix I will occasionally write down its typical element as well as its dimension. Thus,
$$
\mathbf{A}=\left[\mathrm{a}_{\mathrm{i} \mathrm{j}}\right], \quad \boldsymbol{i}=1,2, \ldots, \mathrm{m} ; \boldsymbol{j}=1,2, \ldots, \mathrm{n}
$$
denotes a matrix with $m$ rows and $n$ columns, whose typical element is $a_{i j}$. Note, the first subscript locates the row in which the typical element lies while the second subscript locates the column. For example, $a_{j k}$ denotes the element lying in the $j$ th row and kth column of the matrix $\mathbf{A}$.
Definition 2 A vector is a matrix with only one column. Thus, all vectors are inherently column vectors.
\section{Convention 1}
Multi-column matrices are denoted by boldface uppercase letters: for example, $\mathbf{A}, \mathbf{B}, \mathbf{X}$. Vectors (single-column matrices) are denoted by boldfaced lowercase letters: for example, $\mathbf{a}, \mathbf{b}, \mathbf{x}$. I will attempt to use letters from the beginning of the alphabet to designate known matrices, and letters from the end of the alphabet for unknown or variable matrices.
\section{Convention 2}
When it is useful to explicitly attach the matrix dimensions to the symbolic notation, I will use an underscript. For example, $\underset{m \times n}{\mathbf{A}}$, indicates a known, multi-column matrix with $m$ rows and $n$ columns.
A superscript ${ }^{\top}$ denotes the matrix transpose operation; for example, $\mathbf{A}^{\top}$ denotes the transpose of $\mathbf{A}$. Similarly, if $\mathbf{A}$ has an inverse it will be denoted by $\mathbf{A}^{-1}$. The determinant of $\mathbf{A}$ will be denoted by either $|\mathbf{A}|$ or $\operatorname{det}(\mathbf{A})$. Similarly, the rank of a matrix $\mathbf{A}$ is denoted by $\operatorname{rank}(\mathbf{A})$. An identity matrix will be denoted by $\mathbf{I}$, and $\mathbf{0}$ will denote a null matrix.
\section{Matrix Multiplication}
Definition 3 Let $\mathbf{A}$ be $m \times n$, and $\mathbf{B}$ be $n \times p$, and let the product $\mathbf{A B}$ be
$$
\mathbf{C}=\mathbf{A B}
$$
then $\mathbf{C}$ is a $m \times p$ matrix, with element $(i, j)$ given by
$$
c_{i j}=\sum_{k=1}^{n} a_{i k} b_{k j}
$$
for all $i=1,2, \ldots, m, \quad j=1,2, \ldots, p$
Proposition 1 Let $\mathbf{A}$ be $m \times n$, and $\mathbf{x}$ be $n \times 1$, then the typical element of the product
$$
\mathbf{z}=\mathbf{A x}
$$
is given by
$$
z_{i}=\sum_{k=1}^{n} a_{i k} x_{k}
$$
for all $i=1,2, \ldots, m$. Similarly, let $\mathbf{y}$ be $m \times 1$, then the typical element of the product
$$
\mathbf{z}^{\top}=\mathbf{y}^{\top} \mathbf{A}
$$
is given by
$$
z_{i}=\sum_{k=1}^{n} a_{k i} y_{k}
$$
for all $i=1,2, \ldots, n$. Finally, the scalar resulting from the product
$$
\alpha=\mathbf{y}^{\top} \mathbf{A} \mathbf{x}
$$
is given by
$$
\alpha=\sum_{j=1}^{m} \sum_{k=1}^{n} a_{j k} y_{j} x_{k}
$$
Proof: These are merely direct applications of Definition 3. q.e.d. Proposition 2 Let $\mathbf{A}$ be $m \times n$, and $\mathbf{B}$ be $\mathbf{n} \times \mathbf{p}$, and let the product $\mathbf{A B}$ be
$$
\mathbf{C}=\mathbf{A B}
$$
then
$$
\mathbf{C}^{\top}=\mathbf{B}^{\top} \mathbf{A}^{\top}
$$
Proof: The typical element of $\mathbf{C}$ is given by
$$
c_{i j}=\sum_{k=1}^{n} a_{i k} b_{k j}
$$
By definition, the typical element of $\mathbf{C}^{\top}$, say $\mathrm{d}_{\mathfrak{i j}}$, is given by
$$
d_{i j}=c_{j i}=\sum_{k=1}^{n} a_{j k} b_{k i}
$$
Hence,
$$
\mathbf{C}^{\top}=\mathbf{B}^{\top} \mathbf{A}^{\top}
$$
q.e.d.
Proposition 3 Let $\mathbf{A}$ and $\mathbf{B}$ be $\mathfrak{\times} n$ and invertible matrices. Let the product $\mathbf{A B}$ be given by
$$
\mathbf{C}=\mathbf{A B}
$$
then
$$
\mathbf{C}^{-1}=\mathbf{B}^{-1} \mathbf{A}^{-1}
$$
Proof:
$$
\mathrm{CB}^{-1} \mathbf{A}^{-1}=\mathbf{A B B}^{-1} \mathbf{A}^{-1}=\mathbf{I}
$$
q.e.d.
\section{Partioned Matrices}
Frequently, I will find it convenient to deal with partitioned matrices ${ }^{1}$. Such a representation, and the manipulation of this representation, are two of the relative advantages of the symbolic matrix notation.
Definition 4 Let $\boldsymbol{A}$ be $m \times n$ and write
$$
A=\left[\begin{array}{ll}
B & C \\
D & E
\end{array}\right]
$$
where $\mathbf{B}$ is $m_{1} \times n_{1}, \mathbf{E}$ is $m_{2} \times n_{2}, \mathbf{C}$ is $m_{1} \times n_{2}, \mathbf{D}$ is $m_{2} \times n_{1}, m_{1}+m_{2}=m$, and $n_{1}+n_{2}=n$. The above is said to be a partition of the matrix $\mathbf{A}$.
${ }^{1}$ Much of the material in this section is extracted directly from Dhrymes (1978, Section 2.7). Proposition 4 Let $\mathbf{A}$ be a square, nonsingular matrix of order $\mathrm{m}$. Partition $\mathbf{A}$ as
$$
\mathbf{A}=\left[\begin{array}{ll}
\mathbf{A}_{11} & \mathbf{A}_{12} \\
\mathbf{A}_{21} & \mathbf{A}_{22}
\end{array}\right]
$$
so that $\mathbf{A}_{11}$ is a nonsingular matrix of order $\mathrm{m}_{1}, \mathbf{A}_{22}$ is a nonsingular matrix of order $\mathrm{m}_{2}$, and $m_{1}+m_{2}=m$. Then
$$
\mathbf{A}^{-1}=\left[\begin{array}{cc}
\left(\mathbf{A}_{11}-\mathbf{A}_{12} \mathbf{A}_{22}^{-1} \mathbf{A}_{21}\right)^{-1} & -\mathbf{A}_{11}^{-1} \mathbf{A}_{12}\left(\mathbf{A}_{22}-\mathbf{A}_{21} \mathbf{A}_{11}^{-1} \mathbf{A}_{12}\right)^{-1} \\
-\mathbf{A}_{22}^{-1} \mathbf{A}_{21}\left(\mathbf{A}_{11}-\mathbf{A}_{12} \mathbf{A}_{22}^{-1} \mathbf{A}_{21}\right)^{-1} & \left(\mathbf{A}_{22}-\mathbf{A}_{21} \mathbf{A}_{11}^{-1} \mathbf{A}_{12}\right)^{-1}
\end{array}\right]
$$
Proof: Direct multiplication of the proposed $\mathbf{A}^{-1}$ and $\mathbf{A}$ yields
$$
\mathbf{A}^{-1} \mathbf{A}=\mathbf{I}
$$
q.e.d.
\section{$5 \quad$ Matrix Differentiation}
In the following discussion I will differentiate matrix quantities with respect to the elements of the referenced matrices. Although no new concept is required to carry out such operations, the element-by-element calculations involve cumbersome manipulations and, thus, it is useful to derive the necessary results and have them readily available ${ }^{2}$.
\section{Convention 3}
Let
$$
\mathbf{y}=\psi(\mathbf{x})
$$
where $\mathbf{y}$ is an $m$-element vector, and $\mathbf{x}$ is an $n$-element vector. The symbol
$$
\frac{\partial y}{\partial x}=\left[\begin{array}{cccc}
\frac{\partial y_{1}}{\partial x_{1}} & \frac{\partial y_{1}}{\partial x_{2}} & \cdots & \frac{\partial y_{1}}{\partial x_{n}} \\
\frac{\partial y_{2}}{\partial x_{1}} & \frac{\partial y_{2}}{\partial x_{2}} & \cdots & \frac{\partial y_{2}}{\partial x_{n}} \\
\vdots & \vdots & & \vdots \\
\frac{\partial y_{m}}{\partial x_{1}} & \frac{\partial y_{m}}{\partial x_{2}} & \cdots & \frac{\partial y_{m}}{\partial x_{n}}
\end{array}\right]
$$
will denote the $m \times n$ matrix of first-order partial derivatives of the transformation from $x$ to $\mathbf{y}$. Such a matrix is called the Jacobian matrix of the transformation $\psi()$.
Notice that if $\mathbf{x}$ is actually a scalar in Convention 3 then the resulting Jacobian matrix is a $m \times 1$ matrix; that is, a single column (a vector). On the other hand, if $\mathbf{y}$ is actually a scalar in Convention 3 then the resulting Jacobian matrix is a $1 \times n$ matrix; that is, a single row (the transpose of a vector).
\section{Proposition 5 Let}
$$
\mathbf{y}=\mathbf{A x}
$$
${ }^{2}$ Much of the material in this section is extracted directly from Dhrymes (1978, Section 4.3). The interested reader is directed to this worthy reference to find additional results. where $\mathbf{y}$ is $\mathbf{m} \times 1, \mathbf{x}$ is $\mathbf{n} \times 1, \mathbf{A}$ is $\mathbf{m} \times \mathbf{n}$, and $\mathbf{A}$ does not depend on $\mathbf{x}$, then
$$
\frac{\partial y}{\partial x}=\mathbf{A}
$$
Proof: Since the $i$ th element of $\mathbf{y}$ is given by
$$
y_{i}=\sum_{k=1}^{n} a_{i k} x_{k}
$$
it follows that
$$
\frac{\partial y_{i}}{\partial x_{j}}=a_{i j}
$$
for all $i=1,2, \ldots, m, \quad j=1,2, \ldots, n$. Hence
$$
\frac{\partial y}{\partial x}=\mathbf{A}
$$
q.e.d.
Proposition 6 Let
$$
\mathbf{y}=\mathbf{A x}
$$
where $\mathbf{y}$ is $\mathbf{m} \times 1, \mathbf{x}$ is $\mathbf{n} \times 1, \mathbf{A}$ is $\mathbf{m} \times \mathbf{n}$, and $\mathbf{A}$ does not depend on $\mathbf{x}$, as in Proposition 5 . Suppose that $\mathbf{x}$ is a function of the vector $\mathbf{z}$, while $\mathbf{A}$ is independent of $\mathbf{z}$. Then
$$
\frac{\partial \mathbf{y}}{\partial \mathbf{z}}=\mathbf{A} \frac{\partial \mathbf{x}}{\partial \mathbf{z}}
$$
Proof: Since the $i$ th element of $\mathbf{y}$ is given by
$$
y_{i}=\sum_{k=1}^{n} a_{i k} x_{k}
$$
for all $i=1,2, \ldots, m$, it follows that
$$
\frac{\partial y_{i}}{\partial z_{j}}=\sum_{k=1}^{n} a_{i k} \frac{\partial x_{k}}{\partial z_{j}}
$$
but the right hand side of the above is simply element $(i, j)$ of $\mathbf{A} \frac{\partial \mathbf{x}}{\partial \mathbf{z}}$. Hence
$$
\frac{\partial \mathbf{y}}{\partial \mathbf{z}}=\frac{\partial \mathbf{y}}{\partial \mathbf{x}} \frac{\partial \mathbf{x}}{\partial \mathbf{z}}=\mathbf{A} \frac{\partial \mathbf{x}}{\partial \mathbf{z}}
$$
q.e.d.
Proposition 7 Let the scalar $\alpha$ be defined by
$$
\alpha=\mathbf{y}^{\top} \mathbf{A x}
$$
where $\mathbf{y}$ is $\mathbf{m} \times 1, \mathbf{x}$ is $\mathbf{n} \times 1, \mathbf{A}$ is $\mathbf{m} \times \mathbf{n}$, and $\mathbf{A}$ is independent of $\mathbf{x}$ and $\mathbf{y}$, then
$$
\frac{\partial \alpha}{\partial \mathbf{x}}=\mathbf{y}^{\top} \mathbf{A}
$$
and
$$
\frac{\partial \alpha}{\partial \mathbf{y}}=\mathbf{x}^{\top} \mathbf{A}^{\top}
$$
Proof: Define
$$
\mathbf{w}^{\top}=\mathbf{y}^{\top} \mathbf{A}
$$
and note that
$$
\alpha=\mathbf{w}^{\top} \mathbf{x}
$$
Hence, by Proposition 5 we have that
$$
\frac{\partial \alpha}{\partial \mathbf{x}}=\mathbf{w}^{\top}=\mathbf{y}^{\top} \mathbf{A}
$$
which is the first result. Since $\alpha$ is a scalar, we can write
$$
\alpha=\alpha^{\top}=\mathbf{x}^{\top} \mathbf{A}^{\top} \mathbf{y}
$$
and applying Proposition 5 as before we obtain
$$
\frac{\partial \alpha}{\partial \mathbf{y}}=\mathbf{x}^{\top} \mathbf{A}^{\top}
$$
q.e.d.
Proposition 8 For the special case in which the scalar $\alpha$ is given by the quadratic form
$$
\alpha=\mathbf{x}^{\top} \mathbf{A x}
$$
where $\mathbf{x}$ is $\mathfrak{n} \times 1, \mathbf{A}$ is $\mathfrak{n} \times \mathbf{n}$, and $\mathbf{A}$ does not depend on $\mathbf{x}$, then
$$
\frac{\partial \alpha}{\partial \mathbf{x}}=\mathbf{x}^{\top}\left(\mathbf{A}+\mathbf{A}^{\top}\right)
$$
Proof: By definition
$$
\alpha=\sum_{j=1}^{n} \sum_{i=1}^{n} a_{i j} x_{i} x_{j}
$$
Differentiating with respect to the $k$ th element of $\mathbf{x}$ we have
$$
\frac{\partial \alpha}{\partial x_{k}}=\sum_{j=1}^{n} a_{k j} x_{j}+\sum_{i=1}^{n} a_{i k} x_{i}
$$
for all $\mathrm{k}=1,2, \ldots, \mathrm{n}$, and consequently,
$$
\frac{\partial \alpha}{\partial \mathbf{x}}=\mathbf{x}^{\top} \mathbf{A}^{\top}+\mathbf{x}^{\top} \mathbf{A}=\mathbf{x}^{\top}\left(\mathbf{A}^{\top}+\mathbf{A}\right)
$$
q.e.d. Proposition 9 For the special case where $\mathbf{A}$ is a symmetric matrix and
$$
\alpha=\mathbf{x}^{\top} \mathbf{A x}
$$
where $\mathbf{x}$ is $\mathbf{n} \times 1, \mathbf{A}$ is $\mathbf{n} \times \mathbf{n}$, and $\mathbf{A}$ does not depend on $\mathbf{x}$, then
$$
\frac{\partial \alpha}{\partial x}=2 x^{\top} \mathbf{A}
$$
Proof: This is an obvious application of Proposition 8. q.e.d.
Proposition 10 Let the scalar $\alpha$ be defined by
$$
\alpha=\mathbf{y}^{\top} \mathbf{x}
$$
where $\mathbf{y}$ is $\mathbf{n} \times 1, \mathbf{x}$ is $\mathbf{n} \times 1$, and both $\mathbf{y}$ and $\mathbf{x}$ are functions of the vector $\mathbf{z}$. Then
$$
\frac{\partial \alpha}{\partial z}=x^{\top} \frac{\partial \mathbf{y}}{\partial z}+y^{\top} \frac{\partial \mathbf{x}}{\partial \mathbf{z}}
$$
Proof: We have
$$
\alpha=\sum_{j=1}^{n} x_{j} y_{j}
$$
Differentiating with respect to the $k$ th element of $\mathbf{z}$ we have
$$
\frac{\partial \alpha}{\partial z_{k}}=\sum_{j=1}^{n}\left(x_{j} \frac{\partial y_{j}}{\partial z_{k}}+y_{j} \frac{\partial x_{j}}{\partial z_{k}}\right)
$$
for all $k=1,2, \ldots, n$, and consequently,
$$
\frac{\partial \alpha}{\partial z}=\frac{\partial \alpha}{\partial \mathbf{y}} \frac{\partial \mathbf{y}}{\partial \mathbf{z}}+\frac{\partial \alpha}{\partial \mathbf{x}} \frac{\partial \mathbf{x}}{\partial \mathbf{z}}=\mathbf{x}^{\top} \frac{\partial \mathbf{y}}{\partial \mathbf{z}}+\mathbf{y}^{\top} \frac{\partial \mathbf{x}}{\partial \mathbf{z}}
$$
q.e.d.
Proposition 11 Let the scalar $\alpha$ be defined by
$$
\alpha=\mathbf{x}^{\top} \mathbf{x}
$$
where $\mathbf{x}$ is $\mathbf{n} \times 1$, and $\mathbf{x}$ is a function of the vector $\mathbf{z}$. Then
$$
\frac{\partial \alpha}{\partial \mathbf{z}}=2 \mathbf{x}^{\top} \frac{\partial \mathbf{x}}{\partial \mathbf{z}}
$$
Proof: This is an obvious application of Proposition 10. q.e.d.
Proposition 12 Let the scalar $\alpha$ be defined by
$$
\alpha=\mathbf{y}^{\top} \mathbf{A x}
$$
where $\mathbf{y}$ is $\mathbf{m} \times 1, \mathbf{x}$ is $\mathbf{n} \times 1, \mathbf{A}$ is $\mathbf{m} \times \mathbf{n}$, and both $\mathbf{y}$ and $\mathbf{x}$ are functions of the vector $\mathbf{z}$, while $\mathbf{A}$ does not depend on $\mathbf{z}$. Then
$$
\frac{\partial \alpha}{\partial \mathbf{z}}=\mathbf{x}^{\top} \mathbf{A}^{\top} \frac{\partial \mathbf{y}}{\partial \mathbf{z}}+\mathbf{y}^{\top} \mathbf{A} \frac{\partial \mathbf{x}}{\partial \mathbf{z}}
$$
Proof: Define
$$
\mathbf{w}^{\top}=\mathbf{y}^{\top} \mathbf{A}
$$
and note that
$$
\alpha=\mathbf{w}^{\top} \mathbf{x}
$$
Applying Propositon 10 we have
$$
\frac{\partial \alpha}{\partial \mathbf{z}}=\mathbf{x}^{\top} \frac{\partial \mathbf{w}}{\partial \mathbf{z}}+\mathbf{w}^{\top} \frac{\partial \mathbf{x}}{\partial \mathbf{z}}
$$
Substituting back in for $\mathbf{w}$ we arrive at
$$
\frac{\partial \alpha}{\partial z}=\frac{\partial \alpha}{\partial \mathbf{y}} \frac{\partial \mathbf{y}}{\partial \mathbf{z}}+\frac{\partial \alpha}{\partial \mathbf{x}} \frac{\partial \mathbf{x}}{\partial \mathbf{z}}=\mathbf{x}^{\top} \mathbf{A}^{\top} \frac{\partial \mathbf{y}}{\partial \mathbf{z}}+\mathbf{y}^{\top} \mathbf{A} \frac{\partial \mathbf{x}}{\partial \mathbf{z}}
$$
q.e.d.
Proposition 13 Let the scalar $\alpha$ be defined by the quadratic form
$$
\alpha=\mathbf{x}^{\top} \mathbf{A x}
$$
where $\mathbf{x}$ is $\mathbf{n} \times 1, \mathbf{A}$ is $\mathbf{n} \times \mathbf{n}$, and $\mathbf{x}$ is a function of the vector $\mathbf{z}$, while $\mathbf{A}$ does not depend on $\mathbf{z}$. Then
$$
\frac{\partial \alpha}{\partial \mathbf{z}}=\mathbf{x}^{\top}\left(\mathbf{A}+\mathbf{A}^{\top}\right) \frac{\partial \mathbf{x}}{\partial \mathbf{z}}
$$
Proof: This is an obvious application of Proposition 12. q.e.d.
Proposition 14 For the special case where $\mathbf{A}$ is a symmetric matrix and
$$
\alpha=\mathbf{x}^{\top} \mathbf{A} \mathbf{x}
$$
where $\mathbf{x}$ is $\mathfrak{n} \times 1, \mathbf{A}$ is $\mathfrak{n} \times \mathfrak{n}$, and $\mathbf{x}$ is a function of the vector $\mathbf{z}$, while $\mathbf{A}$ does not depend on $\mathbf{z}$. Then
$$
\frac{\partial \alpha}{\partial \mathbf{z}}=2 \mathbf{x}^{\top} \mathbf{A} \frac{\partial \mathbf{x}}{\partial \mathbf{z}}
$$
Proof: This is an obvious application of Proposition 13. q.e.d.
Definition 5 Let $\mathbf{A}$ be a $m \times n$ matrix whose elements are functions of the scalar parameter $\alpha$. Then the derivative of the matrix $\boldsymbol{A}$ with respect to the scalar parameter $\alpha$ is the $m \times n$ matrix of element-by-element derivatives:
$$
\frac{\partial \mathbf{A}}{\partial \alpha}=\left[\begin{array}{cccc}
\frac{\partial a_{11}}{\partial \alpha} & \frac{\partial a_{12}}{\partial \alpha} & \ldots & \frac{\partial a_{1 n}}{\partial \alpha} \\
\frac{\partial a_{21}}{\partial \alpha} & \frac{\partial a_{22}}{\partial \alpha} & \ldots & \frac{\partial a_{2 n}}{\partial \alpha} \\
\vdots & \vdots & & \vdots \\
\frac{\partial a_{m 1}}{\partial \alpha} & \frac{\partial a_{m 2}}{\partial \alpha} & \ldots & \frac{\partial a_{m n}}{\partial \alpha}
\end{array}\right]
$$
Proposition 15 Let $\mathbf{A}$ be a nonsingular, $m \times m$ matrix whose elements are functions of the scalar parameter $\alpha$. Then
$$
\frac{\partial \mathbf{A}^{-1}}{\partial \alpha}=-\mathbf{A}^{-1} \frac{\partial \mathbf{A}}{\partial \alpha} \mathbf{A}^{-1}
$$
Proof: Start with the definition of the inverse
$$
\mathbf{A}^{-1} \mathbf{A}=\mathbf{I}
$$
and differentiate, yielding
$$
\mathbf{A}^{-1} \frac{\partial \mathbf{A}}{\partial \alpha}+\frac{\partial \mathbf{A}^{-1}}{\partial \alpha} \mathbf{A}=\mathbf{0}
$$
rearranging the terms yields
$$
\frac{\partial \mathbf{A}^{-1}}{\partial \alpha}=-\mathbf{A}^{-1} \frac{\partial \mathbf{A}}{\partial \alpha} \mathbf{A}^{-1}
$$
q.e.d.
\section{References}
- Dhrymes, Phoebus J., 1978, Mathematics for Econometrics, Springer-Verlag, New York, 136 pp.
- Golub, Gene H., and Charles F. Van Loan, 1983, Matrix Computations, Johns Hopkins University Press, Baltimore, Maryland, $476 \mathrm{pp}$.
- Graybill, Franklin A., 1983, Matrices with Applications in Statistics, 2nd Edition, Wadsworth International Group, Belmont, California, 461 pp.