Multivariate Calculus#

8. Multivariate Calculus#

Multivariate calculus is the branch of calculus that deals with functions of more than one variable. It is essential for analyzing systems in higher dimensions, such as in optimization, machine learning, and physics. In this section, we will cover key topics such as partial derivatives, gradient vectors, and optimization.

8.1 Functions of Several Variables#

A multivariable function is a function that takes two or more variables as input. For example, a function $f (x, y)$ in two variables $x$ and $y$ is written as:
[
f: \mathbb{R}^2 \to \mathbb{R}, \quad f(x, y)
]
Similarly, a function in three variables is written as $f (x, y, z)$ .

8.2 Partial Derivatives#

A partial derivative is the derivative of a multivariable function with respect to one variable, while keeping the other variables constant. The partial derivative of $f (x, y)$ with respect to $x$ is denoted by:
[
\frac{\partial f}{\partial x} = \lim_{\Delta x \to 0} \frac{f(x + \Delta x, y) - f(x, y)}{\Delta x}
]
This gives the rate of change of $f$ with respect to $x$ , holding $y$ constant.

Example:
Let $f (x, y) = x^{2} + y^{2}$ .
- The partial derivative with respect to $x$ is:
[
\frac{\partial f}{\partial x} = 2x
]
- The partial derivative with respect to $y$ is:
[
\frac{\partial f}{\partial y} = 2y
]

8.3 The Gradient Vector#

The gradient of a function $f (x_{1}, x_{2}, \dots, x_{n})$ is a vector of its partial derivatives. It points in the direction of the greatest rate of increase of the function. The gradient is denoted by $\nabla f$ or $grad (f)$ , and is defined as:
[
\nabla f(x) = \left( \frac{\partial f}{\partial x_1}, \frac{\partial f}{\partial x_2}, \dots, \frac{\partial f}{\partial x_n} \right)
]
In two dimensions, the gradient is a vector of the partial derivatives with respect to $x$ and $y$ :
[
\nabla f(x, y) = \left( \frac{\partial f}{\partial x}, \frac{\partial f}{\partial y} \right)
]

Example:
For the function $f (x, y) = x^{2} + y^{2}$ , the gradient is:
[
\nabla f(x, y) = \left( 2x, 2y \right)
]
This gradient points away from the origin, and its magnitude increases with distance from the origin.

8.4 Directional Derivative#

The directional derivative of a function $f$ at a point $(x_{0}, y_{0})$ in the direction of a unit vector $v = (v_{1}, v_{2})$ is the rate at which $f$ changes as you move from $(x_{0}, y_{0})$ in the direction of $v$ . It is given by the dot product of the gradient and the direction vector:
[
D_{\mathbf{v}} f(x_0, y_0) = \nabla f(x_0, y_0) \cdot \mathbf{v}
]

Example:
Let $f (x, y) = x^{2} + y^{2}$ , and compute the directional derivative at $(1, 2)$ in the direction of $v = (3, 4)$ :
1. Compute the gradient: $\nabla f (x, y) = (2 x, 2 y)$ .
At $(1, 2)$ , $\nabla f (1, 2) = (2, 4)$ .
2. Normalize the direction vector $v = (3, 4)$ :
[
\mathbf{v} = \frac{(3, 4)}{\sqrt{3^2 + 4^2}} = \frac{(3, 4)}{5}
]
3. Compute the directional derivative:
[
D_{\mathbf{v}} f(1, 2) = (2, 4) \cdot \left( \frac{3}{5}, \frac{4}{5} \right) = \frac{6}{5} + \frac{16}{5} = \frac{22}{5}
]

8.5 Higher-Order Partial Derivatives#

In multivariable calculus, we can also compute higher-order partial derivatives, which involve differentiating a function more than once with respect to one or more variables.

The second-order partial derivative of $f (x, y)$ with respect to $x$ and $y$ is denoted as $\frac{\partial^{2} f}{\partial x \partial y}$ .
The second-order partial derivative with respect to the same variable is denoted as $\frac{\partial^{2} f}{\partial x^{2}}$ or $\frac{\partial^{2} f}{\partial y^{2}}$ .

Example:
Let $f (x, y) = x^{2} y + y^{3}$ .
- The first-order partial derivative with respect to $x$ is:
[
\frac{\partial f}{\partial x} = 2xy
]
- The second-order partial derivative with respect to $x$ and $y$ is:
[
\frac{\partial^2 f}{\partial x \partial y} = 2x + 3y^2
]

8.6 Multivariate Taylor Series#

The Taylor series for a function of multiple variables is an expansion of the function around a point $a = (a_{1}, a_{2}, \dots, a_{n})$ . It generalizes the one-variable Taylor series to multiple variables.

The second-order Taylor expansion of $f (x, y)$ around the point $(a, b)$ is given by:
[
f(x, y) \approx f(a, b) + \frac{\partial f}{\partial x}(a, b)(x - a) + \frac{\partial f}{\partial y}(a, b)(y - b) + \frac{1}{2} \left( \frac{\partial^2 f}{\partial x^2}(a, b)(x - a)^2 + 2 \frac{\partial^2 f}{\partial x \partial y}(a, b)(x - a)(y - b) + \frac{\partial^2 f}{\partial y^2}(a, b)(y - b)^2 \right)
]

8.7 Optimization in Multivariate Calculus#

In optimization, we use partial derivatives and the gradient to find the maximum or minimum of a multivariable function. For an unconstrained optimization problem, we find the critical points by setting the gradient equal to zero:
[
\nabla f(x, y) = 0
]
These critical points can then be classified using the second derivative test.

If $\nabla^{2} f (x, y)$ (the Hessian matrix) is positive definite, the point is a local minimum.
If $\nabla^{2} f (x, y)$ is negative definite, the point is a local maximum.
If $\nabla^{2} f (x, y)$ is indefinite, the point is a saddle point.

Practice Problems#

Find the partial derivatives of the function $f (x, y) = x^{3} y^{2} + 2 x y$ .
Compute the gradient of $f (x, y, z) = x^{2} + y^{2} + z^{2}$ .
Find the second-order partial derivatives of $f (x, y) = e^{x y}$ .
Compute the directional derivative of $f (x, y) = x^{2} + y^{2}$ at the point $(1, 1)$ in the direction of the vector $v = (3, 4)$ .
Use the second derivative test to classify the critical points of $f (x, y) = x^{2} + y^{2} - 4 x y$ .

Would you like solutions to these problems, or should we move on to Vector Calculus?

9. Vector Calculus#

Vector calculus is a branch of calculus that deals with vector fields and the differential operations applied to them. It is fundamental in fields like electromagnetism, fluid dynamics, and optimization, especially in machine learning where it helps with understanding gradients and optimization algorithms.

9.1 Vector Fields#

A vector field assigns a vector to each point in a space. For example, in two dimensions, a vector field $F$ is a function $F : R^{2} \to R^{2}$ , where for each point $(x, y)$ , it gives a vector $F (x, y) = (F_{1} (x, y), F_{2} (x, y))$ .

In three dimensions, the vector field would be $F (x, y, z) = (F_{1} (x, y, z), F_{2} (x, y, z), F_{3} (x, y, z))$ .

9.2 Gradient, Divergence, and Curl#

There are several fundamental differential operations used in vector calculus: gradient, divergence, and curl. Each is associated with specific types of vector fields.

9.2.1 Gradient of a Scalar Field#

The gradient of a scalar field $f (x, y, z)$ gives a vector field that points in the direction of the greatest rate of increase of $f$ . It is denoted as $\nabla f$ (read "del f") and is defined as:
[
\nabla f = \left( \frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}, \frac{\partial f}{\partial z} \right)
]

The gradient of a scalar field points in the direction of the steepest ascent and has a magnitude equal to the rate of increase in that direction.

Example:
Let $f (x, y) = x^{2} + y^{2}$ .
- The gradient is:
[
\nabla f(x, y) = \left( \frac{\partial f}{\partial x}, \frac{\partial f}{\partial y} \right) = (2x, 2y)
]
This shows that the direction of steepest ascent is along the vector $(2 x, 2 y)$ .

9.2.2 Divergence of a Vector Field#

The divergence of a vector field $F = (F_{1}, F_{2}, F_{3})$ is a scalar field that measures the rate at which "stuff" is expanding out of a point. The divergence is denoted by $\nabla \cdot F$ , and is defined as:
[
\nabla \cdot \mathbf{F} = \frac{\partial F_1}{\partial x} + \frac{\partial F_2}{\partial y} + \frac{\partial F_3}{\partial z}
]
If $\nabla \cdot F > 0$ , it indicates that the vector field is "expanding" or "diverging" at that point. If $\nabla \cdot F < 0$ , the field is "converging."

Example:
For the vector field $F (x, y) = (x^{2}, y^{2})$ :
[
\nabla \cdot \mathbf{F} = \frac{\partial x^2}{\partial x} + \frac{\partial y^2}{\partial y} = 2x + 2y
]
Thus, the divergence of this vector field is $2 x + 2 y$ .

9.2.3 Curl of a Vector Field#

The curl of a vector field $F = (F_{1}, F_{2}, F_{3})$ is a vector field that measures the rotation or swirling of the field around a point. The curl is denoted by $\nabla \times F$ , and is defined as:
[
\nabla \times \mathbf{F} = \left( \frac{\partial F_3}{\partial y} - \frac{\partial F_2}{\partial z}, \frac{\partial F_1}{\partial z} - \frac{\partial F_3}{\partial x}, \frac{\partial F_2}{\partial x} - \frac{\partial F_1}{\partial y} \right)
]
If the curl is zero, it means the vector field has no rotation at that point.

Example:
For the vector field $F (x, y, z) = (0, 0, x^{2} + y^{2})$ :
[
\nabla \times \mathbf{F} = \left( \frac{\partial}{\partial y}(x^2 + y^2) - 0, \frac{\partial}{\partial z}(x^2 + y^2) - 0, 0 - 0 \right)
]
This results in:
[
\nabla \times \mathbf{F} = (2y, 0, 0)
]
The field has a rotation around the $x$ -axis.

9.3 Line Integrals#

A line integral of a vector field $F$ along a curve $C$ is the integral of the field along the curve. It represents the work done by the field along the path. The line integral is given by:
[
\int_C \mathbf{F} \cdot d\mathbf{r}
]
Where $d r$ is the differential element of the curve $C$ , and $F \cdot d r$ is the dot product of the vector field with the tangent vector to the curve.

Example:
Let $F = (x, y)$ and $C$ be a line from $(0, 0)$ to $(1, 1)$ . The line integral is:
[
\int_C \mathbf{F} \cdot d\mathbf{r} = \int_0^1 (x + y) \, dx
]
This can be computed by parametrizing the line $C$ .

9.4 Surface Integrals#

A surface integral of a vector field over a surface $S$ measures the flux of the vector field through the surface. It is given by:
[
\int_S \mathbf{F} \cdot d\mathbf{A}
]
Where $d A$ is the differential element of the surface area. The flux through a surface depends on the vector field and the orientation of the surface.

9.5 Fundamental Theorems of Vector Calculus#

9.5.1 Gauss's Divergence Theorem#

Gauss's Divergence Theorem states that the flux of a vector field through a closed surface is equal to the volume integral of the divergence of the field over the region enclosed by the surface:
[
\int_S \mathbf{F} \cdot d\mathbf{A} = \int_V (\nabla \cdot \mathbf{F}) \, dV
]
Where $S$ is the closed surface and $V$ is the volume inside the surface.

9.5.2 Stokes' Theorem#

Stokes' Theorem relates the surface integral of the curl of a vector field over a surface $S$ to the line integral of the vector field along the boundary curve $\partial S$ of the surface:
[
\int_S (\nabla \times \mathbf{F}) \cdot d\mathbf{A} = \int_{\partial S} \mathbf{F} \cdot d\mathbf{r}
]
Where $\partial S$ is the boundary curve of the surface $S$ .

Practice Problems#

Compute the gradient of the function $f (x, y, z) = x^{2} + y^{2} + z^{2}$ .
Find the divergence of the vector field $F (x, y, z) = (x, y, z)$ .
Compute the curl of the vector field $F (x, y, z) = (y z, x z, x y)$ .
Calculate the line integral of $F (x, y) = (x^{2}, y^{2})$ along the curve $C$ defined by $y = x$ from $(0, 0)$ to $(1, 1)$ .
Verify Gauss's Divergence Theorem for a simple vector field $F (x, y, z) = (x, y, z)$ over a cube with edges from $(0, 0, 0)$ to $(1, 1, 1)$ .

Would you like solutions to these problems, or should we move on to Jacobian and Hessian?

9. Vector Calculus#

Vector calculus is a branch of calculus that deals with vector fields and the differential operations applied to them. It is fundamental in fields like electromagnetism, fluid dynamics, and optimization, especially in machine learning where it helps with understanding gradients and optimization algorithms.

9.1 Vector Fields#

A vector field assigns a vector to each point in a space. For example, in two dimensions, a vector field $F$ is a function $F : R^{2} \to R^{2}$ , where for each point $(x, y)$ , it gives a vector $F (x, y) = (F_{1} (x, y), F_{2} (x, y))$ .

In three dimensions, the vector field would be $F (x, y, z) = (F_{1} (x, y, z), F_{2} (x, y, z), F_{3} (x, y, z))$ .

9.2 Gradient, Divergence, and Curl#

There are several fundamental differential operations used in vector calculus: gradient, divergence, and curl. Each is associated with specific types of vector fields.

9.2.1 Gradient of a Scalar Field#

The gradient of a scalar field $f (x, y, z)$ gives a vector field that points in the direction of the greatest rate of increase of $f$ . It is denoted as $\nabla f$ (read "del f") and is defined as:
[
\nabla f = \left( \frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}, \frac{\partial f}{\partial z} \right)
]

The gradient of a scalar field points in the direction of the steepest ascent and has a magnitude equal to the rate of increase in that direction.

Example:
Let $f (x, y) = x^{2} + y^{2}$ .
- The gradient is:
[
\nabla f(x, y) = \left( \frac{\partial f}{\partial x}, \frac{\partial f}{\partial y} \right) = (2x, 2y)
]
This shows that the direction of steepest ascent is along the vector $(2 x, 2 y)$ .

9.2.2 Divergence of a Vector Field#

The divergence of a vector field $F = (F_{1}, F_{2}, F_{3})$ is a scalar field that measures the rate at which "stuff" is expanding out of a point. The divergence is denoted by $\nabla \cdot F$ , and is defined as:
[
\nabla \cdot \mathbf{F} = \frac{\partial F_1}{\partial x} + \frac{\partial F_2}{\partial y} + \frac{\partial F_3}{\partial z}
]
If $\nabla \cdot F > 0$ , it indicates that the vector field is "expanding" or "diverging" at that point. If $\nabla \cdot F < 0$ , the field is "converging."

Example:
For the vector field $F (x, y) = (x^{2}, y^{2})$ :
[
\nabla \cdot \mathbf{F} = \frac{\partial x^2}{\partial x} + \frac{\partial y^2}{\partial y} = 2x + 2y
]
Thus, the divergence of this vector field is $2 x + 2 y$ .

9.2.3 Curl of a Vector Field#

The curl of a vector field $F = (F_{1}, F_{2}, F_{3})$ is a vector field that measures the rotation or swirling of the field around a point. The curl is denoted by $\nabla \times F$ , and is defined as:
[
\nabla \times \mathbf{F} = \left( \frac{\partial F_3}{\partial y} - \frac{\partial F_2}{\partial z}, \frac{\partial F_1}{\partial z} - \frac{\partial F_3}{\partial x}, \frac{\partial F_2}{\partial x} - \frac{\partial F_1}{\partial y} \right)
]
If the curl is zero, it means the vector field has no rotation at that point.

Example:
For the vector field $F (x, y, z) = (0, 0, x^{2} + y^{2})$ :
[
\nabla \times \mathbf{F} = \left( \frac{\partial}{\partial y}(x^2 + y^2) - 0, \frac{\partial}{\partial z}(x^2 + y^2) - 0, 0 - 0 \right)
]
This results in:
[
\nabla \times \mathbf{F} = (2y, 0, 0)
]
The field has a rotation around the $x$ -axis.

9.3 Line Integrals#

A line integral of a vector field $F$ along a curve $C$ is the integral of the field along the curve. It represents the work done by the field along the path. The line integral is given by:
[
\int_C \mathbf{F} \cdot d\mathbf{r}
]
Where $d r$ is the differential element of the curve $C$ , and $F \cdot d r$ is the dot product of the vector field with the tangent vector to the curve.

Example:
Let $F = (x, y)$ and $C$ be a line from $(0, 0)$ to $(1, 1)$ . The line integral is:
[
\int_C \mathbf{F} \cdot d\mathbf{r} = \int_0^1 (x + y) \, dx
]
This can be computed by parametrizing the line $C$ .

9.4 Surface Integrals#

A surface integral of a vector field over a surface $S$ measures the flux of the vector field through the surface. It is given by:
[
\int_S \mathbf{F} \cdot d\mathbf{A}
]
Where $d A$ is the differential element of the surface area. The flux through a surface depends on the vector field and the orientation of the surface.

9.5 Fundamental Theorems of Vector Calculus#

9.5.1 Gauss's Divergence Theorem#

Gauss's Divergence Theorem states that the flux of a vector field through a closed surface is equal to the volume integral of the divergence of the field over the region enclosed by the surface:
[
\int_S \mathbf{F} \cdot d\mathbf{A} = \int_V (\nabla \cdot \mathbf{F}) \, dV
]
Where $S$ is the closed surface and $V$ is the volume inside the surface.

9.5.2 Stokes' Theorem#

Stokes' Theorem relates the surface integral of the curl of a vector field over a surface $S$ to the line integral of the vector field along the boundary curve $\partial S$ of the surface:
[
\int_S (\nabla \times \mathbf{F}) \cdot d\mathbf{A} = \int_{\partial S} \mathbf{F} \cdot d\mathbf{r}
]
Where $\partial S$ is the boundary curve of the surface $S$ .

Practice Problems#

Compute the gradient of the function $f (x, y, z) = x^{2} + y^{2} + z^{2}$ .
Find the divergence of the vector field $F (x, y, z) = (x, y, z)$ .
Compute the curl of the vector field $F (x, y, z) = (y z, x z, x y)$ .
Calculate the line integral of $F (x, y) = (x^{2}, y^{2})$ along the curve $C$ defined by $y = x$ from $(0, 0)$ to $(1, 1)$ .
Verify Gauss's Divergence Theorem for a simple vector field $F (x, y, z) = (x, y, z)$ over a cube with edges from $(0, 0, 0)$ to $(1, 1, 1)$ .

Would you like solutions to these problems, or should we move on to Jacobian and Hessian?

10. Jacobian and Hessian#

The Jacobian and Hessian are important concepts in multivariable calculus that are used extensively in optimization, machine learning, and the analysis of multivariable functions. They allow us to understand the behavior of functions with respect to several variables, and they play a key role in optimization algorithms like Newton’s method.

10.1 Jacobian Matrix#

The Jacobian matrix is a matrix of all first-order partial derivatives of a vector-valued function. It generalizes the derivative of a scalar-valued function to vector-valued functions. The Jacobian matrix provides insight into how the output of a function changes with respect to each of the inputs.

For a vector-valued function $F (x) = (F_{1} (x_{1}, x_{2}, \dots, x_{n}), F_{2} (x_{1}, x_{2}, \dots, x_{n}), \dots, F_{m} (x_{1}, x_{2}, \dots, x_{n}))$ , the Jacobian is a $m \times n$ matrix defined as:
[
J(\mathbf{F}) = $(\begin{matrix} \frac{\partial F_{1}}{\partial x_{1}} & \frac{\partial F_{1}}{\partial x_{2}} & \dots & \frac{\partial F_{1}}{\partial x_{n}} \frac{\partial F_{2}}{\partial x_{1}} & \frac{\partial F_{2}}{\partial x_{2}} & \dots & \frac{\partial F_{2}}{\partial x_{n}} ⋮ & ⋮ & ⋱ & ⋮ \frac{\partial F_{m}}{\partial x_{1}} & \frac{\partial F_{m}}{\partial x_{2}} & \dots & \frac{\partial F_{m}}{\partial x_{n}} \end{matrix})$
]
Where each element $\frac{\partial F_{i}}{\partial x_{j}}$ is the partial derivative of the $i$ -th output with respect to the $j$ -th input.

Example:
Consider a function $F (x, y) = (x^{2} y, e^{x y})$ . The Jacobian matrix is:
[
J(\mathbf{F}) = $(\begin{matrix} \frac{\partial}{\partial x} (x^{2} y) & \frac{\partial}{\partial y} (x^{2} y) \frac{\partial}{\partial x} (e^{x y}) & \frac{\partial}{\partial y} (e^{x y}) \end{matrix})$
= $(\begin{matrix} 2 x y & x^{2} y e^{x y} & x e^{x y} \end{matrix})$
]
This Jacobian matrix tells us how the output of the function $F$ changes with respect to changes in $x$ and $y$ .

10.2 Determinant of the Jacobian (Jacobian Determinant)#

The determinant of the Jacobian is often used to measure how a function changes locally with respect to its inputs. In the case of a function mapping $R^{n} \to R^{n}$ , the Jacobian determinant gives the scaling factor by which the function locally stretches or compresses volumes.

If the Jacobian determinant is positive, the function preserves orientation.
If the Jacobian determinant is negative, the function reverses orientation.
If the Jacobian determinant is zero, the function locally collapses to a lower-dimensional subspace.

For example, if you have a function $f (x, y) = (x^{2} + y^{2}, x y)$ , the Jacobian matrix is:
[
J(f) = $(\begin{matrix} 2 x & 2 y y & x \end{matrix})$
]
The determinant of this matrix is:
[
\text{det}(J(f)) = 2x \cdot x - 2y \cdot y = 2x^2 - 2y^2
]
This tells you how the function locally scales space at any point $(x, y)$ .

10.3 Hessian Matrix#

The Hessian matrix is a square matrix of second-order partial derivatives of a scalar-valued function. It provides information about the curvature of the function, which is essential for optimization problems. The Hessian matrix is symmetric if the function has continuous second partial derivatives (Schwarz's theorem).

For a scalar-valued function $f (x_{1}, x_{2}, \dots, x_{n})$ , the Hessian is a $n \times n$ matrix defined as:
[
H(f) = $(\begin{matrix} \frac{\partial^{2} f}{\partial x_{1}^{2}} & \frac{\partial^{2} f}{\partial x_{1} \partial x_{2}} & \dots & \frac{\partial^{2} f}{\partial x_{1} \partial x_{n}} \frac{\partial^{2} f}{\partial x_{2} \partial x_{1}} & \frac{\partial^{2} f}{\partial x_{2}^{2}} & \dots & \frac{\partial^{2} f}{\partial x_{2} \partial x_{n}} ⋮ & ⋮ & ⋱ & ⋮ \frac{\partial^{2} f}{\partial x_{n} \partial x_{1}} & \frac{\partial^{2} f}{\partial x_{n} \partial x_{2}} & \dots & \frac{\partial^{2} f}{\partial x_{n}^{2}} \end{matrix})$
]
The element $\frac{\partial^{2} f}{\partial x_{i} \partial x_{j}}$ is the second partial derivative of $f$ with respect to the $i$ -th and $j$ -th variables.

Example:
Consider the function $f (x, y) = x^{3} + y^{3} - 3 x y$ . The Hessian matrix is:
[
H(f) = $(\begin{matrix} \frac{\partial^{2} f}{\partial x^{2}} & \frac{\partial^{2} f}{\partial x \partial y} \frac{\partial^{2} f}{\partial y \partial x} & \frac{\partial^{2} f}{\partial y^{2}} \end{matrix})$
= $(\begin{matrix} 6 x & - 3 - 3 & 6 y \end{matrix})$
]

10.4 Eigenvalues and Eigenvectors of the Hessian#

The eigenvalues and eigenvectors of the Hessian matrix help to classify critical points in optimization problems.

If the Hessian matrix is positive definite (all eigenvalues are positive), the function has a local minimum at that point.
If the Hessian is negative definite (all eigenvalues are negative), the function has a local maximum.
If the Hessian has both positive and negative eigenvalues, the function has a saddle point at that location.

10.5 Applications in Optimization#

Newton's Method: The Jacobian and Hessian are essential in Newton’s method for optimization. Newton’s method iteratively refines guesses for the minimum (or maximum) by using both first and second derivatives:
[
\mathbf{x}_{k+1} = \mathbf{x}_k - H^{-1} \nabla f(\mathbf{x}_k)
]
Where $\nabla f$ is the gradient and $H$ is the Hessian matrix.
Optimization of Multivariable Functions: The Jacobian and Hessian are used in algorithms like gradient descent and in constrained optimization to analyze the function’s behavior around the point of interest.

Practice Problems#

Compute the Jacobian matrix for the function $F (x, y) = (x^{2} + y^{2}, x + y)$ .
Find the determinant of the Jacobian for $f (x, y) = (x^{2} + y^{2}, x y)$ .
Calculate the Hessian matrix of $f (x, y) = 3 x^{2} + 2 x y + y^{2}$ .
For the function $f (x, y) = x^{3} + y^{3} - 3 x y$ , find the eigenvalues and eigenvectors of the Hessian matrix and classify the critical points.
Given $f (x, y) = x^{2} + y^{2} - 2 x y$ , use the Hessian to classify the critical point at $(1, 1)$ .

Would you like solutions for these problems, or should we proceed to Multivariate Taylor Series?

10. Jacobian and Hessian#

The Jacobian and Hessian are important concepts in multivariable calculus that are used extensively in optimization, machine learning, and the analysis of multivariable functions. They allow us to understand the behavior of functions with respect to several variables, and they play a key role in optimization algorithms like Newton’s method.

10.1 Jacobian Matrix#

The Jacobian matrix is a matrix of all first-order partial derivatives of a vector-valued function. It generalizes the derivative of a scalar-valued function to vector-valued functions. The Jacobian matrix provides insight into how the output of a function changes with respect to each of the inputs.

For a vector-valued function $F (x) = (F_{1} (x_{1}, x_{2}, \dots, x_{n}), F_{2} (x_{1}, x_{2}, \dots, x_{n}), \dots, F_{m} (x_{1}, x_{2}, \dots, x_{n}))$ , the Jacobian is a $m \times n$ matrix defined as:
[
J(\mathbf{F}) = $(\begin{matrix} \frac{\partial F_{1}}{\partial x_{1}} & \frac{\partial F_{1}}{\partial x_{2}} & \dots & \frac{\partial F_{1}}{\partial x_{n}} \frac{\partial F_{2}}{\partial x_{1}} & \frac{\partial F_{2}}{\partial x_{2}} & \dots & \frac{\partial F_{2}}{\partial x_{n}} ⋮ & ⋮ & ⋱ & ⋮ \frac{\partial F_{m}}{\partial x_{1}} & \frac{\partial F_{m}}{\partial x_{2}} & \dots & \frac{\partial F_{m}}{\partial x_{n}} \end{matrix})$
]
Where each element $\frac{\partial F_{i}}{\partial x_{j}}$ is the partial derivative of the $i$ -th output with respect to the $j$ -th input.

Example:
Consider a function $F (x, y) = (x^{2} y, e^{x y})$ . The Jacobian matrix is:
[
J(\mathbf{F}) = $(\begin{matrix} \frac{\partial}{\partial x} (x^{2} y) & \frac{\partial}{\partial y} (x^{2} y) \frac{\partial}{\partial x} (e^{x y}) & \frac{\partial}{\partial y} (e^{x y}) \end{matrix})$
= $(\begin{matrix} 2 x y & x^{2} y e^{x y} & x e^{x y} \end{matrix})$
]
This Jacobian matrix tells us how the output of the function $F$ changes with respect to changes in $x$ and $y$ .

10.2 Determinant of the Jacobian (Jacobian Determinant)#

The determinant of the Jacobian is often used to measure how a function changes locally with respect to its inputs. In the case of a function mapping $R^{n} \to R^{n}$ , the Jacobian determinant gives the scaling factor by which the function locally stretches or compresses volumes.

If the Jacobian determinant is positive, the function preserves orientation.
If the Jacobian determinant is negative, the function reverses orientation.
If the Jacobian determinant is zero, the function locally collapses to a lower-dimensional subspace.

For example, if you have a function $f (x, y) = (x^{2} + y^{2}, x y)$ , the Jacobian matrix is:
[
J(f) = $(\begin{matrix} 2 x & 2 y y & x \end{matrix})$
]
The determinant of this matrix is:
[
\text{det}(J(f)) = 2x \cdot x - 2y \cdot y = 2x^2 - 2y^2
]
This tells you how the function locally scales space at any point $(x, y)$ .

10.3 Hessian Matrix#

The Hessian matrix is a square matrix of second-order partial derivatives of a scalar-valued function. It provides information about the curvature of the function, which is essential for optimization problems. The Hessian matrix is symmetric if the function has continuous second partial derivatives (Schwarz's theorem).

For a scalar-valued function $f (x_{1}, x_{2}, \dots, x_{n})$ , the Hessian is a $n \times n$ matrix defined as:
[
H(f) = $(\begin{matrix} \frac{\partial^{2} f}{\partial x_{1}^{2}} & \frac{\partial^{2} f}{\partial x_{1} \partial x_{2}} & \dots & \frac{\partial^{2} f}{\partial x_{1} \partial x_{n}} \frac{\partial^{2} f}{\partial x_{2} \partial x_{1}} & \frac{\partial^{2} f}{\partial x_{2}^{2}} & \dots & \frac{\partial^{2} f}{\partial x_{2} \partial x_{n}} ⋮ & ⋮ & ⋱ & ⋮ \frac{\partial^{2} f}{\partial x_{n} \partial x_{1}} & \frac{\partial^{2} f}{\partial x_{n} \partial x_{2}} & \dots & \frac{\partial^{2} f}{\partial x_{n}^{2}} \end{matrix})$
]
The element $\frac{\partial^{2} f}{\partial x_{i} \partial x_{j}}$ is the second partial derivative of $f$ with respect to the $i$ -th and $j$ -th variables.

Example:
Consider the function $f (x, y) = x^{3} + y^{3} - 3 x y$ . The Hessian matrix is:
[
H(f) = $(\begin{matrix} \frac{\partial^{2} f}{\partial x^{2}} & \frac{\partial^{2} f}{\partial x \partial y} \frac{\partial^{2} f}{\partial y \partial x} & \frac{\partial^{2} f}{\partial y^{2}} \end{matrix})$
= $(\begin{matrix} 6 x & - 3 - 3 & 6 y \end{matrix})$
]

10.4 Eigenvalues and Eigenvectors of the Hessian#

The eigenvalues and eigenvectors of the Hessian matrix help to classify critical points in optimization problems.

If the Hessian matrix is positive definite (all eigenvalues are positive), the function has a local minimum at that point.
If the Hessian is negative definite (all eigenvalues are negative), the function has a local maximum.
If the Hessian has both positive and negative eigenvalues, the function has a saddle point at that location.

10.5 Applications in Optimization#

Newton's Method: The Jacobian and Hessian are essential in Newton’s method for optimization. Newton’s method iteratively refines guesses for the minimum (or maximum) by using both first and second derivatives:
[
\mathbf{x}_{k+1} = \mathbf{x}_k - H^{-1} \nabla f(\mathbf{x}_k)
]
Where $\nabla f$ is the gradient and $H$ is the Hessian matrix.
Optimization of Multivariable Functions: The Jacobian and Hessian are used in algorithms like gradient descent and in constrained optimization to analyze the function’s behavior around the point of interest.

Practice Problems#

Compute the Jacobian matrix for the function $F (x, y) = (x^{2} + y^{2}, x + y)$ .
Find the determinant of the Jacobian for $f (x, y) = (x^{2} + y^{2}, x y)$ .
Calculate the Hessian matrix of $f (x, y) = 3 x^{2} + 2 x y + y^{2}$ .
For the function $f (x, y) = x^{3} + y^{3} - 3 x y$ , find the eigenvalues and eigenvectors of the Hessian matrix and classify the critical points.
Given $f (x, y) = x^{2} + y^{2} - 2 x y$ , use the Hessian to classify the critical point at $(1, 1)$ .

Would you like solutions for these problems, or should we proceed to Multivariate Taylor Series?

11. Multivariate Taylor Series#

The Taylor series is a powerful tool used to approximate functions near a specific point using polynomials. In the case of multivariable functions, the multivariate Taylor series expansion provides an approximation of the function around a point $x_{0}$ based on its derivatives at that point.

11.1 Taylor Series for Scalar-Valued Functions#

For a scalar-valued function $f : R^{n} \to R$ , the Taylor series expansion of $f$ at a point $x_{0} = (x_{0}^{1}, x_{0}^{2}, \dots, x_{0}^{n})$ is given by:
[
f(\mathbf{x}) \approx f(\mathbf{x_0}) + \nabla f(\mathbf{x_0}) \cdot (\mathbf{x} - \mathbf{x_0}) + \frac{1}{2} (\mathbf{x} - \mathbf{x_0})^T H(\mathbf{x_0}) (\mathbf{x} - \mathbf{x_0}) + \dots
]
Where:
- $\nabla f (x_{0})$ is the gradient of $f$ at $x_{0}$ ,
- $H (x_{0})$ is the Hessian matrix of second partial derivatives at $x_{0}$ .

The first term is the value of the function at $x_{0}$ , the second term involves the gradient (the first-order approximation), and the third term involves the Hessian (the second-order approximation).

11.2 Taylor Expansion in 2 Variables#

For a function of two variables $f (x, y)$ , the Taylor expansion at $(x_{0}, y_{0})$ is:
[
f(x, y) \approx f(x_0, y_0) + \frac{\partial f}{\partial x}(x_0, y_0) (x - x_0) + \frac{\partial f}{\partial y}(x_0, y_0) (y - y_0) + \frac{1}{2} \left[ \frac{\partial^2 f}{\partial x^2}(x_0, y_0) (x - x_0)^2 + 2 \frac{\partial^2 f}{\partial x \partial y}(x_0, y_0) (x - x_0)(y - y_0) + \frac{\partial^2 f}{\partial y^2}(x_0, y_0) (y - y_0)^2 \right] + \dots
]
This expansion uses both first and second-order partial derivatives of the function at the point $(x_{0}, y_{0})$ .

11.3 General Form of Multivariate Taylor Series#

For a function $f (x)$ of $n$ variables, the multivariate Taylor series is:
[
f(\mathbf{x}) = f(\mathbf{x_0}) + \sum_{i=1}^n \frac{\partial f}{\partial x_i}(\mathbf{x_0}) (x_i - x_{0i}) + \frac{1}{2} \sum_{i=1}^n \sum_{j=1}^n \frac{\partial^2 f}{\partial x_i \partial x_j}(\mathbf{x_0}) (x_i - x_{0i})(x_j - x_{0j}) + \dots
]
Where $x = (x_{1}, x_{2}, \dots, x_{n})$ and $x_{0} = (x_{01}, x_{02}, \dots, x_{0 n})$ . The first sum is for the gradient (first derivatives), and the second sum is for the Hessian (second derivatives).

11.4 Example of Multivariate Taylor Series Expansion#

Let’s look at an example to see how to apply the multivariate Taylor series:

Example 1:
Consider the function $f (x, y) = x^{2} + y^{2}$ , and we want to expand it around the point $(x_{0}, y_{0}) = (1, 1)$ .

Step 1: Compute function value at $(x_{0}, y_{0})$ :
[
f(1, 1) = 1^2 + 1^2 = 2
]
Step 2: Compute the first derivatives (gradient):
[
\frac{\partial f}{\partial x} = 2x, \quad \frac{\partial f}{\partial y} = 2y
]
At $(x_{0}, y_{0}) = (1, 1)$ , the gradient is:
[
\nabla f(1, 1) = (2, 2)
]
Step 3: Compute the second derivatives (Hessian):
[
\frac{\partial^2 f}{\partial x^2} = 2, \quad \frac{\partial^2 f}{\partial y^2} = 2, \quad \frac{\partial^2 f}{\partial x \partial y} = 0
]
So, the Hessian is:
[
H(f) = $(\begin{matrix} 2 & 0 0 & 2 \end{matrix})$
]
Step 4: Write the Taylor series expansion:
[
f(x, y) \approx f(1, 1) + \nabla f(1, 1) \cdot (x - 1, y - 1) + \frac{1}{2} (x - 1, y - 1) H(f) (x - 1, y - 1)^T
]
Substituting the values:
[
f(x, y) \approx 2 + (2, 2) \cdot (x - 1, y - 1) + \frac{1}{2} (x - 1, y - 1) $(\begin{matrix} 2 & 0 0 & 2 \end{matrix})$ (x - 1, y - 1)^T
]
Expanding the terms:
[
f(x, y) \approx 2 + 2(x - 1) + 2(y - 1) + (x - 1)^2 + (y - 1)^2
]
This is the second-order Taylor expansion of $f (x, y)$ around $(1, 1)$ .

11.5 Application of Multivariate Taylor Series in Optimization#

In optimization, Taylor series expansions are used to approximate the behavior of functions near a given point. For example, when finding a local minimum or maximum, we often linearize the function (first-order approximation) using the gradient, and sometimes use second-order approximations (Hessian) to determine the nature of the critical points (whether they are minima, maxima, or saddle points).

Example: Newton’s Method in Optimization

Newton’s method uses the second-order Taylor series expansion to find critical points. Given an objective function $f (x)$ , the update rule is:
[
\mathbf{x}_{k+1} = \mathbf{x}_k - H^{-1}(\mathbf{x}_k) \nabla f(\mathbf{x}_k)
]
Where $\nabla f (x_{k})$ is the gradient and $H (x_{k})$ is the Hessian matrix at $x_{k}$ .

Practice Problems#

Compute the second-order Taylor expansion of $f (x, y) = x^{2} y + y^{2} x$ around the point $(x_{0}, y_{0}) = (1, 1)$ .
For the function $f (x, y) = e^{x + y}$ , compute the second-order Taylor series expansion around $(x_{0}, y_{0}) = (0, 0)$ .
Find the critical points of the function $f (x, y) = x^{3} + y^{3} - 3 x y$ using the first and second derivatives, and classify them using the Hessian matrix.

Would you like solutions for these problems, or should we proceed to the next topic: Gradient Descent?