Multivariate Calculus#
8. Multivariate Calculus#
Multivariate calculus is the branch of calculus that deals with functions of more than one variable. It is essential for analyzing systems in higher dimensions, such as in optimization, machine learning, and physics. In this section, we will cover key topics such as partial derivatives, gradient vectors, and optimization.
8.1 Functions of Several Variables#
A multivariable function is a function that takes two or more variables as input. For example, a function
[
f: \mathbb{R}^2 \to \mathbb{R}, \quad f(x, y)
]
Similarly, a function in three variables is written as
8.2 Partial Derivatives#
A partial derivative is the derivative of a multivariable function with respect to one variable, while keeping the other variables constant. The partial derivative of
[
\frac{\partial f}{\partial x} = \lim_{\Delta x \to 0} \frac{f(x + \Delta x, y) - f(x, y)}{\Delta x}
]
This gives the rate of change of
Example:
Let
- The partial derivative with respect to
[
\frac{\partial f}{\partial x} = 2x
]
- The partial derivative with respect to
[
\frac{\partial f}{\partial y} = 2y
]
8.3 The Gradient Vector#
The gradient of a function
[
\nabla f(x) = \left( \frac{\partial f}{\partial x_1}, \frac{\partial f}{\partial x_2}, \dots, \frac{\partial f}{\partial x_n} \right)
]
In two dimensions, the gradient is a vector of the partial derivatives with respect to
[
\nabla f(x, y) = \left( \frac{\partial f}{\partial x}, \frac{\partial f}{\partial y} \right)
]
Example:
For the function
[
\nabla f(x, y) = \left( 2x, 2y \right)
]
This gradient points away from the origin, and its magnitude increases with distance from the origin.
8.4 Directional Derivative#
The directional derivative of a function
[
D_{\mathbf{v}} f(x_0, y_0) = \nabla f(x_0, y_0) \cdot \mathbf{v}
]
Example:
Let
1. Compute the gradient:
At
2. Normalize the direction vector
[
\mathbf{v} = \frac{(3, 4)}{\sqrt{3^2 + 4^2}} = \frac{(3, 4)}{5}
]
3. Compute the directional derivative:
[
D_{\mathbf{v}} f(1, 2) = (2, 4) \cdot \left( \frac{3}{5}, \frac{4}{5} \right) = \frac{6}{5} + \frac{16}{5} = \frac{22}{5}
]
8.5 Higher-Order Partial Derivatives#
In multivariable calculus, we can also compute higher-order partial derivatives, which involve differentiating a function more than once with respect to one or more variables.
- The second-order partial derivative of
with respect to and is denoted as . - The second-order partial derivative with respect to the same variable is denoted as
or .
Example:
Let
- The first-order partial derivative with respect to
[
\frac{\partial f}{\partial x} = 2xy
]
- The second-order partial derivative with respect to
[
\frac{\partial^2 f}{\partial x \partial y} = 2x + 3y^2
]
8.6 Multivariate Taylor Series#
The Taylor series for a function of multiple variables is an expansion of the function around a point
The second-order Taylor expansion of
[
f(x, y) \approx f(a, b) + \frac{\partial f}{\partial x}(a, b)(x - a) + \frac{\partial f}{\partial y}(a, b)(y - b) + \frac{1}{2} \left( \frac{\partial^2 f}{\partial x^2}(a, b)(x - a)^2 + 2 \frac{\partial^2 f}{\partial x \partial y}(a, b)(x - a)(y - b) + \frac{\partial^2 f}{\partial y^2}(a, b)(y - b)^2 \right)
]
8.7 Optimization in Multivariate Calculus#
In optimization, we use partial derivatives and the gradient to find the maximum or minimum of a multivariable function. For an unconstrained optimization problem, we find the critical points by setting the gradient equal to zero:
[
\nabla f(x, y) = 0
]
These critical points can then be classified using the second derivative test.
- If
(the Hessian matrix) is positive definite, the point is a local minimum. - If
is negative definite, the point is a local maximum. - If
is indefinite, the point is a saddle point.
Practice Problems#
- Find the partial derivatives of the function
. - Compute the gradient of
. - Find the second-order partial derivatives of
. - Compute the directional derivative of
at the point in the direction of the vector . - Use the second derivative test to classify the critical points of
.
Would you like solutions to these problems, or should we move on to Vector Calculus?
9. Vector Calculus#
Vector calculus is a branch of calculus that deals with vector fields and the differential operations applied to them. It is fundamental in fields like electromagnetism, fluid dynamics, and optimization, especially in machine learning where it helps with understanding gradients and optimization algorithms.
9.1 Vector Fields#
A vector field assigns a vector to each point in a space. For example, in two dimensions, a vector field
In three dimensions, the vector field would be
9.2 Gradient, Divergence, and Curl#
There are several fundamental differential operations used in vector calculus: gradient, divergence, and curl. Each is associated with specific types of vector fields.
9.2.1 Gradient of a Scalar Field#
The gradient of a scalar field
[
\nabla f = \left( \frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}, \frac{\partial f}{\partial z} \right)
]
The gradient of a scalar field points in the direction of the steepest ascent and has a magnitude equal to the rate of increase in that direction.
Example:
Let
- The gradient is:
[
\nabla f(x, y) = \left( \frac{\partial f}{\partial x}, \frac{\partial f}{\partial y} \right) = (2x, 2y)
]
This shows that the direction of steepest ascent is along the vector
9.2.2 Divergence of a Vector Field#
The divergence of a vector field
[
\nabla \cdot \mathbf{F} = \frac{\partial F_1}{\partial x} + \frac{\partial F_2}{\partial y} + \frac{\partial F_3}{\partial z}
]
If
Example:
For the vector field
[
\nabla \cdot \mathbf{F} = \frac{\partial x^2}{\partial x} + \frac{\partial y^2}{\partial y} = 2x + 2y
]
Thus, the divergence of this vector field is
9.2.3 Curl of a Vector Field#
The curl of a vector field
[
\nabla \times \mathbf{F} = \left( \frac{\partial F_3}{\partial y} - \frac{\partial F_2}{\partial z}, \frac{\partial F_1}{\partial z} - \frac{\partial F_3}{\partial x}, \frac{\partial F_2}{\partial x} - \frac{\partial F_1}{\partial y} \right)
]
If the curl is zero, it means the vector field has no rotation at that point.
Example:
For the vector field
[
\nabla \times \mathbf{F} = \left( \frac{\partial}{\partial y}(x^2 + y^2) - 0, \frac{\partial}{\partial z}(x^2 + y^2) - 0, 0 - 0 \right)
]
This results in:
[
\nabla \times \mathbf{F} = (2y, 0, 0)
]
The field has a rotation around the
9.3 Line Integrals#
A line integral of a vector field
[
\int_C \mathbf{F} \cdot d\mathbf{r}
]
Where
Example:
Let
[
\int_C \mathbf{F} \cdot d\mathbf{r} = \int_0^1 (x + y) \, dx
]
This can be computed by parametrizing the line
9.4 Surface Integrals#
A surface integral of a vector field over a surface
[
\int_S \mathbf{F} \cdot d\mathbf{A}
]
Where
9.5 Fundamental Theorems of Vector Calculus#
9.5.1 Gauss's Divergence Theorem#
Gauss's Divergence Theorem states that the flux of a vector field through a closed surface is equal to the volume integral of the divergence of the field over the region enclosed by the surface:
[
\int_S \mathbf{F} \cdot d\mathbf{A} = \int_V (\nabla \cdot \mathbf{F}) \, dV
]
Where
9.5.2 Stokes' Theorem#
Stokes' Theorem relates the surface integral of the curl of a vector field over a surface
[
\int_S (\nabla \times \mathbf{F}) \cdot d\mathbf{A} = \int_{\partial S} \mathbf{F} \cdot d\mathbf{r}
]
Where
Practice Problems#
- Compute the gradient of the function
. - Find the divergence of the vector field
. - Compute the curl of the vector field
. - Calculate the line integral of
along the curve defined by from to . - Verify Gauss's Divergence Theorem for a simple vector field
over a cube with edges from to .
Would you like solutions to these problems, or should we move on to Jacobian and Hessian?
9. Vector Calculus#
Vector calculus is a branch of calculus that deals with vector fields and the differential operations applied to them. It is fundamental in fields like electromagnetism, fluid dynamics, and optimization, especially in machine learning where it helps with understanding gradients and optimization algorithms.
9.1 Vector Fields#
A vector field assigns a vector to each point in a space. For example, in two dimensions, a vector field
In three dimensions, the vector field would be
9.2 Gradient, Divergence, and Curl#
There are several fundamental differential operations used in vector calculus: gradient, divergence, and curl. Each is associated with specific types of vector fields.
9.2.1 Gradient of a Scalar Field#
The gradient of a scalar field
[
\nabla f = \left( \frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}, \frac{\partial f}{\partial z} \right)
]
The gradient of a scalar field points in the direction of the steepest ascent and has a magnitude equal to the rate of increase in that direction.
Example:
Let
- The gradient is:
[
\nabla f(x, y) = \left( \frac{\partial f}{\partial x}, \frac{\partial f}{\partial y} \right) = (2x, 2y)
]
This shows that the direction of steepest ascent is along the vector
9.2.2 Divergence of a Vector Field#
The divergence of a vector field
[
\nabla \cdot \mathbf{F} = \frac{\partial F_1}{\partial x} + \frac{\partial F_2}{\partial y} + \frac{\partial F_3}{\partial z}
]
If
Example:
For the vector field
[
\nabla \cdot \mathbf{F} = \frac{\partial x^2}{\partial x} + \frac{\partial y^2}{\partial y} = 2x + 2y
]
Thus, the divergence of this vector field is
9.2.3 Curl of a Vector Field#
The curl of a vector field
[
\nabla \times \mathbf{F} = \left( \frac{\partial F_3}{\partial y} - \frac{\partial F_2}{\partial z}, \frac{\partial F_1}{\partial z} - \frac{\partial F_3}{\partial x}, \frac{\partial F_2}{\partial x} - \frac{\partial F_1}{\partial y} \right)
]
If the curl is zero, it means the vector field has no rotation at that point.
Example:
For the vector field
[
\nabla \times \mathbf{F} = \left( \frac{\partial}{\partial y}(x^2 + y^2) - 0, \frac{\partial}{\partial z}(x^2 + y^2) - 0, 0 - 0 \right)
]
This results in:
[
\nabla \times \mathbf{F} = (2y, 0, 0)
]
The field has a rotation around the
9.3 Line Integrals#
A line integral of a vector field
[
\int_C \mathbf{F} \cdot d\mathbf{r}
]
Where
Example:
Let
[
\int_C \mathbf{F} \cdot d\mathbf{r} = \int_0^1 (x + y) \, dx
]
This can be computed by parametrizing the line
9.4 Surface Integrals#
A surface integral of a vector field over a surface
[
\int_S \mathbf{F} \cdot d\mathbf{A}
]
Where
9.5 Fundamental Theorems of Vector Calculus#
9.5.1 Gauss's Divergence Theorem#
Gauss's Divergence Theorem states that the flux of a vector field through a closed surface is equal to the volume integral of the divergence of the field over the region enclosed by the surface:
[
\int_S \mathbf{F} \cdot d\mathbf{A} = \int_V (\nabla \cdot \mathbf{F}) \, dV
]
Where
9.5.2 Stokes' Theorem#
Stokes' Theorem relates the surface integral of the curl of a vector field over a surface
[
\int_S (\nabla \times \mathbf{F}) \cdot d\mathbf{A} = \int_{\partial S} \mathbf{F} \cdot d\mathbf{r}
]
Where
Practice Problems#
- Compute the gradient of the function
. - Find the divergence of the vector field
. - Compute the curl of the vector field
. - Calculate the line integral of
along the curve defined by from to . - Verify Gauss's Divergence Theorem for a simple vector field
over a cube with edges from to .
Would you like solutions to these problems, or should we move on to Jacobian and Hessian?
10. Jacobian and Hessian#
The Jacobian and Hessian are important concepts in multivariable calculus that are used extensively in optimization, machine learning, and the analysis of multivariable functions. They allow us to understand the behavior of functions with respect to several variables, and they play a key role in optimization algorithms like Newton’s method.
10.1 Jacobian Matrix#
The Jacobian matrix is a matrix of all first-order partial derivatives of a vector-valued function. It generalizes the derivative of a scalar-valued function to vector-valued functions. The Jacobian matrix provides insight into how the output of a function changes with respect to each of the inputs.
For a vector-valued function
[
J(\mathbf{F}) =
]
Where each element
Example:
Consider a function
[
J(\mathbf{F}) =
=
]
This Jacobian matrix tells us how the output of the function
10.2 Determinant of the Jacobian (Jacobian Determinant)#
The determinant of the Jacobian is often used to measure how a function changes locally with respect to its inputs. In the case of a function mapping
- If the Jacobian determinant is positive, the function preserves orientation.
- If the Jacobian determinant is negative, the function reverses orientation.
- If the Jacobian determinant is zero, the function locally collapses to a lower-dimensional subspace.
For example, if you have a function
[
J(f) =
]
The determinant of this matrix is:
[
\text{det}(J(f)) = 2x \cdot x - 2y \cdot y = 2x^2 - 2y^2
]
This tells you how the function locally scales space at any point
10.3 Hessian Matrix#
The Hessian matrix is a square matrix of second-order partial derivatives of a scalar-valued function. It provides information about the curvature of the function, which is essential for optimization problems. The Hessian matrix is symmetric if the function has continuous second partial derivatives (Schwarz's theorem).
For a scalar-valued function
[
H(f) =
]
The element
Example:
Consider the function
[
H(f) =
=
]
10.4 Eigenvalues and Eigenvectors of the Hessian#
The eigenvalues and eigenvectors of the Hessian matrix help to classify critical points in optimization problems.
- If the Hessian matrix is positive definite (all eigenvalues are positive), the function has a local minimum at that point.
- If the Hessian is negative definite (all eigenvalues are negative), the function has a local maximum.
- If the Hessian has both positive and negative eigenvalues, the function has a saddle point at that location.
10.5 Applications in Optimization#
-
Newton's Method: The Jacobian and Hessian are essential in Newton’s method for optimization. Newton’s method iteratively refines guesses for the minimum (or maximum) by using both first and second derivatives:
[
\mathbf{x}_{k+1} = \mathbf{x}_k - H^{-1} \nabla f(\mathbf{x}_k)
]
Where is the gradient and is the Hessian matrix. -
Optimization of Multivariable Functions: The Jacobian and Hessian are used in algorithms like gradient descent and in constrained optimization to analyze the function’s behavior around the point of interest.
Practice Problems#
- Compute the Jacobian matrix for the function
. - Find the determinant of the Jacobian for
. - Calculate the Hessian matrix of
. - For the function
, find the eigenvalues and eigenvectors of the Hessian matrix and classify the critical points. - Given
, use the Hessian to classify the critical point at .
Would you like solutions for these problems, or should we proceed to Multivariate Taylor Series?
10. Jacobian and Hessian#
The Jacobian and Hessian are important concepts in multivariable calculus that are used extensively in optimization, machine learning, and the analysis of multivariable functions. They allow us to understand the behavior of functions with respect to several variables, and they play a key role in optimization algorithms like Newton’s method.
10.1 Jacobian Matrix#
The Jacobian matrix is a matrix of all first-order partial derivatives of a vector-valued function. It generalizes the derivative of a scalar-valued function to vector-valued functions. The Jacobian matrix provides insight into how the output of a function changes with respect to each of the inputs.
For a vector-valued function
[
J(\mathbf{F}) =
]
Where each element
Example:
Consider a function
[
J(\mathbf{F}) =
=
]
This Jacobian matrix tells us how the output of the function
10.2 Determinant of the Jacobian (Jacobian Determinant)#
The determinant of the Jacobian is often used to measure how a function changes locally with respect to its inputs. In the case of a function mapping
- If the Jacobian determinant is positive, the function preserves orientation.
- If the Jacobian determinant is negative, the function reverses orientation.
- If the Jacobian determinant is zero, the function locally collapses to a lower-dimensional subspace.
For example, if you have a function
[
J(f) =
]
The determinant of this matrix is:
[
\text{det}(J(f)) = 2x \cdot x - 2y \cdot y = 2x^2 - 2y^2
]
This tells you how the function locally scales space at any point
10.3 Hessian Matrix#
The Hessian matrix is a square matrix of second-order partial derivatives of a scalar-valued function. It provides information about the curvature of the function, which is essential for optimization problems. The Hessian matrix is symmetric if the function has continuous second partial derivatives (Schwarz's theorem).
For a scalar-valued function
[
H(f) =
]
The element
Example:
Consider the function
[
H(f) =
=
]
10.4 Eigenvalues and Eigenvectors of the Hessian#
The eigenvalues and eigenvectors of the Hessian matrix help to classify critical points in optimization problems.
- If the Hessian matrix is positive definite (all eigenvalues are positive), the function has a local minimum at that point.
- If the Hessian is negative definite (all eigenvalues are negative), the function has a local maximum.
- If the Hessian has both positive and negative eigenvalues, the function has a saddle point at that location.
10.5 Applications in Optimization#
-
Newton's Method: The Jacobian and Hessian are essential in Newton’s method for optimization. Newton’s method iteratively refines guesses for the minimum (or maximum) by using both first and second derivatives:
[
\mathbf{x}_{k+1} = \mathbf{x}_k - H^{-1} \nabla f(\mathbf{x}_k)
]
Where is the gradient and is the Hessian matrix. -
Optimization of Multivariable Functions: The Jacobian and Hessian are used in algorithms like gradient descent and in constrained optimization to analyze the function’s behavior around the point of interest.
Practice Problems#
- Compute the Jacobian matrix for the function
. - Find the determinant of the Jacobian for
. - Calculate the Hessian matrix of
. - For the function
, find the eigenvalues and eigenvectors of the Hessian matrix and classify the critical points. - Given
, use the Hessian to classify the critical point at .
Would you like solutions for these problems, or should we proceed to Multivariate Taylor Series?
11. Multivariate Taylor Series#
The Taylor series is a powerful tool used to approximate functions near a specific point using polynomials. In the case of multivariable functions, the multivariate Taylor series expansion provides an approximation of the function around a point
11.1 Taylor Series for Scalar-Valued Functions#
For a scalar-valued function
[
f(\mathbf{x}) \approx f(\mathbf{x_0}) + \nabla f(\mathbf{x_0}) \cdot (\mathbf{x} - \mathbf{x_0}) + \frac{1}{2} (\mathbf{x} - \mathbf{x_0})^T H(\mathbf{x_0}) (\mathbf{x} - \mathbf{x_0}) + \dots
]
Where:
-
-
The first term is the value of the function at
11.2 Taylor Expansion in 2 Variables#
For a function of two variables
[
f(x, y) \approx f(x_0, y_0) + \frac{\partial f}{\partial x}(x_0, y_0) (x - x_0) + \frac{\partial f}{\partial y}(x_0, y_0) (y - y_0) + \frac{1}{2} \left[ \frac{\partial^2 f}{\partial x^2}(x_0, y_0) (x - x_0)^2 + 2 \frac{\partial^2 f}{\partial x \partial y}(x_0, y_0) (x - x_0)(y - y_0) + \frac{\partial^2 f}{\partial y^2}(x_0, y_0) (y - y_0)^2 \right] + \dots
]
This expansion uses both first and second-order partial derivatives of the function at the point
11.3 General Form of Multivariate Taylor Series#
For a function
[
f(\mathbf{x}) = f(\mathbf{x_0}) + \sum_{i=1}^n \frac{\partial f}{\partial x_i}(\mathbf{x_0}) (x_i - x_{0i}) + \frac{1}{2} \sum_{i=1}^n \sum_{j=1}^n \frac{\partial^2 f}{\partial x_i \partial x_j}(\mathbf{x_0}) (x_i - x_{0i})(x_j - x_{0j}) + \dots
]
Where
11.4 Example of Multivariate Taylor Series Expansion#
Let’s look at an example to see how to apply the multivariate Taylor series:
Example 1:
Consider the function
-
Step 1: Compute function value at
:
[
f(1, 1) = 1^2 + 1^2 = 2
] -
Step 2: Compute the first derivatives (gradient):
[
\frac{\partial f}{\partial x} = 2x, \quad \frac{\partial f}{\partial y} = 2y
]
At , the gradient is:
[
\nabla f(1, 1) = (2, 2)
] -
Step 3: Compute the second derivatives (Hessian):
[
\frac{\partial^2 f}{\partial x^2} = 2, \quad \frac{\partial^2 f}{\partial y^2} = 2, \quad \frac{\partial^2 f}{\partial x \partial y} = 0
]
So, the Hessian is:
[
H(f) =
] -
Step 4: Write the Taylor series expansion:
[
f(x, y) \approx f(1, 1) + \nabla f(1, 1) \cdot (x - 1, y - 1) + \frac{1}{2} (x - 1, y - 1) H(f) (x - 1, y - 1)^T
]
Substituting the values:
[
f(x, y) \approx 2 + (2, 2) \cdot (x - 1, y - 1) + \frac{1}{2} (x - 1, y - 1) (x - 1, y - 1)^T
]
Expanding the terms:
[
f(x, y) \approx 2 + 2(x - 1) + 2(y - 1) + (x - 1)^2 + (y - 1)^2
]
This is the second-order Taylor expansion of around .
11.5 Application of Multivariate Taylor Series in Optimization#
In optimization, Taylor series expansions are used to approximate the behavior of functions near a given point. For example, when finding a local minimum or maximum, we often linearize the function (first-order approximation) using the gradient, and sometimes use second-order approximations (Hessian) to determine the nature of the critical points (whether they are minima, maxima, or saddle points).
Example: Newton’s Method in Optimization
Newton’s method uses the second-order Taylor series expansion to find critical points. Given an objective function
[
\mathbf{x}_{k+1} = \mathbf{x}_k - H^{-1}(\mathbf{x}_k) \nabla f(\mathbf{x}_k)
]
Where
Practice Problems#
- Compute the second-order Taylor expansion of
around the point . - For the function
, compute the second-order Taylor series expansion around . - Find the critical points of the function
using the first and second derivatives, and classify them using the Hessian matrix.
Would you like solutions for these problems, or should we proceed to the next topic: Gradient Descent?