Introduction

  • Definition (Gradient Flow in Linear Space)

    X is a linear space, and $F:X \to \mathbb{R}$ is smooth. Gradient flow (or steepest descent curve) is a smooth curve $X:\mathbb{R}\to X$such that
    $$
    x'(t)=-\nabla F(x(t)).
    $$

  1. Existence and uniqueness of the solution

    Since many PDEs are in the form of a gradient flow, the analysis can be applied to them.

    Example

    For $X=L^2(\mathbb{R}^n)$, a Hilbert space, and for Dirichlet energy $F(u)=\frac{1}{2}\int|\nabla u(x)|^2dx$, the Heat Equation $\partial_t u=\nabla^2u$ is a gradient flow problem.

  2. Numerical methods and their convergence

    Since gradient flow gradually minimizes $F(x)$, so many optimization methods are related to it, e.g. gradient descent, proximal descent methods, mirror descent.

  3. Generalization to the gradient flow on general metric space

    • The need of viewing PDEs are gradient flows on general metric space, thus wider applicability.

      Example

      • PDEs in the continuity equation form $\partial_t-\nabla\cdot(\rho v)=0$, where $v=\nabla\frac{\delta F}{\delta \rho}$, can be cast as a gradient flow on the space of probabilities with Wasserstein distance.
      • Heat Equation can also be viewed as gradient flow in the Wasserstein space.
    • The need of minimizing functionals on metric space.

      Example

      Optimization w.r.t. probability distributions, e.g. $\min_q KL(q||p)$. Optimization without parameterization is possible! (e.g. Stein Variational Gradient Descent)

Gradient Flow in the Euclidean Space

Variants of Gradient Flow in the Euclidean Space

Variant 1: $F$ is a convex and unnecessarily differentiable:
$$
\left\{\begin{aligned}x'(t)&\in -\partial F(x(t)), \quad a.e. t>0,\\x(0)&=x_0,\end{aligned}\right.
$$
where $x$ is an absolutely continuous curve, and
$$
\partial F(x)=\left\{ p\in\mathbb{R}^n:\forall y \in \mathbb{R}^n, F(y)\geq F(x)+p \cdot (y-x) \right\}.
$$

Theorem

Any two solutions $x_1,x_2$ of the above problem with different initial conditions satisfy $|x_1(t)-x_2(t)| \leq |x_1(0)-x_2(0)|$.

Corollary

For a given initial condition, the above problem has one unique solution.

Variant 2: $F$ is semi-convex ($\lambda$ convex)

Definition $\lambda$-convex function

$F$ is $\lambda$-convex $(\lambda\in \mathbb{R})$ if $F(x)-\frac{\lambda}{2}|x|^2$ is convex.

$$
\left\{\begin{aligned}x'(t)&\in -\partial F(x(t)), \quad a.e. t>0,\\x(0)&=x_0,\end{aligned}\right.
$$

where $x$ is an absolutely continuous curve, and
$$
\partial F(x)=\left\{ p\in\mathbb{R}^n:\forall y \in \mathbb{R}^n, F(y)\geq F(x)+p \cdot (y-x)+\frac{\lambda}{2}|y-x|^2 \right\}.
$$

Theorem

Any two solutions $x_1,x_2$ of the above problem with different initial conditions satisfy $|x_1(t)-x_2(t)|\leq e^{-\lambda t}|x_1(0)-x_2(0)|$.

Corollary

  • For a given initial condition, the above problem has one unique solution.
  • If $\lambda>0$ (strong convex), $F$ has a unique minimizer $x^*$. $x(t)\equiv x^*$ is a solution, so for any solution $x(t)$, $|x(t)-x^*|\leq e^{-\lambda t}|x(0)-x^*|$.

Approximating Curves

Definition (MMS)

Minimizing Movement Scheme (MMS): for a fixed small time step $\tau$, define a sequence $\left\{x_k^\tau\right\}_k$by
$$
x_{k+1}^{\tau}\in arg \min_x F(x)+\frac{|x-x_k^\tau|}{2\tau}.
$$

Importance:

  • Practical numerical method for approximating the curve.
  • Easier generalization to metric space, than $x'=-\nabla F(x)$ itself.

Properties:

  • Existence of solution for mild F(e.g. Lipschitz and lower bounded by $C_1-C_2|x|^2$).
  • $\frac{x_{k+1}^\tau-x_k^\tau}{\tau}\in -\partial F(x_{k+1}^\tau)$: implicit Euler scheme (more stable but hard than explicit one: gradient descent)

Convergence:

  • Define $v_{k+1}^\tau\triangleq\frac{x_{k+1}^\tau-x_k^\tau}{\tau}$, and $v^\tau(t)=v_{k+1}^\tau, t\in\left(k\tau,(k+1)\tau\right]$.

    Define two kinds of interpolations:

    1. $x^\tau(t)=x_k^\tau, t\in\left(k\tau,(k+1)\tau\right]$;
    2. $\tilde{x}^\tau(t)=x_k^\tau+(t-k\tau)v_{k+1}\tau, t\in\left(k\tau,(k+1)\tau\right]$

    $\tilde{x}^\tau$ is continuous and $(\tilde{x}^\tau)'=v^\tau$;

    $x^\tau$ is not continuous, but $v^\tau(t)\in -\partial F(x^\tau(t))$.

Theorem

If $F(x_0)<+\infty$ and $\inf F>-\infty$, then up to a subsequence $\tau_j\to 0$, both $\tilde{x}^{\tau_j}$ and $x^{\tau_j}$ converge uniformly to a same curve $x\in H^1(\mathbb{R}^n)$ and $v^{\tau_j}$ weakly converges in $L^2(\mathbb{R};\mathbb{R}^2)$ to a vector function $v$ s.t. $x'=v$ and

  1. $v(t)\in\partial F(x(t))$ a.e., if $F$ is $\lambda$-convex;
  2. $v(t)=-\nabla F(x(t)). \forall t$. if $F$ is $C^1$.

Details:

  1. $L^p$ space

    • For a measure space $(S,\sum,\mu)$, first define $\mathcal{L}(S;\mathbb{R}^n)\triangleq\left\{f:S\to \mathbb{R}^n\big|\int_S|f|^pd\mu < \infty\right\}$. It is a linear space.

    • Define $L^p(S;\mathbb{R}^n)\triangleq\mathcal{L}(S;\mathbb{R}^n)/\left\{f|f=0\quad \mu-a.e.\right\}$ to be a quotient space (i.e. treat all functions that are equal $\mu$-a.e.) as one same element in $L^p$).

      Define $\|f\|_p\triangleq\left(\int_S|f|^pd\mu\right)^{1/p}$, then for $1\leq p\leq\infty$ it is a Banach space.

    • Only $L^2(S;\mathbb{R}^n)$ can be a Hilbert space, with inner product $\left<f,g\right>_{L^2(S;\mathbb{R}^n)}\triangleq\int_Sfg\;d\mu$.

    • $L^p(S)\triangleq L^p(S;\mathbb{R})$.

  2. Weak convergence in a Hilbert space $\mathcal{H}$:

    • $x_n\in\mathcal{H}, n\geq 1, x\in\mathcal{H},x_n\rightharpoonup x$ is define as :

      $\forall f\in\mathcal{H'},f(x_n)\to f(x).\Longleftrightarrow \forall y \in\mathcal{H},\left<x_n,y\right>_{\mathcal{H}}\to\left<x,y\right>_{\mathcal{H}}$.

    • $x_n\to x\Longrightarrow x_n\rightharpoonup x$.

      $x_n\rightharpoonup x,\|x_n\|\to\|x\|\Longrightarrow x_n \to x$.

      if $\dim(\mathbb{H})\leq\infty, x_n\rightharpoonup x\Longleftrightarrow x_n\to x$.

  3. $H_k(\Omega)$ space $(\Omega \subset \mathbb{R}^n)$

    • Weak derivative. For $\mu \in C^k(\Omega)$ and $\phi \in C_c^{\infty}(\Omega)$ ($\cdot_c$ for compact support),
      $$
      \int_\Omega uD^{\alpha}\phi dx=(-1)^{|\alpha|}\int_{\Omega}\phi D^{\alpha}udx,(Integarl by parts)
      $$
      where $D^\alpha=\partial_{x_1}^{\alpha_1}\cdots\partial_{x_n}^{\alpha_n}$, and $|\alpha|=\sum_{i=1}^n\alpha_i$ is fixed as $k$. So define the weak $\alpha$-th partial derivative of $u$ as $v$:
      $$
      \int_\Omega uD^\alpha\phi dx=(-1)^\alpha\int_\Omega\phi vdx,\forall\phi\in C_{c}^{\infty}(\Omega).
      $$
      If it exists, it is uniquely defined a.e.

    • Sobolev space $W^{k,p}(\Omega)$ for $k \in \mathbb{N}$ and $p\in[1,\infty]$:
      $$
      W^{k,p}(\Omega)=\left\{u\in L^P(\Omega):D^\alpha u\in L^p(\Omega),\forall |\alpha|\leq k\right\},
      $$
      with norm:
      $$
      \|u\|_{w^{k,p}(\Omega)}=\left\{\begin{aligned}&\left(\sum_{|\alpha|\leq k}\|D^\alpha u\|_{L^p(\Omega)}^p\right)^{\frac{1}{p}},\quad 1\leq p<+\infty,\\&\max_{|\alpha|\leq k}\|D^\alpha u\|_{L^\infty(\Omega)},\quad p=+\infty.\end{aligned}\right.
      $$
      $W^{k,p}(\Omega)$ is a Banach space.

    • $H^k(\Omega)\triangleq W^{k,2}(\Omega)$. They are Hilbert space.

  4. Up to a subsequence

    There exists a sequence $\tau_j\to 0$ s.t. $\tilde{x}^{\tau_j}$ uniformly converge and $v^{\tau_j}$ weakly converge.

Characterizing Properties

Motivation

  • $x'=-\nabla F(x)$ (or $x'\in -\partial F(x)$) is hard to generalize to metric space! There is nothing but distance in metric space, so $\nabla F(x)$ or $\partial F(x)$ cannot be defined! (different from manifold)
  • Use two properties of gradient flow that can characterize it and can be generalized to metric space.

Two charactering properties of gradient flow in $\mathbb{R}^d$:

  • Energy Dissipation Equality (EDE) for $F\in C^1(\Omega)$, $\Omega\subset \mathbb{R}^n$:
    $$
    F(x(x))-F(x(t))=\int_s^t\left(\frac{1}{2}|x'(r)|^2+\frac{1}{2}|\nabla F(x(r))|^2\right)dr,\forall 0\leq s<t\leq 1
    $$
    is equivalent to $x'=-\nabla F(x)$. Note it is equivalent even for $"\geq"$(i.e $"\geq"\Longleftrightarrow "="$).

  • Evolution Variational Inequality (EVI) for $\lambda$-convex function $F$:
    $$
    \frac{d}{dt}\frac{1}{2}|x(t)-y|^2\leq F(y)-F(x(t))-\frac{\lambda}{2}|x(t)-y|^2, \forall y\in X
    $$
    is equivalent to $x'(t)\in-\partial F(x(t))$.

    • Sometimes also denoted as $EVI_\lambda$.
    • It is important for establishing the uniqueness and stability of gradient
      flow.