# Calculus in Banach Spaces: Fréchet Derivative.

The study of linear equations, known as linear algebra in the finite dimensional case and as functional analysis in the general case, has furnished powerful and beautiful techniques to prove well-posedness of a huge and diverse amount of problems, like the weak variational problem commented in the last part. Many results like Lax-Milgram theorem, Fredholm alternative, spectral theory and so forth, are used every day to find, characterize and understand the solutions of linear problems. However, it is not possible (in general) to translate the conditions imposed by the linear functional analysis in a nonlinear setting.

As we discussed before in the previous part, it is reasonable that to get nice properties in the solutions of the nonlinear equations it is necessary to impose some regularity conditions over the function that defines the equation. In this chapter, trying to take advantage of the strength of the linear structures we are going to develop the differential calculus theory in the general case of normed vector spaces.

Remark 1 For the sake of clarity we are going to present this results in normed linear spaces, even though the most general approach only requires the structure of Fréchet spaces (metric linear spaces).

We already know from the classical differential calculus that if we have a real valued function ${f:G\subset {\mathbb R} \rightarrow {\mathbb R}}$ differentiable on an open set ${G}$ then for any point ${x_0\in G}$ we can approximate ${f}$ in a neighborhood by an affine function in the following way:

$\displaystyle f(y)\simeq f(x)+f'(x)(y-x) \ \ \ \ \ (1)$

For ${y}$ close enough to ${x}$. Under this feature, understanding the local characteristics of ${f}$ can be changed (up to a small error) by analyzing the linear properties of ${f'(x):{\mathbb R}\rightarrow {\mathbb R}}$. Before defining the idea of linear approximation for functions defined in normed linear spaces (NLS) it is necessary to provide a rigurous notion for the error of the approximation.

Definition 1 (Small o notation) Let ${g:G\subset {\mathbb R} \rightarrow {\mathbb R}}$ such that, ${0\in G}$ and with ${G}$ open.

We say that   ${g=o(h)}$   if    ${\lim\limits_{h\rightarrow 0} \frac{g(h)}{h}=0}$.

Through the rest of this section let ${G}$ be an open subset of ${X}$ and let ${X}$, ${Y}$ and ${Z}$be normed linear spaces (NLS).

Definition 2 (Fréchet derivative) Let ${f:G\subset X \rightarrow Y}$ wiht ${x\in G}$..

We say that ${f}$ is (Fréchet) differentiable at ${x}$ if there exist a bounded linear operator ${T\in B(X,Y)}$ such that:

$\displaystyle || f(x+h)-f(x)-Th||_Y=o(|| h||_X ) \ \ \ \ \ (2)$

Or equivalently:

$\displaystyle f(x+h)-f(x)-Th=R(x,h),\hspace{2mm} \text{with} \, \lim\limits_{h\rightarrow 0}\frac{R(x,h)}{|| h||_X}=0. \ \ \ \ \ (3)$

Some basic properties of the derivative in NLS are summarized in the following proposition.

Proposition 3 (Properties of the derivative) Let ${g,f:G\subset X\rightarrow Y}$ and let ${\alpha,\beta \in {\mathbb R}}$:

1. Uniqueness. If ${f}$ is differentiable at ${x\in G}$ then there exist an unique ${T\in B(X,Y)}$ that verifies (2). In this case we denote Df(x):=T.
2. Locally Lipschitz. Given ${\varepsilon>0}$ there exist ${\delta>0}$ such that for all ${|| h||_X<\delta}$ we have that:

$\displaystyle || f(x+h)-f(x)||_Y\leq (|| Df(x)||_{B(X,Y)}+\varepsilon)|| h||_X \ \ \ \ \ (4)$

3. Chain rule. Suppose that ${f}$ is differentiable at ${x\in G}$ and let ${\gamma:U\rightarrow Z}$ where ${U}$ is an open subset of ${Y}$ such that ${f(x)\in U}$. If ${\gamma}$ is differentiable at ${f(x)\in G}$ then ${\gamma \circ f}$ is differentiable at and its derivative is given by:

$\displaystyle D(\gamma\circ f)(x)=D\gamma(f(x))\circ Df(x) \ \ \ \ \ (5)$

4. Linearity. If ${f}$ and ${g}$ are differentiable at ${x\in G}$ then ${\alpha f+\beta g}$ is differentiable at ${x}$ and its derivative is given by:

$\displaystyle D(\alpha f+\beta g)(x)=\alpha D f(x)+\beta D g(x) \ \ \ \ \ (6)$

Proof:

1. Let us suppose that there exist ${T,L\in B(X,Y)}$ that satisfies (2), and let us fix ${h}$ in the unit sphere of ${X}$.

$\displaystyle || T-L||_{B(X,Y)}\leq\frac{|| Th-Lh||_Y}{|| th||_X} \ \ \ \ \$

$\displaystyle \leq \frac{1}{|| th||_X}( || f(x+h)-f(x)-Th||_Y+|| f(x+h)-f(x)-Th||_Y) \ \ \ \ \$

The result follows taking limit as ${t}$ goes to zero in the right side part.

• Let    ${\varepsilon>0}$, by the definition of derivative there exist   ${\delta>0}$   such that for   ${|| h||_X<\delta}$:

$\displaystyle \frac{|| f(x+h)-f(x)||_Y}{|| h||_X}\leq \frac{|| f(x+h)-f(x)-Df(x)h||_Y}{|| h||_X}+|| Df(x)||_{B(X,Y)} \ \ \ \ \$

$\displaystyle \leq \varepsilon +|| Df(x)||_{B(X,Y)} \ \ \ \ \$

Where the first inequality comes from the continuity of ${Df(x)}$ and from the triangular inequality.

• Let ${h\in X}$ small enought such that ${x+h\in G}$, given that ${\gamma}$ is differentiable at ${f(x)}$, applying the definition (3) we have:

Using the differentiability of ${f}$ at ${x}$ we get from this:

It is clear that the first term of the right side is a bounded linear operator, so it is enough to prove that the last two terms constitute efectively the remainder of our definition. The first of these two terms behaves nicely by the continuity of the derivative of ${\gamma}$:

$\displaystyle \frac{|| D\gamma(f(x))(R_f(x,h))||_Z}{|| h||_X}\leq || D\gamma(f(x))||_{B(Y,Z)}\frac{|| R_f(x,h)||_Y}{|| h||_X}\rightarrow 0, \hspace{2mm} as\, h\rightarrow 0. \ \ \ \ \$

For the second term it is necessary to use a standard ${\varepsilon}$${\delta}$ argument, since ${f(x+h)-f(x)=0}$ can be zero for non zero values of ${h}$ .

Let ${\varepsilon>0}$, if ${f(x+h)-f(x)=0}$ for any ${h}$, then automatically ${\frac{|| R_\gamma(f(x),f(x+h)-f(x))||_Z}{|| h||_X}=00}$ such that for ${|| f(x+h)-f(x)||_Y<\delta_1}$ we have:

Using the local lipschitz property (4), we know that there exist ${\delta_2>0}$ such that for ${|| h||_X<\delta_2}$ we have:

$\displaystyle || f(x+h)-f(x)||_Y\leq (|| Df(x)||_{B(X,Y)}+1)|| h||_X \ \ \ \ \ (13)$

If we take ${\delta>0}$ such that ${(|| Df(x)||_{B(X,Y)}+1)\delta<\delta_1}$, we finally get for any ${|| h||_X<\delta}$:

$\displaystyle \frac{|| R_\gamma(f(x),f(x+h)-f(x))||_Z}{|| h||_X}=\frac{|| R_\gamma(f(x),f(x+h)-f(x))||_Z}{|| f(x+h)-f(x)||_Y}\frac{|| f(x+h)-f(x)||_Y}{|| h||_X} \ \ \ \ \$

$\displaystyle< \varepsilon \ \ \ \ \$

• Exercise.$\Box$

Definition 4 Let ${f:G\subset X\rightarrow Y}$ we say that ${f\in C^1(G)}$ if ${f}$ is differentiable at each point in ${G}$ and if the map ${Df:G\rightarrow B(X,Y)}$ is continuous.

Inductivelly for ${k>1}$ we say that ${f\in C^k(G)}$ if ${D^{k-1}f\in C^{1}(G)}$ and anagously to the real case we define ${C^{\infty}(G):=\bigcap_{k\in {\mathbb N}} C^k(G)}$.

Remark 2 Let ${n\in {\mathbb N}}$ and let ${X_1,\dots, X_n}$ be NLS. Let ${B(X_1\times\dots \times X_n,Y)}$ be the set of the bounded multilinear functions ${M:X_1\times\dots \times X_n\rightarrow Y}$ endowed with the usual operator norm (see Brezi’s book).In the case ${X_1=\dots =X_n}$ let us denote this set as ${B^n(X,Y)}$.

It is easy to check (exercise!) that for ${n,k\in {\mathbb N}}$ the vector spaces ${B^n(X,B^k(X,Y))}$ and ${B^{n+k}(X,Y)}$ are isometrically isomorphic, and also that ${B^{n}(X,Y)}$ is a Banach space if and only if ${Y}$ is a Banach space .

Under this consideration we can make the following identification:

Under this consideration we can make the following identification:

${ f : X\rightarrow Y}$

${ Df : X\rightarrow B(X,Y)}$

${ D^2f : X\rightarrow B(X,B(X,Y))=B^2(X,Y)}$

${ \vdots}$

${ D^nf : X\rightarrow B(X,B(X,\cdots B(X,Y))=B^n(X,Y)}$

Example 1 Consider the function ${f:G\subset X\rightarrow Y}$ defined as ${f(x)=y_0}$ for all ${x\in G}$. Therefore for any ${x\in G}$ and for ${h\in X}$ small enough we get that ${f(x+h)-f(x)=0}$ implying that ${Df(x)=0}$ for all ${x\in G}$, therefore ${f\in C^{\infty}(G)}$.

Example 2 Let ${T\in B(X,Y)}$, in this case for any ${h\in X}$ we get that ${T(x+h)-Tx-Th=0}$, therefore from the definition follows that ${DT(x)(h)=Th}$. Clearly since the derivative ${DT}$ is a constant function we get that ${D^2T=0}$ and therefore ${T\in C^{\infty}(X)}$.

Example 3 Let ${b\in B^2(X,Y)}$ and let us define ${f(x):=b(x,x)}$, from the bilinearity is clear that:

$\displaystyle f(x+h)-f(x)=b(x+h,x+h)-b(x,x)=b(x,h)+b(h,x)+b(h,h) \ \ \ \ \$

Since ${b(.,x)+b(x,.)\in B(X,Y)}$ and given that ${|| b(h,h)||_Y\leq || b||_{B^2(X,Y)}|| x||_X^2}$ it follows that ${Df(x)(h)=b(h,x)+b(x,h)}$.

Clearly the function ${Df}$ is a linear function therefore by the last example belongs to ${ C^{\infty}(X)}$

This example can be generalized significantly to get a result that will be useful later in this section.

Proposition 1 Let ${M\in B(X_1\times X_2\times \dots \times X_n,Y)}$ with ${n\in {\mathbb N}}$ and let ${f_i:U\subset Z\rightarrow X_i}$ for ${i=1,\dots ,n}$, where ${U}$ is an open subset of ${Z}$, let us define the function:

${J:Z\rightarrow Y}$

${z\rightarrow M(f_1(z),\cdots, f_n(z))}$

Let ${k\in {\mathbb N}}$. If ${f_i\in C^k(U)}$ for ${i=1,\dots ,n}$, then ${J\in C^{k}(U)}$ and for ${z\in U}$ and ${h\in Z}$ we have:

$\displaystyle DJ(z)h=\sum_{i=1}^{n} M(f_1(z),\cdots, f_{i-1}(z), Df_i(z)h, f_{i+1}(z), \cdots , f_{n}(z)). \ \ \ \ \$

Proof: Let us define the functional ${T_z\in B(Z,Y)}$ by

$\displaystyle T_x(h):=\sum_{i=1}^{n} M(f_1(z),\cdots, f_{i-1}(z), Df_i(z)h, f_{i+1}(z), \cdots , f_{n}(z)). \ \ \ \ \$

Let ${z\in U}$ and ${h\in Z}$ such that ${z+h\in U}$ adding and substracting terms in a suitable way we get:

$\displaystyle J(z+h)-J(z)-T_x(h) \ \ \ \ \$

$\displaystyle=\sum_{i=1}^{n} M(f_1(z),\cdots, f_{i-1}(z),f_{i-1}(z), f_i(z+h)-f_i(z)- Df_i(z)h, f_{i+1}(z), \cdots , f_{n}(z)), \ \ \ \ \$

on the other hand, since ${M}$ is bounded we get for any ${i=1\dots n}$

$\displaystyle \Vert M(f_1(z),\cdots, f_{i-1}(z),f_{i-1}(z), f_i(z+h)-f_i(z)- Df_i(z)h, f_{i+1}(z), \cdots , f_{n}(z))\Vert_{Y} \ \ \ \ \$

$\displaystyle \leq C\Vert f_i(z+h)-f_i(z)- Df_i(z)h\Vert_{X_i}=o(\Vert h\Vert_Z) \ \ \ \ \$

This proves the differentiability of ${J}$, the rest of the proof follows by induction and noticing that the derivative of ${J}$ is a sum of terms with the same structure of ${J}$ exercise!. $\Box$

As a simple consecuence of this result and of the chain rule we have that the differentiability is preserved under composition.

Corollary 9 (Of the Chain Rule) Let ${\gamma:U\subset Y\rightarrow Z}$ and ${f:G\subset X\rightarrow Y}$ be functions such that ${f\in C^k(G)}$ and ${\gamma\in C^k(U)}$ where ${U}$ is an open subset of ${Y}$ such that ${f(G)\subset U}$. Then ${\gamma\circ f:G\rightarrow Z}$ satisifies ${\gamma \circ f\in C^k(G)}$

Proof: The proof proceed by induction. Clearly the case ${k=1}$ corresponds to the chain rule.

For the inductive case, let us suppose that the result is true for ${C^{k-1}}$-functions, hence if ${f\in C^k(G)}$ and ${\gamma\in C^k(U)}$ their derivatives satisfies ${Df\in C^{k-1}(G)}$ and ${D\gamma\in C^{k-1}(U)}$, on the other hand, we trivially have ${f\in C^{k-1}(G)}$. Applying the inductive hypothesis we get that ${D\gamma \circ f\in C^{k-1}(G)}$ and by the chain rule we can see the derivative of the composition as:

$\displaystyle D(\gamma \circ f)(x)=\phi (D\gamma(f(x)),Df(x)) \ \ \ \ \$

Where ${\phi:B(Y,Z)\times B(X,Y)\rightarrow B(Y,Z) }$ is a bounded bilinear function given by the composition, i. e. ${\phi(A,B)=A\circ B}$ with ${A\in B(Y,Z)}$ and ${B\in B(X,Y)}$.

Finally, applying the last proposition the result follows. $\Box$

Example 4 Let ${\Omega}$ be an open subset of ${R^N}$, let us consider the function:

${J :L^n(\Omega)\rightarrow {\mathbb R}}$

${ u \rightarrow \int_{\Omega} u^n}$

With ${n\in {\mathbb N}}$. Applying the binomial theorem we get:

$\displaystyle J(u+h)-J(u)=\int_{\Omega}\sum_{k=0}^{n}\binom{n}{k}u^{n-k}h^k -u^n=n\int_{\Omega} u^{n-1}h+\int_{\Omega}\sum_{k=2}^{n}\binom{n}{k}u^{n-k}h^k \ \ \ \ \$

Clearly the first term of the right hand side part is linear, on the other hand by H\”{o}lder’s inequality we get that ${\Bigg|\int_{\Omega} u^{n-k}h^k\Bigg|\leq ||u||_{L^n}^{n-k} || h||_{L^n}^{k}}$. This shows the continuity of the first term of the rght hand side part and also shows that the second term of this side is ${o(|| h||_{L^n})}$.

Finally ${DJ(u)(h)=n\int_{\Omega} u^{n-1}h}$.

In order to compute the derivative of this function replacing ${n}$ by an arbitrary number bigger than 1 and to compute the derivative of some other functionals it is necessary to introduce some operators that will be quite important in the rest of this notes.

Definition 5 (Carathéodory functions and Nemytskii operators) Let ${f:\Omega\times {\mathbb R}\rightarrow {\mathbb R}}$ with ${\Omega}$ an open subset of ${{\mathbb R}^n}$ such that:

1. ${f(x,.)}$ is continuous for almost each ${x\in \Omega}$
2. ${f(.,t)}$ is measurable for all ${t\in {\mathbb R}}$

A function ${f}$ with this two properties is called a Carathéodory function.

Under these hypotheses let us define the Nemytskii operator associated to ${f}$ by:

$\displaystyle \mathcal{N}_f(u)(x)=f(x,u(x)) \ \ \ \ \$

for any ${u\in \mathcal{M}}$ (i. e. the set of Lebesgue measurable functions in ${\Omega}$).

The main properties of the Nemytskii operators are summarized in the following lemma (for a further analysis of the Nemytskii operators see Krasnoselskii’s book):

Lemma 6 Let ${f:\Omega\times {\mathbb R}\rightarrow {\mathbb R}}$ be a Carathéodory function and ${\Omega}$ an open subset of ${{\mathbb R}^N}$ then the Nemytskii operator ${\mathcal{N}_f}$ satisfies:

1. ${\mathcal{N}_f:\mathcal{M}\rightarrow \mathcal{M}}$ is well defined.
2. Vainberg’s lemma. If there exist a constant ${a>0}$ and a function ${b\in L^q(\Omega)}$ with ${1\leq q<\infty}$ such that:

$\displaystyle |f(x,t)|\leq a|t|^{\frac{p}{q}}+|b(x)|. \ \ \ \ \$

For some ${1\leq p<\infty}$ then:

${ \mathcal{N}_f :L^p(\Omega)\rightarrow L^q(\Omega) }$

${ u \rightarrow f(.,u(.))}$

Is a well defined, bounded and continuous operator.

Proof:

1. Let ${S=\sum_{i=1}^{n}\alpha_i \chi_{A_i}}$ be a simple function in ${\mathcal{M}}$, therefore ${\mathcal{N}_f(S)=f(.,S(.))=\sum_{i=1}^{n}f(.,\alpha_i) \chi_{A_i}}$ that is also measurable.In the general case consider any function ${u\in \mathcal{M}}$, by the density of the simple functions in ${\mathcal{M}}$ we can find a sequence of simple functions ${\{s_n\}_{n\in {\mathbb N}}}$ that converges pointwise to ${u}$ a.e. in ${\Omega}$. This fact jointly with the continuity of ${f}$ in its second argument implies:

$\displaystyle \lim\limits_{n\rightarrow \infty} f(x,s_n(x))=f(x,u(x)), \qquad a.e.\, in\, \Omega. \ \ \ \ \$

And since the pointwise limit of measurable functions is also measurable, the result follows.

2. Given ${u\in L^p(\Omega)}$ it is clear (by Minkowski’s inequality that) ${a|u|^{\frac{p}{q}}+|b|\in L^q(\Omega)}$, moreover using this idea we get:

$\displaystyle || \mathcal{N}_f(u)||_{L^q(\Omega)}\leq a|| u||_{L^p(\Omega)}^{\frac{p}{q}}+|| _b ||_{L^q(\Omega)} \ \ \ \ \$

To prove the continuity it is necessary to use the trick of showing that any subsequence has a subsubsequence converging to the desired limit. Consider any sequence of functions ${\{u_n\}_{n\in {\mathbb N}}}$ converging to ${u}$ in ${L^p(\Omega)}$. Extract any subsequence ${\{u_{n_k}\}_{k\in {\mathbb N}}}$, since this subsequence still converges to ${u}$ in ${L^p(\Omega)}$, there exist a subsequence of this ${\{u_{n_{k_j}}\}_{j\in {\mathbb N}}}$ such that ${u_{n_{k_j}}\rightarrow u}$ a.e. in ${\Omega}$ and there exist a function ${g\in L^p(\Omega)}$ such that ${|u_{n_{k_j}}|\leq |g|}$ a.e. in ${\Omega}$ (see Jone’s book). Therefore, combining the continuity of ${f}$ (as in the last proof) and the fact that ${|\mathcal{N}_f(u_{n_{k_j}})|\leq a|u|^{\frac{p}{q}}+|b|}$ and using the Lebegue’s dominated convergence theorem the result follows.

$\Box$

The generalaziation of these properties to Carathéodory functions of the form ${f:\Omega\times {\mathbb R}^n\rightarrow {\mathbb R}}$ follows mutatis mutandis the same proof scheme and it is left as exercise.

Now, provided with this result we can easily generalize the result of the Example 4.

Example 5 Let ${\Omega}$ be an open subset of ${R^N}$, and let ${p>1}$ be a real number, let us consider the function:

${J :L^p(\Omega)\rightarrow {\mathbb R}}$

${f \rightarrow \int_{\Omega} |u|^p}$

In this case it is clear that the function ${F(t)=|t|^p}$ is differentiable with derivative ${f(t)=pt|t|^{p-2}}$ (with ${f(0)=0}$).

Under these considerations and applying the fundamental theorem of calculus (and Fubini’s theorem) we get:

$\displaystyle \Bigg|\int_{\Omega} (|u+h|^p-|u|^p-pu|u|^{p-2}h)\Bigg| \leq\int_{\Omega}\int_{0}^1 |f(u+th)-f(u)||h|dtdx \ \ \ \ \$

$\displaystyle \leq \int_{0}^1 || f(u+th)-f(u)||_{L^q(\Omega)} || h||_{L^p(\Omega)} \ \ \ \ \$

With ${q=\frac{p}{p-1}}$ (the conjugate exponent of ${p}$).

Since ${f}$ (trivially) satisifies ${|f(t)|\leq |t|^{p-1}}$ in virtue of Vainberg’s lemma the Nemytskii operator ${\mathcal{N}_f:L^p(\Omega)\rightarrow L^q(\Omega)}$ is continuous. Therefore, using Lebegue’s dominated convergence theorem we get:

$\displaystyle \frac{\Bigg|\int_{\Omega} (|u+h|^p-|u|^p-pu|u|^{p-2}h)\Bigg|}{|| h||_{L^p(\Omega)}} \leq \int_{0}^1 || f(u+th)-f(u)||_{L^q(\Omega)}\rightarrow 0 \ \ \ \ \ (24)$

As  $|| h||_{L^p(\Omega)}\rightarrow 0$.

Finally ${DJ(u)(h)=p\int_{\Omega} u|u|^{p-2}h}$.

Remark 3 It is clear that the last computation can be easily generilazed, i. e. under suitable conditions on the Carathéodory function ${f}$ it can be shown that the functional

${J :L^p(\Omega)\rightarrow {\mathbb R}}$

${f \rightarrow \int_{\Omega} F(u)}$

With ${F(x,t):=\int_{0}^{t}f(x,s)ds}$. Is differentiable with derivative given by ${DJ(u)(h)=\int_{\Omega} f(.,u)h}$.

It is left as exercise the problem of finding general conditions on ${f}$ for bounded and unbounded domains ${\Omega}$ to guarantee this result.

In the next part we will use this lemma and some other real analysis techniques to compute the derivative of some classical functionals related to problems in differential geometry and in PDE.