\[\chi:M \to K^\ast.\] By trivial character we mean a character such that \(\chi(M)=\{1\}\). We are particularly interested in the linear independence of characters. Functions \(f_i:M \to K\) are called **linearly independent over \(K\)** if whenever \[a_1f_1+\cdots+a_nf_n=0\] with all \(a_i \in K\), we have \(a_i=0\) for all \(i\). \(\def\Tr\operatorname{Tr}\)

In Fourier analysis we are always interested by functions like \(f(x)=e^{-inx}\) or \(g(x)= e^{-ixt}\), corresponding to Fourier series (integration on \(\mathbb{R}/2\pi\mathbb{Z}\)) and Fourier transform. Later mathematicians realised that everything can be set in a locally compact abelian (LCA) group. For this reason we need to generalise these functions, and the bounded ones coincide with our definition of characters.

Let \(G\) be a LCA group, then \(\gamma:G \to \mathbb{C}\) is called a *character* if \(|\gamma(x)|=1\) for all \(x \in G\) and \[\gamma(x+y)=\gamma(x)\gamma(y).\] Note since \(G\) is automatically a monoid, this coincide with our ordinary definition of character. The set of *continuous* characters form a group \(\Gamma\), which is called the *dual group* of \(G\).

If \(G=\mathbb{R}\), solving the equation \(\gamma(x+y)=\gamma(x)\gamma(y)\) in whatever way he or she likes we obtain \(\gamma(x)=e^{Ax}\) for some \(A \in \mathbb{C}\). But \(|e^{Ax}| \equiv 1\) (or merely being bounded) forces \(A\) to be purely imaginary, say \(A=it\), then we have \(\gamma(x)=e^{itx}\). Hence the dual group of \(\mathbb{R}\) can be determined by (the speed of) rotation on the unit circle.

With this we have our generalised version of Fourier transform. Let \(G\) be a LCA group, \(f \in L^1(G)\), then the **Fourier transform** is given by \[\hat{f}(\gamma) = \int_G f(x)\gamma(-x)dx, \quad \gamma \in \Gamma.\] One can intuitively verify that \(\hat{f}\) is exactly the Gelfand transform of \(f\).

If characters of \(G\) are linear independent, then they are pairwise distinct, but what about the converse? Dedekind answered this question affirmatively. But his approach is rather complicated: it needed determinant. However, Artin found a neat way to do it:

Theorem (Dedekind-Artin)Let \(M\) be a monoid and \(K\) a field. Let \(\chi_1,\dots,\chi_n\) be distinct characters of \(G\) in \(K\). Then they are linearly independent over \(K\).

*Proof.* Suppose this is false. Let \(N\) be the smallest integer that \[a_1\chi_1+a_2\chi_2+\cdots+a_N\chi_N = 0\] but not all \(a_i\) are \(0\), for distinct \(\chi_i\). Since \(\chi_1 \ne \chi_2\), there is some \(z \in M\) such that \(\chi_1(z) \ne \chi_2(z)\). Yet still we have \[a_1\chi_1(zx)+\cdots+a_N\chi_N(zx)=0.\] Since \(\chi_i\) are characters, for all \(x \in M\) we have \[a_1\chi_1(z)\chi_1(x)+\cdots+a_N\chi_N(z)\chi_N(x)=0.\] We now have a linear system \[\begin{pmatrix}a_1 & a_2 & \cdots & a_N \\a_1\chi_1(z) & a_2\chi_2(z) & \cdots & a_N\chi_N(z)\end{pmatrix}\begin{pmatrix}\chi_1 \\\chi_2 \\\vdots \\\chi_N\end{pmatrix} = \begin{pmatrix}0 \\ 0\end{pmatrix}\] If we perform Gaussian elimination once, we see \[\begin{pmatrix}a_1 & a_2 & \cdots & a_N \\0 & \left(\frac{\chi_2(z)}{\chi_1(z)}-1\right)a_2 & \cdots & \left(\frac{\chi_N(z)}{\chi_1(z)}-1\right)a_N\chi_N(z)\end{pmatrix}\begin{pmatrix}\chi_1 \\\chi_2 \\\vdots \\\chi_N\end{pmatrix} = \begin{pmatrix}0 \\ 0\end{pmatrix}\] But this is to say \[\left(\frac{\chi_2(z)}{\chi_1(z)}-1\right)a_2\chi_2 + \cdots + \left(\frac{\chi_N(z)}{\chi_1(z)}-1\right)a_N\chi_N(z)\chi_N=0\] Note by assumption \(\frac{\chi_2(z)}{\chi_1(z)}-1 \ne 0\) and therefore we found \(N-1\) distinct and linearly independent characters, contradicting our assumption. \(\square\)

As an application, we consider an \(n\)-variable equation:

Let \(\alpha_1,\cdots,\alpha_n\) be distinct non-zero elements of a field \(K\). If \(a_1,\cdots,a_n\) are elements of \(K\) such that for all integers \(v \ge 0\) we have \[a_1\alpha_1^v + \cdots + a_n\alpha_n^v = 0\] then \(a_i=0\) for all \(i\).

*Proof.* Consider \(n\) distinct characters \(\chi_i(v)=\alpha^v\) of \(\mathbb{Z}_{\ge 0}\) into \(K^\ast\). \(\square\)

The linear independence of characters gives us a good chance of studying the relation of the field extension and the Galois group.

Hilbert's Theorem 90 (Modern Version)Let \(K/k\) be a Galois extension with Galois group \(G\), then \(H^1(G,K^\ast)=1\) and \(H^1(G,K)=0\). This is to say, the first cohomology group is trivial for both addition and multiplication.

It may look confusing but the classic version is about cyclic extensions (\(K/k\) is cyclic if it is Galois and the Galois group is cyclic).

Hilbert's Theorem 90 (Classic Version, Multiplicative Form)Let \(K/k\) be cyclic of degree \(n\) with Galois group \(G\) generated by \(\sigma\). Then \[\frac{\ker N}{1/\sigma{A}} \cong 1\] where \(1/\sigma{A}\) consists of all elements of the form \(\alpha/\sigma(\alpha)\) with \(\alpha \in A\), and \(N(\beta)\) is the norm of \(\beta \in K\) over \(k\).

This corresponds to the statement that \(H^1(G,K^\ast)=1\). On the other hand,

Hilbert's Theorem 90 (Classic Version, Additive Form)Let \(K/k\) be cyclic of degree \(n\) with Galois group \(G\) generated by \(\sigma\). Then \[\frac{\ker \Tr}{(1-\sigma){A}} \cong 0\] where \((1-\sigma)A\) consists of all elements of the form \((1-\sigma)(\alpha)\) with \(\alpha \in A\), and \(\Tr(\beta)\) is the norm of \(\beta \in K\) over \(k\).

This corresponds to, of course, the statement that \(H^1(G,K)=0\). Note this indeed asserts an exact sequence \[0 \to k \to K \xrightarrow{1-\sigma} K \xrightarrow{\Tr} K \to 0.\] Before we prove it we recall what is group cohomology. Let \(G\) be a group. We consider the category **\(G\)-mod** of left \(G\)-modules. The set of morphisms of two objects \(A\) and \(B\), for which we write \(\operatorname{Hom}_G(A,B)\), consists of all objects of \(G\)-set maps from \(A\) to \(B\). The *cohomology groups of \(G\) with coefficients in \(A\)* is the right derived functor of \(\operatorname{Hom}_G(\mathbb{Z},-)\): \[H^\ast (G,A) \cong \operatorname{Ext}^\ast_{\mathbb{Z}[G]}(\mathbb{Z},A).\] It follows that $H^0(G,A) _G(Z,A)=A/ga-a:g G,a A $. In particular, if \(G\) is trivial, then \(\operatorname{Hom}_G(\mathbb{Z},-)\) is exact and therefore \(H^\ast(G,A)=0\) whenever \(\ast \ne 0\). We will see what will happen when \(G\) is a Galois group of a Galois extension. If the modern version is beyond your reach, you can refer to the classic version. As a side note, the modern version can also be done using Shapiro's lemma.

$$ which is to say \(\alpha_\tau = \gamma/\tau\gamma\). Replacing \(\gamma\) with \(\gamma^{-1}\) gives what we want: cocycle coincides with coboundary. So much for the multiplicative form.

For the additive form, take \(\theta \in K \setminus \ker Tr\). Given a \(1\)-cocycle \(\alpha\) in the additive group \(K\), we put \[\beta = \frac{1}{\Tr(\theta)}\sum_{\tau \in G}\alpha_\tau \tau(\theta)\] Since cocycle satisfies \(\alpha_{\sigma\tau}=\alpha_\sigma+\sigma\alpha_\tau\), we get \[\sigma\beta = \frac{1}{\Tr(\theta)}\sum_{\tau \in G}(\alpha_{\sigma\tau}-\alpha_\sigma)\sigma\tau(\theta) = \beta -\alpha_\sigma\] which gives \(\alpha_\sigma = \beta-\sigma\beta\). Replacing \(\beta\) with \(-\beta\) gives what we want. \(\square\)

*Additive form.* Pick any \(\beta-\sigma\beta\), we see \(\Tr(\beta-\sigma\beta)=\sum_{\tau \in G}\tau\beta-\sum_{\tau \in G}\tau\beta=0\).

Conversely, assume \(\Tr(\alpha)=0\). By Artin's lemma, the trace function is not trivial, hence there exists some \(\theta \in K\) such that \(\Tr(\theta)\ne 0\), then we take \[\beta = \frac{1}{\Tr(\theta)}[\alpha\theta^\sigma+(\alpha+\sigma\alpha)\theta^{\sigma^2}+\cdots+(\alpha+\sigma\alpha+\cdots+\sigma^{n-2}\alpha)\theta^{\sigma^{n-1}}]\] where for convenience we write \(\sigma\theta=\theta^\sigma\). Therefore \[\beta-\sigma\beta = \frac{1}{\Tr(\theta)}\alpha(\theta+\theta^{\sigma}+\theta^{\sigma^2}+\cdots+\theta^{\sigma^{n-1}})=\alpha\] because other terms are cancelled. \(\square\)

*Multiplicative form.* This can be done in a quite similar setting. For any \(\alpha=\beta/\sigma\beta\), we have \[N(\alpha)=N(\beta)/N(\sigma\beta)=\left(\prod_{\tau \in G}\tau\beta\right)/ \left( \prod_{\tau \in G}\tau\sigma\beta\right)=1.\] Conversely, assume \(N(\alpha)=1\). By Artin's lemma, following function is not trivial: \[\Lambda:\operatorname{id}+\alpha\sigma+\alpha^{1+\sigma}\sigma^2+\cdots+\alpha^{1+\sigma+\cdots+\sigma^{n-2}}\sigma^{n-1}.\] Suppose now \(\beta=\Lambda(\theta) \ne 0\). It follows that \[\begin{aligned}\alpha\beta^\sigma &= \alpha(\theta+\alpha\theta^\sigma+\cdots+\alpha^{1+\sigma+\cdots+\sigma^{n-2}}\theta^{\sigma^{n-1}})^\sigma \\&= \alpha(\theta^\sigma+\alpha^\sigma\theta^{\sigma^2}+\cdots+\underbrace{\alpha^{\sigma+\sigma^2+\cdots+\sigma^{n-1}}\theta^{\sigma^n}}_{=\alpha^{-1}\theta}) \\&= \alpha\theta^\sigma+\alpha^{1+\sigma}+\cdots+\alpha^{1+\sigma+\cdots+\sigma^{n-2}}\theta^{n-1}+\theta \\&=\beta\end{aligned}\] and this is exactly what we want. \(\square\)

Consider the extension \(\mathbb{Q}(i)/\mathbb{Q}\). The Galois group \(G=\{1,\tau\}\) is cyclic and generated by \(\tau\) the complex conjugation. Now we pick whatever \(N(a+bi)=a^2+b^2=1\) where \(a,b \in \mathbb{Q}\), we have some \(r=s+ti \in \mathbb{Q}(i)\) such that \[a+bi = \frac{s+ti}{s-ti}=\frac{s^2-t^2+2sti}{s^2+t^2}= \frac{s^2-t^2}{s^2+t^2}+\frac{2st}{s^2+t^2}i\] If we put \((x,y,z)=(s^2-t^2,2st,s^2+t^2)\), we actually get a Pythagorean triple (if \(s,t\) are fractions, we can multiply them with the \(\gcd\) of the denominators so they are integers.). Conversely, all Pythagorean triple \((x,y,z)\), we assign it with \(\frac{x}{z}+\frac{y}{z}i \in \mathbb{Q}(i)\) then we have an element of norm \(1\). Through this we have found all solutions to \(x^2+y^2=z^2\). i.e.

TheoremIntegers \(x,y,z\) satisfy the Diophantine equation \(x^2+y^2=z^2\) if and only if \((x,y,z)\) is proportional to \((m^2-n^2,2mn,m^2+n^2)\) for some integers \(m,n\).

This can be generalised to all Diophantine equations of the form \(x^2+Axy+By^2=Cz^2\) for some nonzero constant \(C\) and constant \(A,B\) such that the discriminant \(A^2-4B\) is square-free. You can find some discussion here.

The additive form is a good friend of "character \(p\)" things. Artin-Schreier's theorem is a good example of \(p\)-to-the-\(p\).

Theorem (Artin-Schreier)Let \(k\) be a field of character \(p\) and \(K/k\) an extension of degree \(p\). Then there exists \(\alpha \in K\) and \(\alpha\) is the zeroof an equation \(X^p-X-a=0\) for some \(a \in k\).

*Proof.* Note the Galois group \(G\) of \(K/k\) is cyclic and \(\Tr(-1)=p(-1)=0\), we are able to use the additive form. Let \(\sigma\) be the generator of \(G\), there exists some \(\alpha \in K\) such that \[\sigma\alpha = \alpha+1.\] Hence \(\sigma(\sigma(\alpha))=\sigma(\alpha+1)=\alpha+1+1\), and by induction we get \[\sigma^i(\alpha) = \alpha+i, \quad i=1,2,\cdots,p\] and \(\alpha\) has \(p\) conjugates. Therefore \([k(\alpha):k] \ge p\). But in the meantime \[[K:k]=[K:k(\alpha)][k(\alpha):k]\] we can only have \([K:k(\alpha)]=1\), which is to say \(K=k(\alpha)\). In the meantime, \[\sigma(\alpha^p-\alpha)=(\alpha+1)^p-(\alpha+1)=\alpha^p+1^p-\alpha-1 = \alpha^p-\alpha.\] Hence \(\alpha^p - \alpha\) lies in the fixed field of \(\sigma\), which happens to be \(k\). Putting \(a=\alpha^p-\alpha\) and our proof is done. \(\square\).

For the case when the character is \(0\) please see here. There is a converse, which deserves a standalone blog post. It says that the polynomial \(f(X)=X^p-X-a\) either has one root in \(k\), in which case all its roots are in \(k\); or it is irredcible, in which case if \(\alpha\) is a root then \(k(\alpha)\) is cyclic of degree \(p\) over \(k\). But I don't know if many people are fans of "character \(p\)" things.

- Serge Lang,
*Algbra, Revised Third Edition*. - Charles A. Weibel,
*An Introduction to Homological Algebra*. - Noam D. Elkies,
*Pythagorean triples and Hilbert’s Theorem 90*. (https://abel.math.harvard.edu/~elkies/Misc/hilbert.pdf) - Jose Capco,
*The Two Artin-Schreier Theorems*. (https://www3.risc.jku.at/publications/download/risc_5477/the_two_artin_schreier_theorems__jcapco.pdf) - Walter Rudin,
*Fourier Analysis on Groups*.

In fact the \(\mathbb{R}^n\) case can be generalised into any locally compact abelian group (see any abstract harmonic analysis books), this is because what really matters here is being locally compact and abelian. But at this moment we stick to Euclidean spaces. Note since \(\mathbb{R}^n\) is \(\sigma\)-compact, all Borel measures are regular.

To read this post you need to be familiar with some basic properties of Banach algebra, complex Borel measures, and the most important, Fubini's theorem.

The norm on \(M(\mathbb{R}^n)\) is the *total variation*: \[\lVert \mu \rVert = |\mu|(\mathbb{R}^n) = \sup \sum_{i=1}^{\infty}|\mu(E_i)|\] the supremum being taken over all partitions \((E_i)\) of \(\mathbb{R}^n\). The supremum on the right hand side is finite because \(\mu\) is assumed to be complex. This norm makes \(M(\mathbb{R}^n)\) normed but we are interested in proving this space to be Banach.

Note each measure in \(M(\mathbb{R})\) gives rise to a bounded complex functional \[\begin{aligned}\Phi_\mu:C_0(\mathbb{R}^n) &\to \mathbb{C} \\ f &\to \int_{\mathbb{R}^n}fd\mu.\end{aligned}\] Note we have \(\vert \Phi_\mu(f)\vert = |\int f d\mu| \le \int |f|d|\mu| <\infty\). Indeed the norm of \(\Phi_\mu\) is \(\lVert \mu \rVert\).

Conversely, every bounded linear functional \(\Phi\) gives rise to a regular Borel measure \(\mu\) such that \(\Phi(f)=\int fd\mu\) and \(\lVert \Phi \rVert = \lVert \mu \rVert\), which is ensured by Riesz representation theorem. This is to say \[C_0(\mathbb{R}^n)^\ast \cong M(\mathbb{R}^n)\] in the sense of vector space isomorphism and homeomorphism (in fact, isometry). But it is well known that the dual space of a normed vector space is a Banach space, hence \(M(\mathbb{R}^n)\) is Banach as is expected.

A vector space \(V\) over a field \(\mathbb{F}\) is called an algebra if there is an \(\mathbb{F}\)-bilinear form \[B:V \times V \to V \]

. It is a Banach algebra if \(V\) itself is Banach and the bilinear form is associative, i.e. \(B(x,B(y,z)) = B(B(x,y),z)\) and \[\lVert B(x,y) \rVert \le \lVert x \rVert \lVert y \rVert.\] We show that \(M(\mathbb{R}^n)\) is Banach by taking \(B(\lambda,\mu)=\lambda \ast \mu\).

The convolution of measures is defined in the style of convolution of functions, in a natural sense. For any Borel set \(E \subset \mathbb{R}^n\), we can consider the set restricted by addition: \[E_2 = \{(x,y):x+y \in E\} \subset \mathbb{R}^{2n}.\] Then we define the convolution of \(\mu,\lambda \in M(\mathbb{R}^{2n})\) by product measure \[(\mu \ast \lambda)(E) = (\mu \times \lambda)(E_2).\] It looks natural but we need many routine verification.

First we need to show that \(E_2\) is Borel. In fact we have \[\chi_{E_2}(x,y) = \chi_E(x+y).\] Since \(E\) is Borel, we see \(\chi_E\) is Borel. Meanwhile \(\varphi(x,y)=x+y\) is continuous hence Borel. Therefore \(\chi_{E_2}\) is Borel as well. It follows that \(E_2\) is a Borel set.

Next, is \(\mu \ast \lambda\) an element of \(M(\mathbb{R}^n)\)? For any Borel set \(E\), the value of \(\mu \ast \lambda(E)\) is defined in \(\mathbb{C}\), so we only need to verify that the definition of measure is satisfied. It shall be shown that \[(\mu \ast \lambda)\left(\bigcup_{k=1}^{\infty}E^k\right)=\sum_{k=1}^{\infty}(\mu \ast \lambda)(E^k)\] where \(E^k\) are pairwise disjoint. Since the "measure" of \(E\) is connected to \(E_2\), we first show that if \(E\) and \(F\) are disjoint, then so are \(E_2\) and \(F_2\). Indeed, if \((x,y) \in E_2 \cap F_2\), then we have \(x+y \in E \cap F\), and the set cannot be empty. Hence pairwise disjoint is preserved. Putting \(E= \bigcup_{k=1}^{\infty}E^k\), we also need to show that \(E_2 = \bigcup_{k=1}^{\infty}E_2^k\). If \(x+y \in E\), then it lies in one of \(E^k\), hence \((x,y) \in E_2 \implies (x,y) \in E_2^k\) for some \(k\). It follows that \(E_2 \subset \bigcup_{k=1}^{\infty}E_2^k\). Conversely, for \((x,y) \in \bigcup_{k=1}^{\infty}E_2^k\), we must have some \(k\) such that \(x+y \in E^k \subset E\), hence \((x,y) \in E_2\), which is to say that \(\bigcup_{k=1}^{\infty}E_2^k \subset E_2\). Therefore \[(\mu \ast \lambda)(E) = (\mu \times \lambda)(E_2) = (\mu \times \lambda)\left( \bigcup_{k=1}^{\infty}E_2^k\right) = \sum_{k=1}^{\infty}(\mu \times \lambda)(E_2^k) = \sum_{k=1}^{\infty}(\mu \ast \lambda)(E^k)\] as is desired.

For any \(f \in C_0(\mathbb{R}^n)\), we have a linear functional \[\Phi:f \mapsto \iint f(x+y)d\mu(x)d\lambda(y) = \int fd(\mu \ast \lambda)\] By Riesz representation theorem, there exists a unique measure \(\nu\) such that \(\Phi(f)=\int fd\nu\), it follows that \(\nu = \mu \ast \lambda\) is uniquely determined. However we have \[\iint f(x+y)d\mu(x)d\lambda(y) = \iint f(x+y)d\lambda(x)d\mu(y)=\int fd(\lambda \ast \mu)\] It follows that \(\lambda \ast \mu = \nu = \mu \ast \lambda\). This convolution is commutative. Note for complex measures we always have \(|\mu|(\mathbb{R}^n)<\infty\) so Fubini's theorem is always valid.

Next we show that \(\ast\) is associative. It can be carried out by Riesz's theorem. Put \(\nu_1 = \lambda \ast (\mu \ast \gamma)\) and \(\nu_2 = (\lambda \ast \mu) \ast \gamma\). It follows that \[\begin{aligned} \int fd\nu_1 &= \iint f(x+y)d\lambda(x)d(\mu \ast \gamma)(y) \\ &= \iiint f(x+y+z)d\lambda(x)d\mu(y)d\gamma(z) \\ &= \iint f(x+y+z)d\gamma(z)d\lambda(x)d\mu(y) \\ &= \iint f(x+y)d\gamma(x)d(\lambda \ast \mu)(y) \\ &= \int fd(\gamma \ast (\lambda \ast \mu)) \\ &= \int fd\nu_2.\end{aligned}\] Hence \(\nu_1 = \nu_2\), which delivers the associativity of the convolution. To show that \(\ast\) makes \(M(\mathbb{R}^n)\) a Banach space, we need to show the distribution law. This follows from the definition of product measure because \[\mu \ast (\lambda_1 + \lambda_2)(E) = \int (\lambda_1 + \lambda_2)(E_{2}^{x})d\mu(x) = \int \lambda_1(E_2^x)d\mu(x) + \int \lambda_2(E_2^x)d\mu(x)\] which is to say \(\mu \ast \lambda_1 + \mu \ast \lambda_2 = \mu \ast (\lambda_1 + \lambda_2)\). Therefore \(M(\mathbb{R}^n)\) is a complex algebra. It remains to show that \(M(\mathbb{R}^n)\) is a Banach algebra. Let \(E^1, E^2, \cdots\) be a partition of \(\mathbb{R}^n\), we see \[\begin{aligned}\sum_{k=1}^{\infty}|\mu \ast \lambda(E^k)| &= \sum_{k=1}^{\infty}|(\mu \times \lambda)(E^k_2)| \\&= \sum_{k=1}^{\infty} \left|\iint \chi_{E_2^k}d\mu d\lambda\right| \\ &\leq \sum_{k=1}^{\infty} \iint \chi_{E_2^k}d|\mu|d|\lambda| \\&\leq |\mu|(\mathbb{R}^n) \cdot |\lambda|(\mathbb{R}^n) \\&\leq \lVert \mu \rVert \cdot \lVert \lambda \rVert.\end{aligned}\] Hence \(\lVert \mu \ast \lambda\rVert \le \lVert \mu \rVert \lVert \lambda \rVert\).

To conclude, \(M(\mathbb{R}^n)\) is a commutative Banach algebra. Even better, this space has a unit which is customarily called the **Dirac measure**. Let \(\delta\) be the measure determined by the evaluation functional \(\Lambda:f \mapsto f(0)\). It follows that \[\begin{aligned}\int f d(\delta \ast \mu) &= \iint f(x+y)d\delta(x)d\mu(y) \\ &= \int f(y)d\mu(y) \end{aligned}\] Hence \(\delta \ast \mu = \mu\) for all \(\mu \in M(\mathbb{R}^n)\). Besides, \(\delta\) has norm \(1\) because it attains value \(1\) at any Borel subset \(E \subset \mathbb{R}^n\) containing the origin and value \(0\) at any other Borel sets.

A measure \(\mu\) is said to be

discreteif there is a countable set \(E\) such that \(\mu(A)=\mu(A \cap E)\) for all measurable sets \(A\) (in general we say \(\mu\) is concentrated on \(E\)). \(\mu\) is said to becontinuousif \(\mu(A)=0\) whenever \(A\) only contains a single point. We write \(\mu \ll \lambda\), \(\mu\) isabsolutely continuouswith respect to \(\lambda\), if \(\lambda(A)=0 \implies \mu(A)=0\).

We now play some games between continuous and discrete measures. First we study the subspace of discrete measures \(M_d(\mathbb{R}^n)\). For sums things are quite straightforward. Suppose \(\mu\) is concentrated on \(A\) and \(\lambda\) is concentrated on \(B\), then \[\begin{aligned}\mu(E) + \lambda(E) &= \mu(E \cap A) + \lambda(E \cap B) \\ &= \mu(E \cap (A \cap (A \cup B))) + \lambda(E \cap (B \cap (A \cup B))) \\ &= \mu(E \cap (A \cup B))+ \lambda(E \cap (A \cup B)).\end{aligned}\] Hence \(\mu+\lambda\) is concentrated on \(A \cup B\).

For convolution things are a little trickier. Suppose \(\mu = \sum_{i=1}^{\infty}a_i\delta_{x_i}\), \(\lambda=\sum_{i=1}^{\infty}b_i\delta_{y_i}\), where the \(x_i\) and \(y_i\) are distinct points, \(\delta_x\) is the Dirac measure concentrated on \(\{x\}\) (hence \(\delta=\delta_0\)), i.e. \(\mu\) is concentrated on \(A=\{x_i\}_{i=1}^{\infty}\) and \(\lambda\) is concentrated on \(\{y_i\}_{i=1}^{\infty}\), we see \[\begin{aligned}(\mu \ast \lambda)(E) &= \iint \chi_E(x+y)d\mu(x)d\lambda(y) \\ &= \int \sum_{i=1}^{\infty}a_i\chi_E(x_i+y)d\lambda(y) \\ &= \sum_{j=1}^{\infty}\sum_{i=1}^{\infty}a_ib_j\chi_E(x_i+y_j) \\ &= \sum_{j=1}^{\infty}\sum_{i=1}^{\infty}a_ib_j\chi_{E \cap (A+B)}(x_i+y_j) \\ &= (\mu \ast \lambda)(E \cap (A+B)).\end{aligned}\] Therefore \(M_d(\mathbb{R}^n)\) forms a subalgebra of \(M(\mathbb{R}^n)\).

Next we focus on the subspace of continuous measures \(M_c(\mathbb{R}^n)\). To begin with we first consider the following identity: \[\begin{aligned}(\mu \ast \lambda)(E) &= \iint \chi_E(x+y)d\mu(x)d\lambda(y) \\ &= \iint \chi_{E-y}(x)d\mu(x)d\lambda(y) \\ &= \int \mu(E-y)d\lambda(y).\end{aligned}\] Suppose \(\mu\) is continuous and \(E\) is a singleton, then \(E-y\) is still a singleton and hence \(\mu(E-y)=0\) for all \(y\), hence \((\mu \ast \lambda)(E)=0\), i.e. \(\mu \ast \lambda\) is still continuous. Therefore the subspace of continuous measures actually forms an ideal.

Next suppose \(\mu \ll m\) and \(m(E)=0\). We see \[(\mu \ast \lambda)(E) = \int \mu(E-y)d\lambda(y) = 0\] because \(m(E)=0\) implies \(m(E-y)=0\) for all \(y\). Hence the subspace of absolutely continuous measures \(M_{ac}(\mathbb{R}^n)\) also forms an ideal.

Finally we consider the Radon-Nikodym derivatives (which exists (surjective) and is unique almost everywhere (injective)) of absolutely continuous measures. If \[\mu(E) = \int_E fdm, \quad \lambda(E) = \int_E gd\mu,\] then the coincide \(\mu \ast \lambda\) coincide with \(f \ast g\) in the following sense: \[\begin{aligned}(\mu \ast \lambda)(E) &= \int_{\mathbb{R^n}} \mu(E-t)d\lambda(t) \\ &= \int_{\mathbb{R^n}}\left(\int_{E}f(x+t)dm(x) \right)g(t)dm(t) \\ &= \int_{\mathbb{R}^n}\int_E f(x+t)g(t)dm(x)d(t) \\ &= \int_E (f \ast g)dm\end{aligned}\] In other words we have \(d(\mu \ast \lambda) = (f \ast g)dm\). Through this we established an algebraic isomorphism \(M_{ac}(\mathbb{R}^n) \cong L^1(\mathbb{R}^n,m)\).

\(L^1(\mathbb{R}^n,m)\) could've been a Banach algebra, but the unit is missing. However one can embed it into \(M(\mathbb{R}^n)\) as a subspace of the subalgebra \(M_{L^1}(\mathbb{R}^n)\) which contains all complex Borel measures \(\mu\) satisfying \[d\mu = fdm + \lambda d\delta, \quad \lambda \in \mathbb{C}.\] Conversely, by Lebesgue decomposition theorem, to every \(\mu \in M(\mathbb{R}^n)\), we have a unique decomposition \[\mu = \mu_a + \mu_s\] where \(\mu_a \ll m\) and \(\mu_s \perp m\). With this being said we have a direct sum \[M(\mathbb{R}^n) = L^1(\mathbb{R}^n,m) \oplus M_s(\mathbb{R}^n)\] where \(M_s(\mathbb{R}^n)\) is the subspace of complex measures singular to \(m\). Informally speaking, the Gelfand transform on \(L^1(\mathbb{R}^n,m)\) can be identified as the Fourier transform. Hence to study the Gelfand transform on \(M(\mathbb{R}^n)\) it suffices to work on \(M_s(\mathbb{R}^n)\). This shows the relation between \(L^1\) and \(C_0\).

\(G\) be the group of invertible elements of \(M=M(\mathbb{R})\), and \(G_1\) be the component of \(G\) that contains \(\delta\). \(G_1\) is an open normal subgroup of \(G\). Since \(M\) is commutative, \(G_1=\exp(M)\), and \(G/G_1\) contains no nontrivial element of finite order. We will show that \(G/G_1\) is actually uncountable. Pick \(\alpha \in \mathbb{R}\), assume \(\delta_\alpha \in G_1\), then \(\delta_\alpha = \exp(\mu_\alpha)\) for some \(\mu_\alpha \in M\). Performing Fourier transform on both sides gives \[\int e^{-ixt}d\delta_\alpha = e^{-i\alpha t} = \int e^{-ixt}d\exp(\mu_\alpha)(x)=e^{\hat{\mu}_\alpha(t)}\] Hence \[-i\alpha t = \hat{\mu}_\alpha(t)+2k\pi{i}\] Since \(\mu_\alpha\) is bounded, so is \(\hat{\mu}_\alpha(t)\). Hence \(\alpha=0\). This is to say \(\delta_\alpha \in G_1 \implies \alpha=0\). Next consider any \(\lambda{G_1} \in G/G_1\). If \(\lambda=\delta_\alpha\) for some real \(\alpha\), then \(\delta_\alpha \in \lambda G_1\) is the only Dirac measure. If not however, then \(\lambda G_1\) contains no Dirac measures. Hence we have obtained an injective but not surjective map \[\begin{aligned}\Lambda:\mathbb{R} &\to G/G_1, \\ \alpha &\mapsto \delta_\alpha G_1.\end{aligned}\] This is to say, \(G/G_1\) is uncountable.

]]>To begin with we consider a calculus problem that you may have seen in your exam:

Let \(f\) be a

continuousfunction on \([0,\infty)\) that \(\lim_{x \to \infty} f(x)=l\). Prove that \[\int_0^\infty \frac{f(ax)-f(bx)}{x}\mathrm{d}x = (f(0)-l)\ln\frac{b}{a}\]

And we solve this problem as follows. Put \(g(x)=f(x)-l\), then \(\lim_{x \to \infty}g(x)=0\). Consider the two variable function \(F(x,y)=-g'(xy)\) and the range \(D=\{(x,y):x \ge 0, a \le y \le b\}\), we have this result: \[\begin{aligned}\iint_D F(x,y)\mathrm{d}x\mathrm{d}y &= \int_0^\infty\mathrm{d}x\int_a^b -g'(xy)\mathrm{d}y \\ &= \int_0^\infty \frac{g(ax)-g(bx)}{x}\mathrm{d}x \\ &= \int_a^b \mathrm{d}y \int_0^\infty -g'(xy)\mathrm{d}x \\ &= \int_a^b \frac{g(0)}{y}\mathrm{d}y \\ &=g(0)\ln\frac{b}{a} \\\end{aligned}\] Substituting \(g(x)\) with \(f(x)-l\) gives exactly what we want, isn't it? **Well, the more analysis you learn, the more absurd this proof has been you will realise.** If you write this in an exam you will get \(0\) mark no matter what. There are two major mistakes:

- Can we change the order of integration? We have no idea. But it is certain that we cannot change the order with ease, and we have some counterexamples.
- Is this function
*even*differentiable? We also have no idea. It is*almost certain*that \(f\) is not (the probability that \(f\) is differentiable is \(0\)), see this post to learn why if you have some background in functional analysis.

For a good proof, please turn to math.stackexchange. This is not easy at all.

The problem is, it is really *unfair* that in some circumstances we have to axe out all properties of differentiation. If you are studying differential equations, and a non-differentiable function pops up, you have no way to go. Sometimes, chances are that you even have *no idea* whether a function is differentiable.

So this post is written. We introduce the concept of (Schwartz) **distribution** (a.k.a. **generalised functions**), where differentiation is significantly extended, to obtain **derivative** in a generalised sense. Roughly speaking, after distribution being introduced, differentiation can be done with absolute ease.

In fact, physicists have been using distribution long before mathematicians established formal theories. For example the \(\delta\) *function* introduced by Dirac that you may have met in Fourier transform: \[\delta(x) = \begin{cases}\infty &\quad x=0, \\ 0&\quad \text{others} .\end{cases}\] And it is required that \[\int_{-\infty}^{\infty}\delta(x)\mathrm{d}x=1.\] But this does not make any sense in calculus. Von Neumann, in his book on quantum physics, warned against the theory using this function, and dismissed this function because this was a "fiction". Not so pleasant. He tried with a lot of effort to demonstrate that, quantum physics could live without such a "fiction". As you can imagine, this function may have created some bad blood between von Neumann and Dirac.

Laurent Schwartz however, managed to be a peacemaker. He developed the theory of distribution (which is exactly what we are talking about in this post), and the "fiction" became an easy "fact". Years later, he became the 1950 Fields Medalist (one of the most prestigious medal/awards in mathematics) at the age of 35 with reason

Developed the theory of distributions, a new notion of generalized function motivated by the Dirac delta-function of theoretical physics. (Source)

As you can see later, thanks to Schwartz, the twisted \(\delta\) function is well-defined and is really plain and elegant. So von Neumann didn't need to be angry later.

By *concept* I mean, I will try to include basic ideas (without many proofs though they can be delivered), so that the serious study of it can be simpler (it can be really tough!). It is not possible that you can solve problems on distributions after reading this post.

There will be two parts. Part one focus on motivation and what is going on. I will try to make it readable to many people having finished calculus or more ideally undergraduate analysis and linear algebra, though rigour is not always guaranteed. It would be better if you know some differential equation theory, but that's not a must. If you already have the background to read part 2, then part 1 is much easier for you and therefore is served as a good source of intuition and motivation.

If you still need to understand differentiation in single-variable calculus, then you have no need to struggle on generalised differentiation at an early point. It does not help. The requirements of linear algebra are vector spaces, subspaces and linear maps. You should know that integration and differentiation are linear maps. This is a graduate course topic, it is not realistic to assume reader to have no idea about calculus and linear algebra.

The second part will be much more advanced, and you are expected to have some background in topological vector spaces (functional analysis). Both parts cannot be considered as a lecture note but they may help you find where you are when you study this concept seriously.

Throughout, we consider functions on \(\mathbb{R}\) with real value. These theories can be generalised to \(\mathbb{R}^n\) with complex value where partial derivative can take part in, but we are not doing that here. At the end of the day, these work would not be a big deal.

In calculus, a lot of functions we study are smooth (for example, \(y=\sin{x}\)), and we write \(C^\infty\) as they are *infinitely differentiable*. This is a vector space and this vector space differentiation can be done *with absolute ease*. For given \(f \in C^\infty\), we have \(f',f'',\cdots,f^{(k)}\) well defined for all \(k = 1,2,\cdots\). But in vector spaces like \(C^2\), \(C^1\), or even \(C\), differentiation can only be done with caution: we may only have \(f''\) and no \(f^{(3)}\), or even \(f'\) does not exist. We don't *feel like* this kind of caution. Hence we introduce the concept of **distribution** which is also known as **generalised functions**. We want a space where we can still do differentiation with absolute ease. We may need to *modify* our definition of differentiation such that it works on every continuous functions (but it shall not lost its meaning within \(C^\infty\)). Bearing these in mind, we have several settings or expectations for distributions:

- Every continuous function should be (considered as) a distribution. (So we can take
derivativesfor all continuous functions without to many worry. Unlike the calculus problem at the beginning.)- The "modified differentiation" should make sure that the "modified derivative" of a distribution is still a distribution. In other words, distributions are "infinitely differentiable" (which makes differential equation theory much easier). In the language of algebra, the "modified derivative" should be an endomorphism.
- The usual formal rules of calculus should hold. For example in the new sense we should still have \((fg)'=f'g+g'f\). (Our modified differentiation should not go to far.)
- Convergence properties should also be available. (Validating this requires more theories so this can only be mentioned in part 2.)

Let's write our desired distribution as \(\mathscr{D}'\), and all continuous functions \(C\). All \(C,C^\infty,\mathscr{D}'\) are considered as real vector spaces and we should have \[C^\infty \subset C \subset \mathscr{D}'\] in the sense of subspaces.

Here is a breakdown of these concepts. You will see terminologies and definitions later.

- A smooth, continuous or more generally, locally integrable function, give rise to a bounded linear functional. The converse is not guaranteed to be true, but we
pretendit to be true, so allbounded linear functionalsgive rise to distributions, a.k.a. generalised functions (this name is nice because wepretendthe converse to be true).Whenever you are asked what is generalised function, you can say, it is a linear map, and sometimes it can be determined by a normal function.- For these distributions or generalised functions, we modify the derivative with respect to integration by parts. The modified derivative cannot be put down explicitly but we don't care, because integration by parts doesn't give us many problems.
Whenever you are asked how the derivative of a non-differentiable function is given, you can say, it is given by pretending nothing wrong in integration by parts.

We now try to understand what we really what about distribution. We start our study through integration, **because differentiation does not work**. Given \(f \in C \subset \mathscr{D}'\), we first need to make sure \(\int f\phi\) is well-defined, for *some* \(\phi\in C^\infty\), because we want to do integration by parts, which involves **some differentiation**, and we may make use of it.

If \(f\) is not even a continuous function, we still need to consider *some* \(\phi\) in the same manner, or our extension would be abrupt.

Let's talk about these \(\phi\) a little bit, with respect to integration by parts. Consider the bump function \[\phi(x) = \begin{cases} \exp(\frac{1}{(x-a)(x-b)}) & \quad a < x < b, \\ 0 &\quad \text{ otherwise. }\end{cases}\] On \((a,b)\), we have $ $. On the boundary \(a\) and \(b\) we have \(\phi(x)=0\) but that shouldn't be a problem, because they are the alpha and omega. Points outside \([a,b]\) have no contribution to the value of this function. For some obvious reason we call \([a,b]\) the *closure* of \((a,b)\). In general, given a real-valued function \(f\), we call the closure of the set of points where \(f(x) \ne 0\) the **support** of \(f\). As you can tell, the support of \(\phi\) is \([a,b]\).

If \(\phi\) has unbounded support (the support of a function \(f\) is the closure of the set of points \(x\) where \(f(x) \ne 0\)), then we may need to discuss limit at infinity. But we don't want improper integrals at all. Hence the support of \(\phi\) are always assumed to be **closed and bounded** subset of \(\mathbb{R}\) It is closed because it is defined to be a closure. These closed and bounded sets are called *compact* sets. If you are not familiar with topology, it is OK at this moment to consider compact sets as bounded closed interval \([a,b]\).

The test function space \(\mathscr{D}\) is defined to be all \(C^\infty\) functions with compact support. This is indeed a vector space and the verification is a good excise on both linear algebra and calculus. What about \(\mathscr{D}'\)? Here we demonstrate how things are extended.

For each \(f \in C\) (which contains \(C^\infty\)), we have a functional (a functional is a linear map between a vector space and its base field, here is \(\mathbb{R}\). Nothing special, just a different name that has been used by mathematicians for decades!) \[\begin{aligned} \Lambda_f: \mathscr{D} &\to \mathbb{R}, \\ \phi &\mapsto \int f\phi.\end{aligned}\] This functional is **bounded** for all \(\phi \in \mathscr{D}\) because if \(\phi\) has support \(K\), then \[|\Lambda_f(\phi)|=\left|\int_K f\phi\right| \le \left(\int_K |f| \right)\sup_{x \in K}|\phi|.\] A continuous function on a compact set is always bounded (proof), hence the integral on the right hand side is always bounded. If it touches infinity a lot of problems are also touched.

In general, a **bounded linear functional** \(\Lambda:\mathscr{D} \to \mathbb{R}\) is called a *distribution*, which forms \(\mathscr{D}'\) exactly. Since every continuous function \(f\) gives rise to a unique bounded functional \(\Lambda_f\), we consider \(C\) as a subspace of \(\mathscr{D}'\). Such a function give rise to a functional, which is called distribution. The converse is not generally true, but we *pretend* it to be true (we pretend the functional gives rise to a function anyway), which makes our study easier, hence the name *generalised function* is well-deserved.

Differential operator \(D\) in \(C^\infty\) should be extended naturally into \(\mathscr{D}'\) naturally. There are many ways to extend a linear function. For example the identity map \(i:\mathbb{R} \to \mathbb{R}\) has at least two ways to be extended into \(\mathbb{R}^2\):

- \(I:\mathbb{R}^2 \to \mathbb{R}^2\) by \((x,y) \mapsto (x,y)\).
- \(\pi:\mathbb{R}^2 \to \mathbb{R}\) by \((x,y) \mapsto x\).

The restriction of these two maps on \(\mathbb{R}\) is the same as \(i\).

But if we extend \(D\) in several ways, things would be messy. Originally derivative is defined in the sense of limit, but for a non-differentiable function, we cannot do that. We need an extension that makes most sense: it is by validating **integration by parts**. It seems like we are developing some advanced concepts, but still we need to make use of elementary ones.

For \(f(x)=\sin{x}\) and \(\phi \in \mathscr{D}\), we have \[\Lambda_{f'}(\phi)=\int f'\phi = \int \phi\cos{x} = \underbrace{\phi\sin{x}|_{-\infty}^{\infty}}_{\text{zero}} -\int \phi'\sin x=-\Lambda_f(\phi')\] The derivative of \(f\) is assigned to the derivative of \(\phi\). Again we are using integration by parts. If \(f\) is not assumed to be differentiable, we *pretend* it is, skip the body and jump to the result immediately. For example, \(f(x)=|x|\) is not differentiable, but we do that anyway: \[\int |x|'\phi = -\int |x|\phi'.\] In general for \(f \in C^\infty\), we have (this can be verified by some computation) \[\Lambda_{D^k f}(\phi)=\int D^k f \phi = (-1)^k \int fD^k\phi = (-1)^k \Lambda _f(D^k\phi).\] Differentiation for distributions (on top of \(C^\infty\) functions) should be in the same **shape**, hence we define the \(k\)-th **distribution derivative** of a distribution \(\Lambda\) by \[D^k\Lambda: \phi \mapsto (-1)^{k}\Lambda(D^k\phi).\] Since all \(\phi\) are assumed to be of \(C^\infty\), there are no problem with this formula and this differentiation is defined for all \(\Lambda\). We don't care about first order limit on a continuous but not differentiable function. What matters here is the differentiation on test functions.

Try to recall what you have learnt about integration by parts. We have \[\int uv' = \int (uv)' - \int u'v\] because \[(uv)' = u'v+uv'.\] Therefore, if our generalisation of differentiation (though we do not know how to do yet) pays respect to integration by parts, then we can still work on product rule of differentiation, hence the usual formal rules of calculus would not go too far. If our extension conflicts with integration by parts, then the ordinary meaning of differentiation is damaged.

Let's sum up what has happened. We have obtained an inclusion \[C^\infty \subset C \subset \mathscr{D}'.\] Every distribution is infinitely differentiable because functions in \(\mathscr{D}\) are. If \(f \in C^\infty\), then the \(k\)-th derivative can be understood in both the sense of ordinary differentiation and the sense of distribution because it is given by \[\phi \mapsto (-1)^k\int f \phi^{(k)} = \int f^{(k)}\phi\quad \forall \phi \in \mathscr{D}. \] This is independent to the choice of \(\phi\). If \(h\) is a function such that \(\int h\phi = \int f^{(k)}\phi\), then \(h=f^{(k)}\).

If \(f\) is merely continuous, still we can write the \(k\)-th derivative as \[\phi \mapsto (-1)^{k} \int f \phi^{(k)} \quad \forall \phi \in \mathscr{D}.\]

At this point, whether \(f\) is differentiable or not is not of our concern. Since \(\phi\) is smooth, the formula above is well-defined. In general we don't even care whether \(f\) is continuous or even integrable, as long as it gives rise to a **bounded** linear functional, which can be guaranteed by being *locally integrable*. A function is locally integrable if \(\int_K |f|<\infty\) for all compact \(K \subset \mathbb{R}\). In particular, \(K\) can be taken to be any bounded closed interval. **As long as \(f\) is locally integrable (for example, differentiable, continuous, or simply bounded), we can assign derivative in the new sense (integration by parts).**

We want something like \((fg)'=f'g+fg'\). To avoid confusion we use \(D\) to denote the derivative on distribution and \(f'\) to denote the derivative in the ordinary sense. This is pretty hard but for a multiplication of a \(C^\infty\) function and a distribution it is not that hard. Suppose \(\Lambda \in \mathscr{D}'\) and \(f \in C^\infty\). We define their 'product' by \[(f\Lambda)(\phi) = \Lambda(f\phi).\] We have another distribution and derivative follows in a natural way: \[\begin{aligned} D(f\Lambda)(\phi) &=-(f\Lambda)(\phi') \\ &= -\Lambda(f\phi') \\\end{aligned}\] Meanwhile \[\begin{aligned}(f'\Lambda+fD\Lambda)(\phi) &= \Lambda(f'\phi)+D\Lambda(f\phi) \\ &= \Lambda(f'\phi)-\Lambda(f'\phi+f\phi') \\ &=-\Lambda(f\phi').\end{aligned}\] Things still work in this aspect.

We haven't verify convergence yet, but that requires much more knowledge on functional analysis, so we don't do that here but in part 2. Fortunately, things would go in an intuitive way.

Consider the linear functional on \(\mathscr{D}\) by \[\delta(\phi)=\phi(0).\] This is bounded and is in fact our rigour definition of Dirac \(\delta\) function (Von Neumann can relax then!). It does have the *required property*. Say, if we realise this function as integration (informally) as \[\delta(\phi)=\int \delta\phi=\phi(0) \quad \forall \phi \in \mathscr{D},\] then \(\delta\) can indeed be considered as a *function* whose support is the origin, and the integral over \(\mathbb{R}\) is \(1\).

The *derivative* of \(\delta\) is well-presented as well. Note \(\delta'(\phi)=\delta(\phi')\), hence we have \[\delta'(\phi)=\phi'(0).\]

So much for part 1. If you don't have many background in functional analysis, then part 2 is not recommended, as you have no idea what is going on at all. It is not feasible to make part 2 to be readable to more people.

Here we provide some basic facts of test functions and distributions, assuming the reader some background in functional analysis. No proof is delivered because if I do this post can be as long as I want. I hope by organising facts here I can help you realise what is going on before you drown yourself in details of a proof. It is recommended to see the table of content on the right hand side first if you are on PC.

In brief, test functions are smooth functions with compact support. By the **support** of a function we mean the *closure* of the set \(\\{x:f(x) \ne 0\\}\). Let \(K\) be a compact set in \(\mathbb{R}\), then \(\mathscr{D}_K\) denotes a subspace of \(C^\infty\) whose support lies in \(K\). Since a closed subset of a compact set itself is compact, we see all functions in \(\mathscr{D}_K\) have compact support.

Test function space is defined by \[\mathscr{D} := \bigcup_{K \text{ compact}}\mathscr{D}_K.\] And the distribution space \(\mathscr{D}'\) is defined to be the dual space of \(\mathscr{D}\), i.e. the space of *continuous* linear functionals of \(\mathscr{D}\). But if we don't know the topology of \(\mathscr{D}\), we cannot proceed. *Here is how we attempt to establish the norm.*

Consider the norm for \(\phi \in \mathscr{D}\) for all \(N=0,1,2,\cdots\) by \[\| \phi \|_N = \sup_{x \in \mathbb{R}; n \le N}|D^nf|.\] This induces a local base \[V_N = \left\{ \phi \in \mathscr{D}_K:\|\phi\|_N \le \frac{1}{N} \right\} \quad (N=1,2,3,\cdots).\]

And we get a locally convex metrisable topology on \(\mathscr{D}\).

If this topology makes \(\mathscr{D}\) a Banach space, then it would be fantastic - a lot of Banach space technique can be used. However, this topology is too *small* to be complete. One simply need to consider this sequence: \[\psi_m(x)=\phi(x-1)+\frac{1}{2}\phi(x-2)+\cdots+\frac{1}{m}\phi(x-m)\] where \(\phi \in \mathscr{D}_{[0,1]}\) and \(\phi>0\) on \((0,1)\). This sequence is Cauchy but the limit has no bounded support hence does not lie in \(\mathscr{D}\).

This time we do an *enhancement* on the previous topology, which makes \(\mathscr{D}\) a locally convex topological space, which is complete and has the Heine-Borel property (closed and bounded set is compact and vice versa). We still need the topology defined in our first attempt. It is broken into three steps:

- For each compact set \(K\), let \(\tau_K\) denote the subspace topology of \(\mathscr{D}\) defined in attempt 1.
- Let \(\beta\) be the collection of all convex balanced set \(W \subset \mathscr{D}\) such that \(\mathscr{D}_K \cap W \in \tau_K\) for all compact \(K\). (A set \(W\) is balanced if \(\alpha{W} \subset W\) for all \(|\alpha| \le 1\).)
- The new topology \(\tau\) is defined to be the collection of all unions of sets of the form \(\phi + W\) with \(\phi \in \mathscr{D}\) and \(W \in \beta\).

This is the topology we want, and one can indeed verify that \(\tau\) is a topology, with local base \(\beta\). This topology has the following properties:

- \(\tau\) makes \(\mathscr{D}\) a locally convex topological vector space.
- \(\mathscr{D}\) has the Heine-Borel property.
- In \(\mathscr{D}\), every Cauchy sequence converges.

Locally, **the topology of \(\mathscr{D}_K\) is the same as \(\tau_K\)**. Hence we can still use properties of these norms if we want. In fact, this \(\tau_K\) makes \(\mathscr{D}_K\) a Fréchet space, i.e. locally compact and complete metric space.

We cannot discuss continuity without topology. But still continuity has to be treated carefully. For example the space \(L^p([0,1])\) with \(0<p<1\) is weird: the dual space is trivial, due to its topology: the only two open convex sets are empty set and itself. Fortunately we have the following, which is quite intuitive.

Suppose \(\Lambda\) is a linear mapping of \(\mathscr{D}\) into a locally compact convex space \(Y\) (which can be \(\mathbb{R}\), \(\mathbb{C}\) or \(\mathscr{D}\) itself). Then the following are equivalent:

- \(\Lambda\) is continuous. (We care about the behaviour of \(\mathscr{D}'\))
- \(\Lambda\) is bounded. (You must have learnt the equivalence of 1 and 2 already)
- \(\phi_i \to 0\) in \(\mathscr{D}\) implies \(\Lambda\phi_i \to 0\) in \(Y\).
- The restriction of \(\Lambda\) to every \(\mathscr{D}_K\) is continuous.

In particular, it follows that the differential operator \(D^n\) is continuous for all \(n\). We also have some knowledge of the behaviour of \(\mathscr{D}'\) now:

If \(\Lambda\) is a linear functional on \(\mathscr{D}\), then the following are equivalent:

- \(\Lambda \in \mathscr{D}'\).
- To every compact set \(K\) there corresponds a nonnegative integer \(N\) and a constant \(C<\infty\) such that the inequality
\[|\Lambda\phi| \le C \|\phi\|_N\]

holds for every \(\mathscr{D}_K\).

Consider the Dirac distribution on \(x\) given by \[\delta_x(\phi)=\phi(x)\quad \phi \in \mathscr{D}.\] This is indeed a distribution. The case when \(x=0\) gives us the Dirac function in physics. Note \[\mathscr{D}_K = \bigcap_{x \in K^c}\ker\delta_x,\] \(\mathscr{D}_K\) is a **closed subspace** of \(\mathscr{D}\). Since \(\mathscr{D}_K\) is also nowhere dense, and there is a countable collection of \(K_i \subset \mathbb{R}\) (for example \(K_i=[-i,i]\)) such that \(\mathscr{D} = \bigcup \mathscr{D}_i\) (of the first category), and \(\mathscr{D}\) itself is complete, by Baire's Category Theorem, \(\mathscr{D}\) is not metrisable. This is a flaw of the topology of \(\mathscr{D}\), though is not that troublesome.

We have shown that every \(C^\infty\) functions can be considered as a distribution. In general, for a function \(f\) one only need to require that \(f\) is **locally integrable**, i.e. for every compact set \(K\) we have \[\int_K |f|<\infty.\] If we define \(\Lambda_f:\phi \mapsto \int f\phi\), we see \[|\Lambda_f(\phi)|\le \left( \int_K |f| \right)\sup|\phi|, \quad \phi \in \mathscr{D}_K.\]

In particular, at the very least, all \(L^1\) functions can be considered as distributions.

On the other hand, if \(\mu\) is a positive measure on \(\mathbb{R}\) with \(\mu(K)<\infty\) for all compact \(K\), then \[\Lambda_\mu:\phi \to \int \phi d\mu\] also defines a distribution.

We know the fundamental theorem of calculus in \(L^1\) only hold when the function \(f\) is *absolutely continuous*. The Cantor function \(f\) is differentiable almost everywhere on \([0,1]\) but \[\int_0^1 f'(x)\mathrm{d}x = 0, \quad f(1)-f(0)=1.\] This restriction still makes sense here. Pick \(f\) to be a left-continuous function with bounded variation. Then it can be shown that \[D\Lambda_f = \Lambda_\mu\] where \(\mu([a,b))=f(b)-f(a)\). Hence \(D\Lambda_f=\Lambda_{Df}\) if and only if \(f\) is *absolutely continuous*.

We consider the weak*-topology of \(\mathscr{D}'\) by \[\Lambda_i \to \Lambda: \lim_{i \to \infty}\Lambda_i\phi = \Lambda\phi \quad \forall \phi \in \mathscr{D}.\] Then fortunately this limit operator commutes with differential operator in a natural way, which may remind you of uniform convergence. In fact, \[\Lambda_i \to \Lambda \implies \Lambda \in \mathscr{D} \text{ and }D^k\Lambda_i \to D^k\Lambda \quad \forall k=1,2,\cdots.\] To prove this one needs Banach-Steinhaus theorem. Here concludes our four requirements of distributions.

Convolution plays an important role in Fourier analysis, and here is how to invite distribution to the party.

Normally for two \(L^1\) functions \(f,g\) we define \[(f \ast g)(x)=\int_\mathbb{R}f(y)g(x-y)\mathrm{d}y.\] We can create more symbols to make life easier:

- \(\tau_xu(y)=u(y-x)\).
- \(\check{u}(y)=u(-y)\).

It follows that \(\tau_x\check{u}(y)=\check{u}(y-x)=u(x-y)\). Hence \[(f \ast g)(x) = \int_\mathbb{R} f(y)(\tau_x\check{g})(y)\mathrm{d}y.\] It shows that \(g \to (f \ast g)(x)\) is actually a linear functional of \(\Lambda_f\), \(\tau_x\) and \(g \mapsto \check{g}\). But \(\Lambda_f\) itself can be a distribution, hence we define convolution for a distribution and a smooth function by \[L \ast \phi(x) = L(\tau_x\check{\phi}), \quad L \in \mathscr{D}', \phi \in \mathscr{D}.\] Convolution can be characterised in a natural way. In fact, for any \(T:\mathscr{D} \to C^\infty\), if \[\tau_x T = T\tau_x,\] then there is a unique \(L \in \mathscr{D}'\) such that \[T\phi = L\ast \phi.\] As you can imagine, this setting creates a lot of potentials for Fourier transform.

- Walter Rudin,
*Functional Analysis*, Second Edition. (Part II of the book) - Peter Lax,
*Functional Analysis*. (Appendix B) - Stanford Encyclopedia of Philosophy Archive (Fall 2018 Edition), Quantum Theory: von Neumann vs. Dirac.

Let us say you are a programmer who has been working in big companies for a decade. How does it feel when you want to help someone who starts studying programming from scratch? You may find it makes no sense that he or she cannot understand that, by copying several lines of code on the book, they has successfully made a programme printing "Hello, world!" on the screen. You know what I am talking about - the curse of knowledge.

When one has successfully learnt some certain skill, they may immediately lose the sense on why other people cannot understand and study. What is the holdup? It becomes increasingly difficult to teach beginners. Blunt simplification does not do the trick all the time.

This is one of the reasons why becoming a good teacher is so hard. Academia superstars may be super awful in teaching, while teaching superstars may have already ceased focusing on academia.

I am not writing this post to be a guru and give some steps on how to lift the curse. In fact I think I am suffering from this as well.

For example, Tien-Yien Li was a famous curse of knowledge lifter. When he did talks, he always tried to start from simple examples (this is adorable of course). When instructing his students, he may ask his students to treat him as a fool, as if he had known nothing. He was indeed a good mathematician and good maths teacher, but I do wonder how practical it is. Can his students do calculus in front of him while assuming he has no idea what is calculus? I have no idea.

Though I am only guessing, I think 'fool' is somewhat over-exaggerating. His students were in the similar field as him, hence it would not be too hard to follow his student at all. Of course the way he instruct his students is adorable as well.

There was a reader emailed me, giving me suggestion on, well, I should write my post simpler at some certain points. But I declined his suggestion in the end. Am I doing some Serge Lang thing? I have no idea.

In his 1983 book Fundamentals of Diophantine Geometry, he included L. J. Mordell's review of Lang's own book Diophantine Geometry which was ended by

In conclusion, the reader will need no convincing that Lang, as has already been said, is a very learned mathematician, thoroughly familiar with every aspect of the topics he deals with, and their developments. His interesting and valuable historical notes give further evidence of this. Lang assumes that his readers are as knowledgeable as he is, and can grapple with the subject with the same ease that he does. Even if they could, Lang's style is not such as to make matters easy for them. Lang in writing is not a follower of Gauss, whose motto was "

pauca sed matura." Further thought and care about his book, before publication, would have been well worth while. Those who can understand the book will be indebted to him for having brought together in one volume the important results contained in it. How much greater thanks would he have earned if the book had been written in such a way that more of it could have been more easily comprehended by a larger class of readers! It is to be hoped that so me one will undertake the task of writing such a book.And he also included his response:

All my books are meant to be understood by readers having the prerequisites for the level at which the books are written.

These prerequisites vary from book to book, depending on the subject matter, my mood, and other aesthetic feelings which I have at the moment of writing.When I write a standard text in Algebra, I attempt something very different from writing a book which for the first time gives a systematic point of view on the relations of Diophantine equations and the advanced contexts of algebraic geometry. The purpose of the latter is to jazz things up as much as possible. The purpose of the former is to educate someone in the first steps which might eventually culminate in his knowing the jazz too, if his tastes allow him that path. And if his tastes don't, then my blessings to him also. This is known as aesthetic tolerance. But just as a composer of music (be it Bach or the Beatles),I have to take my responsibility as to what I consider to be beautiful, and write my books accordingly, not just with the intent of pleasing one segment of the population. Let pleasure then fall where it may.With best regards, Serge Lang.

*Refer to this reddit post for a discussion.*

I can speak with absolute certainty that my posts are much more detailed than Serge Lang. And Lang never tried to lift the curse. But my posts cannot be readable to everyone. Say my posts on functional analysis, is not prepared for middle school students, unless they are ridiculously exceptional and have studied all prerequisites (linear algebra, real analysis, integration theory, topology) at that time. Though I shall never make my posts as terse as in Lang's book, it is never my duty to make my posts readable for everyone. So to some extent I fail as well.

If I try to, over-simplification has to be admitted. And it is against my rule. I do not like over-simplification so I try to make sure everything makes sense. But one would not understand unless he or she has certain prerequisites. I may recover some obstacles and show the clues, but that is so much for it. I can only lift the curse with respect to a certain group of people.

It seems I did not give a thoughtful discussion. But I do hope my inbox gives me good chance for discussion instead of chance to spark unnecessity. I did not try to close myself and a good evidence is that many of my posts can be found on the first page of Google search.

]]>Throughout we consider the polynomial ring \[R=\mathbb{R}[\cos{x},\sin{x}].\] This ring has a lot of non-trivial properties which give us a good chance to study commutative ring theory.

First of all note it is immediate that \[R \cong \mathbb{R}[X,Y]/(X^2+Y^2-1)\] if the map is given by \(X \mapsto \cos x\) and \(Y \mapsto \sin x\). Besides, in \(R\) we have \[\sin^2x=(1-\cos{x})(1+\cos{x})=\sin{x}\cdot\sin{x}\] which is to say that \(R\) is not a factorial ring, although \(\mathbb{R}[X,Y]\) is.

This blog post is inspired by an exercise on Serge Lang's *Algebra*. But when writing this blog post, I found some paywalls. It would be absurd of me to direct a random reader to these paywalls. So it is very likely that I will include proofs as many as possible (when there is an absurd paywall, and chances are I will rework them for readability). But I can't remove the assumption that the reader has finished Atiyah-MacDonald full book or equivalences at the very least. I will add more topics in the future but that is not an easy job.

By Hilbert's basis theorem, \(\mathbb{R}[\cos{x}]\) and therefore \(\mathbb{R}[\cos{x},\sin{x}]\) are Noetherian. Now we are interested in the normality of it. Since \(\mathbb{R}[X,Y]/(X^2+Y^2-1) \cong \mathbb{R}[X][Y]/(Y^2-(1-X^2))\) and \(2\) is a unit, \(1-X^2\) is square free but not a unit, we are able to apply the following lemma to show that \(R\) is a normal Noetherian ring (integrally closed in its field of fraction). For definition and properties of normal ring, please refer to the stack project.

(Lemma 1)Let \(A\) be a factorial ring with field of fraction \(K\) in which \(2\) is a unit, \(a\) in \(A\) a square-free element (i.e., if \(p\) is a prime element in \(A\), then \(a \not\in p^2A)\) which is not a unit. Then \(A[T]/(T^2-a)\) is normal.

Let \(t\) be the image of \(T\) in \(A[T]/(T^2-a)\) and in \(L\). Then it is clear that \(A[t] \cong A[T]/(T^2-a)\) and we can write \(L=K(t)\). Note an element in \(K(t)\) is of degree at most \(1\), which is to say every element in \(L\) can be written uniquely as a sum \(r+st\) where \(r,s \in K\). To prove integral closeness, we need to find minimal polynomial of \(r+st\).

Next we show when \(A[t]\) is integrally closed. Note \[ \begin{aligned} \left[(r+st)-r\right]^2=(st)^2 &= s^2[T^2+(T^2-a)]\\ &= s^2[a+T^2-a+(T^2-a)]\\ &= as^2 \end{aligned}\] Hence \(f(X)=(X-r)^2-as^2\) sends \(r+st\) to \(0\). For polynomial of degree \(1\), we can only write \(g(X)=X-X\) such that \(g(r+st)=0\), which is absurd. Hence \(f(X)\) is the minimal polynomial of \(r+st\). With these being said, \(r+st\) is integral over \(A[t]\) if and only if \(-2r \in A[t]\) and \(r^2-as^2 \in A[t]\). We need to show this implies \(r+st \in A[t]\). Since we can consider \(A\) to be a subring of \(A[t]\), it suffices to show that \(r,s \in A\), provided \(-2r \in A\) and \(r^2-as^2 \in A\) when \(s \ne 0\).

Since \(2\) is a unit in \(A\), \(-2r \in A\) clearly implies \(r \in A\). It remains to prove that \(-as^2 \in A\). For \(s \in K\), we can write \(s=s_1/s_2\) with \(s_1,s_2 \in A\) relatively prime. We shall show that \(s_2\) will always be a unit, which implies that \(s \in A\). Write \(as^2=h\), then we have \(as_1^2=hs_2^2\). Assume \(s_2\) is not a unit, then there is a prime \(p\) divides \(s_2\) as \(A\) is a factorial ring. hence \(as_1^2 = hs_2^2 \in p^2A\). Since \(s_1\) and \(s_2\) are relatively prime, \(p\) and \(p^2\) do not divide \(s_1\), hence \(a \in p^2A\), a contradiction (we have assumed \(a\) to be square-free. Also, the assumption that \(a\) is not a unit is used here to reach the contradiction). Hence \(s_2\) is a unit, \(s \in A\) and therefore \(-as^2 \in A\). The proof is complete. \(\square\)

Of course I shan't be this lazy. It is clear that in the factorial ring \(A=\mathbb{R}[X]\), \(2\) is a unit. By square-free, we mean, if \(p \in A\) is prime, then \(a \not \in p^2A\). For example, in \(\mathbb{Z}\), \(12\) is not square free because \(12=2^2 \times 3 \in 2^2\mathbb{Z}\) while \(14\) is square-free because \(14=2 \times 7\) and square does not appear. And for \(1-X^2\) things is clear because we only have \(1-X^2=(1-X)(1+X)\) - there is no square. We require \(2\) to be a unit because if not this argument becomes much more difficult to prove. We shall return to normality after we study the irreducible elements.

To conclude we have got a satisfying result:

(Proposition 1)\(R\) is a normal Noetherian ring.

With help of Fourier transform or elementary trigonometric relations, every polynomial in \(R=\mathbb{R}[\cos{x},\sin{x}]\) can be written in the form \[P(x) = a_0+\sum_{k=1}^{n}(a_k\cos{kx}+b_k\sin{kx})\] where \(a_0,a_k,b_k \in \mathbb{R}\). Define the degree \(\delta(P)\) to be the maximum of integers \(r,s\) where \(a_r,b_s \ne 0\). Then a direct computation shows that \(\delta(PQ)=\delta(P)+\delta(Q)\).

If \(\delta(P)=0\), then \(P(x)=a_0\) is zero or a unit. If \(\delta(P)=1\), then if we have \(P=P_1P_2\), then \(\delta(P_1)+\delta(P_2)=1\). One of them has to be unit, hence \(P\) is irreducible. If \(\delta(P)=2\), then \(P\) is reducible because we can solve equations in the expansion of the product \[(a+b\sin{x}+c\cos{x})(a'+b'\sin{x}+c'\cos{x}).\] By induction all polynomials of degree \(\ge 2\) is reducible. Hence irreducible elements are of the form \[a+b\sin{x}+c\cos{x} \quad (b,c) \ne (0,0).\] But since \(R\) is not a UFD, we cannot work on the ideal \((a+b\sin{x}+c\cos{x})\) directly. We need to dive into abstraction for a long time.

We now proceed to another satisfying result.

(Proposition 2)\(R\) is a Dedekind domain.

*Proof.* Throughout, we work on the form \(R \cong \mathbb{R}[X,Y]/(X^2+Y^2-1)\). Since \(\mathbb{R}[X,Y]\) is of Krull dimension \(2\) (see Atiyah-MacDonald exercise 11.7, where a solution is almost given), \(X^2+Y^2-1\) is irreducible, we have a prime ideal \((X^2+Y^2-1)\), and all prime ideals \(P \subset \mathbb{R}[X,Y]\) strictly containing \((X^2+Y^2-1)\) are maximal. Next, let the canonical map \(\pi:\mathbb{R}[X,Y] \to \mathbb{R}[X,Y]/(X^2+Y^2-1)\) be given. By proposition 1.1 of Atiyah-MacDonald, \(\pi(P)\) are maximal ideals in \(\mathbb{R}[X,Y]/(X^2+Y^2-1)\) provided that \(P \supsetneq (X^2+Y^2-1)\) is prime. If nontrivial ideal \(Q \subset \mathbb{R}[X,Y]/(X^2+Y^2-1)\) is prime, then \(\pi^{-1}(Q)=Q^c\) is also prime, and it contains \((X^2+Y^2-1)\) strictly, which implies that \(Q\) is maximal. Hence \(R\) is of Krull dimension \(1\). By proposition 1, \(R\) is integrally closed, hence it is Dedekind. \(\square\)

Let \(A\) be an integral domain and \(P\) be the set of all prime ideals of height \(1\), i.e. the set of all prime ideals that only contain itself as a nonzero prime ideal. Then \(A\) is a Krull domain if

(KD1) \(A_{\mathfrak{p}}\) is a discrete valuation ring for all \(\mathfrak{p} \in P\). (KD2) \(A\) is the intersection of these discrete valuation rings (all considered as subrings of the field of fraction of \(A\). (KD3) Any nonzero element of \(A\) is contained in only a finite number of height \(1\) prime ideals.

To proceed our study of \(R\), we need a lemma:

(Lemma 2)If \(A\) is a Dedekind domain, then \(A\) is also a Krull domain.

Next we prove (KD3). Pick any nonzero \(a \in A\). If \(a\) is a unit, then it is contained in \(0\) ideals. If not, consider the ring \((a)=aA\). We have a unique factorisation as a product of prime ideals: \[ (a)= \mathfrak{p}_1^{r_1}\cdots\mathfrak{p}_n^{r_n} \subset \bigcap_{j=1}^{n}\mathfrak{p}_j.\] Hence (KD3) is proved.

For (KD2), note first \(A \subset \bigcap_{\mathfrak{p}}A_{\mathfrak{p}}\) because the natural map \(A \to A_{\mathfrak{p}}\) is injective for all \(\mathfrak{p}\). Hence it suffices to prove the reverse. But elements in \(A_{\mathfrak{p}}\) are of the form \(a/s\). Hence we expect those elements of the form \(b/1\) to be in \(A\). Therefore it suffices to prove that \(b/1 \in (a/1)A_{\mathfrak{p}}\) for all prime \(\mathfrak{p}\) implies \(b \in aA\) for all \(a,b \in A\), \(a ,b\ne 0\). Put \[ (a)=\mathfrak{p}_1^{r_1}\cdots\mathfrak{p}_n^{r_n}\] we see \(\mathfrak{q}_i = \mathfrak{p}_i^{r_i}\) is \(\mathfrak{p}_i\)-primary and we obtain a primary decomposition. Note we in particular have \[ b \in \bigcap_{j=1}^{n}\left(aA_{\mathfrak{p}_i} \cap A \right) = \bigcap_{j=1}^{n}\mathfrak{q}_i = aA\] because each \(\mathfrak{p}_i\) has height \(1\). \(\square\)

Which is to say that

(Proposition 3)\(R\) is a Krull domain.

We know that since \(R\) is Dedekind, its fractional ideals form an abelian group. This gives rise to the ideal class group. By a result of Samuel, we have a shockingly simple fact:

(Proposition 4)The ideal class group \(Cl(R) \cong \mathbb{Z}/2\mathbb{Z}\).

Which can be considered as a corollary to this following statement:

(Samuel)Let \(F\) be a non-degenerate quadratic form in \(k[X_1,X_2,X_3]\). Let \(A_F=k[X_1,X_2,X_3]/(F)\). Then \(Cl(A_F)=\mathbb{Z}/2\mathbb{Z}\) if and only if there is a nontrivial solution to \(F(X_1,X_2,X_3)=0\) in \(k\).

One can find this result via this link, and refer to **study of plane conics**.

With these being said, by theorem 8 of Zaks' paper, one sees that \(R\) is a HFD domain. To be precise, for polynomials \(x_1,x_2,\cdots,x_n\) and \(y_1,y_2,\cdots,y_m\), if \(x_1x_2\cdots x_n=y_1y_2\cdots y_m\), then \(m=n\). I may recover the proof here one day, but it would be much more difficult than writing everything you have seen here. This ring \(R\) also shows that HFD is not necessarily UFD.

Since \(Cl(R) \cong \mathbb{Z}/2\mathbb{Z}\), for any maximal ideal \(M \subset A\), either it is principal or \(M^2\) is principal. If \(M\) and \(M'\) are two non-principal ideal, then \(MM'\) is principal. Conversely, for any irreducible \(z \in R\), either \((z)\) is maximal or \((z)=MM'\) for some maximal ideal \(M\) and \(M'\), and \(M\) and \(M'\) may coincide. We have given the form of irreducible elements \[z = a+b\sin{x}+c\cos{x},\quad (b,c) \ne (0,0).\] So we are now interested in these \(a,b,c\). We will do some high school trick first. If we put \[\begin{cases}k = \frac{a}{\sqrt{b^2+c^2}} \\b' = \frac{b}{\sqrt{b^2+c^2}} \\c' = \frac{c}{\sqrt{b^2+c^2}}\end{cases}\] then \(z= \sqrt{a^2+b^2}(\sin(x+\alpha)+k)\) where \(b'=\cos\alpha\) and \(c' = \sin\alpha\). Since \(\sqrt{a^2+b^2} \in \mathbb{R}\) it suffices to study elements of the form \(\sin(x+\alpha)+k\).

Define a shift morphism \(h:R \to R\) by \[h(\cos{x})=\cos(x+\alpha), \quad h(\sin{x}) = \sin(x+\alpha), \quad h(t) = t\] This map is clearly an isomorphism. More importantly, since \[h(\sin{x}+k)=\sin(x+\alpha)+k,\] the primary decomposition of \((\sin(x+\alpha)+k)\) and \((\sin{x}+k)\) are of the same form. We are interested in the ring \(R/(\sin{x}+k)\), where it is natural to study the behaviour of \(\cos{x}\). For this reason we consider the substitution morphism \[\begin{aligned}g:\mathbb{R}[X] & \to R \\ X & \mapsto \cos{x}.\end{aligned}\] We first compute the inverse image \(g^{-1}[(\sin{x}+k)]\). It is natural to think about cancelling \(\sin x\) into \(\cos x\). Note \((\sin x + k)( \sin x - k) = (\sin^2x -k^2) = (1- \cos^2x-k^2)\), pick whichever \(P(X) \in (1-k^2-X^2)\), we have \[g(P(X))=P(\cos{x}) = (1-\cos^2x-k^2)Q(\cos{x})=(\sin{x}+k)(\sin{x}-k)Q(\cos{x})\] Hence \((1-k^2-X^2)\subset g^{-1}[(\sin{x}+k)]\). For the converse, note that if nonzero \(P \in g^{-1}[(\sin x + k)]\), we have \(\deg P > 1\) because trigonometric polynomial of the form \(a+b\cos x\) can never be divided by \(\sin x + k\). By Euclidean algorithm, we find \(Q(X)\), \(R(X)\) such that \[P(X)=Q(X)(1-k^2-X^2) + R(X)\] with \(\deg R \le 1.\) But when \(P \in g^{-1}[(\sin x + k)]\), we must have \(R(X)=0\), according to our study of the degree earlier. Hence we must have \(P(X) \in (1-k^2-X^2)\), which is to say \[g^{-1}[(\sin x + k)]= (1-k^2-X^2).\] This induces an isomorphism \[\mathbb{R}[X]/(1-k^2-X^2) \cong R/(k+\sin x).\] And it is much easier to study the ideal \(1-k^2-X^2\). To be precise,

- \(k^2=1 \iff (1-k^2-X^2)=(X)^2 \iff (k+\sin x)=M^2\) for some maximal ideal \(M\), because \((X)\) is a maximal ideal.
- \(k^2<1 \iff (1-k^2-X^2)\) is a product of two distinct maximal ideals \(\iff (k+\sin x)\) is a product of two distinct maximal ideals \(M\) and \(M'\).
- \(k^2>1 \iff (1-k^2-X^2)\) is maximal \(\iff\) \((k+\sin x)\) is maximal.

Therefore maximal ideals of \(R\) are determined by \(k\), or more precisely the relation between \(c^2\) and \(a^2+b^2\). Moreover, let \(M\) be a maximal ideal, we have

- If \(M\) is principal, then there exists \(\alpha\) and \(k\) such that

\[M = (\sin(x+\alpha) + k)\]

and \(R/M \cong \mathbb{C}\).

- If \(M\) is not principal, then there exists \(\alpha \in \mathbb{R}\) such that

\[M = (\sin(x+\alpha)+1,\cos(x+\alpha)), \quad M^2 = (\sin(x+\alpha)+1).\]

and \(R/M \cong \mathbb{R}\).

- Robert M. Fossum,
*The Divisor Class Group of a Krull Domain*. - M. F. Atiyah, FRS & I. G. MacDonald,
*Introduction to Commutative Algebra*. - Macro Fontana, Salah-Eddine Kabbaj, Sylvia Wiegand,
*Commutative Ring Theory and Applications.* - Hideyuki Matsumura,
*Commutative Ring Theory*. - P. Samuel,
*Lectures on Unique Factorization Domains*. - A. Zaks,
*Half Factorial Domains*.

Consider a sequence of real or complex numbers \(\{s_n\}\). If \(s_n \to s\), then \[\pi_n = \frac{s_1+\cdots+s_n}{n} \to s.\]

Here, \(\pi_n\) is called the Cesàro sum of \(\{s_n\}\). The proof is rather simple. Given \(\varepsilon>0\), there exists some \(N>0\) such that \(|s_n-s|<\varepsilon\) for all \(n > N\). Therefore we can write \[\begin{aligned} |\pi_n - s| &= \left|\frac{s_1+s_2+\cdots+s_N}{n}+\frac{s_{N+1}+\cdots+s_n}{n}-s\right| \\ &= \left|\frac{(s_1-s)+(s_2-s)+\cdots+(s_N-s)}{n}+\frac{(s_{N+1}-s)+\cdots+(s_n-s)}{n}\right| \\ &\leq \left| \frac{s_1+\cdots+s_N-Ns}{n} \right| + \frac{N}{n}\varepsilon\end{aligned}\] For fixed \(N\), we can pick \(n\) big enough such that \(N/n<1/2\) (i.e. \(n>2N\)) and \[\left| \frac{s_1+\cdots+s_N-Ns}{n} \right|<\frac{1}{2}\varepsilon.\] Hence \(\pi_n\) converges to \(s\). But the converse is not true in general. For example, if we put \(s_n=(-1)^n\), then it diverges but \(\pi_n \to 0\). If \(\pi_n\) converges, we say \(\{s_n\}\) is Cesàro summable.

If we treat \(\pi_n\) as an integration with respect to the counting measure, things become interesting. Why don't we investigate the operator defined to be \[C(f)(x)= \frac{1}{x}\int_0^xf(t)dt.\] In this blog post we investigate this operator in Hilbert space \(L^2(0,\infty)\).

Put \(L^2=L^2(0,\infty)\) relative to Lebesgue measure, and the Cèsaro operator \(C\) is defined as follows: \[\begin{aligned}(Cf)(s) = \frac{1}{s}\int_0^sf(t)dt.\end{aligned}\]

From the example above, we shouldn't expect \(C\) to be too normal or well-behaved. But fortunately it is at the very least continuous: due to Hardy's inequality, we have \(\lVert C \rVert = 2\). I organised several proofs of this. But \(C\) is not compact.

Consider a family of functions \(\{\varphi_A\}_{A>0}\) where \[\varphi_A = \sqrt{A}\chi_{(0,1/A]}.\] (I owe Oliver Diaz for this family of functions.) It's not hard to show that \(\lVert \varphi_A \rVert = 1\). If we apply \(C\) on it we see \[(C\varphi_A)(x) = \frac{1}{x}\int_0^x\sqrt{A}\chi_{(0,1/A]}dx = \sqrt{A}\left(\chi_{(0,1/A]}(x)+\frac{1}{Ax}\chi_{(1/A,+\infty)}\right)\] Hence \(\lVert C\varphi_A \rVert = \frac{\sqrt{1+A^2}}{A}\). Meanwhile for \(B>A\), we have \[\begin{aligned}C(\varphi_B-\varphi_A)(x) &=\left(\sqrt{B}-\sqrt{A} \right)\chi_{(0,1/B]}(x)+\left(\frac{1}{\sqrt{B}x}-\sqrt{A}\right)\chi_{(1/B,1/A]}(x) \\ &+\left(\frac{1}{\sqrt{B}} - \frac{1}{\sqrt{A}} \right)\frac{1}{x}\chi_{(1/A,+\infty)}(x)\end{aligned}\] It follows that \[|C(\varphi_B-\varphi_A)|(x) \geq \left(\frac{1}{\sqrt{A}}-\frac{1}{\sqrt{B}} \right)\frac{1}{x}\chi_{(1/A,\infty)}(x).\] If we compute the norm on the right hand side we get \[\|C(\varphi_B-\varphi_A)\| \geq \left|1-\sqrt{\frac{A}{B}} \right|.\] As a result, if we pick \(f_n=\varphi_{2^n}\), then for any \(m>n\) we get \[\|C(f_m-f_n)\| \geq \left|1 - \sqrt{2^{n-m}} \right| \geq 1-\frac{1}{\sqrt{2}}.\] Therefore, we find a sequence \((f_n)\) on the unit ball such that \((Cf_n)\) has no convergent subsequence.

Also we can find its adjoint operator: \[\begin{aligned}\langle Cf,g \rangle &= \int_0^\infty \left(\frac{1}{s}\int_0^sf(t)dt \right)\overline{g}(s)ds \\ &= \int_0^\infty\left(\int_t^\infty \frac{1}{s}f(t)\overline{g}(s)ds\right)dt \\ &= \int_0^\infty f(t) \left(\int_t^{\infty}\frac{1}{s}\overline{g}(s)ds\right)dt.\end{aligned}\] Hence the adjoint is given by \[(C^\ast f)(t) = \int_t^{\infty}\frac{1}{s}g(s)ds.\] \(C^\ast\) is not compact as well. Further, another application of Fubini's theorem shows that \[CC^\ast = C + C^\ast=C^\ast C \implies (I-C)(I-C^\ast)=I=(I-C^\ast)(I-C)\] Hence \(I-C\) is an isometry, \(C\) is normal.

In this section we study the spectrum of \(C\) and \(C^\ast\), which will be derived from properties of bilateral shift, which comes from \(\ell^2\) space. For convenience we write \(\mathbb{N}=\mathbb{Z}_{\geq 0}\). This section can also help you understand the connection between \(L^2(0,1)\) and \(L^2(0,\infty)\).

An operator \(U\) on a Hilbert space \(H\) is called a *simple unilateral shift* if \(H\) has a orthonormal basis \(\{e_n\}\) such that \(U(e_n)=e_{n+1}\) for all \(n \in \mathbb{N}\). This is nothing but right-shift operator in the sense of basis. Besides, we call \(U\) a *unilateral shift of multiplicity \(m\)* if \(U\) is a direct sum of \(m\) simple unilateral shifts (note: \(m\) can be any cardinal number, finite or infinite).

If we consider the difference between \(\mathbb{N}\) and \(\mathbb{Z}\), we have the definition of *bilateral shift*. An operator \(W\) on \(K\) is called a *simple bilateral shift* if \(K\) has a orthonormal basis \(\{e_n\}\) such that \(We_{n}=e_{n+1}\) for all \(n \in \mathbb{Z}\). Besides, if we consider the subspace \(H\) which is spanned by \(\{e_n\}\), we see \(W|_H\) is simply a unilateral shift. Before we begin, we investigate some elementary properties of uni/bilateral shifts.

(Proposition 1)A simple unilateral shift \(U\) is an isometry.

*Proof.* Note \((Ue_m,Ue_n)=(e_{m+1},e_{n+1})=\delta_{m+1,n+1}=\delta_{mn}=(e_m,e_n)\). \(\square\)

(Proposition 2)A simple bilateral shift \(W\) is unitary, hence is also an isometry.

*Proof.* Note \((We_m,e_n)=(e_{m+1},e_n)=\delta_{m+1,n}=\delta_{m,n-1}=(e_m,W^{-1}e_n)\), which follows that \(W^\ast=W^{-1}\). \(\square\)

Now let the Hilbert space \(K\) and its subspace \(H\) (invariant under \(W\)) be given. Consider the 'orthonormal' operator given by \(Re_n=e_{-(n+1)}\). It follows that \(R\) is a unitary involution and \[Re_0=W^{-1}e_0 \quad RH = H^{\perp} \quad R \circ W = W^{-1} \circ R.\]

With these tools, we are ready for the most important theorems.

\(W=I-C^\ast\) is a simple bilateral shift on \(K=L^2\).

**Step 1 - Obtaining missing subspace, operator and basis**

Here we put \(H=L^2(0,1)\), which can be canonically embedded into \(L^2(0,\infty)\) in the obvious way (consider all \(L^2\) functions vanish outside \((0,1)\)). It is natural to put this, as there are many similarities between \(L^2(0,1)\) and \(L^2(0,\infty)\).

Explicitly, \[(Wf)(x) = f(x) - \int_x^\infty \frac{1}{t}f(t)dt.\] Also we claim the basis to be generated by \(e_0= \chi_{(0,1)}\). First of all we show that \((We_n)_{n \geq 0}\) is orthonormal. Note as we have proved, \(W^\ast W = (I-C)(I-C^\ast)=I\). Without loss of generality we assume that \(m \geq n\) and therefore \[(e_m,e_n)=(W^me_0,W^ne_0)=((W^\ast)^nW^me_0,e_0)=((W^\ast W)^nW^{m-n}e_0,e_0)=(W^{m-n}e_0,e_0).\] If \(m=n\), then \((e_m,e_n)=(e_0,e_0)=1\). Hence it is reduced to prove that \((W^ke_0,e_0)=0\) for all \(k>0\). First of all we have \[(We_0,e_0)=(e_0,e_0)-(C^\ast e_0,e_0)=1-(C^\ast e_0,e_0)\] meanwhile \[\begin{aligned} (C^\ast e_0,e_0) &= \int_0^1 \left(\int_x^1 \frac{1}{t}dt \right)dx \\ &= \int_0^1(-\ln{x})dx \\ &= (-x\ln{x}+x)|_0^1 = 1\end{aligned}\] Hence \(We_0 \perp e_0\). Suppose now we have \((W^ke_0,e_0)=0\), then $$\[\begin{aligned} (W^ke_0,e_0)&=(WW^{k-1}e_0,e_0) \\ &=((I-C^\ast)W^{k-1}e_0,e_0) \\ &= (W^{k-1}e_0,e_0)-(C^\ast W^{k-1}e_0,e_0) \\ &= -(W^{k-1}e_0,C e_0) \\ &= -\int_0^1W^{k-1}e_0(x)\frac{1}{x}\left(\int_0^xdt\right)dx \\ &= -\int_0^1 W^{k-1}e_0(x)\frac{1}{x} \cdot x dx \\ &= -(W^{k-1}e_0,e_0) \\ &= 0. \end{aligned}\]$$ Note \(W^ke_0\) always vanishes when \(x \geq 1\): when we are doing inner product, \([1,\infty)\) is automatically excluded. With these being said, \((W^ne_0)_{n \geq 0}\) forms a orthonormal set. By The Hausdorff Maximality Theorem, it is contained in a maximal orthonormal set. But since \(H=L^2(0,1)\) is separable (if and only if it admits a countable basis) (proof), \((W^ke_0)\) forms a basis of \(H\). From now on we write \(\{e_n\}\).

To find the involution \(R\), note first \(W=I-C^\ast\) is already unitary (also, if it is not unitary, then it cannot be a bilateral shift, we have nothing to prove), whose inverse or adjoint is \(W^\ast=I-C\) as we have proved earlier. Hence we have \[Re_0=e_{-1}=(I-C)e_0=\chi_{(0,1)}-\frac{1}{x}\int_0^xdt = -\frac{1}{x}\chi_{[1,\infty)}\] But we have no idea what \(R\) is exactly. We need to find it manually (or we have to guess). First of all it shall be guaranteed that \(RH=H^\perp\). Since \(H\) contains all \(L^2\) functions vanish on \([1,\infty)\), functions in \(RH\) should vanish on \((0,1)\). It is natural to put \(R(f)(x)=g(x)f\left( \frac{1}{x}\right)\) for the time being. \(g\) should be determined by \(e_{-1}\). Note \(e_0\left(\frac{1}{x}\right)=\chi_{[1,\infty)}\) almost everywhere, we shall put \(g(x)=-\frac{1}{x}\). It is then clear that \(Re_0=W^{-1}e_0\) and \(RH=H^\perp\). For the third condition, we need to show that \[W \circ R \circ W = R.\] Note \[\begin{aligned}W \circ R \circ W(f) &= W \circ R \left(f(x)-\int_x^\infty\frac{1}{t}f(t)dt\right) \\ &= W \left(-\frac{1}{x}f\left(\frac{1}{x}\right)+\frac{1}{x}\int_{1/x}^{\infty}f(t)dt \right) \\ &= -\frac{1}{x}f\left(\frac{1}{x}\right)+\underbrace{\frac{1}{x}\int_{1/x}^{\infty}f(t)dt + \int_x^\infty \frac{1}{t^2}f\left(\frac{1}{t}\right)dt + \int_x^\infty \frac{1}{t^2}\int_{1/t}^{\infty}f(u)du}_{=0 \text{ by Fubini's theorem, similar to proving }CC^\ast=C+C^\ast.} \\ &= R(f).\end{aligned}\] **Step 2 - With these, \(W\) in step 1 has to be a simple bilateral shift**

This is independent to the spaces chosen. To finish the proof, we need a lemma:

Suppose \(K\) is a Hilbert space, \(H\) is a subspace and \(e_0 \in H\). \(W\) is a unitary operator such that \(W^ne_0 \in H\) for all \(n \geq 0\) and \(\{e_n=W^ne_0\}_{n \geq 0}\) forms a orthonormal basis of \(H\). \(R\) is a unitary involution on \(K\) such that \[Re_0 = W^{-1}e_0 \quad RH=H^\perp \quad R \circ W = W^{-1} \circ R\] then \(W\) is a simple bilateral shift.

Indeed, objects mentioned in step 1 fit in this lemma. To begin with, we write \(e_n=W^ne_0\) for all \(n \in \mathbb{Z}\). Then \(\{e_n\}\) is an orthonormal set because for arbitrary \(m,n \in \mathbb{Z}\), there is a \(j \in \mathbb{Z}\) such that \(m+j,n+j \geq 0\). Therefore \[(e_m,e_n)=(W^je_m,W^je_n)=(W^{m+j}e_0,W^{n+j}e_0)=(e_{m+j},e_{n+j})=\delta_{m+j,n+j}=\delta_{m,n}.\] Since \((e_0,e_1,\cdots)\) spans \(H\), \(RH=H^{\perp}\), we see \((Re_0,Re_1,\cdots)\) spans \(H^{\perp}\). But \[Re_n=RW^ne_0=W^{-n}Re_0=W^{-n-1}e_0=e_{-n-1},\] hence \(\{e_{-1},e_{-2},\cdots\}\) spans \(H^\perp\). By definition of \(W\), it is indeed a bilateral shift. And our proof is done \(\square\)

- Walter Rudin,
*Functional Analysis*. - Arlen Brown, P. R. Halmos, A. L. Shields,
*Cesàro operators*.

Throughout we consider the Hilbert space \(L^2=L^2(\mathbb{R})\), the space of all complex-valued functions with real variable such that \(f \in L^2\) if and only if \[\lVert f \rVert_2^2=\int_{-\infty}^{\infty}|f(t)|^2dm(t)<\infty\] where \(m\) denotes the ordinary Lebesgue measure (in fact it's legitimate to consider Riemann integral in this context).

For each \(t \geq 0\), we assign an bounded linear operator \(Q(t)\) such that \[(Q(t)f)(s)=f(s+t).\] This is indeed bounded since we have \(\lVert Q(t)f \rVert_2 = \lVert f \rVert_2\) as the Lebesgue measure is translate-invariant. This is a left translation operator with a single step \(t\).

The inner product in \(L^2\) is defined by \[(f,g)=\int_{-\infty}^{\infty}f(s)\overline{g(s)}dm(s), \quad f,g\in L^2.\] If we apply \(Q(t)\) on \(f\), we see \[\begin{aligned} (Q(t)f,g) &= \int_{-\infty}^{\infty}f(s+t)\overline{g(s)}dm(s) \\ &= \int_{-\infty}^{\infty}f(u)\overline{g(u-t)}dm(u) \quad (u=s+t) \\ &= (f,Q(t)^{\ast}g)\end{aligned}\] where \(Q(t)^\ast\) is the adjoint of \(Q(t)\), which happens to be a left translation operator with a single step \(t\). Clearly we have \(Q(t)Q(t)^\ast=Q(t)^\ast Q(t)=I\), which indicates that \(Q(t)\) is unitary. Also we can check in a more manual way: \[(Q(t)f,Q(t)g) = \int_{-\infty}^{\infty}f(s+t)\overline{g(s+t)}dm(s) = \int_{-\infty}^{\infty}f(s+t)\overline{g(s+t)}dm(s+t)=(f,g).\] By operator theory, since \(Q(t)\) is unitary and bounded, the spectrum of \(Q(t)\) lies in the unit circle \(S^1\).

Note \(Q(0)=I\) and \[Q(t+u)f(s)=f(s+t+u)=f[(s+t)+u]=Q(u)f(s+t)=Q(t)Q(u)f(s)\] for all \(f \in L^2\), which is to say that \(Q(t+u)=Q(t)Q(u)\). Therefore we say \(\{Q(t)\}\) is a *semigroup*. But what's more important is that it satisfies strong continuity near the origin: \[\lim_{t \to 0}\lVert Q(t)f - f \rVert_2 = 0.\] This is not too hard to verify. It suffices to prove that \[\lim_{t \to 0}\int_{-\infty}^{\infty} |f(s+t)-f(s)|^2dm(s) =0.\] Note \(C_c(\mathbb{R})\) (continuous function with compact support) is dense in \(L^2\), and for \(f \in C_c(\mathbb{R})\), it follows immediately from properties of continuous functions. Next pick \(f \in L^2\). Then for \(\varepsilon>0\) there exists some \(f_1 \in C_c(\mathbb{R})\) such that \(\lVert f-f_1 \rVert_2 < \frac{\varepsilon}{4}\) and \(\lVert f_1(s+t)-f_1(s)\rVert_2<\frac{\varepsilon}{2}\) for \(t\) small enough. If we put \(f_2=f-f_1\) we get \[\begin{aligned} \lVert f(s+t)-f(s) \rVert_2 &\leq \lVert f_1(s+t)-f_1(s) \rVert_2+\lVert f_2(s+t)-f_2(s) \rVert \\ &< \frac{\varepsilon}{2}+2\lVert f_2(s)\rVert < \varepsilon.\end{aligned}\] The limit follows as \(\varepsilon \to 0\).

Recall that the infinitesimal generator of \(Q(t)\) is defined to be \[A=\lim_{t \to 0}\frac{1}{t}[Q(t)-I]\] which is inspired by \(\frac{d}{dt}e^{tA}=A\) (thanks to von Neumann). Note if \(f \in L^2\) is differentiable, then \[Af(s) = \lim_{t \to 0} \frac{f(s+t)-f(s)}{t} = f'(s).\] The infinitesimal generator of \(Q(t)\) being differentiation operator is quite intuitive. But we need to clarify it in \(L^2\) which is much larger. So what is the domain \(D(A)\)? We don't know yet but we can guess. When talking about differentiation in \(L^p\) space, it makes sense to extend our differentiation to absolute continuity. Also we need to make sure that \(Af \in L^2\), hence we put \[D=\{f\in L^2:f \text{ absolutely continuous, }f' \in L^2\}.\] For every \(x \in D(A)\) and any fixed \(t\) we already have \[\frac{d}{dt}Q(t)f(s)=f'(s+t)=Af(s+t)\] hence \(Af=f'\) for every \(x \in D(A)\) and it follows that \(D(A) \subset D\). In fact, \(A\) is the restriction of the differential operator on \(D(A)\). Conversely, By Hille-Yosida theorem, we see \(1 \in \rho(A)\) and also one can show that \(1 \in \rho(\frac{d}{dx})\). Therefore \[(I-\frac{d}{dx})D(A)=(I-A)D(A)=L^2.\] But we also have \[D=(I-\frac{d}{dx})^{-1}L^2.\] Thus \[D = \left(1-\frac{d}{dx}\right)^{-1}\left(1-\frac{d}{dx}\right)D(A)=D(A).\] The fact that \((I-\frac{d}{dx})D=L^2\) can be realised by the equation \(f-f'=g\), where the existence of solution can be proved using Fourier transform. Note \(\hat{f'}(y)=iy\hat{f}(y)\), with some knowledge of distribution, the result can also be given by \[D(A)=\left\{f\in L^2:\int_{-\infty}^{\infty}|y\hat{f}(y)|^2dy<\infty\right\}.\]

By the Hille-Yosida theorem, the half plane \(\{z:\Re z>0\} \subset \rho(A)\). But we can give a more precise result of it.

Pick any \(f \in D(A)\). It is directly verified that \[(A-\lambda{I})f = f'-\lambda{f}.\] Put \(g=(A-\lambda{I})f\) then \[\hat{g}(y)=iy\hat{f}(y)-\lambda{\hat{f}(y)}.\] Therefore \[\hat{f}(y) = \frac{\hat{g}(y)}{iy-\lambda} \in L^2.\] Conversely, suppose \(h(y)=\frac{\hat{g}(y)}{iy-\lambda} \in L^2\), then \(\hat{g}(y)=iyh(y)-\lambda{h}(y)\). If we take its Fourier inverse, we see \(g \in R(A-\lambda{I})\).

If \(g \in L^2\), then clearly \(\hat{g} \in L^2\). It remains to discuss \(\hat{g}(y)/(iy-\lambda)\). Note \(iy\) is on the imaginary axis, hence if \(\lambda\) is not purely imaginary, then \(\hat{g}(y)/(iy-\lambda) \in L^2\). If \(\lambda\) is purely imaginary however, then we may have \(\hat{g}(y)/(iy-\lambda)\not\in L^2\). For example, we can take \(\hat{g}=\chi_{[s-1,s+1]}\) where \(\lambda = is\). Hence if \(\lambda\) is purely imaginary, \(R(A-{\lambda}I)\) is a proper subspace of \(L^2\). Therefore we conclude: \[\sigma(A)= \{z \in \mathbb{C}:\Re z = 0\}.\] *This is an exercise on W. Rudin's Functional Analysis. You can find related theorems in Chapter 13.*

Guided by researches in function theory, operator theorists gave the analogue to quasi-analytic classes. Let \(A\) be an operator in a Banach space \(X\). \(A\) is not necessarily bounded hence the domain \(D(A)\) is not necessarily to be the whole space. We say \(x \in X\) is a \(C^\infty\) vector if \(x \in \bigcap_{n \geq 1}D(A^n)\). This is quite intuitive if we consider the differential operator. A vector is analytic if the series \[\sum_{n=0}^{\infty}\lVert{A^n x}\rVert\frac{t^n}{n!}\] has a positive radius of convergence. Finally, we say \(x\) is quasi-analytic for \(A\) provided that \[\sum_{n=0}^{\infty}\left(\frac{1}{\lVert A^n x \rVert}\right)^{1/n} = \infty\] or equivalently its nondecreasing majorant. Interestingly, if \(A\) is symmetric, then \(\lVert{A^nx}\rVert\) is log convex.

Based on the density of quasi-analytic vectors, we have an interesting result.

(Theorem)Let \(A\) be a symmetric operator in a Hilbert space \(\mathscr{H}\). If the set of quasi-analytic vectors spans a dense subset, then \(A\) is essentially self-adjoint.

This theorem can be considered as a corollary to the fundamental theorem of quasi-analytic classes, by applying suitable Banach space techniques in lieu.

For a positive sequence \(\{a_n\}\), we see it is the moment of a positive measure \(\mu\), i.e. \(a_n = \int_\mathbb{R}t^n d\mu(t)\) if and only if it is positively definite (proof). But the uniqueness is not guaranteed. Here we have a sufficient condition for this - using the concept of quasi-analytic vector. This is a old theorem (1922) but we are using operator theory to prove it which appeared decades later.

(Carleman's condition)Suppose \(\{a_n\}\) is the moment sequence of a positive measure \(\mu\) on \(\mathbb{R}\), then \(\mu\) is uniquely determined provided that \(\sum a_{2n}^{-1/2n}=\infty\).

**Proof.** Consider the Hilbert space \[\mathscr{H}= L^2(\mathbb{R},\gamma)\] and the operator \[ A:f(t) \mapsto tf(t).\] It is clear that \(A\) is self-adjoint. We shall work on the constant function \(u(t) \equiv 1 \in \mathscr{H}\). Since \(A^nu = t^n\), we see \(u \in C^\infty\), otherwise \(a_n\) is not defined. On the other hand, we have \[ (A^n u, u) = a_n \implies (A^{2n} u,u) = \lVert A^n u \rVert^2 = |(A^n u, u)|^2 = a_{2n}.\] But \(a_{2n}^{-1/2n}=\lVert A^n u \rVert^{-1/n}\) and as a result we see \(\sum a_{2n}^{-1/2n}= \sum \lVert A^n u \rVert^{-1/n} = \infty\), hence \(u\) is quasi-analytic. In general, \(t^n = A^n u\) is quasi-analytic for all \(n \geq 0\). Consider the space of polynomial \(\mathcal{P}[t]\) with closure \(\mathscr{H}_1\). It follows from the theorem above that \(A_1 = A|_{\mathcal{P}[t]}\) is essentially self-adjoint in \(\mathscr{H}_1\). Hence \(\mathscr{H}_1\) is invariant under the one-parameter group \(e^{iAs}\). Pick \(y \in \mathcal{P}[t]^{\perp}\), we see \[(y,e^{iAs}u) = \int_\mathbb{R}e^{-ist}y(t)d\gamma(t) = 0,\] which implies that \(y = 0\) a.e. [\(\gamma\)]. It follows that \(\mathscr{H}_1 = \mathscr{H}\) or equivalently \(\mathcal{P}[t]\) is dense in \(\mathscr{H}\). Suppose now we have another generating measure \(\nu\) of \(\{a_n\}\). With respect to \(\nu\), \(\mathcal{P}[t]\) is still a dense space. But the norm on \(\mathcal{P}[t]\) is fixed by \(\{a_n\}\), hence we obtain an isometry between \(\mathcal{P}[t]_\gamma\) and \(\mathcal{P}[t]_\nu\), which extends to the isometry between \(L^2(\mathbb{R},\gamma)\) and \(L^2(\mathbb{R},\nu)\) which forces \(\gamma\) and \(\nu\) to be equal. \(\blacksquare\)

There are a lot of nice properties of analytic functions, whose class is denoted by \(C^\omega\). Formally we have the following definition:

If \(f \in C^\omega\) and \(x_0 \in \mathbb{R}\), one can write \[f = a_0+a_1(x-x_0)+a_2(x-x_0)^2+\cdots.\]

Obviously \(f \in C^\infty\) (and hence \(C^\omega \subset C^\infty\)) and alternatively we have the Taylor series converges to \(f\) for any \(x_0 \in \mathbb{R}\): \[T(x) = \sum_{n=0}^{\infty}\frac{D^nf(x_0)}{n!}(x-x_0)^n.\] One interesting thing is, every \(f \in C^\omega\) is uniquely determined by a sequence \(D^0f(x_0), Df(x_0),D^2f(x_0),\cdots\).

Unfortunately, this property is not generally true on \(C^\infty\). For example, we can consider the bump function \(\varphi\) (a simple example can be found on wikipedia). In brief, \(\varphi=0\) for all \(x \in (-\infty,-1] \cup [1,+\infty)\) but \(\varphi>0\) on \((-1,1)\). And more importantly, \(\varphi \in C^\infty\). However, if we take \(f = \varphi\) and \(g = 2\varphi\), then \(f \neq g\), but \(D^nf(-2)=D^ng(-2)=0\) for all \(n \geq 0\). We get a sequence of derivatives of different orders, but this sequence does not determine a unique \(C^\infty\) function.

The term "uniquely determined" can also be described in an alternative way: If \(f \in C^\omega\) and \(D^k(x_0)=0\) for all \(k \geq 0\), then \(f=0\) everywhere.

So a question comes up naturally: how many functions can be determined by its derivatives of all orders? Does \(C^\omega\) contain all we can get? If not, how can we describe them?

The class of analytics functions is our source of motivation, so it makes sense to dig into its properties to find more. For an analytic function it is natural to consider the restriction of a holomorphic function on the complex plane. Let \(\Omega\) be the set of all \(z=x+iy\) such that \(|y| < \delta\) and suppose \(f \in H(\Omega)\) and \(|f(z)|<\beta\) for all \(z \in \Omega\). By Cauchy's Estimate, we get \[|D^n f(x)| \leq \beta \delta^{-n}n!\quad n \in \mathbb{N},x\in \mathbb{R}.\] Also the restriction of \(f\) on \(\mathbb{R}\) is real-analytic. Here comes the interesting part: \(\beta\) and \(\frac{1}{\delta}\) is determined only by \(f\) and have nothing to do with \(n\), meanwhile \(n!\) is a special sequence that dominated \(f\) to some extent.

This motivates us to define a special class of functions, which is called the class \(C\{M_n\}\).

Let \(\{M_n\}\) be a sequence of positive numbers, we let \(C\{M_n\}\) denote the class of all \(f \in C^\infty\) such that \[\lVert D^nf\rVert_\infty \leq \beta_f B^n_f M_n,\] where \(\lVert \cdot \rVert_\infty\) is the supremum norm defined on \(\mathbb{R}\), and \(\beta_f,B_f\) are constants only determined by \(f\) but not \(n\).

In order to equip \(C\{M_n\}\) with some satisfying algebraic structures, which can simplify our work, we need some restrictions.

Indeed, \(B_f\) plays an much more important rule, since we have \[\limsup_{n \to \infty}\left(\frac{\lVert D^n f\rVert_\infty}{M_n}\right)^{1/n} \leq B_f\] while \(\beta_f\) was eliminated to \(1\) in this limit. However, if we eliminate \(\beta_f\) at the beginning, i.e. put \(\beta_f = 1\) for all \(f \in C\{M_n\}\), then when \(n=0\), we have \[\lVert f \rVert_\infty \leq M_0,\] which prevents \(C\{M_n\}\) to be a vector space. For example, if \(\lVert f \rVert_\infty = M_0\), then \(\lVert 2f \rVert_\infty = 2M_0 > M_0\), hence \(2f \not\in C\{M_n\}\). However, if we add \(\beta_f\) no matter what, say \(\lVert f \rVert_\infty \leq \beta_f M_0\), then whenever we do addition and scalar multiplication, there is a different constant with respect to the function, which makes sure that \(C\{M_n\}\) is closed under addition and scalar multiplication, i.e. is a vector space. If we don't add such a constant, our class contains way too few functions.

Further, we have some restriction on the sequence \(\{M_n\}\):

- \(M_0=1\).
- \(M_n^2 \leq M_{n-1}M_{n+1}\) (\(\{\log M_n\}\) is a convex sequence).

As we will see soon, this makes \(C\{M_n\}\) an algebra over \(\mathbb{R}\), where multiplication is defined pointwise.

*Proof.* If \(f,g \in C\{M_n\}\), then we need to show that \(fg \in C\{M_n\}\). We have the product rule for differentiation: \[D^n(fg) = \sum_{j=0}^{n}{n \choose k}(D^jf)(D^{n-j}g).\] Since \(f,g \in C\{M_n\}\), we have \[|D^n(fg)| \leq \sum_{j=0}^{n}{n \choose k}\beta_fB_f^jM_j\beta_gB_g^{n-j}M_{n-j} = \beta_f\beta_g\sum_{j=0}^{n}{n \choose k}B_f^jB_g^{n-j}M_jM_{n-j}.\] Of course we want to eliminate \(M_jM_{n-j}\) to obtain a binomial expansion. To do this we need the convexity of the sequence \(\{\log M_n\}\). Note \(M_n^2 \leq M_{n-1}M_{n+1}\) implies \[\log M_n - \log M_{n-1} \leq \log M_{n+1} - \log M_n.\] As a result, the line segment connecting \((n,\log M_n)\) and \((n-1,\log M_{n-1})\) is steeper and steeper as \(n\) grows. By connecting these points, we actually gets a convex function but we will be more rigorous. For \(0 < j < n\), we have \[\begin{aligned}\log M_n - \log M_j &= \sum_{k=j+1}^{n}\left(\log M_k - \log M_{k-1}\right) \\&\geq \sum_{k = j}^{n-1}\left(\log M_{k} - \log M_{k-1}\right) \\&\geq \sum_{k=1}^{n-j}(\log M_k - \log M_{k-1}) \quad\text{(note $\log M_0=0$)} \\&= \log M_{n-j}.\end{aligned}\] Hence \(M_n \geq M_jM_{n-j}\) for \(0<j<n\). It also hold when \(j=0\) or \(j=n\), hence we get \[|D^n(fg)|= \beta_f\beta_g\sum_{j=0}^{n}{n \choose k}B_f^jB_g^{n-j}M_jM_{n-j} \leq \beta_f\beta_g\sum_{j=0}^{n}{n \choose k}B_f^jB_g^{n-j}M_n = \beta_f\beta_g(B_f+B_g)^nM_n.\] Hence \(fg \in C\{M_n\}\). The reason why \(C\{M_n\}\) is a vector space has been stated already. \(\square\)

This restriction does not hurt the generality. In fact whenever we are given a positive sequence \(\{M_n\}\), we have another sequence \(\{M'_n\}\) satisfying the two restrictions such that \(C\{M_n\}=C\{M'_n\}\).

A class \(C\{M_n\}\) is said to be quasi-analytic if the condition \[f \in C\{M_n\},\quad (D^nf)(0)=0 \] for all \(n \in \mathbb{N}\) implies that \(f = 0\) for all \(x \in \mathbb{R}\).

The reason we try to check whether it's equal to \(0\) everywhere, instead of check whether it is 'uniquely determined' by a sequence of derivative of different order is, this one is much simpler to work with. If a sequence of derivative of different order determines two functions, then their difference is always \(0\).

We have seen that \(C\{n!\}\) contains all functions which is a restriction of a holomorphic function in the strip defined by \(|\Im(z)|<\delta\). Conversely, we show that any function in \(C\{n!\}\) defined on the real axis can be extended to a holomorphic function with the same property. As a result, \(C\{n!\}\) is a quasi-analytics class (which contains all bounded function of \(C^\omega\)). If we only consider functions defined on a closed and bounded interval \([a,b]\), then \(C\{n!\}\) is exactly \(C^\omega\).

Suppose \(f \in C\{n!\}\). First of all we have \[\lVert D^nf \rVert_\infty \leq \beta B^nn!\] for \(n \in \mathbb{N}\). By Taylor's formulae \[f(x) = \sum_{j=0}^{n-1}\frac{D^jf(a)}{j!}+\frac{1}{(n-1)!}\int_a^x(x-t)^{n-1}D^nf(t)dt.\] The remainder is therefore dominated by \[\frac{n!}{(n-1)!}\beta B^n\left\vert\int_a^x(x-t)^{n-1}dt\right\vert = \beta|B(x-a)|^n.\] If \(|B(x-a)|<1\), then \(\lim_{n \to \infty}|B(x-a)|^n = 0\), and we can safely write the expansion \[f(x) = \sum_{n=0}^{\infty}\frac{D^nf(a)}{n!}(x-a)^n.\] Pick \(0<\delta<\frac{1}{B}\), we can replace \(x\) in the expansion above with \(z\) such that \(|z-a|<\delta\). This defines a holomorphic function \(F_a\) on \(D(a,\delta)\) (the open disk centred at \(a\) with radius \(\delta\)). If \(x \in D(a,\delta)\) is real, then \(F_a(x)=f(x)\). Therefore \(F_a\) is the analytic continuation of \(f\); all \(F_a\) form a holomorphic extension \(F\) of \(f\) in the strip \(|\Im(z)|<\delta\). As a result, for \(z = a+iy\) with \(|y|<\delta\), we have \[|F(z)|=|F_a(z)| = \left\vert\sum_{n=0}^{\infty}\frac{D^nf(a)}{n!}(iy)^n\right\vert \leq \beta \sum_{n=0}^{\infty}(B\delta)^n = \frac{1}{1-B\delta}\] Hence \(F\) is bounded in such a region.

In general, if \(M_n \to \infty\) way too fast (at least faster than \(n!\)) as \(n \to \infty\), then \(C\{M_n\}\) is quasi-analytic. There are several equivalent statements on whether \(C\{M_n\}\) is a quasi-analytic class, which is given by the Denjoy-Carleman theorem. Here I collect all conditions that I have found:

(Denjoy-Carleman theorem)The following conditions are equivalent:

- \(C\{M_n\}\) is not quasi-analytic.
- \(\int_0^\infty \log Q(x)\frac{dx}{1+x^2}<\infty\), where \(Q(x)=\sum_{n=0}^{\infty}\frac{x^n}{M_n}\).
- \(\int_0^\infty \log q(x) \frac{dx}{1+x^2}<\infty\), where \(q(x) = \sup \frac{x^n}{M_n}\).
- \(\sum_{n=1}^{\infty}\left(\frac{1}{M_n}\right)^{1/n}<\infty\).
- \(\sum_{n=1}^{\infty}\frac{M_{n-1}}{M_n}<\infty\)
- \(C\{M_n\}\) contains nontrivial function with compact support.
- \(\sum_{n=1}^{\infty}\frac{1}{\lambda_n}<\infty\) where \(\lambda_n = \inf_{k \geq n}M_k^{\frac{1}{k}}\).

You may find condition 7 is ridiculous. In fact, in this condition \(\{M_n\}\) is not required to satisfy the two restriction. This one is what Denjoy and Carleman found initially. Later, mathematicians find that for a sequence \(\{M_n\}\) we can obtain its convex minorant ${M_n'} $ such that

- \(M_n \geq M_n'\) for all \(n\).
- \(\{\log M_n'\}\) is convex.
- There is a sequence \(0=n_0<n_1<\cdots\) such that \(M_{n_0} = M'_{n_0}\) and \(\log M_k\) is linear for \(n_i \leq k \leq n_{i+1}\).

And as you may guess, the convex minorant \(\{M_n'\}\) is what we are using today.

The proof of the Denjoy-Carleman theorem will come out in my next blog post. There are quite a lot of work to do to finish the proof, and it cannot be done within hours. We will be using many complex analysis theories. Also, I will try to cover some extra properties of quasi-analytic classes as well as why convex minorant is sufficient.

]]>This post is still on progress, neither is it finished nor polished properly. For the coming days there will be new contents, untill this line is deleted. What I'm planning to add at this moment:

- Transpose is not just about changing indices of its components.
- Norm and topology in vector spaces
- Representing groups using matrices

Since the background of the reader varies a lot, I will try to organise contents depending on topic and required background. For the following section, you are assumed to be familiar with basic abstract algebra terminologies, for example, group, ring, fields.

When learning linear algebra, we were always thinking about real or complex vectors, matrices. This makes sense because \(\mathbb{R}\) and \(\mathbb{C}\) are the closest number **fields** to our real life. But we should not have the stereotype that linear algebra is all about real and complex spaces, or properties of \(\mathbb{R}^n\) and \(\mathbb{C}^n\). Never has there been such an restriction. In fact, \(\mathbb{R}\) and \(\mathbb{C}\) can be replaced with any field \(\mathbb{F}\), and there are vast differences depending on the properties of \(\mathbb{F}\).

There are already some differences about linear algebra over \(\mathbb{R}\) and \(\mathbb{C}\). Since \(\mathbb{C}\) is algebraically closed, that is, all polynomials of order \(n \geq 1\) have \(n\) roots, dealing with eigen functions has been much 'safer'. Besides, for example, we can diagnoalise the matrix \[A = \begin{pmatrix}-1& -1 \\ 2 & 1 \end{pmatrix}\] in \(\mathbb{C}\) but not in \(\mathbb{R}\).

When \(\mathbb{F}\) above is finite, there are a lot more interesting things. It's not just saying, \(\mathbb{F}\) is a field, and is finite. For example, if \(\mathbb{F}=\mathbb{R}\), we have \[\begin{pmatrix}1&0&2 \\2&3&1 \\1&4&0\end{pmatrix}^{-1}=\begin{pmatrix}-\frac{2}{3} & -\frac{4}{3} & -1 \\\frac{1}{6}&-\frac{1}{3}&\frac{1}{2} \\\frac{5}{6}&-\frac{2}{3}&\frac{1}{2}\end{pmatrix}.\] There shouldn't be any problem. However, on the other hand, if \(\mathbb{F}=\mathbb{Z}_5\), we have \[\begin{pmatrix}1&0&2 \\2&3&1 \\1&4&0\end{pmatrix}^{-1}=\begin{pmatrix}1&3&4 \\1&3&3\\0&1&3\end{pmatrix}.\] In application, when working on applied algebra, it's quite often to meet finite fields. What if we want to solve linear equation over a finite field? That's when linear algebra over finite fields comes in. Realise this before it's late! By the way, we are working on rings in lieu of fields, we find ourselves in module theory.

The set of all invertible \(n \times n\) matrices forms a multiplicative group (and you should have no problem verifying this). The notation won't go further than \(GL(n)\), \(GL(n,\mathbb{F})\), \(GL_n(\mathbb{F})\) or simply \(GL_n\). The set of all orthomormal matrices, which is also a multiplicative group and written as \(O(n)\), is obviously subgroup of \(GL(n)\) since for all \(A \in O(n)\), we have \(\det{A} = \pm 1 \neq 0\) all the time. \(O(n)\) contains \(SO(n)\) as a subgroup, whose elements have determinant \(1\). One should not mess up with \(SO(n)\) and \(SL(n)\) which is the group of all matrices of determinant \(1\). In fact \(SO(n)\) is a proper subset of \(SL(n)\) and \(SL(n) \cap O(n) = SO(n)\). In general we have \[SO(n) \subset SL(n) \subset GL(n), \\SO(n) \subset O(n) \subset GL(n).\] Now we consider a more detailed group structure between \(GL(n)\) and \(O(n)\). I met the following problem on a differential topology book and was about fibre and structure group. But for now it's simply a linear algebra problem. The crux is finding the 'square root' of a positive defined matrix.

There is a direct product decomposition \[GL(n,\mathbb{R})=O(n) \times \{\text{positive definite symmetric matrices}\}.\]

This decomposition is pretty intuitive. For example if a matrix \(A \subset GL(n,\mathbb{R})\) has determinant \(a\), we may be looking for a positive definite matrix of determinant \(|a|\), and another matrix of determinant \(\frac{a}{|a|}\), which is expected to be orthonormal as well. We can consider \(O(n)\) as a rotation of basis (change the direction), and the positive definite symmetric matrix as scaling (change the size). Similar result hold if we change the order of multipication. It worth mentioning that by direct product we mean it's up to the order of eigenvalues.

**Proof.** For any invertible matrix \(A\), we see \(AA^T\) is positive definite and symmetric. Therefore there exists some \(P \in O(n)\) such that \[P^T AA^TP = \operatorname{diag}(\lambda_1,\lambda_2,\cdots,\lambda_n).\] We assume that \(\lambda_1\leq \lambda_2 \leq \cdots \leq \lambda_n\) to preserve uniqueness. Note \(\lambda_k>0\) for all \(1 \leq k \leq n\) since \(AA^T\) is positive definite. We write \(\Lambda=\operatorname{diag}(\sqrt\lambda_1,\sqrt\lambda_2,\cdots,\sqrt\lambda_n)\) which gives \[AA^T = P\Lambda^2P^T.\] Define the square root \(B=\sqrt{AA^T}=\sqrt{A^TA}\) by \[B = P\Lambda P^T.\] Then \(B^2=P\Lambda P^T P \Lambda P^T = AA^T\). Note \(B\) is also a positive definite symmetric matrix and is unique for given \(A\). Let \(v_1,v_2,\cdots,v_n\) be the orthonormal and linear independent eigenvectors of \(B\) with respect to \(\sqrt\lambda_1, \sqrt\lambda_2, \cdots, \sqrt\lambda_n\). We first take a look at the following basis: \[e_1=\frac{1}{\sqrt{\lambda_1}}Av_1,e_2=\frac{1}{\sqrt{\lambda_2}}Av_2,\cdots,e_n=\frac{1}{\sqrt{\lambda_n}}Av_n.\] Note \[\left(\frac{1}{\sqrt{\lambda_i}}Av_i\right)^{T}\left(\frac{1}{\sqrt{\lambda_j}}Av_j\right)=\frac{1}{\sqrt{\lambda_i\lambda_j}}v_i^TA^TAv_j=\frac{1}{\sqrt{\lambda_i\lambda_j}}v_i^TB^2v_j=\frac{\sqrt{\lambda_j}}{\sqrt{\lambda_i}}v_i^Tv_j.\] So if the value above is \(1\) if \(i = j\) and \(0\) if \(i \neq j\). \(\{e_1,e_2,\cdots,e_n\}\) is a basis since \(A\) is invertible, and later we know it is orthonormal.

Then we take \[U = (e_1,e_2,\cdots,e_n)\begin{pmatrix}v_1^T \\v_2^T \\\vdots \\v_n^T\end{pmatrix}\] We see \[UU^T = (e_1,e_2,\cdots,e_n)\begin{pmatrix}v_1^T \\v_2^T \\\vdots \\v_n^T\end{pmatrix}(v_1,v_2,\cdots,v_n)\begin{pmatrix}e_1^T \\e_2^T \\\vdots \\e_n^T\end{pmatrix} = I = U^TU\] since both \(\{e_1,e_2,\cdots,e_n\}\) and \(\{v_1,v_2,\cdots,v_n\}\) are orthonormal. On the other hand, we need to prove that \(A=UB\). First of all, \[Uv_k =(e_1,e_2,\cdots,e_n)\begin{pmatrix}v_1^T \\v_2^T \\\vdots \\v_n^T.\end{pmatrix} v_k = (e_1,e_2,\cdots,e_n)\begin{pmatrix}0\\\vdots \\v_k^Tv_k \\\vdots\end{pmatrix} = e_k.\] (Note we used the fact that \(\{v_k\}\) are orthonormal.) This yields \[UPv_k = U \sqrt\lambda_kv_k = \sqrt\lambda_ke_k=\frac{\sqrt\lambda_k}{\sqrt\lambda_k}Av_k=Av_k.\]

Therefore \(A=UB\) holds on a set of basis, therefore holds on \(\mathbb{R}^n\). This gives the desired conclusion. For any invertible \(n \times n\) matrix \(A\) we have a unique decomposition \[A = UB\] where \(U \in O(n)\) and \(B\) is a positive definitive symmetric matrix. \(\square\)

Basis of a vector space is not coming from nowhere. The statement that all vector spaces have a basis is derived from axiom of choice and the fact that all non-zero elements in a field is invertible. I have written an article proving this already, see here (this is relatively advanced). On the other hand, since elements of a ring are not necessarily invertible, modules over a ring are not equipped with basis in general.

It is also worth mentioning that, a vector space of finite dimension is not necessarily of finite dimension. Infinite dimensional vector space is not some fancy thing. It's quite simple: the set of basis is not finite. It can be countable or uncountable. And there is a pretty straightforward example: the set of all continuous functions \(f:\mathbb{R} \to \mathbb{R}\).

One of the most important concepts developed in 20th century is, when studying a set, one can study functions defined on it. For example, let's consider \([0,1]\) and \((0,1)\). If we consider the set of all continuous functions on \([0,1]\), which is written as \(C([0,1])\), we see everything is fine. It's fine to define norm on it, to define distance on it, and the norm and distance are complete. However, things are messy on \(C((0,1))\). Defining a norm on it results in abnormal behaviour. If you are interested you can check here.

Now let's consider the unit circle \(S^1\) on the plane. The real continuous functions defined on \(S^1\) can be considered as periodic functions defined on \(\mathbb{R}\). So we may have a lot to do with it. If we are interested in the torus (the picture below is from wikipedia),

which is homeomorphic to \(S^1 \times S^1\), how can we study the functions on it? We may consider \(C(S^1) \times C(S^1)\), but as we will show later, there are some problems about that. Anyways, it makes sense to define 'product' from two vector spaces, which can 'expand' it.

Let's review direct sum and direct product first. For the direct product of \(A\) and \(B\), we ask for a algebraic structure on the Cartesian product \(A \times B\). For example, \((a,b)+(a',b')=(a+a',b+b')\). That is, the operation is defined componentwise. This works fine for groups since for each group there is only one binary operation. But at this point we don't care about scalar multiplication.

There are two types of direct sum, inner and outer. For a vector space \(V\) over a field \(\mathbb{F}\), we consider two (or even more) subspaces \(W\) and \(W'\). We have a 'bigger' subspace generated by adding \(W\) and \(W'\) together, namely \(W+W'\), which contains all elements of the form \(w+w'\) where \(w \in W\) and \(w' \in W'\). The representation is not guaranteed to be unique. That is, for \(z=w+w'\), we may have \(w_1 \in W\) and \(w_1' \in W'\) such that \(z=w_1+w_1'\) but \(w \neq w_1'\). This would be weird. Fortunately, the representation is unique if and only if \(W \cap W'\) is trivial. In this case we say the sum of \(W\) and \(W'\) is direct, and write \(W \bigoplus W'\). This is inner direct sum.

Can we represent the direct sum using an ordered pair? Of course we can. Elements in \(W \bigoplus W'\) can be written in the form \((w,w') \in W \times W'\), and the addition is defined componentwise. That is, \((w,w')+(w_1,w_1')=(w+w_1,w'+w_1')\) (which is in fact \((w+w')+(w_1+w_1')=(w+w_1)+(w'+w_1')\)). It seems that we don't go further than direct product. However we need to consider the scalar product. For \(\alpha \in \mathbb{F}\), we have \(\alpha(w,w') = (\alpha{w},\alpha{w'})\) this is because \(\alpha(w+w')=\alpha{w}+\alpha{w'}\). We call this **inner** direct sum because \(W\) and \(W'\) are *inside* \(V\). One may ask, since \(w+w'=w'+w\), why the pair is ordered? For \(w+w'\) we have the first one to be an element of \(W\) and the second one to be \(W'\) but for \(w'+w\) we can't.

Outer direct sum is different. To define this one considers two *arbitrary* vector spaces \(W\) and \(V\) over \(\mathbb{F}\). It is not guaranteed that \(W\) and \(V\) are both subspaces of a bigger vector space. For example it's legit to take \(W\) to be \(\mathbb{R}\) over itself and \(V\) to be all real functions. \(W \bigoplus V\) is defined to be the set of all ordered pairs \((w,v)\) with \(w \in W\) and \(v \in V\). The addition is defined componentwise, and scalar multiplication is defined to be \(\alpha(w,v)=(\alpha{w},\alpha{v})\). One may also write \(w+v\) if context is clear.

When the number of vector spaces is finite, we don't distinguish between direct product and direct sum. When the index is infinite, for example when we consider \(\prod_{i=1}^{\infty}X_i\) and \(\bigoplus_{i=1}^{\infty}X_i\), things are different. To be precise, in the language of category theory, direct product is the *product*, and direct sum is the *coproduct*.

We are not touching the definition but first of all let's imagine what we have for multiplication. Let \(W\) and \(V\) be two vector spaces over \(\mathbb{F}\) and we use \(\cdot\) to be the multiplication for the time being. Law of distribution should hold, that is, we have \(w \cdot v + w' \cdot v = (w+w') \cdot v\) and \(w \cdot v + w \cdot v' = w \cdot (v+v')\). On the other hand, scalar multiplication should be operated on a single component, that is, \(\alpha(w \cdot v)=(\alpha w) \cdot v = w \cdot (\alpha v)\).

It seems illegal to use \(\cdot\) so let's use ordered pair. Under these laws, we have \[(w+w',v)=(w,v)+(w',v) \quad (w,v+v')=(w,v)+(w,v'), \\\alpha(w,v)=(\alpha w,v) = (w,\alpha v).\] It makes sense to call it 'bilinear'. Fixing one component, we have a linear transform. However, direct product and direct product do not work here at all. If it would work, we have \((w,v)+(w',v)=(w+w',v+v)\). This gives rise to the tensor product: we need a legit multiplication works on vector and vector.

We have got the spirit of tensor product. A direct product is not OK. There has to be bilinear operation on itself no matter what. For two vector spaces \(V\) and \(W\), we write the tensor product by \(V \bigotimes W\), for \(v \in V\) and \(w \in W\), we denote its tensor product by \(v \otimes w\), which can be considered as a image or value of a bilinear function \(\varphi(\cdot,\cdot):V \times W \to V \bigotimes W\). There are many bilinear map with domain \(V \times W\). We ask the tensor product to be the essential one.

The

tensor product\(V \bigotimes W\) of \(V\) and \(W\), is the vector space having the following properties.

There exists the canonical bilinear map \(\varphi(\cdot,\cdot):V \times W \to V \otimes W\), and we write \(\varphi(v,w) = v \otimes w \in V \bigotimes W\).

For any bilinear map \(h(\cdot,\cdot):V \times W \to U\), there exists a unique linear map \[\lambda:V \otimes W \to U\] such that \(\lambda(\varphi(v,w)) = h(v,w)\) for all \((v,w) \in V \times W\). This is called the

universal propertyof \(V \bigotimes W\).

It can be easily verified that, if \(V\) and \(W\) have two tensor products, then they are isomorphic (hint: use the universal property). So all tensor products of \(V\) and \(W\) are isomorphic, we only need to pick the obvious one (as long as it exists). But we don't have too much space for it. For further study I recommend the following documents:

- Definition and properties of tensor products. This one involves a considerable amount of explicit calculation and is of elementary approach.
- Tensor products and bases. This one proves the existence in an abstract way.
- Tensor Product as a Universal Object (Category Theory & Module Theory). One of my recent blog posts. The topics here are relatively advanced, and I don't think it's a good idea to use the language of category theory at this early point.

Let \(\mathbb{F}\) be any field (it can be replaced with a commutative ring if you want to), and \(E,F\) be two modules over \(\mathbb{F}\). We will have a glance at the definition of dual space and more importantly, we see what is a transpose. In general we study the bilinear form \[f:E \times F \to \mathbb{F}.\]

Sometimes for simplicity we also write \(f(x,y)=\langle x,y \rangle\). The set of all bilinear forms of \(E \times F\) into \(\mathbb{F}\) will be denoted by \(L^2(E,F;\mathbb{F})\) and you may have seen it earlier.

We define the **kernel** of \(f\) on the left to be \(F^\perp\) and on the right to be \(E^\perp\). Recall that for \(S \subset E\), \(S^\perp\) consists all \(y\) such that \(f(x,y)=0\) whenever \(x \in S\); similarly, for \(T \subset F\), \(T^\perp\) consists all \(x\) such that \(f(x,y)=0\) whenever \(y \in T\). Respectively, we say \(f\) is **non-degenerate** on the left/right if the kernel on the left/right is trivial.

One of the simplest example is the case when \(E=\mathbb{F}^m\) and \(F=\mathbb{F}^n\). We take a \(m \times n\) matrix \(A\) over \(\mathbb{F}\). Define \(f(x,y) = x^T A y\). This is a classic bilinear form. Whether it is non-degenerate on the left or on the right depends on the linear independency of row vectors and column vectors. \(\def\opn{\operatorname}\)

The bilinear form \(f\) gives rise to a homomorphism of \(E\) to a 'space of essential arrows': \[\varphi_f:E \to \opn{Hom}_\mathbb{F}(F,\mathbb{F})\] given by \[\varphi_f(x)(y) = f(x,y)=\langle x, y \rangle.\] \(\opn{Hom}_\mathbb{F}(F,\mathbb{F})\) contains all linear maps of \(F\) into \(\mathbb{F}\). One can imagine \(\opn{Hom}_\mathbb{F}(F,\mathbb{F})\) to be a set of 'arrows' from \(F\) to \(\mathbb{F}\).

Now let's see what we can do in analysis and topology.

Let's consider all complex polynomials of order \(\leq 5\). This is a complex vector space and is in fact isomorphic to \(\mathbb{C}^6\) since we have a bijection mapping \(a_0+a_1z+a_2z^2+a_3z^3+a_4z^4+a_5z^5\) to \((a_0,a_1,a_2,a_3,a_4,a_5)^T\). Therefore we can simply use matrix and vectors. We represent differentiation via matrices. This is a straightforward work. We pick the natural basis \(\\{1,z,z^2,z^3,z^4,z^5\\}\) to begin with and write the differentiation as \(\mathscr{D}\). Since \(\def\ms{\mathscr}\) \[\begin{aligned}\ms{D}(1)&=0 &\quad\ms{D}(z)&=1 \\\ms{D}(z^2)&=2z &\quad\ms{D}(z^3)&=3z^2 \\\ms{D}(z^4)&=4z^3 &\quad\ms{D}(z^5)&=5z^4\end{aligned}\]

We get a matrix corresponding to \(\ms{D}\) by \[D=\begin{pmatrix}0&1&0&0&0&0 \\0&0&2&0&0&0 \\0&0&0&3&0&0 \\0&0&0&0&4&0 \\0&0&0&0&0&5 \\0&0&0&0&0&0\end{pmatrix}\] Next we try to obtain the Jordan normal form of \(D\). Since the minimal polynomial of \(D\) is merely \(m(\lambda)=\lambda^6\), we cannot diagonalise it. After some computation we get \[D =SJS^{-1}= \begin{pmatrix}1&0&0&0&0&0 \\0&1&0&0&0&0 \\0&0&\frac{1}{2}&0&0&0 \\0&0&0&\frac{1}{6}&0&0 \\0&0&0&0&\frac{1}{24}&0 \\0&0&0&0&0&\frac{1}{120}\end{pmatrix}\begin{bmatrix}0&1&0&0&0&0 \\0&0&1&0&0&0 \\0&0&0&1&0&0 \\0&0&0&0&1&0 \\0&0&0&0&0&1 \\0&0&0&0&0&0\end{bmatrix}\begin{pmatrix}1&0&0&0&0&0 \\0&1&0&0&0&0 \\0&0&2&0&0&0 \\0&0&0&6&0&0 \\0&0&0&0&24&0 \\0&0&0&0&0&120\end{pmatrix}\] where the matrix \(J\) in the square bracket is our Jordan normal form. This makes sense since if we consider the basis \(\\{1,z,\frac{1}{2}z^2,\frac{1}{6}z^3,\frac{1}{24}z^4,\frac{1}{120}z^5\\}\), we see under this basis, \[\begin{aligned}\ms{D}(1) &= 0 &\quad \ms{D}(z) &=1 \\\ms{D}(\frac{1}{2}z^2)&= z &\quad \ms{D}(\frac{1}{6}z^3) &= \frac{1}{2}z^2 \\\ms{D}(\frac{1}{24}z^4)&=\frac{1}{6}z^3 &\quad \ms{D}(\frac{1}{120}z^5)&=\frac{1}{24}z^4\end{aligned}\] which coincides with \(J\).

We already know \(\ms{D}^6=0\) but we can also get this by considering \(D^6=SJ^6S^{-1}=0\) since \(J^6=0\). Further, the format of \(S\) should have you realise that we have a hidden \(e\), that is \[e = {\color\red{1+1+\frac{1}{2}+\frac{1}{6}+\frac{1}{24}+\frac{1}{120}}}+\frac{1}{720}+\cdots,\] and the basis is in fact first \(6\) terms of the expansion of \(\exp{z}\).

If this cannot fansinate you I don't know what can!

Next we consider an example on infinite dimensional vector spaces. Consider \(E=C_c^\infty(\mathbb{R})\), the infinite dimensional vectror space of \(C^\infty\) functions on \(\mathbb{R}\) with compact support, namely, for \(f \in C_c^\infty(\mathbb{R})\), we have \(f \in C^\infty\) and there exists some \(0<K<\infty\) such that \(f(x)=0\) outside \([-K,K]\). Next consider the bilinear form \(E \times E \to \mathbb{R}\) defined by the following inner product: \[\langle f,g \rangle =\int_{-\infty}^{\infty}f(x)g(x)dx.\] Note the differential operator \(\ms{D}:E \to E\) is a linear map of \(E\) into \(E\), so let's find its transpose \(\ms{D}^T\). That is, we need to find the unique linear map \(\ms{D}^T:E \to E\) such that \[\langle \ms{D}f,g\rangle = \langle f,\ms{D}^Tg \rangle.\] This is a simple application of integration by parts: \[\begin{aligned}\langle \ms{D}f,g \rangle &= \int_{-\infty}^{\infty}g(x)df(x) \\ &= f(x)g(x)|_{-\infty}^{\infty} - \int_{-\infty}^{\infty}f(x)dg(x) \\ &=\int_{-\infty}^{\infty}f(x)(-\ms{D})g(x)dx \\ &=\langle f,(-\ms{D})g\rangle\end{aligned}\] Hence the **transpose** of differentiation \(\ms{D}\) is \(-\ms{D}\). So we can say it's skew-symmetric for some obvious reason. But the matrix of \(\ms{D}\) in \(n\)-polynomial space is not.

(Perron's theorem)Let \(A\) be a \(n \times n\) matrix having all components \(a_{ij}>0\), then it must have a positive eigenvalue \(\lambda_0\), and a unique corresponding positive eigenvector, i.e., \(x=(x_1,x_2,\cdots,x_n)^T\) such that \(x_i>0\) for all \(i = 1,2,\cdots,n\).

In fact, the positive eigenvalue is the spectral radius of \(A\), which is often written as \(\rho(A)\). I recommend reading the following documents:

- A short proof of Perron's theorem. This mentioned more algebraic properties of \(\rho(A)\).
- The Perron-Frobenius Theorem. This paper mentioned some real life application (modelling growth of a population) and has some exercises to work on.
- Proof of the Frobenius-Perron Theorem. This paper is more elementary-focused.

But here we are using Brouwer's fixed point theorem (you may find an elementary proof on project Euclid). In the following proof, we write \(D_n\) to denote \(n\)-disk and \(\Delta^n\) to denote \(n\)-simplex. That is, \[D_n = \{x \in \mathbb{R}^{n+1}:\lVert x \rVert \leq 1\}, \quad \Delta^n = \left\{(x_1,x_2,\cdots,x_n,x_{n+1}) \in \mathbb{R}^{n+1}:\sum_{i=1}^{n+1}x_i=1, x_i \geq 0\right\}.\] Note \(D_n\) is homeomorphic to \(\Delta^n\). Further we have a lemma:

(Lemma)If \(f:X \to X\) is a continuous function and \(X\) is homeomorphic to \(D_n\), then \(f\) has a fixed point as well.

**Proof of the lemma.** Let \(\varphi\) be the homeomorphism from \(X\) to \(D_n\). Then \(\varphi \circ f \circ \varphi^{-1}:D_n \to D_n\) has a fixed point, according to Brouwer's fixed point theorem, suppose we have \[\varphi \circ f \circ \varphi^{-1}(y)=y.\] Then \[f \circ \varphi^{-1}(y)=\varphi^{-1}(y)\] and hence \(\varphi^{-1}(y) \in X\) is our fixed point. \(\square\)

Now we are ready to prove Perron's theorem using Brouwer's fixed point theorem.

**Proof of Perron's theorem.** Define \(\sigma(x)=\sum_{i=1}^{n}x_i\) where \(x = (x_1,x_2,\cdots,x_n)^T\), we see since it's linear, it's continuous (it's not generally true for infinite dimensional spaces, but it's safe now, and you can see this question on mathstackexchange for a proof). Similarly \(A\) is continuous as well. Also, by definition, \(x \in \Delta^{n-1}\) if and only if \(\sigma(x)=1\). We see Define a function \(g:\Delta^{n-1} \to \Delta^{n-1}\) by \[g(x)=\frac{Ax}{\sigma(Ax)}.\] We will show that this function is well-defined. Since \(x \in \Delta^{n-1}\), not all components of \(x\) are equal to \(0\) since if so, we get \(x_1+x_2+\cdots+x=0\), contradicting the assumption that \(x \in \Delta^{n-1}\). Note we can write down \(Ax\) explicitly (this is an elementary linear algebra thing): \[Ax = \left(\sum_{j=1}^{n}a_{1j}x_j,\sum_{j=1}^{n}a_{2j}x_j,\cdots,\sum_{j=1}^{n}a_{nj}x_j\right)^T.\] Since \(A\) has all components greater than \(0\), we see all components of \(Ax\) are greater than \(0\) as well. Hence \(\sigma(Ax)>0\). On the other hand, \(g(x) \in \Delta^{n-1}\) since \(\sigma(g(x))=\frac{\sigma(Ax)}{\sigma(Ax)}=1\). Since \(A\), \(\sigma\), \(y=\frac{1}{x}\) are continuous, being a composition of continuous functions, \(g\) is continuous.

However, since \(\Delta^{n-1}\) is homeomorphic to \(D_{n-1}\), \(g\) has a fixed point according to the lemma. Hence there exists some \(y \in \Delta^{n-1}\) such that \[g(x)=\frac{Ay}{\sigma(Ay)}=y \implies Ay = \sigma(Ay)y.\] But as we have already proved, \(\lambda_0=\sigma(Ay)\) is continuous. On the other hand, all components of \(y\) are positive since all components of \(Ay\) are positive. The proof is completed. \(\square\)

You are assumed to be familiar with multivariable calculus when reading this subsection since we are discussing it right now. But in general this section is much beyond elementary linear algebra. First of all we are presenting the *ultimate* abstract extension of the usual gradient, curl, and divergence. We simply consider the \(C^\infty\) functions \(\mathbb{R}^3 \to \mathbb{R}^3\). When working on gradient, we consider something like \(\def\pf[#1]{\frac{\partial f}{\partial #1}}\) \[df = \frac{\partial f}{\partial x}dx + \pf[y]dy+\pf[z]dz.\] When working on curl, we consider \[\left(\frac{\partial f_3}{\partial y}-\frac{\partial f_2}{\partial z}\right)dydz - \left(\frac{\partial f_1}{\partial z}-\frac{\partial f_3}{\partial x}\right)dydz + \left(\frac{\partial f_2}{\partial x}-\frac{\partial f_1}{\partial y}\right)dydz.\] Finally for divergence we consider \[\left(\frac{\partial f_1}{\partial x}+\frac{\partial f_2}{\partial y}+\frac{\partial f_3}{\partial z}\right)dxdydz.\] They were connected by Green's theorem, Gauss's theorem, Stokes' theorem. But are they abruptly connected for no reason but numerical equality? Fortunately, no. Let's see why.

First of all for convenience we write \((x_1,x_2,x_3)\) instead of \((x,y,z)\). Define \(dx_idx_j=-dx_jdx_i\) for all \(i,j = 1,2,3\). Note this implies that \(dx_idx_i=0\). For \(d\) we have the definition as follows:

- If \(f\) is a \(C^\infty\) function, then \(df = \sum_{i=1}^{3}\pf[x_i]dx_i\).
- If \(\omega\) is of the
*form*\(\sum f_{i_1 \cdots i_q}dx_{i_1}\dots dx_{i_q}\), then \(d\omega=\sum df_{i_1 \cdots i_q}dx_{i_1}\dots dx_{i_q}\).

Then gradient, curl and divergence follows in the nature of things. You can verify that the second one is actually equal to \(d(f_1dx+f_2dy+f_3dz)\) and the third one is equal to \(d(f_1dydz-f_2dxdz+f_3dxdy)\). We call \(d\) the exterior differentiation.

Linear algebra is not just for \(\mathbb{R}^3\) space, so is exterior differentiation. Let \(\Omega^\ast\) be the algebra over \(\mathbb{R}\) (for algebra over a field, see this), generated by \(dx_1,\dots,dx_n\) with the multiplication defined by an **anti-commutative** multiplication \(dx_idx_j=-dx_jdx_i\) for all \(i,j=1,2,\cdots,n\). As a vector space over \(\mathbb{R}\), \(\Omega^\ast\) is of dimension \(2^n\) with a basis \[1,dx_i,dx_idx_j,dx_idx_jdx_k,dx_1\dots dx_n\] where \(i<j<k\). Let \(C^\infty\) itself be the vector space of \(C^\infty\) functions on \(\mathbb{R}\), and we define the \(C^\infty\) differential *forms* on \(\mathbb{R}^n\) by \[\Omega^\ast(\mathbb{R^n}) = C^\infty \bigotimes_\mathbb{R} \Omega^\ast.\] For simplicity we omit the tensor product symbol \(\otimes\). As a result, for any \(\omega \in \Omega^\ast(\mathbb{R})\), we have \(\omega\) to be a simple \(C^\infty\) function (why don't we call it a \(0\)-form? ) or we have \(\omega = \sum f_{i_1\cdots i_q}dx_{i_1}\dots dx_{i_q}\), and we call it a \(q\)-form since the maximal degree of \(dx_j\) is \(q\). Also we can define \(\Omega^q(\mathbb{R}^n)\) to be the vector space of \(q\)-forms. Consider the differential defined \(d\) defined by \[d:\Omega^q(\mathbb{R}^n) \to \Omega^{q+1}(\mathbb{R}^n)\]

- If \(f\) is a \(C^\infty\) function, then \(df = \sum_{i=1}^{n}\pf[x_i]dx_i\).
- If \(\omega\) is of the
*form*\(\sum f_{i_1 \cdots i_q}dx_{i_1}\dots dx_{i_q}\), then \(d\omega=\sum df_{i_1 \cdots i_q}dx_{i_1}\dots dx_{i_q}\).

This is what we call the exterior differentiation. It's the ultimate abstract extension of gradient, curl and divergence. Your calculus teacher may have warned you, that you cannot deal with \(dx\) independently. So is it safe to work like this? Yes, there is nothing to worry about. We are doing abstraction algebraically.

There are so many concepts can be understood in a linear algebra way. For example we also have \[\Omega^\ast(\mathbb{R}) = \bigoplus_{q=0}^{n}\Omega^q(\mathbb{R}^n).\] In fact Green's theorem, Gauss' theorem and Stokes' theorem have a ultimate abstract extension as well, which is called the general Stokes' theorem:

If \(\omega\) is an \((n-1)\)-form with compact support on an oriented manifold \(M\) of dimension \(n\) and if \(\partial M\) is given the induced orientation, then \[\int_M d\omega = \int_{\partial M}\omega.\]

We are not diving into this theorem but we will conclude this subsection by a glimpse on integration. Recall that the Riemann integral of a differentiable function \(f:\mathbb{R}^n \to \mathbb{R}\) can be written as \[\int_{\mathbb{R}^n}f|dx_1\dots dx_n| = \lim_{\Delta x_i \to 0}f\Delta x_1 \dots \Delta x_n.\] Here we add the absolute value function to \(dx_1 \dots dx_n\) is to emphasise the distinction between the Riemann integral of a function and the integral of differential form, since order only matters in the latter case. For the latter case, if \(\pi\) is a permutation of \(1,2,\cdots,n\) or we simply say \(\pi \in S_n\), then \[\int_{\mathbb{R}^n} f dx_{\pi(1)}\dots dx_{\pi(n)} = (\operatorname{sgn}\pi) \int_{\mathbb{R}^n} f |dx_{\pi(1)}\dots dx_{\pi(n)}|=(\operatorname{sgn}\pi)\int_{\mathbb{R}^n}f|dx_1\dots dx_n|.\] This definition is natural and obvious. Since \(\operatorname{sgn} \pi\) is equal to the determinant of the matrix representing \(\pi\) (see here), it's natural to consider the determinant. Consider the function \[\begin{aligned}\Pi: \mathbb{R}^n &\to \mathbb{R}^n \\(x_1,x_2,\cdots,x_n)&\mapsto (x_{\pi(1)},x_{\pi(2)},\cdots,x_{\pi(n)})\end{aligned}\] Then \(J(\Pi)=\operatorname{sgn}\pi\). This is quite similar to what we expect from Jacobian determinant in general, which describes change-of-variable essentially. Let \(x_1,x_2,\cdots,x_n\) be a basis of \(\mathbb{R}^n\) and \(T:\mathbb{R}^n \to \mathbb{R}^n\) be a diffeomorphism. We have a new basis \(y_1,y_2,\cdots,y_n\) given by \[y_i = \pi_i (T(x_1,x_2,\cdots,x_n))\] where \(\pi_i:(a_1,a_2,\cdots,a_n) \mapsto a_i\) is the \(i\)th projection. Namely \[(y_1,y_2,\cdots,y_n)^T=T(x_1,x_2,\cdots,x_n)=(T_1,T_2,\cdots,T_n)^T,\] written in column vectors. We now show that \[dy_1\dots dy_n = J(T) dx_1\cdots dx_n.\] First we recall that \(J(T)\) is the determinant of \((\partial T_i / \partial x_j)\), and the determinant of a matrix \((a_{ij})\) is defined by \[\sum_{\sigma}\epsilon(\sigma)a_{1,\sigma(1)}a_{2,\sigma(2)}\cdots a_{n,\sigma(n)},\] where \(\epsilon(\sigma)\) is actually \(\operatorname{sgn}\sigma\) and \(\sigma\) ranges through all permutation of \(1,2,\cdots,n\). We need something to coincide. First of all, we compute \(dy_i\). Note \[\frac{\partial y_i}{\partial x_j} = \frac{\partial T_i}{\partial x_j}.\] Hence \[dy_i = \sum_{j=1}^{n}\frac{\partial T_i}{\partial x_j}dx_j.\] We get, as a result, \[dy_1dy_2\cdots dy_n = \prod_{i=1}^{n}\left(\sum_{j=1}^{n}\frac{\partial T_i}{\partial x_j}dx_j\right).\] After cancelling out so many zeros, we get \(J(T)\). You don't have to expand the identity. Pick a component \(\frac{\partial T_1}{\partial x_{j_1}}dx_{j_1}\) from \(dy_1\). Then when we pick another component from \(dy_2\) to get it multiplied with the first one, say \(\frac{\partial T_2}{\partial x_{j_2}}dx_{j_2}\), then we must have \(j_1 \neq j_2\) since if not, then \(dx_{j_1}dx_{j_2}=0\), and we cancel that. The rule remains the same (but even stricter) when we pick components from \(dy_3\), \(dy_4\), and until \(dy_n\). In the end, \(j_1,j_2,\cdots,j_n\) are pairwise unequal. This corresponds exactly a permutation of \(1,2,\cdots,n\). Hence we get \[dy_1dy_2\cdots dy_n = \sum_\sigma \left(\prod_{j=1}^{n} \frac{\partial T_i}{\partial x_{\sigma (i)}}dx_{\sigma(i)}\right) = \sum_{\sigma}\frac{\partial T_1}{\partial x_{\sigma(1)}}\frac{\partial T_2}{\partial x_{\sigma(2)}}\cdots \frac{\partial T_n}{\partial x_{\sigma(n)}}dx_{\sigma(1)}dx_{\sigma(2)}\cdots dx_{\sigma(n)}.\] On the other hand, \(dx_{\sigma(1)}dx_{\sigma(2)}\cdots dx_{\sigma(n)}=\epsilon(\sigma)dx_1dx_2\cdots dx_n\), and if we put this inside the expansion of \(dy_1dy_2\cdots dy_n\), we get \[\begin{aligned}dy_1dy_2\cdots dy_n &= \sum_{\sigma}\epsilon(\sigma)\frac{\partial T_1}{\partial x_{\sigma(1)}}\frac{\partial T_2}{\partial x_{\sigma(2)}}\cdots \frac{\partial T_n}{\partial x_{\sigma(n)}}dx_1dx_2\cdots dx_n \\&=\left(\sum_{\sigma}\epsilon(\sigma)\frac{\partial T_1}{\partial x_{\sigma(1)}}\frac{\partial T_2}{\partial x_{\sigma(2)}}\cdots \frac{\partial T_n}{\partial x_{\sigma(n)}}\right)dx_1dx_2\cdots dx_n \\&=J(T)dx_1dx_2\cdots dx_n.\end{aligned}\] We answered a calculus question in an algebraic way (and more than that if you review more related concepts in calculus).

]]>There are several ways to define Dedekind domain since there are several equivalent statements of it. We will start from the one based on ring of fractions. As a friendly reminder, \(\mb{Z}\) or any principal integral domain is already a Dedekind domain. In fact Dedekind domain may be viewed as a generalization of principal integral domain.

Let \(\mfk{o}\) be an integral domain (a.k.a. entire ring), and \(K\) be its quotient field. A **Dedekind domain** is an integral domain \(\mfk{o}\) such that the fractional ideals form a group under multiplication. Let's have a breakdown. By a **fractional ideal** \(\mfk{a}\) we mean a nontrivial additive subgroup of \(K\) such that

- \(\mfk{o}\mfk{a}=\mfk{a}\),
- there exists some nonzero element \(c \in \mfk{o}\) such that \(c\mfk{a} \subset \mfk{o}\).

What does the group look like? As you may guess, the unit element is \(\mfk{o}\). For a fractional ideal \(\mfk{a}\), we have the inverse to be another fractional ideal \(\mfk{b}\) such that \(\mfk{ab}=\mfk{ba}=\mfk{o}\). Note we regard \(\mfk{o}\) as a subring of \(K\). For \(a \in \mfk{o}\), we treat it as \(a/1 \in K\). This makes sense because the map \(i:a \mapsto a/1\) is injective. For the existence of \(c\), you may consider it as a restriction that the 'denominator' is *bounded*. Alternatively, we say that fractional ideal of \(K\) is a finitely generated \(\mfk{o}\)-submodule of \(K\). But in this post it is not assumed that you have learned module theory.

Let's take \(\mb{Z}\) as an example. The quotient field of \(\mb{Z}\) is \(\mb{Q}\). We have a fractional ideal \(P\) where all elements are of the type \(\frac{np}{2}\) with \(p\) prime and \(n \in \mb{Z}\). Then indeed we have \(\mb{Z}P=P\). On the other hand, take \(2 \in \mb{Z}\), we have \(2P \subset \mb{Z}\). For its inverse we can take a fractional ideal \(Q\) where all elements are of the type \(\frac{2n}{p}\). As proved in algebraic number theory, the ring of algebraic integers in a number field is a Dedekind domain.

Before we go on we need to clarify the definition of ideal multiplication. Let \(\mfk{a}\) and \(\mfk{b}\) be two ideals, we define \(\mfk{ab}\) to be the set of all sums \[x_1y_1+\cdots+x_ny_n\] where \(x_i \in \mfk{a}\) and \(y_i \in \mfk{b}\). Here the number \(n\) means finite but is not fixed. Alternatively we cay say \(\mfk{ab}\) contains all finite sum of products of \(\mfk{a}\) and \(\mfk{b}\).

(Proposition 1)A Dedekind domain \(\mfk{o}\) is Noetherian.

By Noetherian ring we mean that every ideal in a ring is finitely generated. Precisely, we will prove that for every ideal \(\mfk{a} \subset \mfk{o}\) there are \(a_1,a_2,\cdots,a_n \in \mfk{a}\) such that, for every \(r \in \mfk{a}\), we have an expression \[r = c_1a_1 + c_2a_2 + \cdots + c_na_n \qquad c_1,c_2,\cdots,c_n \in \mfk{o}.\] Also note that any ideal \(\mfk{a} \subset \mfk{o}\) can be viewed as a fractional ideal.

**Proof.** Since \(\mfk{a}\) is an ideal of \(\mfk{o}\), let \(K\) be the quotient field of \(\mfk{o}\), we see since \(\mfk{oa}=\mfk{a}\), we may also view \(\mfk{a}\) as a fractional ideal. Since \(\mfk{o}\) is a Dedekind domain, and fractional ideals of \(\mfk{a}\) is a group, there is an fractional ideal \(\mfk{b}\) such that \(\mfk{ab}=\mfk{ba}=\mfk{o}\). Since \(1 \in \mfk{o}\), we may say that there exists some \(a_1,a_2,\cdots, a_n \in \mfk{a}\) and \(b_1,b_2,\cdots,b_n \in \mfk{o}\) such that \(\sum_{i = 1 }^{n}a_ib_i=1\). For any \(r \in \mfk{a}\), we have an expression \[r = rb_1a_1+rb_2a_2+\cdots+rb_na_n.\] On the other hand, any element of the form \(c_1a_1+c_2a_2+\cdots+c_na_n\), by definition, is an element of \(\mfk{a}\). \(\blacksquare\)

From now on, the inverse of an fractional ideal \(\mfk{a}\) will be written like \(\mfk{a}^{-1}\).

(Proposition 2)For ideals \(\mfk{a},\mfk{b} \subset \mfk{o}\), \(\mfk{b}\subset\mfk{a}\) if and only if there exists some \(\mfk{c}\) such that \(\mfk{ac}=\mfk{b}\) (or we simply say \(\mfk{a}|\mfk{b}\))

**Proof.** If \(\mfk{b}=\mfk{ac}\), simply note that \(\mfk{ac} \subset \mfk{a} \cap \mfk{c} \subset \mfk{a}\). For the converse, suppose that \(a \supset \mfk{b}\), then \(\mfk{c}=\mfk{a}^{-1}\mfk{b}\) is an ideal of \(\mfk{o}\) since \(\mfk{c}=\mfk{a}^{-1}\mfk{b} \subset \mfk{a}^{-1}\mfk{a}=\mfk{o}\), hence we may write \(\mfk{b}=\mfk{a}\mfk{c}\). \(\blacksquare\)

(Proposition 3)If \(\mfk{a}\) is an ideal of \(\mfk{o}\), then there are prime ideals \(\mfk{p}_1,\mfk{p}_2,\cdots,\mfk{p}_n\) such that \[\mfk{a}=\mfk{p}_1\mfk{p}_2\cdots\mfk{p}_n.\]

**Proof.** For this problem we use a classical technique: contradiction on maximality. Suppose this is not true, let \(\mfk{A}\) be the set of ideals of \(\mfk{o}\) that cannot be written as the product of prime ideals. By assumption \(\mfk{U}\) is nonempty. Since as we have proved, \(\mfk{o}\) is Noetherian, we can pick an maximal element \(\mfk{a}\) of \(\mfk{A}\) with respect to inclusion. If \(\mfk{a}\) is maximal, then since all maximal ideals are prime, \(\mfk{a}\) itself is prime as well. If \(\mfk{a}\) is properly contained in an ideal \(\mfk{m}\), then we write \(\mfk{a}=\mfk{m}\mfk{m}^{-1}\mfk{a}\). We have \(\mfk{m}^{-1}\mfk{a} \supsetneq \mfk{a}\) since if not, we have \(\mfk{a}=\mfk{ma}\), which implies \(\mfk{m}=\mfk{o}\). But by maximality, \(\mfk{m}^{-1}\mfk{a}\not\in\mfk{U}\), hence it can be written as a product of prime ideals. But \(\mfk{m}\) is prime as well, we have a prime factorization for \(\mfk{a}\), contradicting the definition of \(\mfk{U}\).

Next we show uniqueness up to permutation. If \[\mfk{p}_1\mfk{p}_2\cdots\mfk{p}_k=\mfk{q}_1\mfk{q}_2\cdots\mfk{q}_j,\] since \(\mfk{p}_1\mfk{p}_2\cdots\mfk{p}_k\subset\mfk{p}_1\) and \(\mfk{p}_1\) is prime, we may assume that \(\mfk{q}_1 \subset \mfk{p}_1\). By the property of fractional ideal we have \(\mfk{q}_1=\mfk{p}_1\mfk{r}_1\) for some fractional ideal \(\mfk{r}_1\). However we also have \(\mfk{q}_1 \subset \mfk{r}_1\). Since \(\mfk{q}_1\) is prime, we either have \(\mfk{q}_1 \supset \mfk{p}_1\) or \(\mfk{q}_1 \supset \mfk{r}_1\). In the former case we get \(\mfk{p}_1=\mfk{q}_1\), and we finish the proof by continuing inductively. In the latter case we have \(\mfk{r}_1=\mfk{q}_1=\mfk{p}_1\mfk{q}_1\), which shows that \(\mfk{p}_1=\mfk{o}\), which is impossible. \(\blacksquare\)

(Proposition 4)Every nontrivial prime ideal \(\mfk{p}\) is maximal.

**Proof.** Let \(\mfk{m}\) be an maximal ideal containing \(\mfk{p}\). By proposition 2 we have some \(\mfk{c}\) such that \(\mfk{p}=\mfk{mc}\). If \(\mfk{m} \neq \mfk{p}\), then \(\mfk{c} \neq \mfk{o}\), and we may write \(\mfk{c}=\mfk{p}_1\cdots\mfk{p}_n\), hence \(\mfk{p}=\mfk{m}\mfk{p}_1\cdots\mfk{p}_n\), which is a prime factorisation, contradicting the fact that \(\mfk{p}\) has a unique prime factorisation, which is \(\mfk{p}\) itself. Hence any maximal ideal containing \(\mfk{p}\) is \(\mfk{p}\) itself. \(\blacksquare\)

(Proposition 5)Suppose the Dedekind domain \(\mfk{o}\) only contains one prime (and maximal) ideal \(\mfk{p}\), let \(t \in \mfk{p}\) and \(t \not\in \mfk{p}^2\), then \(\mfk{p}\) is generated by \(t\).

**Proof.** Let \(\mfk{t}\) be the ideal generated by \(t\). By proposition 3 we have a factorisation \[\mfk{t}=\mfk{p}^n\] for some \(n\) since \(\mfk{o}\) contains only one prime ideal. According to proposition 2, if \(n \geq 3\), we write \(\mfk{p}^n=\mfk{p}^2\mfk{p}^{n-2}\), we see \(\mfk{p}^2 \supset \mfk{p}^n\). But this is impossible since if so we have \(t \in \mfk{p}^n \subset \mfk{p}^2\) contradicting our assumption. Hence \(0<n<3\). But If \(n=2\) we have \(t \in \mfk{p}^2\) which is also not possible. So \(\mfk{t}=\mfk{p}\) provided that such \(t\) exists.

For the existence of \(t\), note if not, then for all \(t \in \mfk{p}\) we have \(t \in \mfk{p}^2\), hence \(\mfk{p} \subset \mfk{p}^2\). On the other hand we already have \(\mfk{p}^2 = \mfk{p}\mfk{p}\), which implies that \(\mfk{p}^2 \subset \mfk{p}\) (proposition 2), hence \(\mfk{p}^2=\mfk{p}\), contradicting proposition 3. Hence such \(t\) exists and our proof is finished. \(\blacksquare\)

In fact there is another equivalent definition of Dedekind domain:

A domain \(\mfk{o}\) is Dedekind if and only if

- \(\mfk{o}\) is Noetherian.
- \(\mfk{o}\) is integrally closed.
- \(\mfk{o}\) has Krull dimension \(1\) (i.e. every non-zero prime ideals are maximal).

This is equivalent to say that faction ideals form a group and is frequently used by mathematicians as well. But we need some more advanced techniques to establish the equivalence. Presumably there will be a post about this in the future.

]]>There are several ways to prove it. I think there are several good reasons to write them down thoroughly since that may be why you find this page. Maybe you are burnt out since it's *left as exercise*. You are assumed to have enough knowledge of Lebesgue measure and integration.

Let \(S_1,S_2 \subset \mathbb{R}\) be two measurable set, suppose \(F:S_1 \times S_2 \to \mathbb{R}\) is measurable, then \[\left[\int_{S_2} \left\vert\int_{S_1}F(x,y)dx \right\vert^pdy\right]^{\frac{1}{p}} \leq \int_{S_1} \left[\int_{S_2} |F(x,y)|^p dy\right]^{\frac{1}{p}}dx.\] A proof can be found at here by turning to Example A9. You may need to replace all measures with Lebesgue measure \(m\).

Now let's get into it. For a measurable function in this place we should have \(G(x,t)=\frac{f(t)}{x}\). If we put this function inside this inequality, we see \[\begin{aligned} \lrVert[F]_p &= \left[\int_0^\infty \left\vert \int_0^x \frac{f(t)}{x}dt \right\vert^p dx\right]^{\frac{1}{p}} \\ &= \left[\int_0^\infty \left\vert \int_0^1 f(ux)du \right\vert^p dx\right]^{\frac{1}{p}} \\ &\leq \int_0^1 \left[\int_0^\infty |f(ux)|^pdx\right]^{\frac{1}{p}}du \\ &= \int_0^1 \left[\int_0^\infty |f(ux)|^pudx\right]^{\frac{1}{p}}u^{-\frac{1}{p}}du \\ &= \lrVert[f]_p \int_0^1 u^{-\frac{1}{p}}du \\ &=q\lrVert[f]_p.\end{aligned}\] Note we have used change-of-variable twice and the inequality once.

I have no idea how people came up with this solution. Take \(xF(x)=\int_0^x f(t)t^{u}t^{-u}dt\) where \(0<u<1-\frac{1}{p}\). Hölder's inequality gives us \[\begin{aligned}xF(x) &= \int_0^x f(t)t^ut^{-u}dt \\ &\leq \left[\int_0^x t^{-uq}dt\right]^{\frac{1}{q}}\left[\int_0^xf(t)^pt^{up}dt\right]^{\frac{1}{p}} \\ &=\left(\frac{1}{1-uq}x^{1-uq}\right)^{\frac{1}{q}}\left[\int_0^xf(t)^pt^{up}dt\right]^{\frac{1}{p}}\end{aligned}\] Hence \[\begin{aligned}F(x)^p & \leq \frac{1}{x^p}\left\{\left(\frac{1}{1-uq}x^{1-uq}\right)^{\frac{1}{q}}\left[\int_0^xf(t)^pt^{up}dt\right]^{\frac{1}{p}}\right\}^{p} \\&= \left(\frac{1}{1-uq}\right)^{\frac{p}{q}}x^{\frac{p}{q}(1-uq)-p}\int_0^x f(t)^pt^{up}dt \\&= \left(\frac{1}{1-uq}\right)^{p-1}x^{-up-1}\int_0^x f(t)^pt^{up}dt\end{aligned}\]

Note we have used the fact that \(\frac{1}{p}+\frac{1}{q}=1 \implies p+q=pq\) and \(\frac{p}{q}=p-1\). Fubini's theorem gives us the final answer: \[\begin{aligned}\int_0^\infty F(x)^pdx &\leq \int_0^\infty\left[\left(\frac{1}{1-uq}\right)^{p-1}x^{-up-1}\int_0^x f(t)^pt^{up}dt\right]dx \\&=\left(\frac{1}{1-uq}\right)^{p-1}\int_0^\infty dx\int_0^x f(t)^pt^{up}x^{-up-1}dt \\&=\left(\frac{1}{1-uq}\right)^{p-1}\int_0^\infty dt\int_t^\infty f(t)^pt^{up}x^{-up-1}dx \\&=\left(\frac{1}{1-uq}\right)^{p-1}\frac{1}{up}\int_0^\infty f(t)^pdt.\end{aligned}\] It remains to find the minimum of \(\varphi(u) = \left(\frac{1}{1-uq}\right)^{p-1}\frac{1}{up}\). This is an elementary calculus problem. By taking its derivative, we see when \(u=\frac{1}{pq}<1-\frac{1}{p}\) it attains its minimum \(\left(\frac{p}{p-1}\right)^p=q^p\). Hence we get \[\int_0^\infty F(x)^pdx \leq q^p\int_0^\infty f(t)^pdt,\] which is exactly what we want. Note the constant \(q\) cannot be replaced with a smaller one. We simply proved the case when \(f \geq 0\). For the general case, one simply needs to take absolute value.

This approach makes use of properties of \(L^p\) space. Still we assume that \(f \geq 0\) but we also assume \(f \in C_c((0,\infty))\), that is, \(f\) is continuous and has compact support. Hence \(F\) is differentiable in this situation. Integration by parts gives \[\int_0^\infty F^p(x)dx=xF(x)^p\vert_0^\infty- p\int_0^\infty xdF^p = -p\int_0^\infty xF^{p-1}(x)F'(x)dx.\] Note since \(f\) has compact support, there are some \([a,b]\) such that \(f >0\) only if \(0 < a \leq x \leq b < \infty\) and hence \(xF(x)^p\vert_0^\infty=0\). Next it is natural to take a look at \(F'(x)\). Note we have \[F'(x) = \frac{f(x)}{x}-\frac{\int_0^x f(t)dt}{x^2},\] hence \(xF'(x)=f(x)-F(x)\). A substitution gives us \[\int_0^\infty F^p(x)dx = -p\int_0^\infty F^{p-1}(x)[f(x)-F(x)]dx,\] which is equivalent to say \[\int_0^\infty F^p(x)dx = \frac{p}{p-1}\int_0^\infty F^{p-1}(x)f(x)dx.\] Hölder's inequality gives us \[\begin{aligned}\int_0^\infty F^{p-1}(x)f(x)dx &\leq \left[\int_0^\infty F^{(p-1)q}(x)dx\right]^{\frac{1}{q}}\left[\int_0^\infty f(x)^pdx\right]^{\frac{1}{p}} \\&=\left[\int_0^\infty F^{p}(x)dx\right]^{\frac{1}{q}}\left[\int_0^\infty f(x)^pdx\right]^{\frac{1}{p}}.\end{aligned}\] Together with the identity above we get \[\int_0^\infty F^p(x)dx = q\left[\int_0^\infty F^{p}(x)dx\right]^{\frac{1}{q}}\left[\int_0^\infty f(x)^pdx\right]^{\frac{1}{p}}\] which is exactly what we want since \(1-\frac{1}{q}=\frac{1}{p}\) and all we need to do is divide \(\left[\int_0^\infty F^pdx\right]^{1/q}\) on both sides. So what's next? Note \(C_c((0,\infty))\) is dense in \(L^p((0,\infty))\). For any \(f \in L^p((0,\infty))\), we can take a sequence of functions \(f_n \in C_c((0,\infty))\) such that \(f_n \to f\) with respect to \(L^p\)-norm. Taking \(F=\frac{1}{x}\int_0^x f(t)dt\) and \(F_n = \frac{1}{x}\int_0^x f_n(t)dt\), we need to show that \(F_n \to F\) pointwise, so that we can use Fatou's lemma. For \(\varepsilon>0\), there exists some \(m\) such that \(\lrVert[f_n-f]_p < \frac{1}{n}\). Thus \[\begin{aligned}|F_n(x)-F(x)| &= \frac{1}{x}\left\vert \int_0^x f_n(t)dt - \int_0^x f(t)dt \right\vert \\ &\leq \frac{1}{x} \int_0^x |f_n(t)-f(t)|dt \\ &\leq \frac{1}{x} \left[\int_0^x|f_n(t)-f(t)|^pdt\right]^{\frac{1}{p}}\left[\int_0^x 1^qdt\right]^{\frac{1}{q}} \\ &=\frac{1}{x^{1/p}}\left[\int_0^x|f_n(t)-f(t)|^pdt\right]^{\frac{1}{p}} \\ &\leq \frac{1}{x^{1/p}}\lrVert[f_n-f]_p <\frac{\varepsilon}{x^{1/p}}.\end{aligned}\] Hence \(F_n \to F\) pointwise, which also implies that \(|F_n|^p \to |F|^p\) pointwise. For \(|F_n|\) we have \[\begin{aligned}\int_0^\infty |F_n(x)|^pdx &= \int_0^\infty \left\vert\frac{1}{x}\int_0^x f_n(t)dt\right\vert^p dx \\&\leq \int_0^\infty \left[\frac{1}{x}\int_0^x |f_n(t)|dt\right]^{p}dx \\&\leq q\int_0^\infty |f_n(t)|^pdt\end{aligned}\] note the third inequality follows since we have already proved it for \(f \geq 0\). By Fatou's lemma, we have \[\begin{aligned}\int_0^\infty |F(x)|^pdx &= \int_0^\infty \lim_{n \to \infty}|F_n(x)|^pdx \\&\leq \lim_{n \to \infty} \int_0^\infty |F_n(x)|^pdx \\&\leq \lim_{n \to \infty}q^p\int_0^\infty |f_n(x)|^pdx \\&=q^p\int_0^\infty |f(x)|^pdx.\end{aligned}\]

]]>It is quite often to see direct sum or direct product of groups, modules, vector spaces. Indeed, for modules over a ring \(R\), direct products are also **direct products** of \(R\)-modules as well. On the other hand, the direct sum is a **coproduct** in the category of \(R\)-modules.

But what about tensor products? It is some different kind of *product* but how? Is it related to direct product? How do we write a tensor product down? We need to solve this question but it is not a good idea to dig into numeric works.

From now on, let \(R\) be a commutative ring, and \(M_1,\cdots,M_n\) are \(R\)-modules. Mainly we work on \(M_1\) and \(M_2\), i.e. \(M_1 \times M_2\) and \(M_1 \otimes M_2\). For \(n\)-multilinear one, simply replace \(M_1\times M_2\) with \(M_1 \times M_2 \times \cdots \times M_n\) and \(M_1 \otimes M_2\) with \(M_1 \otimes \cdots \otimes M_n\). The only difference is the change of symbols.

The bilinear maps of \(M_1 \times M_2\) determines a category, say \(BL(M_1 \times M_2)\) or we simply write \(BL\). For an object \((f,E)\) in this category we have \(f: M_1 \times M_2 \to E\) as a bilinear map and \(E\) as a \(R\)-module of course. For two objects \((f,E)\) and \((g,F)\), we define the morphism between them as a linear function making the following diagram commutative: \(\def\mor{\operatorname{Mor}}\)

This indeed makes \(BL\) a category. If we define the morphisms from \((f,E)\) to \((g,F)\) by \(\mor(f,g)\) (for simplicity we omit \(E\) and \(F\) since they are already determined by \(f\) and \(g\)) we see the composition \[\mor(f,g) \times \mor(h,g) \to \mor(h,f)\] satisfy all axioms for a category:

**CAT 1** Two sets \(\mor(f,g)\) and \(\mor(f',g')\) are disjoint unless \(f=f'\) and \(g=g'\), in which case they are equal. If \(g \neq g'\) but \(f = f'\) for example, for any \(h \in \mor(f,g)\), we have \(g = h \circ f = h \circ f' \neq g'\), hence \(h \notin \mor(f,g)\). Other cases can be verified in the same fashion.

**CAT 2** The existence of identity morphism. For any \((f,E) \in BL\), we simply take the identity map \(i:E \to E\). For \(h \in \mor(f,g)\), we see \(g = h \circ f = h \circ i \circ f\). For \(h' \in \mor(g,f)\), we see \(f = h' \circ g = i \circ h' \circ g\).

**CAT 3** The law of composition is associative when defined.

There we have a category. But what about the tensor product? It is defined to be *initial* (or *universally repelling*) object in this category. Let's denote this object by \((\varphi,M_1 \otimes M_2)\).

For any \((f,E) \in BL\), we have a unique morphism (which is a module homomorphism as well) \(h:(\varphi,M_1 \otimes M_2) \to (f,E)\). For \(x \in M_1\) and \(y \in M_2\), we write \(\varphi(x,y)=x \otimes y\). We call the existence of \(h\) the

universal propertyof \((\varphi,M_1 \otimes M_2)\).

The tensor product is unique up to isomorphism. That is, if both \((f,E)\) and \((g,F)\) are tensor products, then \(E \simeq F\) in the sense of module isomorphism. Indeed, let \(h \in \mor(f,g)\) and \(h' \in \mor(g,h)\) be the unique morphisms respectively, we see \(g = h \circ f\), \(f = h' \circ g\), and therefore \[g = h \circ h' \circ g \\f = h' \circ h \circ f\] Hence \(h \circ h'\) is the identity of \((g,F)\) and \(h' \circ h\) is the identity of \((f,E)\). This gives \(E \simeq F\).

What do we get so far? For any modules that is connected to \(M_1 \times M_2\) with a bilinear map, the tensor product \(M_1 \oplus M_2\) of \(M_1\) and \(M_2\), is always able to be connected to that module with a unique module homomorphism. What if there are more than one tensor products? Never mind. All tensor products are isomorphic.

But wait, does this definition make sense? Does this product even exist? How can we study the tensor product of two modules if we cannot even write it down? So far we are only working on arrows, and we don't know what is happening inside an module. It is not a good idea to waste our time on 'nonsenses'. We can look into it in an natural way. Indeed, if we can find a module satisfying the property we want, then we are done, since this can represent the tensor product under any circumstances. Again, all tensor products of \(M_1\) and \(M_2\) are isomorphic.

Let \(M\) be the free module generated by the set of all tuples \((x_1,x_2)\) where \(x_1 \in M_1\) and \(x_2 \in M_2\), and \(N\) be the submodule generated by tuples of the following types: \[(x_1+x_1',x_2)-(x_1,x_2)-(x_1',x_2) \\(x_1,x_2+x_2')-(x_1,x_2)-(x_1,x_2') \\(ax_1,x_2)-a(x_1,x_2) \\(x_1,ax_2) - a(x_1,x_2)\] First we have a inclusion map \(\alpha=M_1 \times M_2 \to M\) and the canonical map \(\pi:M \to M/N\). We claim that \((\pi \circ \alpha, M/N)\) is exactly what we want. But before that, we need to explain why we define such a \(N\).

The reason is quite simple: We want to make sure that \(\varphi=\pi \circ \alpha\) is bilinear. For example, we have \(\varphi(x_1+x_1',x_2)=\varphi(x_1,x_2)+\varphi(x_1',x_2)\) due to our construction of \(N\) (other relations follow in the same manner). This can be verified group-theoretically. Note \[\varphi(x_1+x_1',x_2)=(x_1+x_1',x_2)+N \\\varphi(x_1,x_2)+\varphi(x_1',x_2)=(x_1,x_2)+(x_1',x_2)+N\] but \[\varphi(x_1+x_1',x_2)-\varphi(x_1,x_2)-\varphi(x_1',x_2)=(x_1+x_1',x_2)-(x_1,x_2)-(x_1',x_2) +N = 0+N.\] Hence we get the identity we want. For this reason we can write \[\begin{aligned}(x_1+x_1')\otimes x_2 &= x_1 \otimes x_2 + x_1' \otimes x_2, \\x_1 \otimes (x_2 + x_2') &= x_1 \otimes x_2 + x_1 \otimes x_2', \\(ax_1) \otimes x_2 &= a(x_1 \otimes x_2), \\x_1 \otimes (ax_2) &= a(x_1 \otimes x_2).\end{aligned}\] Sometimes to avoid confusion people may also write \(x_1 \otimes_R x_2\) if both \(M_1\) and \(M_2\) are \(R\)-modules. But before that we have to verify that this is indeed the tensor product. To verify this, all we need is the universal property of free modules.

By the universal property of \(M\), for any \((f,E) \in BL\), we have a induced map \(f_\ast\) making the diagram inside commutative. However, for elements in \(N\), we see \(f_\ast\) takes value \(0\), since \(f_\ast\) is a bilinear map already. We finish our work by taking \(h[(x,y)+N] = f_\ast(x,y)\). This is the map induced by \(f_\ast\), following the property of factor module.

For coprime integers \(m,n>1\), we have \(\def\mb{\mathbb}\) \[\mb{Z}/m\mb{Z} \otimes \mb{Z}/n\mb{Z} = O\] where \(O\) means that the module only contains \(0\) and \(\mb{Z}/m\mb{Z}\) is considered as a module over \(\mb{Z}\) for \(m>1\). This suggests that, the tensor product of two modules is not necessarily 'bigger' than its components. Let's see why this is trivial.

Note that for \(x \in \mb{Z}/m\mb{Z}\) and \(y \in \mb{Z}/n\mb{Z}\), we have \[m(x \otimes y) = (mx) \otimes y = 0 \\n(x \otimes y) = x \otimes(ny) = 0\] since, for example, \(mx = 0\) for \(x \in \mb{Z}/m\mb{Z}\) and \(\varphi(0,y)=0\). If you have trouble understanding why \(\varphi(0,y)=0\), just note that the submodule \(N\) in our construction contains elements generated by \((0x,y)-0(x,y)\) already.

By Bézout's identity, for any \(x \otimes y\), we see there are \(a\) and \(b\) such that \(am+bn=1\), and therefore \[\begin{aligned}x \otimes y &= (am+bn)(x \otimes y) \\ &=am(x \otimes y)+bn (x \otimes y) \\ &= 0.\end{aligned}\] Hence the tensor product is trivial. This example gives us a lot of inspiration. For example, what if \(m\) and \(n\) are not necessarily coprime, say \(\gcd(m,n)=d\)? By Bézout's identity still we have \[d(x \otimes y) = (am+bn)(x \otimes y) = 0.\] This inspires us to study the connection between \(\mb{Z}/m\mb{Z} \otimes \mb{Z}/n\mb{Z}\) and \(\mb{Z}/d\mb{Z}\). By the **universal property**, for the bilinear map \(f:\mb{Z}/m\mb{Z} \times \mb{Z}/n\mb{Z} \to \mb{Z}/d\mb{Z}\) defined by \[(a+m\mb{Z},b+n\mb{Z})\mapsto ab+d\mb{Z}\] (there should be no difficulty to verify that \(f\) is well-defined), there exists a unique morphism \(h:\mb{Z}/m\mb{Z} \otimes \mb{Z}/n\mb{Z} \to \mb{Z}/d\mb{Z}\) such that \[h \circ \varphi(a+m\mb{Z},b+n\mb{Z}) = h((a+m\mb{Z}) \otimes(b+n\mb{Z})) = ab+d\mb{Z}.\] Next we show that it has a natural inverse defined by \[\begin{aligned}g:\mb{Z}/d\mb{Z} &\to \mb{Z}/m\mb{Z} \otimes \mb{Z}/n\mb{Z} \\a+d\mb{Z} &\mapsto (a+m\mb{Z}) \otimes (1+n\mb{Z}).\end{aligned}\] Taking \(a' = a+kd\), we show that \(g(a+d\mb{Z})=g(a'+\mb{Z})\), that is, we need to show that \[(a+m\mb{Z})\otimes(1+n\mb{Z}) = (a'+m\mb{Z}) \otimes (1+n\mb{Z}).\] By Bézout's identity, there exists some \(r,s\) such that \(rm+sn=d\). Hence \(a' = a + ksn+krm\), which gives \[\begin{aligned}(a'+m\mb{Z}) \otimes (1+n\mb{Z}) &= (a+ksn+krm+m\mb{Z}) \otimes(1+n\mb{Z}) \\ &= (a+ksn+m\mb{Z}) \otimes (1+n\mb{Z}) \\ &=(a+m\mb{Z}) \otimes(1+n\mb{Z}) + (ksn+m\mb{Z})\otimes(1+n\mb{Z}) \\ &=(a+m\mb{Z}) \otimes (1+n\mb{Z})\end{aligned}\] since \[(ksn+m\mb{Z}) \otimes (1+n\mb{Z}) =n(ks+m\mb{Z}) \otimes (1+n\mb{Z}) = (ks+m\mb{Z}) \otimes(n+n\mb{Z}) = 0.\] So \(g\) is well-defined. Next we show that this is the inverse. Firstly \[\begin{aligned}g \circ h((a+m\mb{Z}) \otimes(b+n\mb{Z})) &= g(ab+d\mb{Z})\\ &= (ab+m\mb{Z}) \otimes (1+n\mb{Z}) \\ &=b(a+m\mb{Z}) \otimes(1+n\mb{Z}) \\ &= (a+m\mb{Z}) \otimes (b+n\mb{Z}).\end{aligned}\] Secondly, \[\begin{aligned}h \circ g(a+d\mb{Z}) &= h((a+m\mb{Z}) \otimes(1+n\mb{Z})) \\ &= a+d\mb{Z}.\end{aligned}\] Hence \(g = h^{-1}\) and we can say \[\mb{Z}/m\mb{Z} \otimes \mb{Z} /n\mb{Z} \simeq \mb{Z} /\gcd(m,n)\mb{Z}.\] If \(m,n\) are coprime, then \(\gcd(m,n)=1\), hence \(\mb{Z}/m\mb{Z} \otimes \mb{Z}/n\mb{Z} \simeq \mb{Z}/\mb{Z}\) is trivial. More interestingly, \(\mb{Z}/m\mb{Z}\otimes \mb{Z}/m\mb{Z}=\mb{Z}/m\mb{Z}\). But this elegant identity raised other questions. First of all, \(\gcd(m,n)=\gcd(n,m)\), which implies \[\mb{Z}/m\mb{Z} \otimes \mb{Z}/n\mb{Z} \simeq \mb{Z}/\gcd(m,n)\mb{Z} \simeq \mb{Z}/\gcd(n,m)\mb{Z} \simeq\mb{Z}/n\mb{Z}\otimes\mb{Z}/m\mb{Z}.\] Further, for \(m,n,r >1\), we have \(\gcd(\gcd(m,n),r)=\gcd(m,\gcd(n,r))=\gcd(m,n,r)\), which gives \[(\mb{Z}/m\mb{Z}\otimes\mb{Z}/n\mb{Z})\otimes\mb{Z}/r\mb{Z} \simeq \mb{Z}/\gcd(m,n)\mb{Z}\otimes\mb{Z}/r\mb{Z} \simeq \mb{Z}/\gcd(m,n,r)\mb{Z} \\\mb{Z}/m\mb{Z}\otimes(\mb{Z}/n\mb{Z} \otimes\mb{Z}/r\mb{Z}) \simeq \mb{Z}/m\mb{Z} \otimes\mb{Z}/\gcd(n,r)\mb{Z} \simeq \mb{Z}/\gcd(m,n,r)\mb{Z}\] hence \[(\mb{Z}/m\mb{Z}\otimes\mb{Z}/n\mb{Z})\otimes\mb{Z}/r\mb{Z} \simeq \mb{Z}/m\mb{Z}\otimes(\mb{Z}/n\mb{Z}\otimes\mb{Z}/r\mb{Z}).\] Hence for modules of the form \(\mb{Z}/m\mb{Z}\), we see the tensor product operation is associative and commutative up to isomorphism. Does this hold for all modules? The universal property answers this question affirmatively. From now on we will be keep using the universal property. Make sure that you have got the point already.

Let \(M_1,M_2,M_3\) be \(R\)-modules, then there exists a unique isomorphism \[\begin{aligned}(M_1 \otimes M_2) \otimes M_3 &\xrightarrow{\simeq} M_1 \otimes (M_2 \otimes M_3) \\(x \otimes y) \otimes z &\mapsto x \otimes(y \otimes z)\end{aligned}\] for \(x \in M_1\), \(y \in M_2\), \(z \in M_3\).

*Proof.* Consider the map \[\begin{aligned}\lambda_x:M_2 \times M_3 &\to (M_1 \otimes M_2)\otimes M_3 \\ (y,z) &\mapsto (x \otimes y ) \otimes z\end{aligned}\] where \(x \in M_1\). Since \((\cdot\otimes\cdot)\) is bilinear, we see \(\lambda_x\) is bilinear for all \(x \in M_1\). Hence by the universal property there exists a unique map of the tensor product: \[\overline{\lambda}_x:M_2 \otimes M_3 \to (M_1 \otimes M_2) \otimes M_3.\] Next we have the map \[\begin{aligned}\mu_x: M_1 \times (M_2 \otimes M_3) &\to (M_1 \otimes M_2) \otimes M_3 \\(x,y \otimes z) &\mapsto \overline{\lambda}_x(y \otimes z)\end{aligned}\] which is bilinear as well. Again by the universal property we have a unique map \[\overline{\mu}_x: M_1 \otimes (M_2 \otimes M_3) \to (M_1 \otimes M_2) \otimes M_3.\] This is indeed the isomorphism we want. The reverse is obtained by reversing the process. For the bilinear map \[\lambda_x':M_1 \times M_2 \to M_1 \otimes (M_2 \otimes M_3) \] we get a unique map \[\overline{\lambda'}_x: M_1 \otimes M_2 \to M_1 \otimes (M_2 \otimes M_3).\] Then from the bilinear map \[\mu'_x:(M_1 \otimes M_2) \times M_3 \to M_1 \otimes (M_2 \otimes M_3)\] we get the unique map, which is actually the reverse of \(\overline{\mu}_x\): \[\overline{\mu'}_x:(M_1 \otimes M_2) \otimes M_3 \to M_1 \otimes (M_2 \otimes M_3).\] Hence the two tensor products are isomorphic. \(\square\)

Let \(M_1\) and \(M_2\) be \(R\)-modules, then there exists a unique isomorphism \[\begin{aligned}M_1 \otimes M_2 &\xrightarrow{\simeq} M_2 \otimes M_1 \\x_1 \otimes x_2 &\mapsto x_2 \otimes x_1\end{aligned}\] where \(x_1 \in M_1\) and \(x_2 \in M_2\).

*Proof.* The map \[\begin{aligned}\lambda:M_1 \times M_2 &\to M_2 \otimes M_1 \\(x,y) &\mapsto y \otimes x\end{aligned}\] is bilinear and gives us a unique map \[\overline{\lambda}:M_1 \otimes M_2 \to M_2 \otimes M_1\] given by \(x \otimes y \mapsto y \otimes x\). Symmetrically, the map \(\lambda':M_2 \times M_1 \to M_1 \otimes M_2\) gives us a unique map \[\overline{\lambda'}:M_2 \otimes M_1 \to M_1 \otimes M_2\] which is the inverse of \(\overline{\lambda}\). \(\square\)

Therefore, we may view the set of all \(R\)-modules as a commutative semigroup with the binary operation \(\otimes\).

Consider commutative diagram:

Where \(f_i:M_i \to M_i'\) are some module-homomorphism. What do we want here? On the left hand, we see \(f_1 \times f_2\) sends \((x_1,x_2)\) to \((f_1(x_1),f_2(x_2))\), which is quite natural. The question is, is there a natural map sending \(x_1 \otimes x_2\) to \(f_1(x_1) \otimes f_2(x_2)\)? This is what we want from the right hand. We know \(T(f_1 \times f_2)\) exists, since we have a bilinear map by \(\mu = \varphi' \circ (f_1\times f_2)\). So for \((x_1,x_2) \in M_1 \times M_2\), we have \(T(f_1 \times f_2)(x_1 \otimes x_2) = \varphi' \circ (f_1 \times f_2)(x_1,x_2) = f_1(x_1) \otimes f_2(x_2)\) as what we want.

But \(T\) in this graph has more interesting properties. First of all, if \(M_1 = M_1'\) an \(M_2 = M_2'\), both \(f_1\) and \(f_2\) are identity maps, then we see \(T(f_1 \times f_2)\) is the identity as well. Next, consider the following chain \[\cdots \to M_1 \times M_2 \xrightarrow{(f_1 \times f_2)}M_1' \times M_2' \xrightarrow{(g_1 \times g_2)}M_1'' \times M_2''\to \cdots.\] We can make it a double chain:

It is obvious that \((g_1 \circ f_1 \times g_2 \circ f_2)=(g_1 \times g_2) \circ (f_1 \times f_2)\), which also gives \[T(g_1 \times g_2) \circ T(f_1 \times f_2) = T(g_1 \circ f_1 \times g_2 \circ f_2).\] Hence we can say \(T\) is functorial. Sometimes for simplicity we also write \(T(f_1,f_2)\) or simply \(f_1 \otimes f_2\), as it sends \(x_1 \otimes x_2\) to \(f_1(x_1) \otimes f_2(x_2)\). Indeed it can be viewed as a map \[\begin{aligned}T:L(M_1, M_1') \times L(M_2,M_2') &\to L(M_1 \otimes M_2, M_1' \otimes M_2') \\(f_1 \times f_2) &\mapsto f_1 \otimes f_2.\end{aligned}\]

]]>First we recall some backgrounds. Suppose \(A\) is a ring with multiplicative identity \(1_A\). A **left module** of \(A\) is an additive abelian group \((M,+)\), together with an ring operation \(A \times M \to M\) such that \[\begin{aligned}(a+b)x &= ax+bx \\a(x+y) &= ax+ay \\a(bx) &= (ab)x \\1_Ax &= x\end{aligned}\] for \(x,y \in M\) and \(a,b \in A\). As a corollary, we see \((0_A+0_A)x=0_Ax=0_Ax+0_Ax\), which shows \(0_Ax=0_M\) for all \(x \in M\). On the other hand, \(a(x-x)=0_M\) which implies \(a(-x)=-(ax)\). We can also define right \(A\)-modules but we are not discussing them here.

Let \(S\) be a subset of \(M\). We say \(S\) is a **basis** of \(M\) if \(S\) generates \(M\) and \(S\) is linearly independent. That is, for all \(m \in M\), we can pick \(s_1,\cdots,s_n \in S\) and \(a_1,\cdots,a_n \in A\) such that \[m = a_1s_1+a_2s_2+\cdots+a_ns_n,\] and, for any \(s_1,\cdots,s_n \in S\), we have \[a_1s_1+a_2s_2+\cdots+a_ns_n=0_M \implies a_1=a_2=\cdots=a_n=0_A.\] Note this also shows that \(0_M\notin S\) (what happens if \(0_M \in S\)?). We say \(M\) is **free** if it has a basis. The case when \(M\) or \(A\) is trivial is excluded.

If \(A\) is a field, then \(M\) is called a **vector space**, which has no difference from the one we learn in linear algebra and functional analysis. Mathematicians in functional analysis may be interested in the cardinality of a vector space, for example, when a vector space is of finite dimension, or when the basis is countable. But the basis does not come from nowhere. In fact we can prove that vector spaces have basis, but modules are not so lucky. \(\def\mb{\mathbb}\)

First of all let's consider the cyclic group \(\mb{Z}/n\mb{Z}\) for \(n \geq 2\). If we define \[\begin{aligned}\mb{Z} \times \mb{Z}/n\mb{Z} &\to \mb{Z}/n\mb{Z} \\(m,k+n\mb{Z}) &\mapsto mk+n\mb{Z}\end{aligned}\] which is actually \(m\) copies of an element, then we get a module, which will be denoted by \(M\). For any \(x=k+n\mb{Z} \in M\), we see \(nk+n\mb{Z}=0_M\). Therefore for **any** subset \(S \subset M\), if \(x_1,\cdots,x_k \in M\), we have \[nx_1+nx_2+\cdots+nx_k = 0_M,\] which gives the fact that \(M\) has no basis. In fact this can be generalized further. If \(A\) is a ring but not a field, let \(I\) be a nontrivial proper ideal, then \(A/I\) is a module that has no basis.

Following \(\mb{Z}/n\mb{Z}\) we also have another example on finite order. Indeed, **any finite abelian group is not free as a module over \(\mb{Z}\).** More generally,

Let \(G\) be a abelian group, and \(G_{tor}\) be its torsion subgroup. If \(G_{tor}\) is non-trival, then \(G\) cannot be a free module over \(\mb{Z}\).

Next we shall take a look at infinite rings. Let \(F[X]\) be the polynomial ring over a field \(F\) and \(F'[X]\) be the polynomial sub-ring that have coefficient of \(X\) equal to \(0\). Then \(F[X]\) is a \(F'[X]\)-module. However it is not free.

Suppose we have a basis \(S\) of \(F[X]\), then we claim that \(|S|>1\). If \(|S|=1\), say \(P \in S\), then \(P\) cannot generate \(F[X]\) since if \(P\) is constant then we cannot generate a polynomial contains \(X\) with power \(1\); If \(P\) is not constant, then the constant polynomial cannot be generate. Hence \(S\) contains at least two polynomials, say \(P_1 \neq 0\) and \(P_2 \neq 0\). However, note \(-X^2P_1 \in F'[X]\) and \(X^2P_2 \in F'[X]\), which gives \[(X^2P_2)P_1-(X^2P_1)P_2=0.\] Hence \(S\) cannot be a basis.

I hope those examples have convinced you that basis is not a universal thing. We are going to prove that every vector space has a basis. More precisely,

Let \(V\) be a nontrivial vector space over a field \(K\). Let \(\Gamma\) be a set of generators of \(V\) over \(K\) and \(S \subset \Gamma\) is a subset which is linearly independent, then there exists a basis of \(V\) such that \(S \subset B \subset \Gamma\).

Note we can always find such \(\Gamma\) and \(S\). For the extreme condition, we can pick \(\Gamma=V\) and \(S\) be a set containing any single non-zero element of \(V\). Note this also gives that we can generate a basis by expanding any linearly independent set. The proof relies on a fact that every non-zero element in a field is invertible, and also, Zorn's lemma. In fact, axiom of choice is equivalent to the statement that every vector has a set of basis.\(\def\mfk{\mathfrak}\)

*Proof.* Define \[\mfk{T} =\{T \subset \Gamma:S \subset T, \text{ $T$ is linearly independent}\}.\] Then \(\mfk{T}\) is not empty since it contains \(S\). If \(T_1 \subset T_2 \subset \cdots\) is a totally ordered chain in \(\mfk{T}\), then \(T=\bigcup_{i=1}^{\infty}T_i\) is again linearly independent and contains \(S\). To show that \(T\) is linearly independent, note that if \(x_1,x_2,\cdots,x_n \in T\), we can find some \(k_1,\cdots,k_n\) such that \(x_i \in T_{k_i}\) for \(i=1,2,\cdots,n\). If we pick \(k = \max(k_1,\cdots,k_n)\), then \[x_1,x_2,\cdots,x_n \in \bigcup_{i=1}^{n}T_{k_i}=T_k.\] But we already know that \(T_k\) is linearly independent, so \(a_1x_1+\cdots+a_nx_n=0_V\) implies \(a_1=\cdots=a_n=0_K\).

By Zorn's lemma, let \(B\) be the maximal element of \(\mfk{T}\), then \(B\) is also linearly independent since it is an element of \(\mfk{T}\). Next we show that \(B\) generates \(V\). Suppose not, then we can pick some \(x \in \Gamma\) that is not generated by \(B\). Define \(B'=B \cup \\{x\\}\), we see \(B'\) is linearly independent as well, because if we pick \(y_1,y_2,\cdots,y_n \in B\), and if \[\sum_{k=1}^{n}a_ky_k+bx=0_V,\] then if \(b \neq 0\) we have \[x = -\sum_{k=1}^{n}b^{-1}a_ky_k \in B,\] contradicting the assumption that \(x\) is not generated by \(B\). Hence \(b=0_K\). However, we have proved that \(B'\) is a linearly independent set containing \(B\) and contained in \(S\), contradicting the maximality of \(B\) in \(\mfk{T}\). Hence \(B\) generates \(V\). \(\square\)

]]>In fact the construction of \(\mathbb{Q}\) from \(\mathbb{Z}\) has already been an example. For any \(a \in \mathbb{Q}\), we have some \(m,n \in \mathbb{Z}\) with \(n \neq 0\) such that \(a = \frac{m}{n}\). As a matter of notation we may also say an ordered pair \((m,n)\) determines \(a\). Two ordered pairs \((m,n)\) and \((m',n')\) are *equivalent* if and only if \[mn'-m'n=0.\] But we are only using the ring structure of \(\mathbb{Z}\). So it is natural to think whether it is possible to generalize this process to all rings. But we are also using the fact that \(\mathbb{Z}\) is an entire ring (or alternatively integral domain, they mean the same thing). However there is a way to generalize it. \(\def\mfk{\mathfrak}\)

(Definition 1)Amultiplicatively closed subset\(S \subset A\) is a set that \(1 \in S\) and if \(x,y \in S\), then \(xy \in S\).

For example, for \(\mathbb{Z}\) we have a multiplicatively closed subset \[\{1,2,4,8,\cdots\} \subset \mathbb{Z}.\] We can also insert \(0\) here but it may produce some bad result. If \(S\) is also an ideal then we must have \(S=A\) so this is not very interesting. However the complement is interesting.

(Proposition 1)Suppose \(A\) is a commutative ring such that \(1 \neq 0\). Let \(S\) be a multiplicatively closed set that does not contain \(0\). Let \(\mfk{p}\) be the maximal element of ideals contained in \(A \setminus S\), then \(\mfk{p}\) is prime.

*Proof.* Recall that \(\mfk{p}\) is prime if for any \(x,y \in A\) such that \(xy \in \mfk{p}\), we have \(x \in \mfk{p}\) or \(y \in \mfk{p}\). But now we fix \(x,y \in \mfk{p}^c\). Note we have a strictly bigger ideal \(\mfk{q}_1=\mfk{p}+Ax\). Since \(\mfk{p}\) is maximal in the ideals contained in \(A \setminus S\), we see \[\mfk{q}_1 \cap S \neq \varnothing.\] Therefore there exist some \(a \in A\) and \(p \in \mfk{p}\) such that \[p+ax \in S.\] Also, \(\mfk{q}_2=\mfk{p}+Ay\) has nontrivial intersection with \(S\) (due to the maximality of \(\mfk{p}\)), there exist some \(a' \in A\) and \(p' \in \mfk{p}\) such that \[p' + a'y \in S.\] Since \(S\) is closed under multiplication, we have \[(p+ax)(p'+a'y) = pp'+p'ax+pa'y+aa'xy \in S.\] But since \(\mfk{p}\) is an ideal, we see \(pp'+p'ax+pa'y \in \mfk{p}\). Therefore we must have \(xy \notin \mfk{p}\) since if not, \((p+ax)(p'+a'y) \in \mfk{p}\), which gives \(\mfk{p} \cap S \neq \varnothing\), and this is impossible. \(\square\)

As a corollary, for an ideal \(\mfk{p} \subset A\), if \(A \setminus \mfk{p}\) is multiplicatively closed, then \(\mfk{p}\) is prime. Conversely, if we are given a prime ideal \(\mfk{p}\), then we also get a multiplicatively closed subset.

(Proposition 2)If \(\mfk{p}\) is a prime ideal of \(A\), then \(S = A \setminus \mfk{p}\) is multiplicatively closed.

*Proof.* First \(1 \in S\) since \(\mfk{p} \neq A\). On the other hand, if \(x,y \in S\) we see \(xy \in S\) since \(\mfk{p}\) is prime. \(\square\)

We define a equivalence relation on \(A \times S\) as follows: \[(a,s) \sim (b,t) \iff \exists u \in S, (at-bs)u=0.\]

(Proposition 3)\(\sim\) is an equivalence relation.

*Proof.* Since \((as-as)1=0\) while \(1 \in S\), we see \((a,s) \sim (a,s)\). For being symmetric, note that \[(at-bs)u=0 \implies (bs-at)u=0 \implies (b,t) \sim (a,s).\] Finally, to show that it is transitive, suppose \((a,s) \sim (b,t)\) and \((b,t) \sim (c,u)\). There exist \(u,v \in S\) such that \[(at-bs)v=(bu-ct)w=0.\] This gives \(bsv=atv\) and \(buw = ctw\), which implies \[bsvuw=atvuw=ctwsv \implies (au-cs)tvw =0.\] But \(tvw \in S\) since \(t,v,w \in S\) and \(S\) is multiplicatively closed. Hence \[[(a,s) \sim (b,t)] \land [(b,t) \sim (c,u)] \implies (a,s) \sim (c,u).\] \(\square\)

Let \(a/s\) denote the equivalence class of \((a,s)\). Let \(S^{-1}A\) denote the set of equivalence classes (it is not a good idea to write \(A/S\) as it may coincide with the notation of factor group), and we put a ring structure on \(S^{-1}A\) as follows: \[(a/s)+(b/t)=(at+bs)/st, \\(a/s)(b/t)=ab/st.\] There is no difference between this one and the one in elementary algebra. But first of all we need to show that \(S^{-1}A\) indeed form a ring.

(Proposition 4)The addition and multiplication are well defined. Further, \(S^{-1}A\) is a commutative ring with identity.

*Proof.* Suppose \((a,s) \sim (a',s')\) and \((b,t) \sim (b',t')\) we need to show that \[(a/s)+(b/t)=(a'/s')+(b'/t')\] or \[(at+bs)/st = (a't'+b's')/s't'.\] There exists \(u,v \in S\) such that \[(as'-a's)u=0 \quad (bt'-b't)v=0.\] If we multiply the first equation by \(vtt'\) and second equation by \(uss'\), we see \[as'uvtt'-a'suvtt'+bt'vuss'-b'tvuss'=[(at)s't'+(bs)s't'-(a't')st-(b's')st]uv,\] which is exactly what we want.

On the other hand, we need to show that \[ab/st = a'b'/s't'.\] That is, \[\exists y \in S,(abs't'-a'b'st)y=0.\] Again, we have \[(as'-a's)u=(as'-a's)uvbt'=(abs't'-a'bst')uv=0, \\(bt'-b't)v=(bt'-b't)vua's=(a'bst'-a'b'st)uv=0.\] Hence \[(abs't'-a'bst')uv+(a'bst'-a'b'st)uv=(abs't'-a'b'st)uv=0.\] Since \(uv \in S\), we are done.

Next we show that \(S^{-1}A\) has a ring structure. If \(0 \in S\), then \(S^{-1}A\) contains exactly one element \(0/1\) since in this case, all pairs are equivalent: \[(at-bs)0=0.\] We therefore only discuss the case when \(0 \notin S\). First \(0/1\) is the zero element with respect to addition since \[0/1+a/s = (0s+1a)/1s = a/s.\] On the other hand, we have the inverse \(-a/s\): \[-a/s+a/s = (-as+as)/ss=0/ss=0/1.\] \(1/1\) is the unit with respect to multiplication: \[(1/1)(a/s)=1a/1s=a/s.\] Multiplication is associative since \[[(a/s)(b/t)](c/u)=(ab/st)(c/u)=abc/stu. \\(a/s)[(b/t)(c/u)]=(a/s)(bc/tu)=abc/stu.\] Multiplication is commutative since \[ab/st+(-ba)/st=(abst-bast)/s^2t^2=0.\] Finally distributivity. \[(a/s+b/t)(c/u)=(c/u)(a/s+b/t)=[(at+bs)/st](c/u)=(act+bcs)/stu \\(a/s)(c/u)+(b/t)(c/u)=ac/su+bc/tu=(actu+bcsu)/stu^2=(act/bcs)/stu\] Note \(ab/cb=a/c\) since \((abc-abc)1=0\). \(\square\) \(\def\mb{\mathbb}\)

First we consider the case when \(A\) is entire. If \(0 \in S\), then \(S^{-1}A\) is trivial, which is not so interesting. However, provided that \(0 \notin S\), we get some well-behaved result:

(Proposition 5)Let \(A\) be an entire ring, and let \(S\) be a multiplicatively closed subset of \(A\) that does not contain \(0\), then the natural map \[\begin{aligned}\varphi_S: A &\to S^{-1}A \\ x &\mapsto x/1\end{aligned}\] is injective. Therefore it can be considered as a natural inclusion. Further, every element of \(\varphi_S(S)\) is invertible.

*Proof.* Indeed, if \(x/1=0/1\), then there exists \(s \in S\) such that \(xs=0\). Since \(A\) is entire and \(s \neq 0\), we see \(x=0\), hence \(\varphi_S\) is entire. For \(s \in S\), we see \(\varphi_S(s)=s/1\). However \((1/s)\varphi_S(s)=(1/s)(s/1)=s/s=1\). \(\square\)

Note since \(A\) is entire we can also conclude that \(S^{-1}A\) is entire. As a word of warning, the ring homomorphism \(\varphi_S\) is *not* in general injective since, for example, when \(0 \in S\), this map is the zero.

If we go further, making \(S\) contain all non-zero element, we have:

(Proposition 6)If \(A\) is entire and \(S\) contains all non-zero elements of \(A\), then \(S^{-1}A\) is a field, called thequotient fieldor thefield of fractions.

*Proof.* First we need to show that \(S^{-1}A\) is entire. Suppose \((a/s)(b/t)=ab/st =0/1\) but \(a/s \neq 0/1\), we see however \[ab/st=0/1 \implies \exists u \in S, (ab-0)u=0 \implies ab=0.\] Since \(A\) is entire, \(b\) has to be \(0\), which implies \(b/t=0/1\). Second, if \(a/s \neq 0/1\), we see \(a \neq 0\) and therefore is in \(S\), hence we've found the inverse \((a/s)^{-1}=s/a\). \(\square\)

In this case we can identify \(A\) as a subset of \(S^{-1}A\) and write \(a/s=s^{-1}a\).

Let \(A\) be a commutative ring, an let \(S\) be the set of invertible elements of \(A\). If \(u \in S\), then there exists some \(v \in S\) such that \(uv=1\). We see \(1 \in S\) and if \(a,b \in S\), we have \(ab \in S\) since \(ab\) has an inverse as well. This set is frequently denoted by \(A^\ast\), and is called the group of **invertible** elements of \(A\). For example for \(\mb{Z}\) we see \(\mb{Z}^\ast\) consists of \(-1\) and \(1\). If \(A\) is a field, then \(A^\ast\) is the multiplicative group of non-zero elements of \(A\). For example \(\mb{Q}^\ast\) is the set of all rational numbers without \(0\). For \(A^\ast\) we have

If \(A\) is a field, then \((A^\ast)^{-1}A \simeq A\).

*Proof.* Define \[\begin{aligned}\varphi_S:A &\to (A^\ast)^{-1}A \\ x &\mapsto x/1.\end{aligned}\] Then as we have already shown, \(\varphi_S\) is injective. Secondly we show that \(\varphi_S\) is surjective. For any \(a/s \in (A^\ast)^{-1}A\), we see \(as^{-1}/1 = a/s\). Therefore \(\varphi_S(as^{-1})=a/s\) as is shown. \(\square\)

Now let's see a concrete example. If \(A\) is entire, then the polynomial ring \(A[X]\) is entire. If \(K = S^{-1}A\) is the quotient field of \(A\), we can denote the quotient field of \(A[X]\) as \(K(X)\). Elements in \(K(X)\) can be naturally called **rational polynomials**, and can be written as \(f(X)/g(X)\) where \(f,g \in A[X]\). For \(b \in K\), we say a rational function \(f/g\) is **defined** at \(b\) if \(g(b) \neq 0\). Naturally this process can be generalized to polynomials of \(n\) variables.

We say a commutative ring \(A\) is local if it has a unique maximal ideal. Let \(\mfk{p}\) be a prime ideal of \(A\), and \(S = A \setminus \mfk{p}\), then \(A_{\mfk{p}}=S^{-1}A\) is called the **local ring of \(A\) at \(\mfk{p}\)**. Alternatively, we say the process of passing from \(A\) to \(A_\mfk{p}\) is *localization* at \(\mfk{p}\). You will see it makes sense to call it localization:

(Proposition 7)\(A_\mfk{p}\) is local. Precisely, the unique maximal ideal is \[I=\mfk{p}A_\mfk{p}=\{a/s:a \in \mfk{p},s \in S\}.\] Note \(I\) is indeed equal to \(\mfk{p}A_\mfk{p}\).

*Proof.* First we show that \(I\) is an ideal. For \(b/t \in A_\mfk{p}\) and \(a/s \in I\), we see \[(b/t)(a/s)=ba/ts \in A_\mfk{p}\] since \(a \in \mfk{p}\) implies \(ba \in \mfk{p}\). Next we show that \(I\) is maximal, which is equivalent to show that \(A_\mfk{p}/I\) is a field. For \(b/t \notin I\), we have \(b \in S\), hence it is legit to write \(t/b\). This gives \[(b/t+I)(t/b+I)=1/1+I.\] Hence we have found the inverse.

Finally we show that \(I\) is the unique maximal ideal. Let \(J\) be another maximal ideal. Suppose \(J \neq I\), then we can pick \(m/n \in J \setminus I\). This gives \(m \in S\) since if not \(m \in \mfk{p}\) and then \(m/n \in I\). But for \(n/m \in A_\mfk{p}\) we have \[(m/n)(n/m)=1/1 \in J.\] This forces \(J\) to be \(A_\mfk{p}\) itself, contradicting the assumption that \(J\) is a maximal ideal. Hence \(I\) is unique. \(\square\)

Let \(p\) be a prime number, and we take \(A=\mb{Z}\) and \(\mfk{p}=p\mb{Z}\). We now try to determine what do \(A_\mfk{p}\) and \(\mfk{p}A_\mfk{p}\) look like. First \(S = A \setminus \mfk{p}\) is the set of all entire numbers prime to \(p\). Therefore \(A_\mfk{p}\) can be considered as the ring of all rational numbers \(m/n\) where \(n\) is prime to \(p\), and \(\mfk{p}A_\mfk{p}\) can be considered as the set of all rational numbers \(kp/n\) where \(k \in \mb{Z}\) and \(n\) is prime to \(p\).

\(\mb{Z}\) is the simplest example of ring and \(p\mb{Z}\) is the simplest example of prime ideal. And \(A_\mfk{p}\) in this case shows what does localization do: \(A\) is 'expanded' with respect to \(\mfk{p}\). Every member of \(A_\mfk{p}\) is related to \(\mfk{p}\), and the maximal ideal is determined by \(\mfk{p}\).

Let \(k\) be a infinite field. Let \(A=k[x_1,\cdots,x_n]\) where \(x_i\) are independent indeterminates, \(\mfk{p}\) a prime ideal in \(A\). Then \(A_\mfk{p}\) is the ring of all rational functions \(f/g\) where \(g \notin \mfk{p}\). We have already defined rational functions. But we can go further and demonstrate the prototype of the local rings which arise in algebraic geometry. Let \(V\) be the variety defined by \(\mfk{p}\), that is, \[V=\{x=(x_1,x_2,\cdots,x_n) \in k^n:\forall f \in \mfk{p}, f(x)=0\}.\] Then what about \(A_\mfk{p}\)? We see since for \(f/g \in A_\mfk{p}\) we have \(g \notin \mfk{p}\), therefore for \(g(x)\) is not equal to \(0\) almost everywhere on \(V\). That is, \(A_\mfk{p}\) can be identified with the ring of all rational functions on \(k^n\) which are defined at *almost all* points of \(V\). We call this the local ring of \(k^n\) **along the variety** \(V\).

Let \(A\) be a ring and \(S^{-1}A\) a ring of fractions, then we shall see that \(\varphi_S:S \to S^{-1}A\) has a universal property.

(Proposition 8)Let \(g:A \to B\) be a ring homomorphism such that \(g(s)\) is invertible in \(B\) for all \(s \in S\), then there exists a unique homomorphism \(h:S^{-1}A \to B\) such that \(g = h \circ \varphi_S\).

*Proof.* For \(a/s \in S^{-1}A\), define \(h(a/s)=g(a)g(s)^{-1}\). It looks immediate but we shall show that this is what we are looking for and is unique.

Firstly we need to show that it is well defined. Suppose \(a/s=a'/s'\), then there exists some \(u \in S\) such that \[(as'-a's)u=0.\] Applying \(g\) on both side yields \[(g(a)g(s')-g(a')g(s))g(u)=0.\] Since \(g(x)\) is invertible for all \(s \in S\), we therefore get \[g(a)g(s)^{-1}=g(a')g(s')^{-1}.\] It is a homomorphism since \[\begin{aligned}h[(a/s)(a'/s')]&=g(a)g(a')g(s)^{-1}g(s')^{-1} \\h(a/s)h(a'/s')&=g(a)g(s)^{-1}g(a')g(s')^{-1},\end{aligned}\] and \[h(a/s+a'/s')=h((as'+a's)/ss')=g(as'+a's)g(ss')^{-1} \\h(a/s)+h(a'/s')=g(a)g(s)^{-1}+g(a')g(s')^{-1}\] they are equal since \[\begin{aligned}g(as'+a's)g(ss')^{-1}&=g(as')g(ss')^{-1}+g(a's)g(ss')^{-1} \\&=g(a)g(s')g(s)^{-1}g(s')^{-1}+g(a')g(s)g(s)^{-1}g(s')^{-1} \\&=g(a)g(s)^{-1}+g(a')g(s')^{-1}.\end{aligned}\] Next we show that \(g=h \circ \varphi_S\). For \(a \in A\), we have \[h(\varphi_S(a))=h(a/1)=g(a)g(1)^{-1}=g(a).\] Finally we show that \(h\) is unique. Let \(h'\) be a homomorphism satisfying the condition, then for \(a \in A\) we have \[h'(a/1)=h'(\varphi_S(a))=g(a).\] For \(s \in S\), we also have \[h'(1/s)=h'((s/1)^{-1})=h'(\varphi_S(s)^{-1})=h'(\varphi_S(s))^{-1}=g(s)^{-1}.\] Since \(a/s = (a/1)(1/s)\) for all \(a/s \in S^{-1}A\), we get \[h'(a/s)=h'((a/1)(1/s))=g(a)g(s)^{-1}.\] That is, \(h'\) (or \(h\)) is totally determined by \(g\). \(\square\)

Let's restate it in the language of category theory (you can skip it if you have no idea what it is now). Let \(\mfk{C}\) be the category whose objects are ring-homomorphisms \[f:A \to B\] such that \(f(s)\) is invertible for all \(s \in S\). Then according to proposition 5, \(\varphi_S\) is an object of \(\mfk{C}\). For two objects \(f:A \to B\) and \(f':A \to B'\), a morphism \(g \in \operatorname{Mor}(f,f')\) is a homomorphism \[g:B \to B'\] such that \(f'=g \circ f\). So here comes the question: what is the position of \(\varphi_S\)?

Let \(\mfk{A}\) be a category. an object \(P\) of \(\mfk{A}\) is called **universally attracting** if there exists a unique morphism of each object of \(\mfk{A}\) into \(P\), an is called **universally repelling** if for every object of \(\mfk{A}\) there exists a unique morphism of \(P\) into this object. Therefore we have the answer for \(\mfk{C}\).

(Proposition 9)\(\varphi_S\) is a universally repelling object in \(\mfk{C}\).

An ideal \(\mfk{o} \in A\) is said to be **principal** if there exists some \(a \in A\) such that \(Aa = \mfk{o}\). For example for \(\mb{Z}\), the ideal \[\{\cdots,-2,0,2,4,\cdots\}\] is principal and we may write \(2\mb{Z}\). If every ideal of a **commutative** ring \(A\) is principal, we say \(A\) is principal. Further we say \(A\) is a **PID** if \(A\) is also an integral domain (entire). When it comes to ring of fractions, we also have the following proposition:

(Proposition 10)Let \(A\) be a principal ring and \(S\) a multiplicatively closed subset with \(0 \notin S\), then \(S^{-1}A\) is principal as well.

*Proof.* Let \(I \subset S^{-1}A\) be an ideal. If \(a \in S\) where \(a/s \in I\), then we are done since then \((s/a)(a/s) = 1/1 \in I\), which implies \(I=S^{-1}A\) itself, hence we shall assume \(a \notin S\) for all \(a/s \in I\). But for \(a/s \in I\) we also have \((a/s)(s/1)=a/1 \in I\). Therefore \(J=\varphi_S^{-1}(I)\) is not empty. \(J\) is an ideal of \(A\) since for \(a \in A\) and \(b \in J\), we have \(\varphi_S(ab) =ab/1=(a/1)(b/1) \in I\) which implies \(ab \in J\). But since \(A\) is principal, there exists some \(a\) such that \(Aa = J\). We shall discuss the relation between \(S^{-1}A(a/1)\) and \(I\). For any \((c/u)(a/1)=ca/u \in S^{-1}A(a/1)\), clearly we have \(ca/u \in I\), hence \(S^{-1}A(a/1)\subset I\). On the other hand, for \(c/u \in I\), we see \(c/1=(c/u)(u/1) \in I\), hence \(c \in J\), and there exists some \(b \in A\) such that \(c = ba\), which gives \(c/u=ba/u=(b/u)(a/1) \in I\). Hence \(I \subset S^{-1}A(a/1)\), and we have finally proved that \(I = S^{-1}A(a/1)\). \(\square\)

As an immediate corollary, if \(A_\mfk{p}\) is the localization of \(A\) at \(\mfk{p}\), and if \(A\) is principal, then \(A_\mfk{p}\) is principal as well. Next we go through another kind of rings. A ring is called **factorial** (or a **unique factorization ring** or **UFD**) if it is entire and if every non-zero element has a unique factorization into irreducible elements. An element \(a \neq 0\) is called **irreducible** if it is not a unit and whenever \(a=bc\), then either \(b\) or \(c\) is a unit. For all non-zero elements in a factorial ring, we have \[a=u\prod_{i=1}^{r}p_i,\] where \(u\) is a unit (invertible).

In fact, every PID is a UFD (proof here). Irreducible elements in a factorial ring is called **prime elements** or simply **prime** (take \(\mathbb{Z}\) and prime numbers as an example). Indeed, if \(A\) is a factorial ring and \(p\) a prime element, then \(Ap\) is a prime ideal. But we are more interested in the ring of fractions of a factorial ring.

(Proposition 11)Let \(A\) be a factorial ring and \(S\) a multiplicatively closed subset with \(0 \notin S\), then \(S^{-1}A\) is factorial.

*Proof.* Pick \(a/s \in S^{-1}A\). Since \(A\) is factorial, we have \(a=up_1 \cdots p_k\) where \(p_i\) are primes and \(u\) is a unit. But we have no idea what are irreducible elements of \(S^{-1}A\). Naturally our first attack is \(p_i/1\). And we have no need to restrict ourselves to \(p_i\), we should work on all primes of \(A\). Suppose \(p\) is a prime of \(A\). If \(p \in S\), then \(p/1 \in S\) is a unit, not prime. If \(Ap \cap S \neq \varnothing\), then \(rp \in S\) for some \(r \in A\). But then \[(p/1)(r/rp)=1,\] again \(p/1\) is a unit, not prime. Finally if \(Ap \cap S = \varnothing\), then \(p/1\) is prime in \(S^{-1}A\). For any \[(a/s)(b/t)=ab/st=p/1,\] we see \(ab=stp \not\in S\). But this also gives \(ab \in Ap\) which is a prime ideal, hence we can assume \(a \in Ap\) and write \(a=rp\) for some \(r \in A\). With this expansion we get \[ab=stp \implies rbp=stp \implies rb=st \implies (r/s)(b/t)=1/1.\] Hence \(b/t\) is a unit, \(p/1\) is a prime.

Conversely, suppose \(a/s\) is irreducible in \(S^{-1}A\). Since \(A\) is factorial, we may write \(a=u\prod_{i}p_i\). \(a\) cannot be an element of \(S\) since \(a/s\) is not a unit. We write \[a/s=1/s[(u/1)(p_1/1)(p_2/1)\cdots(p_n/1)]\] We see there is some \(v \in A\) such that \(uv=1\) and accordingly \((u/1)(v/1)=uv/1=1/1\), hence \(u/1\) is a unit. We claim that there exist a unique \(p_k\) such that \(1 \leq k \leq n\) and \(Ap \cap S = \varnothing\). If not exists, then all \(p_j/1\) are units. If both \(p_{k}\) and \(p_{k'}\) satisfy the requirement and \(p_k \neq p_k'\), then we can write \(a/s\) as \[a/s = \{1/s[(u/1)(p_1/1)\cdots(p_{k-1}/1)(p_{k+1}/1)\cdots(p_{k'-1}/1)(p_{k'+1}/1)\cdots(p_n/1)](p_k/1)\}(p_{k'}/1).\] Neither the one in curly bracket nor \(p_{k'}/1\) is unit, contradicting the fact that \(a/s\) is irreducible. Next we show that \(a/s=p_k/1\). For simplicity we write \[b = u\prod_{i=1 \\ i \neq k}^{n} p_i, \quad a = bp_k.\] Note \(a/s = bp_k/s = (b/s)(p_k/1)\). Since \(a/s\) is irreducible, \(p_k/1\) is not a unit, we conclude that \(b/s\) is a unit. We are done for the study of irreducible elements of \(S^{-1}A\): it is of the form \(p/1\) (up to a unit) where \(p\) is prime in \(A\) and \(Ap \cap S = \varnothing\).

Now we are close to the fact that \(S^{-1}A\) is also factorial. For any \(a/s \in S^{-1}A\), we have an expansion \[a/s=1/s[(u/1)(p_1/1)(p_2/1)\cdots(p_n/1)].\] Let \(p'_1,p'_2,\cdots,p'_j\) be those whose generated prime ideal has nontrivial intersection with \(S\), then \(p'_1/1, p'_2/1,\cdots,p'_j/1\) are units of \(S^{-1}A\). Let \(q_1,q_2,\cdots,q_k\) be other \(p_i\)'s, then \(q_1/1,q_2/1,\cdots,q_k/1\) are irreducible in \(S^{-1}A\). This gives \[a/s = [(1/s)(p'_1/1)(p'_2/1)\cdots(p'_j/1)]\prod_{i=1}^{k}(q_i/1).\] Hence \(S^{-1}A\) is factorial as well. \(\square\)

We finish the whole post by a comprehensive proposition:

(Proposition 12)Let \(A\) be a factorial ring and \(p\) a prime element, \(\mfk{p}=Ap\). The localization of \(A\) at \(\mfk{p}\) is principal.

*Proof.* For \(a/s \in S^{-1}A\), we see \(p\) does not divide \(s\) since if \(s = rp\) for some \(r \in A\), then \(s \in \mfk{p}\), contradicting the fact that \(S = A \setminus \mfk{p}\). Since \(A\) is factorial, we may write \(a = cp^n\) for some \(n \geq 0\) and \(p\) does not divide \(c\) as well (which gives \(c \in S\). Hence \(a/s = (c/s)(p^n/1)\). Note \((c/s)(s/c)=1/1\) and therefore \(c/s\) is a unit. For every \(a/s \in S^{-1}A\) we may write it as \[a/s = u(p^n/1),\] where \(u\) is a unit of \(S^{-1}A\).

Let \(I\) be any ideal in \(S^{-1}A\), and \[m = \min\{n:u(p^n/1) \in I, u \text{ is a unit }\}.\] Let's discuss the relation between \(S^{-1}A(p^m/1)\) and \(I\). First we see \(S^{-1}A(p^m/1)=S^{-1}A(up^m/1)\) since if \(v\) is the inverse of \(u\), we get \[vS^{-1}A(up^m/1)=S^{-1}A(p^m/1) \subset S^{-1}A(up^m/1), \\S^{-1}A(up^m/1)=uS^{-1}A(p^m/1)\subset S^{-1}A(p^m/1).\] Any element of \(S^{-1}A(up^m/1)\) is of the form \[vup^{m+k}/1=v(p^k/1)up^m/1.\] Since \(up^m/1 \in I\), we see \(vup^{m+k}/1 \in I\) as well, hence \(S^{-1}A(up^m/1) \subset I\). On the other hand, any element of \(I\) is of the form \(wup^{m+n}/1=w(p^n/1)u(p^m/1)\) where \(w\) is a unit and \(n \geq 0\). This shows that \(vup^{m+n}/1 \in S^{-1}A(up^m/1)\). Hence \(S^{-1}A(p^m/1)=S^{-1}A(up^m/1)=I\) as we wanted. \(\square\)

]]>Let \(A\) be an abelian group. Let \((e_i)_{i \in I}\) be a family of elements of \(A\). We say that this family is a **basis** for \(A\) if the family is not empty, and if every element of \(A\) has a unique expression as a **linear expression** \[x = \sum_{i \in I} x_i e_i\] where \(x_i \in \mathbb{Z}\) and almost all \(x_i\) are equal to \(0\). This means that the sum is actually finite. An abelian group is said to be **free** if it has a basis. Alternatively, we may write \(A\) as a direct sum by \[A \cong \bigoplus_{i \in I}\mathbb{Z}e_i.\]

Let \(S\) be a set. Say we want to get a group out of this for some reason, so how? It is not a good idea to endow \(S\) with a binary operation beforehead since overall \(S\) is merely a set. We shall **generate** a group out of \(S\) in the most **freely** way.

Let \(\mathbb{Z}\langle S \rangle\) be the set of all **maps** \(\varphi:S \to \mathbb{Z}\) such that, for only a **finite** number of \(x \in S\), we have \(\varphi(x) \neq 0\). For simplicity, we denote \(k \cdot x\) to be some \(\varphi_0 \in \mathbb{Z}\langle S \rangle\) such that \(\varphi_0(x)=k\) but \(\varphi_0(y) = 0\) if \(y \neq x\). For any \(\varphi\), we claim that \(\varphi\) has a unique expression \[\varphi=k_1 \cdot x_1 + k_2 \cdot x_2 + \cdots + k_n \cdot x_n.\] One can consider these integers \(k_i\) as the order of \(x_i\), or simply the time that \(x_i\) appears (may be negative). For \(\varphi\in\mathbb{Z}\langle S \rangle\), let \(I=\{x_1,x_2,\cdots,x_n\}\) be the set of elements of \(S\) such that \(\varphi(x_i) \neq 0\). If we denote \(k_i=\varphi(x_i)\), we can show that \(\psi=k_1 \cdot x_1 + k_2 \cdot x_2 + \cdots + k_n \cdot x_n\) is equal to \(\varphi\). For \(x \in I\), we have \(\psi(x)=k\) for some \(k=k_i\neq 0\) by definition of the '\(\cdot\)'; if \(y \notin I\) however, we then have \(\psi(y)=0\). This coincides with \(\varphi\). \(\blacksquare\)

By definition the zero map \(\mathcal{O}=0 \cdot x \in \mathbb{Z}\langle S \rangle\) and therefore we may write any \(\varphi\) by \[\varphi=\sum_{x \in S}k_x\cdot x\] where \(k_x \in \mathbb{Z}\) and can be zero. Suppose now we have two expressions, for example \[\varphi=\sum_{x \in S}k_x \cdot x=\sum_{x \in S}k_x'\cdot x\] Then \[\varphi-\varphi=\mathcal{O}=\sum_{x \in S}(k_x-k'_x)\cdot x\] Suppose \(k_y - k_y' \neq 0\) for some \(y \in S\), then \[\mathcal{O}(y)=k_y-k_y'\neq 0\] which is a contradiction. Therefore the expression is unique. \(\blacksquare\)

This \(\mathbb{Z}\langle S \rangle\) is what we are looking for. It is an additive group (which can be proved immediately) and, what is more important, every element can be expressed as a 'sum' associated with finite number of elements of \(S\). We shall write \(F_{ab}(S)=\mathbb{Z}\langle S \rangle\), and call it the **free abelian group generated by \(S\)**. For elements in \(S\), we say they are **free generators** of \(F_{ab}(S)\). If \(S\) is a finite set, we say \(F_{ab}(S)\) is **finitely generated**.

An abelian group is

freeif and only if it is isomorphic to a free abelian group \(F_{ab}(S)\) for some set \(S\).

**Proof.** First we shall show that \(F_{ab}(S)\) is free. For \(x \in M\), we denote \(\varphi = 1 \cdot x\) by \([x]\). Then for any \(k \in \mathbb{Z}\), we have \(k[x]=k \cdot x\) and \(k[x]+k'[y] = k\cdot x + k' \cdot y\). By definition of \(F_{ab}(S)\), any element \(\varphi \in F_{ab}(S)\) has a unique expression \[\varphi = k_1 \cdot x_1 + \cdots + k_n \cdot x_n =k_1[x_1]+\cdots+k_n[x_n]\] Therefore \(F_{ab}(S)\) is free since we have found the basis \(([x])_{x \in S}\).

Conversely, if \(A\) is free, then it is immediate that its basis \((e_i)_{i \in I}\) generates \(A\). Our statement is therefore proved. \(\blacksquare\)

(Proposition 1)If \(A\) is an abelian group, then there is a free group \(F\) which has a subgroup \(H\) such that \(A \cong F/H\).

**Proof.** Let \(S\) be any set containing \(A\). Then we get a surjective map \(\gamma: S \to A\) and a free group \(F_{ab}(S)\). We also get a unique homomorphism \(\gamma_\ast:F_{ab}(S) \to A\) by \[\begin{aligned}\gamma_\ast:F_{ab}(S) &\to A \\\varphi=\sum_{x \in S}k_x\cdot x &\mapsto \sum_{x \in S}k_x\gamma(x)\end{aligned}\] which is also surjective. By the first isomorphism theorem, if we set \(H=\ker(\gamma_\ast)\) and \(F_{ab}(S)=F\), then \[F/H \cong A.\] \(\blacksquare\)

(Proposition 2)If \(A\) is finitely generated, then \(F\) can also be chosen to be finitely generated.

**Proof.** Let \(S\) be the generator of \(A\), and \(S'\) is a set containing \(S\). Note if \(S\) is finite, which means \(A\) is finitely generated, then \(S'\) can also be finite by inserting one or any finite number more of elements. We have a map from \(S\) and \(S'\) into \(F_{ab}(S)\) and \(F_{ab}(S')\) respectively by \(f_S(x)=1 \cdot x\) and \(f_{S'}(x')=1 \cdot x'\). Define \(g=f_{S'} \circ \lambda:S' \to F_{ab}(S)\) we get another homomorphism by \[\begin{aligned}g_\ast:F_{ab}(S') &\to F_{ab}(S) \\\varphi'=\sum_{x \in S'}k_{x} \cdot x &\mapsto \sum_{x \in S'}k_{x}\cdot g(x)\end{aligned}\] This defines a unique homomorphism such that \(g_\ast \circ f_{S'} = g\). As one can also verify, this map is also surjective. Therefore by the first isomorphism theorem we have \[A \cong F_{ab}(S) \cong F_{ab}(S')/\ker(g_\ast)\] \(\blacksquare\)

It's worth mentioning separately that we have implicitly proved two statements with commutative diagrams:

(Proposition 3 | Universal property)If \(g:S \to B\) is a mapping of \(S\) into some abelian group \(B\), then we can define a unique group-homomorphism making the following diagram commutative:

(Proposition 4)If \(\lambda:S \to S\) is a mapping of sets, there is a unique homomorphism \(\overline{\lambda}\) making the following diagram commutative:

(In the proof of Proposition 2 we exchanged \(S\) an \(S'\).)

(The Grothendieck group)Let \(M\) be a commutative monoid written additively. We shall prove that there exists a commutative group \(K(M)\) with a monoid homomorphism \[\gamma:M \to K(M)\]satisfying the following universal property: If \(f:M \to A\) is a homomorphism from \(M\) into a abelian group \(A\), then there exists a unique homomorphism \(f_\gamma:K(M) \to A\) such that \(f=f_\gamma\circ\gamma\). This can be represented by a commutative diagram:

**Proof.** There is a commutative diagram describes what we are doing.

Let \(F_{ab}(M)\) be the free abelian group generated by \(M\). For \(x \in M\), we denote \(1 \cdot x \in F_{ab}(M)\) by \([x]\). Let \(B\) be the group generated by all elements of the type \[[x+y]-[x]-[y]\] where \(x,y \in M\). This can be considered as a subgroup of \(F_{ab}(M)\). We let \(K(M)=F_{ab}(M)/B\). Let \(i=x \to [x]\) and \(\pi\) be the canonical map \[\pi:F_{ab}(M) \to F_{ab}(M)/B.\] We are done by defining \(\gamma: \pi \circ i\). Then we shall verify that \(\gamma\) is our desired homomorphism satisfying the universal property. For \(x,y \in M\), we have \(\gamma(x+y)=\pi([x+y])\) and \(\gamma(x)+\gamma(y) = \pi([x])+\pi([y])=\pi([x]+[y])\). However we have \[[x+y]-[x]-[y] \in B,\] which implies that \[\gamma(x)+\gamma(y)=\pi([x]+[y])=\pi([x+y]) = \gamma(x+y).\] Hence \(\gamma\) is a monoid-homomorphism. Finally the universal property. By proposition 3, we have a unique homomorphism \(f_\ast\) such that \(f_\ast \circ i = f\). Note if \(y \in B\), then \(f_\ast(y) =0\). Therefore \(B \subset \ker{f_\ast}\) Therefore we are done if we define \(f_\gamma(x+B)=f_\ast (x)\). \(\blacksquare\)

Why such a \(B\)? Note in general \([x+y]\) is not necessarily equal to \([x]+[y]\) in \(F_{ab}(M)\), but we don't want it to be so. So instead we create a new **equivalence relation**, by factoring a subgroup generated by \([x+y]-[x]-[y]\). Therefore in \(K(M)\) we see \([x+y]+B = [x]+[y]+B\), which finally makes \(\gamma\) a homomorphism. We use the same strategy to generate the **tensor product** of two modules later. But at that time we have more than one relation to take care of.

If for all \(x,y,z \in M\), \(x+y=x+z\) implies \(y=z\), then we say \(M\) is a cancellative monoid, or the cancellation law holds in \(M\). Note for the proof above we didn't use any property of cancellation. However we still have an interesting property for cancellation law.

(Theorem)The cancellation law holds in \(M\) if and only if \(\gamma\) is injective.

**Proof.** This proof involves another approach to the Grothendieck group. We consider pairs \((x,y) \in M \times M\) with \(x,y \in M\). Define \[(x,y) \sim (x',y') \iff \exists \ell \in M, y+x'+\ell=x+y'+\ell.\] Then we get a equivalence relation (try to prove it yourself!). We define the addition component-wise, that is, \((x,y)+(x',y')=(x+x',y+y')\), then the equivalence classes of pairs form a group \(A\), where the zero element is \([(0,0)]\). We have a monoid-homomorphism \[f:x \mapsto [(x,0)].\] If cancellation law holds in \(M\), then \[\begin{aligned}f(x) = f(y) &\implies [(x,0)] = [(y,0)] \\ &\implies 0+y+\ell=x+0+\ell \\ &\implies x=y.\end{aligned}\] Hence \(f\) is injective. By the universal property of the Grothendieck group, we get a unique homomorphism \(f_\gamma\) such that \(f_\gamma \circ \gamma = f\). If \(x \neq 0\) in \(M\), then \(f_\gamma \circ \gamma(x) \neq 0\) since \(f\) is injective. This implies \(\gamma(x) \neq 0\). Hence \(\gamma\) is injective.

Conversely, if \(\gamma\) is injective, then \(i\) is injective (this can be verified by contradiction). Then we see \(f=f_\ast \circ i\) is injective. But \(f(x)=f(y)\) if and only if \(x+\ell = y+\ell\), hence \(x+ \ell = y+ \ell\) implies \(x=y\), the cancellation law holds on \(M\).

Our first example is \(\mathbb{N}\). Elements of \(F_{ab}(\mathbb{N})\) are of the form \[\varphi=k_1 \cdot n_1 + k_2 \cdot n_2+\cdots + k_m \cdot n_m.\] For elements in \(B\) they are generated by \[\varphi=1\cdot (m+n)-1\cdot m - 1\cdot n\] which we wish to represent \(0\). Indeed, \(K(\mathbb{N}) \simeq \mathbb{Z}\) since if we have a homomorphism \[\begin{aligned}f:K(\mathbb{N}) &\to \mathbb{Z} \\ \sum_{j=1}^{m}k_j \cdot n_j +B &\mapsto \sum_{j=1}^{m}k_j n_j.\end{aligned}\] For \(r \in \mathbb{Z}\), we see \(f(1 \cdot r+B)=r\). On the other hand, if \(\sum_{j=1}^{m}k_j \cdot n_j \not\in B\), then its image under \(f\) is not \(0\).

In the first example we 'granted' the natural numbers 'subtraction'. Next we grant the division on multiplicative monoid.

Consider \(M=\mathbb{Z} \setminus 0\). Now for \(F_{ab}(M)\) we write elements in the form \[\varphi={}^{k_1}n_1{}^{k_2}n_2\cdots{}^{k_m}n_m\] which denotes that \(\varphi(n_j)=k_j\) and has no other differences. Then for elements in \(B\) they are generated by \[\varphi = {}^1(n_1n_2){}^{-1}(n_1)^{-1}(n_2)\] which we wish to represent \(1\). Then we see \(K(M) \simeq \mathbb{Q} \setminus 0\) if we take the isomorphism \[\begin{aligned}f:K(M) &\to \mathbb{Q} \setminus 0 \\\left(\prod_{j=1}^{m}{}^{k_j}n_j\right)B &\mapsto \prod_{j=1}^{m}n_j^{k_j}.\end{aligned}\]

Of course this is not the end of the Grothendieck group. But for further example we may need a lot of topology background. For example, we have the topological \(K\)-theory group of a topological space to be the Grothendieck group of isomorphism classes of topological vector bundles. But I think it is not a good idea to post these examples at this timing.

]]>We begin our study by some elementary Calculus. Now we have the function \(f(x)=x^2+\frac{e^x}{x^2+1}\) as our example. It should not be a problem to find its tangent line at point \((0,1)\), by calculating its derivative, we have \(l:x-y+1=0\) as the tangent line.

\(l\) is not a vector space since it does not get cross the origin, in general. But \(l-\overrightarrow{OA}\) is a vector space. In general, suppose \(P(x,y)\) is a point on the curve determined by \(f\), i.e. \(y=f(x)\), then we obtain a vector space \(l_p-\overrightarrow{OP} \simeq \mathbb{R}\). But the action of moving the tangent line to the origin is superfluous so naturally we consider the tangent line at \(P\) as a vector space **determined** by \(P\). In this case, the induced vector space (tangent line) is always of dimension \(1\).

Now we move to two-variable functions. We have a function \(a(x,y)=x^2+y^2-x-y+xy\) as our example. Some elementary Calculus work gives us the tangent surface of \(z=a(x,y)\) at \(A(1,1,1)\), which can be identified by \(S:2x+2y-z=3\simeq\mathbb{R}^2\). Again, this can be considered as a vector space **determined** by \(A\), or roughly speaking it is one if we take \(A\) as the origin. Further we have a base \((\overrightarrow{AB},\overrightarrow{AC})\). Other vectors on \(S\), for example \(\overrightarrow{AD}\), can be written as a linear combination of \(\overrightarrow{AB}\) and \(\overrightarrow{AC}\). In other words, \(S\) is "spanned" by \((\overrightarrow{AB},\overrightarrow{AC})\).

Tangent line and tangent surface play an important role in differentiation. But sometimes we do not have a chance to use it with ease, for example \(S^1:x^2+y^2=1\) cannot be represented by a single-variable function. However the implicit function theorem, which you have already learned in Calculus, gives us a chance to find a satisfying function locally. Here in this post we will try to generalize this concept, trying to find the tangent **space** at some point of a manifold. (The two examples above have already determined two manifolds and two tangent spaces.)

We will introduce the abstract definition of a tangent vector at beginning. You may think it is way too abstract but actually it is not. Surprisingly, the following definition can simplify our work in the future. But before we go, make sure that you have learned about Fréchet derivative (along with some functional analysis knowledge).

Let \(M\) be a manifold of class \(C^p\) with \(p \geq 1\) and let \(x\) be a point of \(M\). Let \((U,\varphi)\) be a chart at \(x\) and \(v\) be a element of the vector space \(\mathbf{E}\) where \(\varphi(U)\) lies (for example, if \(M\) is a \(d\)-dimensional manifold, then \(v \in \mathbb{R}^d\)). Next we consider the triple \((U,\varphi,v)\). Suppose \((U,\varphi,v)\) and \((V,\psi,w)\) are two such triples. We say these two triples are **equivalent** if the following identity holds: \[{\color\green{[}}{\color\red{(}}{\color\red{\psi\circ\varphi^{-1}}}{\color\red{)'}}{\color\red{(}}{\color\purple{\varphi(x)}}{\color\red)}{\color\green{]}}(v)=w.\] This identity looks messy so we need to explain how to read it. First we consider the function in red: the derivative of \(\psi\circ\varphi^{-1}\). The derivative of \(\psi\circ\varphi^{-1}\) at point \(\varphi(x)\) (in purple) is a linear transform, and the transform is embraced with green brackets. Finally, this linear transform maps \(v\) to \(w\). In short we read, the derivative of \(\psi\circ\varphi^{-1}\) at \(\varphi(x)\) maps \(v\) on \(w\). You may recall that you have meet something like \(\psi\circ\varphi^{-1}\) in the definition of manifold. It is not likely that these 'triples' should be associated to tangent vectors. But before we explain it, we need to make sure that we indeed defined an equivalent relation.

(Theorem 1)The relation \[(U,\varphi,v) \sim (V,\psi,w)\\[(\psi\circ\varphi^{-1})'(\varphi(x))](v)=w\] is an equivalence relation.

*Proof.* This will not go further than elementary Calculus, in fact, chain rule:

(Chain rule)If \(f:U \to V\) is differentiable at \(x_0 \in U\), if \(g: V \to W\) is differentiable at \(f(x_0)\), then \(g \circ f\) is differentiable at \(x_0\), and \[(g\circ f)'(x_0)=g'(f(x_0))\circ f'(x_0)\]

- \((U,\varphi,v)\sim(U,\varphi,v)\).

Since \(\varphi\circ\varphi^{-1}=\operatorname{id}\), whose derivative is still the identity everywhere, we have \[[(\varphi\circ\varphi^{-1})'(\varphi(x))](v)=\operatorname{id}(v)=v\]

- If \((U,\varphi,v) \sim (V,\psi,w)\), then \((V,\psi,w)\sim(U,\varphi,v)\).

So now we have \[[(\psi\circ\varphi^{-1})'(\varphi(x))](v)=w.\] To prove that \([(\varphi\circ\psi^{-1})'(\psi(x))]{}(w)=v\), we need some implementation of chain rule.

Note first \[(\psi\circ\varphi^{-1})'(\varphi(x))=\psi'(\varphi^{-1}(\varphi(x)))\circ\varphi^{-1}{'}(\varphi(x))=\psi'(x)\circ(\varphi^{-1})'(\varphi(x))\] while \[(\varphi\circ\psi^{-1})'(\psi(x))=\varphi'(x)\circ(\psi^{-1})'(\psi(x)).\] But also by the chain rule, if \(f\) is a diffeomorphism, we have \[(f\circ f^{-1})'(x)=(f^{-1})'(f(x))\circ f'(x)=\operatorname{id}\] or equivalently \[f'(x)=[(f^{-1})'(f(x))]^{-1} \quad (f^{-1})'(f(x))=[f'(x)]^{-1}\]

Therefore \[\begin{aligned}\{(\psi\circ\varphi^{-1})'(\varphi(x))\}^{-1} &=\{\psi'(x)\circ(\varphi^{-1})'(\varphi(x))\}^{-1} \\&=\{(\varphi^{-1})'(\varphi(x))\}^{-1}\circ\{\psi'(x)\}^{-1} \\&=\varphi'(x)\circ(\psi^{-1})'(\psi(x)) \\&=(\varphi\circ\psi^{-1})'(\psi(x))\end{aligned}\] which implies \[(\varphi\circ\psi^{-1})'(\psi(x))(w)=\{(\psi\circ\varphi^{-1})'(\varphi(x))\}^{-1}(w)=v.\]

- If \((U,\varphi,v)\sim(V,\psi,w)\) and \((V,\psi,w)\sim(W,\lambda,z)\), then \((U,\varphi,v)\sim(W,\lambda,z)\).

We are given identities \[[(\psi\circ\varphi^{-1})'(\varphi(x))](v)=w\] and \[[(\lambda\circ\psi^{-1})'(\psi(x))](w)=z.\] By canceling \(w\), we get \[\begin{aligned}z = [(\lambda\circ\psi^{-1})'(\psi(x))] \circ [(\psi\circ\varphi^{-1})'(\varphi(x))] (v)\end{aligned}.\] On the other hand, \[\begin{aligned}(\lambda\circ\varphi^{-1})'(\varphi(x))&=(\lambda\circ\psi^{-1}\circ\psi\circ\varphi^{-1})'(\varphi(x)) \\&=(\lambda\circ\psi^{-1})'(\psi\circ\varphi^{-1}\circ\varphi(x))\circ(\psi\circ\varphi^{-1})'(\varphi(x)) \\&=(\lambda\circ\psi^{-1})'(\psi(x))\circ(\psi\circ\varphi^{-1})'(\varphi(x))\end{aligned}\] which is what we needed. \(\square\)

An **equivalence class** of such triples \((U,\varphi,v)\) is called a **tangent vector** of \(X\) at \(x\). The set of such tangent vectors is called the **tangent space** to \(X\) at \(x\), which is denoted by \(T_x(X)\). But it seems that we have gone too far. Is the triple even a 'vector'? To get a clear view let's see Euclidean submanifolds first.

Suppose \(M\) is a submanifold of \(\mathbb{R}^n\). We say \(z\) is the

tangent vectorof \(M\) at point \(x\) if there exists a curve \(\alpha\) of class \(C^1\), which is defined on \(\mathbb{R}\) and where there exists an interval \(I\) such that \(\alpha(I) \subset M\), such that \(\alpha(t_0)=x\) and \(\alpha'(t_0)=z\). (For convenience we often take \(t_0=0\).)

This definition is immediate if we check some examples. For the curve \(M: x^2+1+\frac{e^x}{x^2+1}-y=0\), we can show that \((1,1)^T\) is a tangent vector of \(M\) at \((0,1)\), which is identical to our first example. Taking \[\alpha(t)=(t,t^2+1+\frac{e^t}{t^2+1})\] we get \(\alpha(0)=(0,1)\) and \[\alpha'(t)=(1,2t+\frac{e^t(t-1)^2}{(t^2+1)^2})^T.\] Therefore \(\alpha'(0)=(1,1)^T\). \(\square\)

Let \(\mathbf{E}\) and \(\mathbf{F}\) be two Banach spaces and \(U\) an open subset of \(\mathbf{E}\). A \(C^p\) map \(f: U \to \mathbf{F}\) is called an

immersionat \(x\) if \(f'(x)\) is injective.

For example, if we take \(\mathbf{E}=\mathbf{F}=\mathbb{R}=U\) and \(f(x)=x^2\), then \(f\) is an immersion at almost all point on \(\mathbb{R}\) except \(0\) since \(f'(0)=0\) is not injective. This may lead you to Sard's theorem.

(Theorem 2)Let \(M\) be a subset of \(\mathbb{R}^n\), then \(M\) is a \(d\)-dimensional \(C^p\) submanifold of \(\mathbb{R}^n\) if and only if for every \(x \in M\) there exists an open neighborhood \(U \subset \mathbb{R}^n\) of \(x\), an open neighborhood \(\Omega \subset \mathbb{R}^d\) of \(0\) and a \(C^p\) map \(g: \Omega \to \mathbb{R}^n\) such that \(g\) is immersion at \(0\) such that \(g(0)=x\), and \(g\) is a homeomorphism between \(\Omega\) and \(M \cap U\) with the topology induced from \(\mathbb{R}^n\).

This follows from the definition of manifold and should not be difficult to prove. But it is not what this blog post should cover. For a proof you can check *Differential Geometry: Manifolds, Curves, and Surfaces* by Marcel Berger and Bernard Gostiaux. The proof is located in section 2.1.

A coordinate system on a \(d\)-dimensional \(C^p\) submanifold \(M\) of \(\mathbb{R}^n\) is a pair \((\Omega,g)\) consisting of an open set \(\Omega \subset \mathbb{R}^d\) and a \(C^p\) function \(g:\Omega \to \mathbb{R}^n\) such that \(g(\Omega)\) is open in \(V\) and \(g\) induces a homeomorphism between \(\Omega\) and \(g(\Omega)\).

For convenience, we say \((\Omega,g)\) is centered at \(x\) if \(g(0)=x\) and \(g\) is an immersion at \(x\). By theorem 2 it is always possible to find such a coordinate system centered at a given point \(x \in M\). The following theorem will show that we can get a easier approach to tangent vector.

(Theorem 3)Let \(\mathbf{E}\) and \(\mathbf{F}\) be two finite-dimensional vector spaces, \(U \subset \mathbf{E}\) an open set, \(f:U \to \mathbf{F}\) a \(C^1\) map, \(M\) a submanifold of \(\mathbf{E}\) contained in \(U\) and \(W\) a submanifold of \(\mathbf{F}\) such that \(f(M) \subset W\). Take \(x \in M\) and set \(y=f(x)\), If \(z\) is a tangent vector to \(M\) at \(x\), the image \(f'(x)(z)\) is a tangent vector to \(W\) at \(y=f(x)\).

*Proof.* Since \(z\) is a tangent vector, we see there exists a curve \(\alpha: J \to M\) such that \(\alpha(0)=x\) and \(\alpha'(0)=z\) where \(J\) is an open interval containing \(0\). The function \(\beta = f \circ \alpha: J \to W\) is also a curve satisfying \(\beta(0)=f(\alpha(0))=f(x)\) and \[\beta'(0)=f'(\alpha(0))\alpha'(0)=f'(x)(z),\] which is our desired curve. \(\square\)

We shall show that equivalence relation makes sense. Suppose \(M\) is a \(d\)-submanifold of \(\mathbb{R}^n\), \(x \in M\) and \(z\) is a tangent vector to \(M\) at \(x\). Let \((\Omega,g)\) be a coordinate system centered at \(x\). Since \(g \in C^p(\mathbb{R}^d;\mathbb{R}^n)\), we see \(g'(0)\) is a \(n \times d\) matrix, and injectivity ensures that \(\operatorname{rank}(g'(0))=d\).

Every open set \(\Omega \subset \mathbb{R}^d\) is a \(d\)-dimensional submanifold of \(\mathbb{R}^d\) (of \(C^p\)). Suppose now \(v \in \mathbb{R}^d\) is a tangent vector to \(\Omega\) at \(0\) (determined by a curve \(\alpha\)), then by Theorem 3, \(g \circ \alpha\) determines a tangent vector to \(M\) at \(x\), which is \(z_x=g'(0)(v)\). Suppose \((\Lambda,h)\) is another coordinate system centered at \(x\). If we want to obtain \(z_x\) as well, we must have \[h'(0)(w)=g'(0)(v),\] which is equivalent to \[w = (h'(0)^{-1} \circ g'(0))(v)=(h^{-1}\circ g)'(0)(v),\] for some \(w \in \mathbb{R}^d\) which is the tangent vector to \(\Lambda\) at \(0 \in \Lambda\). *(The inverse makes sense since we implicitly restricted ourself to \(\mathbb{R}^d\))*

However, we also have two charts by \((U,\varphi)=(g(\Omega),g^{-1})\) and \((V,\psi) = (h(\Lambda),h^{-1})\), which gives \[(h^{-1} \circ g)'(0)(v)=[(\psi \circ \varphi^{-1})'(\varphi(x))](v)=w\] and this is just our equivalence relation (don't forget that \(g(0)=x\) hence \(g^{-1}(x)=\varphi(x)=0\)!). There we have our reason for equivalence relation: If \((U,\varphi,v) \sim (V,\psi,w)\), then \((U,\varphi,u)\) and \((V,\psi,v)\) determines the same tangent vector but we do not have to evaluate it manually. In general, all elements in an equivalence class represent a single vector, so the vector is (algebraically) a equivalence class. This still holds when talking about Banach manifold since topological properties of Euclidean spaces do not play a role. The generalized proof can be implemented with little difficulty.

The tangent vectors at \(x \in M\) span a vector space (which is based at \(x\)). We do hope that because if not our definition of tangent vector would be incomplete and cannot even hold for an trivial example (such as what we mentioned at the beginning). We shall show, satisfyingly, the set of tangent vectors to \(M\) at \(x\) (which we write \(T_xM\)) forms a vector space that is toplinearly isomorphic to \(\mathbf{E}\), on which \(M\) is modeled.

(Theorem 4)\(T_xM \simeq \mathbf{E}\). In other words, \(T_xM\) can be given the structure of topological vector space given by the chart.

*Proof.* Let \((U,\varphi)\) be a chart at \(x\). For \(v \in \mathbf{E}\), we see \((\varphi^{-1})'(x)(v)\) is a tangent vector at \(x\). On the other hand, pick \(\mathbf{w} \in T_xM\), which can be represented by \((V,\psi,w)\). Then \[v=(\varphi\circ\psi^{-1})'(\psi(x))(w)\] makes \((U,\varphi,v) \sim (V,\psi,w)\) uniquely, and therefore we get some \(v \in \mathbf{E}\). To conclude, \[T_xM \xrightarrow[(\varphi^{-1})'(x)]{\simeq}\mathbf{E}\] which proves our theorem. Note that this does not depend on the choice of charts. \(\square\)

For many reasons it is not a good idea to identify \(T_xM\) as \(\mathbf{E}\) without mentioning the point \(x\). For example we shouldn't identify the tangent line of a curve as \(x\)-axis. Instead, it would be better to identify or visualize \(T_xM\) as \((x,\mathbf{E})\), that is, a linear space with origin at \(x\).

Now we treat *all* tangent spaces as a vector bundle. Let \(M\) be a manifold of class \(C^p\) with \(p \geq 1\), define the tangent bundle by the disjoint union \[T(M)=\bigsqcup_{x \in M}T_xM.\] This is a vector bundle if we define the projection by \[\begin{aligned}\pi: T(M) &\to M \\ y \in T_xM &\mapsto x\end{aligned}\] and we will verify it soon. First let's see an example. Below is a visualization of the tangent bundle of \(\frac{x^2}{4}+\frac{y^2}{3}=1\), denoted by red lines:

Also we can see \(\pi\) maps points on the blue line to a point on the curve, which is \(B\).

To show that a tangent bundle of a manifold is a vector bundle, we need to verify that it satisfies three conditions we mentioned in previous post. Let \((U,\varphi)\) be a chart of \(M\) such that \(\varphi(U)\) is open in \(\mathbf{E}\), then tangent vectors can be represented by \((U,\varphi,v)\). We get a bijection \[\tau_U:\pi^{-1}(U) = T(U) \to U \times \mathbf{E}\] by definition of tangent vectors as equivalence classes. Let \(z_x\) be a tangent vector to \(U\) at \(x\), then there exists some \(v \in \mathbf{E}\) such that \((U,\varphi,v)\) represents \(z\). On the other hand, for some \(v \in \mathbf{E}\) and \(x \in U\), \((U,\varphi,v)\) represents some tangent vector at \(x\). Explicitly, \[\tau_{U}(z_x)=(x,v)=(\pi(z_x),[(\varphi^{-1})'(\pi(z_x))]^{-1}(z_x))\]

Further we get the following diagram commutative (which establishes **VB 1**):

For **VB 2** and **VB 3** we need to check different charts. Let \((U_i,\varphi_i)\), \((U_j,\varphi_j)\) be two charts. Define \(\varphi_{ji}=\varphi_j \circ \varphi_i^{-1}\) on \(\varphi_i(U_i \cap U_j)\), and respectively we write \(\tau_{U_i}=\tau_i\) and \(\tau_{U_j}=\tau_j\). Then we get a transition mapping \[\tau_{ji}:(\tau_j \circ \tau_i^{-1}):(U_i \cap U_j) \times \mathbf{E} \to (U_i \cap U_j) \times \mathbf{E}.\]

One can verify that \[\tau_{ji}(x,v)=(\varphi_{ji}(x),D\varphi_{ji}(x) \cdot v)\] for \(x \in U_i \cap U_j\) and \(v \in \mathbf{E}\). Since \(D\varphi_{ji} \in C^{p-1}\) and \(D\varphi_{ji}(x)\) is a toplinear isomorphism, we see \[x \mapsto (\tau_j \circ \tau_i^{-1})_x=(\varphi_{ji}(x),D\varphi_{ji}(x)\cdot(\cdot))\] is a morphism, which goes for **VB 3**. It remains to verify **VB 2**. To do this we need a fact from Banach space theory:

If \(f:U \to L(\mathbf{E},\mathbf{F})\) is a \(C^k\)-morphism, then the map of \(U \times \mathbf{E}\) into \(\mathbf{F}\) given by \[(x,v) \mapsto [f(x)](v)\] is a \(C^k\)-morphism.

Here, we have \(f(x)=\tau_{ji}(x,\cdot)\) and to conclude, \(\tau_{ji}\) is a \(C^{p-1}\)-morphism. It is also an isomorphism since it has an inverse \(\tau_{ij}\). Following the definition of manifold, we can conclude that \(T(U)\) has a unique **manifold structure** such that \(\tau_i\) are morphisms (there will be a formal proof in next post about any total space of a vector bundle). By **VB 1**, we also have \(\pi=\tau_i\circ pr\), which makes it a morphism as well. On each fiber \(\pi^{-1}(x)\), we can freely transport the topological vector space structure of any \(\mathbf{E}\) such that \(x\) lies in \(U_i\), by means of \(\tau_{ix}\). Since \(f(x)\) is a toplinear isomorphism, the result is independent of the choice of \(U_i\). **VB 2** is therefore established.

Using some fancier word, we can also say that \(T:M \to T(M)\) is a **functor** from the category of \(C^p\)-manifolds to the category of vector bundles of class \(C^{p-1}\).

If \(f\) is of \(L^p(\mu)\), which means \(\lVert f \rVert_p=\left(\int_X |f|^p d\mu\right)^{1/p}<\infty\), or equivalently \(\int_X |f|^p d\mu<\infty\), then we may say \(|f|^p\) is of \(L^1(\mu)\). In other words, we have a function \[\begin{aligned}\lambda: L^p(\mu) &\to L^1(\mu) \\ f &\mapsto |f|^p.\end{aligned}\] This function does not have to be one to one due to absolute value. But we hope this function to be *fine* enough, at the very least, we hope it is continuous.

Here, \(f \sim g\) means that \(f-g\) equals to \(0\) almost everywhere with respect to \(\mu\). It can be easily verified that this is a equivalence relation.

We still use \(\varepsilon-\delta\) argument but it's in a metric space. Suppose \((X,d_1)\) and \((Y,d_2)\) are two metric spaces and \(f:X \to Y\) is a function. We say \(f\) is continuous at \(x_0 \in X\) if for any \(\varepsilon>0\), there exists some \(\delta>0\) such that \(d_2(f(x_0),f(x))<\varepsilon\) whenever \(d_1(x_0,x)<\delta\). Further, we say \(f\) is continuous on \(X\) if \(f\) is continuous at every point \(x \in X\).

For \(1\leq p<\infty\), we already have a metric by \[d(f,g)=\lVert f-g \rVert_p\] given that \(d(f,g)=0\) if and only if \(f \sim g\). This is complete and makes \(L^p\) a Banach space. But for \(0<p<1\) (yes we are going to cover that), things are much more different, and there is one reason: Minkowski inequality holds reversely! In fact we have \[\lVert f+g \rVert_p \geq \lVert f \rVert_p + \lVert g \rVert_p\] for \(0<p<1\). In fact, \(L^p\) space has too many weird things when \(0<p<1\). Precisely,

For \(0<p<1\), \(L^p(\mu)\) is locally convex if and only if \(\mu\) assumes finitely many values. (Proof.)

On the other hand, for example, \(X=[0,1]\) and \(\mu=m\) be the Lebesgue measure, then \(L^p(\mu)\) has *no* open convex subset other than \(\varnothing\) and \(L^p(\mu)\) itself. However,

A topological vector space \(X\) is normable if and only if its origin has a convex bounded neighbourhood. (See Kolmogorov's normability criterion.)

Therefore \(L^p(m)\) is not normable, hence not Banach.

We have gone too far. We need a metric that is fine enough.

*In this subsection we always have \(0<p<1\).*

Define \[\Delta(f)=\int_X |f|^p d\mu\] for \(f \in L^p(\mu)\). We will show that we have a metric by \[d(f,g)=\Delta(f-g).\] Fix \(y\geq 0\), consider the function \[f(x)=(x+y)^p-x^p.\] We have \(f(0)=y^p\) and \[f'(x)=p(x+y)^{p-1}-px^{p-1} \leq px^{p-1}-px^{p-1}=0\] when \(x > 0\) and hence \(f(x)\) is nonincreasing on \([0,\infty)\), which implies that \[(x+y)^p \leq x^p+y^p.\] Hence for any \(f\), \(g \in L^p\), we have \[\Delta(f+g)=\int_X |f+g|^p d\mu \leq \int_X |f|^p d\mu + \int_X |g|^p d\mu=\Delta(f)+\Delta(g).\] This inequality ensures that \[d(f,g)=\Delta(f-g)\] is a metric. It's immediate that \(d(f,g)=d(g,f) \geq 0\) for all \(f\), \(g \in L^p(\mu)\). For the triangle inequality, note that \[d(f,h)+d(g,h)=\Delta(f-h)+\Delta(h-g) \geq \Delta((f-h)+(h-g))=\Delta(f-g)=d(f,g).\] This is translate-invariant as well since \[d(f+h,g+h)=\Delta(f+h-g-h)=\Delta(f-g)=d(f,g)\] The completeness can be verified in the same way as the case when \(p>1\). In fact, this metric makes \(L^p\) a locally bounded F-space.

The metric of \(L^1\) is defined by \[d_1(f,g)=\lVert f-g \rVert_1=\int_X |f-g|d\mu.\] We need to find a relation between \(d_p(f,g)\) and \(d_1(\lambda(f),\lambda(g))\), where \(d_p\) is the metric of the corresponding \(L^p\) space.

As we have proved, \[(x+y)^p \leq x^p+y^p.\] Without loss of generality we assume \(x \geq y\) and therefore \[x^p=(x-y+y)^p \leq (x-y)^p+y^p.\] Hence \[x^p-y^p \leq (x-y)^p.\] By interchanging \(x\) and \(y\), we get \[|x^p-y^p| \leq |x-y|^p.\] Replacing \(x\) and \(y\) with \(|f|\) and \(|g|\) where \(f\), \(g \in L^p\), we get \[\int_{X}\lvert |f|^p-|g|^p \rvert d\mu \leq \int_X |f-g|^p d\mu.\] But \[d_1(\lambda(f),\lambda(g))=\int_{X}\lvert |f|^p-|g|^p \rvert d\mu \\d_p(f,g)=\Delta(f-g)= d\mu \leq \int_X |f-g|^p d\mu\] and we therefore have \[d_1(\lambda(f),\lambda(g)) \leq d_p(f,g).\] Hence \(\lambda\) is continuous (and in fact, Lipschitz continuous and uniformly continuous) when \(0<p<1\).

It's natural to think about Minkowski's inequality and Hölder's inequality in this case since they are critical inequality enablers. You need to think about some examples of how to create the condition to use them and get a fine result. In this section we need to prove that \[|x^p-y^p| \leq p|x-y|(x^{p-1}+y^{p-1}).\] This inequality is surprisingly easy to prove however. We will use nothing but the mean value theorem. Without loss of generality we assume that \(x > y \geq 0\) and define \(f(t)=t^p\). Then \[\frac{f(x)-f(y)}{x-y}=f'(\zeta)=p\zeta^{p-1}\] where \(y < \zeta < x\). But since \(p-1 \geq 0\), we see \(\zeta^{p-1} < x^{p-1} <x^{p-1}+y^{p-1}\). Therefore \[f(x)-f(y)=x^p-y^p=p(x-y)\zeta^{p-1}<p(x-y)(x^{p-1}-y^{p-1}).\] For \(x=y\) the equality holds.

Therefore \[\begin{aligned}d_1(\lambda(f),\lambda(g)) &= \int_X \left||f|^p-|g|^p\right|d\mu \\ &\leq \int_Xp\left||f|-|g|\right|(|f|^{p-1}+|g|^{p-1})d\mu\end{aligned}\] By *Hölder's inequality*, we have \[\begin{aligned}\int_X ||f|-|g||(|f|^{p-1}+|g|^{p-1})d\mu & \leq \left[\int_X \left||f|-|g|\right|^pd\mu\right]^{1/p}\left[\int_X\left(|f|^{p-1}+|g|^{p-1}\right)^q\right]^{1/q} \\&\leq \left[\int_X \left|f-g\right|^pd\mu\right]^{1/p}\left[\int_X\left(|f|^{p-1}+|g|^{p-1}\right)^q\right]^{1/q} \\&=\lVert f-g \rVert_p \left[\int_X\left(|f|^{p-1}+|g|^{p-1}\right)^q\right]^{1/q}.\end{aligned}\] By *Minkowski's inequality*, we have \[\left[\int_X\left(|f|^{p-1}+|g|^{p-1}\right)^q\right]^{1/q} \leq \left[\int_X|f|^{(p-1)q}d\mu\right]^{1/q}+\left[\int_X |g|^{(p-1)q}d\mu\right]^{1/q}\] Now things are clear. Since \(1/p+1/q=1\), or equivalently \(1/q=(p-1)/p\), suppose \(\lVert f \rVert_p\), \(\lVert g \rVert_p \leq R\), then \((p-1)q=p\) and therefore \[\left[\int_X|f|^{(p-1)q}d\mu\right]^{1/q}+\left[\int_X |g|^{(p-1)q}d\mu\right]^{1/q} = \lVert f \rVert_p^{p-1}+\lVert g \rVert_p^{p-1} \leq 2R^{p-1}.\] Summing the inequalities above, we get \[\begin{aligned}d_1(\lambda(f),\lambda(g)) \leq 2pR^{p-1}\lVert f-g \rVert_p =2pR^{p-1}d_p(f,g)\end{aligned}\] hence \(\lambda\) is continuous.

We have proved that \(\lambda\) is continuous, and when \(0<p<1\), we have seen that \(\lambda\) is Lipschitz continuous. It's natural to think about its differentiability afterwards, but the absolute value function is not even differentiable so we may have no chance. But this is still a fine enough result. For example we have no restriction to \((X,\mathfrak{M},\mu)\) other than the positivity of \(\mu\). Therefore we may take \(\mathbb{R}^n\) as the Lebesgue measure space here, or we can take something else.

It's also interesting how we use elementary Calculus to solve some much more abstract problems.

]]>Direction is a considerable thing. For example take a look at this picture (by David Gunderman):

The position of the red ball and black ball shows that this triple of balls turns upside down every time they finish one round. This wouldn't happen if this triple were on a normal band, which can be denoted by \(S^1 \times (0,1)\). What would happen if we try to describe their velocity on the Möbius band, both locally and globally? There must be some significant difference from a normal band. If we set some move pattern on balls, for example let them run horizontally or zig-zagly, hopefully we get different *set* of vectors. those vectors can span some vector spaces as well.

Here and in the forgoing posts, we will try to develop purely formally certain functorial constructions having to do with vector bundles. It may be overly generalized, but we will offer some examples to make it concrete.

Let \(M\) be a manifold (of class \(C^p\), where \(p \geq 0\) and can be set to \(\infty\)) modeled on a Banach space \(\mathbf{E}\). Let \(E\) be another topological space and \(\pi: E \to M\) a surjective \(C^p\)-morphism. A **vector bundle** is a topological construction associated with \(M\) (base space), \(E\) (total space) and \(\pi\) (bundle projection) such that, roughly speaking, \(E\) is locally a product of \(M\) and \(\mathbf{E}\).

We use \(\mathbf{E}\) instead of \(\mathbb{R}^n\) to include the infinite dimensional cases. We will try to distinguish finite-dimensional and infinite-dimensional Banach spaces here. There are a lot of things to do, since, for example, infinite dimensional Banach spaces have no countable Hamel basis, while the finite-dimensional ones have finite ones (this can be proved by using the Baire category theorem).

Next we will show precisely how \(E\) locally becomes a product space. Let \(\mathfrak{U}=(U_i)_i\) be an open covering of \(M\), and for each \(i\), suppose that we are *given* a mapping \[\tau_i:\pi^{-1}(U_i)\to U_i \times E\] satisfying the following three conditions.

**VB 1** \(\tau_i\) is a \(C^p\) diffeomorphism making the following diagram commutative:

where \(pr\) is the projection of the first component: \((x,y) \mapsto x\). By restricting \(\tau_i\) on one point of \(U_i\), we obtain an isomorphism on each fiber \(\pi^{-1}(x)\): \[\tau_{ix}:\pi^{-1}(x) \xrightarrow{\simeq} \{x\} \times \mathbf{E}\]

**VB 2** For each pair of open sets \(U_i\), \(U_j \in \mathfrak{U}\), we have the map \[\tau_{jx} \circ \tau_{ix}^{-1}: \mathbf{E} \to \mathbf{E}\] to be a toplinear isomorphism (that is, it preserves \(\mathbf{E}\) for being a *topological* vector space).

**VB 3** For any two members \(U_i\), \(U_j \in \mathfrak{U}\), we have the following function to be a \(C^p\)-morphism: \[\begin{aligned}\varphi:U_i \cap U_j &\to L(\mathbf{E},\mathbf{E}) \\ x &\mapsto \left(\tau_j\circ \tau_i^{-1}\right)_x\end{aligned}\]

**REMARKS.** As with manifold, we call the set of 2-tuples \((U_i,\tau_i)_i\) a **trivializing covering** of \(\pi\), and that \((\tau_i)\) are its **trivializing maps**. Precisely, for \(x \in U_i\), we say \(U_i\) or \(\tau_i\) trivializes at \(x\).

Two trivializing *coverings* for \(\pi\) is said to be **VB-equivalent** if taken together they also satisfy conditions of **VB 2** and **VB 3**. It's immediate that **VB-equivalence** is an equivalence relation and we leave the verification to the reader. It is this VB-equivalence *class* of trivializing coverings that determines a structure of **vector bundle** on \(\pi\). With respect to the Banach space \(\mathbf{E}\), we say that the vector bundle has **fiber** \(\mathbf{E}\), or is **modeled on** \(\mathbf{E}\).

Next we shall give some motivations of each condition. Each pair \((U_i,\tau_i)\) determines a local product of 'a part of the manifold' and the model space, on the latter of which we can deploy the direction with ease. This is what **VB 1** tells us. But that's far from enough if we want our vectors fine enough. We do want the total space \(E\) to actually be able to qualify our requirements. As for **VB 2**, it is ensured that using two different trivializing maps will give the same structure of some Banach spaces (with *equivalent* norms). According to the image of \(\tau_{ix}\), we can say, for each point \(x \in X\), which can be determined by a fiber \(\pi^{-1}(x)\) (the pre-image of \(\tau_{ix}\)), can be given another Banach space by being sent via \(\tau_{jx}\) for some \(j\). Note that \(\pi^{-1}(x) \in E\), the total space. In fact, **VB 2** has an equivalent alternative:

**VB 2'** On each fiber \(\pi^{-1}(x)\) we are given a structure of Banach space as follows. For \(x \in U_i\), we have a toplinear isomorphism which is in fact the trivializing map: \[\tau_{ix}:\pi^{-1}(x)=E_x \to \mathbf{E}.\] As stated, **VB 2** implies **VB 2'**. Conversely, if **VB 2'** is satisfied, then for open sets \(U_i\), \(U_j \in \mathfrak{U}\), and \(x \in U_i \cap U_j\), we have \(\tau_{jx} \circ \tau_{ix}^{-1}:\mathbf{E} \to \mathbf{E}\) to be an toplinear isomorphism. Hence, we can consider **VB 2** or **VB 2'** as the refinement of **VB 1**.

In finite dimensional case, one can omit **VB 3** since it can be implied by **VB 2**, and we will prove it below.

(Lemma)Let \(\mathbf{E}\) and \(\mathbf{F}\) be two finite dimensional Banach spaces. Let \(U\) be open in some Banach space. Let \[f:U \times \mathbf{E} \to \mathbf{F}\] be a \(C^p\)-morphism such that for each \(x \in U\), the map \[f_x: \mathbf{E} \to \mathbf{F}\] given by \(f_x(v)=f(x,v)\) is a linear map. Then the map of \(U\) into \(L(\mathbf{E},\mathbf{F})\) given by \(x \mapsto f_x\) is a \(C^p\)-morphism.

**PROOF.** Since \(L(\mathbf{E},\mathbf{F})=L(\mathbf{E},\mathbf{F_1}) \times L(\mathbf{E},\mathbf{F_2}) \times \cdots \times L(\mathbf{E},\mathbf{F_n})\) where \(\mathbf{F}=\mathbf{F_1} \times \cdots \times \mathbf{F_n}\), by induction on the dimension of \(\mathbf{F}\) and \(\mathbf{E}\), it suffices to assume that \(\mathbf{E}\) and \(\mathbf{F}\) are toplinearly isomorphic to \(\mathbb{R}\). But in that case, the function \(f(x,v)\) can be written \(g(x)v\) for some \(g:U \to \mathbb{R}\). Since \(f\) is a morphism, it follows that as a function of each argument \(x\), \(v\) is also a morphism, Putting \(v=1\) shows that \(g\) is also a morphism, which finishes the case when both the dimension of \(\mathbf{E}\) and \(\mathbf{F}\) are equal to \(1\), and the proof is completed by induction. \(\blacksquare\)

To show that **VB 3** is implied by **VB 2**, put \(\mathbf{E}=\mathbf{F}\) as in the lemma. Note that \(\tau_j \circ \tau_i^{-1}\) maps \(U_i \cap U_j \times \mathbf{E}\) to \(\mathbf{E}\), and \(U_i \cap U_j\) is open, and for each \(x \in U_i \cap U_j\), the map \((\tau_j \circ \tau_i^{-1})_x=\tau_{jx} \circ \tau_{ix}^{-1}\) is toplinear, hence linear. Then the fact that \(\varphi\) is a morphism follows from the lemma.

Let \(M\) be any \(n\)-dimensional smooth manifold that you are familiar with, then \(pr:M \times \mathbb{R}^n \to M\) is actually a vector bundle. Here the total space is \(M \times \mathbb{R}^n\) and the base is \(M\) and \(pr\) is the bundle projection but in this case it is simply a projection. Intuitively, on a total space, we can determine a point \(x \in M\), and another component can be any direction in \(\mathbb{R}^n\), hence a *vector*.

We need to verify three conditions carefully. Let \((U_i,\varphi_i)_i\) be any atlas of \(M\), and \(\tau_i\) is the identity map on \(U_i\) (which is naturally of \(C^p\)). We claim that \((U_i,\tau_i)_i\) satisfy the three conditions, thus we get a vector bundle.

For **VB 1** things are clear: since \(pr^{-1}(U_i)=U_i \times \mathbb{R}^n\), the diagram is commutative. Each fiber \(pr^{-1}(x)\) is essentially \((x) \times \mathbb{R}^n\), and still, \(\tau_{jx} \circ \tau_{ix}^{-1}\) is the identity map between \((x) \times \mathbb{R}^n\) and \((x) \times \mathbb{R}^n\), under the same Euclidean topology, hence **VB 2** is verified, and we have no need to verify **VB 3**.

First of all, imagine you have embedded a circle into a Möbius band. Now we try to give some formal definition. As with quotient topology, \(S^1\) can be defined as \[S^1=I/\sim_1,\]

where \(I\) is the unit interval and \(0 \sim_1 1\) (identifying two ends). On the other hand, the infinite Möbius band can be defined by \[B= (I \times \mathbb{R})/\sim_2\] where \((0,v) \sim_2 (1,-v)\) for all \(v \in \mathbb{R}\) (not only identifying two ends of \(I\) but also 'flips' the vertical line). Then all we need is a natural projection on the first component: \[\pi:B \to S^1.\] And the verification has few difference from the trivial bundle. Quotient topology of Banach spaces follows naturally in this case, but things might be troublesome if we restrict ourself in \(\mathbb{R}^n\).

The first example is relatively rare in many senses. By \(S^n\) we mean the set in \(\mathbb{R}^{n+1}\) with \[S^n=\{(x_0,x_1,\dots,x_n):x_0^2+x_1^2+\cdots+x_n^2=1\}\] and the tangent bundle can be defined by \[TS^n=\{(\mathbf{x},\mathbf{y}):\langle\mathbf{x},\mathbf{y}\rangle=0\} \subset S^{n} \times\mathbb{R}^{n+1},\] where, of course, \(\mathbf{x} \in S^n\) and \(\mathbf{y} \in \mathbb{R}^{n+1}\). The vector bundle is given by \(pr:TS^n \to S^n\) where \(pr\) is the projection of the first factor. This total space is of course much finer than \(M \times \mathbb{R}^n\) in the first example. Each point in the manifold now is associated with a *tangent space* \(T_x(M)\) at this point.

More generally, we can define it in any Hilbert space \(H\), for example, \(L^2\) space: \[TS=\{(x,y):\langle x , y \rangle=0\} \subset S \times H\] where \[S=\{x:\langle x , x \rangle = 1\}.\] The projection is natural: \[\begin{aligned}\pi: TM &\to M \\T_x(M) & \mapsto x\end{aligned}\] But we will not cover the verification in this post since it is required to introduce the abstract definition of tangent vectors. This will be done in the following post.

We want to study those 'vectors' associated to some manifold both globally and locally. For example we may want to describe the tangent line of some curves at some point without heavily using elementary calculus stuff. Also, we may want to describe the vector bundle of a manifold globally, for example, when will we have a trivial one? Can we classify the manifold using the behavior of the bundle? Can we make it a little more abstract, for example, consider the class of all isomorphism bundles? How do one bundle *transform* to another? But to do this we need a big amount of definitions and propositions.

We can define several relations between two norms. Suppose we have a topological vector space \(X\) and two norms \(\lVert \cdot \rVert_1\) and \(\lVert \cdot \rVert_2\). One says \(\lVert \cdot \rVert_1\) is *weaker* than \(\lVert \cdot \rVert_2\) if there is \(K>0\) such that \(\lVert x \rVert_1 \leq K \lVert x \rVert_2\) for all \(x \in X\). Two norms are *equivalent* if each is weaker than the other (trivially this is a equivalence relation). The idea of stronger and weaker norms is related to the idea of the "finer" and "coarser" topologies in the setting of topological spaces.

So what about their limit of convergence? Unsurprisingly this can be verified with elementary \(\epsilon-N\) arguments. Suppose now \(\lVert x_n - x \rVert_1 \to 0\) as \(n \to 0\), we immediately have \[\lVert x_n - x \rVert_2 \leq K \lVert x_n-x \rVert_1 < K\varepsilon\]

for some large enough \(n\). Hence \(\lVert x_n - x \rVert_2 \to 0\) as well. But what about the converse? We give a new definition of equivalence relation between norms.

(Definition)Two norms \(\lVert \cdot \rVert_1\) and \(\lVert \cdot \rVert_2\) of a topological vector space arecompatibleif given that \(\lVert x_n - x \rVert_1 \to 0\) and \(\lVert x_n - y \rVert_2 \to 0\) as \(n \to \infty\), we have \(x=y\).

By the uniqueness of limit, we see if two norms are equivalent, then they are compatible. And surprisingly, with the help of the closed graph theorem we will discuss in this post, we have

(Theorem 1)If \(\lVert \cdot \rVert_1\) and \(\lVert \cdot \rVert_2\) are compatible, and both \((X,\lVert\cdot\rVert_1)\) and \((X,\lVert\cdot\rVert_2)\) are Banach, then \(\lVert\cdot\rVert_1\) and \(\lVert\cdot\rVert_2\) are equivalent.

This result looks natural but not seemingly easy to prove, since one find no way to build a bridge between the limit and a general inequality. But before that, we need to elaborate some terminologies.

(Definition)For \(f:X \to Y\), thegraphof \(f\) is defined by \[G(f)=\{(x,f(x)) \in X \times Y:x \in X\}.\]

If both \(X\) and \(Y\) are topological spaces, and the topology of \(X \times Y\) is the usual one, that is, the smallest topology that contains all sets \(U \times V\) where \(U\) and \(V\) are open in \(X\) and \(Y\) respectively, and if \(f: X \to Y\) is continuous, it is natural to expect \(G(f)\) to be closed. For example, by taking \(f(x)=x\) and \(X=Y=\mathbb{R}\), one would expect the diagonal line of the plane to be closed.

(Definition)The topological space \((X,\tau)\) is an \(F\)-space if \(\tau\) is induced by a complete invariant metric \(d\). Here invariant means that \(d(x+z,y+z)=d(x,y)\) for all \(x,y,z \in X\).

A Banach space is easily to be verified to be a \(F\)-space by defining \(d(x,y)=\lVert x-y \rVert\).

(Open mapping theorem)See this post

By definition of closed set, we have a practical criterion on whether \(G(f)\) is closed.

(Proposition 1)\(G(f)\) is closed if and only if, for any sequence \((x_n)\) such that the limits \[x=\lim_{n \to \infty}x_n \quad \text{ and }\quad y=\lim_{n \to \infty}f(x_n)\] exist, we have \(y=f(x)\).

In this case, we say \(f\) is closed. For continuous functions, things are trivial.

(Proposition 2)If \(X\) and \(Y\) are two topological spaces and \(Y\) is Hausdorff, and \(f:X \to Y\) is continuous, then \(G(f)\) is closed.

*Proof.* Let \(G^c\) be the complement of \(G(f)\) with respect to \(X \times Y\). Fix \((x_0,y_0) \in G^c\), we see \(y_0 \neq f(x_0)\). By the Hausdorff property of \(Y\), there exists some open subsets \(U \subset Y\) and \(V \subset Y\) such that \(y_0 \in U\) and \(f(x_0) \in V\) and \(U \cap V = \varnothing\). Since \(f\) is continuous, we see \(W=f^{-1}(V)\) is open in \(X\). We obtained a open neighborhood \(W \times U\) containing \((x_0,y_0)\) which has empty intersection with \(G(f)\). This is to say, every point of \(G^c\) has a open neighborhood contained in \(G^c\), hence a interior point. Therefore \(G^c\) is open, which is to say that \(G(f)\) is closed. \(\square\)

**REMARKS.** For \(X \times Y=\mathbb{R} \times \mathbb{R}\), we have a simple visualization. For \(\varepsilon>0\), there exists some \(\delta\) such that \(|f(x)-f(x_0)|<\varepsilon\) whenever \(|x-x_0|<\delta\). For \(y_0 \neq f(x_0)\), pick \(\varepsilon\) such that \(0<\varepsilon<\frac{1}{2}|f(x_0)-y_0|\), we have two boxes (\(CDEF\) and \(GHJI\) on the picture), namely \[B_1=\{(x,y):x_0-\delta<x<x_0+\delta,f(x_0)-\varepsilon<y<f(x_0)+\varepsilon\}\] and \[B_2=\{(x,y):x_0-\delta<x<x_0+\delta,y_0-\varepsilon<y<y_0+\varepsilon\}.\] In this case, \(B_2\) will not intersect the graph of \(f\), hence \((x_0,y_0)\) is an interior point of \(G^c\).

The Hausdorff property of \(Y\) is not removable. To see this, since \(X\) has no restriction, it suffices to take a look at \(X \times X\). Let \(f\) be the identity map (which is continuous), we see the graph \[G(f)=\{(x,x):x \in X\}\] is the diagonal. Suppose \(X\) is not Hausdorff, we reach a contradiction. By definition, there exists some distinct \(x\) and \(y\) such that all neighborhoods of \(x\) contain \(y\). Pick \((x,y) \in G^c\), then *all* neighborhoods of \((x,y) \in X \times X\) contain \((x,x)\) so \((x,y) \in G^c\) is *not* a interior point of \(G^c\), hence \(G^c\) is not open.

Also, as an immediate consequence, every affine algebraic variety in \(\mathbb{C}^n\) and \(\mathbb{R}^n\) is closed with respect to Euclidean topology. Further, we have the Zariski topology \(\mathcal{Z}\) by claiming that, if \(V\) is an affine algebraic variety, then \(V^c \in \mathcal{Z}\). It's worth noting that \(\mathcal{Z}\) is *not* Hausdorff (example?) and in fact much coarser than the Euclidean topology although an affine algebraic variety is both closed in the Zariski topology and the Euclidean topology.

After we have proved this theorem, we are able to prove the theorem about compatible norms. We shall assume that both \(X\) and \(Y\) are \(F\)-spaces, since the norm plays no critical role here. This offers a greater variety but shall not be considered as an abuse of abstraction.

(The Closed Graph Theorem)Suppose

\(X\) and \(Y\) are \(F\)-spaces,

\(f:X \to Y\) is linear,

\(G(f)\) is closed in \(X \times Y\).

Then \(f\) is continuous.

In short, the closed graph theorem gives a sufficient condition to claim the continuity of \(f\) (keep in mind, linearity does not imply continuity). If \(f:X \to Y\) is continuous, then \(G(f)\) is closed; if \(G(f)\) is closed and \(f\) is linear, then \(f\) is continuous.

*Proof.* First of all we should make \(X \times Y\) an \(F\)-space by assigning addition, scalar multiplication and metric. Addition and scalar multiplication are defined componentwise in the nature of things: \[\alpha(x_1,y_1)+\beta(x_2,y_2)=(\alpha x_1+\beta x_2,\alpha y_1 + \beta y_2).\] The metric can be defined without extra effort: \[d((x_1,y_1),(x_2,y_2))=d_X(x_1,x_2)+d_Y(y_1,y_2).\] Then it can be verified that \(X \times Y\) is a topological space with translate invariant metric. (Potentially the verifications will be added in the future but it's recommended to do it yourself.)

Since \(f\) is linear, the graph \(G(f)\) is a subspace of \(X \times Y\). Next we quote an elementary result in point-set topology, a subset of a complete metric space is closed if and only if it's complete, by the translate-invariance of \(d\), we see \(G(f)\) is an \(F\)-space as well. Let \(p_1: X \times Y \to X\) and \(p_2: X \times Y \to Y\) be the natural projections respectively (for example, \(p_1(x,y)=x\)). Our proof is done by verifying the properties of \(p_1\) and \(p_2\) on \(G(f)\).

*For simplicity one can simply define \(p_1\) on \(G(f)\) instead of the whole space \(X \times Y\), but we make it a global projection on purpose to emphasize the difference between global properties and local properties. One can also write \(p_1|_{G(f)}\) to dodge confusion.*

**Claim 1.** \(p_1\) (with restriction on \(G(f)\)) defines an isomorphism between \(G(f)\) and \(X\).

For \(x \in X\), we see \(p_1(x,f(x)) = x\) (surjectivity). If \(p_1(x,f(x))=0\), we see \(x=0\) and therefore \((x,f(x))=(0,0)\), hence the restriction of \(p_1\) on \(G\) has trivial kernel (injectivity). Further, it's trivial that \(p_1\) is linear.

**Claim 2.** \(p_1\) is continuous on \(G(f)\).

For every sequence \((x_n)\) such that \(\lim_{n \to \infty}x_n=x\), we have \(\lim_{n \to \infty}f(x_n)=f(x)\) since \(G(f)\) is closed, and therefore \(\lim_{n \to \infty}p_1(x_n,f(x_n)) =x\). Meanwhile \(p_1(x,f(x))=x\). The continuity of \(p_1\) is proved.

**Claim 3.** \(p_1\) is a homeomorphism with restriction on \(G(f)\).

We already know that \(G(f)\) is an \(F\)-space, so is \(X\). For \(p_1\) we have \(p_1(G(f))=X\) is of the second category (since it's an \(F\)-space and \(p_1\) is one-to-one), and \(p_1\) is continuous and linear on \(G(f)\). By the open mapping theorem, \(p_1\) is an open mapping on \(G(f)\), hence is a homeomorphism thereafter.

**Claim 4.** \(p_2\) is continuous.

This follows the same way as the proof of claim 2 but much easier since we have no need to care about \(f\).

Now things are immediate once one realizes that \(f=p_2 \circ p_1|_{G(f)}^{-1}\), and hence \(f\) is continuous. \(\square\)

Before we go for theorem 1 at the beginning, we drop an application on Hilbert spaces.

Let \(T\) be a bounded operator on the Hilbert space \(L_2([0,1])\) so that if \(\phi \in L_2([0,1])\) is a continuous function so is \(T\phi\). Then the restriction of \(T\) to \(C([0,1])\) is a bounded operator of \(C([0,1])\).

For details please check this.

Now we go for the identification of norms. Define \[\begin{aligned}f:(X,\lVert\cdot\rVert_1) &\to (X,\lVert\cdot\rVert_2) \\ x &\mapsto x\end{aligned}\] i.e. the identity map between two Banach spaces (hence \(F\)-spaces). Then \(f\) is linear. We need to prove that \(G(f)\) is closed. For the convergent sequence \((x_n)\) \[\lim_{n \to \infty}\lVert x_n -x \rVert_1=0,\] we have \[\lim_{n \to \infty} \lVert f(x_n)-x \rVert_2=\lim_{n \to \infty}\lVert x_n -x\rVert_2=\lim_{n \to \infty}\lVert f(x_n)-f(x)\rVert_2=0.\] Hence \(G(f)\) is closed. Therefore \(f\) is continuous, hence bounded, we have some \(K\) such that \[\lVert x \rVert_2 =\lVert f(x) \rVert_1 \leq K \lVert x \rVert_1.\] By defining \[\begin{aligned}g:(X,\lVert\cdot\rVert_2) &\to (X,\lVert\cdot\rVert_1) \\ x &\mapsto x\end{aligned}\] we see \(g\) is continuous as well, hence we have some \(K'\) such that \[\lVert x \rVert_1 =\lVert g(x) \rVert_2 \leq K'\lVert x \rVert_2\] Hence two norms are weaker than each other.

Since there is no strong reason to write more posts on this topic, i.e. the three fundamental theorems of linear functional analysis, I think it's time to make a list of the series. It's been around half a year.

- The Big Three Pt. 1 - Baire Category Theorem Explained
- The Big Three Pt. 2 - The Banach-Steinhaus Theorem
- The Big Three Pt. 3 - The Open Mapping Theorem (Banach Space)
- The Big Three Pt. 4 - The Open Mapping Theorem (F-Space)
- The Big Three Pt. 5 - The Hahn-Banach Theorem (Dominated Extension)
- The Big Three Pt. 6 - Closed Graph Theorem with Applications

- Walter Rudin,
*Functional Analysis* - Peter Lax,
*Functional Analysis* - Jesús Gil de Lamadrid,
*Some Simple Applications of the Closed Graph Theorem*