- The FT is a global operator. You don’t always want this full globality. So, it is natural to ask for more local transforms, ie. where one coefficient only depends on a small part of your function, not on its entirety. So that you could by manipulating the coefficients of your transform make local changes to your function versus being forced to do global changes. Think about the low-pass filter: this removed the high frequencies/the details, but it also introduced some “ringing” because of these global changes. And the wavelet transform is a more localized transform.
- So we want something that is more localized than the FT.
- conditions:
- $f,\psi\in L^2$
- $f$: the image
- $\psi$: encodes how you want to transform
- $a>0$ (scale parameter)
- $b\in\mathbb{R}$ (spatial parameter)
- then, the wavelet transform of $f$ is \(L_{\psi}f(a,b) = \int_\mathbb{R} f(x)\boxed{\frac{1}{\sqrt{a}}\psi\left(\frac{x-b}{a}\right)}dx\)
- like a convolution, see (4.2 (ii) (B))
- we are integrating our image $f$ and we are weighting this by $\frac{1}{\sqrt{a}}\psi$. And what we do with our $\psi$ is: we shift $\psi$ to a certain position $b$. So, we have our $\psi$ and we shift it with $b$ over the signal wherever we want to have it and we scale $\psi$ with $a$ and $\sqrt{a}$. So this $\frac{1}{\sqrt{a}}$ and $a$ is exactly the change of variables that “zooms in” or “out” to grab some information from your signal.
- Eg. if you would think that $\psi$ is just the characteristic function of $\left[0,1\right]$, then it would give you a mean value on $\left[0,1\right]$, if $a=1$ and $b=0$,
- and if you now change $b$ you would take different mean values because you are shifting this $\psi$
- and with changing this $a$ you would be able to do mean values or some accumulation over smaller or larger intervals depending on how you choose this $a$.
- So this somehow samples … (interrupted by question)
- Q&A: in case $\psi$ does not have compact support then it is still global, not local! So, finding good $\psi$s is a very important question. For now we just know (vaguely) that we have some $\psi$. We will need to find out what suitable $\psi$s are and what they should fulfill.
- But, in general, you can have this image and you go to a certain position of your signal with $b$ and you choose a certain zoom factor $a$ and then you somehow aggregate values in there.
4.2 FT vs WT, Representations of $L_{\psi}f$
- (i) $L_{\psi}$ maps $f:\mathbb{R}\to\mathbb{C}$ to $L_{\psi}f:(0,\infty)\times\mathbb{R}\to\mathbb{R}$
- in words: the result of the WT is a 2D function (2-dimensional domain for $L_{\psi}f$)
- recall: the result of the FT is the 1D function, ie. $\mathcal{F}$ maps $f:\mathbb{R}\to\mathbb{C}$ to $\mathcal{F}f: \mathbb{R}\to\mathbb{C}$ (domain and codomain are the same for $f$ and $\mathcal{F}f$)
- 2D: because you can evaluate it for different $a$s and $b$s, not only for different frequencies.
- We increase from a 1D domain to a 2D domain which already implies that this representation (i) somehow has to be overcomplete. So, this has to have redundant information because it cannot be that we need a 2D function to describe a 1D function. The 2D function has much more power to store information.
- (ii) 2 representations of $L_{\psi}f$ (equivalent to (4.1)):
- (A) $L_{\psi}f$ as scalar product: $L_{\psi}f(a,b)=\frac{1}{\sqrt{a}}(f,T_{-b}D_{\frac{1}{a}}\psi)_{L^2}$
- (B) $L_{\psi}f$ as convolution: $L_{\psi}f(a,b)=\frac{1}{\sqrt{a}}(f\ast D_{-\frac{1}{a}}\psi)(b)$
- just a side remark: (B) is related to (2.16): $u(x,\sigma)=(g_{\sqrt{2\sigma}}\ast f)(x)$, where we also filtered with kernels of different sizes, so there is some relation
- depending on the circumstances one representation will be more useful than the other
- (iii) Where does $L_{\psi}f$ map? Just a variant of $L^2$ on $\left[0,\infty\right)\times\mathbb{R}$: \(L^2(\left[0,\infty\right)\times\mathbb{R},\frac{dadb}{a^2}):=\left\{F:[0,\infty)\times\mathbb{R}\to\mathbb{R}\ \mu-\text{measurable}:\int_\mathbb{R}\int_0^{\infty}\lvert F(a,b)\rvert^2\frac{dadb}{a^2} < \infty\right\}\)
- notation: this $\frac{dadb}{a^2}$ in $L^2(\left[0,\infty\right)\times\mathbb{R},\frac{dadb}{a^2})$ is not a set into which we map (like eg. in the $L^2(\mathbb{R}^d,\mathbb{C})$ notation, where we map from $\mathbb{R}^d$ to $\mathbb{C}$), but it tells us that we need to change the integration measure $dadb$ by rescaling it with $\frac{1}{a^2}$. (just Berkels’ notation)
- everything except the $a^2$ scaling would be “normal Lebesgue”, in particular, $\int_\mathbb{R}\int_0^{\infty}\lvert F(a,b)\rvert^2 dadb < \infty$ would be “normal Lebesgue”, ie. the square integral is finite
- we weight this integral $\int_\mathbb{R}\int_0^{\infty}\lvert F(a,b)\rvert^2 dadb$ by $a^2$: that means for large $a$ this integral will be decreased because we multiply by sth that is small, but for small $a$ this integral will be strongly increased. So, this has a singularity at $a=0$ which puts certain requirements on this transform $F$, so that $\lvert F(a,b)\rvert^2$ has to vanish at $a=0$ (otherwise $\int_\mathbb{R}\int_0^{\infty}\lvert F(a,b)\rvert^2 dadb$ would not be finite) and even a bit more.
- (iv) equipped with the same equivalence relation used for the Lebesgue spaces (B.3) and scalar product \((F,G)_{L^2(\left[0,\infty\right)\times\mathbb{R},\frac{dadb}{a^2})}:=\int_\mathbb{R}\int_0^{\infty}\lvert F(a,b)G(a,b)\rvert^2\frac{dadb}{a^2}\)
- the scalar product looks like the normal scalar product for $L^2$, but we add this scaling $a^2$ (which we also required for the new $L^2$ space that we just defined in (iii))
- problem: so far it is unclear why this $a^2$ is sitting there, but in the next proposition we will see why we have it and how it relates to the wavelet transform
- conditions:
- $\psi\in L^2$ s.t. $0<c_\psi:=2\pi\int_{0}^{\infty}\frac{\lvert \hat{\psi}(\omega)\rvert^2}{\omega}d\omega<\infty$ (warning: these are 3 conditions!)
- where $\hat{\psi}$ is the FT of $\psi$
- then,
- (i) $L_\psi:L^2\to L^2(\left[0,\infty\right)\times\mathbb{R},\frac{dadb}{a^2})$
- this means that the functions that we get from this transform are not in the normal $L^2$ space, but in this “$L^2$ with the different scaling $a^2$”. That is why we need this new space.
- (ii) $L_\psi$ is a linear mapping
- (iii) $(L_{\psi} f,L_{\psi} g)_{L^2(\left[0,\infty\right)\times\mathbb{R},\frac{dadb}{a^2})}=c_{\psi}(f,g)_{L^2}$ for all $f,g\in L^2$
- like Plancherel formula which said that $(\mathcal{F}f,\mathcal{F}g)_{L^2} = (f,g)_{L^2}$. This is essentially also true here. Except that we do not have the normal scalar product on the lhs, but this new scalar product and there is this extra scaling factor $c_\psi$.
- (iii) says even more than that $L_{\psi}$ is continuous (discussed more precisely in (4.5 (ii))). Because (iii) is a very strong property. It says that $L_{\psi}$ is almost like an isometry, except that it changes lengths by a constant factor $c_\psi$.
- proof:
- prove (iii) using
- Plancherel formula (3.27)
- properties of $\mathcal{F}$ for translation, modulation, etc. (3.4)
- Fubini
- $f$ real-valued $\Rightarrow$ $\mathcal{F}f$ hermitian (3.6)
- (iii) now immediately implies (i)
- because (i) claims that we are mapping to this special space and for this special space we need that $\int_\mathbb{R}\int_0^{\infty}\lvert F(a,b)\rvert^2\frac{dadb}{a^2}$ is finite, but if I plug in $f$ or $g$ in (iii) then we see that the integral we are looking for $(L_{\psi} f,L_{\psi} f)_{L^2(\left[0,\infty\right)\times\mathbb{R},\frac{dadb}{a^2})}$ is the same as $c_{\psi}(f,f)_{L^2}$, ie. the norm squared of $f$, and the norm squared of $f$ is, of course, finite since we are starting with $f\in L^2$.
- $L_{\psi}$ is linear (ii) is clear because the integral in the def. of $L_{\psi}$ is linear in $f$ (see def. of $L_{\psi}$)
- sidenote: if you look closely in (4.3), we did not need that $c_{\psi} > 0$, we only needed $c_{\psi} < \infty$. But we will need $c_{\psi} > 0$ to get the inverse.
4.4 Admissibility Condition, Wavelet
- (i) the condition $0<c_\psi:=2\pi\int_{0}^{\infty}\frac{\lvert \hat{\psi}(\omega)\rvert^2}{\omega}d\omega<\infty$ is called admissibility condition and $c_\psi$ is the admissibility constant
- name “admissibility” because if this condition is not fulfilled, ie $c_{\psi} = \infty$, then the norm in (4.3) $c_{\psi}(f,g)_{L^2}$ explodes.
- If $c_{\psi} = 0$ then the norm $c_{\psi}(f,g)_{L^2}$ just degenerates: so that $f$ is mapped to norm $0$, although it was not $0$.
- (ii) any $\psi\in L^2$ that fulfills the admissibility condition is called wavelet
- outlook:
- $c_\psi<\infty$ gives us the continuity of the WT → (4.5 (ii))
- $c_\psi>0$ gives us the invertibility of the WT → (4.6)
4.5 $L_{\psi}$ Almost Isometry, $L_{\psi}$ is Continuous, ($\hat{\psi}(0) = 0\Leftrightarrow$ Mean Value of $\psi$ (not $\hat{\psi}$!) is $0$)
- (i) the scalar product in (4.2 (iv)) induces the norm \(\lVert L_{\psi}f\rVert_{L^2(\left[0,\infty\right)\times\mathbb{R},\frac{dadb}{a^2})}=\sqrt{(L_{\psi}f,L_{\psi}f)_{L^2(\left[0,\infty\right)\times\mathbb{R},\frac{dadb}{a^2})}}\stackrel{(4.3)}{=}\sqrt{c_{\psi}(f,f)_{L^2}}=\sqrt{c_{\psi}}\lVert f\rVert_{L^2}\quad\forall f\in L^2\)
- (ii) from (i) we see that (4.3) shows that the WT is continuous if $c_{\psi}<\infty$
- because (i) shows the boundedness of the WT and since the WT is linear (4.3 (ii)) we can conclude from (ex05 problem 2) that the WT is continuous
- (iii) since $\displaystyle\int_0^1\frac{1}{\omega}d\omega=\lim_{a\searrow 0}\int_a^1\frac{1}{\omega}d\omega=\ldots=\infty$ (since the integral in the admissibility condition has the same structure as this integral here, ie. it includes the integral from $0$ to $1$, if the $\lvert \hat{\psi}(\omega)\rvert^2$ does not go below a certain value then this integral in the admissibility condition will be $\infty$. This means $\lvert \hat{\psi}(\omega)\rvert$ must go to $0$, otherwise the integral in the admissibility condition cannot be finite!) it is necessary that $\lvert \hat{\psi}(\omega)\rvert^2$ decays to $0$ quickly enough for $\omega \to 0$ (ie. $\lvert \hat{\psi}(\omega)\rvert^2 \to 0$ for $\omega \to 0$) in a suitable sense to get $c_\psi < \infty$ (ie. the finiteness of the integral in the admissibility condition at $0$ can only come from the fact that $\lvert \hat{\psi}(\omega)\rvert^2$ also vanishes there).
- In case $\hat{\psi}$ is continuous (at least in $0$), $\lvert \hat{\psi}(\omega)\rvert^2 \to 0$ for $\omega \to 0$ implies $\hat{\psi}(0) = 0$ is necessary (but not sufficient) to get $c_\psi < \infty$.
- note: since $\psi\in L^2$ and the FT maps from $L^2$ to $L^2$ it is not guaranteed that $\hat{\psi}$ is continuous.
- phth: In case $\hat{\psi}$ is not continuous in $0$, $\lvert \hat{\psi}(\omega)\rvert^2 \to 0$ for $\omega \to 0$ does not imply $\hat{\psi}(0) = 0$ (eg. think about the function $f$ with $f=1$ for $x=0$ and $f=0$ for $x\neq 0$, then $f$ is not continuous in $0$ and $\lvert f(x)\rvert^2\to 0$ for $x\to 0$, but $f(0)\neq 0$)
- What does $\hat{\psi}(0) = 0$ mean?
- we have discussed this: the $0$ frequency is the only frequency that you can easily interpret: the $0$ frequency encodes somehow the mean: so if the $0$ frequency is $0$ then the mean of your image is $0$ (here: mean of this $\psi$ is $0$).
- In other words, it is necessary (but not sufficient) that the mean value of $\psi$ (not $\hat{\psi}$!) is $0$ for $\psi$ to be a wavelet.
- (iv) The condition $c_\psi > 0$ is necessary to guarantee the invertibility of the transform on its image (4.6):
4.6 Wavelet Inversion Theorem
- conditions:
- $f\in L^2$
- $\psi\in L^2$
- $\psi$ is a wavelet
- ie. $\psi$ fulfills the admissibility condition
- then, \(\text{for all}\ f\in L^2:\ f(x) = \frac{1}{c_\psi}\int_{\mathbb{R}}\int_0^\infty L_{\psi}f(a,b)\frac{1}{\sqrt{a}}\psi\left(\frac{x-b}{a}\right)\frac{dadb}{a^2}\quad\text{for a.e.}\ x\in\mathbb{R}\)
- in words: we can transform back by integrating over $da$ and $db$ (instead of $dx$ as in the WT (4.1)) with the proper scaling $a^2$ and the same integration weight $\frac{1}{\sqrt{a}}\psi\left(\frac{x-b}{a}\right)$ as in the WT (4.1)
- proof:
- Hilbertian adjoint of $L_{\psi}$ $L_{\psi}^\ast:L^2(\left[0,\infty\right)\times\mathbb{R},\frac{dadb}{a^2})\to L^2(\mathbb{R})$
- a mapping that goes exactly in the other direction than $L_{\psi}$, ie. it is mapping from the weighted $L^2$ space to $L^2$
- idea:
- in finite dimensions the expression $(L_{\psi}f,F)_{L^2(\left[0,\infty\right)\times\mathbb{R},\frac{dadb}{a^2})}$ would be a scalar product on some $\mathbb{R}^n$ and the linear mapping $L_{\psi}$ would be a matrix, ie. the expression would be like the scalar product $(\underline{A}x,y)_{\mathbb{R}^n}$
- recall (a standard linear algebra trick): what can you do to move the matrix $\underline{A}$ in $(\underline{A}x,y)_{\mathbb{R}^n}$ to the other argument $y$?
- for real $\underline{A}$: $(\underline{A}x,y)_{\mathbb{R}^n}=(x,\underline{A}^{\top} y)_{\mathbb{R}^n}$
- for complex $\underline{A}$: $(\underline{A}x,y)_{\mathbb{R}^n}=(x,\overline{\underline{A}^{\top}} y)_{\mathbb{R}^n}$
- recall: Hermitian matrices $\underline{A}=\overline{\underline{A}^{\top}}$ can be understood as the complex extension of real symmetric matrices $\underline{A}=\underline{A}^{\top}$., Hermitian matrix
- there are 2 kinds of adjoints, Hilbert space adjoint vs Banach space adjoint
- “Hilbertian” adjoint (usually called conjugate transpose)
- “Dual” or “Banachian” adjoint
- the Hilbertian adjoint just generalizes the concept of transposing or adjoining a matrix to move it from one scalar product argument to the other to linear mappings on Hilbert spaces (so, just think of it as a generalized transpose)
- problem:
- so far we have only shown that it is necessary that the mean of $\psi$ is $0$ (to ensure $\hat{\psi}(0)=0$), but this is not sufficient for $\psi$ to be a wavelet!
4.7 Example Wavelets: 1st and 2nd Derivatives of the Gaussian, Haar Wavelet
4.7 (i) 1st Derivative of the Gaussian
- (i) 1st derivative of the Gaussian: Let $\psi(x):=xe^{-\frac{x^2}{2}}=-\frac{d}{dx}g(x)$, where $g(x):=e^{-\frac{x^2}{2}}$
- idea: to show that this is a wavelet we need to compute $c_\psi$ and this involves computing the FT of $\psi$ (now you can see why I chose the Gaussian first because there we know what the FT is)
- then, \(\hat{\psi}(\omega)=\mathcal{F}\psi(\omega)=\mathcal{F}\left(-\frac{d}{dx}g\right)(\omega)\stackrel{(3.17)}{=}-i\omega\mathcal{F}g(\omega)\stackrel{(3.20)}{=}-i\omega g(\omega)=-i\omega e^{-\frac{\omega^2}{2}}=-i\psi(\omega)\) \(\Rightarrow c_\psi=2\pi\int_0^\infty\frac{\lvert \hat{\psi}(\omega)\rvert^2}{\omega}d\omega=2\pi\int_0^\infty\frac{\lvert -i\omega e^{-\frac{\omega^2}{2}}\rvert^2}{\omega}d\omega=2\pi\int_0^\infty\frac{\omega^2 e^{-\frac{2\omega^2}{2}}}{\omega}d\omega=2\pi\int_0^\infty\omega e^{-\omega^2}d\omega=\pi\int_0^\infty 2\omega e^{-\omega^2}d\omega=\pi\left(-e^{-\omega^2}\right)\bigg\vert_0^\infty=\pi\) \(\Rightarrow \psi\ \text{is a wavelet.}\)
- important: unlike the mean of $\psi$, the mean of $g$ is not $0$, so $g$ cannot be a wavelet itself!
- intuition:
- $g$ is always above the x-axis, ie. the mean (average value of the function) of $g$ must be positive and not equal to $0$!
- $\psi$ is in part below the x-axis.
- $\psi$ is a product of the function $x$ and a Gaussian. This product must be negative for $x<0$ because $x$ is negative for $x<0$. Ie. the graph of $\psi$ must be in the 3rd quadrant. Since $x=0$ for $x=0$ the graph must go through the origin. For $x>0$ the graph must be in the 1st quadrant because $x$ and the Gaussian are both positive there.
- With the convolution representation of the wavelet transform, we get \(L_{\psi}f(a,b)=\frac{1}{\sqrt{a}}(f\ast D_{-\frac{1}{a}}\psi)(b)=-\frac{1}{\sqrt{a}}\left(f\ast D_{-\frac{1}{a}}g^\prime\right)(b)=\frac{a}{\sqrt{a}}\left(f\ast (D_{-\frac{1}{a}}g)^\prime\right)(b)=\sqrt{a}\left(f\ast (D_{-\frac{1}{a}}g)\right)^\prime(b)\)
- Similarly to the Canny Edge Detector (2.16), the input $f$ is convolved with a scaled Gaussian for different scales (ie. different kernel sizes).
- in (2.16) we noticed that for different scales we get different edge strengths and we had some nice properties and it is somewhat similar here
- Unlike the Canny Edge Detector (2.16), a derivative is taken after the convolution.
4.7 (ii) Mexican Hat
- (ii) 2nd derivative of the Gaussian, aka Mexican hat: Let $\psi(x):=\left(1-x^2\right)e^{-\frac{x^2}{2}}=-\frac{d^2}{dx^2}g(x)$
- one can show (exercise, similar to (i)) that $\hat{\psi}(\omega)=\omega^2 e^{-\frac{\omega^2}{2}}$ and $c_\psi=\pi$. Thus, the Mexican hat is also a wavelet.
- this gives us some hint at what wavelets can look like and what kind of information they possibly extract from our functions:
- these wavelets are not local mean values because the wavelets themselves have mean $0$, but you can think of (i), for instance, as something like a smoothed local derivative. Because, at $x=0$, $\psi$ weights the things to the right of the point $x=0$ positively and the things to the left of the point $x=0$ negatively. Thus, $\psi$ is something like a derivative (cf. derivative filters (2.12)).
- however, (ii) is more like a 2nd derivative because you have a positive value in the center and negative values in the surrounding, but all very smoothed out.
4.7 (iii) Haar Wavelet
- (iii) Haar wavelet: \(\begin{equation*} \psi(x):=
\begin{cases}
1 & 0 \leq x < \frac{1}{2}\\
-1 & \frac{1}{2} \leq x < 1\\
0 & \text{else}
\end{cases}
\end{equation*}\)
- here it is easier to see what $\psi$ computes: $\psi$ computes the mean over $\left[0,\frac{1}{2}\right]$ minus the mean over $\left[\frac{1}{2},1\right]$, so it is also sth like a derivative of your input
- to show that the Haar wavelet is a wavelet we can
- represent $\psi$ as $\psi=\chi_{\left[0,\frac{1}{2}\right)}-\chi_{\left[\frac{1}{2},1\right)}$ and then we can use the properties of the FT to show that \(\hat{\psi}(\omega)=\frac{1}{\sqrt{2\pi}}ie^{-i\frac{\omega}{2}}\sin\left(\frac{\omega}{\psi}\right)\text{sinc}\left(\frac{\omega}{4\pi}\right)\)
- this should not surprise us because we know that the FT of $\chi_{\left[-B,B\right]}$ is a cardinal sine and we know that shifts are turned into modulations, and therefore, $\chi_{\left[0,\frac{1}{2}\right)}$ would have to be shifted to $\chi_{\left[-\frac{1}{4},\frac{1}{4}\right)}$ (ie. the support of $\chi$ is shifted to the origin s.t. $\chi$ has the shape $\chi_{\left[-B,B\right]}$) and $\chi_{\left[\frac{1}{2},1\right)}$ as well, and that is encoded in this modulation $e^{-i\frac{\omega}{2}}$ (cf. (3.4))
- this is not completely obvious and you really need to compute this to see that this formula is correct
- based on that we can show $c_\psi=\ln{2}$, thus $\psi$ is a wavelet
- interesting: you do not need to compute this manually because $c_\psi=\ln{2}$ also follows from (4.16) + (4.17 (i))
- outlook: we will show how to generate wavelets with certain properties and with this generation we can also create the Haar wavelet and that is why it will inherit the general properties
- Unlike the derivatives of the Gaussian we considered before (ie. (i) and (ii)) that are very smooth and have non-compact support, the Haar wavelet is discontinuous and has compact support.
- interesting: it is actually quite difficult to find wavelets that are both smooth and have compact support (will be discussed later)
- note: unlike what the name suggests this discrete WT is not fully discrete like the DFT. (However, there are some relations to the FS, so it is more like the FS.)
- problem: One thing that is very apparent is that this WT so far is redundant because we take a 1D function and convert it to a 2D function. Therefore, this 2D function has to have some redundant information because we cannot need a 2D function to describe a 1D function.
- more precisely: The (continuous) wavelet transform of a function $f \in L^2 (\mathbb{R})$ is apparently a redundant representation of $f$ ($f$ is univariate (1D function), $L_\psi f$ is bivariate (2D function)).
- The question arises whether it is sufficient to know $L_\psi f$ on a subset of $\left(0,\infty\right] \times \mathbb{R}$ (ie. perhaps we do not need to compute it for all the scalings and all the shifts, but just for some of them).
- It will turn out that this is indeed the case for certain discrete subsets, but showing this requires several preparations.
- But it is very important to venture into this because in order to practically use the WT we need to figure out for which scalings and for which translations we actually need these computations.
- problem: The first tool on this way is the MRA:
- problem:
- the WT is redundant
- we are now on our way to the DWT that is supposed to only look at necessary spatial and scaling parameters, so not at all of them, but hopefully just a subset
- to make this possible I introduced the concept of a so-called MRA:
4.8 MRA, MSA, Generator
- For $j \in \mathbb{Z}$, let $V_j$ be (vii) a closed subspace of $L^2(\mathbb{R})$.
- Then, $(V_j)_{j\in \mathbb{Z}}$ is called multiresolution analysis (MRA) or multiscale approximation (MSA), if
it fulfills the following conditions:
- (i) translational invariance $\forall j, k \in \mathbb{Z} : f \in V_j \Leftrightarrow T_{2^j k} f \in V_j$
- when we fix a certain $j$, we must be able to shift $f$ in $V_j$ while $f$ stays in $V_j$. But we are not allowed to shift arbitrarily, but we have to make finite steps: $k$ is the number of steps that we can take. So we can take arbitrary many steps (because $k$ is in $Z$), but the step size is $2^j$, ie. the bigger the $j$ the larger the step. Which hints that these $V_j$’s get smaller and smaller (→ (ii)) because for $j=0$ we must be able to shift by step size $2^0=1$, for $j=1$ we only need to be able to shift by step size $2$, etc.
- (ii) inclusion $\forall j \in \mathbb{Z} : V_{j+1} \subset V_j$
- ie. the space is getting smaller when I increase $j$
- “if $j_1 > j_2$ the space $V_{j_1}$ is included in $V_{j_2}$”
- with increasing $j$ the spaces can only get smaller
- (iii) scaling $\forall j \in \mathbb{Z} : f \in V_j \Leftrightarrow D_{\frac{1}{2}} f \in V_{j+1}$
- this $D_{\frac{1}{2}}$ just means that everything is spread out
- eg. lets say your $f$ is a characteristic function of the interval $\left[0,1\right]$, then when you apply $D_{\frac{1}{2}}$ it will be the characteristic function of $\left[0,2\right]$ (which also relates to the fact that if $j$ gets bigger then these shifts $T_{2^jk}$ get bigger)
- problem: we need some more properties to make this meaningful because so far with (i-iii) we could just say that every $V_j$ is all of $L^2$, then everything here is trivially fulfilled. So, we need to prevent that with the following properties:
- (iv) trivial intersection $\bigcap_{j\in \mathbb{Z}} V_j = \{0\}$
- ie. the intersection of all the $V_j$’s is just $0$ which formalizes that these spaces get smaller and smaller and at some point there is nothing in there any more.
- problem: but (iv) would still be fulfilled if we just say all $V_j$’s are just $0$, then everything is trivially fulfilled. We do not want this, either. Therefore, we require (v):
- (v) completeness (actually, “denseness in $L^2$”) $\overline{\bigcup_{j\in \mathbb{Z}} V_j} = L^2(\mathbb{R})$
- phth: aka “$\bigcup_{j\in \mathbb{Z}} V_j$ is dense in $L^2$”
- siehe “Prerequisites” → “dense” → def. (i)
- Berkels: in this sense the $V_j$’s must capture everything that is in $L^2$.
- phth: complete in the sense that $L^2$ is complete (Riesz-Fischer theorem) and we must be able to approximate any $L^2$ function arbitrarily well with a sequence of functions in $\bigcup_{j\in \mathbb{Z}} V_j$, where the limit of this sequence need not necessarily be in $\bigcup_{j\in \mathbb{Z}} V_j$ (like we can approximate any element of $\mathbb{R}$ with a sequence in $\mathbb{Q}$)
- (vi) orthonormal basis $\exists\phi \in V_0 : \{T_k \phi : k \in \mathbb{Z}\}$ is a CONS of $V_0$
- $T_k \phi$ means “shift $\phi$ by integers $k$”
- the “shift $\phi$ by integers” links to the translational invariance (i) for $j=0$. There [in (i)] we [require that all $f$ in $V_0$] also need to be able to shift by integers. Thus, if $\phi$ is in $V_0$, then all the $T_k\phi$ must [necessarily!] be in $V_0$ by property (i). But [in (vi)] what we want is that this [set] is a CONS!
- which gives us a very nice way to construct a basis for these kind of spaces
- The function $\phi$ from the orthonormal basis property is called generator or scaling function of the MRA.
- name “generator” comes from (vi) because $\phi$ generates an ONS of $V_0$
- name “scaling function” comes from the fact that $\phi$ fulfills the scaling equation (4.11), ie. $\phi$ can be expressed by a scaled version of itself
- problem: this sounds like a very abstract set of properties where it is not so clear yet why this is good or perhaps not even clear how we could even choose this $V_j$. So let us first start with a simple example of such an MRA:
4.9 Example MRA: “Space of Functions that are Constant on Dyadic Intervals”
- for $j\in\mathbb{Z}$ let \(V_j:=\{f\in L^2 : f\vert_\left[k2^j,(k+1)2^j\right)\ \text{is constant for all}\ k\in\mathbb{Z}\}\)
- ie. $V_j$ is the space of square integrable functions that are constant on the dyadic intervals $\left[k2^j , (k + 1)2^j\right)$ (“dyadic” because the interval limits are powers of $2$, related: dyadic logarithm)
- phth: square integrable means for p.w. constant functions that they must either have compact support or be the zero function, ie. the p.w.c. functions are nonzero only a finite number of dyadic intervals
- then, $(V_j)_{j\in\mathbb{Z}}$ is a MRA.
Proof (vii)
- (vii) The $V_j$ are obviously closed subspaces of $L^2(\mathbb{R})$.
- Berkels: If you have a sequence of functions $f_n\in V_j$ for a fixed $j$ that are piecewise constant on the interval $\left[k2^j , (k + 1)2^j\right)$, then their limit $\lim_{n\to\infty}f_n = f$ will also be piecewise constant on the same interval $\left[k2^j , (k + 1)2^j\right)$, ie. $f\in V_j$. There is nothing that could happen to make it [$\lim_{n\to\infty}f_n$] not piecewise constant.
- “Folgen-Abgeschlossenheit” (Bredies S.17): “(…) für jede Folge $( x_n )$ in $U$ mit $x_n \to x$ der Grenzwert $x$ auch in $U$ ist”
- recall: “closed” = “abgeschlossen bzgl bestimmter Operationen” (aka “closed under the limit operation”)
- zB. closed bzgl Multiplikation heißt, Multiplikation führt nicht aus der Menge raus, dh. das Produkt bleibt in der Menge, in der die Faktoren waren
- “In a complete metric space, a closed set is a set which is closed under the limit operation.”, Wikipedia
- $L^2$ is complete (Riesz-Fischer Theorem)
Proof (i) - (iii)
- (i) Translational invariance, (ii) inclusion and (iii) scaling are obviously fulfilled from the corresponding properties of the dyadic intervals.
- (i) - (iii) are intuitively clear:
- (i) we need to be able to shift by multiples of $2^j$: if say $j=0$, then we need to shift the function by $2^0=1$, or by $2^1=2$, etc. and this just shifts the constant values from one interval to another interval, but they will stay piecewise constant on these intervals, you just shift the values to different intervals
- (ii) if you scale by $\frac{1}{2}$ you are supposed to go from $V_j$ to $V_{j+1}$. Lets look at the green function here: if we scale that by $\frac{1}{2}$ that means we double everthing, ie. this value will be mapped to that (…), so you just switched from being constant on intervals of length one to being constant on intervals of length two
- (iii) if you are piecewise constant on intervals of length $2^{j+1}$ of course this is also piecewise constant on intervals of length $2^j$ (because the latter is just a weaker property)
Proof (iv)
- (iv) A function that is in the intersection of all $V_j$ has to be constant
- on $\left[0,1\right)$ and on $[−1, 0)$ (for $j=0$).
- on $\left[0,2\right)$ and on $[−2, 0)$ (for $j=1$).
- … etc.
- on $\left[0,\infty\right)$ and on $(−\infty, 0)$ (for $j\to \infty$).
- The only $L^2$-function ($L^2$ means the function must be integrable on $\left[0,\infty\right)$ and $(−\infty, 0)$!) that fulfills this is the zero function.
Proof (v)
- (v) The completeness follows from the fact $C_c(\mathbb{R})$ is dense in $L^2(\mathbb{R})$ (B.8), continuous functions on compact sets (ie. $C_c(\mathbb{R})$) are uniformly continuous and that uniformly continuous functions can be approximated in the supremum norm with functions constant on dyadic intervals.
- phth: thus, functions constant on dyadic intervals are dense in $L^2$
- Berkels: You need to stack these arguments, ie. first go to compact support, then go to uniform continuity and then you know you just need to make these intervals small enough to have the necessary approximation.
- phth:
- This prooves that “$\bigcup_{j\in \mathbb{Z}} V_j$ is dense in $L^2$” (denseness).
- But, $\bigcup_{j\in \mathbb{Z}} V_j$ is NOT complete in the sense that “all Cauchy sequences in $\bigcup_{j\in \mathbb{Z}} V_j$ converge in $\bigcup_{j\in \mathbb{Z}} V_j$” (completeness).
- $L^2$ is complete (Riesz-Fischer Theorem), but this does not necessarily mean that $\bigcup_{j\in \mathbb{Z}} V_j$ is complete (because eg. $\mathbb{Q}$ is dense in $\mathbb{R}$, but $\mathbb{Q}$ is not complete just because $\mathbb{R}$ is complete!).
- But we can approximate any $L^2$ function arbitrarily well with a function in $\bigcup_{j\in \mathbb{Z}} V_j$.
- recall:
- “completeness” = “Cauchy sequences in this set converge to a point in this set”
- ”$\mathbb{Q}$ is dense in $\mathbb{R}$”
- “$L^p$ is complete” (cf. section “Prerequisites”)
- “supremum norm” = “distance between two functions”
- “uniform continuity” def.
Proof (vi)
- (vi) Finally, $\phi = \chi_{\left[0,1\right)}$ is a generator [because]
- (vi.i) The ONS property [of $\{T_k \phi : k \in \mathbb{Z}\}$] is obviously fulfilled and
- Behauptung: betrachte $V_0 = \{f \in L^2 : f \vert_{\left[k,k+1\right)}\ \text{is constant for all}\ k \in \mathbb{Z}\}$, dann ist $\phi = \chi_{ \left[0,1\right) } \in V_0$ ein Generator, weil:
- Berkels:
- So, [for $\phi$ to be a generator] we need that these shifted versions of $T_k\phi$ are a CONS.
- the orthonormal property is obvious because if you shift $\left[0,1\right]$ by an integer, then it will not intersect with the interval $\left[0,1\right]$ any more. So, for different integer shifts $k$ you just have this function [$\chi_{ \left[0,1\right) }$] on $\left[k,k+1\right]$ and those [$\chi_{ \left[0,1\right) }$ and $\chi_{ \left[k,k+1\right) }$] do not intersect.
- dh. die $L^2$-norm der Produktfunktion $\chi_{ \left[m,m+1\right) } \cdot \chi_{ \left[n,n+1\right) }$ ist $0$ für $m \neq n$ und $1$ für $m=n$, was die ONS Eigenschaft der shifted versions of $T_k\phi$ beweist (see ONS def.).
- (vi.ii) the completeness [of $\{T_k \phi : k \in \mathbb{Z}\}$] is shown based on the fact that (A) the squared $L^2$-norm of an element of $V_0$ has to be (B) the sum of the squared values of the element on each of the intervals $\left[k,k+1\right)$.
- Berkels:
- (B) is equivalent to the Fcoeffs in (3.42)
- so, if I am supposed to give you the $L^2$-norm (rhs of (3.42)) of the green function in the picture above it is always the squared area under the graph and, since the length is $1$, it is just the squares of the heights. So, this is a series. Based on this, you directly see that the completeness def. (3.42) is fulfilled, where we needed to show that the norm of function squared (rhs of (3.42)) is the same as the sum of the squared Fcoeffs (lhs of (3.42)). Here, the Fcoeffs (lhs of (3.42)) are exactly the interval heights (because we just multiply a function that is constant on the interval $\left[k,k+1\right)$ which gives us then the value in the interval $\left[k,k+1\right)$).
- phth: “(A) = (B)” $\Leftrightarrow$ shifted versions of $T_k\phi$ are a CONS (cf. (3.42))
- phth: consider an element $f \in V_0$, then
- (A) is $\lVert f\rVert^2_{L^2}$
- (B) is $\sum_{k\in Z}(f, T_k\chi_{\left[0,1\right)})^2$, where $(f, T_k\chi_{\left[0,1\right)})$ is the k-th Fcoeff
4.10 Orthogonal Projection $P_{V_j}$, Approximating $f$ with the Nested Spaces $V_j$
- conditions:
- Let $(V_j)_{j\in\mathbb{Z}}$ be a MRA.
- Let $P_{V_j}$ denote the orthogonal projection from $L^2$ to $V_j$.
- Note that the orthogonal projection exists since $L^2$ is a Hilbert space and $V_j$ a nonempty, closed subspace of $L^2$. (Hilbert projection theorem)
- Let $f \in L^2$.
- Then, we have $\lim_{j\to -\infty} P_{V_j} f = f$ and $\lim_{j\to \infty} P_{V_j} f = 0$.
- $\lim_{j\to \infty} P_{V_j} f = 0$ encodes that the $V_j$ get smaller and smaller
- $\lim_{j\to -\infty} P_{V_j} f = f$ means that we can use the nested spaces $V_j$ to approximate $f$, the smaller $j$, the better the approximation.
- So, if we have a very small $j$ ($j\to -\infty$, highly negative), then we get very close to $f$ and if it is very large ($j\to\infty$), then it is very coarse. So, we can decide how well we want to approximate $f$ by choosing this $V_j$.
- we can approximate $f$ with the spaces $V_j$ arbitrarily well if $j$ is “negative enough”
- proof:
- the reason we are looking at this is to get familiar with the consequences of all of the properties in (4.8). So far, it is not clear why we need these properties. Here this gets clear.
- $j\to\infty$: (4.14)
- $j\to -\infty$: show that $\lVert f − P_{V_j} f\rVert_{L^2} = \inf_{u\in V_j} \lVert f − u\rVert_{L^2}$ for all $j \in \mathbb{Z}$ goes to $0$ for $j\to -\infty$
- this is because, by definition, the orthogonal projection $P_{V_j}$ is just the “closest point in the space $V_j$ to a given point $f$”
- use completeness (4.8 (v)) and inclusion (4.8 (ii)) properties
4.11 $\phi_{j,k}$, Scaling Equation, Scaling Function $\phi$
- conditions:
- Let $(V_j)_{j\in\mathbb{Z}}$ be a MRA with generator $\phi$.
- again, this is just some object that fulfills all of the properties in (4.8)
- and we have one example (4.9) that fulfills all of the properties in (4.8), so we know that there are such MRAs, thus it makes sense to look at them
- Let $\phi_{j,k} (x) := 2^{-\frac{j}{2}}\phi(2^{-j}x - k)$ for all $j, k \in \mathbb{Z}$ and $x \in \mathbb{R}$.
- problem: why do we look at $\phi_{j,k}(x)$? → Because $\phi_{j,k}(x)$ is an ONB of $V_j$ → (i)
4.11 (i) $\{\phi_{j,k} : k \in \mathbb{Z}\}$ is an ONB/CONS of $V_j$
- (i) The scaling property (4.8 (iii)) and the orthonormal basis property (4.8 (vi)) together imply that, for all $j \in \mathbb{Z}$, $\{\phi_{j,k} : k \in \mathbb{Z}\}$ is an ONB/CONS of $V_j$ (exercise).
- problem: why is this the case?
- so, in (4.8) we have assumed that if we start with $\phi$ then these $T_k\phi$s are a CONS
- but if we look at the new notation then this $T_k\phi$ is nothing but $\phi_{0,k}$
- $j=0$ eliminates this scaling $2^{-\frac{j}{2}}$ and $2^{-j}$ in $\phi_{j,k}(x)$
- and we just have the translation $(x-k)$ in $\phi_{j,k}(x)$
- the $\phi_{0,k}$s are an ONS of $V_0$
- but then we have this scaling property (4.8 (iii)), ie. that $f$ is in $V_j$ iff the scaled version $D_{\frac{1}{2}}f$ is in $V_{j+1}$ and this we can use to convert the ONS from $V_0$ to the other $V$s
- the scaling $2^{-\frac{j}{2}}$ is exactly done in such a way that the normality property does not vanish because if you just scale by $D_{\frac{1}{2}}f$ you would scale the $L^2$-norm (ie. the scaling would change the $L^2$-norm), but the $2^{-\frac{j}{2}}$ corrects for this (exercise: when you just compute it you will see how this fits together)
- so, from one ONB of $V_0$ we can go to ONBs for all of our $V_j$s
- sidenote: if we have such an MRA then the absolute value of $j$ has no meaning by itself because if we have $(V_j)_{j\in\mathbb{Z}}$ then we can just shift $j$ by any number and it still goes from $-\infty$ to $\infty$ just that $j=0$ then means perhaps $j=10$, so there is no inherent value. The only thing that puts a specific value to the $j$ is that for $V_0$ we have the generator $\phi$. And, since we have the generator at $0$, it is natural that we just go one level finer and end up with $V_{-1}$. In principle, we could just invert the ordering of the $j$s, it would not change anything, except for that going up with $j$ (better/finer approximation) and going down with $j$ (worse/coarser approximation) would mean the opposite, ie. whether you call “going finer” “increasing $j$” or “decreasing $j$” does not matter. So, the construction (4.8) links different scales, but it is completely arbitrary whether you want to associate fine scales with a large $j$ or with a negative $j$. It is just one decision. You can just replace $j$ with $-j$ everywhere, if you want, and everything stays consistent.
- problem: in this way we can connect amongst other things $V_0$ and $V_{-1}$ → (ii)
4.11 (ii) The Generator $\phi$ is not only in $V_0$, but also in $V_{-1}$
- (ii) Due to the inclusion property (4.8 (ii)), we have $\phi \in V_0 \subset V_{-1}$, ie. the generator is not only in $V_0$, but also in $V_{-1}$.
4.11 (iii) We can express the Generator $\phi$ in the CONS $\{\phi_{-1,k} : k \in \mathbb{Z}\}$
- (iii) Since $\{\phi_{-1,k} : k \in \mathbb{Z}\}$ is a CONS of $V_{-1}$, we can express the generator $\phi$ in this CONS \(\label{eq:scaling-equation}\phi = \sum_{k\in\mathbb{Z}} h_k \phi_{-1,k} = \sum_{k\in\mathbb{Z}} h_k \sqrt{2}\phi(2 \cdot - k),\) where the dot “$\cdot$” is just a placeholder (not a variable!) and $h_k = (\phi, \phi_{-1,k} )_{L^2}$ for $k \in \mathbb{Z}$ are the Fcoeffs, cf. (3.43) (note that this does not mean that the series is converging pointwise a.e.).
- This equation is called scaling equation and the reason why $\phi$ is also called scaling function (cf. (4.8)).
- note: This also means that we can express $\phi$ with scaled versions of itself. This puts some interesting constraints on this $\phi$.
- phth: this does not imply the pointwise convergence $\phi\color{red}{(x)} = \sum_{k\in\mathbb{Z}} h_k\sqrt{2}\phi(2\color{red}{x} - k)$ for all $x$, but only convergence of the $L^2$-norm of the series (rhs) to the $L^2$-norm of $\phi$ (lhs) (see (3.48))
- watch: lecture 27, beginning of (4.12): long discussion about (3.43) and (3.48), convergence of the series in the $L^2$-norm vs. pointwise convergence
- counterexample: in the figure below: consider the following sequence of functions:
- (black) $1$ on $\left[0,1\right]$, otherwise $0$
- (red) $1$ on $\left[0,\frac{1}{2}\right]$, otherwise $0$
- (blue) $1$ on $\left[\frac{1}{2},1\right]$, otherwise $0$
- (green) $1$ on $\left[0,\frac{1}{4}\right]$, otherwise $0$
- (…) $1$ on $\left[\frac{1}{4},\frac{1}{2}\right]$, otherwise $0$
- (…) $1$ on $\left[\frac{1}{2},\frac{3}{4}\right]$, otherwise $0$
- (…) $1$ on $\left[\frac{3}{4},1\right]$, otherwise $0$
- (…) $1$ on $\left[0,\frac{1}{8}\right]$, otherwise $0$
- etc.
- thus, wherever you are in this sequence, you will always pick up a $1$
- thus, at no point this sequence will converge pointwise to $0$, but the $L^2$-norm converges to $0$ because the intervals get smaller and smaller
- this shows that the $L^2$-convergence does not give you pointwise control, not even on a.e. point
- for most of the part this is not so important and if you go to discrete you do not have these kind of problems, but I stress this here, so that you do not fall for this equality ($\ref{eq:scaling-equation}$) showing more than it is - and that is the reason why I did not put an $(x)$ in ($\ref{eq:scaling-equation}$), like $\phi\color{red}{(x)} = \sum_{k\in\mathbb{Z}} h_k\sqrt{2}\phi(2\color{red}{x} - k)$, because it would be wrong! (actually, it was wrong in the older version of the lecture notes) It is very easy to overlook. That is why I am stressing it.
- similarly, the FS simply does not converge pointwisely, no matter how you interpret this
- note: we will see later that for “nice generators” and “nice wavelets” only a finite amount of these $h_k$s will be non-zero and then all of these [convergence related] problems are gone because if this sum here in ($\ref{eq:scaling-equation}$) is finite then you do not need to consider convergence because you can compute the finite sum and then you are done - it is just when you have a series (“infinite sum”) then you need to be careful! The wavelets that you use in practice will have only a finite number of $h_k$s that are not zero.
- problem: with (iii) we see a direct connection to the continuous WT:
4.11 (iv) $(f, \phi_{j,k} )_{L^2} = L_{\phi} f (2^j , 2^j k)$
- (iv) For $f \in L^2 (\mathbb{R})$, we have \((f, \phi_{j,k} )_{L^2} = \int_{\mathbb{R}}f(x)2^{-\frac{j}{2}} \phi(2^{-j} x - k) dx = \boxed{\int_{\mathbb{R}}f(x) \sqrt{2^j} \phi \left(\frac{x - 2^j k}{2^j}\right) dx} = L_{\phi} f (2^j , 2^j k).\)
- problem: Why did we rewrite $(f, \phi_{j,k} )_{L^2}$ to the boxed formula?
- This should look familiar from (4.1).
- The boxed formula is exactly the continuous WT for certain $a$ and $b$.
- Thus, $(f, \phi_{j,k} )_{L^2}$ is the evaluation of the continuous WT $L_{\phi} f$ for $a = 2^j$ and $b = 2^j k$.
- in particular, we can get all of these necessary coefficients $(f, \phi_{j,k} )_{L^2}$ just by evaluating the WT at certain discrete parameters
- problem: we first have to understand what these $V_j$s are in order to conclude more from this, but we can already see that there is a strong link to the WT by this
- problem: to further work with this I will prepare a helping lemma that tells us sth about these $h_k$
4.12 $h_k$s orthogonal to themselves when shifted
- condition:
- Let $h_k$ be the coefficients of the scaling equation of an MRA.
- then \(\sum_{k\in\mathbb{Z}} h_k h_{k+2m} = \delta_{0,m}\ \text{for all}\ m \in \mathbb{Z}\)
- that means, in a certain sense, the scaling equation coeffs are orthogonal to themselves if you shift them by $2m$
- for $m=0$ that means their squared sum is $1$
4.13 Prerequisites: Orthogonal Complement, Direct Sum
- outer/direct sum:
- orthogonal complement:
- “the set $W^{\perp}$ of all vectors in $V$ that are orthogonal to every vector in $W$”
- phth: thus, $W^{\perp}$ always includes the vector $0$ (important for the proof of (4.14))
- Für einen Unterraum $U \in X$ ist der Unterraum aller Vektoren \(U_\perp = \{ y \in X\ \vert\ x \perp y\ \text{for all}\ x \in U \}\) das orthogonale Komplement von $U$., (Bredies)
- properties: Wikipedia
4.13 Approximation Space, Detail Space, Wavelet Space
- conditions:
- Let $(V_j)_{j\in\mathbb{Z}}$ be a MRA.
- Let $W_j$ be such that $V_{j−1} = V_j \oplus W_j$ and $V_j \perp W_j$, ie. $W_j$ is the orthogonal complement of $V_j$ in $V_{j-1}$.
- we know that $V_j$ is contained in $V_{j-1}$ and everything that is missing in $V_j$ is concentrated in $W_j$. In a way that we only pick up the orthogonal complements that are missing. That is a nice separation.
- Then, $V_j$ is called approximation space to the scale $j$
- because we know that the projection of $f$ to $V_j$ approximates $f$ in a sense that if $j$ goes to $-\infty$ we get $f$ back
- and $W_j$ is called detail space or wavelet space to the scale $j$.
- Because you can think of it, when going from $V_j$ to the larger $V_{j-1}$, then we are adding some stuff and the stuff that we add are just “details” [from the “detail space”].
- If the approximation gets better [as $j$ gets smaller], apparently, we have to add things [functions] that were missing, ie. that were not representable on these coarser spaces [$V_j$ with larger $j$], and that motivates the name “detail space”.
- problem: so, all of the $W_j$s are enough to represent all $L^2$ functions. Now, we can iterate this:
4.14 Splitting $V_j$ into Details, Splitting a $L^2$-Function into all its Details in all Scales $j$
4.14 (i) Decomposition of $V_j$ into Details
- The definition of $W_j$ implies by iteration that \(\label{eq:Vj-decomposition}V_j \stackrel{(4.13)}{=} V_{j+1} \oplus W_{j+1} \stackrel{(4.13)}{=} V_{j+2} \oplus W_{j+2} \oplus W_{j+1} = \ldots \stackrel{(4.13)}{=} V_{j+M} \oplus \color{blue}{\bigoplus_{m=j+1}^{j+M} W_m}\) for all $M \geq 1.$
- Berkels: “outer sum” ($\color{blue}{\bigoplus_{m=j+1}^{j+M} W_m}$) also means that the intersection of all of these $W_j$s is $\{0\}$.
- explanation A: [because by def. of $W_j$, $W_j$ contains all vectors that are orthogonal to the vectors in $V_j$, thus, since the vector $0$ is orthogonal to all vectors in all $V_j$s, all $W_j$ must include the vector $0$]
- explanation B: from Lemma below: $V=U\oplus W$ iff the intersection of $U$ and $W$ contains only the vector $0$ (see the “iff” in the Lemma)! In particular, in our case, the intersection of $V_{j+M}$ and all $W_m$s for $j+1\leq m\leq j+M$ must be the set containing only the vector $0$. But this also means that the intersection of all $W_m$s for $j+1\leq m\leq j+M$ must be the set containing at least the vector $0$ (think about a Venn diagram for 3 intersecting sets: intersection area gets larger, when you take away one intersecting set). Later, we will see $V_j = \bigoplus_{m\geq j+1} W_m$ and with this we can say that the intersection of all $W_m$s for $m\geq j+1$ must be the set containing only the vector $0$. This must hold for all $j$. So, it is safe to say “the intersection of all of these $W_j$s is $\{0\}$”, as Berkels said.
source
- This immediately implies $V_j \supset \bigoplus_{m\geq j+1} W_m$.
- because from ($\ref{eq:Vj-decomposition}$) we see that $V_j$ contains this outer sum (in $\color{blue}{blue}$) for all $M$, and in particular, also the infinite summation $M\to\infty$ of all of these outer sums (in $\color{blue}{blue}$)
- problem: The question is: Can there be something missing in $\bigoplus_{m\geq j+1} W_m$ or does it have to be all of entire $V_j$. The answer is, it is all, but we have to show this:
- Assume there is $v \in V_j \setminus \bigoplus_{m\geq j+1} W_m$. Since \(\label{eq:VjplusM-decomposition}V_j \setminus \left(\bigoplus_{m\geq j+1} W_m\right) \stackrel{(\ref{eq:Vj-decomposition})}{=} \left(V_{j+M} \oplus \left(\color{purple}{\bigoplus_{m=j+1}^{j+M} W_m}\right)\right) \setminus \left(\color{purple}{\bigoplus_{m\geq j+1} W_m}\right) \stackrel{(\text{explanation 1})}{=} V_{j+M} \setminus \left(\bigoplus_{m \geq j+1+M} W_m\right),\) we have, $v \in V_{j+M}$ for all $M \geq 1$.
- explanation 1: intuitiv: das letzte Gleichzeichen gilt, weil Mengen ($\color{purple}{purple}$ in ($\ref{eq:VjplusM-decomposition}$)) zu $V_{j+M}$ addiert “$\oplus$” und subtrahiert “$\setminus$” werden und sich dabei gegenseitig “canceln” ($\color{red}{red}$ in ($\ref{eq:VjplusM-decomposition-explanation}$)) \(\label{eq:VjplusM-decomposition-explanation}V_{j+M}\oplus\{\color{red}{(j+1)+\ldots+(j+M)}\}\setminus\{\color{red}{(j+1)+\ldots+(j+M)}+(j+M+1)+\ldots+(\infty)\}=V_{j+M}\setminus\{(j+M+1)+\ldots+(\infty)\}\)
- due to the inclusion property, this implies $v \in V_M$ for all $M \in \mathbb{Z}$
- due to the trivial intersection property, this implies $v = 0$. This is a contradiction to $v\not\in \bigoplus_{m\geq j+1} W_m \color{blue}{\supset \{0\}}$.
- $\color{blue}{blue}$ part: all of these $W_m$s contain $0$ (recall: orthogonal complement properties: intersection of $V_j$ and $W_j$ is $\{0\}$)
- (i) Thus, we have shown \(V_j = \bigoplus_{m\geq j+1} W_m.\)
- thus, “the $V_j$ can be split into all details contained for these larger $V_j$ spaces”
4.14 (ii) Decomposition of $L^2$ into Details
- (ii) Combined with the completeness, we have \(\color{red}{L^2(\mathbb{R})} \stackrel{(\text{completeness prop. of}\ V_j)}{=} \overline{\bigcup_{j\in\mathbb{Z}} V_j} \stackrel{(\text{i})}{=} \overline{\bigcup_{j\in\mathbb{Z}}\bigoplus_{m\geq j+1} W_m} \color{red}{= \overline{\bigoplus_{m\in\mathbb{Z}} W_m}}\).
- thus, “we can split any $L^2$ function in sums of elements of this $W_m$”
- in other words: this is like “splitting the function in all its details in all the scales”
4.14 (iii) Representing $f$ via $V_j$ and $W_j$
- (iii) Since $V_j \oplus W_j$ is an orthogonal decomposition of $V_{j−1}$, we have \(P_{V_{j-1}} \stackrel{(\text{orth. decomposition})}{=} P_{V_j} + P_{W_j} \Rightarrow P_{W_j} = P_{V_{j-1}} - P_{V_j}.\) Due to \(\label{eq:L2-decomposition}L^2(\mathbb{R}) \stackrel{(\text{ii})}{=} \overline{\color{blue}{\bigoplus_{m\in\mathbb{Z}} W_m}} = \overline{\bigoplus_{m\geq j+1} W_m \oplus \bigoplus_{m\leq j} W_m} \stackrel{(\text{i})}{=} \overline{\color{red}{V_j} \oplus \color{purple}{\bigoplus_{m\leq j} W_m}},\) $f \in L^2(\mathbb{R})$ can be expressed by \(\label{eq:f-projection-to-V-and-W}\boxed{f \stackrel{(\ref{eq:L2-decomposition})}{=} \color{blue}{\sum_{m\in\mathbb{Z}} P_{W_m} f} \stackrel{(\ref{eq:L2-decomposition})}{=} \color{red}{P_{V_j} f} + \color{purple}{\sum_{m\leq j} P_{W_m} f}.}\)
- Bredies: Wir können nun jedes $f \in L^2$ auf verschiedene Weisen mit Hilfe der Räume $V_j$ und $W_j$ darstellen
- explanation 2:
- ($\color{blue}{blue}$ part of ($\ref{eq:f-projection-to-V-and-W}$)) is the orthogonal projection of $f$ to the orthogonal decomposition of $L^2$ in the form of $\overline{\color{blue}{\bigoplus_{m\in\mathbb{Z}} W_m}}$
- ($\color{red}{red}+\color{purple}{purple}$ part of ($\ref{eq:f-projection-to-V-and-W}$)) is the orthogonal projection of $f$ to the orthogonal decomposition of $L^2$ in the form of $\overline{\color{red}{V_j} \oplus \color{purple}{\bigoplus_{m\leq j} W_m}}$
- related: “Decomposition of a vector space into direct sums is not unique.”, Wikipedia
- Bredies (en): These equations justify the name “multiscale analysis”: the spaces $V_j$ allow a systematic approximation of functions on different scales.
- Bredies (de): Diese Gleichungen rechtfertigen die Bezeichnung “Multiskalenanalyse”: Die Räume $V_j$ erlauben die systematische Approximation von Funktionen auf verschiedenen Skalen.
- Bredies: Abbildung 4.12: Im Sinne des vorherigen Beispiels [dh. Beispiel “Stückweise konstante Multiskalenanalyse”] kann man etwas unpräzise aber suggestiv sagen: $P_{V_j} u$ ist die Darstellung von $u$ “auf der Skala $V_j$” und enthält Details von $u$ “bis zur Größe $2^j$”, siehe Abbildung 4.12.
- phth: die Höhe von $P_{V_j}u$ auf den einzelnen dyadischen Intervallen ist nach (4.15 (i)) “the mean of $f$ on the dyadic interval $\left[k2^j , (k + 1)2^j \right)$” (im Fall der stückweise konstanten Multiskalenanalyse) bzw. $\sum_{k\in\mathbb{Z}}(\phi_{j,k},u)_{L^2}\phi_{j,k}$ (im allgemeinen Fall)
- phth: Details von $u$ “bis zur Größe $2^j$”
Abbildung 4.12:
4.14 (iv) Proof of 4.10 (continued)
- (iv) (iii) implies that $P_{V_j} f \to 0$ for $j \to \infty$ holds (because explanation 3), which we claimed earlier (proof of (4.10)).
- explanation 3: “sandwich”: the last equality in ($\ref{eq:f-projection-to-V-and-W}$) must hold for all $j$, thus, since ($\color{blue}{blue}$) and ($\color{purple}{purple}$) are equal for $j\to\infty$, this equality can only hold for $j\to\infty$ if $P_{V_j}f$ vanishes
- problem: Let’s look at our only MRA example again and see what we can get about these projections there:
4.15 Example: Piecewise Constant MRA
4.15 (i) Compute Projection of $f$ to $V_j$
- Let $(V_j )j\in\mathbb{Z}$ the piecewise constant MRA from (4.9) and let $f \in L^2 (\mathbb{R})$.
- Since (A) the elements of $V_j$ are constant on the intervals $\left[k2^j , (k + 1)2^j \right)$ and (B) $\lVert f − P_{V_j} f\rVert_{L^2} = \inf_{u\in V_j} \lVert f − u\rVert_{L^2}$, we get for $x \in \left[k2^j , (k + 1)2^j \right)$ that (intuitively: boils down to approximating a function on an interval with a single value) \((P_{V_j} f )(x) \stackrel{(\text{p.w.c. MRA})}{=} \color{blue}{2^{−j}}\color{red}{\int^{(k+1)2^j}_{k2^j} f(y)dy} = 2^{−j}\int_{\mathbb{R}} \chi_{\left[k2^j ,(k+1)2^j \right)}(y)f(y)dy\) \(\label{eq:PVj1}(\ldots) = 2^{−\frac{j}{2}}\int_{\mathbb{R}}2^{−\frac{j}{2}}\chi_{\left[0,1\right)}(2^{−j}y − k)f(y)dy = \color{purple}{2^{−\frac{j}{2}}}\int_{\mathbb{R}}\phi_{j,k}(y)f(y)dy = (f, \phi_{j,k} )_{L^2} \color{purple}{\phi_{j,k}(x)}\).
- ($\color{purple}{purple}$) is the value of $\phi_{j,k}(x)$ on the intervals $\left[k2^j , (k + 1)2^j \right)$ (which is the same value, $2^{-\frac{j}{2}}$, on all intervals for a fixed $j$, ie. the value is independent of $k$)
- ($\color{blue}{blue}$ + $\color{red}{red}$) is the mean of $f$ on the dyadic interval $\left[k2^j , (k + 1)2^j \right)$
- recall: $f_{\text{avg}}=\color{blue}{\frac{1}{b-a}}\color{red}{\int_a^b f(x)dx}$
- phth: the area under $\phi_{j,k}$ is always $2^{\frac{j}{2}}$
because $\{\phi_{j,k} : k \in \mathbb{Z}\}$ is an ONB/CONS of $V_j$, ie. for a given $j\in\mathbb{Z}$ we have $(\phi_{j,k},\phi_{j,l})_{L^2}=\delta_{k,l}$ which means the $L^2$-norm of $\phi_{j,k}$ must be $1$ for all $j,k\in \mathbb{Z}$ (wrong because: ONBs consider the $L^2$-norm and the $L^2$-norm is the area under the squared function, not the area under the function itself!)
- Warum ist $(P_{V_j} f )(x) \color{red}{\stackrel{(\text{?})}{=}} 2^{−j}\int^{(k+1)2^j}_{k2^j} f(y)dy$?
- because, by definition of $P_{V_j}$, we have $\lVert f − P_{V_j} f\rVert_{L^2} = \inf_{u\in V_j} \lVert f − u\rVert_{L^2}$ (see (B)), so $P_{V_j} f$ has to be “the closest (in the $L^2$-norm) function $u\in V_j$ to $f$” on $\left[k2^j , (k + 1)2^j \right)$, ie. $P_{V_j} f$ is the function that minimizes the $L^2$-norm difference between $u$ and $f$ on $\left[k2^j , (k + 1)2^j \right)$ (intuitively, the size of the area under $u^2$ must be as similar as possible to the size of the area under $f^2$). Since, the $f_{\text{avg}}$ minimizes $\lVert f − u\rVert_{L^2}$ over $u\in V_j$ (the mean minimizes the mean squared error), we get $(P_{V_j} f )(x) = f_{\text{avg}} = 2^{−j}\int^{(k+1)2^j}_{k2^j} f(y)dy$.
- important: This holds only for the p.w.c. MRA (see (A)). For a different MRA, $\inf_{u\in V_j} \lVert f − u\rVert_{L^2}$ is not necessarily minimized by the mean $f_{\text{avg}}$, but by some other $u\in V_j$.
- ($\color{blue}{blue}$) is a normalization factor
- Warum gerade $2^{-j}$?
- because $f_{\text{avg}}=\color{blue}{\frac{1}{b-a}}\color{red}{\int_a^b f(x)dx}$, and in our case we have $\color{blue}{\frac{1}{b-a}}=2^{-j}$
- thus, we can compute this projection $P_{V_j}f$ in terms of the generator, ie. in terms of the CONS of $V_j$ given by $\phi_{j,k}$:
- (i) Noting that for any $x \in \mathbb{R}$, we have $\phi_{j,k} (x) = 0$ for all $k \in \mathbb{Z}$ but one, we get \(\label{eq:PVj2}\boxed{P_{V_j} f = \sum_{k\in \mathbb{Z}} (f, \phi_{j,k} )_{L^2} \phi_{j,k}}\).
- $P_{V_j} f$ is the “Projection of $f$ to our approximation space $V_j$”
- note: ($\ref{eq:PVj1}$) is for $x$ and ($\ref{eq:PVj2}$) is for all $x$. So, why is ($\ref{eq:PVj2}$) valid?
- When going from ($\ref{eq:PVj1}$) to ($\ref{eq:PVj2}$), we have used that for any $x \in \mathbb{R}$, we have $\phi_{j,k} (x) = 0$ for all $k \in \mathbb{Z}$ but one
- for simplicity, consider a fixed $j’$ first
- Because [ for a fixed $j’$ ] these dyadic intervals do not overlap and this $x’$ is only in just one of these dyadic intervals [ say $x’ \in \left[k’2^j,(k’+1)2^j\right)$ ]. Thus, [for any $x’$,] the series (the sum over $k\in \mathbb{Z}$) is just a sum with one summand [which is non-zero. This summand is the one with the factor $\phi_{j,k’}(x’)$ (because $x’ \in \left[k’2^j,(k’+1)2^j\right)$ as assumed above, thus, $\phi_{j,k’}(x’) = 2^{-\frac{j}{2}}\chi_{ \left[k’2^j,(k’+1)2^j\right) }(x’) = 2^{-\frac{j}{2}} \neq 0$ will always hold).]
- from next lec:
- Berkels: ($\ref{eq:PVj2}$) is a finite sum with only one non-zero coeff. [phth: must be wrong: probably Berkels means the summands of the series because the coeffs $(f, \phi_{j,k} )_{L^2}$ themselves cannot be $0$ because the scalar product is an integral over $\mathbb{R}$, thus, the scalar product with $\phi_{j,k}$ “samples” $f$ on $\left[k2^j,(k+1)2^j\right)$ for each $k$ which would be $0$ only if both $f$ and $\phi_{j,k}$ are $0$ on this interval]
This non-zero coeff is $(f, \phi_{j,k’})_{L^2}$ (ie. the scalar product of $f$ with the scaled $\chi_{ \left[k’2^j,(k’+1)2^j\right) })$.
The scalar product of $f$ with all other $\chi_{ \left[m2^j,(m+1)2^j\right) }$ with $m \neq k’$ is always zero because $x’$ is not in $\left[m2^j,(m+1)2^j\right)$, so $\phi_{j,m}(x’) = 0$.
- of course, this also follows from the ONB property of these $\phi_{j,k}$, but here we actually computed this how it naturally arises from how we chose the generator
- problem: now, lets compute the $h_k$ for our specific p.w. constant MRA:
4.15 (ii) Compute $h_k$
- Moreover, we have for this MRA that \(h_k = (\phi, \phi_{−1,k} )_{L^2} = \int_0^1 2^{\frac{1}{2}} \phi(2x − k) dx = \int_0^1 \sqrt{2}\chi_{\left[\frac{k}{2},\frac{k+1}{2}\right)} (x) dx = \begin{cases}
\frac{\sqrt{2}}{2} & k \in \{0, 1\}\\
0 &\text{else}
\end{cases} = 2^{−\frac{1}{2}} (\delta_{k,0} + \delta_{k,1} ).\)
4.15 (iii) Compute Scaling Equation
- Thus, in this case, the scaling equation simplifies to \(\phi = \sum_{k\in\mathbb{Z}} h_k \sqrt{2}\phi(2 \cdot -k) = \phi(2\cdot) + \phi(2 \cdot -1).\)
- ie. $\phi$ can be represented by shifted, scaled copies of itself (which is exactly what the scaling equation said, but here we now specifically see which two translations and scalings we need
- thus, the series becomes a finite sum!
4.15 (iv) Compute $\phi_{j,k}$
- Composing this with $x \mapsto 2^{-j} x - k$, this shows \(\phi(2^{-j} \cdot -k) = \phi(2^{-(j-1)} \cdot -2k) + \phi(2^{-(j-1)} \cdot -(2k + 1)).\) Multiplying both sides with $2^{-\frac{j}{2}}$ shows \(\label{eq:phi-jk}\phi_{j,k} = \frac{1}{\sqrt{2}} (\phi_{j-1,2k} + \phi_{j-1,2k+1} ).\)
- ie. each of these $\phi_{j,k}$ we can represent as linear combination of two of the $\phi_{j-1,\ldots}$, so from one finer level (which we can see in Abbildung 4.12)
- problem: we have already checked $P_{V_j}$ above (the projection to approximation spaces), now, we check $P_{W_j}$ (the projection to detail spaces)
4.15 (v) Compute Projection of $f$ to $W_j$ (Part 1)
- This allows us to compute the projection to $W_{j+1}$. We get \(P_{W_{j+1}} f \stackrel{(\text{EA})}{=} P_{V_j} f − P_{V_{j+1}} f \stackrel{(\text{EB})}{=} \sum_{k\in\mathbb{Z}} (f, \phi_{j,k} )_{L^2} \phi_{j,k} − \sum_{k\in\mathbb{Z}} (f, \phi_{j+1,k} )_{L^2} \phi_{j+1,k}\)
- EA: for this, lets recall our def. of these $W_j$s: so, $V_{j-1} = V_j \oplus W_j$ (4.13) implies that $P_{V_{j-1}}=P_{V_j}+P_{W_j}$ and if I just solve for the projection to $W_j$ then I have to subtract $V_j$ on both sides, so we get this result $P_{V_j}f - P_{V_{j+1}}f$ here
- EB: we computed both $P_{V_j}$ and $P_{V_{j+1}}$ last time (see beginning of (4.15): $P_{V_j}f = \ldots$)
- Splitting the first sum in even and odd indices and using ($\ref{eq:phi-jk}$) for the second sum, leads to
4.15 (vi) Compute Projection of $f$ to $W_j$ (Part 2), Wavelet Corresponding to the Generator
- what we introduce now is actually the wavelet that is corresponding to the generator.
- recall:
- if you think about (4.1) where we had the continuous WT, we started with this wavelet $\psi$ that we used to multiply our function $f$ with for the WT and integrate
- but here we started with a generator that is called $\phi$ (not $\psi$) and is not called “wavelet” but “generator”
- and now comes the connection back to the wavelet that we had before. This is what this $\psi$ will be.
- Introducing \(\label{eq:psijk}\boxed{\psi(x) := \phi(2x) - \phi(2x - 1)}\ \text{and}\ \boxed{\psi_{j,k} (x) := 2^{-\frac{j}{2}} \psi(2^{-j}x - k)}\) we get $\psi_{j+1,k} = \frac{1}{\sqrt{2}} (\phi_{j,2k} - \phi_{j,2k+1} )$ and the representation of $P_{W_{j+1}} f$ simplifies to \(\label{eq:PWf}\boxed{P_{W_{j+1}} f = \sum_{k\in\mathbb{Z}} (f, \psi_{j+1,k} )_{L^2} \psi_{j+1,k}}\).
- ($\ref{eq:psijk}$) defines scaled and translated versions of this $\psi$, exactly like we did with $\phi$
- ($\ref{eq:psijk}$) $\psi_{j,k}$ has the same scaling as $\phi_{j,k}$
- This $\psi$ we already know, you just have to recognize it:
- The function $\psi$ is the Haar wavelet we have seen before in (4.7 (iii)).
- Because $\phi(2x)$ is $\chi_{\left[0,\frac{1}{2}\right)}$ and, accordingly, $\phi(2x-1)$ is $\chi_{\left[\frac{1}{2},1\right)}$. This means the function $\psi$ is $1$ on the interval $\left[0,\frac{1}{2}\right)$ and $-1$ on the interval $\left[\frac{1}{2},1\right)$ which is exactly what was the Haar wavelet!
4.15 (vii) $\{\psi_{j,k} : k \in \mathbb{Z} \}$ is an ONB of $W_j$
- Moreover, we have \(\label{eq:orthonormal}(\psi_{j,k} , \psi_{j,m})_{L^2} = \frac{1}{2}((\phi_{j−1,2k} − \phi_{j−1,2k+1}) , (\phi_{j−1,2m} − \phi_{j−1,2m+1}))_{ L^2 } = \frac{1}{2} (\delta_{k,m} + 0 + 0 + \delta_{k,m} ) = \delta_{k,m}\) .
- this means that the $\psi_{j,k}$ that we get are also an ONS
- Combined with the representation of $P_{W_{j+1}}$ we derived above ($\ref{eq:PWf}$) and that $P_{W_{j+1}}g=g$ (for $g\in W_{j+1}$ for all $j$) is the identity on $W_{j+1}$ , this means that $\{\psi_{j,k} : k \in \mathbb{Z} \}$ is an orthonormal basis of $W_j$ .
- (i) “orthonormality of ONS” follows from ($\ref{eq:orthonormal}$)
- (ii) “each element $f$ of $W_{j+1}$ can be represented by $P_{W_{j+1}}$ as a projection in this space” follows from the $P_{W_{j+1}}g=g$ argument
- (iii) “With $P_{W_{j+1}}$ we can represent any element $f \in W_{j+1}$ as linear combination of the $\psi_{j+1,k}$s” follows from ($\ref{eq:PWf}$) and (ii) (“completeness of ONS”)
- phth:
- (ii), (iii) müssen für alle $j$ gelten:
- zB. mit index shift $j-1$ wird aus (iii):
- “With $P_{W_{j}}$ we can represent any element $f \in W_{j}$ as linear combination of the $\psi_{j,k}$s”
- bestätigt er auch später: “the $P_{W_{j+1}}g=g$ argument holds for all $j$, it is not only for $j+1$, but I can insert $j-1$ for $j$. It is somewhat confusing because of all of these $j$ and $j+1$ and we are jumping back and forth between these, but $j$ is arbitrary.”
- phth: wofür brauchen wir $P_{W_{j+1}}g=g$?
- $P_{W_{j+1}}g=g$ gilt für alle $f\in W_{j+1}$ und zeigt, dass $f$ durch die Operation $P_{W_j}$ nicht verändert wird. Deshalb muss ($\ref{eq:PWf}$) (zumindest) alle $f\in W_{j+1}$ als Linearkombination darstellen können. $\Rightarrow$ completeness of ONS $\{\psi_{j,k}\}$.
- The existence of a wavelet that generates an orthonormal basis for $W_j$ not only holds in the special case here, but more generally.
- so, this general construction that we can start with the generator $\phi$, then use the scaling equation and all these combinations to construct the $\psi$ and then get CONSes for all our $W_j$s, is not linked just to the Haar wavelet, but this is a very general connection.
- phth: the area under $\psi_{j,k}$ is always $2^{\frac{j}{2}}$
because $\{\psi_{j,k} : k \in \mathbb{Z}\}$ is an ONB/CONS of $W_j$, ie. for a given $j\in\mathbb{Z}$ we have $(\psi_{j,k},\psi_{j,l})_{L^2}=\delta_{k,l}$ which means the $L^2$-norm of $\psi_{j,k}$ must be $1$ for all $j,k\in \mathbb{Z}$ (wrong because: ONBs consider the $L^2$-norm and the $L^2$-norm is the area under the squared function, not the area under the function itself!)
source: Bredies
- Bredies: Das obige Beispiel zeigt, dass sich zur stückweise konstanten Multiskalenanalyse tatsächlich ein Wavelet (das Haar-Wavelet) findet, welches eine Orthonormalbasis der Wavelet-Räume $W_j$ liefert. Eine ähnliche Konstruktion funktioniert auch allgemein, wie folgender Satz zeigt:
4.16 $\psi$ is a Wavelet, $\{\psi_{j,k}\}$ is an ONB/CONS of $W_j$ and $L^2$
- conditions:
- (c1) let $(V_j )_{j\in\mathbb{Z}}$ be a MRA with generator $\phi$
- (c2) let $h_k$ be the coefficients of the scaling equation.
- (c3) let $\psi \in V_{−1}$ be defined by $\psi := \sum_{k\in\mathbb{Z}} (−1)^k h_{1−k} \phi_{−1,k}$.
- this $\psi$ may look surprising, but if we insert the p.w.c. MRA we only have $h_0$ and $h_1$ not equal to $0$ and, actually, both are just $\sqrt{2}/2$ (see (4.15 (ii))) and then we see that we get exactly the combination of $\chi_{\left[0,1/2\right]}$ (which just comes from the $k=0$) minus $\chi_{\left[1/2,1\right]}$ (which comes from $k=1$), where we also get this minus here from the $(-1)^k$. So, for the one case that we computed, this $\psi$ here is exactly the $\psi$ that we had above, but here it is now told in general. → (4.17 (i))
- Then,
- (i) $\{\psi_{j,k} : k \in Z\}$ is an orthonormal basis of $W_j$.
- which we showed for our specific generator $\phi$, but it holds for all generators
- (ii) $\{\psi_{j,k} : j, k \in Z\}$ is an orthonormal basis of $L^2 (\mathbb{R})$.
- We already saw last time that $L^2$ is the closure of the [direct] sum of all the $W_j$s (4.14 (ii)). Thus, if we have (i), then (ii) follows from what we already know.
- phth: nach (i) können wir jedes $f\in W_j$ mit $\{\psi_{j,k} : k \in Z\}$ darstellen. Da nach (4.14 (ii)) alle $W_j$ zusammen $L^2$ bilden, muss $\{\psi_{j,k} : k \in Z\}$ über alle $j$ alle $f\in L^2$ darstellen können.
- (iii) $\psi$ is a wavelet with $c_\psi = \ln{2}$.
- again, with the same scaling as above
- which is now the promised connection in its generality. So, for a function to be a wavelet we needed this corresponding constant $c_\psi$ to be bigger than $0$ and smaller than $\infty$ (this was for the continuous WT). In this case, we can even exactly tell what this constant is for the $\psi$ and this constant is always the same if this $\psi$ is constructed from an MRA and its generator. So, in essence, this means that “for any MRA we get a corresponding wavelet”.
- “how to construct a wavelet given any MRA”, works as follows:
- (c1) if we have a MRA $(V_j)_j$ it comes always with a generator $\phi$ (that is part of the def. of a MRA)
- (c3) then we can construct a function $\psi$ from scale shifted versions of the generator, ie. from $\phi_{-1,k}$, by reusing the coeffs from the scaling equation $h_{1-k}$
- (i) if we do so, then this $\psi$ gives us CONSes of all detail spaces $W_j$ for all the $j$ (ie. if we fix the scale $j$ and use all the shifts $k$)
- (ii) if we also use all the scales $j$ and all the shifts $k$, then we even get a CONS of $L^2$. So, this is like, instead of expressing things in terms of the $V_j$, we are expressing in terms of the $W_j$s (the detail spaces)
- (iii) furthermore, the $\psi$ created like this is always a wavelet and we can specify the admissibility constant $c_\psi$ which is $\ln{2}$ in this case, that means “given an MRA we get a wavelet” (this connects the 1st part of this chapter where I just assumed I was given a wavelet, and now we see how we can construct one from an MRA)
- proof:
- we will use similar techniques that we used in the specific case above where we did the computation for a very specific MRA (the p.w.c. MRA).
- Lets see if there are any unforeseen things happening on the way
4.17 Example: Wavelet Pairs $\phi,\psi$
4.17 (i) Piecewise Constant MRA, Haar Wavelet
- Let $(V_j )_{j\in\mathbb{Z}}$ the piecewise constant MRA from (4.9).
- Recalling, $h_k = 2^{-\frac{1}{2}} (\delta_{k,0} +\delta_{k,1} )$, we get \(\psi = \sum_{k\in\mathbb{Z}} (−1)^k h_{1−k} \phi_{−1,k} = (\ldots) = -\chi_{\left[\frac{1}{2},1\right)}+\chi_{\left[0,\frac{1}{2}\right)}\) which is again the Haar wavelet
4.17 (ii) Daubechies Wavelets
- Constructing wavelets that both have compact support and have some smoothness is highly non-trivial.
- The discovery of compact, regular and orthogonal wavelets by Ingrid Daubechies in 1988 is one of the reasons why wavelets became very popular in the 90s-era.
- The Daubechies wavelets are a family of orthogonal wavelets that are characterized by a maximal number $k$ of vanishing moments, ie. \(\int_{\mathbb{R}}x^l \psi(x) dx = 0\ \text{for}\ l \in \{0, \ldots , k - 1\},\) for a given support.
- Perhaps the most famous wavelet pair $\phi, \psi$ is the Daubechies wavelet with two vanishing moments, denoted by
db2
.
- Interestingly, the Daubechies wavelet with one vanishing moment,
db1
, is equivalent to the Haar wavelet.
- The values of the coefficients $h_k$ are available as tabulated numerical values. For
db2
, the values are also available analytically (see notes)
4.17 (iii) Symlets, Daubechies’ Least-Asymmetric Wavelets
- A modification of the Daubechies wavelets are the symlets, which are sometimes also called Daubechies’ least-asymmetric wavelets.
- As the name suggests, these wavelets are more symmetric than the Daubechies wavelets and also have vanishing moments.
- For instance, the symlet with four vanishing moments is denoted by
sym4
.
- Note that
db2
and sym2
are equivalent.
- (i) Due to (4.16), we know that
- $\psi$ constructed from an MRA is a wavelet and
- $\{\psi_{j,k} : j, k \in \mathbb{Z}\}$ is an orthonormal basis of $L^2 (\mathbb{R})$.
- In particular, we have \(f = \sum_{j,k\in\mathbb{Z}} (f, \psi_{j,k} )_{L^2} \psi_{j,k}.\)
- (ii) Recalling $(f, \psi_{j,k} )_{L^2} = L_\psi f (2^j , 2^j k)$, (i) shows that it is sufficient to know $L_\psi f$ on the discrete set $\{(2^j , 2^j k) : j, k \in \mathbb{Z}\} \subset \left[0, \infty\right) \times \mathbb{R}$ to encode $f$.
- (iii) In the sense of (ii), the coefficients $(f, \psi_{j,k} )_{L^2}$ are the discrete wavelet transform of $f$.
- (iv) The key to a fast wavelet transform is to use that those coefficients can be computed recursively, which is a consequence of the scaling equation and the definition of $\psi$:
- basically, the same as (4.19), but (4.19) is the short notation of (4.18)
- Herleitung der FWT Recursion Formulas (4.19):
- the 1st part is a generalization of the scaling equation: says that the scaling equation can be used for all $j$
- recall: $\phi = \sum_{k\in\mathbb{Z}} h_k \phi_{-1,k}$
- $\phi$ is constructed from the $\phi_{\color{red}{-1},k}$, so $\phi_\color{red}{j}$ can be constructed from the $\phi_{\color{red}{j}-1,k}$
- we have added $j$ in front of the $-1$ index in $\phi = \sum_{k\in\mathbb{Z}} h_k \phi_{\color{red}{-1},k}$
- and it is shifted by $k$ (see $\phi_{j,\color{red}{k}}$)
- recall: $\psi := \sum_{k\in\mathbb{Z}} (−1)^k h_{1−k} \phi_{−1,k}$
- $\psi$ is constructed from the $\phi_{\color{red}{-1},k}$, so $\psi_\color{red}{j}$ can be constructed from the $\phi_{\color{red}{j}-1,k}$
- the 2nd part follows by inserting the expressions for $\phi_{j,k}$ and $\psi_{j,k}$ from the 1st part into the scalar products $(f,\phi_{j,k})$ and $(f,\phi_{j,k})$
- this now gives us a way to go from $j$ to $j+1$
- given the $c$s for one $j$ we can get the $c$s and $d$s for $j$ increased by $1$
4.19 FWT: Compute $c^{j+1}$ and $d^{j+1}$ from $c^j$
- Berkels: the $c^j$ correspond to the projection $P_{V_j} f$ and the $c^{j+1}_k$ correspond to the projection $P_{V_{j+1}} f$ and the $d^{j+1}_k$ correspond to the projection $P_{W_{j+1}} f$
- Berkels: ie. if we know the function on $V_j$ we can split it in $V_{j+1}$ and $W_{j+1}$ (this is like splitting the coarse information from the detail)
- FWT Recursion Formulas:
- The FWT uses these FWT Recursion Formulas to
- compute from $P_{V_j} f$ (given in form of the coefficient vector $c^j$ ) the coarser projection $P_{V_{j+1}} f$ (as the coefficient vector $c^{j+1}$ ) and the wavelet component $P_{W_{j+1}} f$ (as the coefficient vector $d_{j+1}$).
- Note that the sums are finite in case the scaling equation coefficient sequence only has finitely many nonzero entries.
- If only few [of the scaling equation coefficients] are nonzero (eg. for the Haar wavelet only $h_0$ and $h_1$ are nonzero, for Daubechies’ wavelets four $h_k$s are nonzero), only few computations have to be done when going from $j$ to $j + 1$.
- problem: we still want to convert back to our original input after we have done whatever manipulation we wanted to do on the wavelet coeffs, so we need to be able to invert this. Fortunately, this is also possible in a similar way:
4.20 Invert the FWT: Compute $c^j$ from $c^{j+1}$ and $d^{j+1}$
- invert the (fast) wavelet transform, ie. we need to be able to
- compute $P_{V_j} f$ from the coarser projection $P_{V_{j+1}} f$ and the details $P_{W_{j+1}} f$
4.21 Compact Notation of the FWT and IFWT (Algorithm)
- apply (4.19) $M$ times, ie. go from $j$ to $j+1$ $M$ times
- only the $V_j$s are splitted further and further by the FWT algorithm, while $W_{j+1}$, $W_{j+2}$ are not splitted further (see figure below)
- the IFWT goes in the figure below in the opposite direction, ie. from bottom to top
4.22 $H$ and $G$ can be expressed as Convolutions
4.23 Coefficients $c_j = ((f, \phi_{j,k} )_{L_2} )_{k\in\mathbb{Z}}$ for one $j \in \mathbb{Z}$, ie. the representation of our signal $f$ on an initial scale $j$
- problem:
- To compute the FWT, we still need the coefficients $c^j = ((f, \phi_{j,k} )_{L^2} )_{k\in\mathbb{Z}}$ for one $j \in\mathbb{Z}$, ie. the representation of our signal $f$ on an initial scale $j$.
- Usually, $f$ is not available analytically, so we cannot just compute $c^j$.
- Thus, we need a way to estimate $c^j$ from available data.
- condition:
- let $C := \int_{\mathbb{R}}\phi(x) dx$
- then, \((f, \phi_{j,k} )_{L^2} = 2^\frac{j}{2} Cf (2^j k)+\mathcal{O}(2^j)\) thus, for $-j$ large enough [so that $\mathcal{O}(2^j)$ is negligible], $2^\frac{j}{2} Cf (2^j k)$ is a good approximation of $(f, \phi_{j,k} )_{L^2}$ .
- it is sufficient, if we can sample our signal $f$ at equidistant points as long as the distance between neighboring sample points is sufficiently small.
- Let $N = 2^M$.
- computing a single coefficient $(Hc)_k = (c \ast D_{−1} h)_{2k}$ is $\mathcal{O}(n)$
- Both $Hc$ and $Gc$ are (due to the downsampling) vectors of length $N/2$ and thus can be computed with $\mathcal{O}(nN )$
- the FWT for decomposition depth $M$ can be computed with $\mathcal{O}(nN )$, which is even faster than the FFT, if $n$ is small.
- Similarly, one can show that the IFWT is also $\mathcal{O}(nN )$.
- There are numerous ways to construct a two-dimensional wavelet transform.
- The easiest way is to extend the one-dimensional wavelet transform by using the tensor product.
- slide demo:
- Berkels: these 3 ($H_j^2$, $S_j^2$, $D_j^2$) contain a $V_j$, why are they called 2D detail spaces? - These 3 ($H_j^2$, $S_j^2$, $D_j^2$) are just details (like edges), although they contain a $V_j$, because these 3 together form $W_j$ by our definition of $W_j$: $\mathcal{W}_j^2$ must be the orthogonal complement of $V_j^2$, ie. $V_{j-1}^2=V_j^2\oplus \mathcal{W}_j^2$, so, when comparing this with the equation in Figure 4.24 we can see that $\mathcal{W}_j^2=H_j^2\oplus S_j^2\oplus D_j^2$
- Berkels: If you check the picture on the slides you can also see that there is no local means left in these $H_j^2$, $S_j^2$, $D_j^2$.
Figure 4.24:
- Berkels: on the lower left it picks more vertical structures, on the top right it picks more horizontal structures and on the diagonal it picks more diagonal structures
- Berkels: at the 2nd level we just get horizontal, vertical and diagonal components at a coarser scale. They contain coarser information because they are already averaged.
- why “averaged”?: 2nd level ist die Zerlegung von $V_{j+1}$ in $V_{j+2}$ und $W_{j+2}$ und $P_{V_{j+1}}f=f_{\text{avg}}$ im Sinne von (4.15 (i)), dh. das Bild oben links (a, aa, aaa, etc.) ist ein “average” (für p.w.c. MRA). In diesem Sinn ist $W_{j+2}$ (2nd level) coarser als $W_{j+1}$ (1st level), weil $W_{j+2}$ ein Teil des bereits “geaverageten” $V_{j+1}$ ist
- for other wavelets (other MRA) than the Haar wavelet the “average” (top left image) and the details (3 other images) look different
- Berkels: but the 2nd level is still more horizontal, vertical and diagonal details. So, the 2nd level gives you additional information that is not in the 1st level. But it is still as directional as the 1st level.
- Berkels: the scale on everything is very much related to your image resolution, not to what you actually see in there
4.25 JPEG, JPEG2000
- Wavelets have countless of applications, both in theoretical and in practical work. Here, we just confine to briefly mention one practical application.
- Like the DCT, the FWT (Fast Wavelet Transform) can be used for image compression by only storing the most important wavelet coefficients.
- In JPEG2000, the block-wise DCT is replaced by a wavelet transform, which results in significantly improved compression rates, ie. better image quality using the same amount of data or same image quality using less data.
- unlike the DCT, the WT has no “blocking” artifacts
- While the classical JPEG format is still much more popular in digital photography than JPEG2000, digital cinema uses JPEG2000 as basis for video compression.