Mathematical Methods of Signal and Image Processing - Chapter 4 - The Wavelet Transform

4 The Wavelet TransformPermalink

The FT is a global operator. You don’t always want this full globality. So, it is natural to ask for more local transforms, ie. where one coefficient only depends on a small part of your function, not on its entirety. So that you could by manipulating the coefficients of your transform make local changes to your function versus being forced to do global changes. Think about the low-pass filter: this removed the high frequencies/the details, but it also introduced some “ringing” because of these global changes. And the wavelet transform is a more localized transform.
So we want something that is more localized than the FT.

4.1 The Continuous Wavelet TransformPermalink

conditions:
- f,ψ∈L2
  - $f$ : the image
  - $\psi$ : encodes how you want to transform
- $a>0$ (scale parameter)
- $b\in\mathbb{R}$ (spatial parameter)
then, the is $L_{\psi}f(a,b) = \int_\mathbb{R} f(x)\boxed{\frac{1}{\sqrt{a}}\psi\left(\frac{x-b}{a}\right)}dx$
like a convolution, see (4.2 (ii) (B))
we are integrating our image f and we are weighting this by $\frac{1}{\sqrt{a}}\psi$ . And what we do with our ψ is: we shift $\psi$ to a certain position $b$ . So, we have our ψ and we shift it with b over the signal wherever we want to have it and we scale $\psi$ with $a$ and $\sqrt{a}$ . So this 1√a and a is exactly the change of variables that “zooms in” or “out” to grab some information from your signal.
- Eg. if you would think that ψ is just the characteristic function of $\left[0,1\right]$ , then it would give you a mean value on $\left[0,1\right]$ , if a=1 and b=0,
  - and if you now you would take different mean values because you are shifting this $\psi$
  - and with you would be able to do mean values or some accumulation over smaller or larger intervals depending on how you choose this $a$ .
  - So this somehow samples … (interrupted by question)
- Q&A: in case $\psi$ does have then it is , not local! So, finding good $\psi$ s is a very important question. For now we just know (vaguely) that we have some $\psi$ . We will need to find out what suitable $\psi$ s are and what they should fulfill.
- But, in general, you can have this image and you go to a certain position of your signal with $b$ and you choose a certain zoom factor $a$ and then you somehow aggregate values in there.

4.2 FT vs WT, Representations of $L_{\psi}f$ Permalink

(i) Lψ maps $f:\mathbb{R}\to\mathbb{C}$ to $L_{\psi}f:(0,\infty)\times\mathbb{R}\to\mathbb{R}$
- in words: the result of the WT is a 2D function (2-dimensional domain for Lψf)
  - recall: the result of the FT is the 1D function, ie. $\mathcal{F}$ maps to (domain and codomain are the same for $f$ and $\mathcal{F}f$ )
- : because you can evaluate it for different $a$ s and $b$ s, not only for different frequencies.
- We increase from a 1D domain to a 2D domain which already implies that this representation (i) somehow has to be overcomplete. So, this has to have redundant information because it cannot be that we need a 2D function to describe a 1D function. The 2D function has much more power to store information.
(ii) 2 representations of Lψf (equivalent to (4.1)):
- (A) $L_{\psi}f$ as : $L_{\psi}f(a,b)=\frac{1}{\sqrt{a}}(f,T_{-b}D_{\frac{1}{a}}\psi)_{L^2}$
- (B) Lψf as convolution: Lψf(a,b)=1√a(f∗D−1aψ)(b)
  - just a side remark: is related to : $u(x,\sigma)=(g_{\sqrt{2\sigma}}\ast f)(x)$ , where we also filtered with kernels of different sizes, so there is some relation
- depending on the circumstances one representation will be more useful than the other
(iii) Where does Lψf map? Just a variant of L2 on [0,∞)×R: L2([0,∞)×R,dadba2):={F:[0,∞)×R→R μ−measurable:∫R∫∞0|F(a,b)|2dadba2<∞}
- notation: this in $L^2(\left[0,\infty\right)\times\mathbb{R},\frac{dadb}{a^2})$ is into which we map (like eg. in the $L^2(\mathbb{R}^d,\mathbb{C})$ notation, where we map from $\mathbb{R}^d$ to $\mathbb{C}$ ), but it tells us that we need to change the integration measure $dadb$ by rescaling it with $\frac{1}{a^2}$ . (just Berkels’ notation)
- everything except the $a^2$ scaling would be “normal Lebesgue”, in particular, ∫R∫∞0|F(a,b)|2dadb<∞ would be “normal Lebesgue”, ie. the square integral is finite
  - we weight this integral $\int_\mathbb{R}\int_0^{\infty}\lvert F(a,b)\rvert^2 dadb$ by $a^2$ : that means for large $a$ this integral will be decreased because we multiply by sth that is small, but for small $a$ this integral will be strongly increased. So, this has a which puts certain requirements on this transform $F$ , so that (otherwise $\int_\mathbb{R}\int_0^{\infty}\lvert F(a,b)\rvert^2 dadb$ would not be finite) and even a bit more.
(iv) equipped with the same equivalence relation used for the Lebesgue spaces (B.3) and scalar product (F,G)L2([0,∞)×R,dadba2):=∫R∫∞0|F(a,b)G(a,b)|2dadba2
- the scalar product looks , but we add this (which we also required for the new $L^2$ space that we just defined in (iii))
problem: so far it is unclear why this $a^2$ is sitting there, but in the next proposition we will see why we have it and how it relates to the wavelet transform

4.3 $L_{\psi}$ maps into $L^2(\left[0,\infty\right)\times\mathbb{R},\frac{dadb}{a^2})$ , $L_{\psi}$ is Linear, Plancherel FormulaPermalink

conditions:
- ψ∈L2 s.t. 0<cψ:=2π∫∞0|ˆψ(ω)|2ωdω<∞ (warning: these are 3 conditions!)
  - where $\hat{\psi}$ is the FT of $\psi$
then,
- (i) Lψ:L2→L2([0,∞)×R,dadba2)
  - this means that the functions that we get from this transform are not in the normal $L^2$ space, but in this “ $L^2$ with the different scaling $a^2$ ”. That is why we need this new space.
- (ii) $L_\psi$ is a linear mapping
- (iii) (Lψf,Lψg)L2([0,∞)×R,dadba2)=cψ(f,g)L2 for all f,g∈L2
  - like Plancherel formula which said that $(\mathcal{F}f,\mathcal{F}g)_{L^2} = (f,g)_{L^2}$ . This is essentially also true here. Except that we do not have the normal scalar product on the lhs, but this new scalar product and there is this extra scaling factor $c_\psi$ .
  - says even more than that $L_{\psi}$ is (discussed more precisely in ). Because is a very strong property. It says that $L_{\psi}$ is almost like an isometry, except that it changes lengths by a constant factor $c_\psi$ .
proof:
- prove (iii) using
  - Plancherel formula (3.27)
  - properties of $\mathcal{F}$ for translation, modulation, etc. (3.4)
  - Fubini
  - $f$ real-valued $\Rightarrow$ $\mathcal{F}f$ hermitian (3.6)
- (iii) now immediately implies (i)
  - because claims that we are mapping to this special space and for this special space we need that $\int_\mathbb{R}\int_0^{\infty}\lvert F(a,b)\rvert^2\frac{dadb}{a^2}$ is finite, but if I plug in $f$ or $g$ in then we see that the integral we are looking for $(L_{\psi} f,L_{\psi} f)_{L^2(\left[0,\infty\right)\times\mathbb{R},\frac{dadb}{a^2})}$ is the same as $c_{\psi}(f,f)_{L^2}$ , ie. the norm squared of $f$ , and the norm squared of $f$ is, of course, finite since we are starting with $f\in L^2$ .
- $L_{\psi}$ is linear is clear because the integral in the def. of $L_{\psi}$ is linear in $f$ (see def. of $L_{\psi}$ )
sidenote: if you look closely in , , we only needed $c_{\psi} < \infty$ . But we will need $c_{\psi} > 0$ to get the inverse.

4.4 Admissibility Condition, WaveletPermalink

(i) the condition 0<cψ:=2π∫∞0|ˆψ(ω)|2ωdω<∞ is called admissibility condition and cψ is the admissibility constant
- name “” because if this condition is fulfilled, ie $c_{\psi} = \infty$ , then the norm in $c_{\psi}(f,g)_{L^2}$ explodes.
- If $c_{\psi} = 0$ then the norm $c_{\psi}(f,g)_{L^2}$ just degenerates: so that $f$ is mapped to norm $0$ , although it was not $0$ .
(ii) any $\psi\in L^2$ that fulfills the admissibility condition is called wavelet
outlook:
- $c_\psi<\infty$ gives us the continuity of the WT → (4.5 (ii))
- $c_\psi>0$ gives us the invertibility of the WT → (4.6)

4.5 $L_{\psi}$ Almost Isometry, $L_{\psi}$ is Continuous, ( $\hat{\psi}(0) = 0\Leftrightarrow$ Mean Value of $\psi$ (not $\hat{\psi}$ !) is $0$ )Permalink

(i) the scalar product in induces the $\lVert L_{\psi}f\rVert_{L^2(\left[0,\infty\right)\times\mathbb{R},\frac{dadb}{a^2})}=\sqrt{(L_{\psi}f,L_{\psi}f)_{L^2(\left[0,\infty\right)\times\mathbb{R},\frac{dadb}{a^2})}}\stackrel{(4.3)}{=}\sqrt{c_{\psi}(f,f)_{L^2}}=\sqrt{c_{\psi}}\lVert f\rVert_{L^2}\quad\forall f\in L^2$
(ii) from (i) we see that (4.3) shows that the WT is continuous if $c_{\psi}<\infty$
- because (i) shows the boundedness of the WT and since the WT is linear (4.3 (ii)) we can conclude from (ex05 problem 2) that the WT is continuous
(iii) since \displaystyle\int_0^1\frac{1}{\omega}d\omega=\lim_{a\searrow 0}\int_a^1\frac{1}{\omega}d\omega=\ldots=\infty (since the integral in the admissibility condition has the same structure as this integral here, ie. it includes the integral from $0$ to $1$ , if the $\lvert \hat{\psi}(\omega)\rvert^2$ does not go below a certain value then this integral in the admissibility condition will be $\infty$ . This means $\lvert \hat{\psi}(\omega)\rvert$ go to $0$ , otherwise the integral in the admissibility condition cannot be finite!) it is necessary that $\lvert \hat{\psi}(\omega)\rvert^2$ decays to $0$ quickly enough for $\omega \to 0$ (ie. $\lvert \hat{\psi}(\omega)\rvert^2 \to 0$ for $\omega \to 0$ ) in a suitable sense to get $c_\psi < \infty$ (ie. the finiteness of the integral in the admissibility condition at $0$ can only come from the fact that $\lvert \hat{\psi}(\omega)\rvert^2$ also vanishes there).
- In case \hat{\psi} is continuous (at least in 0), $\lvert \hat{\psi}(\omega)\rvert^2 \to 0$ for $\omega \to 0$ implies $\hat{\psi}(0) = 0$ is necessary (but not sufficient) to get c_\psi < \infty.
  - note: since $\psi\in L^2$ and the FT maps from $L^2$ to $L^2$ it is not guaranteed that $\hat{\psi}$ is continuous.
  - phth: In case $\hat{\psi}$ is not continuous in $0$ , $\lvert \hat{\psi}(\omega)\rvert^2 \to 0$ for $\omega \to 0$ does not imply $\hat{\psi}(0) = 0$ (eg. think about the function $f$ with $f=1$ for $x=0$ and $f=0$ for $x\neq 0$ , then $f$ is not continuous in $0$ and $\lvert f(x)\rvert^2\to 0$ for $x\to 0$ , $f(0)\neq 0$ )
- What does \hat{\psi}(0) = 0 mean?
  - we have discussed this: the $0$ frequency is the only frequency that you can easily interpret: : so (here: mean of this $\psi$ is $0$ ).
- In other words, it is necessary (but not sufficient) that the for $\psi$ to be a wavelet.
(iv) The condition $c_\psi > 0$ is necessary to guarantee the invertibility of the transform on its image (4.6):

4.6 Wavelet Inversion TheoremPermalink

conditions:
- $f\in L^2$
- $\psi\in L^2$
- \psi is a wavelet
  - ie. $\psi$ fulfills the admissibility condition
then, $\text{for all}\ f\in L^2:\ f(x) = \frac{1}{c_\psi}\int_{\mathbb{R}}\int_0^\infty L_{\psi}f(a,b)\frac{1}{\sqrt{a}}\psi\left(\frac{x-b}{a}\right)\frac{dadb}{a^2}\quad\text{for a.e.}\ x\in\mathbb{R}$
in words: we can transform back by integrating over $da$ and $db$ (instead of $dx$ as in the WT ) with the proper scaling $a^2$ and the same integration weight $\frac{1}{\sqrt{a}}\psi\left(\frac{x-b}{a}\right)$ as in the WT (4.1)
proof:
- Hilbertian adjoint of $L_{\psi}$ L_{\psi}^\ast:L^2(\left[0,\infty\right)\times\mathbb{R},\frac{dadb}{a^2})\to L^2(\mathbb{R})
  - a mapping that goes exactly in the other direction than $L_{\psi}$ , ie. it is mapping from the weighted $L^2$ space to $L^2$
  - idea:
    - in finite dimensions the expression $(L_{\psi}f,F)_{L^2(\left[0,\infty\right)\times\mathbb{R},\frac{dadb}{a^2})}$ would be a scalar product on some $\mathbb{R}^n$ and the linear mapping $L_{\psi}$ would be a matrix, ie. the expression would be like the scalar product $(\underline{A}x,y)_{\mathbb{R}^n}$
    - recall (a standard linear algebra trick): what can you do to move the matrix $\underline{A}$ in $(\underline{A}x,y)_{\mathbb{R}^n}$ to the other argument $y$ ?
      - for real $\underline{A}$ : $(\underline{A}x,y)_{\mathbb{R}^n}=(x,\underline{A}^{\top} y)_{\mathbb{R}^n}$
      - for complex $\underline{A}$ : $(\underline{A}x,y)_{\mathbb{R}^n}=(x,\overline{\underline{A}^{\top}} y)_{\mathbb{R}^n}$
        
        recall: Hermitian matrices $\underline{A}=\overline{\underline{A}^{\top}}$ can be understood as the complex extension of real symmetric matrices $\underline{A}=\underline{A}^{\top}$ ., Hermitian matrix
      - there are 2 kinds of adjoints, Hilbert space adjoint vs Banach space adjoint
        
        “Hilbertian” adjoint (usually called conjugate transpose)
        
        “Dual” or “Banachian” adjoint
    - the Hilbertian adjoint just generalizes the concept of transposing or adjoining a matrix to move it from one scalar product argument to the other to linear mappings on Hilbert spaces (so, just think of it as a generalized transpose)
problem:
- so far we have only shown that it is that the (to ensure $\hat{\psi}(0)=0$ ), but this is for $\psi$ to be a wavelet!

4.7 Example Wavelets: 1st and 2nd Derivatives of the Gaussian, Haar WaveletPermalink

4.7 (i) 1st Derivative of the GaussianPermalink

(i) 1st derivative of the Gaussian: Let \psi(x):=xe^{-\frac{x^2}{2}}=-\frac{d}{dx}g(x), where g(x):=e^{-\frac{x^2}{2}}
- idea: to show that this is a wavelet we need to compute $c_\psi$ and this involves computing the FT of $\psi$ (now you can see why I chose the Gaussian first because there we know what the FT is)
then, $\hat{\psi}(\omega)=\mathcal{F}\psi(\omega)=\mathcal{F}\left(-\frac{d}{dx}g\right)(\omega)\stackrel{(3.17)}{=}-i\omega\mathcal{F}g(\omega)\stackrel{(3.20)}{=}-i\omega g(\omega)=-i\omega e^{-\frac{\omega^2}{2}}=-i\psi(\omega)$ $\Rightarrow c_\psi=2\pi\int_0^\infty\frac{\lvert \hat{\psi}(\omega)\rvert^2}{\omega}d\omega=2\pi\int_0^\infty\frac{\lvert -i\omega e^{-\frac{\omega^2}{2}}\rvert^2}{\omega}d\omega=2\pi\int_0^\infty\frac{\omega^2 e^{-\frac{2\omega^2}{2}}}{\omega}d\omega=2\pi\int_0^\infty\omega e^{-\omega^2}d\omega=\pi\int_0^\infty 2\omega e^{-\omega^2}d\omega=\pi\left(-e^{-\omega^2}\right)\bigg\vert_0^\infty=\pi$ $\Rightarrow \psi\ \text{is a wavelet.}$
important: unlike the mean of $\psi$ , the mean of $g$ is $0$ , so $g$ cannot be a wavelet itself!
intuition:
- $g$ is always above the x-axis, ie. the mean (average value of the function) of $g$ must be positive and not equal to $0$ !
- \psi is in part below the x-axis.
  - $\psi$ is a product of the function $x$ and a Gaussian. This product must be negative for $x<0$ because $x$ is negative for $x<0$ . Ie. the graph of $\psi$ must be in the 3rd quadrant. Since $x=0$ for $x=0$ the graph must go through the origin. For $x>0$ the graph must be in the 1st quadrant because $x$ and the Gaussian are both positive there.

With the convolution representation of the wavelet transform, we get L_{\psi}f(a,b)=\frac{1}{\sqrt{a}}(f\ast D_{-\frac{1}{a}}\psi)(b)=-\frac{1}{\sqrt{a}}\left(f\ast D_{-\frac{1}{a}}g^\prime\right)(b)=\frac{a}{\sqrt{a}}\left(f\ast (D_{-\frac{1}{a}}g)^\prime\right)(b)=\sqrt{a}\left(f\ast (D_{-\frac{1}{a}}g)\right)^\prime(b)
- Similarly to the Canny Edge Detector (2.16), the input f is convolved with a scaled Gaussian for different scales (ie. different kernel sizes).
  - in (2.16) we noticed that for different scales we get different edge strengths and we had some nice properties and it is somewhat similar here
- Unlike the Canny Edge Detector (2.16), a derivative is taken after the convolution.

4.7 (ii) Mexican HatPermalink

(ii) 2nd derivative of the Gaussian, aka Mexican hat: Let \psi(x):=\left(1-x^2\right)e^{-\frac{x^2}{2}}=-\frac{d^2}{dx^2}g(x)
- one can show (exercise, similar to ) that $\hat{\psi}(\omega)=\omega^2 e^{-\frac{\omega^2}{2}}$ and $c_\psi=\pi$ . Thus, the Mexican hat is also a wavelet.

this gives us some hint at what wavelets can look like and what kind of information they possibly extract from our functions:
- these wavelets are local mean values because the wavelets themselves have mean $0$ , but you can think of , for instance, as something like a . Because, at $x=0$ , $\psi$ weights the things to the right of the point $x=0$ positively and the things to the left of the point $x=0$ negatively. Thus, $\psi$ is something like a derivative (cf. derivative filters (2.12)).
- however, (ii) is more like a 2nd derivative because you have a positive value in the center and negative values in the surrounding, but all very smoothed out.

4.7 (iii) Haar WaveletPermalink

(iii) Haar wavelet: \begin{equation*} \psi(x):= \begin{cases} 1 & 0 \leq x < \frac{1}{2}\\ -1 & \frac{1}{2} \leq x < 1\\ 0 & \text{else} \end{cases} \end{equation*}
- here it is easier to see what $\psi$ computes: $\psi$ computes the mean over $\left[0,\frac{1}{2}\right]$ minus the mean over $\left[\frac{1}{2},1\right]$ , so it is also sth like a derivative of your input
- to show that the Haar wavelet is a wavelet we can
  - represent \psi as \psi=\chi_{\left[0,\frac{1}{2}\right)}-\chi_{\left[\frac{1}{2},1\right)} and then we can use the properties of the FT to show that \hat{\psi}(\omega)=\frac{1}{\sqrt{2\pi}}ie^{-i\frac{\omega}{2}}\sin\left(\frac{\omega}{\psi}\right)\text{sinc}\left(\frac{\omega}{4\pi}\right)
    - this should not surprise us because we know that the FT of $\chi_{\left[-B,B\right]}$ is a cardinal sine and we know that shifts are turned into modulations, and therefore, $\chi_{\left[0,\frac{1}{2}\right)}$ would have to be shifted to $\chi_{\left[-\frac{1}{4},\frac{1}{4}\right)}$ (ie. the support of $\chi$ is shifted to the origin s.t. $\chi$ has the shape $\chi_{\left[-B,B\right]}$ ) and $\chi_{\left[\frac{1}{2},1\right)}$ as well, and that is encoded in this modulation $e^{-i\frac{\omega}{2}}$ (cf. (3.4))
    - this is not completely obvious and you really need to compute this to see that this formula is correct
  - based on that we can show c_\psi=\ln{2}, thus \psi is a wavelet
    - interesting: you do not need to compute this manually because c_\psi=\ln{2} also follows from (4.16) + (4.17 (i))
      - outlook: we will show how to generate wavelets with certain properties and with this generation we can also create the Haar wavelet and that is why it will inherit the general properties
- Unlike the derivatives of the Gaussian we considered before (ie. (i) and (ii)) that are very smooth and have non-compact support, the Haar wavelet is discontinuous and has compact support.
  - interesting: it is actually quite difficult to find wavelets that are both smooth and have compact support (will be discussed later)

4.2 The Discrete Wavelet TransformPermalink

note: unlike what the name suggests this discrete WT is not fully discrete like the DFT. (However, there are some relations to the FS, so it is more like the FS.)
problem: One thing that is very apparent is that this WT so far is redundant because we take a 1D function and convert it to a 2D function. Therefore, this 2D function has to have some redundant information because we cannot need a 2D function to describe a 1D function.
more precisely: The (continuous) wavelet transform of a function f \in L^2 (\mathbb{R}) is apparently a redundant representation of $f$ (f is univariate (1D function), L_\psi f is bivariate (2D function)).
- The question arises whether it is sufficient to know $L_\psi f$ on a subset of $\left(0,\infty\right] \times \mathbb{R}$ (ie. perhaps we do not need to compute it for all the scalings and all the shifts, but just for some of them).
  - It will turn out that this is indeed the case for certain discrete subsets, but showing this requires several preparations.
  - But it is very important to venture into this because in order to practically use the WT we need to figure out for which scalings and for which translations we actually need these computations.
problem: The first tool on this way is the MRA:
problem:
- the WT is redundant
- we are now on our way to the DWT that is supposed to only look at necessary spatial and scaling parameters, so not at all of them, but hopefully just a subset
- to make this possible I introduced the concept of a so-called MRA:

4.8 MRA, MSA, GeneratorPermalink

For $j \in \mathbb{Z}$ , let $V_j$ be (vii) a closed subspace of $L^2(\mathbb{R})$ .
Then, (V_j)_{j\in \mathbb{Z}} is called multiresolution analysis (MRA) or multiscale approximation (MSA), if it fulfills the following conditions:
- (i) translational invariance \forall j, k \in \mathbb{Z} : f \in V_j \Leftrightarrow T_{2^j k} f \in V_j
  - when we fix a certain $j$ , we must be able to shift $f$ in $V_j$ while $f$ stays in $V_j$ . But we are not allowed to shift arbitrarily, but we have to make finite steps: $k$ is the number of steps that we can take. So we can take arbitrary many steps (because $k$ is in $Z$ ), but the step size is $2^j$ , ie. the bigger the $j$ the larger the step. Which hints that these $V_j$ ’s get smaller and smaller (→ ) because for $j=0$ we must be able to shift by step size $2^0=1$ , for $j=1$ we only need to be able to shift by step size $2$ , etc.
- (ii) inclusion \forall j \in \mathbb{Z} : V_{j+1} \subset V_j
  - ie. the space is getting smaller when I increase j
    - “if $j_1 > j_2$ the space $V_{j_1}$ is included in $V_{j_2}$ ”
    - with increasing $j$ the spaces can only get smaller
- (iii) scaling \forall j \in \mathbb{Z} : f \in V_j \Leftrightarrow D_{\frac{1}{2}} f \in V_{j+1}
  - this D_{\frac{1}{2}} just means that everything is spread out
    - eg. lets say your $f$ is a characteristic function of the interval $\left[0,1\right]$ , then when you apply $D_{\frac{1}{2}}$ it will be the characteristic function of $\left[0,2\right]$ (which also relates to the fact that if $j$ gets bigger then these shifts $T_{2^jk}$ get bigger)
- problem: we need some more properties to make this meaningful because so far with we could just say that every $V_j$ is all of $L^2$ , then everything here is trivially fulfilled. So, we need to prevent that with the following properties:
- (iv) trivial intersection \bigcap_{j\in \mathbb{Z}} V_j = \{0\}
  - ie. the intersection of all the $V_j$ ’s is just $0$ which formalizes that these spaces get smaller and smaller and at some point there is nothing in there any more.
- problem: but would still be fulfilled if we just say all $V_j$ ’s are just $0$ , then everything is trivially fulfilled. We do not want this, either. Therefore, we require (v):
- (v) completeness (actually, “denseness in L^2”) \overline{\bigcup_{j\in \mathbb{Z}} V_j} = L^2(\mathbb{R})
  - phth: aka “\bigcup_{j\in \mathbb{Z}} V_j is dense in L^2”
    - siehe “Prerequisites” → “dense” → def. (i)
  - Berkels: in this sense the $V_j$ ’s must capture everything that is in $L^2$ .
  - phth: complete in the sense that $L^2$ is complete (Riesz-Fischer theorem) and we must be able to approximate any $L^2$ function arbitrarily well with a sequence of functions in $\bigcup_{j\in \mathbb{Z}} V_j$ , where the limit of this sequence need not necessarily be in $\bigcup_{j\in \mathbb{Z}} V_j$ (like we can approximate any element of $\mathbb{R}$ with a sequence in $\mathbb{Q}$ )
- (vi) orthonormal basis \exists\phi \in V_0 : \{T_k \phi : k \in \mathbb{Z}\} is a CONS of V_0
  - $T_k \phi$ means “shift $\phi$ by integers $k$ ”
  - the “shift \phi by integers” links to the translational invariance (i) for j=0. There [in (i)] we [require that all f in V_0] also need to be able to shift by integers. Thus, if \phi is in V_0, then all the T_k\phi must [necessarily!] be in V_0 by property (i). But [in (vi)] what we want is that this [set] is a CONS!
    - which gives us a very nice way to construct a basis for these kind of spaces
The function \phi from the orthonormal basis property is called generator or scaling function of the MRA.
- name “generator” comes from because $\phi$ generates an ONS of $V_0$
- name “scaling function” comes from the fact that $\phi$ fulfills the scaling equation , ie. $\phi$ can be expressed by a scaled version of itself
problem: this sounds like a very abstract set of properties where it is not so clear yet why this is good or perhaps not even clear how we could even choose this $V_j$ . So let us first start with a simple example of such an MRA:

4.9 Example MRA: “Space of Functions that are Constant on Dyadic Intervals”Permalink

for j\in\mathbb{Z} let V_j:=\{f\in L^2 : f\vert_\left[k2^j,(k+1)2^j\right)\ \text{is constant for all}\ k\in\mathbb{Z}\}
- ie. V_j is the space of square integrable functions that are constant on the dyadic intervals $\left[k2^j , (k + 1)2^j\right)$ (“dyadic” because the interval limits are powers of 2, related: dyadic logarithm)
  - phth: square integrable means for p.w. constant functions that they must either have compact support or be the zero function, ie. the p.w.c. functions are nonzero only a finite number of dyadic intervals

then, $(V_j)_{j\in\mathbb{Z}}$ is a MRA.

Proof (vii)Permalink

(vii) The V_j are obviously closed subspaces of L^2(\mathbb{R}).
- Berkels: If you have a sequence of functions f_n\in V_j for a fixed $j$ that are piecewise constant on the interval \left[k2^j , (k + 1)2^j\right), then their limit \lim_{n\to\infty}f_n = f will also be piecewise constant on the same interval \left[k2^j , (k + 1)2^j\right), ie. f\in V_j. There is nothing that could happen to make it [\lim_{n\to\infty}f_n] not piecewise constant.
  - “Folgen-Abgeschlossenheit” (Bredies S.17): “(…) für jede Folge $( x_n )$ in $U$ mit $x_n \to x$ der Grenzwert $x$ auch in $U$ ist”
  - recall: “closed” = “abgeschlossen bzgl bestimmter Operationen” (aka “closed under the limit operation”)
    - zB. closed bzgl Multiplikation heißt, Multiplikation führt nicht aus der Menge raus, dh. das Produkt bleibt in der Menge, in der die Faktoren waren
    - “In a complete metric space, a closed set is a set which is closed under the limit operation.”, Wikipedia
      - $L^2$ is complete (Riesz-Fischer Theorem)

Proof (i) - (iii)Permalink

(i) Translational invariance, (ii) inclusion and (iii) scaling are obviously fulfilled from the corresponding properties of the dyadic intervals.
- (i) - (iii) are intuitively clear:
  - we need to be able to shift by multiples of $2^j$ : if say $j=0$ , then we need to shift the function by $2^0=1$ , or by $2^1=2$ , etc. and this just shifts the constant values from one interval to another interval, but they will stay piecewise constant on these intervals, you just shift the values to different intervals
  - if you scale by $\frac{1}{2}$ you are supposed to go from $V_j$ to $V_{j+1}$ . Lets look at the green function here: if we scale that by $\frac{1}{2}$ that means we double everthing, ie. this value will be mapped to that (…), so you just switched from being constant on intervals of length one to being constant on intervals of length two
  - if you are piecewise constant on intervals of length $2^{j+1}$ of course this is also piecewise constant on intervals of length $2^j$ (because the latter is just a weaker property)

Proof (iv)Permalink

(iv) A function that is in the intersection of all V_j has to be constant
- on $\left[0,1\right)$ and on $[−1, 0)$ (for $j=0$ ).
- on $\left[0,2\right)$ and on $[−2, 0)$ (for $j=1$ ).
- … etc.
- on $\left[0,\infty\right)$ and on $(−\infty, 0)$ (for $j\to \infty$ ).
- The only $L^2$ -function ( $L^2$ means the function must be integrable on $\left[0,\infty\right)$ and $(−\infty, 0)$ !) that fulfills this is the zero function.

Proof (v)Permalink

(v) The completeness follows from the fact C_c(\mathbb{R}) is dense in L^2(\mathbb{R}) (B.8), continuous functions on compact sets (ie. C_c(\mathbb{R})) are uniformly continuous and that uniformly continuous functions can be approximated in the supremum norm with functions constant on dyadic intervals.
- phth: thus, functions constant on dyadic intervals are dense in $L^2$
- Berkels: You need to stack these arguments, ie. first go to compact support, then go to uniform continuity and then you know you just need to make these intervals small enough to have the necessary approximation.
- phth:
  - This prooves that “ $\bigcup_{j\in \mathbb{Z}} V_j$ is dense in $L^2$ ” (denseness).
  - But, $\bigcup_{j\in \mathbb{Z}} V_j$ is NOT complete in the sense that “all Cauchy sequences in $\bigcup_{j\in \mathbb{Z}} V_j$ converge in $\bigcup_{j\in \mathbb{Z}} V_j$ ” (completeness).
  - $L^2$ is complete (Riesz-Fischer Theorem), but this does not necessarily mean that $\bigcup_{j\in \mathbb{Z}} V_j$ is complete (because eg. $\mathbb{Q}$ is dense in $\mathbb{R}$ , but $\mathbb{Q}$ is not complete just because $\mathbb{R}$ is complete!).
  - But we can approximate any $L^2$ function arbitrarily well with a function in $\bigcup_{j\in \mathbb{Z}} V_j$ .
- recall:
  - “completeness” = “Cauchy sequences in this set converge to a point in this set”
  - ” $\mathbb{Q}$ is dense in $\mathbb{R}$ ”
  - “ $L^p$ is complete” (cf. section “Prerequisites”)
  - “supremum norm” = “distance between two functions”
  - “uniform continuity” def.

Proof (vi)Permalink

(vi) Finally, \phi = \chi_{\left[0,1\right)} is a generator [because]
- (vi.i) The ONS property [of \{T_k \phi : k \in \mathbb{Z}\}] is obviously fulfilled and
  - Behauptung: betrachte $V_0 = \{f \in L^2 : f \vert_{\left[k,k+1\right)}\ \text{is constant for all}\ k \in \mathbb{Z}\}$ , dann ist $\phi = \chi_{ \left[0,1\right) } \in V_0$ ein Generator, weil:
  - Berkels:
    - So, [for $\phi$ to be a generator] we need that these shifted versions of $T_k\phi$ are a CONS.
    - the orthonormal property is obvious because if you shift \left[0,1\right] by an integer, then it will not intersect with the interval \left[0,1\right] any more. So, for different integer shifts k you just have this function [\chi_{ \left[0,1\right) }] on \left[k,k+1\right] and those [\chi_{ \left[0,1\right) } and \chi_{ \left[k,k+1\right) }] do not intersect.
      - dh. die $L^2$ -norm der Produktfunktion $\chi_{ \left[m,m+1\right) } \cdot \chi_{ \left[n,n+1\right) }$ ist $0$ für $m \neq n$ und $1$ für $m=n$ , was die ONS Eigenschaft der shifted versions of $T_k\phi$ beweist (see ONS def.).
- (vi.ii) the completeness [of \{T_k \phi : k \in \mathbb{Z}\}] is shown based on the fact that (A) the squared $L^2$ -norm of an element of $V_0$ has to be (B) the sum of the squared values of the element on each of the intervals $\left[k,k+1\right)$ .
  - Berkels:
    - (B) is equivalent to the Fcoeffs in (3.42)
    - so, if I am supposed to give you the $L^2$ -norm of the green function in the picture above it is always the squared area under the graph and, since the length is $1$ , it is just the squares of the heights. So, this is a series. Based on this, you directly see that the completeness def. (3.42) is fulfilled, where we needed to show that the norm of function squared (rhs of (3.42)) is the same as . Here, the Fcoeffs are exactly the interval heights (because we just multiply a function that is constant on the interval $\left[k,k+1\right)$ which gives us then the value in the interval $\left[k,k+1\right)$ ).
  - phth: “ = ” $\Leftrightarrow$ shifted versions of $T_k\phi$ are a CONS (cf. (3.42))
  - phth: consider an element f \in V_0, then
    - is $\lVert f\rVert^2_{L^2}$
    - is $\sum_{k\in Z}(f, T_k\chi_{\left[0,1\right)})^2$ , where $(f, T_k\chi_{\left[0,1\right)})$ is the k-th Fcoeff

4.10 Orthogonal Projection $P_{V_j}$ , Approximating $f$ with the Nested Spaces $V_j$ Permalink

conditions:
- Let $(V_j)_{j\in\mathbb{Z}}$ be a MRA.
- Let P_{V_j} denote the orthogonal projection from $L^2$ to $V_j$ .
  - Note that the orthogonal projection exists since $L^2$ is a Hilbert space and $V_j$ a nonempty, closed subspace of $L^2$ . (Hilbert projection theorem)
- Let $f \in L^2$ .
Then, we have \lim_{j\to -\infty} P_{V_j} f = f and \lim_{j\to \infty} P_{V_j} f = 0.
- $\lim_{j\to \infty} P_{V_j} f = 0$ encodes that the $V_j$ get smaller and smaller
- \lim_{j\to -\infty} P_{V_j} f = f means that we can use the nested spaces V_j to approximate f, the smaller $j$ , the better the approximation.
  - So, if we have a very small $j$ ( $j\to -\infty$ , highly negative), then we get very close to $f$ and if it is very large ( $j\to\infty$ ), then it is very coarse. So, we can decide how well we want to approximate $f$ by choosing this $V_j$ .
  - we can approximate $f$ with the spaces $V_j$ arbitrarily well if $j$ is “negative enough”
proof:
- the reason we are looking at this is to get familiar with the consequences of all of the properties in (4.8). So far, it is not clear why we need these properties. Here this gets clear.
- $j\to\infty$ : (4.14)
- j\to -\infty: show that \lVert f − P_{V_j} f\rVert_{L^2} = \inf_{u\in V_j} \lVert f − u\rVert_{L^2} for all j \in \mathbb{Z} goes to 0 for j\to -\infty
  - this is because, by definition, the orthogonal projection $P_{V_j}$ is just the “closest point in the space $V_j$ to a given point $f$ ”
  - use completeness (4.8 (v)) and inclusion (4.8 (ii)) properties

4.11 $\phi_{j,k}$ , Scaling Equation, Scaling Function $\phi$ Permalink

conditions:
- Let (V_j)_{j\in\mathbb{Z}} be a MRA with generator \phi.
  - again, this is just some object that fulfills all of the properties in (4.8)
  - and we have one example (4.9) that fulfills all of the properties in (4.8), so we know that there are such MRAs, thus it makes sense to look at them
- Let $\phi_{j,k} (x) := 2^{-\frac{j}{2}}\phi(2^{-j}x - k)$ for all j, k \in \mathbb{Z} and x \in \mathbb{R}.
  - problem: why do we look at $\phi_{j,k}(x)$ ? → Because $\phi_{j,k}(x)$ is an ONB of $V_j$ → (i)

4.11 (i) $\{\phi_{j,k} : k \in \mathbb{Z}\}$ is an ONB/CONS of $V_j$ Permalink

(i) The scaling property (4.8 (iii)) and the orthonormal basis property (4.8 (vi)) together imply that, for all $j \in \mathbb{Z}$ , $\{\phi_{j,k} : k \in \mathbb{Z}\}$ is an ONB/CONS of $V_j$ (exercise).
- problem: why is this the case?
  - so, in we have assumed that if we start with $\phi$ then these $T_k\phi$ s are a CONS
  - but if we look at the new notation then this T_k\phi is nothing but \phi_{0,k}
    - $j=0$ eliminates this scaling $2^{-\frac{j}{2}}$ and $2^{-j}$ in $\phi_{j,k}(x)$
    - and we just have the translation $(x-k)$ in $\phi_{j,k}(x)$
  - the $\phi_{0,k}$ s are an ONS of $V_0$
  - but then we have this scaling property , ie. that $f$ is in $V_j$ iff the scaled version $D_{\frac{1}{2}}f$ is in $V_{j+1}$ and this we can use to convert the ONS from $V_0$ to the other $V$ s
  - the scaling $2^{-\frac{j}{2}}$ is exactly done in such a way that the normality property does not vanish because if you just scale by $D_{\frac{1}{2}}f$ you would scale the $L^2$ -norm (ie. the scaling would change the $L^2$ -norm), but the $2^{-\frac{j}{2}}$ corrects for this (exercise: when you just compute it you will see how this fits together)
  - so, from one ONB of $V_0$ we can go to ONBs for all of our $V_j$ s
- sidenote: if we have such an MRA then the because if we have $(V_j)_{j\in\mathbb{Z}}$ then we can just shift $j$ by any number and it still goes from $-\infty$ to $\infty$ just that $j=0$ then means perhaps $j=10$ , so there is no inherent value. The only thing that puts a specific value to the $j$ is that for $V_0$ we have the generator $\phi$ . And, since we have the generator at $0$ , it is natural that we just go one level finer and end up with $V_{-1}$ . In principle, we could just invert the ordering of the $j$ s, it would not change anything, except for that going up with $j$ (better/finer approximation) and going down with $j$ (worse/coarser approximation) would mean the opposite, ie. whether you call “going finer” “increasing $j$ ” or “decreasing $j$ ” does not matter. So, the construction links different scales, but it is completely arbitrary whether you want to associate fine scales with a large $j$ or with a negative $j$ . It is just one decision. You can just replace $j$ with $-j$ everywhere, if you want, and everything stays consistent.
- problem: in this way we can connect amongst other things $V_0$ and $V_{-1}$ → (ii)

4.11 (ii) The Generator $\phi$ is not only in $V_0$ , but also in $V_{-1}$ Permalink

(ii) Due to the inclusion property (4.8 (ii)), we have $\phi \in V_0 \subset V_{-1}$ , ie. the generator is not only in $V_0$ , but also in $V_{-1}$ .

4.11 (iii) We can express the Generator $\phi$ in the CONS $\{\phi_{-1,k} : k \in \mathbb{Z}\}$ Permalink

(iii) Since \{\phi_{-1,k} : k \in \mathbb{Z}\} is a CONS of V_{-1}, we can express the generator $\phi$ in this CONS \label{eq:scaling-equation}\phi = \sum_{k\in\mathbb{Z}} h_k \phi_{-1,k} = \sum_{k\in\mathbb{Z}} h_k \sqrt{2}\phi(2 \cdot - k), where the dot “\cdot” is just a placeholder (not a variable!) and $h_k = (\phi, \phi_{-1,k} )_{L^2}$ for k \in \mathbb{Z} are the Fcoeffs, cf. (3.43) (note that this does not mean that the series is converging pointwise a.e.).
- This equation is called and the reason why $\phi$ is also called scaling function (cf. (4.8)).
- note: This also means that we can . This puts some interesting constraints on this $\phi$ .
- phth: this does imply the $\phi\color{red}{(x)} = \sum_{k\in\mathbb{Z}} h_k\sqrt{2}\phi(2\color{red}{x} - k)$ for all $x$ , but only to the $L^2$ -norm of $\phi$ (lhs) (see (3.48))
- watch: lecture 27, beginning of (4.12): long discussion about (3.43) and (3.48), convergence of the series in the $L^2$ -norm vs. pointwise convergence
  - counterexample: in the figure below: consider the following sequence of functions:
    - (black) $1$ on $\left[0,1\right]$ , otherwise $0$
    - (red) $1$ on $\left[0,\frac{1}{2}\right]$ , otherwise $0$
    - (blue) $1$ on $\left[\frac{1}{2},1\right]$ , otherwise $0$
    - (green) $1$ on $\left[0,\frac{1}{4}\right]$ , otherwise $0$
    - (…) $1$ on $\left[\frac{1}{4},\frac{1}{2}\right]$ , otherwise $0$
    - (…) $1$ on $\left[\frac{1}{2},\frac{3}{4}\right]$ , otherwise $0$
    - (…) $1$ on $\left[\frac{3}{4},1\right]$ , otherwise $0$
    - (…) $1$ on $\left[0,\frac{1}{8}\right]$ , otherwise $0$
    - etc.
    - thus, wherever you are in this sequence, you will always pick up a $1$
    - thus, at no point this sequence will converge pointwise to $0$ , but the $L^2$ -norm converges to $0$ because the intervals get smaller and smaller
    - this shows that the L^2-convergence does not give you pointwise control, not even on a.e. point
      - for most of the part this is not so important and if you go to discrete you do not have these kind of problems, but I stress this here, so that you do not fall for this equality ( $\ref{eq:scaling-equation}$ ) showing more than it is - and that is the reason why I did not put an $(x)$ in ( $\ref{eq:scaling-equation}$ ), like $\phi\color{red}{(x)} = \sum_{k\in\mathbb{Z}} h_k\sqrt{2}\phi(2\color{red}{x} - k)$ , because it would be wrong! (actually, it was wrong in the older version of the lecture notes) It is very easy to overlook. That is why I am stressing it.
      - similarly, the FS simply does not converge pointwisely, no matter how you interpret this
  - note: we will see later that for “nice generators” and “nice wavelets” only a finite amount of these $h_k$ s will be non-zero and then all of these [convergence related] problems are gone because if this sum here in ( $\ref{eq:scaling-equation}$ ) is finite then you do not need to consider convergence because you can compute the finite sum and then you are done - it is just when you have a series (“infinite sum”) then you need to be careful! The wavelets that you use will have only a finite number of $h_k$ s that are not zero.

problem: with (iii) we see a direct connection to the continuous WT:

4.11 (iv) $(f, \phi_{j,k} )_{L^2} = L_{\phi} f (2^j , 2^j k)$ Permalink

(iv) For f \in L^2 (\mathbb{R}), we have (f, \phi_{j,k} )_{L^2} = \int_{\mathbb{R}}f(x)2^{-\frac{j}{2}} \phi(2^{-j} x - k) dx = \boxed{\int_{\mathbb{R}}f(x) \sqrt{2^j} \phi \left(\frac{x - 2^j k}{2^j}\right) dx} = L_{\phi} f (2^j , 2^j k).
- problem: Why did we rewrite (f, \phi_{j,k} )_{L^2} to the boxed formula?
  - This should look familiar from (4.1).
  - The boxed formula is exactly the continuous WT for certain $a$ and $b$ .
- Thus, $(f, \phi_{j,k} )_{L^2}$ is the evaluation of the continuous WT $L_{\phi} f$ for $a = 2^j$ and $b = 2^j k$ .
  - in particular, we can get all of these necessary coefficients $(f, \phi_{j,k} )_{L^2}$ just by evaluating the WT at certain discrete parameters
  - problem: we first have to understand what these $V_j$ s are in order to conclude more from this, but we can already see that there is a strong link to the WT by this
problem: to further work with this I will prepare a helping lemma that tells us sth about these $h_k$

4.12 $h_k$ s orthogonal to themselves when shiftedPermalink

condition:
- Let $h_k$ be the coefficients of the scaling equation of an MRA.
then $\sum_{k\in\mathbb{Z}} h_k h_{k+2m} = \delta_{0,m}\ \text{for all}\ m \in \mathbb{Z}$
that means, in a certain sense, the scaling equation coeffs are orthogonal to themselves if you shift them by $2m$
for $m=0$ that means their squared sum is $1$

4.13 Prerequisites: Orthogonal Complement, Direct SumPermalink

outer/direct sum:
- direct sum
- macht am meisten Sinn: Innere direkte Summe
orthogonal complement:
- “the set W^{\perp} of all vectors in V that are orthogonal to every vector in W”
  - phth: thus, $W^{\perp}$ always includes the vector $0$ (important for the proof of (4.14))
- Für einen Unterraum $U \in X$ ist der Unterraum aller Vektoren $U_\perp = \{ y \in X\ \vert\ x \perp y\ \text{for all}\ x \in U \}$ das orthogonale Komplement von $U$ ., (Bredies)
- properties: Wikipedia
  - $X\cap X^{\perp}=\{0\}$

4.13 Approximation Space, Detail Space, Wavelet SpacePermalink

conditions:
- Let $(V_j)_{j\in\mathbb{Z}}$ be a MRA.
- Let W_j be such that V_{j−1} = V_j \oplus W_j and V_j \perp W_j, ie. W_j is the orthogonal complement of V_j in V_{j-1}.
  - we know that $V_j$ is contained in $V_{j-1}$ and everything that is missing in $V_j$ is concentrated in $W_j$ . In a way that we only pick up the orthogonal complements that are missing. That is a nice separation.
Then, V_j is called approximation space to the scale j
- because we know that the projection of $f$ to $V_j$ approximates $f$ in a sense that if $j$ goes to $-\infty$ we get $f$ back
and W_j is called detail space or wavelet space to the scale j.
- Because you can think of it, when going from $V_j$ to the larger $V_{j-1}$ , then we are adding some stuff and the stuff that we add are just “details” [from the “detail space”].
- If the approximation gets better [as $j$ gets smaller], apparently, we have to add things [functions] that were missing, ie. that were not representable on these coarser spaces [ $V_j$ with larger $j$ ], and that motivates the name “detail space”.
problem: so, all of the $W_j$ s are enough to represent all $L^2$ functions. Now, we can iterate this:

4.14 Splitting $V_j$ into Details, Splitting a $L^2$ -Function into all its Details in all Scales $j$ Permalink

4.14 (i) Decomposition of $V_j$ into DetailsPermalink

The definition of W_j implies by iteration that \label{eq:Vj-decomposition}V_j \stackrel{(4.13)}{=} V_{j+1} \oplus W_{j+1} \stackrel{(4.13)}{=} V_{j+2} \oplus W_{j+2} \oplus W_{j+1} = \ldots \stackrel{(4.13)}{=} V_{j+M} \oplus \color{blue}{\bigoplus_{m=j+1}^{j+M} W_m} for all M \geq 1.
- Berkels: “outer sum” (\color{blue}{\bigoplus_{m=j+1}^{j+M} W_m}) also means that the intersection of all of these W_js is \{0\}.
  - explanation A: [because by def. of $W_j$ , $W_j$ contains all vectors that are orthogonal to the vectors in $V_j$ , thus, since the vector $0$ is orthogonal to all vectors in all $V_j$ s, all $W_j$ must include the vector $0$ ]
  - explanation B: from Lemma below: $V=U\oplus W$ the intersection of $U$ and $W$ contains the vector $0$ (see the “” in the Lemma)! In particular, in our case, the intersection of must be the set containing the vector $0$ . But this also means that the intersection of must be the set containing the vector $0$ (think about a Venn diagram for 3 intersecting sets: intersection area gets larger, when you take away one intersecting set). Later, we will see $V_j = \bigoplus_{m\geq j+1} W_m$ and with this we can say that the intersection of must be the set containing the vector $0$ . This must hold for all $j$ . So, it is safe to say “the intersection of all of these $W_j$ s is $\{0\}$ ”, as Berkels said.

source

This immediately implies V_j \supset \bigoplus_{m\geq j+1} W_m.
- because from ( $\ref{eq:Vj-decomposition}$ ) we see that $V_j$ contains this outer sum (in $\color{blue}{blue}$ ) for all $M$ , and in particular, also the infinite summation $M\to\infty$ of all of these outer sums (in $\color{blue}{blue}$ )
problem: The question is: Can there be something missing in \bigoplus_{m\geq j+1} W_m or does it have to be all of entire V_j. The answer is, it is all, but we have to show this:
- Assume there is v \in V_j \setminus \bigoplus_{m\geq j+1} W_m. Since \label{eq:VjplusM-decomposition}V_j \setminus \left(\bigoplus_{m\geq j+1} W_m\right) \stackrel{(\ref{eq:Vj-decomposition})}{=} \left(V_{j+M} \oplus \left(\color{purple}{\bigoplus_{m=j+1}^{j+M} W_m}\right)\right) \setminus \left(\color{purple}{\bigoplus_{m\geq j+1} W_m}\right) \stackrel{(\text{explanation 1})}{=} V_{j+M} \setminus \left(\bigoplus_{m \geq j+1+M} W_m\right), we have, v \in V_{j+M} for all M \geq 1.
  - explanation 1: intuitiv: das letzte Gleichzeichen gilt, weil Mengen ( $\color{purple}{purple}$ in ( $\ref{eq:VjplusM-decomposition}$ )) zu $V_{j+M}$ addiert “ $\oplus$ ” und subtrahiert “ $\setminus$ ” werden und sich dabei gegenseitig “canceln” ( $\color{red}{red}$ in ( $\ref{eq:VjplusM-decomposition-explanation}$ )) $\label{eq:VjplusM-decomposition-explanation}V_{j+M}\oplus\{\color{red}{(j+1)+\ldots+(j+M)}\}\setminus\{\color{red}{(j+1)+\ldots+(j+M)}+(j+M+1)+\ldots+(\infty)\}=V_{j+M}\setminus\{(j+M+1)+\ldots+(\infty)\}$
- due to the inclusion property, this implies $v \in V_M$ for all $M \in \mathbb{Z}$
- due to the trivial intersection property, this implies v = 0. This is a contradiction to v\not\in \bigoplus_{m\geq j+1} W_m \color{blue}{\supset \{0\}}.
  - $\color{blue}{blue}$ part: all of these $W_m$ s contain $0$ (recall: orthogonal complement properties: intersection of $V_j$ and $W_j$ is $\{0\}$ )
(i) Thus, we have shown V_j = \bigoplus_{m\geq j+1} W_m.
- thus, “the $V_j$ can be split into all details contained for these larger $V_j$ spaces”

4.14 (ii) Decomposition of $L^2$ into DetailsPermalink

(ii) Combined with the completeness, we have \color{red}{L^2(\mathbb{R})} \stackrel{(\text{completeness prop. of}\ V_j)}{=} \overline{\bigcup_{j\in\mathbb{Z}} V_j} \stackrel{(\text{i})}{=} \overline{\bigcup_{j\in\mathbb{Z}}\bigoplus_{m\geq j+1} W_m} \color{red}{= \overline{\bigoplus_{m\in\mathbb{Z}} W_m}}.
- thus, “we can split any $L^2$ function in sums of elements of this $W_m$ ”
  - in other words: this is like “splitting the function in all its details in all the scales”

4.14 (iii) Representing $f$ via $V_j$ and $W_j$ Permalink

(iii) Since V_j \oplus W_j is an orthogonal decomposition of $V_{j−1}$ , we have P_{V_{j-1}} \stackrel{(\text{orth. decomposition})}{=} P_{V_j} + P_{W_j} \Rightarrow P_{W_j} = P_{V_{j-1}} - P_{V_j}. Due to \label{eq:L2-decomposition}L^2(\mathbb{R}) \stackrel{(\text{ii})}{=} \overline{\color{blue}{\bigoplus_{m\in\mathbb{Z}} W_m}} = \overline{\bigoplus_{m\geq j+1} W_m \oplus \bigoplus_{m\leq j} W_m} \stackrel{(\text{i})}{=} \overline{\color{red}{V_j} \oplus \color{purple}{\bigoplus_{m\leq j} W_m}}, f \in L^2(\mathbb{R}) can be expressed by \label{eq:f-projection-to-V-and-W}\boxed{f \stackrel{(\ref{eq:L2-decomposition})}{=} \color{blue}{\sum_{m\in\mathbb{Z}} P_{W_m} f} \stackrel{(\ref{eq:L2-decomposition})}{=} \color{red}{P_{V_j} f} + \color{purple}{\sum_{m\leq j} P_{W_m} f}.}
- Bredies: Wir können nun jedes $f \in L^2$ auf verschiedene Weisen mit Hilfe der Räume $V_j$ und $W_j$ darstellen
- explanation 2:
  - ( $\color{blue}{blue}$ part of ( $\ref{eq:f-projection-to-V-and-W}$ )) is the of $f$ to the orthogonal decomposition of $L^2$ in the form of $\overline{\color{blue}{\bigoplus_{m\in\mathbb{Z}} W_m}}$
  - ( $\color{red}{red}+\color{purple}{purple}$ part of ( $\ref{eq:f-projection-to-V-and-W}$ )) is the of $f$ to the orthogonal decomposition of $L^2$ in the form of $\overline{\color{red}{V_j} \oplus \color{purple}{\bigoplus_{m\leq j} W_m}}$
  - related: “Decomposition of a vector space into direct sums is not unique.”, Wikipedia
- Bredies (en): These equations justify the name “multiscale analysis”: the spaces V_j allow a systematic approximation of functions on different scales.
  - Bredies (de): Diese Gleichungen rechtfertigen die Bezeichnung “”: Die Räume $V_j$ erlauben die systematische Approximation von Funktionen auf verschiedenen Skalen.
- Bredies: Abbildung 4.12: Im Sinne des vorherigen Beispiels [dh. Beispiel “Stückweise konstante Multiskalenanalyse”] kann man etwas unpräzise aber suggestiv sagen: P_{V_j} u ist die Darstellung von u “auf der Skala V_j” und enthält Details von u “bis zur Größe 2^j”, siehe Abbildung 4.12.
  - phth: die Höhe von $P_{V_j}u$ auf den einzelnen dyadischen Intervallen ist nach “the mean of $f$ on the dyadic interval $\left[k2^j , (k + 1)2^j \right)$ ” (im Fall der stückweise konstanten Multiskalenanalyse) bzw. $\sum_{k\in\mathbb{Z}}(\phi_{j,k},u)_{L^2}\phi_{j,k}$ (im allgemeinen Fall)
  - phth: Details von $u$ “bis zur Größe $2^j$ ”

Abbildung 4.12:

4.14 (iv) Proof of 4.10 (continued)Permalink

(iv) (iii) implies that $P_{V_j} f \to 0$ for $j \to \infty$ holds (because explanation 3), which we claimed earlier (proof of (4.10)).
- : “sandwich”: the last equality in ( $\ref{eq:f-projection-to-V-and-W}$ ) must hold for all $j$ , thus, since ( $\color{blue}{blue}$ ) and ( $\color{purple}{purple}$ ) are equal for $j\to\infty$ , this equality can only hold for $j\to\infty$ if $P_{V_j}f$ vanishes
problem: Let’s look at our only MRA example again and see what we can get about these projections there:

4.15 Example: Piecewise Constant MRAPermalink

4.15 (i) Compute Projection of $f$ to $V_j$ Permalink

Let $(V_j )j\in\mathbb{Z}$ the piecewise constant MRA from and let $f \in L^2 (\mathbb{R})$ .
Since (A) the elements of V_j are constant on the intervals \left[k2^j , (k + 1)2^j \right) and (B) \lVert f − P_{V_j} f\rVert_{L^2} = \inf_{u\in V_j} \lVert f − u\rVert_{L^2}, we get for x \in \left[k2^j , (k + 1)2^j \right) that (intuitively: boils down to approximating a function on an interval with a single value) (P_{V_j} f )(x) \stackrel{(\text{p.w.c. MRA})}{=} \color{blue}{2^{−j}}\color{red}{\int^{(k+1)2^j}_{k2^j} f(y)dy} = 2^{−j}\int_{\mathbb{R}} \chi_{\left[k2^j ,(k+1)2^j \right)}(y)f(y)dy \label{eq:PVj1}(\ldots) = 2^{−\frac{j}{2}}\int_{\mathbb{R}}2^{−\frac{j}{2}}\chi_{\left[0,1\right)}(2^{−j}y − k)f(y)dy = \color{purple}{2^{−\frac{j}{2}}}\int_{\mathbb{R}}\phi_{j,k}(y)f(y)dy = (f, \phi_{j,k} )_{L^2} \color{purple}{\phi_{j,k}(x)}.
- ( $\color{purple}{purple}$ ) is the value of $\phi_{j,k}(x)$ on the intervals $\left[k2^j , (k + 1)2^j \right)$ (which is the same value, $2^{-\frac{j}{2}}$ , on all intervals for a fixed $j$ , ie. the value is independent of $k$ )
- (\color{blue}{blue} + \color{red}{red}) is the mean of f on the dyadic interval \left[k2^j , (k + 1)2^j \right)
  - recall: $f_{\text{avg}}=\color{blue}{\frac{1}{b-a}}\color{red}{\int_a^b f(x)dx}$
  - phth: the area under $\phi_{j,k}$ is always $2^{\frac{j}{2}}$ because $\{\phi_{j,k} : k \in \mathbb{Z}\}$ is an ONB/CONS of $V_j$ , ie. for a given $j\in\mathbb{Z}$ we have $(\phi_{j,k},\phi_{j,l})_{L^2}=\delta_{k,l}$ which means the $L^2$ -norm of $\phi_{j,k}$ must be $1$ for all $j,k\in \mathbb{Z}$ (wrong because: ONBs consider the $L^2$ -norm and the $L^2$ -norm is the area under the squared function, not the area under the function itself!)
  - Warum ist (P_{V_j} f )(x) \color{red}{\stackrel{(\text{?})}{=}} 2^{−j}\int^{(k+1)2^j}_{k2^j} f(y)dy?
    - because, by definition of P_{V_j}, we have \lVert f − P_{V_j} f\rVert_{L^2} = \inf_{u\in V_j} \lVert f − u\rVert_{L^2} (see (B)), so P_{V_j} f has to be “the closest (in the L^2-norm) function u\in V_j to f” on \left[k2^j , (k + 1)2^j \right), ie. P_{V_j} f is the function that minimizes the L^2-norm difference between u and f on \left[k2^j , (k + 1)2^j \right) (intuitively, the size of the area under u^2 must be as similar as possible to the size of the area under f^2). Since, the f_{\text{avg}} minimizes \lVert f − u\rVert_{L^2} over u\in V_j (the mean minimizes the mean squared error), we get (P_{V_j} f )(x) = f_{\text{avg}} = 2^{−j}\int^{(k+1)2^j}_{k2^j} f(y)dy.
      - : This holds (see ). For a different MRA, $\inf_{u\in V_j} \lVert f − u\rVert_{L^2}$ is not necessarily minimized by the mean $f_{\text{avg}}$ , but by some other $u\in V_j$ .
- (\color{blue}{blue}) is a normalization factor
  - Warum gerade 2^{-j}?
    - because $f_{\text{avg}}=\color{blue}{\frac{1}{b-a}}\color{red}{\int_a^b f(x)dx}$ , and in our case we have $\color{blue}{\frac{1}{b-a}}=2^{-j}$

thus, we can compute this projection $P_{V_j}f$ in terms of the generator, ie. in terms of the CONS of $V_j$ given by $\phi_{j,k}$ :
(i) Noting that for any x \in \mathbb{R}, we have \phi_{j,k} (x) = 0 for all k \in \mathbb{Z} but one, we get \label{eq:PVj2}\boxed{P_{V_j} f = \sum_{k\in \mathbb{Z}} (f, \phi_{j,k} )_{L^2} \phi_{j,k}}.
- $P_{V_j} f$ is the “Projection of $f$ to our approximation space $V_j$ ”
- note: (\ref{eq:PVj1}) is for x and (\ref{eq:PVj2}) is for all x. So, why is (\ref{eq:PVj2}) valid?
  - When going from ( $\ref{eq:PVj1}$ ) to ( $\ref{eq:PVj2}$ ), we have used that for any $x \in \mathbb{R}$ , we have $\phi_{j,k} (x) = 0$ for all $k \in \mathbb{Z}$ but one
- for simplicity, consider a fixed $j’$ first
- Because [ for a fixed $j’$ ] these dyadic intervals do not overlap and this $x’$ is only in just one of these dyadic intervals [ say $x’ \in \left[k’2^j,(k’+1)2^j\right)$ ]. Thus, [for any $x’$ ,] the series (the sum over $k\in \mathbb{Z}$ ) is just a sum with This summand is the one with the factor $\phi_{j,k’}(x’)$ (because $x’ \in \left[k’2^j,(k’+1)2^j\right)$ as assumed above, thus, $\phi_{j,k’}(x’) = 2^{-\frac{j}{2}}\chi_{ \left[k’2^j,(k’+1)2^j\right) }(x’) = 2^{-\frac{j}{2}} \neq 0$ will always hold).]
- from next lec:
  - Berkels: ( $\ref{eq:PVj2}$ ) is a finite sum with only one non-zero coeff. [phth: must be wrong: probably Berkels means the summands of the series because the coeffs $(f, \phi_{j,k} )_{L^2}$ themselves cannot be $0$ because the scalar product is an integral over $\mathbb{R}$ , thus, the scalar product with $\phi_{j,k}$ “samples” $f$ on $\left[k2^j,(k+1)2^j\right)$ for each $k$ which would be $0$ only if both $f$ and $\phi_{j,k}$ are $0$ on this interval]
  - ~~This non-zero coeff is $(f, \phi_{j,k’})_{L^2}$ (ie. the scalar product of $f$ with the scaled $\chi_{ \left[k’2^j,(k’+1)2^j\right) })$ .~~
  - The scalar product of $f$ with all other $\chi_{ \left[m2^j,(m+1)2^j\right) }$ with $m \neq k’$ is always zero because $x’$ is not in $\left[m2^j,(m+1)2^j\right)$ , so $\phi_{j,m}(x’) = 0$ .
- of course, this also follows from the ONB property of these $\phi_{j,k}$ , but here we actually computed this how it naturally arises from how we chose the generator
problem: now, lets compute the $h_k$ for our specific p.w. constant MRA:

4.15 (ii) Compute $h_k$ Permalink

Moreover, we have for this MRA that $h_k = (\phi, \phi_{−1,k} )_{L^2} = \int_0^1 2^{\frac{1}{2}} \phi(2x − k) dx = \int_0^1 \sqrt{2}\chi_{\left[\frac{k}{2},\frac{k+1}{2}\right)} (x) dx = \begin{cases} \frac{\sqrt{2}}{2} & k \in \{0, 1\}\\ 0 &\text{else} \end{cases} = 2^{−\frac{1}{2}} (\delta_{k,0} + \delta_{k,1} ).$

4.15 (iii) Compute Scaling EquationPermalink

Thus, in this case, the scaling equation simplifies to \phi = \sum_{k\in\mathbb{Z}} h_k \sqrt{2}\phi(2 \cdot -k) = \phi(2\cdot) + \phi(2 \cdot -1).
- ie. $\phi$ can be represented by shifted, scaled copies of itself (which is exactly what the scaling equation said, but here we now specifically see which two translations and scalings we need
- thus, the series becomes a finite sum!

4.15 (iv) Compute $\phi_{j,k}$ Permalink

Composing this with x \mapsto 2^{-j} x - k, this shows \phi(2^{-j} \cdot -k) = \phi(2^{-(j-1)} \cdot -2k) + \phi(2^{-(j-1)} \cdot -(2k + 1)). Multiplying both sides with 2^{-\frac{j}{2}} shows \label{eq:phi-jk}\phi_{j,k} = \frac{1}{\sqrt{2}} (\phi_{j-1,2k} + \phi_{j-1,2k+1} ).
- ie. each of these $\phi_{j,k}$ we can represent as linear combination of two of the $\phi_{j-1,\ldots}$ , so from one finer level (which we can see in Abbildung 4.12)
problem: we have already checked $P_{V_j}$ above (the projection to approximation spaces), now, we check $P_{W_j}$ (the projection to detail spaces)

4.15 (v) Compute Projection of $f$ to $W_j$ (Part 1)Permalink

This allows us to compute the projection to $W_{j+1}$ . We get P_{W_{j+1}} f \stackrel{(\text{EA})}{=} P_{V_j} f − P_{V_{j+1}} f \stackrel{(\text{EB})}{=} \sum_{k\in\mathbb{Z}} (f, \phi_{j,k} )_{L^2} \phi_{j,k} − \sum_{k\in\mathbb{Z}} (f, \phi_{j+1,k} )_{L^2} \phi_{j+1,k}
- EA: for this, lets recall our def. of these $W_j$ s: so, $V_{j-1} = V_j \oplus W_j$ implies that $P_{V_{j-1}}=P_{V_j}+P_{W_j}$ and if I just solve for the projection to $W_j$ then I have to subtract $V_j$ on both sides, so we get this result $P_{V_j}f - P_{V_{j+1}}f$ here
- EB: we computed both $P_{V_j}$ and $P_{V_{j+1}}$ last time (see beginning of : $P_{V_j}f = \ldots$ )
Splitting the first sum in even and odd indices and using ( $\ref{eq:phi-jk}$ ) for the second sum, leads to

4.15 (vi) Compute Projection of $f$ to $W_j$ (Part 2), Wavelet Corresponding to the GeneratorPermalink

what we introduce now is actually the wavelet that is corresponding to the generator.
- recall:
  - if you think about where we had the continuous WT, we started with this wavelet $\psi$ that we used to multiply our function $f$ with for the WT and integrate
  - but here we started with a generator that is called $\phi$ (not $\psi$ ) and is not called “wavelet” but “generator”
  - and now comes the connection back to the wavelet that we had before. This is what this $\psi$ will be.
Introducing \label{eq:psijk}\boxed{\psi(x) := \phi(2x) - \phi(2x - 1)}\ \text{and}\ \boxed{\psi_{j,k} (x) := 2^{-\frac{j}{2}} \psi(2^{-j}x - k)} we get \psi_{j+1,k} = \frac{1}{\sqrt{2}} (\phi_{j,2k} - \phi_{j,2k+1} ) and the representation of P_{W_{j+1}} f simplifies to \label{eq:PWf}\boxed{P_{W_{j+1}} f = \sum_{k\in\mathbb{Z}} (f, \psi_{j+1,k} )_{L^2} \psi_{j+1,k}}.
- ( $\ref{eq:psijk}$ ) defines scaled and translated versions of this $\psi$ , exactly like we did with $\phi$
- ( $\ref{eq:psijk}$ ) $\psi_{j,k}$ has the same scaling as $\phi_{j,k}$
This $\psi$ we already know, you just have to recognize it:
The function \psi is the Haar wavelet we have seen before in (4.7 (iii)).
- Because $\phi(2x)$ is $\chi_{\left[0,\frac{1}{2}\right)}$ and, accordingly, $\phi(2x-1)$ is $\chi_{\left[\frac{1}{2},1\right)}$ . This means the function $\psi$ is $1$ on the interval $\left[0,\frac{1}{2}\right)$ and $-1$ on the interval $\left[\frac{1}{2},1\right)$ which is exactly what was the Haar wavelet!

4.15 (vii) $\{\psi_{j,k} : k \in \mathbb{Z} \}$ is an ONB of $W_j$ Permalink

Moreover, we have \label{eq:orthonormal}(\psi_{j,k} , \psi_{j,m})_{L^2} = \frac{1}{2}((\phi_{j−1,2k} − \phi_{j−1,2k+1}) , (\phi_{j−1,2m} − \phi_{j−1,2m+1}))_{ L^2 } = \frac{1}{2} (\delta_{k,m} + 0 + 0 + \delta_{k,m} ) = \delta_{k,m} .
- this means that the $\psi_{j,k}$ that we get are also an ONS
Combined with the representation of P_{W_{j+1}} we derived above (\ref{eq:PWf}) and that P_{W_{j+1}}g=g (for g\in W_{j+1} for all j) is the identity on W_{j+1} , this means that $\{\psi_{j,k} : k \in \mathbb{Z} \}$ is an orthonormal basis of $W_j$ .
- “orthonormality of ONS” follows from ( $\ref{eq:orthonormal}$ )
- “each element $f$ of $W_{j+1}$ can be represented by $P_{W_{j+1}}$ as a projection in this space” follows from the $P_{W_{j+1}}g=g$ argument
- “With $P_{W_{j+1}}$ we can represent any element $f \in W_{j+1}$ as linear combination of the $\psi_{j+1,k}$ s” follows from ( $\ref{eq:PWf}$ ) and (ii) (“completeness of ONS”)
- phth:
  - (ii), (iii) müssen für alle j gelten:
    - zB. mit index shift $j-1$ wird aus (iii):
    - “With $P_{W_{j}}$ we can represent any element $f \in W_{j}$ as linear combination of the $\psi_{j,k}$ s”
  - bestätigt er auch später: “the $P_{W_{j+1}}g=g$ argument holds for all $j$ , it is not only for $j+1$ , but I can insert $j-1$ for $j$ . It is somewhat confusing because of all of these $j$ and $j+1$ and we are jumping back and forth between these, but $j$ is arbitrary.”
- phth: wofür brauchen wir P_{W_{j+1}}g=g?
  - $P_{W_{j+1}}g=g$ gilt für alle $f\in W_{j+1}$ und zeigt, dass $f$ durch die Operation $P_{W_j}$ nicht verändert wird. Deshalb muss ( $\ref{eq:PWf}$ ) (zumindest) alle $f\in W_{j+1}$ als Linearkombination darstellen können. $\Rightarrow$ completeness of ONS $\{\psi_{j,k}\}$ .
The existence of a wavelet that generates an orthonormal basis for W_j not only holds in the special case here, but more generally.
- so, this general construction that we can start with the generator $\phi$ , then use the scaling equation and all these combinations to construct the $\psi$ and then get CONSes for all our $W_j$ s, is not linked just to the Haar wavelet, but this is a very general connection.
phth: the area under $\psi_{j,k}$ is always $2^{\frac{j}{2}}$ because $\{\psi_{j,k} : k \in \mathbb{Z}\}$ is an ONB/CONS of $W_j$ , ie. for a given $j\in\mathbb{Z}$ we have $(\psi_{j,k},\psi_{j,l})_{L^2}=\delta_{k,l}$ which means the $L^2$ -norm of $\psi_{j,k}$ must be $1$ for all $j,k\in \mathbb{Z}$ (wrong because: ONBs consider the $L^2$ -norm and the $L^2$ -norm is the area under the squared function, not the area under the function itself!)

source: Bredies

Bredies: Das obige Beispiel zeigt, dass sich zur stückweise konstanten Multiskalenanalyse tatsächlich ein Wavelet (das Haar-Wavelet) findet, welches eine Orthonormalbasis der Wavelet-Räume $W_j$ liefert. Eine ähnliche Konstruktion funktioniert auch allgemein, wie folgender Satz zeigt:

4.16 $\psi$ is a Wavelet, $\{\psi_{j,k}\}$ is an ONB/CONS of $W_j$ and $L^2$ Permalink

conditions:
- (c1) let $(V_j )_{j\in\mathbb{Z}}$ be a MRA with generator $\phi$
- (c2) let $h_k$ be the coefficients of the scaling equation.
- (c3) let \psi \in V_{−1} be defined by $\psi := \sum_{k\in\mathbb{Z}} (−1)^k h_{1−k} \phi_{−1,k}$ .
  - this $\psi$ may look surprising, but if we insert we only have $h_0$ and $h_1$ not equal to $0$ and, actually, both are just $\sqrt{2}/2$ (see ) and then we see that we get exactly the combination of $\chi_{\left[0,1/2\right]}$ (which just comes from the $k=0$ ) minus $\chi_{\left[1/2,1\right]}$ (which comes from $k=1$ ), where we also get this minus here from the $(-1)^k$ . So, for the one case that we computed, this $\psi$ here is exactly the $\psi$ that we had above, but here it is now told in general. → (4.17 (i))
Then,
- (i) $\{\psi_{j,k} : k \in Z\}$ is an orthonormal basis of $W_j$ .
  - which we showed for our specific generator $\phi$ , but it holds for all generators
- (ii) $\{\psi_{j,k} : j, k \in Z\}$ is an orthonormal basis of $L^2 (\mathbb{R})$ .
  - We already saw last time that $L^2$ is the closure of the [direct] sum of all the $W_j$ s (4.14 (ii)). Thus, if we have (i), then (ii) follows from what we already know.
  - phth: nach können wir jedes $f\in W_j$ mit $\{\psi_{j,k} : k \in Z\}$ darstellen. Da nach alle $W_j$ zusammen $L^2$ bilden, muss $\{\psi_{j,k} : k \in Z\}$ alle $f\in L^2$ darstellen können.
- (iii) $\psi$ is a wavelet with $c_\psi = \ln{2}$ .
  - again, with the same scaling as above
  - which is now the promised connection in its generality. So, for a function to be a wavelet we needed this corresponding constant $c_\psi$ to be bigger than $0$ and smaller than $\infty$ (this was for the continuous WT). In this case, we can even exactly tell what this constant is for the $\psi$ and this constant is always the same if this $\psi$ is constructed from an MRA and its generator. So, in essence, this means that “for any MRA we get a corresponding wavelet”.
“how to construct a wavelet given any MRA”, works as follows:
- if we have a MRA $(V_j)_j$ it comes always with a generator $\phi$ (that is part of the def. of a MRA)
- then we can , by reusing the coeffs from the scaling equation $h_{1-k}$
- if we do so, then this for all the $j$ (ie. if we fix the scale $j$ and use all the shifts $k$ )
- if we also use the scales $j$ and the shifts $k$ , then we even get a . So, this is like, instead of expressing things in terms of the $V_j$ , we are expressing in terms of the $W_j$ s (the detail spaces)
- furthermore, and we can specify the admissibility constant $c_\psi$ which is $\ln{2}$ in this case, that means “given an MRA we get a wavelet” (this connects the 1st part of this chapter where I just assumed I was given a wavelet, and now we see how we can construct one from an MRA)
proof:
- we will use similar techniques that we used in the specific case above where we did the computation for a very specific MRA (the p.w.c. MRA).
- Lets see if there are any unforeseen things happening on the way

4.17 Example: Wavelet Pairs $\phi,\psi$ Permalink

4.17 (i) Piecewise Constant MRA, Haar WaveletPermalink

Let $(V_j )_{j\in\mathbb{Z}}$ the piecewise constant MRA from (4.9).
Recalling, $h_k = 2^{-\frac{1}{2}} (\delta_{k,0} +\delta_{k,1} )$ , we get $\psi = \sum_{k\in\mathbb{Z}} (−1)^k h_{1−k} \phi_{−1,k} = (\ldots) = -\chi_{\left[\frac{1}{2},1\right)}+\chi_{\left[0,\frac{1}{2}\right)}$ which is again the Haar wavelet

4.17 (ii) Daubechies WaveletsPermalink

Constructing wavelets that both have compact support and have some smoothness is highly non-trivial.
The discovery of compact, regular and orthogonal wavelets by Ingrid Daubechies in 1988 is one of the reasons why wavelets became very popular in the 90s-era.
The are a family of wavelets that are characterized by a , ie. $\int_{\mathbb{R}}x^l \psi(x) dx = 0\ \text{for}\ l \in \{0, \ldots , k - 1\},$ for a given support.
Perhaps the most famous wavelet pair $\phi, \psi$ is the Daubechies wavelet with two vanishing moments, denoted by db2.
Interestingly, the Daubechies wavelet with one vanishing moment, db1, is equivalent to the Haar wavelet.
The values of the coefficients $h_k$ are available as tabulated numerical values. For db2, the values are also available analytically (see notes)

4.17 (iii) Symlets, Daubechies’ Least-Asymmetric WaveletsPermalink

A modification of the Daubechies wavelets are the symlets, which are sometimes also called Daubechies’ least-asymmetric wavelets.
As the name suggests, these wavelets are more symmetric than the Daubechies wavelets and also have vanishing moments.
For instance, the symlet with four vanishing moments is denoted by sym4.
Note that db2 and sym2 are equivalent.

4.3 Fast Wavelet TransformPermalink

(i) Due to (4.16), we know that
- $\psi$ constructed from an MRA is a wavelet and
- $\{\psi_{j,k} : j, k \in \mathbb{Z}\}$ is an orthonormal basis of $L^2 (\mathbb{R})$ .
- In particular, we have $f = \sum_{j,k\in\mathbb{Z}} (f, \psi_{j,k} )_{L^2} \psi_{j,k}.$
(ii) Recalling $(f, \psi_{j,k} )_{L^2} = L_\psi f (2^j , 2^j k)$ , (i) shows that it is sufficient to know $L_\psi f$ on the discrete set $\{(2^j , 2^j k) : j, k \in \mathbb{Z}\} \subset \left[0, \infty\right) \times \mathbb{R}$ to encode $f$ .
(iii) In the sense of (ii), the coefficients $(f, \psi_{j,k} )_{L^2}$ are the discrete wavelet transform of $f$ .
(iv) The key to a is to use that those , which is a consequence of the scaling equation and the definition of $\psi$ :

4.18 FWT Recursion Formulas: Computing the Coefficients $(f, \psi_{j,k} )_{L^2}$ from $(f, \psi_{j-1,k} )_{L^2}$ Permalink

basically, the same as (4.19), but (4.19) is the short notation of (4.18)
Herleitung der FWT Recursion Formulas (4.19):
the 1st part is a generalization of the scaling equation: says that the scaling equation can be used for all j
- recall: \phi = \sum_{k\in\mathbb{Z}} h_k \phi_{-1,k}
  - $\phi$ is constructed from the $\phi_{\color{red}{-1},k}$ , so $\phi_\color{red}{j}$ can be constructed from the $\phi_{\color{red}{j}-1,k}$
  - we have added $j$ in front of the $-1$ index in $\phi = \sum_{k\in\mathbb{Z}} h_k \phi_{\color{red}{-1},k}$
  - and it is shifted by $k$ (see $\phi_{j,\color{red}{k}}$ )
- recall: \psi := \sum_{k\in\mathbb{Z}} (−1)^k h_{1−k} \phi_{−1,k}
  - $\psi$ is constructed from the $\phi_{\color{red}{-1},k}$ , so $\psi_\color{red}{j}$ can be constructed from the $\phi_{\color{red}{j}-1,k}$
the 2nd part follows by inserting the expressions for $\phi_{j,k}$ and $\psi_{j,k}$ from the 1st part into the scalar products $(f,\phi_{j,k})$ and $(f,\phi_{j,k})$
this now gives us a way to go from $j$ to $j+1$
- given the $c$ s for one $j$ we can get the $c$ s and $d$ s for $j$ increased by $1$

4.19 FWT: Compute $c^{j+1}$ and $d^{j+1}$ from $c^j$ Permalink

Berkels: the c^j correspond to the projection P_{V_j} f and the c^{j+1}_k correspond to the projection P_{V_{j+1}} f and the d^{j+1}_k correspond to the projection P_{W_{j+1}} f
- Berkels: ie. if we know the function on $V_j$ we can split it in $V_{j+1}$ and $W_{j+1}$ (this is like splitting the coarse information from the detail)
FWT Recursion Formulas:

The FWT uses these FWT Recursion Formulas to
- from $P_{V_j} f$ (given in form of the coefficient vector $c^j$ ) the coarser projection $P_{V_{j+1}} f$ (as the coefficient vector $c^{j+1}$ ) and the wavelet component $P_{W_{j+1}} f$ (as the coefficient vector $d_{j+1}$ ).
Note that the sums are finite in case the scaling equation coefficient sequence only has finitely many nonzero entries.
If only few [of the scaling equation coefficients] are nonzero ( only $h_0$ and $h_1$ are nonzero, four $h_k$ s are nonzero), when going from $j$ to $j + 1$ .
problem: we still want to convert back to our original input after we have done whatever manipulation we wanted to do on the wavelet coeffs, so we need to be able to invert this. Fortunately, this is also possible in a similar way:

4.20 Invert the FWT: Compute $c^j$ from $c^{j+1}$ and $d^{j+1}$ Permalink

invert the (fast) wavelet transform, ie. we need to be able to
- $P_{V_j} f$ from the coarser projection $P_{V_{j+1}} f$ and the details $P_{W_{j+1}} f$

4.21 Compact Notation of the FWT and IFWT (Algorithm)Permalink

apply $M$ times, ie. go from $j$ to $j+1$ $M$ times
only the $V_j$ s are splitted further and further by the FWT algorithm, while $W_{j+1}$ , $W_{j+2}$ are not splitted further (see figure below)
the IFWT goes in the figure below in the opposite direction, ie. from bottom to top

4.22 $H$ and $G$ can be expressed as ConvolutionsPermalink

4.23 Coefficients $c_j = ((f, \phi_{j,k} )_{L_2} )_{k\in\mathbb{Z}}$ for one $j \in \mathbb{Z}$ , ie. the representation of our signal $f$ on an initial scale $j$ Permalink

problem:
- To compute the FWT, we still need the coefficients $c^j = ((f, \phi_{j,k} )_{L^2} )_{k\in\mathbb{Z}}$ for one $j \in\mathbb{Z}$ , ie. the representation of our signal $f$ on an initial scale $j$ .
- Usually, $f$ is not available analytically, so we cannot just compute $c^j$ .
- Thus, we need a way to estimate $c^j$ from available data.
condition:
- let $C := \int_{\mathbb{R}}\phi(x) dx$
then, $(f, \phi_{j,k} )_{L^2} = 2^\frac{j}{2} Cf (2^j k)+\mathcal{O}(2^j)$ thus, for $-j$ large enough [so that $\mathcal{O}(2^j)$ is negligible], $2^\frac{j}{2} Cf (2^j k)$ is a good approximation of $(f, \phi_{j,k} )_{L^2}$ .
it is sufficient, if we can sample our signal $f$ at equidistant points as long as the distance between neighboring sample points is sufficiently small.
Let $N = 2^M$ .
computing a single coefficient $(Hc)_k = (c \ast D_{−1} h)_{2k}$ is $\mathcal{O}(n)$
Both $Hc$ and $Gc$ are (due to the downsampling) vectors of length $N/2$ and thus can be computed with $\mathcal{O}(nN )$
the can be computed with , which is even faster than the FFT, if $n$ is small.
Similarly, one can show that the IFWT is also $\mathcal{O}(nN )$ .

4.24 2D Discrete Wavelet TransformPermalink

There are numerous ways to construct a two-dimensional wavelet transform.
The easiest way is to extend the one-dimensional wavelet transform by using the tensor product.
slide demo:
- Berkels: these 3 (H_j^2, S_j^2, D_j^2) contain a V_j, why are they called 2D detail spaces? - These 3 (H_j^2, S_j^2, D_j^2) are just details (like edges), although they contain a $V_j$ , because these 3 together form W_j by our definition of W_j: $\mathcal{W}_j^2$ must be the orthogonal complement of $V_j^2$ , ie. $V_{j-1}^2=V_j^2\oplus \mathcal{W}_j^2$ , so, when comparing this with the equation in Figure 4.24 we can see that $\mathcal{W}_j^2=H_j^2\oplus S_j^2\oplus D_j^2$
  - Berkels: If you check the picture on the slides you can also see that there is no local means left in these $H_j^2$ , $S_j^2$ , $D_j^2$ .

Figure 4.24:

Berkels: on the lower left it picks more vertical structures, on the top right it picks more horizontal structures and on the diagonal it picks more diagonal structures
Berkels: at the 2nd level we just get horizontal, vertical and diagonal components at a coarser scale. They contain coarser information because they are already averaged.
- why “averaged”?: 2nd level ist die Zerlegung von V_{j+1} in V_{j+2} und W_{j+2} und P_{V_{j+1}}f=f_{\text{avg}} im Sinne von (4.15 (i)), dh. das Bild oben links (a, aa, aaa, etc.) ist ein “average” (für p.w.c. MRA). In diesem Sinn ist W_{j+2} (2nd level) coarser als W_{j+1} (1st level), weil W_{j+2} ein Teil des bereits “geaverageten” V_{j+1} ist
  - for other wavelets (other MRA) than the Haar wavelet the “average” (top left image) and the details (3 other images) look different
- Berkels: but the 2nd level is still more horizontal, vertical and diagonal details. So, the 2nd level gives you additional information that is not in the 1st level. But it is still as directional as the 1st level.
- Berkels: the scale on everything is very much related to your image resolution, not to what you actually see in there

4.25 JPEG, JPEG2000Permalink

Wavelets have countless of applications, both in theoretical and in practical work. Here, we just confine to briefly mention one practical application.
Like the DCT, the FWT (Fast Wavelet Transform) can be used for image compression by only storing the most important wavelet coefficients.
In JPEG2000, the block-wise DCT is replaced by a wavelet transform, which results in significantly improved compression rates, ie. better image quality using the same amount of data or same image quality using less data.
- unlike the DCT, the WT has no “blocking” artifacts
While the classical JPEG format is still much more popular in digital photography than JPEG2000, digital cinema uses JPEG2000 as basis for video compression.

Share on

Twitter Facebook LinkedIn

4 The Wavelet TransformPermalink

4.1 The Continuous Wavelet TransformPermalink

4.1 The Continuous Wavelet TransformPermalink

4.2 FT vs WT, Representations of LψfL_{\psi}fPermalink

4.3 LψL_{\psi} maps into L2([0,∞)×R,dadba2)L^2(\left[0,\infty\right)\times\mathbb{R},\frac{dadb}{a^2}), LψL_{\psi} is Linear, Plancherel FormulaPermalink

4.4 Admissibility Condition, WaveletPermalink

4.5 LψL_{\psi} Almost Isometry, LψL_{\psi} is Continuous, (ˆψ(0)=0⇔\hat{\psi}(0) = 0\Leftrightarrow Mean Value of ψ\psi (not ˆψ\hat{\psi}!) is 00)Permalink

4.6 Wavelet Inversion TheoremPermalink

4.7 Example Wavelets: 1st and 2nd Derivatives of the Gaussian, Haar WaveletPermalink

4.7 (i) 1st Derivative of the GaussianPermalink

4.7 (ii) Mexican HatPermalink

4.7 (iii) Haar WaveletPermalink

4.2 The Discrete Wavelet TransformPermalink

4.8 MRA, MSA, GeneratorPermalink

4.9 Example MRA: “Space of Functions that are Constant on Dyadic Intervals”Permalink

Proof (vii)Permalink

Proof (i) - (iii)Permalink

Proof (iv)Permalink

Proof (v)Permalink

Proof (vi)Permalink

4.10 Orthogonal Projection P_{V_j}P_{V_j}, Approximating ff with the Nested Spaces V_jV_jPermalink

4.11 \phi_{j,k}\phi_{j,k}, Scaling Equation, Scaling Function \phi\phiPermalink

4.11 (i) \{\phi_{j,k} : k \in \mathbb{Z}\}\{\phi_{j,k} : k \in \mathbb{Z}\} is an ONB/CONS of V_jV_jPermalink

4.11 (ii) The Generator \phi\phi is not only in V_0V_0, but also in V_{-1}V_{-1}Permalink

4.11 (iii) We can express the Generator \phi\phi in the CONS \{\phi_{-1,k} : k \in \mathbb{Z}\}\{\phi_{-1,k} : k \in \mathbb{Z}\}Permalink

4.11 (iv) (f, \phi_{j,k} )_{L^2} = L_{\phi} f (2^j , 2^j k)(f, \phi_{j,k} )_{L^2} = L_{\phi} f (2^j , 2^j k)Permalink

4.12 h_kh_ks orthogonal to themselves when shiftedPermalink

4.13 Prerequisites: Orthogonal Complement, Direct SumPermalink

4.13 Approximation Space, Detail Space, Wavelet SpacePermalink

4.14 Splitting V_jV_j into Details, Splitting a L^2L^2-Function into all its Details in all Scales jjPermalink

4.14 (i) Decomposition of V_jV_j into DetailsPermalink

4.14 (ii) Decomposition of L^2L^2 into DetailsPermalink

4.14 (iii) Representing ff via V_jV_j and W_jW_jPermalink

4.14 (iv) Proof of 4.10 (continued)Permalink

4.15 Example: Piecewise Constant MRAPermalink

4.15 (i) Compute Projection of ff to V_jV_jPermalink

4.15 (ii) Compute h_kh_kPermalink

4.15 (iii) Compute Scaling EquationPermalink

4.15 (iv) Compute \phi_{j,k}\phi_{j,k}Permalink

4.15 (v) Compute Projection of ff to W_jW_j (Part 1)Permalink

4.15 (vi) Compute Projection of ff to W_jW_j (Part 2), Wavelet Corresponding to the GeneratorPermalink

4.15 (vii) \{\psi_{j,k} : k \in \mathbb{Z} \}\{\psi_{j,k} : k \in \mathbb{Z} \} is an ONB of W_jW_jPermalink

4.16 \psi\psi is a Wavelet, \{\psi_{j,k}\}\{\psi_{j,k}\} is an ONB/CONS of W_jW_j and L^2L^2Permalink

4.17 Example: Wavelet Pairs \phi,\psi\phi,\psiPermalink

4.17 (i) Piecewise Constant MRA, Haar WaveletPermalink

4.17 (ii) Daubechies WaveletsPermalink

4.17 (iii) Symlets, Daubechies’ Least-Asymmetric WaveletsPermalink

4.3 Fast Wavelet TransformPermalink

4.18 FWT Recursion Formulas: Computing the Coefficients (f, \psi_{j,k} )_{L^2}(f, \psi_{j,k} )_{L^2} from (f, \psi_{j-1,k} )_{L^2}(f, \psi_{j-1,k} )_{L^2}Permalink

4.19 FWT: Compute c^{j+1}c^{j+1} and d^{j+1}d^{j+1} from c^jc^jPermalink

4.20 Invert the FWT: Compute c^jc^j from c^{j+1}c^{j+1} and d^{j+1}d^{j+1}Permalink

4.21 Compact Notation of the FWT and IFWT (Algorithm)Permalink

4.22 HH and GG can be expressed as ConvolutionsPermalink

4.23 Coefficients c_j = ((f, \phi_{j,k} )_{L_2} )_{k\in\mathbb{Z}}c_j = ((f, \phi_{j,k} )_{L_2} )_{k\in\mathbb{Z}} for one j \in \mathbb{Z}j \in \mathbb{Z}, ie. the representation of our signal ff on an initial scale jjPermalink

4.24 2D Discrete Wavelet TransformPermalink

4.25 JPEG, JPEG2000Permalink

Share on

4.2 FT vs WT, Representations of $L_{\psi}f$ Permalink

4.3 $L_{\psi}$ maps into $L^2(\left[0,\infty\right)\times\mathbb{R},\frac{dadb}{a^2})$ , $L_{\psi}$ is Linear, Plancherel FormulaPermalink

4.5 $L_{\psi}$ Almost Isometry, $L_{\psi}$ is Continuous, ( $\hat{\psi}(0) = 0\Leftrightarrow$ Mean Value of $\psi$ (not $\hat{\psi}$ !) is $0$ )Permalink

4.10 Orthogonal Projection $P_{V_j}$ , Approximating $f$ with the Nested Spaces $V_j$ Permalink

4.11 $\phi_{j,k}$ , Scaling Equation, Scaling Function $\phi$ Permalink

4.11 (i) $\{\phi_{j,k} : k \in \mathbb{Z}\}$ is an ONB/CONS of $V_j$ Permalink

4.11 (ii) The Generator $\phi$ is not only in $V_0$ , but also in $V_{-1}$ Permalink

4.11 (iii) We can express the Generator $\phi$ in the CONS $\{\phi_{-1,k} : k \in \mathbb{Z}\}$ Permalink

4.11 (iv) $(f, \phi_{j,k} )_{L^2} = L_{\phi} f (2^j , 2^j k)$ Permalink

4.12 $h_k$ s orthogonal to themselves when shiftedPermalink

4.14 Splitting $V_j$ into Details, Splitting a $L^2$ -Function into all its Details in all Scales $j$ Permalink

4.14 (i) Decomposition of $V_j$ into DetailsPermalink

4.14 (ii) Decomposition of $L^2$ into DetailsPermalink

4.14 (iii) Representing $f$ via $V_j$ and $W_j$ Permalink

4.15 (i) Compute Projection of $f$ to $V_j$ Permalink

4.15 (ii) Compute $h_k$ Permalink

4.15 (iv) Compute $\phi_{j,k}$ Permalink

4.15 (v) Compute Projection of $f$ to $W_j$ (Part 1)Permalink

4.15 (vi) Compute Projection of $f$ to $W_j$ (Part 2), Wavelet Corresponding to the GeneratorPermalink

4.15 (vii) $\{\psi_{j,k} : k \in \mathbb{Z} \}$ is an ONB of $W_j$ Permalink

4.16 $\psi$ is a Wavelet, $\{\psi_{j,k}\}$ is an ONB/CONS of $W_j$ and $L^2$ Permalink

4.17 Example: Wavelet Pairs $\phi,\psi$ Permalink

4.18 FWT Recursion Formulas: Computing the Coefficients $(f, \psi_{j,k} )_{L^2}$ from $(f, \psi_{j-1,k} )_{L^2}$ Permalink

4.19 FWT: Compute $c^{j+1}$ and $d^{j+1}$ from $c^j$ Permalink

4.20 Invert the FWT: Compute $c^j$ from $c^{j+1}$ and $d^{j+1}$ Permalink

4.22 $H$ and $G$ can be expressed as ConvolutionsPermalink

4.23 Coefficients $c_j = ((f, \phi_{j,k} )_{L_2} )_{k\in\mathbb{Z}}$ for one $j \in \mathbb{Z}$ , ie. the representation of our signal $f$ on an initial scale $j$ Permalink