- The FT is a global operator. You don’t always want this full globality. So, it is natural to ask for more local transforms, ie. where one coefficient only depends on a small part of your function, not on its entirety. So that you could by manipulating the coefficients of your transform make local changes to your function versus being forced to do global changes. Think about the low-pass filter: this removed the high frequencies/the details, but it also introduced some “ringing” because of these global changes. And the wavelet transform is a more localized transform.
- So we want something that is more localized than the FT.
- conditions:
- f,ψ∈L2
- f: the image
- ψ: encodes how you want to transform
- a>0 (scale parameter)
- b∈R (spatial parameter)
- then, the wavelet transform of f is Lψf(a,b)=∫Rf(x)1√aψ(x−ba)dx
- like a convolution, see (4.2 (ii) (B))
- we are integrating our image f and we are weighting this by 1√aψ. And what we do with our ψ is: we shift ψ to a certain position b. So, we have our ψ and we shift it with b over the signal wherever we want to have it and we scale ψ with a and √a. So this 1√a and a is exactly the change of variables that “zooms in” or “out” to grab some information from your signal.
- Eg. if you would think that ψ is just the characteristic function of [0,1], then it would give you a mean value on [0,1], if a=1 and b=0,
- and if you now change b you would take different mean values because you are shifting this ψ
- and with changing this a you would be able to do mean values or some accumulation over smaller or larger intervals depending on how you choose this a.
- So this somehow samples … (interrupted by question)
- Q&A: in case ψ does not have compact support then it is still global, not local! So, finding good ψs is a very important question. For now we just know (vaguely) that we have some ψ. We will need to find out what suitable ψs are and what they should fulfill.
- But, in general, you can have this image and you go to a certain position of your signal with b and you choose a certain zoom factor a and then you somehow aggregate values in there.
4.2 FT vs WT, Representations of Lψf
- (i) Lψ maps f:R→C to Lψf:(0,∞)×R→R
- in words: the result of the WT is a 2D function (2-dimensional domain for Lψf)
- recall: the result of the FT is the 1D function, ie. F maps f:R→C to Ff:R→C (domain and codomain are the same for f and Ff)
- 2D: because you can evaluate it for different as and bs, not only for different frequencies.
- We increase from a 1D domain to a 2D domain which already implies that this representation (i) somehow has to be overcomplete. So, this has to have redundant information because it cannot be that we need a 2D function to describe a 1D function. The 2D function has much more power to store information.
- (ii) 2 representations of Lψf (equivalent to (4.1)):
- (A) Lψf as scalar product: Lψf(a,b)=1√a(f,T−bD1aψ)L2
- (B) Lψf as convolution: Lψf(a,b)=1√a(f∗D−1aψ)(b)
- just a side remark: (B) is related to (2.16): u(x,σ)=(g√2σ∗f)(x), where we also filtered with kernels of different sizes, so there is some relation
- depending on the circumstances one representation will be more useful than the other
- (iii) Where does Lψf map? Just a variant of L2 on [0,∞)×R: L2([0,∞)×R,dadba2):={F:[0,∞)×R→R μ−measurable:∫R∫∞0|F(a,b)|2dadba2<∞}
- notation: this dadba2 in L2([0,∞)×R,dadba2) is not a set into which we map (like eg. in the L2(Rd,C) notation, where we map from Rd to C), but it tells us that we need to change the integration measure dadb by rescaling it with 1a2. (just Berkels’ notation)
- everything except the a2 scaling would be “normal Lebesgue”, in particular, ∫R∫∞0|F(a,b)|2dadb<∞ would be “normal Lebesgue”, ie. the square integral is finite
- we weight this integral ∫R∫∞0|F(a,b)|2dadb by a2: that means for large a this integral will be decreased because we multiply by sth that is small, but for small a this integral will be strongly increased. So, this has a singularity at a=0 which puts certain requirements on this transform F, so that |F(a,b)|2 has to vanish at a=0 (otherwise ∫R∫∞0|F(a,b)|2dadb would not be finite) and even a bit more.
- (iv) equipped with the same equivalence relation used for the Lebesgue spaces (B.3) and scalar product (F,G)L2([0,∞)×R,dadba2):=∫R∫∞0|F(a,b)G(a,b)|2dadba2
- the scalar product looks like the normal scalar product for L2, but we add this scaling a2 (which we also required for the new L2 space that we just defined in (iii))
- problem: so far it is unclear why this a2 is sitting there, but in the next proposition we will see why we have it and how it relates to the wavelet transform
- conditions:
- ψ∈L2 s.t. 0<cψ:=2π∫∞0|ˆψ(ω)|2ωdω<∞ (warning: these are 3 conditions!)
- then,
- (i) Lψ:L2→L2([0,∞)×R,dadba2)
- this means that the functions that we get from this transform are not in the normal L2 space, but in this “L2 with the different scaling a2”. That is why we need this new space.
- (ii) Lψ is a linear mapping
- (iii) (Lψf,Lψg)L2([0,∞)×R,dadba2)=cψ(f,g)L2 for all f,g∈L2
- like Plancherel formula which said that (Ff,Fg)L2=(f,g)L2. This is essentially also true here. Except that we do not have the normal scalar product on the lhs, but this new scalar product and there is this extra scaling factor cψ.
- (iii) says even more than that Lψ is continuous (discussed more precisely in (4.5 (ii))). Because (iii) is a very strong property. It says that Lψ is almost like an isometry, except that it changes lengths by a constant factor cψ.
- proof:
- prove (iii) using
- Plancherel formula (3.27)
- properties of F for translation, modulation, etc. (3.4)
- Fubini
- f real-valued ⇒ Ff hermitian (3.6)
- (iii) now immediately implies (i)
- because (i) claims that we are mapping to this special space and for this special space we need that ∫R∫∞0|F(a,b)|2dadba2 is finite, but if I plug in f or g in (iii) then we see that the integral we are looking for (Lψf,Lψf)L2([0,∞)×R,dadba2) is the same as cψ(f,f)L2, ie. the norm squared of f, and the norm squared of f is, of course, finite since we are starting with f∈L2.
- Lψ is linear (ii) is clear because the integral in the def. of Lψ is linear in f (see def. of Lψ)
- sidenote: if you look closely in (4.3), we did not need that cψ>0, we only needed cψ<∞. But we will need cψ>0 to get the inverse.
4.4 Admissibility Condition, Wavelet
- (i) the condition 0<cψ:=2π∫∞0|ˆψ(ω)|2ωdω<∞ is called admissibility condition and cψ is the admissibility constant
- name “admissibility” because if this condition is not fulfilled, ie cψ=∞, then the norm in (4.3) cψ(f,g)L2 explodes.
- If cψ=0 then the norm cψ(f,g)L2 just degenerates: so that f is mapped to norm 0, although it was not 0.
- (ii) any ψ∈L2 that fulfills the admissibility condition is called wavelet
- outlook:
- cψ<∞ gives us the continuity of the WT → (4.5 (ii))
- cψ>0 gives us the invertibility of the WT → (4.6)
4.5 Lψ Almost Isometry, Lψ is Continuous, (ˆψ(0)=0⇔ Mean Value of ψ (not ˆψ!) is 0)
- (i) the scalar product in (4.2 (iv)) induces the norm ‖
- (ii) from (i) we see that (4.3) shows that the WT is continuous if c_{\psi}<\infty
- because (i) shows the boundedness of the WT and since the WT is linear (4.3 (ii)) we can conclude from (ex05 problem 2) that the WT is continuous
- (iii) since \displaystyle\int_0^1\frac{1}{\omega}d\omega=\lim_{a\searrow 0}\int_a^1\frac{1}{\omega}d\omega=\ldots=\infty (since the integral in the admissibility condition has the same structure as this integral here, ie. it includes the integral from 0 to 1, if the \lvert \hat{\psi}(\omega)\rvert^2 does not go below a certain value then this integral in the admissibility condition will be \infty. This means \lvert \hat{\psi}(\omega)\rvert must go to 0, otherwise the integral in the admissibility condition cannot be finite!) it is necessary that \lvert \hat{\psi}(\omega)\rvert^2 decays to 0 quickly enough for \omega \to 0 (ie. \lvert \hat{\psi}(\omega)\rvert^2 \to 0 for \omega \to 0) in a suitable sense to get c_\psi < \infty (ie. the finiteness of the integral in the admissibility condition at 0 can only come from the fact that \lvert \hat{\psi}(\omega)\rvert^2 also vanishes there).
- In case \hat{\psi} is continuous (at least in 0), \lvert \hat{\psi}(\omega)\rvert^2 \to 0 for \omega \to 0 implies \hat{\psi}(0) = 0 is necessary (but not sufficient) to get c_\psi < \infty.
- note: since \psi\in L^2 and the FT maps from L^2 to L^2 it is not guaranteed that \hat{\psi} is continuous.
- phth: In case \hat{\psi} is not continuous in 0, \lvert \hat{\psi}(\omega)\rvert^2 \to 0 for \omega \to 0 does not imply \hat{\psi}(0) = 0 (eg. think about the function f with f=1 for x=0 and f=0 for x\neq 0, then f is not continuous in 0 and \lvert f(x)\rvert^2\to 0 for x\to 0, but f(0)\neq 0)
- What does \hat{\psi}(0) = 0 mean?
- we have discussed this: the 0 frequency is the only frequency that you can easily interpret: the 0 frequency encodes somehow the mean: so if the 0 frequency is 0 then the mean of your image is 0 (here: mean of this \psi is 0).
- In other words, it is necessary (but not sufficient) that the mean value of \psi (not \hat{\psi}!) is 0 for \psi to be a wavelet.
- (iv) The condition c_\psi > 0 is necessary to guarantee the invertibility of the transform on its image (4.6):
4.6 Wavelet Inversion Theorem
- conditions:
- f\in L^2
- \psi\in L^2
- \psi is a wavelet
- ie. \psi fulfills the admissibility condition
- then, \text{for all}\ f\in L^2:\ f(x) = \frac{1}{c_\psi}\int_{\mathbb{R}}\int_0^\infty L_{\psi}f(a,b)\frac{1}{\sqrt{a}}\psi\left(\frac{x-b}{a}\right)\frac{dadb}{a^2}\quad\text{for a.e.}\ x\in\mathbb{R}
- in words: we can transform back by integrating over da and db (instead of dx as in the WT (4.1)) with the proper scaling a^2 and the same integration weight \frac{1}{\sqrt{a}}\psi\left(\frac{x-b}{a}\right) as in the WT (4.1)
- proof:
- Hilbertian adjoint of L_{\psi} L_{\psi}^\ast:L^2(\left[0,\infty\right)\times\mathbb{R},\frac{dadb}{a^2})\to L^2(\mathbb{R})
- a mapping that goes exactly in the other direction than L_{\psi}, ie. it is mapping from the weighted L^2 space to L^2
- idea:
- in finite dimensions the expression (L_{\psi}f,F)_{L^2(\left[0,\infty\right)\times\mathbb{R},\frac{dadb}{a^2})} would be a scalar product on some \mathbb{R}^n and the linear mapping L_{\psi} would be a matrix, ie. the expression would be like the scalar product (\underline{A}x,y)_{\mathbb{R}^n}
- recall (a standard linear algebra trick): what can you do to move the matrix \underline{A} in (\underline{A}x,y)_{\mathbb{R}^n} to the other argument y?
- for real \underline{A}: (\underline{A}x,y)_{\mathbb{R}^n}=(x,\underline{A}^{\top} y)_{\mathbb{R}^n}
- for complex \underline{A}: (\underline{A}x,y)_{\mathbb{R}^n}=(x,\overline{\underline{A}^{\top}} y)_{\mathbb{R}^n}
- recall: Hermitian matrices \underline{A}=\overline{\underline{A}^{\top}} can be understood as the complex extension of real symmetric matrices \underline{A}=\underline{A}^{\top}., Hermitian matrix
- there are 2 kinds of adjoints, Hilbert space adjoint vs Banach space adjoint
- “Hilbertian” adjoint (usually called conjugate transpose)
- “Dual” or “Banachian” adjoint
- the Hilbertian adjoint just generalizes the concept of transposing or adjoining a matrix to move it from one scalar product argument to the other to linear mappings on Hilbert spaces (so, just think of it as a generalized transpose)
- problem:
- so far we have only shown that it is necessary that the mean of \psi is 0 (to ensure \hat{\psi}(0)=0), but this is not sufficient for \psi to be a wavelet!
4.7 Example Wavelets: 1st and 2nd Derivatives of the Gaussian, Haar Wavelet
4.7 (i) 1st Derivative of the Gaussian
- (i) 1st derivative of the Gaussian: Let \psi(x):=xe^{-\frac{x^2}{2}}=-\frac{d}{dx}g(x), where g(x):=e^{-\frac{x^2}{2}}
- idea: to show that this is a wavelet we need to compute c_\psi and this involves computing the FT of \psi (now you can see why I chose the Gaussian first because there we know what the FT is)
- then, \hat{\psi}(\omega)=\mathcal{F}\psi(\omega)=\mathcal{F}\left(-\frac{d}{dx}g\right)(\omega)\stackrel{(3.17)}{=}-i\omega\mathcal{F}g(\omega)\stackrel{(3.20)}{=}-i\omega g(\omega)=-i\omega e^{-\frac{\omega^2}{2}}=-i\psi(\omega) \Rightarrow c_\psi=2\pi\int_0^\infty\frac{\lvert \hat{\psi}(\omega)\rvert^2}{\omega}d\omega=2\pi\int_0^\infty\frac{\lvert -i\omega e^{-\frac{\omega^2}{2}}\rvert^2}{\omega}d\omega=2\pi\int_0^\infty\frac{\omega^2 e^{-\frac{2\omega^2}{2}}}{\omega}d\omega=2\pi\int_0^\infty\omega e^{-\omega^2}d\omega=\pi\int_0^\infty 2\omega e^{-\omega^2}d\omega=\pi\left(-e^{-\omega^2}\right)\bigg\vert_0^\infty=\pi \Rightarrow \psi\ \text{is a wavelet.}
- important: unlike the mean of \psi, the mean of g is not 0, so g cannot be a wavelet itself!
- intuition:
- g is always above the x-axis, ie. the mean (average value of the function) of g must be positive and not equal to 0!
- \psi is in part below the x-axis.
- \psi is a product of the function x and a Gaussian. This product must be negative for x<0 because x is negative for x<0. Ie. the graph of \psi must be in the 3rd quadrant. Since x=0 for x=0 the graph must go through the origin. For x>0 the graph must be in the 1st quadrant because x and the Gaussian are both positive there.
- With the convolution representation of the wavelet transform, we get L_{\psi}f(a,b)=\frac{1}{\sqrt{a}}(f\ast D_{-\frac{1}{a}}\psi)(b)=-\frac{1}{\sqrt{a}}\left(f\ast D_{-\frac{1}{a}}g^\prime\right)(b)=\frac{a}{\sqrt{a}}\left(f\ast (D_{-\frac{1}{a}}g)^\prime\right)(b)=\sqrt{a}\left(f\ast (D_{-\frac{1}{a}}g)\right)^\prime(b)
- Similarly to the Canny Edge Detector (2.16), the input f is convolved with a scaled Gaussian for different scales (ie. different kernel sizes).
- in (2.16) we noticed that for different scales we get different edge strengths and we had some nice properties and it is somewhat similar here
- Unlike the Canny Edge Detector (2.16), a derivative is taken after the convolution.
4.7 (ii) Mexican Hat
- (ii) 2nd derivative of the Gaussian, aka Mexican hat: Let \psi(x):=\left(1-x^2\right)e^{-\frac{x^2}{2}}=-\frac{d^2}{dx^2}g(x)
- one can show (exercise, similar to (i)) that \hat{\psi}(\omega)=\omega^2 e^{-\frac{\omega^2}{2}} and c_\psi=\pi. Thus, the Mexican hat is also a wavelet.
- this gives us some hint at what wavelets can look like and what kind of information they possibly extract from our functions:
- these wavelets are not local mean values because the wavelets themselves have mean 0, but you can think of (i), for instance, as something like a smoothed local derivative. Because, at x=0, \psi weights the things to the right of the point x=0 positively and the things to the left of the point x=0 negatively. Thus, \psi is something like a derivative (cf. derivative filters (2.12)).
- however, (ii) is more like a 2nd derivative because you have a positive value in the center and negative values in the surrounding, but all very smoothed out.
4.7 (iii) Haar Wavelet
- (iii) Haar wavelet: \begin{equation*} \psi(x):=
\begin{cases}
1 & 0 \leq x < \frac{1}{2}\\
-1 & \frac{1}{2} \leq x < 1\\
0 & \text{else}
\end{cases}
\end{equation*}
- here it is easier to see what \psi computes: \psi computes the mean over \left[0,\frac{1}{2}\right] minus the mean over \left[\frac{1}{2},1\right], so it is also sth like a derivative of your input
- to show that the Haar wavelet is a wavelet we can
- represent \psi as \psi=\chi_{\left[0,\frac{1}{2}\right)}-\chi_{\left[\frac{1}{2},1\right)} and then we can use the properties of the FT to show that \hat{\psi}(\omega)=\frac{1}{\sqrt{2\pi}}ie^{-i\frac{\omega}{2}}\sin\left(\frac{\omega}{\psi}\right)\text{sinc}\left(\frac{\omega}{4\pi}\right)
- this should not surprise us because we know that the FT of \chi_{\left[-B,B\right]} is a cardinal sine and we know that shifts are turned into modulations, and therefore, \chi_{\left[0,\frac{1}{2}\right)} would have to be shifted to \chi_{\left[-\frac{1}{4},\frac{1}{4}\right)} (ie. the support of \chi is shifted to the origin s.t. \chi has the shape \chi_{\left[-B,B\right]}) and \chi_{\left[\frac{1}{2},1\right)} as well, and that is encoded in this modulation e^{-i\frac{\omega}{2}} (cf. (3.4))
- this is not completely obvious and you really need to compute this to see that this formula is correct
- based on that we can show c_\psi=\ln{2}, thus \psi is a wavelet
- interesting: you do not need to compute this manually because c_\psi=\ln{2} also follows from (4.16) + (4.17 (i))
- outlook: we will show how to generate wavelets with certain properties and with this generation we can also create the Haar wavelet and that is why it will inherit the general properties
- Unlike the derivatives of the Gaussian we considered before (ie. (i) and (ii)) that are very smooth and have non-compact support, the Haar wavelet is discontinuous and has compact support.
- interesting: it is actually quite difficult to find wavelets that are both smooth and have compact support (will be discussed later)
- note: unlike what the name suggests this discrete WT is not fully discrete like the DFT. (However, there are some relations to the FS, so it is more like the FS.)
- problem: One thing that is very apparent is that this WT so far is redundant because we take a 1D function and convert it to a 2D function. Therefore, this 2D function has to have some redundant information because we cannot need a 2D function to describe a 1D function.
- more precisely: The (continuous) wavelet transform of a function f \in L^2 (\mathbb{R}) is apparently a redundant representation of f (f is univariate (1D function), L_\psi f is bivariate (2D function)).
- The question arises whether it is sufficient to know L_\psi f on a subset of \left(0,\infty\right] \times \mathbb{R} (ie. perhaps we do not need to compute it for all the scalings and all the shifts, but just for some of them).
- It will turn out that this is indeed the case for certain discrete subsets, but showing this requires several preparations.
- But it is very important to venture into this because in order to practically use the WT we need to figure out for which scalings and for which translations we actually need these computations.
- problem: The first tool on this way is the MRA:
- problem:
- the WT is redundant
- we are now on our way to the DWT that is supposed to only look at necessary spatial and scaling parameters, so not at all of them, but hopefully just a subset
- to make this possible I introduced the concept of a so-called MRA:
4.8 MRA, MSA, Generator
- For j \in \mathbb{Z}, let V_j be (vii) a closed subspace of L^2(\mathbb{R}).
- Then, (V_j)_{j\in \mathbb{Z}} is called multiresolution analysis (MRA) or multiscale approximation (MSA), if
it fulfills the following conditions:
- (i) translational invariance \forall j, k \in \mathbb{Z} : f \in V_j \Leftrightarrow T_{2^j k} f \in V_j
- when we fix a certain j, we must be able to shift f in V_j while f stays in V_j. But we are not allowed to shift arbitrarily, but we have to make finite steps: k is the number of steps that we can take. So we can take arbitrary many steps (because k is in Z), but the step size is 2^j, ie. the bigger the j the larger the step. Which hints that these V_j’s get smaller and smaller (→ (ii)) because for j=0 we must be able to shift by step size 2^0=1, for j=1 we only need to be able to shift by step size 2, etc.
- (ii) inclusion \forall j \in \mathbb{Z} : V_{j+1} \subset V_j
- ie. the space is getting smaller when I increase j
- “if j_1 > j_2 the space V_{j_1} is included in V_{j_2}”
- with increasing j the spaces can only get smaller
- (iii) scaling \forall j \in \mathbb{Z} : f \in V_j \Leftrightarrow D_{\frac{1}{2}} f \in V_{j+1}
- this D_{\frac{1}{2}} just means that everything is spread out
- eg. lets say your f is a characteristic function of the interval \left[0,1\right], then when you apply D_{\frac{1}{2}} it will be the characteristic function of \left[0,2\right] (which also relates to the fact that if j gets bigger then these shifts T_{2^jk} get bigger)
- problem: we need some more properties to make this meaningful because so far with (i-iii) we could just say that every V_j is all of L^2, then everything here is trivially fulfilled. So, we need to prevent that with the following properties:
- (iv) trivial intersection \bigcap_{j\in \mathbb{Z}} V_j = \{0\}
- ie. the intersection of all the V_j’s is just 0 which formalizes that these spaces get smaller and smaller and at some point there is nothing in there any more.
- problem: but (iv) would still be fulfilled if we just say all V_j’s are just 0, then everything is trivially fulfilled. We do not want this, either. Therefore, we require (v):
- (v) completeness (actually, “denseness in L^2”) \overline{\bigcup_{j\in \mathbb{Z}} V_j} = L^2(\mathbb{R})
- phth: aka “\bigcup_{j\in \mathbb{Z}} V_j is dense in L^2”
- siehe “Prerequisites” → “dense” → def. (i)
- Berkels: in this sense the V_j’s must capture everything that is in L^2.
- phth: complete in the sense that L^2 is complete (Riesz-Fischer theorem) and we must be able to approximate any L^2 function arbitrarily well with a sequence of functions in \bigcup_{j\in \mathbb{Z}} V_j, where the limit of this sequence need not necessarily be in \bigcup_{j\in \mathbb{Z}} V_j (like we can approximate any element of \mathbb{R} with a sequence in \mathbb{Q})
- (vi) orthonormal basis \exists\phi \in V_0 : \{T_k \phi : k \in \mathbb{Z}\} is a CONS of V_0
- T_k \phi means “shift \phi by integers k”
- the “shift \phi by integers” links to the translational invariance (i) for j=0. There [in (i)] we [require that all f in V_0] also need to be able to shift by integers. Thus, if \phi is in V_0, then all the T_k\phi must [necessarily!] be in V_0 by property (i). But [in (vi)] what we want is that this [set] is a CONS!
- which gives us a very nice way to construct a basis for these kind of spaces
- The function \phi from the orthonormal basis property is called generator or scaling function of the MRA.
- name “generator” comes from (vi) because \phi generates an ONS of V_0
- name “scaling function” comes from the fact that \phi fulfills the scaling equation (4.11), ie. \phi can be expressed by a scaled version of itself
- problem: this sounds like a very abstract set of properties where it is not so clear yet why this is good or perhaps not even clear how we could even choose this V_j. So let us first start with a simple example of such an MRA:
4.9 Example MRA: “Space of Functions that are Constant on Dyadic Intervals”
- for j\in\mathbb{Z} let V_j:=\{f\in L^2 : f\vert_\left[k2^j,(k+1)2^j\right)\ \text{is constant for all}\ k\in\mathbb{Z}\}
- ie. V_j is the space of square integrable functions that are constant on the dyadic intervals \left[k2^j , (k + 1)2^j\right) (“dyadic” because the interval limits are powers of 2, related: dyadic logarithm)
- phth: square integrable means for p.w. constant functions that they must either have compact support or be the zero function, ie. the p.w.c. functions are nonzero only a finite number of dyadic intervals
- then, (V_j)_{j\in\mathbb{Z}} is a MRA.
Proof (vii)
- (vii) The V_j are obviously closed subspaces of L^2(\mathbb{R}).
- Berkels: If you have a sequence of functions f_n\in V_j for a fixed j that are piecewise constant on the interval \left[k2^j , (k + 1)2^j\right), then their limit \lim_{n\to\infty}f_n = f will also be piecewise constant on the same interval \left[k2^j , (k + 1)2^j\right), ie. f\in V_j. There is nothing that could happen to make it [\lim_{n\to\infty}f_n] not piecewise constant.
- “Folgen-Abgeschlossenheit” (Bredies S.17): “(…) für jede Folge ( x_n ) in U mit x_n \to x der Grenzwert x auch in U ist”
- recall: “closed” = “abgeschlossen bzgl bestimmter Operationen” (aka “closed under the limit operation”)
- zB. closed bzgl Multiplikation heißt, Multiplikation führt nicht aus der Menge raus, dh. das Produkt bleibt in der Menge, in der die Faktoren waren
- “In a complete metric space, a closed set is a set which is closed under the limit operation.”, Wikipedia
- L^2 is complete (Riesz-Fischer Theorem)
Proof (i) - (iii)
- (i) Translational invariance, (ii) inclusion and (iii) scaling are obviously fulfilled from the corresponding properties of the dyadic intervals.
- (i) - (iii) are intuitively clear:
- (i) we need to be able to shift by multiples of 2^j: if say j=0, then we need to shift the function by 2^0=1, or by 2^1=2, etc. and this just shifts the constant values from one interval to another interval, but they will stay piecewise constant on these intervals, you just shift the values to different intervals
- (ii) if you scale by \frac{1}{2} you are supposed to go from V_j to V_{j+1}. Lets look at the green function here: if we scale that by \frac{1}{2} that means we double everthing, ie. this value will be mapped to that (…), so you just switched from being constant on intervals of length one to being constant on intervals of length two
- (iii) if you are piecewise constant on intervals of length 2^{j+1} of course this is also piecewise constant on intervals of length 2^j (because the latter is just a weaker property)
Proof (iv)
- (iv) A function that is in the intersection of all V_j has to be constant
- on \left[0,1\right) and on [−1, 0) (for j=0).
- on \left[0,2\right) and on [−2, 0) (for j=1).
- … etc.
- on \left[0,\infty\right) and on (−\infty, 0) (for j\to \infty).
- The only L^2-function (L^2 means the function must be integrable on \left[0,\infty\right) and (−\infty, 0)!) that fulfills this is the zero function.
Proof (v)
- (v) The completeness follows from the fact C_c(\mathbb{R}) is dense in L^2(\mathbb{R}) (B.8), continuous functions on compact sets (ie. C_c(\mathbb{R})) are uniformly continuous and that uniformly continuous functions can be approximated in the supremum norm with functions constant on dyadic intervals.
- phth: thus, functions constant on dyadic intervals are dense in L^2
- Berkels: You need to stack these arguments, ie. first go to compact support, then go to uniform continuity and then you know you just need to make these intervals small enough to have the necessary approximation.
- phth:
- This prooves that “\bigcup_{j\in \mathbb{Z}} V_j is dense in L^2” (denseness).
- But, \bigcup_{j\in \mathbb{Z}} V_j is NOT complete in the sense that “all Cauchy sequences in \bigcup_{j\in \mathbb{Z}} V_j converge in \bigcup_{j\in \mathbb{Z}} V_j” (completeness).
- L^2 is complete (Riesz-Fischer Theorem), but this does not necessarily mean that \bigcup_{j\in \mathbb{Z}} V_j is complete (because eg. \mathbb{Q} is dense in \mathbb{R}, but \mathbb{Q} is not complete just because \mathbb{R} is complete!).
- But we can approximate any L^2 function arbitrarily well with a function in \bigcup_{j\in \mathbb{Z}} V_j.
- recall:
- “completeness” = “Cauchy sequences in this set converge to a point in this set”
- ”\mathbb{Q} is dense in \mathbb{R}”
- “L^p is complete” (cf. section “Prerequisites”)
- “supremum norm” = “distance between two functions”
- “uniform continuity” def.
Proof (vi)
- (vi) Finally, \phi = \chi_{\left[0,1\right)} is a generator [because]
- (vi.i) The ONS property [of \{T_k \phi : k \in \mathbb{Z}\}] is obviously fulfilled and
- Behauptung: betrachte V_0 = \{f \in L^2 : f \vert_{\left[k,k+1\right)}\ \text{is constant for all}\ k \in \mathbb{Z}\}, dann ist \phi = \chi_{ \left[0,1\right) } \in V_0 ein Generator, weil:
- Berkels:
- So, [for \phi to be a generator] we need that these shifted versions of T_k\phi are a CONS.
- the orthonormal property is obvious because if you shift \left[0,1\right] by an integer, then it will not intersect with the interval \left[0,1\right] any more. So, for different integer shifts k you just have this function [\chi_{ \left[0,1\right) }] on \left[k,k+1\right] and those [\chi_{ \left[0,1\right) } and \chi_{ \left[k,k+1\right) }] do not intersect.
- dh. die L^2-norm der Produktfunktion \chi_{ \left[m,m+1\right) } \cdot \chi_{ \left[n,n+1\right) } ist 0 für m \neq n und 1 für m=n, was die ONS Eigenschaft der shifted versions of T_k\phi beweist (see ONS def.).
- (vi.ii) the completeness [of \{T_k \phi : k \in \mathbb{Z}\}] is shown based on the fact that (A) the squared L^2-norm of an element of V_0 has to be (B) the sum of the squared values of the element on each of the intervals \left[k,k+1\right).
- Berkels:
- (B) is equivalent to the Fcoeffs in (3.42)
- so, if I am supposed to give you the L^2-norm (rhs of (3.42)) of the green function in the picture above it is always the squared area under the graph and, since the length is 1, it is just the squares of the heights. So, this is a series. Based on this, you directly see that the completeness def. (3.42) is fulfilled, where we needed to show that the norm of function squared (rhs of (3.42)) is the same as the sum of the squared Fcoeffs (lhs of (3.42)). Here, the Fcoeffs (lhs of (3.42)) are exactly the interval heights (because we just multiply a function that is constant on the interval \left[k,k+1\right) which gives us then the value in the interval \left[k,k+1\right)).
- phth: “(A) = (B)” \Leftrightarrow shifted versions of T_k\phi are a CONS (cf. (3.42))
- phth: consider an element f \in V_0, then
- (A) is \lVert f\rVert^2_{L^2}
- (B) is \sum_{k\in Z}(f, T_k\chi_{\left[0,1\right)})^2, where (f, T_k\chi_{\left[0,1\right)}) is the k-th Fcoeff
4.10 Orthogonal Projection P_{V_j}, Approximating f with the Nested Spaces V_j
- conditions:
- Let (V_j)_{j\in\mathbb{Z}} be a MRA.
- Let P_{V_j} denote the orthogonal projection from L^2 to V_j.
- Note that the orthogonal projection exists since L^2 is a Hilbert space and V_j a nonempty, closed subspace of L^2. (Hilbert projection theorem)
- Let f \in L^2.
- Then, we have \lim_{j\to -\infty} P_{V_j} f = f and \lim_{j\to \infty} P_{V_j} f = 0.
- \lim_{j\to \infty} P_{V_j} f = 0 encodes that the V_j get smaller and smaller
- \lim_{j\to -\infty} P_{V_j} f = f means that we can use the nested spaces V_j to approximate f, the smaller j, the better the approximation.
- So, if we have a very small j (j\to -\infty, highly negative), then we get very close to f and if it is very large (j\to\infty), then it is very coarse. So, we can decide how well we want to approximate f by choosing this V_j.
- we can approximate f with the spaces V_j arbitrarily well if j is “negative enough”
- proof:
- the reason we are looking at this is to get familiar with the consequences of all of the properties in (4.8). So far, it is not clear why we need these properties. Here this gets clear.
- j\to\infty: (4.14)
- j\to -\infty: show that \lVert f − P_{V_j} f\rVert_{L^2} = \inf_{u\in V_j} \lVert f − u\rVert_{L^2} for all j \in \mathbb{Z} goes to 0 for j\to -\infty
- this is because, by definition, the orthogonal projection P_{V_j} is just the “closest point in the space V_j to a given point f”
- use completeness (4.8 (v)) and inclusion (4.8 (ii)) properties
4.11 \phi_{j,k}, Scaling Equation, Scaling Function \phi
- conditions:
- Let (V_j)_{j\in\mathbb{Z}} be a MRA with generator \phi.
- again, this is just some object that fulfills all of the properties in (4.8)
- and we have one example (4.9) that fulfills all of the properties in (4.8), so we know that there are such MRAs, thus it makes sense to look at them
- Let \phi_{j,k} (x) := 2^{-\frac{j}{2}}\phi(2^{-j}x - k) for all j, k \in \mathbb{Z} and x \in \mathbb{R}.
- problem: why do we look at \phi_{j,k}(x)? → Because \phi_{j,k}(x) is an ONB of V_j → (i)
4.11 (i) \{\phi_{j,k} : k \in \mathbb{Z}\} is an ONB/CONS of V_j
- (i) The scaling property (4.8 (iii)) and the orthonormal basis property (4.8 (vi)) together imply that, for all j \in \mathbb{Z}, \{\phi_{j,k} : k \in \mathbb{Z}\} is an ONB/CONS of V_j (exercise).
- problem: why is this the case?
- so, in (4.8) we have assumed that if we start with \phi then these T_k\phis are a CONS
- but if we look at the new notation then this T_k\phi is nothing but \phi_{0,k}
- j=0 eliminates this scaling 2^{-\frac{j}{2}} and 2^{-j} in \phi_{j,k}(x)
- and we just have the translation (x-k) in \phi_{j,k}(x)
- the \phi_{0,k}s are an ONS of V_0
- but then we have this scaling property (4.8 (iii)), ie. that f is in V_j iff the scaled version D_{\frac{1}{2}}f is in V_{j+1} and this we can use to convert the ONS from V_0 to the other Vs
- the scaling 2^{-\frac{j}{2}} is exactly done in such a way that the normality property does not vanish because if you just scale by D_{\frac{1}{2}}f you would scale the L^2-norm (ie. the scaling would change the L^2-norm), but the 2^{-\frac{j}{2}} corrects for this (exercise: when you just compute it you will see how this fits together)
- so, from one ONB of V_0 we can go to ONBs for all of our V_js
- sidenote: if we have such an MRA then the absolute value of j has no meaning by itself because if we have (V_j)_{j\in\mathbb{Z}} then we can just shift j by any number and it still goes from -\infty to \infty just that j=0 then means perhaps j=10, so there is no inherent value. The only thing that puts a specific value to the j is that for V_0 we have the generator \phi. And, since we have the generator at 0, it is natural that we just go one level finer and end up with V_{-1}. In principle, we could just invert the ordering of the js, it would not change anything, except for that going up with j (better/finer approximation) and going down with j (worse/coarser approximation) would mean the opposite, ie. whether you call “going finer” “increasing j” or “decreasing j” does not matter. So, the construction (4.8) links different scales, but it is completely arbitrary whether you want to associate fine scales with a large j or with a negative j. It is just one decision. You can just replace j with -j everywhere, if you want, and everything stays consistent.
- problem: in this way we can connect amongst other things V_0 and V_{-1} → (ii)
4.11 (ii) The Generator \phi is not only in V_0, but also in V_{-1}
- (ii) Due to the inclusion property (4.8 (ii)), we have \phi \in V_0 \subset V_{-1}, ie. the generator is not only in V_0, but also in V_{-1}.
4.11 (iii) We can express the Generator \phi in the CONS \{\phi_{-1,k} : k \in \mathbb{Z}\}
- (iii) Since \{\phi_{-1,k} : k \in \mathbb{Z}\} is a CONS of V_{-1}, we can express the generator \phi in this CONS \label{eq:scaling-equation}\phi = \sum_{k\in\mathbb{Z}} h_k \phi_{-1,k} = \sum_{k\in\mathbb{Z}} h_k \sqrt{2}\phi(2 \cdot - k), where the dot “\cdot” is just a placeholder (not a variable!) and h_k = (\phi, \phi_{-1,k} )_{L^2} for k \in \mathbb{Z} are the Fcoeffs, cf. (3.43) (note that this does not mean that the series is converging pointwise a.e.).
- This equation is called scaling equation and the reason why \phi is also called scaling function (cf. (4.8)).
- note: This also means that we can express \phi with scaled versions of itself. This puts some interesting constraints on this \phi.
- phth: this does not imply the pointwise convergence \phi\color{red}{(x)} = \sum_{k\in\mathbb{Z}} h_k\sqrt{2}\phi(2\color{red}{x} - k) for all x, but only convergence of the L^2-norm of the series (rhs) to the L^2-norm of \phi (lhs) (see (3.48))
- watch: lecture 27, beginning of (4.12): long discussion about (3.43) and (3.48), convergence of the series in the L^2-norm vs. pointwise convergence
- counterexample: in the figure below: consider the following sequence of functions:
- (black) 1 on \left[0,1\right], otherwise 0
- (red) 1 on \left[0,\frac{1}{2}\right], otherwise 0
- (blue) 1 on \left[\frac{1}{2},1\right], otherwise 0
- (green) 1 on \left[0,\frac{1}{4}\right], otherwise 0
- (…) 1 on \left[\frac{1}{4},\frac{1}{2}\right], otherwise 0
- (…) 1 on \left[\frac{1}{2},\frac{3}{4}\right], otherwise 0
- (…) 1 on \left[\frac{3}{4},1\right], otherwise 0
- (…) 1 on \left[0,\frac{1}{8}\right], otherwise 0
- etc.
- thus, wherever you are in this sequence, you will always pick up a 1
- thus, at no point this sequence will converge pointwise to 0, but the L^2-norm converges to 0 because the intervals get smaller and smaller
- this shows that the L^2-convergence does not give you pointwise control, not even on a.e. point
- for most of the part this is not so important and if you go to discrete you do not have these kind of problems, but I stress this here, so that you do not fall for this equality (\ref{eq:scaling-equation}) showing more than it is - and that is the reason why I did not put an (x) in (\ref{eq:scaling-equation}), like \phi\color{red}{(x)} = \sum_{k\in\mathbb{Z}} h_k\sqrt{2}\phi(2\color{red}{x} - k), because it would be wrong! (actually, it was wrong in the older version of the lecture notes) It is very easy to overlook. That is why I am stressing it.
- similarly, the FS simply does not converge pointwisely, no matter how you interpret this
- note: we will see later that for “nice generators” and “nice wavelets” only a finite amount of these h_ks will be non-zero and then all of these [convergence related] problems are gone because if this sum here in (\ref{eq:scaling-equation}) is finite then you do not need to consider convergence because you can compute the finite sum and then you are done - it is just when you have a series (“infinite sum”) then you need to be careful! The wavelets that you use in practice will have only a finite number of h_ks that are not zero.
- problem: with (iii) we see a direct connection to the continuous WT:
4.11 (iv) (f, \phi_{j,k} )_{L^2} = L_{\phi} f (2^j , 2^j k)
- (iv) For f \in L^2 (\mathbb{R}), we have (f, \phi_{j,k} )_{L^2} = \int_{\mathbb{R}}f(x)2^{-\frac{j}{2}} \phi(2^{-j} x - k) dx = \boxed{\int_{\mathbb{R}}f(x) \sqrt{2^j} \phi \left(\frac{x - 2^j k}{2^j}\right) dx} = L_{\phi} f (2^j , 2^j k).
- problem: Why did we rewrite (f, \phi_{j,k} )_{L^2} to the boxed formula?
- This should look familiar from (4.1).
- The boxed formula is exactly the continuous WT for certain a and b.
- Thus, (f, \phi_{j,k} )_{L^2} is the evaluation of the continuous WT L_{\phi} f for a = 2^j and b = 2^j k.
- in particular, we can get all of these necessary coefficients (f, \phi_{j,k} )_{L^2} just by evaluating the WT at certain discrete parameters
- problem: we first have to understand what these V_js are in order to conclude more from this, but we can already see that there is a strong link to the WT by this
- problem: to further work with this I will prepare a helping lemma that tells us sth about these h_k
4.12 h_ks orthogonal to themselves when shifted
- condition:
- Let h_k be the coefficients of the scaling equation of an MRA.
- then \sum_{k\in\mathbb{Z}} h_k h_{k+2m} = \delta_{0,m}\ \text{for all}\ m \in \mathbb{Z}
- that means, in a certain sense, the scaling equation coeffs are orthogonal to themselves if you shift them by 2m
- for m=0 that means their squared sum is 1
4.13 Prerequisites: Orthogonal Complement, Direct Sum
- outer/direct sum:
- orthogonal complement:
- “the set W^{\perp} of all vectors in V that are orthogonal to every vector in W”
- phth: thus, W^{\perp} always includes the vector 0 (important for the proof of (4.14))
- Für einen Unterraum U \in X ist der Unterraum aller Vektoren U_\perp = \{ y \in X\ \vert\ x \perp y\ \text{for all}\ x \in U \} das orthogonale Komplement von U., (Bredies)
- properties: Wikipedia
4.13 Approximation Space, Detail Space, Wavelet Space
- conditions:
- Let (V_j)_{j\in\mathbb{Z}} be a MRA.
- Let W_j be such that V_{j−1} = V_j \oplus W_j and V_j \perp W_j, ie. W_j is the orthogonal complement of V_j in V_{j-1}.
- we know that V_j is contained in V_{j-1} and everything that is missing in V_j is concentrated in W_j. In a way that we only pick up the orthogonal complements that are missing. That is a nice separation.
- Then, V_j is called approximation space to the scale j
- because we know that the projection of f to V_j approximates f in a sense that if j goes to -\infty we get f back
- and W_j is called detail space or wavelet space to the scale j.
- Because you can think of it, when going from V_j to the larger V_{j-1}, then we are adding some stuff and the stuff that we add are just “details” [from the “detail space”].
- If the approximation gets better [as j gets smaller], apparently, we have to add things [functions] that were missing, ie. that were not representable on these coarser spaces [V_j with larger j], and that motivates the name “detail space”.
- problem: so, all of the W_js are enough to represent all L^2 functions. Now, we can iterate this:
4.14 Splitting V_j into Details, Splitting a L^2-Function into all its Details in all Scales j
4.14 (i) Decomposition of V_j into Details
- The definition of W_j implies by iteration that \label{eq:Vj-decomposition}V_j \stackrel{(4.13)}{=} V_{j+1} \oplus W_{j+1} \stackrel{(4.13)}{=} V_{j+2} \oplus W_{j+2} \oplus W_{j+1} = \ldots \stackrel{(4.13)}{=} V_{j+M} \oplus \color{blue}{\bigoplus_{m=j+1}^{j+M} W_m} for all M \geq 1.
- Berkels: “outer sum” (\color{blue}{\bigoplus_{m=j+1}^{j+M} W_m}) also means that the intersection of all of these W_js is \{0\}.
- explanation A: [because by def. of W_j, W_j contains all vectors that are orthogonal to the vectors in V_j, thus, since the vector 0 is orthogonal to all vectors in all V_js, all W_j must include the vector 0]
- explanation B: from Lemma below: V=U\oplus W iff the intersection of U and W contains only the vector 0 (see the “iff” in the Lemma)! In particular, in our case, the intersection of V_{j+M} and all W_ms for j+1\leq m\leq j+M must be the set containing only the vector 0. But this also means that the intersection of all W_ms for j+1\leq m\leq j+M must be the set containing at least the vector 0 (think about a Venn diagram for 3 intersecting sets: intersection area gets larger, when you take away one intersecting set). Later, we will see V_j = \bigoplus_{m\geq j+1} W_m and with this we can say that the intersection of all W_ms for m\geq j+1 must be the set containing only the vector 0. This must hold for all j. So, it is safe to say “the intersection of all of these W_js is \{0\}”, as Berkels said.

source
- This immediately implies V_j \supset \bigoplus_{m\geq j+1} W_m.
- because from (\ref{eq:Vj-decomposition}) we see that V_j contains this outer sum (in \color{blue}{blue}) for all M, and in particular, also the infinite summation M\to\infty of all of these outer sums (in \color{blue}{blue})
- problem: The question is: Can there be something missing in \bigoplus_{m\geq j+1} W_m or does it have to be all of entire V_j. The answer is, it is all, but we have to show this:
- Assume there is v \in V_j \setminus \bigoplus_{m\geq j+1} W_m. Since \label{eq:VjplusM-decomposition}V_j \setminus \left(\bigoplus_{m\geq j+1} W_m\right) \stackrel{(\ref{eq:Vj-decomposition})}{=} \left(V_{j+M} \oplus \left(\color{purple}{\bigoplus_{m=j+1}^{j+M} W_m}\right)\right) \setminus \left(\color{purple}{\bigoplus_{m\geq j+1} W_m}\right) \stackrel{(\text{explanation 1})}{=} V_{j+M} \setminus \left(\bigoplus_{m \geq j+1+M} W_m\right), we have, v \in V_{j+M} for all M \geq 1.
- explanation 1: intuitiv: das letzte Gleichzeichen gilt, weil Mengen (\color{purple}{purple} in (\ref{eq:VjplusM-decomposition})) zu V_{j+M} addiert “\oplus” und subtrahiert “\setminus” werden und sich dabei gegenseitig “canceln” (\color{red}{red} in (\ref{eq:VjplusM-decomposition-explanation})) \label{eq:VjplusM-decomposition-explanation}V_{j+M}\oplus\{\color{red}{(j+1)+\ldots+(j+M)}\}\setminus\{\color{red}{(j+1)+\ldots+(j+M)}+(j+M+1)+\ldots+(\infty)\}=V_{j+M}\setminus\{(j+M+1)+\ldots+(\infty)\}
- due to the inclusion property, this implies v \in V_M for all M \in \mathbb{Z}
- due to the trivial intersection property, this implies v = 0. This is a contradiction to v\not\in \bigoplus_{m\geq j+1} W_m \color{blue}{\supset \{0\}}.
- \color{blue}{blue} part: all of these W_ms contain 0 (recall: orthogonal complement properties: intersection of V_j and W_j is \{0\})
- (i) Thus, we have shown V_j = \bigoplus_{m\geq j+1} W_m.
- thus, “the V_j can be split into all details contained for these larger V_j spaces”
4.14 (ii) Decomposition of L^2 into Details
- (ii) Combined with the completeness, we have \color{red}{L^2(\mathbb{R})} \stackrel{(\text{completeness prop. of}\ V_j)}{=} \overline{\bigcup_{j\in\mathbb{Z}} V_j} \stackrel{(\text{i})}{=} \overline{\bigcup_{j\in\mathbb{Z}}\bigoplus_{m\geq j+1} W_m} \color{red}{= \overline{\bigoplus_{m\in\mathbb{Z}} W_m}}.
- thus, “we can split any L^2 function in sums of elements of this W_m”
- in other words: this is like “splitting the function in all its details in all the scales”
4.14 (iii) Representing f via V_j and W_j
- (iii) Since V_j \oplus W_j is an orthogonal decomposition of V_{j−1}, we have P_{V_{j-1}} \stackrel{(\text{orth. decomposition})}{=} P_{V_j} + P_{W_j} \Rightarrow P_{W_j} = P_{V_{j-1}} - P_{V_j}. Due to \label{eq:L2-decomposition}L^2(\mathbb{R}) \stackrel{(\text{ii})}{=} \overline{\color{blue}{\bigoplus_{m\in\mathbb{Z}} W_m}} = \overline{\bigoplus_{m\geq j+1} W_m \oplus \bigoplus_{m\leq j} W_m} \stackrel{(\text{i})}{=} \overline{\color{red}{V_j} \oplus \color{purple}{\bigoplus_{m\leq j} W_m}}, f \in L^2(\mathbb{R}) can be expressed by \label{eq:f-projection-to-V-and-W}\boxed{f \stackrel{(\ref{eq:L2-decomposition})}{=} \color{blue}{\sum_{m\in\mathbb{Z}} P_{W_m} f} \stackrel{(\ref{eq:L2-decomposition})}{=} \color{red}{P_{V_j} f} + \color{purple}{\sum_{m\leq j} P_{W_m} f}.}
- Bredies: Wir können nun jedes f \in L^2 auf verschiedene Weisen mit Hilfe der Räume V_j und W_j darstellen
- explanation 2:
- (\color{blue}{blue} part of (\ref{eq:f-projection-to-V-and-W})) is the orthogonal projection of f to the orthogonal decomposition of L^2 in the form of \overline{\color{blue}{\bigoplus_{m\in\mathbb{Z}} W_m}}
- (\color{red}{red}+\color{purple}{purple} part of (\ref{eq:f-projection-to-V-and-W})) is the orthogonal projection of f to the orthogonal decomposition of L^2 in the form of \overline{\color{red}{V_j} \oplus \color{purple}{\bigoplus_{m\leq j} W_m}}
- related: “Decomposition of a vector space into direct sums is not unique.”, Wikipedia
- Bredies (en): These equations justify the name “multiscale analysis”: the spaces V_j allow a systematic approximation of functions on different scales.
- Bredies (de): Diese Gleichungen rechtfertigen die Bezeichnung “Multiskalenanalyse”: Die Räume V_j erlauben die systematische Approximation von Funktionen auf verschiedenen Skalen.
- Bredies: Abbildung 4.12: Im Sinne des vorherigen Beispiels [dh. Beispiel “Stückweise konstante Multiskalenanalyse”] kann man etwas unpräzise aber suggestiv sagen: P_{V_j} u ist die Darstellung von u “auf der Skala V_j” und enthält Details von u “bis zur Größe 2^j”, siehe Abbildung 4.12.
- phth: die Höhe von P_{V_j}u auf den einzelnen dyadischen Intervallen ist nach (4.15 (i)) “the mean of f on the dyadic interval \left[k2^j , (k + 1)2^j \right)” (im Fall der stückweise konstanten Multiskalenanalyse) bzw. \sum_{k\in\mathbb{Z}}(\phi_{j,k},u)_{L^2}\phi_{j,k} (im allgemeinen Fall)
- phth: Details von u “bis zur Größe 2^j”
Abbildung 4.12:
4.14 (iv) Proof of 4.10 (continued)
- (iv) (iii) implies that P_{V_j} f \to 0 for j \to \infty holds (because explanation 3), which we claimed earlier (proof of (4.10)).
- explanation 3: “sandwich”: the last equality in (\ref{eq:f-projection-to-V-and-W}) must hold for all j, thus, since (\color{blue}{blue}) and (\color{purple}{purple}) are equal for j\to\infty, this equality can only hold for j\to\infty if P_{V_j}f vanishes
- problem: Let’s look at our only MRA example again and see what we can get about these projections there:
4.15 Example: Piecewise Constant MRA
4.15 (i) Compute Projection of f to V_j
- Let (V_j )j\in\mathbb{Z} the piecewise constant MRA from (4.9) and let f \in L^2 (\mathbb{R}).
- Since (A) the elements of V_j are constant on the intervals \left[k2^j , (k + 1)2^j \right) and (B) \lVert f − P_{V_j} f\rVert_{L^2} = \inf_{u\in V_j} \lVert f − u\rVert_{L^2}, we get for x \in \left[k2^j , (k + 1)2^j \right) that (intuitively: boils down to approximating a function on an interval with a single value) (P_{V_j} f )(x) \stackrel{(\text{p.w.c. MRA})}{=} \color{blue}{2^{−j}}\color{red}{\int^{(k+1)2^j}_{k2^j} f(y)dy} = 2^{−j}\int_{\mathbb{R}} \chi_{\left[k2^j ,(k+1)2^j \right)}(y)f(y)dy \label{eq:PVj1}(\ldots) = 2^{−\frac{j}{2}}\int_{\mathbb{R}}2^{−\frac{j}{2}}\chi_{\left[0,1\right)}(2^{−j}y − k)f(y)dy = \color{purple}{2^{−\frac{j}{2}}}\int_{\mathbb{R}}\phi_{j,k}(y)f(y)dy = (f, \phi_{j,k} )_{L^2} \color{purple}{\phi_{j,k}(x)}.
- (\color{purple}{purple}) is the value of \phi_{j,k}(x) on the intervals \left[k2^j , (k + 1)2^j \right) (which is the same value, 2^{-\frac{j}{2}}, on all intervals for a fixed j, ie. the value is independent of k)
- (\color{blue}{blue} + \color{red}{red}) is the mean of f on the dyadic interval \left[k2^j , (k + 1)2^j \right)
- recall: f_{\text{avg}}=\color{blue}{\frac{1}{b-a}}\color{red}{\int_a^b f(x)dx}
- phth: the area under \phi_{j,k} is always 2^{\frac{j}{2}}
because \{\phi_{j,k} : k \in \mathbb{Z}\} is an ONB/CONS of V_j, ie. for a given j\in\mathbb{Z} we have (\phi_{j,k},\phi_{j,l})_{L^2}=\delta_{k,l} which means the L^2-norm of \phi_{j,k} must be 1 for all j,k\in \mathbb{Z} (wrong because: ONBs consider the L^2-norm and the L^2-norm is the area under the squared function, not the area under the function itself!)
- Warum ist (P_{V_j} f )(x) \color{red}{\stackrel{(\text{?})}{=}} 2^{−j}\int^{(k+1)2^j}_{k2^j} f(y)dy?
- because, by definition of P_{V_j}, we have \lVert f − P_{V_j} f\rVert_{L^2} = \inf_{u\in V_j} \lVert f − u\rVert_{L^2} (see (B)), so P_{V_j} f has to be “the closest (in the L^2-norm) function u\in V_j to f” on \left[k2^j , (k + 1)2^j \right), ie. P_{V_j} f is the function that minimizes the L^2-norm difference between u and f on \left[k2^j , (k + 1)2^j \right) (intuitively, the size of the area under u^2 must be as similar as possible to the size of the area under f^2). Since, the f_{\text{avg}} minimizes \lVert f − u\rVert_{L^2} over u\in V_j (the mean minimizes the mean squared error), we get (P_{V_j} f )(x) = f_{\text{avg}} = 2^{−j}\int^{(k+1)2^j}_{k2^j} f(y)dy.
- important: This holds only for the p.w.c. MRA (see (A)). For a different MRA, \inf_{u\in V_j} \lVert f − u\rVert_{L^2} is not necessarily minimized by the mean f_{\text{avg}}, but by some other u\in V_j.
- (\color{blue}{blue}) is a normalization factor
- Warum gerade 2^{-j}?
- because f_{\text{avg}}=\color{blue}{\frac{1}{b-a}}\color{red}{\int_a^b f(x)dx}, and in our case we have \color{blue}{\frac{1}{b-a}}=2^{-j}
- thus, we can compute this projection P_{V_j}f in terms of the generator, ie. in terms of the CONS of V_j given by \phi_{j,k}:
- (i) Noting that for any x \in \mathbb{R}, we have \phi_{j,k} (x) = 0 for all k \in \mathbb{Z} but one, we get \label{eq:PVj2}\boxed{P_{V_j} f = \sum_{k\in \mathbb{Z}} (f, \phi_{j,k} )_{L^2} \phi_{j,k}}.
- P_{V_j} f is the “Projection of f to our approximation space V_j”
- note: (\ref{eq:PVj1}) is for x and (\ref{eq:PVj2}) is for all x. So, why is (\ref{eq:PVj2}) valid?
- When going from (\ref{eq:PVj1}) to (\ref{eq:PVj2}), we have used that for any x \in \mathbb{R}, we have \phi_{j,k} (x) = 0 for all k \in \mathbb{Z} but one
- for simplicity, consider a fixed j’ first
- Because [ for a fixed j’ ] these dyadic intervals do not overlap and this x’ is only in just one of these dyadic intervals [ say x’ \in \left[k’2^j,(k’+1)2^j\right) ]. Thus, [for any x’,] the series (the sum over k\in \mathbb{Z}) is just a sum with one summand [which is non-zero. This summand is the one with the factor \phi_{j,k’}(x’) (because x’ \in \left[k’2^j,(k’+1)2^j\right) as assumed above, thus, \phi_{j,k’}(x’) = 2^{-\frac{j}{2}}\chi_{ \left[k’2^j,(k’+1)2^j\right) }(x’) = 2^{-\frac{j}{2}} \neq 0 will always hold).]
- from next lec:
- Berkels: (\ref{eq:PVj2}) is a finite sum with only one non-zero coeff. [phth: must be wrong: probably Berkels means the summands of the series because the coeffs (f, \phi_{j,k} )_{L^2} themselves cannot be 0 because the scalar product is an integral over \mathbb{R}, thus, the scalar product with \phi_{j,k} “samples” f on \left[k2^j,(k+1)2^j\right) for each k which would be 0 only if both f and \phi_{j,k} are 0 on this interval]
This non-zero coeff is (f, \phi_{j,k’})_{L^2} (ie. the scalar product of f with the scaled \chi_{ \left[k’2^j,(k’+1)2^j\right) }).
The scalar product of f with all other \chi_{ \left[m2^j,(m+1)2^j\right) } with m \neq k’ is always zero because x’ is not in \left[m2^j,(m+1)2^j\right), so \phi_{j,m}(x’) = 0.
- of course, this also follows from the ONB property of these \phi_{j,k}, but here we actually computed this how it naturally arises from how we chose the generator
- problem: now, lets compute the h_k for our specific p.w. constant MRA:
4.15 (ii) Compute h_k
- Moreover, we have for this MRA that h_k = (\phi, \phi_{−1,k} )_{L^2} = \int_0^1 2^{\frac{1}{2}} \phi(2x − k) dx = \int_0^1 \sqrt{2}\chi_{\left[\frac{k}{2},\frac{k+1}{2}\right)} (x) dx = \begin{cases}
\frac{\sqrt{2}}{2} & k \in \{0, 1\}\\
0 &\text{else}
\end{cases} = 2^{−\frac{1}{2}} (\delta_{k,0} + \delta_{k,1} ).
4.15 (iii) Compute Scaling Equation
- Thus, in this case, the scaling equation simplifies to \phi = \sum_{k\in\mathbb{Z}} h_k \sqrt{2}\phi(2 \cdot -k) = \phi(2\cdot) + \phi(2 \cdot -1).
- ie. \phi can be represented by shifted, scaled copies of itself (which is exactly what the scaling equation said, but here we now specifically see which two translations and scalings we need
- thus, the series becomes a finite sum!
4.15 (iv) Compute \phi_{j,k}
- Composing this with x \mapsto 2^{-j} x - k, this shows \phi(2^{-j} \cdot -k) = \phi(2^{-(j-1)} \cdot -2k) + \phi(2^{-(j-1)} \cdot -(2k + 1)). Multiplying both sides with 2^{-\frac{j}{2}} shows \label{eq:phi-jk}\phi_{j,k} = \frac{1}{\sqrt{2}} (\phi_{j-1,2k} + \phi_{j-1,2k+1} ).
- ie. each of these \phi_{j,k} we can represent as linear combination of two of the \phi_{j-1,\ldots}, so from one finer level (which we can see in Abbildung 4.12)
- problem: we have already checked P_{V_j} above (the projection to approximation spaces), now, we check P_{W_j} (the projection to detail spaces)
4.15 (v) Compute Projection of f to W_j (Part 1)
- This allows us to compute the projection to W_{j+1}. We get P_{W_{j+1}} f \stackrel{(\text{EA})}{=} P_{V_j} f − P_{V_{j+1}} f \stackrel{(\text{EB})}{=} \sum_{k\in\mathbb{Z}} (f, \phi_{j,k} )_{L^2} \phi_{j,k} − \sum_{k\in\mathbb{Z}} (f, \phi_{j+1,k} )_{L^2} \phi_{j+1,k}
- EA: for this, lets recall our def. of these W_js: so, V_{j-1} = V_j \oplus W_j (4.13) implies that P_{V_{j-1}}=P_{V_j}+P_{W_j} and if I just solve for the projection to W_j then I have to subtract V_j on both sides, so we get this result P_{V_j}f - P_{V_{j+1}}f here
- EB: we computed both P_{V_j} and P_{V_{j+1}} last time (see beginning of (4.15): P_{V_j}f = \ldots)
- Splitting the first sum in even and odd indices and using (\ref{eq:phi-jk}) for the second sum, leads to
4.15 (vi) Compute Projection of f to W_j (Part 2), Wavelet Corresponding to the Generator
- what we introduce now is actually the wavelet that is corresponding to the generator.
- recall:
- if you think about (4.1) where we had the continuous WT, we started with this wavelet \psi that we used to multiply our function f with for the WT and integrate
- but here we started with a generator that is called \phi (not \psi) and is not called “wavelet” but “generator”
- and now comes the connection back to the wavelet that we had before. This is what this \psi will be.
- Introducing \label{eq:psijk}\boxed{\psi(x) := \phi(2x) - \phi(2x - 1)}\ \text{and}\ \boxed{\psi_{j,k} (x) := 2^{-\frac{j}{2}} \psi(2^{-j}x - k)} we get \psi_{j+1,k} = \frac{1}{\sqrt{2}} (\phi_{j,2k} - \phi_{j,2k+1} ) and the representation of P_{W_{j+1}} f simplifies to \label{eq:PWf}\boxed{P_{W_{j+1}} f = \sum_{k\in\mathbb{Z}} (f, \psi_{j+1,k} )_{L^2} \psi_{j+1,k}}.
- (\ref{eq:psijk}) defines scaled and translated versions of this \psi, exactly like we did with \phi
- (\ref{eq:psijk}) \psi_{j,k} has the same scaling as \phi_{j,k}
- This \psi we already know, you just have to recognize it:
- The function \psi is the Haar wavelet we have seen before in (4.7 (iii)).
- Because \phi(2x) is \chi_{\left[0,\frac{1}{2}\right)} and, accordingly, \phi(2x-1) is \chi_{\left[\frac{1}{2},1\right)}. This means the function \psi is 1 on the interval \left[0,\frac{1}{2}\right) and -1 on the interval \left[\frac{1}{2},1\right) which is exactly what was the Haar wavelet!
4.15 (vii) \{\psi_{j,k} : k \in \mathbb{Z} \} is an ONB of W_j
- Moreover, we have \label{eq:orthonormal}(\psi_{j,k} , \psi_{j,m})_{L^2} = \frac{1}{2}((\phi_{j−1,2k} − \phi_{j−1,2k+1}) , (\phi_{j−1,2m} − \phi_{j−1,2m+1}))_{ L^2 } = \frac{1}{2} (\delta_{k,m} + 0 + 0 + \delta_{k,m} ) = \delta_{k,m} .
- this means that the \psi_{j,k} that we get are also an ONS
- Combined with the representation of P_{W_{j+1}} we derived above (\ref{eq:PWf}) and that P_{W_{j+1}}g=g (for g\in W_{j+1} for all j) is the identity on W_{j+1} , this means that \{\psi_{j,k} : k \in \mathbb{Z} \} is an orthonormal basis of W_j .
- (i) “orthonormality of ONS” follows from (\ref{eq:orthonormal})
- (ii) “each element f of W_{j+1} can be represented by P_{W_{j+1}} as a projection in this space” follows from the P_{W_{j+1}}g=g argument
- (iii) “With P_{W_{j+1}} we can represent any element f \in W_{j+1} as linear combination of the \psi_{j+1,k}s” follows from (\ref{eq:PWf}) and (ii) (“completeness of ONS”)
- phth:
- (ii), (iii) müssen für alle j gelten:
- zB. mit index shift j-1 wird aus (iii):
- “With P_{W_{j}} we can represent any element f \in W_{j} as linear combination of the \psi_{j,k}s”
- bestätigt er auch später: “the P_{W_{j+1}}g=g argument holds for all j, it is not only for j+1, but I can insert j-1 for j. It is somewhat confusing because of all of these j and j+1 and we are jumping back and forth between these, but j is arbitrary.”
- phth: wofür brauchen wir P_{W_{j+1}}g=g?
- P_{W_{j+1}}g=g gilt für alle f\in W_{j+1} und zeigt, dass f durch die Operation P_{W_j} nicht verändert wird. Deshalb muss (\ref{eq:PWf}) (zumindest) alle f\in W_{j+1} als Linearkombination darstellen können. \Rightarrow completeness of ONS \{\psi_{j,k}\}.
- The existence of a wavelet that generates an orthonormal basis for W_j not only holds in the special case here, but more generally.
- so, this general construction that we can start with the generator \phi, then use the scaling equation and all these combinations to construct the \psi and then get CONSes for all our W_js, is not linked just to the Haar wavelet, but this is a very general connection.
- phth: the area under \psi_{j,k} is always 2^{\frac{j}{2}}
because \{\psi_{j,k} : k \in \mathbb{Z}\} is an ONB/CONS of W_j, ie. for a given j\in\mathbb{Z} we have (\psi_{j,k},\psi_{j,l})_{L^2}=\delta_{k,l} which means the L^2-norm of \psi_{j,k} must be 1 for all j,k\in \mathbb{Z} (wrong because: ONBs consider the L^2-norm and the L^2-norm is the area under the squared function, not the area under the function itself!)

source: Bredies
- Bredies: Das obige Beispiel zeigt, dass sich zur stückweise konstanten Multiskalenanalyse tatsächlich ein Wavelet (das Haar-Wavelet) findet, welches eine Orthonormalbasis der Wavelet-Räume W_j liefert. Eine ähnliche Konstruktion funktioniert auch allgemein, wie folgender Satz zeigt:
4.16 \psi is a Wavelet, \{\psi_{j,k}\} is an ONB/CONS of W_j and L^2
- conditions:
- (c1) let (V_j )_{j\in\mathbb{Z}} be a MRA with generator \phi
- (c2) let h_k be the coefficients of the scaling equation.
- (c3) let \psi \in V_{−1} be defined by \psi := \sum_{k\in\mathbb{Z}} (−1)^k h_{1−k} \phi_{−1,k}.
- this \psi may look surprising, but if we insert the p.w.c. MRA we only have h_0 and h_1 not equal to 0 and, actually, both are just \sqrt{2}/2 (see (4.15 (ii))) and then we see that we get exactly the combination of \chi_{\left[0,1/2\right]} (which just comes from the k=0) minus \chi_{\left[1/2,1\right]} (which comes from k=1), where we also get this minus here from the (-1)^k. So, for the one case that we computed, this \psi here is exactly the \psi that we had above, but here it is now told in general. → (4.17 (i))
- Then,
- (i) \{\psi_{j,k} : k \in Z\} is an orthonormal basis of W_j.
- which we showed for our specific generator \phi, but it holds for all generators
- (ii) \{\psi_{j,k} : j, k \in Z\} is an orthonormal basis of L^2 (\mathbb{R}).
- We already saw last time that L^2 is the closure of the [direct] sum of all the W_js (4.14 (ii)). Thus, if we have (i), then (ii) follows from what we already know.
- phth: nach (i) können wir jedes f\in W_j mit \{\psi_{j,k} : k \in Z\} darstellen. Da nach (4.14 (ii)) alle W_j zusammen L^2 bilden, muss \{\psi_{j,k} : k \in Z\} über alle j alle f\in L^2 darstellen können.
- (iii) \psi is a wavelet with c_\psi = \ln{2}.
- again, with the same scaling as above
- which is now the promised connection in its generality. So, for a function to be a wavelet we needed this corresponding constant c_\psi to be bigger than 0 and smaller than \infty (this was for the continuous WT). In this case, we can even exactly tell what this constant is for the \psi and this constant is always the same if this \psi is constructed from an MRA and its generator. So, in essence, this means that “for any MRA we get a corresponding wavelet”.
- “how to construct a wavelet given any MRA”, works as follows:
- (c1) if we have a MRA (V_j)_j it comes always with a generator \phi (that is part of the def. of a MRA)
- (c3) then we can construct a function \psi from scale shifted versions of the generator, ie. from \phi_{-1,k}, by reusing the coeffs from the scaling equation h_{1-k}
- (i) if we do so, then this \psi gives us CONSes of all detail spaces W_j for all the j (ie. if we fix the scale j and use all the shifts k)
- (ii) if we also use all the scales j and all the shifts k, then we even get a CONS of L^2. So, this is like, instead of expressing things in terms of the V_j, we are expressing in terms of the W_js (the detail spaces)
- (iii) furthermore, the \psi created like this is always a wavelet and we can specify the admissibility constant c_\psi which is \ln{2} in this case, that means “given an MRA we get a wavelet” (this connects the 1st part of this chapter where I just assumed I was given a wavelet, and now we see how we can construct one from an MRA)
- proof:
- we will use similar techniques that we used in the specific case above where we did the computation for a very specific MRA (the p.w.c. MRA).
- Lets see if there are any unforeseen things happening on the way
4.17 Example: Wavelet Pairs \phi,\psi
4.17 (i) Piecewise Constant MRA, Haar Wavelet
- Let (V_j )_{j\in\mathbb{Z}} the piecewise constant MRA from (4.9).
- Recalling, h_k = 2^{-\frac{1}{2}} (\delta_{k,0} +\delta_{k,1} ), we get \psi = \sum_{k\in\mathbb{Z}} (−1)^k h_{1−k} \phi_{−1,k} = (\ldots) = -\chi_{\left[\frac{1}{2},1\right)}+\chi_{\left[0,\frac{1}{2}\right)} which is again the Haar wavelet
4.17 (ii) Daubechies Wavelets
- Constructing wavelets that both have compact support and have some smoothness is highly non-trivial.
- The discovery of compact, regular and orthogonal wavelets by Ingrid Daubechies in 1988 is one of the reasons why wavelets became very popular in the 90s-era.
- The Daubechies wavelets are a family of orthogonal wavelets that are characterized by a maximal number k of vanishing moments, ie. \int_{\mathbb{R}}x^l \psi(x) dx = 0\ \text{for}\ l \in \{0, \ldots , k - 1\}, for a given support.
- Perhaps the most famous wavelet pair \phi, \psi is the Daubechies wavelet with two vanishing moments, denoted by
db2
.
- Interestingly, the Daubechies wavelet with one vanishing moment,
db1
, is equivalent to the Haar wavelet.
- The values of the coefficients h_k are available as tabulated numerical values. For
db2
, the values are also available analytically (see notes)
4.17 (iii) Symlets, Daubechies’ Least-Asymmetric Wavelets
- A modification of the Daubechies wavelets are the symlets, which are sometimes also called Daubechies’ least-asymmetric wavelets.
- As the name suggests, these wavelets are more symmetric than the Daubechies wavelets and also have vanishing moments.
- For instance, the symlet with four vanishing moments is denoted by
sym4
.
- Note that
db2
and sym2
are equivalent.
- (i) Due to (4.16), we know that
- \psi constructed from an MRA is a wavelet and
- \{\psi_{j,k} : j, k \in \mathbb{Z}\} is an orthonormal basis of L^2 (\mathbb{R}).
- In particular, we have f = \sum_{j,k\in\mathbb{Z}} (f, \psi_{j,k} )_{L^2} \psi_{j,k}.
- (ii) Recalling (f, \psi_{j,k} )_{L^2} = L_\psi f (2^j , 2^j k), (i) shows that it is sufficient to know L_\psi f on the discrete set \{(2^j , 2^j k) : j, k \in \mathbb{Z}\} \subset \left[0, \infty\right) \times \mathbb{R} to encode f.
- (iii) In the sense of (ii), the coefficients (f, \psi_{j,k} )_{L^2} are the discrete wavelet transform of f.
- (iv) The key to a fast wavelet transform is to use that those coefficients can be computed recursively, which is a consequence of the scaling equation and the definition of \psi:
- basically, the same as (4.19), but (4.19) is the short notation of (4.18)
- Herleitung der FWT Recursion Formulas (4.19):
- the 1st part is a generalization of the scaling equation: says that the scaling equation can be used for all j
- recall: \phi = \sum_{k\in\mathbb{Z}} h_k \phi_{-1,k}
- \phi is constructed from the \phi_{\color{red}{-1},k}, so \phi_\color{red}{j} can be constructed from the \phi_{\color{red}{j}-1,k}
- we have added j in front of the -1 index in \phi = \sum_{k\in\mathbb{Z}} h_k \phi_{\color{red}{-1},k}
- and it is shifted by k (see \phi_{j,\color{red}{k}})
- recall: \psi := \sum_{k\in\mathbb{Z}} (−1)^k h_{1−k} \phi_{−1,k}
- \psi is constructed from the \phi_{\color{red}{-1},k}, so \psi_\color{red}{j} can be constructed from the \phi_{\color{red}{j}-1,k}
- the 2nd part follows by inserting the expressions for \phi_{j,k} and \psi_{j,k} from the 1st part into the scalar products (f,\phi_{j,k}) and (f,\phi_{j,k})
- this now gives us a way to go from j to j+1
- given the cs for one j we can get the cs and ds for j increased by 1
4.19 FWT: Compute c^{j+1} and d^{j+1} from c^j
- Berkels: the c^j correspond to the projection P_{V_j} f and the c^{j+1}_k correspond to the projection P_{V_{j+1}} f and the d^{j+1}_k correspond to the projection P_{W_{j+1}} f
- Berkels: ie. if we know the function on V_j we can split it in V_{j+1} and W_{j+1} (this is like splitting the coarse information from the detail)
- FWT Recursion Formulas:
- The FWT uses these FWT Recursion Formulas to
- compute from P_{V_j} f (given in form of the coefficient vector c^j ) the coarser projection P_{V_{j+1}} f (as the coefficient vector c^{j+1} ) and the wavelet component P_{W_{j+1}} f (as the coefficient vector d_{j+1}).
- Note that the sums are finite in case the scaling equation coefficient sequence only has finitely many nonzero entries.
- If only few [of the scaling equation coefficients] are nonzero (eg. for the Haar wavelet only h_0 and h_1 are nonzero, for Daubechies’ wavelets four h_ks are nonzero), only few computations have to be done when going from j to j + 1.
- problem: we still want to convert back to our original input after we have done whatever manipulation we wanted to do on the wavelet coeffs, so we need to be able to invert this. Fortunately, this is also possible in a similar way:
4.20 Invert the FWT: Compute c^j from c^{j+1} and d^{j+1}
- invert the (fast) wavelet transform, ie. we need to be able to
- compute P_{V_j} f from the coarser projection P_{V_{j+1}} f and the details P_{W_{j+1}} f
4.21 Compact Notation of the FWT and IFWT (Algorithm)
- apply (4.19) M times, ie. go from j to j+1 M times
- only the V_js are splitted further and further by the FWT algorithm, while W_{j+1}, W_{j+2} are not splitted further (see figure below)
- the IFWT goes in the figure below in the opposite direction, ie. from bottom to top
4.22 H and G can be expressed as Convolutions
4.23 Coefficients c_j = ((f, \phi_{j,k} )_{L_2} )_{k\in\mathbb{Z}} for one j \in \mathbb{Z}, ie. the representation of our signal f on an initial scale j
- problem:
- To compute the FWT, we still need the coefficients c^j = ((f, \phi_{j,k} )_{L^2} )_{k\in\mathbb{Z}} for one j \in\mathbb{Z}, ie. the representation of our signal f on an initial scale j.
- Usually, f is not available analytically, so we cannot just compute c^j.
- Thus, we need a way to estimate c^j from available data.
- condition:
- let C := \int_{\mathbb{R}}\phi(x) dx
- then, (f, \phi_{j,k} )_{L^2} = 2^\frac{j}{2} Cf (2^j k)+\mathcal{O}(2^j) thus, for -j large enough [so that \mathcal{O}(2^j) is negligible], 2^\frac{j}{2} Cf (2^j k) is a good approximation of (f, \phi_{j,k} )_{L^2} .
- it is sufficient, if we can sample our signal f at equidistant points as long as the distance between neighboring sample points is sufficiently small.
- Let N = 2^M.
- computing a single coefficient (Hc)_k = (c \ast D_{−1} h)_{2k} is \mathcal{O}(n)
- Both Hc and Gc are (due to the downsampling) vectors of length N/2 and thus can be computed with \mathcal{O}(nN )
- the FWT for decomposition depth M can be computed with \mathcal{O}(nN ), which is even faster than the FFT, if n is small.
- Similarly, one can show that the IFWT is also \mathcal{O}(nN ).
- There are numerous ways to construct a two-dimensional wavelet transform.
- The easiest way is to extend the one-dimensional wavelet transform by using the tensor product.
- slide demo:
- Berkels: these 3 (H_j^2, S_j^2, D_j^2) contain a V_j, why are they called 2D detail spaces? - These 3 (H_j^2, S_j^2, D_j^2) are just details (like edges), although they contain a V_j, because these 3 together form W_j by our definition of W_j: \mathcal{W}_j^2 must be the orthogonal complement of V_j^2, ie. V_{j-1}^2=V_j^2\oplus \mathcal{W}_j^2, so, when comparing this with the equation in Figure 4.24 we can see that \mathcal{W}_j^2=H_j^2\oplus S_j^2\oplus D_j^2
- Berkels: If you check the picture on the slides you can also see that there is no local means left in these H_j^2, S_j^2, D_j^2.
Figure 4.24:
- Berkels: on the lower left it picks more vertical structures, on the top right it picks more horizontal structures and on the diagonal it picks more diagonal structures
- Berkels: at the 2nd level we just get horizontal, vertical and diagonal components at a coarser scale. They contain coarser information because they are already averaged.
- why “averaged”?: 2nd level ist die Zerlegung von V_{j+1} in V_{j+2} und W_{j+2} und P_{V_{j+1}}f=f_{\text{avg}} im Sinne von (4.15 (i)), dh. das Bild oben links (a, aa, aaa, etc.) ist ein “average” (für p.w.c. MRA). In diesem Sinn ist W_{j+2} (2nd level) coarser als W_{j+1} (1st level), weil W_{j+2} ein Teil des bereits “geaverageten” V_{j+1} ist
- for other wavelets (other MRA) than the Haar wavelet the “average” (top left image) and the details (3 other images) look different
- Berkels: but the 2nd level is still more horizontal, vertical and diagonal details. So, the 2nd level gives you additional information that is not in the 1st level. But it is still as directional as the 1st level.
- Berkels: the scale on everything is very much related to your image resolution, not to what you actually see in there
4.25 JPEG, JPEG2000
- Wavelets have countless of applications, both in theoretical and in practical work. Here, we just confine to briefly mention one practical application.
- Like the DCT, the FWT (Fast Wavelet Transform) can be used for image compression by only storing the most important wavelet coefficients.
- In JPEG2000, the block-wise DCT is replaced by a wavelet transform, which results in significantly improved compression rates, ie. better image quality using the same amount of data or same image quality using less data.
- unlike the DCT, the WT has no “blocking” artifacts
- While the classical JPEG format is still much more popular in digital photography than JPEG2000, digital cinema uses JPEG2000 as basis for video compression.