2.1 Continuous Moving Average Filter, Circular Averaging

Task: Denoising

  • image: $f: \mathbb{R}^d \to \mathbb{R}$
  • image without noise: $f_0: \mathbb{R}^d \to [0,1]$
  • noise: $n: \mathbb{R}^d \to \mathbb{R}$
  • $M_rf(x) = \int^\text{avg}_{B_r(x)}f(y)dy$
    • where $B_r(x)$ is the ball around $x$ with radius $r$ and $\int_A^\text{avg}f(x)dx := \frac{1}{\text{Vol}(A)}\int_A f(x)dx$
    • $B_r(x)$ is like a moving window
    • you can use either a ball in 2-norm (corresponding to filter $M_r$) or in $\infty$-norm (maximum norm) (corresponding to filter $M_r^\infty$)
      • $M_r^\infty$ is easier to compute in the discrete case
  • $M_rf(x) = \int_{\mathbb{R}^d} f(x+y)\left(\frac{1}{| B_r(0) |}\chi_{B_r(0)}(y)\right)dy$ ($M_r$ as a linear filter, $\frac{1}{| B_r(0) |}\chi_{B_r(0)}$ as filter function)
  • slide demo: as the circle gets bigger the image gets more and more blurry (ie. you loose sharp structures, eg. edges), but at the same time the noise is reduced
    • it will soon turn out that “loosing edges” is a property of linear filters, it is unavoidable
    • thus, denoising is not the best use case for linear filters

2.2 Cross-Correlation, Linear Filter

  • $\left(\psi\star f\right)(x) = \int_{\Omega}\psi(y)f(x+y)dy$
  • linear filter with kernel $\psi$: $M_\psi: f\mapsto(x \mapsto \left(\psi\star f\right)(x))$

Convolution vs. Cross-Correlation

Merke

  • Merke:
    • Correlation ist “das Einfache”, Convolution ist eine Correlation mit der gespiegelten Version einer der beiden Funktionen (welche der beiden wir spiegeln ist egal, aber die Gespiegelte muss die sein, die wir von links nach rechts schieben).
      • Je nachdem welche wir spiegeln, kriegen wir entweder $f\ast g$ oder $g\ast f$ raus. Aber weil die Convolution kommutativ ist, sind beide Ergebnisse gleich.
    • betrachte $f\star g$ bzw. $f\ast g$:
      • Correlation: Das erste Argument $f$ wird von links nach rechts geschoben.
      • Convolution: Das zweite Argument $g$ wird gespiegelt und dann von links nach rechts geschoben.
      • im Komplexen wird aus der Spiegelung eine Adjungierung, dh. die Convolution schiebt $\overline{g(-x)}$ statt $g(-x)$ von links nach rechts
    • Convolution: kommutativ.
    • Correlation: Kommutation entspricht einer Spiegelung an der y-Achse (im Reellen).

Relationship between Convolution and Correlation

  • (i) For real-valued functions, of a continuous or discrete variable, convolution $( f \ast g )$ differs from cross-correlation $( f \star g )$ only in that either $f(x)$ or $g(x)$ is reflected about the y-axis in convolution; thus the convolution is a cross-correlation of $g(−x)$ and $f(x)$, or $f(−x)$ and $g(x)$.
  • (ii) For complex-valued functions, the cross-correlation operator is the adjoint of the convolution operator, ie. the convolution is a cross-correlation of $\overline{g(−x)}$ and $f(x)$, or $\overline{f(−x)}$ and $g(x)$
    • Beweis:
      • ”$ f(x) \ast g(x) = \overline{f(−x)} \star g(x)$” (Eq. 1) steht explizit hier.
      • Für die Kommutation der convolution $( g \ast f )$ gilt nach (Eq. 1) $ g(x) \ast f(x) = \overline{g(−x)} \star f(x)$.
      • Da die LHS von (Eq. 1) und (Eq. 2) gleich sind by commutativity, müssen auch deren RHS gleich sein, dh. “$\overline{f(−x)} \star g(x) = \overline{g(−x)} \star f(x)$”.

Comparison-convolution-correlation-svg

  • in the figure: “The symmetry of $f$ is the reason $f\star g$ and $g\ast f$ (s. unten Mitte und unten links im Bild) are identical in this example.”
    • dh. wäre $f$ nicht symmetrisch, wären $f\star g$ und $g\ast f$ unterschiedlich!

Commutative or Not?

  • convolution: $f\ast g = g\ast f$ (wegen commutativity)
  • correlation: $(f\star g)(x) = (\overline{g}\star \overline{f})(-x) = \overline{(g\star f)(-x)}$
    • im Reellen ist Kommutation also nur eine Spiegelung an der y-Achse
    • bec of the latter $(\overline{g}\star \overline{f})(-x) = \overline{(g\star f)(-x)}$, this is equivalent to (3.9 (ii))
      • $(\overline{g}\star \overline{f})(-x) = \overline{(g\star f)(-x)}$ holds bec conjugation is distributive over multiplication and addition (the integral is like a sum, so distributivity must hold for the integral, too), conjugation properties

2.3 Properties of the cross-correlation

2.3 (i) Boundedness of the norm of the cross-correlation

  • $f\in L^p(\Omega)$ and $g\in L^q(\mathbb{R}^d)$ and $\frac{1}{r}+1 = \frac{1}{p}+\frac{1}{q}$, then
    • (i) existence/finiteness of the integral: $(f\star g)\in L^r(\mathbb{R}^d)$
    • (ii) Young’s convolution inequality for cross-correlation: $\|f\star g\|_{L^r} \leq \|f\|_{L^p}\|g\|_{L^q}$
      • we need this to show (2.7 (i))
    • (iii) $(f\star g)(x) = (g\star f)(-x)$ (reflection about the y-axis)
      • but for convolution: $(f\ast g)(x) = (g\ast f)(x)$
      • for complex-valued functions: $(f\star g)(x) = (\overline{g}\star \overline{f})(-x)$
    • (iv) $\|f\star g\|_{L^r} = \|g\star f\|_{L^r}$

2.3 (ii) Inheritance of Differentiability, Derivatives of the Correlation

To understand $\psi \in C_c^k(\mathbb{R}^d)$:

see section “compact”

  • $\psi\in C_c^k$ and $f\in L^p$, then
    • (i) $(f\star \psi)\in C^k$
      • in words:
        • “the resulting correlation inherits differentiability of its arguments (it is sufficient if one of the arguments is differentiable for the result to be differentiable)”
        • “the result of the correlation is always as smooth as the kernel
      • we need this to show (2.7 (ii)) “denseness of $C_c^{\infty}$”
    • (ii) $\frac{\partial^\alpha}{\partial x^\alpha}(f\star \psi) = f\star \frac{\partial^\alpha}{\partial x^\alpha}\psi$
      • $\alpha$ is a vector of natural numbers including $0$ ($\partial x^0$ means “not derived in x-direction”, $\partial y^1$ means “derived once in y-direction”, etc.), eg. $\alpha=(2,1,0)$ means $\frac{\partial^3}{\partial x^2\partial y^1\partial z^0}$
      • in words: “you just need to compute the derivative of the kernel”
      • useful if: you need a differentiable (aka “smooth”) approximation of your image, eg. for edge detection (see “In general, a signal $f$ is not smooth, but Proposition (2.3 (ii)) ensures that …” idea in the Canny Edge section)
      • we need this to show (iii) which, in turn, we will need in the Canny Edge Detector section (2.16)
    • (iii) $\frac{\partial^\alpha}{\partial x^\alpha}(\psi\star f) = (-1)^{|\alpha|}\frac{\partial^\alpha}{\partial x^\alpha}\psi\star f$
      • $|\alpha|$ is the sum of components, eg. with the example above $|\alpha|=2+1+0=3$
      • in words: again, “you just need to compute the derivative of the kernel”
      • note: $\psi$ is differentiable, but $f$ is not necessarily differentiable
      • we get (iii) because by (2.3 (i) part (iii)) $\frac{\partial^\alpha}{\partial x^\alpha}(\psi\star f)(x) = \frac{\partial^\alpha}{\partial x^\alpha}(f\star \psi)(-x)$ and, by applying the chain rule to compute $\frac{\partial^\alpha}{\partial x^\alpha}(f\star \psi)(-x)$, we see that the outer derivative of $(f\star \psi)(-x)$ is given by (ii), whereas the inner derivative, $\frac{\partial^\alpha}{\partial x^\alpha}(-x)$, is $(-1)^{|\alpha|}$.
      • we need this
        • in the Canny Edge Detector section (2.16) to show $(\psi\star f)’ = -\psi’\star f$ (note: (ii) would not be enough to show this!)

2.3 (iii) Approximating any Function with a Differentiable Function

  • note: (B.10 (iii)) has the same conditions!
  • conditions:
    • $\psi\in L^1$, $\psi\geq 0$ and $\int_{\mathbb{R}^d}\psi(x)dx=1$
    • $\psi_\epsilon$ a scaled version of $\psi$: for $\epsilon > 0$, $\psi_\epsilon: \mathbb{R}^d \to \mathbb{R}, x \mapsto \frac{1}{\epsilon^d}\psi(\frac{x}{\epsilon})$
      • factor $\frac{1}{\epsilon^d}$ chosen s.t. integral over $\mathbb{R}^d$ stays $1$
      • gets narrower along $x$-axis and higher along $y$-axis as $\epsilon\to 0$
        • $\epsilon$ is the width of $\psi_\epsilon$, ie. low values of $\epsilon$ mean $\psi_\epsilon$ is very concentrated
    • $f\in L^\infty$ (warning: this just means $f$ must be bounded (aka $\infty$-norm is finite), it does not mean that $f$ must be integrable!)
  • (i) if $f$ is continuous in point $x$ then,
    • $\lim_{\epsilon\to 0}{(\psi_\epsilon\star f)(x)} = f(x)$, ie. $(\psi_\epsilon\star f)$ converges to $f$ in point $x\in \mathbb{R}^d$ (pointwise convergence)
      • the narrower and more concentrated $\psi_\epsilon$ is the better $(\psi_\epsilon\star f)$ approximates $f$
      • we need this in the Canny Edge section (2.16) to show that for very weak smoothing $\sigma\to 0$ the filtered image converges to the original image
  • (ii) if $f$ is uniformly continuous then,
    • $(\psi_\epsilon\star f)$ converges to $f$ uniformly on each compact subset of $\mathbb{R}^d$ (uniform convergence)
    • we need this to show (2.7 (i))
  • tool that helps us approximating images $f$
    • mainly useful for proofs
    • the nice thing is $f$ can be any function, we only need boundedness for $f$ (see condition: $f\in L^\infty$)
      • then, the limit on the lhs of (i) gives us a way to approximate this $f$ with the differentiable function $(\psi_\epsilon\star f)$
    • it is often easier to prove s.th. for a differentiable function and then show that it also holds in the limit
      • so, this is a tool to show properties of the limit $f(x)$ on the rhs of (i)

2.4 The Difference Quotient Converges to the Derivative

  • conditions:
    • $D\subset \mathbb{R}^d$ be compact and convex
      • $D$ is, again, a large ball (because a ball is also convex)
      • at some point in the proof we have 2 points and we need everything in between the 2 points to be in the set that is why we need this convexity
    • $f\in C^1(D)$
  • then \(\forall i\in\{1,\ldots,d\}\forall\epsilon\in (0,\infty)\exists\delta\in (0,\infty)\forall x\in D\forall h\in (-\delta,\delta)\backslash \{0\} : \bigg\lvert \frac{f(x+he_i)-f(x)}{h} - \partial_if(x)\bigg\rvert <\epsilon,\quad\text{where}\quad x+he_i\in D\)
  • $i$: coordinate direction of the derivative
  • $x+he_i\in D$:
    • technicality: because we must not leave the domain where $f$ is defined
  • in words: for each $\epsilon$ we find a $\delta$ so that our shifts $h$ are smaller than $\delta$ in absolute value. Then the difference between the difference quotient and the derivative is smaller than $\epsilon$
  • proof:
    • (i) $D$ compact (aka “closed and bounded”) $\Rightarrow$ $\partial_if$ uniformly continuous on $D$
    • (ii) $x\in D$, $h$ s.t. for $0<h<\delta$, $x+he_i\in D$
    • (iii) $D$ convex $\Rightarrow$ $g: \left[0,h\right] \to \mathbb{R},\,t\mapsto f(x+te_i)$ is well-defined
      • convex: the affine combination $(1 − t)x + ty$ belongs to $D$ for all $x,y$ in $D$ and $t$ in the interval $\left[0,1\right]$
        • problem: (ii) sagt nur $x\in D$ und $x+he_i\in D$, aber was ist mit den Werten zwischen den beiden, also die $x+te_i$ mit $t\in\left(0,h\right)$?
          • solution: convexity legt fest, dass $x+te_i\in D$
        • dh. wäre $D$ nicht convex würde unter Umständen eine Stelle in $\left[0,h\right]$ nicht in $D$ liegen, sodass wir $f(x+te_i)$ an dieser Stelle nicht bestimmen könnten und deshalb $g$ an dieser Stelle dann nicht well-defined wäre.
    • (iv) mean value theorem: $\partial_if(x+\xi e_i)=g’(\xi)=\frac{g(h)-g(0)}{h}=\frac{f(x+he_i)-f(x)}{h}$
  • we need this for (2.3 (ii)) proof

2.5 Correlation is Continuous, Edges are Always Blurred

- conditions:
  - $\frac{1}{p}+\frac{1}{q}=1$ ($q,p$ are dual exponents)
  - $f\in L^p$, $g\in L^q$ (neither of them needs to be continuous !)
- then $$(f\star g)\in C(\mathbb{R}^d)$$

  • in words: “the correlation is continuous” (although $f$ and $g$ do not have to be continuous!)
  • in images: edges and sudden color changes are typical discontinuities in images. Now, (2.5) tells us:
    • if you apply a linear filter, no matter how clever you design it, the result will be continuous! Thus, edges are always blurred.
      • of course, you can have a high resolution and a small filter and then you will have a very tiny blur. But this proposition shows fundamentally the limits regarding edge preservation of linear filters.
    • this is bad bec in images typically you want to have nice and sharp edges.
    • “Linear filters have their merits, but edge preservation is not one of them.”
      • which is where non-linear filters come in
  • proof:
    • exercise: $\lVert T_hg-g\rVert_{L^q}\to 0$ for $h\to 0$
      • in words: “Shift a function by $h$. Then, compare the shifted version with the original. If the shift $h$ gets small then difference of the integrals gets small.”

2.6 Positive Mollifier

  • conditions:
    • $\psi$ as in (2.3 (iii)), but additionally, $\psi\in C_c^\infty$
      • you can think of it as an “averaging kernel”
      • infinitely differentiable + compact support (so $\psi$ can have as many derivatives as we wish and it is nicely localized)
  • then $\psi$ is a positive mollifier
  • this is just a convenient set of properties for kernels, that is why these have a name

2.7 $C_c^\infty$ is dense in $L^p$

  • (i) let $\Omega$ be open, $\psi$ be a mollifier and $1\leq p < \infty$ then \(\forall f\in L^p\,\forall\delta\,\exists\epsilon : \lVert \psi_\epsilon\star f-f\rVert_{L^p}<\delta\)
    • Achtung: wie $\epsilon$-$\delta$-Definition des $\lim$ Symbols, aber das $\epsilon$ ist der Index von $\psi_\epsilon$
      • phth: dh. “the more concentrated $\psi_\epsilon$ the better the approximation” (like in (2.3(iii.i)))
        • phth: das muss stimmen, weil er im nächsten Punkt “very concentrated” sagt
    • in words: “if I convolve $f$ with a very concentrated kernel, then I’m very close to the original $f$”
      • this is like a more specific version of what we had in (2.3(iii)), but there we had a weaker filter kernel (not necessarily differentiable), but now in (2.7) with this additional differentiability we get this nice property that we can get arbitrarily close to our $f$.
        • in (2.3(iii)) war $(\psi_\epsilon\star f)$ nur einmal differentiable, aber hier ist es infinitely differentiable
        • dh. (2.7 (i)) ist ein Spezialfall von (2.3 (iii))
      • why do we want that?:
        • $\psi_\epsilon$ is a mollifier, so it is infinitely often differentiable
        • thus, we can approximate $f$ with a function $\psi_\epsilon \star f$ that is infinitely often differentiable
  • with (i) we can show:

- (ii) $C_c^\infty$ is dense in $L^p$

  • ie. for any $L^p$ function we can find a $C_c^\infty$ function that is arbitrarily close.
    • and this allows to approximate any $f$ with nicely behaving functions which will be very helpful in the future
  • proof (ii):
    • idea: find ANY $C_c^\infty$ element that gets arbitrarily close to our $f$ (which is a $L^p$ function by assumption). Here, this is $\psi_\epsilon\star g$.
      • (a) show $\lVert \psi_\epsilon\star g-f\rVert_{L^p}<\delta$
      • (b) from (2.3(ii)) $\psi_\epsilon\star g\in C^\infty$
      • (c) show for $\epsilon>0$ small enough $\psi_\epsilon\star g\in C_c^\infty$
    • By (a) and (c) a $C_c^\infty$ function (here: $\psi_\epsilon\star g$) can get arbitrarily close to ANY $L^p$ function (here: $f$)
      • (like a sequence of rational numbers can get arbitrarily close to any real number)
  • we will need this later: and this is actually kind of a preparation for some stuff we will need in the FT chapter, but it is fitting nicely here bec it uses the correlation
  • problem:
    • time to look at the discrete case in order to actually compute sth (bec when you want to implement sth you surely would not compute analytically these integrals)

2.8 Discrete Filter, Discrete Correlation in 1D, Padding Methods

  • (i) discrete 1D-cross-correlation of $F$ and $W$:
    • (i.a) for a filter with 3 elements: $W=(w_{-1},w_0,w_1)$ and $F=(f_i)_{i\in\mathbb{Z}}\in\mathbb{R}^{\mathbb{Z}}$, then \(\hat{f}_i = \sum_{j=-1}^{1}w_jf_{i+j}\)
      • the formula on the left means that we take the weighted sum of these $f$s with the $w$s
      • a simple discretization of the 1d version of the correlation
      • $(x, y)$ in the continuous correlation integral are now $(i, j)$ in the sum
    • (i.b) for a larger filter: $W=(w_{-a},\ldots,w_a)\in \mathbb{R}^{2a+1}$, where $a\in\mathbb{N}$ is the radius of the filter, the same holds
  • (ii) problem: the filter needs $f_{i+j}$ for all $i\in\{1,\ldots,m\},j\in\{-1,0,1\}$ or $j\in\{-a,\ldots,a\}$, eg. for the pair $(i=1, j=-1)$ we have $i+j=0$, but there is no $f_{i+j}=f_0$
    • anschaulich: “the filter will have to look outside the signal / image”
    • solution: different ways to define $f_i$ for $i\in\mathbb{Z}\backslash\{1,\ldots,m\}$
      • zero extension ($f_i=0$ außerhalb)
      • constant extension ($f_i=f_1$ links, $f_i=f_m$ rechts)
      • extension by mirroring at the boundary (aka reflection)
        • Berkels: “reflection works probably best, looks the most natural; the others have arbitrary jumps at boundaries
      • periodic extension
        • anschaulich: “dasselbe Signal / Bild links und rechts einfach drangesetzt” (für images meistens nicht helpful)

2.9 Discrete Correlation in 2D

  • discrete cross-correlation \((W\star F)_{i,j} = \sum_{i=-a}^{a}\sum_{j=-b}^{b}w_{k,l}f_{i+k,j+l}\quad\text{for}\quad(i,j)\in\mathcal{I}\), where $f_{i+k,j+l}$ for $(i+k,j+l)\not\in\mathcal{I}$ is defined by extension (see (2.8(ii)))
    • the center of the filter is the element of the filter with $(k=0,l=0)$
  • discrete convolution \((W\ast F)_{i,j} = \sum_{i=-a}^{a}\sum_{j=-b}^{b}w_{k,l}f_{i-k,j-l}\quad\text{for}\quad(i,j)\in\mathcal{I}\), where $f_{i-k,j-l}$ for $(i-k,j-l)\not\in\mathcal{I}$ is defined by extension (see (2.8(ii)))
    • the minus signs in the indices $(i-k,j-l)$ are the only thing that changes wrt the cross-correlation def. (like in the continuous case, where the difference was $y - x$ instead of $y + x$)
  • Notation: see figure below
    • Berkels: I use square brackets to indicate that this is NOT a matrix - a matrix has round brackets
      • (this is different from a matrix bec the 1st index increases from left to right and the 2nd index from bottom to top and the center has index $(0,0)$. This is very different from the usual matrix index notation, where the row number increases from top to bottom and the column number from left to right and $(0,0)$ is on the top left corner!)

Screenshot-from-2024-02-23-23-29-31

  • problem:
    • sometimes it’s helpful to think about these filters in matrix form

2.10 $(W\ast F)$ as a Matrix

  • $(W\ast F)$ is linear in $F$. Thus, filtering with $W$ can be expressed as matrix.
  • see notes
  • Building this matrix $W$ is only useful for theoretical arguments. This is NOT how you would implement this, bec copying all these matrices like this would be a waste of resources. But still, it is good to have seen how you can do this.
    • (for implementation you would avoid building this matrix and just compute the sums directly)
  • problem:
    • now, lets have a look at some filters you can construct in this way

2.11 Denoising Filters

  • averaging filters have $\sum_{k,l}w_{k,l} = 1$
    • TODO: not sure if the rank filters (2.11 (v)) fulfill this property

2.11 (i) Mean Value Filters

  • correspond to the moving average over squares $M_r^{\infty}$ that we had at the beginning of this chapter, see (2.1), or to some more refined avgs where eg. the center is weighted more.
    • in (2.1) the filter was $\frac{1}{| B_r(0) |}\chi_{B_r(0)}$
  • (i) continuous case: the moving average over squares $M_r^\infty$, see (2.1)
    • the ball becomes a square when we use the $\infty$-norm
  • (ii) discrete case: eg. $M_3, M_5$ or for arbitrary sizes $M_{a,b}=\frac{1}{ab}(1,\ldots,1)\in \mathbb{R}^a\otimes(1,\ldots,1)\in \mathbb{R}^b$
    • all ones with scaling factor
    • mean value filters fulfill $\sum_{k,l}w_{k,l}=1$, ie. elements must sum to one bec we want Maß $1$
  • tensor product notation
  • the larger the filter the larger the averaging effect, ie. the more blurry the image gets
  • other mean value filters, eg.
    • (i) trying to average on a discrete circle instead of a discrete square
    • (ii) add a higher weight to the center pixel instead of equal weights everywhere

2.11 (ii) Gaussian Filter

  • idea: by (2.3 (ii)) $\psi\star f$ is as smooth as the $\psi$ $\Rightarrow$ “full smoothing” by choosing sth that is in $C^\infty$ $\Rightarrow g_\sigma\in C^\infty$
  • (i) continuous case: $g_\sigma(x) = \frac{1}{(\sqrt{2\pi}\sigma)^d}e^{-(\frac{\lVert x\rVert}{2\sigma})^2}$
    • normalization factor (should integrate to $1$, depends on dim)
  • (ii) discrete case: $(\widetilde{G}_\sigma)_{k,l} = \frac{1}{(\sqrt{2\pi}\sigma)^d}e^{-(\frac{k^2+l^2}{2\sigma})^2}$ (einfach das $\lVert x\rVert$ in (i) mit $k^2+l^2$ ersetzen)
    • with normalization: $(G_\sigma)=\frac{1}{\sum_{k,l}(\widetilde{G}_\sigma)_{k,l}}(\widetilde{G}_\sigma)$ (einfach (ii) durch die Summe über alle möglichen (ii) dividieren)
    • assumes grid size 1, otherwise $(k^2+l^2)$ has to be scaled corresponding to the grid size
    • biggest weight in the center like the Gaussian function in d-dim
    • smoothing/blurring (less sharp) is less pronounced than for the mean filter above, but also more isotropic bec filter is more rotationally symmetric

2.11 (iii) Binomial Filter

  • an approximation of the Gaussian that can be computed rather easily (without having to evaluate exponential functions)
  • $W=\frac{1}{2}(1,1)$ (the “averager”)
    • recursive definition of $B^n$: To get $B^n$ we correlate $W$ with $(0,B^{n-1},0)$. $B^0:=1$.
      • anschaulich:
        • When correlating $W$ with $(0,B^{n-1},0)$ we basically “average each pair” of neighboring values in $(0,B^{n-1},0)$.
        • E.g. when we average each pair in $(0,B^0,0) = (0,1,0)$ we get $(\frac{1}{2},\frac{1}{2}) = B^1$.
  • the resulting $B_n$ are normalized rows of Pascal’s triangle
    • $B^1=\frac{1}{2}(1,1)$, $B^2=\frac{1}{4}(1,2,1)$ (used for the Sobel filter), $B^3=\frac{1}{8}(1,3,3,1)$, $B^4=\frac{1}{16}(1,4,6,4,1)$
    • that is why it is called Binomial filter
  • $B_{a,b} = B^{a-1}\otimes B^{b-1}$
    • (Achtung: genau gucken: eg. für $5\times 5$ Filter muss das tensor product $B^4\otimes B^4$ gebildet werden, nicht $B^5\otimes B^5$!)
    • tensor product: wie das normale matrix product für zwei $n\times 1$ Matrizen, wobei die zweite (nicht die erste!) Matrix transponiert wird: $\underline{x}^{n\times 1} \otimes \underline{y}^{n\times 1} = \underline{x}^{n\times 1} \underline{y}^{1\times n}$
      • dass die zweite und nicht die erste Matrix transponiert wird, ist zB. in (2.13) wichtig
  • remember: larger filter means larger blurring (but also more denoising)
    • (you could estimate the filter size if you would have the standard deviation of the noise, but for real data you typically do not have that information, so you must try practically which size works best)
  • for large $n$ the Binomial filters are a good approximation of the Gaussian filter
    • but Binomial filters are easier to compute
  • one reason why you would use the Gaussian is when you need the derivative
    • (because the Binomial approximates the Gaussian itself, but if you want to convolve with the derivative of the Gaussian - which we will do next - then you actually need the Gaussian filter.)

2.11 (iv) Duto Blur

  • overlays an image $F$ with a smoothed version of itself $(G_\sigma\star F)$: \(\lambda F+(1-\lambda)(G_\sigma\star F),\quad\text{for}\quad\lambda\in\left[0,1\right]\)
    • ie. it is a convex combination of the filtered image with the original image
  • strange effect: it looks somewhat sharp but also somewhat blurred because of this convex combination
  • not sure whether it is used somewhere in practice. There are some lenses that have this kind of property. (lec9:) This is just for an artistic effect.

2.11 (v) Median Filter, Maximum and Minimum Filter

  • a non-linear filter (cannot preserve edges as we will see)
  • mean value filter as solution of the minimization problem \(\label{eq:medianminimizationproblem}(M_{2a+1}\star F)_{i,j}=\text{argmin}_{f\in\mathbb{R}}\sum_{k=-a}^{a}\sum_{l=-a}^{a}\lvert f_{i+k,j+l}-f\rvert^2\)
  • this is just a different interpretation of what the mean filter does
    • if you solve the minimization problem by taking the necessary condition “derivative of double sum=0” and then solve this equation wrt $f$ you get “$f =$ our orig. definition of the mean filter” as the minimizer
  • now think about outliers:
    • for outliers the difference in ($\ref{eq:medianminimizationproblem}$) would be very large ( $f_{i+k,j+l}$ would be a nonsense value which would be far away from the average $f$ )
      • thus, bec of the square $\cdot^2$, these outliers would have a strong effect on the average. If you are familiar with these kinds of minimization problems then you know that using a $1$ instead of the square reduces this strong effect of the outliers. That is what we will do now.
  • problem: the square makes the minimization problem susceptible to outliers in $F$
  • solution: replace square by a $1$
    • the solution of the resulting minimization problem is the median (exercise)
    • double sum in ($\ref{eq:medianminimizationproblem}$) becomes a single sum (because if we reorder the matrix to a vector, the minimum does not change, so the order does not matter), thus the new minimization problem is simply $\min_{g\in\mathbb{R}}\sum_{i=1}^{n}\lvert g_i-g\rvert$, where the $g_i$ are all the $f_{i+k,j+l}$ in the filter window
  • To summarize:
    • The mean minimizes the sum of squared deviations/errors, whereas the median minimizes the sum of absolute deviations/errors.
  • the median filter of an image $F$ is defined as the median over windows of size $(2a+1)\times(2a+1)$
  • a non-linear filter
  • very suitable to remove outlier pixels, eg. salt-and-pepper noise (ie. when we randomly change pixels to either black or white)
    • whereas with the mean value filter (even with large mean value filters) the salt-and-pepper noise cannot be removed and the image gets very blurry (see slides demo in lecture)
  • of course, the 3x3 median filter will slightly move edges, but in your window less than half of the values should be corrupted
  • if instead of the median, the maximum or minimum is considered the filter is called maximum or minimum filter
    • also non-linear
  • these three filters are known as rank-filters
    • we will revisit these under the name of “morphological filters” later, where we will see why it would be a good idea to take the minimum or maximum and what kind of effect this has

2.11 (vi) Bilateral Filter, Selective Gaussian Filter (aka Nonlinear Gaussian Filter)

  • see lec notes
  • idea: prevent blurring by taking the gray value distance into account with an adaptive weighting function
    • reinterpret $(\psi\star f)$ as $(\psi\star f)(x)=\int_{\mathbb{R}^d}\psi(y-x)f(y)dy$ (by substitution $\phi(y)=y-x$ in the def. of the correlation)
      • the weight at position $y$ only depends on the distance $y-x$, but it is independent of $f(y)-f(x)$
    • idea: take also into account the difference of the gray values $f(y)-f(x)$, so that you only average similar gray values, \((\psi\star f)(x)=\frac{1}{\mathcal{Z}}\int_{\mathbb{R}^d}\psi_1(y-x)\psi_2(f(y)-f(x))f(y)dy\)
      • thus, image intensity edges are not averaged and the edges remain sharp
      • $\mathcal{Z}:=\int_{\mathbb{R}^d}\psi_1(y-x)\psi_2(f(y)-f(x))dy$ is a normalization factor (again, bec we want Maß 1)
        • the computation of $\mathcal{Z}$ makes this filter more costly than linear filters bec the normalization depends on the input image and has to be computed for every pixel
      • $\psi_1$ is the weight function encoding the spatial closeness
      • $\psi_2$ is the weight function encoding the intensity closeness
  • if the weight functions are Gaussians, then the filter is called Selective Gaussian Filter or Nonlinear Gaussian Filter

Screenshot-from-2024-02-24-08-01-39

  • slide demo:
    • selective Gaussian removes the salt-and-pepper noise AND ALSO preserves all the edges sharply (regardless of the filter size used)
    • thus, this is a really practical filter for denoising
  • unlike for the aforementioned filters, here we can use “zero extension”
    • we could not use zero extension for the aforementioned filters because the aforementioned filters did not take the gray value distance into account, so that zero extension would have caused artificial edges averaging of the boundary gray values with the $0$s outside the image and thus, strong blurring at the image boundaries!

2.12 Derivative Filters

  • problem:
    • later we will see that we can use them for edge detection
    • continuous case: later: more sophisticated approach: use the property of the correlation (2.3) that when you convolve with a differentiable kernel you can convolve with the derivative of the kernel to get the derivative of the filtered image.
      • this approach is better because it does not have the “oscillating signal problem” of the derivative filters (see below)
    • discrete case: But, we here first look at finite differences
  • (i) $D_{x_1}^{+}$: forward difference quotient at $i,j$ in direction $x_1$
    • recall, how we defined the coordinate directions in a filter matrix $\underline{W}$ (1st coordinate direction goes from left to right and 2nd direction goes from bottom to top)
      • therefore, we go from $-1$ to $1$ from left to right (ie. along the 1st coordinate direction), or in other words, when the index increases from $i$ to $i+1$
  • (ii) $D_{x_1}^{-}$: backward difference quotient at $i,j$ in direction $x_1$
  • (iii) $D_{x_2}^{+}$, $D_{x_2}^{-}$: the quotients in direction $x_2$
  • (iv) $D_{x_1}^{c}$, $D_{x_2}^{c}$: central difference quotient at $i,j$ in direction $x_1$ and $x_2$
    • this is really just translating the finite differences for the derivatives into the filter notation. So, the exact factors come from applying Taylor to the function so that you get the approximation order that you are looking for. This compuation of the factors should be on the last attendance sheet.
  • (v) $D_{x_1}^{2}$, $D_{x_2}^{2}$: higher derivatives at $i,j$ in direction $x_1$ and $x_2$
  • (vi) derivative filters have $\sum_{k,l}w_{k,l} = 0$, whereas averaging filters have $\sum_{k,l}w_{k,l} = 1$
  • (vii) oscillating signal problem:
    • a problem typical for finite differences
    • central diff quot $D_{x_1}^{c} = 0$, but the signal is not constant
      • ie. it is not desirable to have a derivative of $0$ for a signal that is absolutely NOT constant / a signal that is maximally oscillating.
      • why central diff quot?: bec this is what we would usually do for the 1st order derivative bec the central diff is a better approximation than the forward or backward difference !
    • solution: make use of property (2.3), ie. convolve with the derivative of a certain kernel
  • problem:
    • despite these problems with the finite differences you can use them for edge detection:

2.13 Prewitt and Sobel Edge Detector

  • idea:
    • here the idea of Prewitt and Sobel is to combine the derivative in one direction with an averaging in the other direction, so that you get a smoothed approximation of the derivative.
    • you do this for both directions: So, derivative in $x_1$ direction combined with smoothing in $x_2$ direction and vice versa to get an approximation of the gradient
    • then
      • the magnitude of the gradient will be an estimate of where the edges are and
      • the direction of the gradient will be an estimate of the direction of this edge.
    • summarizing:
      • ie the idea is not only use finite differences, but combine it with smoothing (to get rid of these negative effects of finite diff → see oscillating signal problem)
  • slide demo:
    • if you look in $x_1$ direction you will NOT see this edge in $x_2$ direction and vice versa
    • but if you combine these absolute values you get a nice detection of where these edges are
    • and then you can also convert the angle into a color and then show what the direction of your angle is (for this we will need to introduce the HSV space)
  • (i) Prewitt filter: mean value filter $M_3$ for smoothing, $D_{x_1}^c$ for the derivative
    • $D_{x_1}^{Prewitt}=D_{x_1}^c\otimes M_3$ (merke: für $x_1$ direction erst $D_{x_1}^c$ dann $M_3$, für $x_2$ andersrum)
    • use central difference instead of forward difference (bec central diff has a better approximation order)
  • (ii) Sobel filter: $B^2$ for smoothing, $D_{x_1}^c$ for the derivative
    • $D_{x_1}^{Sobel}=D_{x_1}\otimes B^2$ gleiche Merkregel wie für Prewitt
  • remember: square brackets denote filters which have a different indexing than normal matrices
  • (iii) $F$ be a discrete image, then $G_{x_1}=D_{x_1}^{Prewitt}\star F$ and $G_{x_2}=D_{x_2}^{Prewitt}\star F$
    • for visualization we use $T^{\text{norm}}(G_{x_1})$ and $T^{\text{norm}}(G_{x_2})$, where $T^{norm}$ maps the gradients to $\left[0,1\right]$ to get the gray values
    • slide demo:
      • transition from white to black → maximal positive gradient $= +1$, black edge in $G_{x_1}$ and $G_{x_2}$ visualization
      • transition from black to white → maximal negative gradient $= -1$, white edge in $G_{x_1}$ and $G_{x_2}$ visualization
      • no transition / constant color → zero gradient $= 0$, gray area in $G_{x_1}$ and $G_{x_2}$ visualization
  • to estimate edges we need to combine these two directions bec otherwise we would overlook certain directions of edges if we would look in one direction only (as we have seen in a slide demo before).
  • (iv) then $T^{\text{norm}}(\sqrt{G_{x_1}^2+G_{x_2}^2})$ is the gradient magnitude
    • this is a simple edge detector (“simple” compared to the Canny detector)
    • “square and root pixelwise”: ie. the discrete image $\underline{F}$ is a matrix, both directional derivatives $\underline{G_{x_1}}$ and $\underline{G_{x_2}}$ give us matrices of the same size as $\underline{F}$, and for these matrices I square the entries and sum them up and take the square root of the individual entries
    • again, $T^{\text{norm}}$ is applied on the result of the root, so that the grad magnitude is mapped to $\left[0,1\right]$ to get the gray values
    • the vector that contains $\underline{G_{x_1}}$ and $\underline{G_{x_2}}$ at every pixel gives us a gradient estimate at every pixel. The gradient magnitude is then an estimate for edges.
  • (v) $\theta = \text{arctan2}(G_{x_1},G_{x_2})$ is the gradient direction (direction of “steepest ascent”)
    • anschaulich: the angle of this $(G_{x_1},G_{x_2})$ vector with the x-axis
    • 2-argument arctan: the normal $\text{arctan}$ only covers half of the unit circle and the $\text{arctan2}$ fixes this problem
      • if you are in the right half plane $(a>0)$ you have the standard $\text{arctan}$ formula (the standard $\text{arctan}$ does not know angles outside $(-pi,pi]$ - this is why the standard $\text{arctan}$’s value range is $(-pi,pi]$)
    • $\text{arctan2}$ is often used in computer science
  • (vi) all of this can be done with the Sobel kernels as well

2.14 HSV Space, Visualization of the Gradient Direction $\theta$

  • problem: How to visualize $\theta$?
    • $V=\left[-\pi,\pi\right]$ is not really an interval, but must rather be interpreted as a unit circle
    • we need sth that takes into account that $-\pi$ and $\pi$ are the same (otherwise we would have artificial jumps from black to white when the angle changes from $-\pi$ to $\pi$ although these jumps would have no meaning then. Therefore, we want to avoid such artificial discontinuities.)
  • slide demo:
    • naive visualization: $T^{norm}(\theta)$ has an artificial discontinuity (jump from black to white), even though there is a smooth transition of the angle from $-\pi$ to $\pi$.
      • thus, we need HSV
  • visualize the gradient direction $\theta$ by $H=\theta$
  • visualize the gradient magnitude by
    • option 1: $S=V=1$
    • option 2: $S=G, V=1$
    • option 3: $S=G, V=G$
  • HSV cylinder: 3 axes like for RGB, but H axis goes in a circle:
    • S: white to colorful
    • V: dark to bright
    • H: color: red -> green -> blue -> red (in a circle)
      • this is the behaviour we want for our angle: $-\pi$ and $\pi$ are both red
  • to display we must convert HSV to RGB
    • Berkels: “not a very nice mapping, all these transitions from one color to the next are linear and we need to check in which of these transitions we are to then do this explicit linear interpolation
  • slide demo:
    • option 1: $(H,S,V) = (\theta, 1, 1)$ ie. on top of HSV cylinder, outer boundary (always bright + colorful)
      • problem: completely removes the magnitude and just shows direction of gradient (→ it’s red where there is no gradient !)
      • solution: use the other 2 channels to visualize the gradient magnitude
    • option 2: $(H,S,V) = (\theta, G, 1)$ ie when $\text{grad}=0$ use white, and when $\text{grad}>0$ then more colorful ($\text{grad}$ controls how close we are to the boundary of the cylinder/how far away from the center of the cylinder)
    • option 3: $(H,S,V) = (\theta, G, G)$ ie transition from black to colorful (instead of white to colorful as in 2.)

2.15 CNNs

  • linear filters can be used as feature extractors, eg. in CNNs

2.16 Canny Edge Detector

  • problem:
    • Problem with Prewitt and Sobel detector: there is no “knob” to configure how many edges you want or to see how strong the edges have to be.
  • idea:
    • step 1: smooth the signal (otherwise local maxima cannot be determined)
      • In general, a signal $f$ is not smooth, but (2.3 (ii)) ensures that $\psi \star f \in C^2$ and $(\psi\star f)’ = -\psi’ \star f$ for a suitable kernel $\psi$. Assuming some regularity of $f$ (eg. (2.3 (iii))), with $\psi_\sigma(x) = \frac{1}{\sigma} \psi( \frac{x}{\sigma} )$ we have $\lim_{\sigma\to 0} (\psi_\sigma \star f )(x) = f(x)$.
    • step 2: find a suitable kernel $\psi$
      • The Canny Edge Detector will give us a way to select how sensitive we want to be to the edges and also with some sane properties, eg.
        • when you want fewer edges there will never be new edges and
        • existing edges will either vanish or stay at their position (ie. they will also not move to new positions).

Screenshot-from-2024-02-25-02-37-14

  • 2nd condition: “if we increase the intensity of the smoothing $\sigma$ the edge stays where it is (aka it should not move)”
  • 1st and 3rd condition specify what happens left and right of the edges

Screenshot-from-2024-02-25-02-55-21

  • slide demo:
    • graph $f$ (black),$f’$ (green),$f’’$ (red):
      • black graph: image
      • zero crossings of red graph are where the edges are
    • look at left edge:
      • the only thing we are allowed to do is to decrease the slope where we are locally at the edge (“decrease” because “we just want that if we increase sigma then the edges decrease”, as mentioned below and “edge decreases” = “slope decreases”), this means:
        • left of left edge: we are only allowed to increase values of $f$
        • right of left edge: we are only allowed to decrease values of $f$
    • the 1st and 3rd condition encode exactly this idea
      • $\partial^2_xu > 0$ detects whether we are left of the left edge
        • where $\partial_{\sigma}u > 0$ only allows to increase $f$
      • $\partial^2_xu < 0$ detects whether we are right of the left edge
        • where $\partial_{\sigma}u < 0$ only allows to decrease $f$
    • the situation at the right edge is the opposite, but the 3 conditions apply there, too
  • (i) for $d=1$: solutions of the heat equation $\partial_x^2u=\partial_\sigma u$ (a PDE), with $u(x,\sigma)$ fulfill these 3 properties
    • for $d>1$: using the Laplace operator $\Delta_x$: $\Delta_xu = \sum_{i=1}^{d}\partial^2_{x_i}u$
  • (ii) we require the initial value $u(x,0)=f(x)$ because for $\sigma=0$ we want to get back to the original image (no blur, but noisy)
  • (iii) for $d=1$: the unique solution is $u=(g_{\sqrt{2\sigma}}\ast f)(x)$, thus $g_\sigma$ is a suitable kernel for edge detection with a “knob” to configure the edge strength
    • we just want that if we increase sigma then the edges decrease, but it does not matter how fast. In other words the scale of sigma does not matter
      • take $(\sigma^2)/2$ instead of $\sigma$
    • for $d>1$: for all $d\in\mathbb{N}$ the solution of the heat equation is $g_\sigma$ (exercise)
  • (iv) the convolution can be expressed as a correlation $(g_{\sqrt{2\sigma}}\ast f)=(g_{\sqrt{2\sigma}}\star f)$
  • (v) the Canny Edge Detector computes
    • size: $\rho(x)=\sqrt{(\partial_{x_1}(g_{\sqrt{2\sigma}}\star f))^2+(\partial_{x_2}(g_{\sqrt{2\sigma}}\star f))^2}$
    • direction of the edge (direction of “steepest ascent”): $\theta(x)=\text{arctan2}((\partial_{x_1}(g_{\sqrt{2\sigma}}\star f)), (\partial_{x_2}(g_{\sqrt{2\sigma}}\star f)))$
      • the angle that the gradient has with the x-axis
    • similar to the Sobel detector before, but here we use a Gaussian kernel instead of a Prewitt or Sobel kernel
    • trick: $\partial_{x_i}(g_{\sqrt{2\sigma}}\star f) = \partial_{x_i}g_{\sqrt{2\sigma}}\star f$, we can precompute the corresponding Gaussian derivative filters and filter the image with those
    • as edges we consider all points where $\rho$ is strictly maximal locally in direction $(\sin(\theta),\cos(\theta))$ TODO
  • implementation: to implement this we will have to
    • discretize: ie instead of looking at all $\theta$s we only look at multiples of 45 degrees by rounding $\theta$ to those multiples and then compare neighboring pixel values in direction $\theta$ and $\theta+180$
      • figure below: 3 red boxes: we compare the center value with the 2 neighboring pixels (at $\theta$ and $\theta+180$) and if the middle value is strictly bigger than the 2 outer values then you say there is an edge
    • thresholding: disregard all edges for which $\rho<\text{threshold}$
      • $\text{threshold}$ is another parameter that you set; this will avoid interpreting small noise components as edges (this noise will also lead to response in the gradient but the value may be quite small which is why we can just filter them out like this)

Screenshot-from-2024-02-25-04-40-24

  • slide demo:
    • the larger $\sigma$, the less edges you see
    • edges stay where they are, they do not move
    • no new edges appear
    • small $\sigma$s give you even the tiny noise that you have in the intensities
    • but you can nicely control which kind of edge strength you want to look at

2.17 Laplacian Sharpening

  • goal: Sharpen an unsharp image
  • in 1D: sharpening = emphasizing edges (edges = zeros of $f’’$)
  • (i) $f-\tau f’’$, where $\tau>0$
    • $\tau$: a measure for the degree of “edge amplification”
    • idea:
      • in (2.16): at the zero crossing of the red line we are only allowed to make the slope of the edge (black line) less steep (for increasing $\sigma$) (bec we do not want to move the edge to some other place)
      • here: we want to pronounce the edge → we have to make the slope of the edge MORE steep. → subtracting the red line from the black line exactly achieves this effect
        • this works at all places in the domain, incl. the left as well as the right edge: for example:
          1. left of left edge: pos red values are subtracted from pos black curve
          2. right of left edge: neg red values are subtracted from pos black curve
          3. by 1. and 2. the slope of the black curve must increase
            • The same slope increase effect happens at the right edge.
  • for $d>1$:
    • problem: edges occur in multiple directions (the gradient is orthogonal to the direction of the edge. We cannot simply look at positions where the Hessian is zero!)
    • solution: need a rotationally invariant form of the 2nd derivative → Laplace operator
    • (ii) thus, $f-\tau\Delta_x f$
      • Berkels: facts about the Hessian:
        • its trace is the Laplace operator
        • the sum of its eigenvalues is invariant under rotation (but eigendirections can change!)
        • key insight: all the mixed derivatives are not considered by the Laplace operator. (exercise)
  • problem: images are not smooth enough
    • solution: like in (2.16), smoothen with $g_\sigma$
    • (iii) thus, $(g_\sigma\star f)-\tau\Delta_x(g_\sigma\star f) = (g_\sigma-\tau\Delta_xg_\sigma)\star f$ (exercise)
  • slide demo:
    • increase $\tau$ → sharper edges
    • peppers image: problem: since Gaussian blur does not remove the stripe patterns we are also sharpening those stripe patterns!

C.2.1 Morphological Filters

2.18 Denoising of Objects

  • the “objects” is the subset of the domain (the “big circle” in the drawing) together with the noise dots
  • $f(x)=1$ if $\forall y: \chi_A(x+y)=1$ (universal quantifier) → “denoiser” of $\chi_A$
  • $g(x)=1$ if $\exists y: f(x+y)=1$ (existential quantifier) → “grower” of $f$
  • where
    • $A\subset\Omega$ is a noisy object (ie. the “noise dots” also belong to $A$)
    • $\chi_A$ is the noisy image
    • $f$ is an image that takes the noisy image $\chi_A$ as input
    • $g$ is an image that takes the denoised image $f$ as input

2.19 Erosion and Dilation when $f(x)\in\{0,1\}$

  • erosion = “denoiser” = $\ominus$
  • dilation = “grower” = $\oplus$

2.20 Erosion and Dilation when $f(x)\in\{0,1\}$ - Alternative Formulation

2.21 Erosion and Dilation when $f(x)\in\left[0,1\right]$

  • generalization of (2.19): when we plug in an image that has only values $0$ and $1$ by (2.20) we know that the definitions for the characteristic functions coincide with this more general definition.
  • (i) $f(x)\oplus B = \sup_{y\in B}f(x+y)$
  • (ii) $f(x)\ominus B = \inf_{y\in B}f(x+y)$
  • (iii) erosion and dilation coincide with the minimum and maximum filter
    • more precisely: in (2.11 (v)) we saw: median is just one operation that we could use. We could also use minimum and maximum. And we called this rank-order filters (or “rank filters”).
      • So erosion and dilation are in the same class of filters as the median, they are just a bit more specialized for the task here.
      • The only difference is that
        • in the discrete setting (2.11 (v)) we had a rectangle as structuring element. Because we did the minimum or maximum or median over a rectangle of pixels and since it is finitely many values in this discrete rectangle we could also use maximum and minimum.
        • Here in the continuous setting we must rely on $\sup$ and $\inf$ bec we do not know whether $\max$ and $\min$ will actually be attained.

2.22 Opening and Closing

  • wenn $B$ und $-B$ gleich sind (zB. wenn $B$ ein Ball ist), dann macht das Minus keinen Unterschied, dh. dann ist $B = -B$
  • (i) opening = 1. erosion, 2. dilation
    • denoted by “open” circle symbol $\circ$
    • Memorize: what we have used in the object denoising example (2.18)
      • $f$ is applying erosion to $\chi_A$ and then to the result we apply dilation $g$ with $-B$
      • (notice, the minus sign before $B$, because in the definition of erosion and dilation there is a $+y$ in $f(x+y)=1$ and not $-y$)
  • (ii) closing = 1. dilation, 2. erosion
    • denoted by “closed” circle symbol $\bullet$
    • Memorize: the closing “closes holes”
      • also the hole would not reappear because if it is fully closed then the erosion cannot open it again

2.23 Segmentation with Background Equalization

  • we had the isodata algorithm for the segmentation problem, but this algorithm gives a global threshold and eg. uneven illumination would cause local problems
  • the opening is an estimate of the background:
    1. erosion removes this thin, bright writing
    2. dilation corrects the shrinkage caused by the erosion
  • Here the structuring element $B$ must be wider than the width of the writing, otherwise it would not remove the writing
  • in the end we get an estimate of the background: ie. the smudges stay, the general background stays, the illumination stays, but the writing is gone
  • idea:
    • use this as pre-processing step before isodata:
      • subtract the estimated background from the image
    • this compensates
      • brightness variations and
      • small “smudges” (which can be seen as local brightness variations)
    • this is called white-top-hat transform
    • for white background: black-top-hat transform
    • connection: $f\bullet B - f = (-f) - (-f)\circ B$
      • from the duality of opening and closing (2.24)
      • $\boxed{f - f\circ B} = f - (- ((-f)\bullet B)) \boxed{= f + (-f)\bullet B} \Rightarrow_{(f\to -f)} \boxed{(-f) - (-f)\circ B = -f + f\bullet B}$ which is the same result as above

Screenshot-from-2024-02-28-11-54-32

2.24 Properties of the Morphological Operators

  • $f,g: \mathbb{R}^d\to \left[0,1\right]$
  • erosion, dilation:
    • duality $f\ominus B = - ((-f)\oplus B)$
    • translational invariance $T_hf\ominus B=$
      • if I translate my image by a vector $h$ and then apply the erosion this is the same as if I would first apply the erosion and then shift everything. → rather self-explaining (bec it does not matter where the structures are, the shrinkage and the growth is independent of the position of where we are in the image)
    • monotonicity $f\leq g \Rightarrow$
    • distributivity
      • here, the combination of “minimum and erosion” (or “maximum and dilation”) is important - the other combinations are not allowed (→ because the maximum is not compatible with the $\inf$, when you want to exchange that)
      • pointwise minimum $\land$ and maximum $\lor$ behave like $\cap$ and $\cup$: $f\land g = \chi_{A\cap B}$ and $f\lor g = \chi_{A\cup B}$
        • Gut zum Merken! → vorstellen: 2D Venn Diagramm mit Mengen $A$ und $B$:
          • Schnitt nimmt immer den “tieferen” Wert von $A$ und $B$ → minimum
          • Union nimmt immer den “höheren” Wert von $A$ und $B$ → maximum

Screenshot-from-2024-02-28-13-13-09

  • opening, closing:
    • fulfill all except distributivity (makes sense bec of (2.27)) and additionally:
    • non-increasingness
      • durch opening (“removing noise”) kann $f$ nur kleiner werden oder gleich bleiben
    • non-decreasingness
      • durch closing (“closing holes”) kann $f$ nur größer werden oder gleich bleiben
    • idempotence
      • ie. if you try to open sth multiple times you will not change anything → there is no reason to iterate opening or closing
  • problem:
    • one thing mathematicians wonder when they see these properties is whether these properties already determine what these operators are or if there could be other operators that fulfill these properties

2.25 Summary of next two Propositions

  • (i) just a lemma: distributivity holds also for infinitely many arguments, where in the case of an infinite number of arguments you would take the infimum over all functions instead of taking the minimum over $f$ and $g$ ( $=(f \land g)$ ) only, see (2.26)
    • a lemma needed for the proof of (ii)
  • (ii) main proposition: dilation and erosion are the only translational invariant operators that fulfill the generalized distributivity in (i), see (2.27)
    • proof: using (i)

2.26 Infinite Distributive Law

  • $f_i\in F(\mathbb{R}^d,\left[0,1\right])$, (Warning: not binary images!)
  • (i) generalized distributivity with the maximum $(\lor_i f_i)\oplus B = \lor_i(f_i\oplus B)$
  • (ii) generalized distributivity with the minimum $(\land_i f_i)\ominus B = \land_i(f_i\ominus B)$
  • where $\land_if_i := \inf_i f_i$ and $\lor_if_i := \sup_i f_i$ (instead of $\min$ and $\max$!)

2.27 Dilation and Erosion are the only operators that fulfill (2.26)

  • conditions:
    • mapping $D$ maps binary images to binary images, ie. $D: F(\mathbb{R}^d, \{0,1\})\to F(\mathbb{R}^d, \{0,1\})$
    • mapping $D$ fulfills the translational invariance property (2.24)
    • mapping $D$ fulfills the generalized distributivity either (i) with the maximum or (ii) with the minimum (2.26)
    • some technical assumptions to exclude “pathological examples” (but they are necessary!)
  • (i) then there is a nonempty $B$ s.t. \(D(f) = f\oplus B\quad \forall\, \text{binary images}\,f\)
    • This means that the dilation with $B$ is characterized by these properties:
      • if you know $D$ is translationally invariant and fulfills the conditions above, then it is necessarily a dilation with some set $B$.
  • (ii) then there is a nonempty $B$ s.t. \(D(f) = f\ominus B\quad \forall\, \text{binary images}\,f\)

2.28 Contrast Invariance of Erosion and Dilation

  • conditions:
    • for all continuous, increasing intensity transformations $T: \mathbb{R}\to\mathbb{R}$ (→ (Chapter 1))
    • $f: \mathbb{R}\to \left[0,1\right]$ (not binary!)
  • then,
    • (i) $T(f)\oplus B = T(f\oplus B)$
    • (ii) $T(f)\ominus B = T(f\ominus B)$
  • in words: “it does not matter whether we change the contrast first or whether we dilate first”
  • proof:
    • $(T(f)\ominus B)(x) = \inf_{y\in B} T(f(x+y)) = T(\inf_{y\in B} f(x+y)) = T((f\ominus B)(x))$
    • interchangeability of infimum and this continuous, increasing functions $\inf_{y\in B} T(f(x+y)) = T(\inf_{y\in B} f(x+y))$ (shown in exercise)

2.29 Generalization of Erosion and Dilation

  • replace the “flat” structuring element with a function $b(y)$, ie. $(f\oplus b)(x) := \sup_{y\in B} (f(x+y)+b(y))$
    • so far we were shifting this $B$ (probably a circle), but now this $b$ is somehow shaped and it can adjust your function graph to put different weights to pronounce more what is going on in the middle of the element.
    • for $b\equiv 0$ equivalent to the usual dilation