Unit-3 3

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 32

UNIT-3

Chapter- 3

Basics of Images and Videos:


An image can be digitally represented as a function of space as shown below:

I = f(x, y)

where x and y are points is space and I represents the intensity. f is a mapping of points in space
to the corresponding intensity values.
As shown in the figure below, each point on the image is mapped to an intensity value which is
the color that we see at that particular point.

For an 8-bit encoded image, intensity levels can have 2 ^ 8 values, i.e, from 0–255.
For a grayscale image, each point in space can have values in the range [0, 255].
For a color image, each point in space takes 3 values, each in the range [0, 255] for the 3 color
channels, R, G and B.
A video on the other hand varies both in space and time as shown in the equation below:
V = f(x, y, t)
where x and y are points is space, t represents time and V represents the intensity. f is a mapping
of points in space and time to the corresponding intensity values.
As shown in the video below, the intensity at a point (on the road) not only is a function of the x
and y coordinates of that point but is also a function of time. It is time that determines whether
the intensity value is contributed by a vehicle or by the road.

Ways of Detecting Moving Objects


1. Basic Motion Detection
The first and most intuitive way is to calculate the differences between frames, or between a
frame considered “background” and all the other frames.
The idea is quite simple at the highest level: first, save the first frame. After saving it, compare it
to the new frames. By comparing them pixel by pixel, simply subtract the two images. In this
way, you are going to obtain moving objects. This technique is quite fast to implement but it is
not that suitable for applications because you need to set the default frame as a background and
probably background is not going to stay constant in your applications.
Imagine you are detecting cars. Setting a default background is not going to be effective because
cars are constantly moving, and everything is changing. Lighting conditions fluctuate, and
objects are in motion. For instance, you set the first frame as a background image and there are 3
cars in the background image, but after just one second, they are not going to exist because they
are moving. Consequently, the background image becomes inaccurate as everything is changing
rapidly. Therefore, the algorithm is not going to be accurate, especially in environments with
rapid changes.
Look at the images; the algorithm works, but it is not accurate. Look at the image on the left;
there are meaningless areas. That is because in videos, the background is nearly changing every
second, but in the algorithm, the background is constant.

I think you understand the main idea behind Basic Motion Detection. It may be useful depending
on your expectations. If you don’t expect high accuracy, it may be useful, but it is certainly not
the best approach.
To address the issues described above, Background Subtractors come into play. Now it is time to
talk about Background subtraction and subtractors.

Background subtraction (BS) is a common and widely used technique for generating a
foreground mask (namely, a binary image containing the pixels belonging to moving objects in
the scene) by using static cameras.

As the name suggests, BS calculates the foreground mask performing a subtraction between the
current frame and a background model, containing the static part of the scene or, more in
general, everything that can be considered as background given the characteristics of the
observed scene.
2. Background Subtraction
What is Background Subtraction ?
Background subtraction is a fundamental technique in computer vision for isolating moving
objects from the background in a video stream. By comparing each frame of the video to a
background model, areas of significant difference can be identified as potential foreground
objects. This foreground information can then be used for various purposes, including object
detection and tracking. Background subtraction is often a crucial step in many object tracking
and detection algorithms.

How it solves background problem


In background subtraction, the background image is not constant; it changes over time due to
various factors such as lighting variations, object movements, and scene dynamics. The goal of
background subtraction algorithms is to adaptively model and update the background to
accurately detect foreground objects in changing environments. In this way, the background
problem is solved.
There are several different background subtractors as some are :
● K-Nearest Neighbors (KNN)
● Mixture of Gaussians (MOG2)

Here is how MOG2 works for background subtraction:


1. Initialization: Initialize a mixture of K Gaussian distributions to model the background of
the scene. Each pixel’s background model is represented by a mixture of Gaussians, with
K being a predefined parameter.
2. Adaptation: Update the background model for each pixel over time, adjusting the
parameters of the Gaussian distributions to adapt to changes in the scene.

Background Model
3. Foreground Detection: Compute the probability of each pixel belonging to the
background based on the Gaussian mixture model. Pixels with low probabilities are
classified as foreground.
4. Update Background: For pixels classified as background, update the Gaussian
distributions to incorporate new observations and adapt to changes in the scene
Post-processing: Apply morphological operations(erosion, dilation..) or other techniques
to refine the foreground mask and remove noise.

Gaussian Mixture Modeling (GMM):


Gaussian Mixture Modeling is the method of modeling data as a weighted sum of Gaussians.
GMMs are widely used to cluster data, where each point in the n-dimensional feature space gets
associated with each of the clusters with a certain probability, unlike in k-means clustering,
where a point in the feature space gets associated with only a single cluster,
Each of these clusters are parameterized by the cluster mean (μ), the covariance (Σ) and weight
(π) as shown below:

Couple of articles on Gaussian Mixture Modeling are provided in the reference section. It is
recommended to read those articles to gain a better understanding of how we will leverage
GMMs for background modeling.
Background Modeling:
Background modeling is the task of extracting the static background from a sequence of
video frames. Once the background has been modeled, a technique called background
subtraction which allows an image’s foreground to be extracted for further processing (object
recognition etc.) is generally used. Hence, background modeling forms an integral part of
foreground extraction and analysis.
As shown in the video below, the background consists of the road which gets hidden
occasionally owing to variations in the foreground caused by the moving vehicles.

The vehicles constitute the foreground here and their dynamic nature
accounts for the variation in intensity levels of points on the road. The
end result of this exercise would be an image (frame) where there will
be no vehicles, i.e, an image devoid of the dynamic foreground.
We are going to model each point in space for all the three image channels, namely R, G and B
as a bimodal distribution of Gaussians, where one Gaussian in the mixture accounts for the
background and the other for the foreground.

The vehicles constitute the foreground here and their dynamic nature accounts for the variation
in intensity levels of points on the road. The end result of this exercise would be an image
(frame) where there will be no vehicles, i.e, an image devoid of the dynamic foreground.
We are going to model each point in space for all the three image channels, namely R, G and B
as a bimodal distribution of Gaussians, where one Gaussian in the mixture accounts for the
background and the other for the foreground.
Algorithm:
The step-wise approach is as follows:
1. Extract frames from the video.
2. Stack the frames in an array where the final array dimensions will be (num_frames,
image_width, image_height, num_channels)
3. Initialize a dummy background image of the same size as that of the individual
frames.
4. For each point characterized by the x-coordinate, the y-coordinate and the channel,
model the intensity value across all the frames as a mixture of two Gaussians.
5. Once modeled, initialize the intensity value at the corresponding location in the
dummy background image with the mean of the most weighted cluster. The most
weighted cluster will be the one coming from the background whereas owing to the
dynamically changing and sparse nature of the foreground, the other cluster will be
weighted less.
6. Finally, the background image will contain the intensity values corresponding to the
static background.
Optical Flow:
Put simply, optical flow is the movement of pixels over time. The goal of optical flow is to
generate a motion vector for each pixel in an image between t0 and t1 by looking at two images
I0 and I1. By computing a motion vector field between each successive frame in a video, we can
track the flow of objects, or, more accurately, "brightness patterns" over extended periods of
time. However, it is important to note that while optical flow aims to represent the motion of
image patterns, it is limited to representing the apparent motion of these patterns.
This nuanced difference is explained more in depth in the Assumptions and Limitations further
section.
Two type of flows can be seen in images/ videos:

● Optical flow is the apparent motion of brightness patterns in the image. It’s the 2D
projection of the physical movement of points relative to the observer. A common
assumption is brightness constancy: the brightness of a point in the image does not
change over time. Algorithms for optical flow estimation analyze the pixel intensity
patterns between frames to determine their displacement. By observing how these
intensities change, the algorithm infers the motion of objects in the scene.
1. Optical flow is the apparent motion of brightness patterns in the image.
2. It’s the 2D projection of the physical movement of points relative to the observer.
3. A common assumption is brightness constancy: the brightness of a point in the
image does not change over time.
4. Algorithms for optical flow estimation analyze the pixel intensity patterns
between frames to determine their displacement.
5. By observing how these intensities change, the algorithm infers the motion of
objects in the scene.
6. Optical flow is the motion of each brightness pattern.
7. The Lucas-Kanade algorithm is the most popular algorithm for computing optical
flow.
8. This method assumes that motion is constant within a local neighborhood and
computes optical flow using spatial gradients.
9. It’s efficient and works well for small motion.
10. Optical flow has various applications in computer vision, including object
tracking, video recognition, and 3D reconstruction.
11. It’s like watching a movie and noticing how things move from one frame to the
next.
12. It’s all about tracking the movement of bright spots in an image.
13. It assumes that the brightness of a point in an image doesn’t change over time.
14. It uses changes in brightness to figure out how things are moving.
15. It’s like watching a bunch of dots move around on a screen.
16. It’s used in things like video games and special effects to make things look more
realistic.
● Motion Field: In computer vision, the motion field represents the 3D motion as it is
projected onto an image plane. The primary goal of object tracking is to estimate the
motion field. The motion field represents the motion of each pixel in the frame. The
motion field cannot be measured directly. In object tracking, the motion field can be
measured in some particular condition by estimating the optical flow.
1. In computer vision, the motion field represents the 3D motion as it is projected
onto an image plane.
2. The primary goal of object tracking is to estimate the motion field.
3. The motion field represents the motion of each pixel in the frame.
4. The motion field cannot be measured directly.
5. In object tracking, the motion field can be measured in some particular condition
by estimating the optical flow.
6. The motion field cannot be correctly measured for all image points.
7. The motion field is superficially similar to a dense motion field derived from the
techniques of motion estimation.
8. The motion field is used in estimating the three-dimensional nature and structure
of the scene.
9. It is also used in estimating the 3D motion of objects and the observer relative to
the scene.
10. Major interests of motion analysis is to estimate 3D motion.
11. It’s like having a map that shows you how everything in a scene is moving.
12. It’s all about figuring out the 3D motion of objects.
13. It’s used in things like robotics and self-driving cars to understand the world
around them.
14. It can’t be measured directly, so it has to be estimated.
15. It’s like having a bunch of arrows that show you how everything in a scene is
moving.

In what conditions Optical Flow and Motion Flow Are Equal:

○ Ideally, the optical flow would be the same as the motion field. This is because
both are trying to estimate the same thing: the motion of objects in the scene.
○ In a perfect scenario where there is no noise, occlusion, or other visual artifacts,
and the scene is fully textured, the optical flow and motion field would be equal.

In what conditions Optical Flow and Motion Flow Are not Equal:

○ Unfortunately, in real-world scenarios, the optical flow does not always


correspond to the motion field.
○ For example, consider a rotating sphere under a fixed light. The sphere is made of
a single material that spins around the vertical axis, passing through its center. In
this case, we observe a shading change and brightness patterns moving in the
images as the light source moves. Hence, there is a non-zero optical flow but no
motion field.
○ Another example is when there is a lack of texture in the image, which leads to
noisy matches among the many nearly identical pixels.
○ Image intensity noise, specularities in the image which change location from one
camera view to another, and scene points appearing in one image but not another
due to occlusion, can also lead to differences between optical flow and motion
field.
Optical Flow constraint equation and then develop an algorithm for estimating the optical flow.
At each point, that uses this constraint equation. So here is our scenario.

Figure shows: Let's say we have two images of a scene taken in quick succession, the scene in
this case is this bird in flight. So
you have an image taken at time
T and then you have an image
taken at time T, there's 𝛥 T, 𝛥 Tis
small. And now let's consider one
window, one small window in
these two images. And let's
assume that the location here is X
Y. Now, at time T plus 𝛥T, that
point has moved to a new
location, which is X plus 𝛥 X, Y,
plus 𝛥 Y. So the displacement of
the point X Y can be said to be 𝛥
X, 𝛥Y.
And if you take 𝛥X by 𝛥T and 𝛥Y by 𝛥T, then we essentially have the speed of the point in the
X and Y directions and that we will call you V. And that is the optical flow corresponding to the
point. This is what we want to measure U. V.
Inorder to solve this problem following assumptions are required:

● The first assumption is that the


brightness of an image point remains constant
over time.

● And then we come to our second assumption,


which is that the displacement, but that we are
talking about the spatial displacement, 𝛥X,
𝛥Y and the time step 𝛥T, we're going to
assume the used to be really small. That has to be the case for us to be able to derive a
constrained equation.
Taylor Series Expansion

● The Taylor series expansion is used to


make a linear approximation for the
function representing image brightness.
○ It allows for modeling the
expansion using just the first order
term in the context of small spatial
and time displacements.

Optical Flow Constraint Equation

● The optical flow constraint equation is


derived by subtracting and
manipulating the equations based on
the two assumptions. The equation
states that

Ix * u + Iy * v + It = 0,

where u and v are the components of


optical flow.

Geometrical Interpretation of Optical Flow Constraint Equation The optical flow


constraint equation is represented by the line equation Ixu + Iyv + It = 0.

○ This equation forms a straight line in


the U-V space.
○ The true optical flow vector lies on
this line.

The optical flow estimation problem is under-


constrained because the exact location of the optical
flow vector on the line is unknown.
○ The optical flow vector can be split into two components: the normal component
and the parallel component.
■ The normal component is perpendicular to the constraint line.
■ The parallel component is parallel to the constraint line.

Estimation of Optical Flow Components

● The normal flow can be computed using the constraint line equation.

It can be computed using the unit


vector perpendicular to the
constraint line and the distance of
the line from the origin.

● The parallel flow cannot be computed as the


exact location of the optical flow vector on
the line is unknown.

Under-Constrained Nature of
Optical Flow Problem

● The under-constrained nature of


the optical flow problem applies to
both algorithms and humans.
○ When observing the motion
of an object through a small
aperture, only the normal
flow can be determined.
○ This limitation is known as
the aperture problem in
the estimation of optical
flow.
Solution Approach

● The optical flow constraint equation has two unknowns and only one equation.
○ Additional constraints are used to solve the under-constrained optical flow
problem.

So here are assumptions and limitations of optical flow :

2.2.1 Apparent Motion


Given a two dimensional image, optical flow can only represent the apparent motion of
brightness patterns, meaning that the movement vectors of optical flow can be the result of a
variety of actions. For instance, variable lighting can cause strong motion vectors on static
objects, and movement into or out of the frame cannot be captured by the 2D motion vectors of
optical flow. One example of an issue poorly dealt
with by optical flow is the aperture problem.

Figure 1: In the aperture problem, the line appears to


have moved to the right when only in the context of
the frame, but the true motion of the line was down
and to the right. The aperture problem is a result of
optical flow being unable to represent motion along
an edge–an issue that can lead to other errors in
motion estimation as well

2.2.2 Brightness Consistency


As optical flow can only represent apparent motion, to correctly track the motion of points on an
image we must assume that these points remain at the same brightness between frames. The
equation for this brightness consistency equation is as follows

where u(x,y) represents the horizontal motion of a point and v(x,y) represents the vertical
motion.

2.2.3 Small Motion


Optical flow assumes that points do not move very far between consecutive images. This is often
a safe assumption, as videos are typically comprised of 20+ frames per second, so motion
between individual frames is small. However, in cases where the object is very fast or close to
the camera this assumption can still prove to be untrue. To understand why this assumption is
necessary, we must consider the Brightness Consistency equation defined above. When trying
to solve this equation, it is useful to linearize the right side using a Taylor expansion. This yields
Linearizing in this way allows us to solve for the u and v motion vectors we want, but in this
case we have only included the first order Taylor series terms. When motion is large between
frames, these terms do a poor job of capturing the entire motion, thus leading to inaccurate u,v.

2.2.4 Spatial Coherence


Spatial coherence is the assumption that nearby pixels will move together, typically because they
are part of the same object. To see why this assumption is necessary, consider the equation for
optical flow as defined above

Giving us

Ignoring the meaning of this derivation for the moment, it is clear that we do not have enough
equations to find both u and v at every single pixel. Assuming that pixels move together allows
us to use many more equations with the same [u v], making it possible to solve for the motion of
pixels in this neighborhood

Lucas-Kanade Technique:

Optical Flow Estimation Problem

● The problem is under constraint


● Lucas Kanade Method assumes optical flow in a small neighborhood is constant

Assumptions for Optical


Flow

● Motion field and optical


flow are constant within a
small neighborhood
○ All points within a patch move in the same way
○ Derivative of intensity in X and Y direction for each point within the window is
equal to zero

Solving for Optical Flow

● System of equations is derived


from the constraints for each point
○ Equations are not linearly
dependent due to different
spatial derivatives
● Equations are stacked and written
in matrix form
○ Use least squares solution
to find the unknown vector
for optical flow

Recovering Imagemotion Given By (uv)


in the above equation requires at least two
equations per pixel.To achieve this,the
Lucas-Kanade technique for image tracking relies on an additional constraint—spatial.
The spatial coherence constraint is applied to a pixel using a window of size k x k. The
assumption is that the neighboring pixels in this window will have the same(uv). For example, in
a 5x5 window the following equations apply:

This produces an overly-constrained system of linear equations of the form Ad=b. Using a least
squares method for solving over-constrained systems, we reduce the problem to solving for d in
(ATA) d=ATb. More explicitly the system to solve is reduced to
Conditions for Working
3.1 Condition for an Existing Solution:
In order to solve the system the following conditions should hold:

- A should be invertible.
- A should not be too small due to noise.
Eigenvalues 𝜆1 and 𝜆2 of A should not be too small.
- A should be well-conditioned.
i.e 𝜆1 /𝜆2 should not be too large (for 𝜆1 >𝜆2).

Figure 2: Conditions for a


solvable matrix A may be
interpreted as different edge
regions depending on the relation
between 𝜆1 and 𝜆2. Corner
regions produce more optimal
conditions.

Using this interpretation, it is apparent that an ideal region for Lucas-Kanade optical flow
estimation is a corner. Visually, if 𝜆1and 𝜆2 are too small this means the region is too “flat”.
If 𝜆1>> 𝜆2, the method suffers from the aperture problem,and may fail to solve for correct
optical flow.
Figure 3: Example of regions with large 𝜆1 and small 𝜆2 (left), small 𝜆1 and small 𝜆2 (center,
low texture region), large 𝜆1 and large 𝜆2 (right, high texture region)
3.3 Error in Lucas-Kanade:
The Lucas-Kanade method is constrained under the assumptions of optical flow. Supposing that
A is easily invertible and that there is not much noise in the image, errors may still arise
when:
● Brightness constancy is not satisfied, meaning that a pixel may change intensity from
different time steps.
● The motion is not small or and does not change gradually over time.
● Spatial coherence is not satisfied, meaning neighboring pixels do not move alike.
This may arise due to an inappropriately sized window (choosing bad k).
Lucas-Kanade was that there would be small motion of points between consecutive frames. This
assumption causes the algorithm to fall apart when dealing with large motion:

Notice in the graphic above,


Lucas-Kanade can’t find a
consistent vector for the flow of
the tree trunk. In order to correct for this, we can apply a tactic where we apply Lucas-Kanade
iteratively to a lower resolution version of the image, similar to how we created image pyramids
for our sliding-window feature detector.

Manifestation in Real Images

● Textureless patch in the scene leads to small spatial gradients


○ Visualize the condition number by fitting an ellipse to the points in the gradient
space
○ Small eigenvalues result in unreliable optical flow computation
● Edge in the image has strong gradients in one direction and weak gradients in the other
○ One eigenvalue significantly larger than the other
○ Aperture problem arises
● Good case: rich texture with large eigenvalues in the matrix A transpose A
○ Reliable optical flow computation
Conclusion of KLT

● Reliable optical flow computation depends on the condition of A transpose A


● Textureless and edge regions lead to unreliable optical flow
● Rich texture results in reliable optical flow estimation

Coarse to fine Optical Flow Estimation:

What happens if you have large motion between consecutive images? So consider this case here
you have two images taken in quick succession.

Let's assume that in this case, the


camera's moving. And that's the
cause of the motion. And because
the tree is close to the camera, its
motion is going to be substantial.
As per perspective, projection,
maybe by tens of pixels. Here
simple linear optical flow
constraint equation is not valid
anymore. So what do we do in this
case? But there's a simple trick
that we can play here, which is by
using what's called the resolution
pyramid.

Let's say that these are


the images that were
given to us taken at
time T and T plus
Delta T, and let's say
that the resolution of
each images N by N.
Here is a need to
compute lower
resolution versions of these two images. We can compute images that are of resolution and by
two times and by two.

And maybe you come down to end by eight times and behaved so very small number of pixels,
but at some point all emotions are going to be less than a big so in magnitude. And if that is the
case now and that resolution. That very low resolution, your optical flow constraint equation
becomes valid again. So that's the key observation.

So here's the algorithm for then


going from coarse to fine estimation
of optical flow.

Sure, here’s the step-by-step


procedure for going from coarse to
fine estimation of optical flow:

1. Start with the Lowest


Resolution: Begin with the
smallest or lowest resolution
images.
2. Apply Optical Flow
Algorithm: Apply an optical
flow algorithm, such as the Lucas Kanade algorithm, to these low-resolution images to
get the initial optical flow.
3. Warp the Image: Use the computed optical flow to warp the image in the next higher
resolution. This involves pushing the pixels around in the image to compute a new image,
which is a warped version of the original.
4. Compute Optical Flow Between Warped and Original Image: Now, compute the
optical flow between these two images. This is possible because these two images have
motions that are very small with respect to each other, as most of the large motions have
been taken care of by applying the warp.
5. Add Residual Flow: Add the newly computed optical flow (residual flow) to the
previous one. This gives a new flow.
6. Repeat for Each Level of Resolution: Use this new flow to warp the next level of
resolution. Repeat this process until you finally arrive at the highest resolution images.
7. Final Flow: The final flow is obtained at the highest resolution, giving the optical flow at
every pixel in the image.

This method works well because it propagates information from lower resolutions to higher
resolutions while always ensuring that the optical flow constraint equation remains valid.
When we try to find the flow vector, the small motion condition is fulfilled, as the downsampled
pixels move less from frame to consecutive frame than pixels in the higher resolution image.

Here are some key points for Coarse to fine Optical Flow:

1. Optical Flow Results: Optical flow can handle scenes with varying textures and
motions. Each vector in the optical flow reveals the speed and direction of the motion of
the point.
2. Consistency with Motion: Optical flow vectors are consistent with the motion in the
scene, even in cases of rotation.
3. Template Matching Approach: Another approach to compute optical flow is template
matching. This involves using a small window in one image as a template and searching
for the best match in another image.
4. Search Space: The search space needs to be large enough to account for the unknown
magnitude of motion.
5. Optical Flow Vector: The difference between the locations of the two windows is the
optical flow vector.
6. Challenges with Template Matching: This approach can be slow, especially with large
search spaces, and it needs to be repeated for every pixel in the image. It can also lead to
mismatches, as it doesn’t use local image derivatives.
7. Advantages of Using Local Derivatives: Using local derivatives to compute optical
flow, as done earlier, has major advantages. It can prevent mismatches that might occur
when appearances match but don’t correspond to the optical flow.
Motion Illusion: Motion illusion, also known as “apparent motion” or “illusory motion”, is
an optical illusion in which a static image appears to be moving due to the cognitive effects of
interacting color contrasts, object shapes, and position. It’s a fascinating aspect of visual
perception and plays a significant role in our understanding of the world around us.
In the context of computer vision, motion illusion can be seen as a challenge or an opportunity.
For instance, it can be a challenge when trying to analyze video data, as the perceived motion
may not align with the actual motion in the scene. On the other hand, understanding the
principles behind motion illusions can help in designing more effective algorithms for motion
detection and analysis.

Motion is a perceptual attribute: the visual system infers motion from the changing pattern of
light in the retinal image. Often the inference is correct. Sometimes it is not. In class I showed
you a number of demonstrations in which motion is misperceived. Below is one example of a
visual illusion of motion that I made. It is a tribute to Duchamp's cubist painting titled "Nude
Descending a Staircase" in which the changing pattern of light gives the illusion of motion even
though she never gets anywhere (you made need to double-click on the image below or reload
the page for the animation to play).
Another example is the motion aftereffect. Stare at the center of the following animation for
about a minute, as it expands continuously (you may need to reload the page to get it moving
again after it stops), then fix your gaze on the colorful texture pattern next to it.

After viewing continuous motion in the


same direction for a long time, if you look
at a stationary object, it appears to move in the direction opposite to the one you were viewing.
This is sometimes called the "waterfall illusion" - if you look at a waterfall for a while, then look
at a tree next to it, the tree appears to move upward. The demonstration above shows that this
adaptation is local in the retina (to the right of where you were looking, you were adapting to
rightward motion, to the left you adapted to leftward, and so on). We take this as evidence for the
existence of neurons that are sensitive to motion and selective for the direction of motion, which
adapt to the stimulus (analogous to color adaptation after-effects).

Below is yet another example of a motion illusion.


Role of motion perception: Motion perception serves lots of helpful functions.

● Simply detecting that something is moving, draws your attention to it.


● Segmentation of foreground from background.
● Compute the 3D shape of an object.
● Compute the distance to various objects in the scene and estimate the direction in which
you are heading within the scene. For example, hold up two fingers (one on each hand)
at different distances, and move your head slowly from side to side while fixating an
object on a far wall. Things that are further away slide across the retina more slowly.
When there is strong motion on your retina, especially in peripheral regions, you can
misattribute that motion and perceive yourself as moving (called "vection"). Movies
(especially with large screens as in an IMAX theater) can give this illusion that you are
moving.
● Recognition actions, such as movements of a human (in the "point light displays" shown
in class of people walking, dancing, etc., displayed as the motion of a small number of
dots attached to the joints of the person).

Here are some key points about motion illusion:


1. Continuity Illusion: This is a phenomenon where our brain perceives a sequence of
quick flashes as continuous, smooth motion. It’s a fundamental aspect of how all
mammals, from humans to rats, perceive the dynamic world around them.
2. Flicker Fusion Frequency (FFF): The speed at which flashes must occur for our brain
to see them as constant rather than flickering is known as the Flicker Fusion Frequency
(FFF) threshold. This threshold varies among animals.
3. Illusory Motion: Illusory motion illusions create the perception of movement in static
images. They can make objects appear to rotate, vibrate, or pulsate, even though they are
not actually moving.
4. Vection: When there is strong motion on your retina, especially in peripheral regions,
you can misattribute that motion and perceive yourself as moving. This is called
“vection”. Movies, especially with large screens as in an IMAX theater, can give this
illusion that you are moving.

Optical illusions are fascinating phenomena that play tricks on our eyes and deceive our
perception of reality. These visual illusions occur when our brain interprets the information
received by our eyes in a way that does not match the physical reality. Optical illusions can be
found in various forms, such as geometric illusions, ambiguous illusions, and cognitive illusions.
They often challenge our understanding of depth, size, color, and motion. These illusions have
been studied by psychologists and neuroscientists to gain insights into how our brain processes
visual information. Now, let’s take a look at some key takeaways about optical illusions in the
table below:

Types of Optical Description


illusions

Geometric Illusions These illusions involve geometric shapes that appear distorted or
misaligned.

Ambiguous Illusions These illusions can be interpreted in more than one way, leading to
confusion or uncertainty.

Cognitive Illusions These illusions exploit our cognitive processes, such as memory and
attention, to create perceptual distortions.

Depth Illusions These illusions create an illusion of depth or three-dimensionality in a


two-dimensional image.
Motion Illusions These illusions give the perception of movement or motion in a static
image.

Understanding Optical Illusions

Optical illusions are fascinating phenomena that play tricks on our visual perception. They occur
when our brain processes visual stimuli in a way that creates an illusionary effect, causing us to
see objects or images differently than they actually are. These illusions challenge our perception
of reality and provide insights into the workings of our visual system and cognitive psychology.

Definition Of Optical Illusions

Optical illusions can be defined as perceptual distortions that occur when our brain processes
visual information in a way that deviates from the objective reality of the stimulus. They can
occur due to various factors, such as the way our eyes perceive color, depth perception, or the
brain’s tendency to fill in missing information. Optical illusions can be created using a
combination of lines, shapes, colors, and patterns to trick our brain into perceiving something
that is not actually there.

The Science Behind Optical Illusions

The science behind optical illusions lies in the complex interplay between our visual system and
the brain. Our eyes receive visual stimuli, which are then processed by the brain to create our
subjective perception of the world around us. However, this process is not always accurate, and
our brain can be easily fooled by certain visual cues.

One of the key factors that contribute to optical illusions is the brain’s tendency to perceive
patterns and organize visual information. This perceptual organization can sometimes lead to
misinterpretations of visual stimuli, resulting in illusions. Additionally, our brain relies on past
experiences and expectations to interpret what we see, which can further contribute to the
creation of illusions.

Types Of Optical Illusions

There are various types of optical illusions that can be categorized based on the specific visual
effects they create. Some common types of optical illusions include:
1. Ambiguous Figures: These illusions involve images or patterns that can be
interpreted in more than one way, leading to a perceptual flip-flop between
different interpretations.
2. Motion Illusions: These illusions create a sense of movement or motion where
there is none. They can make static images appear as if they are moving or create
the illusion of objects changing their position.
3. Geometric Illusions: Geometric illusions involve the use of geometric shapes and
patterns to create distortions in size, length, or angles. These illusions can make
objects appear larger or smaller than they actually are.
4. Illusionary Contours: Illusionary contours are perceived edges or boundaries that
are not actually present in the stimulus. Our brain fills in the missing information
to create the illusion of contours or shapes.
5. Illusory Motion: Illusory motion illusions create the perception of movement in
static images. They can make objects appear to rotate, vibrate, or pulsate, even
though they are not actually moving.
6. Illusory Patterns: Illusory patterns involve the creation of patterns or textures that
are not actually present in the stimulus. These patterns can trick our brain into
perceiving regularity or organization where there is none.

These are just a few examples of the many types of optical illusions that exist.

Motion Estimation:
An important task in both human and computer vision is to model how images (and the
underlying scene) change over time. Our visual input is constantly moving, even when the world
is static. Motion tells us how objects move in the world, and how we move relative to the scene.
It is an important grouping cue that lets us discover new objects. It also tells us about the three-
dimensional (3D) structure of the scene.

Look around you and write down how many things are moving and what they are doing. Take
note of the things that are moving because you interact with them (such as this book or your
computer) and the things that move independently of you.

The first observation you might make is that not much is happening. Nothing really moves. Most
of the world is remarkably static, and when something moves it attracts our attention. However,
motion perception becomes extremely powerful as soon as the world starts to move. Our visual
system can form a detailed representation of moving objects with complex shapes. Even in front
of a static image, we form a representation of the dynamics of an object, as shown in the
photograph in ?
Looking at the power of that static image to convey motion, one wonders if seeing movies is
really necessary. From the notes you took about what moves around you, probably you deduced
that the world is, most of the time, static.

And yet, biological systems need motion signals to learn. Hubel and Wiesel Wiesel (1982)
observed that a paralyzed kitten was not capable of developing its visual system properly. The
human eye is constantly moving with saccades and microsaccades. Even when the world is
static, the eye is a moving camera that explores the world. Motion tells us about the temporal
evolution of a 3D scene, and is important for predicting events, perceiving physics, and
recognizing actions. Motion allows us to segment objects from the static background, understand
events, and predict what will happen next. Motion is also an important grouping cue that our
visual system uses to understand what parts of the image are connected. Similarly moving scene
points are likely to belong to the same object. For example, the movement of a shadow
accompanying an object, or various parts of a scene moving in unison—even when the
connecting mechanism is concealed—strongly suggests that they are physically linked and form
a single entity.

Motion estimation between two frames in a sequence is closely related to disparity estimation in
stereo images. A key difference is that stereo images incorporate additional constraints, as only
the camera moves—imagine a stereo pair as a sequence with a moving camera while everything
else remains static. The displacements between stereo images respect the epipolar constraint,
which allows the estimated motions to be more robust. In contrast, optical flow estimation
doesn’t assume a static world.

Optical Flow and Motion Estimation: One way of representing the motion is by computing the
displacement for each pixel between the two frames. Under this formulation, the task of motion
estimation consists of finding, for each pixel in frame 1, the location of the corresponding pixel
in frame 2.
Template Matching Approach: Another approach to compute optical flow is template
matching. This involves using a small window in one image as a template and searching for the
best match in another image.

Challenges with Template Matching: This approach can be slow, especially with large search
spaces, and it needs to be repeated for every pixel in the image. It can also lead to mismatches, as
it doesn’t use local image derivatives.

Motion Perception in the Human Visual System:

● Eye Movement and


Fixation: The eye
moves and fixates on
different scene locations
every 300 ms, indicating
that motion perception is
a crucial aspect of visual
perception. Smooth
tracking is possible only
when following a
moving object.
● Motion Illusions: The
human brain can be tricked by motion illusions, such as the waterfall illusion and
Rotating Snakes illusion, which demonstrate the brain’s adaptation to motion and its
ability to perceive motion from static images.
● Visual Cortex Processing: In the visual cortex, particularly in area V1, neurons are
selective to motion direction, with areas MT and MST playing significant roles in motion
processing, suggesting a modular architecture for the visual system.
● Computational Models: Early models like the Hassenstein-Reichardt and Adelson-
Bergen energy model have influenced the understanding of motion perception, with a
focus on motion estimation algorithms in computer vision.

Matching Based Motion Estimation:


Fig. 1
Fig.2

1. Objective: Imagine watching a video of a busy street with moving cars. We want to
figure out how each pixel in the video moves between two frames.
2. Pixel Representation: Instead of just looking at individual pixel colors, we group pixels
together into small patches. Each patch represents the local appearance around a pixel.
3. Patch Size: The size of these patches matters. Smaller patches might miss important
details, while larger ones could cause problems. It’s like zooming in or out on a picture.
4. Matching Constraint: We assume that the motion between frames is small. So, we only
look for matching patches nearby in the second frame.

5. Distance Calculation: We measure how different patches are using something called the Euclidean dista
It’s
like

comparing colors or shapes.


6. Motion Computation: For each pixel, we find the best matching patch in the second
frame. This helps us figure out how things move.
7. Algorithm Parameters: We use specific settings to make this work.

But this approach has many short coming:

1. Discrete Motion Assumption and Subpixel Accuracy:


○ Imagine you’re watching a video, and we want to figure out how things move
between frames.
○ Right now, we assume that motion happens in whole steps (like counting
numbers). But sometimes, things move just a tiny bit, like a fraction of a step.
○ To be more precise, we could look at smaller steps (like half-steps or quarter-
steps), but that would take more time and calculations.
2. Problems Near Edges:
○ When we compare patches (like little image pieces), we sometimes struggle near
edges or where things change a lot.
○ Imagine trying to match puzzle pieces near the edge of a picture—it’s tricky!
3. No Assumptions About Objects:
○ We don’t assume anything about what’s moving or how. It’s like solving a
mystery without knowing the characters.
○ Other methods might guess what objects are there, but we keep it open-ended.
4. Motion Helps Grouping:
○ The motion we find can actually help us group things together.
○ It’s like discovering new clues in our mystery—motion becomes one more piece
of evidence.

Does Human uses Matching to estimate Motion:

The human visual system does use matching to estimate motion:

● Motion Perception: Our visual system forms detailed representations of moving objects
and uses motion as a cue to segment objects from the background and understand events.
● Computational Models: Early models of motion perception, like those by Hassenstein and
Reichardt, and Adelson and Bergen, suggest that the human visual system uses pattern
matching methods based on image correlations.
● Patch Matching Algorithm: The algorithm described on the page uses patch matching to
estimate motion by finding correspondences between frames in a video sequence.
● Visual Illusions: Visual illusions, such as the waterfall illusion and motion-inducing
images, provide insights into how motion perception is implemented in the brain.

Motion Estimation Parameters:

•Image pyramid and Image Warping


•Patch based Motion (Optical Flow)
•Parametric: (Global Motion)
•Application: Image Morphing
Image Morphing is a technique in computer vision that creates a smooth, visually appealing
transition from one image to another. It’s like transforming one image into another. You might
have seen this in movies or animations where one face transforms into another.
Image Warping: This is the first step in morphing. It involves distorting the original image
using transformations like scaling (changing size), rotation (turning), and translation (moving).
For example, if you have an image of a rectangle and you want to rotate it by 45 degrees, you
would create a rotation matrix, apply it to the image, and obtain its rotated version.

○ Definition: Image warping involves distorting an image using various


transformations like scaling, rotation, translation, and more. It’s also known as
geometric transformation.
○ Purpose: We can view the image from different angles by applying these
transformations.
○ Example: Imagine you have an image of a rectangle, and you want to rotate it by
45 degrees. You’d create a rotation matrix, apply it to the image, and obtain its
rotated version. This is a simple example of warping.

Various Examples of
Image Warping:

1. Rotation:
○ Definition: Rotation is a linear transformation that rotates vectors about the
origin. When applied to a vector u in R^2, it rotates u by an angle θ anti-
clockwise about the origin.
2. Reflection:
○ Definition: Reflection about a line involves reflecting a vector u in R^2 across a
line passing through the origin, making an angle θ with the x-axis. The head of
the image vector v has the same distance from the origin as u.
2. Scaling:
○ Definition: Scaling changes the dimensions of a shape while preserving its basic
form. For instance, scaling an ellipse by a factor of 0.5 results in a smaller ellipse
that maintains the same proportions.

•Morphing: After warping, we create a smooth transition between the original image and the
warped image. This is the actual morphing process. For example, if we want to transform one
face into another, we first select corresponding features like eyes, nose, and mouth in both
images. Then, we create a smooth transition between these features to create a morphing effect.

○ Definition: Image morphing is a special form of image warping that smoothly


transitions between two or more images. It’s like transforming one image into
another.
○ Application: You’ve probably seen morphing in movies and animations. For
instance, transforming one face into another face.
○ Process:
■ Select corresponding features (e.g., eyes, nose, mouth) in both images.
■ Create a smooth transition between these features to achieve a morphing
effect.
■ Similar to an age filter, where faces gradually change over time.
The steps involved in image morphing:

1. Subdivision:
○ The initial and target images are subdivided into smaller regions, typically
triangles. These triangles serve as the building blocks for the morphing process.
2. Correspondence Mapping:
○ Create a mapping between corresponding triangles in the initial and final images.
○ Each triangle in the initial image must correspond to one triangle in the final
image.
○ This correspondence ensures that features align correctly during the morphing.
3. Individual Triangle Morphing:
○ Individually morph each triangle from the initial image to its corresponding
triangle in the final image.
○ The morphing process involves smoothly transitioning the vertices of each
triangle.
○ Common techniques include linear interpolation or weighted averages of vertex
positions.
4. Combining Triangles:
○ Combine all the morphed triangles into a single image.
○ The resulting image represents the gradual transformation from the initial to the
final state.

Image morphing is widely used in movies, animations, and creative applications. For instance, it
allows transforming one face into another or creating age filters. Each step in the process
contributes to achieving seamless and visually appealing transitions between images.

Applications of Image Morphing: Some of these applications:

1. Face Morphing:
○ Description: Face morphing is perhaps the most well-known application. It
involves transforming one face into another gradually.
○ Use Cases:
■ Age Progression/Regression: Simulate how a person’s face might look as
they age.
■ Celebrity Lookalike Filters: Create fun filters that morph a user’s face to
resemble a famous celebrity.
■ Gender Swapping: Transform a male face into a female face (and vice
versa).
2. Animating Transitions:
○ Description: Image morphing is used in animations to create smooth transitions
between scenes or frames.
○ Use Cases:
■ Scene Transitions: Transition from day to night, or from one location to
another seamlessly.
■ Shape Transformations: Morphing objects (e.g., a car transforming into
a robot).
3. Special Effects in Movies and Games:
○ Description: Image morphing enhances visual effects in movies and video games.
○ Use Cases:
■ Shape-Shifting Characters: Transforming characters (e.g., werewolves,
superheroes) smoothly.
■ Magical Transformations: Wizards, witches, or magical creatures
changing form.
4. Medical Imaging:
○ Description: In medical imaging, morphing helps visualize changes over time or
during treatments.
○ Use Cases:
■ Tumor Growth Visualization: Show how a tumor evolves over weeks or
months.
■ Facial Reconstruction: Morphing CT scans for facial reconstruction after
accidents.
5. Art and Creativity:
○ Description: Artists and designers use image morphing for creative expression.
○ Use Cases:
■ Surreal Art: Combine elements from different images seamlessly.
■ Metamorphosis: Create fantastical creatures by blending features.
6. Virtual Makeup and Plastic Surgery Simulations:
○ Description: Morphing helps visualize how makeup or surgical changes would
appear.
○ Use Cases:
■ Makeup Try-Ons: Show users how different makeup styles would look
on their face.
■ Plastic Surgery Previews: Simulate post-surgery appearance.
7. Evolutionary Biology and Anthropology:
○ Description: Morphing aids in studying evolutionary changes.
○ Use Cases:
■ Facial Evolution: Morphing skulls to understand how human faces
evolved.
■ Species Transitions: Visualize transitions between species.
8. Emotional Expression in Avatars and Chatbots:
○ Description: Morphing avatars or chatbot expressions based on user input.
○ Use Cases:
■ Animated Emojis: Create dynamic emojis that change expressions.
■ Chatbot Emotional Responses: Adjust chatbot avatars to match the
conversation tone.

You might also like