Unit-3 3
Unit-3 3
Unit-3 3
Chapter- 3
I = f(x, y)
where x and y are points is space and I represents the intensity. f is a mapping of points in space
to the corresponding intensity values.
As shown in the figure below, each point on the image is mapped to an intensity value which is
the color that we see at that particular point.
For an 8-bit encoded image, intensity levels can have 2 ^ 8 values, i.e, from 0–255.
For a grayscale image, each point in space can have values in the range [0, 255].
For a color image, each point in space takes 3 values, each in the range [0, 255] for the 3 color
channels, R, G and B.
A video on the other hand varies both in space and time as shown in the equation below:
V = f(x, y, t)
where x and y are points is space, t represents time and V represents the intensity. f is a mapping
of points in space and time to the corresponding intensity values.
As shown in the video below, the intensity at a point (on the road) not only is a function of the x
and y coordinates of that point but is also a function of time. It is time that determines whether
the intensity value is contributed by a vehicle or by the road.
I think you understand the main idea behind Basic Motion Detection. It may be useful depending
on your expectations. If you don’t expect high accuracy, it may be useful, but it is certainly not
the best approach.
To address the issues described above, Background Subtractors come into play. Now it is time to
talk about Background subtraction and subtractors.
Background subtraction (BS) is a common and widely used technique for generating a
foreground mask (namely, a binary image containing the pixels belonging to moving objects in
the scene) by using static cameras.
As the name suggests, BS calculates the foreground mask performing a subtraction between the
current frame and a background model, containing the static part of the scene or, more in
general, everything that can be considered as background given the characteristics of the
observed scene.
2. Background Subtraction
What is Background Subtraction ?
Background subtraction is a fundamental technique in computer vision for isolating moving
objects from the background in a video stream. By comparing each frame of the video to a
background model, areas of significant difference can be identified as potential foreground
objects. This foreground information can then be used for various purposes, including object
detection and tracking. Background subtraction is often a crucial step in many object tracking
and detection algorithms.
Background Model
3. Foreground Detection: Compute the probability of each pixel belonging to the
background based on the Gaussian mixture model. Pixels with low probabilities are
classified as foreground.
4. Update Background: For pixels classified as background, update the Gaussian
distributions to incorporate new observations and adapt to changes in the scene
Post-processing: Apply morphological operations(erosion, dilation..) or other techniques
to refine the foreground mask and remove noise.
Couple of articles on Gaussian Mixture Modeling are provided in the reference section. It is
recommended to read those articles to gain a better understanding of how we will leverage
GMMs for background modeling.
Background Modeling:
Background modeling is the task of extracting the static background from a sequence of
video frames. Once the background has been modeled, a technique called background
subtraction which allows an image’s foreground to be extracted for further processing (object
recognition etc.) is generally used. Hence, background modeling forms an integral part of
foreground extraction and analysis.
As shown in the video below, the background consists of the road which gets hidden
occasionally owing to variations in the foreground caused by the moving vehicles.
The vehicles constitute the foreground here and their dynamic nature
accounts for the variation in intensity levels of points on the road. The
end result of this exercise would be an image (frame) where there will
be no vehicles, i.e, an image devoid of the dynamic foreground.
We are going to model each point in space for all the three image channels, namely R, G and B
as a bimodal distribution of Gaussians, where one Gaussian in the mixture accounts for the
background and the other for the foreground.
The vehicles constitute the foreground here and their dynamic nature accounts for the variation
in intensity levels of points on the road. The end result of this exercise would be an image
(frame) where there will be no vehicles, i.e, an image devoid of the dynamic foreground.
We are going to model each point in space for all the three image channels, namely R, G and B
as a bimodal distribution of Gaussians, where one Gaussian in the mixture accounts for the
background and the other for the foreground.
Algorithm:
The step-wise approach is as follows:
1. Extract frames from the video.
2. Stack the frames in an array where the final array dimensions will be (num_frames,
image_width, image_height, num_channels)
3. Initialize a dummy background image of the same size as that of the individual
frames.
4. For each point characterized by the x-coordinate, the y-coordinate and the channel,
model the intensity value across all the frames as a mixture of two Gaussians.
5. Once modeled, initialize the intensity value at the corresponding location in the
dummy background image with the mean of the most weighted cluster. The most
weighted cluster will be the one coming from the background whereas owing to the
dynamically changing and sparse nature of the foreground, the other cluster will be
weighted less.
6. Finally, the background image will contain the intensity values corresponding to the
static background.
Optical Flow:
Put simply, optical flow is the movement of pixels over time. The goal of optical flow is to
generate a motion vector for each pixel in an image between t0 and t1 by looking at two images
I0 and I1. By computing a motion vector field between each successive frame in a video, we can
track the flow of objects, or, more accurately, "brightness patterns" over extended periods of
time. However, it is important to note that while optical flow aims to represent the motion of
image patterns, it is limited to representing the apparent motion of these patterns.
This nuanced difference is explained more in depth in the Assumptions and Limitations further
section.
Two type of flows can be seen in images/ videos:
● Optical flow is the apparent motion of brightness patterns in the image. It’s the 2D
projection of the physical movement of points relative to the observer. A common
assumption is brightness constancy: the brightness of a point in the image does not
change over time. Algorithms for optical flow estimation analyze the pixel intensity
patterns between frames to determine their displacement. By observing how these
intensities change, the algorithm infers the motion of objects in the scene.
1. Optical flow is the apparent motion of brightness patterns in the image.
2. It’s the 2D projection of the physical movement of points relative to the observer.
3. A common assumption is brightness constancy: the brightness of a point in the
image does not change over time.
4. Algorithms for optical flow estimation analyze the pixel intensity patterns
between frames to determine their displacement.
5. By observing how these intensities change, the algorithm infers the motion of
objects in the scene.
6. Optical flow is the motion of each brightness pattern.
7. The Lucas-Kanade algorithm is the most popular algorithm for computing optical
flow.
8. This method assumes that motion is constant within a local neighborhood and
computes optical flow using spatial gradients.
9. It’s efficient and works well for small motion.
10. Optical flow has various applications in computer vision, including object
tracking, video recognition, and 3D reconstruction.
11. It’s like watching a movie and noticing how things move from one frame to the
next.
12. It’s all about tracking the movement of bright spots in an image.
13. It assumes that the brightness of a point in an image doesn’t change over time.
14. It uses changes in brightness to figure out how things are moving.
15. It’s like watching a bunch of dots move around on a screen.
16. It’s used in things like video games and special effects to make things look more
realistic.
● Motion Field: In computer vision, the motion field represents the 3D motion as it is
projected onto an image plane. The primary goal of object tracking is to estimate the
motion field. The motion field represents the motion of each pixel in the frame. The
motion field cannot be measured directly. In object tracking, the motion field can be
measured in some particular condition by estimating the optical flow.
1. In computer vision, the motion field represents the 3D motion as it is projected
onto an image plane.
2. The primary goal of object tracking is to estimate the motion field.
3. The motion field represents the motion of each pixel in the frame.
4. The motion field cannot be measured directly.
5. In object tracking, the motion field can be measured in some particular condition
by estimating the optical flow.
6. The motion field cannot be correctly measured for all image points.
7. The motion field is superficially similar to a dense motion field derived from the
techniques of motion estimation.
8. The motion field is used in estimating the three-dimensional nature and structure
of the scene.
9. It is also used in estimating the 3D motion of objects and the observer relative to
the scene.
10. Major interests of motion analysis is to estimate 3D motion.
11. It’s like having a map that shows you how everything in a scene is moving.
12. It’s all about figuring out the 3D motion of objects.
13. It’s used in things like robotics and self-driving cars to understand the world
around them.
14. It can’t be measured directly, so it has to be estimated.
15. It’s like having a bunch of arrows that show you how everything in a scene is
moving.
○ Ideally, the optical flow would be the same as the motion field. This is because
both are trying to estimate the same thing: the motion of objects in the scene.
○ In a perfect scenario where there is no noise, occlusion, or other visual artifacts,
and the scene is fully textured, the optical flow and motion field would be equal.
In what conditions Optical Flow and Motion Flow Are not Equal:
Figure shows: Let's say we have two images of a scene taken in quick succession, the scene in
this case is this bird in flight. So
you have an image taken at time
T and then you have an image
taken at time T, there's 𝛥 T, 𝛥 Tis
small. And now let's consider one
window, one small window in
these two images. And let's
assume that the location here is X
Y. Now, at time T plus 𝛥T, that
point has moved to a new
location, which is X plus 𝛥 X, Y,
plus 𝛥 Y. So the displacement of
the point X Y can be said to be 𝛥
X, 𝛥Y.
And if you take 𝛥X by 𝛥T and 𝛥Y by 𝛥T, then we essentially have the speed of the point in the
X and Y directions and that we will call you V. And that is the optical flow corresponding to the
point. This is what we want to measure U. V.
Inorder to solve this problem following assumptions are required:
Ix * u + Iy * v + It = 0,
● The normal flow can be computed using the constraint line equation.
Under-Constrained Nature of
Optical Flow Problem
● The optical flow constraint equation has two unknowns and only one equation.
○ Additional constraints are used to solve the under-constrained optical flow
problem.
where u(x,y) represents the horizontal motion of a point and v(x,y) represents the vertical
motion.
Giving us
Ignoring the meaning of this derivation for the moment, it is clear that we do not have enough
equations to find both u and v at every single pixel. Assuming that pixels move together allows
us to use many more equations with the same [u v], making it possible to solve for the motion of
pixels in this neighborhood
Lucas-Kanade Technique:
This produces an overly-constrained system of linear equations of the form Ad=b. Using a least
squares method for solving over-constrained systems, we reduce the problem to solving for d in
(ATA) d=ATb. More explicitly the system to solve is reduced to
Conditions for Working
3.1 Condition for an Existing Solution:
In order to solve the system the following conditions should hold:
- A should be invertible.
- A should not be too small due to noise.
Eigenvalues 𝜆1 and 𝜆2 of A should not be too small.
- A should be well-conditioned.
i.e 𝜆1 /𝜆2 should not be too large (for 𝜆1 >𝜆2).
Using this interpretation, it is apparent that an ideal region for Lucas-Kanade optical flow
estimation is a corner. Visually, if 𝜆1and 𝜆2 are too small this means the region is too “flat”.
If 𝜆1>> 𝜆2, the method suffers from the aperture problem,and may fail to solve for correct
optical flow.
Figure 3: Example of regions with large 𝜆1 and small 𝜆2 (left), small 𝜆1 and small 𝜆2 (center,
low texture region), large 𝜆1 and large 𝜆2 (right, high texture region)
3.3 Error in Lucas-Kanade:
The Lucas-Kanade method is constrained under the assumptions of optical flow. Supposing that
A is easily invertible and that there is not much noise in the image, errors may still arise
when:
● Brightness constancy is not satisfied, meaning that a pixel may change intensity from
different time steps.
● The motion is not small or and does not change gradually over time.
● Spatial coherence is not satisfied, meaning neighboring pixels do not move alike.
This may arise due to an inappropriately sized window (choosing bad k).
Lucas-Kanade was that there would be small motion of points between consecutive frames. This
assumption causes the algorithm to fall apart when dealing with large motion:
What happens if you have large motion between consecutive images? So consider this case here
you have two images taken in quick succession.
And maybe you come down to end by eight times and behaved so very small number of pixels,
but at some point all emotions are going to be less than a big so in magnitude. And if that is the
case now and that resolution. That very low resolution, your optical flow constraint equation
becomes valid again. So that's the key observation.
This method works well because it propagates information from lower resolutions to higher
resolutions while always ensuring that the optical flow constraint equation remains valid.
When we try to find the flow vector, the small motion condition is fulfilled, as the downsampled
pixels move less from frame to consecutive frame than pixels in the higher resolution image.
Here are some key points for Coarse to fine Optical Flow:
1. Optical Flow Results: Optical flow can handle scenes with varying textures and
motions. Each vector in the optical flow reveals the speed and direction of the motion of
the point.
2. Consistency with Motion: Optical flow vectors are consistent with the motion in the
scene, even in cases of rotation.
3. Template Matching Approach: Another approach to compute optical flow is template
matching. This involves using a small window in one image as a template and searching
for the best match in another image.
4. Search Space: The search space needs to be large enough to account for the unknown
magnitude of motion.
5. Optical Flow Vector: The difference between the locations of the two windows is the
optical flow vector.
6. Challenges with Template Matching: This approach can be slow, especially with large
search spaces, and it needs to be repeated for every pixel in the image. It can also lead to
mismatches, as it doesn’t use local image derivatives.
7. Advantages of Using Local Derivatives: Using local derivatives to compute optical
flow, as done earlier, has major advantages. It can prevent mismatches that might occur
when appearances match but don’t correspond to the optical flow.
Motion Illusion: Motion illusion, also known as “apparent motion” or “illusory motion”, is
an optical illusion in which a static image appears to be moving due to the cognitive effects of
interacting color contrasts, object shapes, and position. It’s a fascinating aspect of visual
perception and plays a significant role in our understanding of the world around us.
In the context of computer vision, motion illusion can be seen as a challenge or an opportunity.
For instance, it can be a challenge when trying to analyze video data, as the perceived motion
may not align with the actual motion in the scene. On the other hand, understanding the
principles behind motion illusions can help in designing more effective algorithms for motion
detection and analysis.
Motion is a perceptual attribute: the visual system infers motion from the changing pattern of
light in the retinal image. Often the inference is correct. Sometimes it is not. In class I showed
you a number of demonstrations in which motion is misperceived. Below is one example of a
visual illusion of motion that I made. It is a tribute to Duchamp's cubist painting titled "Nude
Descending a Staircase" in which the changing pattern of light gives the illusion of motion even
though she never gets anywhere (you made need to double-click on the image below or reload
the page for the animation to play).
Another example is the motion aftereffect. Stare at the center of the following animation for
about a minute, as it expands continuously (you may need to reload the page to get it moving
again after it stops), then fix your gaze on the colorful texture pattern next to it.
Optical illusions are fascinating phenomena that play tricks on our eyes and deceive our
perception of reality. These visual illusions occur when our brain interprets the information
received by our eyes in a way that does not match the physical reality. Optical illusions can be
found in various forms, such as geometric illusions, ambiguous illusions, and cognitive illusions.
They often challenge our understanding of depth, size, color, and motion. These illusions have
been studied by psychologists and neuroscientists to gain insights into how our brain processes
visual information. Now, let’s take a look at some key takeaways about optical illusions in the
table below:
Geometric Illusions These illusions involve geometric shapes that appear distorted or
misaligned.
Ambiguous Illusions These illusions can be interpreted in more than one way, leading to
confusion or uncertainty.
Cognitive Illusions These illusions exploit our cognitive processes, such as memory and
attention, to create perceptual distortions.
Optical illusions are fascinating phenomena that play tricks on our visual perception. They occur
when our brain processes visual stimuli in a way that creates an illusionary effect, causing us to
see objects or images differently than they actually are. These illusions challenge our perception
of reality and provide insights into the workings of our visual system and cognitive psychology.
Optical illusions can be defined as perceptual distortions that occur when our brain processes
visual information in a way that deviates from the objective reality of the stimulus. They can
occur due to various factors, such as the way our eyes perceive color, depth perception, or the
brain’s tendency to fill in missing information. Optical illusions can be created using a
combination of lines, shapes, colors, and patterns to trick our brain into perceiving something
that is not actually there.
The science behind optical illusions lies in the complex interplay between our visual system and
the brain. Our eyes receive visual stimuli, which are then processed by the brain to create our
subjective perception of the world around us. However, this process is not always accurate, and
our brain can be easily fooled by certain visual cues.
One of the key factors that contribute to optical illusions is the brain’s tendency to perceive
patterns and organize visual information. This perceptual organization can sometimes lead to
misinterpretations of visual stimuli, resulting in illusions. Additionally, our brain relies on past
experiences and expectations to interpret what we see, which can further contribute to the
creation of illusions.
There are various types of optical illusions that can be categorized based on the specific visual
effects they create. Some common types of optical illusions include:
1. Ambiguous Figures: These illusions involve images or patterns that can be
interpreted in more than one way, leading to a perceptual flip-flop between
different interpretations.
2. Motion Illusions: These illusions create a sense of movement or motion where
there is none. They can make static images appear as if they are moving or create
the illusion of objects changing their position.
3. Geometric Illusions: Geometric illusions involve the use of geometric shapes and
patterns to create distortions in size, length, or angles. These illusions can make
objects appear larger or smaller than they actually are.
4. Illusionary Contours: Illusionary contours are perceived edges or boundaries that
are not actually present in the stimulus. Our brain fills in the missing information
to create the illusion of contours or shapes.
5. Illusory Motion: Illusory motion illusions create the perception of movement in
static images. They can make objects appear to rotate, vibrate, or pulsate, even
though they are not actually moving.
6. Illusory Patterns: Illusory patterns involve the creation of patterns or textures that
are not actually present in the stimulus. These patterns can trick our brain into
perceiving regularity or organization where there is none.
These are just a few examples of the many types of optical illusions that exist.
Motion Estimation:
An important task in both human and computer vision is to model how images (and the
underlying scene) change over time. Our visual input is constantly moving, even when the world
is static. Motion tells us how objects move in the world, and how we move relative to the scene.
It is an important grouping cue that lets us discover new objects. It also tells us about the three-
dimensional (3D) structure of the scene.
Look around you and write down how many things are moving and what they are doing. Take
note of the things that are moving because you interact with them (such as this book or your
computer) and the things that move independently of you.
The first observation you might make is that not much is happening. Nothing really moves. Most
of the world is remarkably static, and when something moves it attracts our attention. However,
motion perception becomes extremely powerful as soon as the world starts to move. Our visual
system can form a detailed representation of moving objects with complex shapes. Even in front
of a static image, we form a representation of the dynamics of an object, as shown in the
photograph in ?
Looking at the power of that static image to convey motion, one wonders if seeing movies is
really necessary. From the notes you took about what moves around you, probably you deduced
that the world is, most of the time, static.
And yet, biological systems need motion signals to learn. Hubel and Wiesel Wiesel (1982)
observed that a paralyzed kitten was not capable of developing its visual system properly. The
human eye is constantly moving with saccades and microsaccades. Even when the world is
static, the eye is a moving camera that explores the world. Motion tells us about the temporal
evolution of a 3D scene, and is important for predicting events, perceiving physics, and
recognizing actions. Motion allows us to segment objects from the static background, understand
events, and predict what will happen next. Motion is also an important grouping cue that our
visual system uses to understand what parts of the image are connected. Similarly moving scene
points are likely to belong to the same object. For example, the movement of a shadow
accompanying an object, or various parts of a scene moving in unison—even when the
connecting mechanism is concealed—strongly suggests that they are physically linked and form
a single entity.
Motion estimation between two frames in a sequence is closely related to disparity estimation in
stereo images. A key difference is that stereo images incorporate additional constraints, as only
the camera moves—imagine a stereo pair as a sequence with a moving camera while everything
else remains static. The displacements between stereo images respect the epipolar constraint,
which allows the estimated motions to be more robust. In contrast, optical flow estimation
doesn’t assume a static world.
Optical Flow and Motion Estimation: One way of representing the motion is by computing the
displacement for each pixel between the two frames. Under this formulation, the task of motion
estimation consists of finding, for each pixel in frame 1, the location of the corresponding pixel
in frame 2.
Template Matching Approach: Another approach to compute optical flow is template
matching. This involves using a small window in one image as a template and searching for the
best match in another image.
Challenges with Template Matching: This approach can be slow, especially with large search
spaces, and it needs to be repeated for every pixel in the image. It can also lead to mismatches, as
it doesn’t use local image derivatives.
1. Objective: Imagine watching a video of a busy street with moving cars. We want to
figure out how each pixel in the video moves between two frames.
2. Pixel Representation: Instead of just looking at individual pixel colors, we group pixels
together into small patches. Each patch represents the local appearance around a pixel.
3. Patch Size: The size of these patches matters. Smaller patches might miss important
details, while larger ones could cause problems. It’s like zooming in or out on a picture.
4. Matching Constraint: We assume that the motion between frames is small. So, we only
look for matching patches nearby in the second frame.
5. Distance Calculation: We measure how different patches are using something called the Euclidean dista
It’s
like
● Motion Perception: Our visual system forms detailed representations of moving objects
and uses motion as a cue to segment objects from the background and understand events.
● Computational Models: Early models of motion perception, like those by Hassenstein and
Reichardt, and Adelson and Bergen, suggest that the human visual system uses pattern
matching methods based on image correlations.
● Patch Matching Algorithm: The algorithm described on the page uses patch matching to
estimate motion by finding correspondences between frames in a video sequence.
● Visual Illusions: Visual illusions, such as the waterfall illusion and motion-inducing
images, provide insights into how motion perception is implemented in the brain.
Various Examples of
Image Warping:
1. Rotation:
○ Definition: Rotation is a linear transformation that rotates vectors about the
origin. When applied to a vector u in R^2, it rotates u by an angle θ anti-
clockwise about the origin.
2. Reflection:
○ Definition: Reflection about a line involves reflecting a vector u in R^2 across a
line passing through the origin, making an angle θ with the x-axis. The head of
the image vector v has the same distance from the origin as u.
2. Scaling:
○ Definition: Scaling changes the dimensions of a shape while preserving its basic
form. For instance, scaling an ellipse by a factor of 0.5 results in a smaller ellipse
that maintains the same proportions.
•Morphing: After warping, we create a smooth transition between the original image and the
warped image. This is the actual morphing process. For example, if we want to transform one
face into another, we first select corresponding features like eyes, nose, and mouth in both
images. Then, we create a smooth transition between these features to create a morphing effect.
1. Subdivision:
○ The initial and target images are subdivided into smaller regions, typically
triangles. These triangles serve as the building blocks for the morphing process.
2. Correspondence Mapping:
○ Create a mapping between corresponding triangles in the initial and final images.
○ Each triangle in the initial image must correspond to one triangle in the final
image.
○ This correspondence ensures that features align correctly during the morphing.
3. Individual Triangle Morphing:
○ Individually morph each triangle from the initial image to its corresponding
triangle in the final image.
○ The morphing process involves smoothly transitioning the vertices of each
triangle.
○ Common techniques include linear interpolation or weighted averages of vertex
positions.
4. Combining Triangles:
○ Combine all the morphed triangles into a single image.
○ The resulting image represents the gradual transformation from the initial to the
final state.
Image morphing is widely used in movies, animations, and creative applications. For instance, it
allows transforming one face into another or creating age filters. Each step in the process
contributes to achieving seamless and visually appealing transitions between images.
1. Face Morphing:
○ Description: Face morphing is perhaps the most well-known application. It
involves transforming one face into another gradually.
○ Use Cases:
■ Age Progression/Regression: Simulate how a person’s face might look as
they age.
■ Celebrity Lookalike Filters: Create fun filters that morph a user’s face to
resemble a famous celebrity.
■ Gender Swapping: Transform a male face into a female face (and vice
versa).
2. Animating Transitions:
○ Description: Image morphing is used in animations to create smooth transitions
between scenes or frames.
○ Use Cases:
■ Scene Transitions: Transition from day to night, or from one location to
another seamlessly.
■ Shape Transformations: Morphing objects (e.g., a car transforming into
a robot).
3. Special Effects in Movies and Games:
○ Description: Image morphing enhances visual effects in movies and video games.
○ Use Cases:
■ Shape-Shifting Characters: Transforming characters (e.g., werewolves,
superheroes) smoothly.
■ Magical Transformations: Wizards, witches, or magical creatures
changing form.
4. Medical Imaging:
○ Description: In medical imaging, morphing helps visualize changes over time or
during treatments.
○ Use Cases:
■ Tumor Growth Visualization: Show how a tumor evolves over weeks or
months.
■ Facial Reconstruction: Morphing CT scans for facial reconstruction after
accidents.
5. Art and Creativity:
○ Description: Artists and designers use image morphing for creative expression.
○ Use Cases:
■ Surreal Art: Combine elements from different images seamlessly.
■ Metamorphosis: Create fantastical creatures by blending features.
6. Virtual Makeup and Plastic Surgery Simulations:
○ Description: Morphing helps visualize how makeup or surgical changes would
appear.
○ Use Cases:
■ Makeup Try-Ons: Show users how different makeup styles would look
on their face.
■ Plastic Surgery Previews: Simulate post-surgery appearance.
7. Evolutionary Biology and Anthropology:
○ Description: Morphing aids in studying evolutionary changes.
○ Use Cases:
■ Facial Evolution: Morphing skulls to understand how human faces
evolved.
■ Species Transitions: Visualize transitions between species.
8. Emotional Expression in Avatars and Chatbots:
○ Description: Morphing avatars or chatbot expressions based on user input.
○ Use Cases:
■ Animated Emojis: Create dynamic emojis that change expressions.
■ Chatbot Emotional Responses: Adjust chatbot avatars to match the
conversation tone.