CV Unit 3
CV Unit 3
CV Unit 3
1. Point Detection
Point detection is the process of identifying points of interest in an image. These points often
represent important features like corners or regions of high intensity variation.
1.1 Why is Point Detection Important?
Points are typically used to match and align images. They can also represent landmarks in a
scene, which are useful in applications like 3D reconstruction and image stitching.
1.2 Method for Point Detection: Laplacian of Gaussian (LoG)
The Laplacian of Gaussian (LoG) is commonly used to detect points of interest in an image.
The LoG operator is a combination of a Gaussian filter and the Laplacian operator, designed to
smooth an image and enhance areas of high intensity change.
Steps in LoG:
1. Gaussian Smoothing: Blur the image to reduce noise.
2. Apply Laplacian: The Laplacian operator is applied to detect areas of rapid intensity changes
(edges, points, etc.).
3. Thresholding: Points are identified based on a set threshold of intensity values.
Example: In a simple black and white checkerboard image, the intersection of two different
colored squares (a corner) would be detected as a point of interest using LoG.
2. Line Detection
Lines represent continuous, elongated features in an image. Detecting lines helps identify
structures such as road markings, building edges, or pathways.
2.1 Why is Line Detection Important?
Line detection is useful in various applications like lane detection in autonomous vehicles,
contour tracing, and image segmentation.
2.2 Method for Line Detection: Hough Transform
The Hough Transform is a popular technique for detecting lines in an image. It works by
transforming points in the image into parameter space (typically radius and angle for lines) and
then finding patterns of collinear points that correspond to lines.
Steps in Hough Transform:
1. Edge Detection: First, detect edges using techniques like Canny Edge Detection.
2. Hough Space Transformation: Every point in the image is transformed into a sinusoidal curve
in the Hough space, representing potential lines.
3. Detect Peaks in Hough Space: The highest values in Hough space correspond to detected
lines.
Example: In an image of a road, the Hough Transform would identify lane markings as straight
or curved lines.
3. Edge Detection
Edge detection is a process of identifying boundaries between objects in an image. Edges occur
where there is a sharp change in intensity, which often corresponds to object boundaries.
Example: In a grayscale image of a cat, the Canny edge detector would highlight the edges
around the cat's ears, whiskers, and body contours.
4. Corner Detection
Corners are points in an image where two edges meet. They often represent significant features
such as the junctions of objects or changes in the image structure.
Example: In an image of a building, Harris Corner Detection would identify the corners of
windows, doors, and the roof.
1. Thresholding
- Definition: The threshold value varies over the image, adapting to local variations.
- Method:
- Divide the image into smaller regions.
- Calculate T for each region based on local statistics (mean, median).
- Apply thresholding within each region.
- Example:
- Segmenting text from a photograph where lighting conditions vary across the image.
- Definition: Automatically computes the optimal global threshold by maximizing the between-
class variance.
- Method:
- Compute the histogram of the image.
- Calculate the probability of each intensity level.
- Iterate over all possible thresholds to find the one that minimizes intra-class variance.
- Example:
- Automatically segmenting cell images in medical diagnostics without manual threshold
selection.
- Histogram Shape-Based Methods: Analyze peaks, valleys, and curvatures in the histogram.
- Clustering-Based Methods: Use clustering algorithms (e.g., k-means) to find natural
groupings of pixel intensities.
- Entropy-Based Methods: Use information theory to find thresholds that maximize entropy
between classes.
2. Edge Detection
Edge detection identifies points in an image were intensity changes sharply, indicating the
presence of boundaries.
- Sobel Operator:
- Uses convolution kernels to approximate the gradient.
- Emphasizes edges in horizontal and vertical directions.
- Prewitt Operator:
- Similar to Sobel but with different kernel weights.
- Roberts Cross Operator:
- Uses diagonal kernels to detect edges.
- Laplacian Operator:
- Second-order derivative operator.
- Detects zero-crossings corresponding to edges.
- Laplacian of Gaussian (LoG):
- Combines Gaussian smoothing with the Laplacian operator.
- Reduces noise before edge detection.
- Definition: A multi-stage algorithm that provides good detection, localization, and minimal
response.
- Stages:
1. Noise Reduction: Apply Gaussian filter.
2. Gradient Calculation: Compute intensity gradients.
3. Non-Maximum Suppression: Thin out edges to 1-pixel width.
4. Double Thresholding: Identify strong and weak edges.
5. Edge Tracking by Hysteresis: Connect weak edges that are connected to strong edges.
- Advantages:
- Low error rate.
- Well-defined edges with minimal noise.
After detecting edges, it's important to link them to form continuous boundaries.
- Based on Neighborhood:
- Examine the local neighborhood (typically 3x3 or 5x5).
- Link edge pixels that are close and have similar gradient directions.
- Criteria:
- Distance: Pixels are within a certain proximity.
- Angle: Gradient directions are similar.
3.2.2 Global Processing
- Hough Transform:
- Transforms edge points into a parameter space.
- Detects lines, circles, and other parametric shapes by finding accumulations in parameter
space.
- Graph Theory:
- Represent edge pixels as nodes in a graph.
- Use algorithms like shortest path to link edges.
3.2.3 Edge Relaxation
- Iterative Method:
- Update edge strengths based on neighboring pixels.
- Edges are reinforced or suppressed in each iteration.
- Advantages:
- Improves continuity.
- Reduces false edges.
- Histogram Analysis:
- The histogram of an image represents the distribution of pixel intensities.
- Peaks correspond to dominant intensity values (e.g., background and foreground).
- Gradient Magnitude:
- Gradient Direction:
- Line Detection:
- Each edge point x, y votes for all lines passing through it.
- Parameter space: ρ = xcosθ + ysinθ
- Accumulator array records votes; peaks correspond to detected lines.
- Circle Detection:
- Parameter space includes center coordinates and radius.
6. Practical Considerations
6.1 Preprocessing Steps
- Noise Reduction:
- Apply filters like Gaussian blur to reduce noise before thresholding or edge detection.
- Contrast Enhancement:
- Use histogram equalization to improve contrast.
6.2 Post-Processing Steps
- Morphological Operations:
- Erosion: Removes small objects or noise.
- Dilation: Fills small holes and gaps.
- Opening: Erosion followed by dilation.
- Closing: Dilation followed by erosion.
- Homogeneity: The pixels within a region are similar in terms of some properties (e.g.,
intensity, color).
- Connectedness: Pixels in a region are connected to each other spatially.
- Boundaries: The regions are distinct from their surrounding areas, forming clear boundaries.
There are several techniques for region-based segmentation, including region growing, region
splitting and merging, and watershed segmentation.
Region growing is a technique that starts with a seed pixel and expands by adding neighboring
pixels that share similar properties.
2.1.1 Steps in Region Growing
2.1.3 Example
- Medical Imaging: Region growing is commonly used to segment tumors or organs in medical
images. For example, a seed pixel is selected inside the tumor, and the region grows to include
all neighboring pixels with similar intensity values.
- Advantages:
- Simple to implement.
- Produces connected regions.
- Can handle noise to some extent.
- Disadvantages:
- Sensitive to the choice of seed points.
- May over-segment the image if regions are not homogeneous.
2.2 Region Splitting and Merging
This technique divides an image into smaller regions and merges adjacent regions that meet a
homogeneity criterion.
1. Splitting:
- Start with the entire image.
- Divide the image into quadrants.
- Recursively split the quadrants until each region satisfies a homogeneity criterion.
2. Merging:
- Adjacent regions are merged if they are similar in terms of pixel properties (e.g., intensity).
2.2.2 Algorithm
2.2.3 Example
- Satellite Imaging: Splitting and merging can be used to segment land types, where different
regions represent forests, water bodies, and urban areas.
- Advantages:
- Adaptive and flexible.
- Can handle complex images.
- Disadvantages:
- Computationally expensive.
- Requires a robust homogeneity criterion for splitting and merging.
The watershed algorithm treats an image like a topographic surface, where pixel intensities
correspond to elevations. The algorithm “floods” the surface from regional minima, and dams
are built where waters from different minima meet. These dams represent the segmented
regions.
1. Gradient Image: Compute the gradient of the image, which emphasizes edges.
2. Minima Identification: Identify the local minima in the gradient image.
3. Flooding Process: Start flooding the gradient image from the minima.
4. Region Creation: Build barriers where floods from different minima meet, creating
segmented regions.
2.3.2 Example
- Object Segmentation: In industrial applications, watershed segmentation is used to separate
overlapping objects, such as nuts and bolts on a conveyor belt.
2.3.3 Advantages and Disadvantages
- Advantages:
- Good for separating touching or overlapping objects.
- Provides accurate region boundaries.
- Disadvantages:
- Sensitive to noise.
- Over-segmentation is a common problem, often mitigated by preprocessing the image (e.g.,
smoothing).
3. Region-Based Segmentation in Practice
- Definition: Pixels are considered similar if their intensity values are close.
- Example: In a grayscale image, all pixels with intensity values between 100 and 120 might
belong to the same region.
- Definition: Pixels are grouped based on their color values (e.g., in RGB or HSV color spaces).
- Example: Grouping all pixels with similar shades of red to form a region representing a red
object.
- Definition: Pixels are grouped based on their texture properties (e.g., roughness, regularity).
- Example: Segmenting regions in an image of a fabric based on different weaving patterns.
After segmentation, post-processing is often required to refine the results and eliminate noise.
3.2.1 Morphological Operations
- Tumor Detection: Segmenting medical images to isolate and analyze tumors based on pixel
intensities.
- Organ Segmentation: Identifying and segmenting different organs, such as the heart or liver,
for diagnostic purposes.
- Land Use Classification: Segmenting satellite images into regions representing different land
types, such as forests, water bodies, and urban areas.
- Geographical Mapping: Creating maps by identifying and labeling different regions in
satellite images.
- Problem: Noise can cause over-segmentation, where small irrelevant regions are created.
- Solution: Preprocessing techniques like filtering (e.g., Gaussian blur) or morphological
operations can help reduce noise.
5.3 Over-Segmentation
Boundary representation (B-rep) in image processing is a method for describing the shape of
an object by specifying its outer boundary or contour. It captures the object's geometry by
focusing on its periphery, making it easier to perform shape-based analyses.
- Shape Analysis: B-rep helps in extracting meaningful shape features, such as corners and
edges.
- Object Recognition: Boundaries provide a way to recognize and classify objects by comparing
their shapes.
- Segmentation: Boundaries define the regions of interest in an image.
Boundary representations can be constructed using several techniques, each with its own
advantages depending on the application.
Chain codes represent boundaries by encoding the direction of movement between consecutive
boundary pixels. Each movement is assigned a specific code, and the boundary is described by
a sequence of these codes.
2.1.1 Steps in Chain Code Representation
1. Boundary Extraction: Extract the boundary of the object from the image.
2. Coding: Assign directional codes (e.g., 0 for right, 1 for up, 2 for left, etc.) based on the
movement between consecutive boundary pixels.
2.1.2 Example of Chain Code
Consider an object boundary that moves right, up, and left in a grid. The chain code could be
represented as:
- Right → 0
- Up → 1
- Left → 2
- Advantages:
- Compact representation of boundaries.
- Easy to compute.
- Disadvantages:
- Sensitive to noise.
- The chain code representation is dependent on the starting point.
A circular object might be represented by a polygon with 8 vertices, approximating the circular
shape with straight lines connecting those vertices.
- Advantages:
- Simplifies complex boundaries.
- Reduces computational complexity.
- Disadvantages:
- Approximation may lose details of the actual boundary.
- Choice of significant points can impact accuracy.
B-splines (Basis splines) and parametric curves represent boundaries using smooth, continuous
curves, defined by control points. This method is particularly useful for representing curved
boundaries.
2.3.1 B-Splines
B-splines use a set of control points to create smooth curves that approximate or interpolate the
boundary of an object.
- Control Points: These are the main points that define the shape of the spline.
- Knot Vector: This determines how control points influence the curve.
An object's boundary, which is curvilinear, can be described by a spline that passes through
key points on the boundary. The resulting curve is smooth, capturing the object's shape more
naturally than straight-line segments.
- Advantages:
- Captures smooth boundaries accurately.
- Flexible and can represent complex shapes.
- Disadvantages:
- Computationally more expensive.
- Requires careful selection of control points.
3. Boundary Descriptors
Once a boundary is represented, various descriptors can be used to characterize its shape and
properties. These descriptors are essential for comparing and recognizing objects.
3.1.1 Perimeter
The perimeter is the length of the boundary, calculated by summing the distances between
consecutive boundary points.
3.1.2 Compactness
Compactness is a measure of how closely packed an object is, given by the ratio of the square
of the perimeter to the area of the object.
3.1.3 Example
For a circular object, compactness would be close to 1, while for irregular objects, compactness
would be larger.
Boundary representations are widely used in object recognition tasks. By extracting boundary
features and comparing them to known objects, the system can classify or identify objects in
images.
- Example: Recognizing shapes like circles, triangles, or more complex objects in industrial
applications.
- Example: Identifying the boundaries of a tumor in an MRI scan for diagnostic purposes.
4.3 Robotics and Navigation
Robots and autonomous systems use boundary representations to navigate their environment,
recognizing and avoiding obstacles based on their boundaries.
- Example: A robot using boundary representations to identify and avoid objects in its path.
Boundaries can appear different at varying scales and resolutions, making it difficult to use the
same representation across different images.
- Segmentation: Helps in dividing an image into meaningful parts, like different objects or
regions.
- Texture and Color Analysis: Allows for analyzing the texture, color, or intensity inside the
region.
- Feature Extraction: Extracts features that describe the region, which are useful for
classification and recognition tasks.
Several techniques exist for representing regions in an image, depending on the type of
information needed for a particular task. The following are the most commonly used methods.
In binary region representation, each pixel within a region is assigned a value of 1 (indicating
the pixel belongs to the region) or 0 (indicating it belongs to the background). This method is
often used when we only need to distinguish between object and background.
1. Thresholding: Convert the grayscale image to a binary image by choosing a threshold value.
2. Region Identification: Mark the pixels with intensity values above the threshold as belonging
to the region.
2.1.2 Example of Binary Region Representation
Consider a grayscale image where an object has higher intensity than the background. By
setting a threshold, all pixels with intensity greater than the threshold are set to 1, while the
others are set to 0.
- Original image:
10 20 30
80 90 100
40 50 60
- After thresholding (threshold = 50):
0 0 0
1 1 1
0 1 1
Connected component labeling assigns unique labels to each distinct region in a binary image.
Pixels that are connected (e.g., neighboring pixels with the same intensity) are grouped together
to form a region.
2.2.1 Steps in Connected Components Labeling
1. Binary Image: Start with a binary image where pixels belonging to the region are labeled as
1.
2. Labeling: Traverse the image, labeling connected pixels as belonging to the same region.
3. Region Extraction: Identify and label each connected region.
- Binary image:
1001
1011
0100
- Labeled regions:
1002
1022
0300
Here, pixels connected in the same region are assigned the same label (1, 2, 3).
- Advantages:
- Useful for separating multiple objects within an image.
- Provides a unique label for each connected region.
- Disadvantages:
- Complex for large images with many connected regions.
- May fail if noise creates false connections between regions.
2.3 Region Adjacency Graph (RAG)
The Region Adjacency Graph (RAG) represents an image’s regions as nodes in a graph, where
edges connect nodes if the regions, they represent are adjacent. RAG is helpful for hierarchical
segmentation and region merging.
- Regions in an image:
R1 R1 R2
R3 R3 R2
- Corresponding RAG:
(R1)---(R2)
|
(R3)
Regions R1 and R2 are adjacent, and so are R1 and R3.
3. Region Descriptors
Once a region has been represented, various descriptors can be used to characterize its
properties, such as shape, texture, and intensity. These descriptors provide additional
information for tasks like classification and object recognition.
3.1 Area
The area is the total number of pixels within the region. This is a basic descriptor that indicates
the size of the object.
3.2 Centroid
The centroid of a region is the average of the positions of all the pixels in the region. It gives
the "center of mass" of the region.
In medical image processing, region representations are used to identify and analyze
anatomical structures, such as organs or tumors, based on their intensity characteristics.
- Example: Identifying a tumor as a region in an MRI scan based on its higher intensity
compared to surrounding tissue.
There are several types of boundary descriptors used in computer vision, each capturing
different aspects of the object’s boundary. Common types include:
1. Shape Signatures
2. Curvature
3. Fourier Descriptors
4. Chain Codes
5. Shape Invariants
- Example:
Consider a circular object with a radius of 5. The radial distance signature for every boundary
point will be constant (5), producing a flat line in the signature plot.
2.1.2 Curvature Signature
The curvature signature represents the change in the angle of the boundary at each point. It is
calculated by taking the derivative of the boundary orientation with respect to the arc length.
- Steps:
1. Calculate the angle θi at each boundary point.
2. Compute the change in angle Δθi between consecutive boundary points.
- Application: The curvature signature is useful for identifying corners and sharp turns in the
boundary.
2.2 Curvature
Curvature measures how sharply a boundary is turning at a given point. It is a key feature for
recognizing objects with distinct corners or smooth curves.
- Example: Consider the boundary of a square. Starting from the top-left corner and moving
clockwise:
Chain code: 0 0 0 6 6 6 4 4 4
- Disadvantages:
- Sensitive to noise and small perturbations.
- Not scale- or rotation-invariant.
Moment invariants are a set of functions of image moments that are invariant to translation,
rotation, and scaling. They are used to describe the shape of an object in a way that remains
consistent under transformations.
- Hu’s Moment Invariants: These are seven functions of central moments that provide a
rotation, translation, and scale-invariant description of a shape.
1. Introduction
In computer vision and image processing, regional descriptors provide a way to describe the
characteristics of regions within an image, focusing on the internal properties of an object (such
as texture, area, and moments). Image warping, on the other hand, involves geometrically
transforming an image to correct distortions or achieve a specific visual effect, often used in
tasks like image registration and texture mapping.
2. Regional Descriptors
Regional descriptors capture the properties of a region or area within an image, typically
focusing on the interior of an object rather than its boundary. These descriptors are essential
for understanding the object's characteristics, like its shape, size, or texture, and are useful for
tasks such as object recognition and segmentation.
Regional descriptors are a set of features that describe various properties of a region in an
image, including its geometric shape, texture, and statistical properties. They are especially
important when working with region-based segmentation techniques.
2.2.1 Area
Area measures the number of pixels contained within the boundary of a region or object. It’s
one of the simplest descriptors and provides an indication of the size of the object.
- Example: In a binary image, the area of an object is simply the number of pixels with value
1 (foreground pixels).
2.2.2 Centroid
The centroid is the center of mass of the region, calculated by averaging the pixel coordinates
of all the points within the region. It gives a point that represents the location of the region in
space.
- Example: For a circular object, the centroid will be the center of the circle.
2.2.3 Eccentricity
Eccentricity is a measure of how much a region deviates from being circular. It’s calculated as
the ratio of the length of the major axis to the length of the minor axis of the region's ellipse.
- Example: A circle has an eccentricity of 0, while an elongated ellipse will have a higher
eccentricity value.
2.2.4 Moments
Moments are a set of scalar quantities that provide information about the shape of the region.
Moments can be used to calculate various geometric properties, such as area, centroid, and
orientation.
- Central Moments: Central moments are calculated relative to the centroid and provide more
robust information about the shape.
- Gray-Level Co-occurrence Matrix (GLCM): Describes how often pairs of pixels with specific
values occur in a certain spatial relationship in the region. It captures texture properties such
as contrast, homogeneity, and correlation.
- Example: A region with a smooth texture will have a low contrast and a more homogeneous
distribution of pixel intensities.
2.2.6 Compactness
Compactness is a measure of how tightly packed the region’s pixels are, relative to its
perimeter.
- Example: A perfect circle has the smallest compactness value for a given area, while elongated
or irregular shapes have higher compactness.
2.3 Applications of Regional Descriptors
- Medical Imaging: Regional descriptors are used to analyze regions of interest, such as
detecting tumors or lesions in MRI scans based on size, texture, or shape.
3. Image Warping
Image warping refers to the process of geometrically transforming an image so that the objects
within it appear in a desired manner. The transformation can involve scaling, rotating,
translating, or distorting the image.
1. Define the Transformation: Specify the transformation that needs to be applied, such as affine
or perspective transformation.
2. Apply the Transformation: For each pixel in the original image, compute its new location
based on the transformation matrix.
3. Interpolation: Since pixel coordinates are typically non-integer values after transformation,
interpolation is used to estimate the new pixel values. Common interpolation methods include
nearest neighbor, bilinear, and bicubic interpolation.
- Image Registration: Aligning two images taken from different perspectives or at different
times, such as in medical imaging to track the progression of disease.
- Texture Mapping: In computer graphics, warping is used to map textures onto 3D models.
- Panorama Stitching: Image warping is used to blend multiple images together into a single
panoramic view.