CV Unit 3

Download as pdf or txt
Download as pdf or txt
You are on page 1of 41

Computer Vision – Unit 3

Topic 1 - Point, Line and Edge and Corner Detection

Introduction to Feature Detection


In computer vision and image processing, feature detection plays a critical role in
understanding and interpreting images. Features such as points, lines, edges, and corners help
to identify meaningful information within an image, enabling tasks like object recognition,
image matching, and motion tracking.

1. Point Detection
Point detection is the process of identifying points of interest in an image. These points often
represent important features like corners or regions of high intensity variation.
1.1 Why is Point Detection Important?
Points are typically used to match and align images. They can also represent landmarks in a
scene, which are useful in applications like 3D reconstruction and image stitching.
1.2 Method for Point Detection: Laplacian of Gaussian (LoG)
The Laplacian of Gaussian (LoG) is commonly used to detect points of interest in an image.
The LoG operator is a combination of a Gaussian filter and the Laplacian operator, designed to
smooth an image and enhance areas of high intensity change.
Steps in LoG:
1. Gaussian Smoothing: Blur the image to reduce noise.
2. Apply Laplacian: The Laplacian operator is applied to detect areas of rapid intensity changes
(edges, points, etc.).
3. Thresholding: Points are identified based on a set threshold of intensity values.
Example: In a simple black and white checkerboard image, the intersection of two different
colored squares (a corner) would be detected as a point of interest using LoG.

2. Line Detection
Lines represent continuous, elongated features in an image. Detecting lines helps identify
structures such as road markings, building edges, or pathways.
2.1 Why is Line Detection Important?
Line detection is useful in various applications like lane detection in autonomous vehicles,
contour tracing, and image segmentation.
2.2 Method for Line Detection: Hough Transform
The Hough Transform is a popular technique for detecting lines in an image. It works by
transforming points in the image into parameter space (typically radius and angle for lines) and
then finding patterns of collinear points that correspond to lines.
Steps in Hough Transform:
1. Edge Detection: First, detect edges using techniques like Canny Edge Detection.
2. Hough Space Transformation: Every point in the image is transformed into a sinusoidal curve
in the Hough space, representing potential lines.
3. Detect Peaks in Hough Space: The highest values in Hough space correspond to detected
lines.
Example: In an image of a road, the Hough Transform would identify lane markings as straight
or curved lines.

3. Edge Detection
Edge detection is a process of identifying boundaries between objects in an image. Edges occur
where there is a sharp change in intensity, which often corresponds to object boundaries.

3.1 Why is Edge Detection Important?


Edges are crucial for tasks such as object recognition, shape detection, and segmentation, as
they define the shape and structure of objects.

3.2 Methods for Edge Detection:


There are several methods for detecting edges in images:
3.2.1 Sobel Operator
The Sobel operator computes the gradient of the image intensity at each pixel, highlighting
areas of high gradient, which correspond to edges.
Horizontal Gradient (Gx): Detects edges in the horizontal direction.
Vertical Gradient (Gy): Detects edges in the vertical direction.

The overall gradient magnitude is computed as:


3.2.2 Canny Edge Detector
The Canny edge detector is an advanced edge detection algorithm that includes:
Noise Reduction: Gaussian filtering to remove noise.
Gradient Calculation: Similar to Sobel, it computes intensity gradients.
Non-Maximum Suppression: Ensures that only local maxima in the gradient are considered as
edges.
Hysteresis Thresholding: Two thresholds are used to identify strong and weak edges, ensuring
that only meaningful edges are retained.

Example: In a grayscale image of a cat, the Canny edge detector would highlight the edges
around the cat's ears, whiskers, and body contours.

4. Corner Detection
Corners are points in an image where two edges meet. They often represent significant features
such as the junctions of objects or changes in the image structure.

4.1 Why is Corner Detection Important?


Corners are particularly useful in tasks such as object tracking, stereo vision, and image
matching. They provide stable features that can be used to compare images taken from different
perspectives.

4.2 Methods for Corner Detection:

4.2.1 Harris Corner Detection


The Harris Corner Detector is one of the most widely used algorithms for detecting corners. It
is based on the idea that a corner should have significant intensity variation in multiple
directions.

Steps in Harris Corner Detection:


1. Gradient Calculation: Compute intensity gradients (Gx and Gy) in both horizontal and
vertical directions.
2. Autocorrelation Matrix: Use the gradients to form an autocorrelation matrix, which measures
the intensity changes in the neighborhood of each pixel.
3. Eigenvalue Calculation: The eigenvalues of the autocorrelation matrix indicate the presence
of edges and corners. If both eigenvalues are large, a corner is detected.

The Harris response R is calculated as:

where M is the autocorrelation matrix and k is a constant (usually set to 0.04).

4.2.2 Shi-Tomasi Corner Detection (Good Features to Track)


An improvement over the Harris detector, the Shi-Tomasi method selects corners by comparing
eigenvalues. Corners are detected where the minimum eigenvalue is above a certain threshold,
ensuring more accurate results.

Example: In an image of a building, Harris Corner Detection would identify the corners of
windows, doors, and the roof.

5. Applications of Feature Detection


Object Recognition: Points, edges, and corners help in identifying and recognizing objects in
images.
Image Matching: Corner and edge features are used to align images in tasks like stitching
panoramas or comparing satellite images.
Autonomous Driving: Edge and line detection help in lane and obstacle detection.
3D Reconstruction: Corner detection is crucial for creating accurate 3D models from multiple
2D images.
Topic 2 – Thresholding, Edge, and Boundary Linking

1. Thresholding

Thresholding is the simplest method of image segmentation. It transforms a grayscale image


into a binary image by assigning pixel values based on a threshold.

1.1 Importance of Thresholding

- Simplifies Images: Reduces complexity by converting images to binary form.


- Segmentation: Separates objects (foreground) from the background.
- Preprocessing: Often a preliminary step before higher-level processing.

1.2 Types of Thresholding

1.2.1 Global Thresholding

- Definition: Applies a single threshold value to the entire image.


- Method:
- Choose a threshold value T,
- For each pixel x, y:
If f (x, y) > T, set f′ (x, y) = 1 (object)
Else, set f'(x, y) = 0 (background).
- Example:
- Separating printed text from a uniformly lit background in a scanned document.

1.2.2 Adaptive Thresholding

- Definition: The threshold value varies over the image, adapting to local variations.
- Method:
- Divide the image into smaller regions.
- Calculate T for each region based on local statistics (mean, median).
- Apply thresholding within each region.
- Example:
- Segmenting text from a photograph where lighting conditions vary across the image.

1.2.3 Otsu's Method

- Definition: Automatically computes the optimal global threshold by maximizing the between-
class variance.
- Method:
- Compute the histogram of the image.
- Calculate the probability of each intensity level.
- Iterate over all possible thresholds to find the one that minimizes intra-class variance.
- Example:
- Automatically segmenting cell images in medical diagnostics without manual threshold
selection.

1.2.4 Multilevel Thresholding

- Definition: Uses multiple thresholds to segment an image into several regions.


- Method:
- Select multiple thresholds T1, T2, T3…. Tn.
- Assign pixels to different classes based on which threshold interval they fall into.
- Example:
- Segmenting an image into background, shadow, and object regions.

1.3 Threshold Selection Techniques

- Histogram Shape-Based Methods: Analyze peaks, valleys, and curvatures in the histogram.
- Clustering-Based Methods: Use clustering algorithms (e.g., k-means) to find natural
groupings of pixel intensities.
- Entropy-Based Methods: Use information theory to find thresholds that maximize entropy
between classes.

1.4 Challenges in Thresholding

- Uneven Illumination: Can cause global thresholding to fail.


- Noise: Random variations in intensity can affect thresholding accuracy.
- Solution:
- Preprocess the image with filters (Gaussian, median).
- Use adaptive thresholding for uneven lighting conditions.

2. Edge Detection

Edge detection identifies points in an image were intensity changes sharply, indicating the
presence of boundaries.

2.1 Importance of Edge Detection

- Object Recognition: Edges define object boundaries.


- Feature Extraction: Essential for extracting meaningful features from images.
- Image Segmentation: Helps in dividing the image into regions of interest.

2.2 Edge Detection Operators

2.2.1 Gradient-Based Methods

- Sobel Operator:
- Uses convolution kernels to approximate the gradient.
- Emphasizes edges in horizontal and vertical directions.
- Prewitt Operator:
- Similar to Sobel but with different kernel weights.
- Roberts Cross Operator:
- Uses diagonal kernels to detect edges.

2.2.2 Laplacian-Based Methods

- Laplacian Operator:
- Second-order derivative operator.
- Detects zero-crossings corresponding to edges.
- Laplacian of Gaussian (LoG):
- Combines Gaussian smoothing with the Laplacian operator.
- Reduces noise before edge detection.

2.2.3 Canny Edge Detector

- Definition: A multi-stage algorithm that provides good detection, localization, and minimal
response.
- Stages:
1. Noise Reduction: Apply Gaussian filter.
2. Gradient Calculation: Compute intensity gradients.
3. Non-Maximum Suppression: Thin out edges to 1-pixel width.
4. Double Thresholding: Identify strong and weak edges.
5. Edge Tracking by Hysteresis: Connect weak edges that are connected to strong edges.
- Advantages:
- Low error rate.
- Well-defined edges with minimal noise.

2.3 Edge Detection Challenges

- Noise Sensitivity: Noise can create false edges.


- Edge Localization: Balancing between detecting all edges and accurately locating them.
- Computational Complexity: Advanced methods like Canny are computationally intensive.
3. Edge and Boundary Linking

After detecting edges, it's important to link them to form continuous boundaries.

3.1 Importance of Edge and Boundary Linking

- Completes Object Contours: Essential for shape analysis.


- Facilitates Object Recognition: Continuous boundaries are easier to match with models.
- Improves Segmentation: Leads to more accurate region segmentation.

3.2 Techniques for Edge Linking

3.2.1 Local Processing

- Based on Neighborhood:
- Examine the local neighborhood (typically 3x3 or 5x5).
- Link edge pixels that are close and have similar gradient directions.
- Criteria:
- Distance: Pixels are within a certain proximity.
- Angle: Gradient directions are similar.
3.2.2 Global Processing

- Hough Transform:
- Transforms edge points into a parameter space.
- Detects lines, circles, and other parametric shapes by finding accumulations in parameter
space.
- Graph Theory:
- Represent edge pixels as nodes in a graph.
- Use algorithms like shortest path to link edges.
3.2.3 Edge Relaxation

- Iterative Method:
- Update edge strengths based on neighboring pixels.
- Edges are reinforced or suppressed in each iteration.
- Advantages:
- Improves continuity.
- Reduces false edges.

3.3 Boundary Detection Techniques

3.3.1 Active Contours (Snakes)

- Definition: Energy-minimizing curves that evolve to fit object boundaries.


- Components:
- Internal Energy: Smoothness constraints.
- External Energy: Attracted to features like edges.
- Method:
- Initialize a contour close to the desired boundary.
- Iteratively adjust the contour to minimize the energy function.
- Example:
- Segmenting organs in medical images.

3.3.2 Level Set Methods

- Definition: Implicitly represents contours as level sets of a higher-dimensional function.


- Advantages:
- Handles topological changes naturally (e.g., splitting, merging).
- Method:
- Evolve the level set function according to partial differential equations.
4. Examples and Applications

4.1 Example: Thresholding

Scenario: Segmenting handwritten digits from scanned documents.

- Problem: Variations in ink intensity and paper background.


- Solution:
- Use adaptive thresholding to handle uneven lighting.
- Apply morphological operations (erosion, dilation) to refine the binary image.

4.2 Example: Edge Detection

Scenario: Detecting road edges in autonomous driving.

- Problem: Variable lighting and complex backgrounds.


- Solution:
- Use the Canny edge detector for robust edge detection.
- Apply edge linking to form continuous road boundaries.
4.3 Example: Edge and Boundary Linking

Scenario: Medical image segmentation for tumor detection.

- Problem: Edges are weak and discontinuous due to noise.


- Solution:
- Apply edge relaxation to strengthen true edges.
- Use active contours to accurately delineate tumor boundaries.
5. Detailed Explanations

5.1 Mathematical Foundations of Thresholding

- Histogram Analysis:
- The histogram of an image represents the distribution of pixel intensities.
- Peaks correspond to dominant intensity values (e.g., background and foreground).

5.2 Gradient Calculation in Edge Detection

- Gradient Magnitude:

- Gx, Gy : Gradients in x and y directions.

- Gradient Direction:

- Used in non-maximum suppression to thin edges.


5.3 Hough Transform for Edge Linking

- Line Detection:
- Each edge point x, y votes for all lines passing through it.
- Parameter space: ρ = xcosθ + ysinθ
- Accumulator array records votes; peaks correspond to detected lines.

- Circle Detection:
- Parameter space includes center coordinates and radius.
6. Practical Considerations
6.1 Preprocessing Steps
- Noise Reduction:
- Apply filters like Gaussian blur to reduce noise before thresholding or edge detection.
- Contrast Enhancement:
- Use histogram equalization to improve contrast.
6.2 Post-Processing Steps
- Morphological Operations:
- Erosion: Removes small objects or noise.
- Dilation: Fills small holes and gaps.
- Opening: Erosion followed by dilation.
- Closing: Dilation followed by erosion.

- Edge Linking Refinement:


- Remove spurious edges.
- Ensure continuity of significant edges.
Topic 3 - Region-Based Segmentation

1. What is Region-Based Segmentation?

Region-based segmentation identifies regions of interest in an image by grouping pixels that


share common characteristics. Unlike edge-based methods that focus on detecting boundaries,
region-based methods look for areas with uniform properties.

1.1 Importance of Region-Based Segmentation

- Object Recognition: Helps in grouping pixels corresponding to objects.


- Image Interpretation: Simplifies images by reducing them to meaningful regions.
- Preprocessing: Often used as an initial step for further image analysis tasks.

1.2 Characteristics of Regions

- Homogeneity: The pixels within a region are similar in terms of some properties (e.g.,
intensity, color).
- Connectedness: Pixels in a region are connected to each other spatially.
- Boundaries: The regions are distinct from their surrounding areas, forming clear boundaries.

2. Region-Based Segmentation Techniques

There are several techniques for region-based segmentation, including region growing, region
splitting and merging, and watershed segmentation.

2.1 Region Growing

Region growing is a technique that starts with a seed pixel and expands by adding neighboring
pixels that share similar properties.
2.1.1 Steps in Region Growing

1. Seed Selection: Select one or more initial seed points.


2. Region Expansion: Add neighboring pixels to the region based on a similarity criterion (e.g.,
intensity, texture).
3. Stopping Criterion: Continue growing the region until no more similar pixels are found.

2.1.2 Region Growing Algorithm

1. Select seed points in the image.


2. Define a similarity measure (e.g., pixel intensity difference).
3. Add neighboring pixels to the region if they satisfy the similarity criterion.
4. Repeat until no more pixels can be added.

2.1.3 Example

- Medical Imaging: Region growing is commonly used to segment tumors or organs in medical
images. For example, a seed pixel is selected inside the tumor, and the region grows to include
all neighboring pixels with similar intensity values.

2.1.4 Advantages and Disadvantages

- Advantages:
- Simple to implement.
- Produces connected regions.
- Can handle noise to some extent.

- Disadvantages:
- Sensitive to the choice of seed points.
- May over-segment the image if regions are not homogeneous.
2.2 Region Splitting and Merging

This technique divides an image into smaller regions and merges adjacent regions that meet a
homogeneity criterion.

2.2.1 Steps in Region Splitting and Merging

1. Splitting:
- Start with the entire image.
- Divide the image into quadrants.
- Recursively split the quadrants until each region satisfies a homogeneity criterion.

2. Merging:
- Adjacent regions are merged if they are similar in terms of pixel properties (e.g., intensity).

2.2.2 Algorithm

1. Begin with the whole image as a single region.


2. Split the region if it fails to meet the homogeneity criterion.
3. Continue splitting recursively.
4. Merge adjacent regions if they meet the homogeneity criterion.

2.2.3 Example

- Satellite Imaging: Splitting and merging can be used to segment land types, where different
regions represent forests, water bodies, and urban areas.

2.2.4 Advantages and Disadvantages

- Advantages:
- Adaptive and flexible.
- Can handle complex images.

- Disadvantages:
- Computationally expensive.
- Requires a robust homogeneity criterion for splitting and merging.

2.3 Watershed Segmentation

The watershed algorithm treats an image like a topographic surface, where pixel intensities
correspond to elevations. The algorithm “floods” the surface from regional minima, and dams
are built where waters from different minima meet. These dams represent the segmented
regions.

2.3.1 Steps in Watershed Segmentation

1. Gradient Image: Compute the gradient of the image, which emphasizes edges.
2. Minima Identification: Identify the local minima in the gradient image.
3. Flooding Process: Start flooding the gradient image from the minima.
4. Region Creation: Build barriers where floods from different minima meet, creating
segmented regions.

2.3.2 Example
- Object Segmentation: In industrial applications, watershed segmentation is used to separate
overlapping objects, such as nuts and bolts on a conveyor belt.
2.3.3 Advantages and Disadvantages
- Advantages:
- Good for separating touching or overlapping objects.
- Provides accurate region boundaries.
- Disadvantages:
- Sensitive to noise.
- Over-segmentation is a common problem, often mitigated by preprocessing the image (e.g.,
smoothing).
3. Region-Based Segmentation in Practice

3.1 Homogeneity Criteria

The success of region-based segmentation depends on defining appropriate homogeneity


criteria, which determine when pixels should be grouped together.

3.1.1 Intensity-Based Homogeneity

- Definition: Pixels are considered similar if their intensity values are close.
- Example: In a grayscale image, all pixels with intensity values between 100 and 120 might
belong to the same region.

3.1.2 Color-Based Homogeneity

- Definition: Pixels are grouped based on their color values (e.g., in RGB or HSV color spaces).
- Example: Grouping all pixels with similar shades of red to form a region representing a red
object.

3.1.3 Texture-Based Homogeneity

- Definition: Pixels are grouped based on their texture properties (e.g., roughness, regularity).
- Example: Segmenting regions in an image of a fabric based on different weaving patterns.

3.2 Post-Processing Techniques

After segmentation, post-processing is often required to refine the results and eliminate noise.
3.2.1 Morphological Operations

- Erosion: Removes small objects or noise.


- Dilation: Fills gaps between segmented regions.
- Closing: Dilation followed by erosion to close small holes.
- Opening: Erosion followed by dilation to remove noise.

3.2.2 Region Merging and Refinement

- Adjacent regions may need to be merged based on similarity criteria.


- Small, irrelevant regions may be removed or merged with larger regions.

4. Applications of Region-Based Segmentation

4.1 Medical Imaging

- Tumor Detection: Segmenting medical images to isolate and analyze tumors based on pixel
intensities.
- Organ Segmentation: Identifying and segmenting different organs, such as the heart or liver,
for diagnostic purposes.

4.2 Satellite Imaging

- Land Use Classification: Segmenting satellite images into regions representing different land
types, such as forests, water bodies, and urban areas.
- Geographical Mapping: Creating maps by identifying and labeling different regions in
satellite images.

4.3 Industrial Inspection

- Defect Detection: Identifying defects in manufacturing products by segmenting regions with


abnormal intensity or texture patterns.
- Object Counting: Segmenting and counting objects, such as nuts and bolts, on a conveyor
belt.
5. Challenges in Region-Based Segmentation

5.1 Noise and Artifacts

- Problem: Noise can cause over-segmentation, where small irrelevant regions are created.
- Solution: Preprocessing techniques like filtering (e.g., Gaussian blur) or morphological
operations can help reduce noise.

5.2 Homogeneity Criteria Selection

- Problem: Choosing an inappropriate homogeneity criterion can result in poor segmentation.


- Solution: Experimenting with different criteria, such as intensity, color, or texture, and
combining them if necessary.

5.3 Over-Segmentation

- Problem: Some techniques, like watershed, are prone to over-segmentation.


- Solution: Apply preprocessing (e.g., gradient smoothing) or post-processing (e.g., region
merging) to refine the results.
Topic 4 - Boundary Representations

1. What is Boundary Representation?

Boundary representation (B-rep) in image processing is a method for describing the shape of
an object by specifying its outer boundary or contour. It captures the object's geometry by
focusing on its periphery, making it easier to perform shape-based analyses.

1.1 Why Boundary Representations?

- Shape Analysis: B-rep helps in extracting meaningful shape features, such as corners and
edges.
- Object Recognition: Boundaries provide a way to recognize and classify objects by comparing
their shapes.
- Segmentation: Boundaries define the regions of interest in an image.

2. Techniques for Boundary Representation

Boundary representations can be constructed using several techniques, each with its own
advantages depending on the application.

2.1 Chain Code Representation

Chain codes represent boundaries by encoding the direction of movement between consecutive
boundary pixels. Each movement is assigned a specific code, and the boundary is described by
a sequence of these codes.
2.1.1 Steps in Chain Code Representation

1. Boundary Extraction: Extract the boundary of the object from the image.
2. Coding: Assign directional codes (e.g., 0 for right, 1 for up, 2 for left, etc.) based on the
movement between consecutive boundary pixels.
2.1.2 Example of Chain Code

Consider an object boundary that moves right, up, and left in a grid. The chain code could be
represented as:

- Right → 0
- Up → 1
- Left → 2

So, the boundary sequence would be: `[0, 1, 2]`.

2.1.3 Advantages and Disadvantages

- Advantages:
- Compact representation of boundaries.
- Easy to compute.

- Disadvantages:
- Sensitive to noise.
- The chain code representation is dependent on the starting point.

2.2 Polygonal Approximations

Polygonal approximations represent boundaries as a series of straight-line segments connecting


significant points on the contour. This method simplifies complex boundaries by reducing them
to a set of vertices and edges.
2.2.1 Steps in Polygonal Approximation

1. Boundary Extraction: Extract the boundary of the object.


2. Vertex Detection: Identify significant boundary points (e.g., corners, junctions).
3. Line Fitting: Approximate the boundary using straight lines between the vertices.
2.2.2 Example of Polygonal Approximation

A circular object might be represented by a polygon with 8 vertices, approximating the circular
shape with straight lines connecting those vertices.

2.2.3 Advantages and Disadvantages

- Advantages:
- Simplifies complex boundaries.
- Reduces computational complexity.

- Disadvantages:
- Approximation may lose details of the actual boundary.
- Choice of significant points can impact accuracy.

2.3 B-Splines and Parametric Curves

B-splines (Basis splines) and parametric curves represent boundaries using smooth, continuous
curves, defined by control points. This method is particularly useful for representing curved
boundaries.

2.3.1 B-Splines

B-splines use a set of control points to create smooth curves that approximate or interpolate the
boundary of an object.

- Control Points: These are the main points that define the shape of the spline.
- Knot Vector: This determines how control points influence the curve.

2.3.2 Steps in B-Spline Representation


1. Control Point Selection: Choose control points along the boundary.
2. Curve Fitting: Generate a smooth curve that passes near (or through) these control points.
2.3.3 Example of B-Spline

An object's boundary, which is curvilinear, can be described by a spline that passes through
key points on the boundary. The resulting curve is smooth, capturing the object's shape more
naturally than straight-line segments.

2.3.4 Advantages and Disadvantages

- Advantages:
- Captures smooth boundaries accurately.
- Flexible and can represent complex shapes.

- Disadvantages:
- Computationally more expensive.
- Requires careful selection of control points.

3. Boundary Descriptors

Once a boundary is represented, various descriptors can be used to characterize its shape and
properties. These descriptors are essential for comparing and recognizing objects.

3.1 Simple Shape Descriptors

3.1.1 Perimeter

The perimeter is the length of the boundary, calculated by summing the distances between
consecutive boundary points.
3.1.2 Compactness

Compactness is a measure of how closely packed an object is, given by the ratio of the square
of the perimeter to the area of the object.
3.1.3 Example

For a circular object, compactness would be close to 1, while for irregular objects, compactness
would be larger.

4. Applications of Boundary Representations

4.1 Object Recognition

Boundary representations are widely used in object recognition tasks. By extracting boundary
features and comparing them to known objects, the system can classify or identify objects in
images.
- Example: Recognizing shapes like circles, triangles, or more complex objects in industrial
applications.

4.2 Medical Imaging


In medical image processing, boundaries are used to segment and identify important structures,
such as organs or tumors, in medical scans.

- Example: Identifying the boundaries of a tumor in an MRI scan for diagnostic purposes.
4.3 Robotics and Navigation

Robots and autonomous systems use boundary representations to navigate their environment,
recognizing and avoiding obstacles based on their boundaries.

- Example: A robot using boundary representations to identify and avoid objects in its path.

5. Challenges in Boundary Representations


5.1 Noise and Artifacts
Noise in images can cause boundary extraction to fail, producing irregular or broken
boundaries.
- Solution: Preprocessing techniques such as smoothing or filtering can help remove noise
before boundary extraction.

5.2 Scale and Resolution

Boundaries can appear different at varying scales and resolutions, making it difficult to use the
same representation across different images.

- Solution: Multi-scale representations can help by representing boundaries at different levels


of detail.

5.3 Occlusion and Partial Boundaries


Objects may be partially occluded, resulting in incomplete boundaries.
- Solution: In such cases, shape completion algorithms can be used to infer the missing parts
of the boundary.
Topic 5 - Region Representations

1. What is Region Representation?

Region representation is a technique used to define an object or area within an image by


considering its interior pixels and their properties. Unlike boundary representation, which
focuses on the object's outline, region representation considers the object's entire area.

1.1 Importance of Region Representation

- Segmentation: Helps in dividing an image into meaningful parts, like different objects or
regions.
- Texture and Color Analysis: Allows for analyzing the texture, color, or intensity inside the
region.
- Feature Extraction: Extracts features that describe the region, which are useful for
classification and recognition tasks.

2. Techniques for Region Representation

Several techniques exist for representing regions in an image, depending on the type of
information needed for a particular task. The following are the most commonly used methods.

2.1 Binary Region Representation

In binary region representation, each pixel within a region is assigned a value of 1 (indicating
the pixel belongs to the region) or 0 (indicating it belongs to the background). This method is
often used when we only need to distinguish between object and background.

2.1.1 Steps in Binary Region Representation

1. Thresholding: Convert the grayscale image to a binary image by choosing a threshold value.
2. Region Identification: Mark the pixels with intensity values above the threshold as belonging
to the region.
2.1.2 Example of Binary Region Representation

Consider a grayscale image where an object has higher intensity than the background. By
setting a threshold, all pixels with intensity greater than the threshold are set to 1, while the
others are set to 0.

- Original image:
10 20 30
80 90 100
40 50 60
- After thresholding (threshold = 50):
0 0 0
1 1 1
0 1 1

2.1.3 Advantages and Disadvantages


- Advantages:
- Simple and easy to implement.
- Effective for images with distinct object/background contrast.
- Disadvantages:
- Not suitable for images with varying object intensity.
- Sensitive to noise and threshold selection.

2.2 Labeling and Connected Components

Connected component labeling assigns unique labels to each distinct region in a binary image.
Pixels that are connected (e.g., neighboring pixels with the same intensity) are grouped together
to form a region.
2.2.1 Steps in Connected Components Labeling

1. Binary Image: Start with a binary image where pixels belonging to the region are labeled as
1.
2. Labeling: Traverse the image, labeling connected pixels as belonging to the same region.
3. Region Extraction: Identify and label each connected region.

2.2.2 Example of Connected Components

- Binary image:
1001
1011
0100
- Labeled regions:
1002
1022
0300
Here, pixels connected in the same region are assigned the same label (1, 2, 3).

2.2.3 Advantages and Disadvantages

- Advantages:
- Useful for separating multiple objects within an image.
- Provides a unique label for each connected region.

- Disadvantages:
- Complex for large images with many connected regions.
- May fail if noise creates false connections between regions.
2.3 Region Adjacency Graph (RAG)

The Region Adjacency Graph (RAG) represents an image’s regions as nodes in a graph, where
edges connect nodes if the regions, they represent are adjacent. RAG is helpful for hierarchical
segmentation and region merging.

2.3.1 Steps in RAG Representation

1. Segmentation: Segment the image into regions.


2. Graph Construction: Represent each region as a node.
3. Adjacency: Connect nodes whose regions are adjacent.
2.3.2 Example of Region Adjacency Graph

- Regions in an image:
R1 R1 R2
R3 R3 R2

- Corresponding RAG:

(R1)---(R2)
|
(R3)
Regions R1 and R2 are adjacent, and so are R1 and R3.

2.3.3 Advantages and Disadvantages


- Advantages:
- Simplifies the representation of segmented images.
- Useful for region merging and hierarchical segmentation.
- Disadvantages:
- Graph-based methods can be computationally intensive for large images.
- Requires accurate segmentation as a precursor.
2.4 Quadtree Representation

In quadtree representation, an image is recursively divided into four quadrants, or subregions,


based on homogeneity. This representation is especially useful for images that contain regions
of varying sizes.

2.4.1 Steps in Quadtree Representation

1. Initial Division: Divide the image into four quadrants.


2. Homogeneity Check: Check if each quadrant is homogeneous (i.e., contains similar pixel
values).
3. Recursive Division: If a region is not homogeneous, divide it further into four subregions.

2.4.2 Example of Quadtree Representation


Consider a 4x4 image:
- Initial image:
10 10 20 20
10 10 20 20
30 30 40 40
30 30 40 40

- After quadtree division:


- Quadrant 1: 10 (homogeneous)
- Quadrant 2: 20 (homogeneous)
- Quadrant 3: 30 (homogeneous)
- Quadrant 4: 40 (homogeneous)
2.4.3 Advantages and Disadvantages
- Advantages:
- Reduces memory usage by focusing on large homogeneous regions.
- Efficient for images with large areas of uniform intensity.
- Disadvantages:
- Less effective for images with small, irregular regions.
- Recursive divisions can be computationally expensive.

3. Region Descriptors
Once a region has been represented, various descriptors can be used to characterize its
properties, such as shape, texture, and intensity. These descriptors provide additional
information for tasks like classification and object recognition.

3.1 Area
The area is the total number of pixels within the region. This is a basic descriptor that indicates
the size of the object.

3.2 Centroid
The centroid of a region is the average of the positions of all the pixels in the region. It gives
the "center of mass" of the region.

3.3 Aspect Ratio


The aspect ratio is the ratio of the region’s width to its height. It provides information about the
shape of the region.

3.4 Texture Descriptors


Texture descriptors characterize the variation of pixel intensity within the region. Common
texture measures include contrast, correlation, and entropy, which can be computed using
statistical methods like the Gray Level Co-occurrence Matrix (GLCM).

4. Applications of Region Representation

4.1 Image Segmentation


Region representations are critical in image segmentation tasks, where the goal is to divide the
image into meaningful regions based on pixel properties such as intensity or texture.

- Example: Segmenting an image into regions corresponding to different objects (e.g.,


separating a car from the background).
4.2 Medical Imaging

In medical image processing, region representations are used to identify and analyze
anatomical structures, such as organs or tumors, based on their intensity characteristics.

- Example: Identifying a tumor as a region in an MRI scan based on its higher intensity
compared to surrounding tissue.

4.3 Object Detection and Recognition


Region representations are also used in object detection and recognition tasks.
Topic 6 - Boundary Descriptors

1. What are Boundary Descriptors?

Boundary descriptors provide a mathematical or geometric representation of an object’s


boundary, capturing essential features such as shape, size, and curvature. These descriptors help
reduce the complexity of the boundary data while preserving the most relevant information for
tasks such as object detection, classification, and recognition.

2. Types of Boundary Descriptors

There are several types of boundary descriptors used in computer vision, each capturing
different aspects of the object’s boundary. Common types include:

1. Shape Signatures
2. Curvature
3. Fourier Descriptors
4. Chain Codes
5. Shape Invariants

2.1 Shape Signatures

A shape signature is a one-dimensional function derived from a two-dimensional shape. It


provides a compact representation of the shape by measuring some property of the boundary
(e.g., distance from the centroid) at each boundary point.
2.1.1 Radial Distance Signature
The radial distance signature measures the distance between the centroid of the object and each
point on the boundary.

- Example:
Consider a circular object with a radius of 5. The radial distance signature for every boundary
point will be constant (5), producing a flat line in the signature plot.
2.1.2 Curvature Signature
The curvature signature represents the change in the angle of the boundary at each point. It is
calculated by taking the derivative of the boundary orientation with respect to the arc length.

- Steps:
1. Calculate the angle θi at each boundary point.
2. Compute the change in angle Δθi between consecutive boundary points.

- Application: The curvature signature is useful for identifying corners and sharp turns in the
boundary.

2.2 Curvature
Curvature measures how sharply a boundary is turning at a given point. It is a key feature for
recognizing objects with distinct corners or smooth curves.

2.2.1 Curvature Calculation

For a continuous boundary curve, curvature k is given by:


k= dθ/ds
where θ is the angle of the tangent to the curve, and s is the arc length along the boundary.
For a discrete boundary, curvature is approximated by the change in angle between consecutive
boundary points.

2.2.2 Example of Curvature


- A circle has a constant curvature at all points because the boundary turns uniformly.
- A rectangle has sharp changes in curvature at its corners (high curvature at the corners, zero
elsewhere).

2.4 Chain Codes


Chain codes provide a way to represent the boundary of a region by encoding the relative
movement from one boundary pixel to the next. This representation is based on the direction
in which the boundary moves between adjacent pixels.
2.4.1 Chain Code Directions
In an 8-connected grid, there are 8 possible directions of movement, each represented by a
number:
3 2 1
4 0
5 6 7

2.4.2 Chain Code Representation


For a boundary, the chain code is a sequence of numbers representing the direction of
movement between consecutive boundary points.

- Example: Consider the boundary of a square. Starting from the top-left corner and moving
clockwise:
Chain code: 0 0 0 6 6 6 4 4 4

2.4.3 Advantages and Disadvantages


- Advantages:
- Simple and easy to implement.
- Compact representation of boundary information.

- Disadvantages:
- Sensitive to noise and small perturbations.
- Not scale- or rotation-invariant.

2.5 Shape Invariants


Shape invariants are boundary descriptors that remain unchanged under transformations such
as translation, rotation, and scaling. These descriptors are particularly useful for object
recognition tasks where the object may appear in different positions, orientations, or sizes.
2.5.1 Examples of Shape Invariants
- Area: The area enclosed by the boundary.
- Perimeter: The length of the boundary.
- Aspect Ratio: The ratio of the length of the major axis to the length of the minor axis.
2.5.2 Invariant Moments

Moment invariants are a set of functions of image moments that are invariant to translation,
rotation, and scaling. They are used to describe the shape of an object in a way that remains
consistent under transformations.
- Hu’s Moment Invariants: These are seven functions of central moments that provide a
rotation, translation, and scale-invariant description of a shape.

3. Applications of Boundary Descriptors


Boundary descriptors are widely used in various computer vision applications, particularly
those that require a compact and descriptive representation of an object’s shape.

3.1 Object Recognition


Boundary descriptors, such as Fourier descriptors and shape invariants, are used to recognize
objects in images, even if they have undergone transformations such as rotation or scaling.
- Example: Recognizing different types of vehicles (cars, trucks, etc.) based on their boundary
shapes.
3.2 Shape Analysis
In medical imaging, boundary descriptors are used to analyze the shapes of anatomical
structures, such as organs or tumors. Shape analysis helps in diagnosing abnormalities and
detecting disease.
- Example: Identifying tumors based on the irregularity of their boundaries in MRI scans.

3.3 Handwritten Character Recognition


Boundary descriptors, particularly chain codes and Fourier descriptors, are used in recognizing
handwritten characters and digits. The boundary of each character is analyzed and compared
to a set of reference shapes.
- Example: Recognizing handwritten digits in postal code sorting applications.
Topic 7 - Regional Descriptors and Image Warping

1. Introduction
In computer vision and image processing, regional descriptors provide a way to describe the
characteristics of regions within an image, focusing on the internal properties of an object (such
as texture, area, and moments). Image warping, on the other hand, involves geometrically
transforming an image to correct distortions or achieve a specific visual effect, often used in
tasks like image registration and texture mapping.

2. Regional Descriptors

Regional descriptors capture the properties of a region or area within an image, typically
focusing on the interior of an object rather than its boundary. These descriptors are essential
for understanding the object's characteristics, like its shape, size, or texture, and are useful for
tasks such as object recognition and segmentation.

2.1 What are Regional Descriptors?

Regional descriptors are a set of features that describe various properties of a region in an
image, including its geometric shape, texture, and statistical properties. They are especially
important when working with region-based segmentation techniques.

2.2 Types of Regional Descriptors

2.2.1 Area
Area measures the number of pixels contained within the boundary of a region or object. It’s
one of the simplest descriptors and provides an indication of the size of the object.

- Example: In a binary image, the area of an object is simply the number of pixels with value
1 (foreground pixels).

2.2.2 Centroid
The centroid is the center of mass of the region, calculated by averaging the pixel coordinates
of all the points within the region. It gives a point that represents the location of the region in
space.
- Example: For a circular object, the centroid will be the center of the circle.

2.2.3 Eccentricity
Eccentricity is a measure of how much a region deviates from being circular. It’s calculated as
the ratio of the length of the major axis to the length of the minor axis of the region's ellipse.

- Example: A circle has an eccentricity of 0, while an elongated ellipse will have a higher
eccentricity value.

2.2.4 Moments
Moments are a set of scalar quantities that provide information about the shape of the region.
Moments can be used to calculate various geometric properties, such as area, centroid, and
orientation.
- Central Moments: Central moments are calculated relative to the centroid and provide more
robust information about the shape.

2.2.5 Texture Descriptors


Texture descriptors provide information about the surface texture of the region, which is
important for recognizing objects based on their appearance rather than just their shape.

- Gray-Level Co-occurrence Matrix (GLCM): Describes how often pairs of pixels with specific
values occur in a certain spatial relationship in the region. It captures texture properties such
as contrast, homogeneity, and correlation.
- Example: A region with a smooth texture will have a low contrast and a more homogeneous
distribution of pixel intensities.

2.2.6 Compactness
Compactness is a measure of how tightly packed the region’s pixels are, relative to its
perimeter.
- Example: A perfect circle has the smallest compactness value for a given area, while elongated
or irregular shapes have higher compactness.
2.3 Applications of Regional Descriptors

- Object Recognition: Regional descriptors help in distinguishing between objects based on


their internal properties such as area, centroid, and texture.

- Medical Imaging: Regional descriptors are used to analyze regions of interest, such as
detecting tumors or lesions in MRI scans based on size, texture, or shape.

- Image Segmentation: Regional descriptors can be used to improve the accuracy of


segmentation algorithms by identifying similar regions based on shared properties.

3. Image Warping

Image warping refers to the process of geometrically transforming an image so that the objects
within it appear in a desired manner. The transformation can involve scaling, rotating,
translating, or distorting the image.

3.1 What is Image Warping?


Image warping modifies the spatial coordinates of pixels in an image to achieve a desired
transformation. This process is particularly useful in correcting distortions caused by
perspective or lens distortion, aligning images, and for special effects in computer graphics.

3.2 Types of Image Warping

3.2.1 Affine Transformation


An affine transformation is a linear mapping that preserves points, straight lines, and planes. It
can include translation, rotation, scaling, and shearing.
- Example: Rotating an image by a certain angle while maintaining the straightness of lines and
the uniformity of parallel lines.
3.2.2 Perspective Transformation
A perspective transformation (also called homography) is a more general form of
transformation that maps points in a 2D plane to other points on a plane, taking into account
the effects of perspective.
- Example: Correcting the distortion in an image taken at an angle to make it look as if the
image was taken head-on.
3.2.3 Non-Linear Warping
In non-linear warping, the transformation applied to the image is not a simple linear operation
but can vary across the image. Non-linear warping is used to correct lens distortions or to morph
images.
- Radial Distortion Correction: Corrects the distortion caused by wide-angle lenses, where
straight lines appear curved.
- Example: Removing the barrel distortion in images taken with fisheye lenses.

3.3 Steps in Image Warping

1. Define the Transformation: Specify the transformation that needs to be applied, such as affine
or perspective transformation.
2. Apply the Transformation: For each pixel in the original image, compute its new location
based on the transformation matrix.

3. Interpolation: Since pixel coordinates are typically non-integer values after transformation,
interpolation is used to estimate the new pixel values. Common interpolation methods include
nearest neighbor, bilinear, and bicubic interpolation.

3.4 Applications of Image Warping

- Image Registration: Aligning two images taken from different perspectives or at different
times, such as in medical imaging to track the progression of disease.
- Texture Mapping: In computer graphics, warping is used to map textures onto 3D models.
- Panorama Stitching: Image warping is used to blend multiple images together into a single
panoramic view.

You might also like