DL Unit - 5

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 14

UNIT – V

UNIT V: Applications: Object recognition, sparse coding, computer vision, natural language
processing. Introduction to Deep Learning Tools: Caffe, Theano, Torch.

object detection:
Object detection, a subset of computer vision, is an automated method for locating
interesting objects in an image with respect to the background. For example, Figure 1 shows
two images with objects in the foreground. There is a bird in the left image, while there is a
dog and a person in the right image.

Figure 1. The Concept of Object Detection


Solving the object detection problem means placing a tight bounding box around these
objects and associating the correct object category with each bounding box. Like other
computer vision tasks, deep learning is the state-of-art method to perform object detection.

How object detection works:

A key issue for object detection is that the number of objects in the foreground can
vary across images. But to understand how object detection works, let’s first consider
restricting the object detection problem by assuming that there is only one object per image.
If there is only one object per image, finding a bounding box and categorizing the object can
be solved in a straightforward manner. The bounding box consists of four numbers, so
learning the bounding box location can naturally be modeled as a regression problem. From
there, categorizing the object is a classification problem.

The convolutional neural network (CNN) shown in Figure 2 provides a solution to the
regression and classification problems for our restricted object detection problem. Like other
traditional computer vision tasks such as image recognition, key points detection, and
semantic segmentation, our restricted object detection problem deals with a fixed number of
targets. These targets can be fit, by modeling the targets as a fixed number of classification or
regression problems.

1
Figure 2. Network architecture to detect one object in the image
As already noted, true object detection must be able to deal with N objects, where N
varies from image to image. Unfortunately, the CNN shown in Figure 2 cannot solve this
more general problem. It may be possible to use a variant of the CNN by hypothesizing many
rectangle box locations and sizes and simply use the CNN for object classification. We often
refer to these rectangle boxes as windows. To be comprehensive, the window hypotheses
must cover all possible locations and sizes in the image. For each window size and location, a
decision can be made on whether there is an object present, and if so, the category for the
object.

sparse coding:
Sparse coding involves translating data into a high-dimensional feature space and
using basis functions to represent it as a linear combination of a limited number of these
functions. The coefficients of linear combinations are selected under a sparsity constraint,
aiming to reduce the reconstruction error between the original data and the representation.
This constraint pushes the final representation to be a linear combination of the basis
functions.
In this way, sparse coding, as previously noted, seeks to discover a usable sparse
representation of any given data. Then, a sparse code will be used to encode data:

 To learn the sparse representation, the sparse coding requires input data
 This is incredibly helpful since you can utilize unsupervised learning to apply it
straight to any data
 Without sacrificing any information, it will automatically find the representation
Sparse coding techniques aim to meet the following two constraints simultaneously:

 It’ll try to learn how to create a useful sparse representation as a vector h for a
given data X (as a vector) using an encoder matrix D
 It’ll try to reconstruct the data X using a decoder matrix D for each representation is
given as a vector h

2
Mathematical Background of Sparse Coding:
We assume our data X satisfies the following equation:

 Sparse learning: a process that aims to train the input data and xj,j€{1,…,m} to
generate a dictionary of basis functions D
 Sparse encoding: a process that aims to learn the sparse code h using the test
data X and the generated dictionary (D)
The general form of the target function to optimize the dictionary learning process may be
mathematically represented as follows:

In the previous equation, C is a constant; N denotes the dimensionality of data; and xk means
the k-given vector of the data. Furthermore, hk is the sparse representation of xk; D is the
decoder matrix (dictionary); and the coefficient of sparsity is .
The min expression in the general form gives the impression that we are attempting to nest
two optimization issues. We can think of this as two separate optimization issues fighting to
get the optimal compromise:

 Outer minimization: the procedure that deals with the left side of the sum process
while attempting to adjust D to reduce the input’s loss during the reconstruction
process
 Inner minimization: the procedure that attempts to increase sparsity by lowering
the L1-norm of the sparse representation h. Said, we are trying to solve a problem (in
this case, reconstruction) while utilizing the fewest resources feasible to store our data
The following algorithm performs sparse coding by iteratively updating the coefficients for
each sample in the dataset, using the residuals from the previous iteration to update the
coefficients for the current iteration. The sparsity constraint is enforced by setting the
coefficients to zero or close to zero if they fall below a certain threshold.

Sparse Coding Neural Networks:


From a biological perspective, sparse code follows the more all-encompassing idea of neural
code. Consider the case when you have N binary neurons. So, basically:

 The neural networks will get some inputs and deliver outputs
 Some neurons in the neural network will be frequently activated while others won’t
be activated at all to calculate the outputs
 The average activity ratio refers to the number of activations on some data, whereas
the neural code is the observation of those activations for a specific input
3
 Neural coding is the process of instructing your neurons to produce a reliable neural
code
Now that we know what a neural code is, we can speculate on what it may be like. Then,
data will be encoded using a sparse code while taking into consideration the following
scenarios:

 No neurons are even activated


 One neuron alone is activated
 Half of the neurons are active
 Neurons in half of the brain activate
The figure next shows an example of how a dense neural network graph is transformed into a
sparse neural network graph:

Applications of Sparse Coding


Brain imaging, natural language processing, and image and sound processing all use sparse
coding. Also, sparse coding has been employed in leading-edge systems in several areas and
has shown to be very successful at capturing the structure of complicated data.
Resources for the implementation of sparse coding are available in several Python modules.
A well-known toolkit called Scikit-learn offers a light coding module with methods for
creating a dictionary of basis functions and sparsely combining these basis functions to
encode data.

https://www.baeldung.com/cs/sparse-coding#1-mathematical-background-of-sparse-
coding

4
computer vision:
Computer vision, a fascinating field, involves teaching machines to interpret and
understand visual information. Deep learning, particularly neural networks, has
revolutionized computer vision by enabling models to learn directly from raw image data.
Here are some exciting applications of deep learning in computer vision:

1) Image Classification:

Image classification involves assigning a label to an entire image or photograph.

This problem is also referred to as “object classification” and perhaps more generally
as “image recognition,” although this latter task may apply to a much broader set of tasks
related to classifying the content of images.

Some examples of image classification include:


 Labeling an x-ray as cancer or not (binary classification).
 Classifying a handwritten digit (multiclass classification).
 Assigning a name to a photograph of a face (multiclass classification).

A popular example of image classification used as a benchmark problem is the MNIST


dataset.

Example of Handwritten Digits From the MNIST Dataset

2) Image Classification With Localization:


Image classification with localization assigns a class label to an image and displays the
object's location using a bounding box, making it a more challenging version of image
classification.

Some examples of image classification with localization include:

 Labeling an x-ray as cancer or not and drawing a box around the cancerous region.
 Classifying photographs of animals and drawing a box around the animal in each
scene.

5
A classical dataset for image classification with localization is the PASCAL Visual
Object Classes datasets, or PASCAL VOC for short (e.g. VOC 2012). These are datasets used
in computer vision challenges over many years.

The task may involve adding bounding boxes around multiple examples of the same
object in the image. As such, this task may sometimes be referred to as “object detection.”

3) Object Detection:
Object detection is a challenging task that involves image classification with localization,
often involving multiple objects of different types. This task is more challenging than simple
or localized image classification, and techniques developed for image classification with
localization are often used for object detection.

Some examples of object detection include:

 Drawing a bounding box and labeling each object in a street scene.


 Drawing a bounding box and labeling each object in an indoor photograph.
 Drawing a bounding box and labeling each object in a landscape.

The PASCAL Visual Object Classes datasets, or PASCAL VOC for short (e.g. VOC 2012),
is a common dataset for object detection.

6
Another dataset for multiple computer vision tasks is Microsoft’s Common Objects in
Context Dataset, often referred to as MS COCO.

4) Object Segmentation:

Object segmentation is the process of detecting objects in an image by drawing a line


around each object, while image segmentation is a broader problem of dividing an image into
segments.

Object detection is also sometimes referred to as object segmentation.

Object segmentation is a method that identifies specific pixels in an image that belong
to an object, akin to fine-grained localization. It involves segmenting all pixels in an image
into different object categories.

Again, the VOC 2012 and MS COCO datasets can be used for object segmentation.

7
5) Style Transfer:

Style transfer is a process of learning a style from images and applying it to a new
one, often referred to as a photo filter or transform.

It involves applying the style of famous artworks, such as Pablo Picasso or Vincent
van Gogh, to new photographs, often using public domain artworks and standard
computer vision datasets.

6) Image Colorization:

Image colorization or neural colorization involves converting a grayscale image to a full


color image.

This task can be thought of as a type of photo filter or transform that may not have an
objective evaluation.

Examples include colorizing old black and white photographs and movies.

8
Datasets often involve using existing photo datasets and creating grayscale versions of
photos that models must learn to colorize.

7) Image Reconstruction:

Image reconstruction and inpainting involve filling in missing or corrupt parts of an image,
often involving using existing photo datasets and creating corrupted versions of photos that
models must learn to repair.

Examples include reconstructing old, damaged black and white photographs and movies, and
often involve using photo filters or transforms without objective evaluation.

9
8) Image Super-Resolution:

Image super-resolution is the process of creating a new image with higher resolution and
detail than the original. Models developed for this purpose are often used for image
restoration and inpainting to solve related problems. These models often use existing photo
datasets and create down-scaled versions.

9) Image Synthesis:

Image synthesis is the task of generating targeted modifications of existing images or


entirely new images.

This is a very broad area that is rapidly advancing.

It may include small modifications of image and video (e.g. image-to-image translations),
such as:

 Changing the style of an object in a scene.


 Adding an object to a scene.
 Adding a face to a scene.

10
natural language processing[NLP]:
Language is a method of communication with the help of which we can speak, read
and write. For example, we think, we make decisions, plans and more in natural language;
precisely, in words. However, the big question that confronts us in this AI era is that can we
communicate in a similar manner with computers. In this sense, we can say that Natural
Language Processing (NLP) is the sub-field of Computer Science especially Artificial
Intelligence (AI) that is concerned about enabling computers to understand and process
human language. Technically, the main task of NLP would be to program computers for
analysing and processing huge amount of natural language data.

11
Natural Language Processing Phases:

Figure: NLP Phases

The above diagram shows the phases or logical steps involved in natural language processing

Morphological Processing:

It is the first phase of NLP. The purpose of this phase is to break chunks of language
input into sets of tokens corresponding to paragraphs, sentences and words. For example, a
word like “uneasy” can be broken into two sub-word tokens as “un-easy”.

Syntax Analysis:

It is the second phase of NLP. The purpose of this phase is two folds: to check that a
sentence is well formed or not and to break it up into a structure that shows the syntactic
relationships between the different words. For example, the sentence like “The school goes to
the boy” would be rejected by syntax analyser or parser.

Semantic Analysis:

It is the third phase of NLP. The purpose of this phase is to draw exact meaning, or
you can say dictionary meaning from the text. The text is checked for meaningfulness. For
example, semantic analyser would reject a sentence like “Hot ice-cream”.

12
Pragmatic Analysis:
It is the fourth phase of NLP. Pragmatic analysis simply fits the actual objects/events,
which exist in a given context with object references obtained during the last phase (semantic
analysis).

For example, the sentence “Put the banana in the basket on the shelf” can have two semantic
interpretations and pragmatic analyser will choose between these two possibilities.

Content Source: Https:// www.tutorialspoint.com/ natural_language_processing/


natural_language_processing_quick_guide.htm

Different types based on Working:

1) Speech Recognition — The translation of spoken language into text.


2) Natural Language Understanding (NLU) — The computer’s ability to
understand what we say.
3) Natural Language Generation (NLG) — The generation of natural language by
a computer.

Applications of NLP:

 Spam Filters

 Algorithmic Trading

 Answering Questions

 Summarizing Information’s etc

Content Source: https://www.geeksforgeeks.org/natural-language-processing-overview/

Introduction to Deep Learning Tools: Caffe, Theano, Torch:


Deep learning tools like Caffe, Theano, and Torch are popular frameworks that facilitate
the development and implementation of deep neural networks. Each of these tools has its own
strengths and use cases, and they have been widely used in both research and industry. Let's
briefly introduce each of them:

1. Caffe:
o Description: Caffe, short for Convolutional Architecture for Fast Feature
Embedding, is a deep learning framework developed by the Berkeley Vision

13
and Learning Center (BVLC). It is known for its speed and efficiency,
particularly in convolutional neural networks (CNNs).
o Key Features:
 Designed for image classification tasks.
 Supports both CPU and GPU acceleration.
 Provides a clean and expressive architecture for defining neural
networks.
2. Theano:
o Description: Theano is a numerical computation library for Python that
allows you to define, optimize, and evaluate mathematical expressions,
especially matrix-valued ones. It is widely used for deep learning tasks.
o Key Features:
 Efficiently compiles mathematical expressions into optimized CPU or
GPU code.
 Supports symbolic differentiation, which is useful for training neural
networks.
 Provides a flexible and extensible platform for building various types
of models.
3. Torch:
o Description: Torch is an open-source machine learning library and scientific
computing framework. It provides a wide array of tools for building and
training deep learning models. PyTorch, a popular deep learning library, is
based on Torch.
o Key Features:
 Dynamic computational graph: Allows for dynamic creation and
modification of the computation graph during runtime.
 Provides modules for building various types of neural networks.
 Supports GPU acceleration for faster training.

Each of these tools has played a significant role in the evolution of deep learning, but
it's important to note that the field has seen rapid advancements, and newer frameworks and
libraries have emerged over time. For example, PyTorch and TensorFlow have gained
popularity and are often preferred for their flexibility, ease of use, and strong community
support. When choosing a deep learning tool, it's essential to consider factors such as your
specific use case, the framework's community support, documentation, and your familiarity
with the tools.

14

You might also like