Computer Vision Class X

Download as pdf or txt
Download as pdf or txt
You are on page 1of 39

COMPUTER VISION

Introduction
As we all know, artificial intelligence is a technique that enables
computers to mimic human intelligence. As humans we can see
things, analyse it and then do the required action on the basis of what
we see.
But can machines do the same? Can machines have the eyes that
humans have? If you answered Yes, then you are absolutely right. The
Computer Vision domain of Artificial Intelligence, enables machines to
see through images or visual data, process and analyse them on the
basis of algorithms and methods in order to analyse actual
phenomena with images.
Now before we get into the concepts of Computer
Vision, let us experience this domain with the help of
the following game:
* Emoji Scavenger Hunt :
https://emojiscavengerhunt.withgoogle.com/
Go to the link and try to play the game of Emoji
Scavenger Hunt. The challenge here is to find 8
items within the time limit to pass.
Applications of Computer Vision

The concept of computer vision was first introduced in the


1970s. All these new applications of computer vision
excited everyone. Having said that, the computer vision
technology advanced enough to make these applications
available to everyone at ease today. However, in recent
years the world witnessed a significant leap in technology
that has put computer vision on the priority list of many
industries. Let us look at some of them:
Facial Recognition*: With the advent of smart cities and
.

smart homes, Computer Vision plays a vital role in making


the home smarter. Security being the most important
application involves use of Computer Vision for facial
recognition. It can be either guest recognition or log
maintenance of the visitors. It also finds its application in
schools for an attendance system based on facial
recognition of students
Face Filters*: The modern-day apps like Instagram
and snapchat have a lot of features based on the
usage of computer vision. The application of face
filters is one among them. Through the camera the
machine or the algorithm is able to identify the facial
dynamics of the person and applies the facial filter
selected.
Google’s Search by Image*: The maximum amount of
searching for data on Google’s search engine comes from textual
data, but at the same time it has an interesting feature of getting
search results through an image. This uses Computer Vision as it
compares different features of the input image to the database
of images and give us the search result while at the same time
analysing various features of the image.
Computer Vision in Retail*: The retail field has been one of the
fastest growing field and at the same time is using Computer
Vision for making the user experience more fruitful. Retailers can
use Computer Vision techniques to track customers’ movements
through stores, analyse navigational routes and detect walking
patterns.
Inventory Management is another such application. Through
security camera image analysis, a Computer Vision algorithm can
generate a very accurate estimate of the
items available in the store. Also, it can analyse
the use of shelf space to identify suboptimal
configurations and suggest better item placement.
Self-Driving Cars: Computer Vision is the fundamental
technology behind developing autonomous vehicles. Most
leading car manufacturers in the world are reaping the
benefits of investing in artificial intelligence for developing
on-road versions of hands-free technology.
This involves the process of identifying the objects, getting
navigational routes and also at the same time environment
monitoring.
Medical Imaging*: For the last decades, computer-
supported medical imaging application has been a
trustworthy help for physicians. It doesn’t only create and
analyse images, but also becomes an assistant and helps
doctors with their interpretation. The application is used to
read and convert 2D scan images into interactive 3D
models that enable medical professionals to gain a
detailed understanding of a
patient’s health condition.
Google Translate App*: All you need to do to read signs
in a foreign language is to point your phone’s camera at
the words and let the Google Translate app tell you what it
means in your preferred language almost instantly. By
using optical character recognition to see the image and
augmented reality to overlay an accurate translation, this
is a convenient tool that uses
Computer Vision.
Computer Vision: Getting Started
Computer Vision is a domain of Artificial
Intelligence, that deals with the images. It involves
the concepts of image processing and machine
learning models to build a Computer Vision based
application.
Computer Vision Tasks
The various applications of Computer Vision are based
on a certain number of tasks which are performed to
get certain information from the input image which can
be directly used for prediction or forms the base for
further analysis. The tasks used in a computer vision
application are :
For Multiple
For Single Objects
Objects

Classification Object
Detection

Classification
Instance
+ Segementation
Localisation
Classification
Image Classification problem is the task of assigning an input
image one label from a fixed set of categories. This is one of
the core problems in CV that, despite its simplicity, has a large
variety of practical applications.
Classification + Localisation
This is the task which involves both processes of identifying
what object is present in the image and at the same time
identifying at what location that object is present in that
image. It is used only for single objects.
Object Detection
Object detection is the process of finding instances of real-world
objects such as faces, bicycles, and buildings in images or videos.
Object detection algorithms typically use extracted features and
learning algorithms to recognize instances of an object category. It is
commonly used in applications such as image retrieval and automated
vehicle parking systems.
Instance Segmentation
Instance Segmentation is the process of detecting instances of the
objects, giving them a category and then giving each pixel a label on
the basis of that. A segmentation algorithm takes an image as input
and outputs a collection of regions (or segments).
Basics of Images
We all see a lot of images around us and use them daily either through our
mobile phones or computer system. But do we ask some basic questions to
ourselves while we use them on such a regular basis.

Don’t know the answer yet? Don’t worry, in this section we will study about the
basics of an image:
Basics of Pixels
The word “pixel” means a picture element. Every photograph,
in digital form, is made up of pixels. They are the smallest unit
of information that make up a picture. Usually round or
square, they are typically arranged in a 2-dimensional grid.
In the image below, one portion has been magnified many
times over so that you can see its individual composition in
pixels. As you can see, the pixels approximate the actual image.
The more pixels you have, the more closely the image
resembles the original.
Resolution
The number of pixels in an image is sometimes called the resolution.
When the term is used to describe pixel count, one convention is to
express resolution as the width by the height, for example a monitor
resolution of 1280×1024. This means there are 1280 pixels from one
side to the other, and 1024 from top to bottom.
Another convention is to express the number of pixels as a single
number, like a 5 mega pixel camera (a megapixel is a million pixels).
This means the pixels along the width multiplied by the pixels along
the height of the image taken by the camera equals 5 million pixels. In
the case of our 1280×1024 monitors, it could also be expressed as
1280 x 1024 = 1,310,720, or 1.31 megapixels.
Pixel value
Each of the pixels that represents an image stored inside a computer has a
pixel value which describes how bright that pixel is, and/or what colour it
should be. The most common pixel format is the byte image, where this
number is stored as an 8-bit integer giving a range of possible values from 0
to 255. Typically, zero is to be taken as no colour or black and 255 is taken to
be full colour or white.
Why do we have a value of 255 ? In the computer systems, computer data is
in the form of ones and zeros, which we call the binary system. Each bit in a
computer system can have either a zero or a one.
Since each pixel uses 1 byte of an image, which is equivalent to 8 bits of
data. Since each bit can have two possible values which tells us that the 8 bit
can have 255 possibilities of values which starts from 0 and ends at 255.
Grayscale Images
Grayscale images are images which have a range of shades of gray
without apparent colour. The darkest possible shade is black, which is
the total absence of colour or zero value of pixel. The lightest possible
shade is white, which is the total presence of colour or 255 value of a
pixel . Intermediate shades of gray are represented by equal brightness
levels of the three primary colours.
A grayscale has each pixel of size 1 byte having a single plane of 2d
array of pixels. The size of a grayscale image is defined as the Height x
Width of that image.
Let us look at an image to understand about grayscale images.
Here is an example of a grayscale image. as you check, the value of pixels are
within the range of 0- 255.The computers store the images we see in the form of
these numbers.
RGB Images
All the images that we see around are coloured
images. These images are made up of three
primary colours Red, Green and Blue. All the
colours that are present can be made by
combining different intensities of red, green and
blue.
For Example,
As you can see, each colour image is stored in the form of three different
channels, each having different intensity. All three channels combine together to
form a colour we see.

In the above given image, if we split the image into three different channels,
namely Red (R), Green
(G) and Blue (B), the individual layers will have the following intensity of colours
of the individual pixels. These individual layers when stored in the memory looks
like the image on the extreme right. The images look in the grayscale image
because each pixel has a value intensity of 0 to 255 and as studied earlier, 0 is
considered as black or no presence of colour and 255 means white or full
presence of colour. These three individual RGB values when combined together
form the colour of each pixel.
Therefore, each pixel in the RGB image has three values to form the complete
colour.
Image Features
In computer vision and image processing, a feature is a piece
of information which is relevant for solving the computational
task related to a certain application. Features may be specific
structures in the image such as points, edges or objects.
For example:
Imagine that your security camera is capturing an image. At the
top of the image we are given six small patches of images. Our
task is to find the exact location of those image patches in the
image.
Take a pencil and mark the exact location of those patches in
the image.
Conclusion
In image processing, we can get a lot of features from the image. It can
be either a blob, an edge or a corner. These features help us to perform
various tasks and then get the analysis done on the basis of the
application. Now the question that arises is which of the following are
good features to be used? As you saw in the previous activity, the
features having the corners are easy to find as they can be found only at
a particular location in the image, whereas the edges which are spread
over a line or an edge look the same all along. This tells us that the
corners are always good features to extract from an image followed by
the edges.
Let’s look at another example to understand this. Consider the images
given below and apply the concept of good features for the following.
In the above image how would we determine the exact
location of each patch?
The blue patch is a flat area and difficult to find and
track. Wherever you move the blue patch it looks the
same. The black patch has an edge. Moved along the
edge (parallel to edge), it looks the same. The red patch
is a corner. Wherever you move the patch, it looks
different, therefore it is unique. Hence, corners are
considered to be good features in an image.
Introduction to OpenCV
Now that we have learnt about image features and its
importance in image processing, we will learn about a tool we
can use to extract these features from our image for further
processing.
OpenCV or Open Source Computer Vision Library is that tool
which helps a computer extract these features from the
images. It is used for all kinds of images and video processing
and analysis. It is capable of processing images and videos to
identify objects, faces, or even handwriting.
In this chapter we will use OpenCV for basic image processing operations on images such as
resizing, cropping and many more.
pip install opencv-python

Now let us take a deep dive on the various functions of OpenCV to understand
the various image processing techniques. Head to Jupyter Notebook for
introduction to OpenCV given on this link: http://bit.ly/cv_notebook

You might also like