Text and Face Detection

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 82

Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

Chapter 1

Introduction

1.1 Prelude
The goal of the proposed project is to develop smart text recognition and face detection
technology for blind or visually impaired person using Raspberry Pi. In this project a camera based
assistive text reading system is proposed to help visually impaired person in reading the text present
on the captured image. The faces can also be detected when a person enter into the frame by the mode
control.

The proposed idea involves text extraction from scanned image using Tesseract Optical
Character Recognition (OCR) and converting the text to speech by e-speak tool, a process which helps
visually impaired persons to read the text. This is a prototype for blind people to recognize the products
in real world by extracting the text on image and converting it into speech. Proposed method is carried
out by using Raspberry Pi and portability is achieved by using a battery backup. Thus the user can
carry the device anywhere and is able to use at any time.

1.2 Aim of the project


To develop a camera based assistive text reading to help visually impaired person in reading
the text present on the captured image. The system developed will eliminate others support for visually
impaired people in recognizing the text and provide them a convenient and sophisticated environment.
On implementing this system, it facilitates the blind people for reading, face detection.

1.3 Existing system


The problem of the text recognition or reading and face detection has been addressed in
academia, primarily from the angle of human-computer interaction, and in the industry, by proposing
some commercially viable systems that utilize recent advances in mobile device and sensor technology.
In particular for outdoor navigation the availability of GPS-compatible cell phones and PDAs

Department of Electronics and Communication Engineering, AIET, Mijar 1


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

prompted appearance of a number of software products, some of which have accessibility features
making them potentially suitable for the blind and visually impaired users.

1.4 Proposed System


The proposed idea involves text extraction from scanned image using Tesseract Optical
Character Recognition (OCR) and converting the text to speech by e-speak tool, a process which makes
visually impaired persons to read the text. This is a prototype for blind people to recognize the products
in real world by extracting the text on image and converting it into speech. Proposed method is carried
out by using Raspberry Pi and portability is achieved by using a battery backup. Thus the user can
carry the device anywhere and use it at any time.

1.5 Objectives of the proposed system

Text recognition and face detection for visually impaired people is undertaken to help the blind
people as well as the people who cannot read and recognize. The main objectives of the project are:
 To design a smart text recognition system so that visually impaired people can live
independently.
 To design a smart face detection system that is a portable device so that people can carry it
from one place to another very easily.

1.6 Motivation

Text recognition and face detection for visually impaired people is a project undertaken to help
the visually impaired people and also to make them independent. This idea is obtained by realising the
fact that the blind people are dependent on another person for text recognition. Therefore, the main
objective of the project is to design a system that can help the blind people or visually impaired people.

The main purpose of the project is to develop text recognition and face detection system for
visually impaired person. The system developed will eliminate others support for visually impaired
people in text reading and face detection and provide them a convenient and sophisticated environment.

Department of Electronics and Communication Engineering, AIET, Mijar 2


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

On implementing this system, it facilitates the visually impaired people for text reading and face
detection.

1.7 Organization of the report:


The report is organized as follows:

Chapter 2: This chapter presents the review of necessary related work of text recognition and face
detection for visually impaired people.

Chapter 3: This chapter presents the fundamentals and block diagram of text recognition and face
detection aid for visually impaired people using Raspberry Pi which describes the details about each
and every component used in this project.

Chapter 4: This chapter presents the circuit diagram and the flowchart of the text recognition and face
detection aid for visually impaired people using Raspberry Pi.

Chapter 5: This chapter presents how the implementation of the text recognition and face detection for
visually impaired people is carried out.

Chapter 6: This chapter presents the results and discussion of text recognition and face detection for
visually impaired people which shows result after every step. This chapter also includes the advantages
of this project.

Chapter 7: This chapter presents the conclusion and future scope of this project.

Department of Electronics and Communication Engineering, AIET, Mijar 3


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

Chapter 2

Literature Survey

2.1 Introduction

Different text extraction methods are studied as text localization and text recognition in natural
scene images of real-world scenes. The survey was done on several ongoing researches on Raspberry
Pi based document analysis such as text detection, extraction, enhancement, recognition and its
applications. Most of the existing system are built in MATLAB platforms. And few of them use
laptops, so that they are not portable. Algorithms used in earlier system lack efficiency and accuracy.

2.2 Literature review

Rupali et al [1] have discussed about the prototype for extracting text from images using
Raspberry Pi. The images are captured using a web cam and are processed using open Computer
Vision(CV) and Operational Test Support Unit (OTSU) algorithm. Initially the captured images are
converted to grayscale colour mode. The images are rescaled and cosine transformations are applied
by setting vertical and horizontal ratio. After applying some morphological transformations OTSU’s
thresholding is applied to images which is adaptive thresholding algorithm. The proposed system needs
to improve the accuracy rate of text detection and text recognition with the help of improved algorithm.

Rajkumar N et al [2] have proposed a camera based assistive text reading framework to help
visually impaired persons read text labels and product packaging from hand-held objects in daily life.
The system proposes a motion based method to define a Region of Interest (ROI), for isolating the
object from untidy backgrounds or other surrounding objects in the camera vision. A mixture of
Gaussians based background subtraction technique is used to extract moving object region. To acquire
text details from the ROI, text localization and recognition are conducted to acquire text details. In an
Adaboost model the gradient features of stroke orientations and distributions of edge pixels are carried
out by Novel text localization algorithm. Text characters in localized text regions are binarized and
recognized by off the shelf optical character identification software. The proposed system needs to

Department of Electronics and Communication Engineering, AIET, Mijar 4


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

extend localization algorithm to process text strings with characters fewer than three and to design
more robust block patterns for text feature extraction.

Bindu K Rajan et al [3] have proposed a camera based assistive text reading to help visually
impaired person in reading the text present on the captured image. The faces can also be detected when
a person enter into the frame by the mode control. The proposed idea involves text extraction from
scanned image using Tesseract Optical Character Recognition (OCR) and converting the text to speech
by e-speak tool, a process which makes visually impaired persons to read the text. This is a prototype
for blind people to recognize the products in real world by extracting the text on image and converting
it into speech. The proposed system needs improve the quality of scanned images.

Ezaki et al [4] have proposed a method in which a binary image is created using global or
local thresholding which can be decided from Fisher’s Discriminant Rate (FDR). The technique is
essentially based on OTSU’s binarization method. It is an automatic threshold selection region based
segmentation method. In this method when the characters are present on frame, then the local histogram
has two peaks and this is reflected as a high value for the FDR. For quasi uniform frames the value of
the FDR is small and the histogram has only one peak. In the case of complex areas the histogram is
dispersed resulting in higher FDR value, which are still lower than in the case of text areas. With a
bimodal gray level histogram the FDR is used to detect the image frames. When the image frames are
of high FDR values, the local OTSU threshold is used for binarizing the image. The proposed system
needs to improve the OTSU binarization method since it is based on the assumption of binary classes.

Chucai Yi et al [5] have proposed a new framework to extract text strings with multiple sizes
and colours, and arbitrary orientations from scene images with a complex and cluttered background.
The proposed framework consists of two main steps, image partition to find text character candidates
based on gradient feature and colour uniformity. In this step two methods adjacent character grouping
method and text line grouping methods are used. The adjacent character grouping method calculates
the sibling groups of each character candidates as string segments and then merges the intersecting
sibling groups into text string. The text line grouping method performs Hough transform to fit text line
among the centroids of text candidates. The drawback of the proposed system is that the accuracy rate
of text detection is less. Hence the system cannot be extended to word level recognition.

Department of Electronics and Communication Engineering, AIET, Mijar 5


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

Priya S et al [6] have proposed a method to detect panels and to recognize the information
inside them. The proposed system extracts local descriptors at some interest key points after applying
colour segmentation. Then, images are represented as a Bag of Visual Words (BOVW) and classified
using support vector machines. Here the segmentation and BOVW methods are used for
implementation. This method reduces the size of dictionary to a limited geographical area. The
drawback of the proposed system is that the sign board is modeled using a BOVW technique from
local descriptors extracted at interest key points which is not an easy task due to immense variability
of the information included in sign board.

Devendra kumar et al [7] have discussed that the automated recognition of facial reflection
also helps to create applications that can be implemented in security system of rules and also for other
investigative aim. It also has an emerging impact on commercial identification and marketing. Facial
expression recognition system are mostly based on feature tracking from a video information. These
system can be implemented using a variety of algorithmic programs such as local binary pattern and
Viola Jones algorithm. Viola Jones algorithm is used to detect face from an image and local binary
pattern is used for expression recognition. For classification of expressions support vector machine is
used.

Marvan A Mattar et al [8] have discussed about the sign detection. Sign detection is an
extremely challenging problem. Sign detection which uses the image features, these image features is
divided into two categories, local and global. Local features are computed at multiple points in the
image and describe image patches around these points. The result is a set of feature vectors for each
image. All the feature vectors have the same dimensionality, but each image produces a different
number of features which is dependent on the interest point detector used and image content. Local
feature extraction consists of two components, the interest point detector, and the feature descriptor.
The interest point detector finds specific image structures that are considered important. The drawback
of the proposed system is that the sign detection rate needs to be improved. This can be done

Oi-Mean Foong et al [9] have proposed a sign board recognition framework for visually
impaired people. Independent navigation is always a challenge to visually impaired people. The
proposed framework will capture an image of a public sign board and transform it into a text file using

Department of Electronics and Communication Engineering, AIET, Mijar 6


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

OTSU’s optical character recognition method. The text file will be read by a speech synthesizer that
tells the visually impaired people what the image is. This framework does not require huge database
of the sign board but only the character database. The proposed framework with OCR for sign board
recognition is integrated into two parts, image to text process using the OTSU’s method to differentiate
the background and foreground object, and text to speech process using the Speech Application
Programming Interface (SAPI). The drawback of the proposed system is that the framework is not able
to differentiate between the alphabets, symbols and text image correctly.

Boris Epshtein et al [10] have discussed about detecting text in natural images. The
technology used here is OCR engine which is designed for scanned text and so depend on segmentation
which correctly separates text from background pixels. Natural images exhibit a wide range of imaging
conditions, such as color noise, blur and occlusions. One feature that separates text from other elements
of a scene is its nearly constant stroke width. The main idea presented in this work shows how to
compute the stroke width for each pixel. The operator output can be utilized to separate text from other
high frequency content of a scene. Using a logical and flexible geometric reasoning, places with similar
stroke width can be grouped together into bigger components that are likely to be words. The proposed
system needs to work on the grouping of letters by considering the directions of the recovered strokes.

Jaychand Upadhyay et al [11] have proposed a system for face recognition. Here two
algorithm analysis are used which are Eigen face and Independent Component Analysis (ICA). The
local data set is utilized for pre-processing using statistical standard techniques. Pre-processing
software, Face Identification Evaluation system version 5.0 under Unix shell scripts, was written via
American National Standard Institute (ANSII) C code. Independent component analysis algorithm is
written using Matlab for face recognition implementation. The system is based on the criteria of low
power consumption, resources optimization, and enhanced operation speed. The main goal of this work
is to form an intelligent doorbell system based on human face identification. The primary half involves
face detection with the help of harr-like filters. This system is helpful for those who aren’t at home
most of the times and need to keep track on visitors. Its utility is to be set as an alert for home visitors
and provide information about the visitors in a dynamic website and phone application, could be used
in other fields like industries, offices and even air-ports for identifying wanted people.

Department of Electronics and Communication Engineering, AIET, Mijar 7


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

A viji et al [12] have developed real time face recognition using Raspberry Pi. Input image
is captured from web camera. The facial image is detected by using Viola-Jones face detection
technique. The integral image is developed by using Haar wavelet concept to detect the face. It
considers the different intensity of values of adjacent rectangular regions. Face is detected and pointed
by using rectangular box. Feature extraction is performed by using PCA algorithm. The training images
is prepared for equal size and all images are centered. The average face vector is calculated from
images and it is subtracted from all original images in database. The classification of image is
performed by Adaboost classifier that provides classification learning task. In the real time face
recognition, when person looks into camera, his /her image is taken and given as input to Raspberry Pi
and the face recognition software is already deployed and displayed the recognized face in display of
monitor. The drawback of proposed system is that all the images must be of same size and must be
centered.

Shruthika et al [13] have proposed a system to develop a security access control application
based on face recognition. The Haar like features is used for face detection and Histogram of Oriented
Gradients (HOG), Support Vector Machine (SVM) algorithm is used for face recognition. In order to
achieve a higher accuracy and effectiveness open CV libraries and python computer language are used.
Training and identification is done in embedded device known as Raspberry Pi. The system will fall
into two categories as face detection and face recognition. In the face detection it is classified into face
versus non face region while in recognition process single face image is compared with multiple
images from the input image. The proposed system needs to work on the identification of face region
and non-face region.

Brunelli et al [14] have discussed about computer recognition of human faces. The purpose
of this paper is to compare two simple but general strategies on a common database. A new algorithms
are developed and implemented, the first one based on the computation of set of geometrical features,
such as nose width and length, mouth position and chin shape, and the second one based on almost
gray level template matching. In this system it is focused on two traditional classes of techniques
applied to the recognition of digital images of frontal views of faces under roughly constant
illumination. A face can be recognized even when the details of the individual features are no longer

Department of Electronics and Communication Engineering, AIET, Mijar 8


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

resolved. The drawback of the proposed system is that the extraction of relative positon and other
parameters of distinctive features such as eyes, mouth, nose and chin is very difficult.

Rutuja et al [15] have proposed a method on face recognition and anti-spoofing system. The
proposed architecture has been validated with real users and a real environment. First an algorithm is
proposed for the normalization face robust user as to rotations and misalignments in the face detection
algorithm. Robust normalization algorithm can significantly increase the rate of success in face
detection algorithm. Once the face has been detected, it is applied to robust detector based on facial
features Deformable Models Parts (DMP). Output detector estimates the corresponding to locations
for set of characteristic points on the image corners of the eye, corners of the mouth and nose. The
drawback of the proposed system is difficult to achieve since the data must be applied to robust
detector.

Liton Chandra Paul et al [16] have proposed a system which mainly addresses the building
of face recognition system by using Principal Component Analysis (PCA). PCA is a statistical
approach used for reducing the number of variables in face recognition. In PCA, every image in the
training set is represented as a linear combination of weighted Eigen vectors called Eigen faces. These
Eigen vectors are obtained from covariance matrix of a training image set. The weights are found out
after selecting a set of most relevant Eigen faces. Recognition is performed by projecting a test image
onto the subspace spanned by the Eigen faces and then classification is done by measuring minimum
Euclidean distance. A number of experiments were done to evaluate the performance of the face
recognition system. The drawback of the proposed system is that the different size face image
recognition is not possible.

Kriti P Bhure et al [17] have proposed the sensory substitution mechanism with the
comparison based efficient system. The system focuses on evaluating robust algorithms to recognize
and locate objects in images with better efficiency. Hence, they have proposed a system which
compares SIFT (Scale Invariant Feature Transform) and SURF (Speeded UP Robust Feature)
algorithm. The method which gives more matching, good processing speed and variations with respect
to illumination. The output of the efficient algorithm will be processed further for converting the

Department of Electronics and Communication Engineering, AIET, Mijar 9


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

information of the recognized object in to the speech form. Converting the detected object information
in to speech makes it easier and friendly to identify object for the visually impaired person.

Gary B Haung et al [18] have proposed human face images designed as an aid in studying
the problem of unconstrained face recognition. Face recognition is the problem of identifying a specific
individual, rather than merely detecting the presence of a human face, which is often called face
detection. Another database which shares important properties with LFW (Labeled Faces in the Wild)
is the BioID face database. This database consists of 1521 gray level images with a resolution of 384
by 286 pixels. Each image shows a frontal view of the face of one out of 23 different test persons. The
most important property shared by the BioID Face Database and Labeled Faces in the Wild is that both
databases strive to capture realistic settings, with significant variability in pose, lighting, and
expression.

Paul Viola et al [19] have discussed about machine learning approach for visual object
detection which is capable of processing images extremely rapidly and achieving high detection rates.
It is distinguished by three key contributions. The first is the introduction of a new image representation
called integral image which allows the features used by detector to be computed very quickly. The
second is a learning algorithm, based on AdaBoost, which selects a small number of critical visual
features from a larger set and yields extremely efficient classifiers. The third contribution is a method
for combining increasingly more complex classifiers in a cascade which allows background regions of
the image to be quickly discarded while spending more computation on promising object like regions.
The cascade can be viewed as an object specific focus of attention mechanism which unlike previous
approaches provides statistical guarantees that discarded regions are unlikely to contain the object of
interest.

Ali Mosleh et al [20] have proposed a text detection method based on a feature vector
generated from connected components produced via the stroke width transform. Several properties,
such as variant directionality of gradient of text edges, high contrast with background, and geometric
properties of text components jointly with the properties found by the stroke width transform are
considered in the formation of feature vectors. Then clustering is performed by employing the feature
vectors in a bid to distinguish text and non-text components. Finally, the obtained text components are

Department of Electronics and Communication Engineering, AIET, Mijar 10


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

grouped and the remaining components are discarded. Since the stroke width transform relies on a
precise edge detection scheme, a novel band let based edge detector which is quite effective at
obtaining text edges in images. This technique gives a high performance for the proposed method and
the effectiveness for text localization purposes.

Nitin Gaurkar et al [21] have discussed about portable camera-based information reading of
handheld packaged product for blind person. Capturing process involves several processing steps. First
the analog video signal is digitized by an analog to digital converter to produce a raw, digital data
stream. In second composite video, the luminance and chrominance are then separated. Next, the
chrominance is demodulated to produce color difference video data. At this point, the data may be
modified so as to adjust brightness, contrast, saturation and hue. The data is transformed by a color
space converter to generate data in conformance with any of several color space standards, such as
RGB. By comparing images with mat lab if scene unit matches, output of mat lab is fed to the ARM7
kit. The information obtained from mat lab is given to Bluetooth module by microcontroller and this
is transferred to Bluetooth inbuilt android mobile. In the proposed system the Bluetooth must be
replaced by Wi-Fi so that several users can connect simultaneously.

Prachee H Shah et al [22] have proposed a cost-effective prototype system to help blind
persons to shop independently. A camera based assistive text reading framework is proposed to help
blind persons read text labels and product packaging from hand held objects in their daily lives. To
isolate the object from cluttered backgrounds or other surrounding objects in the camera view, a region
of interest (ROI) in the image is defined. In the extracted ROI, text localization and text recognition
will be done to acquire text information. Text characters in the localized text regions are then converted
into binary format and recognized by trained optical character recognition software. The recognized
text codes are output to blind users in speech. The proposed framework is implemented on Raspberry
Pi board. The drawback of the proposed system is that the localization algorithm cannot be used to
process text strings with characters fewer than three.

Sunil Kumar et al [23] have proposed a novel scheme for the extraction of textual areas of an
image using globally matched wavelet filters. A clustering-based technique has been devised for
estimating globally matched wavelet filters using a collection of ground truth images. Hence, the

Department of Electronics and Communication Engineering, AIET, Mijar 11


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

system works on extending the text extraction scheme for the segmentation of document images into
text, background, and picture components (which include graphics and continuous tone images).
Matched wavelets to develop the globally matched wavelet (GMW) filters specifically adapted for the
text and non-text region. These filters are used for detecting text regions in scene images and for
segmentation of document images into text, picture and background.

Cheng-Lin Liu et al [24] have proposed a novel hybrid method to robustly and accurately
localize texts in natural scene images. A text region detector is designed to generate a text confidence
map, based on which text components can be segmented by local binarization approach. A Conditional
Random Field (CRF) model, considering the unary component property as well as binary neighboring
component relationship, is then presented to label components as text or non-text. Last, text
components are grouped into text lines with an energy minimization approach. For utilizing region
information, a text region detector is designed to measure confidences of containing texts for local
image regions, based on which components can be segmented and analyzed accurately.

Rainer Lienhart et al [25] have proposed a novel method for localizing and segmenting text
in complex images and videos. Text lines are identified by using a complex valued multilayer feed
forward network trained to detect text at a fixed scale and position. The network’s output at all scales
and positions is integrated into a single text saliency map, serving as a starting point for candidate text
lines. In the case of video, these candidate text lines are refined by exploiting the temporal redundancy
of text in video. Localized text lines are then scaled to a fixed height of 100 pixels and segmented into
a binary image with black characters on white background. For videos, temporal redundancy is
exploited to improve segmentation performance. Input images and videos can be of any size due to a
true multi resolution approach. For more efficiency, the globally adaptive threshold used in
binarization of the text bounding boxes can be replaced by a locally adaptive threshold.

Arthur et al [26] have said that identifying of large vocabulary sentences which continuously
provide the speech recognition systems are known to be computationally intensive. The method of
gaussian mixture model (GMM) computation and its various techniques are utilized under this
methodology to help the visually impaired people to get the solution about their problem. The work
consists of many number of GMM. The objectives were met by categorizing the GMM into our layers

Department of Electronics and Communication Engineering, AIET, Mijar 12


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

and section of the representatives for evaluation in each layer is done based on this framework of study.
They provided an analysis for comparison of GMM computation techniques from the four-layer
perspective. It exhibited two subtle practical issues, which provide the concept of: 1) how these
different techniques can be combined for effective usage and 2) how the beam pruning will affect the
performance of GMM computation technique.

Florence et al [27] has proposed functional specifications for a localized verbal way which
helps blind pedestrians in simple and structured urban areas. Here the analyses of route descriptions
produced for blind pedestrians are done. The analyses allowed here first provide the evidence of verbal
guidance rules and then to elaborate route descriptions of unfamiliar paths. The database is selected on
the basis of streets, sidewalks, crosswalks, and intersections and that guidance functions consists of a
combination of orientation and localization, goal location, intersection, crosswalks and warning
information as well as of progression, crossing, orientation and route-ending instructions.

Jack et al [28] have told that the work towards visually impaired people, finding their way is
categorized by two distinct methods by the use of sensing the component which provide the
information of environment for impediments to travel (e.g. information of obstacles and hazards) and
helped in navigating to remote destinations beyond the immediately perceptible environment.
Navigation, in turn, involves updating one’s position and orientation during travel with respect to the
intended route or desired destination and, in the event of becoming lost, reorienting and reestablishing
travel towards the destination. The other distinct method is by updating the position and orientation
which is classified on the basis of kinematic order position-based navigation (called pilotage or
piloting) relies on external signals indicating the traveler’s position and orientation.

David [29] has studied on the porting and optimization of CMU SPHINX, which is popular
open source of large vocabulary continuous speech recognition (LVCSR) system, which are based with
respect to some hardware modules like hand-held devices. This system operates on an average 0.87
times real time specification and utilizes 206MHz of device frequency. The concept of helping visually
impaired people came out with a first hand-held LVCSR system which is available as an open source
license. Drawback of this method is found in computational requirements of continuous speech
recognition for a medium to large vocabulary scenario. The method worked towards minimizing the

Department of Electronics and Communication Engineering, AIET, Mijar 13


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

size of overall device and power consumption of the devices. This method provides a compromise in
their hardware and operating system software utilization which further restrict their capabilities.

Scooter [30] has said that a methodology involves spatial language (SL) which is used to create
some spatial images of layout and object are followed. The spatial images are selected based on the
languages of person’s action through space, spatial images with the same precision perceptually can
be used to guide action. In particular, a small method of verbal description, like “1 o’clock, 3 m,”
results in a spatial image functionally similar to that produced by a sound source perceived to be at the
same direction and distance. The task of interest here is called spatial updating.

Siddhesh et al [31] have proposed the path finding software that resides on the server side and
is responsible for formulating directions to reach the destination. It assumes general structure of a floor.
Data sent by client acts as an input for path finding software. The path finding software then interact
with the R-tag Database and obtains Row ID and Co-ordinate of the corresponding R-tag. The database
also contains the record of the relative position of row number. This information accessed by path
finding software and accordingly it gives directions to the user whether to move towards left, right or
opposite side. Current position of the user is then compared with the destination co-ordinates and the
navigation are sent back to the PDA. The response time for giving navigation direction to user after an
R-tag has been scanned is approximately 3-4 seconds. Multiple clients will not affect response time as
each client will be serviced by creating a separate object (instance) at the server side.

Simon et al [32] have said that the visually impaired people can get help to perform the task
in their everyday life using tactile map technique. It was made possible by map reading strategies used
by the participants who led towards the effectiveness for gaining practical route based knowledge. This
work did not give the participants an overall spatial representation of the space. To explore this
possibility further, the Sheffield study considered the effect of individual differences in map reading
strategies on the type of mental representation which visually impaired people acquire from a tactile
map.

Graziano et al [33] have designed and developed electronic stick for the Sesamonet system,
which has been more influenced by the electronic progresses during the past years. The first prototype

Department of Electronics and Communication Engineering, AIET, Mijar 14


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

was based on the low frequency RFID reader produced by all flex for livestock identification and
equipped with a Bluetooth module for data communication. After the Sesamonet proof of principle
demonstration custom electronics and modules have been designed and produced in order to obtain a
fully functional safe and secure navigation system for blind people. The main characteristics and the
evolution from the first prototype to the actual electronic stick are presented in this paper.

Department of Electronics and Communication Engineering, AIET, Mijar 15


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

Chapter – 3
Fundamentals of the Project

Raspberry Pi is a small, powerful, cheap and education-oriented computer board introduced in


2012. This credit card-sized computer with many performances and affordability is perfect platform
for interfacing with many devices.

3.1 Hardware Components

The hardware components required are:

 Camera
 Raspberry Pi
 Power Bank
 Loudspeaker

3.2 Software Tools

Raspberry Pi works in Raspbian which is derived from the DBIAN operating system. The
algorithms are written using Python language. The functions in algorithm are called from the Open CV
library. Tesseract is an open source OCR engine. It assumes that its input is the binary image with
optional polygonal text region defined. Open CV is an open source computer vision lib.

3.3 Block Diagram of the Proposed System

The block diagram of the proposed system is shown in fig 3.1.

Department of Electronics and Communication Engineering, AIET, Mijar 16


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

TEXT
RECOGNITION &
FACE DETECTION

RASPBERRY
CAMERA PI

VOICE OUTPUT

Fig 3.1. Block diagram of the proposed system

3.4 Raspberry Pi

Fig 3.2.Raspberry Pi Board

Fig 3.2 shows the Raspberry Pi board. All models feature a Broadcom system on a chip (SoC)
with an integrated ARM compatible central processing unit (CPU) and on-chip graphics processing
unit (GPU).

Department of Electronics and Communication Engineering, AIET, Mijar 17


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

Processor speed ranges from 700 MHz to 1.4 GHz for the Pi 3 Model B+; on-board memory
ranges from 256 MB to 1 GB RAM. Secure Digital (SD) cards are used to store the operating system
and program memory in either SDHC or Micro SDHC sizes. The boards have one to four USB ports.
For video output, HDMI and composite video are supported, with a standard 3.5 mm phono jack for
audio output. Lower-level output is provided by a number of GPIO pins which support common
protocols like I²C. The B-models have an 8P8C Ethernet port and the Pi 3 and Pi Zero W have on-
board Wi-Fi 802.11n and Bluetooth.

The first generation (Raspberry Pi 1 Model B) was released in February 2012, followed by the
simpler and cheaper Model A. In 2014, the Foundation released a board with an improved design,
Raspberry Pi 1 Model B. These boards are approximately credit-card sized and represent the standard
mainline form-factor. Improved A+ and B+ models were released a year later. A "Compute Module"
was released in April 2014 for embedded applications.

A Raspberry Pi Zero with smaller size and reduced input/output (I/O) and general-purpose
input/output (GPIO) capabilities was released in November 2015 for US$5. By 2019, it became the
newest mainline Raspberry Pi. On 28 February 2017, the Raspberry Pi Zero W was launched, a version
of the Zero with Wi-Fi and Bluetooth capabilities, for US$10. On 12 January 2018, the Raspberry Pi
Zero WH was launched, the same version of the Zero W with pre-soldered GPIO headers.

Raspberry Pi 3 Model B with a 64 bit quad core processor, and has on-board Wi-Fi, Bluetooth
and USB boot capabilities appeared with a faster 1.4 GHz processor and a 3 times faster network based
on gigabit ethernet (300 Mbit / s) or 2.4 / 5 GHz dual-band Wi-Fi (100 Mbit / s). Other options are:
Power over Ethernet (PoE), USB boot and network boot (an SD card is no longer required). This allows
the use of the Pi in hard-to-reach places (possibly without electricity).

The organization behind the Raspberry Pi consists of two arms. The first two models were
developed by the Raspberry Pi Foundation. After the Pi Model B was released, the Foundation set up
Raspberry Pi Trading, with Eben Upton as CEO, to develop the third model, the B+. Raspberry Pi
Trading is responsible for developing the technology while the foundation is an educational charity to
promote the teaching of basic computer science in schools and in developing countries.

Department of Electronics and Communication Engineering, AIET, Mijar 18


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

The Foundation provides Raspbian, a Debian-based Linux distribution for download, as well
as third-party Ubuntu, Windows 10 IoT Core, RISC OS, and specialized media centre distributions. It
promotes Python and Scratch as the main programming language, with support for many other
languages. The default firmware is closed source, while an unofficial open source is available.

3.4.1 Hardware

The Raspberry Pi hardware has evolved through several versions that feature variations in
memory capacity and peripheral-device support.

Model A, A+, and the Pi Zero lack the Ethernet and USB hub components. The Ethernet
adapter is internally connected to an additional USB port. In Model A, A+, and the Pi Zero, the USB
port is connected directly to the system on a chip (SoC). On the Pi 1 Model B+ and later models the
USB/Ethernet chip contains a five-point USB hub, of which four ports are available, while the Pi 1
Model B only provides two. On the Pi Zero, the USB port is also connected directly to the SoC, but it
uses a micro USB (OTG) port.

3.4.2 Processor

The Raspberry Pi 2B uses a 32-bit 900 MHz quad-core ARM Cortex-A7 processor. The
Broadcom BCM2835 SoC used in the first generation Raspberry Pi which includes a 700 MHz
ARM1176JZF-S processor, Video Core IV graphics processing unit (GPU), and RAM. It has a level 1
(L1) cache of 16 KB and a level 2 (L2) cache of 128 KB. The level 2 cache is used primarily by the
GPU. The SoC is stacked underneath the RAM chip, so only its edge is visible.

The earlier V1.1 model of the Raspberry Pi 2 used a Broadcom BCM2836 SoC with a 900 MHz
32-bit quad-core ARM Cortex-A7 processor, with 256 KB shared L2 cache. The Raspberry Pi 2 V1.2
was upgraded to a Broadcom BCM2837 SoC with a 1.2 GHz 64-bit quad-core ARM Cortex-A53
processor, the same SoC which is used on the Raspberry Pi 3, but underclocked (by default) to the
same 900 MHz CPU clock speed as the V1.1. The BCM2836 SoC is no longer in production (as of
late 2016).

Department of Electronics and Communication Engineering, AIET, Mijar 19


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

The Raspberry Pi 3+ uses a Broadcom BCM2837B0 SoC with a 1.4 GHz 64-bit quad-core
ARM Cortex-A53 processor, with 512 KB shared L2 cache.

BCM2835
Architecture : ARM 1176JZF-S
Clock speed : 700MHz
Cores :1
FCC : 2011-06/29

Features – Full HD 1080p HP H.264 video encode/decode dual core video core IV multimedia
coprocessor.

BCM2836
Architecture : ARM v7 Cortex A7
Clock speed : 900MHz
Cores :4
FCC : 2016-04/05

Features – Full HD 1080p HP H.264 video encode/decode dual core video core IV multimedia
coprocessor

BCM2837
Architecture : ARM v8 Cortex A53
Clock speed : 1.2GHz
Cores :4
FCC : 2016-02/26

Features – Full HD 1080p HP H.264 video encode/decode dual core video core IV multimedia
coprocessor

Out of the three chips BCM2837 is largely preferred since it has high clock speed.

Department of Electronics and Communication Engineering, AIET, Mijar 20


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

3.4.3 Performance
The Raspberry Pi 3, with a quad-core ARM Cortex-A53 processor, is described as having 10
times the performance of a Raspberry Pi 1. This was suggested to be highly dependent upon task
threading and instruction set use. Benchmarks showed the Raspberry Pi 3 to be approximately 80%
faster than the Raspberry Pi 2 in parallelized tasks.

Raspberry Pi 2 V1.1 included a quad-core Cortex-A7 CPU running at 900 MHz and 1 GB
RAM. It was described as 4–6 times more powerful than its predecessor. The GPU was identical to the
original. In parallelized benchmarks, the Raspberry Pi 2 V1.1 could be up to 14 times faster than a
Raspberry Pi 1 Model B+.

While operating at 700 MHz by default, the first generation Raspberry Pi provided a real-world
performance roughly equivalent to 0.041 GFLOPS. On the CPU level the performance is similar to a
300 MHz Pentium II of 1997–99. The GPU provides 1G pixel/s or 1.5 G texel/s of graphics processing
or 24 GFLOPS of general purpose computing performance. The graphical capabilities of the Raspberry
Pi are roughly equivalent to the performance of the Xbox of 2001.

The LINPACK single node compute benchmark results in a mean single precision performance
of 0.065 GFLOPS and a mean double precision performance of 0.041 GFLOPS for one Raspberry Pi
Model-B board. A cluster of 64 Raspberry Pi Model B computers, labelled "Iridis-pi", achieved a
LINPACK HPL suite result of 1.14 GFLOPS (n=10240) at 216 watts for c. US$4000.

3.4.4 Overclocking
Most Raspberry Pi chips could be overclocked to 800 MHz, and some to 1000 MHz. There are
reports the Raspberry Pi 2 can be similarly overclocked, in extreme cases, even to 1500 MHz
(discarding all safety features and over-voltage limitations). In the Raspbian unix distro the
overclocking options on boot can be done by a software command running "sudo raspi-config" without
voiding the warranty. In those cases, the Pi automatically shuts the overclocking down if the chip
reaches 85 °C (185 °F), but it is possible to override automatic over-voltage and overclocking settings
(voiding the warranty); an appropriately sized heat sink is needed to protect the chip from serious
overheating

Department of Electronics and Communication Engineering, AIET, Mijar 21


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

Newer versions of the firmware contain the option to choose between five overclock ("turbo")
presets that when used, attempt to maximize the performance of the SoC without impairing the lifetime
of the board. This is done by monitoring the core temperature of the chip, the CPU load, and
dynamically adjusting clock speeds and the core voltage. When the demand is low on the CPU or it is
running too hot the performance is throttled, but if the CPU has much to do and the chip's temperature
is acceptable, performance is temporarily increased with clock speeds of up to 1 GHz depending on
the individual board and on which of the turbo settings is used.

The seven overclock presets are:


None - 700 MHz ARM, 250 MHz core, 400 MHz SDRAM, 0 overvolting,
Modest - 800 MHz ARM, 250 MHz core, 400 MHz SDRAM, 0 overvolting,
Medium - 900 MHz ARM, 250 MHz core, 450 MHz SDRAM, 2 overvolting,
High - 950 MHz ARM, 250 MHz core, 450 MHz SDRAM, 6 overvolting,
Turbo - 1000 MHz ARM, 500 MHz core, 600 MHz SDRAM, 6 overvolting,
Pi 2 - 1000 MHz ARM, 500 MHz core, 500 MHz SDRAM, 2 overvolting,
Pi 3 - 1100 MHz ARM, 550 MHz core, 500 MHz SDRAM, 6 overvolting.

In system information CPU speed will appear as 1200 MHz. When in idle speed lowers to 600MHz.
In the highest (turbo) preset the SDRAM clock was originally 500 MHz, but this was later changed to
600 MHz because 500 MHz sometimes causes SD card corruption. Simultaneously in high mode the
core clock speed was lowered from 450 to 250 MHz, and in medium mode from 333 to 250 MHz.

The CPU on the first and second generation Raspberry Pi board did not require cooling, such
as a heat sink or fan, even when overclocked, but the Raspberry Pi 3 may generate more heat when
overclocked.

3.4.5 RAM
On the older beta Model B boards, 128 MB was allocated by default to the GPU, leaving 128
MB for the CPU. On the first 256 MB release Model B (and Model A), three different splits were
possible. The default split was 192 MB (RAM for CPU), which should be sufficient for standalone
1080p video decoding, or for simple 3D. 224 MB was for Unix only, with only a 1080p framebuffer,

Department of Electronics and Communication Engineering, AIET, Mijar 22


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

and was likely to fail for any video or 3D. 128 MB was for heavy 3D, possibly also with video decoding
(e.g. XBMC). Comparatively the Nokia 701 uses 128 MB for the Broadcom Video Core IV.

3.4.6 Software Operating Systems

1. Python
Python is a high-level, interpreted, interactive and object-oriented scripting language. Python is
designed to be highly readable. It uses English keywords frequently where as other languages use
punctuation, and it has fewer syntactical constructions than other languages.
 Python is Interpreted: Python is processed at runtime by the interpreter. You do not need to
compile your program before executing it. This is similar to PERL and PHP.

 Python is Interactive: You can actually sit at a Python prompt and interact with the interpreter
directly to write your programs.

 Python is Object-Oriented: Python supports Object-Oriented style or technique of


programming that encapsulates code within objects.

 Python is a Beginner's Language: Python is a great language for the beginner-level


programmers and supports the development of a wide range of applications from simple text
processing to WWW browsers to games.

2. Python Features

Python's features include:

 Easy-to-learn: Python has few keywords, simple structure, and a clearly defined syntax. This
allows the student to pick up the language quickly.

 Easy-to-read: Python code is more clearly defined and visible to the eyes.

 Easy-to-maintain: Python's source code is fairly easy to maintain.

 A broad standard library: Python's bulk of the library is very portable and cross-platform
compatible on UNIX, Windows, and Macintosh.

Department of Electronics and Communication Engineering, AIET, Mijar 23


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

 Interactive Mode: Python has support for an interactive mode which allows interactive testing
and debugging of snippets of code.

 Portable: Python can run on a wide variety of hardware platforms and has the same interface
on all platforms.

 Extendable: You can add low-level modules to the Python interpreter. These modules enable
programmers to add to or customize their tools to be more efficient.

 Databases: Python provides interfaces to all major commercial databases.

 GUI Programming: Python supports GUI applications that can be created and ported to many
system calls, libraries, and windows systems, such as Windows MFC, Macintosh, and the X
Window system of Unix.

 Scalable: Python provides a better structure and support for large programs than shell
scripting.

 It supports functional and structured programming methods as well as OOP.

 It can be used as a scripting language or can be compiled to byte-code for building large
applications.

 It provides very high-level dynamic data types and supports dynamic type checking.

 It supports automatic garbage collection.


 It can be easily integrated with C, C++, COM, ActiveX, CORBA, and Java.

3. PYTHON ENVIRONMENT

Python is available on a wide variety of platforms including Linux and Mac OS X. Let's
understand how to set up the Python environment.

4. Local Environment Setup

Open a terminal window and type "python" to find out if it is already installed and which
version is installed.
 Unix (Solaris, Linux, FreeBSD, AIX, HP/UX, SunOS, IRIX, etc.)
 Win 9x/NT/2000

Department of Electronics and Communication Engineering, AIET, Mijar 24


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

 Macintosh (Intel, PPC, 68K)


 OS/2
 DOS (multiple versions)
 PalmOS
 Nokia mobile phones
 Windows CE
 Acorn/RISC OS
 BeOS
 Amiga
 VMS/OpenVMS
 QNX
 VxWorks
 Psion
 Python has also been ported to the Java and .NET virtual machines

5. Unix and Linux Installation

The steps to install Python on Unix/Linux machine are:

 Open a Web browser and go to http://www.python.org/download/.


 Follow the link to download zipped source code available for Unix/Linux.
 Download and extract files.
 Editing the Modules/Setup file to customize some options.
 Run/configure script
 Make
 Make install

This installs Python at standard location /usr/local/bin and its libraries at /usr/local/lib/pythonXX
where XX is the version of Python

6. Python Environment Variables


The important environment variables, which can be recognized by Python are shown in table 3.1:

Department of Electronics and Communication Engineering, AIET, Mijar 25


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

Table 3.1. Python Environment Variables

It has a role similar to PATH. This variable tells the Python interpreter
PYTHONPATH where to locate the module files imported into a program. It should
include the Python source library directory and the directories
containing Python source code. PYTHONPATH is sometimes preset by
the Python installer.

PYTHONSTARTUP It contains the path of an initialization file containing Python source


code. It is executed every time to start the interpreter. It is named as
.pythonrc.py in Unix and it contains commands that load utilities or
modify PYTHONPATH.
PYTHONCASEOK It is used in Windows to instruct Python to find the first case insensitive
match in an import statement. Set this variable to any value to activate
it.
PYTHONHOME It is an alternative module search path. It is usually embedded in the
PYTHONSTARTUP or PYTHONPATH directories to make switching
module libraries easy.

7. Python Installation on Windows

Python doesn’t come pre-packaged with Windows, but that doesn’t mean Windows users won’t
find the flexible programming language useful. It’s not quite as simple as installing the newest version
however, so let’s make sure the right tools for the task at hand.

First released in 1991, Python is a popular high-level programming language used for general
purpose programming. Not only it is an easy language (comparatively speaking, that is) to pick up but
it has thousands of projects in online that require to have Python installed to use the program.

8. Version

Department of Electronics and Communication Engineering, AIET, Mijar 26


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

Unfortunately, there was a significant update to Python several years ago that created a big split
between Python versions. After visiting the Python for Windows download page, there is a division.
Right at the top, square and center, the repository asks the latest release of Python 2 or Python 3 (2.7.13
and 3.6.1, respectively).

Fig 3.3. Different versions of python

The version is dependent on the end goal. For example, to read the article about expanding
Minecraft world with MCDungeon . The project is coded in Python and requires Python 2.7 but Python
3.6 can’t run the MCDungeon project . In fact, if it is exploring hobby projects like MCDungeon, it
will find that nearly all of them use 2.7. On the other hand, if it is actually required to learn Python,
installing both versions side by side is shown below. This lets to work with the newest version of the
language, but also run older Python scripts (and test backwards compatibility for newer projects).
It is possible to download just Python 2 or Python 3 if it is required to use a particular version.
In this work it is shown the installation of both version, it is recommended to download both versions
and do the same. Under the main entry for both versions you’ll see an “x86-64” installer, as seen below.

Department of Electronics and Communication Engineering, AIET, Mijar 27


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

Fig 3.4. Required python version

This installer will install the appropriate 32-bit or 64-bit version on the computer automatically.

9. Installation of Python 2
Installing Python 2 is a snap, and unlike in years past, the installer will even set the path variable.
Download and run the installer, select “Install for all users,” and then click “Next.”

Fig 3.5. Selecting install for all users


On the directory selection screen, leave the directory as “Python27” and click “Next.

Department of Electronics and Communication Engineering, AIET, Mijar 28


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

Fig 3.6. Selecting destination directory

On the customization screen, scroll down, click “Add python.exe to Path,” and then select “Will
be installed on local hard drive.” And click “Next.”

Fig 3.7. Selecting Add python.exe to path

Department of Electronics and Communication Engineering, AIET, Mijar 29


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

Just click through the wizard to complete the installation. When the installation is finished, it is required
to confirm the installation by opening up Command Prompt and typing the following command.

10.Installation of Python 3.6

If it is required to learn the newest version of Python, it is required to install Python 3. It can
have installed alongside with Python 2.7 with no problems, so download and run the installer now.

On the first screen, enable the “Add Python 3.6 to PATH” option and then click “Install Now.”

Fig 3.8. Python installation window


Clicking the “Disable path length limit” option removes the limitation on the MAX_PATH variable.
This change won’t break anything but it will allow Python to use long path names. Since many Python
programmers are working in Linux and other Unix systems where path name length isn’t an issue,
turning this on in advance can help smooth over any path-related issues while working in Windows.
On selecting this option, it disables the path length limit, now click on “Close” to finish the installation.

Department of Electronics and Communication Engineering, AIET, Mijar 30


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

Fig 3.9. Python installation setup was successful

For installing Python 3, same command line trick of typing python -v which is used above to
check that it is installed correctly and the path variable is set. Whenever installing both versions,
however, it is required to make the quick tweak found in the following section.

11.Accessing of Both Python Versions

This section is completely optional but will allow to quickly access both versions of Python
from the command line. After installing both versions of Python, a little quirk in the application. Even
though the system enabled the path for both Python installations, typing “python” at the command
prompt only points is Python 2.7.

The variable (whether automatically adjusted by an installer or manually tweaked) simply


points at a directory, and every executable in that directory becomes a command line command. If
there are two directories listed and both have a “python.exe” file in them, whichever directory is higher
in the list of variables gets used. And, if there is a variable set for the system and the user, the system
path takes precedence over the user path.

Department of Electronics and Communication Engineering, AIET, Mijar 31


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

The latter is exactly what’s happening in this case: The Python 2 installer edited the system
wide variable and the Python 3 installer added a user level variable and it can be confirmed by looking
at the Windows environment variables.

Hit Start, type “advanced system settings,” and then select the “View advanced system settings”
option. In the “System Properties” window that opens, on the “Advanced” tab, click the “Environment
Variables” button.

Fig 3.10. Python system properties

Here, it is shown Python 3 listed in the “User variables” section and Python 2 listed in the
“System variables” section.

Department of Electronics and Communication Engineering, AIET, Mijar 32


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

Fig 3.11. Python environment variable

There are a few ways for remedying this situation. The simplest (the one with the least
functionality) is to just remove the entry for the version of Python by using the least. While that’s
simple, it’s also not very much fun. Instead make another change that will give access to “python” for
Python 2 and “python3” for Python 3.

To do this, fire up File Manager and head to the folder where it is installed Python 3
(C:\Users\[username]\AppData\Local\Programs\Python\Python36 by default). Make a copy of the
“python.exe” file and rename that copy (not the original) to “python3.exe”.

Department of Electronics and Communication Engineering, AIET, Mijar 33


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

Fig 3.12. Selecting python3.exe and python3.exe


Open a new command prompt (the environmental variables refresh with each new command
prompt you open), and type “python3 –version”.

Fig 3.13. Python working


Now use the “python” command at the Command Prompt when it is required to use Python 2.7
and the “python3” command when it is required to use Python 3.

12.Installation of PuTTY software on Windows


PuTTY is a free software application for Windows 95, 98, XP, Vista, 7 and 10 which can be
used to make an SSH connection to the server. The application can be downloaded at,
http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html.

13.Requirements

 SSH login credentials.


 Be familiar with connecting to your server via SSH.

Secure shell (SSH) is a UNIX based command interface and protocol for securely getting access
to a remote computer. SSH is actually a suite of three utilities slogin, ssh, and scp - that are secure
versions of the earlier UNIX utilities, rlogin, rsh, and rcp. SSH commands are encrypted and secure in
several ways. Both ends of the client/server connection are authenticated using a digital certificate, and

Department of Electronics and Communication Engineering, AIET, Mijar 34


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

passwords are protected by being encrypted. SSH allows to connect to the server securely and perform
Linux command-line operations.

14.Instruction

 Download PuTTY software source file from google.


 The “PuTTY.exe” download is good for basic SSH.
 Save the download to your C:\windows folder.
 Double click on the putty.exe program or the desktop shortcut to launch the application.
 Enter your connection settings.

Fig 3.14. Putty configuration

 Type the IP address click open to start the SSH session.


 If this is first time connecting to the server from this computer, then a pop-up screen will appear.
Accept the connection by clicking Yes.

Department of Electronics and Communication Engineering, AIET, Mijar 35


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

Fig 3.15. Putty security alert

 Once the SSH Connection is open, a pop-up terminal prompt asking for the username:
User Name: pi
Password: raspberry

15.Installation of VNC Server in Windows

Virtual Network Computing(VNC) is a graphical desktop sharing system that uses the Remote
Frame Buffer protocol (RFB) to remotely control another computer. It transmits
the keyboard and mouse events from one computer to another, relaying the graphical screen updates
back in the other direction, over a network.

The following steps are used for installation process:

Step1: Download the VNC server through internet.


Step2: Double click on the VNC server application.
Step3: After clicking the VNC server application a pop-up screen will appear click on “Run” button.

Department of Electronics and Communication Engineering, AIET, Mijar 36


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

Fig 3.16. Selecting run

Step4: After clicking on the “Run” button another will appear, select “English” and click on “OK”.

Fig 3.17. Selecting the language

Step5: After clicking on the “OK” button a window will appear, click on “NEXT” button.

Fig 3.18. VNC setup wizard

Department of Electronics and Communication Engineering, AIET, Mijar 37


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

Step6: Again, another window will open click on “Change” and “NEXT”.

Fig 3.19. VNC operation

Step7: After clicking on change and next a window will appear click on “NEXT”.

Fig 3.20. Custom setup

Department of Electronics and Communication Engineering, AIET, Mijar 38


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

Step8: After selecting next button a window will appear saying “Ready to change VNC viewer” and
click on “Change”.

Fig 3.21. Ready to change VNC viewer

Step9: After clicking on change again a window will appear, if the VNC application is already installed
it will generate a pop-up screen saying that “The specified account already exists” otherwise click on
“NEXT”.

Fig 3.22. VNC server status

Department of Electronics and Communication Engineering, AIET, Mijar 39


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

Step10: After the fetching the information a window will appear and click on “Finish”.

Fig 3.23.VNC setup wizard ended prematurely

Step10: VNC server is installed successful.


Various operating systems for the Raspberry Pi can be installed on a MicroSD, MiniSD or SD card,
depending on the board and available adapters. MicroSD slot is located on the bottom of a Raspberry
Pi 2 board.

The Raspberry Pi Foundation recommends the use of Raspbian, a Debian-based Linux


operating system. Other third-party operating systems available via the official website include Ubuntu
MATE, Windows 10 IoT Core, RISC OS and specialized distributions for the Kodi media centre and
classroom management.

16.Other Operating Systems (not Unix/Linux-based):

RISC OS Pi (a special cut down version RISC OS Pico, for 16 MB cards and larger for all
models of Pi 1 & 2, has also been made available.) Plan 9 from Bell Labs and Inferno (in beta).

Department of Electronics and Communication Engineering, AIET, Mijar 40


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

Windows 10 IoT Core – a no-cost edition of Windows 10 offered by Microsoft that runs natively on
the Raspberry Pi 2.
xv6 – is a modern reimplementation of Sixth Edition Unix OS for teaching purposes; it is ported to
Raspberry Pi from MIT xv6; this xv6 port can boot from NOOBS.
Haiku – is an opensource BeOS clone that has been compiled for the Raspberry Pi and several other
ARM boards. Work on Pi 1 began in 2011, but only the Pi 2 will be supported.
HelenOS – a portable microkernel-based multi-server operating system; has basic Raspberry Pi
support since version 0.6.0.

17.Other operating systems (Unix/Linux-based):

 Android Things – an embedded version of the Android operating system designed for IoT
device development.
 Arch Linux ARM – a port of Arch Linux for ARM processors.
 OpenSUSE - SUSE Linux Enterprise Server 12 SP2
 Raspberry Pi Fedora Remix
 Gentoo Linux
 CentOS for Raspberry Pi 2 and later
 Devuan - a version of Debian with sysvinit instead of systemd
 RedSleeve (a RHEL port) for Raspberry Pi 1

Slackware ARM – version 13.37 and later runs on the Raspberry Pi without modification. The 128–
496 MB of available memory on the Raspberry Pi is at least twice the minimum requirement of 64 MB
needed to run Slackware Linux on an ARM or i386 system. (Whereas the majority of Linux systems
boot into a graphical user interface, Slackware's default user environment is the textual shell / command
line interface.) The Fluxbox window manager running under the X Window System requires an
additional 48 MB of RAM.

OpenWrt – is primarily used on embedded devices to route network traffic.


Kali Linux – is a Debian-derived distro designed for digital forensics and penetration testing.
SolydXK – is a light Debian-derived distro with Xfce.
Ark OS – is designed for website and email self-hosting.

Department of Electronics and Communication Engineering, AIET, Mijar 41


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

Sailfish OS with Raspberry Pi 2 (due to use ARM Cortex-A7 CPU; Raspberry Pi 1 uses different
ARMv6 architecture and Sailfish requires ARMv7.)
Tiny Core Linux – a minimal Linux operating system focused on providing a base system using Busy
Box and FLTK. Designed to run primarily in RAM.
Alpine Linux – is a Linux distribution based on musl and Busy Box, primarily designed for "power
users who appreciate security, simplicity and resource efficiency".
Void Linux – a rolling release Linux distribution which was designed and implemented from scratch,
provides images based on musl or glibc.
Fedora 25 – supports Pi 2 and (Pi 1 is supported by some unofficial derivatives, e.g. listed here.).
Media center operating systems
Daylight Linux – An ultra-lightweight operating system with the Flux Box interface
Raspberry Digital Signage – An operating system designed for digital signage deployments.
Driver APIs

Raspberry Pi can use a VideoCore IV GPU via a binary blob, which is loaded into the GPU at
boot time from the SD-card, and additional software, that initially was closed source. This part of the
driver code was later released. However, much of the actual driver work is done using the closed source
GPU code. Application software use calls to closed source run-time libraries (OpenMax, OpenGL ES
or OpenVG) which in turn calls an open source driver inside the Unix kernel, which then calls the
closed source VideoCore IV GPU driver code. The API of the kernel driver is specific for these closed
libraries. Video applications use OpenMAX, 3D applications use OpenGL ES and 2D applications use
OpenVG which both in turn use EGL. OpenMAX and EGL use the open source kernel driver in turn.

18.Firmware
The official firmware is a freely redistributable binary blob, that is closed-source. A minimal
open source firmware is also available.

19.Third party application software


AstroPrint – AstroPrint's wireless 3D printing software can be run on the Pi 2.
C/C++ Interpreter Ch – Released 3 January 2017, C/C++ interpreter Ch and Embedded Ch are released
free for non-commercial use for Raspberry Pi, ChIDE is also included for the beginners to learn C/C++.

Department of Electronics and Communication Engineering, AIET, Mijar 42


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

Programs can be run either from a command line interface or from a Notebook interface. There are
Wolfram Language functions for accessing connected devices. There is also a Wolfram Language
desktop development kit allowing development for Raspberry Pi in Mathematica from desktop
machines, including features from the loaded Mathematica version such as image processing and
machine learning.
Minecraft – A modified version that allows players to directly alter the world with computer code.
Real VNC – Raspbian includes Real VNC's remote access server and viewer software. This includes
a new capture technology which allows directly-rendered content (e.g. Minecraft, camera preview and
omxplayer) as well as non-X11 applications to be viewed and controlled remotely.
UserGate Web Filter - Florida-based security vendor Entensys announced porting UserGate Web Filter
to Raspberry Pi platform.

Fig 3.3 shows the components of Raspberry Pi. It contains


 4x USB two ports
 10/100 LAN port
 3.5MM 4-pole composite video and audio output jack
 CSI (Camera serial interface) camera port
 Full size HDMI video output
 Micro USB power input. Upgraded switched power source that can handle up to 2.5A
 DSI port
 Micro SD card slot
 On board Bluetooth 4.1 Wi-Fi
 Breadcom BCM 2837 64bit quad core CPU at 1.2Ghz, 1GB RAM
 40 pins extended GPIO
 Dimensions 85.6mm x 55mm x 21mm

Department of Electronics and Communication Engineering, AIET, Mijar 43


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

Fig 3.24. Raspberry Pi components

3.5 Pin diagram of Raspberry Pi

Fig 3.25. Pin diagram of Raspberry Pi


When programming the GPIO pins there are two different ways to refer to them:
 GPIO Numbering

Department of Electronics and Communication Engineering, AIET, Mijar 44


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

 Physical Numbering

3.5.1 GPIO Numbering


These are the GPIO pins as the computer sees them. Individual needs a reference board that fits
over the pins.

3.5.2 Physical Numbering


The other way to refer to the pins is by simply counting across and down from pin 1at the top
left (nearest to the SD card).

3.6 Features of Raspberry Pi

The Raspberry Pi 3 include:

 CPU: Quad-core 64-bit ARM Cortex A53 clocked at 1.2 GHz


 GPU: 400MHz VideoCore IV multimedia
 Memory: 1GB LPDDR2-900 SDRAM (i.e. 900MHz)
 USB ports: 4
 Video outputs: HDMI, composite video (PAL and NTSC) via 3.5 mm jack
 Network: 10/100Mbps Ethernet and 802.11n Wireless LAN
 Peripherals: 17 GPIO plus specific functions, and HAT ID bus
 Bluetooth: 4.1
 Power source: 5 V via MicroUSB or GPIO header
 Size: 85.60mm × 56.5mm
 Weight: 45g (1.6 oz)

3.6.1 Advantages of Raspberry Pi

1. Raspberry Pi is a small independent computer


2. It has a very large working memory (many other sensor nodes do not have).
3. It has expandable memory to store the data.
4. It operates at speeds from 700 MHz to 1000 MHz

Department of Electronics and Communication Engineering, AIET, Mijar 45


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

5. It has support for USB 2.0 which allows its expansion with a large number of peripherals.
6. Depending of the needs it is possible to expand the Raspberry Pi with WiFi and Bluetooth
adapters (power and range can be changed by changing the adapter).
7. Expansion and communication with network devices over a LAN adapter are possible.
8. This system can be easily available for the security and tracking purpose

3.6.2 Disadvantages of Raspberry Pi

The main disadvantages of Raspberry Pi are:


1. It does not have a real-time clock (RTC) with a backup battery.
2. The Raspberry Pi always boots from an SD card. It means that even a perfectly valid installation
of an operating system is available on a USB stick or an external hard drive, it can’t be booted.
In other words, external storage devices can be used but can’t be used to boot the Raspberry
Pi.
3. It does not support Bluetooth or Wi-Fi out of the box but these supports can be added by USB
dongles. Unfortunately, most Linux distributions are still a bit picky about their hardware, so
it should be first checked whether flavor of Linux supports particular device.
4. It doesn’t have built-in an Analog to Digital converter. External component must be used for
AD conversion.

3.7 Tesseract OCR

The Tesseract engine was originally developed as proprietary software at Hewlett Packard labs
in Bristol, England and Greeley, Colorado between 1985 and 1994, with some more changes made in
1996 to port to Windows, and some migration from C to C++ in 1998. A lot of the code was written in
C, and then some more was written in C++. Since then all the code has been converted to at least
compile with a C++ compiler. Very little work was done in the following decade. It was then released
as open source in 2005 by Hewlett Packard and the University of Nevada, Las Vegas (UNLV).
Tesseract development has been sponsored by Google since 2006.

Department of Electronics and Communication Engineering, AIET, Mijar 46


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

3.7.1 Features

Tesseract was in the top three OCR engines in terms of character accuracy in 1995. It is
available for Linux, Windows and Mac OS X. However, due to limited resources it is only rigorously
tested by developers under Windows and Ubuntu.

Tesseract up to and including version 2 could only accept TIFF images of simple one-column
text as inputs. These early versions did not include layout analysis, and so inputting multi-columned
text, images, or equations produced garbled output. Since version 3.00 Tesseract has supported output
text formatting, hOCR positional information and page-layout analysis. Support for a number of new
image formats was added using the Leptonica library. Tesseract can detect whether text is monospaced
or proportionally spaced.

Tesseract is suitable for use as a backend and can be used for more complicated OCR tasks
including layout analysis by using a frontend such as OCRopus.

Tesseract's output will have very poor quality if the input images are not preprocessed to suit
it. Images (especially screenshots) must be scaled up such that the text x-height is at least 20 pixels,
any rotation or skew must be corrected or no text will be recognized, low-frequency changes in
brightness must be high-pass filtered, or Tesseract's binarization stage will destroy much of the page,
and dark borders must be manually removed, or they will be misinterpreted as characters.

3.8 E-speak tool

E-Speak is a compact open source software speech synthesizer for English and other languages,
for Linux, Unix and Windows.
E-Speak uses a "formant synthesis" method. This allows many languages to be provided in a
small size. The speech is clear, and can be used at high speeds, but is not as natural or smooth as larger
synthesizers which are based on human speech recordings.

E-Speak is available as:

Department of Electronics and Communication Engineering, AIET, Mijar 47


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

A command line program (Linux and Windows) to speak text from a file or from stdin.
A shared library version for use by other programs. (On Windows this is a DLL).
A SAPI5 version for Windows, so it can be used with screen-readers and other programs that support
the Windows SAPI5 interface.
eSpeak has been ported to other platforms, including Android, Mac OSX and Solaris.

3.8.1 Features:
 Includes different Voices, whose characteristics can be altered.
 Can produce speech output as a WAV file.
 SSML (Speech Synthesis Markup Language) is supported (not complete), and also HTML.
The program and its data, including many languages, totals about 2 Mbytes.
 Can be used as a front-end to MBROLA diphone voices. eSpeak converts text to phonemes
with pitch and length information.
 Can translate text into phoneme codes, so it could be adapted as a front end for another speech
synthesis engine.
 Potential for other languages. Several are included in varying stages of progress.

3.9 Digital Image Processing

Digital image processing is the use of computer algorithms to perform image


processing on digital images. As a subcategory or field of digital signal processing, digital image
processing has many advantages over analog image processing. It allows a much wider range of
algorithms to be applied to the input data and can avoid problems such as the build-up of noise and
signal distortion during processing. Since images are defined over two dimensions (perhaps more)
digital image processing may be modeled in the form of multidimensional systems.
Image processing is a method to perform some operations on an image, in order to get an
enhanced image or to extract some useful information from it. It is a type of signal processing in which
input is an image and output may be image or characteristics/features associated with that image.
Nowadays, image processing is among rapidly growing technologies. It forms core research area
within engineering and computer science disciplines too.

Department of Electronics and Communication Engineering, AIET, Mijar 48


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

Image processing basically includes the following three steps:

 Importing the image via image acquisition tools;


 Analyzing and manipulating the image;
 Output in which result can be altered image or report that is based on image analysis.

There are two types of methods used for image processing namely, analogue and digital image
processing. Analogue image processing can be used for the hard copies like printouts and photographs.
Image analysts use various fundamentals of interpretation while using these visual techniques. Digital
image processing techniques help in manipulation of the digital images by using computers. The three
general phases that all types of data have to undergo while using digital technique are pre-processing,
enhancement, and display, information extraction.

Department of Electronics and Communication Engineering, AIET, Mijar 49


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

Chapter-4
Methodology

4.1 Architecture of the Proposed System

Fig 4.1. System architecture of camera based assistive reading

Fig 4.1 shows the System Architecture of camera based assistive text reading to help visually
impaired person in reading the text present on the captured image. The faces can also be detected when
a person enter into the frame by the mode control. This is a prototype for blind people to recognize the
products in real world by extracting the text on image and converting it into speech. This is carried out
by using Raspberry Pi and portability is achieved by using a battery backup. The system consists of:
1. Text Detection Section
2. Face Detection Section

Department of Electronics and Communication Engineering, AIET, Mijar 50


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

4.2 Flow chart of the Proposed System

Fig 4.2. Flow chart of the proposed system

Department of Electronics and Communication Engineering, AIET, Mijar 51


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

Raspberry pi works in Raspbian which is derived from the Debian operating system. The
algorithms are written using the python language which is a script language. The functions in algorithm
are called from the OpenCV library. Tesseract is an open source-OCR engine. It assumes that its input
is a binary image with optional polygonal text region defined. OpenCV is an open source computer
vision library.

Flow chart for the proposed system is shown in the Fig 4.2. The system initializes the values
for count and mode as zero. Count is to store the number of frames and mode value is to select text or
face modes. When the number of frames reaches a value of 120 frames, the system checks for face or
text depending upon the mode and gives the voice output. The switches ‘c’ and ‘m’ is used in the
system. When switch c is high it captures the image and does the processing part. The mode control is
done by switch ‘m’. The system includes a switch ‘s’ to shut down the system.

Department of Electronics and Communication Engineering, AIET, Mijar 52


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

Chapter 5
Implementation of the System

5.1 Introduction

The proposed method is to help blind person in reading the text present on the text labels,
printed notes and products as a camera based assistive text reader. The implemented idea involves text
recognition and detection of faces from image taken by camera and recognizes the text using OCR.
Conversion of the recognized text file to voice output is done by e-Speak algorithm. The system has
good portability, which is achieved by providing a battery backup.

5.2 Working of Proposed System

The proposed system has two different modes as shown in Fig. 4.1. The face and text modes
are selected using mode control switch. The system captures the frame and checks the presence of text
in the frame. It will also check the presence of face in the frame and inform the user via audio message.
If a character is found by the camera the user will be informed that image with some text was detected.
Thus, the captured image is first converted to grayscale and then filtered using a Gaussian filter to
reduce the noise in the image. Here adaptive Gaussian thresholding is used to reduce the noise in the
image. The filtered image is then converted to binary. The binarized image is cropped so that the
portions of the image with no characters are removed. The cropped frame is loaded to the Tesseract
OCR so as to perform text recognition. The output of the Tesseract OCR will be text file which will
be the input of the e-Speak. The e-Speak creates an analog signal corresponding to the text file given
as the input. The analog signal produced by the e-Speak is then given to a headphone to get the audio
output signal.

5.2.1 Camera

A compactable camera is used for image capturing. It has auto focusing capability with a
resolution of 1280X720 which is capable of capturing some good quality images. The USB powered
camera is used in order to connect it with Raspberry Pi board.

Department of Electronics and Communication Engineering, AIET, Mijar 53


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

5.2.2 Mode Selection

Mode selection is the task of selecting a statistical model from a set of candidate models, given
data. Once the set of candidate models has been chosen, the statistical analysis allows us to select the
best of these models. Model selection techniques can be considered as estimators of some physical
quantity, such as the probability of the model producing the given data. Model selection can be carried
out in many ways but most commonly used are the Alkaike Information Criterion (AIC) and the Bayes
factor. The Alkaike Information Criterion (AIC) is an estimator of the relative quality of statistical
models for a given set of data. Given a collection of models for the data, AIC estimates the quality of
each model, relative to each of the other models. Thus, AIC provides a means for model selection. The
Bayes factor is a ratio of the likelihood probability of two competing hypothesis, usually a null and an
alternative. The Bayes factor is equal to the ratio of posterior probabilities. The Bayes factor integral,
the likelihood corresponds to the maximum likelihood estimate of the parameter for each statistical
model is used, then the test becomes a classical likelihood ratio test.

5.2.3 Face Detection

Face detection is a computer technology being used in a variety of applications that identifies
human faces in digital images. First the possible human eye regions are detected by testing all the
valley regions in the gray level image. Then the genetic algorithm is used to generate all the possible
face regions which include the eyebrows, the iris, the nostril and the mouth corners. Each possible face
candidate is normalized to reduce both the lightning effect, which is caused by uneven illumination
and the shirring effect, which is due to head movement. The fitness value of each candidate is measured
based on its projection on the eigen faces. After a number of iteration, all the face candidates with a
high fitness value are selected for further verification.

5.2.4 Text Detection

In text detection, Optical Character Recognition (OCR) is used. OCR is the mechanical or
electronic conversion of images of typed, handwritten or printed text into machine encoded text,
whether from a scanned document, a photo of a document, a scene photo or from subtitle text

Department of Electronics and Communication Engineering, AIET, Mijar 54


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

superimposed on an image. OCR often pre-processes images to improve the chances of successful
recognition. It removes the positive and negative spots, smoothing edges. Convert an image from
colour or gray scale to black and white. The task of binarisation is performed as a simple way of
separating the text from the background.

5.2.5 Noise Correction and Sound Indication

Images taken with both digital cameras and conventional cameras will pick up noise from a
variety of sources. Pixels in the images are very different in colour or intensity from their surrounding
pixels. The defining characteristic is that the value of a noisy pixel bears no relation to the colour of
surrounding pixels. In Gaussian noise, each pixel in the image will be changed from its original value
by a small amount. A histogram, a plot of the amount of distortion of a pixel value against the frequency
with which it occurs, shows a normal distribution of noise. One method to remove noise is
by convolving the original image with a mask that represents a low pass filter or smoothing operation.
This convolution brings the value of each pixel into closer harmony with the values of its neighbors.
In general, a smoothing filter sets each pixel to the average value, or a weighted average, of itself and
its nearby neighbors, the Gaussian filter is just one possible set of weights. Smoothing filters tend to
blur an image, because pixel intensity values that are significantly higher or lower than the surrounding
neighborhood would smear across the area. Because of this blurring, linear filters are seldom used in
practice for noise reduction; they are, however, often used as the basis for nonlinear noise reduction
filters.

5.2.6 Thresholding

Thresholding is the simplest method of image segmentation which is used to partitioning an


image into a foreground and background. This image analysis technique is a type of image
segmentation that isolates objects by converting grayscale images into binary images. Image
thresholding is most effective in images with high levels of contrast.

5.2.7 Tesseract OCR


Tesseract is an optical character recognition engine for various operating system. It is free
software, released under the apache license, version 2.0. Tesseract is an OCR engine with support for

Department of Electronics and Communication Engineering, AIET, Mijar 55


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

Unicode and the ability to recognize more than 100 languages. Tesseract is an open source OCR
engine. It assumes that its input is a binary image with optional polygonal text region defined. The first
step is a connected component analysis in which outline of the components is stored. By the inspection
of the nesting of outlines, it is easy to detect inverse text and recognize it as early as black on white
text. At this stage, outlines are gathered together, purely by nesting, into blobs. Blobs are organized
into text lines, and the lines and regions are analyzed for fixed pitch or proportional text. Slope across
the line is used to find text lines. These lines are broken into words differently according to the kind of
character spacing. Fixed pitch text is chopped immediately by character cells. The cells are checked
for joined letters and if it is found then it is separated. Quality of recognized text is verified. If clarity
is not enough the text is passed to associate. The classifier compares each recognized letter with
training data. The word recognition is done by considering confidence and rating

5.2.8 E-speak Tool

E-speak tool is a compact open source software speech synthesizer for English and other
languages. E-speak uses a formant synthesis method. This allows many languages to be provided in a
small size. The speech is clear, and be used at high speeds, but is not natural or smooth as larger
synthesizers which are based on human speech recordings.

5.2.9 Audio Output

Audio outputs are used for playing sound.i.e., for providing voice output of text read or face
detected.

5.2.10 Conversion of Text to Voice using E-speak Tool

A TTS (Text to Speech) is composed of two parts a front end and a back end. The front end has
two major tasks, the normalization and phonetic transcription of text. Normalization, pre-processing,
or tokenization of text is the conversion of text containing symbols like abbreviations and numbers
into equivalent written-out words. The front end then assigns phonetic transcription to each word. The
prosodic units like clauses, sentences, and phrases are marked and divided. Text to phoneme
conversion is the process of assigning phonetic transcriptions to words. The output from the front end

Department of Electronics and Communication Engineering, AIET, Mijar 56


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

is a symbolic linguistic representation from the Phonetic transcriptions and the prosody information.
The back end performs the function of a synthesizer. The symbolic linguistic representation to sound
conversion is achieved using this back end. The most attractive feature of a speech synthesis system
are naturalness and intelligibility. The output sounds like human speech which describes the
naturalness, the output is ease with the intelligibility of understanding. Speech synthesis systems
usually try to maximize both naturalness and intelligibility which are the characteristics of an ideal
speech synthesizer.

5.2.11 Software Implementation

Both the Raspberry Pi module and the laptop should be connected to a single mobile network
of a user. The steps for the software implementation are indicated from figures 5.1 to 5.

1. Click on the Putty Software.

Fig 5.1. Selecting Putty software

2. Type the IP address in the given Host name or IP address in the configuration window. Click
on open option.

Department of Electronics and Communication Engineering, AIET, Mijar 57


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

Fig 5.2. Typing the IP address

3. IP address (192.168.43.179) window will open.


Login ID - pi

Fig 5.3. Putty login ID

Department of Electronics and Communication Engineering, AIET, Mijar 58


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

4. Enter the password as “raspberry”.

Fig 5.4. Password window

5. “ls” command is used to know the files present in the Raspberry Pi.

Fig 5.5. ls command window

Department of Electronics and Communication Engineering, AIET, Mijar 59


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

6. “vncserver” command is used in Putty to know the internal files of Putty.

Fig 5.6. VNC server path

7. Click on “vncserver” .

Fig 5.7. Selecting VNC software

Department of Electronics and Communication Engineering, AIET, Mijar 60


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

8. A window pops up. Type the below IP address.


“ 192.168.43.179:1”

Fig 5.8. VNC viewer window

9. Click on enter. An encryption window will open, now click on the continue option.

Fig 5.9. Encryption window

Department of Electronics and Communication Engineering, AIET, Mijar 61


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

10. Now in the new window, enter the password as “raspberry”.

Fig 5.10. Authentication window

11. VNC viewer will open with the following commands.

Department of Electronics and Communication Engineering, AIET, Mijar 62


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

Fig 5.11. Raspberry Pi desktop

12. Click on the file option. A new window opens.

Department of Electronics and Communication Engineering, AIET, Mijar 63


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

Fig 5.12. Raspberry Pi project window

Department of Electronics and Communication Engineering, AIET, Mijar 64


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

13. Click on the “captureTest.py”

Fig 5.13. Text recognition python code

Department of Electronics and Communication Engineering, AIET, Mijar 65


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

14. Type “python captureTest.py” in the putty to execute the program for Text recognition.

Fig 5.14. Executing text recognition code

Department of Electronics and Communication Engineering, AIET, Mijar 66


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

15. Click on the “faced.py” to know the code of Face detection

Fig 5.15. Face detection code

Department of Electronics and Communication Engineering, AIET, Mijar 67


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

16. Continued code of face detection.

Fig 5.16. Face detection code

17. To execute the coding of face detection, type “sudo python3 faceD.py” in the putty software

Department of Electronics and Communication Engineering, AIET, Mijar 68


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

Fig 5.17. Executing face detection code

Department of Electronics and Communication Engineering, AIET, Mijar 69


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

18. After few seconds, detected images will be shown in putty software

Fig 5.18. Executing face detection code again

Department of Electronics and Communication Engineering, AIET, Mijar 70


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

19. The detected images will be stored in 192.168.43.179:1(pi’s Xdesktop(raspberrypi:1)) VNC


software.

Fig 5.19. Detected faces set1

Department of Electronics and Communication Engineering, AIET, Mijar 71


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

20. To see the stored images, enter the File manager and click on projects.

Fig 5.20. Detected face set2

Department of Electronics and Communication Engineering, AIET, Mijar 72


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

21. To shut down the software type “sudo shutdown --p now” in the putty software

Fig 5.21. Shutdown command

Department of Electronics and Communication Engineering, AIET, Mijar 73


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

Chapter – 6
Results and Discussions

This chapter discusses the results obtained after the implementation of the project. The required
connections are made as shown in Fig 6.1.

Fig 6.1. System design


Steps #1 to #7 illustrate the various stages in obtaining the desired result.

Step #1: The captured image is first converted to grayscale and then filtered using a Gaussian filter to
reduce the noise in the image.
Step #2: The filtered image is then converted to binary.
Step #3: The binarized image is cropped so that the portions of the image with no characters are
removed.
Step #4: The cropped frame is loaded to the Tesseract OCR so as to perform text recognition.
Step #5: The output of the Tesseract OCR will be text file which will be the input of the e-Speak.

Department of Electronics and Communication Engineering, AIET, Mijar 74


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

Step #6: The e-Speak creates an analog signal corresponding to the text file given as the input.
Step #7: The analog signal produced by the e-Speak is then given to a headphone to get the audio
output signal.

Department of Electronics and Communication Engineering, AIET, Mijar 75


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

Chapter -7

Conclusion & Future Scope


7.1 Conclusion

In the proposed idea portability issue is solved by using Raspberry pi. The MATLAB is
replaced with Open CV and it results in fast processing. Open CV which is the latest tool for image
processing has more supporting libraries than MATLAB. The device consists of a camera and the
processor used is very small and can be kept inside the pocket of the user. A wired connection is
provided with the camera for fast access. Power bank provided for the system helps the device to work
for about 6 to 8 hours. By these features the device become simple, reliable and more user friendly.

The proposed system can be improved through addition of various components. Addition of
GPS to the present system will enable the user to get directions and it could give information regarding
present location of the user. Also, the device can be used for face recognition. Visually impaired person
need not guess people. He/She can identify them as the camera capture their faces. GSM module can
be added to this system to implement a panic button. If the user is in trouble, then he/she can make use
of the panic button to seek help by sending the location to some predefined mobile numbers.

This will increase the safety of blind people. The device could give better result if some training
is given to visually impaired person. By providing object detection feature to the visual narrator, it
could recognize objects that are commonly used by the visually impaired people. Recognizing objects
like currencies, tickets, visa cards, numbers or details on smart phone etc. could make the life of blind
people easier. Identification of traffic signals, sign boards and other land marks could be helpful in
traveling. Blue tooth facility could be added in order to remove the wired connection.

7.2 Future Scope

The proposed system text recognition and face detection for visually impaired people has been
summarized. The future directions are as listed below

Department of Electronics and Communication Engineering, AIET, Mijar 76


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

 This project can also be further implemented for indoor navigation, as routing the rooms, things
etc.
 This can be further implemented for navigation system such as bus-stop identification. The
bus-stop sections for blind people and wireless technology can be used.

Department of Electronics and Communication Engineering, AIET, Mijar 77


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

Department of Electronics and Communication Engineering, AIET, Mijar 78


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

Department of Electronics and Communication Engineering, AIET, Mijar 79


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

Department of Electronics and Communication Engineering, AIET, Mijar 80


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

Department of Electronics and Communication Engineering, AIET, Mijar 81


Text Recognition and Face Detection Aid for Visually Impaired People Using Raspberry PI 2017-2018

Department of Electronics and Communication Engineering, AIET, Mijar 82

You might also like