Visual Assistant Project-Content Final

CHAPTER 1
INTRODUCTION
1.1 DOMAIN INTRODUCTION
1.1.1 Machine Learning
Machine learning is the practice of helping software performs a

task without explicit programming or rules. With traditional computer
programming, a programmer specifies rules that the computer should use.
ML requires a different mindset, though. Real-world ML focuses far
more on data analysis than coding. Programmers provide a set of
examples and the computer learns patterns from the data. You can think
of machine learning as “programming with data”.
Machine learning is an application of artificial intelligence (AI)

that provides systems the ability to automatically learn and improve from
experience without being explicitly programmed. Machine learning
focuses on the development of computer programs that can access data
and use it learn for themselves.
The process of learning begins with observations or data, such as

examples, direct experience, or instruction, in order to look for patterns in
data and make better decisions in the future based on the examples that
we provide. The primary aim is to allow the computers learn
automatically without human intervention or assistance and adjust actions
accordingly.
Machine learning algorithms are often categorized as supervised or

unsupervised.
1
● Supervised Machine Learning:
Supervised machine learning algorithms can apply what has been
learned in the past to new data using labeled examples to predict future
events. Starting from the analysis of a known training dataset, the
learning algorithm produces an inferred function to make predictions
about the output values. The system is able to provide targets for any new
input after sufficient training. The learning algorithm can also compare its
output with the correct, intended output and find errors in order to modify
the Model accordingly.
● Unsupervised Machine Learning:

In contrast, unsupervised machine learning algorithms are used
when the information used to train is neither classified nor labeled.
Unsupervised learning studies how systems can infer a function to
describe a hidden structure from unlabeled data. The system doesn’t
figure out the right output, but it explores the data and can draw
inferences from datasets to describe hidden structures
from Unlabeled data.
● Semi-supervised Machine Learning:

These algorithms fall somewhere in between supervised and
unsupervised learning, since they use both labeled and unlabeled data for
training – typically a small amount of labeled data and a large amount of
unlabeled data. The systems that use this method are able to considerably
improve learning accuracy. Usually, semi-supervised learning is chosen
when the acquired labeled data requires skilled and relevant resources in
order to train it / learn from it.
2
● Reinforcement Machine Learning:
Such algorithms are a learning method that interacts with its

environment by producing actions and discovers errors or rewards. Trial
and error search and delayed reward are the most relevant characteristics
of reinforcement learning. This method allows machines and software
agents to automatically determine the ideal behavior within a specific
context in order to maximize its performance. Simple reward feedback is
required for the agent to learn which action is best; this is known as the
reinforcement signal.
Machine learning enables analysis of massive quantities of data.

While it generally delivers faster, more accurate results in order to
identify profitable opportunities or dangerous risks, it may also require
additional time and resources to train it properly. Combining machine
learning with AI and cognitive technologies can make it even more
effective in processing large volumes of information.
1.1.2 TensorFlow
TensorFlow is a free and open-source software library for dataflow

and differentiable programming across a range of tasks. It is a symbolic
math library, and is also used for machine learning applications such as
neural networks. It is used for both research and production at Google.‍ It
is a standard expectation in the industry to have experience in
TensorFlow to work in machine learning.[6]
TensorFlow is Google Brain's second-generation system. Version

1.0.0 was released on February 11, 2017. While the reference
implementation runs on single devices, TensorFlow can run on multiple
3
CPUs and GPUs (with optional CUDA and SYCL extensions for general-
purpose computing on graphics processing units). TensorFlow is
available on 64-bit Linux, macOS, Windows, and mobile computing
platforms including Android and iOS.
Its flexible architecture allows for the easy deployment of

computation across a variety of platforms (CPUs, GPUs, TPUs), and
from desktops to clusters of servers to mobile and edge devices.
TensorFlow computations are expressed as stateful dataflow

graphs. The name TensorFlow derives from the operations that such
neural networks perform on multidimensional data arrays, which are
referred to as tensors.
1.1.3 Android Software Development
Android software development is the process by which new

applications are created for devices running the Android operating
system. Google states that "Android apps can be written using Kotlin,
Java, and C++ languages" using the Android software development kit
(SDK), while using other languages is also possible.[7]
All non-JVM languages, such as Go, JavaScript, C, C++ or

assembly, need the help of JVM language code, that may be supplied by
tools, likely with restricted API support. Some languages/programming
tools allow cross-platform app support, i.e. for both Android and iOS.
Third party tools, development environments and language support have
also continued to evolve and expand since the initial SDK was released in
2008.
4
Until around the end of 2014, the officially-supported integrated
development environment (IDE) was Eclipse using the Android
Development Tools (ADT) Plugin, though IntelliJ IDEA IDE (all
editions) fully supports Android development out of the box. As of 2015,
Android Studio, made by Google and powered by IntelliJ, is the official
IDE; however, developers are free to use others, but Google made it clear
that ADT was officially deprecated since the end of 2015 to focus on
Android Studio as the official Android IDE.[7] Additionally, developers
may use any text editor to edit Java and XML files, then use command
line tools (Java Development Kit and Apache Ant are required) to create,
build and debug Android applications as well as control attached Android
devices (e.g., triggering a reboot, installing software package(s)
remotely).
What all do we need?
Below is the basic requirements you would need to get started with using
android studio to build mobile applications.
Table 1.1 System Requirements
Computer Laptop or Desktop

Storage 2 GB
OS Windows 7/8/10 or MacOS 10.10 +
Processor Intel i3 and above
RAM Minimum 3 GB (Recommended 8 GB)
Display LCD Display
Input Keyboard and Mouse
Network Ethernet or Wi-Fi
5
Table 1.2 Android Phone Requirements
Computer Android Mobile

Storage Min 1GB
OS Android API 21 (Lollipop) or Higher
Processor Snapdragon or ARM with GPU
RAM Min 4GB
Display LCD Phone Display
Input Camera
Network Wi-Fi or Mobile Data
1.2 PROJECT IDENTIFICATION
Visually impaired people are often unaware of dangers in front of

them, even in familiar environments. Furthermore, in unfamiliar
environments, such people require guidance to reduce the risk of
colliding with obstacles. This study proposes a simple smartphone-based
guiding system for solving the navigation problems for visually impaired
people and achieving obstacle avoidance to enable visually impaired
people to travel smoothly from a beginning point to a destination with
greater awareness of their surroundings.
1.3 PROJECT INTRODUCTION
Research shows that the detection of objects like a human eye has
not been achieved with high accuracy using cameras and cameras cannot
be replaced with a human eye. Detection refers to identification of an
6
object or a person by training a model by itself. Detection of images or
moving objects has been highly worked upon, and has been integrated
and used in commercial, residential and industrial environments. But,
most of the strategies and techniques have heavy limitations in the form
of computational resources, lack of proper data analysis of the measured
trained data, dependence of the motion of the objects, inability to
differentiate one object from other, and also there is a concern over speed
of the movement and light. Hence, there is a need to draft, apply and
recognize new techniques of detection that tackle the existing limitations.
Humans learn to recognize objects or humans by learning starting

from their birth. Same idea has been utilized by incorporating the
intelligence by training into a camera using neural networks and
TensorFlow. This enables to have the same intelligence in cameras,
which can be used as an artificial eye and can be used in many areas such
as surveillance, detection of objects/things etc.,
A model based on Scalable Object Detection using Deep Neural

Networks to localize and track people/cars/animals and many others in
the camera preview in real-time. This is implemented in an android
application and used handy in a mobile phone or any other smart device.
1.4 ORGANIZATION OF THE PROJECT
The project is organized into 8 chapters, each focusing on different

aspects of the project. Chapter 1 deals with the introduction about the
project domain and about the project description. Chapter 2 deals with the
various papers presented about the project and deals with the literature
survey. Chapter 3 explains the existing and proposed system and the
feasibility studies. Chapter 4 specifies the hardware and software
7
requirements for this project. Chapter 5 explains overall System
Architecture and modules it consists of, also the system is represented by
various UML diagrams. Chapter 6 describe the various techniques used in
the project and gives general introduction of it. Chapter 7 deals with the
various testing and results of the project. Chapter 8 gives the Conclusion
and the future enhancements to be done.
1.5 SUMMARY
In this chapter the domain of the project machine learning is
discussed all along with object detection and types of machine learning
concepts. The problem description give the brief description about the
main problem be resolved in this project. Later description about the
project is discussed.
8
CHAPTER 2
LITERATURE SURVEY
1. Smart Vision using Machine learning for Blind, International Journal of

Advanced Science and Technology Vol. 29, No. 5, (2020)
In this paper, they have used hardware components like raspberry Pi,
SoC, Pi camera which makes the project more accurate and effective. They have
used pre-trained model named SSD Mobile_Net_v1 which performs very well
compared to other common deep learning models
This paper provides a flexible and effective model to help the visually
challenged people. The results obtained from this prototype were accurate and
reliable. Using Tensor flow an advanced technique, the objects were trained and
given to the module, hence the object detection and identification was easier.
This model will be helpful for the visually challenged people to overcome their
disability.
9
2. Cang Ye and Xiangfei Qian,(2018) ‘3-D Object Recognition of a Robotic
Navigation Aid for the Visually Impaired’, IEEE Transactions on
Neural Systems and Rehabilitation Engineering
This paper presents a 3-D object recognition method and its

implementation on a robotic navigation aid to allow real-time detection of
indoor structural objects for the navigation of a blind person. The method
segments a point cloud into numerous planar patches and extracts their inter-
plane relationships (IPRs).Based on the existing IPRs of the object models;
the method defines six high level features (HLFs) and determines the HLFs
for each patch.
A Gaussian-mixture-model-based plane classifier is then devised to

classify each planar patch into one belonging to a particular object model.
Finally, a recursive plane clustering procedure is used to cluster the
classified planes into the model objects. As the proposed method uses
geometric context to detect an object, it is robust to the object's visual
appearance change. As a result, it is ideal for detecting structural objects
(e.g., stairways, doorways, and so on). In addition, it has high scalability and
parallelism. The method is also capable of detecting some indoor non-
structural objects. Experimental results demonstrate that the proposed
method has a high success rate in object recognition [1].
10
3. Endo, Y., Sato, K., Yamashita, A., & Matsubayashi, K. (2017).
Indoor positioning and obstacle detection for visually impaired
navigation system based on LSD-SLAM.
This paper presents a 6-degree of freedom (DOF) pose estimation (PE)

method and an indoor wayfinding system based on the method for the
visually impaired. The PE method involves two-graph simultaneous
localization and mapping (SLAM) processes to reduce the accumulative
pose error of the device. In the first step, the floor plane is extracted from the
3D camera's point cloud and added as a landmark node into the graph for 6-
DOF SLAM to reduce roll, pitch, and errors [2].
In the second step, the wall lines are extracted and incorporated into the
graph for 3-DOF SLAM to reduce X, Y, and yaw errors. The method
reduces the 6-DOF pose error and results in more accurate pose with less
computational time than the state-of-the-art planar SLAM methods. Based
on the PE method, a way finding system is developed for navigating a
visually impaired person in an indoor environment.
11
4. Jianhe Yuan,Wenming Cao, Zhihai He , Zhi Zhang , Zhiquan He (2018)
,Fast Deep Neural Networks With Knowledge Guided Training and
Predicted Regions of Interests for Real-Time Video Object Detection,
2018 IEEE Access ( Volume: 6 )
It has been recognized that deeper and wider neural networks are
continuously advancing the state-of-the-art performance of various computer
vision and machine learning tasks.[3] However, they often require large sets
of labeled data for effective training and suffer from extremely high
computational complexity, preventing them from being deployed in real-
time systems, for example vehicle object detection from vehicle cameras for
assisted driving. In this paper, we aim to develop a fast deep neural network
for real-time video object detection by exploring the ideas of knowledge-
guided training and predicted regions of interest. Specifically, we will
develop a new framework for training deep neural networks on datasets with
limited labeled samples using cross-network knowledge projection which is
able to improve the network performance while reducing the overall
computational complexity significantly. A large pre-trained teacher network
is used to observe samples from the training data.
A projection matrix is learned to project this teacher-level knowledge and

its visual representations from an intermediate layer of the teacher network
to an intermediate layer of a thinner and faster student network to guide and
regulate the training process. To further speed up the network, we propose to
train a low-complexity object detection using traditional machine learning
methods, such as support vector machine. Using this low-complexity object
12
detector, we identify the regions of interest that contain the target objects
with high confidence.
CHAPTER 3
SYSTEM OVERVIEW
3.1 EXISTING SYSTEM
Existing system proposed — Robotic Navigation Aid, Visually Impaired,

3D and Object Recognition. Visual feature matching and visual shape
matching have been employed for single-view object detection. This
model classifies a 3D scene’s planar patches by using both local features
and geometric context. This System uses an additional instrument to help
visually impaired people.
3.1.1 Drawbacks
● This model is restricted only for indoor purposes.

● This model mainly focuses on non-structural objects.
● This model uses a robotic navigation aid.
● This model does not acknowledges the user about the location of
the object.
3.2 PROPOSED SYSTEM
This Model is used to assist the visually impaired people using a mobile
application on their routine. It helps the user by locating objects in front
of them. The Technology used is Image Processing and Machine
Learning. The models are trained using Google’s Tensorflow framework .
13
They are mapped with the images from the real time computer vision.
The output of the above mentioned process is a Talkback feature.
3.3 FEASIBILITY STUDY

The feasibility of the project is analyzed in this phase and business
proposal is put forth with a very gentle plan for this project and some cost
estimates. During system analysis the feasibility study of the proposed
system is to be carried out. This is to ensure that the proposed system is
not burden to the company. For feasibility analysis, some understanding
of major requirements of the system is essential.
3.3.1 Technical Feasibility
This study is carried out to check the technical feasibility,
that is, the technical requirements of the system. Any system
developed must not have a high demands on available technical
resources. This will lead to high demands on available technical
resources. This will lead to high demand being placed on the client.
The developed system must have a modest requirement, as only
minimal or null changes are required for implementing this system.
3.3.2 Operational Feasibility
The study is carried out to check and measure of how well a

proposed system solves the problems, and takes advantage of the
opportunities identified during scope definition and how it satisfies
the requirements identified in the requirements analysis phase of
system development.
3.3.3 Economic Feasibility
The aspect of study is to check the level of acceptance of the

system by the user. This includes the process of training the users
14
to use the system efficiently. The user must not feel threatened by
the system, instead must accept it as a necessity.
3.4 SUMMARY
This chapter deals with the description of the existing system and
its disadvantages which leads to proposed system. Also it discussed about
the various feasibility of the system.
15
CHAPTER 4
SYSTEM REQUIREMENTS
4.1 HARDWARE REQUIREMENTS
The hardware requirements may serve as the basis for a contract

for the implementation of the system and should therefore be a complete
engineer as the starting point for the system design.
❏ Android Phone : API 21 or higher (Lollipop)
❏ Laptop/Desktop : i3 or Higher with min 8GB RAM
4.2 SOFTWARE REQUIREMENTS
The software requirement documents are a specification of the

system. It should include both a definition and a specification of
requirements. It is a set of what the system should rather than how it
should do it. The software requirements provide a basis for creating
software requirement specification.
Software Requirement
❏ Operating system : Windows7/8/10,macOS, Linux
❏ Front End : XML
❏ Back End : Java
❏ API : TensorFlow Object Detection API
❏ IDE : Android Studio IDE
16
4.3 SUMMARY
This chapter gives description about the hardware and software
requirements of the project.
17
CHAPTER 5
SYSTEM DESIGN
5.1 ARCHITECTURE DIAGRAM
The overall architecture diagram describes all modules of the

project. Fig 5.1 explains the flow of the project.
Fig 5.1 Architecture Diagram
5.2 MOD ULES DESCRIPTION
The project is divided into the following two modules:

1. Image Recognition and Object Detection
2.Text to Speech Conversion
18
5.2.1 Image Recognition and Object Detection
This module is used to capture the real time image through Mobile
camera. The object is the input data given by the blind person while
walking. This module is shown in Fig.5.2.
Fig 5.2 Module Diagram 1
5.2.2 Text To Speech Conversion

This module is used to convert text data provided by the mobile
application into speech. This helps the Visually Impaired person to
interact with the application and know about the surrounding. This
module is shown in Fig.5.3.
Fig 5.3 Module Diagram
5.3 UML DIAGRAMS
19
Design deals with the various UML (unified modeling language)
diagrams for the implementation for the project. Design is meaningful
engineering representation of a thing that is to be built. Software design is a
process through which the requirements are translated into representation of
the software. Design is the place where quality is rendered in software
engineering. UML to has many types of diagram which are divided into two
categories. Some types represent structural information, and the rest
represent general types of behavior, including a few that represent different
aspects of interactions.
5.3.1 Use case Diagram
20
A use case is a set of scenarios that describing an interaction
between a user and a system. A use case diagram displays the
relationship among actors and use cases. The two main
components of a use case diagram are use cases and actors. An
actor is represents a user or another system that will interact with
the system modeled. The visually impaired user in Fig 5.4 is the
actor.
Figure 5.4 Use case Diagram
21
5.3.2 Class Diagram
Class diagram represents the object orientation of a system.

A Class diagram is drawn as rectangular Box with three
compartments or components separated by horizontal line. The top
compartment holds the class name and the middle compartment
holds attribute and the bottom compartment holds list of
operations. There are three major classes as defined in Fig 5.5.
Figure 5.5 Class Diagram
22
5.3.3 Sequence Diagram
A sequence diagram shows an interaction arranged in time

sequence, it shows object participating in interaction by the
message the exchange arrange in the time sequence. Vertical
dimension represent time and horizontal dimensions Represent
object. The Fig 5.6 shows the major sequence of the app.
Figure 5.6 Sequence Diagram
23
5.3.4 Activity Diagram
Activity diagram are graphical representation of workflows

of stepwise activities and action with support for choice, iteration
and concurrency. The activity diagram can be used to describe the
business and operational step-by-step workflows of components in
a system. Activity diagram consists of initial node, activity final
node and activities in between. The start and end of the activities
are shown in Fig 5.7.
Figure 5.7 Activity Diagram for Visual assistant.
24
5.4 SUMMARY
This chapter gives the overall architecture of this project. It also
describes each module that the overall architecture composed of. The
floor of the project is depicted by various UML diagrams.
25
CHAPTER 6
SYSTEM IMPLEMENTATION
6.1 SOFTWARE DESCRIPTION
6.1.1 Android SDK
The Android software development kit (SDK) includes a

comprehensive set of development tools. These include a debugger,
libraries, a handset emulator based on QEMU, documentation, sample
code, and tutorials. Currently supported development platforms include
computers running Linux (any modern desktop Linux distribution),
Mac OS X 10.5.8 or later, and Windows 7 or later. As of March 2015,
the SDK is not available on Android itself, but software development is
possible by using specialized Android applications.
● Java
Obstacles to development include the fact that Android does not use
established Java standards, that is, Java SE and ME. This prevents
compatibility between Java applications written for those platforms and
those written for the Android platform. Android reuses the Java
language syntax and semantics, but it does not provide the full class
libraries and APIs bundled with Java SE or ME.
While most Android applications are written in Java-like language,

there are some differences between the Java API and the Android API,
and Android does not run Java bytecode by a traditional Java virtual
machine (JVM), but instead by a Dalvik virtual machine in older
versions of Android, and an Android Runtime (ART) in newer
26
versions, that compile the same code that Dalvik runs to Executable
and Linkable Format (ELF) executables containing machine code.
Java bytecode in Java Archive (JAR) files is not executed by Android

devices. Instead, Java classes are compiled into a proprietary bytecode
format and run on Dalvik (or compiled version thereof with newer
ART), a specialized virtual machine (VM) designed for Android.
Unlike Java VMs, which are stack machines (stack-based architecture),
the Dalvik VM is a register machine (register-based architecture).
● XML
Extensible Markup Language (XML) is a markup language that

defines a set of rules for encoding documents in a format that is both
human-readable and machine-readable. The W3C's XML 1.0
Specification and several other related specifications—all of them free
open standards—define XML.
The design goals of XML emphasize simplicity, generality, and

usability across the Internet. It is a textual data format with strong
support via Unicode for different human languages. Although the
design of XML focuses on documents, the language is widely used for
the representation of arbitrary data structures such as those used in web
services.
Several schema systems exist to aid in the definition of XML-based

languages, while programmers have developed many application
programming interfaces (APIs) to aid the processing of XML data.
27
6.1.2 TensorFlow
Tensorflow is Google’s Open Source Machine Learning

Framework for dataflow programming across a range of tasks.
Nodes in the graph represent mathematical operations, while the
graph edges represent the multi-dimensional data arrays (tensors)
communicated between them.[9]
TensorFlow provides a collection of workflows to develop

and train models using Python, JavaScript, or Swift, and to easily
deploy in the cloud, in the browser, or on-device no matter what
language you use. Tensors are just multidimensional arrays, an
extension of 2-dimensional tables to data with a higher dimension.
There are many features of Tensorflow which makes it appropriate
for Deep Learning.
● Object Detection API
Creating accurate machine learning models capable of

localizing and identifying multiple objects in a single image
remains a core challenge in computer vision. The TensorFlow
Object Detection API is an open source framework built on top of
TensorFlow that makes it easy to construct, train and deploy object
detection models.[8] At Google we’ve certainly found this
codebase to be useful for our computer vision needs, and we hope
that you will as well. This API is a very powerful tool that can
quickly enable anyone to build and deploy powerful image
recognition software.
28
6.2 SUMMARY
This chapter gives an overall idea about system implementation

and the libraries that is being used in the project. Explanation about every
API is given along with the API’s name.
29
CHAPTER 7
SYSTEM TESTING
7.1 AIM OF TESTING
The purpose of testing is to discover errors. Testing is the process

of trying to discover every conceivable fault or weakness in a work
product. It provides a way to check the functionality of components, sub-
assemblies, assemblies and/or a finished product It is the process of
exercising software with the intent of ensuring that the Software system
meets its requirements and user expectations and does not fail in an
unacceptable manner. There are various types of test. Each test type
addresses a specific testing requirement.
7.2 TYPES OF TESTS
7.2.1 UNIT TESTING
Unit testing involves the design of test cases that validate

that the internal program logic is functioning properly, and that
program input produces valid outputs. All decision branches and
internal code flow should be validated. It is the testing of
individual software units of the application .it is done after the
completion of an individual unit before integration. This is a
structural testing, that relies on knowledge of its construction and is
invasive. Unit tests perform basic tests at component level and test
a specific business process, application, and/or system
configuration. Unit tests ensure that each unique path of a business
30
process performs accurately to the documented specifications and
contains clearly defined inputs and expected results.
Table 7.1 Unit Testing-Module 1
Input System Constraints Expected Output

Video Stream- A Camera must be working Multiple objects must be
class room scene detected- bench,person,board
Video Stream – Camera must be working. Person label must be detected.
People
Camera view should not be obstructed.
Video Stream – Camera must be working Sometimes detected as
Laptop monitor or T.V Screen.
Table 7.2 Unit Testing-Module 2

Label of the Speaker must be working or The name of the label and the
object detected position of the object- right
Headphones must be inserted
left or in front of the camera is
given as audio.
Label of multiple Speaker must be working or The name of the label and the
objects detected position of the object- right
Headphones must be inserted
as a string. left or in front of the camera is
given as audio.
No label. Speaker must be working or No voice
31
Headphones must be connected.
7.2.2 INTEGRATION TESTING
Integration tests are designed to test integrated software

components to determine if they actually run as one program.
Testing is event driven and is more concerned with the basic
outcome of screens or fields. Integration tests demonstrate that
although the components were individually satisfaction, as shown
by successfully unit testing, the combination of components is
correct and consistent. Integration testing is specifically aimed at
exposing the problems that arise from the combination of
components.
Table 7.3 Integration Testing

Video Stream- A Camera must be working Multiple objects must be
class room scene detected- bench,person,board
and audio output.
Video Stream – Camera must be working. Person label must be detected
People and and audio output is
Camera view should not be
received
obstructed.
Video Stream - Camera must be working Nothing is detected
Window
7.2.3 SYSTEM TESTING
32
System testing ensures that the entire integrated software system
meets requirements. It tests a configuration to ensure known and
predictable results. An example of system testing is the configuration
oriented system integration test. System testing is based on process
descriptions and flows, emphasizing pre-driven process links and
integration points.
7.3 SUMMARY
In this chapter, various test performed are discussed are. Also
description and purpose of each test is described above.
33
CHAPTER 8
CONCLUSION AND FUTURE ENHANCEMENT
8.1 CONCLUSION
In conclusion, we’ve learned many things through my research of

the visually impaired. Of all the things, the most important thing we’ve
read is the obstacles and experiences people who are visually impaired go
through. When we first thought of this idea for a project, we didn’t
believe there was much to being visually impaired. But now, we realize
being visually impaired has many complicated and complex things about
it that people must deal with every day, like financial issues, emotional
issues, medications, etc. Every day is a struggle for them. This project is
to help them move with ease in their everyday life.
8.2 FUTURE ENHANCEMENT
● Facial recognition
● Processing speed of images
● Google maps navigation
8.3 SUMMARY
In this chapter, conclusions describes about the improvement achieved by
this project.
34
APPENDIX 1
SAMPLE CODE
CameraActivity.java
/*
Main Function
*/
public abstract class CameraActivity extends Activity implements

OnImageAvailableListener {
/*
Initializing Variables
*/
private static final Logger LOGGER = new Logger();
private static final int PERMISSIONS_REQUEST = 1;
private static final String PERMISSION_CAMERA =
Manifest.permission.CAMERA;
private static final String PERMISSION_STORAGE =
Manifest.permission.WRITE_EXTERNAL_STORAGE;
private Handler handler;

private HandlerThread handlerThread;
private boolean isProcessingFrame = false;
private byte[][] yuvBytes = new byte[3][];
private int[] rgbBytes = null;
private int yRowStride;
protected int previewWidth = 0;

protected int previewHeight = 0;
private Runnable postInferenceCallback;
private Runnable imageConverter;
private TextToSpeech textToSpeech;
35
@Override
protected void onCreate(final Bundle savedInstanceState) {
LOGGER.d("onCreate " + this);
super.onCreate(null);
//Set Flag to keep the screen on.
getWindow().addFlags(WindowManager.LayoutParams.FLAG_KEEP_SCREE
N_ON);
setContentView(R.layout.activity_camera);
//Check Camera and Storage Permissions

if (hasPermission()) {
setFragment();
} else {
requestPermission();
}
//Initialize Text To Speech Module

this.textToSpeech = new TextToSpeech(this, new
TextToSpeech.OnInitListener() {
@Override
public void onInit(int status) {
if (status == TextToSpeech.SUCCESS) {
LOGGER.i("onCreate", "TextToSpeech is initialised");
} else {
LOGGER.e("onCreate", "Cannot initialise text to speech!");
}
}
});
protected int[] getRgbBytes() {

imageConverter.run();
return rgbBytes; }
protected byte[] getLuminance() {
36
return yuvBytes[0];
}
/**
* Callback for Camera2 API
* Initialize Module 1 for Object Detection
*/
@Override
public void onImageAvailable(final ImageReader reader) {
//We need wait until we have some size from onPreviewSizeChosen
if (previewWidth == 0 || previewHeight == 0) {
return;
}
if (rgbBytes == null) {
rgbBytes = new int[previewWidth * previewHeight];
}
try {
final Image image = reader.acquireLatestImage();
if (image == null) {
return;
}
if (isProcessingFrame) {
image.close();
return;
}
isProcessingFrame = true;
Trace.beginSection("imageAvailable");
final Plane[] planes = image.getPlanes();
fillBytes(planes, yuvBytes);
yRowStride = planes[0].getRowStride();
final int uvRowStride = planes[1].getRowStride();
final int uvPixelStride = planes[1].getPixelStride();
imageConverter =
new Runnable() {
37
@Override
public void run() {
ImageUtils.convertYUV420ToARGB8888(
yuvBytes[0],
yuvBytes[1],
yuvBytes[2],
previewWidth,
previewHeight,
yRowStride,
uvRowStride,
uvPixelStride,
rgbBytes);
}
};
//Callback to the object detection method.

postInferenceCallback =
new Runnable() {
@Override
public void run() {
image.close();
isProcessingFrame = false;
}
};
processImage();
} catch (final Exception e) {
LOGGER.e(e, "Exception!");
Trace.endSection();
return;
}
Trace.endSection();
}
//Logging various activities while detection of objects.

@Override
public synchronized void onStart() {
38
LOGGER.d("onStart " + this);
super.onStart(); }
@Override
public synchronized void onResume() {
LOGGER.d("onResume " + this);
super.onResume();
handlerThread = new HandlerThread("inference");

handlerThread.start();
handler = new Handler(handlerThread.getLooper());
}
@Override
public synchronized void onPause() {
LOGGER.d("onPause " + this);
if (!isFinishing()) {
LOGGER.d("Requesting finish");
finish();
}
handlerThread.quitSafely();
try {
handlerThread.join();
handlerThread = null;
handler = null;
} catch (final InterruptedException e) {
LOGGER.e(e, "Exception!");
}
//Stop Text To Speech Module.

if (textToSpeech != null) {
textToSpeech.stop();
textToSpeech.shutdown();
}
super.onPause();
39
}
@Override
public synchronized void onStop() {
LOGGER.d("onStop " + this);
super.onStop();
}
@Override
public synchronized void onDestroy() {
LOGGER.d("onDestroy " + this);
super.onDestroy();
}
protected synchronized void runInBackground(final Runnable r) {

if (handler != null) {
handler.post(r);
}
}
//Get permission from user for Camera and Storage.

@Override
public void onRequestPermissionsResult(
final int requestCode, final String[] permissions, final int[] grantResults) {
if (requestCode == PERMISSIONS_REQUEST) {
if (grantResults.length > 0
&& grantResults[0] == PackageManager.PERMISSION_GRANTED
&& grantResults[1] == PackageManager.PERMISSION_GRANTED) {
setFragment();
} else {
requestPermission();
}
}
}
private boolean hasPermission() {

if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.M) {
40
return checkSelfPermission(PERMISSION_CAMERA) ==
PackageManager.PERMISSION_GRANTED &&
checkSelfPermission(PERMISSION_STORAGE) ==
PackageManager.PERMISSION_GRANTED;
} else {
return true;
}
}
private void requestPermission() {

if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.M) {
if (shouldShowRequestPermissionRationale(PERMISSION_CAMERA) ||
shouldShowRequestPermissionRationale(PERMISSION_STORAGE)) {
Toast.makeText(CameraActivity.this,
"Camera AND storage permission are required for this demo",
Toast.LENGTH_LONG).show();
}
requestPermissions(new String[]{PERMISSION_CAMERA,
PERMISSION_STORAGE}, PERMISSIONS_REQUEST);
}
}
// Returns true if the device supports the required hardware level, or better.
private boolean isHardwareLevelSupported(
CameraCharacteristics characteristics, int requiredLevel) {
int deviceLevel =
characteristics.get(CameraCharacteristics.INFO_SUPPORTED_HARDWARE_
LEVEL);
if (deviceLevel ==
CameraCharacteristics.INFO_SUPPORTED_HARDWARE_LEVEL_LEGAC
Y) {
return requiredLevel == deviceLevel;
}
// deviceLevel is not LEGACY, can use numerical sort
return requiredLevel <= deviceLevel;
}
41
private String chooseCamera() {
final CameraManager manager = (CameraManager)
getSystemService(Context.CAMERA_SERVICE);
try {
for (final String cameraId : manager.getCameraIdList()) {
final CameraCharacteristics characteristics =
manager.getCameraCharacteristics(cameraId);
// We don't use a front facing camera in this sample.

final Integer facing =
characteristics.get(CameraCharacteristics.LENS_FACING);
if (facing != null && facing ==
CameraCharacteristics.LENS_FACING_FRONT) {
continue;
}
final StreamConfigurationMap map =

characteristics.get(CameraCharacteristics.SCALER_STREAM_CONFIGURAT
ION_MAP);
if (map == null) {
continue;
}
boolean useCamera2API = isHardwareLevelSupported(characteristics,

CameraCharacteristics.INFO_SUPPORTED_HARDWARE_LEVEL_FULL);
LOGGER.i("Camera API lv2?: %s", useCamera2API);
return cameraId;
}
} catch (CameraAccessException e) {
LOGGER.e(e, "Not allowed to access camera");
}
return null;
}
protected void setFragment() {
42
String cameraId = chooseCamera();
CameraConnectionFragment camera2Fragment =
CameraConnectionFragment.newInstance(
new CameraConnectionFragment.ConnectionCallback() {
@Override
public void onPreviewSizeChosen(final Size size, final int rotation) {
previewHeight = size.getHeight();
previewWidth = size.getWidth();
CameraActivity.this.onPreviewSizeChosen(size, rotation);
}},
this,
getLayoutId(),
getDesiredPreviewFrameSize());
camera2Fragment.setCamera(cameraId);
getFragmentManager()
.beginTransaction()
.replace(R.id.container, camera2Fragment)
.commit();
}
protected void fillBytes(final Plane[] planes, final byte[][] yuvBytes) {

// Because of the variable row stride it's not possible to know in
// advance the actual necessary dimensions of the yuv or xyz planes.
for (int i = 0; i < planes.length; ++i) {
final ByteBuffer buffer = planes[i].getBuffer();
if (yuvBytes[i] == null) {
LOGGER.d("Initializing buffer %d at size %d", i, buffer.capacity());
yuvBytes[i] = new byte[buffer.capacity()];}
buffer.get(yuvBytes[i]);
}}
protected void readyForNextImage() {

if (postInferenceCallback != null) {
postInferenceCallback.run();
43
}}
protected int getScreenOrientation() {

switch (getWindowManager().getDefaultDisplay().getRotation()) {
case Surface.ROTATION_270:
return 270;
return 180;
return 90;
default:
return 0;
}}
//Define the list of classifiers present.
private List<Classifier.Recognition> currentRecognitions;
protected void toSpeech(List<Classifier.Recognition> recognitions) {

if (recognitions.isEmpty() || textToSpeech.isSpeaking()) {
currentRecognitions = Collections.emptyList();
return;
}
if (currentRecognitions != null) {
// Ignore if current and new are same.

if (currentRecognitions.equals(recognitions)) {
return;
}
final Set<Classifier.Recognition> intersection = new HashSet<>(recognitions);
intersection.retainAll(currentRecognitions);
// Ignore if new is sub set of the current

if (intersection.equals(recognitions)) {
return;
}}
currentRecognitions = recognitions;
44
speak();
}
//Create the string to be spoken through the speaker or headphone.

private void speak() {
final double rightStart = previewWidth / 2 - 0.10 * previewWidth;

final double rightFinish = previewWidth;
final double letStart = 0;
final double leftFinish = previewWidth / 2 + 0.10 * previewWidth;
final double previewArea = previewWidth * previewHeight;
StringBuilder stringBuilder = new StringBuilder();
for (int i = 0; i < currentRecognitions.size(); i++) {

Classifier.Recognition recognition = currentRecognitions.get(i);
stringBuilder.append(recognition.getTitle());
float start = recognition.getLocation().top;

float end = recognition.getLocation().bottom;
double objArea = recognition.getLocation().width() *
recognition.getLocation().height();
if (objArea > previewArea / 2) {

stringBuilder.append(" in front of you ");
} else {
if (start > letStart && end < leftFinish) {
stringBuilder.append(" on the right ");
} else if (start > rightStart && end < rightFinish) {
stringBuilder.append(" on the left ");
} else {
stringBuilder.append(" in front of you ");
}}
if (i + 1 < currentRecognitions.size()) {
stringBuilder.append(" and ");
45
}}
stringBuilder.append(" detected.");
textToSpeech.speak(stringBuilder.toString(), TextToSpeech.QUEUE_FLUSH,
null);
}
protected abstract void processImage();

protected abstract void onPreviewSizeChosen(final Size size, final int rotation);
protected abstract int getLayoutId();
protected abstract Size getDesiredPreviewFrameSize();
}
46
APPENDIX - 2
SAMPLE SCREENSHOTS
In this output the motorcycle is detected.
Fig A2.1 Vehicle
47
In this output multiple objects are detected.
Fig A2.2 Multiple objects
48
In this output, 2 person are detected in a certain distance in front of
the user.
Fig A2.3 Person
REFERENCES
49
1. Smart Vision using Machine learning for Blind, International Journal of
Advanced Science and Technology Vol. 29, No. 5, (2020).
2. Endo, Y., Sato, K., Yamashita, A., & Matsubayashi, K. (2017). Indoor
positioning and obstacle detection for visually impaired navigation system
based on LSD-SLAM.
3. Cang Ye and Xiangfei Qian,(2018) ‘3-D Object Recognition of a Robotic

Navigation Aid for the Visually Impaired’, IEEE Transactions on Neural
Systems and Rehabilitation Engineering ,Volume:26, Issue:2, pp. 441-450.
4. Jianhe Yuan,Wenming Cao, Zhihai He , Zhi Zhang , Zhiquan He (2018)

‘Fast Deep Neural Networks With Knowledge Guided Training and
Predicted Regions of Interests for Real-Time Video Object Detection’, 2018
IEEE Access ( Volume: 6 ), pp. 8990 – 8999
WEB REFERENCES
5. Shttps://www.expertsystem.com/machine-learning-definition/
6. https://en.wikipedia.org/wiki/TensorFlow
7. https://en.wikipedia.org/wiki/Android_software_development
8. https://github.com/tensorflow/models/tree/master/research/object_detection
9. https://www.tensorflow.org/
50

Visual Assistant Project-Content Final

Uploaded by

Copyright:

Available Formats

Visual Assistant Project-Content Final

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Visual Assistant Project-Content Final

Uploaded by

Copyright:

Available Formats

CHAPTER 1

1.1 DOMAIN INTRODUCTION

1.1.1 Machine Learning

Machine learning is the practice of helping software performs a

Machine learning is an application of artificial intelligence (AI)

The process of learning begins with observations or data, such as

Machine learning algorithms are often categorized as supervised or

● Unsupervised Machine Learning:

● Semi-supervised Machine Learning:

Such algorithms are a learning method that interacts with its

Machine learning enables analysis of massive quantities of data.

TensorFlow is a free and open-source software library for dataflow

TensorFlow is Google Brain's second-generation system. Version

Its flexible architecture allows for the easy deployment of

TensorFlow computations are expressed as stateful dataflow

1.1.3 Android Software Development

Android software development is the process by which new

All non-JVM languages, such as Go, JavaScript, C, C++ or

What all do we need?

Table 1.1 System Requirements

Computer Laptop or Desktop

Computer Android Mobile

1.2 PROJECT IDENTIFICATION

Visually impaired people are often unaware of dangers in front of

1.3 PROJECT INTRODUCTION

Humans learn to recognize objects or humans by learning starting

A model based on Scalable Object Detection using Deep Neural

1.4 ORGANIZATION OF THE PROJECT

The project is organized into 8 chapters, each focusing on different

1. Smart Vision using Machine learning for Blind, International Journal of

This paper presents a 3-D object recognition method and its

A Gaussian-mixture-model-based plane classifier is then devised to

This paper presents a 6-degree of freedom (DOF) pose estimation (PE)

A projection matrix is learned to project this teacher-level knowledge and

3.1 EXISTING SYSTEM

Existing system proposed — Robotic Navigation Aid, Visually Impaired,

● This model is restricted only for indoor purposes.

3.2 PROPOSED SYSTEM

3.3 FEASIBILITY STUDY

The study is carried out to check and measure of how well a

3.3.3 Economic Feasibility

The aspect of study is to check the level of acceptance of the

4.1 HARDWARE REQUIREMENTS

The hardware requirements may serve as the basis for a contract

❏ Android Phone : API 21 or higher (Lollipop)

❏ Laptop/Desktop : i3 or Higher with min 8GB RAM

4.2 SOFTWARE REQUIREMENTS

The software requirement documents are a specification of the

❏ Operating system : Windows7/8/10,macOS, Linux

❏ Front End : XML

❏ Back End : Java

❏ API : TensorFlow Object Detection API

❏ IDE : Android Studio IDE

5.1 ARCHITECTURE DIAGRAM

The overall architecture diagram describes all modules of the

Fig 5.1 Architecture Diagram

5.2 MOD ULES DESCRIPTION

The project is divided into the following two modules:

Fig 5.2 Module Diagram 1

5.2.2 Text To Speech Conversion