Pfa2 2024 13
Pfa2 2024 13
Pfa2 2024 13
5, Avenue Taha Hussein–Tunis Tel. : 71 . 496 . 066 : Ah Hw þ ys ¢V CAJ , 5
B. P. 56, Bab Menara 1008 Fax : 71 . 391. 166 :HA 1008 CAn A 56 :. . Q
Acknowledgments
educational pursuit.
Additionally, we extend our sincere appreciation to all those who have played
are thankful for his openness to our ideas while maintaining a discerning eye
on our progress.
Our heartfelt appreciation also extends to everyone who has supported us along
our academic journey. We extend special acknowledgment to the members of
the jury who generously dedicated their time and expertise to evaluate our
work, providing invaluable insights and contributing to our growth.
ENSIT has been instrumental in shaping our academic and personal growth,
and we are deeply grateful for the opportunities it has provided us.
1
Table of Content
General Introduction 1
1 General Framework 2
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Presentation of the subject . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Study of the existing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Proposed solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Project management method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2
3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
General Conclusion 39
3
List of Figures
4
4.12 Some of detected objects in Arabic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.13 Recognized text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.14 Recognized text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.15 Recognized text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.16 Recognized medication leaflet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5
List of Tables
6
General Introduction
Visual impairment can significantly impact one’s ability to interact with the world effectively.
Tasks that are routine and effortless for sighted individuals, such as identifying objects, reading texts,
or navigating unfamiliar places, can pose substantial challenges for those with visual impairments.
This underscores the importance of developing assistive technologies that cater specifically to the
needs of visually impaired individuals. Innovation in this area is not just a matter of convenience;
it is a moral imperative and a societal responsibility. Engineers and technology developers play a
crucial role in advancing solutions that enhance accessibility and inclusion for all.
As software engineering students, we have chosen to develop a specialized mobile application
dedicated to assisting visually impaired individuals in order to promote inclusion and accessibility.
This decision stems from our belief in the transformative power of technology to create positive social
impact. Our application aims to leverage artificial intelligence algorithms to provide real-time object
identification for users with visual impairments, empowering them to navigate their surroundings
more independently. By focusing on the needs of this underserved community, we aspire to contribute
towards a more inclusive society where everyone, regardless of ability, has equal access to the tools
and resources needed to thrive. Through this project, we are committed to bridging the gap between
technology and accessibility, fostering empathy-driven innovation, and advocating for the rights and
empowerment of individuals with disabilities. We view this endeavor not just as a technical challenge,
but as a meaningful opportunity to make a tangible difference in the lives of others and promote a
culture of inclusivity within the field of software engineering.
This report summarizes the steps in the implementation of this system. It is structured into
four chapters as follows:
— The first chapter presents the general framework of the project.
— The second chapter presents the functional analysis and design of our application in order to
specify the requirements.
— The third chapter presents the deep learning models and libraries .
— The fourth chapter presents the realization and implementation of our solution.
We conclude this report with a general conclusion and some potential perspectives that can improve
our solution.
1
Chapter 1
General Framework
Plan
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
4 Proposed solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Chapter 1. General Framework
1.1 Introduction
The study of a project is a strategic approach that will allow us to have a vision on the
latter, thus aiming to organize the smooth running of the project. In this first chapter, we present
the context of the project as well as a study of the existing situation, the proposed solution and the
project management method that we followed.
Object recognition for visually impaired assistance revolves around the development and
implementation of technologies aimed at helping individuals with visual impairments identify objects
in their surroundings. This area of study is critical due to the challenges faced by visually impaired
individuals in recognizing and interacting with objects independently. By leveraging computer vision
and artificial intelligence techniques, researchers and developers are exploring methods to enable
real-time object recognition through devices like smartphones or wearable technology.
In our project, we are tackling the challenge of improving object recognition for visually
impaired individuals without the need for specialized hardware. Our approach involves leveraging
computer vision and artificial intelligence techniques to enable real-time object identification using
smartphones or other accessible devices. The goal is to develop an application that harnesses the
power of these technologies to assist users in identifying objects in their environment efficiently
and accurately. Additionally, our application includes text detection capabilities, allowing users to
identify text in their surroundings. Through the use of innovative algorithms and user-friendly
interfaces, we aim to create a tool that enhances accessibility and promotes independence for
individuals with visual impairments, ultimately empowering them to interact more confidently with
their surroundings.
In examining the existing landscape of applications in this field, it’s evident that there are
several offerings, although not all are specifically designed to cater to the needs of visually impaired
individuals. While these applications vary in their focus and functionality, they generally lack the
comprehensive adaptation required to effectively support users with visual impairments.
3
Chapter 1. General Framework
Our project is distinct in its aim to address this gap by focusing on improving object
recognition and text detection specifically for the visually impaired, without relying on specialized
hardware. Through the utilization of computer vision and machine learning techniques, we intend
to develop a user-friendly application that can be accessed on commonly available devices like
smartphones. This approach seeks to empower visually impaired individuals by providing them
with efficient and accurate object identification and text detection capabilities, ultimately promoting
greater independence and confidence in navigating their surroundings.
• Data understanding: focuces on identifying, collecting, and analyzing the data sets that
can help you accomplish the project goals.
• Data preparation: often referred to as “data munging”, prepares the final data set(s) for
modeling. It has five tasks: select, clean, construct, integrate and format data.
• Modeling: build and assess various models based on several different modeling techniques.
• Evaluation: looks more broadly at which model best meets the business and what to do next.
• Deployment: This is the final stage of the process. Indeed, when the developed model is
ready, it is deployed and integrated into daily use. Therefore, the objective of this stage is
deployment planning, monitoring, and maintenance.[1]
4
Chapter 1. General Framework
1.6 Conclusion
Throughout this chapter, we have presented the general context of our project as well as the
study of the existing situation, the proposed solution and the project management method. The
following chapter will be devoted to the analysis and design.
5
Chapter 2
Plan
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 Actors’ Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Chapter 2. Analysis and Design
2.1 Introduction
In this chapter, we will analyze and specify the business requirements. Next, we will present
the use case diagram along with the textual description and the detailed sequence diagram. Finally,
we will develop the overall schema of the entire application accompanied by an explanation of each
step.
This phase consists of highlighting the application’s context in order to address specific points
of the specifications and to clearly determine the functionalities. We will identify the actors in our
application.
An actor represents an abstract role performed by an external entity, such as a person, process, or
another system, that interacts with the system being designed. In our application, we have identified
just one actor who is:
— User : The user with visual impairment can get access to text recognition and to object
detection.
Functional needs express the services that the mobile application must provide in response
to the user request to meet his expectations. Our application offers the following functionalities:
— Text Recognition: Accurately detecting and recognizing text from various sources such as
documents, signs, labels, and screens.
— Object Detection: Accurately detecting and predicting objects within the user’s environment
using computer vision technology based on the default language settings (English, French,
Arabic) on the user’s phone.
— Voice Assistance: Offering a real-time audio feedback by converting recognized texts and
objects into speech, respecting the default language settings on the user’s phone.
7
Chapter 2. Analysis and Design
The non-functional needs express the internal requirements that the mobile application must
provide such as the constraints related to the environment and to the implementation. Our mobile
application must have the following characteristics:
— Accessibility : The application must embody accessibility, catering to the visually impaired
by implementing features and designs that facilitate seamless navigation and interaction.
— Ergonomics : The application must have simple interfaces that are easy to use and consistent.
It respects the density of components in each interface to satisfy the user.
— Offline Functionality :
The figure 2.1 below presents the overall operational use case diagram of the application
benefiting the actors.
8
Chapter 2. Analysis and Design
9
Chapter 2. Analysis and Design
Actor User
Nominal scenario
— User specifies the text or the area where he want to detect
text .
— Text Detected.
Actor User
Nominal scenario
— User specify the object to detect
10
Chapter 2. Analysis and Design
2.4 Design
In this section, we illustrate the dynamic aspect of our application by presenting the sequence
diagram. The sequence diagram focuses more specifically on the temporal interactions between the
actors and the system. In other words, it describes the process and messages exchanged between
them in order to produce a function.
The figure 2.2 below represents our sequence diagram :
2.5 Conclusion
During this chapter, we have elaborated on the analysis and design of our application. We
began with the specification and analysis of requirements. We identified the actors as well as the
list of functional requirements through the use case diagram accompanied by a textual description,
11
Chapter 2. Analysis and Design
12
Chapter 3
Plan
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2 MobileNet V2 model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3 OpenCV library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Chapter 3. Deep Learning models and libraries
3.1 Introduction
This chapter is devoted in the first place to select the right model, to determine the appropriate
hyper parameters and then to test the chosen model.
3.2.1 Presentation
MobileNet is a neural network architecture optimized for mobile and embedded devices,
offering efficiency and compactness. It employs depthwise separable convolutions to reduce computational
cost while maintaining accuracy. It’s commonly used for tasks like image classification, object
detection, and semantic segmentation on resource-constrained devices. We utilized MobileNet for
object detection and recognition; we employ a confidence threshold set above 0.5. This means that
only detections surpassing this threshold are accepted as valid. By implementing this strategy, we
prioritize more reliable detections, potentially enhancing the system’s accuracy.[3]
The figure below shows the logo of Mobilenet:
3.2.2 Architecture
14
Chapter 3. Deep Learning models and libraries
The V2 of the MobileNet series introduces inverted residuals and linear bottlenecks to improve
MobileNets’ performance. Inverted residuals allow the network to compute activations (ReLU) more
efficiently and preserve more information after activation. To preserve this information, it becomes
important that the last activation in the bottleneck has a linear activation. The diagram below
from the original MobileNetV2 paper shows the bottleneck and includes inverted residuals. In this
diagram, thicker blocks have more channels.
15
Chapter 3. Deep Learning models and libraries
16
Chapter 3. Deep Learning models and libraries
17
Chapter 3. Deep Learning models and libraries
18
Chapter 3. Deep Learning models and libraries
3.3.1 Presentation
3.3.1.1 OpenCV
OpenCV, which stands for Open Computer Vision, is a graphics library that provides various
image processing techniques. It is widely used in artificial intelligence and computer vision projects.
We utilized OpenCV Library 3413 for text extraction and recognition.[8]
The figure below shows the logo of OpenCV :
19
Chapter 3. Deep Learning models and libraries
20
Chapter 3. Deep Learning models and libraries
21
Chapter 3. Deep Learning models and libraries
The figure below shows how to set parameters and the input/output process:
22
Chapter 3. Deep Learning models and libraries
3.4 Conclusion
In this chapter we have selected the appropriate model and libraries for our application and
we have tested their performance. In the next chapter we will talk about deploying the model in a
mobile application.
23
Chapter 4
Plan
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Chapter 4. Realization and Implementation
4.1 Introduction
In this chapter, we will present the tools adopted as well as the hardware and software
environment where our application was developed, then we will end up introducing some interfaces.
In this section, we illustrate the selected work environments and technologies chosen to
implement our system.
To implement our solution in an easy and optimal manner, and with the aim of keeping up
with new techniques, we have opted for Java 21 which is the latest version of the Java programming
language and platform, featuring performance improvements, enhanced security, and language updates.
We used Kotlin also which is a modern, concise programming language by JetBrains, known for its
interoperability with Java and its safety features like null safety and type inference.
This application was developed on a Microsoft Surface Book 2 machine with the following
characteristics:
— RAM: 16 GB
Android Studio is a development environment created by Google for building Android applications.
It provides comprehensive tools for designing, developing, and deploying mobile apps.[10]
The figure below shows the Android studio logo.
25
Chapter 4. Realization and Implementation
Firebase ML Kit is a mobile SDK by Google for adding machine learning features to Android
and iOS apps. It provides ready-to-use models for tasks like text recognition and image labeling, as
well as support for custom models.[11] The figure below shows the Logo of Firebase ML kit.
Several packages have been used in our work, we list some of them below:
This dependencies provide support for TensorFlow Lite, a lightweight version of the TensorFlow
machine learning framework optimized for mobile and embedded devices. It offer utilities for loading
and running TensorFlow Lite models, as well as metadata parsing capabilities.[12]
The figure below shows the logo of TensorFlow Lite
26
Chapter 4. Realization and Implementation
4.2.4 Permissions
Several permissions must be allowed to use the application, we list some of them below:
This permission grants the app access to the device’s camera hardware. It allows the app to
capture photos and videos using the device’s camera.
27
Chapter 4. Realization and Implementation
In this section, we will detail each interface of the general workflow accompanied by an
explanation of each tool used to achieve our final result, which is the object and text detection and
recognition and the navigation between. The figures below illustrates the app’s logo and the splash
interface.
28
Chapter 4. Realization and Implementation
When users launch the app, they are greeted with the home interface, and a welcome message
is conveyed in multiple languages (Arabic, French, English) based on the user’s default language
settings. Subsequently, a voice guide directs users on how to utilize the app: a simple click leads to
text detection, while a double-click triggers object detection. The figures below illustrates the home
interface:
29
Chapter 4. Realization and Implementation
Users can access the object recognition interface by performing a double-click gesture on the
screen. This interface supports multiple languages: French, English, and Arabic. The language
displayed is automatically detected based on the default language setting of the user’s phone,
otherwise, the default language will be English. This approach ensures a user-friendly experience
by catering to diverse linguistic preferences and simplifying navigation through intuitive gestures.
Additionally, voice assistance is provided for every detected object, enhancing accessibility and
usability for users with varying needs and preferences. The figures below illustrate some of the
30
Chapter 4. Realization and Implementation
31
Chapter 4. Realization and Implementation
32
Chapter 4. Realization and Implementation
33
Chapter 4. Realization and Implementation
Users can access the text recognition interface with a simple click on the screen. They can take
a photo of the text to read by pressing the volume up button, followed by the volume down button.
This approach ensures a user-friendly experience, accommodating diverse linguistic preferences and
simplifying navigation through intuitive gestures. Additionally, voice assistance is provided for every
detected word, enhancing accessibility and usability for users with varying needs. The figures below
illustrates some texts detected by the Warrini’s text detection interface:
34
Chapter 4. Realization and Implementation
35
Chapter 4. Realization and Implementation
36
Chapter 4. Realization and Implementation
37
Chapter 4. Realization and Implementation
4.4 Conclusion
This chapter provides an overview of our app’s development journey. It covers technology
choices, tools, and features which are object and text recognition. The chapter also discusses
permissions, user interface details, and emphasizes a user-friendly experience throughout.
38
General Conclusion
Having completed this project, we have realized the profound impact that technology can have
on the lives of visually impaired individuals. Our journey from conceptualization to implementation
has reinforced the importance of developing solutions that address the unique challenges faced by
this community. This project is not merely a technical exercise; it represents a significant step
towards creating a more inclusive society where everyone, regardless of ability, can fully participate
and engage with the world around them.
The specialized mobile application we have developed has the potential to revolutionize the
way visually impaired individuals interact with their environment. By harnessing the power of
artificial intelligence and mobile technology, we have created a tool that empowers users to identify
objects, read texts, and navigate their surroundings with greater independence and confidence.
This project underscores the transformative power of technology to break down barriers and
promote accessibility for all.
Looking ahead, we envision expanding the capabilities of our application to further enhance
the user experience and address additional needs of the visually impaired community. One key
aspect we plan to explore is the integration of advanced navigation features, including a guiding
person feature, allowing users to share their location with a trusted individual and communicate
with them directly through the app for added support and assistance.
In conclusion, this project has been a journey of innovation, empathy, and social
responsibility. We are proud of the work we have accomplished and are excited about the potential
impact it can have on the lives of visually impaired individuals. As we continue to refine and
improve our solution, we remain committed to advocating for inclusivity and accessibility in the
field of technology. Through collaboration, empathy, and a dedication to making a difference, we
believe that we can create a more inclusive world for all.
39
Netography
40
Résumé
Le projet présente une application mobile axée sur l’aide aux personnes malvoyantes grâce
à la reconnaissance d’objets en temps réel, à la détection de texte et à l’assistance vocale.
Mots-clés : MobileNet, OpenCV, Android Studio, IA, reconnaissance d’objets,
Abstract
The project introduces a mobile application focused on aiding visually impaired individuals
through real-time object recognition, text detection, and voice assistance.
Plm
® ©rOb `S ©¤Ð QAJ± dAsm A¾Ahw A¾¯wm A¾AqybW ¤rKm dq§
.Ty wO dAsm ¤ Pn K¤ ¨l`f w ¨ An¶Ak Yl r`t
...Ty wO dAsm ,Pn K ,An¶Ak Yl r`t ,¨AnW}¯ ºA@A:y Afm Amlk