Artificial Intelligence-Based Voice Assistant
Artificial Intelligence-Based Voice Assistant
Artificial Intelligence-Based Voice Assistant
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue XII Dec 2022- Available at
Abstract: Workers were replaced by machines throughout the commercial revolution, sending more people into the service
industry. Chatbots and voice assistants, that might offer support to customers or users, square measure currently a part of the
digital revolution's assault on this field. Voice assistants (VA) are the type of voice- enabled artificial intelligence (AI). AI refers
to some level of intelligence displayed by digital interfaces, or the ability of algorithms to mimic intelligent human behaviour.
However, AI refers to “cognitive” functions that we tend to escort the human mind, including problem solving and learning. The
counselling response model provides a suitable response by combining the users’ input and the emotional status of the user; this
can have a consolatory impact which will create the user loaded down with depression feel higher.
A voice assistant (VA) is a sort of artificial intelligence that can respond to voice commands. Voice is currently incorporated in
varied merchandise in consumers' homes, including smartphones and smart speakers. Voice assistants are also growing more and
more important in our daily lives. AI-based voice assistants are operating systems that can recognize the human voice and respond
via integrated voices. This voice assistant will gather the audio from the microphone and then convert that into text. Later it is sent
through pyttsx3. pyttsx3 supports multiple TTS engines, including Sapi5, nsss, and espeak. While human personalities influence
how we connect with the world, voice assistant personalities (VAP) can have an impact on how we interact with our surroundings
on a daily basis.
This analysis identifies seven temperament traits shared by three popular applications: Microsoft's Cortana, Google's Assistant, and
Amazon's Alexa. This study uses and extends flow theory to investigate why VAP has the impact it does, as well as what aspects of
VAP generate the voice interaction flow experience that can influence consumers' attitudes and behavioural intentions and their
current mood.
This reveals that voice engagement with a virtual assistant that integrates operational intelligence, sincerity, and creativity
encourages customers to take charge of their voice interactions with the VA, focus on them, and engage in exploratory behaviour.
Consumer happiness and willingness to use voice assistants are influenced by consumers' experimental activity. In order to
personalize interactions with customers, VAP refers to the attribution of cognitive, emotional, and social human traits to VA.
Consumers are more engaged in their dealings with VA because of these compassionate features.
A. Drawbacks
1) Time-consuming process
2) Sentiment Analysis only performs On Twitter Data
3) The existing system doesn’t have any voice assistance option for opening any application
4) It’s Not a user-Friendly Application
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1130
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue XII Dec 2022- Available at
A. Advantage
1) Less consuming process
2) Effective Sentiment Analysis performs on user voice input
3) The proposed system has any voice assistance option for opening any application
4) It’s a user-Friendly Application
Here outputs are the most important and direct source of information for the customer and management. Intelligent output design
will improve the system's relationship with the user and help in deciding. Outputs are wont to make a permanent text of the results
for later consultation. The output generated by the system is usually considered the standard for evaluating the performance of the
For the proposed system, it's necessary that the output should be compatible with the prevailing manual reports. The outputs are
formatted with this consideration in mind. The outputs are obtained after all the phases, of the system and can be displayed or can be
produced in the hard copy. The text is very preferred since it is often employed by the controller section for future reference and it is
often used for maintaining the record.
A. User Enrollment Process
This module helps users to register with the application. Registration is mandatory since it is required for users to perform the voice
assistance options for opening any application. In the registration, form user has to fill in their personal details such as name, address,
DOB, and the mobile number, mail id details, user needs to select a username and password at the time of registration and the
username will be Unique. All the details are stored in the user table. Users can log on to this software using their user names and
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1131
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue XII Dec 2022- Available at
B. Voice Process
This module is completely for users using this modules user can give voice input. Speech recognition, or speech-totext, is the ability
of a machine or program to identify words spoken aloud and convert them into readable text. This readable text will open the
application and do the overall activities in the system. This voice model is working on a Python text to speech converter. This
module converts the speech to text with the help of sapi5 and nsss.
C. Pattern Matching
Pattern matching is a machine learning algorithm that finds pre-determined patterns among sequences of raw data or processed
After successfully converting of user's voice text is compared with a list of process names inside the system using a pattern-
matching algorithm. Finally, the application will open Based on user voice input. This will also help in asking queries to the voice
assistant that is not related to the computer system. The machine learning algorithm will match the pattern of the previously asked
queries and gives the best and most relevant answer to the user.
D. Text Classification
A support vector machine (SVM) is a machine learning algorithm that analyzes data for classification and regression analysis. SVM
is a learning method that looks at data and sorts it into multiple categories. An SVM map of the sorted data with the margins
between the two as far apart as possible. Every sentence will be segmented and each and every keyword will match with prefix and
Based on analysing the sentence will classification effectively use the SVM Classification Algorithm.
1) Pros
a) With a distinct dividing margin, it works incredibly well.
b) It works well in three-dimensional spaces.
c) When the number of dimensions exceeds the number of samples, this method works well.
d) It is memory-efficient because it uses a subset of training points (called support vectors) in the decision function.
2) Cons
a) When we have a large data collection, it does not perform well since the necessary training time is longer.
b) When the data set contains more noise, such as overlapping target classes, it does not perform well.
c) Probability estimates are produced using an expensive five-fold cross-validation method, which is not directly provided by
SVM. It's part of the Python scikit-learn library's related SVC algorithm.
F. Other Modules
There are so many small modules in this Voice assistant model.
1) SMTP: A simple mail transfer protocol is used to send mail throughout the voice at a complete process.
2) Selenium: Selenium web driver is used to mostly automate the process of the browsers that are requested by the users by voice.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1132
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue XII Dec 2022- Available at
The proposed system is to create an easy and simple voice assistant, especially for the computers that do some simple operations like
responding, opening programs like system apps, google, and some other applications, telling jokes, finding lyrics, sending emails,
dictionaries, converters. It has an Ocr function that helps to extract text from images. It can be connected to the IoT and works with
various operations like turning on and off all electrical appliances such as lights, fans, AC, etc. by voice control (based on hardware
and software).
[1] Hoy, Matthew B. (2018). "Alexa, Siri, Cortana, and More: An Introduction to Voice Assistants". Medical Reference Services Quarterly. 37 (1): 81–88.
doi:10.1080/02763869.2018.1404391. PMID 29327988. S2CID 30809087.
[2] Schwoebel, J. (2018). An Introduction to Voice Computing in Python. Boston; Seattle, Atlanta: NeuroLex Laboratories.
[3] Mozilla's large repository of voice data will shape the future of machine learning.
[4] Hill, I. (1983). "Natural language versus computer language." In M. Sime and M. Coombs (Eds.) Designing for Human-Computer Communication. Academic
[5] "1.4. Support Vector Machines — scikit-learn 0.20.2 documentation". Archived from the original on 2017- 11-08. Retrieved 2017-11-08.
[6] Wenzel, Florian; Galy-Fajou, Theo; Deutsch, Matthäus; Kloft, Marius (2017). "Bayesian Nonlinear Support Vector Machines for Big Data". Machine Learning
and Knowledge Discovery in Databases (ECML PKDD). Lecture Notes in Computer Science. 10534: 307–322. arXiv:1707.05532.
Bibcode:2017arXiv170705532W . doi:10.1007/978-3- 319-71249-9_19. ISBN 978-3-319-71248-2. S2CID4018290.
[7] Test result analysis
[8] Howard, W.R. (2007-02-20). "Pattern Recognition and Machine Learning". Kybernetes. 36 (2): 275. doi:10.1108/03684920710743466. ISSN 0368-492X.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 1133