Docudetective (2) (1) Final

Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

CHAPTER 1

INTRODUCTION
Chapter 1
INTRODUCTION
1.1 Overview
The project involves developing a chatbot that provides answers by
referencing a PDF document uploaded by the user. The system utilizes Python, Streamlit,
and LangChain to create a seamless user experience. The Streamlit module is employed for
building a user-friendly interface, enabling users to upload PDF documents. LangChain is
integrated to process natural language queries and extract relevant information from the
uploaded PDF. In cases where the chatbot cannot find an answer within the provided
document, it dynamically searches the internet for relevant information. This
comprehensive approach ensures that users receive accurate and extensive responses,
combining local document knowledge with real-time internet-based information retrieval.
The synergistic use of Streamlit and LangChain enhances the user interface and natural
language processing capabilities, respectively, providing a robust and interactive solution
for answering queries based on both local and online resources.

1.2 Problem Statement


“The exiting methods of manual reading and keyword-based searches have proven
time-consuming and often ineffective in providing precise, context-aware answers from
PDF documents. The advent of ChatGPT people started to get generic answers to question
they asked but those might not be the answer they are looking for from the PDF.

1.3 Existing System

1. Keyword-Based Search Engines: Before the advent of PDF-based chatbots, users


typically relied on traditional search engines like Google to find information in PDF
documents. They had to enter specific keywords and hope that the search engine's
algorithms would lead them to relevant PDFs. This approach lacked context and often
required users to sift through numerous results to find the information they needed.

2. Manual Document Reading: Another method employed by users was manually


reading through lengthy PDF documents, which could be time-consuming and
inefficient, especially when dealing with large volumes of data.

Department of AI & ML, BIT 2023-24 Page No.1


Docu Detective.AI
1.4 Proposed Solution
The project aims to address the challenge of efficiently retrieving information from
PDF documents uploaded by users through a chatbot interface. The primary issue lies in
creating an intelligent system capable of accurately understanding and responding to
natural language queries within the uploaded PDFs. Additionally, the challenge extends to
seamlessly integrating internet searches into the chatbot's functionality to provide relevant
answers when information is not readily available in the uploaded documents. The project
seeks to overcome these hurdles by developing a robust and user-friendly solution that
seamlessly combines local document referencing and dynamic internet searches to deliver
comprehensive responses.

1.5 Motivation
Motivation for this project stems from the desire to enhance user accessibility to
information within a seamless conversational interface. By developing a chatbot capable of
referencing uploaded PDF files, we empower users to effortlessly retrieve specific
information. Moreover, integrating an internet search functionality ensures a
comprehensive knowledge base, ensuring users receive accurate and up-to-date responses
even when the information is not present in the uploaded documents. This innovative
approach aligns with the evolving needs of users, offering a dynamic and efficient solution
for information retrieval.

1.6 Objectives

• To minimize the error occurring in understanding the users statement.


• To build a chatbot which answers referencing a PDF uploaded by the user.
• To design a interface which is easy for the common man to understand and use.
• To implement using Large Language Model(LLM) flow using Lang Chain.

Department of AI & ML, BIT 2023-24 Page No.2


CHAPTER 2
LITERATURE SURVEY
Chapter 2
LITERATURE SURVEY
2.1 Survey Sources
Jitender Kumar et al., [1], It proposes an extensive comparative analysis of ChatGPT,
Google BARD and Microsoft Bing in natural language processing, machine learning, and
user experience. The study also revealed the importance of natural language processing
and machine learning in enhancing chatbot performance. It does not extensively address
ethical considerations or the evolving trends in chatbot technology. A more comprehensive
analysis would provide a more complete picture of these technologies.

Nagendra prasad Krishnam, Dr. Ashim Bora et al., [2], It proposes a development of a
multilingual talkbot, powered by AI and NLP technologies, to address the high demand for
student inquiries in both English and Arabic. This talkbot is designed to provide quick and
accurate responses to questions related to college policies, academic processes, and
extracurriculars. Absence of an extensive discussion regarding the scalability, real-world
implementation challenges, and adaptability to diverse and evolving educational
environments.

C Kavitha, K Pavan., [3], It envison and creation of chatbot framework that attempts to
improve the interest skill of chatbot by intelligently recognizing and gathering lacking data
from the user that are required for answer generation. There is lack of detailed exploration
into the practical implementation also following a gap in the research because we need
more real-world testing and results to understand how effective this chatbot system.

Asha, Nithya, Durgaprasad, Kiran., [4], It proposes an algorithm to train a chatbot that
helps giving instantaneous answers for the users query using Natural language processing,
deep learning. It also suggests that we can improve the machine by implementing
Recurrent Neural Network. It focuses on the technical aspects of chatbot development and
their applications but does not mention the ethical and privacy considerations associated
with using chatbots.

Department of AI & ML, BIT 2023-24 Page No.3


Docu Detective.AI
Jin-Hyuk, Wan-Sup Cho, Chi-Hwan Choi., [5], It proposes an technique to extracts
words using OCR from files then generate questions via Over generating Transformations
and Ranking Algorithm. It was built to form question which can be used in question papers
for exam. It has limited knowledge towards the file uploaded by the user.

Guru kiran,Angad Pal, Rishi J, Saritha K., [6], It proposes a practical application
implementing a Chatbot in PES University which is used to resolve the queries of students
and guest regarding the college management, the courses and general queries related to the
college.But the model can answer only predefined questions and no out of the box
question can be asked.

Ana Rodriguez, Rodrigo mejia, Karen Cornejo., [7], It proposes an algorithm for
Implementation of Business Process Outsourcing to phrase multiple types of input
including text and audio.The main objective is to help client grow economically through
optimization of time and resource. But it can be used only on CSV file which is used to
understand the financial growth of the business and give predictive answers.

Patchara Vanichvasin., [8], It proposes an algorithm for Implementation a Chatbot


technology in educational setting to give personalized learning support in order to increase
student research knowledge and led positive outcomes. It failed to handle unseen texts or
keywords which were mis-spelled.

Sardar Jaf, Kenneth McGarry., [9], It gives detailed explanations on the recent
advancement in Chatbot in terms of language model, applications, dataset used and
evaluation framework.

Julija Skrebeca, Paula Kalniete, Janis Goldbergs., [10], Advocates the widespread use
of chatbots, particularly during challenging times like the COVID-19 pandemic. These
chatbots, powered by artificial intelligence, are proposed to serve a dual purpose: as
interactive teaching systems in education and as tools to e-commerce processes. It lacks a
comprehensive exploration of chatbots' potential to bridge the domains of education and

Department of AI & ML, BIT 2023-24 Page No.4


Docu Detective.AI
e-commerce. Investigating how chatbots could enhance both the learning experience for
students and e-commerce.

Pallavi, Prassana Kumar, Akshath MG., [11], It proposes ideas of implementing a


chatbot for hospitals to reduce the work pressure on doctors during covid times to get
information of diseases and infection. They tried to build the knowledge base with live
information updating.

Dr. Vishwanath Karnad., [12], It proposes an algorithm for implementing a chatbot to


identify user question or query and answer accordingly and develop a database were all the
related data is stored and matched with question. It is very limited to each institute and new
modification needs to be done for each.

Munira Ansari, Talha Khan, Anupam Singh., [13], This is a theoretical papers, which
tells us all the ways of how to implement web based chatbot and compares different
technique by giving all the pros and cons. In this cloud storage plays a vital role and might
be expensive if not used properly.

Kokou Gaglo, Bessan Melckior Degboe., [14], This is a practical implementation of a


chatbot build during the covid time when all the school was shut down and it told us about
the technology like NLP and how it can be used in education remediation in day-to-day
life. But it never test creativity or unique thinking, there is no effectiveness and user
experience of this chatbot in real education.

Satyendra Praneel Reddy Karri, Dr B Santosh., [15], This is a practical implementation


of chatbot build based on ELIZA and tried to over come the drawback of speech to Text
and to increase the accuracy of the system output. It failed in discussing and analysis
regarding the specific topic given, challenges faced by chatbots in truly emulating human-
like responses.

Department of AI & ML, BIT 2023-24 Page No.5


CHAPTER 3
SYSTEM ANALYSIS AND MODELLING
Chapter 3
SYSTEM ANALYSIS AND MODELLING
3.1 System Analysis

3.1.1 Use Case Diagram

Fig 3.1: Use Case Diagram


The main actors in the diagram are:

• External Developer: The person who builds the application.


• Non technical User: The person who uses the application.
• Open API Service: Connects to OpenAI server.
• Chatbot main: It provides all the modules for chatbot application.
• Database: The place where all the user uploaded pdf are stored.

The steps in the process are as follows:

• First the developer starts developing the application.


• Connects to OpenAI service using API keys.
• Developer uses the needed modules from chatbot main for application.
• The user uses the application and uploads the pdf.

Department of AI & ML, BIT 2023-24 Page No.6


Docu Detective.AI
• The pdf is stored in the database.
• User as the question and the application in the backend will run the algorithms to
find the answer to the question from the pdf uploaded by the user and in case the
answer is not found will search on the internet for the answer.
• The answer is displayed as output on the frontend.

A use case diagram is crucial in systems analysis and design as it illustrates the
interactions between system components and external entities, showcasing how users
interact with a system. It helps identify and define system functionalities, requirements, and
user roles, providing a high-level view of the system's behavior. This diagram serves as a
communication tool between stakeholders, aiding in the understanding of system
functionality and ensuring alignment with user needs and business goals.

Department of AI & ML, BIT 2023-24 Page No.7


CHAPTER 4
SYSTEM ARCHITECTURE
Chapter 4

SYSTEM ARCHITECTURE

Fig 4.1: System Architecture

Modules:

• User interface: It is a critical aspect of chatbot technology. It refers to the overall


experience that users have when interacting with the chatbot. A well-designed
chatbot that is easy to use and provides relevant responses can provide a positive
user experience and increase user engagement.
• NLU: It enables chatbots to understand the context and meaning of user inputs,
which is important in providing relevant responses. Another machine learning
technique used in chatbot development is natural language generation (NLG).
• Large language models - LLMs are emerging as a transformative technology,
enabling developers to build applications that they previously could not. But using
these LLMs in isolation is often not enough to create a truly powerful app — the
real power comes when you are able to combine them with other sources of
computation or knowledge. LangChain library is aimed at assisting in the
development of those types of applications.
• Datasources: These are divided into 2 categories
1. Knowledge Base: It contains the pdf document uploaded by the user which
the machine refers to.
2. Web: The connection to WWW will help us find answers to question
which are not in the pdf file uploaded.
Department of AI & ML, BIT 2023-24 Page No.8
CHAPTER 5
SOFTWARE REQUIREMENT SPECIFICATIONS
Chapter 5

SOFTWARE REQUIREMENT SPECIFICATIONS

5.1 Functional Requirements

API calls:
• Client responsibility: The client will send a GET request to the web api with the
question as a URL parameter. The client will upload the pdf and then ask the
question in a single line.
• Server responsibility: The server will send all the API data in JSON response
documents. The sever will respond with 200 OK status code if the request is found
in the database. The server will respond with 400 BAD status code if the request is
not found in the document and will reach out to SERP API.

5.2 Nonfunctional Requirements


• Accuracy: Errors made should be minimum.
• Fast response: Waiting time should be minimal.
• Security: Implementation of security measures to protect user data, including secure
transmission (HTTPS) and proper authentication mechanisms.
• Performance: Respond promptly, with low latency, even during peak usage periods.

5.3 System Requirements

Software Requirement

• Operating system: windows 7 or above.


• IDE: VS Code.
• API level: 19 or above.
• Browser compatibility.

Hardware Requirement

• Processor: Pentium IV or above.


• RAM: 8GB or more.
• Internet Connectivity.

Department of AI & ML, BIT 2023-24 Page No.9


CHAPTER 6
PROJECT PLANNING REPORT
Chapter 6

PROJECT PLANNING REPORT


A Gantt chart is essential for project management as it visually represents project
tasks, their durations, and dependencies over time. It provides a clear and comprehensive
overview, facilitating effective scheduling, coordination, and tracking of project progress,
thus helping teams and stakeholders manage and complete projects efficiently.

Fig 6.1: Project Planning

Department of AI & ML, BIT 2023-24 Page No.10


CHAPTER 7
APPLICATIONS
Chapter 7

APPLICATIONS
1. Customer Support:
• Users can upload product manuals or guides, and the chatbot can help them find
information or troubleshoot issues.
• The chatbot can refer to the uploaded documentation to provide step-by-step instructions
or solutions.
2. Educational Assistance:
• Students can upload lecture notes, textbooks, or research papers, and the chatbot can
help them with questions related to the material.
• The chatbot can serve as a study aid, providing explanations or additional information
based on the uploaded documents.
3. Technical Documentation:
• In a professional setting, employees can upload technical manuals or documentation,
and the chatbot can assist with queries related to procedures or specifications.
• It can help streamline the process of accessing and understanding complex technical
information.
4. Legal Assistance:
• Users can upload legal documents, and the chatbot can provide information on legal
terms, processes, or general advice.
• The chatbot can guide users through legal documents and offer explanations or
interpretations.
5. Research Support:
• Researchers can use the chatbot to quickly access relevant information from a large set
of documents.
• The chatbot can provide summaries, key points, or references from the uploaded
documents.
6. Healthcare Information:
• Patients can upload medical documents, and the chatbot can provide explanations of
medical terms, procedures, or general health information.
• The chatbot can also direct users to reliable online health resources.

Department of AI & ML, BIT 2023-24 Page No.11


CONCLUSION
The integration of PDF file parsing allows users to leverage information from their own
documents, making the chatbot a valuable tool for personal and professional use. Users can
simply upload a PDF file, and the chatbot intelligently extracts relevant information to
address inquiries. This feature enhances the user experience by eliminating the need for
manual input and streamlining the process of obtaining specific information from
documents. Moreover, the chatbot's ability to extend its search to the internet when
information is not found in the uploaded PDF file adds an extra layer of adaptability. This
ensures that users receive answers even when the necessary data is not available locally.
The integration of internet search functionality broadens the scope of the chatbot's
knowledge and enhances its overall utility.
REFERENCES
[1] Jitender Kumar, Saumyamani Bhardwaz, (2023). “An Extensive Comparative
Analysis of Chatbot Technologies-ChatGPT,Google BARD and Microsoft
Bing”.
[2] Nagendra prasad Krishnam, Dr. Ashim Bora, Dr. R.S.V.Rama Swathi,
(2023). “AI-Based advanced Talk-chatbot for Implementation”.
[3] C Kavitha, K Pavan, (2023). “A chatbot system for education NLP using
Deep Learning”.
[4] Asha, Nithya, Durgaprasad, Kiran, (2022). “Implication and advantage of
machine learning based chatbot in diverse discipline”.
[5] Jin-Hyuk, Wan-Sup Cho, Chi-Hwan Choi, (2022). “Smart Answering
Chatbot Based on OCR”.
[6] Guru kiran,Angad Pal, Rishi J, Saritha K, (2022). “Cross Domain Answering
FAQ Chatbot”.
[7] Ana Rodriguez, Rodrigo mejia, Karen Cornejo,(2022). “Chatbot analysis for
the creation of Automated Conversations in Real Time”.
[8] Patchara Vanichvasin, (2022). “Chatbot Development as Digital Learning
Tool to increase Students’ Research Knowledge”.
[9] Sardar Jaf, Kenneth McGarry, (2022) .“Recent Advances in Chatbot”.
[10] Julija Skrebeca, Paula Kalniete, Janis Goldbergs, (2021).“Modern
Development trend of chatbot using Artificial Intelligence”.
[11] Pallavi, Prassana Kumar, Akshath MG, (2021). “ChatDoc(Medical
Assitant)”.
[12] ] Dr. Vishwanath Karnad, (2021). “Chatbot Development for educational
Institute”.
[13] Munira Ansari, Talha Khan, Anupam Singh (2021). “Intelligent Chatbot”
[14] Kokou Gaglo, Bessan Melckior Degboe (2021). “Proposal of conversational
chatbot for educational remediation in the context of COVID-19”
[15] Satyendra Praneel Reddy Karri, Dr B Santosh. (2021). “Deep Learning
Technique for implementation of Chatbot”
[16] M. Jovanovic, M. Baez, and F. Casati, (2021). "Talkbots as Conversational
Healthcare Services".
[17] Siddhant Meshram,Namit Naik, Megha VR,Tanmay More,Shubhangi
kharche (2021)“Conversational AI:Chatbots” .
[18] E. Adamopoulou and L. Moussiades,(2020) "An Overview of Talkbot
Technology BT-Artificial Intelligence Applications and Innovations”.
[19] M. Daswani, K. Desai, M. Patel, R. Vani, and M. Eirinaki.(2020)
"CollegeBot: A Conversational Al Approach to Help Students Navigate
College" .
[20] Gwendal Daniel,Jordi Cabot,(2020) “The software challenges of building
smart chatbots”.

You might also like