IOMP - Document Format - IQAC
IOMP - Document Format - IQAC
IOMP - Document Format - IQAC
on
IMAGE CAPTION GENERATOR
Submitted in Partial fulfillment of requirements for the award of the degree of
BACHELOR OF TECHNOLOGY
In
By
Priyanka Saxena
Assistant Professor, Department of CSD
DATA SCIENCE
CERTIFICATE
This is to certify that this is a Bonafide record of the project report titled “Image Caption
Generator” which is being presented as the Industrial Oriented Mini Project /Summer
Internship report by
3. P. VARSHA 20BD1A6745
In partial fulfillment for the award of the degree of Bachelor of Technology in Computer Science
and Engineering(Data Science) affiliated to the Jawaharlal Nehru Technological University
Hyderabad, Hyderabad
Mission of KMIT
● To establish an industry institute Interaction to make students ready for the industry.
convergence.
● To encourage and enable students to not merely seek jobs from industry but also to
● To induce a spirit of nationalism which will enable the student to develop, understand
● To support the faculty to accelerate their learning curve to deliver excellent service to
students.
vibrant nation.
5. Modern Tool Usage: Create select, and apply appropriate techniques, resources,
and modern engineering and IT tools including prediction and modeling to complex
engineering activities with an understanding of the limitations.
12. Life-Long Learning: Recognize the need for and have the preparation and ability
to engage in independent and life-long learning in the broadest context of
technological change.
PSO1: An ability to analyze the common business functions to design and develop
PSO2: Shall have expertise on the evolving technologies like Python, Machine
Learning, Deep learning, IOT, Data Science, Full stack development, Social
Networks, Cyber Security, Mobile Apps, CRM, ERP, Big Data, etc.
PEO1: Graduates will have successful careers in computer related engineering fields or
PEO2: Graduates will try and provide solutions to challenging problems in their
PEO4: Graduates will communicate effectively, work collaboratively and exhibit high
PO PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12
P1 H H M H M L M H H M H
P2 M H H M L M L H M M L
P3 H M H M L M L M M M
P4 H H M M H L M L H M H M
P1 H M
P2 M M
L H
P3
M M
P4
P1 H L M M
L M H
P2
P3 M H M
P4 H M L
We hereby declare that the results embodied in the dissertation entitled “Image Caption
Generator'' has been carried out by us together during the academic year 2023-24 as a partial
fulfillment of the award of the B. Tech degree in Computer Science and Engineering (Data
Science) from JNTUH. We have not submitted this report to any other university or
We take this opportunity to thank all the people who have rendered their full support to our
project work. We render our thanks to Dr. B L Malleswari, Principal, who encouraged us
to do the Project.
We are grateful to Mr. Neil Gogte, Founder & Director, Mr. S. Nitin, Director, for
facilitating all the amenities required for carrying out this project.
We express our sincere gratitude to Ms. Deepa Ganu, Director Academic, for providing an
excellent environment in the college.
We are also thankful to Dr. G Narender, Head of the Department, for providing us with
time to make this project a success within the given schedule.
We are also thankful to our guide Ms. Savitha Ramesh, for her valuable guidance and
encouragement given to us throughout the project work.
We would like to thank the entire IT Department faculty, who helped us directly and
indirectly in the completion of the project.
We sincerely thank our friends and family for their constant motivation during the project
work.
Recently, image captioning is a new challenging task that has gathered widespread interest. The
task involves generating a concise description of an image in natural language and is currently
accomplished by techniques that use a combination of computer vision (CV), natural language
processing (NLP), and machine learning methods. In this paper, we presented a model that
architecture. The encoder part is the first phase where CNN models are used in order to extract
the important features from an image. On the other hand, the decoder phase involves RNN
models such as LSTM or GRU to translate the features into natural language sentences. We
incorporated the attention mechanism while generating captions. For image captioning, attention
tends to focus on specific regions in the image while generating descriptions. Adding an attention
mechanism to a model allows the model to focus on the parts of the input that are most important
for making a prediction. This can help the model to make more accurate and informative
predictions.
CV Computer Vision
UI User Interface
2. Sequence Diagram 18
3. Class Diagram 19
4. State Diagram 20
5. Deployment Diagram 20
1. Login Page 31
2. Sign Up Page 32
4. Result Page 30
DESCRIPTION PAGE
CHAPTER - 1 1
1. INTRODUCTION 2
1.1 Purpose of the project 2
1.2 Problem with Existing Systems 3
1.3 Proposed System 3
The primary purpose of the project is to develop a mobile application that leverages deep
learning models and cloud services to enable users to automatically generate descriptive captions
images. This application is designed to fulfill the following objectives:
Enhance User Experience: The project aims to enhance the user experience by providing a user-
friendly and intuitive mobile application that allows users to effortlessly add context and
descriptions to their images. Users can create meaningful captions for personal photos, share
captivating content on social media, and support visually impaired individuals in understanding
image content.
Leverage Deep Learning Models: By incorporating the InceptionV3 pre-trained architecture for
image encoding and LSTM models for caption generation, the project harnesses the power of
deep learning and natural language processing to generate contextually relevant and coherent
captions. This enhances the overall quality of image descriptions and user satisfaction.
User Authentication and Data Management: Utilizing Firebase for user authentication and data
storage, the project ensures secure user account management. Users can register, log in, and
securely store their data and generated captions. Firebase simplifies user management and
provides a reliable cloud-based data storage solution.
Cross-Platform Accessibility: The mobile application, built using Flutter, is designed to be
cross- platform, supporting both Android and iOS devices. This broad accessibility ensures that a
wide range of users can benefit from the image captioning capabilities.
Multilingual Support: The project offers the potential to generate captions in multiple
languages, contributing to its inclusivity and making it suitable for a global audience. This
multilingual support ensures that users from diverse linguistic backgrounds can use the
application effectively.
Innovation and Automation: The project represents an innovative solution to the challenges of
adding context and meaning to visual content. By automating the caption generation process,
Existing systems for image caption generation often face several challenges and
limitations that the Image Caption Generator project aims to address. Many image caption
generators struggle to provide accurate and contextually relevant captions. They often produce
generic or incorrect descriptions that do not effectively convey the content of the image.
Existing systems may have difficulty understanding the nuances of language and context. They
often fail to capture the subtleties and intricacies of images, leading to vague or inappropriate
captions. Some systems may suffer from slow processing times, especially when dealing with a
large number of images or complex visual content. This can result in frustrating delays for users.
Users may have limited control over the generated captions, with little ability to tailor the
descriptions to their specific needs or preferences.
The proposed system for the Image Caption Generator is a mobile application developed using
Flutter, which seamlessly integrates with a deep learning model. This deep learning model
employs Inception v3 as the encoder and LSTM (Long Short-Term Memory) as the decoder. This
innovative combination of technologies aims to address the existing limitations in image caption
generation by delivering a user-friendly, accurate, and versatile solution.
The mobile application, built on the Flutter framework, offers an intuitive and accessible platform
for users to upload their images and receive automatically generated captions. This user-friendly
interface ensures that individuals from various backgrounds and expertise levels can effortlessly
harness the power of image captioning to enhance their visual content.
The core of the system lies in the deep learning model.
Improved Descriptive Quality: Image caption generators with attention mechanisms have a
broader scope in terms of generating high-quality, contextually relevant captions for a wide range
of images. The attention mechanism allows the model to focus on different regions of the image
while generating the caption, resulting in more accurate and detailed descriptions.
Contextual Awareness: These models excel in capturing contextual information within an image.
They can better understand the relationships between objects and scenes, and they can produce
captions that reflect these relationships effectively.
Multimodal Understanding: Image caption generators with attention can seamlessly integrate
information from both the visual (image) and textual (caption) modalities. This makes them well-
suited for applications where cross-modal understanding is crucial.
Adaptability: They can be fine-tuned and adapted to specific domains or applications, allowing
for customization and improved performance in specialized areas.
Improved Training Efficiency: Attention mechanisms often reduce the reliance on large-scale
training datasets because the model can focus on relevant image regions. This can make training
more efficient.
Variety in Captions: The attention mechanism can promote the generation of diverse captions for
the same image, adding richness and variety to the output.
Objective: This part deals with visual information, particularly the extraction of features from
images.
Deep Convolutional Neural Network (CNN): A deep CNN is used to process images. CNNs
are known for their ability to automatically learn and extract hierarchical features from images.
In this context, CNN takes an image as input and transforms it into a fixed-size feature
representation.
Multimodal Part:
Objective: The multimodal part bridges the gap between the textual and visual modalities,
enabling the model to jointly consider both types of information.
One-layer Representation: This component connects the language model and the deep CNN. It
essentially combines the output from the language model and the vision part into a unified
representation.
The BLEU score of the above model is around 37.6 and its main disadvantages are
Limited Contextual Information: m-RNNs typically use recurrent neural networks (RNNs) to
capture temporal dependencies in sentences. However, RNNs have limitations in modeling
long- range dependencies effectively.
Attention mechanisms, on the other hand, can focus on specific regions of the image and
relevant parts of the input sentence, providing richer contextual information and
Image Features (from a CNN): The model first processes the input image using a Convolutional
Neural Network (CNN). CNNs are well-suited for extracting hierarchical features from images.
In this case, CNN converts the image into a fixed-length vector representation that captures
important visual information.
Words (from a Vocabulary): The model generates sentences word by word. It takes the previous
words in the sentence (if any) and combines them with the image features to predict the next
word. The words are represented as vectors using an embedding model.
The core of the NIC model is a type of recurrent neural network (RNN) known as Long Short-
Term Memory (LSTM).
The BLEU score of this model is around 42.1.
The main disadvantage of this particular model is model complexity
NIC Model: The NIC model consists of a Convolutional Neural Network (CNN) for image
feature extraction and a Long Short-Term Memory (LSTM) network for text generation.
Top-down visual attention mechanisms have been used extensively in image captioning and to
enable deeper image understanding through fine-grained analysis and even multiple steps of
reasoning. In this work, we propose a combined bottom-up and top-down attention mechanism
that enables attention to be calculated at the level of objects and other salient image regions.
This is the natural basis for attention to be considered.
Applying this approach to image captioning, our results on the MSCOCO test server establish a
new state-of-the-art for the task, achieving BLEU scores of 45.7, respectively.
The Software Requirements Specification (SRS) plays a crucial role in the software development
process, serving as a cornerstone for the entire project. Its primary role is to provide a detailed
and comprehensive description of what the software system is supposed to accomplish, how it
should behave, and what its constraints and limitations are.
The SRS serves as the foundation for designing, developing, and testing the software system.
It clearly defines the scope of the software project by specifying what the software will and will
not do. This helps manage client expectations and avoid scope creeps.
The SRS acts as a reference document for managing changes to the project. Any modifications to
the requirements can be documented and evaluated for their impact on the project's timeline and
budget.
Software architects and designers use the SRS as a starting point for designing the system's
architecture, data structures, and user interfaces.
Image Upload:
Users should be able to upload an image from their device's gallery or capture one using the device's
camera.
Image Preprocessing:
The system should preprocess the uploaded image, ensuring it is compatible with the Inception v3
model. This may involve resizing, normalizing, and formatting the image appropriately.
Inception v3 Integration:
The system should integrate the Inception v3 model to perform image recognition. It should send
the preprocessed image to the model for analysis.
Caption Generation:
Upon analyzing the image, the system should generate a natural language caption that describes the
contents of the image.
Performance:
Performance requirements for an image caption generator using Flutter and the Inception v3
model are essential to ensure that the system operates efficiently and meets user expectations.
Response Time:
The system should generate captions for images within a maximum response time of X seconds
(e.g., 2 seconds) under normal operating conditions.
Image Processing Time:
The Inception v3 model should process and recognize an image within a maximum time of X
seconds.
Concurrent Users:
The system should support a minimum of X concurrent users without a significant degradation in
response time. Define the expected level of concurrency.
Complementing this, Long Short-Term Memory (LSTM) networks are employed as decoders for
generating textual descriptions. LSTMs, part of the recurrent neural network family, address the
vanishing gradient problem often encountered in standard RNNs. This makes them well-suited for
sequential data generation tasks. In image captioning, LSTMs sequentially produce words, taking
into account the context from both the image features encoded by Inception V3 and the
previously generated words. The synergy between Inception V3 and LSTM decoders creates a
powerful model that bridges the gap between visual and textual information, generating coherent
and contextually relevant captions for images.
The server-side infrastructure should be capable of hosting the backend application, including the
Inception v3 model and image recognition processes.
Unified Modeling Language (UML) is a standardized modeling language used in the field of
software engineering to visually represent and document the design and structure of complex
systems. UML provides a set of diagrams and symbols that allow software developers, designers,
and stakeholders to communicate and understand the architecture, behavior, and relationships
within a software system. UML was developed by the Object Management Group (OMG) and has
become a widely accepted and essential tool in the software development process.
UML serves as a powerful tool for visualizing, documenting, and communicating the design and
leading to more effective software development and maintenance. UML diagrams are an integral
part of the software development process, from initial design and modeling to system
implementation and maintenance.
You'll use Flutter, a popular open-source framework for building natively compiled applications
for mobile, web, and desktop from a single codebase.
Flutter allows you to create a cross-platform mobile application that can run on both Android and
iOS devices, ensuring a broad reach.
InceptionV3 is a pre-trained deep learning model used for image recognition. It can analyze and
encode images effectively, extracting meaningful features from them.
In your application, InceptionV3 will take user-uploaded images and provide encoded
representations, which will be used for caption generation.
Long Short-Term Memory (LSTM) is a type of recurrent neural network (RNN) that is suitable
for sequence-to-sequence tasks like natural language processing.
You'll use LSTM to generate textual captions for the images. The encoded image data from
InceptionV3 will serve as input to the LSTM model.
Firebase is a comprehensive mobile and web application development platform provided by
Google.
You'll use Firebase for user authentication, enabling users to create accounts, log in, and secure
their data. Firebase Authentication simplifies the authentication process.
Firebase can also serve as a backend for storing user data, images, and generated captions.
First, you need to have a deep learning model that performs the specific task you want. You
can build your model using popular deep learning frameworks like TensorFlow. Once the
model is trained, save it in a format that can be loaded on the server.
You'll need a backend server that hosts your deep learning model. Common technologies for
building backend servers include Flask (Python), Express (Node.js), Django (Python), or Fast
API (Python). Your server will be responsible for receiving requests from the Flutter app, running
predictions using the deep learning model, and sending back the results.
Create an API endpoint on your backend server to accept requests from the Flutter app. This
endpoint should allow the app to send data to the server for processing. For example, you might
use HTTP endpoints or RESTful APIs.
Ensure that the data you send between your Flutter app and the backend is in a format that both
can understand. JSON is a common choice for this purpose. You might need to serialize your data
on the app side and deserialize it on the server side.
On the backend server, implement the logic to handle the incoming requests. This includes
deserializing the data, using your deep learning model to make predictions, and sending
the results back to the Flutter app.
Deploy your backend server to a hosting platform. Common choices include AWS,
Google Cloud, Heroku, or your own server. Make sure the server is accessible via a public
URL.
Test the connection between the Flutter app and the backend to ensure that data is
transferred correctly, and predictions are received as expected. Debug any issues that may
arise.
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY | Image Caption Generator 24
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY | Image Caption Generator 24
By following these steps, you can set up connections between your Flutter app and the deep
learning model for image recognition and caption generation. This will enable your app to
provide meaningful captions for uploaded images.
Creating a Flutter application that takes an image, sends it to an Inception V3 model for image
feature extraction, generates a caption using an LSTM decoder, and then displays the image with
the generated caption.
Login Page: Your Flutter application starts with a login page where users can authenticate
themselves if required. This page typically includes fields for entering a username and password,
as well as buttons for signing in. The login page can be implemented using Flutter's Text Form
Field widgets for input and Elevated Button for the sign-in action. Users need to log in to access
the caption generation feature.
Image Selection: After logging in, the user can navigate to a screen where they can select an
image. You can use Flutter's Image Picker package to implement image selection. Users can
choose to either pick an image from the gallery or capture a picture using the device's camera.
This screen includes buttons to access the gallery or camera, and a preview of the selected
image.
Decide on the design and layout of your dashboard. Consider what information you want to
display, how you want to visualize the data, and what interactions you want to enable. You can
use UI frameworks like Flutter to create the dashboard.
Your dashboard will need to communicate with the backend server that hosts the deep learning
model.
Build components in your dashboard to display the data and predictions received from the
backend. You might use charts, graphs, tables, or any other appropriate visualization methods to
present the results to users.
Implement user interactions on the dashboard. Users should be able to input data or customize
parameters for the deep learning model. This may involve adding input forms, sliders, or buttons
that allow users to interact with the model.
If your deep learning model provides real-time predictions or updates, consider implementing a
mechanism for real-time data visualization and updates on the dashboard. Technologies like
WebSocket's can be helpful for real-time communication.
If your dashboard should only be accessible to authorized users, implement user authentication
and authorization mechanisms. This ensures that only authorized individuals can access and
interact with the dashboard.
Test the dashboard to ensure that it correctly communicates with the backend and displays the
results as intended. Debug any issues that may arise during testing.
Deploy the dashboard to the intended environment, whether it's a web server, cloud platform, or a
local network.
Ensure that the deployment environment can support the technologies used in your dashboard.
5.5 UI Screenshots:
6.1 Introduction
Testing objectives for the Image Caption Generator project using Flutter and the
Inception v3 model with LSTM as the encoder and decoder are crucial to ensure the system's
functionality, reliability, and performance.
Functional Testing:
Objective: Verify that the system functions according to the specified requirements.
Test Use Cases: Test image recognition, caption generation, user authentication, and caption saving
functionality.
Ensure that all features work as expected, including image upload, caption generation, and user
account management.
Performance Testing:
Objective: Evaluate the system's performance under different load conditions.
Test Use Cases: Measure response times, throughput, and scalability under normal and peak usage
scenarios.
Ensure the system can handle a predefined number of concurrent users and process a specific
number of image recognition requests per minute.
Compatibility Testing:
Objective: Ensure that the system functions correctly on different platforms and devices.
Test Use Cases: Test the Flutter application on various mobile devices (iOS and Android) and web
browsers.
Confirm that the system is compatible with different screen sizes and operating systems.
Language and Localization Testing:
Objective: Verify that the system supports multiple languages and localization.
Test Use Cases: Test the application in different languages and ensure that captions can be
generated in various supported languages.
Functional Testing:
Unit Testing: Test individual components, such as the image recognition module and caption
generation module, in isolation to verify their correctness.
Integration Testing: Ensure that different components of the system work seamlessly together,
including the integration between the Flutter app and the deep learning model.
System Testing: Verify the end-to-end functionality of the system, including user interactions,
image upload, recognition, and caption generation.
Performance Testing:
Load Testing: Simulate a high volume of concurrent users and image recognition requests to
evaluate the system's response time and scalability.
Stress Testing: Push the system beyond its normal capacity to identify its breaking points and
measure performance under extreme conditions.
Scalability Testing: Assess how the system handles increasing loads by adding resources and
verifying its ability to scale.
Compatibility Testing:
Platform and Device Testing: Test the Flutter app on various mobile devices, operating systems
(iOS and Android), and web browsers to ensure consistent functionality.
Cross-Browser Testing: Ensure that the web version of the app works correctly on different
browsers and platforms.
Language and Localization Testing:
System evaluation for the Image Caption Generator project involves assessing the system's
performance, functionality, usability, and overall effectiveness. It aims to determine whether the
project has achieved its objectives and if it meets the requirements and expectations of both the
developers and end-users.
Verify that the system successfully recognizes images and generates contextually relevant
captions.
Ensure that all the functional requirements, such as image upload, caption saving, user
authentication, and multilingual support, are met.
Assess the system's response time for image recognition and caption generation under various
load conditions, ensuring it meets performance requirements.
Confirm that the system scales as expected to accommodate growing numbers of users and image
processing requests.
Test the system on various platforms, devices, and web browsers to ensure consistent
functionality and appearance.
Address any compatibility issues that may arise in different environments or screen sizes.
Validate that the system correctly displays content in multiple languages and supports accurate
caption generation in various languages.
Confirm that the localization of the user interface elements is culturally appropriate.
Create test cases for various aspects of the system, including positive and negative scenarios,
boundary cases, and exceptional conditions. Set up the testing environment, including the
hardware, software, and data needed to execute the test cases.
Begin with unit testing, which focuses on testing individual components or functions in isolation.
Verify that each unit works as expected and correct any defects identified.
Conduct integration testing to ensure that the units or components work together seamlessly.
Test the interactions between modules and validate data flows.
Perform system testing to validate the end-to-end functionality of the entire system.
Evaluate the system's compliance with all specified requirements.
Test the system on various platforms, devices, and web browsers to ensure compatibility.
Confirm that the system functions correctly and looks consistent across different environments.
Validate that the system correctly displays content in multiple languages and supports accurate
localization.
Ensure that user interface elements are culturally appropriate.
User Experience: The project aims to enhance the user experience by providing a user-
friendly and intuitive mobile application that allows users to effortlessly add context and
descriptions to their images.
Performance Test Case - Response Time Under Load: Test Objective: To measure the system's
response time for image recognition and caption generation under a simulated heavy load. Test
Scenario: Simulate a scenario with a high volume of concurrent users and image recognition
requests. Test Steps:
Simulate concurrent user connections to the system.
Upload multiple images simultaneously.
Usability Test Case - User Interface Evaluation:
Test Objective: To assess the user-friendliness of the application's interface and navigation. Test
Scenario: Involve real users in usability testing to provide feedback. Test Steps:
Engage representative users in the test.
Ask users to upload images and generate captions.
Collect feedback on the intuitiveness of the user interface and navigation.
Address any usability issues identified during testing.
Measure the response time for image recognition and caption generation.
Ensure that the system meets the specified response time requirements under load.
Security Test Case - Authentication and Authorization: Test Objective: To verify that user
authentication and authorization mechanisms are robust. Test Scenario: Attempt to access the
system without proper authentication.
Test Steps:
Compatibility Test Case - Cross-Platform Testing: Test Objective: To ensure that the system
works consistently across different platforms and devices. Test Scenario: Test the application on
various mobile devices (iOS and Android)
. Test Steps:
Install and run the Flutter app on different mobile devices with varying screen sizes and operating
systems.
Access the web version of the application using various web browsers.
Verify that the system's functionality and appearance are consistent on all tested platforms.
Concluding the Image Caption Generator project, it is evident that the combination of Flutter,
Inception v3, and LSTM has resulted in a sophisticated and user-friendly system that fulfills its
primary objectives. The project aimed to bridge the gap between images and language, enhancing
the storytelling potential of visual content, and it has largely succeeded in doing so. Through
rigorous testing and evaluation, several key conclusions can be drawn.
First and foremost, the system has proven its functionality, successfully recognized images and
generated contextually relevant captions. The combination of Inception v3's accurate image
recognition and LSTM's natural language processing capabilities has resulted in a reliable and
accurate caption generation process. Users can confidently rely on the system to describe the
contents of their images effectively.
The conclusion of the project also acknowledges the ongoing nature of software development.
Regular monitoring, user feedback, and continuous improvement processes will be essential to
keep the system current and effective. User acceptance testing has been crucial in ensuring that
the system aligns with user expectations and business requirements, and this feedback-driven
approach will continue to guide the system's evolution.
the Image Caption Generator project has successfully achieved its primary objectives, offering a
reliable, user-friendly, and scalable solution for image recognition and caption generation. As it
continues to evolve based on user feedback and emerging technologies, the project holds the
promise of further enriching the storytelling potential of visual content for a wide range of users
and applications.
Expanding language support beyond the currently supported languages opens the door to a global
audience. Future enhancements could involve incorporating additional languages, dialects, and
regional variations, making the system more inclusive and accessible to users from diverse
linguistic backgrounds.
Incorporating collaboration and content sharing features can encourage users to create and
share content with others more easily. The ability to collaborate on captioned images, co-
editing, and content-sharing functionalities can expand the system's utility in various
collaborative settings.
1.O. Vinyals, A. Toshev,S. Bengio and D. Erhan,” Show and tell: A neural image caption
generator,” In Proceedings of the IEEE conference on computer vision and pattern recognition,
pp. 3156- 3164,2014.
2. J. Mao et al. “Deep captioning with multimodal recurrent neural networks (m-rnn),” arXiv
preprint arXiv:1412.6632,2014.
3. C.Szegedy, V.Vanhoucke, S.Ioffe, J.Shlens and Z.Wojna, “Rethinking the inception architecture
for computer vision,” In Proceedings of the IEEE conference on computer vision and pattern
recognition. 2818-2826,2016.
3. Keras. (2021). "Keras: An open-source deep learning API." Retrieved from https://keras.io/.
Firebase. (2021). "Firebase: A comprehensive app development platform." Retrieved from
https://firebase.google.com/.
4. Hochreiter, S., & Schmidhuber, J. (1997). "Long short-term memory." Neural computation,
9(8), 1735-1780.