Substantiating Precise Analysis of Data To Evaluate Students Answer Scripts

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Volume 8, Issue 10, October – 2023 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165

Substantiating Precise Analysis of Data to Evaluate


Students Answer Scripts
Anshika Singh1 Dr. Sharvan Kumar Garg2
Subharti Institute of Technology & Engineering, Subharti Institute of Technology & Engineering,
Swami Vivekanand Subharti University, Swami Vivekanand Subharti University,
Meerut, India Meerut, India

Abstract:- Handwriting recognition refers to interpreting from the scored responses dataset and the answer key. The
and analyzing handwritten text. In rece-nt years, there model is employed to assess the ungraded responses during
have- been notable advance-ments in this field, espe-cially the testing stage. The ungraded responses are transformed into
in the context of computerize-d assessments. As online TF-IDF vectors, then cosine similarity matching is executed
e-xams and digital education platforms continue to gain using the trained model to award scores[1].
popularity, handwriting recognition plays a crucial role-
in evaluating students' written answers. Our proposed Aqil M. Azm et al. developed a system that applies
system automatically recognizes and scores handwritten LSA(Latent Semantic Analysis), RST(Rhetorical Structure
responses on answer sheets by comparing them to the Theory), which involves two stages: training and testing. In
correct answers provided by a moderator. To achieve the training phase, pre-scored essays are used to train the
this, the system utilizes Optical Character Recognition Latent Semantic Analysis model. The training set includes
(OCR) to convert the handwritten text images into essays that were scored by human instructors. During testing
computer-readable text. Additionally, BERT is employed stage, a new essay is processed through several steps
to convert the text into embeddings, and cosine similarity including pre-processing, checking cohesion, counting
is utilized to take these embeddings as input and provide spelling mistakes, comparing essay length, and applying RST.
a final matching confidence score. The essay’s overall score is calculated based on the weighted
sum of the scores assigned to semantic or conceptual analysis,
Keywords:- OCR, Google Vision OCR, BERT, Cosine spelling mistakes and writing style[2].
similarity.
Muhammad Farrukh Bashir et al. presented a method
I. INTRODUCTION that employs machine learning (ML) and natural language
processing (NLP) techniques like WMD(word mover's
In Universities Colleges as well as schools common way distance), Cosine similarity, MNB (Multinomial naive Bayes)
of assessment lies in manual evaluation of written to evaluate subjective answer responses. Assessment of
examination attempted by the students. The student's response responses involves employing solution statements and
is evaluated based on their understanding of language, relevant keywords, while a ML model is constructed to
concepts, and other relevant aspects. Professors encounter forecast the grades of the replies. The comparison score is
numerous challenges when manually grading handwritten determined by assessing the solution sentence against each
answer booklets. Answer scripts can be in the form of OMRs answer sentence using keyword weighting and similarity
in case of multichoice question paper which is easier to distance calculations. The process of keyword-weight
evaluate using a computer but when we need to evaluate computation involves identifying keywords in both the
descriptive answer scripts it is trickier due to the nature of solution sentence and the matching answer sentence. The
handwriting, way of expressing ideas or keywords etc. Hence, keyword-weight number, which falls within the range of 0 to
the evaluation task demands a significant amount of time and 1, is derived by computing the proportion of keywords in the
labor. answer sentence relative to the solution sentence and dividing
it by 100. The computation of similarity distance is performed
Ganga Sanuvala et al. presents a model for assessing using either the Word Movers Distance (WMD) method or the
descriptive responses in tests through a three-module Cosine Similarity (CSim) approach. The present comparison
evaluation system: 1) Scanning; wherein an Optical Character score is computed by amalgamating the similarity weight and
Recognition (OCR) technology is employed to scan the page keyword-weight when the similarity distance descends below
and retrieve student responses and kept in a dataset that is in a certain threshold of 30% keywords present. The keyword-
the form of a text file. 2) Preprocessing; where NLP is utilized weight is considered only when the similarity distance is over
to extract a collection of distinct words that correspond to the threshold, but only if it is below 30%. The overall score is
each sentence in the response by conducting a grammatical calculated by taking the average of the current comparison
check, tokenizing the text, removing stop words, checking for scores for all solution sentences[3].
synonyms and antonyms, and performing stemming.
3)Learning; comprising both training and testing. During the
training phase, a model is constructed by acquiring knowledge

IJISRT23OCT1081 www.ijisrt.com 1179


Volume 8, Issue 10, October – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
Sijimol P J et al. proposed a system named descriptive answers is still a challenging task which remains
HSAES(Handwritten Short Answer Evaluation System) unresolved and even trickier in case of handwritten.
which is a system specifically created to automatically detect
and assess short responses written on answer papers. The
system employs Optical Character Recognition (OCR)
technologies to retrieve handwritten texts and utilizes NLP to
obtain keywords from human-evaluated sample datasets of
answer papers(handwritten) and answer keys. The model
assesses scores by utilizing sentence cosine similarity metrics,
where every sentence in an examined answer paper is
assigned a corresponding score or marks. The established
approach can be utilized to assess the grades of ungraded brief
responses. The study emphasizes the necessity of employing
automated essay scoring approaches to enhance the caliber of
students' writing and diminish the laborious and expensive
process of essay evaluation. The model is trained by selecting
text files at random and utilizing a heavily weighted response
key text file. The computation of semantic similarity between
sentences can be achieved by a method that relies on cosine
similarity. Mobile Agents are a highly efficient paradigm for
distributed applications. The objective of the suggested
system is to automatically evaluate descriptive text responses
through the use of three modules: scanning, training, and
testing. The scanning phase processes answer paper PDF files
and extracts pertinent characteristics from the answers. It
employs a model to assess the marks of descriptive answers.
Meanwhile, the training phase identifies crucial sections from
the answer sheets and maps each processed answer and key to
vector spaces using TF-IDF score and cosine similarity score.
The learning phase entails generating a trained model by
acquiring knowledge from the scored responses dataset and
the answer key. The testing phase evaluates unanswered Fig1: Proposed system
questions by utilizing the acquired knowledge in the training
model. It transforms these questions into TF-IDF vectors and Initially, We will collect scanned answer scripts
then compares their similarity using cosine-based similarity. submitted by the students which are then fed to Optical
The grades are determined based on the scores of the Character Recognition. OCR is an automated technique used
sentences that are most comparable to the unscored replies[4]. to identify and interpret the characters included in digital
images by performing feature extraction process and
To address the issue of manual assessment of answer classification stages. It is a highly reliable technology in the
booklets, our paper proposes a system based on BERT and field of pattern recognition and AI(Artificial Intelligence)
cosine similarity which will result in reducing the time and which is employed to convert handwritten text including
effort of an examiner. words, letters, or characters into a digitized format that can
be easily edited, searched, and stored more efficiently[5,9].
II.METHODOLOGY
We may utilize a CNN or RNN or any deep learning
Our proposed system (Figure 1) employs sophisticated model to develop a customized Optical Character
artificial intelligence techniques for various objectives- Recognition (OCR) system. We require the training data and
Optical Character Recognition (OCR) is employed to convert develop our own model which can process each character
handwritten answers from images into textual data; Text individually and subsequently combine them into a cohesive
Embedding is utilized to convert regular text into a numerical word. The accuracy of the model depends on various criteria
representation by utilizing a deep learning model, which aids like diversity of dataset, size of dataset, pre and post
in the system's comprehension of the student's intended processing of data and to create such an accurate model is a
message while placing emphasis on significant terms; The challenging task due to lack of availability of resources like
concept of vector similarity is employed in the system to cost and time. So, it is advisable to utilize existing OCR
evaluate the correctness of answers. This is done by assessing models like “tesseract” which is trained on large datasets
the numerical representations of the student's response in catering different languages. However, the level of accuracy
relation to the correct answer. There are numerous methods is still not meeting our minimum standards. Furthermore, a
for assessment of handwritten answer scripts majorly relying significant amount of pre and post processing is still
on matching keywords, comparing sequence of words and necessary for the misclassification of some characters. A
performing quantitative analysis. Evaluating subjective or better option is to use publicly accessible OCR APIs like
Google Cloud Vision OCR, Amazon Textract, Microsoft

IJISRT23OCT1081 www.ijisrt.com 1180


Volume 8, Issue 10, October – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
Azure Computer Vision API which are cost-effective as BERT embeddings are applicable for several NLP
compared to any user made OCR model. They undergo applications, including text categorization, question
training using extensive datasets and are capable of answering, and sentiment analysis. For instance, BERT
processing numerous languages, including Indian languages. embeddings can be employed to train a text classifier that can
It enables us to perform OCR with exceptional precision and differentiate between spam and ham emails, or to build a
little time consumption. Google Cloud Vision OCR API question answering model that can respond to queries on a
gives 98% accuracy and support 200+ languages while both specific passage of text.
Amazon Textract and Microsoft Azure Computer Vision API
result in 95% accuracy and support 200 and 120+ languages After creation of BERT embeddings which transformed
respectively. Therefore, we choose to utilize the Google text into vectors with size 786. Now, vector similarity is
Cloud Vision API for OCR. employed in the system to evaluate the correctness of
answers. It can be calculated using various similarity
The Google Cloud Vision API is a machine learning measures like Euclidean distance, Cosine similarity etc. A
service hosted on the cloud that assists developers in similarity measure employs these embeddings and produces a
comprehending the information of photos and movies. The numerical value that quantifies their similarity. [8]. Cosine
system employs a range of machine learning methodologies, similarity is preferred over Euclidean distance and Jaccard
such as deep learning, to scrutinize photos and videos and similarity for measuring text similarity due to its distinct
derive valuable insights from them. The Vision API provides advantages that are specifically suited to the unique
an extensive array of functionalities, encompassing: Image properties of textual data. Firstly, it standardizes vectors,
classification, Text extraction, Landmark recognition, Facial rendering them unaffected by the magnitude of vectors,
detection and analysis etc. The Google Cloud Vision API is which is essential in text analysis where document lengths
a dynamic service that undergoes continuous development, differ significantly. Euclidean distance, on the other hand,
with Google frequently introducing novel functionalities and can be sensitive to high-dimensionality and often suffers
enhancements. It is an invaluable resource for developers from the "curse of dimensionality," which is typical with text
seeking to construct intelligent applications capable of data. Cosine similarity is particularly effective in dealing
comprehending and manipulating photos and videos[6]. with high-dimensional and sparse data, which is commonly
seen in text documents. It achieves this by emphasizing the
Our next step is Text Embedding, which involves angle between vectors, which accurately measures the
transforming ordinary text into a numerical format using similarity of texts based on word usage, a more relevant
Word2Vec, LSA, LDA or more advanced deep learning metric than the actual distance between vectors. Moreover,
models like BERT. The main purpose of utilizing text text data is intrinsically sparse, as the majority of documents
embedding is to encapsulate the fundamental semantic only include a small portion of the entire lexicon. Cosine
meanings and connections between words or phrases, or texts similarity effectively addresses sparsity by taking into
in a way that is suitable to process and analyze via machine account vector components that are not zero. Unlike Jaccard
learning algorithms. This helps the system understand the similarity, which focuses on set intersections and unions and
student's intended answer's conceptual meaning while giving may not consider word frequencies, cosine similarity also
importance to important terms. Here, each individual word or incorporates semantic meaning to some degree, making it a
token is paired with high-dimensional numerical vectors. versatile and appropriate option for various natural language
Each dimension in this arrangement corresponds to a certain processing applications.
attribute or aspect of the word's meaning.
Output using Google Vision OCR:
BERT embeddings are compact vector representations
of words and phrases acquired by the BERT model through Sample Input Answer:
pre-training. BERT is a language model that has been pre-
trained using transformers, which is capable of encoding text
bidirectionally. It learns to comprehend the contextual
connections among words in a given text. BERT embeddings
differ from earlier word embedding approaches in that they
are contextual, capturing the meaning of a word by
considering its position in a sentence and the words
surrounding it[7].

BERT embeddings are generated by training the BERT


model on a vast collection of text data, enabling it to learn
how to forecast the masked words inside the text and
determine if two sentences are consecutive. As the model
acquires the ability to execute these tasks, it cultivates a
profound comprehension of the connections between words
and phrases. The BERT embeddings encapsulate this
comprehension by utilizing real-number vectors to symbolize
the semantic essence of individual words or phrases.

IJISRT23OCT1081 www.ijisrt.com 1181


Volume 8, Issue 10, October – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
REFERENCES

[1]. G. Sanuvala and S. S. Fatima, "A Study of Automated


Evaluation of Student’s Examination Paper using
Machine Learning Techniques," 2021 International
Conference on Computing, Communication, and
Intelligent Systems (ICCCIS), Greater Noida, India, pp.
1049-1054, 2021. doi:
10.1109/ICCCIS51004.2021.9397227.
[2]. Aqil M. Azmi, Maram F. Al-Jouie, Muhammad
Hussain, “AAEE – Automated evaluation of students’
essays in Arabic language”, Information Processing &
Management, vol. 56, pp. 1736-1752, 2019.
https://doi.org/10.1016/j.ipm.2019.05.008.
[3]. M. F. Bashir, H. Arshad, A. R. Javed, N. Kryvinska and
S. S. Band, "Subjective Answers Evaluation Using
Machine Learning and Natural Language Processing,"
in IEEE Access, vol. 9, pp. 158972-158983, 2021, doi:
10.1109/ACCESS.2021.3130902.
[4]. Sijimol P J, Surekha Mariam Varghese, " Handwritten
Short Answer Evaluation System (HSAES)",
International Journal of Scientific Research in Science
Fig 2. Sample handwritten Answer and Technology(IJSRST), Print ISSN : 2395-6011,
Online ISSN : 2395-602X, vol. 4 , pp.1514-1518,
Sample Output By Google OCR: January-February-2018.
https://ijsrst.com/IJSRST1841325
Answer 1- [5]. Singh, A., Garg, S.K. (2023). Comparative Study of
Optical Character Recognition Using Different
b) The Doctrine of Lapse - Techniques on Scanned Handwritten Images. In:
This policy was ordopted and introduced by Lord Sharma, D.K., Peng, SL., Sharma, R., Jeon, G. (eds)
Dalhousie who was the Governor General of India from 184 Micro-Electronics and Telecommunication Engineering
8 to 1856. According to Dalhousies, If a state had never been . Lecture Notes in Networks and Systems, Springer,
subordinate or has territory of subordination to the British Singapore 617. https://doi.org/10.1007/978-981-19-
Government or the states that have been under the grants of 9512-5_38
the British Goverment , verment were the broad. [6]. Cloud Vision API. https://cloud.google.com/vision
classification of the states. So, in that been a case if it has no [7]. BERT: Pre-training of Deep Bidirectional Transformers
heir they cannot adopt. These annerations were majorly done for Language Understanding.
to provide profit and access. https://arxiv.org/abs/1810.04805
[8]. https://developers.google.com/machine-
III. CONCLUSION learning/clustering/similarity/measuring-similarity
[9]. https://aws.amazon.com/what-is/ocr/
We applied our model on more than 100 answers and
got OCR accuracy more than 90+% using google vision API.
We pass the BERT embedding of the user handwritten
answer and the correct answer embeddings. We find the
cosine similarity for both of them. As cosine similarity works
on vector direction we get average cosine similarity is 0.72
when the answer is not matched. And the cosine similarity is
0.81 when the answer is matched. Results with cosine
similarity suggest that there is very little difference between
positive and negative matched answers. Also we can’t use
cosine similarity for the partial scoring for the answer as its
not matching percentage. Our work can be further extended
to calculate the score for answers in terms of percentage. It
also find its applications in grading of answers in languages
other than English.

IJISRT23OCT1081 www.ijisrt.com 1182

You might also like