A Python Tool For Evaluation of Subjective Answers (Aptesa) : Dharma Reddy Tetali
A Python Tool For Evaluation of Subjective Answers (Aptesa) : Dharma Reddy Tetali
A Python Tool For Evaluation of Subjective Answers (Aptesa) : Dharma Reddy Tetali
Lakshmi Ramana
Research Scholar, MLR Institute of Technology, Hyderabad
ABSTRACT
The marks awarded to the answers of the subjective questions varies from evaluator
to evaluator. Also, there are instance where the same evaluator awarded different marks
to the same answer. ApTeSa is a tool developed for automated evaluation of subjective
answers. ApTeSa uses a smart and systematic technique to evaluate the answers. This
tool is developed using PyQt, Python and its modules: pyuic, xlsxwriter, Platypus and
Reportlab. This paper elaborates ApTeSa tool, its functionalities and implementation
details along with an example evaluation of a subject titled 'Structural Engineering'.
ApTeSa works either in a Semi-automated mode or in Complete automated mode. Semi-
automated mode has the flexibility to allow the faculty to reevaluate an answer and
update the results. It is established that the Semi-automated mode yields comparatively
better results than the complete automated mode.
Key words: Computer Assisted Assessment (CAA), Multiple Choice Question (MCQ),
PyQt, pyuic, xlsxwriter, Platypus and Reportlab
Cite this Article: Dharma Reddy Tetali, Dr. Kiran Kumar G and Lakshmi Ramana, A
Python Tool For Evaluation of Subjective Answers (APTESA). International Journal
of Mechanical Engineering and Technology, 8(7), 2017, pp. 247–255.
http://iaeme.com/Home/issue/IJMET?Volume=8&Issue=7
1. INTRODUCTION
In the current evaluation system, descriptive answers given by the students are evaluated
manually by the faculty. This is an error-prone process, as different professors are likely to
award different marks to the same answer. As per reference[1], during last semester Four
Hundred and twenty thousands of students in Anna university applied for revaluation of their
answer scripts, as they believe that the marks obtained during the first evaluation are incorrect.
Nineteen percent of the failed students passed after the revaluation. This clearly indicates that
the marks obtained during revaluation differ from the initial evaluation. The same answer
scripts are awarded with different marks in the revaluation. This example indicates one of the
problems with manual evaluation of answer scripts. Marks awarded during manual evaluation
depend not only on the content of the answer script, but also on other factors like hand writing,
style of answering and even on the perception of the evaluator.
Manual evaluation is a time taking process, and any delay in this process would ultimately
lead to the delay in the announcement of results and discomfort to the students. For instance,
Jawaharlal Nehru Technological University conducted the first semester end examinations for
the final year engineering students, during 3rd week of November, 2016 and the corresponding
results were announced on 9th Feb 2017[2]. There is time gap of 80 days between the
conduction of examination and declaration of results.
The current evaluation system is tedious as faculty has to put lot of manual effort to read through
and evaluate the scripts of all students. Faculty has to put considerable effort to deal with the
following concerns while evaluating the scripts [3]
• Concerns about equity and fairness
• Concerns about comparability of the evaluations
• Concerns about what to weigh in making judgments
The time needed to correct any given answer script depends on its content. On an average,
if a teacher spends twenty-five minutes to correct an answer script, then it would take ten hours
forty-one minutes to correct 25 scripts [4].
At present, assessment of course outcomes is carried out after the evaluation. The
assessment process involves data entry of the marks of all answers for every student in a class.
This process again takes a considerable amount of time.
Attempts to use computers in educational assessment started in the early 1990s [5].
Bunderson et al. explored the ways in which hardware and software technologies can be used
for effective educational assessment. Their recommendations in this regard can be summarized
as follows:
• Enhance the frequency and variety of help services to the disciple through online assessments
• Increase the frequency of formative evaluation, and provide incentives to use the evaluation
data for ongoing improvement of educational programs
• Make use of alternate methods of assessment that require human judgment and that measure
more complex, integrated, and strategic objectives
• Foster new item types and uses of portable answer media in order to utilize the existing testing
infrastructure more effectively
• Emphasize the development of localized infrastructure of Integrated Learning and Assessment
Systems, and the coordinated evolution of central sites for development of system, tests, and
research
• Stimulate the professional development of faculty and other professionals who are
knowledgeable and skilled about both the human judgment and the technical aspects of
automated assessment
G. Frosini et al described a tool to build software systems [6]. Their tool is used to replace
the role of an examiner during typical exams, and to reduce effort as well as anxiety. The tool
uses computerized adaptive testing to increase assessment efficiency. According to their model,
the database consists of queries submitted by an author. A Design module handles insertion of
the queries, analysis module analyzes the results gathered during the exams and calculates
particular indexes to determine the level of difficulty on the submitted questions.
2. RESEARCH METHODOLOGY
ApTeSa is applied during the First Internal examination conducted at MLR Institute of
Technology [10].The tool is applied to evaluate the answer scripts for the subject 'Structural
Engineering'. The answer scripts of three batches of students are taken for evaluation. Each
batch comprises of 40 students. The answer scripts of first batch of students are evaluated
manually. Second batch scripts are evaluated in 'semi automated mode' of ApTeSa, and
'Complete automated mode' is used to assess the scripts of the third batch. A feedback survey
is conducted after the evaluation, and the results of this survey are depicted in the 'RESULTS
ANALYSIS' section.
each of their occurrences. The structure of the answer base is discussed in detail, in the later
section entitled 'ANSWER BASE OF ApTeSa'.
ApTeSa accepts disciple answer scripts as text files. The name of the text file, containing
an answer script is provided as an input to the system. The system provides an appropriate error
message, if that file doesn't exists. Otherwise, it accepts the file and initiates it's processing. The
system gets the keywords and the corresponding marks to be awarded, from the answer base. It
then checks for the existence of those keywords in the answer file. If a keyword exists in the
given answer, then the marks corresponding to that keyword are added to the marks to be given
to that answer. This process is repeated for the phrases as well. The total marks to be awarded
are calculated by summing up the marks for keywords and phrases. The system has the
flexibility to update these total marks. The system generates a report, in the format (either in
.pdf, or in .xls format) chosen by the faculty. This report contains the details of marks awarded
to each keyword and to each phrase of the given answer. Faculty can update the marks, if
needed, after going through the report.
ApTeSa.ui is the primary user interface, which is shown in the following figure.
Generate Report
This push button instantiates the user interface genrep.ui, which is used to generate the report
either in .PDF format, or in .XLS format, as chosen by the user.
Pyuic tool is used to automatically generate the python code of all the above mentioned user
interfaces. This automatically generated code is imported into the main programs. These main
programs are briefly described in the following section.
Question_main.py:
This Python program accepts the Question ID, Question Description and the total marks that
can be awarded to the question, and using the line edits of the user interface it inserts those
values into the Question entity. Likewise, Phrase_main.py & Keyword_main.py are used to
deal with the details of phrases and keywords.
Answer_main.py:
This program uses isfile() function in os.path module [11], to verify whether the answer script
file exists or not.
Eval_main.py
This program obtains the details of the keywords and phrases, of the answer to be evaluated,
from the answer base. It then verifies for the existence of the required keywords and phrases in
the answer script to be evaluated. If they exist, then it adds the corresponding marks to the
'marks to be awarded'.
Genrep_main.py
This program is used to generate the details of the awarded marks in the form of either a .PDF
report or as a .XLS report. This program imports Python’s xlsxwriter [12] module to generate
the report in .XLS format. Also, this program imports Python's Reportlab [13] module and
Platypus [14] module to generate the report in .PDF format.
'reportlab.platypus.SimpleDocTemplate.multiBuild' is used to generate the report body.
'xlsxwriter.Workbook' is used to create the workbook and ' workbook.add_worksheet’ is used
to add the worksheets to the workbook.
8. RESULTS ANALYSIS
Following, is part of the .pdf report generated after evaluating the answer for a question with
question id 001.By default, ApTeSa has assigned 0.5 marks to the phrases and 0.25 marks to
the keyword. ApTeSa allows these default values to be changed and each keyword or phrase
can have separate marks.
ApTeSa is tested on a sample of 120 students in MLR Institute of Technology. The students
were divided into three batches, each batch comprising of 40 students. The answer scripts of
these students for the subject ‘Structural Engineering’, were evaluated in the following three
modes of evaluation:
• Complete Manual evaluation
• Semi-Automated evaluation using ApTeSa
• Completely automated evaluation using ApTeSa
A feedback survey is conducted after the above three evaluations, and the following graphs
depicts the results of this survey.
Not known 6
UnSatisfactory 7
Satisfactory 13
Good 9
Very Good 5
0 2 4 6 8 10 12 14
Not known 7
UnSatisfactory 10
Satisfactory 12
Good 7
Very Good 4
0 2 4 6 8 10 12
Not known 2
UnSatisfactory 4
Satisfactory 16
Good 12
Very Good 6
0 2 4 6 8 10 12 14 16
CONCLUSION
Analysis of the results clearly indicate that the semi-automated evaluation method clearly
dominated the remaining two evaluation methods. This is obvious since the semi-automated
evaluation involves a high level manual evaluation after the automated evaluation. Even though
semi-automated evaluation is not as fast as the completely automated evaluation, it can still be
adopted since it is efficient than the complete manual evaluation.
REFERENCES
[1] http://timesofindia.indiatimes.com/city/chennai/Engineering-grades-rise-after-revaluation-
of-papers/articleshow/19718892.cms
[2] http://jntuh.ac.in//bulletin_board/II_III_IV_Year_BTech_BPharm_2016_17.pdf
[3] https://www.theatlantic.com/national/archive/2013/01/why-teachers-secretly-hate-
grading-papers/266931/
[4] http://indianexpress.com/article/cities/mumbai/evaluation-season-school-teachers-sweat-
it-out-for-a-pittance/
[5] Computers in Educational Assessment: An Opportunity to Restructure Education practice
by C.V. Bunderson, J.B. Olsen & A. Greenberg Published by Institute for Computer Uses
in Education, 1990.
[6] Performing automatic exams by G. Frosini, B. Lazzerini & F. Marcelloni, Computers &
Education, vol. 31, pp. 282, 1998.
[7] Rodriguez, M.C.: Three options are optimal for multiple-choice items: A meta-analysis of
80 years of research. Educational Measurement: Issues and Practice, 24 (2), 3{13 (2005)
[8] http://caacentre.lboro.ac.uk/
[9] Prof. Megha Mehta, An Evaluation of Budgetary Control at Bhima Co-Operatives Sugar
Industry Ltd, In Pune, 5(2), February (2014), pp. 54-60. International Journal of
Management (IJM).
[10] Naveenkumar Jayakumar, Farid Zaeimfar, Manjusha Joshi and Dr. Shashank.D.Joshi, A
Generic Performance Evaluation Model For The File Systems, Volume 5, Issue 1, January
(2014), pp. 46-51, International Journal of Computer Engineering and Technology.