A Mental Health Chatbot For Regulating Emotions (SERMO) - Concept and Usability Test

This article has been accepted for publication in a future issue of this journal, but has not been
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TETC.2020.2974478, IEEE
Transactions on Emerging Topics in Computing
JOURNAL OF LATX CLASSFILES, VOL.14, NO. 8,AUGUST 2021 1
A Mental Health Chatbot for Regulating

Emotions (SERMO) - Concept and Usability Test
Kerstin Denecke, Sayan Vaaheesan, and Aaganya Arulnathan
Abstract—Mental disorders are widespread in countries all over the world. Nevertheless, there is a global shortage in human
resources delivering mental health services. Leaving people with mental disorders untreated may increase suicide attempts and
mortality. To address this matter of limited resources, conversational agents have gained momentum in the last years. In this work, we
introduce SERMO, a mobile application with integrated chatbot that implements methods from cognitive behaviour therapy (CBT) to
support mentally ill people in regulating emotions and dealing with thoughts and feelings. SERMO asks the user on a daily basis on
events that occurred and on emotions. It determines automatically the basic emotion of a user from the natural language input using
natural language processing and a lexicon-based approach. Depending on the emotion, an appropriate measurement such as
activities or mindfulness exercises are suggested by SERMO. Additional functionalities are an emotion diary, a list of pleasant activities,
mindfulness exercises and information on emotions and CBT in general. User experience was studied with 21 participants using the
User Experience Questionnaire (UEQ). Findings show that efficiency, perspicuity and attractiveness are considered as good. The
scales describing hedonic quality (stimulation and novelty), i.e., fun of use, show neutral evaluations.
Index Terms—Conversational user interface, natural language processing, sentiment analysis, mental health, mHealth.
1 I NTRODUCTION
M ENTAL disorders affect 29% of the global population

in their lives [1]. Every year, 25% of adults and
10% of children are affected [2]. The most common mental
giving the user appropriate written answers in a language
that he or she understands. Chatbots were mainly used
in marketing to enhance customer experiences. A recent
disorders are depressive disorders and anxiety disorders. In review of Laranjo et al. confirms that the use of conver-
2017, 322 million people suffered from depressive disorders sational agents with unconstrained natural language input
and 264 million from anxiety disorders worldwide [3]. Apart capabilities for health-related purposes is an emerging field
from the fact that mental disorders impact on people’s of research [11]. Vaidyam et al. concluded that the mental
quality of life, they are one of the most common causes of health field could use conversational agents in psychiatric
occupational disability [4] leading to high economic costs. In treatment [12]. According to a systematic review [13], there
Switzerland, it is estimated that more than CHF 11 billion are currently 41 different chatbots for mental health reported
- including indirect costs that are associated with untreated by 53 studies. About 43% of those chatbots were imple-
mental disorders - are spent per year, e.g. for reduced labour mented in the United States of America. They focused on
productivity and suicide complications [5]. different use cases including therapy (e.g. Woebot), training
Mental disorders are usually treated by pharmacother- (e.g. LISSA), and screening (e.g. SimSensei), and mainly on
apy or psychotherapy [6]. However, there is a global short- users suffering from depression or autism. So far, there is
age of human resources for delivering such mental health only limited clinical evidence that mental health chatbots
services. In developed countries there are nine psychia- are reducing symptoms. Woebot was tested with students
trists per 100,000 people available [7], while in developing with depression [14]. The students who used Woebot sig-
countries there is one psychiatrist per ten million people nificantly reduced their symptoms of depression over the
[8]. According to the WHO, about 45% of people in devel- study period as measured by the depression questionnaire
oped countries and 15% of people in developing countries PHQ-9 while those in the information control group did not.
have access to psychiatric services [9]. Leaving people with The control group had to read a self-help book.
mental disorders untreated can increase suicide attempts Many existing mental health chatbots restrict the con-
and mortality [10]. There is a need for providing support versation to a limited set of patterns in form of predefined
and self-help between therapeutic sessions and for patients answers. This limits the expressiveness of interactions for
waiting for treatment by mental health service providers. the user. Furthermore, currently only mental health chatbots
To address this problem, conversational agents have arisen in English are available.
interest in recent years, particularly in psychoeducation, In this paper, we are focusing on conversational agents
behaviour change and self-help. with unconstrained natural language input in German.
Conversational agents or chatbots are text-based dia- More specifically, the aim of this work is to develop a mobile
logue systems integrated in mobile apps or web pages. application with integrated chatbot that supports the user in
The chatbot simulates a realistic conversation partner by regulating his or her emotions. Emotions will be recognized
directly from the conversation which requires methods for
• Bern University of Applied Sciences, Bern, Switzerland . analysing free text input with respect to emotions. The
E-mail: [email protected]
following questions serve as a basis for the development
2168-6750 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on May 10,2020 at 19:47:07 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TETC.2020.2974478, IEEE
and implementation of the application: stand-alone software, e.g. Wysa and WoeBot. In contrast
to other digital interventions in mental health, chatbots
• How can emotions be recognized in chatbot conver-
aim to increase the adherence to the intervention [14] [20]
sations?
[21]. Chatbots process the user input, and offer responsive,
• How can detected emotions be used to support a user
guided conversations and advice to help users in current
in regulating his / her emotions?
mental health challenges. The bots normally ask a user on
The paper is structured as follows. Section 2 describes a daily basis on his emotions, thoughts, and behaviour.
the background of cognitive behaviour therapy (CBT) and Some systems passively track users’ movements via the
introduces the state of the art in conversational agents accelerometer integrated in the phone [22].
for mental health. Furthermore, an overview on emotion The chatbot Wysa [22] provides a mood tracker and can
analysis from natural language text is given. In Section 3, detect negative moods. If necessary, it suggests a depression
we describe the requirement analysis process, the system test and recommends seeking professional help. To sup-
development and used frameworks. The mobile application port the relief of anxiety, depression and stress, there are
including the chatbot is introduced in Section 4. A usability mindfulness meditation exercises integrated in the app. The
test was performed with methods and results summarized chatbot was tested in a study with a total of 129 participants
in Section 5. The paper finishes with discussions (Section 6) [22]. The participants were divided into two groups (fre-
and conclusions (Section 7). quent and occasional users). The quantitative results show
that frequent users had a higher, average improvement in
their mood than the group of occasional users. Two thirds
2 BACKGROUND of the users perceived the app as positive. They replied that
2.1 CBT and emotion regulation the conversation with Wysa was helpful and stimulating.
We grounded the development of the application, especially Woebot is a chatbot that uses CBT strategies to help users
the information basis of the chatbot, on knowledge and cope with symptoms of anxiety and depression [14]. The
best practices of CBT. CBT was initially developed for the chatbot allows to enter emotions by selecting terms from a
treatment of mild to moderate depression [15]. However, list of suggestions. This limits the user to comprehensively
it is also in use for treating other mental disorders such express his or her current emotions and feelings. In current
as anxiety disorders, panic disorders, bipolar disorders and publications, it is not mentioned on which psychological ev-
post-traumatic disorders. The basic assumption of CBT is idences the system is based on. In a study comparing people
that psychiatric disorders arise and are maintained due who interacted with Woebot versus a group of people who
to distorted cognition (thoughts and attitudes). In modern read a self-help book 12 times over two weeks , those who
CBT, the treatment pays increasingly more attention to used Woebot had a reduction in their symptoms. Another
the emotional aspects since emotions and their regulation chatbot, Replica1 , allows users to reply in their own words
impact on mental health [16]. to chatbot comments, but the chatbot does not understand
Basically, emotions are considered subjective perceptions the context and therefore gives inappropriate answers or
that persist over a short period of time and relate to specific changes the subject.
events, persons or objects. According to Richard Graph’s C- Mental health apps are easy accessible and easy to use.
I-E (Cognition, Intuition, Emotion) theory, there are seven They can be consulted whenever users feel sad, anxious,
basic emotions: Fear, disgust, anger, joy, grief, guilt and stressed, or just want a distraction. They are also signif-
shame [17]. Finally, emotion regulation concerns influencing icantly less costly than face-to-face interventions such as
the type, intensity and duration of emotions into a certain CBT [21]. Vaidyam et al. found out that chatbots showed
direction. potential to support psychoeducation and self-adherence
[12]. Users are satisfied by interacting with such systems, in-
dicating that they could provide an extension to psychiatric
2.2 Conversational agents in mental health treatment. Limitations of the existing chatbots are that they
Existing studies and reviews show that mobile apps with are only available in English and the chatbots are asking for
integrated CBT can be successfully used for the treatment of emotions, but are normally incapable of determining emo-
psychiatric disorders. In a review by the Bolton University, tions based on natural language user input. In this paper, we
the effectiveness of mHealth applications with CBT was introduce SERMO, a chatbot with integrated CBT interven-
investigated. Half of the studies focused on the treatment of tions in German enabling unrestricted natural language user
depression with generally positive results [18]. A marginal- input. It differs from the available systems by integrating
ized control study was conducted with 300 participants natural language processing (NLP) and emotion analysis
using the mobile Web app MoodHacker [19]. An online self- methods in order to automatically determine emotions from
assessment survey was realized at the beginning and after the user input. In contrast to existing decision-tree based
six weeks. During the six-week follow-up, significant effects systems, our system does not rely on strict patterns, but on
of depressive syndromes, behavioural activation, negative syntactic and semantic similarities between user input and
thinking, knowledge, work productivity, absenteeism and stored expressions. Furthermore, with this paper we address
disability were observed. the issue that existing mental health chatbots are rarely
Clinical outcomes for the use of mental health chatbots described in sufficient detail [13]. We describe details on the
are still rare as a recent systematic review by Abd-alrazaq et underlying psychological evidences, technical implementa-
al. [13] shows. The authors identified 41 different chatbots
in mental health, mainly implemented as rule-based and 1. https://play.google.com/store/apps/details?id=ai.replika.app
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 3
tion and dialogue structure. This will allow researchers and applied recursive deep models to predict sentence level
practitioners to judge the quality of the underlying evidence sentiment. With a complicated treebank annotation, the
base. proposed method has recognized the negated sentiment in
a better way and achieved more than 80% overall accuracy
[30].
2.3 Emotions in chatbots and natural language
Sentiment and emotion analysis in a medical context
Emotions can be detected in text [23], voice [24] and faces has been mainly addressed for web content. Denecke and
[25] with varying reliability. In general, emotion recognition Deng reviewed the state of the art and studied the chal-
is a two-step procedure which involves extraction of signifi- lenges of sentiment analysis in medical settings [31]. They
cant features and classification. This general principle holds found out that given the varying usage and meanings of
true for all three sources of emotion detection, text, voice terms, sentiment analysis from medical documents requires
or faces, but the relevant features differ. Feature extraction a domain-specific sentiment source and complementary
determines a set of independent attributes, which in sum context-dependent features to be able to correctly interpret
can characterize an expression of an emotion. Features for the implicit sentiment. The challenges of sentiment and
emotion recognition from faces include for example specific emotion analysis in mental health chatbots have not yet
distances or angles in the face determined from recorded been considered so far. Furthermore, health applications
images (e.g. angle of eyebrows [26]). For classification in equipped with emotion and sentiment analysis are still
emotion recognition the features are mapped to one of missing.
various emotion classes like anger, joy, sadness, disgust, Although there are limitations of lexicon-based ap-
surprise, etc. The feature attributes and the chosen classifier proaches, we decided on the current implementation to inte-
impact on the classification quality. The classification is often grate only a lexicon-based approach into SERMO. The main
challenging since multiple emotions can be expressed at the reason is that training data in German is missing, while
same time [27]. emotion lexicons for some emotions are already available.
We are focusing on emotions expressed in natural lan- SERMO integrates a method to analyse a user statement to
guage. There are two basic procedures: lexicon-based or select an appropriate, motivating or encouraging response
machine learning-based methods for analysing emotions in when given a specific user emotion. The system is imple-
text. A lexicon-based method uses an emotion term lexicon mented in a way, that in future, the system could create
that for each emotion contains terms that could be used to in parallel an annotated data set: Emotions recognized by
express this particular emotion. Through lexicon lookup, SERMO have to be confirmed by the user which could be
the input sentence is matched with the lexicon terms. The used to collect labeled data.
matches are aggregated to determine an emotion. A problem
with this method is, that the context remains unconsidered.
This is sometimes important for a correct emotion classifica- 3 M ETHODS
tion since a term can change meaning in different contexts.
In contrast, machine learning-based approaches are based In this section, the methods for collecting requirements and
upon labeled training data and could consider the context. developing the application SERMO are described. Methods
Prominent examples for algorithms are Naive Bayes and for testing the usability of the application are introduced in
Support Vector Machines (e.g. [23]). section 5 along with the test results.
Analyzing emotions or sentiments resulting from inter-
actions with chatbots has so far only rarely been addressed.
3.1 Requirement analysis
There are multiple ways to enable a chatbot to choose an
emotion category for a response. On the one hand, the The application requirements were determined by means of
chatbot can be equipped with a personality and background a literature search, interviews and discussions with experts.
knowledge. On the other hand, training data can be used to More specifically, four psychologists of different clinics in
find the most frequent response emotion category for an Switzerland and Germany were interviewed. The inter-
emotion in a given response and use this as the response views focused on the current treatment of mental disor-
emotion. Previous research by Skowron proposed affect ders and on current practices to accompany patients in
listeners, i.e. conversational systems that can respond to the time between two therapeutic sessions. The collected
users’ utterances on a content-, but also on an affect-level requirements formed the framework of the implementation.
[28]. Zhou et al. [29] describe an emotional chatting machine The literature search focused on mental diseases in general,
that can generate appropriate responses fitting in content psychotherapy, mental health applications with conversa-
and emotion to a users’ response. The architecture consists tional user interfaces, emotion recognition in free text, and
of a recurrent neural network enabled with GRU cells with sentiment analysis. Results were retrieved from Pubmed
attention mechanism. It contains three different mechanisms and Google Scholar.
for generating responses with a specific emotion: Exter- Furthermore, persons suffering from mental diseases
nal knowledge serves to model emotions explicitly using were asked for app functionalities that would be of help
an external emotion vocabulary. Internal memory captures to deal with their disease. Four young people aged 16-25
emotion dynamics and finally, different emotion categories with diagnosed depressions and one 59-year-old man with
are represented as embedded vector. Socher et al. introduced bipolar disorder were interviewed. The patients had been
a sentiment treebank that includes fine-grained sentiment suggested by the psychologists considering their current
labels to parse trees of sentences. On this treebank, they mental state.
3.2 System development document her emotions and to deal consciously with her
The chatbot was developed using the Syn.Bot frame- problems. For this purpose, she desires an app that sup-
work (https://www.nuget.org/packages/Syn.Bot/). It con- ports her in keeping a diary of thoughts and emotions and
tains OSCOVA (https://oscova.com) and an official SIML that supports in coping with her mental health problems
(Synthetic Intelligence Markup Language) interpreter. The between therapy sessions.
framework is platform independent. We use OSCOVA to The functional requirements can be grouped into six
realize the chatbot. Compared to other chatbot frameworks, categories. They concern the login, the chatbot, diary, list of
OSCOVA does not have a hybrid decision tree or does not activities, information provision and notification. It should
rely on strict patterns. Instead, it relies on the semantic be possible to log in to the app using a code. The diary
and syntactic similarities between user input and stored should allow to enter a daily goal, to record the daily mood
expressions. OSCOVA also allows developers to use ma- and to document an event. The event is stored according
chine learning and NLP functions. Another advantage is to the ABC schema (situation, thoughts, emotions) [32]. De-
that OSCOVA does not require a connection to the online pending on the users’ emotion, the system should suggest
API and can therefore be used in an offline setting. activities or exercises to the user and should contact the user
OSCOVA consists of five different components: expres- at least daily. Further, the user should be enabled to access
sions, entities, contexts, intents, and dialogues. An expression diary entries and to get an overview on the mood develop-
is a pattern that defines user input. The expression attribute ment over the past month. Desired chatbot functionalities
is used to decorate an intent method by triggering user include:
input expressions. Entities are pieces of information within • User can frame answers in his own words.
a user message that a developer would like to extract. • User can select predefined answers.
Entities are associated with an entity type like “dateTime” • Chatbot recognizes emotions in natural language
and “emotion”. The context represents the current context of user input.
the conversation, i.e. the conversation state of a user session. • Chatbot suggests activities and exercises for regulat-
An intent is any action that the bot is supposed to execute ing emotions.
when the user message is similar to an expression. A dialog • Chatbot creates an entry for an event.
in OSCOVA is used to group together a collection of related • Chabot stores the mood of the user on a daily basis.
intents and actions. Dialogues determine which responses • Chatbot stores specified goals of a user.
must be returned to the chatbot’s user input. • Chatbot reminds user on appointments.
The application was developed with Xamarin.Forms
(https://docs.microsoft.com/en-us/xamarin/xamarin- Non-functional requirements include that the system
forms/). Xamarin.Forms was chosen because the should run on Android 7.1 or iOS 12.2.
OSCOVA framework exists as .NET NuGet to facilitate
referencing OSCOVA in .NET projects. An advantage is that 4.2 Architecture
Xamarin.Forms provides a cross-platform interface toolkit
for .NET developers. Large parts of the development results Figure 1 shows the system architecture of SERMO. The
can already be used for all platform implementations. We collected data on events, daily mood etc. are stored in a
developed a native Android platform application with structured manner in an SQLite database in the internal
Xamarin.Forms. The underlying database uses SQLite. storage of the mobile phone. User input from the chat is
processed using the OSCOVA interpreter. It includes an
NLP component that realises the emotion recognition. The
4 SERMO - S YSTEM OVERVIEW OSCOVA interpreter determines the context and intentions.
In the following, we are going to describe the scenario un- The NLP component, in particular the emotion recognition
derlying the system development. It was developed based method, exploits a knowledge base which is a lexical re-
on the collected requirements that are also summarized. Af- source with lists of words that express the emotions that
terwards, the system architecture and functionalities are in- can currently be detected. More details on the emotion
troduced. Finally, technological details on the implemented recognition algorithm are provided in section 4.4
chatbot conversation and emotion analysis algorithm are
provided. 4.3 Functionalities
Following the collected requirements, SERMO provides four
4.1 Requirements main functionalities: 1) interaction with the chatbot, 2) pro-
From the expert and patient interviews, we decided on the vision of activities and exercises to train the attentiveness, 3)
following scenario: Mona is a 25-years old student. During diary of events with associated emotions and 4) information
the semester, she has to pass many exams and projects to provision.
hand in. She is under stress and constantly has negative
thoughts and emotions regarding her ability to complete 4.3.1 Chatbot
the studies successfully. She reproaches herself and blames In interaction with the user, the chatbot asks for the current
herself for everything. One day, she collapses at school. mood, runs an ABC dialogue to retrieve information on
As a consequence, she starts psychotherapy and sees a a current event that impacted on the user as well as the
psychotherapist once a week for an hour who exploits CBT emotion associated with the event. Based on this, it suggests
methods. In addition to the sessions, Mona is asked to suited activities and exercises (see Fig. 2). The content of the
chatbot was grounded on scientific literature, expert input

and assessment of existing apps. SERMO integrates four
contexts (where a context is a topic of a specific part of a
conversation): daily mood, emotion recognition, supporting
measurements and information / other activities.
Patients with depression or bipolar disorder often suffer
additionally from mood disorders. Keeping an emotion
diary can help people monitor and control their emotions
[33]. Previous research showed that mobile mood applica-
tions helped users to experience patterns in their mood,
to improve their mood and to cope better with stress and
emotions [34]. Therefore, SERMO asks daily the user about
his mood. It can be selected on a slider scale from one to
five. When the mood is rather positive, tasks are suggested
to train the strengths and resources of the user. In case the
mood is rather negative, the chatbot asks for the triggering
Fig. 1. System architecture of SERMO. The knowledge base contains event of the mood. For this purpose, the ABC theory has
the chatbot brain, i.e. information on how to react to an emotion. Further,
a list of emotion terms is captured in the knowledge base as a basis for been implemented in the chatbot.
the emotion analysis algorithm. The SQLite database is used to store The ABC theory was developed by Albert Ellis. It follows
diary entries. the approach that consciously or unconsciously perceived
stimuli are evaluated and these evaluations lead to cer-
tain feelings and behaviors [32]. The Activating Event (A)
represents an external or internal event or situation. The
Belief (B) comprises attitudes and thoughts regarding the
event and Consequences (C) reflect feelings and behaviors.
SERMO collects in a dialogue with the user information on
A, B, and C and exploits the user input to determine the
emotion. The emotion analysis (see section 4.4) currently
distinguishes five emotions: fear, anger, sadness, joy, grief.
In case the user decides to regulate the emotion, SERMO
first offers information on the detected emotion and then
suggests exercises or activities. For example, if the emotion
anger is detected, SERMO asks whether it is a justified anger
or not. If justified, the user is asked whether he or she would
like to change the situation. If unjustified , the chatbot tries
to distract the user by asking positive questions such as
”What are you proud of?”, ”What do you like doing in your
spare time?”.
For realizing the chatbot conversation, 13 dialogues have
been developed with OSCOVA. They are described in table
1. They cover the various interactions triggered by an emo-
tion or mood expressed by the user. An example of the flow
of trigger events is shown in figure 3.
4.3.2 Activities and exercises

Several studies show that certain activities have a positive
impact on patients of various disorder groups [35]. To reflect
this aspect, SERMO provides users with a list of pleasant
activities divided into four categories: Mindfulness exer-
cises, relaxation exercises, leisure activities and others that
are accessible also outside the chat conversation. The list
was adopted by a psychologist. A list of pleasant activities
is helpful for users who are unable to develop sufficient
ideas for positive activities. The user can mark activities as
Fig. 2. Screenshot of the chat: SERMO asks whether the user would favorites within the app. After performing an activity, the
like to get some suggestions on how to improve the mood. It suggests user can enter his experiences and emotions associated with
hugging someone, listening to music, social interaction, smiling. Once the activity in his SERMO diary.
the user selects an activity, SERMO will explain, why this activity is Mindfulness exercises are a form of meditation. They
helpful for changing the mood.
are increasingly used within CBT. In this context, mind-
fulness means to focus one’s attention intentionally and
non-judging on the conscious experience of the present
Fig. 3. Flow chart of the Emotion Dialog. Following the ABC theory, the user is asked about an event, his thoughts and feelings. SERMO then
determines the emotion. If recognized, SERMO asks for confirmation. If recognized incorrectly, the user can enter the correct emotion. Finally,
SERMO
. proceeds by handling the correct emotion (FearDialogue, AngerDialogue, GriefDialogue)
moment [36]. In the application, the mindfulness exercises 4.3.4 Information

breathing, sitting meditation, body scan and mindful yoga
are suggested and integrated as audio.
4.3.3 Diary The information section of SERMO comprises four topics:

SERMO, CBT, emotions and advice. In the tab “SERMO”,
In the diary, the user can record his mood on a daily basis the app is explained with all available functions. The basics
(see figure 4 ). The mood is recorded using a slider with five of CBT are introduced under tab “CBT” and the seven basic
different smileys ranging from good to bad. It is possible to emotions (fear, disgust, anger, joy, grief, guilt, shame) are
record several moods on one day. In this case, the average explained in tab “Emotions”. Emotion regulation primarily
mood of the day is displayed in the day view. The diary concerns supporting a user in identifying and naming his
contains a day and a month view. In the day view, the user feelings, emotions and thoughts. In order to achieve this
can quickly and easily document his mood and the things goal, it is important that the user knows what emotions
he has experienced. The month view shows the course of are and what they mean. This is not always as self-evident
the average mood for a period of a month. In this way, the as thought and is part of emotion-focused therapy. SERMO
user can observe changes in his mood over a longer period therefore explains the basic emotions in the information sec-
of time. tion. Beyond, addresses of professional counselling centres
In CBT, psychotherapists work with goals. Together with from all over Switzerland are provided. The information
the patients, goals are defined for a short time and the on the counselling services is relevant, as mental illnesses
patient tries to achieve this goal. In SERMO, patients can should not be underestimated and must therefore be treated
enter the discussed goal. With the diary and set goals, a by specialists as early as possible. The app is not expected
user can understand himself in a better way, observe the to replace professional help, but to encourage users for
course of illness and if necessary, based on the aggregated self-help and equip them with appropriate information and
view shown in the app, discuss it with a therapist. tools.
TABLE 1
SERMO integrated chatbot dialogues.
Welcome Dialogue In this dialogue, the welcoming of the user is administered. If started for the first time, the chatbot asks for the user’s
name and consent to the privacy policy. Then, the user is asked about his mood. If the mood was already entered up to
three hours ago, the welcome dialogue is based on the last mood.
Joy Dialogue As soon as the user states that he is in a good mood, the Joy Dialogue is called. SERMO asks for the reason of the good
mood and finally asks if he wants to do a task. If so, the HashTag dialogue is started.
Normal Dialogue This dialogue is started when the user is in a balanced mood. SERMO asks for the reason of the mood and starts the
Emotion Dialogue to determine the emotion.
Sadness Dialogue This dialogue is started when the user states he is sad. The Emotion Dialogue is started to determine the emotion.
Emotion Dialogue After the mood has been selected, the emotion dialogue is executed. This dialogue implements the ABC theory. The
user is asked about the situation or event, his thoughts and feelings. Based on the replies, the emotion is recognized
and passed forward to the appropriate emotion dialogue (i.e. Fear Dialogue, Anger Dialogue, Grief Dialogue, Sadness
Dialogue, Joy Dialogue).
Anger Dialogue The dialogue handles the emotion anger. The user is informed on the different types of anger (appropriate anger and
inadequate anger). Further, a pleasant activity is suggested.
Fear Dialogue This dialogue concerns the emotion fear. The user is provided with information on reasonable and inadequate fear.
Finally, he is asked to transform the fear-provoking thoughts into positive thoughts.
Grief Dialogue The dialogue handles the emotion grief. The user is informed on the different phases of grief. Further, activities for
distraction are suggested.
Other Dialogue Further measures are proposed to the user in this dialogue. One dialogue is about improving the user’s mood and the
other allows the user to plan the day. In addition, the user can select mindfulness exercises or the option Nothing.
Improved Mood In this dialogue, various activities are suggested to the user which could improve his current mood. After having carried
Dialogue out an activity, he has the possibility to carry out another activity.
HashTag Dialogue This dialogue manages specific interactions that are triggered by the user using a hashtag. In its current implementation
two interactions are available: #todo show a list of tasks for the current day, #strengths shows a list of strengths of the
user. Both lists can be adapted by the user.
Activity Dialogue In this dialogue, mindfulness exercises are suggested and, if selected, the user is redirected directly to the exercise on
the Activities page and the exercise is started.
Goodbye Dialogue This dialogue manages the ending of the conversation.
4.4 Emotion analysis user input is classified as emotion class where the largest
number of terms were extracted from the input. In some
The implemented emotion analysis algorithm uses a cases, however, it may happen that no emotion terms are
lexicon-based approach. Five emotions are recognized au- identified or there is no majority of emotion terms of one
tomatically: Fear, anger, grief, sadness, joy. The processing specific category. In these cases, the application responds
comprises six steps (see figure 5). First, the user input is split that it could not identify the user’s emotion and asks the
into sentences. Second, each sentence is tokenized. Third, user to select one of the five emotions. Depending on the
stop words are removed, i.e. all words that are irrelevant for determined emotion, the dialogue proceeds as foreseen in
emotion classification. This includes prepositions, pronouns the emotion-specific dialogues (see table 1).
etc. Fourth, negations are detected, but we did not yet
implemented an interpretation of negations. In principle,
the meaning of emotion words with negation has to be 5 U SABILITY TEST
inverted. This requires a list of antonyms for all emotion We conducted a usability test to study the user experience
words. In the current version, the negated emotion words and quality of the app and to determine areas of improve-
are excluded from further processing. Fifth, the emotion ment. Furthermore, we collected feedback from patients and
terms are determined and finally, the input is classified into experts on the app and its functionalities. The methodology
one out of the five emotion categories. and results are presented in the following.
The underlying emotion lexicon is the Emotional Dic-
tionaries of SentiWS. The SentiWS is a publicly available 5.1 Usability test methodology
German vocabulary for emotional analysis [37]. It covers
only the five emotions listed above. It remains to the future As demographic data, we collected age, gender and a per-
to develop emotion term lists for the emotions guilt and sonal judgment of the technical competencies on a scale of
shame to cover all relevant emotions. In order to deal with 1 (no competencies) to 10 (expert). The usability test was
typos and writing errors, a fuzzy matching method is used scenario-based comprising six tasks. The users were asked
for identifying emotion terms. In this way, words can be to perform the tasks to test the specific functionalities and
recognized even if they do not match 100% with words in provide feedback whether they could complete the task (yes
the dictionary. A threshold value was defined for the fuzzy / no) or whether and which problems occurred. The tasks
matching [38]. included
For the user input, all matches of terms with the emotion • Define a goal,
lexicon are determined. Per emotion class, the number of • Enter a mood,
terms that have been identified are calculated. Finally, the • Enter a current event,
ranging from -3 (fully agree with negative term) to +3 (fully

agree with positive term). Half of the items start with the
positive term, the rest with the negative term (in random-
ized order). The 26 items (see figure 6) are grouped into six
scales: attractiveness, perspicuity, efficiency, dependability,
stimulation, and novelty. Attractiveness concerns the overall
impression of the app. Perspicuity assesses whether it is
easy to get familiar with the app. Efficiency studies whether
users can solve their tasks without unnecessary effort. De-
pendability aggregates factors whether users feel in control
of the interaction. Further, stimulation assesses how exciting
and motivating it is to use the app, while novelty asks for the
degree of innovation. The scales of the UEQ can be grouped
into pragmatic quality (includes the dimensions perspicuity,
efficiency, dependability) and hedonic quality (stimulation,
novelty). Pragmatic quality aggregates task related quality
aspects; hedonic quality the non-task related quality aspects.
Further, we compared the measured user experience of
SERMO to results of other established products using a
benchmark data set containing quite different products. The
UEQ offers such a benchmark, which contains the data of
246 product evaluations with the UEQ (with a total of 9905
participants in all evaluations) [39].
We targeted to include at least five persons into the us-
ability test. Previous studies from the human-computer in-
terface literature have found that 80% of usability problems
can be detected with only five research subjects [40]. Turner
et al. even claim that the most serious usability problems
can be revealed with only three subjects [41]. Participants
Fig. 4. Screenshot of the diary: A goal has not yet been specified. The were recruited amongst students and employees of two
daily mood is rather bad (smiley). Two events had been added: One
concerns a funeral; the other one a positive activity which is eating ice
universities. Patients suffering from mental diseases were
cream. recruited from a clinic specialised in psychiatric disorders
with in- and outpatient care. Psychologists of this clinic
selected candidates to be asked to participate in the test to
ensure that the people are mentally stable. Further, psychol-
ogists and psychotherapists were asked to participate in the
testing. None of the participants who joined the test had
been involved in the development of SERMO. Before the
survey was conducted and participants were recruited, a
clarification of responsibilities has been sent to the Cantonal
Ethics Committee. The Ethics Committee confirmed that for
the planned evaluation no approval is required.
5.2 Usability test results

The usability test took place between September 15 and
October 24, 2019 and was performed by 21 persons (13
Fig. 5. Emotion analysis process
female, 8 male). 9 persons were currently under treatment
in the clinic. None of them was currently keeping an emo-
tion diary, but five confirmed to have problems in regulat-
• Choose a pleasant activity, ing their emotions. Furthermore, 4 psychologists and psy-
• Run a mindfulness exercise, chotherapists and 8 persons with different background (de-
• Chat with SERMO for at least 1 minute. sign, computer science, commercial, communication) from
two universities joined the test. The age of the participants
In the second part, the participants had to judge concrete ranged between 22 and 67 (average 38.4). The average self-
aspects of user experience. For this purpose, we applied the perceived technical competence was 7.3.
user experience questionnaire (UEQ) provided by Schrepp Figure 9 and 10 present the results of the judgements
et al. [39]. The main goal of the UEQ is a fast and direct of all participants aggregated into the 6 categories. Find-
measurement of user experience. Each item (in total 26 ings show that the scores for attractiveness and two scales
items) of the UEQ consists of a pair of terms with opposite describing a pragmatic quality aspect (efficiency, perspicu-
meanings. Each item can be rated on a 7-point Likert scale ity) are good, i.e., values are above 0.8 (see figure 9). In
particular, the participants confirm that the app is under- that a therapist should receive an alert when the system de-
standable, easy to learn, and clear (scale perspicuity) as well termines a certain risk for a patient from the conversations.
as friendly, attractive and pleasant (scale attractiveness, see The results show that the tasks can be completed well
figure 6). The scales describing hedonic quality (stimulation with the app and it is easy to get used to the app (perspicuity
and novelty), i.e., fun to use, show neutral evaluations. and efficiency are good). By non-experts, the app is per-
When considering only the expert judgements, all mean ceived as not very stimulating and motivating. This might
values are good, i.e. a value above 0.8 is achieved for all be due to false expectations. Nowadays, people use voice
scales (see figure 8). user interfaces such as Siri, Alexa etc. The objective of those
Figure 7 shows the comparison of the judgements of all systems is to entertain and provide information. SERMO is
participants to the UEQ benchmark. For the categories per- not designed to entertain a user, but to collect information
spicuity and efficiency a value above average was achieved. and improve certain skills of a user. The critical judgements
Attractiveness and novelty was judged as below average regarding novelty, stimulation and attractiveness of the non-
and dependability and stimulation is bad compared to the experts might also be due to their technical background: the
benchmark. According to the UEQ handbook, the label participants were rather experienced.
”good” means 10% of the results in the benchmark data The evaluation setting had several difficulties: The non-
set are better and 75% of the results are worse. The label experts had a high technical competence, while the experts
”Above average” means 25% of the results in the benchmark were rather inexperienced in technical issues (score average
are better than the result for the evaluated system, 50% of of 4). The non-experts were not informed on cognitive
the results are worse. Interestingly, the four experts (psy- behaviour therapy and received only a brief introduction
chologists or psychotherapists) judged the application more into the goals of SERMO. This might have impacted the ex-
positively (see figure 8): The category attractiveness was pectations. We conclude that SERMO in its current stage still
good compared to the benchmark; perspicuity, efficiency needs improvements with respect to design and variability
and stimulation above average and novelty even excellent. of chatbot responses. The system has to be evaluated in a
We observed a larger variance in the judgements of real world treatment setting where people are informed on
attractiveness, stimulation and novelty. The reliability of the actual purpose of the app.
the UEQ scales attractiveness, stimulation and originality is To study the user experience of SERMO, we decided to
good and excellent as the Cronbach-Alpha values indicate. use the UEQ since a benchmark has been created that allows
Cronbach-Alpha is a measure of the internal consistency of to better judge and compare the results with other systems.
a questionnaire dimension. The Cronbach-Alpha coefficients However, the benchmark does not reflect the peculiarities
in our evaluation are 0.94 for attractiveness, 0.94 for stimu- of systems or mobile applications in healthcare. There are
lation and 0.85 for originality. Thus, these values indicate other scales and questionnaires available such as the System
a good scale consistency. The Cronbach-Alpha values for Usability Scale. Chatbot usability is still a very incipient
efficiency (0.26), 0.53 for perspicuity, and dependability field as the study of Ren et al. shows [42]. Beyond usability
(0.27) are rather weak, i.e. the internal consistency is poor and user experience, there are other aspects that have to be
or even unacceptable. This can be due to problems with the studied specifically for a health chatbot such as the specific
interpretation of the items in these scales: Some UEQ items task-oriented perspective and the clinical efficacy.
are difficulty to assess for SERMO (e.g. item ”slow/fast”
and ”not secure / secure” cannot be judged by the users or 6.2 Comparison with existing mental health chatbots
interpreted differently). SERMO is one of the first applications for supporting emo-
During the interactions with the chatbot, the conversation regulation and processesing German natural language.
tion sometimes stopped during the test due to unexpected There are several applications for mentally ill people on the
user input. Conversation competencies of SERMO are still market. Compared to the mental health chatbot Woebot for
limited which was recognized by all participants. It became example [14], SERMO differs in the scope: While Woebot
clear that even though the app is designed to collect the provides psychotherapy support and education, SERMO
same information on a daily basis, the users desire a larger additionally aims to support practicing emotion regulation
flexibility. Some users suggested reformulations of the chat- and allows self-monitoring of emotions and related events.
bot responses to be more helpful and acceptable by the Through automatic emotion recognition from free text user
users. input by SERMO, the user can also be supported more
specifically by appropriate information, tasks and exercises.
The integrated feedback function on suggested exercises
6 D ISCUSSION
could help in future to study the effectiveness of different
6.1 Lessons learnt from the usability test measures depending on individual situations and emotions.
Psychologists and psychotherapists confirmed that SERMO The system could also learn user preferences.
could be stimulating for patients. Obviously, a similar ap- Another peculiarity of SERMO is that the system runs
plication is not yet in use in their daily treatment practice. offline. The underlying technology is the OSCOVA frame-
They clearly see benefits of SERMO. We received the feed- work. The OSCOVA NLP engine supports machine learn-
back from experts that the app is well suited for patients ing, i.e. OSCOVA trains itself to understand natural lan-
who have problems in expressing themselves in a face-to- guage which helps to improve the recognition rate for
face encounter. It could well bridge the gap between two natural language user input. In contrast, Woebot is based
therapeutic sessions (instead of calling the therapist, the on decision trees and is thus more restricted with respect to
patient could chat with SERMO). However, they suggested interpreting user input.
Fig. 6. UEQ Answers per item (n=21). -3 means fully agree with negative item, +3 means fully agree with positive item. Dark red (-3), light red (-2),
orange (-1), grey (0), light green (+1), darker green (+2), dark green (+3)
Fig. 7. Comparison of SERMO evaluation results with the UEQ bench-

mark (n=21)
Fig. 10. Results of the UEQ questionnaire aggregated into six categories
attractiveness, perspicuity, efficiency, dependability, stimulation and nov-
elty (n=21)
Fig. 8. SERMO evaluation results from psychologists and psychothera-
pists compared to the UEQ benchmark (n=4)
base and the reliability of chatbot answers. The system is
not designed to generate responses on its own from training
In this paper, we provided a detailed description of data. This is to ensure that the responses are reliable and the
SERMO, its knowledge base and its integrated technology. patient safety is not impacted. The system development was
This will enable users and mental health service provider based on clinical evidences from CBT which are described in
to inform themselves on the system and judge the evidence this paper. The conversation flow of SERMO was approved
by psychologists. For many existing systems, it is unclear
on which psychological principles they are based if any of
them have been considered at all. We integrated exercises
and knowledge retrieved from discussion with psycholo-
gists and from literature. Integrated exercises are currently
only examples, i.e. the application has to be extended for
a real-world use with additional exercises to increase the
user experience and have a larger variability in suggested
measures.
Fig. 9. Confidence intervals for the six categories (n=21) SERMO is not yet available in app stores as other mental
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 11
health chatbots. The reason is the current development current implementation, all data is stored on the mobile
phase: the system still needs improvements to be considered phone. In future, privacy issues have to be addressed. In
as the usability test showed. case of a market-ready application, a usage condition, data
protection declaration and declaration of consent covering
all aspects of the General Data Protection Regulation has
6.3 Limitations of the emotion recognition
to be integrated. In the next phase of the project, we will
The implemented emotion recognition method still has po- work closely together with psychological experts and re-
tentials for improvement. First of all, the algorithm has searchers to improve and expand the chat processes. Once
to be extended to cover all relevant emotions. There are the improvements of the app has been extended and the
different theories on emotions and emotion types. We based variability in chatbot answers has been increased, a pilot
our work on the theory of Berking and want to distinguish study will be conducted as randomized controlled trial with
only seven emotions [17]. This was a result of our dis- the target group to further study efficacy and usability of
cussions with psychologists. Currently, the prerequisite for the app.
emotion recognition is that the user writes whole sentences
in German, with punctuation marks, without errors. Smaller
spelling errors can already be handled by the integrated 8 ACKNOWLEDGEMENTS
fuzzy matching method. We have run a preliminary test
on the emotion classifier with texts derived from a German We would like to thank all persons participating in the
depression forum. 50 statements were classified by SERMO. usability test and the collaborating psychologists for their
An accuracy of 81% could be achieved. Errors were partially input and selection of patients to be involved in the testing.
due to the fact that the statements expressed emotions that
SERMO is not yet able to determine. For six statements,
no emotion was returned. Surprisingly, the algorithm well R EFERENCES
recognizes the emotions even though it is still simple. For [1] Z. Steel, C. Marnane, C. Iranpour, T. Chey, J. W. Jackson, V. Patel,
example, the sentence “This does not make me angry, but and D. Silove, “The global prevalence of common mental disor-
sad” was classified correctly as “sadness”. However, the ders: a systematic review and meta-analysis 1980–2013,” Interna-
tional journal of epidemiology, vol. 43, no. 2, pp. 476–493, 2014.
current lexicon-based approach for emotion classification [2] M. H. Foundation, “Fundamental facts about mental health,”
still has limitations. A morphological analysis or at least https://www.mentalhealth.org.uk/sites/default/files/fundamental-facts-
a stemming algorithm could, among other things, help to 15.pdf, 2015.
improve the matching with the lexicon by reducing the [3] WHO, “Depression and other common mental disorders,” Global
Health Estimates, 2017.
terms of the user input to their lexical roots. In this way, the [4] H. A. Whiteford, A. J. Ferrari, L. Degenhardt, V. Feigin, and T. Vos,
recognition rate could be improved. Furthermore, methods “The global burden of mental, neurological and substance use
are necessary to determine implicit emotions. For example, disorders: an analysis from the global burden of disease study
2010,” PloS one, vol. 10, no. 2, p. e0116820, 2015.
when someone is writing “There is no sunshine inside of
[5] D. Schuler, A. Tuch, N. Buscher, and P. Camenzind, “Psychis-
me” the person is most probably sad. che gesundheit in der schweiz,” Schweiz Gesundheitsobservatorium,
Existing work on emotion extraction from free text ex- 2016.
ploited support vector machines. Desmet and Hoste intro- [6] P. Cuijpers, M. Sijbrandij, S. Koole, G. Andersson, A. Beekman, and
C. Reynolds, “The efficacy of psychotherapy and pharmacother-
duced an emotion classification algorithm and tested it on apy in treating depressive and anxiety disorders: a meta-analysis
suicide note in English [43]. They distinguished 15 emotions. of direct comparisons.” World Psychiatry, vol. 12, no. 2, pp. 137–48,
Their results show that the most salient features are trigram 2013.
and lemma bags-of-words and subjectivity clues. Spelling [7] C. J. Murray, T. Vos, R. Lozano, M. Naghavi, A. D. Flaxman,
C. Michaud, M. Ezzati, K. Shibuya, J. A. Salomon, S. Abdalla
correction had a slightly positive effect on classification et al., “Disability-adjusted life years (dalys) for 291 diseases and
performance. Shao et al. propose a lexicon-based emotion injuries in 21 regions, 1990–2010: a systematic analysis for the
detection approach that combines basic grammars, tagged global burden of disease study 2010,” The lancet, vol. 380, no. 9859,
pp. 2197–2223, 2012.
emotional words, and WordNet thesauruses [44]. Clearly, [8] B. D. Oladeji and O. Gureje, “Brain drain: a challenge to global
such existing work has to be considered for improving the mental health,” BJPsych international, vol. 13, no. 3, pp. 61–63, 2016.
emotion recognition in SERMO. Additional lexicons could [9] E. Anthes, “Mental health: theres an app for that,” Nature News,
be included such as LIWC for German [45] or GermaNet vol. 532, no. 7597, p. 20, 2016.
[10] R. D. Hester, “Lack of access to mental health services contributing
[46]. A future evaluation has to find out how reliably and to the high suicide rates among veterans,” International journal of
correctly SERMO recognizes a user’s emotions. Another mental health systems, vol. 11, no. 1, p. 47, 2017.
extension is to understand the entire context of emotion [11] L. Laranjo, A. G. Dunn, H. L. Tong, A. B. Kocaballi, J. Chen,
terms and in this way improve the emotion classification. R. Bashir, D. Surian, B. Gallego, F. Magrabi, A. Y. S. Lau, and
E. Coiera, “Conversational agents in healthcare: a systematic
review,” Journal of the American Medical Informatics Association,
vol. 25, no. 9, pp. 1248–1258, 07 2018. [Online]. Available:
7 C ONCLUSIONS AND FUTURE WORK https://doi.org/10.1093/jamia/ocy072
[12] A. N. Vaidyam, H. Wisniewski, J. D. Halamka, M. S. Kashavan,
In this paper, we introduced SERMO, a mobile application and J. B. Torous, “Chatbots and conversational agents in mental
to support mentally ill people in regulating their emotions. health: A review of the psychiatric landscape,” The Canadian
The usability test results showed that the app is very well Journal of Psychiatry, 2019.
perceived by psychologist, but still needs improvements [13] A. A. Abd-alrazaq, M. Alajlani, A. A. Alalwan, B. M. Bewick,
P. Gardner, and M. Househ, “An overview of the features of
with respect to system stability, dealing with unexpected chatbots in mental health: A scoping review,” International Journal
user input and variability of chatbots responses. In the of Medical Informatics, p. 103978, 2019.
[14] K. Fitzpatrick, A. Darcy, and M. Vierhile, “Delivering cognitive be- [35] M. Linden, Verhaltenstherapiemanual. Springer Medizin Verlag,
havior therapy to young adults with symptoms of depression and 2005.
anxiety using a fully automated conversational agent (woebot): A [36] J. Brantley, “Mindfulness-based stress reduction,” Acceptance and
randomized controlled trial,” JMIR Ment Health, vol. 4, no. 2, 2019. Mindfulness-Based Approaches to Anxiety: Conceptualization and Treat-
[15] A. Beck, Cognitive therapy and the emotional disorders. International ment, pp. 131–145, 2005.
Universities Press, 1976. [37] R. Remus, U. Quasthoff, and G. Heyer, “SentiWS - a publicly avail-
[16] S. Barnow, “Emotionsregulation und psychopathologie,” Psychol able German-language resource for sentiment analysis,” in Pro-
Rundsch., vol. 63, pp. 111–24, 2012. ceedings of the Seventh conference on International Language Resources
[17] M. Berking, P. Wupperman, A. Reichardt, T. Pejic, A. Dippel, and Evaluation (LREC’10). Valletta, Malta: European Languages
and H. Znoj, “Emotion-regulation skills as a treatment target in Resources Association (ELRA), May 2010.
psychotherapy,” Behav Res Ther., vol. 46, pp. 1230–7, 2008. [38] K. Chatzitheodorou, “Improving translation memory fuzzy
[18] A. Rathbone, L. Clarry, and J. Prescott, “Assessing the efficacy matching by paraphrasing,” Proceedings of the Work-shop Natural
of mobile health apps using the basic principles of cognitive Language Processing for Translation Memories, Hissar, Bulgaria: Asso-
behavioral therapy: Systematic review,” J Med Internet Res., vol. 19, ciation for Computational Linguistics;, pp. 24–30, 2015.
2017. [39] M. Schrepp, A. Hinderks, and J. Thomaschewski, “Construction of
[19] A. Birney, R. Gunn, J. Russell, and D. Ary, “Moodhacker mobile a benchmark for the user experience questionnaire (ueq).” IJIMAI,
web app with email for adults to self-manage mild-to-moderate vol. 4, no. 4, pp. 40–44, 2017.
depression: Randomized controlled trial,” JMIR MHealth UHealth, [40] J. R. Lewis, “Sample sizes for usability tests: Mostly math, not
vol. 4, 2016. magic,” Interactions, vol. 13, no. 6, pp. 29–33, Nov. 2006. [Online].
[20] K. H. Ly, A.-M. Ly, and G. Andersson, “A fully automated conver- Available: http://doi.acm.org/10.1145/1167948.1167973
sational agent for promoting mental well-being: A pilot rct using [41] C. Turner, J. Lewis, and J. Nielsen, “Determining usability test
mixed methods,” Internet Interventions, vol. 10, pp. 39 – 46, 2017. sample size,” International Encyclopedia of Ergonomics and Human
[21] K. Kretzschmar, H. Tyroll, G. Pavarini, A. Manzini, and I. Singh, Factors, pp. 3084–3088, 2006.
“Can your phone be your therapist? young peoples ethical per- [42] R. Ren, J. W. Castro, S. T. Acuña, and J. de Lara, “Usability of
spectives on the use of fully automated conversational agents chatbots: A systematic mapping study,” in The 31st International
(chatbots) in mental health support,” Biomedical Informatics In- Conference on Software Engineering and Knowledge Engineering,
sights, vol. 11, 2019. SEKE 2019, Hotel Tivoli, Lisbon, Portugal, July 10-12, 2019.,
[22] B. Inkster, S. Sarda, and V. Subramanian, “An empathy-driven, A. Perkusich, Ed. KSI Research Inc. and Knowledge Systems
conversational artificial intelligence agent (wysa) for digital men- Institute Graduate School, 2019, pp. 479–617. [Online]. Available:
tal well-being: Real-world data evaluation mixed-methods study,” https://doi.org/10.18293/SEKE2019-029
JMIR Mhealth Uhealth, vol. 6, no. 11, 2018. [43] B. Desmet and V. Hoste, “Emotion detection in suicide notes,”
[23] M. Deshpande and V. Rao, “Depression detection using emotion Expert Systems with Applications, vol. 40, no. 16, pp. 6351 – 6358,
artificial intelligence,” in 2017 International Conference on Intelligent 2013.
Sustainable Systems (ICISS). IEEE, 2017, pp. 858–862. [44] Z. Shao, R. Chandramouli, K. Subbalakshmi, and C. T. Boyadjiev,
“An analytical system for user emotion extraction, mental state
[24] K. M. Kudiri, G. K. Verma, and B. Gohel, “Relative amplitude
modeling, and rating,” Expert Systems with Applications, vol. 124,
based features for emotion detection from speech,” in 2010 Inter-
pp. 82 – 96, 2019.
national Conference on Signal and Image Processing. IEEE, 2010, pp.
[45] T. Meier, R. L. Boyd, J. W. Pennebaker, M. R. Mehl, M. Martin,
301–304.
M. Wolf, and A. B. Horn, “liwc auf deutsch: The development,
[25] A. De and A. Saha, “A comparative study on different approaches
psychometrics, and introduction of de-liwc2015,” 2019.
of real time human emotion recognition based on facial expression
[46] B. Hamp and H. Feldweg, “Germanet-a lexical-semantic net for
detection,” in 2015 International Conference on Advances in Computer
german,” in Automatic information extraction and building of lexical
Engineering and Applications. IEEE, 2015, pp. 483–487.
semantic resources for NLP applications, 1997.
[26] A. Fernández-Caballero, A. Martı́nez-Rodrigo, J. M. Pastor, J. C.
Castillo, E. Lozano-Monasor, M. T. López, R. Zangróniz, J. M.
Latorre, and A. Fernández-Sotos, “Smart environment architec-
ture for emotion detection and regulation,” Journal of biomedical
informatics, vol. 64, pp. 55–73, 2016.
[27] L. F. Barrett, R. Adolphs, S. Marsella, A. M. Martinez, and S. D.
Pollak, “Emotional expressions reconsidered: Challenges to infer-
ring emotion from human facial movements,” Psychological science Kerstin Denecke is a professor of medical informatics at Bern University
in the public interest, vol. 20, no. 1, pp. 1–68, 2019. of Applied Sciences. Her research interests include medical language
[28] M. Skowron, “Affect listeners: Acquisition of affective states by processing, information extraction, sentiment analysis, and text clas-
means of conversational systems,” Proceedings of the Second Inter- sification. Denecke received a Doctoral degree in computer science
national Conference on Development of Multimodal Interfaces: Active from the Technical University of Braunschweig. She is a member of
Listening and Synchrony, pp. 169–181, 2010. the German Society of Medical Computer Science, Biometry and Epi-
[29] H. Zhou, M. Huang, T. Zhang, X. Zhu, and B. Liu, “Emotional demiology (GMDS); the German Journalists Association; and the IMIA
chatting machine: Emotional conversation generation with inter- Participatory Health and Social Media Working Group.
nal and external memory,” Proceedings of the 2017 AAAI conference,
2017.
[30] R. Socher, A. Perelygin, J. Wu, J. Chuang, C. D. Manning,
A. Ng, and C. Potts, “Recursive deep models for semantic
compositionality over a sentiment treebank,” in Proceedings
of the 2013 Conference on Empirical Methods in Natural
Language Processing. Seattle, Washington, USA: Association Sayan Vaaheesan studied medical informatics at the Bern University of
for Computational Linguistics, Oct. 2013, pp. 1631–1642. [Online]. Applied Sciences. In August 2019, he started as application developer
Available: https://www.aclweb.org/anthology/D13-1170 at CISTEC AG.
[31] K. Denecke and Y. Deng, “Sentiment analysis in medical settings,”
Artif. Intell. Med., vol. 64, no. 1, pp. 17–27, May 2015. [Online].
Available: http://dx.doi.org/10.1016/j.artmed.2015.03.006
[32] B. Wilken, Methoden der Kognitiven Umstrukturierung. Ein Leitfaden
fr die psychotherapeutische Praxis. Verlag W. Kohlhammer, 1998.
[33] V. Vahia, “Diagnostic and statistical manual of mental disorders 5:
A quick glance,” Indian J Psychiatry, vol. 55, pp. 220–23, 2013. Aaganya Arulnathan studied medical informatics at the Bern University
[34] C. Caldeira, Y. Chen, L. Chan, V. Pham, Y. Chen, and K. Zheng, of Applied Sciences. Since August 2019 she works as an IT consultant
“Mobile apps for mood tracking: an analysis of features and at ERNI Schweiz AG, a Swiss software engineering company.
user reviews.” in AMIA Annual Symposium proceedings. AMIA
Symposium, 2017, pp. 495–504.

A Mental Health Chatbot For Regulating Emotions (SERMO) - Concept and Usability Test

Uploaded by

Copyright:

Available Formats

A Mental Health Chatbot For Regulating Emotions (SERMO) - Concept and Usability Test

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Mental Health Chatbot For Regulating Emotions (SERMO) - Concept and Usability Test

Uploaded by

Copyright:

Available Formats

This article has been accepted for publication in a future issue of this journal, but has not been

A Mental Health Chatbot for Regulating

M ENTAL disorders affect 29% of the global population

chatbot was grounded on scientific literature, expert input

4.3.2 Activities and exercises

moment [36]. In the application, the mindfulness exercises 4.3.4 Information

4.3.3 Diary The information section of SERMO comprises four topics:

ranging from -3 (fully agree with negative term) to +3 (fully

5.2 Usability test results

Fig. 7. Comparison of SERMO evaluation results with the UEQ bench-

You might also like