HAV Exercise EpiInfo (Part 2)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Foodborne outbreak investigations

Case study
Outbreak with Hepatitis A virus in
Scandinavian countries 2013

Case study prepared by Steen Ethelberg, Statens Serum Institut, and


Karin Nygård, Folkehelseintituttet, 2013. Adapted by Tine Hald DTU
National Food Institute for the DTU course: Epidemiology – an
introduction (23256)
1
Introduction

Hepatitis A
Hepatitis A virus (HAV) is a single-stranded, non-enveloped RNA virus belonging to the Picorna
family of viruses (same family as poliovirus). It can infect only humans and some primates. The
route of transmission is faecal-oral.

Symptoms of infection range from mild to severe, and can include fever, malaise, loss of appetite,
diarrhea, nausea, abdominal discomfort, dark-colored urine, and jaundice. The disease is rarely
fatal, and there is no chronic state (as with Hepatitis B and C infections). Adults have symptoms of
illness more often than children. The incubation period is long, typically three weeks (normally 14
to 28 days). After infection, lifelong immunity is gained. Vaccine exists which provides good
protection.

Infections are generally transmitted via food or water that has become contaminated with HAV or
via person-to-person contact. HAV is among the most frequent causes of foodborne infections
worldwide and can cause huge outbreaks. The disease is typical for areas with poor sanitation and
is endemic in many areas in the developing world, but not in Europe. In Northern Europe, infections
are generally associated with foreign travel and outbreaks are very rare.

HAV surveillance and diagnostics and Denmark


HAV disease is notifiable in Denmark. Physicians shall report patients directly to Statens Serum
Institute (SSI). The notifications include basic, clinical information and information about where (in
which country) the infections were most likely acquired. At the SSI, these data are monitored – and
if something unusual is observed, action is taken.

Diagnostic testing for HAV is performed at local laboratories in Denmark. The standard test is
serological and the patient material is a blood sample that is examined for antibodies (IgM) against
the virus. Virus typing is carried out only at the reference laboratory at SSI. This laboratory receives
material from a subset of the IgM-positive patients in Denmark. Such samples will undergo
confirmatory identification of virus RNA by PCR and further characterisation by genotyping and
subtyping. This consists in sequencing relevant areas of the viral genome (the VP1 region). If two
persons are not infected from the same source, these sequences will most likely vary.

Number of cases reported in Denmark


HAV disease usually not acquired in Denmark. Most Danish cases get infected in endemic countries.
Transmission in Denmark is generally limited and associated with patients having had contact to
persons who has been infected abroad. Foodborne or waterborne outbreaks had until 2013 not
been described in Denmark. The figure below shows the number of notified cases from 2009 to
2012 stratified by foreign/domestic place of infection.

The epidemiology and the surveillance system for HAV in Norway is not exactly similar, but very
much comparable to that of Denmark.

2
Case definition
The following case definition was developed for the outbreak:
• Probable case: A person living in Denmark or Norway with clinical illness associated with
Hepatitis A and positive for HAV IgM antibodies, no travel history outside of Western
European countries or other known HAV risk factors, and symptoms onset on or after 1
October 2012.
• Confirmed case: Probable case typed with HAV genotype 1B and having the outbreak strain
sequence.

3
Part 1: Descriptive analysis of outbreak data – using EpiInfo 7

Throughout this exercise, you should imagine that you are part of the team working on solving
an outbreak of HAV infections in Denmark.

The situation is as follows:

It is March 2013. An outbreak has been recognised and an outbreak team formed following international
alerts, it has become clear that Norway is also experiencing an outbreak with HAV - and it is thought that
the source of infection in both countries is the same. However, the source is not known. As of first of
March, 20 Danish cases are being counted as part of the outbreak of which 10 are confirmed
microbiologically. In Norway, there are six cases of which three are confirmed.

Most of the cases in Denmark and Norway have been interviewed by yourself or your Norwegian
colleagues. The interviews have two aims. Firstly, to establish if notified patients likely were part of the
outbreak, by assessing time of onset, foreign travel and similar characteristics. Secondly, to generate
hypotheses about the source of the illness. This is done by asking detailed questions about food intake
and behaviour at the likely time of infection. Questions can help clarify, if cases were infected directly
from another case (and thus were secondary cases), and so on.

To collect this information, a questionnaire has been developed. Patients have been interviewed over the
phone using the questionnaire - and the questionnaire filled in on paper. A data entry form was made
using EpiInfo 7 and the data were entered into the computer.

On the next page you can see a shortened form of the questionnaire that was used. Also shown is a table
(divided into two) with a few rows of the dataset that resulted from the questionnaire.

Task 1.
• Look at the questionnaire and the dataset (next page).
• Make sure you understand what the column headings in the table means - what is for
instance “DiseaseOnset” and “NotificationDate”.
• Make sure you understand how the data shown, were generated.
4
5
Task 2.
• Which types of descriptive analysis would you suggest making?

At the course site at D2L, you will have the full dataset (“Dataset Session 2a”) in Excel format.
We will now do the descriptive analysis using EpiInfo 7. We will use the so-called “Visual
Dashboard” function. However, it is also possible to use the “Classical Analysis” format (so if
there is time, familiarise yourself with both, and use the one you like the better).

Open dataset. First we need to load the data into EpiInfo 7.


Stepwise instruction:
• Open EpiInfo 7
• Go to the ‘Visual Dashboard’. The program will ask you to ‘set data source’. Alternatively,
right click on the dashboard and choose ‘set data source’.
• A dialogue box will open (as shown below). Choose Excel as the Database type and locate the
Excel file. Then press on ’Descriptive$’ in the large box (that’s the name of the Excel sheet)
and you can now press okay.

Now that you have successfully loaded the dataset, please perform the tasks listed on the
next page using the Visual Dashboard. For most of the tasks, you will simply need to right
click with the mouse and then choose one of the ‘Analysis Gadgets’ – see the figure
above.
6
Task 3 – Line list
• Make a line list containing basic information (i.e., case no., case, age, sex, date of onset, date of
diagnosis, country of residence, foreign travel).
• Use the line list to count the number of cases. How many are there?

Task 4 - Epidemic curve


• Make an epidemic curve. Use days as the time interval on the x-axis.
• Try if it is more informative to use other time intervals.
• Try stratifying the epi curve, according to patients from the two different countries (e.g. using
different colors for Danish and Norwegian cases, respectively). What does this tell you?

Task 5 - Sex distribution


• How many cases are male/female? Count the number of males/females using the frequency
function.
• How many males/females in Denmark and Norway, respectively?
• Make a chart of the sex distribution.

Task 6 – Age distribution


• The age of the cases is provided. However, it is difficult to get an overview of the age distribution
without first grouping the ages into age groups. Divide the cases into age groups of 10 years
intervals using the ‘ Define variables’ panel at the left-hand side of the dashboard. Move the mouse
over it and it will pop up. Then choose: ‘New variable’ and ‘with recoded value’. You can now
choose a variable you want to modify (Age) and name the new variable you’re making (e.g.
Agegroup). You can fill in the fields below - or press the ‘Fill ranges’ button and let EpiInfo do it for
you. After this step, you can now run the ‘Frequency’ function on the new variable ‘Agegroup’.

Task 7 – Select/filter data


• You want to focus in particular on the Norwegian cases in order to see if they are differently
distributed from the Danish cases. Using the filter function, select the Norwegian cases only.
• Redo the chart of the age and sex distribution, so that only the Norwegian cases are depicted.

Task 8 – Map
• Using the map function, make a map of where the cases are living so that it easy to see which cases
live in each country. From the main interface page of the program, press ‘Create maps’. Then go to
the ‘Add Data Layer’ menu and choose ‘Case Cluster’. Then you can again open the Excel dataset.
You are then asked to choose the latitude and longitude variables. The map will then be made i.e.
you can now see the addresses of cases on the map.
• Zoom in on Denmark. Redo the map so that you stratify on males and females and applied to
different colors to each gender. Using the map layers function at the lower end of the map you can
restrict the cases shown e.g. only females in Denmark. Then you can add another layer e.g. only
males in Denmark. Having the two layers shown at the same time in different colors will give you a
distribution of male and female cases in Denmark.

Task 9 – Saving
• Finally, remember to save your output. You can save what you have made on the visual dashboard.
This way you can open it again with EpiInfo and continue working with the data and the output you
made by clicking ‘Save’ in the top blue bar. The resulting file will have the suffix .cvs7, which can be
opened in EpiInfo. You can also export the visuals as a webpage (HTML) or to Excel and Word by
right clicking on the visual dashboard and choose ‘Send output to’ and then e.g. ‘Microsoft word’.
You can also copy and paste each element from the dashboard into a Word or PowerPoint
document if you so prefer. 7
Part 2: Case-control study– using EpiInfo 7

Throughout this exercise, you should imagine that you are part of the team working on solving
an outbreak of HAV infections in Denmark.

The situation is as follows:


It is the first week of March 2013. An outbreak has been declared and an outbreak team formed.
A total of 33 cases are being counting as part of the outbreak of which 14 are confirmed
microbiologically. The cases have been interviewed and none had been travelling outside
Denmark in the period of infection. Based on a review of previous sources of HAV outbreaks and
subsequent interviews with some of the cases, a series of hypotheses about the source has been
identified. It is decided to conduct a case-control study to test a number of these hypotheses.

Case-control study:
The cases included in the case-control study were the most recent cases from the line list.
Control-persons, representing the background population, were selected from the Danish
population register. They were selected to match cases individually, meaning that they had the
same age (was born within the two months of the case), had the same sex and lived in the same
municipality. For every case two controls were interviewed.

Task 1 - discussion points


• Among yourselves, make sure that everyone understands what a case-control study is
• What is being compared in a case-control study?
• Why was a cohort study, not performed in the situation?
• Discuss how you could select control persons

8
For the case-control study, information was collected using telephone interviews by filling in
questionnaires on paper. This was done from March 6 to March 14. In total, 25 cases and 50
matching controls were interviewed. A data entry form was then made and the data entered into
the computer. On the next pages you can see an extract of the questionnaire that were used for
the case-control study (the real questionnaire was longer). The table shows the first rows of data
in table format.

Task 2 - Discussion points


• Look at the questionnaire and the dataset. Make sure you understand what the column
headings in the table means.
• Note that cases and controls had four choices for answering the questions about which
food items they had consumed, but in the dataset, these data has been recoded as ‘yes’
and ‘no’ – why do you think this was done?

9
10
You now wish to make an analysis of the data. We will do that using the computer. We need to
work fast, since people are getting ill. The media knows that you’re conducting the study and the
main newspapers have been calling to hear about the results, and also the food authorities are
eagerly requesting the information in order to take action if needed.

Before we begin the data analysis using the computer, we do a rough calculation of two food
items that we are particularly interested in: eating edamame beans and drinking home-made
smoothies. We count the number of cases and controls that have reported eating these two
products.

We learn that 2 cases said that they likely ate edamame beans before becoming ill, while 23 said
that they didn’t or most likely didn’t. For the controls, 4 persons said that they ate edamame
beans, while 46 said that they didn’t.

Concerning home-made smoothies, 18 cases said that they likely drank home-made smoothies
before becoming ill, while 7 said that they didn’t or most likely didn’t. For the control persons, 10
said that they drank home-made smoothies and 40 said they didn’t.

Task 3
• Without using the computer, make a 2×2 table for each of these two exposures. Calculate
the crude odds ratios for both and discuss the results.

11
A dataset ‘Dataset Session 2B’ is supplied in Excel format. You now wish to examine the dataset
and perform the odds ratio calculation for all food items. (Hint: use the create new variable to
form a ‘Food’ variable, so you don’t have to make a cross tabulation for all the food items). First,
you will do the odds ratio calculation without taking the matching of cases and controls into
account. This is formally wrong, but later you will do both a matched analysis and a logistic
regression.

For the present analysis, we will use the functions on the Visual Dashboard. Start by opening the
dataset, as in Part 1. For most of the tasks, you will simply need to right click with the mouse and
then choose one of the ‘Analysis Gadgets’.

Task 4. Please perform the following operations.


• Inspect the data - what to do about missing values?
• Count cases and controls.
• Count the number of persons that drank smoothies - stratified by cases and controls.
• Calculate odds ratios and confidence intervals for each food item.
• Filter data. It becomes clear that case number 22 was most likely infected by her sister.
Exclude her and repeat the analyses of above.
• Create variable. You speculate that strawberries may be the source of the outbreak, as you
realise that the bags with mixed frozen berries also contain strawberries. Therefore, you
want to create a new variable for being exposed to strawberries. Do so and perform the
odds ratios analysis for this variable. See screen shots below.
• Make of report with the output in either HTML or in Word using the send output function
on the Dashboard.

12
A dataset ‘Dataset Session 2B one control only matched’ is supplied in Excel format. Here only
one control is retained, as EpiInfo only allows for one control per case for the paired match
analysis. You now wish to examine the dataset and perform a matched analysis for each
suspected food item. In the dataset, new exposure variables have been created recoding the 1/0
to yes/no. Likewise, a CaCo2 variable has been created by recoding the CaCo to yes/no.

Again, we will use the functions on the Visual Dashboard. Start by opening the dataset, as in the
previous exercise.

Task 5. Please perform the following operations.


• Inspect the data to understand the coding. Notice that the variable Case ‘matches’ a case
with a control - what are the matching variables?
• As before
– Filter the data by excluding case number 22.
– Create a variable for overall exposure to strawberries.
• Make a matched pair case-control analysis for suspected food items (see screenshots
below).
– Which variable is the ‘Pair Group ID’?
• Interpret and present the output and discuss the most likely source of the outbreak.

13
Go back to the full dataset ‘Dataset Session 2B’. Create a new dependent variable called ‘ill’ by
recoding the CaCo to 1 and 0 values. Perform a matched logistic regression analysis including
multiple independent variables (consider carefully which to include).

Again, we will use the functions on the Visual Dashboard. Start by opening the dataset, as in the
previous exercise.

Task 6. Please perform the following operations.


• As before
– Filter the data by excluding case number 22.
– Create a variable for overall exposure to strawberries.
• Make a matched logistic regression including suspected food items (see screenshots
below).
• Interpret and present the output and discuss the most likely source of the outbreak.
• Compare the results with your previous analyses.

14
15

You might also like