SSRN Id3569367 PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 43

Identification and Estimation of the

SEIRD Epidemic Model for COVID-19∗


Ivan Korolev†

April 20, 2020

Abstract

This paper studies the SEIRD epidemic model for COVID-19. First, I show that the
model is poorly identified from the observed number of deaths and confirmed cases. There
are many sets of parameters that are observationally equivalent in the short run but lead to
markedly different long run forecasts. Next, I show that the basic reproduction number R0
can be identified from the data, conditional on the clinical parameters. I then estimate it for
the US and several other countries and regions, allowing for possible underreporting of the
number of cases. The resulting estimates of R0 are heterogeneous across countries: they are
2-3 times higher for Western countries than for Asian countries. I demonstrate that if one
fails to take underreporting into account and estimates R0 from the reported cases data, the
resulting estimate of R0 will be biased downward and the resulting forecasts will exaggerate
the number of deaths. Finally, I demonstrate that auxiliary information from random tests
can be used to calibrate the initial parameters of the model and reduce the range of possible
forecasts about the future number of deaths.


I thank Andy Atkeson, Jeremy Fox, Oleg Itskhoki, David Slichter, Jim Stock, Ping Yan, and Tom Zohar
for their comments and suggestions. All remaining errors are mine.

Department of Economics, Binghamton University, 4400 Vestal Parkway East, PO Box 6000, Bingham-
ton, NY 13902-6000, USA. E-mail: [email protected].

Electronic copy available at: https://ssrn.com/abstract=3569367


1 Introduction

The SIR (Susceptible, Infectious, Recovered) model and its variations is widely used in
epidemiology to model the spread of epidemics. Since the outbreak of COVID-19, it has seen
increased popularity among economists who are trying to asses the economic consequences
of the coronavirus and various mitigation policies, such as Atkeson (2020b,c), Berger et al.
(2020), Eichenbaum et al. (2020), Fernandez-Villaverde and Jones (2020), Piguillem and Shi
(2020), Toda (2020), and others. In this paper, I study identification and estimation of the
modification of the SIR model called SEIRD (Susceptible, Exposed, Infectious, Recovered,
and Dead) and present several findings.
First, I show that the SEIRD model has too many degrees of freedom and is poorly
identified from the deaths and confirmed cases data. Conditional on the values of clinical
parameters, i.e. parameters that reflect the clinical progression of the disease, the only model
parameter that is identified is the basic reproduction number R0 . While R0 governs the speed
of spread of the virus, the key driver of the long run number of deaths in the model is the
case fatality ratio (CFR), which is not identified separately from initial values. As a result,
when I estimate the model for different initial values, I obtain models that are observationally
equivalent in the short run but produce markedly different estimates of unobserved variables
and long run forecasts. For instance, the estimated number of people who had the virus in
the US on March 31 for different observationally identical models ranges from several million
to around 140 million, while the predicted death toll varies from around 30 thousand to over
a million.
Second, I estimate the basic reproduction number R0 , which is identified, and show that it
is heterogeneous across countries and regions. For the same values of clinical parameters, the
estimates of R0 for the US, UK, California, and New York state are about 2-3 times higher
than for Japan or Taiwan. Moreover, the estimates of R0 are highly sensitive to the values
of clinical parameters. There is no agreement in the medical literature on the length of the
incubation and infectious period for COVID-19, different values of these parameters result

Electronic copy available at: https://ssrn.com/abstract=3569367


in the estimates of R0 for the US that range from 3.75 to 11.6. Despite the different values
of R0 and clinical parameters, the resulting models result in very similar, if not identical,
fit of the observed data. These findings highlight that there is no single value of R0 that is
consistent with the data, at least in the short run. There are many values of R0 that are
equally “correct,” and the exact value of R0 that is appropriate depends both on the country
and on the model.
My model takes into account possible underreporting of the number of COVID-19 cases.
Even though the fraction of all cases that is reported is not identified, I show that it is im-
portant to allow it to differ from one. I demonstrate that if one does not take underreporting
into account and estimates R0 from the confirmed cases data, assuming that all cases are
reported, the estimate of R0 will be biased downward. The resulting model is not consistent
with the observed US data, produces poor out of sample forecasts, and leads to severe over-
estimation of the long run number of deaths. In contrast, the estimates of R0 that I obtain
not only fit the observed data well, but also produce reasonable short run out of sample
forecasts.
Finally, I use the example of Iceland to show how auxiliary data can be used to narrow
down the range of possible forecasts of the long run number of deaths from the epidemic.
I use the results of presumably random testing conducted in Iceland to calibrate the initial
parameters of the model and show that doing so results in a 5-fold reduction in the range
of possible forecasts. This finding highlights the importance of random testing. Once more
countries conduct tests of random samples of population for having COVID-19 as well as
for having antibodies to it, it may become possible to calibrate the initial values better and
obtain more precise forecasts about the future.
The remainder of the paper is organized as follows. Section 2 presents the SEIRD model.
Section 3 describes the data I use. Section 4 discusses identification of the model. Section 5
outlines the estimation procedure, and Section 6 presents the results. Section 7 concludes.
All tables and figures are collected in Appendix A.

Electronic copy available at: https://ssrn.com/abstract=3569367


2 Model

In this paper I study a version of the SEIR model that includes dead among its compart-
ments. Similar models have been used in epidemiology by Chowell et al. (2007), Lin et al.
(2020), Wang et al. (2020), and others. More advanced versions of the model with more
compartments are considered in Chowell et al. (2003) and Chowell et al. (2006). I consider
a model with five groups of people: susceptible (S), exposed (E), infectious (I), recovered
(R), and dead (D). Susceptible are those who have not gotten the virus yet and can become
infected. Exposed are those who have gotten the virus but cannot transmit it to others yet.
This corresponds to the so called incubation period. Infectious are those who have the virus
and are contagious. Recovered are those who were sick in the past but have recovered from
the virus. Dead are those who have died because of the virus.
The number of people in different groups evolves over time as follows:

dS(t) S(t)
= −β I(t) (2.1)
dt N
dE(t) S(t)
=β I(t) − σE(t) (2.2)
dt N
dI(t)
= σE(t) − γI(t) (2.3)
dt
dR(t)
= (1 − α)γI(t) (2.4)
dt
dD(t)
= αγI(t) (2.5)
dt
dC(t)
= λγI(t) (2.6)
dt

N is the population size of a given country or region. I assume that it is fixed and does
not vary over time. I could model the dynamics of the population size to account for the fact
that some people die from the disease, but then I would also need to model births and deaths
due to other causes. In order to avoid these complications, I simply fix N , as is commonly
done in the literature. C(t) is the cumulative number of cases confirmed. It does not affect
the model dynamics but is used to match the model to the confirmed cases data. In my

Electronic copy available at: https://ssrn.com/abstract=3569367


main analysis, I assume that it is the people who are infectious, rather than exposed, who
are tested for the virus. In my robustness checks, I replace I(t) in equation (2.6) with E(t)
and find that the results remain virtually unchanged.
The evolution of the SEIRD model depends on several parameters. I will refer to γ
and σ as clinical parameters. The parameter γ reflects the estimated duration of illness.
Its estimates in the literature vary from 1/18 (e.g. Wang et al. (2020)) to 1/5 (e.g. Lin
et al. (2020)). The parameter σ reflects the estimated incubation period of the disease. Its
estimates in the literature vary from 1/5 (e.g. Wang et al. (2020), Lauer et al. (2020)) to 1/3
(Lin et al. (2020)).
The parameter β reflects the rate at which infectious people interact with others. It is
often written as β = R0 γ, where R0 , called the basic reproduction number, measures the
transmission of the disease with no mitigation efforts. Liu et al. (2020) review the literature
on the estimation of R0 for COVID-19 and conclude that the average and median estimates
in the literature are around 3. However, Sanche et al. (2020) estimate in their recent study
that R0 in China was equal to 5.7, much higher than found in the previous literature.
The parameter α is the case fatality ratio (CFR). As discussed in Korolev (2020), the
CFR has serious limitations and heavily depends on the composition of people who get sick.
The CFR may also not be constant over time and can substantially increase if the health
care system becomes overwhelmed. However, for simplicity, I assume that α is fixed and try
to estimate it. Finally, λ is the proportion of all COVID-19 cases that is observed. It is also
estimated within the model.
The initial conditions for the number of recovered and dead are R(0) = 0 and D(0) = 0.
I need to pick the initial number of infectious I(0) and exposed E(0). I discuss their choice
later in the paper. Finally, the initial number of susceptible people is S(0) = N − I(0) −
E(0) − R(0) − D(0) = N − I(0) − E(0).

Electronic copy available at: https://ssrn.com/abstract=3569367


3 Data

In my estimation, I rely on the deaths and confirmed cases data for COVID-19. The
country level data is collected by the Center for Systems Science and Engineering at Johns
Hopkins University and is available online.1 The state level data for the US is collected by
the New York Times and is also available online.2 The population of different countries is
taken from World Population Prospects 2019 by United Nations.3 The population of different
states in the US is taken from the US Census Bureau.4
I use T = 70 observations in my sample, with the first observation being January 22,
2020. Around that time, cases of coronavirus were widely registered outside China, e.g. in
the USA (January 21),5 Germany (January 27),6 and the UK (January 31).7 However, as I
show below, the initial conditions and the epidemic start date are not identified separately
from the CFR and the fraction of cases observed. I discuss the identification challenges in
more detail below. The last observation in my data corresponds to March 31, 2020.

4 Identification

In this section, I study identification of the model parameters based on the deaths and
confirmed cases data. There are several earlier papers on identification of the parameters of
the SIR and related models, e.g. Marinov et al. (2014), Magal and Webb (2018), and Ducrot
et al. (2019), but they are not directly applicable in the current setting. In particular,
they do not study whether the parameters are identified based on the short run data only.
Atkeson (2020a), written concurrently and independently of this paper, attempts to answer
the question similar to mine in the context of the usual SIR model.
1
https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_
19_time_series
2
https://github.com/nytimes/covid-19-data
3
https://population.un.org/wpp/Download/Standard/Population/
4
https://www.census.gov/data/tables/time-series/demo/popest/2010s-state-total.html.
5
https://www.cdc.gov/media/releases/2020/p0121-novel-coronavirus-travel-case.html
6
https://www.dw.com/en/germany-confirms-human-transmission-of-coronavirus/a-52169007
7
https://www.bbc.co.uk/news/health-51325192

Electronic copy available at: https://ssrn.com/abstract=3569367


In the econometrics literature, identification often studies whether the parameters of the
model would be known if the researcher knew the population that data is drawn from (see,
e.g., Lewbel (2019)). In the context of this paper, the question is somewhat different: if
we observed the evolution of deaths and confirmed cases in the short run without any noise,
would we then know the parameters of the model? Or, in other words, do different parameter
values lead to different realizations of the observable data in the short run? Because the
SEIRD model cannot be solved in closed form, I simulate the deaths and observed cases
paths from models with different parameter values and investigate whether these paths are
identical instead of studying identification theoretically.
First, I assume that clinical parameters γ and σ are known constants and study identi-
fication of the remaining parameters. The parameters of the model then include the case
fatality ratio α, the basic reproduction number R0 , the fraction λ of all cases that is observed,
and the initial conditions: the number of infectious people I(0) = I0 , the number of exposed
people E(0) = E0 , and the time T0 that has passed since the epidemic started. For instance,
T0 = 1 means that the epidemic just started and the initial values (E0 , I0 ) correspond to the
first period we observe. T0 = 2 means that the epidemic started last period and the current
period corresponds to (E(1), I(1)). T0 = 10 means that the epidemic started 9 periods ago
from values (E0 , I0 ) and the current values are (E(9), I(9)). I denote the vector of parameters
θ = (α, R, λ, E0 , I0 , T0 ) and study whether these parameters can be identified based on the
short run (say, 60 days) data.
The upper panel of Figure 1 plots the simulated paths of deaths and confirmed cases
for three sets of parameters: θ1 = (0.01, 5, 0.2, 2, 2, 2), θ2 = (0.005, 5, 0.1, 4, 4, 2), and θ3 =
(0.004, 5, 0.08, 2, 2, 10). The first two sets of parameters share the same start date and re-
production number, but differ in the initial values and the values of α and λ. Essentially,
the epidemic that corresponds to the second set of parameters just scales the first epidemic
up by a factor of two, but cuts the fatality rate and the observable fraction of cases in half.
As a result, these two epidemics are indistinguishable in the short run. In other words, we

Electronic copy available at: https://ssrn.com/abstract=3569367


cannot tell from the short run data whether we observe a large epidemic with a low fatality
rate and large number of unreported cases, or a small epidemic with a high fatality rate and
small number of unreported cases.
The third epidemic starts from the same values of (E0 , I0 ) as the first one, but nine
periods ago instead of last period. At the same time it reduces the fatality rate and the
observable fraction by a factor of 2.5. It produces more cases in the current period than
the first epidemic, but a smaller fraction of them is observed and a smaller fraction leads to
death. As a result, the third epidemic is indistinguishable from the first one.
While the three sets of parameters are indistinguishable in the short run, the middle
panel of Figure 1 shows that the resulting epidemics lead to very different long run number
of deaths. The epidemic with the highest fatality rate α will result in about twice as many
deaths as any of the other two epidemics.
Next, I study identification of R0 . The bottom panel of Figure 1 shows that R0 affects
the curvature of the deaths and reported cases curves, while other parameters only tilt it
around the origin. As a result, R0 can be uniquely identified from the curvature of deaths
and confirmed cases.
Figure 2 parallels Figure 1, but plots the logarithms of deaths and reported cases rather
than their levels. It starts from day 30 rather than day 1, because logarithms are very
sensitive to small values of different variables that are observed in the very beginning of the
epidemic. The figure shows that changes in the initial conditions, the fatality rate α, or
the observable fraction of cases λ shift the lines up or down without affecting their slope,
while changes in the reproduction number R0 change the slope of the lines. Thus, R0 can be
identified from the slope of the log series, but the remaining parameters cannot be separately
identified.
Because one cannot separately identify α, λ, E0 , I0 , and T0 , I set T0 = 1 and I0 = 0. In
my main analysis, I pick the lowest value of E0 that results in the interior solution for both α
and λ during all stages of the estimation process. I should note, however, that other choices

Electronic copy available at: https://ssrn.com/abstract=3569367


of initial parameters are possible and that the estimate of α that I will obtain should be
viewed as a surrogate of the fatality rate and initial conditions rather than the true fatality
rate per se.
Next, Figure 3 explores the role of different parameters in the evolution of the model. It
demonstrates that changes in the value of R0 primarily affect the timing of the epidemic but
have little effect on the total death toll. The values of α and λ affect the number of deaths
and reported cases respectively, but they have no effect on the model dynamics. Finally, the
initial values E0 and I0 affect the timing of the model, but to a much smaller extent than
the value of R0 . Thus, if we are interested in modeling the evolution of the epidemic and its
burden in terms of the number of deaths, the primary parameters of interest are R0 and α,
while the remaining model parameters can be viewed as nuisance parameters.
In the discussion so far, I have assumed that the values of clinical parameters γ and σ are
known. But, as has been discussed earlier, there is no agreement in the medical literature
on their appropriate values. The estimates of γ range from 1/18 to 1/5 and the estimates
of σ range from 1/5 to 1/3. Hence, I consider three scenarios: “fast” with σ = 1/3, γ = 1/5,
“medium” with σ = 1/4, γ = 1/10, and “slow” with σ = 1/5, γ = 1/18. While these exact choices
are somewhat arbitrary, considering several values of clinical parameters instead of just one
will allow me to better understand their effect on the estimation results and forecasts.
Finally, I should note that if the number of deaths because of the virus is overreported
because some deaths with the virus did not happen because of the virus, then my estimate of
R0 will be correct if the fraction of such deaths in all observed deaths is constant over time.
This measurement error will only result in a biased upward estimate of α. Similarly, if the
number of deaths is underreported by the same factor in all time periods (say, only half of
all deaths from the virus is reported), the estimate of R0 will be correct but the estimate of
α will be biased downward.

Electronic copy available at: https://ssrn.com/abstract=3569367


5 Estimation

In this section, I describe the estimation procedure that I use. It is similar to Method 2
in Chowell et al. (2007) but consists of several steps because I use both deaths and reported
cases data. As discussed above, I fix T0 = 1 and I0 = 0. I start from E0 = 1 and increase it
if needed as discussed below. For a given vector of parameters (α, R0 , λ) and each period t, I
compute the implied number of deaths D(t, α, R) and the implied number of reported cases
C(t, R, λ).8 Then I compute the residual sum of squares for the deaths series, given by

T
X
RSSD (α, R) = (D(t) − D(t, α, R))2 ,
t=1

and the residual sum of squares for the reported cases series, given by

T
X
RSSC (R, λ) = (C(t) − C(t, R, λ))2 ,
t=1

where D(t) and C(t) are the actual data. I then find the values (α̃D , R̃D ) that minimize
RSSD and the values (R̃C , λ̃C ) that minimize RSSC . I then estimate (α, R0 , λ) jointly by
minimizing

RSST (α, R, λ) = RSSD (α, R)/RSSD (α̃D , R̃D ) + RSSC (R, λ)/RSSC (R̃C , λ̃C )

I use the normalization by the preliminary values of the RSS for deaths and reported cases
so that both series contribute roughly equally to the final objective function. If I did not
normalize their contributions, then RSSC would dominate, because the number of reported
cases in the data is orders of magnitude larger than the number of deaths. I call the resulting
estimates (α̂, R̂, λ̂).
If any of α̃D , λ̃C , α̂, or λ̂ are at the upper bound of 1, then I increase E0 until the
8
Note that the fatality rate does not affect the number of reported cases in the model, while the observable
fraction of cases does not affect the number of deaths.

10

Electronic copy available at: https://ssrn.com/abstract=3569367


constraint is no longer binding. Thus, I use the lowest possible value of E0 for which none
of the constraints on the parameters is binding. One could drop that restriction and always
use E0 = 1. Because I attempt to use the lowest possible value of E0 , the estimated fatality
rate α will be close to, or at, its upper bound. The forecasted number of deaths could then
be viewed as the upper bound on the possible number of deaths, but it should be interpreted
with caution: the model I use may not be correct, and some of its assumptions, such as the
fatality rate α being constant over time, may be unrealistic.
In addition to estimating the parameters by minimizing the RSS in levels, I obtain a
separate set of estimates by minimizing the RSS in logarithms. For the deaths series, it is
given by
T
(log D(t) − log D(t, α, R))2 1{D(t) > 1},
X
RSSD,logs (α, R) =
t=1

and for the reported cases series, by

T
(log C(t) − log C(t, R, λ))2 1{D(t) > 25}
X
RSSC,logs (R, λ) =
t=1

When I compute the RSS for deaths in logs, I trim observations where the number of
deaths is below 2. For the RSS for cases in logs, I trim observations where the number of
cases is below 26. These choices are arbitrary, but they are motivated by the fact that the
log series, which should be linear in the model in the short run, typically exhibit breaks in
the data. During the early stages of the epidemic, they are almost flat, but then their slope
increases. By using trimming, I fit the model to the latter portion of the data, where thee
linearity of the log series in time appears to hold, but exclude the early observations before
the break.
I abstract away from the statistical properties of my estimators and consider them purely
as curve fitting algorithms. There are several reasons for this. First, the SEIRD model is
deterministic and does not contain any error terms. In order to conduct inference properly,
one would need to introduce the error term in the model and discuss its properties. Because

11

Electronic copy available at: https://ssrn.com/abstract=3569367


I focus on identification and forecasting, doing so is beyond the scope of the paper. Second,
while a bootstrap procedure for computing standard errors is proposed and used in Chowell
et al. (2007), I am not aware of its theoretical justification. Third, the bootstrapped standard
errors turn out very unstable and poorly estimated.
While I do not consider inference in this paper, in order to validate whether my model is
reasonable, I assess its short fun forecasting performance using the validation set approach.

6 Results

This section presents the estimation results. Before I move on to my main results, I discuss
computational issues associated with estimation of the model. Based on the arguments from
the previous section, when I change the initial parameters, the estimate of R0 should remain
unchanged, while the estimates of α and λ should change proportionally to the changes initial
parameters. In practice, however, this is not always the case. Both α and λ are constrained
to lie between 0 and 1, and when these constraints are binding or close to binding, changes
in the initial values can have a substantial effect on the estimate of R0 .
Table 1 reports the estimates of R0 , α, and λ for different values of the initial parameters
when the RSS is computed in levels or in logs. When α and λ are reasonably far from the
upper and lower bound, as in Panel B, the estimate of R0 is fairly robust to changes in the
initial parameters. However, the left parts of Panels A and C demonstrate that when α is
close to its lower bound, the estimate of R0 can be very sensitive to the choice of initial
parameters. This issue seems to be more pronounced when the RSS is computed in levels
rather than logs. I have tried checking the sensitivity of R0 to the initial values by estimating
the model for several sets of initial values. Whenever possible, I pick the initial values so
that the constraints are not binding and the estimate of R0 is fairly stable.
The estimates of R0 for different values of clinical parameters σ and γ are presented in
Table 2. The upper panel of the table uses the RSS computed in levels, while the bottom

12

Electronic copy available at: https://ssrn.com/abstract=3569367


panel is based on the RSS computed in logarithms. The first three columns of the table
correspond to the “fast,” “medium,” and “slow” scenarios above. The last column of the table
reports the results from a modified version of the SIERD model that replaces β S(t)
N
I(t) in
equations (2.1) and (2.2) with β S(t)
N
(I(t) + qE(t)) for q = 0.5. This version of the model
assumes that people in the exposed compartment can be contagious, but to a lesser extent
that people in the infectious compartment.
The results suggest that there is no single correct value of R0 . The estimates of R0 for a
given country or region change a lot as clinical parameters change. For example, the estimate
of R0 for the US ranges from under 4 in the “fast” scenario to around 11.5 in the “slow” one.
Moreover, for the fixed values of clinical parameters γ and σ, the estimates of R0 differ a lot
from region to region. For instance, in the “medium” scenario with σ = 1/4 and γ = 1/10, the
estimates of R0 vary from 2 for Japan to around 6.5 for the US and 7 for New York state.
While the magnitude of cross-country differences might be surprising, heterogeneity itself is
not. For example, Ferguson et al. (2020) assume in their analysis that various mitigation
or suppression policies can reduce R0 . If this is the case, one could expect that different
countries have different values of R0 due to differences in their approaches to dealing with
COVID-19, as well as due to differences in social norms, population density, etc.
Tables 3 and 4 present robustness checks. Table 3 replaces I(t) in equation (2.6) with
E(t), i.e. it assumes that the number of confirmed cases is based on the number of exposed
rather that infectious people in the model. As we can see, this has little effect on the estimates
of R0 , which remain very similar to those in thee baseline model.
Table 4 presents the estimates based on the first 60, rather than 70, observations. This
robustness check is motivated by the fact that several states in the US issued stay home
orders around March 20, e.g. California on March 19 and New York on March 22.9 Hence,
one may be worried that these measures affected the value of the basic reproduction number
R0 . By considering the first 60 observations, i.e. the data up to March 21, I should be able
9
https://www.nytimes.com/interactive/2020/us/coronavirus-stay-at-home-order.html

13

Electronic copy available at: https://ssrn.com/abstract=3569367


to address this concern. Indeed, the estimates of R0 in Table 4 are typically higher than in
Table 2; however, the pattern of heterogeneity remains unchanged: Japan and Taiwan have
substantially lower values of R0 than Western countries and regions.
Next, I present my results graphically. The upper panel of Figure 4 plots the fitted values
of deaths and reported cases for the US from the models with the estimates from the first
row of Panel A of Table 2. As we can see, even though the four models have different values
of clinical parameters and different estimates of R0 , they appear to be indistinguishable or
almost indistinguishable in the short run: the resulting paths of deaths and reported cases are
nearly identical. However, in the long run the story is different. The lower panel of Figure 4
demonstrates that the predicted total number of deaths from the COVID-19 epidemic in the
four models ranges from around 35 thousand to almost 340 thousand.
Figure 5 repeats this exercise for the estimates from the first row of Panel B of Table 2,
when the RSS is computed in logarithms. The exact forecasted number of deaths in this
figure differs from that in Figure 4, in part due to different initial values, but the qualitative
results remain unchanged: different observationally equivalent models produce very different
forecasts.
Figure 6 shows that replacing I(t) in equation (2.6) with E(t) has little, if any, effect on
the results. Figures 7 and 8 present the results for California and demonstrate that they are
qualitatively the same as for the US as a whole.
Next, Figures 9 and 10 fix the values of clinical parameters σ = 1/4 and γ = 1/10 and
consider the pessimistic and optimistic scenarios, given by different initial conditions, for
which the resulting models are observationally equivalent or almost equivalent.10 The pes-
simistic scenario corresponds to the lowest possible initial condition E0 = 1, I0 = 0, while the
optimistic scenario corresponds to E0 = 16, I0 = 0. Intuitively, the lower the initial values,
the lower the cumulative number of people who have had the virus, the higher the estimated
fatality rate, and the higher the forecasted death toll.
10
Small differences are possible due to aforementioned numerical problems.

14

Electronic copy available at: https://ssrn.com/abstract=3569367


We can see that different initial conditions that lead to observationally equivalent models
in the short run. However, there are large differences in estimates of unobserved variables and
in long run forecasts. For instance, when the RSS is computed in logs, the model with the
initial condition E0 = 1, I0 = 0 estimates that the current number of people with COVID-19
(here I count people both in the exposed and infectious compartments) in the US is around
6 million and predicts over 1.14 million deaths in the long run. In contrast, the model with
E0 = 16, I0 = 0 estimates that there are currently around 140 million people with COVID-19
and predicts around 33 thousand deaths in the long run, a 34-fold difference. When the
RSS is computed in logs, the current number of people with COVID-19 varies from under
10 million to 108 million, and the forecasted death toll ranges from 45 thousand to over 730
thousand.
Figure 11 presents the results for different estimates of R0 in the “medium” scenario with
σ = 1/4 and γ = 1/10. One model corresponds to the pessimistic case scenario from Figure 9
with R̂0 = 6.37 and E0 = 1. The forecasted long run number of deaths for that model is 1.14
million. The remaining two models estimate R0 from the confirmed cases data assuming that
all cases are observed, i.e. λ = 1, and then recover α from the deaths data. The resulting
estimates of R0 are lower, 5.51 when the initial conditions are I0 = 0, E0 = 1, and 3.18
when the initial conditions are I0 = 25, E0 = 100. As we can see from the upper panel, the
resulting models, especially the latter one, provide a poorer fit of the observed data: they
cannot generate enough curvature because of the low R0 .
Because the difference between the red (E0 = 1, possible underreporting) and orange
(E0 = 1, no underreporting) models in Figure 11 may not appear substantial, Figure 12
compares these two models using the validation set approach. I divide the available data
into two parts: a training period, based on which the model is estimated, and a test period,
for which I compute forecasts and compare them with actual values. I vary the length of
the training period from 55 to 65 observations. We can see that the model that allows for
underreporting of cases can predict the number of deaths out of sample much better than the

15

Electronic copy available at: https://ssrn.com/abstract=3569367


model that does not allow for underreporting. As the sample size gets larger, the difference
between the models becomes smaller because the value of R0 estimated from the reported
cases data increases. Nevertheless, the method that allows for underreporting seems to be
preferable if we are interested in predicting the future number of deaths.
The bottom panel of Figure 11 shows that the forecasted long run number of deaths is
from both these models is between 5 and 6 million, a lot higher than in the pessimistic model
with higher R0 that fits the data well. Thus, estimating R0 based on the confirmed cases data
under the assumption that all cases are observed leads to the downward bias in the estimate
of R0 , poor fit of the observed data, and severe overestimation of the long run number of
deaths. Figure 13 repeats this analysis when the RSS is computed in logarithms and reaches
similar conclusions.
One may wonder whether the models studied here provide a good enough approximation
of reality. I try to answer this question by assessing the quality of out of sample predictions
generated by the models. If a model can fit the observed data well, it does not necessarily
produce good forecasts. But if the forecasts obtained using a given model are precise, this
can be viewed as evidence in favor of the model’s credibility.
I use the validation set approach as described above to study whether the models consid-
ered here have any predictive power in the short run. Figures 14 and 15 present the results
from the validation set exercise, similar to the one described above, for the US for different
choices of clinical parameters when the RSS is computed in levels and logs, respectively. Fig-
ures 16 and 17 present the results for California. As we can see, out of sample forecasts seem
to be better when the RSS in computed in levels, especially when the number of observations
in the training sample is equal to 55 or 60. But overall, it appears that the models considered
here generate fairly similar short run out of sample forecasts, and these forecasts match the
observed values closely. The discrepancy between the forecasts and observed values closer
to the end of the data might arise due to the mitigation policies, which are not taken into
account by the models. I could allow R0 to differ before and after policy interventions, but

16

Electronic copy available at: https://ssrn.com/abstract=3569367


the analysis of the impact of lockdowns on the value of R0 is beyond the scope of this paper.
Finally, I study whether additional information may be helpful for calibrating the initial
conditions using Iceland as an example. Iceland is an interesting country to study because
it was among the first countries to launch wide-scale random, or narly random, testing of
its population.11 While it is debatable whether testing in Iceland is completely random, my
goal here is to demonstrate how information from these tests could in principle be used to
calibrate the initial values.
The tests done in Iceland found 48 positives among 5,571 people who were tested for
COVID-19. I fix the clinical parameter values σ = 1/4 and γ = 1/10 and focus on the choice
of initial values. I assume that the fraction of Iceland’s population who had COVID-19 on
March 21, when the results were published, is the same as in the test, 0.86%. Given the
population of 341,250, this translates into 2,935 cases, with the 95% confidence interval of
[2, 107, 3, 762]. I hold I0 = 0 and calibrate E0 so that the sum of exposed and infectious
people on March 21 matches the numbers calculated above. E0 = 4.745 yields 2,933 cases,
E0 = 3.49 yields 2,107 cases, and E0 = 5.956 yields 3,762 cases on March 21. For simplicity, I
do not require that E0 be an integer. The results are presented in the right panel of Figure 18.
For comparison, the left panel of Figure 18 plots the results for E0 = 1.5, E0 = 4.745,
and E0 = 16. As we can see, both in the left and right panel all models are indistinguishable
on the available data. However, in the left panel, the forecasted death toll varies from under
50 for E0 = 16 to over 650 for E0 = 1.5; in the right panel, it varies from around 150 for
E0 = 5.956 to around 275 for E0 = 3.49. Thus, the use of additional information leads to a
5-fold reduction in the range of forecasted deaths for observationally equivalent models. This
result demonstrates the value of auxiliary information that becomes available due to random
testing.
11
https://www.government.is/news/article/?newsid=f96a270c-66e8-11ea-945f-005056bc4d74

17

Electronic copy available at: https://ssrn.com/abstract=3569367


7 Conclusion

In this paper, I show that the SEIRD model for COVID-19 is poorly identified from the
short run data on deaths and reported cases. There can be many different models that are
indistinguishable in the short run but result in different estimates of unobserved variables and
markedly different long run forecasts. The estimated number of active COVID-19 cases in the
US based on observationally equivalent models can vary from several million to 140 million,
while the forecasted number of deaths from the epidemic ranges from tens of thousands to over
a million. Thus, this paper highlights that long run forecasts for COVID-19 heavily depend
on arbitrary choices made by the researcher. Available data cannot be used to determine
which forecast is correct because there are many models that are observationally equivalent
in the short run.
Next, I show that the basic reproduction number R0 is identified from the data, conditional
on the values of clinical parameters, and estimate it. My model takes into account possible
underreporting of cases. I show that there appears to be no single correct value of the basic
reproduction number R0 , as its estimates vary depending on clinical parameters. However,
for all values of clinical parameters the estimates of R0 are heterogeneous across countries:
they are 2-3 times higher in the US and Western countries than in Asian countries. I also
demonstrate that the estimates of R0 based on the confirmed cases data under the assumption
that all cases are reported are biased downward. The resulting models are inconsistent with
the observed data and dramatically overestimate the long run number of deaths.
Finally, I demonstrate that auxiliary information from random tests for COVID-19 can
help calibrate the initial values of the model and reduce the range of possible forecasts that
are consistent with the observed data. Random, or nearly random, tests were conducted in
Iceland, and utilizing the information from these tests leads to a 5-fold reduction in the range
of the forecasted number of deaths.
I do not take a stand on which of the models used and forecasts made, if any, is correct.
The model I consider is fairly simplistic and does not take into account important factors

18

Electronic copy available at: https://ssrn.com/abstract=3569367


such as possible overloading of the health care system, mitigation efforts, etc. There are
more sophisticated and realistic epidemic models that may be able to predict the spread
of COVID-19 and the long run number of deaths better than the models presented here.
However, those models usually have even more parameters, so one may worry that their
identification would be even more troublesome.

19

Electronic copy available at: https://ssrn.com/abstract=3569367


Appendix A Tables and Figures

Table 1: Estimates of α, R0 , and λ


Levels, Levels, Logs, Logs,
E0 = 3, I0 = 0 E0 = 6, I0 = 0 E0 = 3, I0 = 0 E0 = 6, I0 = 0
Panel A: Deaths Only
α 0.00052 0.00013 0.00152 0.00076
R0 6.822 7.268 6.138 6.140

Panel B: Cases Only


R0 5.156 5.160 7.457 7.513
λ 0.6695 0.3329 0.0108 0.0052

Panel C: Deaths and Cases


α 0.00103 0.00009 0.00072 0.00036
R0 6.440 7.492 6.640 6.656
λ 0.0569 0.0053 0.0312 0.0153

The table presents the estimates of α, R0 , and λ for USA for γ = 1/10, σ = 1/5, and different initial parameters.

20

Electronic copy available at: https://ssrn.com/abstract=3569367


Table 2: Estimates of R0
σ = 1/4,
σ = 1/3, σ = 1/4, σ = 1/5,
γ = 1/10,
γ = 1/5 γ = 1/10 γ = 1/18
q = 0.5
Panel A: RSS in levels
USA 3.748 6.440 11.625 5.308
California 2.827 4.604 7.637 3.576
New York 3.854 6.937 11.423 4.631
UK 3.140 5.279 8.967 3.974
Japan 1.568 2.045 2.809 1.848
Taiwan 1.750 2.404 3.468 2.117

Panel B: RSS in logs


USA 3.768 6.625 11.621 4.741
California 3.082 5.144 8.695 3.902
New York 4.014 7.445 13.601 5.048
UK 3.736 6.634 11.752 4.704
Japan 1.785 2.467 3.574 2.159
Taiwan 1.577 2.064 2.843 1.863

The table presents the estimates of R0 for different countries and different values of clinical parameters σ
and γ. Panel A computes RSS in levels. Panel B computes RSS in logs.

21

Electronic copy available at: https://ssrn.com/abstract=3569367


Table 3: Alternative Estimates of R0
σ = 1/4,
σ = 1/3, σ = 1/4, σ = 1/5,
γ = 1/10,
γ = 1/5 γ = 1/10 γ = 1/18
q = 0.5
Panel A: RSS in levels, E instead of I in equation (2.6)
USA 3.652 6.352 11.058 4.606
California 2.829 4.608 7.644 3.580
New York 3.859 7.080 12.832 4.452
UK 3.144 5.261 8.913 3.981
Japan 1.575 2.069 2.870 1.867
Taiwan 1.752 2.410 3.483 2.121

Panel B: RSS in logs, E instead of I in equation (2.6)


USA 3.768 6.625 11.623 4.741
California 3.083 5.146 8.698 3.903
New York 3.932 7.209 13.055 4.926
UK 3.777 6.713 11.908 4.773
Japan 1.798 2.507 3.673 2.190
Taiwan 1.586 2.092 2.912 1.884

The table presents the estimates of R0 for different countries and different values of clinical parameters σ and
γ when E(t) is used in equation (2.6) instead of I(t). Panel A computes RSS in levels. Panel B computes
RSS in logs.

22

Electronic copy available at: https://ssrn.com/abstract=3569367


Table 4: Estimates of R0 , First 60 Observations
σ = 1/4,
σ = 1/3, σ = 1/4, σ = 1/5,
γ = 1/10,
γ = 1/5 γ = 1/10 γ = 1/18
q = 0.5
Panel A: RSS in levels
USA 4.594 8.597 15.784 5.676
California 3.157 5.304 9.009 3.996
New York 4.250 7.955 15.784 5.296
UK 3.973 7.137 18.424 4.982
Japan 1.712 2.323 3.308 2.054
Taiwan 1.390 1.526 1.865 1.436

Panel B: RSS in logs


USA 3.316 5.648 9.688 4.193
California 3.070 5.123 8.656 3.888
New York 4.427 8.352 15.408 5.500
UK 3.991 7.147 12.709 5.002
Japan 1.898 2.690 3.985 2.318

The table presents the estimates of R0 for different countries and different values of clinical parameters σ
and γ. Panel A computes RSS in levels. Panel B computes RSS in logs. Only the first 60 observations are
used in estimation. Note: results for Taiwan become unreliable when RSS is computed in logs due to the
very small number of reported cases and deaths in the data and are thus omitted.

23

Electronic copy available at: https://ssrn.com/abstract=3569367


Figure 1: Parameter Identification

The upper panel shows the short run number of deaths and reported cases for three sets of parameters θ1 =
(0.01, 5, 0.2, 2, 2, 2), θ2 = (0.005, 5, 0.1, 4, 4, 2), and θ3 = (0.004, 5, 0.08, 2, 2, 10), where θ = (α, R, λ, E0 , I0 , T0 ).
The middle panel shows the long run forecasts from these models. The lower panel fixes the initial conditions
and shows the short run number of deaths and reported cases for (α, R, λ) = (0.01, 5, 0.2), (0.01, 3, 0.2), and
(0.05, 3, 0.8).

24

Electronic copy available at: https://ssrn.com/abstract=3569367


Figure 2: Parameter Identification in Logarithms

The upper panel shows the logarithms of the short run number of deaths and reported cases for three sets
of parameters θ1 = (0.01, 5, 0.2, 2, 2, 2), θ2 = (0.005, 5, 0.1, 4, 4, 2), and θ3 = (0.004, 5, 0.08, 2, 2, 10), where
θ = (α, R, λ, E0 , I0 , T0 ). The lower panel fixes the initial conditions and shows the logarithms of the short
run number of deaths and reported cases for (α, R, λ) = (0.01, 5, 0.2), (0.01, 3, 0.2), and (0.05, 3, 0.8).

25

Electronic copy available at: https://ssrn.com/abstract=3569367


Figure 3: Role of Different Parameters

The upper panel shows the short run number of deaths and reported cases for three sets of parameters θ1 =
(0.01, 5, 0.2, 2, 2, 2), θ2 = (0.005, 5, 0.1, 4, 4, 2), and θ3 = (0.004, 5, 0.08, 2, 2, 10), where θ = (α, R, λ, E0 , I0 , T0 ).
The middle panel shows the long run forecasts from these models. The lower panel fixes the initial conditions
and shows the short run number of deaths and reported cases for (α, R, λ) = (0.01, 5, 0.2), (0.01, 3, 0.2), and
(0.05, 3, 0.8).

26

Electronic copy available at: https://ssrn.com/abstract=3569367


Figure 4: Results for USA

The upper panel shows the fit of the actual deaths and reported cases by models with four different values
of clinical parameters σ and γ. The middle panel shows the fit of the logarithms of the actual deaths and
reported cases for the same four models. The lower panel shows the forecasts from the same four models.

27

Electronic copy available at: https://ssrn.com/abstract=3569367


Figure 5: Results for USA, Fitting in Logs

The upper panel shows the fit of the actual deaths and reported cases by models with four different values
of clinical parameters σ and γ. The middle panel shows the fit of the logarithms of the actual deaths and
reported cases for the same four models. The lower panel shows the forecasts from the same four models.

28

Electronic copy available at: https://ssrn.com/abstract=3569367


Figure 6: Results for USA, Robustness Check

The upper panel shows the fit of the actual deaths for the models that use I(t) and E(t) in equation (2.6).
The middle panel shows the fit of the reported cases for the same models. The bottom panel shows the
deaths forecasts. The left panel computes RSS in levels. The right panel computes RSS in logs.

29

Electronic copy available at: https://ssrn.com/abstract=3569367


Figure 7: Results for California

The upper panel shows the fit of the actual deaths and reported cases by models with four different values
of clinical parameters σ and γ. The middle panel shows the fit of the logarithms of the actual deaths and
reported cases for the same four models. The lower panel shows the forecasts from the same four models.

30

Electronic copy available at: https://ssrn.com/abstract=3569367


Figure 8: Results for California, Fitting in Logs

The upper panel shows the fit of the actual deaths and reported cases by models with four different values
of clinical parameters σ and γ. The middle panel shows the fit of the logarithms of the actual deaths and
reported cases for the same four models. The lower panel shows the forecasts from the same four models.

31

Electronic copy available at: https://ssrn.com/abstract=3569367


Figure 9: Pessimistic and Optimistic Scenarios for USA, σ = 1/4, γ = 1/10

The upper panel shows the fit of the actual deaths and reported cases by models with three different sets of
initial conditions. The middle panel shows the fit of the logarithms of the actual deaths and reported cases
by the same three models. The lower panel shows the forecasts from the same three models.

32

Electronic copy available at: https://ssrn.com/abstract=3569367


Figure 10: Pessimistic and Optimistic Scenarios for USA, σ = 1/4, γ = 1/10

The upper panel shows the fit of the actual deaths and reported cases by models with three different sets of
initial conditions. The middle panel shows the fit of the logarithms of the actual deaths and reported cases
by the same three models. The lower panel shows the forecasts from the same three models.

33

Electronic copy available at: https://ssrn.com/abstract=3569367


Figure 11: Results for USA for Different Values of R0 , σ = 1/4, γ = 1/10

The upper panel shows the fit of the actual deaths and reported cases by models with different estimates of
R0 and different initial conditions. The middle panel shows the fit of the logarithms of the actual deaths and
reported cases by the same three models. The lower panel shows the forecasts from the same three models.

34

Electronic copy available at: https://ssrn.com/abstract=3569367


Figure 12: Validation Set Results for USA for Different Values of R0 , σ = 1/4, γ = 1/10

The figure presents the results from the validation set exercise for the US. The vertical line marks the end of
the training period. Observations to the left of the line are used to estimate the model. Observations to the
right are used to assess the accuracy of forecasts. The red model allows for possible underreporting of the
number of confirmed cases. The blue model assumes that all cases are reported.

35

Electronic copy available at: https://ssrn.com/abstract=3569367


Figure 13: Results for USA for Different Values of R0 , σ = 1/4, γ = 1/10, Fitting in Logs

The upper panel shows the fit of the actual deaths and reported cases by models with different estimates of
R0 and different initial conditions. The middle panel shows the fit of the logarithms of the actual deaths and
reported cases by the same three models. The lower panel shows the forecasts from the same three models.

36

Electronic copy available at: https://ssrn.com/abstract=3569367


Figure 14: Validation Set Results for USA

The figure presents the results from the validation set exercise for the US. The vertical line marks the end of
the training period. Observations to the left of the line are used to estimate the model. Observations to the
right are used to assess the accuracy of forecasts.

37

Electronic copy available at: https://ssrn.com/abstract=3569367


Figure 15: Validation Set Results for USA, Fitting in Logs

The figure presents the results from the validation set exercise for the US. The vertical line marks the end of
the training period. Observations to the left of the line are used to estimate the model. Observations to the
right are used to assess the accuracy of forecasts.

38

Electronic copy available at: https://ssrn.com/abstract=3569367


Figure 16: Validation Set Results for CA

The figure presents the results from the validation set exercise for the US. The vertical line marks the end of
the training period. Observations to the left of the line are used to estimate the model. Observations to the
right are used to assess the accuracy of forecasts.

39

Electronic copy available at: https://ssrn.com/abstract=3569367


Figure 17: Validation Set Results for CA, Fitting in Logs

The figure presents the results from the validation set exercise for the US. The vertical line marks the end of
the training period. Observations to the left of the line are used to estimate the model. Observations to the
right are used to assess the accuracy of forecasts.

40

Electronic copy available at: https://ssrn.com/abstract=3569367


Figure 18: Results for Iceland

The figure presents the results for Iceland. The left panel does not use any additional information. The right
panel matches the number of active COVID-19 cases on March 21 to the one estimated based on testing a
random sample of population. The upper panel shows the deaths fit by models with three different initial
values E0 . The middle panel shows the reported cases fit by the same three models. The lower panel shows
the deaths forecasts from these models.

41

Electronic copy available at: https://ssrn.com/abstract=3569367


References
Atkeson, A. (2020a): “How Deadly is COVID-19? Understanding the Difficulties with
Estimation of its Fatality Rate,” Working paper, UCLA.

——— (2020b): “Lockdowns and GDP: Is there a tradeoff?” Working paper, UCLA.

——— (2020c): “What Will Be the Economic Impact of COVID-19 in the US? Rough Esti-
mates of Disease Scenarios,” Working Paper 26867, National Bureau of Economic Research.

Berger, D. W., K. F. Herkenhoff, and S. Mongey (2020): “An SEIR Infectious


Disease Model with Testing and Conditional Quarantine,” Tech. rep., National Bureau of
Economic Research.

Chowell, G., C. Ammon, N. Hengartner, and J. Hyman (2006): “Transmission


dynamics of the great influenza pandemic of 1918 in Geneva, Switzerland: Assessing the
effects of hypothetical interventions,” Journal of Theoretical Biology, 241, 193 – 204.

Chowell, G., P. Fenimore, M. Castillo-Garsow, and C. Castillo-Chavez (2003):


“SARS outbreaks in Ontario, Hong Kong and Singapore: the role of diagnosis and isolation
as a control mechanism,” Journal of Theoretical Biology, 224, 1 – 8.

Chowell, G., H. Nishiura, and L. M. Bettencourt (2007): “Comparative estimation


of the reproduction number for pandemic influenza from daily case notification data,”
Journal of The Royal Society Interface, 4, 155–166.

Ducrot, A., P. Magal, T. Nguyen, and G. F. Webb (2019): “Identifying the number
of unreported cases in SIR epidemic models,” Mathematical Medicine and Biology.

Eichenbaum, M. S., S. Rebelo, and M. Trabandt (2020): “The macroeconomics of


epidemics,” Tech. rep., National Bureau of Economic Research.

Ferguson, N. M., D. Laydon, G. Nedjati-Gilani, N. Imai, K. Ainslie,


M. Baguelin, S. Bhatia, A. Boonyasiri, Z. Cucunuba, G. Cuomo-Dannenburg,
A. Dighe, I. Dorigatti, H. Fu, K. Gaythorpe, W. Green, A. Hamlet, W. Hins-
ley, L. C. Okell, S. van Elsland, H. Thompson, R. Verity, E. Volz, H. Wang,
Y. Wang, P. G. Walker, C. Walters, P. Winskill, C. Whittaker, C. A. Don-
nelly, S. Riley, and A. C. Ghani (2020): “Report 9: Impact of non-pharmaceutical
interventions (NPIs) to reduce COVID19 mortality and healthcare demand,” Tech. rep.,
Imperial College London.

42

Electronic copy available at: https://ssrn.com/abstract=3569367


Fernandez-Villaverde, J. and C. Jones (2020): “Estimating and Simulating a SIRD
Model of COVID-19,” mimeo.

Korolev, I. (2020): “What Does the Case Fatality Ratio Really Measure?” Working paper,
Binghamton University.

Lauer, S. A., K. H. Grantz, Q. Bi, F. K. Jones, Q. Zheng, H. R. Meredith,


A. S. Azman, N. G. Reich, and J. Lessler (2020): “The incubation period of coro-
navirus disease 2019 (COVID-19) from publicly reported confirmed cases: estimation and
application,” Annals of internal medicine.

Lewbel, A. (2019): “The Identification Zoo: Meanings of Identification in Econometrics,”


Journal of Economic Literature, 57, 835–903.

Lin, Q., S. Zhao, D. Gao, Y. Lou, S. Yang, S. S. Musa, M. H. Wang, Y. Cai,


W. Wang, L. Yang, et al. (2020): “A conceptual model for the coronavirus disease
2019 (COVID-19) outbreak in Wuhan, China with individual reaction and governmental
action,” International journal of infectious diseases.

Liu, Y., A. A. Gayle, A. Wilder-Smith, and J. Rocklöv (2020): “The reproductive


number of COVID-19 is higher compared to SARS coronavirus,” Journal of travel medicine.

Magal, P. and G. Webb (2018): “The parameter identification problem for SIR epidemic
models: identifying unreported cases,” Journal of mathematical biology, 77, 1629–1648.

Marinov, T. T., R. S. Marinova, J. Omojola, and M. Jackson (2014): “Inverse prob-


lem for coefficient identification in SIR epidemic models,” Computers & Mathematics with
Applications, 67, 2218 – 2227, efficient Algorithms for Large Scale Scientific Computations.

Piguillem, F. and L. Shi (2020): “The Optimal COVID-19 Quarantine and Testing Poli-
cies,” Tech. rep., Einaudi Institute for Economics and Finance (EIEF).

Sanche, S., Y. T. Lin, C. Xu, E. Romero-Severson, N. Hengartner, and R. Ke


(2020): “High Contagiousness and Rapid Spread of Severe Acute Respiratory Syndrome
Coronavirus 2,” Emerging Infectious Diseases.

Toda, A. A. (2020): “Susceptible-Infected-Recovered (SIR) Dynamics of COVID-19 and


Economic Impact,” arXiv preprint arXiv:2003.11221.

Wang, H., Z. Wang, Y. Dong, R. Chang, C. Xu, X. Yu, S. Zhang, L. Tsamlag,


M. Shang, J. Huang, et al. (2020): “Phase-adjusted estimation of the number of
coronavirus disease 2019 cases in Wuhan, China,” Cell Discovery, 6, 1–8.

43

Electronic copy available at: https://ssrn.com/abstract=3569367

You might also like