HW 15 220 Soln
HW 15 220 Soln
HW 15 220 Soln
Required Problems:
(1) (a) These are paired data. Recall Section 9.3 and Lecture 8 regarding the variance of linear combinations of variables:
We also know that: 𝑀𝐸𝐴𝑁 𝑑𝑒𝑐21 𝑗𝑎𝑛22 𝑀𝐸𝐴𝑁 𝑑𝑒𝑐21 𝑀𝐸𝐴𝑁 𝑗𝑎𝑛22 163.1968
(Note: Your answers may differ very slightly because of rounding: the above numbers come directly from STATA.)
𝐻 :𝜇 0 versus 𝐻 : 𝜇 0
𝑠 590.6239
𝑆𝐸 𝑑̅ 17.0498
√𝑛 √1200
𝑑̅ Δ 163.1968 0
𝑡 9.57
𝑆𝐸 𝑑̅ 17.0498
𝜈 𝑛 1 1,199
Use the Standard Normal table because the degrees of freedom are huge. However, the t test statistic is enormous, and
the P‐value is 0 for all practical purposes: extremely strong evidence of a difference. There is a highly statistically
significant difference in the mean credit card spending in December versus January.
(b) Use 𝑋 𝑡 ⁄ . The 95% CI for a mean difference from December to January is (‐196.6, ‐129.7). (Show your work.)
√
We are 95% confident that the mean credit card spending of ALL customers decreased between $127.7 and $196.6.
(c) Failing to recognize that these data are paired means the standard error of the difference would be much bigger.
There is a positive correlation and changing that correlation to zero – like for independent samples – would mean not
subtracting anything in computing the variance of the difference in Part (a): hence, a bigger variance, which feeds
through to a bigger standard error. Hence, the hypothesis test would be less powerful: there is a greater chance of an
inconclusive result, which means being unable to rule out no difference in spending. The confidence interval estimate
would be wider: less precise inferences about the size of the difference in spending.
(c) For Part (a), the point estimate is the difference in the sample mean wages: 𝑋 , 𝑋 , . For Part (b),
the point estimate is the difference in the sample proportions unemployed: 𝑃 , 𝑃 , .
Page 1 of 4
(d) Any difference in wages may have been small (in terms of dollars), which means not economically significant, and/or
any difference may simply be because of sampling error, which means not statistically significant.
(3) (a) Call 𝜇 the population mean donated for those giving money and offered no match. Call 𝜇 the population mean
donated for those giving money and offered a 1:1 match. Similarly, 𝜇 and 𝜇 are for a 2:1 and 3:1 match. The question
asks three things: 𝐻 : 𝜇 𝜇 0 versus 𝐻 : 𝜇 𝜇 0; 𝐻 : 𝜇 𝜇 0 versus 𝐻 : 𝜇 𝜇 0; 𝐻 : 𝜇 𝜇 0
versus 𝐻 : 𝜇 𝜇 0. There is no point in doing these tests because there is NO EVIDENCE in favor of the research
hypotheses. In all three cases the sample average amount donated with a match is LESS THAN the average amount
donated with no match. If we did the formal hypothesis tests, the P‐values would be above 0.5 (i.e. huge): we have no
evidence that offering a match increases the mean amount donated among those donating.
(b) This sentence is potentially misleading: “We find that the match offer increases both the revenue per solicitation and
the response rate.” It seems to imply two effects when there is only one effect. The proportion giving money does
increase with a match compared to no match (i.e. the match causes an increase in the response rate). However, that is
the only effect. The only reason the mean revenue per solicitation goes up is because a higher proportion of people give
and so they are averaging in fewer zeros. Part (a) shows that conditional on giving (i.e. averaging in only the non‐zero
donations), the revenue per solicitation goes down (slightly) with the matches compared to no match.
(c) In this case, because we know the amount donated cannot be less than 0, the mean and s.d. ($79.99 and $627.06)
alone show the obvious presence of an outlier (or outliers). If we made the mistake of going ahead with the analysis and
compared the amount donated for a 3:1 match versus a 2:1 match:
393201.5 1871.691
𝑆. 𝐸. 𝑋 𝑋 39.5
253 252
79.98696 45.3373
𝑡 0.88
39.5
𝑠 𝑠
𝑛 𝑛
𝜈 254.4 254
1 𝑠 1 𝑠
𝑛 1 𝑛 𝑛 1 𝑛
From the Student t table, the critical value for 250 degrees of freedom (a good approximation) and a significance level of
0.05 for this two‐tailed test is 1.969. The rejection region is (‐∞, ‐1.969) and (1.969, ∞). The test statistic is not in the
rejection region so we fail to reject the null. We do not have a statistically significant difference at a 5% level. We can
approximate the P‐value with the Student t table to be greater than 0.20 (remember it is a two‐tailed test) so it is not
statistically significant at any reasonable significance level. [Note: You may be surprised that this large positive outlier
that pulled up the mean so much did not cause a statistically significant difference. The outlier also made the s.d. very
large.] The difference between an average donation of $79.99 and $45.34 would certainly be economically significant
even if it is not statistically significant. However, because a single data point single‐handedly caused this large this
difference, we would not say that we have an economically significant result.
(4) (a) Panel B breaks out the original sample of 24,646 lottery participants (i.e. the winners and losers) into three
different subgroups: those that had “no visits” to the ED BEFORE the lottery (i.e. the healthy people), those with “one
visit” BEFORE the lottery (i.e. not as healthy people), and those with two or more visits to the ED BEFORE the lottery (i.e.
the chronically unhealthy people). Notice the description in the title of Panel B and how 16,930 + 3,881 + 3,835 = 24,646.
Within each subgroup (e.g. the healthy subgroup) some people won the lottery (got insurance) and others lost (no
insurance). Hence, the Panel B results show that regardless of whether we look at people who were healthier or sicker
Page 2 of 4
PRIOR to the lottery, we see POSITIVE and economically significant increases in the ED use with coverage (the opposite
of what many predicted). However, not all the results are statistically significant.
(b) Define 𝑝 to be the proportion of all people in the control group (i.e. no Medicaid, lost the lottery) who did not have
any visits to the ED in the pre‐randomization period (i.e. before the lottery) that did visit the ED after the lottery. Define
𝑝 to be the proportion of all people in the treatment group (i.e. got Medicaid, won the lottery) who did not have any
visits to the ED in the pre‐randomization period (i.e. before the lottery) that did visit the ED after the lottery.
𝐻 : 𝑝 𝑝 0
𝐻: 𝑝 𝑝 0
The table tells us that 𝑛 𝑛 16,903. However, it does not report how many people are in the control group and
treatment group. But it does tell us that 𝑃 𝑃 0.067 and that 0.029 (and that 𝑃 0.225
which implies that 𝑃 0.292). Hence, we can find the z value:
.
𝑧 2.31
.
𝑃 𝑣𝑎𝑙𝑢𝑒 𝑃 𝑧 2.31 𝑃 𝑧 2.31 2 ∗ 0.0104 0.0208. The table reports a P‐value of 0.019: ours is off a
tiny bit because we used rounded numbers in our calculations (and we used the Normal table instead of software).
(c) Define 𝜇 to be the mean number of ED visits post‐randomization (i.e. after the lottery) of all people in the control
group (i.e. no Medicaid, lost the lottery) who did not have any visits to the ED in the pre‐randomization period (before
the lottery). Define 𝜇 to the mean number of ED visits post‐randomization (i.e. after the lottery) of all people in the
treatment group (i.e. got Medicaid, won the lottery) who did not have any visits to the ED in the pre‐randomization
period (before the lottery).
𝐻 : 𝜇 𝜇 0
𝐻: 𝜇 𝜇 0
These are independent samples (not paired data). However, it is unclear if they assumed equal variances. Even though
our textbook cautions against it, researchers often use the equal variances assumption.
From the table 𝑋 𝑋 0.261 and Δ 0 (and that 𝑋 0.418, which implies that 𝑋 0.679). Again, the table
does not give us enough to compute 𝑠 and 𝑛 and 𝑛 . But it does tell us that 0.084. Hence, we can find the
.
t value: 𝑡 3.11. Give the large degrees of freedom, we can use the Normal table very accurately approximate
.
the P‐value. 𝑃 𝑣𝑎𝑙𝑢𝑒 𝑃 𝑡 3.11 𝑃 𝑡 3.11 2 ∗ 𝑃 𝑍 3.11 2 ∗ 0.0009 0.0018. The table reports a
P‐value of 0.002 and our calculations round to that.
(d) There are two reasons. One is the bigger difference in mean visits comparing the treatment and control groups
(0.652 versus 0.380): other things equal, this leads to a bigger test statistic and smaller P‐value. (It is easier to reject a
Page 3 of 4
null of no difference in mean visits comparing the two groups if we see a big difference in the sample.) However, if you
look at the standard error for that estimate, it is also smaller than the next row (0.254 versus 0.648). Remember the
formula for the standard error of the difference between two sample means, independent samples: 𝑋 𝑋
, which is an estimate of 𝑆𝐷 𝑋 𝑋 . In addition to sample sizes, it is also a function of the
variance of ED visits among those in the treatment group and those in the control group. These must be smaller to
explain the smaller standard error. It makes sense the s.d. would be smaller for the group that had one ED visit
compared to the group that had two+ visits as the latter group likely includes some rather unwell people who had lots of
visits: i.e. a long right tail.
Page 4 of 4