Session 4 Summary
Session 4 Summary
Session 4 Summary
Descriptive statistics involves summarizing and describing data Let's say a biostatistician is studying the heights of a g
using numerical measures, tables, and graphs. It focuses on individuals. They collect data on the heights of 100 peo
organizing and presenting data in a meaningful way to gain population. To summarize and describe this dat
insights and understand the characteristics of the data set. biostatistician calculates various descriptive statistics suc
Descriptive statistics provide a snapshot of the data and are mean height, median height, range, and standard de
typically used to describe and analyze a sample or a population. They might also create a histogram or a box plot to visua
distribution of heights in the population.
e
ying the heights of a group of
the heights of 100 people in a
nd describe this data, the
descriptive statistics such as the
ange, and standard deviation.
am or a box plot to visualize the
ulation.
Regression includes
ANOVA
Confidence intervals for both the slope and the Y intercept
12.5
3.12
Data
Level of Significance 0.05
Number of Rows 2
Number of Columns 2
Degrees of Freedom 1
Results
Critical Value 3.8414588
Chi-Square Test Statistic 0.3159808
p-Value 0.5740331
Do not reject the null hypothesis
The normal probability distribution, also known as the Gaussian distribution or bell In this formula:
curve, is one of the most important and widely used probability distributions in x represents a random variable.
statistics. It is characterized by its symmetric, bell-shaped curve. μ (mu) is the mean or the center of
σ (sigma) is the standard deviatio
The probability density function (PDF) is a mathematical function that describes the the spread or variability of the dist
shape of the normal distribution. The PDF of the normal distribution is given by the π (pi) is a mathematical constant,
following formula: to 3.14159.
e is the base of the natural loga
equal to 2.71828.
Hypothesis Testing: In biostatistics, hypothesis testing is often used to determine whether a specific treatment or interventi
biological outcome. The normal distribution is frequently used to model the sampling distribution of test statistics, such a
employed to test hypotheses about means, proportions, or other parameters.
Confidence Intervals: The estimation of population parameters, such as means or proportions, is a common task in bio
provide a range of plausible values for a population parameter, along with an associated level of confidence. When sample
distribution is approximately normal, confidence intervals are often based on the normal distribution.
Power and Sample Size Calculations: Power analysis is used in biostatistics to determine the required sample size for a stu
statistical power. Statistical power refers to the ability of a study to detect a true effect when it exists. The normal distributio
variability of the outcome and calculate the necessary sample size for different effect sizes and levels of significance.
Regression Analysis: In regression analysis, the normal distribution plays a crucial role. Linear regression models, for examp
or residuals follow a normal distribution with constant variance. This assumption allows for the calculation of confidence in
the regression coefficients.
ght, weight, blood pressure, enzyme activity, and gene
s can model and analyze these variables in biostatistical
Estimate Example
al estimate could be a confidence interval for the average height of the
n. For instance, a 95% confidence interval might be (165 cm, 175 cm),
g that there is a 95% chance that the true average height of the population
een 165 cm and 175 cm based on the sample data and the chosen level of
ce.
Xi Calculate the sample average
-5
0
-10
-15
5
-5
-5
0
5
-10
Estimation of μ Estimation of μ ~ wh
Estimation of μ refers to the process of estimating or determining the unknown population mean In practical terms, e
(μ) based on sample data. In statistics, μ represents the true average or mean of a particular population based on
variable in the entire population. the average income o
their incomes, and us
When we have a sample from a population, we can use statistical methods to estimate the
population mean. Estimating μ involves using the sample mean (x̄) as a point estimate of the What should we con
population mean. The sample mean is calculated by summing the values of the observations in the
sample and dividing it by the sample size. It's important to note
samples from the sa
The process of estimation involves using the sample mean as an approximation or best guess of the confidence intervals c
unknown population mean. The quality of the estimate depends on the representativeness and and provide a range o
size of the sample, as well as the variability within the population.
Estimation of μ ~ what is the purpose?
More specifically, the central limit theorem states that the sampling A common rule of thumb
distribution of the sum (or average) of a large number of independent and
identically distributed (i.i.d.) random variables approaches a normal It is important to note that the central lim
distribution as the sample size increases, regardless of the shape of the sample sizes. The exact conditions fo
original population distribution. characteristics of the underlying populatio
rule of thumb is that a sample size of a
In simpler terms central limit theorem to be applicable.
In simpler terms, the central limit theorem tells us that if we take many
samples of a certain size from any population (regardless of its distribution
shape), calculate the means of those samples, and plot a histogram of those
means, the resulting distribution will be approximately normal.
eorem ~ why is widely used?
theorem is widely used in statistical inference because it
e assumptions about the sampling distribution and perform
construct confidence intervals, and estimate parameters,
ginal population distribution is unknown or non-normal.
f thumb
note that the central limit theorem holds for sufficiently large
e exact conditions for its application depend on the
he underlying population distribution, but a commonly used
that a sample size of at least 30 is often sufficient for the
em to be applicable.
Online calculator
1)
2)
3)
4)
Online calculator https://onlinestatbook.com/2/calculators/normal_dist.html
Range of Plausible Values: The interval [0.04, 3.96] provides a range of plausible values for the populatio
suggests that, with 95% confidence, the true population mean falls somewhere between 0.04 and 3.96.
Precision of the Estimate: The width of the confidence interval reflects the precision of the estimate. In t
interval has a width of 3.92 (3.96 - 0.04), indicating that the estimate is relatively imprecise. A narrower in
indicate a more precise estimate.
No Guarantee about a Specific Value: It's important to note that the confidence interval does not make
statement about the true population mean. It provides a range of plausible values within which the tru
mean is likely to lie, but it does not single out a specific value.
opulation mean (μ) lies within this interval
.
3.96] for the estimate of μ suggests that,
ons, we are 95% confident that the true
Increased Precision: A larger sample size provides more information about the population,
leading to increased precision in estimating the parameter of interest. With a larger sample, the
estimate of the population parameter becomes more reliable, reducing the variability and
resulting in a narrower confidence interval.
More Certainty: With a larger sample size, there is a higher level of confidence in the estimate
and narrower confidence intervals. This means that the range of plausible values for the
population parameter becomes more focused, providing a higher level of certainty about where
the true parameter lies.
Trade-off with Cost and Resources: While increasing the sample size generally leads to
narrower confidence intervals and increased precision, it often requires more resources, time,
and effort. Collecting a larger sample may involve increased costs and logistical challenges.
Therefore, the decision regarding sample size should consider the trade-off between precision
and the available resources.
Relationship between CI and Sample Size
It's important to note that the relationship between the confidence interval and sample size is
not linear. The effect of increasing the sample size on the width of the confidence interval
diminishes as the sample size becomes larger. For very large sample sizes, additional increases in
the sample size have a minimal impact on further narrowing the confidence interval.
nate hypothesis
othesis (H₁ or Ha) contradicts the null hypothesis and
m or theory we are trying to support. It asserts the
, difference, or relationship in the population.
Hypothesis Testing Procedure
The hypothesis testing process involves the following steps:
1 State the hypotheses: Clearly specify the null hypothesis (H₀) and the alternative
hypothesis (H₁).
2 Set the significance level (α): Determine the level of significance, often denoted
as α, which represents the probability of rejecting the null hypothesis when it is
true. Commonly used values for α include 0.05 (5%) and 0.01 (1%).
3 Collect and analyze the data: Gather a sample of data and perform the appropriate
statistical analysis to obtain test statistics or p-values.
4 Determine the critical region: Based on the chosen significance level (α), determine
the critical region or rejection region of the test statistic. This is the range of values
that would lead to the rejection of the null hypothesis.
5 Calculate the test statistic: Compute the test statistic based on the data and the
chosen statistical test (e.g., t-test, chi-square test, etc.).
6 Make a decision: Compare the test statistic to the critical value(s) or use the p-
value to make a decision. If the test statistic falls in the critical region or the p-
value is less than α, reject the null hypothesis. Otherwise, fail to reject the null
hypothesis.
7 Draw conclusions: Based on the decision made, interpret the results and draw
conclusions about the population based on the evidence provided by the data.
What is the P-value? What is the level of significance?
"The p-value is the probability of observing the data, or more extreme data, The level of significance, denoted as α (alpha
assuming that the null hypothesis is true." used in hypothesis testing to determine th
hypothesis. It represents the maximum prob
In simpler terms..... which is the incorrect rejection of the null
the p-value quantifies the strength of evidence against the null hypothesis. true.
It tells us how likely it is to obtain the observed data or more extreme data if
the null hypothesis is true. The most commonly used levels of significan
Choosing a higher level of significance incre
A smaller p-value indicates stronger evidence against the null hypothesis, the null hypothesis, while choosing a lowe
suggesting that the observed data is unlikely to occur by chance alone. stronger evidence to reject the null hypothes
That's why we reject it when the p-value is < level of signficance.
2) Z= 2
2
U = 25
Sample 1
e Interval Excluding Zero ~ Example
er a study that investigates the effect of a new drug on blood pressure. A researcher
a from a sample of individuals and calculates a 95% confidence interval for the mean
n blood pressure before and after taking the drug. The confidence interval is (-5.2, -
confidence interval does not include zero, it suggests that there is a statistically
difference in blood pressure before and after taking the drug.
er hand, the interval indicates that, with 95% confidence, the true mean difference
n -5.2 and -1.8 units.
x - bar = 25.3
Why some researchers suggest confidence intervals should be preferred over 1 Simplicity and Interpretability
statistical tests? Confidence intervals provide a m
values and statistical tests. They
Some researchers suggest that confidence intervals should be preferred over statistical parameter of interest. This make
tests due to several reasons: communicate the uncertainty ass
4 Overemphasis on Statistical S
Statistical tests based on p-values
(significant or not significant) an
the other hand, provide a mor
enabling a more balanced conside
city and Interpretability 5 Replicability and Reproducibility
ce intervals provide a more intuitive and straightforward interpretation compared to p- Confidence intervals encourage replicatio
nd statistical tests. They directly estimate the range of plausible values for the population indication of the range of values that f
er of interest. This makes it easier for researchers and decision-makers to understand and comparison and synthesis of research findi
cate the uncertainty associated with the estimate.
In summary
It's important to note that confidence inte
ete Information provide complementary information. How
ce intervals provide more complete information about the parameter estimate by including
point estimate and the associated uncertainty. They convey not only the magnitude of the their ability to offer a more comprehe
associated uncertainty.
t also the precision of the estimate. This additional information can be valuable in making
decisions and drawing meaningful conclusions.
ng Arbitrary Cutoffs
ce intervals avoid the reliance on arbitrary significance levels (e.g., α = 0.05) often used in
is testing. By presenting a range of plausible values, confidence intervals allow for a more
assessment of the evidence. Researchers can evaluate the magnitude and precision of the
without the need for dichotomous decisions based on arbitrary thresholds.
ote that confidence intervals and statistical tests serve different purposes and can
entary information. However, the preference for confidence intervals stems from
ffer a more comprehensive and interpretable summary of the data and the
ainty.