2024-Lecture 02

Download as pdf or txt
Download as pdf or txt
You are on page 1of 42

PROBABILITY AND STATISTICS

LECTURE 2
DESCRIPTIVE STATISTICS II

Adapted from http://www.prenhall.com/mcclave


OUTLINE

1. Graphical methods for describing 2 variables


2. Numerical methods for describing data:
- Measures of central tendancy
- Measures of variation
- Measures of relative standing
DESCRIPTIVE STATISTICS

Descriptive Statistics

Graphical Numerical
Methods Methods

Now, we will continue to explore graphical methods


GRAPHICAL METHODS FOR 2
CATEGORICAL VARIABLES

2 categorical
variables

Tabulating Data Graphing Data

Crosstabulation
table Clustered Bar Stacked
Chart bar chart
CROSSTABULATION TABLE
CROSSTABULATION TABLE
CROSSTABULATION TABLE
ACTIVITY

Exploring graphs via


http://www.r-graph-gallery.com/all-graphs/
GRAPHICAL METHODS FOR 2
NUMERICAL VARIABLES

2 numerical
variables

Scatter plot
SCATTER PLOT

used for paired observations taken from two


numerical variables
2 axes:
 dependent variable
 independent variable
EXAMPLE
Average SAT scores by
state: 1998
Verbal Math
Alabama 562 558
Alaska 521 520
Arizona 525 528
Arkansas 568 555
California 497 516
Colorado 537 542
Connecticu
t 510 509
Delaware 501 493
D.C. 488 476
Florida 500 501
Georgia 486 482
Hawaii 483 513
ANALYZING SCATTER PLOT

Look for:
 Overall pattern
Form
Direction

Strength

 possible clusters/groups
 possible outliers

Q: What is an outlier?
DESCRIPTIVE STATISTICS

Descriptive Statistics

Graphical Numerical
Methods Methods

We will now explore numerical methods


NUMERICAL METHODS TO DESCRIBE AND
SUMMARIZE DATA

Describing Data Numerically

Relative standing
Central Tendency Variation
Percentile
Arithmetic Mean Range
Median Interquartile Range
Mode Variance

Standard Deviation
MEASURES OF CENTRAL
TENDENCY
ARITHMETIC MEAN


MEAN EXAMPLE
Raw Data:10.3 4.9 8.9 11.7 6.3 7.7

How would the formula change if the data


represent a population?
MEDIAN

1. Measure of Central Tendency


2. Middle Value In Ordered Sequence
 If Odd n, Middle Value of Sequence
 If Even n, Average of 2 Middle Values
3. Position of Median (for sample)

4. Not Affected by Extreme Values


MEDIAN EXAMPLE
ODD-SIZED SAMPLE
 Raw Data: 24.1 22.6 21.5 23.7 22.6
 Ordered: 21.5 22.6 22.6 23.7 24.1
 Position: 1 2 3 4 5
MEDIAN EXAMPLE
EVEN-SIZED SAMPLE

 Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7


 Ordered: 4.9 6.3 7.7 8.9 10.3 11.7
 Position: 1 2 3 4 5 6
MODE

 1. Value That Occurs Most Often


 2. Not Affected by Extreme Values
 3. May Be No Mode or Several Modes
 4. May Be Used for Numerical & Categorical
Data
MODE EXAMPLE

No Mode
Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7
One Mode
Raw Data: 6.3 4.9 8.9 6.3 4.9 4.9
More Than 1 Mode
Raw Data: 21 28 28 41 43 43

Q: can you give an example of mode for categorical


data?
DATA TYPE CONSIDERATIONS

 The mean, median, and mode are appropriate for


which type of data?
MEASURES OF VARIATION
(OR VARIABILITY)
RANGE

1. Difference Between Largest & Smallest Observations


2. Simple to compute and interpret
3. Affected by outliers
4. Ignores How Data Are Distributed

7 8 9 10 7 8 9 10
VARIANCE &
STANDARD DEVIATION

1. Most Common Measures


2. Involve all values in sample (or population)
3. Show Variation About Mean (X or )
4. Affected by outliers

X = 8.3

4 6 8 10 12
Variance

STANDARD DEVIATION

EXAMPLE
6 8 10 12 14 9 11 7 13 11
Calculate the sample variance and standard
deviation
VARIANCE AND SD
CHEBYSHEV’S RULE

EXAMPLE

EXAMPLE

EMPIRICAL RULE

EXAMPLE
Consider a very large number of students
taking a college entrance exam such as the SAT.
Suppose that the distribution of SAT score is
bell-shaped, the mean score on the mathematics
section of the SAT is 550 with a standard
deviation of 50.
Measures of relative standing
PERCENTILE

• Indicates position of a value relative to entire data set (it


is a measure of relative standing)
• Generally used for large data sets
• Example: an IQ score at the 90th percentile

 Question: A oil company’s sales are in the 80th
percentile of all companies in the industry. What
does it mean?
QUARTILES

 Split Ordered Data into 4 Quarters

25% 25% 25% 25%


Q1 Q2 Q3
INTERQUARTILE RANGE


BOX PLOT

• How to construct
• How to represent outliers
• Use a boxplot to assess and compare the
shape, central tendency, and variability of
distributions and to look for potential
outliers.
• Sample size: n at least 20
SHAPE & BOX PLOT

Source: https://www.slideshare.net/mido02/chap-3gbu
CONCLUSION

1. Graphical methods for describing 2


variables
2. Numerical methods for describing
data:
- Measures of central tendency
- Measures of variation
- Measures of relative standing

You might also like