Chapter 1: Populations, Samples and Processes

Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

Populations, Samples and Variable

Visual Displays for Univariate Data


Describing distributions
The Normal Distribution
Other Continuous Distributions
Several Useful discrete distributions

Chapter 1: Populations, Samples and


Processes

Applied Probability and Statistics for Engineering and Science Chapter 1: Populations, Samples and Processes 1
Populations, Samples and Variable
Visual Displays for Univariate Data
Describing distributions
The Normal Distribution
Other Continuous Distributions
Several Useful discrete distributions

Outline of Chapter 1
1.1 Populations, Samples and Variable
1.2 Visual Displays for Univariate Data
1.3 Describing Distributions
1.4 The Normal Distribution
1.5 Other Continuous Distributions
1.6 Several Useful Discrete Distributions

Applied Probability and Statistics for Engineering and Science Chapter 1: Populations, Samples and Processes 2
Populations, Samples and Variable
Visual Displays for Univariate Data
Describing distributions
The Normal Distribution
Other Continuous Distributions
Several Useful discrete distributions

Introduction
Statistics theory and techniques are powerful and
indispensable means in understanding the world around
us.
The means can help one to make intelligent judgments
and decisions in the presence of uncertainty and
variation.
Without uncertainty or variation, there would be little
need for statistical techniques and statisticians.

Applied Probability and Statistics for Engineering and Science Chapter 1: Populations, Samples and Processes 3
Populations, Samples and Variable
Visual Displays for Univariate Data
Populations
Describing distributions
Sample
The Normal Distribution
Branches of statistics
Other Continuous Distributions
Several Useful discrete distributions

Populations
Engineers and scientists are constantly exposed to
collections of facts/data in their work.
Population is a well-defined collection of objects.
Examples:
Students in Class ECE08
People in Vietnam
...

Applied Probability and Statistics for Engineering and Science Chapter 1: Populations, Samples and Processes 4
Populations, Samples and Variable
Visual Displays for Univariate Data
Populations
Describing distributions
Sample
The Normal Distribution
Branches of statistics
Other Continuous Distributions
Several Useful discrete distributions

Sample
When desired information is available for all objects in the
population, we have what is called a census.
Practical constraints (e.g., money, time and other limited
resources) usually make a census impractical or infeasible.
Sample: a (random) subset of the population.
For instance, we might select a sample of last year’s
engineering graduates to obtain feedback about the
quality of the engineering curricula.

Applied Probability and Statistics for Engineering and Science Chapter 1: Populations, Samples and Processes 5
Populations, Samples and Variable
Visual Displays for Univariate Data
Populations
Describing distributions
Sample
The Normal Distribution
Branches of statistics
Other Continuous Distributions
Several Useful discrete distributions

Sample: variable
Variable: is any characteristic whose value may change
from one object to another in the population. Examples:
X = gender of a graduating engineer, Y = age of a
graduating engineer, Z = temperature of a certain time
instance in a day.
Univariate data set: consists of observations on a single
variable.
Bivariate data: observations are made on each of two
variables.
Multivariate data: observations are made on more than
two variables.

Applied Probability and Statistics for Engineering and Science Chapter 1: Populations, Samples and Processes 6
Populations, Samples and Variable
Visual Displays for Univariate Data
Populations
Describing distributions
Sample
The Normal Distribution
Branches of statistics
Other Continuous Distributions
Several Useful discrete distributions

Branches of statistics
Descriptive Statistics: methods to summarize and
describe important features of the data. Examples:
Graphical: the construction of histogram, stem-and-leaf
display, dot plot
Calculation: numerical measures of means, variances,
correlation,...
Inferential Statistics: techniques for generalizing from a
sample to a population

Applied Probability and Statistics for Engineering and Science Chapter 1: Populations, Samples and Processes 7
Populations, Samples and Variable
Visual Displays for Univariate Data
Stem-and-leaf displays
Describing distributions
Dotplots
The Normal Distribution
Histograms
Other Continuous Distributions
Several Useful discrete distributions

Stem-and-leaf displays
Stem-and-leaf display: an effective way to organize
numerical data into two parts:
Stem: one or more leading digits
Leaf: the remaining digits
The display can provide the following information:
Identification of a typical or representative value
Extent of spread about the typical value
Presence of any gaps in the data
Extent of symmetry in the distribution of values
Number and location of peaks

Applied Probability and Statistics for Engineering and Science Chapter 1: Populations, Samples and Processes 8
Populations, Samples and Variable
Visual Displays for Univariate Data
Stem-and-leaf displays
Describing distributions
Dotplots
The Normal Distribution
Histograms
Other Continuous Distributions
Several Useful discrete distributions

Stem-and-leaf displays: an example


In a given experiment, the values of the considered
variable are: 41,43,49,52,57,...112,114,123
The related stem-and-leaf can be presented as follows:

Applied Probability and Statistics for Engineering and Science Chapter 1: Populations, Samples and Processes 9
Populations, Samples and Variable
Visual Displays for Univariate Data
Stem-and-leaf displays
Describing distributions
Dotplots
The Normal Distribution
Histograms
Other Continuous Distributions
Several Useful discrete distributions

Dotplots
Dotplot: a summary of data when the data set is
reasonably small or there are relatively few distinct data
values.
Each observation is represented by a dot above the
corresponding location on a a horizontal measurement
scale.
When a value occurs more than once, there is a dot for
each occurrence, and these dots are stacked vertically.
a dotplot provides information about location, spread,
extremes, and gaps.

Applied Probability and Statistics for Engineering and Science Chapter 1: Populations, Samples and Processes 10
Populations, Samples and Variable
Visual Displays for Univariate Data
Stem-and-leaf displays
Describing distributions
Dotplots
The Normal Distribution
Histograms
Other Continuous Distributions
Several Useful discrete distributions

Dotplot: an example

Here is an example to show what a dotplot looks like and how to interpret it. Suppose 30 first
graders are asked to pick their favorite color. Their choices can be summarized in a dotplot, as
shown below.

*
*
* *
* *
* * *
* * *
* * * * *
* * * * * *
* * * * * * *
Red Orange Yellow Green Blue Indigo Violet

Each dot represents one student, and the number of dots in a column represents the number of first
graders who selected the color associated with that column. For example, Red was the most popular
color (selected by 9 students), followed by Blue (selected by 7 students). Selected by only 1
student, Indigo was the least popular color.

Applied Probability and Statistics for Engineering and Science Chapter 1: Populations, Samples and Processes 11
Populations, Samples and Variable
Visual Displays for Univariate Data
Stem-and-leaf displays
Describing distributions
Dotplots
The Normal Distribution
Histograms
Other Continuous Distributions
Several Useful discrete distributions

Histograms
Construct a histogram for:
discrete data:
Determine the (relative) frequency of each x value in a
sample set
Mark possible x values on a horizontal scale
Above each value, draw a rectangle whose height is the
relative frequency of that value.
continuous data:
Determine the (relative) frequency of each class
Mark the class boundaries on a horizontal measurement
axis
Above each class interval, draw a rectangle whose height
is the corresponding frequency.
Applied Probability and Statistics for Engineering and Science Chapter 1: Populations, Samples and Processes 12
Populations, Samples and Variable
Visual Displays for Univariate Data
Stem-and-leaf displays
Describing distributions
Dotplots
The Normal Distribution
Histograms
Other Continuous Distributions
Several Useful discrete distributions

Histogram: an example

1500

Gaussian Histogram
Number of values in each interval

1000

500

0
−4 −3 −2 −1 0 1 2 3 4
Variable value

Applied Probability and Statistics for Engineering and Science Chapter 1: Populations, Samples and Processes 13
Populations, Samples and Variable
Visual Displays for Univariate Data
Describing distributions Continuous distributions
The Normal Distribution Discrete distributions
Other Continuous Distributions
Several Useful discrete distributions

Density function
A density function f (x) is used to describe
(approximately) the population distribution of a
continuous variable x.
The graph of f (x) is called the density curve.
The following properties of f (x) must be satisfied:
fR (x) ≥ 0
−∞
−∞ f (x)dx = 1 (i.e., the total area under the density
curve is 1)
For any two numbers a and b withR b a < b, the proportion
of x values between a and b = a f (x)dx.

Applied Probability and Statistics for Engineering and Science Chapter 1: Populations, Samples and Processes 14
Populations, Samples and Variable
Visual Displays for Univariate Data
Describing distributions Continuous distributions
The Normal Distribution Discrete distributions
Other Continuous Distributions
Several Useful discrete distributions

Density function

Applied Probability and Statistics for Engineering and Science Chapter 1: Populations, Samples and Processes 15
Populations, Samples and Variable
Visual Displays for Univariate Data
Describing distributions Continuous distributions
The Normal Distribution Discrete distributions
Other Continuous Distributions
Several Useful discrete distributions

Mass function
A mass function p(x) is used to describe (approximately)
the population distribution of a discrete variable x.
The following properties of p(x) must be satisfied:
P ≥0
p(x)
p(x) = 1

Applied Probability and Statistics for Engineering and Science Chapter 1: Populations, Samples and Processes 16
Populations, Samples and Variable
Visual Displays for Univariate Data
Describing distributions Definition
The Normal Distribution The standard normal distribution
Other Continuous Distributions
Several Useful discrete distributions

Definition
A continuous variable x is said to have a normal distribution
with parameters µ and σ, where −∞ < µ < ∞ and σ > 0, if
the density function of x is
1 2 2
f (x) = √ e−(x−µ) /(2σ ) with − ∞ < x < ∞ (1)
2πσ

The normal distribution is the most important distribution


in statistics.
Many population and process variables have distributions
that can be very closely fit by an appropriate normal
curve.
Applied Probability and Statistics for Engineering and Science Chapter 1: Populations, Samples and Processes 17
Populations, Samples and Variable
Visual Displays for Univariate Data
Describing distributions Definition
The Normal Distribution The standard normal distribution
Other Continuous Distributions
Several Useful discrete distributions

The standard normal distribution


The normal distribution with parameters µ = 0 and σ = 1 is
1
called the standard normal distribution f (x) = √2πσ
0.4

0.35

0.3

0.25
f(x)

0.2

0.15

0.1

0.05

0
−6 −4 −2 0 2 4 6
x

Applied Probability and Statistics for Engineering and Science Chapter 1: Populations, Samples and Processes 18
Populations, Samples and Variable
Visual Displays for Univariate Data
The lognormal distribution
Describing distributions
The Weibull distribution
The Normal Distribution
Selecting an appropriate distribution
Other Continuous Distributions
Several Useful discrete distributions

The lognormal distribution


The nonnegative variable x is said to be have a lognormal
distribution if ln(x) has a normal distribution with
parameters µ and σ.
The density function of the lognormal distribution is
( 2 2
√ 1
2πσx
e−(ln(x)−µ) /(2σ ) x > 0
f (x) = . (2)
0 for x ≤ 0.

Applied Probability and Statistics for Engineering and Science Chapter 1: Populations, Samples and Processes 19
Populations, Samples and Variable
Visual Displays for Univariate Data
The lognormal distribution
Describing distributions
The Weibull distribution
The Normal Distribution
Selecting an appropriate distribution
Other Continuous Distributions
Several Useful discrete distributions

The lognormal distribution: an example

0.014
σ=1
0.012 µ =4

0.01
lognormal distribution

0.008

0.006

0.004

0.002

0
0 50 100 150 200 250 300 350 400 450 500
x
Applied Probability and Statistics for Engineering and Science Chapter 1: Populations, Samples and Processes 20
Populations, Samples and Variable
Visual Displays for Univariate Data
The lognormal distribution
Describing distributions
The Weibull distribution
The Normal Distribution
Selecting an appropriate distribution
Other Continuous Distributions
Several Useful discrete distributions

The Weibull distribution


The distribution was introduced in 1939 by a Swedish
physicist.
A variable x has a Weibull distribution with parameters α
and β if the density function of x is
(
α α−1 −(x/β)α
βα
x e x>0
f (x) = (3)
0 x≤0

In recent years, the Weibull distribution has been used to


model engine emission of various pollutants.

Applied Probability and Statistics for Engineering and Science Chapter 1: Populations, Samples and Processes 21
Populations, Samples and Variable
Visual Displays for Univariate Data
The lognormal distribution
Describing distributions
The Weibull distribution
The Normal Distribution
Selecting an appropriate distribution
Other Continuous Distributions
Several Useful discrete distributions

The Weibull distribution: an example

2
β=1, α=1
1.8
β=1, α=1.5
1.6 β=1, α=5

1.4
Density function

1.2

0.8

0.6

0.4

0.2

0
0 0.5 1 1.5 2 2.5 3
x
Applied Probability and Statistics for Engineering and Science Chapter 1: Populations, Samples and Processes 22
Populations, Samples and Variable
Visual Displays for Univariate Data
The lognormal distribution
Describing distributions
The Weibull distribution
The Normal Distribution
Selecting an appropriate distribution
Other Continuous Distributions
Several Useful discrete distributions

Selecting an appropriate distribution


The choice of an appropriate distribution for a continuous
variable x is usually based on sample data.
An investigator must first decide whether a particular
family, such as the Weibull or the normal one, is
reasonable.
Then, any parameters of the chosen family must be
estimated to find a particular member of the family that
best fits the data.

Applied Probability and Statistics for Engineering and Science Chapter 1: Populations, Samples and Processes 23
Populations, Samples and Variable
Visual Displays for Univariate Data
Describing distributions The Binomial distribution
The Normal Distribution The Poisson distribution
Other Continuous Distributions
Several Useful discrete distributions

The Binomial distribution


Suppose that items or entities of some sort come in
batches or groups of size n.
Let denote ρ the proportion of all items in the population
or process that are satisfactory (S), so the proportion of
all items that are unsatisfactory (F) is 1 − ρ
Assume the condition of any particular item (S or F) is
independent of that of any other item.

Applied Probability and Statistics for Engineering and Science Chapter 1: Populations, Samples and Processes 24
Populations, Samples and Variable
Visual Displays for Univariate Data
Describing distributions The Binomial distribution
The Normal Distribution The Poisson distribution
Other Continuous Distributions
Several Useful discrete distributions

The Binomial distribution (cont.)


The binomial variable x is the number of S’s in a batch or
group. The mass function of x is given by the formula
n!
p(x) = proportion of batches with x S’s = ρx (1−ρ)n−
x!(n − x)!
(4)
The binomial distribution is used extensively in genetic
applications.
The use of binomial distribution can be tedious when n is
large.

Applied Probability and Statistics for Engineering and Science Chapter 1: Populations, Samples and Processes 25
Populations, Samples and Variable
Visual Displays for Univariate Data
Describing distributions The Binomial distribution
The Normal Distribution The Poisson distribution
Other Continuous Distributions
Several Useful discrete distributions

The Binomial distribution: a histogram

0.35
Binomial histogram
0.3

0.25
Proportion

0.2

0.15

0.1

0.05

0
0 1 2 3 4 5 6 7 8
x
Applied Probability and Statistics for Engineering and Science Chapter 1: Populations, Samples and Processes 26
Populations, Samples and Variable
Visual Displays for Univariate Data
Describing distributions The Binomial distribution
The Normal Distribution The Poisson distribution
Other Continuous Distributions
Several Useful discrete distributions

The Poisson distribution


The Poisson distribution is usually used as a model for the
number of times an ”event” of some sort occurs during a
specific time period or in a particular region of space.
The Poisson mass function is
e−λ λx
p(x) = x = 0, 1, 2, 3... (5)
x!
The Poisson distribution is used telephone engineering.

Applied Probability and Statistics for Engineering and Science Chapter 1: Populations, Samples and Processes 27
Populations, Samples and Variable
Visual Displays for Univariate Data
Describing distributions The Binomial distribution
The Normal Distribution The Poisson distribution
Other Continuous Distributions
Several Useful discrete distributions

The Poisson distribution: a histogram

0.35
λ=2
0.3

0.25
Poisson histogram

0.2

0.15

0.1

0.05

0
0 1 2 3 4 5 6 7 8
x
Applied Probability and Statistics for Engineering and Science Chapter 1: Populations, Samples and Processes 28

You might also like