Descriptive Statsistics
Descriptive Statsistics
Descriptive Statsistics
Why Study
Statistics
• Technological developments, Revolution of Internet and social
networks, data generated from electronic devices, produce large
amount of data
• Large storage capacity
• Advancement in enormous computing power to effectively process
and analyze large amount of data
• Better data visualization from Business Intelligence
• Discovery of patterns and trends from this data can help
organizations gain competitive advantage in marketplace
Types of Statistics
• Descriptive statistics is concerned with Data Summarization
Graphs/Charts and tables.
Data
Categorical Numerical
(Quantitative
(Qualitative
E.g. Gender, Location of
store, Preference
)
) Discrete Continuous
E.g. Family size, Number of E.g. Waiting time, Length of
rooms in a hotel, number of a part produced
credit cards issued
Measurement Scales
• Nominal –e.g. Internet service provider
• 340 340 350 350 340 340 320 340 330 330
Affected by extreme values. Not affected by extreme values. Not affected by extreme values.
Can be treated algebraically. That is, Cannot be treated algebraically. That Cannot be treated algebraically.
Means of several groups can be is, Medians of several groups cannot That is, Modes of several groups
combined. be combined. cannot be combined.
.
Measures of Dispersion
• In simple terms, of dispersion indicate how large the spread
of the distribution is around the central tendency.
• It answers unambiguously the equation
Range X Maximum –
=
X Minimum
Range -Example
Range = X Maximum –X
minimum = 18 - 9 =
9
Inter-Quartile Range(IQR)
• IQR =Q3-Q1
Interquartile Range-Example
Histogram is a graphical representation of the frequency distribution in which the X-axis represents the
classes and the Y-axis represents the frequencies in bars.
Histogram depicts the pattern of the distribution emerging from the characteristic being measured.
The Empirical Rule
• The empirical rule approximates the variation of data in the
bell-shaped distribution.
• Approximately 68% of the data in a bell shaped distribution is
within 1 standard deviation of the mean or
The Empirical Rule
• Approximately 95% of the data is a bell-shaped distribution lies within two
standard deviations of the mean, or
• Most companies are now recognizing the power of data in making crucial
business decisions. For an Insurance company, it becomes more important to
study various attributes about their customers. Leveraging this customer
information to make business decisions can provide a competitive edge to the
Company over other players in the market
• We are provided with some customer data of an Insurance company like age,
gender, BMI and medical charges billed by insurance company. We need to
explore this data to see if we can derive some meaningful insights from this data.
Five number summary and
The Boxplot
• The Boxplot: A graphical display of the data on the five-
number summary:
Five number summary:
Shape of Boxplots
• If data is symmetric around the median then the box and
central line are centered between the endpoints.