The 8 Basic Statistics Concepts For Data Science - KDnuggets
The 8 Basic Statistics Concepts For Data Science - KDnuggets
The 8 Basic Statistics Concepts For Data Science - KDnuggets
News
Programming
Python JOIN NEWSLETTER
SQL
Datasets
Education
Certificates
Courses
Online Masters
Add AI to your Windows apps
Resources
Cheat Sheets
Science
Projects
Publications
Webinars
Understanding the fundamentals of statistics is a core capability for becoming a Data Scientist. Review
these essential ideas that will be pervasive in your work and raise your expertise in the field.
By Shirley Chen, Data Analyst @ Outdoorsy on April 21, 2022 in Data Science
Latest Posts
Statistics is a form of mathematical analysis that uses quantified models and Applying Descriptive and Inferentia
Statistics in Python
representations for a given set of experimental data or real-life studies. The main
advantage of statistics is that information is presented in an easy way. Recently, I reviewed
all the statistics materials and organized the 8 basic statistics concepts for becoming a data
Blog Top Posts
scientist! Top Posts
Submissions
About
Understand the Type of Analytics
Topics
Probability Artificial Intelligence
Career Advice
Central Tendency
Computer Vision
Variability Data Engineering
Data Science
Relationship Between Variables
Machine Learning
MLOps
Probability Distribution NLP
News
https://www.kdnuggets.com/2020/06/8-basic-statistics-concepts.html 1/13
9/17/23, 10:31 AM The 8 Basic Statistics Concepts for Data Science - KDnuggets
News
Programming
Hypothesis Testing and Statistical Significance
Python Understanding
J O I Machine
N N E W S LLearning
ETTER
Regression SQL Algorithms: An In-Depth Overview
Resources
Descriptive Analytics tells us what happened in the past and helps a business understand How to Select Rows and Columns i
Cheat Sheets Pandas Using [ ], .loc, iloc, .at and .
how it is performing by providing context to help stakeholders interpret information.
Events
Jobs Introduction to Databases in Data
Diagnostic Analytics takes descriptive
Projects data a step further and helps you understand why
Science
Publications
something happened in the past.
Webinars
4 Ways to Rename Pandas Column
Predictive Analytics predicts what is most likely to happen in the future and provides
companies with actionable insights based on the information. 3 Ways to Access GPT-4 for Free
Prescriptive Analytics provides recommendations regarding actions that will take Decision Tree Algorithm, Explained
advantage of the predictions and guide the possible actions toward a solution.
Working with Big Data: Tools and
Techniques
Probability is the measure of the likelihood that an event will occur in a Random
Experiment.
Conditional Probability: P(A|B) is a measure of the probability of one event occurring with
About
By subscribing you accept KDnuggets Privacy P
some relationship to one or Topics
more other events. P(A|B)=P(A∩B)/P(B), when P(B)>0.
Independent Events: Two events areIntelligence
Artificial independent if the occurrence of one does not affect
Career Advice
the probability of occurrence ofComputer
the other.
VisionP(A∩B)=P(A)P(B) where P(A) != 0 and P(B) != 0 ,
Data Engineering
P(A|B)=P(A), P(B|A)=P(B)
Data Science
Machine Learning
Mutually Exclusive Events: Two events are mutually exclusive if they cannot both occur at
MLOps
NLP
the same time. P(A∩B)=0 and P(A∪B)=P(A)+P(B).
News
https://www.kdnuggets.com/2020/06/8-basic-statistics-concepts.html 2/13
9/17/23, 10:31 AM The 8 Basic Statistics Concepts for Data Science - KDnuggets
News
Bayes’ Theorem describes theProgramming
probability of an event based on prior knowledge of
Python JOIN NEWSLETTER
conditions that might be related
SQLto the event.
Datasets
Education
Certificates
Courses
Online Masters
Resources
Cheat Sheets
Events Bayes’ Theorem.
Jobs
Projects
Publications
Central Tendency
Webinars
Mode: The most frequent value in the dataset. If the data have multiple values that
occurred the most frequently, we have a multimodal distribution.
Kurtosis: A measure of whether the data are heavy-tailed or light-tailed relative to a normal
distribution
Skewness.
Kurtosis.
Blog
Top Posts
Submissions
About
Variability
Topics
Artificial Intelligence
Range: The difference betweenCareer
the highest
Advice and lowest value in the dataset.
Computer Vision
Data EngineeringRange (IQR)
Percentiles, Quartiles and Interquartile
Data Science
Machine Learning
Percentiles — A measure thatMLOpsindicates the value below which a given percentage of
NLP
observations in a group of observations falls.
News
https://www.kdnuggets.com/2020/06/8-basic-statistics-concepts.html 3/13
9/17/23, 10:31 AM The 8 Basic Statistics Concepts for Data Science - KDnuggets
News
Programming
Quantiles— Values that divide the number of data points into four more or less equal
Python JOIN NEWSLETTER
parts, or quarters. SQL
Resources
Cheat Sheets
Events
Jobs
Projects
Publications
Webinars
Variance: The average squared difference of the values from the mean to measure how
spread out a set of data is relative to mean.
Standard Deviation: The standard difference between each data point and the mean and
the square root of variance.
Standard Error (SE): An estimate of the standard deviation of the sampling distribution.
Blog
Top Posts
Submissions
About
Topics
Artificial Intelligence
Career Advice
Computer Vision
Data Engineering
Population and Sample Standard Error.
Data Science
Machine Learning
MLOps
NLP
News
https://www.kdnuggets.com/2020/06/8-basic-statistics-concepts.html 4/13
9/17/23, 10:31 AM The 8 Basic Statistics Concepts for Data Science - KDnuggets
News
Relationship Between Variables
Programming
Python JOIN NEWSLETTER
SQL
Causality: Relationship between two events where one event is affected by the other.
Datasets
Education
Covariance: A quantitative measure of the joint variability between two or more variables.
Certificates
Courses
Correlation: Measure the relationship between two variables and ranges from -1 to 1, the
Online Masters
Probability Distributions
Probability Distribution Functions
Probability Mass Function (PMF): A function that gives the probability that a discrete
random variable is exactly equal to some value.
Probability Density Function (PDF): A function for continuous data where the value at any
given sample can be interpreted as providing a relative likelihood that the value of the
random variable would equal that sample.
Blog
Cumulative Density FunctionTop
(CDF):
Posts
A function that gives the probability that a random
variable is less than or equal toSubmissions
a certain value.
About
Topics
Artificial Intelligence
Career Advice
Computer Vision
Data Engineering
Data Science
Machine Learning
MLOps
NLP
News
https://www.kdnuggets.com/2020/06/8-basic-statistics-concepts.html 5/13
9/17/23, 10:31 AM The 8 Basic Statistics Concepts for Data Science - KDnuggets
News
Programming
Python JOIN NEWSLETTER
SQL
Datasets
Education
Certificates
Courses
Online Masters
Resources
Cheat Sheets
Events
Jobs
Projects
Publications
Webinars
Chi-Square Distribution: The distribution of the sum of squared standard normal deviates.
Blog
Top Posts
Submissions
About
Topics
Artificial Intelligence
Career Advice
Computer Vision
Data Engineering
Data Science
Machine Learning
MLOps
NLP
News
https://www.kdnuggets.com/2020/06/8-basic-statistics-concepts.html 6/13
9/17/23, 10:31 AM The 8 Basic Statistics Concepts for Data Science - KDnuggets
News
Programming
Python JOIN NEWSLETTER
SQL
Datasets
Education
Certificates
Courses
Online Masters
Resources
Cheat Sheets
Events
Jobs
Projects
Publications
Webinars
Discrete Probability Distribution
Bernoulli Distribution: The distribution of a random variable which takes a single trial and
only 2 possible outcomes, namely 1(success) with probability p, and 0(failure) with
probability (1-p).
Poisson Distribution: The distribution that expresses the probability of a given number of
events k occurring in a fixed interval of time if these events occur with a known constant
average rate λ and independently of the time.
Blog
Top Posts
Submissions
About
Topics
Artificial Intelligence
Career Advice
Computer Vision
Data Engineering
Data Science
Machine Learning
Hypothesis Testing and Statistical Significance
MLOps
NLP
News
https://www.kdnuggets.com/2020/06/8-basic-statistics-concepts.html 7/13
9/17/23, 10:31 AM The 8 Basic Statistics Concepts for Data Science - KDnuggets
News
Programming
Python JOIN NEWSLETTER
SQL
Interpretation
P-value: The probability of the test statistic being at least as extreme as the one observed
given that the null hypothesis is true. When p-value > α, we fail to reject the null hypothesis,
while p-value ≤ α, we reject the null hypothesis, and we can conclude that we have a
significant result.
Critical Value: A point on the scale of the test statistic beyond which we reject the null
hypothesis and is derived from the level of significance α of the test. It depends upon a test
statistic, which is specific to the type of test, and the significance level, α, which defines the
sensitivity of the test.
Significance Level and Rejection Region: The rejection region is actually dependent on
the significance level. The significance level is denoted by α and is the probability of
rejecting the null hypothesis if it is true.
Blog
Z-Test Top Posts
Submissions
A Z-test is any statistical test for which the distribution of the test statistic under the null
About
Datasets
Education
Certificates
Courses
Online Masters
Resources
Cheat Sheets
Events
Jobs
T-Test Projects
Publications
Webinars
A T-test is the statistical test if the population variance is unknown, and the sample size is
not large (n < 30).
Paired sample means that we collect data twice from the same group, person, item, or
thing. Independent sample implies that the two samples must have come from two
completely different populations.
Blog
Top Posts
Submissions
About
Topics
Artificial Intelligence
Career Advice
Computer Vision
Data Engineering
ANOVA Table.
Data Science
Machine Learning
MLOps
NLP
News
https://www.kdnuggets.com/2020/06/8-basic-statistics-concepts.html 9/13
9/17/23, 10:31 AM The 8 Basic Statistics Concepts for Data Science - KDnuggets
News
Chi-Square Test Programming
Python JOIN NEWSLETTER
SQL
Datasets
Education
Certificates
Courses
Online Masters
Regression
Linear Regression
Assumptions of Linear Regression
Linear Relationship
Multivariate Normality
No or Little Multicollinearity
No or Little Autocorrelation
Homoscedasticity
Blog
Top Posts
Submissions
About
Topics
Artificial Intelligence
Career Advice
Computer Linear Regression Formula.
Vision
Data Engineering
Data Science
Machine Learning
Multiple Linear Regression isMLOpsa linear approach to modeling the relationship between a
NLP
dependent variable and two or more independent variables.
News
https://www.kdnuggets.com/2020/06/8-basic-statistics-concepts.html 10/13
9/17/23, 10:31 AM The 8 Basic Statistics Concepts for Data Science - KDnuggets
News
Programming
Python JOIN NEWSLETTER
SQL
Datasets
Education
Certificates
Courses
Online Masters
Resources
Cheat Sheets
Events
Jobs
Projects
Publications
Webinars
Multiple Linear Regression Formula.
Step 2: Check the data, categorical data, missing data, and outliers
Outlier is a data point that differs significantly from other observations. We can use the
standard deviation method and interquartile range (IQR) method.
Dummy variable takes only the value 0 or 1 to indicate the effect for categorical
variables.
Step 3: Simple Analysis — Check the effect comparing between dependent variable to
independent variable and independent variable to independent variable
Step 4: Multiple Linear Regression — Check the model and the correct variables
Datasets
Education
Certificates
Courses
Shirley Chen is a Data AnalystOnline
at Outdoorsy.
Masters
Resources
Cheat Sheets
Original. Reposted with permission.
Events
Jobs
Projects
Publications
More On This Topic Webinars
KDnuggets News, June 29: 20 Basic Linux Commands for Data Science…
20 Basic Linux Commands for Data Science Beginners
Advanced Statistical Concepts in Data Science
20 Core Data Science Concepts for Beginners
10 Statistical Concepts You Should Know For Data Science Interviews
7 SQL Concepts You Should Know For Data Science
SIGN UP
Top Posts
Understanding Machine Learning Algorithms: An In-Depth Overview
© 2023 Guiding Tech Media | About | Contact | Privacy Policy | Terms of Service
Datasets
Education
Certificates
Courses
Online Masters
Resources
Cheat Sheets
Events
Jobs
Projects
Publications
Webinars
https://www.kdnuggets.com/2020/06/8-basic-statistics-concepts.html 13/13