Data Science Specialization

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 21
At a glance
Powered by AI
Data science involves analyzing large amounts of data from various sources to help decision making. Key topics discussed include what is data science, skills gained, and developing data products.

Data science is a multidisciplinary field that uses techniques and theories from many fields like statistics, computer science etc. to extract knowledge from large amounts of data.

Skills like machine learning, programming in R and Python, data analysis, data manipulation, data visualization can be gained from studying data science.

Data Science Specialization

SUBMITTED TO- SUBMITTED BY-


Mr.Pratimal Singh Seenu Mangal
ROLLNO:18ESKCS153
CONTENT
 Why should study Data Science ?
 How does Data Science impact Organizations ?
 Application and competitive advantage of Data Science Organization ?
 Importance to Data science Society
 Road to become a data scientist
What is Data Science ?
 Lots of data is being collected and warehoused
1. Web data, e-commerce
2. Financial transactions, bank/credit transactions
3. Online trading and purchasing
4. Social Network
Introduction-Data Science
 An area that manages, manipulates, extracts, and interprets knowledge
from tremendous amount of data
 Data science (DS) is a multidisciplinary field of study with goal to
address the challenges in big data
 Data science principles apply to all data – big and small
 Theories and techniques from many fields and disciplines are used to
investigate and analyze a large amount of data to help decision makers
in many industries such as science, engineering, economics, politics,
finance, and education
Multidisciplinary
Requirement
Skills Gain
 Github
 Machine Learning
 R Programming
 Regression Analysis
 Data Science
 R Studio
 Data Analysis
 Debugging
 Data Manipulation
 Regular Expression (REGEX)
 Data Cleansing
 Cluster Analysis
The Data Scientist’s Toolbox
 Overview of the data, questions, and tools that data analysts and data
scientists work with.
 There are mainly two components in this course.
 The first is a conceptual introduction to the ideas behind turning data
into actionable knowledge.
 The second is a practical introduction to the tools that will be used in
the program like version control, markdown, Git, GitHub, R, and R
Studio.
R Programming
 Program in R and use R for effective data analysis.
 Install and configure software necessary for a statistical programming
environment and describe generic programming language concepts as
they are implemented in a high-level statistical language.
 The course covers practical issues in statistical computing which
includes programming in R, reading data into R, accessing R
packages, writing R functions, debugging, profiling R code, and
organizing and commenting R code.
Getting and Cleaning Data
 The course has covered obtaining data from the web, from APIs, from
databases and from colleagues in various formats.
 It will also cover the basics of data cleaning and how to make data
“tidy”. Tidy data dramatically speed downstream data analysis tasks.
 The components of a complete data set including raw data, processing
instructions, codebooks, and processed data.
 The basics needed for collecting, cleaning, and sharing data.
 The essential exploratory
Exploratory techniques for summarizing data.
 These techniques are typically
Data applied before formal modeling
commences and can help inform
the development of more
Analysis complex statistical models.
 Exploratory techniques are also
important for eliminating or
sharpening potential hypotheses
about the world that can be
addressed by the data.
 The plotting systems in R as well
as some of the basic principles of
constructing data graphics.
 The common multivariate
statistical techniques used to
Fig – graph between count and visualize high-dimensional data.
weight
Reproducible Research
 The concepts and tools behind reporting modern data analyses in a
reproducible manner.
 Reproducible research is the idea that data analyses, and more generally,
scientific claims, are published with their data and software code so that
others may verify the findings and build upon them.
 The need for reproducibility is increasing dramatically as data analyses
become more complex, involving larger datasets and more sophisticated
computations.
 Reproducibility allows for people to focus on the actual content of a data
analysis, rather than on superficial details reported in a written summary.
In addition, reproducibility makes an analysis more useful to others
because the data and code that actually conducted the analysis are
available.
 Literate statistical analysis tools which allow one to publish data
analyses in a single document that allows others to easily execute the
same analysis to obtain the same results.
Statistical Inference
 Statistical inference is the process of drawing
conclusions about populations or scientific truths from
data.
 There are many modes of performing inference
including statistical modeling, data oriented strategies
and explicit use of designs and randomization in
analyses. Furthermore, there are broad theories
(frequency, Bayesian, likelihood, design based, …) and
numerous complexities (missing data, observed and
unobserved confounding, biases) for performing
inference.
 A practitioner can often be left in a debilitating maze of
techniques, philosophies and nuance.
 The fundamentals of inference in a practical approach
for getting things done. The broad directions of
statistical inference and use this information for making
Regression Models
 Linear models, as their name implies, relates an outcome to a
set of predictors of interest using linear assumptions.
 Regression models, a subset of linear models, are the most
important statistical analysis tool in a data scientist’s toolkit.
 This course covers regression analysis, least squares and
inference using regression models. Special cases of the
regression model, ANOVA and ANCOVA will be covered as
well.
 Analysis of residuals and variability will be investigated. The
course will cover modern thinking on model selection and
novel uses of regression models including scatterplot
smoothing.
Practical Machine Learning
 One of the most common tasks performed by data scientists and
data analysts are prediction and machine learning.
 The basic components of building and applying prediction
functions with an emphasis on practical applications.
 It provide basic grounding in concepts such as training and tests
sets, over fitting, and error rates.
 The course will also introduce a range of model based and
algorithmic machine learning methods including regression,
classification trees, Naive Bayes, and random forests.
 The complete process of building prediction functions including
data collection, feature creation, algorithms, and evaluation.
Developing Data Products
 A data product is the production output from a statistical
analysis.
 Data products automate complex analysis tasks or use
technology to expand the utility of a data informed
model, algorithm or inference.
 It covers the basics of creating data products using
Shiny, R packages, and interactive graphics.
 It focus on the statistical fundamentals of creating a data
product that can be used to tell a story about data to a
mass audience.
Data Science Capstone
 The capstone project class will allow students to create a
usable/public data product that can be used to show your
skills to potential employers.
 Projects will be drawn from real-world problems and
will be conducted with industry, government, and
academic partners.
Concentration in Data Science
 Mathematics and Applied Mathematics
 Applied Statistics/Data Analysis
 Solid Programming Skills (R, Python, Julia,
SQL)
 Data Mining
 Data Base Storage and Management
 Machine Learning and discovery
References
 https://en.wikipedia.org/wiki/Data_science
 https://www.coursera.org/specializations/jhu-data-science
 https://verify.wiki/wiki/Data_Science
3D Model Creation With Autodesk Fusion 360
 Conceptual Design
 2D Sketches to 3D Solid Model
 Generative Wing Design
 Photorealistic renderings
Thank You…

You might also like