The document discusses a data science specialization that includes courses covering topics like the data scientist's toolbox, R programming, getting and cleaning data, exploratory data analysis, reproducible research, statistical inference, regression models, machine learning, developing data products, and a capstone project. The specialization provides skills in areas like machine learning, R programming, data analysis, data manipulation, and developing interactive data products to tell stories about data. It takes a multidisciplinary approach to address challenges in big data and helps decision makers across different industries.
The document discusses a data science specialization that includes courses covering topics like the data scientist's toolbox, R programming, getting and cleaning data, exploratory data analysis, reproducible research, statistical inference, regression models, machine learning, developing data products, and a capstone project. The specialization provides skills in areas like machine learning, R programming, data analysis, data manipulation, and developing interactive data products to tell stories about data. It takes a multidisciplinary approach to address challenges in big data and helps decision makers across different industries.
The document discusses a data science specialization that includes courses covering topics like the data scientist's toolbox, R programming, getting and cleaning data, exploratory data analysis, reproducible research, statistical inference, regression models, machine learning, developing data products, and a capstone project. The specialization provides skills in areas like machine learning, R programming, data analysis, data manipulation, and developing interactive data products to tell stories about data. It takes a multidisciplinary approach to address challenges in big data and helps decision makers across different industries.
The document discusses a data science specialization that includes courses covering topics like the data scientist's toolbox, R programming, getting and cleaning data, exploratory data analysis, reproducible research, statistical inference, regression models, machine learning, developing data products, and a capstone project. The specialization provides skills in areas like machine learning, R programming, data analysis, data manipulation, and developing interactive data products to tell stories about data. It takes a multidisciplinary approach to address challenges in big data and helps decision makers across different industries.
Download as PPTX, PDF, TXT or read online from Scribd
Download as pptx, pdf, or txt
You are on page 1of 21
At a glance
Powered by AI
Data science involves analyzing large amounts of data from various sources to help decision making. Key topics discussed include what is data science, skills gained, and developing data products.
Data science is a multidisciplinary field that uses techniques and theories from many fields like statistics, computer science etc. to extract knowledge from large amounts of data.
Skills like machine learning, programming in R and Python, data analysis, data manipulation, data visualization can be gained from studying data science.
Data Science Specialization
SUBMITTED TO- SUBMITTED BY-
Mr.Pratimal Singh Seenu Mangal ROLLNO:18ESKCS153 CONTENT Why should study Data Science ? How does Data Science impact Organizations ? Application and competitive advantage of Data Science Organization ? Importance to Data science Society Road to become a data scientist What is Data Science ? Lots of data is being collected and warehoused 1. Web data, e-commerce 2. Financial transactions, bank/credit transactions 3. Online trading and purchasing 4. Social Network Introduction-Data Science An area that manages, manipulates, extracts, and interprets knowledge from tremendous amount of data Data science (DS) is a multidisciplinary field of study with goal to address the challenges in big data Data science principles apply to all data – big and small Theories and techniques from many fields and disciplines are used to investigate and analyze a large amount of data to help decision makers in many industries such as science, engineering, economics, politics, finance, and education Multidisciplinary Requirement Skills Gain Github Machine Learning R Programming Regression Analysis Data Science R Studio Data Analysis Debugging Data Manipulation Regular Expression (REGEX) Data Cleansing Cluster Analysis The Data Scientist’s Toolbox Overview of the data, questions, and tools that data analysts and data scientists work with. There are mainly two components in this course. The first is a conceptual introduction to the ideas behind turning data into actionable knowledge. The second is a practical introduction to the tools that will be used in the program like version control, markdown, Git, GitHub, R, and R Studio. R Programming Program in R and use R for effective data analysis. Install and configure software necessary for a statistical programming environment and describe generic programming language concepts as they are implemented in a high-level statistical language. The course covers practical issues in statistical computing which includes programming in R, reading data into R, accessing R packages, writing R functions, debugging, profiling R code, and organizing and commenting R code. Getting and Cleaning Data The course has covered obtaining data from the web, from APIs, from databases and from colleagues in various formats. It will also cover the basics of data cleaning and how to make data “tidy”. Tidy data dramatically speed downstream data analysis tasks. The components of a complete data set including raw data, processing instructions, codebooks, and processed data. The basics needed for collecting, cleaning, and sharing data. The essential exploratory Exploratory techniques for summarizing data. These techniques are typically Data applied before formal modeling commences and can help inform the development of more Analysis complex statistical models. Exploratory techniques are also important for eliminating or sharpening potential hypotheses about the world that can be addressed by the data. The plotting systems in R as well as some of the basic principles of constructing data graphics. The common multivariate statistical techniques used to Fig – graph between count and visualize high-dimensional data. weight Reproducible Research The concepts and tools behind reporting modern data analyses in a reproducible manner. Reproducible research is the idea that data analyses, and more generally, scientific claims, are published with their data and software code so that others may verify the findings and build upon them. The need for reproducibility is increasing dramatically as data analyses become more complex, involving larger datasets and more sophisticated computations. Reproducibility allows for people to focus on the actual content of a data analysis, rather than on superficial details reported in a written summary. In addition, reproducibility makes an analysis more useful to others because the data and code that actually conducted the analysis are available. Literate statistical analysis tools which allow one to publish data analyses in a single document that allows others to easily execute the same analysis to obtain the same results. Statistical Inference Statistical inference is the process of drawing conclusions about populations or scientific truths from data. There are many modes of performing inference including statistical modeling, data oriented strategies and explicit use of designs and randomization in analyses. Furthermore, there are broad theories (frequency, Bayesian, likelihood, design based, …) and numerous complexities (missing data, observed and unobserved confounding, biases) for performing inference. A practitioner can often be left in a debilitating maze of techniques, philosophies and nuance. The fundamentals of inference in a practical approach for getting things done. The broad directions of statistical inference and use this information for making Regression Models Linear models, as their name implies, relates an outcome to a set of predictors of interest using linear assumptions. Regression models, a subset of linear models, are the most important statistical analysis tool in a data scientist’s toolkit. This course covers regression analysis, least squares and inference using regression models. Special cases of the regression model, ANOVA and ANCOVA will be covered as well. Analysis of residuals and variability will be investigated. The course will cover modern thinking on model selection and novel uses of regression models including scatterplot smoothing. Practical Machine Learning One of the most common tasks performed by data scientists and data analysts are prediction and machine learning. The basic components of building and applying prediction functions with an emphasis on practical applications. It provide basic grounding in concepts such as training and tests sets, over fitting, and error rates. The course will also introduce a range of model based and algorithmic machine learning methods including regression, classification trees, Naive Bayes, and random forests. The complete process of building prediction functions including data collection, feature creation, algorithms, and evaluation. Developing Data Products A data product is the production output from a statistical analysis. Data products automate complex analysis tasks or use technology to expand the utility of a data informed model, algorithm or inference. It covers the basics of creating data products using Shiny, R packages, and interactive graphics. It focus on the statistical fundamentals of creating a data product that can be used to tell a story about data to a mass audience. Data Science Capstone The capstone project class will allow students to create a usable/public data product that can be used to show your skills to potential employers. Projects will be drawn from real-world problems and will be conducted with industry, government, and academic partners. Concentration in Data Science Mathematics and Applied Mathematics Applied Statistics/Data Analysis Solid Programming Skills (R, Python, Julia, SQL) Data Mining Data Base Storage and Management Machine Learning and discovery References https://en.wikipedia.org/wiki/Data_science https://www.coursera.org/specializations/jhu-data-science https://verify.wiki/wiki/Data_Science 3D Model Creation With Autodesk Fusion 360 Conceptual Design 2D Sketches to 3D Solid Model Generative Wing Design Photorealistic renderings Thank You…