Data Science Programming All-in-One For Dummies

Ebook1,289 pages15 hours

Data Science Programming All-in-One For Dummies

Name: Data Science Programming All-in-One For Dummies
Author: John Paul Mueller
ISBN: 9781119626145

By John Paul Mueller and Luca Massaron

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Your logical, linear guide to the fundamentals of data science programming

Data science is exploding—in a good way—with a forecast of 1.7 megabytes of new information created every second for each human being on the planet by 2020 and 11.5 million job openings by 2026. It clearly pays dividends to be in the know. This friendly guide charts a path through the fundamentals of data science and then delves into the actual work: linear regression, logical regression, machine learning, neural networks, recommender engines, and cross-validation of models.

Data Science Programming All-In-One For Dummies is a compilation of the key data science, machine learning, and deep learning programming languages: Python and R. It helps you decide which programming languages are best for specific data science needs. It also gives you the guidelines to build your own projects to solve problems in real time.

Get grounded: the ideal start for new data professionals
What lies ahead: learn about specific areas that data is transforming
Be meaningful: find out how to tell your data story
See clearly: pick up the art of visualization

Whether you’re a beginning student or already mid-career, get your copy now and add even more meaning to your life—and everyone else’s!

Skip carousel

Computers

LanguageEnglish

PublisherWiley

Release dateDec 9, 2019

ISBN9781119626145

Author

John Paul Mueller

John Paul Mueller is a technical editor and freelance author who has written on topics ranging from database management to heads-down programming, from networking to artificial intelligence. He is the author of Start Here!™ Learn Microsoft Visual C#® 2010.

Related to Data Science Programming All-in-One For Dummies

Related ebooks

Skip carousel

Data Science For Dummies
Ebook
Data Science For Dummies
byLillian Pierson
Rating: 4 out of 5 stars
4/5
Data Science Strategy For Dummies
Ebook
Data Science Strategy For Dummies
byUlrika Jägare
Rating: 0 out of 5 stars
0 ratings
SQL All-in-One For Dummies
Ebook
SQL All-in-One For Dummies
byAllen G. Taylor
Rating: 4 out of 5 stars
4/5
SQL For Dummies
Ebook
SQL For Dummies
byAllen G. Taylor
Rating: 0 out of 5 stars
0 ratings
Blockchain Data Analytics For Dummies
Ebook
Blockchain Data Analytics For Dummies
byMichael G. Solomon
Rating: 0 out of 5 stars
0 ratings
C++ All-in-One For Dummies
Ebook
C++ All-in-One For Dummies
byJohn Paul Mueller
Rating: 4 out of 5 stars
4/5
Data Mining For Dummies
Ebook
Data Mining For Dummies
byMeta S. Brown
Rating: 4 out of 5 stars
4/5
Artificial Intelligence For Dummies
Ebook
Artificial Intelligence For Dummies
byJohn Paul Mueller
Rating: 3 out of 5 stars
3/5
Python All-in-One For Dummies
Ebook
Python All-in-One For Dummies
byJohn C. Shovic
Rating: 0 out of 5 stars
0 ratings
Web Coding & Development All-in-One For Dummies
Ebook
Web Coding & Development All-in-One For Dummies
byPaul McFedries
Rating: 1 out of 5 stars
1/5
Coding All-in-One For Dummies
Ebook
Coding All-in-One For Dummies
byNikhil Abraham
Rating: 4 out of 5 stars
4/5
Excel Data Analysis For Dummies
Ebook
Excel Data Analysis For Dummies
byPaul McFedries
Rating: 0 out of 5 stars
0 ratings
Business Statistics For Dummies
Ebook
Business Statistics For Dummies
byAlan Anderson
Rating: 5 out of 5 stars
5/5
AWS For Developers For Dummies
Ebook
AWS For Developers For Dummies
byJohn Paul Mueller
Rating: 0 out of 5 stars
0 ratings
Tableau For Dummies
Ebook
Tableau For Dummies
byMolly Monsey
Rating: 4 out of 5 stars
4/5
Coding For Dummies
Ebook
Coding For Dummies
byNikhil Abraham
Rating: 5 out of 5 stars
5/5
Beginning Programming with C++ For Dummies
Ebook
Beginning Programming with C++ For Dummies
byStephen R. Davis
Rating: 0 out of 5 stars
0 ratings
HTML5 and CSS3 All-in-One For Dummies
Ebook
HTML5 and CSS3 All-in-One For Dummies
byAndy Harris
Rating: 0 out of 5 stars
0 ratings
Data Lakes For Dummies
Ebook
Data Lakes For Dummies
byAlan R. Simon
Rating: 0 out of 5 stars
0 ratings
Cloud Computing For Dummies
Ebook
Cloud Computing For Dummies
byJudith S. Hurwitz
Rating: 5 out of 5 stars
5/5
PHP and MySQL For Dummies
Ebook
PHP and MySQL For Dummies
byJanet Valade
Rating: 4 out of 5 stars
4/5
Adobe Analytics For Dummies
Ebook
Adobe Analytics For Dummies
byDavid Karlins
Rating: 5 out of 5 stars
5/5
Coding All-in-One For Dummies
Ebook
Coding All-in-One For Dummies
byChris Minnick
Rating: 0 out of 5 stars
0 ratings
C++ For Dummies
Ebook
C++ For Dummies
byStephen R. Davis
Rating: 3 out of 5 stars
3/5
PHP, MySQL, & JavaScript All-in-One For Dummies
Ebook
PHP, MySQL, & JavaScript All-in-One For Dummies
byRichard Blum
Rating: 5 out of 5 stars
5/5
Python for Data Science For Dummies
Ebook
Python for Data Science For Dummies
byJohn Paul Mueller
Rating: 0 out of 5 stars
0 ratings
Predictive Analytics For Dummies
Ebook
Predictive Analytics For Dummies
byAnasse Bari
Rating: 3 out of 5 stars
3/5
Beginning Programming with Python For Dummies
Ebook
Beginning Programming with Python For Dummies
byJohn Paul Mueller
Rating: 3 out of 5 stars
3/5
Functional Programming For Dummies
Ebook
Functional Programming For Dummies
byJohn Paul Mueller
Rating: 0 out of 5 stars
0 ratings
R For Dummies
Ebook
R For Dummies
byAndrie de Vries
Rating: 4 out of 5 stars
4/5

Computers For You

Skip carousel

Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Ebook
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
byMargot Lee Shetterly
Rating: 4 out of 5 stars
4/5
The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution
Ebook
The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution
byWalter Isaacson
Rating: 4 out of 5 stars
4/5
The Invisible Rainbow: A History of Electricity and Life
Ebook
The Invisible Rainbow: A History of Electricity and Life
byArthur Firstenberg
Rating: 5 out of 5 stars
5/5
Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad
Ebook
Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad
byAaron Smith
Rating: 0 out of 5 stars
0 ratings
The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology
Ebook
The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology
byTJ Books
Rating: 3 out of 5 stars
3/5
101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters
Ebook
101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters
byTriumph Books
Rating: 4 out of 5 stars
4/5
Elon Musk
Ebook
Elon Musk
byWalter Isaacson
Rating: 4 out of 5 stars
4/5
Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics
Ebook
Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics
byGary Smith
Rating: 4 out of 5 stars
4/5
Computer Science I Essentials
Ebook
Computer Science I Essentials
byRandall Raus
Rating: 5 out of 5 stars
5/5
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
Ebook
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
byWalter Shields
Rating: 4 out of 5 stars
4/5
Uncanny Valley: A Memoir
Ebook
Uncanny Valley: A Memoir
byAnna Wiener
Rating: 4 out of 5 stars
4/5
Deep Search: How to Explore the Internet More Effectively
Ebook
Deep Search: How to Explore the Internet More Effectively
byAlan Pearce
Rating: 5 out of 5 stars
5/5
The Simulation Hypothesis: An MIT Computer Scientist Shows Why AI, Quantum Physics and Eastern Mystics All Agree We Are In a Video Game
Ebook
The Simulation Hypothesis: An MIT Computer Scientist Shows Why AI, Quantum Physics and Eastern Mystics All Agree We Are In a Video Game
byRizwan Virk
Rating: 5 out of 5 stars
5/5
ChatGPT Money Machine 2024 - The Ultimate Chatbot Cheat Sheet to Go From Clueless Noob to Prompt Prodigy Fast! Complete AI Beginner’s Course to Catch the GPT Gold Rush Before It Leaves You Behind
Ebook
ChatGPT Money Machine 2024 - The Ultimate Chatbot Cheat Sheet to Go From Clueless Noob to Prompt Prodigy Fast! Complete AI Beginner’s Course to Catch the GPT Gold Rush Before It Leaves You Behind
byAlec Rowe
Rating: 0 out of 5 stars
0 ratings
Alan Turing: The Enigma: The Book That Inspired the Film The Imitation Game - Updated Edition
Ebook
Alan Turing: The Enigma: The Book That Inspired the Film The Imitation Game - Updated Edition
byAndrew Hodges
Rating: 4 out of 5 stars
4/5
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
Ebook
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
bySteven Cooper
Rating: 4 out of 5 stars
4/5
How to Create Cpn Numbers the Right way: A Step by Step Guide to Creating cpn Numbers Legally
Ebook
How to Create Cpn Numbers the Right way: A Step by Step Guide to Creating cpn Numbers Legally
byAlex Parkinson
Rating: 4 out of 5 stars
4/5
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
Ebook
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
byCea West
Rating: 4 out of 5 stars
4/5
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
Ebook
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
byArthur T. Brooks
Rating: 0 out of 5 stars
0 ratings
CompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61
Ebook
CompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61
byQuentin Docter
Rating: 0 out of 5 stars
0 ratings
How to Write a Book: An 11-Step Process to Build Habits, Stop Procrastinating, Fuel Self-Motivation, Quiet Your Inner Critic, Bust Through Writer's Block, & Let Your Creative Juices Flow (Short Read)
Ebook
How to Write a Book: An 11-Step Process to Build Habits, Stop Procrastinating, Fuel Self-Motivation, Quiet Your Inner Critic, Bust Through Writer's Block, & Let Your Creative Juices Flow (Short Read)
byDavid Kadavy
Rating: 5 out of 5 stars
5/5
The Professional Voiceover Handbook: Voiceover training, #1
Ebook
The Professional Voiceover Handbook: Voiceover training, #1
byPeter Baker
Rating: 5 out of 5 stars
5/5
Slenderman: Online Obsession, Mental Illness, and the Violent Crime of Two Midwestern Girls
Ebook
Slenderman: Online Obsession, Mental Illness, and the Violent Crime of Two Midwestern Girls
byKathleen Hale
Rating: 4 out of 5 stars
4/5
Dark Aeon: Transhumanism and the War Against Humanity
Ebook
Dark Aeon: Transhumanism and the War Against Humanity
byJoe Allen
Rating: 5 out of 5 stars
5/5
CompTIA Security+ Get Certified Get Ahead: SY0-701 Study Guide
Ebook
CompTIA Security+ Get Certified Get Ahead: SY0-701 Study Guide
byJoe Shelley
Rating: 5 out of 5 stars
5/5
The Hacker Crackdown: Law and Disorder on the Electronic Frontier
Ebook
The Hacker Crackdown: Law and Disorder on the Electronic Frontier
byBruce Sterling
Rating: 4 out of 5 stars
4/5
The Huffington Post Complete Guide to Blogging
Ebook
The Huffington Post Complete Guide to Blogging
byThe editors of the Huffington Post
Rating: 3 out of 5 stars
3/5
Excel 101: A Beginner's & Intermediate's Guide for Mastering the Quintessence of Microsoft Excel (2010-2019 & 365) in no time!
Ebook
Excel 101: A Beginner's & Intermediate's Guide for Mastering the Quintessence of Microsoft Excel (2010-2019 & 365) in no time!
byJohannes Wild
Rating: 0 out of 5 stars
0 ratings
AI Crash Course: A fun and hands-on introduction to machine learning, reinforcement learning, deep learning, and artificial intelligence with Python
Ebook
AI Crash Course: A fun and hands-on introduction to machine learning, reinforcement learning, deep learning, and artificial intelligence with Python
byHadelin de Ponteves
Rating: 0 out of 5 stars
0 ratings
Machine Learning for Beginners: An Introduction for Beginners, Why Machine Learning Matters Today and How Machine Learning Networks, Algorithms, Concepts and Neural Networks Really Work
Ebook
Machine Learning for Beginners: An Introduction for Beginners, Why Machine Learning Matters Today and How Machine Learning Networks, Algorithms, Concepts and Neural Networks Really Work
bySteven Cooper
Rating: 4 out of 5 stars
4/5

Related podcast episodes

Skip carousel

Getting Technical about the Data Center Revolution with Jonathan Friedmann, CEO of Speedata
UNLIMITED
Getting Technical about the Data Center Revolution with Jonathan Friedmann, CEO of Speedata
byMaking Data Simple
0 ratings
0% found this document useful
78: Mindset of a Rockstar Data Analyst w/ Trevor Tapscott: Our focus for this inspiring episode of AOF is mindset, especially if you want to be a standout data analyst! I have brought one of my first ever followers and day ones! Trevor Tapscott is a VP and Analytics Consultant at Wells Fargo and has been in...
UNLIMITED
78: Mindset of a Rockstar Data Analyst w/ Trevor Tapscott: Our focus for this inspiring episode of AOF is mindset, especially if you want to be a standout data analyst! I have brought one of my first ever followers and day ones! Trevor Tapscott is a VP and Analytics Consultant at Wells Fargo and has been in...
byAnalytics on Fire
0 ratings
0% found this document useful
Measuring Your Python Learning Progress
UNLIMITED
Measuring Your Python Learning Progress
byThe Real Python Podcast
100%
100% found this document useful
Improving the Learning Experience on Real Python
UNLIMITED
Improving the Learning Experience on Real Python
byThe Real Python Podcast
0 ratings
0% found this document useful
S1:E1 "The Beginning"
UNLIMITED
S1:E1 "The Beginning"
byData Science Now
0 ratings
0% found this document useful
Advantages of Completing Small Python Projects
UNLIMITED
Advantages of Completing Small Python Projects
byThe Real Python Podcast
0 ratings
0% found this document useful
It’s Not a Data Science Problem, It’s a Data Engineering Problem with Laurie Voss: Laurie Voss is a senior data analyst at Netlify, makers of a serverless platform designed to help teams build, deploy, and collaborate on web apps more effectively. Previously, Laurie worked as Chief Data Officer at npm, Inc., co-founded Snowball Factory,
UNLIMITED
It’s Not a Data Science Problem, It’s a Data Engineering Problem with Laurie Voss: Laurie Voss is a senior data analyst at Netlify, makers of a serverless platform designed to help teams build, deploy, and collaborate on web apps more effectively. Previously, Laurie worked as Chief Data Officer at npm, Inc., co-founded Snowball Factory,
byScreaming in the Cloud
0 ratings
0% found this document useful
#1 Data Science, Past, Present and Future: Hilary Mason talks about the past, present, and future of data science with Hugo. Hilary is the VP of Research at Cloudera Fast Forward, a machine intelligence research company, and the data scientist in residence at Accel. If you want to hear about wh...
UNLIMITED
#1 Data Science, Past, Present and Future: Hilary Mason talks about the past, present, and future of data science with Hugo. Hilary is the VP of Research at Cloudera Fast Forward, a machine intelligence research company, and the data scientist in residence at Accel. If you want to hear about wh...
byDataFramed
100%
100% found this document useful
Episode 19 (Python for Data Science - Python Files - Scripts and Modules)
UNLIMITED
Episode 19 (Python for Data Science - Python Files - Scripts and Modules)
byHow to Data (Joshiverse- Journey of a Budding Data Scientist)
0 ratings
0% found this document useful
Open Source TensorFlow with Yifei Feng: Yifei Feng, a TensorFlow software engineer, shares with Melanie and Mark about her work on the open source TensorFlow project and the tools she builds.
UNLIMITED
Open Source TensorFlow with Yifei Feng: Yifei Feng, a TensorFlow software engineer, shares with Melanie and Mark about her work on the open source TensorFlow project and the tools she builds.
byGoogle Cloud Platform Podcast
100%
100% found this document useful
#059 - 10 Python clean code tips drawn from code reviews
UNLIMITED
#059 - 10 Python clean code tips drawn from code reviews
byPybites Podcast
0 ratings
0% found this document useful
Exploring deep reinforcement learning: with Thomas Simonini of Hugging Face
UNLIMITED
Exploring deep reinforcement learning: with Thomas Simonini of Hugging Face
byPractical AI: Machine Learning, Data Science, LLM
0 ratings
0% found this document useful
DevelopHer and Creating Success for All in Tech with Lauren Hasson: Corey is joned by Lauren Hasson, Fonder of DevelopHer, to discuss whats its like to not be a just another whtie dude in tech and her own work in tech and advocacy for everyone in their careers. Lauren stays busy with her multifaceted interaction with the
UNLIMITED
DevelopHer and Creating Success for All in Tech with Lauren Hasson: Corey is joned by Lauren Hasson, Fonder of DevelopHer, to discuss whats its like to not be a just another whtie dude in tech and her own work in tech and advocacy for everyone in their careers. Lauren stays busy with her multifaceted interaction with the
byScreaming in the Cloud
0 ratings
0% found this document useful
Composable Data Analytics
UNLIMITED
Composable Data Analytics
byThe Cloudcast
0 ratings
0% found this document useful
The Redis Rebrand with Yiftach Shoolman: Toward the end of last year Redis Lab’s went through a shift, and became Redis—sans “Labs.” Yiftach Shoolman, Co-Founder & CTO at Redis, has joined the “Screaming” line up to discuss their rebranding. Namely, they wanted to bring the messaging of Redis un
UNLIMITED
The Redis Rebrand with Yiftach Shoolman: Toward the end of last year Redis Lab’s went through a shift, and became Redis—sans “Labs.” Yiftach Shoolman, Co-Founder & CTO at Redis, has joined the “Screaming” line up to discuss their rebranding. Namely, they wanted to bring the messaging of Redis un
byScreaming in the Cloud
0 ratings
0% found this document useful
Everything is a Little Bit Broken
UNLIMITED
Everything is a Little Bit Broken
byThe Cloudcast
0 ratings
0% found this document useful
Google AI with Jeff Dean: Mark and Melanie are joined by Jeff Dean today to discuss AI at Google.
UNLIMITED
Google AI with Jeff Dean: Mark and Melanie are joined by Jeff Dean today to discuss AI at Google.
byGoogle Cloud Platform Podcast
0 ratings
0% found this document useful
DevOps and Incident Response Evolution
UNLIMITED
DevOps and Incident Response Evolution
byThe Cloudcast
0 ratings
0% found this document useful
How Redpanda Extracts Business Value from Data Events with Alex Gallego
UNLIMITED
How Redpanda Extracts Business Value from Data Events with Alex Gallego
byScreaming in the Cloud
0 ratings
0% found this document useful
Gitting After It with Katie Sylor-Miller: Katie Sylor-Miller is a frontend architect at Etsy, a company she joined in November 2015. Prior to this position, Katie worked as a senior front end developer at Constant Contact, a technical lead at EF Education, a front end web developer at Miller Syst
UNLIMITED
Gitting After It with Katie Sylor-Miller: Katie Sylor-Miller is a frontend architect at Etsy, a company she joined in November 2015. Prior to this position, Katie worked as a senior front end developer at Constant Contact, a technical lead at EF Education, a front end web developer at Miller Syst
byScreaming in the Cloud
0 ratings
0% found this document useful
How Data Discovery is Changing the Game with Shinji Kim: Shinji Kim, CEO and Co-Founder of Select Star, joins Corey to talk about the fast-growing world of data discovery. Shinji presents the question that Select Star answers, “How discoverable is your data?” and explains how Select Star is differentiating itse
UNLIMITED
How Data Discovery is Changing the Game with Shinji Kim: Shinji Kim, CEO and Co-Founder of Select Star, joins Corey to talk about the fast-growing world of data discovery. Shinji presents the question that Select Star answers, “How discoverable is your data?” and explains how Select Star is differentiating itse
byScreaming in the Cloud
0 ratings
0% found this document useful
BDTP. Data Activation for SaaS with David Sepulveda: What processes are involved in activating your data? In this episode, we talk to David Sepulveda, Head of Data at Kumospace. You'll learn why data analysts need to work closely with research and product teams, how to distinguish correlation from causation, how much data you need to track, and more.
UNLIMITED
BDTP. Data Activation for SaaS with David Sepulveda: What processes are involved in activating your data? In this episode, we talk to David Sepulveda, Head of Data at Kumospace. You'll learn why data analysts need to work closely with research and product teams, how to distinguish correlation from causation, how much data you need to track, and more.
byUI Breakfast: UI/UX Design and Product Strategy
0 ratings
0% found this document useful
The Cloudcast #208 - Infrastructure as Code: Brian talks with Nathen Harvey (@nathenharvey, Community Manager @chef) about how he became a Community Manager, his passion for DevOps, The Food Fight podcast, the future of configuration management and the best first steps to developing the skills to...
UNLIMITED
The Cloudcast #208 - Infrastructure as Code: Brian talks with Nathen Harvey (@nathenharvey, Community Manager @chef) about how he became a Community Manager, his passion for DevOps, The Food Fight podcast, the future of configuration management and the best first steps to developing the skills to...
byThe Cloudcast
0 ratings
0% found this document useful
Putting the “Fun” in Functional with Frank Chen: Almost everyone is using Slack, and a lot of that is because of the work of those like Frank Chen, Slack’s Senior Staff Software Engineer. Frank is here to tell us how Slack keeps us all angrily typing. But equally as important is his own trajectory which
UNLIMITED
Putting the “Fun” in Functional with Frank Chen: Almost everyone is using Slack, and a lot of that is because of the work of those like Frank Chen, Slack’s Senior Staff Software Engineer. Frank is here to tell us how Slack keeps us all angrily typing. But equally as important is his own trajectory which
byScreaming in the Cloud
0 ratings
0% found this document useful
The Cloudcast #355 - Exploring IoT Edge
UNLIMITED
The Cloudcast #355 - Exploring IoT Edge
byThe Cloudcast
0 ratings
0% found this document useful
Inspiring the Next Generation of Devs on TikTok with Scott Hanselman: Scott Hanselman is a partner program manager at Microsoft, where he’s worked for nearly 14 years. Scott brings more than 30 years of tech expertise to Microsoft. Prior to this role, he worked as the chief architect at Corillian, an adjunct professor at th
UNLIMITED
Inspiring the Next Generation of Devs on TikTok with Scott Hanselman: Scott Hanselman is a partner program manager at Microsoft, where he’s worked for nearly 14 years. Scott brings more than 30 years of tech expertise to Microsoft. Prior to this role, he worked as the chief architect at Corillian, an adjunct professor at th
byScreaming in the Cloud
0 ratings
0% found this document useful
DataFramed Careers Series Special Announcement!
UNLIMITED
DataFramed Careers Series Special Announcement!
byDataFramed
0 ratings
0% found this document useful
The Value of Analysts and Observability with Nick Heudecker: Nick Heudecker, who leads Market Strategy and Competitive Intelligence at Cirbl, joins Corey who, as it turns out, has some similarities with Corey. Nick also spent some time in Maine, as a cryptologist for the Navy, and also spent the months of deep wint
UNLIMITED
The Value of Analysts and Observability with Nick Heudecker: Nick Heudecker, who leads Market Strategy and Competitive Intelligence at Cirbl, joins Corey who, as it turns out, has some similarities with Corey. Nick also spent some time in Maine, as a cryptologist for the Navy, and also spent the months of deep wint
byScreaming in the Cloud
0 ratings
0% found this document useful
46: The Importance of Trust with Your Audience with Justin Jackson: Today’s episode features guest Justin Jackson, who is building a new podcasting startup called, Transistor.fm, and runs MegaMaker training and books for SaaS and indie software companies. He’s discovering that it’s a lot of work to build something. Derrick and Ben know how he feels. For Derrick, it’s been a fun week in the land of Level. He’s been working on his new landing page and landing new sign-ups for the pre-launch list. Ben has pre-sold $700 worth of Tuple accounts and raised $500 worth of verbal “Yeses.” Contact Ben if you want to be a part of Tuple. They share their ups and downs, fears and triumphs!
UNLIMITED
46: The Importance of Trust with Your Audience with Justin Jackson: Today’s episode features guest Justin Jackson, who is building a new podcasting startup called, Transistor.fm, and runs MegaMaker training and books for SaaS and indie software companies. He’s discovering that it’s a lot of work to build something. Derrick and Ben know how he feels. For Derrick, it’s been a fun week in the land of Level. He’s been working on his new landing page and landing new sign-ups for the pre-launch list. Ben has pre-sold $700 worth of Tuple accounts and raised $500 worth of verbal “Yeses.” Contact Ben if you want to be a part of Tuple. They share their ups and downs, fears and triumphs!
byThe Art of Product
0 ratings
0% found this document useful
Whiteboard Confessional: Scaling Databases in a Single Bound: Join me as I continue a new series called Whiteboard Confessional by examining an all-too-common problem: having to scale a database when it’s too late. In this episode, I touch upon the underlying reason many developers don’t think about their database u
UNLIMITED
Whiteboard Confessional: Scaling Databases in a Single Bound: Join me as I continue a new series called Whiteboard Confessional by examining an all-too-common problem: having to scale a database when it’s too late. In this episode, I touch upon the underlying reason many developers don’t think about their database u
byAWS Morning Brief
0 ratings
0% found this document useful

Skip carousel

2 The Use of Python in AI and ML
Techfastly
UNLIMITED
2 The Use of Python in AI and ML
Nov 30, 2020
3 min read
Scikit-Learn: The Ultimate Python Library
APC
UNLIMITED
Scikit-Learn: The Ultimate Python Library
Jul 15, 2019
4 min read
Want A Job In Data Science? You Might Have To Take A Standardized Test When Applying
Chicago Tribune
UNLIMITED
Want A Job In Data Science? You Might Have To Take A Standardized Test When Applying
Jul 10, 2018
3 min read
Manipulate Data Like A Pro With Pandas
Linux Format
UNLIMITED
Manipulate Data Like A Pro With Pandas
Jul 27, 2021
7 min read
The Future Of The Database
Linux Format
UNLIMITED
The Future Of The Database
Aug 27, 2019
7 min read
How Image Recognition Works
APC
UNLIMITED
How Image Recognition Works
Nov 4, 2019
4 min read
A New Computer Science Major Is Rebooting Society
Los Angeles Times
UNLIMITED
A New Computer Science Major Is Rebooting Society
Mar 14, 2018
4 min read
Understanding ELT & ETL
Techfastly
UNLIMITED
Understanding ELT & ETL
Apr 1, 2021
8 min read
Why Python?
Linux Format
UNLIMITED
Why Python?
Apr 7, 2020
Python is an interpreted, high-level, general-purpose programming language that was first released in 1991 by its creator, Guido van Rossum. Very similar in programming construct to how BASIC (Beginners All-purpose Sybollic Instruction Code) was used
1 min read
Inform And Enhance Your Business With Open Data
PC Pro Magazine
UNLIMITED
Inform And Enhance Your Business With Open Data
Jun 10, 2021
7 min read
Stay Safe Online!
Linux Format
UNLIMITED
Stay Safe Online!
Jan 9, 2024
19 min read
Web App Security
Linux Format
UNLIMITED
Web App Security
Jun 29, 2021
8 min read
A.i. Coding
Linux Format
UNLIMITED
A.i. Coding
Aug 22, 2023
16 min read
Family History In The AI Era
Family Tree UK
UNLIMITED
Family History In The AI Era
Apr 12, 2024
7 min read
Contributing For Non - Coders
Linux Format
UNLIMITED
Contributing For Non - Coders
Jan 10, 2023
9 min read
Machine-learning On Your Android Phone?
APC
UNLIMITED
Machine-learning On Your Android Phone?
Dec 30, 2019
4 min read
News Readers For Mac
MacFormat
UNLIMITED
News Readers For Mac
Dec 12, 2023
5 min read
Beta Yourself Rss
Stuff UK
UNLIMITED
Beta Yourself Rss
Mar 17, 2022
2 min read
Software Subscription Overload: Which Services Are Worth Paying For?
PCWorld
UNLIMITED
Software Subscription Overload: Which Services Are Worth Paying For?
Aug 2, 2022
6 min read
Syncing Without A Server
TechLife
UNLIMITED
Syncing Without A Server
Jan 10, 2022
4 min read
Make Your Home As Smart As Possible
Linux Format
UNLIMITED
Make Your Home As Smart As Possible
Mar 8, 2022
HOME ASSISTANT Credit: www.home-assistant.io Part One Don’t miss next issue! Subscribe on page 16 Matt Holder has been a fan of the open source methodology for over two decades and uses Linux and other tools where possible. A huge number of smart hom
11 min read
Make Your Home As Smart As Possible
Linux Format
UNLIMITED
Make Your Home As Smart As Possible
Mar 8, 2022
HOME ASSISTANT Credit: www.home-assistant.io Part One Don’t miss next issue! Subscribe on page 16 Matt Holder has been a fan of the open source methodology for over two decades and uses Linux and other tools where possible. A huge number of smart hom
11 min read
Beta Yourself Rss
Stuff Magazine South Africa
UNLIMITED
Beta Yourself Rss
Apr 4, 2022
2 min read
Doctor
Maximum PC
UNLIMITED
Doctor
Nov 5, 2024
6 min read
“Everyone Knows That What Drives The Google Machineis Data: Your Data”
PC Pro Magazine
UNLIMITED
“Everyone Knows That What Drives The Google Machineis Data: Your Data”
Mar 10, 2022
7 min read
Create Your Own News Feeds Using Reeder 5
iCreate
UNLIMITED
Create Your Own News Feeds Using Reeder 5
Jun 16, 2022
2 min read
Create Your Own News Feeds Using Reeder 5
iCreate
UNLIMITED
Create Your Own News Feeds Using Reeder 5
Jun 16, 2022
2 min read
Funding The Free
Linux Format
UNLIMITED
Funding The Free
Sep 22, 2020
7 min read
Ditch The Filing Cabinet
PC Pro Magazine
UNLIMITED
Ditch The Filing Cabinet
Aug 10, 2023
3 min read
Doctor
Maximum PC
UNLIMITED
Doctor
Aug 16, 2022
⟶ Quick Privacy Tips ⟶ A New Browser ⟶ PortableApps In the July issue, you had a news article titled “FBI Searches Data Without Warrants”. They aren’t just spying on people, they act on it, too. Thousands of arrests are made every year due to the FBI
5 min read

Related categories

Skip carousel

Reviews for Data Science Programming All-in-One For Dummies

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Data Science Programming All-in-One For Dummies - John Paul Mueller

Introduction

Data science is a term that the media has chosen to minimize, obfuscate, and sometimes misuse. It involves a lot more than just data and the science of working with data. Today, the world uses data science in all sorts of ways that you might not know about, which is why you need Data Science Programming All-in-One For Dummies.

In the book, you start with both the data and the science of manipulating it, but then you go much further. In addition to seeing how to perform a wide range of analysis, you also delve into making recommendations, classifying real-world objects, analyzing audio, and even creating art.

However, you don’t just learn about amazing new technologies and how to perform common tasks. This book also dispels myths created by people who wish data science were something different than it really is or who don’t understand it at all. A great deal of misinformation swirls around the world today as the media seeks to sensationalize, anthropomorphize, and emotionalize technologies that are, in fact, quite mundane. It’s hard to know what to believe. You find reports that robots are on the cusp of becoming sentient and that the giant tech companies can discover your innermost thoughts simply by reviewing your record of purchases. With this book, you can replace disinformation with solid facts, and you can use those facts to create a strategy for performing data science development tasks.

About This Book

You might find that this book starts off a little slowly because most people don’t have a good grasp on getting a system prepared for data science use. Book 1 helps you configure your system. The book uses Jupyter Notebook as an Integrated Development Environment (IDE) for both Python and R. That way, if you choose to view the examples in both languages, you use the same IDE to do it. Jupyter Notebook also relies on the literate programming strategy first proposed by Donald Knuth (see http://www.literateprogramming.com/) to make your coding efforts significantly easier and more focused on the data. In addition, in contrast to other environments, you don’t actually write entire applications before you see something; you write code and focus on the results of just that code block as part of a whole application.

After you have a development environment installed and ready to use, you can start working with data in all its myriad forms in Book 2. This book covers a great many of these forms — everything from in-memory datasets to those found on large websites. In addition, you see a number of data formats ranging from flat files to Relational Database Management Systems (RDBMSs) and Not Only SQL (NoSQL) databases.

Of course, manipulating data is worthwhile only if you can do something useful with it. Book 3 discusses common sorts of analysis, such as linear and logistic regression, Bayes’ Theorem, and K-Nearest Neighbors (KNN).

Most data science books stop at this point. In this book, however, you discover AI, machine learning, and deep learning techniques to get more out of your data than you might have thought possible. This exciting part of the book, Book 4, represents the cutting edge of analysis. You use huge datasets to discover important information about large groups of people that will help you improve their health or sell them products.

Performing analysis may be interesting, but analysis is only a step along the path. Book 5 shows you how to put your analysis to use in recommender systems, to classify objects, work with nontextual data like music and video, and display the results of an analysis in a form that everyone can appreciate.

The final minibook, Book 6, offers something you won’t find in many places, not even online. You discover how to detect and fix problems with your data, the logic used to interpret the data, and the code used to perform tasks such as analysis. By the time you complete Book 6, you’ll know much more about how to ensure that the results you get are actually the results you need and want.

To make absorbing the concepts easy, this book uses the following conventions:

Text that you’re meant to type just as it appears in the book is in bold. The exception is when you’re working through a step list: Because each step is bold, the text to type is not bold.

When you see words in italics as part of a typing sequence, you need to replace that value with something that works for you. For example, if you see "Type Your Name and press Enter," you need to replace Your Name with your actual name.

Web addresses and programming code appear in monofont. If you're reading a digital version of this book on a device connected to the Internet, you can click or tap the web address to visit that website, like this: https://www.dummies.com.

When you need to type command sequences, you see them separated by a special arrow, like this: File ⇒ New File. In this example, you go to the File menu first and then select the New File entry on that menu.

Foolish Assumptions

You might find it difficult to believe that we’ve assumed anything about you — after all; we haven’t even met you yet! Although most assumptions are indeed foolish, we made these assumptions to provide a starting point for the book.

You need to be familiar with the platform you want to use because the book doesn’t offer any guidance in this regard. (Book 1, Chapter 3 does, however, provide Anaconda installation instructions for both Python and R, and Book 1, Chapter 5 helps you install the TensorFlow and Keras frameworks used for this book.) To give you the maximum information about Python concerning how it applies to deep learning, this book doesn’t discuss any platform-specific issues. You see the R version of the Python coding examples in the downloadable source, along with R-specific notes on usage and development. You really do need to know how to install applications, use applications, and generally work with your chosen platform before you begin working with this book.

You must know how to work with Python or R. You can find a wealth of Python tutorials online (see https://www.w3schools.com/python/ and https://www.tutorialspoint.com/python/ as examples). R, likewise, provides a wealth of online tutorials (see https://www.tutorialspoint.com/r/index.htm, https://docs.anaconda.com/anaconda/navigator/tutorials/r-lang/, and https://www.statmethods.net/r-tutorial/index.html as examples).

This book isn’t a math primer. Yes, you see many examples of complex math, but the emphasis is on helping you use Python or R to perform data science development tasks rather than teaching math theory. We include some examples that also discuss the use of technologies such as data management (see Book 2), statistical analysis (see Book 3), AI, machine learning, deep learning (see Book 4), practical data science application (see Book 5), and troubleshooting both data and code (see Book 6). Book 1, Chapters 1 and 2 give you a better understanding of precisely what you need to know to use this book successfully. You also use a considerable number of libraries in writing code for this book. Book 1, Chapter 4 discusses library use and suggests other libraries that you might want to try.

This book also assumes that you can access items on the Internet. Sprinkled throughout are numerous references to online material that will enhance your learning experience. However, these added sources are useful only if you actually find and use them.

Icons Used in This Book

As you read this book, you see icons in the margins that indicate material of interest (or not, as the case may be). This section briefly describes each icon in this book.

Tip Tips are nice because they help you save time or perform some task without a lot of extra work. The tips in this book are time-saving techniques or pointers to resources that you should try so that you can get the maximum benefit from Python or R, or from performing deep learning–related tasks. (Note that R developers will also find copious notes in the source code files for issues that differ significantly from Python.)

Warning We don’t want to sound like angry parents or some kind of maniacs, but you should avoid doing anything that’s marked with a Warning icon. Otherwise, you might find that your application fails to work as expected, you get incorrect answers from seemingly bulletproof algorithms, or (in the worst-case scenario) you lose data.

Technical Stuff Whenever you see this icon, think advanced tip or technique. You might find these tidbits of useful information just too boring for words, or they could contain the solution you need to get a program running. Skip these bits of information whenever you like.

Remember If you don’t get anything else out of a particular chapter or section, remember the material marked by this icon. This text usually contains an essential process or a bit of information that you must know to work with Python or R, or to perform deep learning–related tasks successfully. (Note that the R source code files contain a great deal of text that gives essential details for working with R when R differs considerably from Python.)

Beyond the Book

This book isn’t the end of your Python or R data science development experience — it’s really just the beginning. We provide online content to make this book more flexible and better able to meet your needs. That way, as we receive email from you, we can address questions and tell you how updates to Python, R, or their associated add-ons affect book content. In fact, you gain access to all these cool additions:

Cheat sheet: You remember using crib notes in school to make a better mark on a test, don’t you? You do? Well, a cheat sheet is sort of like that. It provides you with some special notes about tasks that you can do with Python and R with regard to data science development that not every other person knows. You can find the cheat sheet by going to www.dummies.com, searching this book's title, and scrolling down the page that appears. The cheat sheet contains really neat information, such as the most common data errors that cause people problems with working in the data science field.

Updates: Sometimes changes happen. For example, we might not have seen an upcoming change when we looked into our crystal ball during the writing of this book. In the past, this possibility simply meant that the book became outdated and less useful, but you can now find updates to the book, if we have any, by searching this book's title at www.dummies.com.

In addition to these updates, check out the blog posts with answers to reader questions and demonstrations of useful, book-related techniques at http://blog.johnmuellerbooks.com/.

Companion files: Hey! Who really wants to type all the code in the book and reconstruct all those neural networks manually? Most readers prefer to spend their time actually working with data and seeing the interesting things they can do, rather than typing. Fortunately for you, the examples used in the book are available for download, so all you need to do is read the book to learn Python or R data science programming techniques. You can find these files at www.dummies.com. Search this book's title, and on the page that appears, scroll down to the image of the book cover and click it. Then click the More about This Book button and on the page that opens, go to the Downloads tab.

Where to Go from Here

It’s time to start your Python or R for data science programming adventure! If you’re completely new to Python or R and its use for data science tasks, you should start with Book 1, Chapter 1. Progressing through the book at a pace that allows you to absorb as much of the material as possible makes it feasible for you to gain insights that you might not otherwise gain if you read the chapters in a random order. However, the book is designed to allow you to read the material in any order desired.

If you’re a novice who’s in an absolute rush to get going with Python or R for data science programming as quickly as possible, you can skip to Book 1, Chapter 3 with the understanding that you may find some topics a bit confusing later. Skipping to Book 1, Chapter 5 is okay if you already have Anaconda (the programming product used in the book) installed with the appropriate language (Python or R as you desire), but be sure to at least skim Chapter 3 so that you know what assumptions we made when writing this book.

This book relies on a combination of TensorFlow and Keras to perform deep learning tasks. Even if you’re an advanced reader who wants to perform deep learning tasks, you need to go to Book 1, Chapter 5 to discover how to configure the environment used for this book. You must configure the environment according to instructions or you’re likely to experience failures when you try to run the code. However, this issue applies only to deep learning. This book has a great deal to offer in other areas, such as data manipulation and statistical analysis.

Book 1

Defining Data Science

Contents at a Glance

Chapter 1: Considering the History and Uses of Data Science

Considering the Elements of Data Science

Defining the Role of Data in the World

Creating the Data Science Pipeline

Comparing Different Languages Used for Data Science

Learning to Perform Data Science Tasks Fast

Chapter 2: Placing Data Science within the Realm of AI

Seeing the Data to Data Science Relationship

Defining the Levels of AI

Creating a Pipeline from Data to AI

Chapter 3: Creating a Data Science Lab of Your Own

Considering the Analysis Platform Options

Choosing a Development Language

Obtaining and Using Python

Obtaining and Using R

Presenting Frameworks

Accessing the Downloadable Code

Chapter 4: Considering Additional Packages and Libraries You Might Want

Considering the Uses for Third-Party Code

Obtaining Useful Python Packages

Locating Useful R Libraries

Chapter 5: Leveraging a Deep Learning Framework

Understanding Deep Learning Framework Usage

Working with Low-End Frameworks

Understanding TensorFlow

Chapter 1 Considering the History and Uses of Data Science

IN THIS CHAPTER

check Understanding data science history and uses

check Considering the flow of data in data science

check Working with various languages in data science

check Performing data science tasks quickly

The burgeoning uses for data in the world today, along with the explosion of data sources, create a demand for people who have special skills to obtain, manage, and analyze information for the benefit of everyone. The data scientist develops and hones these special skills to perform such tasks on multiple levels, as described in the first two sections of this chapter.

Data needs to be funneled into acceptable forms that allow data scientists to perform their tasks. Even though the precise data flow varies, you can generalize it to a degree. The third section of the chapter gives you an overview of how data flow occurs.

As with anyone engaged in computer work today, a data scientist employs various programming languages to express the manipulation of data in a repeatable manner. The languages that a data scientist uses, however, focus on outputs expected from given inputs, rather than on low-level control or a precise procedure, as a computer scientist would use. Because a data scientist may lack a formal programming education, the languages tend to focus on declarative strategies, with the data scientist expressing a desired outcome rather than devising a specific procedure. The fourth section of the chapter discusses various languages used by data scientists, with an emphasis on Python and R.

The final section of the chapter provides a very quick overview of getting tasks done quickly. Optimization without loss of precision is an incredibly difficult task and you see it covered a number of times in this book, but this introduction is enough to get you started. The overall goal of this first chapter is to describe data science and explain how a data scientist uses algorithms, statistics, data extraction, data manipulation, and a slew of other technologies to employ it as part of an analysis.

Remember You don’t have to type the source code for this chapter manually (or, actually at all, given that you use it only to obtain an understanding of the data flow process). In fact, using the downloadable source is a lot easier. The source code for this chapter appears in the DSPD_0101_Quick_Overview.ipynb source code file for Python. See the Introduction for details on how to find these source files.

Considering the Elements of Data Science

At one point, the world viewed anyone working with statistics as a sort of accountant or perhaps a mad scientist. Many people consider statistics and the analysis of data boring. However, data science is one of those occupations in which the more you learn, the more you want to learn. Answering one question often spawns more questions that are even more interesting than the one you just answered. However, what makes data science so sexy is that you see it everywhere, used in an almost infinite number of ways. The following sections give you more details on why data science is such an amazing field of study.

Considering the emergence of data science

Data science is a relatively new term. William S. Cleveland coined the term in 2001 as part of a paper entitled Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics. It wasn't until a year later that the International Council for Science actually recognized data science and created a committee for it. Columbia University got into the act in 2003 by beginning publication of the Journal of Data Science.

Remember However, the mathematical basis behind data science is centuries old because data science is essentially a method of viewing and analyzing statistics and probability. The first essential use of statistics as a term comes in 1749, but statistics are certainly much older than that. People have used statistics to recognize patterns for thousands of years. For example, the historian Thucydides (in his History of the Peloponnesian War) describes how the Athenians calculated the height of the wall of Platea in fifth century BC by counting bricks in an unplastered section of the wall. Because the count needed to be accurate, the Athenians took the average of the count by several solders.

The process of quantifying and understanding statistics is relatively new, but the science itself is quite old. An early attempt to begin documenting the importance of statistics appears in the ninth century, when Al-Kindi wrote Manuscript on Deciphering Cryptographic Messages. In this paper, Al-Kindi describes how to use a combination of statistics and frequency analysis to decipher encrypted messages. Even in the beginning, statistics saw use in the practical application of science for tasks that seemed virtually impossible to complete. Data science continues this process, and to some people it might actually seem like magic.

Outlining the core competencies of a data scientist

As is true of anyone performing most complex trades today, the data scientist requires knowledge of a broad range of skills to perform the required tasks. In fact, so many different skills are required that data scientists often work in teams. Someone who is good at gathering data might team up with an analyst and someone gifted in presenting information. Finding a single person who possesses all the required skills would be hard. With this in mind, the following list describes areas in which a data scientist can excel (with more competencies being better):

Data capture: It doesn’t matter what sort of math skills you have if you can’t obtain data to analyze in the first place. The act of capturing data begins by managing a data source using database-management skills. However, raw data isn’t particularly useful in many situations; you must also understand the data domain so that you can look at the data and begin formulating the sorts of questions to ask. Finally, you must have data-modeling skills so that you understand how the data is connected and whether the data is structured.

Analysis: After you have data to work with and understand the complexities of that data, you can begin to perform an analysis on it. You perform some analysis using basic statistical tool skills, much like those that just about everyone learns in college. However, the use of specialized math tricks and algorithms can make patterns in the data more obvious or help you draw conclusions that you can’t draw by reviewing the data alone.

Presentation: Most people don’t understand numbers well. They can’t see the patterns that the data scientist sees. Providing a graphical presentation of these patterns is important to help others visualize what the numbers mean and how to apply them in a meaningful way. More important, the presentation must tell a specific story so that the impact of the data isn’t lost.

Linking data science, big data, and AI

Interestingly enough, the act of moving data around so that someone can perform analysis on it is a specialty called Extract, Transform, and Load (ETL). The ETL specialist uses programming languages such as Python to extract the data from a number of sources. Corporations tend not to keep data in one easily accessed location, so finding the data required to perform analysis takes time. After the ETL specialist finds the data, a programming language or other tool transforms it into a common format for analysis purposes. The loading process takes many forms, but this book relies on Python to perform the task. In a large, real-world operation, you might find yourself using tools such as Informatica, MS SSIS, or Teradata to perform the task.

Remember Data science isn’t necessarily a means to an end; it may instead be a step along the way. As a data scientist works through various datasets and finds interesting facts, these facts may act as input for other sorts of analysis and AI applications. For example, consider that your shopping habits often suggest what books you might like or where you might like to go for a vacation. Shopping or other habits can also help others understand other, sometimes less benign, activities as well. Machine Learning For Dummies and Artificial Intelligence For Dummies, both by John Paul Mueller and Luca Massaron (Wiley), help you understand these other uses of data science. For now, consider the fact that what you learn in this book can have a definite effect on a career path that will go many other places.

Understanding the role of programming

A data scientist may need to know several programming languages in order to achieve specific goals. For example, you may need SQL knowledge to extract data from relational databases. Python can help you perform data loading, transformation, and analysis tasks. However, you might choose a product such as MATLAB (which has its own programming language) or PowerPoint (which relies on VBA) to present the information to others. (If you’re interested to see how MATLAB compares to the use of Python, you can get the book, MATLAB For Dummies, by John Paul Mueller [Wiley].) The immense datasets that data scientists rely on often require multiple levels of redundant processing to transform into useful processed data. Manually performing these tasks is time consuming and error prone, so programming presents the best method for achieving the goal of a coherent, usable data source.

Given the number of products that most data scientists use, sticking to just one programming language may not be possible. Yes, Python can load data, transform it, analyze it, and even present it to the end user, but the process works only when the language provides the required functionality. You may have to choose other languages to fill out your toolkit. The languages you choose depend on a number of criteria. Here are some criteria you should consider:

How you intend to use data science in your code (you have a number of tasks to consider, such as data analysis, classification, and regression)

Your familiarity with the language

The need to interact with other languages

The availability of tools to enhance the development environment

The availability of APIs and libraries to make performing tasks easier

Defining the Role of Data in the World

This section of the chapter is too short. It can’t even begin to describe the ways in which data will affect you in the future. Consider the following subsections as offering tantalizing tidbits —appetizers that can whet your appetite for exploring the world of data and data science further. The applications listed in these sections are already common in some settings. You probably used at least one of them today, and quite likely more than just one. After reading the following sections, you might want to take the time to consider all the ways in which data currently affects your life. The use of data to perform amazing feats is really just the beginning. Humanity is at the cusp of an event that will rival the Industrial Revolution (see https://www.history.com/topics/industrial-revolution/industrial-revolution), and the use of data (and its associated technologies, such as AI, machine learning, and deep learning) is actually quite immature at this point.

Enticing people to buy products

Demographics, those vital or social statistics that group people by certain characteristics, have always been part art and part science. You can find any number of articles about getting your computer to generate demographics for clients (or potential clients). The use of demographics is wide ranging, but you see them used for things like predicting which product a particular group will buy (versus that of the competition). Demographics are an important means of categorizing people and then predicting some action on their part based on their group associations. Here are the methods that you often see cited for AIs when gathering demographics:

Historical: Based on previous actions, an AI generalizes which actions you might perform in the future.

Current activity: Based on the action you perform now and perhaps other characteristics, such as gender, a computer predicts your next action.

Characteristics: Based on the properties that define you, such as gender, age, and area where you live, a computer predicts the choices you are likely to make.

Warning You can find articles about AI’s predictive capabilities that seem almost too good to be true. For example, the article at https://medium.com/@demografy/artificial-intelligence-can-now-predict-demographic-characteristics-knowing-only-your-name-6749436a6bd3 says that AI can now predict your demographics based solely on your name. The company in that article, Demografy (https://demografy.com/), claims to provide gender, age, and cultural affinity based solely on name. Even though the site claims that it’s 90 to 95 percent accurate (see the Is Demografy Accurate answer at https://demografy.com/faq for details), this statistic is unlikely because some names are gender ambiguous, such as Renee, and others are assigned to one gender in some countries and another gender in others. In fact, the answer on the Demografy site seems to acknowledge this issue by saying the outcome heavily depends on your particular list and may show considerably different results than these averages. Yes, demographic prediction can work, but exercise care before believing everything that these sites tell you.

If you want to experiment with demographic prediction, you can find a number of APIs online. For example, the DeepAI API at https://deepai.org/machine-learning-model/demographic-recognition promises to help you predict age, gender, and cultural background based on a person’s appearance in a video. Each of the online APIs do specialize, so you need to choose the API with an eye toward the kind of input data you can provide.

Keeping people safer

You already have a good idea of how data might affect you in ways that keep you safer. For example, statistics help car designers create new designs that provide greater safety for the occupant and sometimes other parties as well. Data also figures into calculations for things like

Medications

Medical procedures

Safety equipment

Safety procedures

How long to keep the crosswalk signs lit

Safety goes much further, though. For example, people have been trying to predict natural disasters for as long as there have been people and natural disasters. No one wants to be part of an earthquake, tornado, volcanic eruption, or any other natural disaster. Being able to get away quickly is the prime consideration in such cases, given that humans can’t control their environment well enough yet to prevent any natural disaster.

Data managed by deep learning provides the means to look for extremely subtle patterns that boggle the minds of humans. These patterns can help predict a natural catastrophe, according to the article on Google’s solution at http://www.digitaljournal.com/tech-and-science/technology/google-to-use-ai-to-predict-natural-disasters/article/533026. The fact that the software can predict any disaster at all is simply amazing. However, the article at http://theconversation.com/ai-could-help-us-manage-natural-disasters-but-only-to-an-extent-90777 warns that relying on such software exclusively would be a mistake. Overreliance on technology is a constant theme throughout this book, so don’t be surprised that deep learning is less than perfect in predicting natural catastrophes as well.

Creating new technologies

New technologies can cover a very wide range of applications. For example, you find new technologies for making factories safer and more efficient all the time. Space travel requires an inordinate number of new technologies. Just consider how the data collected in the past affects things like smart phone use and the manner in which you drive your car.

However, a new technology can take an interesting twist, and you should look for these applications as well. You probably have black-and-white videos or pictures of family members or special events that you’d love to see in color. Color consists of three elements: hue (the actual color); value (the darkness or lightness of the color); and saturation (the intensity of the color). You can read more about these elements at http://learn.leighcotnoir.com/artspeak/elements-color/hue-value-saturation/. Oddly enough, many artists are color-blind and make strong use of color value in their creations (read https://www.nytimes.com/2017/12/23/books/a-colorblind-artist-illustrator-childrens-books.html as one of many examples). So having hue missing (the element that black-and-white art lacks) isn’t the end of the world. Quite the contrary: Some artists view it as an advantage (see https://www.artsy.net/article/artsy-editorial-the-advantages-of-being-a-colorblind-artist for details).

When viewing something in black and white, you see value and saturation but not hue. Colorization is the process of adding the hue back in. Artists generally perform this process using a painstaking selection of individual colors, as described at https://fstoppers.com/video/how-amazing-colorization-black-and-white-photos-are-done-5384 and https://www.diyphotography.net/know-colors-add-colorizing-black-white-photos/. However, AI has automated this process using Convolutional Neural Networks (CNNs), as described at https://emerj.com/ai-future-outlook/ai-is-colorizing-and-beautifying-the-world/.

Remember The easiest way to use CNN for colorization is to find a library to help you. The Algorithmia site at https://demos.algorithmia.com/colorize-photos/ offers such a library and shows some example code. You can also try the application by pasting a URL into the supplied field. The article at https://petapixel.com/2016/07/14/app-magically-turns-bw-photos-color-ones/ describes just how well this application works. It’s absolutely amazing!

Performing analysis for research

Most people think that research focuses only on issues like health, consumerism, or improving efficiency. However, research takes a great many other forms as well, many of which you’ll never even hear about, such as figuring out how people move in order to keep them safer. Think about a manikin for a moment. You can pose the manikin in various ways to see how that pose affects an environment, such as in car crash research. However, manikins are simply snapshots in a process that happens in real time. In order to see how people interact with their environment, you must pose the people in a fluid, real time, manner using a strategy called person poses.

Person poses don’t tell you who is in a video stream, but rather what elements of a person are in the video stream. For example, using a person pose can tell you whether the person’s elbow appears in the video and where it appears. The article at https://medium.com/tensorflow/real-time-human-pose-estimation-in-the-browser-with-tensorflow-js-7dd0bc881cd5 tells you more about how this whole visualization technique works. In fact, you can see how the system works through a short animation of one person in the first case and three people in the second case.

Person poses can have all sorts of useful purposes. For example, you might use a person pose to help people improve their form for various kinds of sports — everything from golf to bowling. A person pose could also make new sorts of video games possible. Imagine being able to track a person’s position for a game without the usual assortment of cumbersome gear. Theoretically, you could use person poses to perform crime-scene analysis or to determine the possibility of a person committing a crime.

Another interesting application of pose detection is for medical and rehabilitation purposes. Software powered by data managed by deep learning techniques could tell you whether you’re doing your exercises correctly and track your improvements. An application of this sort could support the work of a professional rehabilitator by taking care of you when you aren’t in a medical facility (an activity called telerehabilitation; see https://matrc.org/telerehabilitation-telepractice for details).

Remember Fortunately, you can at least start working with person poses today using the tfjs-models (PoseNet) library at https://github.com/tensorflow/tfjs-models/tree/master/posenet. You can see it in action with a webcam, complete with source code, at https://ml5js.org/docs/posenet-webcam. The example takes a while to load, so you need to be patient.

Providing art and entertainment

Book 5, Chapter 4 provides you with some good ideas on how deep learning can use the content of a real-world picture and an existing master painter (live or dead) for style to create a combination of the two. In fact, some pieces of art generated using this approach are commanding high prices on the auction block. You can find all sorts of articles on this particular kind of art generation, such as the Wired article at https://www.wired.com/story/we-made-artificial-intelligence-art-so-can-you/.

However, even though pictures are nice for hanging on the wall, you might want to produce other kinds of art. For example, you can create a 3-D version of your picture using products like Smoothie 3-D. The articles at https://styly.cc/tips/smoothie-3d/ and https://3dprint.com/38467/smoothie-3d-software/ describe how this software works. It’s not the same as creating a sculpture; rather, you use a 3-D printer to build a 3-D version of your picture. The article at https://thenextweb.com/artificial-intelligence/2018/03/08/try-this-ai-experiment-that-converts-2d-images-to-3d/ offers an experiment that you can perform to see how the process works.

Remember The output of an AI doesn’t need to consist of something visual, either. For example, deep learning enables you to create music based on the content of a picture, as described at https://www.cnet.com/news/baidu-ai-creates-original-music-by-looking-at-pictures-china-google/. This form of art makes the method used by AI clearer. The AI transforms content that it doesn’t understand from one form to another. As humans, we see and understand the transformation, but all the computer sees are numbers to process using clever algorithms created by other humans.

Making life more interesting in other ways

Data is part of your life. You really can’t perform too many activities anymore that don’t have data attached to them in some way. For example, consider gardening. You might think that digging in the earth, planting seeds, watering, and harvesting fruit has nothing to do with data, yet the seeds you use likely rely on research conducted as the result of gathering data. The tools you use to dig are now ergonomically designed based on human research studies. The weather reports you use to determine whether to water or not rely on data. The clothes you wear, the shoes you employ to work safely, and even the manner in which you work are all influenced by data. Now, consider that gardening is a relatively nontechnical task that people have performed for thousands of years, and you get a good feel for just how much data affects your daily life.

Creating the Data Science Pipeline

Data science is partly art and partly engineering. Recognizing patterns in data, considering what questions to ask, and determining which algorithms work best are all part of the art side of data science. However, to make the art part of data science realizable, the engineering part relies on a specific process to achieve specific goals. This process is the data science pipeline, which requires the data scientist to follow particular steps in the preparation, analysis, and presentation of the data. The following sections help you understand the data science pipeline better so that you can understand how the book employs it during the presentation of examples.

Preparing the data

The data that you access from various sources doesn’t come in an easily packaged form, ready for analysis — quite the contrary. The raw data not only may vary substantially in format, but you may also need to transform it to make all the data sources cohesive and amenable to analysis. Transformation may require changing data types, the order in which data appears, and even the creation of data entries based on the information provided by existing entries.

Performing exploratory data analysis

The math behind data analysis relies on engineering principles in that the results are provable and consistent. However, data science provides access to a wealth of statistical methods and algorithms that help you discover patterns in the data. A single approach doesn’t ordinarily do the trick. You typically use an iterative process to rework the data from a number of perspectives. The use of trial and error is part of the data science art.

Learning from data

As you iterate through various statistical analysis methods and apply algorithms to detect patterns, you begin learning from the data. The data might not tell the story that you originally thought it would, or it might have many stories to tell. Discovery is part of being a data scientist. In fact, it’s the fun part of data science because you can’t ever know in advance precisely what the data will reveal to you.

Remember Of course, the imprecise nature of data and the finding of seemingly random patterns in it means keeping an open mind. If you have preconceived ideas of what the data contains, you won’t find the information it actually does contain. You miss the discovery phase of the process, which translates into lost opportunities for both you and the people who depend on you.

Visualizing

Visualization means seeing the patterns in the data and then being able to react to those patterns. It also means being able to see when data is not part of the pattern. Think of yourself as a data sculptor — removing the data that lies outside the patterns (the outliers) so that others can see the masterpiece of information beneath. Yes, you can see the masterpiece, but until others can see it, too, it remains in your vision alone.

Obtaining insights and data products

The data scientist may seem to simply be looking for unique methods of viewing data. However, the process doesn’t end until you have a clear understanding of what the data means. The insights you obtain from manipulating and analyzing the data help you to perform real-world tasks. For example, you can use the results of an analysis to make a business decision.

In some cases, the result of an analysis creates an automated response. For example, when a robot views a series of pixels obtained from a camera, the pixels that form an object have special meaning, and the robot’s programming may dictate some sort of interaction with that object. However, until the data scientist builds an application that can load, analyze, and visualize the pixels from the camera, the robot doesn’t see anything at all.

Comparing Different Languages Used for Data Science

None of the existing programming languages in the world can do everything. One such language endeavor, Ada, has received limited success because the language is incredibly difficult to learn (see https://www.nap.edu/read/5463/chapter/3 and https://news.ycombinator.com/item?id=7824570 for details). The problem is that if you make a language robust enough to do everything, it’s too complex to do anything. Consequently, as a data scientist, you likely need exposure to a number of languages, each of which has a forte in a particular aspect of data science development. The following sections help you to better understand the languages used for data science, with a special emphasis on Python and R, the languages supported by this book.

Obtaining an overview of data science languages

Many different programming languages exist, and most were designed to perform tasks in a certain way or even make a particular profession’s work easier to do. Choosing the correct tool makes your life easier. It’s akin to using a hammer instead of a screwdriver to drive a screw. Yes, the hammer works, but the screwdriver is much easier to use and definitely does a better job. Data scientists usually use only a few languages because they make working with data easier. With this idea in mind, here are the top languages for data science work in order of preference:

Python (general purpose): Many data scientists prefer to use Python because it provides a wealth of libraries, such as NumPy, SciPy, MatPlotLib, pandas, and Scikit-learn, to make data science tasks significantly easier. Python is also a precise language that makes using multiprocessing on large datasets easier, thereby reducing the time required to analyze them. The data science community has also stepped up with specialized IDEs, such as Anaconda, that implement the Jupyter Notebook concept, which makes working with data science calculations significantly easier. (Chapter 3 of this minibook demonstrates how to use Jupyter Notebook, so don’t worry about it in this chapter.) In addition to all these aspects in Python’s favor, it’s also an excellent language for creating glue code (code that is used to connect various existing code elements together into a cohesive whole) with languages such as C/C++ and Fortran. The Python documentation actually shows how to create the required extensions. Most Python users rely on the language to see patterns, such as allowing a robot to see a group of pixels as an object. It also sees use for all sorts of scientific tasks.

R (special purpose statistical): In many respects, Python and R share the same sorts of functionality but implement it in different ways. Depending on which source you view, Python and R have about the same number of proponents, and some people use Python and R interchangeably (or sometimes in tandem). Unlike Python, R provides its own environment, so you don’t need a third-party product such as Anaconda. However, Chapter 3 of this minibook shows how you can use R in Jupyter Notebook so that you can use a single IDE for all your needs. Unfortunately, R doesn’t appear to mix with other languages with the ease that Python provides.

SQL (database management): The most important thing to remember about Structured Query Language (SQL) is that it focuses on data rather than tasks. (This distinction makes it a full-fledged language for a data scientist, but only part of a solution for a computer scientist.) Businesses can’t operate without good data management — the data is the business. Large organizations use some sort of relational database, which is normally accessible with SQL, to store their data. Most Database Management System (DBMS) products rely on SQL as their main language, and DBMS usually has a large number of data analysis and other data science features built in. Because you’re accessing the data natively, you often experience a significant speed gain in performing data science tasks this way. Database Administrators (DBAs) generally use SQL to manage or manipulate the data rather than necessarily perform detailed analysis of it. However, the data scientist can also use SQL for various data science tasks and make the resulting scripts available to the DBAs for their needs.

Java (general purpose): Some data scientists perform other kinds of programming that require a general-purpose, widely adapted, and popular language. In addition to providing access to a large number of libraries (most of which aren’t actually all that useful for data science, but do work for other needs), Java supports object orientation better than any of the other languages in this list. In addition, it’s strongly typed and tends to run quite quickly. Consequently, some people prefer it for finalized code. Java isn’t a good choice for experimentation or ad hoc queries. Oddly enough, an implementation of Java exists for Jupyter Notebook, but it isn’t refined and is not usable for data science work at this time. (You can find helpful information about the Jupyter Java implementation at https://blog.frankel.ch/teaching-java-jupyter-notebooks/, https://github.com/scijava/scijava-jupyter-kernel, and https://github.com/jupyter/jupyter/wiki/Jupyter-kernels.)

Scala (general purpose): Because Scala uses the Java Virtual Machine (JVM), it does have some of the advantages and disadvantages of Java. However, like Python, Scala provides strong support for the functional programming paradigm, which uses lambda calculus as its basis (see Functional Programmming For Dummies, by John Paul Mueller [Wiley] for details). In addition, Apache Spark is written in Scala, which means that you have good support for cluster computing when using this language. Think huge dataset support. Some of the pitfalls of using Scala are that it’s hard to set up correctly, it has a steep learning curve, and it lacks a comprehensive set of data science–specific libraries.

Defining the pros and cons of using Python

Given the right data sources, analysis requirements, and presentation needs, you can use Python for every part of the data science pipeline. In fact, that’s precisely what you do in this book. Every example uses Python to help you understand another part of the data science equation. Of all the languages you could choose for performing data science tasks, Python is the most flexible and capable because it supports so many third-party libraries devoted to the task. The following sections help you better understand why Python is such a good choice for many (if not most) data science needs.

Considering the shifting profile of data scientists

Some people view the data scientist as an unapproachable nerd who performs miracles on data with math. The data scientist is the person behind the curtain in an Oz-like experience. However, this perspective is changing. In many respects, the world now views the data scientist as either an adjunct to a developer or as a new type of developer. The ascendance of applications of all sorts that can learn is the essence of this change. For an application to learn, it has to be able to manipulate large databases and discover new patterns in them. In addition, the application must be able to create new data based on the old data — making an informed prediction of sorts. The new kinds of applications affect people in ways that would have seemed like science fiction just a few years ago. Of course, the most noticeable of these applications define the behaviors of robots that will interact far more closely with people tomorrow than they do today.

From a business perspective, the necessity of fusing data science and application development is obvious: Businesses must perform various sorts of analysis on the huge databases they have collected — to make sense of the information and use it to predict the future. In truth, however, the far greater impact of the melding of these two branches of science — data science and application development — will be felt in terms of creating altogether new kinds of applications, some of which aren’t even possible to imagine with clarity today. For example, new applications could help students learn with greater precision by analyzing their learning trends and creating new instructional methods that work for that particular student. This combination of sciences might also solve a host of medical problems that seem impossible to solve today — not only in keeping disease at bay, but also by solving problems, such as how to create truly usable prosthetic devices that look and act like the real thing.

Working with a multipurpose, simple, and efficient language

Many different ways are available for accomplishing data science tasks. This book covers only one of the myriad methods at your disposal. However, Python represents one of the few single-stop solutions that you can use to solve complex data science problems. Instead of having to use a number of tools to perform a task, you can simply use a single language, Python, to get the job done. The Python difference is the large number scientific and math libraries created for it by third parties. Plugging in these libraries greatly extends Python and allows it to easily perform tasks that other languages could perform, but with great difficulty.

Tip Python’s libraries are its main selling point; however, Python offers more than reusable code. The most important thing to consider with Python is that it supports four different coding styles:

Functional: Treats every statement as a mathematical equation and avoids any form of state or mutable data. The main advantage of this approach is having no side effects to consider. In addition, this coding style lends itself better than the others to parallel processing because you have no state to consider. Many developers prefer this coding style for recursion and for lambda calculus.

Imperative: Performs computations as a direct change to program state. This style is especially useful when manipulating data structures and produces elegant, but simple, code.

Object-oriented: Relies on data fields that are treated as objects and manipulated only through prescribed methods. Python doesn’t fully support this coding form because it can’t implement features such as data hiding. However, this is a useful coding style for complex applications because it supports encapsulation and polymorphism. This coding style also favors code reuse.

Procedural: Treats tasks as step-by-step iterations in which common tasks are placed in functions that are called as needed. This coding style favors iteration, sequencing, selection, and modularization.

Defining the pros and cons of using R

The standard download of R is a combination of an environment and a language. It’s a form of the S programming language, which John Chambers originally created at Bell Laboratories to make working with statistics easier. Rick Becker and Allan Wilks eventually added to the S programming language as well. The goal of the R language is to turn ideas into software quickly and easily. In other words, R is a language designed to help someone who doesn’t have much programming experience create code without a huge learning curve.

This book uses R instead of S because R is a free, downloadable product that can run most S code without modification; in contrast, you have to pay for S. Given the examples used in the book, R is a great choice. You can read more about R in general at https://www.r-project.org/about.html.

Warning You don’t want to make sweeping generalizations about the languages used for data science because you must also consider how the languages are used within the field (such as performing machine learning or deep learning tasks). Both R and Python are popular languages for different reasons. Articles such as In data science, the R language is swallowing Python (http://www.infoworld.com/article/2951779/application-development/in-data-science-the-r-language-is-swallowing-python.html) initially seem to say that R is becoming more popular for some reason, which isn’t clearly articulated. The author wisely backs away from this statement by pointing out that R is best used for statistical purposes and Python is a better general-purpose language. The best developers always have an assortment of programming tools in their tool belts to make performing tasks easier. Languages address developer needs, so you need to use the right language for the job. After all, all languages ultimately become machine code that a processor understands — an extremely low-level, processor-specific language that few developers understand any longer because high-level programming languages make development easier.

You can get a basic copy of R from the Comprehensive R Archive Network (CRAN) site at https://cran.r-project.org/. The site provides both source code versions and compiled versions of the R distribution for various platforms. Unless you plan to make your own changes to the basic R support or want to delve into how R works, getting the compiled version is always better. If you use RStudio, you must also download and install a copy of R.

Remember This book uses a version of R specially designed for use in Jupyter Notebook (as described in Chapter 3 of this minibook). Because you can work with Python and R using the same IDE, you save time and effort because now you don’t have to learn a separate IDE for R. However, you might ultimately choose to work with a specialized R environment to obtain language help features that Jupyter Notebook doesn’t provide. If you use a different IDE, the screenshots in the book won’t match what you see onscreen, and the downloadable source code files may not load without error (but should still work with minor touchups).

The RStudio Desktop version (https://www.rstudio.com/products/rstudio/#Desktop) can make the task of working with R even easier. This product is a free download, and you can get it in Linux (Debian/Ubuntu, RedHat/CentOS, and SUSE Linux), Mac, and Windows versions. The book doesn’t use the advanced features found in the paid version of the product, nor will you require the RStudio Server features for the examples.

You can try other R distributions if you find that you don’t like Jupyter Notebook or RStudio. The most common alternative distributions are StatET (http://www.walware.de/goto/statet), Red-R (https://decisionstats.com/2010/09/28/red-r-1-8-groovy-gui/ or http://www.red-r.org/), and Rattle (http://rattle.togaware.com/). All of them are good products, but RStudio appears to have the strongest following and is the simplest product to use outside Jupyter Notebook. You can read discussions about the various choices online at places such as https://www.quora.com/What-are-the-best-choices-for-an-R-IDE.

Learning to Perform Data Science Tasks Fast

It’s time to see the data science pipeline in action. Even though the following sections use Python to provide a brief overview of the process you explore in detail in the rest of the book, they also apply to using R. Throughout the book, you see Python used directly in the text for every example, with some R additions. The downloadable source contains R versions of the examples that reflect the capabilities that R provides.

You won’t actually perform the tasks in the following sections. In fact, you don’t find installation instructions for Python until Chapter 3, so in this chapter, you can just follow along in the text. This book uses a specific version of Python and an IDE called Jupyter Notebook, so please wait until Chapter 3 to install these features (or skip ahead, if you insist, and install them now). Don’t worry about understanding every aspect of the process at this point. The purpose of these sections is to help you gain an understanding of the flow of using Python to perform data science tasks. Many of the details may seem difficult to understand at this point, but the rest of the book will help you understand them.

Remember The examples in this book rely on a web-based application named Jupyter Notebook. The screenshots you see in this and other chapters reflect how Jupyter Notebook looks in Firefox on a Windows 7 system. The view you see will contain the same data, but the actual interface may differ a little depending on platform (such as using a notebook instead of a desktop system), operating system, and browser. Don’t worry if you see some slight differences between your display and the screenshots in the book.

Loading data

Before you can do anything, you need to load some data. The book shows you all sorts of methods for performing this task. In this case, Figure 1-1 shows how to load a dataset called Boston that contains housing prices and other facts about houses in the Boston area. The code places the entire dataset in the boston variable and then places parts of that data in variables named X and y. Think of variables as you would storage boxes. The variables are important because they enable you to work with the data.

Code depicting how to load a dataset called Boston that contains housing prices and other facts about houses in the Boston area.

FIGURE 1-1: Loading data into variables so that you can manipulate it.

Training a model

Now that you have some data to work with, you can do something with it. All sorts of algorithms are built into Python. Figure 1-2 shows a linear regression model. Again, don't worry precisely how this works; later chapters discuss linear regression in detail. The important thing to note in Figure 1-2 is that Python lets you perform the linear regression using just two statements and to place the result in a variable named hypothesis.

Code depicting Python that lets you perform the linear regression using just two statements and to place the result in a variable named hypothesis.

FIGURE 1-2: Using the variable content to train a linear regression model.

Viewing a result

Performing any sort of analysis doesn’t pay unless you obtain some benefit from it in the form of a result. This book shows all sorts of ways to view output, but Figure 1-3 starts with something simple. In this case, you see the coefficient output from the linear regression analysis.

Code depicting the coefficient output from the linear regression analysis.

FIGURE 1-3: Outputting a result as a response to the model.

Tip One of the reasons that this book uses Jupyter Notebook is that the product helps you to create nicely formatted output as part of creating the application. Look again at Figure 1-3 and you see a report that you could simply print and offer to a colleague. The output isn’t suitable for many people, but those experienced with Python and data science will find it quite usable and informative.

Chapter 2 Placing Data Science within the Realm of AI

IN THIS CHAPTER

check Understanding how data and data science relate

check Considering the progression into AI and beyond

check Developing a data pipeline to AI

Some people perceive data science as simply a method of managing data for use with an AI discipline, but you can use your data science skills for a great many tasks other than AI. You use data science skills for various types of statistical analysis that don’t rely on an AI, such as to perform analytics, manage data in various ways, and locate information that you use directly rather than as an input into anything. However, the data science to AI connection does exist as well, so you need to know about it as a data scientist, which is the focus of the first part of this chapter.

Many terms used in data science become muddled because people misuse them. When you hear the term AI, you might think about all sorts of technologies that are either distinct AI subcategories or have nothing to do with AI at all. The second part of the chapter defines AI and then clarifies its connection to machine learning, which is a subcategory of AI, and finally to deep learning, which is actually a subcategory of machine learning. Understanding this hierarchy is important in understanding the role data science plays in making these technologies work.

The first two sections define the endpoints of a data pipeline. The third section describes the pipeline between data science and AI (and its subcategories). This data pipeline is a particular implementation of data science skills, so you need to know about it. You also need to consider that this data pipeline isn’t the only one you need to create and use as a data scientist. For example, you might be involved in a type of data mining that doesn’t rely on AI but rather on specific sorts of filtering, sorting, and the use of statistical analysis. The articles at https://www.innoarchitech.com/blog/data-science-big-data-explained-non-data-scientist and https://www.northeastern.edu/graduate/blog/what-does-a-data-scientist-do/ give you some other ideas about how data scientists create and use data pipelines.

Seeing the Data to Data Science Relationship

Obviously, to become a data scientist, you must have data to work with. What isn’t obvious is the kind of data, the data sources, and the uses of the data. Data is the requirement for analysis, but that analysis can take many forms. For example, the article at https://blog.allpsych.com/spending-habits-can-reveal-personality-traits/ talks about data used to guess your psychological profile, which can then be used for all sorts of purposes — many positive; others not. The issue is that these analyses often help others know more about you than you know yourself, which is a scary thought when you consider how someone might use the information.

The

Enjoying the preview?

Page 1 of 1

Data Science Programming All-in-One For Dummies

About this ebook

John Paul Mueller

Read more from John Paul Mueller

Machine Learning For Dummies

Algorithms For Dummies

Deep Learning For Dummies

C# 7.0 All-in-One For Dummies

Python for Data Science For Dummies

Beginning Programming with Python For Dummies

Data Analytics & Visualization All-in-One For Dummies

Functional Programming For Dummies

Artificial Intelligence For Dummies

AWS For Developers For Dummies

C# 10.0 All-in-One For Dummies

MATLAB For Dummies

AWS For Admins For Dummies

VBA For Dummies

Visio 2007 For Dummies

Windows Command Line Administration Instant Reference

C++ All-in-One For Dummies

Mastering IDEAScript: The Definitive Guide

Windows Administration at the Command Line for Windows 2003, Windows XP, and Windows 2000: In the Field Results

Machine Learning Security Principles: Keep data, networks, users, and applications safe from prying eyes

Related authors

Related to Data Science Programming All-in-One For Dummies

Related ebooks

Data Science For Dummies

Data Science Strategy For Dummies

SQL All-in-One For Dummies

SQL For Dummies

Blockchain Data Analytics For Dummies

C++ All-in-One For Dummies

Data Mining For Dummies

Artificial Intelligence For Dummies

Python All-in-One For Dummies

Web Coding & Development All-in-One For Dummies

Coding All-in-One For Dummies

Excel Data Analysis For Dummies

Business Statistics For Dummies

AWS For Developers For Dummies

Tableau For Dummies

Coding For Dummies

Beginning Programming with C++ For Dummies

HTML5 and CSS3 All-in-One For Dummies

Data Lakes For Dummies

Cloud Computing For Dummies

PHP and MySQL For Dummies

Adobe Analytics For Dummies

Coding All-in-One For Dummies

C++ For Dummies

PHP, MySQL, & JavaScript All-in-One For Dummies

Python for Data Science For Dummies

Predictive Analytics For Dummies

Beginning Programming with Python For Dummies

Functional Programming For Dummies

R For Dummies

Computers For You

Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race

The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution

The Invisible Rainbow: A History of Electricity and Life

Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad

The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology

101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters

Elon Musk

Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics

Computer Science I Essentials

SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL

Uncanny Valley: A Memoir

Deep Search: How to Explore the Internet More Effectively

The Simulation Hypothesis: An MIT Computer Scientist Shows Why AI, Quantum Physics and Eastern Mystics All Agree We Are In a Video Game

ChatGPT Money Machine 2024 - The Ultimate Chatbot Cheat Sheet to Go From Clueless Noob to Prompt Prodigy Fast! Complete AI Beginner’s Course to Catch the GPT Gold Rush Before It Leaves You Behind

Alan Turing: The Enigma: The Book That Inspired the Film The Imitation Game - Updated Edition

Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees

How to Create Cpn Numbers the Right way: A Step by Step Guide to Creating cpn Numbers Legally

Mastering ChatGPT: 21 Prompts Templates for Effortless Writing

Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)

CompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61

How to Write a Book: An 11-Step Process to Build Habits, Stop Procrastinating, Fuel Self-Motivation, Quiet Your Inner Critic, Bust Through Writer's Block, & Let Your Creative Juices Flow (Short Read)

The Professional Voiceover Handbook: Voiceover training, #1

101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters