Data Analytics
Data Analytics
Data Analytics
UNIT – I
Introduction to Data Analysis
Overview of Data Analytics (DA)
Analysis of data, also known as data analytics, is a process of inspecting,
cleansing, transforming, and modeling data with the goal of discovering useful
information, suggesting conclusions, and supporting decision-making.
There are four types of big data BI that really aid business:
2. Faster, better decision making. With the speed of Hadoop and in-
memory analytics, combined with the ability to analyze new sources of
data, businesses are able to analyze information immediately – and
make decisions based on what they’ve learned.
3. New products and services. With the ability to gauge customer needs
and satisfaction through analytics comes the power to give customers
what they want. Davenport points out that with big data analytics, more
companies are creating new products to meet customers’ needs.
Classification of Data
Structured Data
Structured data concerns all data which can be stored in database SQL in table
with rows and columns. They have relational key and can be easily mapped
into pre-designed fields. Today, those data are the most processed in
development and the simplest way to manage information.
But as Structured data, semi structured data represents a few parts of data (5
to 10%).
Unstructured data
Unstructured data represent around 80% of data. It often includes text and
multimedia content. Examples include e-mail messages, word processing
documents, videos, photos, audio files, presentations, WebPages and many
other kinds of business documents. Note that while these sorts of files may
have an internal structure, they are still considered « unstructured » because
the data they contain doesn’t fit neatly in a database.
Satellite images: This includes weather data or the data that the
government captures in its satellite surveillance imagery. Just think
about Google Earth, and you get the picture.
Scientific data: This includes seismic imagery, atmospheric data, and
high energy physics.
Photographs and video: This includes security, surveillance, and traffic
video.
Radar or sonar data: This includes vehicular, meteorological, and
oceanographic seismic profiles.
Text internal to your company: Think of all the text within documents,
logs, survey results, and e-mails. Enterprise information actually
represents a large percent of the text information in the world today.
Social media data: This data is generated from the social media
platforms such as YouTube, Facebook, Twitter, LinkedIn, and Flickr.
Mobile data: This includes data such as text messages and location
information.
Website content: This comes from any site delivering unstructured
content, like YouTube, Flickr, or Instagram.
The unstructured data growing quickiest than the other, and their exploitation
could help in business decision.
A group called the Organization for the Advancement of Structured
Information Standards (OASIS) has published the Unstructured Information
Management Architecture (UIMA) standard. The UIMA « defines platform-
independent data representations and interfaces for software components or
services called analytics, which analyze unstructured information and assign
semantics to regions of that unstructured information. »
Many industry watchers say that Hadoop has become the de facto industry
standard for managing Big Data.
Characteristics of Data
There is lot of buzz around data these days. Businesses, big and small, have
started relying on data analytics for critical business decisions. However, it is
observed that not all businesses are able to leverage the benefits of data
analytics in the same ratio. Let us try to understand the reason behind this.
There are five data characteristics that are the building blocks of an efficient
data analytics solution: accuracy, completeness, consistency, uniqueness, and
timeliness. Understanding each of these will help us in understanding why
different businesses are not able to leverage the benefits of data analytics in
the same ratio.
Accuracy
When they are insights extracted from a well-developed and well-tested data
analytics solution, we are assuming that the data is reliable and accurate.
However, flaws in data collection, data storage, or data retrieving will result in
unreliable data and this will reduce the accuracy of the insights extracted by a
data analytics solution.
Completeness
The insights or information extracted by a data analytics solution depends a
great deal on the completeness of the data. Partial data or a dataset with lot of
missing values represents an incomplete picture. Thus, the degree of
completeness of a data determines the accuracy of a data analytics solution.
Consistency
The consistency within a dataset is another important factor that determines
the degree of accuracy of a data analytics solution. A consistent dataset is less
prone to errors and results in better accuracy of a data analytics solution.
Uniqueness
One of the essential components of any business is high quality data. This data,
if used properly, can make a company competitive or can keep a company
competitive. Thus, the degree of uniqueness of data explains the efficiency of a
data analytics solution. In order to add value to any business, the data should
be unique and distinctive.
Timeliness
A data analytics solution that uses out-dated data can restrict a company from
achieving their goals or from surviving in a competitive arena. New and current
data is more valuable to a business than old out-dated data. Though old data
should not be completely over-looked by a data analytics solution, but
emphasis should be placed on the current data.
Applications of Data Analytics/ Uses of Data Science
Using data science, companies have become intelligent enough to push & sell
products as per customers purchasing power & interest. Here’s how they are
ruling our hearts and minds:
Internet Search
When we speak of search, we think ‘Google’. Right? But there are many other
search engines like Yahoo, Bing, Ask, AOL, Duckduckgo etc. All these search
engines (including Google) make use of data science algorithms to deliver the
best result for our searched query in fraction of seconds. Considering the fact
that, Google processes more than 20 petabytes of data everyday. Had there
been no data science, Google wouldn’t have been the ‘Google’ we know today.
Digital Advertisements (Targeted Advertising and re-targeting)
If you thought Search would have been the biggest application of data science
and machine learning, here is a challenger – the entire digital marketing
spectrum. Starting from the display banners on various websites to the digital
bill boards at the airports – almost all of them are decided by using data
science algorithms.
This is the reason why digital ads have been able to get a lot higher CTR than
traditional advertisements. They can be targeted based on user’s past
behaviour. This is the reason why I see ads of analytics trainings while my
friend sees ad of apparels in the same place at the same time.
Recommender Systems
Who can forget the suggestions about similar products on Amazon? They not
only help you find relevant products from billions of products available with
them, but also adds a lot to the user experience.
A lot of companies have fervidly used this engine / system to promote their
products / suggestions in accordance with user’s interest and relevance of
information. Internet giants like Amazon, Twitter, Google Play, Netflix,
Linkedin, imdb and many more uses this system to improve user experience.
The recommendations are made based on previous search results for a user.
Image Recognition
You upload your image with friends on Facebook and you start getting
suggestions to tag your friends. This automatic tag suggestion feature uses face
recognition algorithm. Similarly, while using whatsapp web, you scan a barcode
in your web browser using your mobile phone. In addition, Google provides
you the option to search for images by uploading them. It uses image
recognition and provides related search results. To know more about image
recognition, check out this amazing (1:31) mins video:
https://www.analyticsvidhya.com/blog/2015/09/applications-data-science/
Speech Recognition
Some of the best example of speech recognition products are Google Voice,
Siri, Cortana etc. Using speech recognition feature, even if you aren’t in a
position to type a message, your life wouldn’t stop. Simply speak out the
message and it will be converted to text. However, at times, you would realize,
speech recognition doesn’t perform accurately. Just for laugh, check out this
hilarious video(1:30 mins) and the conversation between Cortana & Satya
Nadela (CEO, Microsoft).
https://www.analyticsvidhya.com/blog/2015/09/applications-data-science/
Gaming
At a basic level, these websites are being driven by lots and lots of data which
is fetched using APIs and RSS Feeds. If you have ever used these websites, you
would know, the convenience of comparing the price of a product from
multiple vendors at one place. PriceGrabber, PriceRunner, Junglee, Shopzilla,
DealTime are some examples of price comparison websites. Now a days, price
comparison website can be found in almost every domain such as technology,
hospitality, automobiles, durables, apparels etc.
Airline Industry across the world is known to bear heavy losses. Except a few
airline service providers, companies are struggling to maintain their occupancy
ratio and operating profits. With high rise in air fuel prices and need to offer
heavy discounts to customers has further made the situation worse. It wasn’t
for long when airlines companies started using data science to identify the
strategic areas of improvements. Now using data science, the airline
companies can:
5. Southwest Airlines, Alaska Airlines are among the top companies who’ve
embraced data science to bring changes in their way of working.
Delivery logistics
Who says data science has limited applications? Logistic companies like DHL,
FedEx, UPS, Kuhne+Nagel have used data science to improve their operational
efficiency. Using data science, these companies have discovered the best
routes to ship, the best suited time to deliver, the best mode of transport to
choose thus leading to cost efficiency, and many more to mention. Further
more, the data that these companies generate using the GPS installed,
provides them a lots of possibilities to explore using data science.
Miscellaneous
Apart from the applications mentioned above, data science is also used in
Marketing, Finance, Human Resources, Health Care, Government Policies and
every possible industry where data gets generated. Using data science, the
marketing departments of companies decide which products are best for Up
selling and cross selling, based on the behavioral data from customers. In
addition, predicting the wallet share of a customer, which customer is likely to
churn, which customer should be pitched for high value product and many
other questions can be easily answered by data science. Finance (Credit Risk,
Fraud), Human Resources (which employees are most likely to leave,
employees performance, decide employees bonus) and many other tasks are
easily accomplished using data science in these disciplines.