BD Unit1
BD Unit1
BD Unit1
For example, bad advertising decisions can be one of the greatest wastes of resources
in a company. With data showing how different marketing channels are performing,
however, you can see which ones offer the greatest ROI and focus on those. Or you
could dig into why other channels are not performing as well and work to improve
their performance. This would allow you budget to generate more leads without
having to increase the advertising spend.
5)Data helps you understand consumers
1. Structured data –
Structured data is data whose elements are addressable for effective analysis.
It has been organized into a formatted repository that is typically a database.
It concerns all data which can be stored in database SQL in a table with rows
and columns. They have relational keys and can easily be mapped into pre-
designed fields. Today, those data are most processed in the development and
simplest way to manage information. Example: Relational data.
2. Semi-Structured data –
Semi-structured data is information that does not reside in a relational
database but that have some organizational properties that make it easier to
analyze. With some process, you can store them in the relation database (it
could be very hard for some kind of semi-structured data), but Semi-
structured exist to ease space. Example: XML data.
3. Unstructured data –
Unstructured data is a data that is which is not organized in a pre-defined
manner or does not have a pre-defined data model, thus it is not a good fit for
a mainstream relational database. So for Unstructured data, there are
alternative platforms for storing and managing, it is increasingly prevalent in
IT systems and is used by organizations in a variety of business intelligence
and analytics applications. Example: Word, PDF, Text, Media logs.
Velocity
Velocity refers to the speed at which the data is generated, collected and
analyzed. Data continuously flows through multiple channels such as computer
systems, networks, social media, mobile phones etc. In today’s data-driven
business environment, the pace at which data grows can be best described as
‘torrential’ and ‘unprecedented’. Now, this data should also be captured as close
to real-time as possible, making the right data available at the right time. The
speed at which data can be accessed has a direct impact on making timely and
accurate business decisions. Even a limited amount of data that is available in
real-time yields better business results than a large volume of data that needs a
long time to capture and analyze.
Several Big data technologies today allow us to capture and analyze data as it is
being generated in real-time.
Volume
Big data volume defines the ‘amount’ of data that is produced. The value of data is
also dependent on the size of the data.
Value
Although data is being produced in large volumes today, just collecting it is of no
use. Instead, data from which business insights are garnered add ‘value’ to the
company. In the context of big data, value amounts to how worthy the data is of
positively impacting a company’s business. This is where big data analytics come
into the picture. While many companies have invested in establishing data
aggregation and storage infrastructure in
their organizations, they fail to understand that the aggregation of data doesn’t
equal value addition. What you do with the collected data is what matters. With the
help of advanced data analytics, useful insights can be derived from the collected
data. These insights, in turn, are what add value to the decision-making process.
One way to ensure that the value of big data is considerable and worth investing
time and effort into, is by conducting a cost Vs benefit analysis. By calculating the
total cost of processing big data and comparing it with the ROI that the business
insights are expected to generate, companies can effectively decide whether or not
big data analytics will actually add any value to their business.
Variety
While the volume and velocity of data are important factors that add value to a
business, big data also entails processing diverse data types collected from varied
data sources. Data sources may involve external sources as well as internal
business units. Generally, big data is classified as structured, semi-structured and
unstructured data. While structured data is one whose format, length and volume
are clearly defined, semi- structured data is one that may partially conform to a
specific data format. On the other hand, unstructured data is unorganized data and
doesn’t conform with the traditional data formats. Data generated via digital and
social media (images, videos, tweets, etc.) can be classified as unstructured data,
The sheer volume of data that organizations usually collect and generate may look
chaotic and unstructured. In fact, almost 80 percent of data produced globally
including photos, videos, mobile data, and social media content, is unstructured in
nature.
Veracity/Validity
The Veracity of big data or Validity, as it is more commonly known, is the
assurance of quality or credibility of the collected data. Can you trust the data that
you have collected? Is this data credible enough to glean insights from? Should
we be basing our business decisions on the insights garnered from this data? All
these questions and more, are answered when the veracity of the data is known.
Since big data is vast and involves so many data sources, there is the possibility
that not all collected data will be of good quality or accurate in nature. Hence,
when processing big data sets, it is important that the validity of the data is
checked before proceeding for processing.
Online ticket bookings, which includes your Rail tickets, Flight tickets, movie
tickets etc.
Online shopping which is your Amazon, Flipkart, Walmart, Snap deal and
many more.
Data from social media sites like Facebook, Instagram, what’s app and a lot
more.
The employee details of any Multinational Company.
So, with this let us move into the Analytical Big Data Technologies.
Analytical Big Data is like the advanced version of Big Data Technologies. It is a
little complex than the Operational Big Data. In short, Analytical big data is where
the actual performance part comes into the picture and the crucial real-time
business decisions are made by analyzing the Operational Big Data.
Stock marketing
Carrying out the Space missions where every single bit of information is
crucial.
Weather forecast information.
Medical fields where a particular patients health status can be monitored.
2. Diagnostic Analytics
Diagnostic analytics is used to determine why something happened in the past. It is
characterized by techniques such as drill-down, data discovery, data mining and
correlations. Diagnostic analytics takes a deeper look at data to understand the root
causes of the events. It is helpful in determining what factors and events contributed
to the outcome. It mostly uses probabilities, likelihoods, and the distribution of
outcomes for the analysis.
In a time series data of sales, diagnostic analytics would help you understand why the
sales have decrease or increase for a specific year or so. However, this type of
analytics has a limited ability to give actionable insights. It just provides an
understanding of causal relationships and sequences while looking backward.
A few techniques that uses diagnostic analytics include attribute importance, principle
components analysis, sensitivity analysis, and conjoint analysis. Training algorithms
for classification and regression also fall in this type of analytics
3. Predictive Analytics
As mentioned above, predictive analytics is used to predict future outcomes.
However, it is important to note that it cannot predict if an event will occur in the
future; it merely forecasts what are the probabilities of the occurrence of the event. A
predictive model builds on the preliminary descriptive analytics stage to derive the
possibility of the outcomes.
The essence of predictive analytics is to devise models such that the existing data is
understood to extrapolate the future occurrence or simply, predict the future data. One
of the common applications of predictive analytics is found in sentiment analysis
where all the opinions posted on social media are collected and analyzed (existing
text data) to predict the person’s sentiment on a particular subject as being- positive,
negative or neutral (future prediction).
Hence, predictive analytics includes building and validation of models that provide
accurate predictions. Predictive analytics relies on machine learning algorithms like
random forests, SVM, etc. and statistics for learning and testing the data. Usually,
companies need trained data scientists and machine learning experts for building
these models. The most popular tools for predictive analytics include Python, R,
RapidMiner, etc.
The prediction of future data relies on the existing data as it cannot be obtained
otherwise. If the model is properly tuned, it can be used to support complex forecasts
in sales and marketing. It goes a step ahead of the standard BI in giving accurate
predictions.
4. Prescriptive Analytics
The basis of this analytics is predictive analytics but it goes beyond the three
mentioned above to suggest the future solutions. It can suggest all favorable outcomes
according to a specified course of action and also suggest various course of actions to
get to a particular outcome. Hence, it uses a strong feedback system that constantly
learns and updates the relationship between the action and the outcome.
The computations include optimisation of some functions that are related to the
desired outcome. For example, while calling for a cab online, the application uses
GPS to connect you to the correct driver from among a number of drivers found
nearby. Hence, it optimises the distance for faster arrival time. Recommendation
engines also use prescriptive analytics.
The other approach includes simulation where all the key performance areas are
combined to design the correct solutions. It makes sure whether the key performance
metrics are included in the solution. The optimisation model will further work on the
impact of the previously made forecasts. Because of its power to suggest favorable
solutions, prescriptive analytics is the final frontier of advanced analytics or data
science, in today’s term.
Big data solutions typically involve one or more of the following types
of workload:
Batch processing of big data sources at rest.
Real-time processing of big data in motion.
Interactive exploration of big data.
Predictive analytics and machine learning.
Big Data Architectures components
1. Data sources:
All big data solutions start with one or more data sources. Examples
include:
Application data stores, such as relational databases.
Static files produced by applications, such as web server log files.
Real-time data sources, such as IoT devices.
2. Data storage:
Data for batch processing operations is typically stored in a distributed file
store that can hold high volumes of large files in various formats. This kind of store
is often called a data lake.
Options for implementing this storage include Azure Data Lake Store or blob
(OBJECT) containers (MULTIMEDIA FILES) in Azure Storage.
3. Batch processing:
Because the data sets are so large, often a big data solution must process data
files using long-running batch jobs to filter, aggregate, and otherwise prepare
the data for analysis.
Usually these jobs involve reading source files, processing them,
and writing the output to new files.
4. Real-time message ingestion
If the solution includes real-time sources, the architecture must include a way
to capture and store real-time messages for stream processing.
This might be a simple data store, where incoming messages are dropped into a
folder for processing.
However, many solutions need a message ingestion store to act as a buffer for
messages, and to support scale-out processing, reliable delivery, and other
message queuing semantics. Options include Azure Event Hubs, Azure IoT
Hubs, and Kafka
5. Stream processing:
After capturing real-time messages, the solution must process them
by filtering, aggregating, and otherwise preparing the data for
analysis.
The processed stream data is then written to an output sink.
EXAMPLE: Azure Stream Analytics provides a managed stream
processing service based on perpetually running SQL queries that
operate on unbounded streams.
Can also use open source Apache streaming technologies like Storm
and Spark Streaming in an HDInsight cluster.
6. Analytical data store
Many big data solutions prepare data for analysis and then serve the
processed data in a structured format that can be queried using
analytical tools. The analytical data store used to serve these queries
can be a relational data warehouse, as seen in most traditional business
intelligence (BI) solutions
The goal of most big data solutions is to provide insights into the data
through analysis and reporting.
To empower users to analyze the data, the architecture may include a
data modeling layer, such as a multidimensional OLAP cube or tabular
data model in Azure Analysis Services
7. Orchestration
Most big data solutions consist of repeated data processing
operations, encapsulated in workflows, that transform source data,
move data between multiple sources and sinks, load the processed
data into an analytical data store, or push the results straight to a
report or dashboard
Orchestration is the automated configuration, management, and
coordination of computer systems, applications, and
services. Orchestration helps IT to more easily manage complex
tasks and workflows
Now we are going ahead with the main components of big data.
1. Machine Learning
It is the science of making computers learn stuff by themselves. In machine learning,
tasks without any explicit instructions. Machine learning applications provide results
based on past experience. For example, these days there are some mobile applications
that will give you a summary of your finances, bills, will remind you of your bill
payments, and also may give you suggestions to go for some saving plans. These
obvious examples that people can relate to these days are google home and Amazon
Alexa. Both use NLP and other technologies to give us a virtual assistant experience.
NLP is all around us without us even realizing it. When writing a mail, while making
any mistakes, it automatically corrects itself and these days it gives auto-suggests for
completing the mails and automatically intimidates us when we try to send an email
without the attachment that we referenced in the text of the email, this is part of
3. Business Intelligence
Business Intelligence (BI) is a method or process that is technology-driven to gain
insights by analyzing data and presenting it in a way that the end-users (usually high-
level executives) like managers and corporate leaders can gain some actionable
4. Cloud Computing
we can define cloud computing as the delivery of computing services—servers,
Internet (“the cloud”) to offer faster innovation, flexible resources, and economies of
scale.
Advantages and Disadvantages
Advantages Disadvantages
data analytics.
data.
1. Cost Savings : Some tools of Big Data like Hadoop and Cloud-Based Analytics
can bring cost advantages to business when large amounts of data are to be stored
and these tools also help in identifying more efficient ways of doing business.
2. Time Reductions :The high speed of tools like Hadoop and in-memory analytics
can easily identify new sources of data which helps businesses analyzing data
immediately and make quick decisions based on the learnings.
3. Understand the market conditions : By analyzing big data you can get a better
understanding of current market conditions. For example, by analyzing customers’
purchasing behaviors, a company can find out the products that are sold the most
and produce products according to this trend. By this, it can get ahead of its
competitors.
4. Control online reputation: Big data tools can do sentiment analysis. Therefore,
you can get feedback about who is saying what about your company. If you want
to monitor and improve the online presence of your business, then, big data tools
can help in all this.
5. Using Big Data Analytics to Boost Customer Acquisition and Retention
The customer is the most important asset any business depends on. There is no
single business that can claim success without first having to establish a solid
customer base. However, even with a customer base, a business cannot afford to
disregard the high competition it faces. If a business is slow to learn what
customers are looking for, then it is very easy to begin offering poor quality
products. In the end, loss of clientele will result, and this creates an adverse overall
effect on business success. The use of big data allows businesses to observe
various customer related patterns and trends. Observing customer behaviour is
important to trigger loyalty.
6. Using Big Data Analytics to Solve Advertisers Problem and Offer Marketing
Insights
Big data analytics can help change all business operations. This includes the ability
to match customer expectation, changing company’s product line and of course
ensuring that the marketing campaigns are powerful.
7. Big Data Analytics As a Driver of Innovations and Product Development
Another huge advantage of big data is the ability to help companies innovate and
redevelop their products.
> Simple classification can be: financial, HR, sales, inventory, and communications.
> Once organizations better understand their data, they can take important steps to
segregate the information and that makes it easier to employ security measures like
encryption and monitoring more
Protecting Big Data Analytics:
A real concern with Big Data is the fact that Big Data contains all of the things
you don't want to see when are trying to protect data, very unique sample set,
etc.
Such uniqueness also means that you can't leverage time-saving backup and
security technologies such as reduplication.
Significant issue is the large size and number of files involved in Big Data
Analytics environment. Backup bandwidth and/or the backup appliance must
be large and the receiving devices must be able to ingest data at the delivery
rate of data.
Hadoop distributions:
Big data platforms based on Hadoop are market newcomers that have appeared
within the past several years.
The primary vendors in this space (MapR, Hortonworks, and Cloudera) run
Hadoop as their core data processing platforms, which they supplement with a
storm of open source software and, in some cases, proprietary software.
Cloud managed services:
This category includes pure-play cloud service providers that manage and
operate big data platforms on behalf of subscribers in the cloud.
More than a platform-as-a-service, a cloud managed service lets customers
focus solely on analyzing data and building data-driven applications rather than
data infrastructure.
In addition, cloud managed services provide a quick and easy way for
customers without information technology experts or available servers to try
out or deploy a big data platform.
Leading cloud managed service providers include Altiscale, Qubole,
Treasure Data, Cazena, and Amazon Web Services (AWS).
Challenges of Conventional System
The challenges when dealing with Big Data in three dimensions:
data,
process,
and management.
Data Challenge
Volume
The volume of data, especially machine-generated data, is exploding,
how fast that data is growing every year, with new sources of data that are
emerging. •For example, in the year 2000, 800,000 petabytes (PB) of data
were stored in the world, and it is expected to reach 35 zettabytes (ZB) by
2020 (according to IBM).
Social media plays a key role: Twitter generates 7+ terabytes (TB) of data
every day. Facebook, 10 TB.
Mobile devices play a key role as well, as there were estimated 6 billion
mobile phones in 2011.
The challenge is how to deal with the size of Big Data.
PROCESSING
Variety, Combining Multiple Data Sets
More than 80% of today’s information is unstructured and it is typically
too big to manage effectively.
Today, companies are looking to leverage a lot more
data from a wider variety of sources both inside and outside the
organization.
Things like documents, contracts, machine data, sensor data, social media,
health records, emails, etc. The list is endless really.
MANAGEMENT
A lot of this data is unstructured, or has a complex structure that’s hard to
represent in rows and
R is the leading analytics tool in the industry and widely used for statistics and
data modeling. It can easily manipulate your data and present in different ways. It
has exceeded SAS in many ways like capacity of data, performance and outcome.
R compiles and runs on a wide variety of platforms viz -UNIX, Windows and
MacOS. It has 11,556 packages and allows you to browse the packages by
categories. R also provides tools to automatically install all packages as per user
requirement, which can also be well assembled with Big data.
2. Tableau Public:
Tableau Public is a free software that connects any data source be it corporate
Data Warehouse, Microsoft Excel or web-based data, and creates data
visualizations, maps, dashboards etc. with real-time updates presenting on web.
They can also be shared through social media or with the client. It allows the
access to download the file in different formats. If you want to see the power of
tableau, then we must have very good data source. Tableau’s Big Data capabilities
makes them important and one can analyze and visualize data better than any other
data visualization software in the market.
3.Python
QlikView has many unique features like patented technology and has in-memory
data processing, which executes the result very fast to the end users and stores
the data in the report itself. Data association in QlikView is automatically
maintained and can be compressed to almost 10% from its original size. Data
relationship is visualized using colors – a specific color is given to related data
and another color for non-related data.
10. Splunk:
Splunk is a tool that analyzes and search the machine-generated data. Splunk pulls
all text-based log data and provides a simple way to search through it, a user can
pull in all kind of data, and perform all sort of interesting statistical analysis on it,
and present it in different formats.
Reporting vs Analysis
Living in the era of digital technology and big data has made organizations
dependent on the wealth of information data can bring. You might have seen how
reporting and analysis are used interchangeably, especially the manner which
outsourcing companies market their services. While both areas are part of web
analytics (note that analytics isn’t similar to analysis), there’s a vast difference
between them, and it’s more than just spelling.
It’s important that we differentiate the two because some organizations might be
selling themselves short in one area and not reap the benefits, which web analytics
can bring to the table. The first core component of web analytics, reporting, is
merely organizing data into summaries. On the other hand, analysis is the process
of inspecting, cleaning, transforming, and modeling these summaries (reports)
with the goal of highlighting useful information.
Simply put, reporting translates data into information while analysis turns
information into insights. Also, reporting should enable users to ask “What?”
questions about the information, whereas analysis should answer to “Why”” and
“What can we do about it?”
Reporting helps companies monitor their data even before digital technology
boomed. Various organizations have been dependent on the information it brings
to their business, as reporting extracts that and makes it easier to understand.
Analysis interprets data at a deeper level. While reporting can link between cross-
channels of data, provide comparison, and make understand information easier
(think of a dashboard, charts, and graphs, which are reporting tools and not
analysis reports), analysis interprets this information and provides
recommendations on actions.
2. Tasks
As reporting and analysis have a very fine line dividing them, sometimes it’s easy
to confuse tasks that have analysis labeled on top of them when all it does is
reporting. Hence, ensure that your analytics team has a healthy balance doing
both.
3. Outputs
Reporting and analysis have the push and pull effect from its users through their
outputs. Reporting has a push approach, as it pushes information to users and
outputs come in the forms of canned reports, dashboards, and alerts.
Analysis has a pull approach, where a data analyst draws information to further
probe and to answer business questions. Outputs from such can be in the form of
ad hoc responses and analysis presentations. Analysis presentations are comprised
of insights, recommended actions, and a forecast of its impact on the
company—all in a language that’s easy to understand at the level of the user
who’ll be reading and deciding on it.
This is important for organizations to realize truly the value of data, such that a
standard report is not similar to a meaningful analytics.
4. Delivery
Considering that reporting involves repetitive tasks—often with truckloads of data,
automation has been a lifesaver, especially now with big data. It’s not surprising
that the first thing outsourced are data entry services since outsourcing companies
are perceived as data reporting experts.
Analysis requires a more custom approach, with human minds doing superior
reasoning and analytical thinking to extract insights, and technical skills to provide
efficient steps towards accomplishing a specific goal. This is why data analysts
and scientists are demanded these days, as organizations depend on them to come
up with recommendations for leaders or business executives make decisions about
their businesses.
5. Value
This isn’t about identifying which one brings more value, rather understanding
that both are indispensable when looking at the big picture. It should help
businesses grow, expand, move forward, and make more profit or increase their
value.
This Path to Value diagram illustrates how data converts into value by reporting
and analysis such that it’s not achievable without the other.
Data alone is useless, and action without data is baseless. Both reporting and
analysis are vital to bringing value to your data and operations.
Reporting and Analysis are Valuable
Not to undermine the role of reporting in web analytics, but organizations need to
understand that reporting itself is just numbers. Without drawing insights and
getting reports aligned with your organization’s big picture, you can’t make
decisions based on reports alone.
Data analysis is the most powerful tool to bring into your business. Employing
the powers of analysis can be comparable to finding gold in your reports, which
allows your business to increase profits and further develop.