BIG DATA Technology: Subtitle
BIG DATA Technology: Subtitle
BIG DATA Technology: Subtitle
Subtitle
What is Big Data?
▪ 'Big Data' is also a data but with a huge size. 'Big Data' is a term used to
describe collection of data that is huge in size and yet growing
exponentially with time.
▪ such a data is so large and complex that none of the traditional data
management tools are able to store it or process it efficiently.
What comes under Big Data?
▪ Big data involves the data produced by different devices and applications.
Given below are some of the fields that come under the umbrella of Big
Data.
▪ Black Box Data
▪ Social Media Data
▪ Stock Exchange Data
▪ Power Grid Data
▪ Search Engine Data
Types of Data
▪ Big Data includes huge volume, high velocity, and extensible variety
of data. The data in it will be of three types.
▪ Structured data : Relational data.
▪ Semi Structured data : XML data.
▪ Unstructured data : Word, PDF, Text, Media Logs.
Importance of Big Data:
▪ The importance of big data doesn’t revolve around how much data
you have, but what you do with it. You can take data from any
source and analyze it to find answers that enable
1) cost reductions,
2) time reductions
3) new product development and optimized offerings
4) smart decision making.
Importance of Big Data: (cont.)
▪ When you combine big data with high-powered analytics, you can
accomplish business-related tasks such as:
1.Determining root causes of failures, issues and defects in near-real
time.
2.Generating coupons at the point of sale based on the customer’s
buying habits.
3.Recalculating entire risk portfolios in minutes.
4.Detecting fraudulent behavior before it affects your organization.
4 V’s of Big Data / characteristics of
Big data
Volume
▪ Since more than one client may access the same data simultaneously, the
server must have a mechanism in place (such as maintaining information
about the times of access) to organize updates so that the client always
receives the most current version of data and that data conflicts do not
arise.
▪ Distributed file systems typically use file or database replication
(distributing copies of data on multiple servers) to protect against data
access failures.
▪ Sun Microsystems' Network File System (NFS), Novell NetWare,
Microsoft's Distributed File System, and IBM/Transarc's DFS are some
examples of distributed file systems.
Benefits of DFS:
▪ Resources management
users access all resources through a single point
▪ Accessibility
users do not need to know the physical location of the shared folder, then
can navigate to it through Explorer and domain tree)
▪ Fault tolerance
shares can be replicated, so if the server in Chicago goes down,
resources still will be available to users
▪ Work load management
DFS allows administrators to distribute shared folders and workloads
across several servers for more efficient network and server resources use.
Big Data Analytics
• diagnostic analytics
• predictive analytics
• prescriptive analytics
Types of Big Data analytics (cont.)
Descriptive Analytics
▪descriptive analytics are more about summarizing and reporting data.
▪This type of data analytics is geared towards what is currently happening or
what has already happened.
▪Descriptive analytics are often carried out via ad-hoc reporting or dashboards
▪The reports are generally static in nature and display historical data that is
presented in the form of data grids or charts.
Diagnostic Analytics
Mapper class takes the input, tokenizes it, maps and sorts it. The output of
Mapper class is used as input by Reducer class, which in turn searches
matching pairs and reduces them.
MapReduce implements various mathematical algorithms to divide a task into
small parts and assign them to multiple systems