Big Data12
Big Data12
Big Data12
In order to understand 'Big Data', you first need to know What is data?
Systems that process and store big data have become a common component of data
management architectures in organizations. Big data is often characterized by the 3Vs: the
large volume of data in many environments, the wide variety of data types stored in big data systems
and the velocity at which the data is generated, collected and processed. These characteristics were
first identified by Doug Laney, then an analyst at Meta Group Inc., in 2001; Gartner further popularized
them after it acquired Meta Group in 2005. More recently, several other Vs have been added to
different descriptions of big data, including veracity, value and variability.
Although big data doesn't equate to any specific volume of data, big data deployments often involve
terabytes (TB), petabytes (PB) and even exabytes (EB) of data captured over time.
Companies use the big data accumulated in their systems to improve operations, provide
better customer service, create personalized marketing campaigns based on specific
customer preferences and, ultimately, increase profitability. Businesses that utilize big data
hold a potential competitive advantage over those that don't since they're able to make faster
and more informed business decisions, provided they use the data effectively.
Big data is also used by medical researchers to identify disease risk factors and by doctors to
help diagnose illnesses and conditions in individual patients. In addition, data derived from
electronic health records (EHRs), social media, the web and other sources provides
healthcare organizations and government agencies with up-to-the-minute information on
infectious disease threats or outbreaks.
In the energy industry, big data helps oil and gas companies identify potential drilling locations
and monitor pipeline operations; likewise, utilities use it to track electrical grids. Financial
services firms use big data systems for risk management and real-time analysis of market
data. Manufacturers and transportation companies rely on big data to manage their supply
chains and optimize delivery routes. Other government uses include emergency response,
crime prevention and smart city initiatives.
Characteristics Of Big Data
(i) Volume – The name Big Data itself is related to a size which is enormous. Size of data plays a
very crucial role in determining value out of data. Also, whether a particular data can actually be
considered as a Big Data or not, is dependent upon the volume of data. Hence, 'Volume' is one
characteristic which needs to be considered while dealing with Big Data.
Variety refers to heterogeneous sources and the nature of data, both structured and unstructured.
During earlier days, spreadsheets and databases were the only sources of data considered by most
of the applications. Nowadays, data in the form of emails, photos, videos, monitoring devices, PDFs,
audio, etc. are also being considered in the analysis applications. This variety of unstructured data
poses certain issues for storage, mining and analyzing data.
(iii) Velocity – The term 'velocity' refers to the speed of generation of data. How fast the data
is generated and processed to meet the demands, determines real potential in the data.
Big Data Velocity deals with the speed at which data flows in from sources like business
processes, application logs, networks, and social media sites, sensors, Mobile devices, etc.
The flow of data is massive and continuous.
(iv) Variability – This refers to the inconsistency which can be shown by the data at times,
thus hampering the process of being able to handle and manage the data effectively.
Types Of Big Data
BigData' could be found in three forms:
1. Structured
2. Unstructured
3. Semi-structured
Structured
Any data that can be stored, accessed and processed in the form of fixed format is termed as a
'structured' data. Over the period of time, talent in computer science has achieved greater success in
developing techniques for working with such kind of data (where the format is well known in advance)
and also deriving value out of it. However, nowadays, we are foreseeing issues when a size of such
data grows to a huge extent, typical sizes are being in the rage of multiple zettabytes.
Do you know? 1021 bytes equal to 1 zettabyte or one billion terabytes forms a zettabyte.
Looking at these figures one can easily understand why the name Big Data is given
and imagine the challenges involved in its storage and processing.
Do you know? Data stored in a relational database management system is one example of
a 'structured' data.
Unstructured
Any data with unknown form or the structure is classified as unstructured data. In addition to the size
being huge, un-structured data poses multiple challenges in terms of its processing for deriving value
out of it. A typical example of unstructured data is a heterogeneous data source containing a
combination of simple text files, images, videos etc. Now day organizations have wealth of data
available with them but unfortunately, they don't know how to derive value out of it since this data is
in its raw form or unstructured format.
Semi-structured
Semi-structured data can contain both the forms of data. We can see semi-structured
data as a structured in form but it is actually not defined with e.g. a table definition in
relational DBMS. Example of semi-structured data is a data represented in an XML
file.
<rec><name>Prashant Rao</name><sex>Male</sex><age>35</age></rec>
<rec><name>Seema R.</name><sex>Female</sex><age>41</age></rec>
<rec><name>Satish Mane</name><sex>Male</sex><age>29</age></rec>
<rec><name>Subrato Roy</name><sex>Male</sex><age>26</age></rec>
<rec><name>Jeremiah J.</name><sex>Male</sex><age>35</age></rec>
The need to handle big data velocity imposes unique demands on the underlying compute
infrastructure. The computing power required to quickly process huge volumes and varieties of data
can overwhelm a single server or server cluster. Organizations must apply adequate processing
capacity to big data tasks in order to achieve the required velocity. This can potentially demand
hundreds or thousands of servers that can distribute the processing work and operate collaboratively
in a clustered architecture, often based on technologies like Hadoop and Apache Spark.
Achieving such velocity in a cost-effective manner is also a challenge. Many enterprise leaders are
reticent to invest in an extensive server and storage infrastructure to support big data workloads,
particularly ones that don't run 24/7. As a result, public cloud computing is now a primary vehicle for
hosting big data systems. A public cloud provider can store petabytes of data and scale up the
required number of servers just long enough to complete a big data analytics project. The business
only pays for the storage and compute time actually used, and the cloud instances can be turned off
until they're needed again.
To improve service levels even further, public cloud providers offer big data capabilities through
managed services that include the following:
lower-cost cloud object storage, such as Amazon Simple Storage Service (S3);
YARN, Hadoop's built-in resource manager and job scheduler, which stands for Yet Another
Resource Negotiator but is commonly known by the acronym alone;
Users can install the open source versions of the technologies themselves or turn to
commercial big data platforms offered by Cloudera, which merged with former rival
Hortonworks in January 2019, or Hewlett Packard Enterprise (HPE), which bought the assets
of big data vendor MapR Technologies in August 2019. The Cloudera and MapR platforms
are also supported in the cloud.
Big data can be contrasted with small data, another evolving term that's often used to describe
data whose volume and format can be easily used for self-service analytics. A commonly
quoted axiom is that "big data is for machines; small data is for people."
Volume is the most commonly cited characteristic of big data. A big data environment doesn't
have to contain a large amount of data, but most do because of the nature of the data being
collected and stored in them. Clickstreams, system logs and stream processing systems are
among the sources that typically produce massive volumes of big data on an ongoing basis.
Access to social data from search engines and sites like facebook, twitter are enabling
organizations to fine tune their business strategies.
Traditional customer feedback systems are getting replaced by new systems designed with Big
Data technologies. In these new systems, Big Data and natural language processing technologies
are being used to read and evaluate consumer responses.
Big Data technologies can be used for creating a staging area or landing zone for new data before
identifying what data should be moved to the data warehouse. In addition, such integration of Big
Data technologies and data warehouse helps an organization to offload infrequently accessed data.
USES
Big data is used in nearly every industry to identify patterns and trends, answer questions,
gain insights into customers, and tackle complex problems. Companies and organizations use the
information for a multitude of reasons like growing their businesses, understanding customer
decisions, enhancing research, making forecasts and targeting key audiences for advertising.
BIG DATA EXAMPLES
• Personalized e-commerce shopping experiences
• Financial market modelling
• Media recommendations from streaming services like Spotify, Hulu and Netflix
• Big data helping sports teams maximize their efficiency and value
• Recognizing trends in education habits from individual students, schools and districts
Here are a few industries in which the big data revolution is already underway:
Finance
The finance and insurance industries utilize big data and predictive analytics for fraud detection, risk
assessments, credit rankings, brokerage services and blockchain technology, among other uses.
Financial institutions are also using big data to enhance their cybersecurity efforts and
personalize financial decisions for customers.
Healthcare
Hospitals, researchers and pharmaceutical companies are adopting big data solutions to improve and
advance healthcare. With access to vast amounts of patient and population data, healthcare is
enhancing treatments, performing more effective research on diseases like cancer and Alzheimer’s,
developing new drugs, and gaining critical insights on patterns within population health.
Media companies analyze our reading, viewing and listening habits to build individualized
experiences. Netflix even uses data on graphics, titles and colors to make decisions about customer
preferences.
Agriculture
From engineering seeds to predicting crop yields with amazing accuracy, big data and automation is
rapidly enhancing the farming industry.
With the influx of data in the last two decades, information is more abundant than food in many
countries, leading researchers and scientists to use big data to tackle hunger and malnutrition. With
groups like the Global Open Data for Agriculture & Nutrition (GODAN) promoting open and
unrestricted access to global nutrition and agricultural data, some progress is being made in the fight
to end world hunger.More Application Areas
Summary
• Big Data is defined as data that is huge in size. Bigdata is a term used to describe a collection
of data that is huge in size and yet growing exponentially with time.
• Examples of Big Data generation includes stock exchanges, social media sites, jet engines,
etc.
• Big Data could be 1) Structured, 2) Unstructured, 3) Semi-structured
• Volume, Variety, Velocity, and Variability are few Characteristics of Bigdata
• Improved customer service, better operational efficiency, Better Decision Making are few
advantages of Bigdata
GLOSSARY
English Spanish English Spanish