Introduction to Hadoop: What Hadoop is and why it is important in today's data-driven world.
What is Hadoop?
In today's data-driven world, millions of data are generated in seconds. It's getting tough to store and analyze the structure and unstructured data. But with the help of Hadoop, we can quickly get this job done.
Hadoop has two Main Layers
- Storage: HDFS ( Hadoop Distributed File System)
- Processing of Data: MapReduce
HDFS
HDFS is simply storage for big data, The ability of HDFS to store the data in a distributed manner gives us the power to store large amounts of Data.
Suppose you have 4TB of data that you need to store, Now HDFS spits the data into 1TB and stores it in clusters.
MapReduce
MapReduce is for the Data Processing of Large amounts of data. So MapReduce processes the data in different machines or Nodes. So we can imagine our 4TB of data processed in multiple machines or DataNodes in chunks at the same time, when the processing is done on different clusters the results are then aggregated to give the final output.