Hadoop

Hadoop

Hadoop HDFS

The Hadoop Distributed File System (HDFS) is Hadoop’s storage layer. Housed on multiple servers, data is divided into blocks based on file size. These blocks are then randomly distributed and stored across slave machines.

HDFS in Hadoop Architecture divides large data into different blocks. Replicated three times by default, each block contains 128 MB of data. Replications operate under two rules:

  1. Two identical blocks cannot be placed on the same DataNode

  2. When a cluster is rack aware, all the replicas of a block cannot be placed on the same rack

In this example, blocks A, B, C, and D are replicated three times and placed on different racks. If DataNode 7 crashes, we still have two copies of block C data on DataNode 4 of Rack 1 and DataNode 9 of Rack 3.

There are three components of the Hadoop Distributed File System:  

  1. NameNode (a.k.a. masternode): Contains metadata in RAM and disk

  2. Secondary NameNode: Contains a copy of NameNode’s metadata on disk

  3. Slave Node: Contains the actual data in the form of blocks

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics