Hadoop

Vanshika Munshi

Senior Consultant-Client Relationship & Delivery Management at HuQuo

Published Feb 15, 2024

Hadoop HDFS

The Hadoop Distributed File System (HDFS) is Hadoop’s storage layer. Housed on multiple servers, data is divided into blocks based on file size. These blocks are then randomly distributed and stored across slave machines.

HDFS in Hadoop Architecture divides large data into different blocks. Replicated three times by default, each block contains 128 MB of data. Replications operate under two rules:

Two identical blocks cannot be placed on the same DataNode
When a cluster is rack aware, all the replicas of a block cannot be placed on the same rack

In this example, blocks A, B, C, and D are replicated three times and placed on different racks. If DataNode 7 crashes, we still have two copies of block C data on DataNode 4 of Rack 1 and DataNode 9 of Rack 3.

There are three components of the Hadoop Distributed File System:

NameNode (a.k.a. masternode): Contains metadata in RAM and disk
Secondary NameNode: Contains a copy of NameNode’s metadata on disk
Slave Node: Contains the actual data in the form of blocks

To view or add a comment, sign in

See all

Hadoop

Vanshika Munshi

Senior Consultant-Client Relationship & Delivery Management at HuQuo

Hadoop HDFS

More articles by this author

Insights from the community

Others also viewed

Hadoop 2.x

Harnessing the Power of Hadoop A Guide to Effective Data Management

Harnessing the Power of Hadoop A Guide to Effective Data Management

Top 4 Components of Hadoop Services in London UK Architecture

Hadoop

Hadoop

HOW COMPANIES USE HADOOP?!

How To Create Hadoop Cluster In Just 10 Minutes ?

Hadoop’s Legacy: No more fear of data

Top Difference Between Name Node and Data Node in Hadoop Services in New York

Explore topics

Hadoop HDFS

Key Data Engineer Skills and Responsibilities

Aug 13, 2024

What Is Financial Planning? Definition, Meaning and Purpose

Aug 12, 2024

What is Power BI?

Aug 10, 2024

Abinitio Graphs

Aug 8, 2024

Abinitio Interview Questions

Aug 6, 2024

Big Query

Aug 5, 2024

Responsibilities of Abinitio Developer

Aug 3, 2024

Abinitio Developer

Aug 2, 2024

Data Engineer

Aug 1, 2024

Pyspark

Jul 31, 2024

Insights from the community

Others also viewed

Hadoop 2.x

Harnessing the Power of Hadoop A Guide to Effective Data Management

Harnessing the Power of Hadoop A Guide to Effective Data Management

Top 4 Components of Hadoop Services in London UK Architecture

Hadoop

Hadoop

HOW COMPANIES USE HADOOP?!

How To Create Hadoop Cluster In Just 10 Minutes ?

Hadoop’s Legacy: No more fear of data

Top Difference Between Name Node and Data Node in Hadoop Services in New York

Explore topics