Welcome to Scribd!

0% found this document useful (0 votes)

131 views

Lab Assignment 1: Mapreduce / Hadoop: Notes

Uploaded by

This document provides instructions for Lab Assignment 1 on distributed systems. Students will set up Hadoop on a single node and multi-node cluster to understand the MapReduce programming model. They will download data files and store them in HDFS, then run MapReduce jobs on the data to count word frequencies and filter based on a dictionary file. Finally, they will check the output files to analyze the results.

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Lab Assignment 1: Mapreduce / Hadoop: Notes

Uploaded by

Karthick Jothivel

0% found this document useful (0 votes)

131 views2 pages

Original Title

CS432S17Lab1

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Download as pdf or txt

0% found this document useful (0 votes)

131 views2 pages

Lab Assignment 1: Mapreduce / Hadoop: Notes

Uploaded by

Karthick Jothivel

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Download as pdf or txt

Jump to Page

You are on page 1of 2

Search inside document

Alexandria University Lab Assignment 1

Faculty of Engineering CS432: Distributed Systems

Computer and Systems Engineering Assigned: Feb 19, 2017
Spring 2017 Due: Feb 22, 2017

Lab Assignment 1: MapReduce / Hadoop

Notes
You can work on this assignment in teams of two.

Objectives
• Understand the MapReduce programming model.

• Setting up Hadoop on a single node and on a cluster of nodes.

Overview
It is required to install Hadoop on both single node cluster and multiple nodes cluster. Next, you
will practice running few HDFS commands and executing Hadoop jobs. You can use the following
command to download Hadoop on your machine:
wget http://www-eu.apache.org/dist/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz
You can extract the downloaded file using:
tar -xzvf hadoop-2.7.3.tar.gz

Setting up Hadoop
• You will need to download the latest stable version of Hadoop (2.7.3) from this link: http:
//hadoop.apache.org/releases.html.

• Setup the downloaded Hadoop version on your machine. These are the steps that you will
need to follow: https://goo.gl/8KVyGJ. You can do this step before the lab time.

• During the lab, you will need to setup Hadoop on a cluster of machines using the following
steps: https://goo.gl/KLyzFU. You have the choice to use one of the following options to
setup the Hadoop clusters: (1) AWS EC2 instances; (2) Lab machines; or (3) your laptops.

Useful resources: This tutorial can help you setup hadoop: Part I and Part II

HDFS
• Create a directory called input in your home directory.

• Download the following text files from the Gutenberg project, in Plain Text UTF-8 format (hint:
you can use wget):

– The Outline of Science, Vol. 1 (of 4) by J. Arthur Thomson

– The Notebooks of Leonardo Da Vinci

1 of 2
– Ulysses by James Joyce
– The Art of War by 6th cent. B.C. Sunzi
– The Adventures of Sherlock Holmes by Sir Arthur Conan Doyle
– Encyclopaedia Britannica, 11th Edition, ”Brquigny, Louis Georges Oudard Feudrix

• Download the above data and store them to the input directory on your machine.

• Create a new file mydata.txt in the input directory. Open the file and write to it this line:
CS432 FirstStudentID SecondStudentID. Repeat this line four times in the file.

• Copy the input directory from your local disk to HDFS. You can use the command:
hadoop fs -copyToLocal /home/userid/input /home/userid/input. The first path is the
source, which is on your local disk. The second path is the destination, which is on HDFS.

• Now check that the files were already copied using this command:
hadoop fs -ls /home/userid/input

Running Hadoop Jobs

At this step, you run Hadoop jobs on the data loaded on HDFS.

• You need to build the WordCount example described in this tutorial. Name the created jar file
wc.jar.

• You are now ready to run the jar file using:

hadoop jar wc.jar /home/userid/input /home/userid/output

• Check the output files created in the /home/userid/output.

• Copy the output directory to your local disk using:

hadoop fs -get /home/userid/output /home/userid/output. You can also use
copyToLocal or getmerge

• In the output file, check that CS432 and your SIDs are counted four times. Check the word
count for various words that appeared in the input files.

• You can now update the words count to filter the words based on a second input file. Therefore,
only the words that appear in that dictionary file will be counted.

Resources
• HDFS shell commands

• MapReduce Tutorial

2 of 2

Data Storage Data Processing: Hadoop Distributed File System (HDFS) Mapreduce
Document35 pages
Data Storage Data Processing: Hadoop Distributed File System (HDFS) Mapreduce
SUDHEER REDDY
No ratings yet
HOL Hive PDF
Document23 pages
HOL Hive PDF
Kishore Kumar
No ratings yet
Course: Big Data Analytics Lab Scheme: 2017
Document25 pages
Course: Big Data Analytics Lab Scheme: 2017
219x1a3313
No ratings yet
Exp1 Hirday Merged
Document102 pages
Exp1 Hirday Merged
bhavisha.hemwani16
No ratings yet
Bigdata Lab
Document55 pages
Bigdata Lab
Radheshyam Shah
No ratings yet
CC Assignment - 11 - LP-II
Document4 pages
CC Assignment - 11 - LP-II
callhimsid
No ratings yet
Hadoop File Complte
Document18 pages
Hadoop File Complte
rashant
No ratings yet
A48970353 16469 14 2019 Hadoop
Document18 pages
A48970353 16469 14 2019 Hadoop
Anuj Pratap Singh
No ratings yet
Big Data Analytics Lab Experiments
Document16 pages
Big Data Analytics Lab Experiments
Kiran alex Challagiri
No ratings yet
@bigdatalabfile 09
Document35 pages
@bigdatalabfile 09
goatrip2024
No ratings yet
Experiment No 1
Document13 pages
Experiment No 1
Aman Jain
No ratings yet
Hadoop Administrator Training - Lab Hand Book
Document12 pages
Hadoop Administrator Training - Lab Hand Book
debkrc
No ratings yet
Root Password and Log Into The System
Document4 pages
Root Password and Log Into The System
Nasir Kamal
No ratings yet
SSJ Bda File
Document16 pages
SSJ Bda File
chaitanyagndh
No ratings yet
How To Set Up A Hadoop Cluster in Docker
Document13 pages
How To Set Up A Hadoop Cluster in Docker
NP Neupane
No ratings yet
big-data-file
Document32 pages
big-data-file
Ankita Kurle
No ratings yet
Big Data Ia Answers
Document14 pages
Big Data Ia Answers
DARSHAN DARSH
No ratings yet
Notes
Document53 pages
Notes
Radheshyam Shah
No ratings yet
Introduction To Hadoop
Document52 pages
Introduction To Hadoop
anytingac1
No ratings yet
Final Copy - BDA LAB Record
Document44 pages
Final Copy - BDA LAB Record
Praveen Kumar
No ratings yet
Hadoop Installation Manual 2.odt
Document20 pages
Hadoop Installation Manual 2.odt
Gurasees Singh
No ratings yet
Bda - Unit 2
Document56 pages
Bda - Unit 2
Kajal Vaniya
No ratings yet
Big Data Lab Manual and Syllabus
Document71 pages
Big Data Lab Manual and Syllabus
startechbyjus123
No ratings yet
BDA Lab Assignment 1 PDF
Document20 pages
BDA Lab Assignment 1 PDF
parth shah
No ratings yet
Bda Manual
Document80 pages
Bda Manual
bhuvans80_m
No ratings yet
Hands On
Document26 pages
Hands On
Ashok Kumar K R
No ratings yet
Linux Privilege Escalation
Document17 pages
Linux Privilege Escalation
ilykillit325
No ratings yet
Lab 1 - Hadoop HDFS and MapReduce
Document4 pages
Lab 1 - Hadoop HDFS and MapReduce
Shiv GM
No ratings yet
11 Preparing For Programming
Document19 pages
11 Preparing For Programming
Giovanni
No ratings yet
Running Ha Do Op Michel Noll
Document23 pages
Running Ha Do Op Michel Noll
Mahashwetha Rao
No ratings yet
C21053 Jay Vijay Karwatkar-Big Data Analytics & Visualization
Document210 pages
C21053 Jay Vijay Karwatkar-Big Data Analytics & Visualization
Jay Karwatkar
No ratings yet
Hadoop Nishant Gandhi.
Document21 pages
Hadoop Nishant Gandhi.
Bhavesh Lodaliya
No ratings yet
BDT Lab Manual
Document34 pages
BDT Lab Manual
jyothsnapeetla543
No ratings yet
04 Hadoop Setup 05 CLI 06 Running MapRed
Document30 pages
04 Hadoop Setup 05 CLI 06 Running MapRed
Manjula Annamalai
No ratings yet
OS X D2XX Library Version 1.2.2 © Future Technology Devices International Ltd. 2011 Disclaimer
Document5 pages
OS X D2XX Library Version 1.2.2 © Future Technology Devices International Ltd. 2011 Disclaimer
xythri+fake
No ratings yet
Extreme Computing Lab Exercises Session One: 1 Getting Started
Document6 pages
Extreme Computing Lab Exercises Session One: 1 Getting Started
Kumar Satti
No ratings yet
Ba Lab Record-It b2022-26
Document43 pages
Ba Lab Record-It b2022-26
Praveen Kumar
No ratings yet
PDC All Labs
Document129 pages
PDC All Labs
Sai Kiran
100% (1)
04 Hadoop Setup 05 CLI 06 Running MapRed-1
Document42 pages
04 Hadoop Setup 05 CLI 06 Running MapRed-1
Manjula Annamalai
No ratings yet
33 Unix Sessin09 10
Document111 pages
33 Unix Sessin09 10
Qtp Selenium
No ratings yet
Unix - Session09 & 10
Document111 pages
Unix - Session09 & 10
Pindiganti
100% (1)
UnixClass01 PDF
Document111 pages
UnixClass01 PDF
Qtp Selenium
No ratings yet
Bigdatamanualfinal 231019063224 d211cb48
Document45 pages
Bigdatamanualfinal 231019063224 d211cb48
sugunacsbsssmiet
No ratings yet
HDFS Commands Updated
Document87 pages
HDFS Commands Updated
sowjanya kandukuri
No ratings yet
Hadoop File System: CSC 369 Distributed Computing Alexander Dekhtyar
Document5 pages
Hadoop File System: CSC 369 Distributed Computing Alexander Dekhtyar
Carl Alabaster
No ratings yet
DAN Lab ManuaL
Document53 pages
DAN Lab ManuaL
SARANYA A
No ratings yet
GS Command Line 2019 Nov11
Document10 pages
GS Command Line 2019 Nov11
navneeth91
No ratings yet
Lab2 WC
Document2 pages
Lab2 WC
wisesharkwhale
No ratings yet
Linux v2
Document34 pages
Linux v2
Phan Minh Hải
No ratings yet
Big Data File
Document16 pages
Big Data File
Arnav Shrivastava
No ratings yet
BDA Practicalfile
Document19 pages
BDA Practicalfile
hereforpractice
No ratings yet
Dsa Practical File
Document16 pages
Dsa Practical File
Giri Kanchan
No ratings yet
Linux Administrator Guide1
Document31 pages
Linux Administrator Guide1
Moe Thet Hnin
No ratings yet
CS333 Project 1
Document14 pages
CS333 Project 1
VanlocTran
No ratings yet
A New Way To Store and Analyze Data: Presented By:: Harsha Jain
Document20 pages
A New Way To Store and Analyze Data: Presented By:: Harsha Jain
C. Valeriu
No ratings yet
Kcs 061 PPT Unit 2
Document56 pages
Kcs 061 PPT Unit 2
PRACHI ROSHAN
No ratings yet
Linux Commands By Example
From Everand
Linux Commands By Example
Khaled Jamal
Rating: 4.5 out of 5 stars
4.5/5 (3)
Foundation Db2 and Python: Access Db2 with Module-Based API Examples Using Python
From Everand
Foundation Db2 and Python: Access Db2 with Module-Based API Examples Using Python
W. David Ashley
No ratings yet
Quick Configuration of Openldap and Kerberos In Linux and Authenicating Linux to Active Directory
From Everand
Quick Configuration of Openldap and Kerberos In Linux and Authenicating Linux to Active Directory
Dr. Hidaia Mahmood Alassouli
No ratings yet
Evaluation of Some Windows and Linux Intrusion Detection Tools
From Everand
Evaluation of Some Windows and Linux Intrusion Detection Tools
Dr. Hedaya Alasooly
No ratings yet
Steps: /usr/lib/hadoop-0.20/ Usr/lib/hadoop-0.20/lib
Document4 pages
Steps: /usr/lib/hadoop-0.20/ Usr/lib/hadoop-0.20/lib
Karthick Jothivel
No ratings yet
IoT CBS
Document1 page
IoT CBS
Karthick Jothivel
No ratings yet
15CSL77 - Web Lab Manual - 19-20 PDF
Document61 pages
15CSL77 - Web Lab Manual - 19-20 PDF
Karthick Jothivel
No ratings yet
ICAICT 2016 Paper 26
Document8 pages
ICAICT 2016 Paper 26
Karthick Jothivel
No ratings yet
HP Vertica 7.1.x New Features
Document61 pages
HP Vertica 7.1.x New Features
Rahul Vishwakarma
No ratings yet
Scet Unit 5
Document9 pages
Scet Unit 5
Devi Kondaveti
No ratings yet
Cassandra and DataStax Enterprise Essentials
Document38 pages
Cassandra and DataStax Enterprise Essentials
alan88w
No ratings yet
Dorr Dawson Walnut Creek, CA
Document3 pages
Dorr Dawson Walnut Creek, CA
SARVAGYA PANDEY
No ratings yet
Data Science Bootcamp: Curriculum
Document13 pages
Data Science Bootcamp: Curriculum
Arindam Basu
No ratings yet
Big Data Management Syllabus
Document5 pages
Big Data Management Syllabus
ANOOPATICS Y
No ratings yet
Cloudera Spark Training
Document2 pages
Cloudera Spark Training
kudaravalligopi
No ratings yet
Case Study Hadoop
Document3 pages
Case Study Hadoop
Rutuja Soni
No ratings yet
Big Data Analytics: by S. P. Sajjan
Document21 pages
Big Data Analytics: by S. P. Sajjan
Sharanu Sajjan
No ratings yet
PXF 5 11 2
Document252 pages
PXF 5 11 2
Marvin Bacani
No ratings yet
SQL Server Ground To Cloud
Document167 pages
SQL Server Ground To Cloud
Bani Ruliza
No ratings yet
Hbase
Document13 pages
Hbase
A2 Motivation
No ratings yet
Talend Di Guide 201412
Document45 pages
Talend Di Guide 201412
Remzi Kurshumliu
No ratings yet
Intro To DM2
Document92 pages
Intro To DM2
Alfin Rahman Hamada
No ratings yet
Installation Guide - Hadoop-2.6.0
Document28 pages
Installation Guide - Hadoop-2.6.0
Suresh Sai
No ratings yet
IT3061 - Massive Data Processing and Cloud Computing - V3
Document5 pages
IT3061 - Massive Data Processing and Cloud Computing - V3
Geethma Minoli
No ratings yet
Introduction To Data Engineering
Document28 pages
Introduction To Data Engineering
sibuaya495
No ratings yet
Teradata Data Mover Installation, Configuration, and Upgrade Guide For Customers
Document62 pages
Teradata Data Mover Installation, Configuration, and Upgrade Guide For Customers
pintadus
No ratings yet
Bigdata Interview Preparation Guide
Document292 pages
Bigdata Interview Preparation Guide
Rishi
No ratings yet
Attunity Replicate 5.5 Release Notes - August 2017
Document26 pages
Attunity Replicate 5.5 Release Notes - August 2017
nidhib6666
No ratings yet
Twister Tutorial
Document37 pages
Twister Tutorial
Nawaal Ali
No ratings yet
Big Data - Bi - and - Analytics PDF
Document30 pages
Big Data - Bi - and - Analytics PDF
flavio_trauer
0% (1)
Da Unit-2
Document23 pages
Da Unit-2
Shruthi Sayam
No ratings yet
F-IoT Unit-4
Document101 pages
F-IoT Unit-4
Harshini Reddy Revuri
No ratings yet
SalesData Map Reduce
Document3 pages
SalesData Map Reduce
bhavana16686
No ratings yet
Hbase Tutorial
Document107 pages
Hbase Tutorial
GRBPrasad
No ratings yet
Hadoop Cluster
Document23 pages
Hadoop Cluster
Anoushka Rao
No ratings yet
HDFS Vs AFS
Document4 pages
HDFS Vs AFS
abulfaiziqbal
No ratings yet
Big Data Unit5
Document57 pages
Big Data Unit5
Ananth Kallam
No ratings yet
Big Data Analytics in Healthcare
Document193 pages
Big Data Analytics in Healthcare
Muhammad Alkahfi
100% (3)