Grid Computing Lab
Grid Computing Lab
Grid Computing Lab
Globus Toolkit
Introduction
The open source Globus Toolkit is a fundamental enabling technology for the "Grid,"
letting people share computing power, databases, and other tools securely online across
corporate, institutional, and geographic boundaries without sacrificing local autonomy.
The toolkit includes software services and libraries for resource monitoring, discovery,
and management, plus security and file management.
In addition to being a central part of science and engineering projects that total nearly a
half-billion dollars internationally, the Globus Toolkit is a substrate on which leading IT
companies are building significant commercial Grid products.
1. cp /home/stack/downloads/* /usr/local
2. pwd
3. tar zxvf jdk-8u60-linux-x64.gz
cd jdk1.8.0_60/
pwd
export JAVA_HOME=/usr/local/grid/SOFTWARE/jdk1.8.0_60/bin
cd ..
4. tar zxvf apache-ant-1.9.6-bin.tar.gz
pwd
export ANT_HOME=/usr/local/grid/SOFTWARE/apache-ant-1.9.6
cd ..
cd apache-tomcat-7.0.67/
pwd
export CATALINA_HOME=/usr/local/grid/SOFTWARE/apache-tomcat-7.0.67
cd ..
6. unzip junit3.8.1.zip
cd junit3.8.1
pwd
export JUNIT_HOME=/usr/local/grid/SOFTWARE/junit3.8.1
cd ..
pwd
7. dpkg -i globus-toolkit-repo_latest_all.deb
8. apt-get update
Aim:
To Develop new web service for calculator using Globus toolkit.
Procedure :
When you start Globus toolkit container, there will be number of services starts up.
The service for this task will be a simple Math service that can perform basic arithmetic for a
client.
The Math service will access a resource with two properties:
1. An integer value that can be operated upon by the service
2. A string values that holds string describing the last operation
The service itself will have three remotely accessible operations that operate upon value:
(a) add, that adds a to the resource property value.
(b) subtract that subtracts a from the resource property value.
(c) getValueRP that returns the current value of value.
Usually, the best way for any programming task is to begin with an overall description of what
you want the code to do, which in this case is the service interface. The service interface
describes how what the service provides in terms of names of operations, their arguments and
return values.
public interface
Math
{ public void add(int a);
public void subtract(int a);
public int getValueRP();
}
It is possible to start with this interface and create the necessary WSDL file using the
standard Web service tool called Java2WSDL. However, the WSDL file for GT 4 has to include
details of resource properties that are not given explicitly in the interface above.
Hence, we will provide the WSDL file. Step 1 Getting the Files All the required files are
provided and comes directly from [1]. The MathService source code files can be found from
http://www.gt4book.com (http://www.gt4book.com/downloads/gt4book-examples.tar.gz).
A Windows zip compressed version can be found at
http://www.cs.uncc.edu/~abw/ITCS4146S07/gt4book-examples.zip. Download and uncompress
the file into a directory called GT4services. Everything is included (the java source WSDL and
deployment files, etc.)
WSDL service interface description file -- The WSDL service interface description file is
provided within the GT4services folder at:
GT4Services\schema\examples\MathService_instance\Math.wsdl This file, and discussion of its
contents, can be found in Appendix A. Later on we will need to modify this file, but first we will
use the existing contents that describe the Math service above.
Service code in Java -- For this assignment, both the code for service operations and for the
resource properties are put in the same class for convenience. More complex services and
resources would be defined in separate classes.
The Java code for the service and its resource properties is located within the GT4services folder
at:
GT4services\org\globus\examples\services\core\first\impl\MathService.java.
Deployment Descriptor -- The deployment descriptor gives several different important
sets of information about the service once it is deployed. It is located within the GT4services
folder at: GT4services\org\globus\examples\services\core\first\deploy-server.wsdd. Step 2 –
Building the Math Service It is now necessary to package all the required files into a GAR (Grid
Archive) file.
The build tool ant from the Apache Software Foundation is used to achieve this as shown
overleaf:
BUILD SUCCESSFUL
Total time: 8 seconds
During the build process, a new directory is created in your GT4Services directory that is
named build. All of your stubs and class files that were generated will be in that directory and its
subdirectories.
More importantly, there is a GAR (Grid Archive) file called
org_globus_examples_services_core_first.gar. The GAR file is the package that contains every
file that is needed to successfully deploy your Math Service into the Globus container.
The files contained in the GAR file are the Java class files, WSDL, compiled stubs, and the
deployment descriptor. Step 3 – Deploying the Math Service If the container is still running in
the Container Window, then stop it using Control-C.
To deploy the Math Service, you will use a tool provided by the Globus Toolkit called
globus-deploy-gar. In the Container Window, issue the command: globus-deploy-gar
org_globus_examples_services_core_first.gar Successful output of the command is :
Step 4 – Compiling the Client A client has already been provided to test the Math Service and is
located in the GT4Services directory.
Step 5 – Start the Container for your Service Restart the Globus container from the Container
Window with: globus-start-container -nosec if the container is not running.
Step 6 – Run the Client To start the client from your GT4Services directory, do the following in
the Client Window, which passes the GSH of the service as an argument:
java -classpath build\classes\org\globus\examples\services\core\first\impl\:%CLASSPATH%
org.globus.examples.clients.MathService_instance.
Client http://localhost:8080/wsrf/services/examples/core/first/MathService
which should give the output:
Current value: 15
Current value: 10
Step 7 – Undeploy the Math Service and Kill a Container Before we can add functionality to the
Math Service (Section 5), we must undeploy the service.
In the Container Window, kill the container with a Control-C.
Then to undeploy the service, type in the following command:
globus-undeploy-gar org_globus_examples_services_core_first
which should result with the following output:
Undeploying gar... Deleting /.
Undeploy successful
Result:
Thus the develop new web service for calculator was successfully completed and output
verified.
Ex. No:2 Developing New Grid Service
Date :
Aim:
` To Develop new Grid Service
Procedure :
1. Setting up Eclipse, GT4, Tomcat, and the other necessary plug-ins and tools
2. Creating and configuring the Eclipse project in preparation for the source files
3. Adding the source files (and reviewing their major features)
4. Creating the build/deploy Launch Configuration that orchestrates the automatic
generation of the remaining artifacts, assembling the GAR, and deploying the grid service
into the Web services container
5. Using the Launch Configuration to generate and deploy the grid service
6. Running and debugging the grid service in the Tomcat container
7. Executing the test client
8. To test the client, simply right-click the Client.java file and select Run > Run... from the
pop-up menu (See Figure 27).
9. In the Run dialog that is displayed, select the Arguments tab and enter
http://127.0.0.1:8080/wsrf/services/examples/ProvisionDirService in the Program
Arguments: textbox.
10. Run dialog
11. Run the client application by simply right-clicking the Client.java file and selecting Run
> Java Application
Output
Result:
Thus the developing new grid service was successfully completed and output verified.
Ex. No:3 Develop applications using Java - Grid APIs
Date :
Aim:
To develop Applications using Java – Grid APIs
Procedure:
1. Build a server-side SOAP service using Tomcat and Axis
Output
Result:
Thus the develop applications using java-grid APIs was successfully completed and
output verified.
Ex. No:4 Develop secured applications using basic security in Globus
Date :
Aim:
To Develop secured applications using basic security in Globus.
Procedure:
Mandatory prerequisite:
Tomcat v4.0.3
Axis beta 1
Commons Logging v1.0
Java CoG Kit v0.9.12
Xerces v2.0.1
<Valve className="org.globus.tomcat.catalina.valves.CertificatesValve"
debug="1" />
Copy gsiaxis.jar to the WEB-INF/lib directory of your Axis installation under Tomcat.
You should ensure that the following jars from the axis/lib directory are in your
classpath:
o axis.jar
o clutil.jar
o commons-logging.jar
o jaxrpc.jar
o log4j-core.jar
o tt-bytecode.jar
o wsdl4j.jar
You should also have these jars in your classpath:
o gsiaxis.jar
o cog.jar
o xerces.jar (or other XML parser)
Check the logs in Tomcat's logs/ directory to ensure the server started correctly. In particular
check that:
Alpha 3 version
Let's assume we already have a web service called MyService with a single method,
myMethod. When a SOAP message request comes in over the GSI httpg transport, the Axis RPC
despatcher will look for the same method, but with an additional parameter: the MessageContext.
So we can write a new myMethod which takes an additional argument, the MessageContext.
package org.globus.example;
import org.apache.axis.MessageContext;
import org.globus.axis.util.Util;
public class MyService {
// The "normal" method
public String myMethod(String arg) {
System.out.println("MyService: http request\n");
System.out.println("MyService: you sent " + arg);
return "Hello Web Services World!";
}
// Add a MessageContext argument to the normal method
public String myMethod(MessageContext ctx, String arg) {
System.out.println("MyService: httpg request\n");
System.out.println("MyService: you sent " + arg);
System.out.println("GOT PROXY: " + Util.getCredentials(ctx));
return "Hello Web Services World!";
}
}
Beta 1 version
In the Beta 1 version, you don't even need to write a different method. Instead the
Message Context is put on thread local store.
This can be retrieved by calling MessageCOntext.getCurrentContext():
package org.globus.example;
import org.apache.axis.MessageContext;
import org.globus.axis.util.Util;
public class MyService {
// Beta 1 version
public String myMethod(String arg) {
System.out.println("MyService: httpg request\n");
System.out.println("MyService: you sent " + arg);
// Retrieve the context from thread local
MessageContext ctx = MessageContext.getCurrentContext();
System.out.println("GOT PROXY: " + Util.getCredentials(ctx));
return "Hello Web Services World!";
}
}
Part of the code provided by ANL in gsiaxis.jar is a utility package which includes the
getCredentials() method. This allows the service to extract the proxy credentials from the
MessageContext.
7.2. Deploying the service
Before the service can be used it must be made available. This is done by deploying the service.
This can be done in a number of ways:
As in the previous example, this is very similar to writing a normal web services client. There are
some additions required to use the new GSI over SSL transport:
Result:
Aim :
To Develop a Grid portal, where user can submit a job and get the result. Implement it
with and without GRAM concept.
Procedure:
1) Building the GridSphere distribution requires 1.5+. You will also need Ant 1.6+ available at
http://jakarta.apache.org/ant.
2) You will also need a Tomcat 5.5.x servlet container available at
http://jakarta.apache.org/tomcat. In addition to providing a hosting environment for GridSphere,
Tomcat provides some of the required XML (JAR) libraries that are needed for compilation.
3) Compiling and Deploying
4) The Ant build script, build.xml, uses the build.properties file to specify any compilation
options. Edit build.properties appropriately for your needs.
5) At this point, simply invoking "ant install" will deploy the GridSphere portlet container to
Tomcat using the default database. Please see the User Guide for more details on configuring the
database.
6) The build.xml supports the following basic tasks:
install -- builds and deploys GridSphere, makes the documentation and installs the database
clean -- removes the build and dist directories including all the compiled classes
update -- updates the existing source code from CVS
compile -- compiles the GridSphere source code
deploy -- deploys the GridSphere framework and all portlets to a Tomcat servlet container
located at $CATALINA_HOME
create-database - creates a new, fresh database with original GridSphere settings, this wipes out
your current database
docs -- builds the Javadoc documentation from the source code
To see all the targets invoke "ant --projecthelp".
7) Startup Tomcat and then go to http://127.0.0.1:8080/gridsphere/gridsphere to see the portal.
uname -m
If the result is x86_64, it means that your Operating system is 64 bit Operating
system.
3. Few KVM packages are availabe with Linux installation.
To check this, run the command,
ls /lib/modules/{press tab}/kernel/arch/x86/kvm
The three files which are installed in your system will be displayed
kvm-amd.ko kvm-intel.ko kvm.ko
4. Install the KVM packages
1. Switch to root (Administrator) user
sudo -i
2. To install the packages, run the following commands,
apt-get update
apt-get install qemu-kvm
apt-get install libvirt-bin
apt-get install bridge-utils
apt-get install virt-manager
apt-get install qemu-system
5. To verify your installation, run the command
virsh -c qemu:///system list
it shows output:
Id Name State
-------------------------------------------
If Vms are running, then it shows name of VM. If VM is not runnign, the system shows blank
output, whcih means your KVM installation is perfect.
6. Run the command
virsh –connect qemu:///system list –all
7. Working with KVM
run the command
virsh
version (this command displays version of software tools installed)
nodeinfo (this command displays your system information)
quit (come out of the system)
8. To test KVM installation - we can create Virtual machines but these machines are to be done
in manual mode. Skipping this, Directly install Openstack.
Installation of Openstack
1. add new user named stack – This stack user is the adminstrator of the openstack services.
To add new user – run the command as root user.
adduser stack
2. run the command
apt-get install sudo -y || install -y sudo
3. Be careful in running the command – please be careful with the syntax. If any error in thsi
following command, the system will crash beacause of permission errors.
echo “stack ALL=(ALL) NOPASSWD:ALL” >> /etc/sudoers
4. Logout the system and login as stack user
5. Run the command (this installs git repo package)
Please ensure that you are as logged in as non-root user (stack user), and not in /root
directory.
sudo apt-get install git
6. Run the command (This clones updatesd version of dev-stack (which is binary auto-installer
package of Openstack)
git clone https://git.openstack.org/openstack-dev/devstack
ls (this shows a folder named devstack)
cd devstack (enter into the folder)
7. create a file called local.conf. To do this run the command,
nano local.conf
8. In the file, make the following entry (Contact Your Network Adminstrator for doubts in these
values)
[[local|localrc]]
FLOATING_RANGE=192.168.1.224/27
FIXED_RANGE=10.11.11.0/24
FIXED_NETWORK_SIZE=256
FLAT_INTERFACE=eth0
ADMIN_PASSWORD=root
DATABASE_PASSWORD=root
RABBIT_PASSWORD=root
SERVICE_PASSWORD=root
SERVICE_TOCKEN=root
9. Save this file
10. Run the command (This installs Opentack)
./stack.sh
11. If any error occurs, then run the command for uninistallation
./unstack.sh
1. update the packages
apt-get update
Aim :
To Find procedure to run the virtual machine of different configuration and to check
how many Virtual machines can be utilized at particular time.
Procedure:
This experiment is to be performed through portal. Login into Openstack portal, in instances,
create virtual machines.
TO RUN VM
Step 1 : Under the Project Tab, Click Instances. In the right side screen Click Launch Instance.
Step 2 : In the details, Give the instance name(eg. Instance1).
Step 3: Click Instance Boot Source list and choose 'Boot from image'
Step 4: Click Image name list and choose the image currently uploaded.
Step 5: Click launch.
Your VM will get created.
Ex. No:2
Procedure to attach virtual block to the virtual machine
Date :
Aim:
To find procedure to attach virtual block to the virtual machine and check whether it
holds the data even after the release of the virtual machine.
Procedure:
This experiment is to be performed through portal. Login into Openstack portal, in instances,
create virtual machines.
In Volumes, create storage block of available capacity. Attach / Mount the storage block
volumes to virtual machines, unmount the volume and reattach it.
Volumes are block storage devices that you attach to instances to enable persistent storage.
You can attach a volume to a running instance or detach a volume and attach it to another
instance at any time. You can also create a snapshot from or delete a volume. Only
administrative users can create volume types.
Create a volume
1. Log in to the dashboard.
2. Select the appropriate project from the drop down menu at the top left.
3. On the Project tab, open the Compute tab and click Volumes category.
In the dialog box that opens, enter or select the following values.
o No source, empty volume: Creates an empty volume. An empty volume does not
contain a file system or a partition table.
o Image: If you choose this option, a new field for Use image as a source displays.
You can select the image from the list.
o Volume: If you choose this option, a new field for Use volume as a
source displays. You can select the volume from the list. Options to use a
snapshot or a volume as the source for a volume are displayed only if there are
existing snapshots or volumes.
Availability Zone: Select the Availability Zone from the list. By default, this value is set
to the availability zone given by the cloud provider (for example, us-west or apac-south). For
some cases, it could be nova.
After you create one or more volumes, you can attach them to instances. You can attach a
volume to one instance at a time.
2. Select the appropriate project from the drop down menu at the top left.
3. On the Project tab, open the Compute tab and click Volumes category.
6. Enter the name of the device from which the volume is accessible by the instance.
The dashboard shows the instance to which the volume is now attached and the device
name.
You can view the status of a volume in the Volumes tab of the dashboard. The volume is either
Available or In-Use.
Now you can log in to the instance and mount, format, and use the disk.
2. Select the appropriate project from the drop down menu at the top left.
3. On the Project tab, open the Compute tab and click Volumes category.
6. In the dialog box that opens, enter a snapshot name and a brief description.
The dashboard shows the new volume snapshot in Volume Snapshots tab.
Edit a volume
3. On the Project tab, open the Compute tab and click Volumes category.
6. In the Edit Volume dialog box, update the name and description of the volume.
Delete a volume
When you delete an instance, the data in its attached volumes is not deleted.
2. Select the appropriate project from the drop down menu at the top left.
3. On the Project tab, open the Compute tab and click Volumes category.
4. Select the check boxes for the volumes that you want to delete.
Aim :
To find procedure to attach virtual block to the virtual machine and check whether it holds
the data even after the release of the virtual machine.
Procedure:
1. Install a C compiler in the virtual machine and execute a sample program.
Through Openstack portal create virtual machine. Through the portal connect to virtual
machines. Login to VMs and install c compiler using commands.
Eg : apt-get install gcc
2. Show the virtual machine migration based on the certain condition from one node to the other.
To demonstrate virtual machine migration, two machines must be configured in one cloud. Take
snapshot of running virtual machine and copy the snapshot file to the other destination machine
and restore the snapshot. On restoring the snapshot, VM running in source will be migrated to
destination machine.
$ nova list
2. After selecting a VM from the list, run this command where VM_ID is set to the ID in the
list returned in the previous step:
4. To migrate an instance and watch the status, use this example script:
#!/bin/bash
# Provide usage
usage() {
exit 1
}
VM_ID=$1
echo -n "."
sleep 2
done
echo;
Aim :
Storage controller will be installed as Swift and Cinder components when installing Openstack.
The ways to interact with the storage will be done through portal.
OpenStack Object Storage (swift) is used for redundant, scalable data storage using clusters of
standardized servers to store petabytes of accessible data. It is a long-term storage system for
large amounts of static data which can be retrieved and updated.
OpenStack Object Storage provides a distributed, API-accessible storage platform that can be
integrated directly into an application or used to store any type of file, including VM images,
backups, archives, or media files. In the OpenStack dashboard, you can only manage containers
and objects.
In OpenStack Object Storage, containers provide storage for objects in a manner similar to a
Windows folder or Linux file directory, though they cannot be nested. An object in OpenStack
consists of the file to be stored in the container and any accompanying metadata.
Create a container
Log in to the dashboard
1. Select the appropriate project from the drop down menu at the top left.
2. On the Project tab, open the Object Store tab and click Containers category.
3. Click Create Container.
4. In the Create Container dialog box, enter a name for the container, and then click Create
Container.
Upload an object
2. Select the appropriate project from the drop down menu at the top left.
3. On the Project tab, open the Object Store tab and click Containers category.
The Upload Object To Container: <name> dialog box appears. <name> is the name of the
container to which you are uploading the object.
Manage an object
To edit an object
1. Log in to the dashboard.
2. Select the appropriate project from the drop down menu at the top left.
3. On the Project tab, open the Object Store tab and click Containers category.
5. Click the menu button and choose Edit from the dropdown list.
Result:
Thus the procedure to install storage controller and interact with it was successfully
completed and output verified.
Ex. No:5 Procedure to set up the one node Hadoop cluster
Aim :
Mandatory prerequisite:
1) Installing Java:
Hadoop is a framework written in Java for running applications on large clusters of commodity
hardware. Hadoop needs Java 6 or above to work.
Step 1: Download Jdk tar.gz file for linux-62 bit, extract it into “/usr/local”
boss@solaiv[]# cd /opt
Step 2:
Open the “/etc/profile” file and Add the following line as per the version
Use the root user to save the /etc/proflie or use gedit instead of vi .
The 'profile' file contains commands that ought to be run for login shells
boss@solaiv[]# sudo vi /etc/profile
#--insert JAVA_HOME
JAVA_HOME=/opt/jdk1.8.0_05
#--in PATH variable just append at the end of the line PATH=$PATH:$JAVA _HOME/bin
By default OS will have a open jdk. Check by “java -version”. You will be prompt “openJDK”
If you also have openjdk installed then you'll need to update the java alternatives:
If your system has more than one version of Java, configure which one your system causes by
entering the following command in a terminal window
By default OS will have a open jdk. Check by “java -version”. You will be prompt “Java
HotSpot(TM) 64-Bit Server”
2) configure ssh
Hadoop requires SSH access to manage its nodes, i.e. remote machines plus your local machine
if you want to use Hadoop on it (which is what we want to do in this short tutorial). For our
single-node setup of Hadoop, we therefore need to configure SSH access to localhos
The need to create a Password-less SSH Key generation based authentication is sothat the master
node can then login to slave nodes (and the secondary node) to start/stop them easily without any
delays for authentication
Generate an SSH key for the user. Then Enable password-less SSH access to yo
root@solaiv[]# exit
3) Hadoop installation
Now Download Hadoop from the official Apache, preferably a stable release version of
Hadoop 2.7.x and extract the contents of the Hadoop package to a location of your choice.
Step 1: Download the tar.gz file of latest version Hadoop ( hadoop-2.7.x) from the official site .
Step 2: Extract(untar) the downloaded file from this commands to /opt/bigdata
root@solaiv[]# cd /opt
root@solaiv[/opt]# sudo tar xvpzf /home/itadmin/Downloads/hadoop-2.7.0.tar.gz
root@solaiv[/opt]# cd hadoop-2.7.0/
#--in PATH variable just append at the end of the line PATH=$PATH:$HADOOP_PREFIX/bin
boss@solaiv[]# cd $HADOOP_PREFIX
Add the following properties in the various hadoop configuration files which is available under
$HADOOP_PREFIX/etc/hadoop/ core-site.xml, hdfs-site.xml, mapred-site.xml & yarn-site.xml
boss@solaiv[]# cd $HADOOP_PREFIX/etc/hadoop
boss@solaiv[]# vi hadoop-env.sh
Paste following line at beginning of the fileexport
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
Modify the hdfs-site.xml
boss@solaiv[]# vi hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
boss@solaiv[]# vi mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
Modiy yarn-site.xml
boss@solaiv[]# vi yarn-site.xml
<configuration>
<property><name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value></property>
</configuration>
The first step to starting up your Hadoop installation is formatting the Hadoop files system which
is implemented on top of the local file system of our “cluster” which includes only our local
machine. We need to do this the first time you set up a Hadoop cluster. Do not format a running
Hadoop file system as you will lose all the data currently in the cluster (in HDFS)
root@solaiv[]# sbin/start-dfs.sh
root@solaiv[]# sbin/stop-dfs.sh
root@solaiv[]# sbin/stop-yarn.sh
Result :
Thus the procedure to set up the one node Hadoop cluster was successfully completed
ant output verified.
Ex. No:6 Mount the one node Hadoop cluster using FUSE
Introduction
FUSE (Filesystem in Userspace) enables you to write a normal user application as a bridge for a
traditional filesystem interface.
The hadoop-hdfs-fuse package enables you to use your HDFS cluster as if it were a traditional
filesystem on Linux. It is assumed that you have a working HDFS cluster and know the
hostname and port that your NameNode exposes.
Aim :
mkdir -p <mount_point>
hadoop-fuse-dfs dfs://<name_node_hostname>:<namenode_port><mount_point>
You can now run operations as if they are on your mount point. Press Ctrl+C to end the fuse-dfs
program, and umount the partition if it is still mounted.
Note:
$ umount<mount_point>
You can now add a permanent HDFS mount which persists through reboots.
hadoop-fuse-dfs#dfs://<name_node_hostname>:<namenode_port><mount_point> fuse
allow_other,usetrash,rw 2 0
For example:
$ mount <mount_point>
Your system is now configured to allow you to use the ls command and use that mount point as
if it were a normal system disk.
Result :
Thus the Mount the one Hadoop cluster using FUSE was successfully completed and
output verified.
Ex. No: 7 Program to use the API’s of Hadoop to interact with it.
Aim :
Procedure:
1. Given below is the data regarding the electrical consumption of an organization. It
contains the monthly electrical consumption and the annual average for various years.
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Avg
1979 23 23 2 43 24 25 26 26 26 26 25 26 25
1980 26 27 28 28 28 30 31 31 31 30 30 30 29
1981 31 32 32 32 33 34 35 36 36 34 34 34 34
1984 39 38 39 39 39 41 42 43 40 39 38 38 40
1985 38 39 39 39 39 41 41 41 00 40 39 39 45
If the above data is given as input, we have to write applications to process it and produce results
such as finding the year of maximum usage, year of minimum usage, and so on. This is a
walkover for the programmers with finite number of records. They will simply write the logic to
produce the required output, and pass the data to the application written.
But, think of the data representing the electrical consumption of all the large scale industries of a
particular state, since its formation.
There will be a heavy network traffic when we move data from source to network server
and so on.
2. The above data is saved as sample.txtand given as input. The input file looks as shown below.
1979 23 23 2 43 24 25 26 26 26 26 25 26 25
1980 26 27 28 28 28 30 31 31 31 30 30 30 29
1981 31 32 32 32 33 34 35 36 36 34 34 34 34
1984 39 38 39 39 39 41 42 43 40 39 38 38 40
1985 38 39 39 39 39 41 41 41 00 40 39 39 45
3. Write a program to the sample data using MapReduce framework and Save the above program
as ProcessUnits.java. The compilation and execution of the program is explained below.
Follow the steps given below to compile and execute the above program.
Step 1
The following command is to create a directory to store the compiled java classes.
$ mkdir units
Step 2
Download Hadoop-core-1.2.1.jar, which is used to compile and execute the MapReduce
program. Visit the following linkhttp://mvnrepository.com/artifact/org.apache.hadoop/hadoop-
core/1.2.1 to download the jar. Let us assume the downloaded folder is /home/hadoop/.
Step 3
The following commands are used for compiling the ProcessUnits.javaprogram and creating a
jar for the program.
Step 4
The following command is used to create an input directory in HDFS.
Step 5
The following command is used to copy the input file named sample.txtin the input directory of
HDFS.
Step 6
The following command is used to verify the files in the input directory.
Step 7
The following command is used to run the Eleunit_max application by taking the input files from
the input directory.
Wait for a while until the file is executed. After execution, as shown below, the output will
contain the number of input splits, the number of Map tasks, the number of reducer tasks, etc.
completed successfully
14/10/31 06:02:52
Map-Reduce Framework
Spilled Records=10
Shuffled Maps =2
Failed Shuffles=0
Bytes Written=40
Step 8
The following command is used to verify the resultant files in the output folder.
Step 9
The following command is used to see the output in Part-00000 file. This file is generated by
HDFS.
1984 40
1985 45
Step 10
The following command is used to copy the output folder from HDFS to the local file system for
analyzing.
Important Commands
All Hadoop commands are invoked by the $HADOOP_HOME/bin/hadoopcommand. Running
the Hadoop script without any arguments prints the description for all commands.
GENERIC_OPTIONS Description
e.g.
e.g.
e.g.
Output
1981 34
1984 40
1985 45
Result:
Thus the program to use the API’s of Hadoop to interact with it was successfully
completed and output verified.
Aim :
To Write a wordcount program to demonstrate the use of Map and Reduce tasks
Procedure:
WordCount is a simple application that counts the number of occurrences of each word in a given
input set.
export JAVA_HOME=/usr/java/default
export PATH=${JAVA_HOME}/bin:${PATH}
export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar
Assuming that:
/user/joe/wordcount/input - input directory in HDFS
/user/joe/wordcount/output - output directory in HDFS
World 2
Result :
Thus the Word Count program to demonstrate the use of Map and reduce tasks was
successfully completed and output verified.