Lab Manual
Lab Manual
Lab Manual
OBJECTIVE:
PROCEDURE:
When we start Globus toolkit container, there will be number of services starts up.
The service for this task will be a simple Math service that can perform basic
arithmetic for a client.
The Math service will access a resource with two properties:
1. An integer value that can be operated upon by the service.
2. A string values that holds string describing the last operation.
The service itself will have three remotely accessible operations that operate upon
value:
Usually, the best way for any programming task is to begin with an overall
description of what you want the code to do, which in this case is the service
interface. The service interface describes how what the service provides in terms of
names of operations, their arguments and return values. A Java interface for our
service is:
It is possible to start with this interface and create the necessary WSDL file using the
standard Web service tool called Java2WSDL. However, the WSDL file for GT 4 has
to include details of resource properties that are not given explicitly in the interface
above. Hence, we will provide the WSDL file.
CS6712 – GRID AND CLOUD COMPUTING LABORATORY
All the required files are provided. The MathService source code files can be found
from http://www.gt4book.com
(http://www.gt4book.com/downloads/gt4book-examples.tar.gz)
WSDL service interface description file -- The WSDL service interface description
file is provided within the GT4services folder at:
GT4Services\schema\examples\MathService_instance\Math.wsdl
Service code in Java -- Both the code for service operations and for the resource
properties are put in the same class for convenience. More complex services and
resources would be defined in separate classes. The Java code for the service and
its resource properties is located within the GT4services folder at:
GT4services\org\globus\examples\services\core\first\impl\MathService.java.
GT4services\org\globus\examples\services\core\first\deploy-server.wsdd.
CS6712 – GRID AND CLOUD COMPUTING LABORATORY
It is now necessary to package all the required files into a GAR (Grid Archive) file.
The build tool ant from the Apache Software Foundation is used to achieve this as
shown overleaf:
Generating a GAR file with Ant (from http://gdp.globus.org/gt4-
tutorial/multiplehtml/ch03s04.html)
Ant is similar in concept to the UNIX make tool but a java tool and XML based.
Build scripts are provided by Globus 4 to use the ant build file. The windows version
of the build script for MathService is the Python file called globus-build-service.py,
which held in the GT4services directory. The build script takes one argument, the
name of your service that you want to deploy. To keep with the naming convention,
this service will be called first.
In the Client Window, run the build script from the GT4services directory with:
globus-build-service.py first
During the build process, a new directory is created in your GT4Services directory
that is named build. All of your stubs and class files that were generated will be in
that directory and its subdirectories. More importantly, there is a GAR (Grid Archive)
file called org_globus_examples_services_core_first.gar.
The GAR file is the package that contains every file that is needed to successfully
deploy your Math Service into the Globus container. The files contained in the GAR
file are the Java class files, WSDL, compiled stubs, and the deployment descriptor.
If the container is still running in the Container Window, then stop it using Control-C.
To deploy the Math Service, you will use a tool provided by the Globus Toolkit called
globus-deploy-gar. In the Container Window, issue the command:
globus-deploy-gar org_globus_examples_services_core_first.gar
The service has now been deployed. Check service is deployed by starting container
from the Container Window. You should see the service called MathService.
A client has already been provided to test the Math Service and is located in the
GT4Services directory at:
GT4Services\org\globus\examples\clients\MathService_instance\Client.java
package org.globus.examples.clients.MathService_instance;
import org.apache.axis.message.addressing.Address;
CS6712 – GRID AND CLOUD COMPUTING LABORATORY
import org.apache.axis.message.addressing.EndpointReferenceType;
import org.globus.examples.stubs.MathService_instance.MathPortType;
import org.globus.examples.stubs.MathService_instance.GetValueRP;
import
org.globus.examples.stubs.MathService_instance.service.MathServiceAddressingL
ocator;
public class Client {
public static void main(String[] args) {
MathServiceAddressingLocator locator = new
MathServiceAddressingLocator()
try {
String serviceURI = args[0];
// Create endpoint reference to service
EndpointReferenceType endpoint = new
EndpointReferenceType();
endpoint.setAddress(new Address(serviceURI));
MathPortType math;
// Get PortType
math = locator.getMathPortTypePort(endpoint);
// Perform an addition
math.add(10);
// Perform another addition
math.add(5);
// Access value
System.out.println("Current value: "
+ math.getValueRP(new GetValueRP()));
// Perform a subtraction
math.subtract(5);
// Access value
System.out.println("Current value: "
+ math.getValueRP(new GetValueRP()));
} catch (Exception e) {
e.printStackTrace();
}
}
}
When the client is run from the command line, you pass it one argument. The
argument is the URL that specifies where the service resides. The client will create
the end point rerference and incorporate this URL as the address.
CS6712 – GRID AND CLOUD COMPUTING LABORATORY
The end point reference is then used with the getMathPortTypePort method of a
MathServiceAdressingLocator object to obtain a reference to the Math interface
(portType). Then, we can apply the methods available in the service as though they
were local methods Notice that the call to the service (add and subtract method
calls) must be in a “try {} catch(){}” block because a “RemoteException” may be
thrown. The code for the “MathServiceAddressingLocator” is created during the
build process.
To compile the new client, you will need the JAR files from the Globus toolkit in your
CLASSPATH. Do this by executing the following command in the Client Window:
%GLOBUS_LOCATION%\etc\globus-devel-env.bat
You can verify that this sets your CLASSPATH, by executing the command:
echo %CLASSPATH%
You should see a long list of JAR files. Running \gt4\etc\globus-devel-env.bat only
needs to be done once for each Client Window that you open. It does not need to be
done each time you compile.
Once your CLASSPATH has been set, then you can compile the Client code by
typing in the following command:
javac -classpath
build\classes\org\globus\examples\services\core\first\impl\:%CLASSPATH%
org\globus\examples\clients\MathService_instance\Client.java
To start the client from your GT4Services directory, do the following in the Client
Window, which passes the GSH of the service as an argument:
java -classpath
build\classes\org\globus\examples\services\core\first\impl\:%CLASSPATH%
org.globus.examples.clients.MathService_instance.Client
http://localhost:8080/wsrf/services/examples/core/first/MathService
CS6712 – GRID AND CLOUD COMPUTING LABORATORY
Before we can add functionality to the Math Service , we must undeploy the service.
In the Container Window, kill the container with a Control-C. Then to undeploy the
service, type in the following command:
globus-undeploy-gar org_globus_examples_services_core_first
In this final task, you are asked to modify the Math service and associated files so
the service supports the multiplication operation. To do this task, you will need to
modify:
Service code (MathService.java)
WSDL file (Math.wsdl)
RESULT:
Thus the web service for Calculator applications has been developed and
deployed.
CS6712 – GRID AND CLOUD COMPUTING LABORATORY
OBJECTIVE:
PROCEDURE:
Writing and deploying a WSRF Web Service is easier. We just have to follow five
simple steps:
To run this program, as a minimum we will be required to have installed the following
prerequisite software.
1. Download the latest Axis2 runtime from the above link and extract it. Now we point
Eclipse WTP to downloaded Axis2 Runtime. Open Window -> Preferences -> Web
Services -> Axis2 Emitter
2. Select the Axis2 Runtime tab and point to the correct Axis2 runtime location.
Alternatively at the Axis2 Preference tab, you can set the default setting that will
CS6712 – GRID AND CLOUD COMPUTING LABORATORY
come up on the Web Services Creation wizards. For the moment we will accept the
default settings.
a. Click OK.
b. Next we need to create a project with the support of Axis2 features. Open File ->
New -> Other... -> Web -> Dynamic Web Project
c. Click next
3. Select the name Axis2WSTest as the Dynamic Web project name (you can specify
any name you prefer), and select the configured Tomcat runtime as the target
runtime. Click next.
6. Import the wtp/Converter.java class into Axis2WSTest/src (be sure to preserve the
package). Build the Project, if it not auto build.
CS6712 – GRID AND CLOUD COMPUTING LABORATORY
7. Select Converter.java, open File -> New -> Other... -> Web Services -> Web
Service. Click next.
CS6712 – GRID AND CLOUD COMPUTING LABORATORY
8. The Web service wizard would be brought up with Web service type set to Bottom
up Java bean Web Service with the service implementation automatically filled in.
Move the service scale to Start service.
9. Click on the Web Service runtime link to select the Axis2 runtime. Click OK.
CS6712 – GRID AND CLOUD COMPUTING LABORATORY
10. Ensure that the correct server and service project are selected as displayed below.
Click next.
11. This page is the service.xml selection page. if you have a custom services.xml, you
can include that by clicking the Browse button. For the moment, just leave it at the
default.
Click next.
CS6712 – GRID AND CLOUD COMPUTING LABORATORY
12. This page is the Start Server page. It will be displayed if the server has not been
started. Click on the Start Server button. This will start the server runtime.
Click next.
13. This page is the Web services publication page, accept the defaults.
Click Finish.
CS6712 – GRID AND CLOUD COMPUTING LABORATORY
14. Now, select the Axis2WSTest dynamic Web project, right-click and select Run ->
Run As -> Run on Server to bring up the Axis2 servlet.
Click Next.
15. Make sure you have the Axis2WSTest dynamic Web project on the right-hand side
under the Configured project.
Click Finish.
CS6712 – GRID AND CLOUD COMPUTING LABORATORY
16. This will deploy the Axis2 server webapp on the configured servlet container and
will display the Axis2 home page. Note that the servlet container will start up
according to the Server configuration files on your workspace.
17. Click on the Services link to view the available services. The newly created
converter Web service will be shown there.
CS6712 – GRID AND CLOUD COMPUTING LABORATORY
18. Click on the Converter Service link to display the wsdl URL of the newly created
Web service. Copy the URL.
19. Now we'll generate the client for the newly created service by referring the ?wsdl
generated by the Axis2 Server. Open File -> New -> Other... -> Web Services ->
WebServiceClient
20. Paste the URL that was copied earlier into the service definition field.
CS6712 – GRID AND CLOUD COMPUTING LABORATORY
21. Click on the Client project hyperlink and enter Axis2WSTestClient as the name of
the client project. Click OK.
CS6712 – GRID AND CLOUD COMPUTING LABORATORY
22. Back on the Web Services Client wizard, make sure the Web service runtime is set
to Axis2 and the server is set correctly. Click Next.
23. Next page is the Client Configuration Page. Accept the defaults and click Finish.
24. The Clients stubs will be generated to your Dynamic Web project
Axis2WSTestClient.
CS6712 – GRID AND CLOUD COMPUTING LABORATORY
25. Now we are going to write Java main program to invoke the client stub. Import the
ConverterClient.java file to the workspace into the wtp package in the src folder of
Axis2WSTestClient.
26. Then select the ConverterClient file, right-click and select Run As -> Java
Application. Here's what you get on the server console:
CS6712 – GRID AND CLOUD COMPUTING LABORATORY
27. Another way to test and invoke the service is to select Generate test case to test
the service check box on the Axis2 Client Web Service Configuration Page when
going through the Web Service Client wizard.
CS6712 – GRID AND CLOUD COMPUTING LABORATORY
28. If that option is selected, the Axis2 emitter will generate JUnit testcases matching
the WSDL we provide to the client. These JUnit testcases will be generated to a
newly added source directory to the Axis2WSTestClient project called test.
Next thing we need to do is to insert the test case with the valid inputs as the Web
service method arguments. In this case, let's test the
ConverterConverterSOAP11Port_httpTest.java by provide values for Celsius and
Farenheit for the temperature conversion. As an example, replace the generated TODO
statement in each test method to fill in the data with values as:
Here the testcases were generated to test both the synchronous and asynchronous
clients.
29. After that, select the testcase, right-click, select Run As -> JUnit Test. You will be
able to run the unit test successfully invoking the Web service.
CS6712 – GRID AND CLOUD COMPUTING LABORATORY
CS6712 – GRID AND CLOUD COMPUTING LABORATORY
We can choose Web Services -> Test with Web Services Explorer to test the
service.
We can choose Web Services -> Publish WSDL file to publish the service to a
public UDDI registry.
RESULT:
OBJECTIVE:
PROCEDURE:
1. Java 2 SDK
Run the downloaded executable (j2sdk-1_4_1-windows-i586.exe) which will install
the SDK in C:\j2sdk1.4.1. Set the JAVA_HOME environment variable to point to this
directory as follows:
Click on START->CONTROL PANEL->SYSTEM
Click on the Advanced tab
Click on the Environment Variables button
Click on the New… button in the user variable section and enter the details
Add the Java binaries to your PATH variable in the same way by setting a user
variable called PATH with the value “%PATH%;C:\j2sdk1.4.1\bin”
2. Apache Tomcat
3. XML Security
Download and unzip
http://www.apache.org/dist/xml/security/javalibrary/xmlsecurity-bin 1_0_4.zip
Copy xml-sec.jar to C:\axis-1_1\lib\
Set-up your CLASSPATH environment variable to including the following:
C:\axis1_1\lib\xml-sec.jar;
4. Apache Axis
Unzip the downloaded Axis archive to C: (this will create a directory C:\axis-1_1).
Extract the file xmlsec.jar from the downloaded security archive to
C:\axis1_1\webapps\axis\WEB-INF\lib.
Now tell Tomcat about your Axis web application by creating the file
C:\jakarta- tomcat-4.1.24\webapps\axis.xml with the following content:
<Context path="/axis" docBase="C:\axis-1_1\webapps\axis" debug="0"
privileged="true">
<LoggerclassName="org.apache.catalina.logger.FileLogger"prefix="axis_lo."
suffix=".txt" timestamp="false"/>
CS6712 – GRID AND CLOUD COMPUTING LABORATORY
RESULT:
Thus the development of a Grid Service using Apache Axis is executed successfully.
CS6712 – GRID AND CLOUD COMPUTING LABORATORY
OBJECTIVE:
To develop an applications using Java or C/C++ Grid APIs.
SAMPLE CODE:
import AgentTeamwork.Ateam.*;
import MPJ.*;
public class UserProgAteam extends AteamProg {
private int phase;
public UserProgAteam( Ateam o )
{}
public UserProgAteam( )
{}
// real const
public UserProgAteam( String[] args ) {
phase = 0;
}
// phase recovery
private void userRecovery( ) {
phase = ateam.getSnapshotId( );
}
private void compute( ) {
for ( phase = 0; phase < 10; phase++ ) {
try {
Thread.currentThread( ).sleep( 1000 );
}
catch(InterruptedException e ) {
}
ateam.takeSnapshot( phase );
System.out.println( "UserProgAteam at rank " + MPJ.COMM_WORLD.Rank( ) + " : took a
snapshot " + phase );
}
}
public static void main( String[] args ) {
System.out.println( "UserProgAteam: got started" );
MPJ.Init( args, ateam);
UserProgAteam program = null;
// Timer timer = new Timer( );
if ( ateam.isResumed( ) ) {
program = ( UserProgAteam )
ateam.retrieveLocalVar( "program" );
program.userRecovery( );
}
else
{
program = new UserProgAteam( args );
ateam.registerLocalVar( "program", program );
CS6712 – GRID AND CLOUD COMPUTING LABORATORY
}
program.compute( );
MPJ.Finalize( );
}
public class UserProgAteam extends AteamProg {
// application body private void compute( ) {
for ( phase = 0; phase < 10; phase++ ) {
try {
Thread.currentThread( ).sleep( 1000 );
}
catch(InterruptedException e ) {
}
ateam.takeSnapshot( phase );
System.out.println ( "UserProgAteam at rank " + MPJ.COMM_WORLD.Rank( ) + " : took a
snapshot " + phase );
}}
RESULT:
Thus the development of applications using Java or C/C++ Grid APIs is executed
successfully
CS6712 – GRID AND CLOUD COMPUTING LABORATORY
OBJECTIVE:
PROCEDURE:
The Globus Toolkit's Authentication and Authorization components provide the
de facto standard for the "core" security software in Grid systems and applications.
These software development kits (SDKs) provide programming libraries, Java classes,
and essential tools for a PKI, certificate-based authentication system with single sign-on
and delegation features, in either Web Services or non-Web Services frameworks.
("Delegation" means that once someone accesses a remote system, he can give the
remote system permission to use his credentials to access others systems on his
behalf.)
Web Services are simply applications that interact with each other using Web
standards, such as the HTTP transport protocol and the XML family of standards. In
particular, Web Services use the SOAP messaging standard for communication
between service and requestor. They should be self-describing, self-contained and
modular; present a platform and implementation neutral connection layer; and be based
on open standards for description, discovery and invocation.
The Grid Security Infrastructure (GSI) is based on the Generic Security Services
API (GSS-API) and uses an extension to X509 certificates to provide a mechanism to
authenticate subjects and authorise resources. It allows users to benefit from the ease
of use of a single sign-on mechanism by using delegated credentials, and time-limited
proxy certificates. GSI is used as the security infrastructure for the Globus Toolkit.
Recently, a new proposal for an Open Grid Services Architecture (OGSA) was
announced which marries the Grid and Web Services to create a new Grid Services
model. One problem, which has not yet been explicitly addressed, is that of security. A
possible solution is to use a suitably secure transport binding, e.g. TLS, and extend it to
CS6712 – GRID AND CLOUD COMPUTING LABORATORY
incorporate appropriate support for proxy credentials. It would be useful to test out some
of the principles of Grid Services using the currently available frameworks and tools for
developing Web Services. Unfortunately, no standards currently exist for implemented
proxy credential support to provide authenticated communication between web
services. A number of XML/Web Services security standards are currently in
development, e.g. XML Digital Signatures, SAML, XKMS, XACML, but the remainder of
this document describes an approach proposed by ANL to use GSI over an SSL link.
A generic Job Submission environment, GAP enables researchers and scientists to
execute their applications on Grid from a conventional web browser. Both Sequential
and Parallel jobs can be submitted to GARUDA Grid through Portal. It provides a
web interface for viewing the resources, and for submitting and monitoring jobs.
Accessing GAP
Type http://192.168.60.40/GridPortal1.3/ (to access the Portal through GARUDA
Network) or http://203.200.36.236/GridPortal1.3 (to access the Portal through Internet)
in the address bar of the web browser to invoke the Portal. It is preferable to access
the Portal through GARUDA Network, since it is much faster than the Internet.
CS6712 – GRID AND CLOUD COMPUTING LABORATORY
In order to access the facilities of Grid Portal such as Job Submission, Job Status
tracking, Storing(Uploading) of Executables and View Output/Error data, the user has
to login into the
Portal using the User's Login Form in the Home page of the Portal.
a) New users are required to click Sign up in the User Login Form, which leads them
to home page of Indian Grid Certification Authority (IGCA) (http://ca.garudaindia.in/).
Click on Request Certificate and acquire the required user/host certificate(s),
details are provided in IGCA section.
b) Registered users are required to provide User Id and Password for logging into the
Portal and access various facilities.
Job Management
User can submit their job, monitor the status and view output files using the Job
Management interfaces. Types of job submission (Basic and Advanced) and Job
information are covered under this section.
Id. The Job Id has to be noted for future reference to this job. In the event of
unsuccessful submission, the corresponding error message is displayed.
All those fields marked with * are mandatory fields and should be filled before
submitting a job. By clicking on submit button, the portal submits the job to GridWay
Meta Scheduler, which then schedules the job for execution and returns the Job Id.
The Job Id has to be noted for future reference to this job.
This interface is provided for the user to submit their Sequential and Parallel
Jobs. The difference from Basic job submission being: it is using GT4 Web Services
components for submitting jobs to the Grid instead of Gridway as scheduler.
Job Info
The user can view the status of the job submitted through Portal and the output file of
the job by specifying the Job Id. The option for downloading the Output/ Error file is
also provided, after the job execution. To cancel any of the queued jobs, the user
has to select the job and click
Cancel Job button, following which the acknowledgment for the job canceled is
provided.
Resources
The GridWay meta-scheduler provides the following information - Node Name, Head
Node, OS, ARCH, Load Average, Status, Configured Process and Available Process.
This information aids user to select a suitable cluster and reserve them in advance for
job submission.
CS6712 – GRID AND CLOUD COMPUTING LABORATORY
File browser
For the logged-in user, the File Browser lists files, such as the uploaded executables
and Input/Output/Error files, along with their size and last modified information. It also
allows deletion of files.
CS6712 – GRID AND CLOUD COMPUTING LABORATORY
Accounting
This module provides Accounting information of the jobs that are submitted to
GARUDA, such as no. of jobs submitted, and system parameters such as Memory
usage, Virtual memory, Wall Time, and CPU time. Last one month data is displayed by
default.
MyProxy
MyProxy allows user to upload their Globus Certificates into Myproxy Server and the
same can be used for initializing the Grid proxy on the Grid. If the certificate has been
already generated for you, but you do not have access to the above- mentioned files,
you can download it from GridFS machine (from $HOME/.globus directory) using
winscp/scp.
CS6712 – GRID AND CLOUD COMPUTING LABORATORY
MyProxy Init
By default, the "Myproxy Init" option is enabled for the user. Upload proxy by entering
valid inputs - User name, Grid-proxy Passphrase, User certificate file (usercert.pem),
User key file (userkey.pem) and Proxy life time (168 hours is the default value).
MyProxyGet
Grid proxy will be initialized on the Grid head node by providing the inputs - User
name, Myproxy Passphrase and Life time of the certificate.
VOMS Proxy
The Virtual Organization Management System (VOMS) allows users to belong to
Virtual Organizations (VOs), thereby allowing them to utilize resources earmarked for
those VOs.
CS6712 – GRID AND CLOUD COMPUTING LABORATORY
The user can also request for a new VO by using "Request for VO" link. VOMS proxy
initialization with Multiple roles is provided to the user, by selecting more than one
entry on the Role combo box.
2. pvfs2 (172.20.1.81): pvfs2 is the GSRM testing node with the following client
interfaces installed.
3. GSRM Web Client is accessible from any of the user machines reachable to
GSRM server (xn05.ctsf.cdac.org.in), using URL --
https://xn05.ctsf.cdac.org.in/GSRM Client Interfaces
Compiler GUI
The users are required to adhere to following directory structure. Application Parent
Dir- src/,bin/,lib/,include/
CS6712 – GRID AND CLOUD COMPUTING LABORATORY
1) Login
This method is for logging in to the GARUDA.
Inputs
user name MyProxy User Name
password MyProxy Password
life time Indicates how long is the proxy's life time
Output
Proxy string Proxy issued by the My proxy server
Login status Indicates the status of the operation
Last Login Time Gives when this user was last logged in
Current Login Time Gives users logging in time
2) uploadProxy
This method uploads a proxy that is generated using other tools, to the MyProxy
Server.
Inputs
user name MyProxy User Name
password MyProxy Password
proxyBytes Existing proxy file is given as byte array
Output
uploadStatus Indicates the status of the operation
3) storeCredential
This method is used for uploading the credentials that is the PKCS12 certificate
directly to the MyProxy Server. It will convert the PKCS12 to certificate and stores in
server for users to download the proxy until it expires.
CS6712 – GRID AND CLOUD COMPUTING LABORATORY
Inputs
user name MyProxy User Name
password MyProxy Password
p12Bytes PKCS12 file as byte array
Output
storeStatus Indicates the status of the operation
RESULT:
Ex.No.6 Develop a Grid portal, where user can submit a job and get the result.
Implement it with and without GRAM concept
OBJECTIVE:
To develop a Grid portal, where user can submit a job and get the result and to
implement it with and without GRAM concept.
PROCEDURE:
1. Opening the workflow editor
The editor is a Java Webstart application download and installation is only a click.
4. The information system can query EGEE and Globus information systems
CS6712 – GRID AND CLOUD COMPUTING LABORATORY
RESULT:
Thus the development of a Grid portal, where user can submit a job and get the
result and to implement it with and without GRAM is executed successfully.
CS6712 – GRID AND CLOUD COMPUTING LABORATORY
Ex.No.7
FIND PROCEDURE TO RUN THE VIRTUAL MACHINE OF DIFFERENT
CONFIGURATION. CHECK HOW MANY VIRTUAL MACHINES CAN BE UTILIZED
AT PARTICULAR TIME
OBJECTIVE:
To understand the procedure to run the virtual machine of different configuration.
Check how many Virtual machines can be utilized at particular time.
PROCEDURE:
KVM INSTALLTION
Now see if your running kernel is 64-bit, just issue the following command:
$ uname –m
x86_64 indicates a running 64-bit kernel. If you use see i386, i486, i586 or i686, you're
running a 32-bit kernel.
$ ls /lib/modules/3.16.0-30- generic/kernel/arch/x86/kvm/kvm
Verify Installation
You can test if your install has been successful with the following command:
$ virsh -c qemu:///system list
Id Name State
----------------------------------
If on the other hand you get something like this:
$ virsh -c qemu:///system list
libvir: Remote error : Permission denied
error: failed to connect to the hypervisor
virsh # version
virsh # node info
Creating VMS
$ virt-install --connect qemu:///system -n hardy -r 512 -f hardy1.qcow2 -s 12 -c
ubuntu-14.04.2-server-amd64.iso --vnc --noautoconsole --os-type linux --os-variant
ubuntuHardy
(or)
Open disk image Error
$ sudo chmod 777 hardy.gcow2
To run
$ virt-install --connect qemu:///system -n hardy -r 512 -f hardy1.qcow2 -s 12 -c
CS6712 – GRID AND CLOUD COMPUTING LABORATORY
To Login in Guest OS
Step 1 : Under the Project Tab, Click Instances. In the right side screen Click Launch Instance.
CS6712 – GRID AND CLOUD COMPUTING LABORATORY
$ cd devstack
$ ll
$ mv ../local.conf .
$ ls –l local.conf
$ ./stack.sh
$ nano stack.sh edit
#make sure unmask is sane
Add FORCE=yes save and exit
$ ./unstack.sh
$ ./clean.sh
Run DevStack:
$ ./stack.sh
Re-Starting Openstack
$ ./rejoin.sh
$ ps -ef|grep devstack it shows all the processes running
End all the processes.
RESULT:
Thus the procedure to run the virtual machine of different configuration is executed
successfully.
CS6712 – GRID AND CLOUD COMPUTING LABORATORY
OBJECTIVE:
To write the procedure to attach virtual block to the virtual machine and check
whether it holds the data even after the release of the virtual machine.
PROCEDURE:
Volumes are block storage devices that you attach to instances to enable
persistent storage. You can attach a volume to a running instance or detach a volume
and attach it to another instance at any time. You can also create a snapshot from or
delete a volume. Only administrative users can create volume types.
Create a volume
1. Log in to the dashboard.
2. Select the appropriate project from the drop down menu at the top left.
3. On the Project tab, open the Compute tab and click Volumes category.
4. Click Create Volume.
In the dialog box that opens, enter or select the following values.
Volume: If you choose this option, a new field for Use volume as a source displays.
You can select the volume from the list. Options to use a snapshot or a volume as the
source for a volume are displayed only if there are existing snapshots or volumes.
On the Project tab, open the Compute tab and click the Volumes category.
6. Select the volume and click Manage Attachments.
7. Click Detach Volume and confirm your changes.
A message indicates whether the action was successful.
CS6712 – GRID AND CLOUD COMPUTING LABORATORY
Edit a volume
1. Log in to the dashboard.
2. Select the appropriate project from the drop down menu at the top left.
3. On the Project tab, open the Compute tab and click Volumes category.
4. Select the volume that you want to edit.
5. In the Actions column, click Edit Volume.
6. In the Edit Volume dialog box, update the name and description of the volume.
7. Click Edit Volume.
Note
You can extend a volume by using the Extend Volume option available in the More
dropdown list and entering the new value for volume size.
Delete a volume¶
When you delete an instance, the data in its attached volumes is not deleted.
1. Log in to the dashboard.
2. Select the appropriate project from the drop down menu at the top left.
3. On the Project tab, open the Compute tab and click Volumes category.
Note: The actual device name might differ from the volume name because of hypervisor
settings
7. Click Attach Volume.
The dashboard shows the instance to which the volume is now attached and the device
name.
You can view the status of a volume in the Volumes tab of the dashboard. The volume
is either Available or In-Use.
CS6712 – GRID AND CLOUD COMPUTING LABORATORY
Now you can log in to the instance and mount, format, and use the disk.
RESULT:
Thus the procedure to attach virtual block to the virtual machine and check
whether it holds the data even after the release of the virtual machine is executed
successfully.
CS6712 – GRID AND CLOUD COMPUTING LABORATORY
OBJECTIVE:
PROCEDURE:
Step 1: To login into Guest OS in KVM
RESULT:
OBJECTIVE:
To learn virtual machine migration based on the certain condition from one node
to the
other.
PROCEDURE:
To demonstrate virtual machine migration, two machines must be configured in
one cloud.
MIGRATION LIMITATIONS
Openstack has two commands specific to virtual machine migration:
nova migrate $UUID
nova live-migration $UUID $COMPUTE-HOST
The nova migrate command shuts down an instance to move it to another
hypervisor.
The instance is down for a period of time and sees this as a regular shutdown. It is
not possible to specify the compute host you want to migrate the instance to.
This command does not require shared storage, the migrations can take a long time.
The Openstack cluster chooses the target hypervisor machine based on the free
resources and availability.
The migrate command works with any type of instance. The VM clock has no issues.
The nova live-migration command has almost no instance downtime.
The instance is suspended and does not see this as a shutdown.
The live-migration lets you specify the compute host you want to migrate to, however
with some limitations. This requires shared storage, instances without a configdrive
when block storage is used, or volume-backed instances.
The migration fails if there are not enough resources on the target hypervisor
The VM clock might be off.
Hypervisor Capacity
Before you do a migration, check if the hypervisor host has enough free capacity for the
VM you want to migrate:
Example output:
+-------------+----------------------------------+-----+-----------+---------+
| HOST | PROJECT | cpu | memory_mb | disk_gb |
+-------------+----------------------------------+-----+-----------+---------+
| compute-30 | (total) | 64 | 512880 | 5928 |
| compute-30 | (used_now) | 44 | 211104 | 892 |
| compute-30 | (used_max) | 44 | 315568 | 1392 |
| compute-30 | 4[...]0288 | 1 | 512 | 20 |
| compute-30 | 4[...]0194 | 20 | 4506 | 62 |
In this table, the first row shows the total amount of resources available on the
physical server. The second line shows the currently used resources. The third line
shows the maximum used resources. The fourth line and below shows the resources
available for each project.
If the VM flavor fits on this hypervisor, continue on with the manual migration. If not,
free up some resources or choose another compute server. If the hypervisor node
lacks enough capacity, the migration will fail.
On versions older than Kilo, the Compute service does not use libvirt's live migration
functionality by default, therefore guests are suspended before migration and might
experience several minutes of downtime. This is because there is a risk that the
migration process will never end. This can happen if the guest operating system
uses blocks on the disk faster than they can be migrated. To enable true live
migration using libvirt's migrate functionality, see the Openstack documentation
linked below.
CS6712 – GRID AND CLOUD COMPUTING LABORATORY
If you have shared storage, or if the instance is volume backed, this will send the
instances memory (RAM) content over to the destination host. The source hypervisor
keeps track of which memory pages are modified on the source while the transfer is
in progress.
Once the initial bulk transfer is complete, pages changed in the meantime are
transferred again. This is done repeatedly with (ideally) ever smaller increments. As
long as the differences can be transferred faster than the source VM dirties memory
pages, at some point the source VM gets suspended.
Final differences are sent to the target host and an identical machine started there. At
the same time the virtual network infrastructure takes care of all traffic being directed
to the new virtual machine. Once the replacement machine is running, the suspended
source instance is deleted.
Usually the actual handover takes place so quickly and seamlessly that all but very
time sensitive applications ever notice anything. You can check this by starting a ping
to the VM you are live-migrating. It will stay online and when the VM is suspended
and resumed on the target hypervisor, the ping responses will take a bit longer.
If we don't have shared storage and the VM is not backed by a volume as root disk
(image based VM's) a live-migration requires an extra parameter: nova live-migration
----block-migrate $UUID $COMPUTE-HOST
The process is almost exactly the same as described above. There is one extra step
however. Before the memory contents is sent the disk content is copied over, without
downtime. When the VM is suspended, both the memory contents and the disk
contents are sent over.
The suspend action takes longer and might be noticeable as downtime. The --block-
migrate option is incompatible with read only devices such as ISO CD/DVD drives
and the Config Drive.
The VM is shut down and will be down as long as the copying. With a migrate, the
Openstack cluster chooses an compute-service enabled hypervisor with the most
resources available. This works with any type of instance, with any type of backend
storage.
A migrate is even simpler than a live-migration. Here's the syntax:
This is perfect for instances that are part of a clustered service, or when you have
scheduled and communicated downtime for that specific VM. The downtime is
dependent on the size of the disk and the speed of the (storage) network. rsync over
ssh is used to copy the actual disk, you can test the speed yourself with a few regular
rsync tests, and combine that with the disk size to get an indication of the migration
downtime.
Example output:
+----+--------------+--------------+------+----------+-------+--------------------
--------+-----------------+
13T17:02:49.000000 | - |
| 9 | nova-compute | compute-32 | OS2 | enabled | up | 2015-06-
13T17:02:50.000000 | None |
| 10 | nova-compute | compute-33 | OS2 | enabled | up | 2015-06-
13T17:02:50.000000 | - |
| 11 | nova-compute | compute-34 | OS1 | disabled | up | 2015-06-
13T17:02:49.000000 | Migrations Only |
+----+--------------+--------------+------+--------
In the above example, we have 5 compute nodes, of which one is disabled with
reason Migrations Only. In our case, before we started migrating we have enabled
nova compute on that hypervisor and disabled it on all the other hypervisors:
Now execute the nova migrate command. Since we have disabled all compute
hypervisors except the target hypervisor, that one will be used as migration target. All
new virtual machines created during the migration will also be spawned on that
specific hypervisor. When the migration is finished, enable all the other compute
nodes:
In our case, we would disable the compute-34 because it is for migrations only. This
is a bit dirty and might cause problems if you have monitoring on the cluster state or
spawn a lot of machines all the time.
Do note that this part is specific to the storage you use. In this example we use local
storage (or, a local folder on an NFS mount not shared with other compute nodes)
and image-backed instances. In my case, I needed to migrate an image-backed block
storage instance to a non-shared storage node, but the instance had a configdrive
enabled. Disabling the compute service everywhere is not an option, since the cluster
CS6712 – GRID AND CLOUD COMPUTING LABORATORY
was getting about a hundred new VM's every 5 minutes and that would overload the
hypervisor node.
Example output:
| OS-EXT-SRV-ATTR:hypervisor_hostname | compute-30 |
Login to that hypervisor via SSH. Navigate to the folder where this instance is located,
in our case,/var/lib/nova-compute/instances/$UUID.
The instance is booted from an image based root disk, named disk. qemu in our case
diffs the root disk from the image the VM was created from. Therefore the new
hypervisor also needs that backing image. Find out which file is the backing image:
cd /var/lib/nova-compute/instances/UUID/
qemu-img info disk # disk is the filename of the instance root disk
Example output:
image: disk
file format: qcow2
virtual size: 32G (34359738368 bytes)
disk size: 1.3G
cluster_size: 65536
backing file: /var/lib/nova-compute/instances/_base/d00[...]61
Example output:
image: /var/lib/nova-compute/instances/_base/d00[...]61
Check the target hypervisor for the existence of that image. If it is not there, copy that
file to the target hypervisor first:
cd /var/lib/nova-compute/instances/
rsync -r --progress $VM_UUID -e ssh compute-34:/var/lib/nova-compute/instances/
Set the correct permissions on the folder on the target hypervisor:
chown nova:nova /var/lib/nova-compute/instances/$VM_UUID
chown nova:nova /var/lib/nova-compute/instances/$VM_UUID/disk.info
chown nova:nova /var/lib/nova-compute/instances/libvirt.xml
chown libvirt:kvm /var/lib/nova-compute/instances/$VM_UUID/console.log
chown libvirt:kvm /var/lib/nova-compute/instances/$VM_UUID/disk
chown libvirt:kvm /var/lib/nova-compute/instances/$VM_UUID/disk.config
If you use other usernames and groups, change those in the command. Log in to
your database server. In my case that is a MySQL Galera cluster. Start up a MySQL
command prompt in the novadatabase
mysql nova
Execute the following command to update the nova database with the new
hypervisor for this VM:
This was tested on an IceHouse database scheme, other versions might require
other queries. Use the nova show command to see if the new hypervisor is set. If so,
start the VM:
CS6712 – GRID AND CLOUD COMPUTING LABORATORY
Do note that we must check the free capacity ourself. The VM will work if there is not
enough capacity, but we do run in to weird issues with the hypervisor like bad
performance or killed processes (OOM's).
RESULT:
Thus the virtual machine migration based on the certain condition from one node
to the other was executed successfully.
CS6712 – GRID AND CLOUD COMPUTING LABORATORY
OBJECTIVE:
To find procedure to install storage controller and interact with it.
PROCEDURE:
cinder-api
Accepts API requests, and routes them to the cinder-volume for action.
cinder-volume
Interacts directly with the Block Storage service, and processes such as the cinder-
scheduler. It also interacts with these processes through a message queue. The
cinder-volume service responds to read and write requests sent to the Block Storage
service to maintain state. It can interact with a variety of storage providers through a
driver architecture.
cinder-scheduler daemon
Selects the optimal storage provider node on which to create the volume. A similar
component to the nova-scheduler.
Messaging queue
Routes information between the Block Storage processes.
To configure prerequisites
Before you install and configure the Block Storage service, you must create a
database, service credentials, and API endpoints.
1. To create the database, complete these steps:
a. Use the database access client to connect to the database server as
the root user:
$ mysql -u root -p
2. Create the cinder database:
CS6712 – GRID AND CLOUD COMPUTING LABORATORY
| id | 16e038e449c94b40868277f1d801edb5 |
| name | cinderv2 |
| type | volumev2 |
+-------------+----------------------------------+
4. Create the Block Storage service API endpoints:
$ keystone endpoint-create \
--service-id $(keystone service-list | awk '/ volume / {print $2}') \
--publicurl http://controller:8776/v1/%\(tenant_id\)s \
--internalurl http://controller:8776/v1/%\(tenant_id\)s \
--adminurl http://controller:8776/v1/%\(tenant_id\)s \
--region regionOne
+-------------+-----------------------------------------+
| Property | Value |
+-------------+-----------------------------------------+
| adminurl | http://controller:8776/v1/%(tenant_id)s |
| id | d1b7291a2d794e26963b322c7f2a55a4 |
| internalurl | http://controller:8776/v1/%(tenant_id)s |
| publicurl | http://controller:8776/v1/%(tenant_id)s |
| region | regionOne |
| service_id | 1e494c3e22a24baaafcaf777d4d467eb |
+-------------+-----------------------------------------+
$ keystone endpoint-create \
--service-id $(keystone service-list | awk '/ volumev2 / {print $2}') \
--publicurl http://controller:8776/v2/%\(tenant_id\)s \
--internalurl http://controller:8776/v2/%\(tenant_id\)s \
--adminurl http://controller:8776/v2/%\(tenant_id\)s \
--region regionOne
+-------------+-----------------------------------------+
| Property | Value |
+-------------+-----------------------------------------+
| adminurl | http://controller:8776/v2/%(tenant_id)s |
| id | 097b4a6fc8ba44b4b10d4822d2d9e076 |
| internalurl | http://controller:8776/v2/%(tenant_id)s |
| publicurl | http://controller:8776/v2/%(tenant_id)s |
| region | regionOne |
| service_id | 16e038e449c94b40868277f1d801edb5 |
+-------------+-----------------------------------------+
[database]
...
connection = mysql://cinder:CINDER_DBPASS@controller/cinder
Replace CINDER_DBPASS with the password you chose for the Block Storage
database.
b. In the [DEFAULT] section, configure RabbitMQ message broker access:
[DEFAULT]
...
auth_strategy = keystone
[keystone_authtoken]
...
auth_uri = http://controller:5000/v2.0
identity_uri = http://controller:35357
admin_tenant_name = service
admin_user = cinder
admin_password = CINDER_PASS
3. Replace CINDER_PASS with the password you chose for the cinder user in the
Identity service.
4. In the [DEFAULT] section, configure the my_ip option to use the management
interface IP address of the controller node:
[DEFAULT]
...
my_ip = 10.0.0.11
To finalize installation
1. Restart the Block Storage services:
# service cinder-scheduler restart
# service cinder-api restart
To configure prerequisites
You must configure the storage node before you install and configure the volume
service on it. Similar to the controller node, the storage node contains one network
interface on the management network. The storage node also needs an empty block
storage device of suitable size for your environment.
Only instances can access Block Storage volumes. However, the underlying
operating system manages the devices associated with the volumes. By default, the
LVM volume scanning tool scans the /dev directory for block storage devices that
contain volumes. If tenants use LVM on their volumes, the scanning tool detects these
volumes and attempts to cache them which can cause a variety of problems with both
CS6712 – GRID AND CLOUD COMPUTING LABORATORY
the underlying operating system and tenant volumes. You must reconfigure LVM to
scan only the devices that contain the cinder-volume volume group.
In the devices section, add a filter that accepts the /dev/sdb device and rejects all
other devices:
devices {
...
filter = [ "a/sdb/", "r/.*/"]
Each item in the filter array begins with a for accept or r for reject and includes a regular
expression for the device name. The array must end with r/.*/ to reject any remaining
devices. You can use the vgs -vvvv command to test filters.
Replace RABBIT_PASS with the password you chose for the guest account in
RabbitMQ.
c. In the [DEFAULT] and [keystone_authtoken] sections, configure Identity service
access:
[DEFAULT]
...
auth_strategy = keystone
[keystone_authtoken]
...
auth_uri = http://controller:5000/v2.0
identity_uri = http://controller:35357
CS6712 – GRID AND CLOUD COMPUTING LABORATORY
admin_tenant_name = service
admin_user = cinder
admin_password = CINDER_PASS
Replace CINDER_PASS with the password you chose for the cinder user in the Identity
service.
d. In the [DEFAULT] section, configure the my_ip option:
[DEFAULT]
...
my_ip = MANAGEMENT_INTERFACE_IP_ADDRESS
Replace MANAGEMENT_INTERFACE_IP_ADDRESS with the IP address of the
management
network interface on your storage node, typically 10.0.0.41 for the first node in the
example
architecture.
e. In the [DEFAULT] section, configure the location of the Image Service:
[DEFAULT]
...
glance_host = controller
f. (Optional) To assist with troubleshooting, enable verbose logging in
the [DEFAULT] section:
[DEFAULT]
...
verbose = True
To finalize installation
1. Restart the Block Storage volume service including its dependencies:
# service tgt restart
# service cinder-volume restart
2. By default, the Ubuntu packages create an SQLite database. Because this
configuration uses a SQL
database server, remove the SQLite database file:
# rm -f /var/lib/cinder/cinder.sqlite
Verify operation
This section describes how to verify operation of the Block Storage service by creating a
volume
+------------------+------------+------+---------+-------+-----------------
-----------+-----------------+
| cinder-scheduler | controller | nova | enabled | up | 2014-10-
18T01:30:54.000000 | None |
| cinder-volume | block1 | nova | enabled | up | 2014-10-
18T01:30:57.000000 | None |
+------------------+------------+------+---------+-------+-----------------
-----------+-----------------+
3. Source the demo tenant credentials to perform the following steps as a
nonadministrative
tenant:
$ source demo-openrc.sh
4. Create a 1 GB volume:
$ cinder create --display-name demo-volume1 1
+---------------------+--------------------------------------+
| Property | Value |
+---------------------+--------------------------------------+
| attachments | [] |
| availability_zone | nova |
| bootable | false |
| created_at | 2014-10-14T23:11:50.870239 |
| display_description | None |
| display_name | demo-volume1 |
| encrypted | False |
| id | 158bea89-07db-4ac2-8115-66c0d6a4bb48 |
| metadata | {} |
| size | 1 |
| snapshot_id | None |
| source_volid | None |
| status | creating |
| volume_type | None |
+---------------------+--------------------------------------+
5. Verify creation and availability of the volume:
$ cinder list
--------------------------------------+-----------+--------------+------+--
-----------+----------+-------------+
| ID | Status | Display Name | Size |
Volume Type | Bootable | Attached to |
+--------------------------------------+-----------+--------------+------+-
RESULT:
Thus the procedure to install storage controller and interact with openstack service is
executed successfully.
CS6712 – GRID AND CLOUD COMPUTING LABORATORY
OBJECTIVE:
To set up the one node Hadoop cluster.
PROCEDURE:
1) Installing Java
Hadoop is a framework written in Java for running applications on large clusters of
commodity hardware. Hadoop needs Java 6 or above to work.
Step 1: Download tar and extract
Download Jdk tar.gz file for linux-62 bit, extract it into “/usr/local”
# cd /opt
# sudo tar xvpzf /home/itadmin/Downloads/jdk-8u5-linux-x64.tar.gz
# cd /opt/jdk1.8.0_05
configure ssh:
Hadoop requires SSH access to manage its nodes, i.e. remote machines plus your
local machine if you want to use Hadoop on it (which is what we want to do in this
exercise).
For our single-node setup of Hadoop, we therefore need to configure SSH access to
localhost. The need to create a Password-less SSH Key generation based
authentication is so that the master node can then login to slave nodes (and the
secondary node) to start/stop them easily without any delays for authentication.
Generate an SSH key for the user. Then Enable password-less SSH access to sudo
apt-get install openssh-server
--You will be asked to enter password,
root@abc []# ssh localhost
root@abc[]# ssh-keygen
root@abc[]# ssh-copy-id -i localhost
--After above 2 steps, You will be connected without password,
root@abc[]# ssh localhost
root@abc[]# exit
3) Hadoop installation
Now Download Hadoop from the official Apache, preferably a stable release version of
Hadoop 2.7.x and extract the contents of the Hadoop package to a location of your
choice.
For example, choose location as “/opt/”
Step 1: Download the tar.gz file of latest version Hadoop ( hadoop-2.7.x) from the
official site .
Step 2: Extract (untar) the downloaded file from this commands to /opt/bigdata
root@abc[]# cd /opt
root@abc[/opt]# sudo tar xvpzf /home/itadmin/Downloads/hadoop-2.7.0.tar.gz
root@abc[/opt]# cd hadoop-2.7.0/
Like java, update Hadop environment variable in /etc/profile
# sudo vi /etc/profile
#--insert HADOOP_PREFIX
HADOOP_PREFIX=/opt/hadoop-2.7.0
#--in PATH variable just append at the end of the line
PATH=$PATH:$HADOOP_PREFIX/bin
#--Append HADOOP_PREFIX at end of the export statement
CS6712 – GRID AND CLOUD COMPUTING LABORATORY
Add the following properties in the various hadoop configuration files which is available
under $HADOOP_PREFIX/etc/hadoop/core-site.xml, hdfs-site.xml, mapred-site.xml &
yarn-site.xml
Update Java, hadoop path to the Hadoop environment file
# cd $HADOOP_PREFIX/etc/hadoop
# vi hadoop-env.sh
Paste following line at beginning of the file
export JAVA_HOME=/usr/local/jdk1.8.0_05
export HADOOP_PREFIX=/opt/hadoop-2.7.0
Modify the core-site.xml
# cd $HADOOP_PREFIX/etc/hadoop
# vi core-site.xml
Paste following between <configuration> tags
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
< /configuration>
<value>1</value>
</property>
</configuration>
YARN configuration - Single Node
modify the mapred-site.xml
# cp mapred-site.xml.template mapred-site.xml
# vi mapred-site.xml
Paste following between <configuration> tags
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
Modify yarn-site.xml
# vi yarn-site.xml
Copy the output files from the distributed filesystem to the local filesystem and examine
them:
$ bin/hdfs dfs -get output output
$ cat output/* or
View the output files on the distributed filesystem:
$ bin/hdfs dfs -cat /output/*
RESULT:
Thus setting up of one node Hadoop cluster is successfully executed.
CS6712 – GRID AND CLOUD COMPUTING LABORATORY
OBJECTIVE:
To mount the one node Hadoop cluster using FUSE.
PROCEDURE:
Once fuse-dfs is installed, go ahead and mount HDFS using FUSE as follows.
$ sudo hadoop-fuse-dfs dfs://<name_node_hostname>:<namenode_port>
<mount_point>
Once HDFS has been mounted at <mount_point>, you can use most of the traditional
filesystem operations (e.g., cp, rm, cat, mv, mkdir, rmdir, more, scp). However,
random write operations such as rsync, and permission related operations such as
chmod, chown are not supported in FUSE-mounted HDFS.
RESULT:
Thus mounting the one node Hadoop cluster using FUSE is successfully executed.
CS6712 – GRID AND CLOUD COMPUTING LABORATORY
OBJECTIVE:
Program to use Hadoop’s File system API'
PROCEDURE:
Reading data from and writing data to Hadoop Distributed File System (HDFS) can be
done in a lot of ways. Now let us start by using the FileSystem API to create and write to
a file in HDFS, followed by an application to read a file from HDFS and write it back to
the local file system.
Step 1: Once you have downloaded a test dataset, we can write an application to
read a file from the local file system and write the contents to Hadoop Distributed
File System.
package com.hadoop.hdfs.writer;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.util.Tool;
import java.io.BufferedInputStream;
import java.io.FileInputStream;
import java.io.InputStream;
import java.io.OutputStream;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.util.ToolRunner;
public class HdfsWriter extends Configured implements Tool {
public static final String FS_PARAM_NAME = "fs.defaultFS";
public int run(String[] args) throws Exception {
if (args.length < 2) {
System.err.println("HdfsWriter [local input path] [hdfs output path]");
return 1;
}
String localInputPath = args[0];
Path outputPath = new Path(args[1]);
Configuration conf = getConf();
System.out.println("configured filesystem = " + conf.get(FS_PARAM_NAME));
FileSystem fs = FileSystem.get(conf);
if (fs.exists(outputPath)) {
System.err.println("output path exists");
return 1;
}
OutputStream os = fs.create(outputPath);
InputStream is = new BufferedInputStream(new FileInputStream(localInputPath));
CS6712 – GRID AND CLOUD COMPUTING LABORATORY
Step 2: Export the Jar file and run the code from terminal to write a sample file to
HDFS.
[training@localhost ~]$ hadoop jar HdfsWriter.jar com.hadoop.hdfs.writer.HdfsWriter
sample.txt /user/training/HdfsWriter_sample.txt
Step 3: Verify whether the file is written into HDFS and check the contents of the
file.
[training@localhost ~]$ hadoop fs -cat /user/training/HdfsWriter_sample.txt
Step 4: Next, we write an application to read the file we just created in Hadoop
Distributed File System and write its contents back to the local file system.
package com.hadoop.hdfs.reader;
import java.io.BufferedOutputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.OutputStream;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class HdfsReader extends Configured implements Tool {
public static final String FS_PARAM_NAME = "fs.defaultFS";
public int run(String[ ] args) throws Exception {
if (args.length < 2) {
Step 5: Export the Jar file and run the code from terminal to write a sample file to
HDFS.
[training@localhost ~]$ hadoop jar HdfsReader.jar com.hadoop.hdfs.reader.HdfsReader
/user/training/HdfsWriter_sample.txt /home/training/HdfsReader_sample.txt
Step 6: Verify whether the file is written back into local file system.
[training@localhost ~]$ hadoop fs -cat /user/training/HdfsWriter_sample.txt
FileSystem is an abstract class that represents a generic file system. Most Hadoop
file system implementations can be accessed and updated through the FileSystem
object. To create an instance of the HDFS, you call the method FileSystem.get().
The FileSystem.get() method will look at the URI assigned to the fs.defaultFS
parameter of the Hadoop configuration files on your classpath and choose the correct
implementation of the FileSystem class to instantiate. The fs.defaultFS parameter of
HDFS has the value hdfs://.
Once an instance of the FileSystem class has been created, the HdfsWriter class
calls the create() method to create a file in HDFS. The create() method return an
OutputStream object, which can be manipulated using normal Java I/O methods.
Similarly HdfsReader calls the method open() to open a file in HDFS, which returns
an InputStream object that can be used to read the contents of the file. The
FileSystem API is extensive. To demonstrate some of the other methods available in
the API, we can add some error checking to the HdfsWriter and HdfsReader classes
we created.
RESULT:
Thus the program to use the Hadoop File System API to interact with it is
successfully executed.
CS6712 – GRID AND CLOUD COMPUTING LABORATORY
OBJECTIVE:
Word count program to demonstrate the use of Map and Reduce tasks.
PROCEDURE:
Sample Program:
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class WordCount
{
//Step a
public static class TokenizerMapper extends Mapper < Object , Text, Text, IntWritable >
{
//hadoop supported data types
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
//map method that performs the tokenizer job and framing the initial key value pairs
public void map( Object key, Text value, Context context) throws IOException ,
InterruptedException
CS6712 – GRID AND CLOUD COMPUTING LABORATORY
{
//taking one line at a time and tokenizing the same
StringTokenizer itr = new StringTokenizer (value.toString());
//iterating through all the words available in that line and forming the key value pair
while (itr.hasMoreTokens())
{
word.set(itr.nextToken());
//sending to the context which inturn passes the same to reducer
context.write(word, one);
}
}
}
//Step b
public static class IntSumReducer extends Reducer < Text, IntWritable, Text,
IntWritable >
{
private IntWritable result = new IntWritable();
// Reduce method accepts the Key Value pairs from mappers, do the aggregation based
on keys
// and produce the final output
public void reduce(Text key, Iterable < IntWritable > values, Context context) throws
IOException , InterruptedException }
int sum = 0;
/*iterates through all the values available with a key and
add them together and give the final result as the key and sum of its values*/
for (IntWritable val: values)
{
sum += val.get();
} result.set(sum);
context.write(key, result);
}
}
//Step c
public static void main( String [] args) throws Exception
{
//creating conf instance for Job Configuration
Configuration conf = new Configuration();
//Parsing the command line arguments
String [] otherArgs = new GenericOptionsParser(conf,args).getRemainingArgs();
if (otherArgs.length < 2)
{
System .err.println( "Usage: wordcount <in> [<in>...]<out>" );
System .exit(2);
}
CS6712 – GRID AND CLOUD COMPUTING LABORATORY
//Create a new Job creating a job object and assigning a job name for identification
//purposes
Job job = new Job(conf, "word count" );
job.setJarByClass(WordCount.class);
// Specify various job specific parameters
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
//Setting job object with the Data Type of output Key
job.setOutputKeyClass(Text.class);
//Setting job object with the Data Type of output value
job.setOutputValueClass(IntWritable.class);
//the hdfs input and output directory to be fetched from the command line
for ( int i = 0; i < otherArgs.length 1; ++i)
{
FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
}
FileOutputFormat.setOutputPath(job, new Path(otherArgs[otherArgs.length 1]));
System .exit(job.waitForCompletion( true ) ? 0 : 1);
}
}
Creating Input path in HDFS and moving the data into Input path
bigdata@localhost:/home/bigdata/Downloads/hadoop2.5.1$bin/hadoop fs mkdir/mrin
bigdata@localhost:/home/bigdata/Downloads/hadoop2.5.1$bin/hadoop
fs copyFromLocal/home/bigdata/Downloads/mrcode/mrsampledata/*
hdfs://localhost:9000/mrin
RESULT:
Thus the Word count program to use Map and reduce tasks is demonstrated
successfully.