Bda Manual
Bda Manual
Bda Manual
AIM:
To Download and install Hadoop; Understand different Hadoop modes. Startup scripts,
Configuration files.
THEORY:
Hadoop is a Java-based programming framework that supports the processing and storage of
extremely large datasets on a cluster of inexpensive machines. It was the first major open-source
project in the big data playing field and is sponsored by the Apache Software Foundation.
Hadoop-2.8.0 is comprised of four main layers:
Hadoop Common is the collection of utilities and libraries that support other Hadoop
modules.
HDFS, which stands for Hadoop Distributed File System, is responsible for persisting
data to disk.
YARN, short for Yet Another Resource Negotiator, is the "operating system" for HDFS.
MapReduce is the original processing model for Hadoop clusters. It distributes work
within the cluster or map, then organizes and reduces the results from the nodes into a
response to a query. Many other processing models are available for the 2.x version of
Hadoop.
Hadoop clusters are relatively complex to set up, so the project includes a stand-alone mode
which is suitable for learning about Hadoop, performing simple operations, and debugging.
PREPARE:
These softwares should be prepared to install Hadoop 2.8.0 on window 10 64bit
1. Download Hadoop 2.8.0
2. Java JDK 1.8.0.zip
PROCEDURE:
Procedure to Run Hadoop
1. Install Apache Hadoop 2.8.0 in Microsoft Windows OS
If Apache Hadoop 2.8.0 is not already installed then follow the post Build, Install,
Configure and Run Apache Hadoop 2.8.0 in Microsoft Windows OS.
2. Start HDFS (Namenode and Datanode) and YARN (Resource Manager and Node
Manager)
1
Run following commands.
Command Prompt
C:\Users\> hdfs namenode –format
C:\hadoop\sbin>start-dfs
C:\hadoop\sbin>start-yarn
C:\hadoop\sbin>start-all.cmd
C:\hadoop\sbin>jps (used to check how many nodes are running in background of Hadoop)
Namenode, Datanode, Resource Manager and Node Manager will be started in few
minutes and ready to execute Hadoop MapReduce job in the Single Node (pseudo-distributed
mode) cluster.
PREREQUISITES:
Step1: Installing Java 8 version.
Openjdk version "1.8.0_91"
OpenJDK Runtime Environment (build 1.8.0_91-8u91-b14-3ubuntu1~16.04.1-b14)
OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
This output verifies that OpenJDK has been successfully installed.
Note: To set the path for environment variables. i.e. JAVA_HOME
2
2. If Java is not installed on your system then first install java under C:\JAVA
3
4. Set the path HADOOP_HOME Environment variable on windows 10(see Step 1,2,3 and 4 below)
4
5. Set the path JAVA_HOME Environment variable on windows 10(see Step 1,2,3 and 4 below)
5
6. Next, we set the Hadoop bin directory path and JAVA bin path
6
CONFIGURATION
1. Edit file C:/Hadoop-2.8.0/etc/hadoop/core-site.xml, paste below xml paragraph and save this file.
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
4. Edit file C:\Hadoop-2.8.0/etc/hadoop/hdfs-site.xml, paste below xml paragraph and save this file.
<configuration>
<property>
<name>dfs.replication</name>
7
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/hadoop-2.8.0/data/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/hadoop-2.8.0/data/datanode</value>
</property>
</configuration>
5. Edit file C:/Hadoop-2.8.0/etc/hadoop/yarn-site.xml, paste below xml paragraph and save this file.
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
8
Hadoop Configuration
1. Dowload file Hadoop Configuration.zip
2. Delete file bin on C:\Hadoop-2.8.0\bin, replaced by file bin on file just download (from Hadoop
Configuration.zip).
3. Open cmd and typing command “hdfs namenode –format” . You will see
Testing
1. Open cmd and change directory to “C:\Hadoop-2.8.0\sbin” and type “start-all.cmd” to start Hadoop
9
3. OUTPUT:
Open:
http://localhost:8088
10
4. OUTPUT:
Open:
http://localhost:50070
11
RESULT:
Thus, a procedure to installation of Hadoop cluster was successfully executed.
12
EX.NO: 2 HADOOP FILE MANAGEMENT TASKS
DATE:
AIM:
Implement the following file management tasks in Hadoop:
Adding files and directories
Retrieving files
Deleting files
DESCRIPTION:
HDFS is a scalable distributed filesystem designed to scale to petabytes of data while running
on top of the underlying filesystem of the operating system. HDFS keeps track of where the data
resides in a network by associating the name of its rack (or network switch) with the dataset. This
allows Hadoop to efficiently schedule tasks to those nodes that contain data, or which are nearest to
it, optimizing bandwidth utilization. Hadoop provides a set of command line utilities that work
similarly to the Linux file commands, and serve as your primary interface with HDFS. We‘re going
to have a look into HDFS by interacting with it from the command line. We will take a look at the
most common file management tasks in Hadoop, which include:
Adding files and directories to HDFS
Retrieving files from HDFS to local filesystem
Deleting files from HDFS
ALGORITHM:
SYNTAX AND COMMANDS TO ADD, RETRIEVE AND DELETE DATA FROM HDFS
Step-1: Adding Files and Directories to HDFS
Before you can run Hadoop programs on data stored in HDFS, you‘ll need to put the data into
HDFS first. Let‘s create a directory in and put a file in it. HDFS has a default working directory
of /user/$USER, where $USER is your login user name. This directory isn‘t automatically created
for you, though, so let‘s create it with the mkdir command.
Note: input_file.txt is created in sbin with some contents
C:\hadoop-2.8.0\sbin>hadoop fs -mkdir /input_dir
C:\hadoop-2.8.0\sbin>hadoop fs -put input_file.txt /input_dir/input_file.txt
Step 2: List the contents of a directory.:
13
C:\hadoop-2.8.0\sbin>hadoop fs -ls /input_dir/
Step 3: Retrieving Files from HDFS
The Hadoop command get copies files from HDFS back to the local filesystem. To
retrieveexample.txt, we can run the following command:
C:\hadoop-2.8.0\sbin>Hadoop fs -cat /input_dir/input_file.txt
Output: Hello world hello hi (which is stored in input_file .txt)
Step 4: Download the file:
Command: hadoop fs -get: Copies/Downloads files to the local file system
Example: hadoop fs -get /user/saurzcode/dir3/Samplefile.txt /home/
Step 5: Copy a file from source to destination
This command allows multiple sources as well in which case the destination must be a directory.
Command: hadoop fs -cp
Example: hadoop fs -cp /user/saurzcode/dir1/abc.txt /user/saurzcode/ dir2
Step 6: Copy a file from/To Local file system to HDFS copyFromLocal
Command: hadoop fs -copyFromLocal URI
Example: hadoop fs -copyFromLocal /home/saurzcode/abc.txt /user/ saurzcode/abc.txt
copyToLocal
Command: hadoop fs -copyToLocal [-ignorecrc] [-crc] URI
Step 7: Move file from source to destination
Note:- Moving files across filesystem is not permitted.
Command: hadoop fs -mv
Example: hadoop fs -mv /user/saurzcode/dir1/abc.txt /user/saurzcode/ dir2
Step 8: Deleting Files from HDFS
C:\hadoop-2.8.0\sbin>hadoop fs -rm input_file.txt /input_dir/input_file.txt
Recursive version of delete:
Command: hadoop fs -rmr
Example: hadoop fs -rmr /user/saurzcode/
Step 9: Display last few lines of a file
Similar to tail command in Unix.
Usage : hadoop fs -tail
Example: hadoop fs -tail /user/saurzcode/dir1/abc.txt
Step 10: Display the aggregate length of a file
Command: hadoop fs -du
14
Example: hadoop fs -du /user/saurzcode/dir1/abc.txt
HADOOP OPERATION:
1. Open cmd in administrative mode and move to “C:/Hadoop-2.8.0/sbin” and start cluster
Start-all.cmd
2. Copy the input text file named input_file.txt in the input directory (input_dir)of HDFS.
15
hadoop fs -put C:/input_file.txt /input_dir
OUTPUT:
OTHER COMMANDS:
OUTPUT:
16
OUTPUT:
17
RESULT:
Thus, the implementation for file management in Hadoop was successfully
executed.
18
EX.NO: 3 MATRIX MULTIPLICATION
DATE:
AIM:
To Implement of Matrix Multiplication with Hadoop Map Reduce.
THEORY:
In mathematics, matrix multiplication or the matrix product is a binary operation that produces
amatrix from two matrices. In more detail, if A is an n × m matrix and B is an m × p matrix, their matrix
product AB is an n × p matrix, in which the m entries across a row of A are multiplied with the m
entries down a column of B and summed to produce an entry of AB. When two linear transformations
are represented by matrices, then the matrix product represents the composition of the two
transformations.
produce (key,value) pairs as ((i,k), (M,j,mij), for k=1,2,3,.. upto the number of columns of N for
produce (key,value) pairs as ((i,k),(N,j,Njk), for i = 1,2,3,.. Upto the number of rows of M.
return Set of (key,value) pairs that each key (i,k), has list with values (M,j,mij) and (N, j,njk) for
all possible values of j.
19
ALGORITHM FOR REDUCE FUNCTION:
HADOOP OPERATION:
Make sure that Hadoop is installed on your system with java idk Steps to follow
Step 1: Open Eclipse> File > New > Java Project > (Name it – MRProgramsDemo) > Finish
Step 2: Right Click > New > Package (Name as com.mapreduce.wc) > Finish
Step 3: Right Click on Package > New > Class (Name it - Matrixmultiply)
Step 4: Add Following Reference Libraries –
Right Click on Project > Build Path> Add External Archivals
1. C:/Hadoop/share/Hadoop->common/lib -> add all jar
2. C:/Hadoop/share/Hadoop-> client -> add all jar
3. C:/Hadoop/share/Hadoop-> mapreduce -> add all jar
4. C:/Hadoop/share/Hadoop-> yarn -> add all jar
4. C:/lib/hadoop-2.8.0/lib/Commons-cli-1.2.jar
5. C:/lib/hadoop-2.8.0/hadoop-core.jar
By Downloading the hadoop jar files with these links.
Download Hadoop Common Jar files: https://goo.gl/G4MyHp
hadoop-common-2.2.0.jar
Download Hadoop Mapreduce Jar File: https://goo.gl/KT8yfB
hadoop-mapreduce-client-core-2.7.1.jar
PROGRAM:
Creating Map file for Matrix Multiplication.
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import java.io.IOException;
public class Map
extends org.apache.hadoop.mapreduce.Mapper<LongWritable, Text, Text, Text> {
@Override
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
Configuration conf = context.getConfiguration();
int m = Integer.parseInt(conf.get("m"));
int p = Integer.parseInt(conf.get("p"));
String line = value.toString(); // (M, i, j, Mij);
import java.io.IOException;
import java.util.HashMap;
21
int n = Integer.parseInt(context.getConfiguration().get("n"));
float result = 0.0f;
float m_ij;
float n_jk;
for (int j = 0; j < n; j++) {
m_ij = hashA.containsKey(j) ? hashA.get(j) : 0.0f;
n_jk = hashB.containsKey(j) ? hashB.get(j) : 0.0f;
result += m_ij * n_jk;
}
if (result != 0.0f) {
context.write(null,
new Text(key.toString() + "," + Float.toString(result)));
}
}
}
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setInputFormatClass(TextInputFormat.class);
22
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
}
}
Step 5: Uploading the M, N file which contains the matrix multiplication data to HDFS.
Create M.txt in sbin
M,0,0,1
M,0,1,2
M,1,0,3
M,1,1,4
Create N.txt in sbin
N,0,0,5
1,6
N,1,0,7
N,1,1,8
Run following commands in cmd prompt:
$ hadoop fs -mkdir /input_matrix/
$ hadoop fs -put M.txt / input_matrix
$ hadoop fs -put N.txt / input_matrix
$ hadoop fs -cat N.txt / input_matrix / M.txt
$ hadoop fs -cat N.txt / input_matrix / N.txt
Step 1: Open Eclipse> open -> (MRProgramsDemo )project -> Right Click -> Export->java->JAR file
-> Next -> name it as matrix.jar
Step 2: Open Command prompt
Sbin >hadoop jar C:/MRProgramsDemo\matrix com.mapreduce.wc/matrixmultiply
/input_matrix/* /output_matrix
23
OUTPUT:
24
RESULT:
Thus, a Implementation of Matrix Multiplication with Hadoop Map Reduce was executed
successfully.
25
EX.NO: 4 WORD COUNT MAP REDUCE
DATE:
AIM:
To Run a basic Word Count MapReduce program to understand Map Reduce
Paradigm.
THEORY:
MapReduce is a programming model used for efficient processing in parallel over large data -sets in a
distributed manner. The data is first split and then combined to produce the final result. The libraries
for MapReduce is written in so many programming languages with various different -different
optimizations.
Workflow of MapReduce consists of 5 steps:
1. Splitting – The splitting parameter can be anything, e.g. splitting by space, comma, semicolon, or
even by a new line (‘\n’).
2. Mapping – It takes a set of data and converts it into another set of data, where individual elements
are broken down into tuples (Key-Value pair).
3. Intermediate splitting – the entire process in parallel on different clusters. In order to group them
in “Reduce Phase” the similar KEY data should be on same cluster.
4. Reduce – it is nothing but mostly group by phase
5. Combining – The last phase where all the data (individual result set from each cluster) is
combined together to form a Result.
26
PREPARE:
1. Download MapReduceClient.jar
(Link: https://github.com/MuhammadBilalYar/HADOOP- INSTALLATION-ON-
WINDOW-10/blob/master/MapReduceClient.jar)
2. Download Input_file.txt
(Link: https://github.com/MuhammadBilalYar/HADOOP-
INSTALLATION-ON-WINDOW-10/blob/master/input_file.txt)
PROGRAM:
package PackageDemo;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class WordCount {
public static void main(String [] args) throws Exception
{
Configuration c=new Configuration();
String[] files=new GenericOptionsParser(c,args).getRemainingArgs();
Path input=new Path(files[0]);
Path output=new Path(files[1]);
27
Job j=new Job(c,"wordcount");
j.setJarByClass(WordCount.class);
j.setMapperClass(MapForWordCount.class);
j.setReducerClass(ReduceForWordCount.class);
j.setOutputKeyClass(Text.class);
j.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(j, input);
FileOutputFormat.setOutputPath(j, output);
System.exit(j.waitForCompletion(true)?0:1);
}
public static class MapForWordCount extends Mapper<LongWritable, Text, Text,IntWritable>
{
public void map(LongWritable key, Text value, Context con) throws IOException,
InterruptedException
{
String line = value.toString();
String[] words=line.split(","); for(String word: words )
{
Text outputKey = new Text(word.toUpperCase().trim());
IntWritable outputValue = new IntWritable(1);
con.write(outputKey, outputValue);
}
}
}
public static class ReduceForWordCount extends Reducer<Text, IntWritable, Text,IntWritable>
{
public void reduce(Text word, Iterable<IntWritable> values, Context con) throwsIOException,
InterruptedException
{
int sum = 0;
for(IntWritable value : values)
{
sum += value.get();
}
con.write(word, new IntWritable(sum));
}
}
}
OUTPUT:
HADOOP OPERATION:
1. Open cmd in Administrative mode and move to "C:/Hadoop-2.8.0/sbin" andstart cluster
Start-all.cmd
29
2. Create an input directory in HDFS.
hadoop fs -mkdir /input_dir
3. Copy the input text file named input_file.txt in the input directory (input_dir) of HDFS.
hadoop fs -put C:/input_file.txt /input_dir/input_file.txt
30
OUTPUT
31
32
RESULT:
Thus, the implementation of Wordcount with Hadoop Map Reduce was executed
successfully.
33
EX.NO: 5 HIVE
DATE:
AIM:
To Installation of Hive along with practice examples.
THEORY:
Apache Hive is a data warehouse and an ETL tool which provides an SQL-like interface between the
user and the Hadoop distributed file system (HDFS) which integrates Hadoop. It is built on top of Hadoop.
It is a software project that provides data query and analysis. It facilitates reading, writing and handling
wide datasets that stored in distributed storage and queried by Structure Query Language (SQL) syntax.
PREPARE:
These softwares should be prepared to install Hadoop 2.8.0 on window 10 64bit
1. Download Hadoop 2.8.0
2. Java JDK 1.8.0.zip
3. Download Hive 2.1.0 : https://archive.apache.org/dist/hive/hive-2.1.0/
4. Download Derby Metastore 10.12.1.1: https://archive.apache.org/dist/db/derby/db-derby-10.12.1.1/
5. Download hive-site.xml :
https://drive.google.com/file/d/1qqAo7RQfr5Q6O-GTom6Rji3TdufP81zd/view?usp=sharing
PROCEDURE:
STEP - 1: Download and Extract the Hive file:
[1] Extract file apache-hive-2.1.0-bin.tar.gz and place under "D:\Hive", you can use any preferred location
[2] Copy the leaf folder “apache-hive-2.1.0-bin” and move to the root folder "D:\Hive.
34
STEP - 3: Moving hive-site.xml file
Drop the downloaded file “hive-site.xml” to hive configuration location “D:\Hive\apache-hive-2.1.0-
bin\conf”.
35
2] Select all , copy and paste all libraries from derby to hive location D:\Hive\apache-hive-2.1.0-bin\lib.
36
37
STEP - 6: Configure System variables
Next onward need to set System variables, including Hive bin directory path:
HADOOP_USER_CLASSPATH_FIRST - true
Variable: Path
Value:
1. D:\Hive\apache-hive-2.1.0-bin\bin
2. D:\Derby\db-derby-10.12.1.1-bin\bin
38
<name>hive.server2.authentication</name>
<value>NONE</value>
<description> Client authentication types. NONE: no authentication check LDAP: LDAP/AD based
authentication KERBEROS: Kerberos/GSSAPI authentication CUSTOM: Custom authentication provider
(Use with property hive.server2.custom.authentication.class) </description>
</property>
<property>
<name>datanucleus.autoCreateTables</name>
<value>True</value>
</property>
</configuration>
STEP - 8: Start the Hadoop
Here need to start Hadoop first -
Open command prompt and change directory to “D:\Hadoop\hadoop-2.8.0\sbin" and type "start-all.cmd" to
start apache.
39
It can be verified via browser also as –
Namenode (hdfs) - http://localhost:50070
Datanode - http://localhost:50075
All Applications (cluster) - http://localhost:8088 etc.
Since the ‘start-all.cmd’ command has been deprecated so you can use below command in order wise -
“start-dfs.cmd” and
“start-yarn.cmd”
OUTPUT:
40
STEP - 10: Start the Hive
Derby server has been started and ready to accept connection so open a new command prompt under
administrator privileges and move to hive directory as “
> cd D:\Hive\apache-hive-2.1.0-bin\bin” –
[1] Type “jps -m” to check NetworkServerControl
OUTPUT:
PROGRAM:
HIVE QUARIES AND OUTPUT:
[1] Create Database in Hive -
hive>CREATE DATABASE IF NOT EXISTS TRAINING;
41
[2] Show Database -
hive>SHOW DATABASES;
42
[4] DESCRIBE Table Command in Hive -
hive>describe students
43
[6] Retrieve Data from Table -
hive>SELECT * FROM STUDENTS;
44
Example:
The following query renames the table from employee to emp:
hive> ALTER TABLE employee RENAME TO emp;
The following queries rename the column name and column data type using the above data:
hive> ALTER TABLE employee CHANGE name ename String;
hive> ALTER TABLE employee CHANGE salary salary Double;
The following query adds a column named dept to the employee table:
hive> ALTER TABLE employee ADD COLUMNS (dept STRING COMMENT 'Department name');
45
RESULT:
Thus, a procedure to installation of HIVE and commands are executed successfully.
46
EX NO: 6.1 HBASE
DATE:
AIM:
To Installation of HBase along with Practice examples.
THEORY:
Apache HBase is an open source non-relational (NoSQL) distributed column-oriented database that runs
on top of HDFS and real-time read/write access to those large data-sets. Initially, it was Google Big Table,
afterwards it was re-named as HBase and is primarily written in Java, designed to provide quick random access
to huge amounts of the data-set.
In brief, the HBase can store massive amounts of data from terabytes to petabytes and allows fast random reads
and writes that cannot be handled by the Hadoop. Even relational databases (RDBMS) cannot handle a variety
of data that is growing exponentially.
HBase can be installed in three modes. The features of these modes are mentioned below.
[1] Standalone mode installation (No dependency on Hadoop system)
This is default mode of HBase
It runs against local file system
It doesn't use Hadoop HDFS
Only HMaster daemon can run
Not recommended for production environment
Runs in single JVM
[2] Pseudo-Distributed mode installation (Single node Hadoop system + HBase installation)
It runs on Hadoop HDFS
All Daemons run in single node
Recommend for production environment
[3] Fully Distributed mode installation (Multi node Hadoop environment + HBase installation)
It runs on Hadoop HDFS
All daemons going to run across all nodes present in the cluster
Highly recommended for production environment
47
PREPARE:
These softwares should be prepared to install Hadoop 2.8.0 on window 10 64bit
1. Download Hadoop 2.8.0
2. Java JDK 1.8.0.zip
3. Download HBase 1.4.7
http://www.apache.org/dyn/closer.lua/hbase/
PROCEDURE:
Hbase - Standalone mode installation
Here, we will go through the Standalone mode installation with Hbase on Windows 10.
STEP - 1: Extract the HBase file
Extract file hbase-1.4.7-bin.tar.gz and place under "D:\HBase", you can use any preferred location:
[1] You will get again a tar file post extraction –
[2] Go inside of hbase-1.4.7-bin.tar folder and extract again.Then Copy the leaf folder “hbase-1.4.7” and move
to the root folder "D:\HBase" folders:
48
STEP - 2: Configure Environment variable
Set the path for the following Environment variable (User Variables) on windows 10 –
HBASE_HOME - D:\HBase\hbase-1.4.7
This PC - > Right Click - > Properties - > Advanced System Settings - > Advanced - > Environment Variables
49
For example -
[2] Edit file D:/HBase/hbase-1.4.7/conf/hbase-site.xml, paste below xml paragraph and save this file.
<configuration>
<property>
<name>hbase.rootdir</name>
<value>file:///D:/HBase/hbase-1.4.7/hbase</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/D:/HBase/hbase-1.4.7/zookeeper</value>
</property>
50
<property>
<name> hbase.zookeeper.quorum</name>
<value>127.0.0.1</value>
</property>
</configuration>
All HMaster and ZooKeeper activities point out to this hbase-site.xml.
[3] Edit file hosts (C: /Windows/System32/drivers/etc/hosts), mention localhost IP and save this file.
127.0.0.1 localhost
51
STEP - 7: Validate HBase
Post successful execution of HBase, verify the installation using following commands –
hbase –version
jps
HBase installed !!
52
PROGRAM:
QUERIES AND OUTPUT:
[1] Create a simple table-
hBase>create 'student', 'bigdata'
53
[5] Disabling a Table using HBase Shell
To delete a table or change its settings, you need to first disable the table using the disable command.
You can re-enable it using the enable command.
Given below is the syntax to disable a table:
hBase>disable ‘emp’
Example
Given below is an example that shows how to disable a table.
hbase> disable 'emp'
0 row(s) in 1.2760 seconds
Verification
After disabling the table, you can still sense its existence through list and exists commands. You cannot scan
it. It will give you the following error.
hbase> scan 'emp'
ROW COLUMN + CELL
ERROR: emp is disabled.
is_disabled
This command is used to find whether a table is disabled. Its syntax is as follows.
hbase> is_disabled 'table name'
The following example verifies whether the table named emp is disabled. If it is disabled, it will return true
and if not, it will return false.
hbase(main):031:0> is_disabled 'emp'
true
0 row(s) in 0.0440 seconds
disable_all
This command is used to disable all the tables matching the given regex. The syntax for disable_all command
is given below.
54
hbase> disable_all 'r.*'
Suppose there are 5 tables in HBase, namely raja, rajani, rajendra, rajesh, and raju. The following code will
disable all the tables starting with raj.
hbase(main):002:07> disable_all 'raj.*'
raja
rajani
rajendra
rajesh
raju
Disable the above 5 tables (y/n)?
y
5 tables successfully disabled
55
RESULT:
Thus, a procedure to installation of HBase and queries are executed successfully.
56
EX.NO: 6.2 THRIFT
DATE:
AIM:
To Installing thrift along with Practice examples.
THEORY:
Apache Thrift is a RPC framework founded by facebook and now it is an Apache project. Thrift lets
you define data types and service interfaces in a language neutral definition file. That definition file is used
as the input for the compiler to generate code for building RPC clients and servers that communicate
over different programming languages.
software framework, for scalable cross-language services development, combines a software stack
with a code generation engine to build services that work efficiently and seamlessly between C++, Java,
Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js, Smalltalk, OCaml and Delphi
and other languages.
PROCEDURE:
PROGRAM:
1.Example definition file (add.thrift)
namespace java com.eviac.blog.samples.thrift.server // defines the namespace
typedef i32 int //typedefs to get convenient names for your types
57
service AdditionService { // defines the service to add two numbers
int add(1:int n1, 2:int n2), //defines a method
}
After performing the command, inside gen-java directory you'll find the source codes which is useful for
building RPC clients and server. it will create a java code called AdditionService.java
package com.eviac.blog.samples.thrift.server;
import org.apache.thrift.TException;
@Override
public int add(int n1, int n2) throws TException {
return n1 + n2;
}
58
import org.apache.thrift.transport.TServerTransport;
import org.apache.thrift.server.TServer;
import org.apache.thrift.server.TServer.Args;
import org.apache.thrift.server.TSimpleServer;
Following is an example java client code which consumes the service provided by AdditionService.
59
import org.apache.thrift.transport.TTransport;
import org.apache.thrift.transport.TTransportException;
public class AdditionClient {
public static void main(String[] args) {
try {
TTransport transport;
System.out.println(client.add(100, 200));
transport.close();
} catch (TTransportException e) {
e.printStackTrace();
} catch (TException x) {
x.printStackTrace();
}
}
OUTPUT:
Run the server code(MyServer.java). It should output following and will listen to the requests.
300
60
RESULT:
Thus, a procedure to installation of Thrift and programs are executed successfully.
61
EX.NO: 7.1 CASSANDRA
DATE:
AIM:
To Export and Import data in Cassandra
THEORY:
Cassandra is a distributed database management system which is open source with wide column
store, NoSQL database to handle large amount of data across many commodity servers which provides
high availability with no single point of failure. It is written in Java a nd developed by Apache Software
Foundation.
CQL shell (cqlsh) :
This is a tool for Cassandra Query Language which supports Cassandra. cqlsh is a command-line
shell for interacting with Cassandra through CQL (the Cassandra Query Language). with the help of the
cql command, we can read and write data with the help of the cql query.
By default, CQL installed in bin/ directory alongside the Cassandra executable. In Cassandra, cqlsh utilizes
the Python native protocol driver and connects to the single node specified on the command line.
PROGRAM:
62
Example:
OUTPUT:
The CSV file is created:
Using 7 child processes
Starting copy of Data with columns [id, firstname, lastname].
Processed: 6 rows; Rate: 20 rows/s; Avg. rate: 30 rows/s
6 rows exported to 1 files in 0.213 seconds.
63
Step 6: RETRIEVE IMPORTED DATA
To View the results whether it is successfully imported or not.
cqlsh>SELECT * FROM Data;
OUTPUT:
64
RESULT:
Thus, a procedure to installation of Cassandra was successfully completed and execution of files are
exported and imported successfully.
65
EX.NO: 7.2 MongoDB
DATE:
AIM:
To Export and Import data in MongoDB
THEORY:
MongoDB is an open-source document database and leading NoSQL database. MongoDB is written
in C++. This tutorial will give you great understanding on MongoDB concepts needed to create and deploy a
highly scalable and performance-oriented database.
MongoDB is a cross-platform, document-oriented database that provides, high performance, high
availability, and easy scalability. MongoDB works on concept of collection and document.
Database
Database is a physical container for collections. Each database gets its own set of files on the file system. A
single MongoDB server typically has multiple databases.
Collection
Collection is a group of MongoDB documents. It is the equivalent of an RDBMS table. A collection exists
within a single database. Collections do not enforce a schema. Documents within a collection can have
different fields. Typically, all documents in a collection are of similar or related purpose.
Document
A document is a set of key-value pairs. Documents have dynamic schema. Dynamic schema means that
documents in the same collection do not need to have the same set of fields or structure, and common fields
in a collection's documents may hold different types of data.
PROCEDURE:
Step 1:
Install MongoDB Compass On Windows
MongoDB Compass is a GUI based tools (unline MongoDB Shell) to interact with local or remote
MongoDB server and databases. Use Compass to visually explore your data, run ad hoc queries,
perform CRUD operations, and view and optimize your query performance. It can be installed on
Linux, Mac, or Windows.
66
just click OK to connect with your local server, as shown below.
As you can see above, it will display all the databases on the connected MongoDB server. On the left pane,
it displays information about the connected server.
67
Now, you can create, modify, delete databases, collections, documents using MongoDB Compass.
Click on the CREATE DATABASE button to create a new database. This will open Create Database popup,
as shown below.
Enter your database name and collection name and click Create Database. This will create
a new database humanResourceDB with the new employees collection shown below.
68
Click on employees collection to insert, update, find documents in it. This will open the following
window to manage documents.
69
PROGRAMS:
1. Create a new database:
Command: use <database-name>
Example: use humanResourceDB
3.Delete a database:
Command: db.dropDatabase()
Example: db.dropDatabase()
OUTPUT:
4. Create a collection:
Command: db.createCollection()
Example: db.createCollection(“employee”)
70
OUTPUT:
To show Db:
71
5.Insert documents into a collection:
5.1 insertOne() - Inserts a single document into a collection.
Command: db.<collection>.insertOne()
Example: db.employees.insertOne({
firstName: "John",
lastName: "King",
email: "[email protected]"
})
OUTPUT:
{
acknowledged: true,
insertedId: ObjectId("616d44bea861820797edd9b0")
}
72
OUTPUT:
{
acknowledged: true,
insertedIds: {
'0': ObjectId("616d63eda861820797edd9b3"),
'1': 1,
'2': ObjectId("616d63eda861820797edd9b5")
}
}
email: "[email protected]"
}
7. Import Data: Import data into a collection from either a JSON or CSV file
Step 1: Navigate to your target collection: Either select the collection from the Collections tab or click the
collection in the left-hand pane.
Step 2: Click the Add Data dropdown and select Import JSON or CSV file.
76
Step 3: Select the appropriate file type.
Select either a JSON or CSV file to import and click Select.
77
Step 4: Click insert or Import.
8. Export Data from a Collection: Export data from a collection as either a JSON or CSV file.
Step 1: Click the Export Data dropdown and select Export the full collection.
78
Step 2: Select your file type: You can select either JSON or CSV.
Step 3: Click Export and Choose where to export the file and click Select.
79
RESULT:
Thus, a procedure to installation of MongoDB was completed and execution of files are exported and
imported successfully.
80