Introduction To Perl

Download as pdf or txt
Download as pdf or txt
You are on page 1of 62

Introduction to Perl

A group activity
HTDG July 29th, 2009 C-DAC, Pune

Outline
Perl History and current activity A simple Perl Program Data Type and storing classes Subroutines Input, Output and File handling The world of Regular Expression Control structures and logical operators Process management Introduction to Perl Modules

Perl History
It sometimes called Practical Extraction and Reporting Language. Larry Wall has created it in the mid of 1980, when he was fed up with scripting tools like awk, sed, grep, cut and sort. As being a lazy programmer, he needed a language which would be easy and quick. He decided to overkill the problem with general purpose tool that he could use at least one other place. The result was Perl version zero.

Current activity
Larry Wall made Perl available to Usenet readers, known as the Net. Now a days Perl developers are working on Perl-6 which is an Object Oriented version. CPAN Comprehensive Perl Archive Network, where you will get most of the Perl stuff. (source code itself, modules, extensions, examples and documentations) http://search.cpan.org/ http://kobsearch.cpan.org/

A Simple Perl Program


#!/usr/bin/perl # This is perl commenting style print Hello, World ! \n ;

$chmod a+x hello.pl $perl -c hello.pl $./hello.pl Or $perl hello.pl

Perl Data Type and storing classes


Scalar Data
It is useful for storing integers, floating numbers, character and string as well.

Lists (storing class)


It is similar to arrays in C language, unlike arrays in C lists can be form of mixed data types.

Here scalar and Lists context is very important Hashes (storing class)

Scalar Data
Declaration, Initialization and accessing Scalar variable
my $variable ; # it contains undef $variable = 10 ; print my first variable is $variable \n ;

Integer literals
0 2001 -40 43555586769367 Or 435_55_58_67_69_367

Scalar Data

(cntd)

Floating Literals 1.25 255.000 7.25e45 # 7.25 times 10 to the 45th power -12e-24 # a big negative number Non-decimal Integer Literal 0377 # 377 octal, same as 255 decimal 0xff # FF hex, also 255 decimal 0b11111111 # a binary, also 255 decimal

Strings

Scalar Data

(cntd)

Single Quoted Strings Any character other than a single quote or back slash between the quote marks (including new line character \n) nilesha # string having 7 characters nilesh awate # string with 12 characters # null string Don\t let apostrophe end this string prematurely

Scalar Data
Strings

(cntd)

Double Quoted Strings It is similar to strings you may have seen in other languages where backslash takes on its full power to specify certain control characters 0377 means 255\n; what is 2+2\n; $var = 2+2 ; print variable is $var\n ; # 255 means 255 # what is 2+2

# variable is 4

String Operators
. or concatenate operator Hello . World # HelloWorld Hello . . World # Hello World Hello World . \n # Hello World\n x or repetition operator fred x 3 # fredfredfred trick x (2+2) # trick x 4 . . . 5x4 #? chomp chomps get rid of trailing new line (\n) my $text = good afternoon\n ; chomp ($text) ; # get rid of \n this is very useful in many cases, you will see it almost in all programs

Perls Built-in Warnings


Warnings Perl can warn you when it sees something suspicious going on in your program. To turn warnings on in your program use #!/usr/bin/perl w Or use warnings ; # its a pragma Without making warnings on you might end up with operating on wrong data. my ($a, $b) ; print sum is $a + $b \n ; # lets check

Strict use strict ; # at the beginning of program this pragma enforces some good programming rules, Some of them are. Always declare a variable with my. You cannot use two variables of same name. Diagnostics use diagnostics ; # when you want to debug. $perl Mdiagnostics ./my_program.pl It gives you a nice snippet of Perl documentation on your mistakes.

Perl Built-in Warnings

Lists and arrays

Scalar Data

(cntd)

List : list is a order of collection of scalars, In simple words list is a data. For e.g (35, 12.5, hello, 17.2e12, bye) ; Arrays : array is available that contains/hold a list. declaration and initialization of an array as follows. my @array = (35, 12.5, hello, 17.2e12, bye) ; print $array[0] ; # prints 35 print $array[2] ; # prints hello $array[3] = 5 ; # stores 5 instead of 17.2e12 $array[4] .= \tbye # ?

More on arrays
Spacial array indices
Array is automatically extended as needed There is no limit on its length as long as theres memory.

my @nums = (1, 2, 3, 4) ; # array of 4 elements


$nums[99] = 99 ; my $end = $#nums ; # there is 95 undef variables # to access last element. 99

$nums[ $#nums ] = last ; # Or $nums[-1]

More on arrays

(cntd)

List literals : range oprator ( . . ) (1 .. 100) ; # list of 100 integers. (0 .. $#nums) ; # range of indices of @nums. ($m .. $n) ; # range decided by $m and $n. (1.7 .. 5.7) ; # both values are truncated. Quoted words qw my @names = qw ( fred brian harry dino) ; qw / fred brian harry dino / # to save typing qw { fred brian harry dino } # avoids , and .

Perl built-in Array @ARGV


Its a Perl built-in array which stores the input argument to the program. It is similar to argv in c language unlike element zero here is the first argument to program not the program name. ./my_prog first 1 zero print $ARGV[0] ; # prints first print $#ARGV ; # number of element in the array. print @ARGV ; # prints array element w/o gaps. print @ARGV; # array element with white spaces.

Array operators
pop and push
pop : It removes the last element from array @array = 5 .. 9 ; $val = pop @array # gives you 9. left (5,6,7,8) $next = pop @array # gives you 8. left (5,6,7) pop (@array) ; # 7 is discarded left (5,6) push : It append an element to an array. push (@array, 0) ; # now its (5,6,0) push (@array 1) ; # now its (5,6,0,1)

Array operators

(cntd)

shift and unshift It performs operations on the start of the array. shift : It removes the first element from array @array = qw# dino fred jon # ; $m = shift @arrays; # $m gets dino. (fred, jon) Unshift : It add element at start of an array unshift (@arrays, 2) ; # now it has (2, fred, jon) unshift (@arrays, 1) ; # now (1, 2, fred, jon)

Array operators

(cntd)

reverse : It takes the list of values (may from arrays) and returns the list in opposite order. @fred = 6 .. 9 ; @rev = reverse (@fred) ; # gets 9, 8, 7, 6 @fred = reverse @fred ; # ? sort : It takes the list of values (may from arrays) and returns the list in sorted order. @rocks = qw/slate hard bed / @sorted = sort (@rocks); # bed, hard, slate @nums = sort 97 . . 102; # ?

Iterating an array
foreach loop control It is an easy and handy control loop to process an entire array or list foreach $shell (qw/ sh csh tsh bash /) { print shell is $shell\n # print shell names } # $shell is control variable. @names = qw/ sh csh tsh bash / ; foreach $shell (@names) { $shell .= \n ; # here we are going to modify } # actual array elements.

Perls Favorite default : $_


If you omit control variable from foreach loop, perl uses its default variable $_ foreach (1 .. 10){ print start counting : $_ ; } # print array element from 1 to 10 $_ = Hey, how are you\n ; print ; # it automatically takes $_ and print its contents

Scalar and List context


A given expression may mean different things depending upon where it appears. As Perl parses your expression it always expects a scalar or list value. What Perl expects is called the context of the expression 42 + something ; # something must be scalar sort something ; # something must be list @ppl = qw/ fred dino jon / ; @sorted = sort @ppl ; # list context $num = 5 + @ppl ; # scalar context 5+3 gives 8.

Scalar and List context (cntd)


List producing expression in scalar context. $sorted = sort @ppl ; # in scalar it returns undef @rev = reverse @ppl ; # gives (jon, dino, fred) $back = reverse @rev ; # scalar context. # gives derfonidnoj Here are some common context $fred = something ; # scalar context @pebble = something ; # list context ($fred, $jon ) = something ; # list context ($fred) = something ; # still list context

Subroutines
Defining a subroutine sub messege { cnt += 1 ; print my count is $cnt\n ; } Invoking a subroutine my cnt = 0 ; &messege ; # my count is 1 &messege() ; # my count is 2 messege() ; # my count is 3

Subroutines (cntd)
Return values : sub routine returns the last statements value always. my $max = &max ; # gives either $fred or $dino sub max { if ($fred > $dino) { $fred ; else { $dino ; } Here Ive used same name for variable and subroutine, which Perl allows me to do that.

Arguments to Subroutine
Lexical variable (private variable) This variable are declare by writing my before it. my $result = max(3, 4) ; # sub max { my ($m, $n) = @_ ; # m & n are private to max() if ($m > $n) { $m } else { $n } } # scope of m & n till here only @_ : It stores argument pass to the subroutine. Its a special array, for the duration of subroutine.

Variable length parameter list


sub max { If ($_[0] > $_[1]) { $_[0] } else { $_[1] } } #avoided the private variables. my $result = max(10, 15, 12) ; # 12 is ignored. Because subroutine is not looking as $_[2] sub max { print warning insufficient argument\n if (@_ != 2 ); ... } # here checking arguments in scalar context

A better max subroutine


My $maximum = &max ( @numbers ) ; sub max { my ($big) = shift @_ ; # first is largest yet seen foreach (@_) { # look at remaining element. If ($_ > $big) { $big = $_ } return $big ; # returns from the subroutine. }

Non scalar return values


You can even return array or list from subroutine. sub max { if ($fred < $dino) { $fred .. $dino ; } else { $dino .. $fred ; } } $fred = 4 ; $dino = 7 ; @result = &max ; # gets (4, 5, 6, 7)

Omitting the apmersand &


If compiler sees the sub definition before invocation or Perl can tell from syntax thats its a sub call, then sub can be called without &. my @cards = shuffle (@my_cards) ; sub divide { # sub divide definition first. $_[0] / $_[1] ; } quotient = divide 35, 4 ; # calls like perls built-in Dont put definition after invocation, as compiler wont know what attempted invocation all about. With & compiler calls user defined subroutine.

Input, Output and File handling


Line-input operator <STDIN>. To get a value/input from the keyboard into a Perl program we use this operator or file handle STDIN. my $line = <STDIN> ; # scalar context chomp ($line) ; # get rid of trailing \n It reads the next complete line including \n from standard input. When i/p is coming from file it read till EOF, but what when i/p comes from keyboard ? on unix like system type Ctlr-D. For windows use Ctrl-Z. which indicates EOF.

Check user input


User input in list context my @lines = <STDIN> ; chomp (@lines) ; # each element is a line till \n Or chomp (my $line = <STDIN>) ; # commonly used. while(defined ($line = <STDIN>)) { # scalar context chomp $line; print $line; } # print one line at a time till Ctrl-D foreach(<STDIN>) { # list context chomp ; print ; } # same as above

Input from Diamond <> operator


This is useful for making programs that work like standard unix utilities like cat, grep, lrp, awk etc. The invocation arguments to a program are normally a number of words on the command line after program. Usually name of files. ./my_prog hello.pl array.pl sub.pl while(<>) { # take file as input source from argv chomp ; print ; } # prints a line from file hello.pl Or print sort <> ; # ?

Formatting output
@ARGV = qw# larry nilesh randal #; # forces these three files to be use in <> operator while(<>) { chomp ; print ; } printf : It is similar to c laguages printf. printf My name is %s" & age is %6d",$user, $age ; my @info = qw/ htdg c-dac pune university / ; printf %s\t%s\t%s\t%s, @info You need to give same no of %s as many elements in the array.

Array and printf


my @items = qw( hi hello bye bolo ) ; $format = "the items are:\n" . ("%10s\n" x @items) ; # scalar context printf $format, @items ; # list context
The above printf will look like as below

# printf "the items are:\n%10s\n%10s\n%10s", @items ;

File handles
A file handle is the name in a perl program for an I/O connection between your Perl process and the outside world Perl uses six special file handle names for its own purposes; STDIN, STDOUT, STDERR, ARGV, and ARGVOUT

STDOUT is implicit file handle for print. $ ./my_progm <data >result $ cat fred dino | sort | ./process | grep hpl | lpr

Opening File handles


Open file for reading : In this mode you can only read from a file write is not permitted. < sign is optional, default is read. open( CONFIG, "/etc/dat.conf") ;
Open file in write mode : Only write is permitted, this option is useful when you are creating new file. open( CONFIG, ">/etc/dat.conf") ;

Opening File handles (cntd)


Open file in append mode : When you want to modify existing file or a shared file between many processes open( LOG, ">>/tmp/log") ; Open a file in read and write mode : This is what we want, reading and writing. open( LOG, +>/tmp/log") ; Open a file in read and append mode : open( LOG, +>>/tmp/log") ;

die a way of terminating program


Standard way of opening a file open( CONFIG, ">>$config") or die "couldn't open $config file $!" ; die : will terminate the program and give you

file name line no. at which it fails. $! : Is the Perl special variable will show you os's reason for which it fails. If you dont want file name n no. u simply put "\n" at the end. for e.g. die You must be root if ($> =! 0) ;

Reading from File handle


$datself = <CONFIG> ; # for single line file
If your file is having multiple lines

$/ = ; # i/p record separator $dathosts = <HOSTS> ; # Or my $file = join ', <CONFIG> ;


# Dont forget to use chomp, once you grab the file foreach (<LSMOD>) { chomp ; . . .; } close (LSMOD) ;

Writing through File handle


To write to a file you need to mention file handle in to print as STDOUT is default file handle; print CONFIG GUID\tDATSELF\tDATSELF\n\n ; print CONFIG "$guid\t$datself\t$datself\n\n" ; To change default file handle of print. select CONFIG ; $| = 1 ; # to avoid buffer the output flush every print . . ; # print/output operation to file. select STDOUT ; # dont forget to restore it back.

Hashes
Its a data structure like an array. Instead of indexing the values by name i.e. indices It uses keys to access, modify and retrieve its values. keys are strings, but values can be mixed data like integer, float, char, string etc. In a single hash there can be any number of key value pairs. %hash = ("hi", "tata", "2.5", "how", "why", 100") ; my %hash = ( "hi" => "tata", "2.5"=> "how", "why"=> "100 ) ;

Perls built-in Hash %ENV


%ENV : Is Perls built-in hash. Perl uses it for storing environment variables. You can use/access it in your program. $path = $ENV{'PATH'} ; # to access a single variable. keys and values : These are Perl functions to get keys and values of an hash respectively. my @env_vars = values %ENV; my @env_keys = keys %ENV ; foreach $key (keys %ENV) { { print "$key = $ENV{$key}\n" ; }

Practical usage of hashes


Following are the situation where one choose hash data structure.
1. for storing environment variables. 2. host name to ip address lookup table 3. Word , count of number of times that word appear.

my $kshipra_env = 'KSHIPRA_HOME' ; print "$kshipra_env = "$ENV{$kshipra_env}\n" if (exists $ENV{$kshipra_env} and defined $ENV{$kshipra_env}) ; delete $ENV{$kshipra_env} ; # delete the entry from hash

Regular Expression
A Regular expression, often called a pattern in Perl, is a template that matches a given string. Perl has strong support for regular expression. This allow fast, flexible and reliable string handling. To match a pattern (regex) against the contents of $_, put the pattern between / / . foreach (<HOSTS>) {

print "It matched\n" if (/pn1/) ;


}

Metacharacters
Period . : this matches any single character except "\n . For e.g. /Ni.esh/ -> o/p Nilesh, Nitesh, Nimesh . . . If you want to search for . itself use /3\.142/ Quantifier : * : matches 0 or more of the immediate previous character {0, infinite}. For e.g. /mem\t*lock/ ; memlock, mem lock, mem lock, mem lock,. . . /http.*in/ ; All sites which starts with "http" and ends with "in.

Some more Quantifiers


+ : matches 1(atleast) or more of the immidiate previos char {1, infinite} . For e.g. /bam +bam/ ; bam bam, bam bam, bam bam, not for bambam. /(bred)+/ #?

? : matches 0 or 1 of the immidiate previos char {0,1}. For e.g.


/bam-?bam/ ; bambam or bam-bam.

Character classes
| : Oring means that left or right side may match /CDAC|HTDG/ -> matches CDAC or HTDG in the given string. [] : matches only that string which contains characters inside [] /[abc]/ ; matches all words which contain a or b or c. /[a-zA-Z]/ ; # matches all 52 characters /[a-zA-Z0-9]/ ; # all apha numeric characters e.g /[pn0-9]/ #?

Character classes (cntd)


{} : character/word multiplier /a.{3}e/ ; # awate, aapte, amare {3,5} # you can write in this way too. It means minimum 3 maximum 5. {1,} # this means minimum 1, max any.

\1 : remember or matched first regex \2 : remember or matched second regex /a(.)b(.)c\1\2z/ ; # matches axbycxyz you can use $1, $2 outside the regex . . later on

Character class short cuts


\b : strict selection /pn1\b/ # select only pn1 and not pn11, pn12... ^ : matches at beginning of string /^http/ # select only starting http and not intermediate http.

$ : matches at end of the string /com$/ # match at end of the string, exactly opposite to ^.

Few more short cuts


\d -> matches any digit [0-9] \D -> Not digit [^0-9] \w -> matches a single word \W -> Not a word \s -> matches a spaces [\t\n\f\r] \S -> Not a space
/-?\d+\.?\d*/ # what is this doing ?

Matching with Regex


m/ / : Is a short cut for the forward slashes (//) It is similar to qw// operator. So you could write that same expression as m{}, m(), m## . . . Flags used in Perl Regex :

m//i : case insensitive e.g. m/y/I m//s : to search pattern between newline (.) fails to search beyond \n e.g Nilesh\nfwfwq\nqwgwg\nawate

Few more flags


m//x : allow you to add arbitary white spaces in regex. For e.g. m/ -? \d+ \.? \d* /x s///g : global replacement replace all the occuracnces of found pattern in a string. \L : forces Lower case \U -> Forces Upper case \l : Only next character \u : Only next character It forces till end of string you can turn it off with \E

Binding Operator =~
Matching against $_ is merely the default; the binding operator =~ tells Perl to match the pattern on the right against the string on the left. print "Do you like perl ? : ; my $opinion = (<STDIN> =~ /\byes\b/i) ; print "Hey, you said that you like Perl\n if($opinion)

$date = `date` ; # Wed Jul 22 12:36:08 IST 2009


$date =~ /((\d{2}):(\d{2}):(\d{2}))/ ; # ? @array = ($2, $3, $4) ; # explain in the next slide.

The persistence memory


These match variables generally stay around until the next pattern. An unsuccessful match leaves the previous memories intact, but a successful one reset them all. $host =~ m/(\w+)/ ; # BAD! Untested match result Print $1 is the host\n ; # unpredictable If ($host =~ /(\w+)/) { # you can check whether print $1 is the host\n ; # match is success or not } else { print Sorry unknown host\n ; }

Persistence Memory (cntd)


$1 # total time. You have used single bracket around complete pattern. 12:36:08 $2 # hour 12 $3 # minute 36 $4 # seconds 08 $& : It matches the complete pattern.

$` : matches before pattern $' : matches after pattern

# Wed Jul 22 # IST 2009

It remains there until next successful pattern match

Substitutions
If you think m// is search feature, then search and replace feature would be Perls s/// substitute operator. This replace whichever part of a variable matches a pattern with replacement string. You can write s{ }{ }, s# # # . . . $_ = "HTDG, C-DAC Pune University ; s/(\w+), (\w+)/$2, $1/ ; # Now its become C-DAC, HTDG Pune University

s/^/Wow/ ;

#?

split and join operators


It breaks up string according to pattern, returns you pure strings useful in tab, colon, white space, underscore or any regex. for e.g. /etc/passwd or many /etc files which are tab separated. @data = split (/separator/,"string"); # list context $line = `tail -1 /etc/passwd` ; vivek:x:501:501::/home/vivek:/bin/bash my @user_info = split (/:/, $line) ; vivek, x, 501, 501, /home/vivek, . . .

split and join operators (cntd)


It is exactly opposite to split, It glues/joins bunch of pieces to make a single string usage reading a complete file in a single variable concatenating string e.g. time date tab in files. join ("saparator", ... , or array) ; # 4,5,6 or 1 .. 3 It operate on list data but returns scalar context. my @array = qw(12, 43, 05) ; my $time = join(":", @array) ;

Editing a File
The most common way of programmatically updating a text file is by writing a new file that looks similar to the old one. open FILE, +<$filename ; my $line = join ('',<FILE>) ; $line =~ s/^/$filename/gm ; # ? print FILE $line; close FILE ;

Editing many files


chomp (my $date = `date`) ; $^I = ".bak" ; while (<>) { # remember the diamond operator s/Author:.*/Author: Nilesh Awate/ ; s/Group:.*/Group: HTDG/ ; s/Date:.*/Date: $date/ ; print ; # keep remaining lines as it is. } $^I : is Perl special variable which take backup of a file before editing it with the extension which you provide. Dont keep it empty

You might also like