A Crash Course in Perl

A Crash Course in Perl

Perl is an acronym, short for Practical Extraction and Report Language. It dates back to the 1980's. It was originally written by Larry Wall while he was working at NASA's Jet Propulsion Labs.

Perl started life as a 'glue' language, for the use of Larry and his officemates, allowing one to 'stick' different tools together by converting between their various data formats. It pulled together the best features of several languages: the powerful regular expressions from sed (the UNIX stream editor), the pattern-scanning language awk, and a few other languages and utilities. The syntax was further made up out of C, Pascal, Basic, UNIX shell languages, English and maybe a few other things along the way.

Perl has the power and flexibility of a high-level programming language such as C and convenience of scripting. It provides extreme performance and ease of coding. It is meant for geeks (not for the muggles). It does allow the usual readable code. But the real power of Perl lies in the concept of default variables and parameters, that allows us to crunch huge functionality into a single line of code.

Let us now have a look at some important aspects of Perl

Perl 5 Basic Concepts

The first and foremost task with any new programming language is the Hello World. So also with Perl.

Before that, we have to start with installing the Perl interpreter. Perl normally comes pre-installed with Linux and Mac but we need to install it on Windows. We can obtain it from the Perl Website.

Once the interpreter is setup, we can go ahead with the first script. Windows associates the extension with application used run the file. Once the Perl interpreter is installed, the .pl extension gets attached with the interpreter and the job is done.

But that is not the case on a Linux (or Unix) - where a "real" Perl script will normally run. On a Linux, the interpreter is specified by the first line of the script. The first line in any kind of script is very important. It is called the "Shabang" line. It should like something like this:

#!/bin/perl

The #! is called the Shabang. What follows it is the path to the interpreter executable. It could be /bin/ksh or /bin/python or /bin/bash ... depending on the kind of script we create. It is customary (not necessary) to leave a line after the Shabank. After that comes the line of code

print "Hello World\n";

No marks for guessing.. This prints out the text Hello World. Thus the helloworld.pl script looks like this:

#!/bin/perl
print "Hello World\n";

Save this in a file. Change the permissions (on Linux) and run it. There we see on the console: Hello World. Nothing much for the world. But it is an assurance that we have started well. %%[tatacliq]

Basic Syntax

Perl syntax is very simple and flexible. It is based on the principle of forgiveness. Most languages have their own lunacies. Some need a bracket for parameters of a function, some don't. Some are hard on types. Some don't understand types... Perl takes care of all of these aspects. You may pass parameters in brackets if you like. But that is not required. If you care for it, the variables are typed else they are not. There is only aspects that Perl enforces - Curly Braces to mark blocks. Let's have a skyview of the Perl syntax.

Comments

The foremost in any syntax are the comments. In Perl, anything that follows a # is considered a comment. This may be the first character of a line or it could be midway. But anything between the # and the new line is a comment and the interpreter does not look at it.

The Semicolon

Every line of code in Perl must end with the ; That is defined as the end of a code line. A new line or any white space is redundant. A single line of code can span several lines of text and we can have several lines of code on one line of text.

Statements and Tokens

In Perl, all code consists of three components: The statement, the tokens and a return value. Every line of code is a function call (statement) with some parameters passed into it (tokens) and has a return value that may or may not be captured into a variable. The parameters may be passed in a bracket (comma separated) or without a bracket (separated by white space). The final optional element is the conditional that dictates if the line of code should run.

Control Structures

Perl also provides a rich chunk of control structures - if, elsif, else, while, until, for, foreach, do, last, continue, next. They are not very different from the their counterparts in other languages. We will look at these in detail later. But it is important to note here that the scope of these conditionals is defined by a pair of curly braces {...} or by the line of code containing them.

if (1) {
   print "Hello World\n";
}

Is the same as

print "Hello World\n" if (1);

STDIN / STDOUT

The console interaction is generally through the STDIN and STDOUT. Any input taken from the console is available on the STDIN and anything that we print in the code is set to the STDOUT by default. Check out the script below - echo.pl that echoes anything that we typed

#!/bin/perl

print "Enter a string\n";
while (1) {
  $input = ;
  print $input;
}

This script runs in an infinite loop. Anything that you type into the console is echoed back. The first line in the loop reads a line from the STDIN. The second line prints whatever it read.

Scalar Variables

Perl does not explicitly define variable types. But, internally it manages two types of data - strings and numbers.

Numbers

A variable that is assigned a numeric value is a numeric variable.

$numer = 1;

Perl does not restrict us to simple numbers. All these are valid numbers in Perl.

4
3.2
.23434234
5.
1_123_456
10E2
45e-4
0xbeef
012

Strings

A sequence of characters in quotes is a string.

$string1 = "This is a string";
$string2 = 'This is also a string";

We can use single or double quotes for defining a string variable. There are some subtle differences in the way they are parsed. If a string is defined in double quotes, some values have a special meaning in the string. For example the new line \n or tab \t... also any variable within the string is parsed and its value is inserted instead. Single quote strings take everything literally. Except for the \' that is used to include a single quote inside a string marked by single quotes.

#!/bin/perl

$c = 1;
$string1 = "String with newline \n and also some variables \" $c\n";
$string2 = 'String with newline \n and also some variables \\ \' $c\n';

print string1;
print string2;

This generates the output:

String with newline 
 and also some variables " 1
String with newline \n and also some variables \ ' $c\n

Most programming language have a similar set of escape characters. The exhaustive list for Perl can be found in the documentation

Concatenating strings

Perl allows us to concatenate two strings to generate a new third string. The operator . is used for this. Check this script.

#!/bin/perl

$string = "Hello" . " " . "World" . "\n";
print($string);

Arithmetic Operations

All arithmetic operations in Perl are floating point. Thus, 19/2 gives us 9.5 and not 9 as a Java/C developer would expect. If we really want 9, can can enforce the integer output using the int operator. This does not change the division. But just truncates the resunt to give an integer.

#!/bin/perl

$x = 19/2;
$y = int 12/2;
print("$x $y\n");

Perl includes most of the common operators that are used in other programming languages. They mean the same; so not elaborating too much.

String - Number Conversion

Perl does its best to help us with the conversion. If a variable is used in string context, it converts the variable into a string and if it is in a numeric context, it tries to get a number out of it. Most often this is quite intuitive. For example,

#!/bin/perl

$x = "10";
$y = "ten";
$z = "5 hundred";

print $x + 5;      # 10
print $y + 5;      # 5
print $z + 5;      # 10

The first will give us 15 as one would expect. But what would you expect for the other two? This is less intuitive. When converting a string to a number, Perl starts scanning the string from the left edge until it sees numbers and stops when it sees anything else. Thus, in the first case, "ten" is evaluated as 0 while "5 hundred" is evaluated as 5. The second results in 10 and the third gives us 10.

Boolean

Perl does not have an explicit boolean variable. Strings and numbers are evaluated as true or false depending on their content. An empty string or the number 0 or an undefined variable evaluate to false. Anything else is true. This leads to an interesting situation. What if we assign a string "0" to a variable? Would that be a non empty string evaluating to true, or the number 0 evaluating to false.

This allows us to peep behind the curtains into the way Perl handles variables. "0" is a string hence true. It is converted to a number only when it is seen in a numeric context. Until then it is a number. Thus, "0" is true while 0 + "0" is false.

Such booleans are used for conditional evaluation and various control structures that Perl provides.

Arrays

Array in most programming languages implies an ordered chunk of elements that can be indexed. Each language adds its own flavors to this, but the essense remains the same. Perl arrays are not different. A single Perl array can contain any data type - including arrays

In the syntax, a Perl array is a variable that begins with @. We can also access individual elements of the array. Array elements can be indexed using negative numbers as well. Just as the first element of the array is indexed by 0, the last element is indexed by -1. Second last by -2 and so on. Thus we can also use

#!/bin/perl

@a = (1, 'one', $variable);

print("$a[0], $a[1], $a[2]\n");
print("$a[-3], $a[-2], $a[-1]\n"); 
print("@a\n");

There are other ways of creating arrays. For sequential numbers / alphabets,

#!/bin/perl

@numbers = (1..50);
@alphabets = (a..z);

print "@numbers\n"; 
print "@alphabets\n";

Array Size

The size of an array can be found by accessing it in the scalar context. Now what is that? That is not as complicated as it sounds. When we assign an array to a scalar variable, we get the value in a scalar context. We can also do the same by adding the explicit keyword "scalar".

#!/bin/perl

@array = ("Test", "Array", "Size");

$size = @array;                         # This assignment forces the scalar context.
print "Size of array:", scalar @array;         # In this case, explicit keyword is required.

Modifying the Array

Perl provides for push, pop, shift and unshift to add and remove elements from either ends. The push and unshift are not limited to one element. We can pass in a list or an array and it works well.

#!/usr/bin/perl

# create a simple array
@coins = ("Quarter","Dime","Nickel");
print "1. \@coins  = @coins\n";

# add one element at the end of the array
push(@coins, "Penny");
print "2. \@coins  = @coins\n";

# add one element at the beginning of the array
unshift(@coins, "Dollar");
print "3. \@coins  = @coins\n";

# remove one element from the last of the array.
pop(@coins);
print "4. \@coins  = @coins\n";

# remove one element from the beginning of the array.
shift(@coins);
print "5. \@coins  = @coins\n";

That takes care of either ends. Perl also provides slicing that helps us with intermediate elements.

#!/bin/perl

@nums = (1..20);
print "Before - @nums\n";

print "@nums[1, 5, -2]\n";
print "@nums[2..10]\n";

splice(@nums, 5, 5, 21..25); 
print "After - @nums\n";

Sorting Arrays

Perl provides the keyword sort to sort arrays.

#!/bin/perl

@strings = ("Perl", "provides", "the", "keyword", "sort", "to", "sort", "arrays");
print sort @strings;

@numbers = (1, 44, 9, 5, 8);
print sort @numbers, "\n";         # Sorts in alphabetical order
print sort {$a <=> $b} @numbers    # Sorts in numeric order.

By default, sort function sorts elements by the alphabetical order. But Perl allows us to pass in our own implementation of the sort order. The code inside the curly braces could include any kind of logic that works consistently and returns -1, 0 and 1 by comparing the values passed into it. This method should be consistent and fit the expected requirements of a comparator - else the result will be messed up. It can also get into an infinite loop if the implementation is bad enough.

Iterating over an Array

Any language worth its name makes some provision for iteration. For and while loops are quite common. Perl adds its own flavor to these. In fact, Perl has many different ways of looping over arrays. Some examples below:

#!/bin/perl

@a = (1..100);

for ($i=0; $i<@a; $i++) {
    print $a[$i], "\n";
}
print $a[$i], "\n" for ($i=0; $i<@a; $i++);

while ($i++ < @a) {
    print $a[$i], "\n";
}
print $a[$i], "\n" while($i++ < @a);

foreach $x (@a) {
    print "$x\n";
}
print "$x\n" foreach $x (@a);

Allocation

Perl arrays are self allocating. We do not need to allocate explicitly. Just assigning a value to an index ensures the memory allocation - if the memory is available - else the application will dump. Similarly, when we try to read from an index, it will return irrespective of whether the array actually has that element. If the value is not allocated, it returns undef. We can explicitly delete an element and set it to undef by using the method undef

#!/bin/perl

@a = (1..10);
print "@a\n";

undef($a[5]);
print "@a\n";

print "$x\n" if(defined $x) foreach $x (@a);

Scalar Functions

Many functions that make sense for scalars are extended for arrays. When applied on an array, they work on all the elements of the array.

#!/bin/perl

@a = ("chop", "every", "word", "in", "this", "array");
chop @a;
print "@a\n";

Note that this may not work for all functions. It will work only if the method implements the extension for arrays. Else we have other ways like map.

#!/bin/perl

@a = ("chop", "every", "word", "in", "this", "array");
@a = map { chop } @a;
print "@a\n";

The map function essentially loops through the list and passes each element to the code inside the curly braces. The result of each such iteration is accumulated into a new array that is returned by the map.