anders.com: words: perl 101

anders.com: words: perl 101

perl 101: written for maximum linux magazine

[ home ]
[ anders ]
[ resume ]
[ choppers ]
[ projects ]
[ netatalk ]
[ route66 ]
[ javascript ]
[ webgallery ]
[ mockMarket ]
[ merits ]
[ dailyBulletin ]
[ panacea ]
[ words ]
[ pictures ]
[ movies ]
[ contact ]

MaximumLinux Article: Perl 101
By Anders Brownworth

1. Intro:

Perl stands for "practical extraction and report language", but trust me, it isn't even close to being as boring as it sounds! It has actually become the defacto standard glue language for Unix. You might as well get used to saying, oh, that can be done with perl because that's usually the case! About the only downside to perl is it's speed, but for the overwhelming majority of tasks that perl is used for, it's far more than fast enough. In short, perl rocks.

"In love, anything is possible. For everything else, there is perl."
~Anders Brownworth

So the real problem as I see it is, "How does one get over the learning curve?" This is possibly perl's chief strength and it's chief weakness. As you'll see, perl is very concise and has lots of simple ways of doing fairly complicated things quickly. That is to say, it's handy but cryptic. But stick with it and you'll thank me!

We'll go through several example scripts to get you up to speed. Make sure you are sitting in front of your trusty Linux machine and try all of the examples. Because I won't be able to go in all the directions you might want to go, there will be no better teacher than the perl interpreter itself!

2. Sample script:

Perl is an interpreted language. This means that you make little text files and "execute" them by sending them through the perl interpreter. Break out your favorite text editor (such as emacs) and create a file called "sample.pl" and enter the following program. (I usually use the extension ".pl" to denote perl scripts, but you can use whatever you want.)

#!/usr/bin/perl

print "World domination, one line of perl at a time.\n";

Now make that file "executable" and run it by typing it's name.

eyore:~> chmod +x sample.pl
eyore:~> sample.pl
World domination, one line of perl at a time.
eyore:~>

(if execution fails with a "command not found", then your default path may not include the current directory. Execute the program like this: "./sample.pl" or check if perl lives in /usr/bin/perl or somewhere else)

When you "run" this script, the shell notices that sample.pl is an executable text file, so it reads it to process the contents as if they were shell commands. All modern shells will notice that the file starts with "#!" so they give up trying to interpret it and launch the named program sending the rest of the file to it as input. In our case, they launch /usr/bin/perl and send our script (all of one line) to it for execution. That's the first line of sample.pl.

The script is just a simple print statement that just does exactly what you would expect. It prints whatever you typed between the quotes. Perl statements end in a semicolon (;) so never forget them. Many times, syntax errors crop up because you didn't put a semicolon somewhere. But what is the "\n" all about? "back n" stands for the special character "newline", or the same thing as hitting enter.

Sniglet: Line endings

The standard Unix line ending is a "newline" character, ( \n ) but DOS and Windows terminate lines with the two characters "newline linefeed". ( \r\n ) This really won't matter to you until you need to start reading Windows files with your perl scripts, but if you are editing perl files on Windows and executing them on linux, you will need to yank the \r out of the #!/usr/bin/perl line.

Now let's make it a little more interesting. Let's add a variable.

#!/usr/bin/perl

$name = "Anders";
print "Your name is $name?\n";

$name is a variable that we're using to store my name. We could also set this variable on the fly:

#!/usr/bin/perl

print "What's your name? ";
$name = <STDIN>;
chomp $name;
print "You wouldn't happen to be $name?, would you?\n";

(chomp is a quick little function that lops off the last character in a string if it is a newline or whitespace character) Notice the lack of a \n character in the first print statement. That is done because the user presses enter after entering their name.

Or we could store a number in a variable:

#!/usr/bin/perl

$number = 5;
$square = $number * $number;
print "The square of $number is $square\n";

What about a loop?

#!/usr/bin/perl

$maximum = 10;
$number  = 1;
while ( $number <= $maximum ) {
  print "$number\n";
  $number++;
}

Here we are setting $maximum and $number and saying "while the number is less than or equal to the maximum, run everything between the { and the }. We increment $number with the perl-ism "++" or in other words, add one to $number. The opposite of that would be "--" meaning subtract one from the variable.

We can also construct arrays (lists) of items and pick things from them.

#!/usr/bin/perl

@days = ( "Sunday",   "Monday",
          "Tuesday",  "Wednesday",
          "Thursday", "Friday",
          "Saturday" );

$total_days = @days;

print "There are $total_days days in the week.\n";
print "The first day of the week is @days[0].\n";
print "The last day of the week is @days[$total_days - 1].\n";

@time = localtime(time);

print "Today is @days[@time[6]].\n";

Hold on a minute. A few things are happening here. First we are making an array called @days with all the days of the week in it. We can call any particular day by referencing it by number, but arrays are numbered starting from 0, so @days[1] is "Monday".

@days is an array, but when we call it like a variable ($total_days = @days) it returns the total number of elements in that array. In this case, that would be 7. But don't forget that the last element of the array @days is 6, not 7 because the array is numbered from 0. So to get the last element of the array, we call it like this: @days[$total_days - 1]

Next we are using a handy perl function called localtime which returns an array representing the current date and time. It just so happens that array element 6 is the day of the week on a 0 - 6 scale, so we can convert that to the long form by calling the @time[6]'th element of the @days array: @days[@time[6]]

Tip:
The function localtime exposes a number of items which are covered in the manual page. In short, you can use them like this:

($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime(time);

To read up on localtime and many other functions, type: perldoc perlfunc

Regular Expressions:

One of the coolest things in perl is regular expressions. Essentially, it's a very sophisticated way to search through a variable and find a match. While regular expressions may look cryptic, it doesn't take long to start understanding them. Check this out:

#!/usr/bin/perl

$line = "Love is blindness, I don't want to see";

print "blind is in the phrase\n" if ( $line =~ /blind/ );
print "love isn't in the phrase\n" unless ( $line =~ /love/ );
print "ignoring case, love is in the phrase\n" if ( $line =~ /love/i );

You will want to run the above example to get exactly what is going on here. First we establish a variable with a phrase in it. Next we print "blind is in the phrase" only if $line contains the string "blind", which of course it does. But in the next line, "love" is not in the phrase because "love" and "Love" are different strings. Notice the use of "unless" which does the opposite of an "if" in this case. In the last line, we add case insensitivity to the match, and hence get a match.

We can also selectively replace things in a variable:

#!/usr/bin/perl

$line = "Love is blindness, I don't want to see";

$line =~ s/want to/wanna/;
print "$line\n";

In this case, we are changing "want to" to "wanna". Preceding the regular expression with an "s" swaps the first instance of "want to" with what's on the other side of the replace, in this case, the word "wanna".

What were to happen if we had another occurrence of the string "want to" in our sample? In this case, it wouldn't have been replaced unless we added a "g" to the end of the replace statement. For instance, the output of the following line:

$line =~ s/s/z/g;

which replaces all "s" characters with "z" characters, looks like this:

Love iz blindnezz, I don't want to zee

How about a more practical example? Let's say that you have a whole pile of mp3 files in a directory and we wanted to get rid of the spaces in the names.

#!/usr/bin/perl

@files = qx {ls *.mp3};

foreach $original ( @files ) {
    chomp $original;
    $modified = $original;
    $modified =~ s/ /_/g;
    print "renameing '$original' to '$modified'\n";
    qx {mv '$original' '$modified'};
}

First we create an array of all the files ending in .mp3 with the line @files = qx {ls *.mp3}; qx executes everything between the { and } marks as a system command and sends each line of the results to @files. The we do a foreach loop through all the elements in @files. Let's say that the first element of the array is "love is blindness.mp3\n". In the first iteration of the loop, the string $original is set to "love is blindness.mp3\n" and then a chomp operation is done on the string killing the \n. Then $modified is set to the contents of $original and has all it's spaces replaced with underscores. (s/ /_/g) Next we print what we are going to do and then execute a move command with the qx line. Because we are inside a foreach loop, the process will repeat until all the mp3 files are renamed with underscores for spaces. Walah!

What happens if we have a bunch of illegal characters such as ( and ) in an mp3 name and we want to convert those to underscores? With regular expressions come a whole slew of characters with special meanings. Here's a few:

\w
\W
\t
\d
\D

So if we do this:

#!/usr/bin/perl

@files = qx {ls *.mp3};

foreach $original ( @files ) {
    chomp $original;
    $modified = $original;
    $modified =~ s/.mp3//;
    $modified =~ s/\W/_/g;
    print "renameing '$original' to '$modified.mp3'\n";
    qx {mv '$original' '$modified.mp3'};
}

Note that we use \W to select all non-word characters and replace them with an underscore. But the dot in ".mp3" is also considered a non-word character, so we lopped it off in the line above and added it back when we did the print and qx command. Obviously this little script could become as complicated as you want, adding numbers to filenames, and deleting multiple underscore marks. Another common use

Subroutines:
As soon as you start doing things more than once in any computer language, it's usually a good idea to create a subroutine to take care of things. Let's say that you are reading numbers in and computing the square of each number. It would be handy to have a subroutine to take care of that, so consider the following example:

#!/usr/bin/perl

print "enter a list of numbers. (control-c to quit)\n";

while (<>) {
    chomp;
    $result = square ( $_ );
    print "the square of $_ is $result.\n";
}

sub square {
    my ( $number ) = @_;
    $number = $number * $number;
    return ( $number );
}

Whoa there! What happened to all the variable names? Well, in perl there is a default variable (referred to as $_) that is implied wherever there would normally be a variable. Standard input from the keyboard is also implied when we do the while (<>) statement. In this case, what we are actually saying is something like this: while ( $_ = <STDIN> ) meaning that we are setting the default variable to whatever is typed in, one line at a time.

The next line sets the variable $result to be equal to the output of the subroutine square ( ). We are sending $_, or whatever the user typed in, to the subroutine square ( ) as a parameter. With luck, square ( ) will return the square of $_ so in the next line, we print out the result.

The subroutine is defined at the bottom. In fact it doesn't matter where you define a subroutine! Let's take a look at the first line in the subroutine. "my ( $number ) = @_;" The special array variable @_ holds an array of the parameters that were sent to this subroutine. Because it is an array, we have to list our variables in an array form which is why we have parenthesis around $number. But because we may have used the variable $number somewhere else in this script we want to confine the scope of $number to just this subroutine so we don't clobber whatever values may already have been in there. (that's what the "my" does) Next, we compute the square of $number and lastly we return the result. Now we have a routine that we can call over and over with different values that computes the square.

CGI Applications

One of the coolest uses of perl is CGI Applications on websites. Typically HTML is static. You write up the HTML file, you put it on your web server, and people download it and look at it. Well wouldn't it be nice if that little html file had some dynamic element to it, such as "today's date"? Well, have fear, we're going to do just that!

Because this is MaximumLinux and just about every Linux machine in existence has the apache webserver on it, I'm going to make the assumption that you have apache set up and executable files ending in ".cgi" are run as CGI programs. Consider the following code. (although, this time, save it as "sample.cgi" and place it somewhere that you can get to it via the web.)

#!/usr/bin/perl

print "Content-type: text/html\n\n";

$time = localtime ( time );
print "Today: $time\n";

In this script, we print a legal CGI header (Content-type: text/html\n\n) and then call localtime ( time ) in a scalar context. (That is to say we call localtime as if it were a variable.) Next we simply print the string "Today: " and then whatever localtime gave us. Some sample output of this program might look like this:

Content-type: text/html

Today: Fri Apr 28 02:26:37 2000

Notice the \n\n after the Content-type line. This is very important as it signifies the end of the HTML header and the beginning of the content. When the web browser renders this, it won't render Content-type but rather just Today: Fri Apr 28 02:26:37 2000.

Let's make a completely server generated page that parses an html file and inserts the date wherever it sees ##date. If we have some html in a file called sample.inc that looks like this:

<html>
<title>Welcome: ##date</title>
<body>
  Today's date is ##date.
</body>
</html>

and we have the following script named sample.cgi:

#!/usr/bin/perl

print "Centent-type: text/html\n\n";

$date = localtime ( time );

open (FILE, "sample.inc") or die "Can't open sample.inc\n";
while ( <FILE> )  {
  s/##date/$date/g;
  print
}
close ( FILE );

Now when we hit sample.cgi in a web browser, we should get the contents of the include file (sample.inc) printed out with every instance of ##date replaced with today's date. Of course you don't want to do this on a heavily hit webserver because parsing every single line of html takes time, but it's still a great example.

So that's cool, but what about forms? Now that's where the real fun comes in! Consider the following code with the subroutine "parse". Form data shows up as if it were typed on the command line (STDIN) to a CGI application. This little routine will grab the data from STDIN and set it up in a nice handy format.

#!/usr/bin/perl

parse ( );

print "Centent-type: text/html\n\n";
print "<html><body>\n";
print "Your name is $cgi{'name'}\n";
print "and your email is $cgi{'email'}\n";
print "</body></html>\n";

sub parse  {
  read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
  @pairs = split(/&/, $buffer);
  foreach $pair (@pairs)
  {
    ($name, $value) = split(/=/, $pair);
    $value =~ tr/+/ /;
    $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
    $cgi{$name} = $value;
  }
}

And we'll access this cgi program from a form that asks for some input:

<html>
<body>
<form method="post" action="sample.cgi">
Name: <input type="text" name="name"><br>
Email: <input type="text" name="email"><br>
<input type="submit" value="Go!">
</form>
</body>
</html>

Now when the user hits the html page and enters some data, we will have access to it in the hash called $cgi{}. So to access what the person typed in for the name field, it is referenced as the variable $cgi{'name'}. The inner workings of the parse subroutine are a little beyond this perl primer, but at least it gives you something with which to stark hacking around.

Now that you have a good head start into the world of perl, you will probably want to get a book and start looking at some of the online resources that perl has to offer. You really can do just about anything with perl. It has been extended to deal with so many things that it would be hard to cover them all in one sitting! You can get perl modules to do everything from read the id3 tag on an mp3 file to resize an image or connect to a database. The horizon is limitless, so don't be afraid to start hacking! Aside from it's incredible flexibility, perl is also quite a valuable tool to have on a resume if you are so inclined. Take it from me, time with perl is never wasted!

Other Information
Without a question, the best book that I have ever had the pleasure of reading has been O'Reilly and Associates' "Programming Perl". Not only is it a well written and entertaining book, but it's littered with practical examples of not only how you might use something, but why you would use it. All of this is done by example, which as you probably know, is the best way to teach.

"Programming Perl" is a hard core perl programming book. A softer introduction is O'Reilly's "Learning Perl" which is actually the first real book on perl that I ever read. Between the two of them you should have all the perl book you'll ever need.

CPAN, or the Comprehensive Perl Archive Network is an invaluable source for perl information as well. They keep the official archive of perl modules and are a great clearing house for everything perl. You'll probably want to spend some time looking through the scripts and documentation here. http://www.cpan.org/

http://www.perl.org/ is also a must-see perl site. Aside from the usual, they will have very insightful articles about practical uses of perl today. They will cover everything from the latest XML hacks to using a perl script to virtually eliminate spam!