Perl/Addendum: Difference between revisions

From Citizendium
Jump to navigation Jump to search
imported>Georg Heidenreich
imported>Chris Day
No edit summary
 
(5 intermediate revisions by 2 users not shown)
Line 1: Line 1:
====healing slasheritis====
{{subpages}}
==Enhancements for readability==
===healing slasheritis===
In the standard Unix tools such as [[sed]], a regular expression is enclosed in a pair of slashes, i.e. '<code>/pattern/</code>' .  
In the standard Unix tools such as [[sed]], a regular expression is enclosed in a pair of slashes, i.e. '<code>/pattern/</code>' .  
A non-printing character is written by using the backslash ("escape") character '<code>\</code>', e.g. '<code>/\n/</code>'  
A non-printing character is written by using the backslash ("escape") character '<code>\</code>', e.g. '<code>/\n/</code>'  
Line 19: Line 21:
or '<code>{}</code>', some well known Perl authors prefer this style: <code>$a =~ s{\\/} {\\/\\/}</code>, because it is even clearer.
or '<code>{}</code>', some well known Perl authors prefer this style: <code>$a =~ s{\\/} {\\/\\/}</code>, because it is even clearer.


====special symbols====
===special symbols===
Perl introduced a whole new flock of shortcuts for classes of characters, usually combined with their (upper case) complement,  
Perl introduced a whole new flock of shortcuts for classes of characters, usually combined with their (upper case) complement,  
i.e., '<code>/\w/</code>' stands for all "white" characters (blank, tab, newline, and a few special ones),  
i.e., '<code>/\w/</code>' stands for all "white" characters (blank, tab, newline, and a few special ones),  
Line 26: Line 28:
The whole list can be found in the "Camel" book [1].
The whole list can be found in the "Camel" book [1].


====inline comments====
===inline comments===
Since version 5.002 a regular expression can be written with inline comments, if the closing delimiter is followed by the 'x' oprerator.  
Since version 5.002 a regular expression can be written with inline comments, if the closing delimiter is followed by the 'x' oprerator.  
Here a short program to eliminate comments from html code (by Perl author Tom Christiansen, with his original comments):
Here a short program to eliminate comments from html code (by Perl author Tom Christiansen, with his original comments):
Line 64: Line 66:
</code>
</code>


==Literature==
===Literature===
*[1] Larry Wall, Tom Christiansen, Jon Orwant: ''Programming Perl'' - (the Camel Book). O'Reilly Media, Inc.; 3 edition (July 14, 2000). ISBN 0596000278. The standard reference.
*[1] Larry Wall, Tom Christiansen, Jon Orwant: ''Programming Perl'' - (the Camel Book). O'Reilly Media, Inc.; 3 edition (July 14, 2000). ISBN 0596000278. The standard reference.
*[2] Jeffrey E. F. Friedl: ''Mastering Regular Expressions'' - (the Owls Book). O'Reilly Media, Inc.; 3 edition (August 8, 2006). ISBN 0596528124. All you ever need to know about Regular Expressions, not Perl specific
*[2] Jeffrey E. F. Friedl: ''Mastering Regular Expressions'' - (the Owls Book). O'Reilly Media, Inc.; 3 edition (August 8, 2006). ISBN 0596528124. All you ever need to know about Regular Expressions, not Perl specific
[[Category:CZ Live]]
[[Category:Computers Workgroup]]

Latest revision as of 13:06, 24 March 2008

This article is developing and not approved.
Main Article
Discussion
Related Articles  [?]
Bibliography  [?]
External Links  [?]
Citable Version  [?]
Code [?]
Addendum [?]
 
This addendum is a continuation of the article Perl.

Enhancements for readability

healing slasheritis

In the standard Unix tools such as sed, a regular expression is enclosed in a pair of slashes, i.e. '/pattern/' . A non-printing character is written by using the backslash ("escape") character '\', e.g. '/\n/' represents a newline character in a pattern. Certain printing characters --of course the metacharacter '/' itself is one of them-- also need to be escaped. So, to match against '/', the pattern would be written as '/\//'. This is not uncommon, for example in file (path) names.

It gets confusing quickly if e.g. '\/' is to be substituted by its duplicate '\/\/'. Both the backslash and the slash need to be escaped: '/\\\//' represents the string '\/' inside a pattern definition. The "substitute" construct '$g =~ s/a/b/' (substitute 'a' by 'b') explodes into the so-called slasheritis: '$g =~s/\\\//\\\/\\\//', i.e. such regular Expression patterns become quickly unreadable.

Perl's solution is to allow the definition of pattern delimiters "on-the-fly", after all Perl knows exactly that a pattern definition begins after the '=~' operator, so why not take the well-chosen next character to represent the delimiter? Now you can resolve the above slasheritis by writing '$g =~ s#\\/#\\/\\/#' (you still need to escape the backslash), and everything is (somewhat) clearer again. It is customary to use non-alphanumeric characters, such as '!#|' as delimiters, but since Perl knows about paired characters such as '<>' or '{}', some well known Perl authors prefer this style: $a =~ s{\\/} {\\/\\/}, because it is even clearer.

special symbols

Perl introduced a whole new flock of shortcuts for classes of characters, usually combined with their (upper case) complement, i.e., '/\w/' stands for all "white" characters (blank, tab, newline, and a few special ones), and '/\W/' (capital 'W') stands for all non-white characters. Similarly, '/\d/' stands for numerical characters ("digit"), '/\D/' for non-digits, etc. The whole list can be found in the "Camel" book [1].

inline comments

Since version 5.002 a regular expression can be written with inline comments, if the closing delimiter is followed by the 'x' oprerator. Here a short program to eliminate comments from html code (by Perl author Tom Christiansen, with his original comments):

#!/usr/bin/perl -p0777
#
# htdecom -- remove html comments from a document
# tchrist@perl.com
# 
# taken from the larger striphtml program

require 5.002;

s{ <!                  # comments begin with a `<!'
                       # followed by 0 or more comments;

   (.*?)               # this is actually to eat up comments in non 
                       # random places

    (                  # not suppose to have any white space here

                       # just a quick start; 
     --                # each comment starts with a `--'
       .*?             # and includes all text up to and including
     --                # the *next* occurrence of `--'
       \s*             # and may have trailing while space
                       #   (albeit not leading white space XXX)
    )+                 # repetire ad libitum  XXX should be * not +
   (.*?)               # trailing non comment text
  >                    # up to a `>'
}{
   if ($1 || $3) {     # this silliness for embedded comments in tags
       "<!$1 $3>";
 } 
}gesx;                 # mutate into nada, nothing, and niente

Literature

  • [1] Larry Wall, Tom Christiansen, Jon Orwant: Programming Perl - (the Camel Book). O'Reilly Media, Inc.; 3 edition (July 14, 2000). ISBN 0596000278. The standard reference.
  • [2] Jeffrey E. F. Friedl: Mastering Regular Expressions - (the Owls Book). O'Reilly Media, Inc.; 3 edition (August 8, 2006). ISBN 0596528124. All you ever need to know about Regular Expressions, not Perl specific