regexp(n)
     regexp(n)                    Tcl ( )                    regexp(n)



     _________________________________________________________________

     NAME
          regexp - Match a regular expression against a string

     SYNOPSIS
          regexp  ?switches?  exp   string   ?matchVar?   ?subMatchVar
          subMatchVar ...?
     _________________________________________________________________


     DESCRIPTION
          Determines whether the regular expression exp  matches  part
          or all of string and returns 1 if it does, 0 if it doesn't.

          If additional arguments are specified after string then they
          are  treated  as  the  names of variables in which to return
          information about  which  part(s)  of  string  matched  exp.
          MatchVar will be set to the range of string that matched all
          of exp.  The first subMatchVar will contain  the  characters
          in   string   that   matched   the   leftmost  parenthesized
          subexpression within exp, the next subMatchVar will  contain
          the   characters   that   matched   the  next  parenthesized
          subexpression to the right in exp, and so on.

          If the initial arguments to regexp start with  -  then  they  |
          are   treated  as  switches.   The  following  switches  are  |
          currently supported:                                          |

          -nocase                                                            ||
                    Causes  upper-case  characters  in  string  to  be  |
                    treated as lower case during the matching process.  |

          -indices                                                           ||
                    Changes   what  is  stored  in  the  subMatchVars.  |
                    Instead of storing the  matching  characters  from  |
                    string,  each  variable will contain a list of two  |
                    decimal strings giving the indices  in  string  of  |
                    the  first  and  last  characters  in the matching  |
                    range of characters.                                |

          --                                                                 ||
                    Marks the end of switches.  The argument following  |
                    this one will be treated as exp even if it  starts  |
                    with a -.

          If  there  are   more   subMatchVar's   than   parenthesized
          subexpressions  within exp, or if a particular subexpression
          in exp doesn't match the string (e.g. because it  was  in  a
          portion  of  the  expression  that wasn't matched), then the
          corresponding subMatchVar  will  be  set  to  ``-1  -1''  if
          -indices has been specified or to an empty string otherwise.



     Page 1                                         (printed 11/11/93)






     regexp(n)                    Tcl ( )                    regexp(n)



     REGULAR EXPRESSIONS
          Regular expressions are implemented  using  Henry  Spencer's
          package  (thanks,  Henry!),  and  much of the description of
          regular expressions below is copied verbatim from his manual
          entry.

          A regular expression is zero or more branches, separated  by
          ``|''.    It  matches  anything  that  matches  one  of  the
          branches.

          A branch is zero or more pieces, concatenated.  It matches a
          match  for  the  first,  followed by a match for the second,
          etc.

          A piece is an atom possibly followed  by  ``*'',  ``+'',  or
          ``?''.  An atom followed by ``*'' matches a sequence of 0 or
          more matches of the atom.  An atom followed by ``+'' matches
          a  sequence  of  1  or  more  matches  of the atom.  An atom
          followed by ``?'' matches a match of the atom, or  the  null
          string.

          An atom is a regular expression in parentheses  (matching  a
          match  for  the  regular  expression),  a range (see below),
          ``.''  (matching any single character), ``^'' (matching  the
          null  string  at  the  beginning of the input string), ``$''
          (matching the null string at the end of the input string), a
          ``\''   followed   by  a  single  character  (matching  that
          character), or a single character with no other significance
          (matching that character).

          A range is a sequence of characters enclosed in ``[]''.   It
          normally matches any single character from the sequence.  If
          the sequence  begins  with  ``^'',  it  matches  any  single
          character  not  from  the  rest  of  the  sequence.   If two
          characters in the sequence are separated by ``-'',  this  is
          shorthand for the full list of ASCII characters between them
          (e.g. ``[0-9]'' matches any decimal digit).   To  include  a
          literal  ``]''  in the sequence, make it the first character
          (following a possible ``^'').  To include a  literal  ``-'',
          make it the first or last character.


     CHOOSING AMONG ALTERNATIVE MATCHES
          In general there may be more than one way to match a regular
          expression  to  an  input string.  For example, consider the
          command

               regexp  (a*)b*  aabaaabb  x  y
          Considering only the rules given so far, x and y  could  end
          up  with  the values aabb and aa, aaab and aaa, ab and a, or
          any  of  several  other  combinations.   To   resolve   this
          potential  ambiguity regexp chooses among alternatives using



     Page 2                                         (printed 11/11/93)






     regexp(n)                    Tcl ( )                    regexp(n)



          the  rule  ``first  then  longest''.   In  other  words,  it
          consders  the possible matches in order working from left to
          right across the  input  string  and  the  pattern,  and  it
          attempts  to  match longer pieces of the input string before
          shorter ones.  More specifically, the following rules  apply
          in decreasing order of priority:

          [1]  If a regular expression could match two different parts
               of  an  input  string  then  it will match the one that
               begins earliest.

          [2]  If a regular expression contains | operators  then  the
               leftmost matching sub-expression is chosen.

          [3]  In *, +, and ? constructs, longer matches are chosen in
               preference to shorter ones.

          [4]  In sequences of expression  components  the  components
               are considered from left to right.

          In the example from above, (a*)b*  matches  aab:   the  (a*)
          portion  of the pattern is matched first and it consumes the
          leading aa; then the b* portion of the pattern consumes  the
          next b.  Or, consider the following example:

               regexp  (ab|a)(b*)c  abc  x  y  z
          After this command x will be abc, y will be ab, and  z  will
          be an empty string.  Rule 4 specifies that (ab|a) gets first
          shot at the input string and Rule 2 specifies  that  the  ab
          sub-expression is checked before the a sub-expression.  Thus
          the b has already been claimed before the (b*) component  is
          checked and (b*) must match an empty string.


     KEYWORDS
          match, regular expression, string



















     Page 3                                         (printed 11/11/93)