(back to main page)

I've written a q extension that enables the use of the pcre library for regular-expression-based pattern matching and text replacement. Note that "pcre" stands for "Perl-Compatible Regular Expressions". Go to http://www.pcre.org for more info about pcre.

You can get the installation bundle here. See below for installation instructions.

This extension provides the following functions:

pmatch:perform regular expression matching: returns info for the first match found
pmatchall:perform regular expression matching: returns info for all matches found
psubst:perform text substitution based on regular expressions

pmatch[subject; pattern; options]

Perform regular expression matching and return info for the first match found

arguments:

subject:

the text to match against

pattern:

a pcre-compatible regular expression

options:

zero or more of the following pcre match options in the form of a single string or symbol:

i:ignore case when matching (PCRE_CASEALL)
m:match multiple lines (PCRE_MULTILINE)
s:dot also matches newlines (PCRE_DOTALL)
x:use extended syntax (PCRE_EXTENDED)

for more complete explanations of these options, see the pcre man pages: http://www.pcre.org/pcre.txt

the three arguments can be either chars, char lists, or symbols, in any combination

The pmatch function returns a four-element list:

element 0:

1b if the match succeeded, 0b otherwise

element 1:

a more meaningful return code

this is either a positive integer containing the number of expressions and subexpressions that were matched, a negative integer which is a standard pcre error code (see http://www.pcre.org/pcre.txt), or one of the following:

-100:one or more arguments were passed incorrectly
-101:the regular expression could not be parsed
element 2:

a list of the matching subexpressions, or an empty list if the match failed. It will be a list of symbols if the match subject is a symbol; otherwise, it will be a list of char lists

element 3:

a list of of pairs of integers that represent the start and end position of every matching subexpression in the subject string, or an empty list if the match failed. The end position is the index of the character which follows the end of the match, which means that you can subtract the start position from the end position to get the length of the matched string.

examples:

q)s:"Now is the time for all good men to come to the aid of their hippo"
q)pmatch[s; "^now\\s+is"; `] / this will fail
0b
-1
()
()
q)pmatch[s; "^now\\s+is"; `i] / this will succeed
1b
1
,"Now is"
,(0;6)
q)pmatch[s; `$"(Good|Bad).*(hippo|axolotl)"; `] / will fail
0b
-1
()
()
q)pmatch[s; `$"(Good|Bad).*(hippo|axolotl)"; "i"] / will succeed (char lists)
1b
3
("good men to come to the aid of their hippo";"good";"hippo")
((24;66);(24;28);(61;66))
q)xs:`$s
q)pmatch[xs; `$"(Good|Bad).*(hippo|axolotl)"; "i"] / will succeed (symbol list)
1b
3
(`good men to come to the aid of their hippo`good`hippo)
((24;66);(24;28);(61;66))

Note that in the second-to-last example, the matched subexpressions are returned as a list of char lists, because the match subject is a char list. In the final example, the matched subexpressions are returned as a list of symbols, because the match subject is a symbol.

pmatchall[subject; pattern; options]

Perform regular expression matching and return info for all matches found

Note that this routine returns a more complex data structure than the one returned by pmatch. Currently, its principal use is as a helper function for psubst (below).

arguments:

subject:

the text to match against

pattern:

a pcre-compatible regular expression

options:

zero or more of the following pcre match options in the form of a single string or symbol:

i:ignore case when matching (PCRE_CASEALL)
m:match multiple lines (PCRE_MULTILINE)
s:dot also matches newlines (PCRE_DOTALL)
x:use extended syntax (PCRE_EXTENDED)

for more complete explanations of these options, see the pcre man pages: http://www.pcre.org/pcre.txt

the three arguments can be either chars, char lists, or symbols, in any combination

The pmatchall function returns a three-element list:

element 0:

1b if the match succeeded, 0b otherwise

element 1:

the number of matches found in the text.

element 2:

a list of lists of data that represents the matching expressions, or an empty list if the match failed. Each element of this list consists of a pair of lists (each referred to as an "item", below). There is one element for each match that is found in the text.

item 0:a list of the matching subexpressions for the nth match. It will be a list of symbols if the match subject is a symbol; otherwise, it will be a list of char lists. There is one entry for each matching subexpression.
item 1:a list of of pairs of integers that represent the start and end position of every matching subexpression in the nth match. The end position is the index of the character which follows the end of the match, which means that you can substract the start position from the end position to get the length of the matched string. There is one entry for each matching subexpression.

examples:

q)s:"Now is the time for all good men to come to the aid of their hippo"
q)pmatchall[s; "T(O)+"; `]  / this will fail
0b
-1
()
q)pmatchall[s; "T(O)+"; `i] / this will succeed
1b
2
((("to";,"o");((33;35);(34;35)));(("to";,"o");((41;43);(42;43))))

psubst[subject; pattern; options; replacement; count]

Perform text substitution based on regular expressions

arguments:

subject:

the text upon which the substitution will be performed

pattern:

a pcre-compatible regular expression

options:

zero or more of the following pcre match options in the form of a single string or symbol:

i:ignore case when matching (PCRE_CASEALL)
m:match multiple lines (PCRE_MULTILINE)
s:dot also matches newlines (PCRE_DOTALL)
x:use extended syntax (PCRE_EXTENDED)

for more complete explanations of these options, see the pcre man pages: http://www.pcre.org/pcre.txt

replacement:

the replacement text; this can contain references to the matched patterns and subpatterns as follows:

\0:the string which matches the entire pattern
\n:the string which matches the nth subexpression, where n is a number that ranges from 1 to the subexpression count

note that you need to double the backslash if this replacement text is a string; for example: "abc\\1fff\\2xyz"

count:

the number of substitutions to perform; if zero, perform the substitions on all patterns that match within the subject string

the four initial arguments can be either chars, char lists, or symbols, in any combination

The psubst function returns the substituted text. This will be contained in a symbol if subject is a symbol; otherwise, it will be contained within a char list.

examples:

q)s:"Now is the time for all good men to come to the aid of their hippo"
q)psubst[s; "hippo"; `; "potamus"; 0]        / succeeds
"Now is the time for all good men to come to the aid of their potamus"
q)psubst[s; "HiPpO"; `; "potamus"; 0]        / fails
"Now is the time for all good men to come to the aid of their hippo"
q)psubst[s; "HiPpO"; `i; "potamus"; 0]       / succeeds
"Now is the time for all good men to come to the aid of their potamus"
q)t:"Doo-wah diddy, diddy-dum, diddy-doo"
q)psubst[t; "diddy"; `; "diggity"; 1]        / replaces only the 1st instance
"Doo-wah diggity, diddy-dum, diddy-doo"
q)psubst[t; "diddy"; `; "diggity"; 0]        / replaces all instances
"Doo-wah diggity, diggity-dum, diggity-doo"
q)psubst[t; "d(.[^d])"; `i; "sh\\1"; 0]      / subexpression replacement
"shoo-wah dishdy, dishdy-shum, dishdy-shoo"
q)tt:`$t
q)psubst[tt; "d(.[^d])"; `i; `$"sh\\1"; 0]   / works with symbols, too
`shoo-wah dishdy, dishdy-shum, dishdy-shoo

installation:

Unzip the installation bundle and cd into the newly created "pcre" directory. Then, edit the Makefile and change the macros near the top of that file to conform to your system configuration. Then, type "make" followed by "make install".

This will only work if pcre is installed on your system. If it isn't, you can find the pcre installation bundle and lots of documentation here: http://www.pcre.org.

Finally, within your q session, invoke the following command to load these functions into your workspace:

\l pcre.q

software licensing:

Copyright (C) 2007, 2008, Lloyd Zusman <q.o@potam.us>

This program is free software; you can redistribute
it and/or modify it under the terms of Version 2 of
the GNU General Public License as published by the
Free Software Foundation.

This program is distributed in the hope that it will
be useful, but WITHOUT ANY WARRANTY; without even the
implied warranty of MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.  See the GNU General Public License
for more details: http://q.o.potam.us/license or
http://www.gnu.org/copyleft/gpl.html

You should have received a copy of the GNU General
Public License along with this program; if not, write to
the Free Software Foundation, Inc., 51 Franklin Street,
Fifth Floor, Boston, MA  02110-1301, USA.