- A RegularExpression is a way of describing search patterns. The letters A-Z and the numbers 0-9 match themselves (case sensitively) and "." matches the "any character". "*" matches the previous charactor zero or more times. \ prevents the next character from having special meaning. so
(any number of a's, followed by a b, two characters and an e)
- will match
- but not
| . | Any single character
| ^ | Beginning of line
| $ | End of line
| \any character | Match any character exactly (even if it's a special character)
| [''character group''? | Any single character in the group
Things that alter the previous expression
| ? | Match the previous expression exactly zero or one times
| * | Match the previous expression zero or more times
| + | Match the previous expression one or more times
regex(7) explains all the neat things you can do with RegularExpressions and the different types. perlre(1) explains perl's extended regex's.
- grep(1) is a command to look for a regex in a file. eg
- grep 'foo' /tmp/baz.txt
- will look for the string "foo" in /tmp/baz.txt. More usefully
- grep 'wlug\.linuxcare\.co\.nz' *
will search for every occurance of "wlug.linuxcare.co.nz" in all the files in this directory.
- sed(1) is a "script editor" which uses regex's. sed is usually used for it's amazing search and replace capability. for (simple) example
- sed 's/foo/baz/g' <a.txt >b.txt
will search for "foo" and replace it with "baz" in a.txt and output the result in b.txt
perl(1) can also be used for inplace substitutions like so
perl -pi -e 's/foo/bar/g' a.txt
will replace all instances of "foo" with "bar" in a.txt
awk(1) is a tool for doing processing on record orientated files. It allows you to specify different actions to perform based on regex's.
See also: File Globs
Tricks and Traps:
- When specifying regex's on the command line, surround them in single quotes "'", it's just easier that way.
Examples of single-character expressions
To match any lowercase vowel:
To match any lowercase or uppercase vowel:
To match any single digit:
The same thing:
Any single digit or minus:
Any lowercase letter:
The ^ character can be used to negate a  pattern:
To match anything except a lowercase letter:
To match anything except a lowercase or uppercase letter, digit or underscore:
These can be used with * too, so:
matches any number of digits, including no digits.
Note: These apply to perl regular expressions. They will most likely work in other regex parsers such as sed, but there may be subtle differences.
To match any digit:
(Equivalent to /[0-9?/)
To match any 'word' character:
(Equivalent to /[a-zA-Z0-9_?/)
To match any space character:
(Equivalent to /[ \r\t\n\f?/)
\D, \W and \S are the negated versions of \d, \w and \s:
/[\D?/ is equivalent to /[^0-9?/
/[\W?/ is equivalent to /[^a-zA-Z0-9_?/
/[\S?/ is equivalent to /[^ \r\t\n\f?/
As mentioned before, perl uses an extended regular expression syntax explained on the perlre(1) man page.
Having said that though, here are some hints.
DON'T interprete variables as a regular expression
if we have
$text = s/\Q$search\E/XX/ ;
would replace the substring '[c?' in $text with "XX", while
$text = s/$search/XX/ ;
would replace all the occurrences of the character 'c' with "XX".