Regular expressions syntax

Regular expressions are a series of characters that define a pattern of text. Regular expressions are used in Workbench for expressing search criteria.

Regular expressions are useful when you need to search for text but you do not want to specify the entire content of the text that is returned. Regular expressions are useful in the following situations:

  • You only know a portion of the text that you are searching for. For example, if you are searching for files, you may only know the first three letters used to start the file name.

  • You need to find multiple instances of text that include a certain pattern. For example, if you want to find all of the files that have the same file extension.

When constructing regular expressions, you need to use the correct syntax:

Literal characters

The base components of regular expressions are literal characters. For example, the regular expression abc matches the sequence abc.

Groups

Enclosing a sequence of characters in parentheses forms a group, for example (defg). The characters in a group are treated as a single element.

Any character

The period (.) represents any character. For example a.bc matches the sequence of a followed by any character followed by bc, such as ambc.

The OR operator

To specify a pattern that includes either one of two characters, you separate them with the pipe character (|). For example, x|y matches either the character x or y. The regular expression x|(zy) matches either the character x or the sequence zy.

Quantifiers

Quantifiers are special characters used to express a quantity. Quantifiers are applied to the element that directly precedes the quantifier. For example, when a group precedes a quantifier, the quantifier applies to the characters in the group as a whole.

Character

Quality

Example

?

Zero or one

a? matches a sequence of zero or one a characters.

abc? matches a sequence of ab followed by zero or one c characters, such as abc or ab

*

Zero or more

a*b|c matches a sequence of zero or more a characters followed by either b or c, such as aaaac.

+

One or more

z+ matches a sequence of one or more z characters.

{n}

n times

efg{3}b matches a sequence of ef followed by three consecutive g characters followed by another character, such as efgggb.

{n,}

n or more

a{3,} matches a sequence of three or more consecutive a characters.

{n,m}

n or more, but no more than m

a{3,5} matches a sequence of three, four, or five consecutive a characters.

Note: The backslash character (\) is used for escaping special characters so that you can express them as literal characters. For example, \+ matches the plus sign (+). To specify the backslash character (\) in the path on a Windows file system, you use \\ as follows:
    c:\\file.txt

Character classes

Character classes are a group of characters. You define character classes by enclosing characters and special characters within brackets ([ ]).

Character

Explanation

Example

Literals

The characters that you want to include in the character class

[abc] specifies a or b or c.

^

The negation of characters.

[^abc] specifies any character except a, b, and c.

-

An inclusive range of characters.

[a-z] specifies all characters between a and z, including a and z.

[a-zA-Z] specifies all characters between a and z, inclusive (including a and z), as well as all characters between A and Z, inclusive.

nested brackets

Specifies the union of the character class that is defined by the nested bracket.

[a-d[m-p]] specifies all characters between a and d , inclusive, as well as all characters between m and p, inclusive. Syntactically equivalent to [a-dm-p].

&&

Specifies the intersection of characters with a character class.

[a-z&&[def]] specifies the intersection of the character range a-z with the character class that includes def. This example regular expression evaluates to the characters d, e, or f.

Predefined character classes

Several predefined character classes can be expressed using shorthand.

Character class

Shorthand

Description

[0-9]

\d

A digit.

[^0-9]

\D

Any character that is not a digit.

\\x

\s

A white space character.

[^\s]

\S

Any character that is not a white space character.

[a-zA-Z_0-9]

\w

A character that represents a letter of the alphabet

[^\w]

\W

Any character that is not a letter of the alphabet.

// Ethnio survey code removed