컴퓨터/language

Regular Expression

Hikasiru 2006. 5. 12. 15:00
Table 4-2. Regular expression metacharacter syntax

Subexpression

Matches

Notes

General

^

Start of line/string

 

$

End of line/string

 

\b

Word boundary

 

\B

Not a word boundary

 

\A

Beginning of entire string

 

\z

End of entire string

 

\Z

End of entire string (except allowable final line terminator)

 

.

Any one character (except line terminator)

 

[...]

"Character class"; any one character from those listed

 

[^...]

Any one character not from those listed


 

Alternation and grouping

(...)

Grouping (capture groups)


 

|

Alternation

 

(?:re)

Noncapturing parenthesis

 

\G

End of the previous match

 

\n

Back-reference to capture group number "n"

 

Normal (greedy) multipliers

{m,n}

Multiplier for "from m to n repetitions"

 

{m,}

Multiplier for "m or more repetitions"

 

{m}

Multiplier for "exactly m repetitions"

 

{,n}

Multiplier for 0 up to n repetitions

 

*

Multiplier for 0 or more repetitions

Short for {0,}

+

Multiplier for 1 or more repetitions

Short for {1,}

?

Multiplier for 0 or 1 repetitions (i.e, present exactly once, or not at all)

Short for {0,1}

Reluctant (non-greedy) multipliers

{m,n}?

Reluctant multiplier for "from m to n repetitions"

 

{m,}?

Reluctant multiplier for "m or more repetitions"

 

{,n}?

Reluctant multiplier for 0 up to n repetitions

 

*?

Reluctant multiplier: 0 or more

 

+?

Reluctant multiplier: 1 or more

 

??

Reluctant multiplier: 0 or 1 times

 

Possessive (very greedy) multipliers

{m,n}+

Possessive multiplier for "from m to n repetitions"

 

{m,}+

Possessive multiplier for "m or more repetitions"

 

{,n}+

Possessive multiplier for 0 up to n repetitions

 

*+

Possessive multiplier: 0 or more

 

++

Possessive multiplier: 1 or more

 

?+

Possessive multiplier: 0 or 1 times

 

Escapes and shorthands

\

Escape (quote) character: turns most metacharacters off; turns subsequent alphabetic into metacharacters

 

\Q

Escape (quote) all characters up to \E

 

\E

Ends quoting begun with \Q

 

\t

Tab character

 

\r

Return (carriage return) character

 

\n

Newline character


 

\f

Form feed

 

\w

Character in a word

Use \w+ for a word;

\W

A non-word character

 

\d

Numeric digit

Use \d+ for an integer;

\D

A non-digit character

 

\s

Whitespace

Space, tab, etc., as determined by java.lang.Character.isWhitespace( )

\S

A nonwhitespace character


 

Unicode blocks (representative samples)

\p{InGreek}

A character in the Greek block

(simple block)

\P{InGreek}

Any character not in the Greek block

 

\p{Lu}

An uppercase letter

(simple category)

\p{Sc}

A currency symbol

 

POSIX-style character classes (defined only for US-ASCII)

\p{Alnum}

Alphanumeric characters

[A-Za-z0-9]

\p{Alpha}

Alphabetic characters

[A-Za-z]

\p{ASCII}

Any ASCII character

[\x00-\x7F]

\p{Blank}

Space and tab characters

 

\p{Space}

Space characters

[ \t\n\x0B\f\r]

\p{Cntrl}

Control characters

[\x00-\x1F\x7F]

\p{Digit}

Numeric digit characters

[0-9]

\p{Graph}

Printable and visible characters (not spaces or control characters)

 

\p{Print}

Printable characters

Same as \p{Graph}

\p{Punct}

Punctuation characters

One of !"#$%&'( )*+,-./:;<=>?@[\]^_`{|}~

\p{Lower}

Lowercase characters

[a-z]

\p{Upper}

Uppercase characters

[A-Z]

\p{XDigit}

Hexadecimal digit characters

[0-9a-fA-F]