Regular Expression

컴퓨터/language

Regular Expression

Hikasiru 2006. 5. 12. 15:00

Table 4-2. Regular expression metacharacter syntax
Subexpression	Matches	Notes
General
`^`	Start of line/string
`$`	End of line/string
`\b`	Word boundary
`\B`	Not a word boundary
`\A`	Beginning of entire string
`\z`	End of entire string
`\Z`	End of entire string (except allowable final line terminator)
.	Any one character (except line terminator)
`[...]`	"Character class"; any one character from those listed
`[^...]`	Any one character not from those listed
Alternation and grouping
`(...)`	Grouping (capture groups)
`\|`	Alternation
`(?`:`re)`	Noncapturing parenthesis
`\G`	End of the previous match
`\n`	Back-reference to capture group number "`n`"
Normal (greedy) multipliers
`{m`,`n}`	Multiplier for "from `m` to `n` repetitions"
`{m,}`	Multiplier for "`m` or more repetitions"
`{m}`	Multiplier for "exactly `m` repetitions"
`{`,`n}`	Multiplier for 0 up to `n` repetitions
`*`	Multiplier for 0 or more repetitions	Short for `{0,}`
`+`	Multiplier for 1 or more repetitions	Short for `{1,}`
`?`	Multiplier for 0 or 1 repetitions (i.e, present exactly once, or not at all)	Short for `{0,1}`
Reluctant (non-greedy) multipliers
`{m`,`n}?`	Reluctant multiplier for "from `m` to `n` repetitions"
`{m,}?`	Reluctant multiplier for "`m` or more repetitions"
`{`,`n}?`	Reluctant multiplier for 0 up to `n` repetitions
`*?`	Reluctant multiplier: 0 or more
`+?`	Reluctant multiplier: 1 or more
`??`	Reluctant multiplier: 0 or 1 times
Possessive (very greedy) multipliers
`{m`,`n}+`	Possessive multiplier for "from `m` to `n` repetitions"
`{m,}+`	Possessive multiplier for "`m` or more repetitions"
`{`,`n}+`	Possessive multiplier for 0 up to `n` repetitions
`*+`	Possessive multiplier: 0 or more
`++`	Possessive multiplier: 1 or more
`?+`	Possessive multiplier: 0 or 1 times
Escapes and shorthands
`\`	Escape (quote) character: turns most metacharacters off; turns subsequent alphabetic into metacharacters
`\Q`	Escape (quote) all characters up to `\E`
`\E`	Ends quoting begun with `\Q`
`\t`	Tab character
`\r`	Return (carriage return) character
`\n`	Newline character
`\f`	Form feed
`\w`	Character in a word	Use `\w+` for a word;
`\W`	A non-word character
`\d`	Numeric digit	Use `\d+` for an integer;
`\D`	A non-digit character
`\s`	Whitespace	Space, tab, etc., as determined by `java.lang.Character.isWhitespace( )`
`\S`	A nonwhitespace character
Unicode blocks (representative samples)
`\p{InGreek}`	A character in the Greek block	(simple block)
`\P{InGreek}`	Any character not in the Greek block
`\p{Lu}`	An uppercase letter	(simple category)
`\p{Sc}`	A currency symbol
POSIX-style character classes (defined only for US-ASCII)
`\p{Alnum}`	Alphanumeric characters	`[A-Za-z0-9]`
`\p{Alpha}`	Alphabetic characters	`[A-Za-z]`
`\p{ASCII}`	Any ASCII character	`[\x00-\x7F]`
`\p{Blank}`	Space and tab characters
`\p{Space}`	Space characters	`[ \t\n\x0B\f\r]`
`\p{Cntrl}`	Control characters	`[\x00-\x1F\x7F]`
`\p{Digit}`	Numeric digit characters	`[0-9]`
`\p{Graph}`	Printable and visible characters (not spaces or control characters)
`\p{Print}`	Printable characters	Same as `\p{Graph}`
`\p{Punct}`	Punctuation characters	One of !"#$%&'( )*+,-./:;<=>?@[\]^_`{\|}~
`\p{Lower}`	Lowercase characters	`[a-z]`
`\p{Upper}`	Uppercase characters	`[A-Z]`
`\p{XDigit}`	Hexadecimal digit characters	`[0-9a-fA-F]`

Regular Expression

Table 4-2. Regular expression metacharacter syntax