English | 简体中文 | 繁體中文 | Русский язык | Français | Español | Português | Deutsch | 日本語 | 한국어 | Italiano | بالعربية

C# Regular Expressions

Regular expressions It is a pattern that matches input text.

.Net framework provides a regular expression engine that allows this kind of matching.

A pattern is composed of one or more characters, operators, and structures.

Defining regular expressions

The following lists various characters, operators, and structures used to define regular expressions.

  • Character escaping

  • Character class

  • Locators

  • Group construction

  • Quantifiers

  • Backreference construction

  • Alternative construction

  • Replacement

  • Miscellaneous constructions

Character escaping

The backslash character (\) in regular expressions indicates that the character following it is a special character, or that the character should be interpreted literally.

The following table lists escape characters:

Escape charactersDescriptionPatternMatch
\aMatches the alarm (bell) symbol \u0007 Matches.\aWarning! + '\u0007' with "\u0007"
\bIn the character class, matches the backspace key \u0008 Matches.[\b]{3,}The "\b\b\b\b" in "\b\b\b\b"
\tMatches the tab character \u0009 Matches.(\w+)\tThe "Name\t" and "Addr\t" in "Name\tAddr\t"
\rMatches the carriage return character \u000D. (\r is not equivalent to the newline character \n.)\r\n(\w+)The "\r\nHello" in "\r\nHello\nWorld."
\vMatches the vertical tab character \u000B.[\v]{2,}The "\v\v\v" in "\v\v\v"
\fMatches the formfeed character \u000C.[\f]{2,}The "\f\f\f" in "\f\f\f"
End of line.Matches the newline character \u000A.\r\n(\w+)The "\r\nHello" in "\r\nHello\nWorld."
\eMatches the escape character \u001B Match.\e"\x001B" with "\x001B"
\ nnnSpecifies a character using octal notation (nnn consists of two to three digits).\w\040\wThe "a b" and "c d" in "a bc d"
\x nnSpecifies a character using hexadecimal notation (nn is exactly two digits).\w\x20\wThe "a b" and "c d" in "a bc d"
\c X \c xMatches the ASCII control character specified by X or x, where X or x is the letter of the control character.\cC"\x0003" with "\x0003" (Ctrl-C)
\u nnnnMatches a Unicode character (a four-digit number represented by nnnn) using hexadecimal notation.\w\u0020\wThe "a b" and "c d" in "a bc d"
\Matches the character following an unrecognized escape character.\d+[\+-x\*]\d+\d+[\+-x\*\d+End of string followed by an opening parenthesis "(".2+2) * 3*9" within "2+2" and "3*9"

Character class

Character classes match any character in a set of characters.

The following table lists character classes:

Character classDescriptionPatternMatch
[character_group]Matches any single character in character_group. By default, the match is case-sensitive.[mn]The "m" in "mat", the "m" and "n" in "moon"
[^character_group]Not: Matches any single character not in the character_group. By default, characters in character_group are case-sensitive.[^aei]"avail" 中的 "v" 和 "l"
[ first - last ]字符范围:与从 first 到 last 的范围中的任何单个字符匹配。[b-d][b-d]irds 可以匹配 Birds、 Cirds、 Dirds
.通配符:与除 \n 之外的任何单个字符匹配。
若要匹配原意句点字符(. 或 \u002E),您必须在该字符前面加上转义符 (\.)。
a.e"have" 中的 "ave", "mate" 中的 "ate"
\p{ name }name The wildcard matches any single character except for \n.Matches any character that is not "a", "e", or "i".The characters "v" and "l" within the string "avail"
Character range: Matches any single character within the range from first to last.The string "d]irds" can match "Birds", "Cirds", "Dirds" name The wildcard matches any single character except for \n.To match the literal dot character (.), you must precede the character with an escape character (\).a.e
\wMatch the characters "ave" within the word "have" and the characters "ate" within the word "mate"\wThe characters "i", "t", and "y" within the string "City"1The characters "C" and "L" within the string "City Lights"1"
Match any single character that is a Unicode letter.Match any single character that is not a Unicode letter.Match any single character that is a Unicode letter.The characters "i", "t", and "y" within the string "City"1Match any word character.
The characters "R", "o", "m", and " " within the stringMatch any non-word character.Match the word "Room#"The hash "#" within the string1.3Match any whitespace character.
Match the word "ID A"The space " " within the string "D"Match any non-whitespace character.Match any non-whitespace character.
\dThe underscore "_" within the string "int __ctr"\d"4 Match any decimal digit.4"
Match any character that is not a decimal digit.The character within "= IV" is not a decimal digit.Match any character that is not a decimal digit."4 The space, "=" signs, " " signs, "I", and "V" within "= IV"

Locators

Locators or atomic zero-width assertions can make a match successful or fail, depending on the current position in the string, but they will not advance the engine or use characters in the string.

The table below lists the locators:

AssertionDescriptionPatternMatch
End of string followed by a caret symbol "^".The match must start from the beginning of the string or a line.Match a decimal digit at the beginning of the string or a line.3}"567-777-" within "567"
$The match must appear at the end of the string or appear at the end of a line or the end of the string followed by a dollar sign "$". End of line. Before.-Match a decimal digit.4End of string followed by a dollar sign "$"."8-12-2012" within "-2012"
End of string followed by a closing curly brace "}".The match must appear at the beginning of the string.Match a word character.4}-007-The word "Code" within quotes.
End of string followed by a closing curly brace "}".The match must appear at the end of the string or appear at the end of the string followed by a space or a tab. End of line. Before.-Match a decimal digit.3End of string followed by a closing curly brace "}".The word "Bond"-901-007" within "-007"
End of string followed by a closing curly brace "}".The match must appear at the end of the string.-Match a decimal digit.3End of string followed by a closing curly brace "}"."-901-333" within "-333"
Match the group number immediately following the last match.The match must appear at the location where the previous match ended.Match the group number immediately following the last match.End of string followed by an opening parenthesis "(".1End of string followed by an opening parenthesis "(".3End of string followed by an opening parenthesis "(".5End of string followed by a closing square bracket "]".7End of string followed by a square bracket "[".9End of string followed by a space, then the opening parenthesis "(".1End of string, followed by a space, then the opening parenthesis "(".3))" and "("5)"
\bMatch a word boundary, which refers to the position between a word and a space.er\bMatch "never" with "er", but do not match "verb" with "er".
\BMatch a non-word boundary.er\BMatch "verb" with "er", but do not match "never" with "er".

Group construction

Group constructions describe the subexpressions of regular expressions, which are usually used to capture substrings of the input string.

The following table lists the group constructions:

Group constructionDescriptionPatternMatch
( subexpression )Capture the matched subexpression and assign it to a zero-based index.(\w)1In the word "deep", the letters "ee"
(?< name >subexpression)Capture the matched subexpression into a named group.(?< double>\w)\k< double>In the word "deep", the letters "ee"
(?< name1 -name2 >subexpression)Define a balanced group definition.(((?'Open'\()[^\(\)]*)+((?'Close-Open'\))[^\(\)]*)+)*(?(Open)(?!))$"3+2^((1-3)*(3-1))" contains "("1-3)*(3-1))"
(?: subexpression)Define a non-capturing group.Write(?:Line)?In the phrase "Console.WriteLine()", the word "WriteLine"
(?imnsx-imnsx:subexpression)Enable or disable subexpression The specified option.A\d{2}(?i:\w+)\b"A12xl A12XL a12xl" contains "A"12xl" and "A12XL"
(?= subexpression)Zero-width positive lookahead assertion.\w+(?=\.)In the sentence "He is. The dog ran. The sun is out.", the words "is", "ran", and "out"
(?! subexpression)Zero-width negative lookahead assertion.\b(?!un)\w+\bIn the phrase "unsure sure unity used", the words "sure" and "used"
(?<=subexpression)Zero-width positive lookahead assertion followed by a break.(?A+B+)"1ABB 3ABBC 5AB 5"AC" contains "1ABB", "3ABB" and "5AB"
using System;
using System.Text.RegularExpressions;
public class Example
{
   public static void Main()
   {
      string input = "1851 1999 1950 1905 2003";
      string pattern = @"(?<=19)\d{2};
      foreach (Match match in Regex.Matches(input, pattern))
         Console.WriteLine(match.Value);
   }
}


Run example »

Quantifiers

Quantifiers specify how many instances of the preceding element (which can be a character, group, or character class) must exist in the input string to produce a match. Quantifiers include the language elements listed in the table.

The following table lists the quantifiers:

QuantifiersDescriptionPatternMatch
*Match the preceding element zero or more times.\d*.\d".0", "19.9" and "219.9"
+Match the preceding element one or more times."be"+""bee" within "been", "be" within "bent"
?Match the preceding element zero or one time."rai?n""ran", "rain"
{ n }Match the preceding element exactly n times.",\d{3""1,043.6" within "043",9,876,543,210" within "876" and ",543" and ",210"
{ n ,}Match the preceding element at least n times."\d{2,""166" and "29" and "1930"
{ n , m }Match the preceding element at least n times, but not more than m times."\d{3,5""166",17668", "193024" within "19302"
*?Match the preceding element zero or more times, but as few times as possible.\d*?\.\d".0", "19.9" and "219.9"
+?Match the preceding element one or more times, but as few times as possible."be"+?""be" within "been", "be" within "bent"
??Match the preceding element zero or one time, but as few times as possible."rai??n""ran", "rain"
{ n }?Match the leading element exactly n times.",\d{3}?""1,043.6" within "043",9,876,543,210" within "876" and ",543" and ",210"
{ n ,}?Match the preceding element at least n times, but as few times as possible."\d{2,}?""166" and "29" and "1930"
{ n , m }?Match the preceding element between n and m times, but as few times as possible."\d{3,5}?""166",17668", "193024" within "193" and "024"

Backreference construction

Backreference allows to identify previously matched subexpressions within the same regular expression.

The following table lists the backreference constructions:

Backreference constructionDescriptionPatternMatch
\numberBackreference. Matches the value of the numbered subexpression.(\w)1"seek" within "ee"
\k<name>Named backreference. Matches the value of the named expression.(?<char>\w)\k<char>"seek" within "ee"

Alternative construction

Alternative construction is used to modify the regular expression to enable either/or match.

The following table lists the alternative constructions:

Alternative constructionDescriptionPatternMatch
|Match any of the elements separated by the vertical bar (|) character.th(e|is|at)In "this is the day. " matches "the" and "this"
(?( expression )yes | no )If the regular expression pattern is matched by expression, then match yes; otherwise match the optional no Part. expression is interpreted as a zero-width assertion.(?(A)A\d{2}\b|\b\d{3}\b)"A10 C103 910" in "10" and "910"
(?( name )yes | no )If the named or numbered capture group has a match, then match yes; otherwise match the optional no.(?< quoted>")?(?(quoted).+?"|\S+\s)"Dogs.jpg "Yiska playing.jpg"" contains "Dogs.jpg" and "Yiska playing.jpg"

Replacement

Replacement is the regular expression used in the replacement pattern.

The following table lists characters used for replacement:

CharacterDescriptionPatternReplacement patternInput stringResulting string
$numberReplace by group number Matched substring.\b(\w+)(\s)(\w+)\b$3$2$1"one two""two one"
${name}Replace by named group name Matched substring.\b(?< word1>\w+)(\s)(?< word2>\w+)\b${word2} ${word1}"one two""two one"
$$Replace the character "$".\b(\d+)\s?USD$$$1"103 USD""$103"
$&Replace a copy of the entire matched item.(\$*(\d*(\.+\d+)?){1)**$&"$1.30""**$1.30**"
$`Replace all text of the input string before the match.B+$`"AABBCC""AAAACC"
$'Replace all text of the matched input string.B+$'"AABBCC""AACCCC"
$+Replace the last captured group.B+(C+)$+"AABBCCDD"AACCDD
$_Replace the entire input string.B+$_"AABBCC""AAAABBCCCC"

Miscellaneous constructions

The following table lists various miscellaneous constructions:

ConstructionDescriptionExample
(?imnsx-imnsx)Set or disable options such as case-insensitive in the middle of the pattern.\bA(?i)b\w+\b Matches "ABA Able Act" for "ABA" and "Able"
(?#Comment)Inline comment. The comment ends at the first right parenthesis.\bA(?#Match words starting with A)\w+\b
#}} [Line End]This comment starts with a non-escaped # and continues to the end of the line.(?x)\bA\w+\b#Match words that start with A

Regex Class

The Regex class is used to represent a regular expression.

The following table lists some commonly used methods in the Regex class:

NumberMethod & Description
1public bool IsMatch( string input )
Indicates whether the regular expression specified in the Regex constructor is found in the specified input string.
2public bool IsMatch( string input, int startat )
Indicates whether the regular expression specified in the Regex constructor is found in the specified input string, starting from the specified starting position in the string.
3public static bool IsMatch( string input, string pattern )
Indicates whether the specified regular expression is found in the specified input string.
4public MatchCollection Matches( string input )
Search for all occurrences of the regular expression in the specified input string.
5public string Replace( string input, string replacement )
Replace all strings that match the regular expression pattern in the specified input string with the specified replacement string.
6public string[] Split( string input )
Split the input string into an array of substrings, based on the positions defined by the regular expression pattern specified in the Regex constructor.

For a complete list of properties of the Regex class, please refer to Microsoft's C# documentation.

Example 1

The following example matches words that start with 'S':

Online Example

using System;
using System.Text.RegularExpressions;
namespace RegExApplication
{
   class Program
   {
      private static void showMatch(string text, string expr)
      {
         Console.WriteLine("The Expression: ") + expr);
         MatchCollection mc = Regex.Matches(text, expr);
         foreach (Match m in mc)
         {
            Console.WriteLine(m);
         }
      }
      static void Main(string[] args)
      {
         string str = "A Thousand Splendid Suns";
         Console.WriteLine("Matching words that start with 'S': ");
         showMatch(str, @"\bS\S*");
         Console.ReadKey();
      }
   }
}

When the above code is compiled and executed, it will produce the following result:

Matching words that start with 'S':
The Expression: \bS\S*
Splendid
Suns

Example 2

The following example matches words that start with 'm' and end with 'e':

using System;
using System.Text.RegularExpressions;
namespace RegExApplication
{
   class Program
   {
      private static void showMatch(string text, string expr)
      {
         Console.WriteLine("The Expression: " + expr);
         MatchCollection mc = Regex.Matches(text, expr);
         foreach (Match m in mc)
         {
            Console.WriteLine(m);
         }
      }
      static void Main(string[] args)
      {
         string str = "make maze and manage to measure it";
         Console.WriteLine("Matching words start with 'm' and end with 'e':");
         showMatch(str, @"\bm\S*e\b
         Console.ReadKey();
      }
   }
}

When the above code is compiled and executed, it will produce the following result:

Matching words start with 'm' and end with 'e':
The Expression: \bm\S*e\b
make
maze
manage
measure

Example 3

The following example replaces extra spaces:

using System;
using System.Text.RegularExpressions;
namespace RegExApplication
{
   class Program
   {
      static void Main(string[] args)
      {
         string input = "Hello   World   ";
         string pattern = "\\s"+";
         string replacement = " ";
         Regex rgx = new Regex(pattern);
         string result = rgx.Replace(input, replacement);
         Console.WriteLine("Original String: {0}", input)
         Console.WriteLine("Replacement String: {0}", result);    
         Console.ReadKey();
      }
   }
}

When the above code is compiled and executed, it will produce the following result:

Original String: Hello     World   
Replacement String: Hello World