English | 简体中文 | 繁體中文 | Русский язык | Français | Español | Português | Deutsch | 日本語 | 한국어 | Italiano | بالعربية
Scala uses the scala.util.matching package to Regex class to support regular expressions. The following example demonstrates using regular expressions to search for words Scala :
import scala.util.matching.Regex object Test { def main(args: Array[String]) { val pattern = "Scala".r val str = "Scala is Scalable and cool" println(pattern findFirstIn str) } }
Execute the above code, the output will be:
$ scalac Test.scala $ scala Test Some(Scala)
The example uses the r() method of the String class to construct a Regex object.
Then use the findFirstIn method to find the first matched item.
If you need to view all the matched items, you can use the findAllIn method.
You can use the mkString( ) method to concatenate the string of regular expression matching results, and you can use the pipe (|) to set different patterns:
import scala.util.matching.Regex object Test { def main(args: Array[String]) { val pattern = new Regex("(S|s)cala") // The first letter can be uppercase S or lowercase s val str = "Scala is scalable and cool" println((pattern findAllIn str).mkString(",")) // Use commas , to connect the returned results } }
Execute the above code, the output will be:
$ scalac Test.scala $ scala Test Scala,scala
If you need to replace the matched text with a specified keyword, you can use replaceFirstIn( ) The method to replace the first matched item, using replaceAllIn( ) The method replaces all matched items, for example:
object Test { def main(args: Array[String]) { val pattern = "(S|s)cala".r val str = "Scala is scalable and cool" println(pattern replaceFirstIn(str, "Java")) } }
Execute the above code, the output will be:
$ scalac Test.scala $ scala Test Java is scalable and cool
Scala's regular expressions inherit the syntax rules of Java, while Java mostly uses the rules of the Perl language.
The following table lists some commonly used regular expression rules:
Expression | Matching rule |
---|---|
^ | Match the position at the beginning of the input string. |
$ | Match the position at the end of the input string. |
. | Match any single character except "\r\n" |
[...] | Character class. Matches any character included. For example, "[abc]" matches "plain" with "a". |
[^...] | Negative character class. Matches any character not included. For example, "[^abc]" matches "plain" with "p", "l", "i", "n". |
\\A | Match the start position of the input string (no multi-line support) |
\\z | String end (similar to $, but not affected by the multi-line option) |
\\Z | String end or line end (not affected by the multi-line option) |
re* | Repeat zero times or more |
re+ | Repeat once or more |
re? | Repeat zero times or once |
re{ n} | Repeat n times |
re{ n,} | Repeat n times or more |
re{ n, m} | Repeat n to m times |
a|b | Match a or b |
(re) | Match re, and capture the text into an automatically named group |
(?: re) | Match re, do not capture the matched text, and do not assign a group number to this group |
(?> re) | Greedy subexpression |
\\w | Match a letter or digit or underscore or Chinese character |
\\W]} | Match any character that is not a letter, digit, underscore, or Chinese character |
\\s | Match any whitespace character, equivalent to [\t\n\r\f] |
\\S | Match any non-whitespace character |
\\d | Match digits, similar to [0-9] |
\\D | Match any non-digit character |
\\G | The start of the current search |
\\n | Newline |
\\b | Typically word boundary positions, but if used within a character class, it represents a backspace |
\\B | Match positions that are not the start or end of a word |
\\t | Tab |
\\Q | Start quote:\Q(a+b)*3\E Can match text "(a+b)*3". |
\\E | End quote:\Q(a+b)*3\E Can match text "(a+b)*3". |
Example | Description |
---|---|
. | Match any single character except "\r\n" |
[Rr]uby | Match "Ruby" or "ruby" |
rub[ye] | Match "ruby" or "rube" |
[aeiou] | Match lowercase letters: aeiou |
[0-9] | Match any digit, similar to [0123456789] |
[a-z] | Match any ASCII lowercase letter |
[A-Z] | Match any ASCII uppercase letter |
[a-zA-Z0-9] | Match digits and uppercase letters |
[^aeiou] | Match any character except aeiou |
[^0-9] | Match any character except digits |
\\d | Match digits, similar to: [0-9] |
\\D | Match non-digits, similar to: [^0-9] |
\\s | Match whitespace characters, similar to: [ \t\r\n\f] |
\\S | Match non-whitespace characters, similar to: [^ \t\r\n\f] |
\\w | Match letters, numbers, underscores, similar to: [A-Za-z0-9_] |
\\W]} | Match non-letter, non-digit, non-underscore, similar to: [^A-Za-z0-9_] |
ruby? | Match "rub" or "ruby": y is optional |
ruby* | Match "rub" followed by 0 or more of y. |
ruby+ | Match "rub" followed by 1 numbers. |
\\d{3} | Exactly match 3 numbers. |
\\d{3,} | Match 3 numbers. |
\\d{3,5} | Match 3 or4 or 5 numbers. |
\\D\\d+ | No grouping: + Repeat \d |
(\\D\\d)+/ | Grouping: + Repeat \D\d pairs |
([Rr]uby(, )?)+ | Match "Ruby", "Ruby, ruby, ruby", etc. |
Note that each character in the table above is represented by two backslashes. This is because in Java and Scala, the backslash is an escape character in strings. So if you want to output \, you need to write \\ in the string to get a single backslash. See the following example:
import scala.util.matching.Regex object Test { def main(args: Array[String]) { val pattern = new Regex("abl[ae]\d+") val str = "ablaw is able1 and cool println((pattern findAllIn str).mkString(",")) } }
Execute the above code, the output will be:
$ scalac Test.scala $ scala Test able1