Regular Expressions Examples


A regular expression is a sequence of characters that a search pattern. The search pattern can be used for text search and text replace operations. In this post we have a few common examples of regular expressions that serve mainly as a learning tool, rather than a reference. A good reference book that has many common examples of regular expressions is Regular Expressions Cookbook, Second Edition published by O’Reilly Media, Inc. written by By Jan Goyvaerts and Steven Levithan.

Here are a few examples of regular expressions. If you want to test your regular expressions on a web site you may try https://regex101.com. We start with some simple examples.

Characters

You can simply search for a series of characters in another series of characters by just specifying the characters. For example, to search/match the word calendar just use calendar in the regular expression. You can use a character class to find calendar and its common misspellings as shown in the example below. The notation using square brackets is called a character class. A character class matches a single character out of a list of possible characters. The three classes in the regular expression below match either an a or an e, and they do it independently.

c[ae]l[ae]nd[ae]r

As previously mentioned, if you want to search/match the three characters cat you just specify cat. However if you want to match the word cat you need a different regular expression. Just specifying cat will potentially match cat, concatenate, application and many others. To only match the word cat you use the following regular expression.

\bcat\b

The word boundaries at both ends of the regular expression ensure that cat is matched only when it appears as a complete word. More precisely, the word boundaries require that cat is set apart from other text by the beginning or end of the string, whitespace, punctuation, or other non-word characters. Regular expression engines consider letters, numbers, and underscores to all be word characters.

Digits

[0-9]{3}
\d\d\d

The above two are equivalent. It matches 3 digits in the range from zero to nine. However, know that it will match any string of characters that include three digits in a row, such as “247MGR”, “R520” and so on. If you want to match three digits and only three digits then you can use the ^ at the beginning and the $ at the end as shown below. So, \d matches any digit, \D matches anything that is not a digit and \D is equivalent to ^\d.

Seven-Digit Phone Number

^\d{3}-?\d{4}$

The above code will match a seven-digit phone number, with or without a dash. The -? in the expression means zero or one occurrences of the dash. It doesn’t mean it is a valid phone number however, it just matches three digits from zero to nine, perhaps a hyphen and then another four digits from zero to nine.The ^ means match at the beginning of the string and the $ means match at the end of the string.

Ten-Digit Phone Number

Area codes in telephone numbers in North America cannot begin with a zero or a one. Therefore it must start with the digits 2 through 9, as shown with ^[2-9]. The next two digits can be anything 0 through 9 as shown with \d{2}. The phone number may or may not be delimited. If you want to match on no delimiter, a dash or a space character, you can use this in your pattern: [- ]?. Read that code as zero or one occurrences of either a dash or a space. Therefore it would match on no delimiter, one dash or one space.

^[2-9]\d{2}[- ]?\d{3}[- ]?\d{4}$

To allow for the area code to be delimited with round brackets, we could use this expression below.

^[(]?[2-9]\d{2}[- )]?[ ]?\d{3}[- ]?\d{4}$

The above will match the following examples.

  • 4165554454
  • 416 555 4454
  • 416-555-4454
  • 416 555-4454
  • 416-5554454
  • 416-555 4454
  • (416)5554454
  • (416) 555 4454
  • (416)555-4454
  • (416) 555-4454
  • (416)555 4454
  • (416555-4454
  • (416-555-4454
  • 416)555-4454

As you can see from the above list of matches, our regular expression may not work in every case. Now looking at non-matches, it will not match these (and a whole bunch more not listed here):

  • (416)-555 4454
  • (416)-555-4454

Canadian telephone numbers have a three-digit area code, a three-digit central office code (or exchange code) and a four-digit station code.

Canadian Postal Code

^[A-Za-z]\d[A-Za-z][ -]?\d[A-Za-z]\d$

Perhaps you could also use the i flag so that the explicit upper/lowercase checks aren’t needed. To dig a little deeper, one person on the Internet commented that postal codes never start with characters that might be mistaken for numbers: I, O, Q, U, W, Z. I don’t know if this is true or not.

Did you know that no valid Canadian postal code starts with the letter D? D is not the only letter they cannot start with. Here is an expression.

^[ABCEGHJKLMNPRSTVXYabceghjklmnprstvxy]{1}\d{1}[A-Za-z]{1} *\d{1}[A-Za-z]{1}\d{1}$

With the code space asterisk in the above expression, we have allowed for none or many spaces separating the two parts of the postal code.

US Zip Codes

The expression below matches all US format zip code formats (e.g., “58175-0021” or “58175”).

^\d{5}(-\d{4})?$

The one below matches either a US or Canadian zip code. To code logical OR in a regular expression we use the format: ( )|( ). Also, below we have not allowed lower-case letters in the Canadian postal codes.

(^\d{5}(-\d{4})?$)|(^[ABCEGHJKLMNPRSTVXY]{1}\d{1}[A-Z]{1} *\d{1}[A-Z]{1}\d{1}$)

Parentheses () groups a series of pattern elements to a single element. When you match a pattern within parentheses, you can use any of $1, $2, … later to refer to the previously matched pattern.

Passwords

Below is the regular expression for six or more characters (example – entering a password).

.{6,}

Below is the expression for a password that must contain 6 or more characters that are of at least one number, and one uppercase and lowercase letter.

(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).{6,}

The period after the ?= means any character except newline. The asterisk after the period means any number of times between zero and infinity. The ?= is a positive lookahead. The \d is a digit from 0 to 9. So in the first part we have a search for anything or nothing (except newline) plus a digit from 0 to 9. The other two parts between the parenthesis work the same way. If you wanted to make the password even more secure you could require the user to also include at least one symbol from the following list of acceptable symbols: !@#$%^&*()_+ as shown in the expression below. This example can be seen at w3schools.com.

(?=.*\d)(?=.*[a-z])(?=.*[A-Z])(?=.*[!@#$%^&*()_+]).{6,}

Email

The email must be in the following order: characters@characters.domain (characters followed by an @ sign, followed by more characters, and then a “.” After the “.” sign, you can only write 2 to 4 letters from a to z.

[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,4}$

Leave a comment

Your email address will not be published. Required fields are marked *