Developer Tools6 min read

How to Use Regex: A Practical Guide to Regular Expressions

Regular expressions let you match patterns in text with precision. Here's the core syntax, the most useful patterns, and how to test and debug your expressions.

A regular expression (regex) is a pattern that describes a set of strings. Instead of searching for a literal word or phrase, you describe the shape of what you're looking for — things like "a digit followed by three letters" or "any email address" or "a line that starts with a hash." Regex is supported in virtually every programming language and in most text editors, making it one of the most transferable skills in software development.

The Building Blocks

Literal characters match themselves. The pattern cat matches the string "cat" wherever it appears.

The dot . matches any single character except a newline:

c.t  →  matches "cat", "cut", "c4t", "c t"

Character classes [] match any one character from a set:

[aeiou]    →  any vowel
[a-z]      →  any lowercase letter
[A-Za-z0-9]  →  any letter or digit
[^aeiou]   →  any character that is NOT a vowel (^ negates inside [])

Shorthand character classes:

  • \d — any digit (equivalent to [0-9])
  • \w — any word character (letters, digits, underscore)
  • \s — any whitespace (space, tab, newline)
  • \D, \W, \S — the negated versions

Quantifiers

Quantifiers specify how many times the preceding element must match:

*     →  zero or more
+     →  one or more
?     →  zero or one (makes it optional)
{3}   →  exactly 3 times
{2,5} →  between 2 and 5 times
{3,}  →  3 or more times

Examples:

\d+      →  one or more digits ("42", "1000")
colou?r  →  "color" or "colour"
\w{3,8}  →  a word between 3 and 8 characters long

Anchors

Anchors match a position in the string, not a character:

^   →  start of string (or start of line in multiline mode)
$   →  end of string (or end of line)
\b  →  word boundary

Examples:

^hello     →  "hello" only at the start
world$     →  "world" only at the end
\bcat\b   →  "cat" as a whole word, not inside "concatenate"

Groups and Alternation

Parentheses () group parts of a pattern and capture the matched text:

(\d{4})-(\d{2})-(\d{2})  →  captures year, month, day from a date like "2026-04-03"

Alternation | works like a logical OR:

cat|dog    →  matches "cat" or "dog"
(jpg|png|webp)  →  matches any of the three extensions

Non-capturing groups (?:) group without capturing — useful when you want alternation but don't need to capture the result:

(?:https?|ftp)://  →  matches "http://", "https://", or "ftp://"

Practical Patterns

Email address (simplified):

[\w.+-]+@[\w-]+\.[a-zA-Z]{2,}

US phone number:

\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}

URL:

https?://[\w.-]+(?:\.[a-zA-Z]{2,})(?:/[^\s]*)?

Hex color code:

#[0-9A-Fa-f]{3,6}

Date in YYYY-MM-DD format:

\d{4}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12]\d|3[01])

Flags

Most regex engines support flags that modify matching behavior:

  • i — case-insensitive (Cat matches cat, CAT, etc.)
  • g — global (find all matches, not just the first)
  • m — multiline (^ and $ match per line, not just start/end of string)
  • s — dotall (makes . match newlines too)

Greedy vs. Lazy Matching

By default, quantifiers are greedy — they match as much as possible. Add ? to make them lazy (match as little as possible):

Input: <b>bold</b> and <i>italic</i>

<.+>   →  greedy: matches the entire string "<b>bold</b> and <i>italic</i>"
<.+?>  →  lazy: matches "<b>", then "</b>", then "<i>", then "</i>"

Lazy matching is essential when working with HTML or any nested structure.

Testing and Debugging

Regex is notoriously hard to read, especially complex patterns. The best approach is to build patterns incrementally — start with the simplest version that partially works, then extend it. Use the Regex Tester to test patterns against sample input with live match highlighting, or the Find & Replace tool to apply regex substitutions to any block of text.

Related Tools