A regular expression (regex) is a pattern that describes a set of strings. Instead of searching for a literal word or phrase, you describe the shape of what you're looking for — things like "a digit followed by three letters" or "any email address" or "a line that starts with a hash." Regex is supported in virtually every programming language and in most text editors, making it one of the most transferable skills in software development.
The Building Blocks
Literal characters match themselves. The pattern cat matches the string "cat" wherever it appears.
The dot . matches any single character except a newline:
c.t → matches "cat", "cut", "c4t", "c t"
Character classes [] match any one character from a set:
[aeiou] → any vowel
[a-z] → any lowercase letter
[A-Za-z0-9] → any letter or digit
[^aeiou] → any character that is NOT a vowel (^ negates inside [])
Shorthand character classes:
\d— any digit (equivalent to[0-9])\w— any word character (letters, digits, underscore)\s— any whitespace (space, tab, newline)\D,\W,\S— the negated versions
Quantifiers
Quantifiers specify how many times the preceding element must match:
* → zero or more
+ → one or more
? → zero or one (makes it optional)
{3} → exactly 3 times
{2,5} → between 2 and 5 times
{3,} → 3 or more times
Examples:
\d+ → one or more digits ("42", "1000")
colou?r → "color" or "colour"
\w{3,8} → a word between 3 and 8 characters long
Anchors
Anchors match a position in the string, not a character:
^ → start of string (or start of line in multiline mode)
$ → end of string (or end of line)
\b → word boundary
Examples:
^hello → "hello" only at the start
world$ → "world" only at the end
\bcat\b → "cat" as a whole word, not inside "concatenate"
Groups and Alternation
Parentheses () group parts of a pattern and capture the matched text:
(\d{4})-(\d{2})-(\d{2}) → captures year, month, day from a date like "2026-04-03"
Alternation | works like a logical OR:
cat|dog → matches "cat" or "dog"
(jpg|png|webp) → matches any of the three extensions
Non-capturing groups (?:) group without capturing — useful when you want alternation but don't need to capture the result:
(?:https?|ftp):// → matches "http://", "https://", or "ftp://"
Practical Patterns
Email address (simplified):
[\w.+-]+@[\w-]+\.[a-zA-Z]{2,}
US phone number:
\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}
URL:
https?://[\w.-]+(?:\.[a-zA-Z]{2,})(?:/[^\s]*)?
Hex color code:
#[0-9A-Fa-f]{3,6}
Date in YYYY-MM-DD format:
\d{4}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12]\d|3[01])
Flags
Most regex engines support flags that modify matching behavior:
i— case-insensitive (Catmatchescat,CAT, etc.)g— global (find all matches, not just the first)m— multiline (^and$match per line, not just start/end of string)s— dotall (makes.match newlines too)
Greedy vs. Lazy Matching
By default, quantifiers are greedy — they match as much as possible. Add ? to make them lazy (match as little as possible):
Input: <b>bold</b> and <i>italic</i>
<.+> → greedy: matches the entire string "<b>bold</b> and <i>italic</i>"
<.+?> → lazy: matches "<b>", then "</b>", then "<i>", then "</i>"
Lazy matching is essential when working with HTML or any nested structure.
Testing and Debugging
Regex is notoriously hard to read, especially complex patterns. The best approach is to build patterns incrementally — start with the simplest version that partially works, then extend it. Use the Regex Tester to test patterns against sample input with live match highlighting, or the Find & Replace tool to apply regex substitutions to any block of text.