Regular Expression Syntax

A comprehensive, reference-ready guide to regex metacharacters, quantifiers, groups, lookarounds, and performance best practices across modern engines.

๐Ÿ“… Updated: Nov 12, 2025
โฑ๏ธ Read Time: 14 min
๐Ÿ‘ค Aevum Editorial Team
Programming
Text Processing
Pattern Matching

Regular expressions (regex) provide a declarative language for pattern matching, text searching, validation, and transformation. Supported natively in most modern programming languages and text editors, regex engines parse concise syntax strings into optimized matching algorithms.

1. Introduction

A regular expression is a sequence of characters that defines a search pattern. While traditionally rooted in formal language theory (regular languages), modern implementations extend far beyond theoretical limits with features like backreferences, lookarounds, and recursive matching.

Most contemporary engines follow either the PCRE (Perl-Compatible Regular Expressions) or ECMAScript standard. This article documents the shared core syntax, noting engine-specific variations where applicable.

2. Basic Metacharacters

Metacharacters carry special meaning within a regex pattern. To match them literally, they must be escaped with a backslash (\).

SymbolMeaningExample
.Any character except newlinec.t โ†’ cat, cot, cut
^Start of string/line^Hello โ†’ Hello at beginning
$End of string/lineend$ โ†’ end at conclusion
\Escape character\. โ†’ literal dot
|Alternation (OR)cat|dog โ†’ matches either

3. Character Classes & Ranges

Character classes match a single character from a specified set. Square brackets [] define custom classes, while shorthand sequences provide convenience.

Regex
[a-zA-Z0-9]     โ†’ alphanumeric
[0-9]          โ†’ digits only
[^abc]         โ†’ negation: any except a, b, c
\w \d \s       โ†’ word, digit, whitespace
\W \D \S       โ†’ negated counterparts

๐Ÿ’ก Engine Note
\w behavior varies: in ASCII mode it matches [a-zA-Z0-9_], while Unicode-aware engines include accented characters and non-Latin scripts.

4. Quantifiers & Repetition

Quantifiers specify how many times the preceding token should repeat. By default, they are greedy (match as much as possible). Append ? to make them lazy (match as little as possible).

QuantifierMatchesLazy Variant
*0 or more*?
+1 or more+
?0 or 1??
{n}exactly nn/a
{n,}n or more{n,}?
{n,m}between n and m{n,m}?

5. Anchors & Word Boundaries

Anchors match positions rather than characters. They are zero-width assertions that constrain where a pattern can match.

Regex
\b       โ†’ word boundary (between \w and \W)
\B       โ†’ non-word boundary
\A       โ†’ absolute start of string
\Z       โ†’ absolute end of string
(?=X)    โ†’ positive lookahead (X follows)
(?!X)    โ†’ negative lookahead (X does not follow)

6. Groups & Capturing

Parentheses () group subpatterns and capture matched text for later reference. Non-capturing groups (?:) improve performance when backreference isn't needed.

JavaScript
const text = "2025-11-12";
const match = text.match(/(\d{4})-(\d{2})-(\d{2})/);
// match[1] โ†’ "2025", match[2] โ†’ "11", match[3] โ†’ "12"

// Non-capturing:
/(?:http|https):\/\//i

// Named groups (PCRE/JS ES2018+):
/(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/

7. Lookarounds & Assertions

Lookarounds allow conditional matching without consuming characters. They are essential for context-aware extraction.

AssertionDescriptionExample
(?=pattern)Positive lookahead\d+(?=px) โ†’ number before "px"
(?!pattern)Negative lookahead\b(?!foo)\w+\b โ†’ words not starting with "foo"
(?<=pattern)Positive lookbehind(?<=\$)\d+\.\d{2} โ†’ dollar amounts
(?<!pattern)Negative lookbehind(?<!\w)error โ†’ "error" at word start

โš ๏ธ Compatibility
Lookbehind assertions were not supported in JavaScript until ES2018, and fixed-length lookbehinds are required in many engines. PCRE and Python (re/regex modules) offer more flexible variable-length support.

8. Practical Examples

Email Validation (Basic)

Regex
/^[\w.-]+@[\w.-]+\.[a-zA-Z]{2,}$/

Note: RFC 5322 allows far more complex email formats. This pattern balances accuracy and readability for most production use cases.

Extracting Hex Colors

Python
import re
html = "<div style='color: #ff5733; background: #1a1a2e;'>"
colors = re.findall(r'#[0-9a-fA-F]{6}', html)
# ['#ff5733', '#1a1a2e']

Log Timestamp Parsing

Regex
/\[(?<ts>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})\] (?<level>\w+): (?<msg>.*)/

9. Best Practices & Performance

10. References & Further Reading