Regular Expressions Introduction

About 933 wordsAbout 12 min

2025-08-07

Basic Definitions

Alphabet and Regular Expressions

Alphabet ( $\Sigma$ ): A finite set of symbols
Regular Expression: Built from alphabet symbols using:
- Union ( $+$ ): $r_1 + r_2$ matches either $r_1$ or $r_2$
- Concatenation ( $\cdot$ ): $r_1 \cdot r_2$ matches $r_1$ followed by $r_2$
- Kleene Star ( $*$ ): $r^*$ matches zero or more repetitions of $r$

Formal Definition

A regular expression over alphabet $\Sigma$ is defined recursively:

Base Cases:
- $\emptyset$ is a regular expression denoting the empty language
- $\epsilon$ is a regular expression denoting $\{\epsilon\}$
- For each $a \in \Sigma$ , $a$ is a regular expression denoting $\{a\}$
Inductive Cases:
- If $r_1$ $r_{1}$ and $r_2$ $r_{2}$ are regular expressions:
  - $(r_1 + r_2)$ is a regular expression
  - $(r_1 \cdot r_2)$ is a regular expression
  - $(r_1^*)$ is a regular expression

Precedence Rules

When parentheses are omitted, the following precedence applies (highest to lowest):

Kleene Star ( $*$ )
Concatenation ( $\cdot$ )
Union ( $+$ )

Tips

Always use parentheses to clarify precedence when combining operators

Examples

Simple Regular Expressions

Regular Expression	Language Description	Example Strings
$a$	Single 'a'	"a"
$a + b$	'a' or 'b'	"a", "b"
$a \cdot b$	'a' followed by 'b'	"ab"
$a^*$	Zero or more 'a's	"", "a", "aa", "aaa", ...
$(a + b)^*$	Any combination of 'a' and 'b'	"", "a", "b", "aa", "ab", "ba", "bb", ...

More Complex Examples

Regular Expression: $(a + b)^* \cdot a \cdot (a + b)^*$

This describes all strings over $\{a, b\}$ that contain at least one 'a'.

$(a + b)^*$ matches any prefix (including empty)
$a$ matches exactly one 'a'
$(a + b)^*$ matches any suffix (including empty)

Fundamental Properties

Union Properties

Commutative: $r_1 + r_2 = r_2 + r_1$
Associative: $(r_1 + r_2) + r_3 = r_1 + (r_2 + r_3)$
Identity: $r + \emptyset = r$

Concatenation Properties

Associative: $(r_1 \cdot r_2) \cdot r_3 = r_1 \cdot (r_2 \cdot r_3)$
Identity: $r \cdot \epsilon = \epsilon \cdot r = r$
Annihilator: $r \cdot \emptyset = \emptyset \cdot r = \emptyset$

Kleene Star Properties

Basic: $\emptyset^* = \epsilon^* = \epsilon$
Iteration: $(r^*)^* = r^*$
Closure: $\epsilon + r \cdot r^* = r^*$

Language Operations

Important Language Classes

Finite Languages: Can be expressed using union
- $L = \{w_1, w_2, ..., w_n\} = w_1 + w_2 + ... + w_n$
Infinite Languages: Require Kleene star
- $L = \{a^n \mid n \geq 0\} = a^*$
- $L = \{a^n \mid n \geq 1\} = a \cdot a^*$
Concatenation of Languages
- If $L_1 = \{ab, a\}$ and $L_2 = \{b, bc\}$
- Then $L_1 \cdot L_2 = \{abb, abbc, ab, abc\}$

Practical Considerations

Shorthand Notation

In practice, we often use:

$ab$ instead of $a \cdot b$
$r^+$ for one or more repetitions: $r^+ = r \cdot r^*$
$r?$ for optional elements: $r? = \epsilon + r$

Common Patterns

Pattern	Meaning	Example
$a^*$	Zero or more 'a's	"", "a", "aa", ...
$a^+$	One or more 'a's	"a", "aa", "aaa", ...
$a?$	Zero or one 'a'	"", "a"
$a^n$	Exactly n 'a's	"a" (when n=1), "aa" (when n=2)

Regular expressions provide a powerful and concise way to describe patterns in strings. By combining the three fundamental operations (union, concatenation, and Kleene star), we can express both finite and infinite languages with remarkable precision.

The key insight is that regular expressions correspond to the class of regular languages, which can be recognized by finite automata, forming an important equivalence in automata theory.

Kleene’s Theorem (overview)

Kleene’s Theorem states that a language is regular iff it is recognized by some finite automaton (DFA/NFA) iff it can be described by a regular expression. We use this equivalence throughout (regex→NFA via Thompson, NFA→DFA via subset construction, DFA→regex via GNFA state elimination).

Problem-Solving Playbook

Clarify the alphabet and constraints
- List what is allowed/forbidden.
- Identify must-have substrings or positional constraints.
Choose building blocks
- Fixed pieces (literals), choices (union), repetition (Kleene star/plus), and optional parts.
Compose with precedence in mind
- Use parentheses to group; remember: Star > Concatenation > Union.
Sanity-check with examples and non-examples
- Generate a few strings that should match and should not match.
Simplify algebraically
- Apply idempotence, identities, and distributive laws to get a cleaner expression.

Worked Examples

Spec → Regex:active

Goal: Strings over {a,b} that end with ab or ba.

Build endings: ab + ba
Allow any prefix: (a + b)^*

Final: (a + b)^* (ab + ba)

Strings matched

Examples: ab, ba, aab, baba, abba

Non-examples: a, b, aa, bbb

Simplify

No further algebraic simplification changes the language meaningfully here. Keep as (a + b)^* (ab + ba).

Parse Tree vs Postfix

Parse tree clarifies precedence; postfix (reverse Polish) simplifies Thompson’s construction.

Example: (a + b) a^* → postfix ab+ a* ·.

Contains at least one a

Idea: any prefix, then an a, then any suffix.

Regex: (a + b)^* a (a + b)^*

Even number of a’s

Pattern of pairs of a possibly separated by b’s: (b^* a b^* a)^* b^*

Exactly three b’s

Allow any number of a’s between the three b’s: a^* b a^* b a^* b a^*

Mini Exercises (with hints)

Contains substringaba and does not end with bb

All strings over {0,1} with no11 as a substring

Strings over {a,b} where everyb is immediately followed by a

Tips

When designing a regex, write 3–5 positive and 3–5 negative examples first. This quickly reveals missing edge cases.

Changelog

9/7/25, 2:51 AM

View All Changelog

87c17-web-deploy(Auto): Update base URL for web-pages branchon 9/7/25

Copyright

License under:Attribution-NonCommercial-NoDerivatives 4.0 International (CC-BY-NC-ND-4.0)