Pumping Lemma for Regular Languages

The Pumping Lemma

All regular languages satisfy the pumping lemma:

\begin{alignat*}{2} \c{blue}{\forall L \subseteq}&\c{blue}{ \Sigma^*,\ \text{regular}(L)} \implies \c{magenta}{\exists p \ge 1} \text{ such that}\\ &\begin{alignat*}{2} \c{teal}{\forall w\in }&\c{teal}{ L,\ \left|w\right| \ge p} \implies \c{green}{\exists x, y, z \in \Sigma^*}\text{ such that}\\ &\begin{alignat*}{2} \c{green}{w = xyz} \hspace{0.01cm} \land \c{yellow}{\left|y\right| \ge 1} \hspace{0.01cm} \land \c{orange}{\left|xy\right| \le p} \hspace{0.01cm} \land \c{cyan}{\forall n \ge 0,\ xy^{n}z \in L}\\ \end{alignat*} \end{alignat*} \end{alignat*}

The pumping lemma states that if a language $L$ is regular, then there exists some integer $p \ge 1$ such that every string $w$ in $L$ at least $p$ in length can be split into substrings $x, y, z$ where:

The substring to be "pumped" $y$ has at least length 1
The first two substrings together $xy$ has at most length $p$
The result of pumping the substring $y$ any number of times is also in $L$

If you now understand the lemma given the definition above, the post ends here for you! Otherwise, I hope to explain it in more detail below.

Inside Out

Looking at the innermost statement:

\begin{alignat*}{2} \c{green}{\exists x, y, z \in}&\c{green}{ \Sigma^*} \text{ }\text{such that}\\ &\begin{alignat*}{2} \c{green}{w =}&\c{green}{\ xyz}\\ &\land \c{yellow}{\left|y\right| \ge 1}\\ &\land \c{orange}{\left|xy\right| \le p}\\ &\land \c{cyan}{\forall n \ge 0,\ xy^nz \in L}\\ \end{alignat*} \end{alignat*}

The statement shows that there exists some string $w$ that can be split into substrings $\c{magenta}{x}$ , $\c{red}{y}$ , and $\c{blue}{z}$ . The length of $\c{red}{y}$ must be at least 1 and the length of substring $\c{magenta}{x}\c{red}{y}$ must be at most some constant $p$ . Additionally, pumping the string should still give a string in the language $L$ . Let's assume the constant $p$ is the length of $\c{magenta}{x}\c{red}{y}$ for now, and look at examples of pumping some string $w$ .

The three examples shown above demonstrate that the substrings are chosen in a way that pumping substring $y$ still results in a string that is in the regular language $L$ .

Note that for the same regular expression and string, dividing the string into the wrong set of substrings can result in a string that is not in the language when pumped, as shown below for example 3:

Now consider the remainder of the statement:

\begin{alignat*}{2} &\c{blue}{\forall L \subseteq \Sigma^*,\ \text{regular}(L)} \implies\\ &\begin{alignat*}{2} \c{magenta}{\exists p \ge}&\c{magenta}{ 1} \text{ such that}\\ &\begin{alignat*}{2} &\c{teal}{\forall w \in L,\ \left|w\right| \ge p} \implies\\ &\ldots \end{alignat*} \end{alignat*} \end{alignat*}

This shows that a regular language $L$ will have some constant $p$ such that every string ${w}$ that is at least $\c{magenta}{p}$ characters long can be pumped according to the rules above.

Therefore, if we can demonstrate that every constant $p \ge 1$ leads to some string $w$ (with length $\ge p$ ) that cannot be pumped, we can prove that the language $L$ is non-regular.

Proving That a Language is Non-Regular

The classic non-regular language to prove using the pumping lemma is $L = \{a^n b^n : n \ge 0\}$ over the alphabet $\Sigma = \{a, b\}$ .

We first assume that the language $L$ is regular, and therefore satistifes the pumping lemma. Our goal: Show that every constant $p \ge 1$ leads to some string $w$ (with length $\ge p$ ) that cannot be pumped for a language $L$ .

We cannot pick a concrete value for $p$ , since we would then have to find $w$ for an infinite number of values, which is infeasible. Therefore, we need to come up with a string $w$ that is at least an unknown $p$ characters long. The smallest such string is $a^{\ p/2} b^{\ p/2}$ . However, since we have freedom of choice, we'll go with $a^{p}b^{p}$ for simplicity.

According to the pumping lemma, $w = \c{cyan}{a^p}\c{teal}{b^p} = \c{magenta}{x}\c{red}{y}\c{blue}{z}$ for some arbitrary $x, y, z$ . Since $\c{orange}{\left|xy\right| \le p}$ and $\c{cyan}{a^p}$ has length $\c{orange}{p}$ , $\c{magenta}{x}$ and $\c{red}{y}$ can only consist of $\c{cyan}{a}$ . Additionally, since $\c{yellow}{\left|y\right| \ge 1}$ , $\c{red}{y}$ must consist of at least 1 $\c{cyan}{a}$ .

Here we have our contradiction: Since $y$ only contains $a$ , pumping even once would result in $a$ and $b$ no longer having the same number of occurrences. The pumped string would no longer be in $a^p b^p$ .

Therefore, $L = \{a^n b^n : n \ge 0\}$ is not a regular language.

Finite vs. Infinite Regular Languages

The pumping lemma is mainly used with infinite regular languages. For finite languages, the pumping lemma is vacuously true.

Consider any finite regular language (for example, $abcd$ ). To pick some integer $p$ , we can pick the integer $m + 1$ where $m$ is the maximum length of a string (in the example, $m + 1 = 5$ ). Since there can never be a string $w$ at least $m + 1$ in length in $L$ , we don't need to worry about the remainder of the lemma.

For infinite regular languages, there can never be a maximum string length, so we do have to consider the pumping lemma fully.