Pumping Lemma for Regular Languages

The Pumping Lemma

All regular languages satisfy the pumping lemma:

LΣ, regular(L)    p1 such thatwL, wp    x,y,zΣ such thatw=xyzy1xypn0, xynzL \begin{alignat*}{2} \c{blue}{\forall L \subseteq}&\c{blue}{ \Sigma^*,\ \text{regular}(L)} \implies \c{magenta}{\exists p \ge 1} \text{ such that}\\ &\begin{alignat*}{2} \c{teal}{\forall w\in }&\c{teal}{ L,\ \left|w\right| \ge p} \implies \c{green}{\exists x, y, z \in \Sigma^*}\text{ such that}\\ &\begin{alignat*}{2} \c{green}{w = xyz} \hspace{0.01cm} \land \c{yellow}{\left|y\right| \ge 1} \hspace{0.01cm} \land \c{orange}{\left|xy\right| \le p} \hspace{0.01cm} \land \c{cyan}{\forall n \ge 0,\ xy^{n}z \in L}\\ \end{alignat*} \end{alignat*} \end{alignat*}

The pumping lemma states that if a language LL is regular, then there exists some integer p1p \ge 1 such that every string ww in LL at least pp in length can be split into substrings x,y,zx, y, z where:

  • The substring to be “pumped” yy has at least length 1
  • The first two substrings together xyxy has at most length pp
  • The result of pumping the substring yy any number of times is also in LL

If you now understand the lemma given the definition above, the post ends here for you! Otherwise, I hope to explain it in more detail below.

Inside Out

Looking at the innermost statement:

x,y,zΣ such thatw= xyzy1xypn0, xynzL \begin{alignat*}{2} \c{green}{\exists x, y, z \in}&\c{green}{ \Sigma^*} \text{ }\text{such that}\\ &\begin{alignat*}{2} \c{green}{w =}&\c{green}{\ xyz}\\ &\land \c{yellow}{\left|y\right| \ge 1}\\ &\land \c{orange}{\left|xy\right| \le p}\\ &\land \c{cyan}{\forall n \ge 0,\ xy^nz \in L}\\ \end{alignat*} \end{alignat*}

The statement shows that there exists some string ww that can be split into substrings x\c{magenta}{x}, y\c{red}{y}, and z\c{blue}{z}. The length of y\c{red}{y} must be at least 1 and the length of substring xy\c{magenta}{x}\c{red}{y} must be at most some constant pp. Additionally, pumping the string should still give a string in the language LL. Let’s assume the constant pp is the length of xy\c{magenta}{x}\c{red}{y} for now, and look at examples of pumping some string ww.

1. Regular expression
ab^*c
Accepted string w
abc
Pumped 0 times
abc
2. Regular expression
a(bc)^*d
Accepted string w
abcd
Pumped 0 times
abcd
3. Regular expression
a^*b^*c^*
Accepted string w
aaaabbbccc
Pumped 0 times
aaaabbbccc

The three examples shown above demonstrate that the substrings are chosen in a way that pumping substring yy still results in a string that is in the regular language LL.

Note that for the same regular expression and string, dividing the string into the wrong set of substrings can result in a string that is not in the language when pumped, as shown below for example 3:

4. Regular expression
a^*b^*c^*
Accepted string w
aaaabbbccc
Pumped 0 times
aaaabbbccc

Now consider the remainder of the statement:

LΣ, regular(L)    p1 such thatwL, wp     \begin{alignat*}{2} &\c{blue}{\forall L \subseteq \Sigma^*,\ \text{regular}(L)} \implies\\ &\begin{alignat*}{2} \c{magenta}{\exists p \ge}&\c{magenta}{ 1} \text{ such that}\\ &\begin{alignat*}{2} &\c{teal}{\forall w \in L,\ \left|w\right| \ge p} \implies\\ &\ldots \end{alignat*} \end{alignat*} \end{alignat*}

This shows that a regular language LL will have some constant pp such that every string w{w} that is at least p\c{magenta}{p} characters long can be pumped according to the rules above.

Therefore, if we can demonstrate that every constant p1p \ge 1 leads to some string ww (with length p\ge p) that cannot be pumped, we can prove that the language LL is non-regular.

Proving That a Language is Non-Regular

The classic non-regular language to prove using the pumping lemma is L={anbn:n0}L = \{a^n b^n : n \ge 0\} over the alphabet Σ={a,b}\Sigma = \{a, b\}.

We first assume that the language LL is regular, and therefore satistifes the pumping lemma. Our goal: Show that every constant p1p \ge 1 leads to some string ww (with length p\ge p) that cannot be pumped for a language LL.

We cannot pick a concrete value for pp, since we would then have to find ww for an infinite number of values, which is infeasible. Therefore, we need to come up with a string ww that is at least an unknown pp characters long. The smallest such string is a p/2b p/2a^{\ p/2} b^{\ p/2}. However, since we have freedom of choice, we’ll go with apbpa^{p}b^{p} for simplicity.

According to the pumping lemma, w=apbp=xyzw = \c{cyan}{a^p}\c{teal}{b^p} = \c{magenta}{x}\c{red}{y}\c{blue}{z} for some arbitrary x,y,zx, y, z. Since xyp\c{orange}{\left|xy\right| \le p} and ap\c{cyan}{a^p} has length p\c{orange}{p}, x\c{magenta}{x} and y\c{red}{y} can only consist of a\c{cyan}{a}. Additionally, since y1\c{yellow}{\left|y\right| \ge 1}, y\c{red}{y} must consist of at least 1 a\c{cyan}{a}.

Here we have our contradiction: Since yy only contains aa, pumping even once would result in aa and bb no longer having the same number of occurrences. The pumped string would no longer be in apbpa^p b^p.

5. Regular expression
a^pb^p
Accepted string w
aaabbb
Pumped 0 times
aaabbb

Therefore, L={anbn:n0}L = \{a^n b^n : n \ge 0\} is not a regular language.

Finite vs. Infinite Regular Languages

The pumping lemma is mainly used with infinite regular languages. For finite languages, the pumping lemma is vacuously true.

Consider any finite regular language (for example, abcdabcd ). To pick some integer pp, we can pick the integer m+1m + 1 where mm is the maximum length of a string (in the example, m+1=5m + 1 = 5 ). Since there can never be a string ww at least m+1m + 1 in length in LL, we don’t need to worry about the remainder of the lemma.

For infinite regular languages, there can never be a maximum string length, so we do have to consider the pumping lemma fully.