Open In App

Properties of Regular Expressions

Last Updated : 18 Nov, 2024
Summarize
Comments
Improve
Suggest changes
Like Article
Like
Save
Share
Report
News Follow

Regular expressions, often called regex or regexp, are a powerful tool used to search, match, and manipulate text. They are essentially patterns made up of characters and symbols that allow you to define a search pattern for text.

In this article, we will see the basic properties of regular expressions and their work character and how they help in real-world applications.

What is a Regular Expression?

Regular Expression is a way of representing regular languages. The algebraic description for regular languages is done using regular expressions. They can define it in the same language that various forms of finite automata can describe. Regular expressions offer something that finite automata do not, i.e. it is a declarative way to express the strings that we want to accept. They act as input for many systems. They are used for string matching in many systems(Java, python, etc.)

Example: Lexical-analyzer generators, such as Lex or Flex.

The widely used operators in regular expressions are Kleene closure(∗), concatenation(.), and Union(+).

Regular expressions form the foundation of pattern matching and text processing. To explore their properties and applications in automata theory, the GATE CS Self-Paced Course provides detailed lessons on formal languages and regular expressions, preparing you for GATE.

Rules for Regular Expressions

  • The set of regular expressions is defined by the following rules.
  • Every letter of ∑ can be made into a regular expression, null string, ∈ itself is a regular expression.
    If r1 and r2 are regular expressions, then (r1), r1.r2, r1+r2, r1*, r1 + are also regular expressions.

Example –  âˆ‘ = {a, b} and r is a regular expression of language made using these symbols 

Regular language  Regular set
    ∅   { }
    ∈   {∈}
    a*   {∈, a, aa, aaa …..}
    a+ b   {a, b}
     a.b   {ab}
    a* + ba   {∈, a, aa, aaa,…… , ba}

Operations Performed on Regular Expressions

1. Union

The union of two regular languages, L1 and L2, which are represented using L1 ∪ L2, is also regular and which represents the set of strings that are either in L1 or L2 or both.

Example:

L1 = (1+0).(1+0) = {00 , 10, 11, 01} and 
L2 = {∈ , 100}
then L1 ∪ L2 = {∈, 00, 10, 11, 01, 100}.

2. Concatenation

The concatenation of two regular languages, L1 and L2, which are represented using L1.L2 is also regular and which represents the set of strings that are formed by taking any string in L1 concatenating it with any string in L2.

Example:

L1 = { 0,1 } and L2 = { 00, 11} then L1.L2 = {000, 011, 100, 111}.

3. Kleene closure

If L1 is a regular language, then the Kleene closure i.e. L1* of L1 is also regular and represents the set of those strings which are formed by taking a number of strings from L1 and the same string can be repeated any number of times and concatenating those strings.

Example:

L1 = { 0,1} = {∈, 0, 1, 00, 01, 10, 11 …….} , then L* is all strings possible with symbols 0 and 1 including a null string.

Algebraic Properties of Regular Expressions

Kleene closure is an unary operator and Union(+) and concatenation operator(.) are binary operators.

1. Closure

If r1 and r2 are regular expressions(RE), then 

  • r1*  is a RE
  • r1+r2 is a RE
  • r1.r2 is a RE

2. Closure Laws

  • (r*)* = r*, closing an expression that is already closed does not change the language.
  • ∅* = ∈, a string formed by concatenating any number of copies of an empty string is empty itself.
  • r + =  r.r* = r*r, as r* = ∈ + r + rr+ rrr …. and r.r* = r+ rr + rrr ……
  • r* = r*+ ∈

3. Associativity

If r1, r2, r3 are RE, then 
i.) r1+ (r2+r3) = (r1+r2) +r3 

  • For example : r1 = a , r2 = b , r3 = c, then
  • The resultant regular expression in LHS becomes a+(b+ c) and the regular set for the corresponding RE is {a, b, c}.
  • for the RE in RHS becomes (a+ b) + c and the regular set for this RE is {a, b, c}, which is same in both cases. Therefore, the associativity property holds for union operator.

ii.) r1.(r2.r3)  = (r1.r2).r3

  • For example – r1 = a , r2 = b , r3 = c
  • Then the string accepted by RE a.(b.c) is only abc.
  • The string accepted by RE in RHS is (a.b).c is only abc ,which is  same in both cases. Therefore, the associativity property holds for concatenation operator.

Associativity property does not hold for Kleene closure(*) because it is unary operator.

4. Identity

In the case of union operators,

r + ∅ = ∅ + r = r,

Therefore, ∅ is the identity element for a union operator.

In the case of  concatenation operator:

r.x = r , for x= ∈, r.∈ = r  

Therefore, ∈ is the identity element for concatenation operator(.).

5. Annihilator

  • If r+ x = r  â‡’  r ∪ x= x , there is no annihilator for +
  • In the case of a concatenation operator, r.x = x, when x = ∅, then r.∅ = ∅, therefore ∅ is the annihilator for the (.)operator. For example {a, aa, ab}.{ } = { }

6. Commutative Property

If r1, r2 are RE, then 

  • r1+r2 = r2+r1. For example, for r1 =a and r2 =b, then RE a+ b and b+ a are equal.
  • r1.r2 ≠ r2.r1. For example, for r1 = a and r2 = b, then  RE a.b is not equal to b.a.

7. Distributed Property

If r1, r2, r3 are regular expressions, then 

  • (r1+r2).r3 = r1.r3 + r2.r3  i.e. Right distribution
  • r1.(r2+ r3) = r1.r2 + r1.r3  i.e. left distribution
  • (r1.r2) +r3  â‰  (r1+r3)(r2+r3)

8. Idempotent Law

  • r1 + r1 = r1  â‡’  r1 ∪ r1 = r1 , therefore the union operator satisfies idempotent property.
  • r.r ≠  r ⇒ concatenation operator does not satisfy idempotent property.

9. Identities for Regular Expression

There are many identities for the regular expression. Let p, q and r are regular expressions.

  • ∅ + r = r
  • ∅.r= r.∅ = ∅
  • ∈.r = r.∈ =r
  • ∈* = ∈ and ∅* = ∈
  • r + r = r
  • r*.r* = r*
  • r.r* = r*.r = r + .
  • (r*)*  =  r*
  • ∈ +r.r* = r* = ∈ + r.r*
  • (p.q)*.p = p.(q.p)*
  • (p + q)* = (p*.q*)* = (p* + q*)*
  • (p+ q).r= p.r+ q.r and r.(p+q) = r.p + r.q

Conclusion

In conclusion, regular expressions are a versatile and powerful tool for working with text. They allow you to search, match, and manipulate patterns efficiently, making them invaluable in tasks like data validation, text searching, and automated editing. Mastering regular expressions can greatly enhance your efficiency and problem-solving capabilities. The flexibility and power they offer make them an essential skill in many fields.

Properties of Regular Expressions – FAQs

What are some common uses of regular expressions?

Regular expressions are commonly used for tasks such as pattern matching, text searching, data validation (e.g., email or phone numbers), extracting information from large datasets, and formatting or replacing text in files.

What is the difference between greedy and non-greedy quantifiers in regular expressions?

Greedy quantifiers match as much of the input as possible, while non-greedy quantifiers match as little as possible. For example, in the pattern .*, the * is greedy and will match as many characters as possible, whereas .*? is non-greedy and will match the fewest characters necessary.

What are lookahead and lookbehind assertions in regular expressions?

Lookahead ((?=…)) and lookbehind ((?<=…)) assertions are zero-width assertions that ensure a specific pattern either precedes or follows a position in the string without including the characters in the final match. They are useful for complex pattern matching where certain criteria need to be checked but not captured.

How do regular expressions handle special characters?

Special characters in regular expressions (like . for any character, * for zero or more matches, or \ for escaping) need to be either escaped with a backslash (\) if you want to treat them as literals or used as part of the regex syntax for their intended functionality. For example, to match a literal period, you would use \..



Next Article

Similar Reads

three90RightbarBannerImg