Regular Expressions in Python
  • To use regular expressions in python you need to import re module
  • Regular expressions are used for pattern matching

Simple Match

  • Searches for the given pattern in that string and return True or False
  • If you want to ignore case you can use re.I
# Simple matching
string: input("Enter a String: ")

if (re.search('Python', string)): # search for python in string
    print ("Python satisfied")
else:
    print ("Python not satisfied")

if (re.search('python', string, re.I)): # search for pyhton in string ignroe case
    print ("python satisfied")
else:
    print ("python satisfied")

Anchors

  • ^ Represents starts with
  • $ Represents ends with
if (re.search("^python", string)): # Matches string if python is at the start of the sentence
    print ("^python satisfied")
else:
    print ("^python not satisfied")

if (re.search("python$", string)): # Matches string if python at the end of the sentence
    print ("python$ satisfied")
else:
    print ("python$ not satisfied")

if (re.search("^python$", string)): # Matches if input should be only the word python
    print ("^python$ satisfied")
else:
    print ("^python$ not satisfied")

Range of Characters

  • [0-9]:> Matches single digit
  • [a-z]:> Matches single lowercase alphabet
  • [A-Z]:> Matches single uppercase alphabet
  • [a-zA-Z]:> Matches any single alphabet
  • [0-9a-zA-Z]:> Matches single alphanumeric
if (re.search("[0-9][a-z][A-Z]", string)): # Matches string anywhere in the sentence which has a three letter word whoose first lette ris num , 2nd small char, 3rd caps char
    print ("[0-9][a-z][A-Z] satisfied")
else:
    print ("[0-9][a-z][A-Z] not satisfied")

if (re.search("^[0-9][a-z][A-Z]$", string)): # input should exactly be a three letter word whoose first lette ris num , 2nd small char, 3rd caps char
    print ("^[0-9][a-z][A-Z]$ satisfied")
else:
    print ("^[0-9][a-z][A-Z]$ not satisfied")

Meta characters

  • + Matches one or more occurrences of the previous character
  • * Matches zero or more occurrences of the previous character
  • ? Matches zero or one occurrence of the previous character
  if (re.search("^ab+c$", string)): # between a nd c atleast one b should be present
    print ("^ab+c$ - satisfied")
  else:
    print ("^ab+c$ - not satisfied")

  if (re.search("^ab*c$", string)): # between a nd c zero or more b's should be present
    print ("^ab*c$ - satisfied")
  else:
    print ("^ab*c$ - not satisfied")

  if (re.search("^ab?c$", string)): # between a nd c zero or one b should be present
    print ("^ab?c$ - satisfied")
  else:
    print ("^ab?c$ - not satisfied")

Quantifiers

  • {m} Matches exactly ’m’ occurrences of the previous character
  • {m,n} Matches minimum ’m’ and maximum ’n’ occurrences of the previous character
if (re.search("^ab{3}c$", string)): # only three b's should be present between a and c
    print ("^ab{3}c$ - satisfied")
  else:
    print ("^ab{3}c$ - not satisfied")

  if (re.search("^ab{1,3}c$", string)):# minimum one and maximum three b's should be present between a and c
    print ("^ab{1,3}c$ - satisfied")
  else:
    print ("^ab{1,3}c$ - not satisfied")

DOT Character

  • . Matches any single character
if (re.search("^a.c$", string)): # Any single character can be present between a and c
  print ("^a.c$ - satisfied")
else:
  print ("^a.c$ - not satisfied")

if (re.search("^a.*c$", string)): # Any number of characters can be present between a and c
  print ("^a.*c$ - satisfied")
else:
  print ("^a.*c$ - not satisfied")

Grouping

  • If you want to operate on a group of character you can enclose them in ()
# Groupings
if (re.search("^(ab){3}c$", string)): # valid for abababc
  print ("^(ab){3}c$ - satisfied")
else:
  print ("^(ab){3}c$ - not satisfied")

Character Range Escape Sequences

  • \d:> [0-9] -> Matches single digit
  • \D:> [^0-9] -> Matches single other than digit
  • \w:> [0-9a-zA-Z_]
  • \W:> [^0-9a-zA-Z_]
  • \s:> Spaces and Tabs
  • \S:> Other than Spaces and Tabs

Choices and Alternatives

  • [abc]:> Matches any one of ‘a’, ‘b’ and ‘c’
  • [^abc]:> Matches other than ‘a’, ‘b’ and ‘c’
if (re.search("^a[123]c$", string)):
  print ("^a[123]c$ - satisfied")
else:
  print ("^a[123]c$ - not satisfied")

if (re.search("^a[^123]c$", string)):
  print ("^a[^123]c$ - satisfied")
else:
  print ("^a[^123]c$ - not satisfied")

Difference between search and match

  • Search looks for the pattern anywhere in the given string and matches it
  • Match looks for the pattern only at the begining of the string
if (re.search("\d{3}", string)):
  print ("Search - Satisfied")
else:
  print ("Search - Not Satisfied")

if (re.match("\d{3}", string)):
  print ("Match - Satisfied")
else:
  print ("Match - Not Satisfied")

Findall

  • Returns all the values that are matched with the pattern in the form of list
all_matched: re.findall("\d{3}", string)
print (all_matched, '->', type(all_matched))

Split

  • Splits the string based on that pattern
  • See regex_demo.py in the code repository detailed for examples
spt_str: re.split("\d{3}", string)
print (spt_str, '->', type(spt_str))

Sub

  • Substitutes the pattern with the 555 in the given string
rep_str: re.sub("\d{3}", "555", string)
print (rep_str, '->', type(rep_str))