---
title: Regular Expressions
tags: Programming Concepts, Python Cheatbook
---
# Regular Expressions (RegEx)
Regular expressions also known as *RegEx*, is a search pattern expression for text. It is one of the important tools used by data wranglers to search, filter and validate text data for their applications.
Lets say I have a text
> Quick brown fox jumps over a quick fox.
Lets search for text `quick` (case-insensitive)
> ==Quick== brown fox jumps over a ==quick== fox.
:::success
2 matches
:::
That was easy, we got 2 matches, but now we want to search for 'Quick' only at the start of the text. What do we do?
1. search for `Quick` and match-case, this might fail if there are grammar mistakes.
1. use regex search `^quick`, where `^` meta-character says match the following **keyword** with only the start of the text.
Before we start it is important to note that each language has a slightly different implementation of regex, **most** of the implementations have common attributes among them which will be covered in the basics.
In the following sections we will look at basic regex usage with examples, :star: indicates popular regex based on my assumptions.
# Anchors `^` `$` `\b` `\B`
:star: Search: `^quick` (case-insensitive)
> ==Quick== brown fox jumps over a quick fox.
:::success
1 match
:::
Search: `fox$` (case-insensitive)
> Quick brown fox jumps over a quick fox.
:::danger
No match. This failed because there is a full stop at the end of the text, update the regex to include a full stop `fox\.$`.
:::
:star: Search: `fox\.$` (case-insensitive)
> Quick brown fox jumps over a quick ==fox.==
:::success
1 match, use `\` to **escape special regex characters**, `.` is a special regex character (more on it below) that must be escaped in this case.
:::
Search: `fox\b` (case-insensitive)
> Quick brown ==fox== jumps over a quick **foxy**
:::success
1 match, `\b` is a **word boundary match**, i.e. match any character that separates letters.
:::
Search: `fox\B` (case-insensitive)
> Quick brown fox jumps over a quick **==fox==y**
:::success
1 match, `\B` is a **not-word boundary match**, i.e. match any character that does not separates letters.
:::
<!-- todo remaining article -->
# Examples
Search for a N letter words in a document.
Search: `^[A-Za-z]{N}$` (where N is an integer)
# Resources
1. Regulex helps visualizing regex patterns, https://jex.im/regulex/#!flags=&re=%5E(a%7Cb)*%3F%24
2. Compile and test regex https://regexr.com/