The first time I ever encountered a regular expression was many years ago now, but I still remember my first thoughts on it:
- What is this string-like thing?
- I don’t want to touch it, it looks scary.
I don’t remember quite what that regex was doing, or how exactly it looked like, but it scared me to death. In hindsight, I now realize that it wasn’t actually that scary after all. In face, it was an easy way to solve the problem in hand. But why did I ever feel this way? It’s just the awkwardness of the syntax, they certainly look strange, and if you don’t know what they are, they look very complicated.
My intention here is not to scare you because regex can be simple. But if you don’t understand regex just yet, it could look a little daunting, like this example below:
In this article, I’m going to demystify regular expressions. I’ll tell you what they look like, what they’re used for, and howyou can design your regular expressions to solve problems.
So, first. What are regular expressions?
Regular expressions are a way to describe patterns in data strings. They have their own syntax, as is they are their own programming language, and there are methods and ways to interact with regular expressions in most (if not all) programming languages.
But what kind of patterns are we talking about? Common examples of regular expressions determine for example if a given string is an email address or a phone number, or they can be used to verify if a password fulfills a certain complexity.
Once you have the pattern, what can you do with the regular expressions?
- Validate a string with the pattern.
- Search within a string.
- Replace substrings in a string.
- Extract information from a string.
Working with regular expressions
I’m also going to cover how to work with regular expressions in JavaScript, though the concepts learned here apply to other languages as well. With that said, in other languages there may be some differences in the way they treat regular expressions.
Let’s look at an example that will validate if the string contains the word Hello
or not.
In JavaScript, there are two ways to find this out:
- Constructor
- Literal
Constructor
Literal
In both scenarios, the variable regex
is an object, which exposes different methods we can use to interact with the regular expression. However, the first example has a more familiar look, instancing an object with a string
as a parameter. In the second scenario things look a bit weird, there’s something that resembles a string
but instead of quotes is wrapped in /
. As it turns out both ways represent the same, I personally like the second option, which is very clean, and IDEs or code editors can have syntax highlighting on the regular expression compared to the first scenario where the regular expression pattern is defined as just a string.
So far, our regular expressions have been fairly simple, it’s just the exact match on the string Hello
and it worked perfectly for JavaScript. However the result we obtained can be different for other languages, even though the regular expression is the same. This is because each programming language can define certain defaults or special behaviors in their regular expressions which can vary from one to another. So sorry about that, but that’s just how it is. When we build a RegEx though, for the most part, it will be the same in most programming languages. Before you use it somewhere else you will have to test it and adjust it if necessary.
Different uses of regular expressions
When working with regular expressions we’re basically working with the RegExp object methods, or with string methods which allow us to interact with regular expressions.
RegExp.prototype.test()
The test()
method executes a search for a match between a regular expression and a specified string. Returns true
or false
.
Here’s an example. First, see if the specified string contains the string foo:
RegExp.prototype.exec()
The exec()
method executes a search for a match in a specified string. Returns a result array, or null.
Example: Look for all the instances of foo
in the given string:
String.prototype.match()
The match()
method retrieves the result of matching a string against a regular expression.
Example: Find all the capital letters on a string:
String.prototype.matchAll()
The matchAll()
method returns an iterator of all results matching a string against a regular expression, including capturing groups.
Example: Find occurrences of a string in groups:
String.prototype.search()
The search()
method executes a search for a match between a regular expression and this string object. It returns the index at which the matched happened, or -1 is there is no match.
Example: Find the position of any character that is not a word character or white space:
String.prototype.replace()
The replace()
method returns a new string with some or all matches of a pattern replaced by a replacement. The pattern can be a string or a RegExp, and the replacement can be a string or a function to be called for each match. If the pattern is a string, only the first occurrence will be replaced.
Note that the original string will remain unchanged.
Example: Replace the word dog with monkey:
Not to be mistaken here, the method replace() uses regular expressions, so even when we pass a string, it will be interpreted as a regular expression and executed as such. Hence the reason why on the second console.log the word dog got replaced only once. But we will cover more on that later.
String.prototype.replaceAll()
The replaceAll()
method returns a new string with all matches of a pattern replaced by a replacement. The pattern can be a string or a RegExp, and the replacement can be a string or a function to be called for each match.
Example: Replace the word ‘dog’ with ‘monkey’:
It’s similar to our previous example, but now we replace all the matches. I usually avoid this function as I can always do it with regular expressions and using the replace() function plus is a function that’s not supported in all platforms/browsers.
String.prototype.split()
The split()
method divides a String into an ordered set of substrings and puts these substrings into an array, and returns the array. The division is done by searching for a pattern; where the pattern is provided as the first parameter in the method’s call.
Example:
Building regular expressions
Now that we know how to work with regular expressions and the different methods that are available to interact with, let’s spend some time building regular expressions to match the patterns we want.
Anchoring
/hello/
This will match hello
wherever it was put inside the string. If you want to match strings that start with ‘hello,’ use the ^
operator:
If you want to match strings that end with hello
, use the $
operator:
You can also combine them to find exact matches as seen below:
To find strings with wildcards in the middle you can use .*
, which matches any character repeated zero or more times:
Match items by character or numeric range
Once very cool feature of regular expressions is the ability to match by character or numeric range. But, what do I mean by range? Well, it’s something that looks like this:
These type regex patterns will match when at least one of the characters in the range matches:
You can also combine ranges:
Negating a pattern
We saw that the ^
character at the beginning of a pattern anchors it to the beginning of a string. However when used inside a range, it negates it:
Meta-characters
There are special characters in regular expressions, some of them include:
d
matches any digit, equivalent to [0-9]D
matches any character that’s not a digit, equivalent to [^0-9]w
matches any alphanumeric character (plus underscore), equivalent to [A-Za-z_0-9]W
matches any non-alphanumeric character, anything except [^A-Za-z_0-9]s
matches any whitespace character: spaces, tabs, newlines and Unicode spacesS
matches any character that’s not a whitespace- matches null
n
matches a newline charactert
matches a tab characteruXXXX
matches a unicode character with code XXXX (requires the u flag)- . matches any character that is not a newline char (e.g. n) (unless you use the s flag, explained later on)
[^]
matches any character, including newline characters. It’s useful on multiline stringsb
matches a set of characters at the beginning or end of a wordB
matches a set of characters not at the beginning or end of a word
Regular expression choices (or)
If you want to search one string or another, use the | operator:
Quantifiers
Quantifiers are special operators, here are some of them:
?
: optional quantifier Imagine you need to find if a string contains one digit in it, just the one, you can do something like:
+
: 1 ore more Matches one or more (>=1) items:
*
: 0 ore more Matches cero or more (>=0) items:
{n}
: fixed number of matches Matches exactly ‘n’ items:
{n, m}
: n to m number of matches Matches between ‘n’ and ‘m’ times:
m
can also be omitted, in that case, it will match at least ‘n’ items:
Escaping
As we saw already, there are certain characters which have a special meaning, but what if we want to match with of those characters? It’s possible to escape special characters with, let’s see an example:
Groups
Using parentheses, you can create groups of characters: (...)
:
You can also use the qualifiers (like the repetition or the optional qualifier) for a group:
Groups are also very interesting, especially when used with functions like match()
and exec()
as we saw before, they can be captured separately:
Example with exec()
:
Example with match()
:
Named capture groups
With ES2018 it’s now possible to assign names to groups, so that working with the results is much easier. Now, take a look at the following example without naming groups:
Now using named groups:
Flags
As we saw in the constructor example, regular expressions have some flags which change the behavior for the matches:
g
: matches the pattern multiple timesi
: makes the regex case insensitivem
: enables multiline mode. In this mode, ^ and $ match the start and end of the whole string. Without this, with multiline strings they match the beginning and end of each line.u
: enables support for unicode (introduced in ES6/ES2015)s
: short for single line, it causes the . to match new line characters as well
Flags can be combined, and in the case of regex literals they’re set at the end of the regex:
Or using the constructor as a second parameter of the function:
That was a lot, enough with that, let’s see some cool examples:
Password strength
This checks a password’s strength, and it’s especially useful if you want to build your own password validator. I know this is subjective, as different services may have different needs, but it’s a great place to start.
Validating email address
This is probably one of the most famous cases for regular expressions, validating email addresses.
IP Addresses
V4:
V6:
Pull domain from URL
Pull image source
Credit card numbers
Regular expressions are a very powerful feature, that can be intimidating at first. But once you get the hang of them, they’re pretty cool. Today we learnt what they are, how to use them, and how to build them and some cool examples. I hope the next time you see one of them in your projects you don’t run away (like I did), and you try to understand it and work with it.
This article was originally published on Live Code Stream by Juan Cruz Martinez, founder and publisher of Live Code Stream. He is a Software Engineer with more than 10 years of experience in the field, working in a wide variety of projects, from open source solutions to enterprise applications. Happily married, with a kid, officially engaged to JavaScript, in a love relationship with Python, and pursuing the writer’s dream! You can read this original piece here.
Live Code Stream is also available as a free weekly newsletter. Sign up for updates on everything related to programming, AI, and computer science in general.
Get the TNW newsletter
Get the most important tech news in your inbox each week.