Hello and Welcome! This is a quick tutorial on a URL regex, also known as a regular expression!
Regular expressions, or REGEX for short, are a series of special characters that define a search pattern. Take the following example of a regular expression, which we’ll call “Matching a URL”:
/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/
- The characters '^' and '$' are both considered to be anchors.
- The '^' anchor signifies a string that begins with the characters that follow it. While the '$' anchor signifies a string that ends with the characters that precede it.
- In this regex it follows a format which alternates between literal and meta characters.
- It also alternates between a range and exact matches of string. Ranges are found inside brackets.
['insert range here]
7 total quantifier characters, 4 of which are unique! ? + {,} and *
?matches between 0 and 1 of the preceeding token.^(https?looks to match if there is http or https. ? also makes the quantifier lazy which tries to match as few characters as possible. It could also be optional due to the ? quantifier.+matches 1 or more of the preceeding token.([\da-z\.-]+)looks to match any digit between 0-9 and a-z which is followed by a '.' or a '-' character. Because it is a range of all characters it should expect most inputs with exception of special characters.{,}matches between 1 or 2 values of characters in preceeding token.{2,6}in this example there must be at least 2 characters but no more than 6 total.*matches 0 or more of the preceeding tokens.
Character classes, also known as a 'character set', are used to match one out of several characters OR with a hypen they can then match a range of characters. The classes, or 'sets', will be found inside square brackets [] (AKA 'Bracket Expression').
- There are 3 character classes in this regular expression.
[\da-z\.-]matches any digit between 0-9 from/d, matches a character in range of a through z (case sensitive), followed with\.-which will match a.and-[a-z\.]matches a character in range of a through z (case sensitive), followed with\.-which will match a.[\/\w \.-]matches a/, followed by any word from\wwhich is alphanumeric with an underscore. This accepts both capital and lowercase characters. There is a space following before completing expression with\.-which will match a.and-.
While this regex does not use any flags, below is a quick break down of the 6 Javascript flags and how they can help. Flags follow your regular expression with / followed with one of the following
imakes your regex search case-INSENSTIVE. Meaning it will seach for A or a.gstands for global which will returns all matches. Without it, you will only have first match returned.mMultiline mode matches not just the string between your^and$but also the start and end of the line of code.senables 'dotall; mode which allows a.to match newline character/n.uenables full unicode support. This flag allows correct processing of surrogate pairs. Javascript uses unicode encoding for strings which helps match characters across multiple writing systems around the world.aunicode is0x0061yknown as 'sticky' mode. This flag searches for the exact position in the text.
Grouping constructs use parantheses () to define sub-expressions within our regex. In our URL regex we have 4 unique groups that help us define a URL when searching. These groups define;
(https?:\/\/)defines thehttp://ORhttps://match of the URL.([\da-z\.-]+)defines the domain name of the URL with exception of the top-level domain (examplegithub).([a-z\.]{2,6})defines the top-level domain of the URL (example.com).([\/\w \.-]*)defines the path, query parameters or any URL code succeeding the domain and top-level domain groups (example .com/paths/).
For regular expressions, greedy matching is the default behavior. This means that the expression is attempting to match as much text as possible. Lazy matching, does the opposite, attempting to match as little text as possible. Of our 4 quantifiers, only the ? would be considered a lazy match as it looks for 0 or 1 match.
My name is Ceres Markley! Check out my work on Github!
back to top
Love how the #flags section is so clearly listed and just visually satisfying! I'm gonna borrow this style in mine!