Home

RegEx Searching

(Regular Expressions Searching)

Regular Expressions can be used in various javascript methods.
Regular Expressions can be extremely complex but they are very flexible and powerful and can be used to perform comparisons that cannot be done using the other checks available.
There follows some very basic examples of regular expression usage.
For a complete description please visit w3schools or www.regular-expressions.info

It can seem very complicated...but here goes...

Regular Expressions can be used in various javascript methods - like :-
.test() = returns True if the match was found & False if there was no match.
.match() = returns null if no match or returns an Array of matches.
.replace() = used in order to replace certain characters with another character.
.search() = used in order to get the starting position of the FIRST instance of your search string.

eg
var x;
var y;
var stringToBeChecked = "Hello World!";
var stringToSearchFor = /w[ox]/;

x = stringToSearchFor.test(stringToBeChecked);
y = stringToBeChecked.match(stringToSearchFor);
z = stringToBeChecked.replace(/lo/gi", "")
zz = stringToBeChecked.replace(/[lo]/", "%")
a = stringToBeChecked.search(/or/)


(Note
There are NO quote marks used in the Regex expression /w[ox]/ )
AND also note in the .test() & .match() formats the variables are switched about.


In the above scenario :-
x will = TRUE as the .test() method just returns a True (matched) or False (not matched).
y will be an array if there were any matches or y will = null if there were no matches.
z will = "He wrd!" as ALL l & o's will be replaced with nothing.
zz will = "He%lo world!" as only the FIRST occurrence will be replaced - if it finds 'l' first then it will stop searching for other letters to replace and not change any more l's or any o's.
a will = 7 as the FIRST occurrence of "or" starts at chr position 7 (0=1st chr. -1=Not Found)

Searching finds the first occurrence and is case-sensitive

If you need to find ALL occurences then you need to include the global search parameter after your search expression - do this by including g.

If you need to do a case-insensitive search then you need to include i after the search expression.

If you have a string that has line breaks in (\n = line break) then you need to include the multi-line matching symbol (m)) sfter the search expression.

A search expression needs to be enclosed inside / /.
eg
/abc/ will search for the first instance of abc only
/abc/g will search for ALL instances of abc only
/abc/i will search for the first instance of ABC and abc
/abc/gi will search for ALL instances of ABC and abc
/abc/igm will search for ALL instances of abc & ABC even if line breaks are present


Top

My attempt at an explanation of some things...

Using the ^ symbol just after the opening /
When a ^ is used just after the opening / in the search expression it means to search for the chr's requested at the BEGINNING of your search field.
(using ^ within the search expression means NOT (explained later))
eg
/^ABC/ finds ABCDEF but not ZABCDEF


Using the $ symbol just before the closing /
When a $ is used just before the closing / in the search expression it means to search for the chr's requested at the END of your search field.
eg
/ABC$/ finds XYZABC but not ABCDEF


Using the square bracket [ ] symbols
use the square brackets [] to specify which characters are allowed in a single position of a string.
If you have [ABC] it means it will look for the character A or B or C in any position
so if you have a field to search that = "Hello world!"
eg
/[ABC]/ : would not match as there are no capital A, B or C's present.
/[EFG]/ : would not match as there are no capital E, F or G's present.
/[efg]/ : would match as there is an e present.
/[EFG]/i : would match as you are using the i to perform a case-insensitive search
/w[ox]/ : would match as this means search for a 'w' followed by either an o or an x


Using the ^ symbol inside square brackets []
If the ^ is used within the search expression (and inside the square brackets []) then that means NOT
eg
/[^ABC]/ : would match as there are no capital A, B or C's present and you are saying you are searching a field where A, B or C is not present.
/[^EFG]/ : would match as there are no capital E, F or G's present.
/[^efg]/ : would not match as there is an e present.
/[^EFG]/i : would not match as you are using the i to perform a case-insensitive search and there is a lower case e present.
/w[^ox]/ : would not match as this means search for a 'w' followed by a letter that is not an o or an x


Top

Using the *, + and ? symbols
the symbols '*', '+', and '?', denote the number of times a character or a sequence of characters may occur.
What they mean is:
* = "zero or more"
+ = "one or more"
? = "zero or one."
eg
/ab*/ : would match a string that has an a followed by zero or more b's ("ac", "abc", "abbc", etc.)
/ab+/ : same, but there's at least one b ("abc", "abbc", etc., but not "ac")
/ab?/ : there might be a single b or not ("ac", "abc" but not "abbc").
/a?b+$/ : a possible 'a' followed by one or more 'b's at the end of the string: Matches any string ending with "ab", "abb", "abbb" etc. or "b", "bb" etc. but not "aab", "aabb" etc.


Using the braces { } symbol
Known as bounds...values which appear inside braces indicate ranges in the number of occurrences
Note... that you must always specify the first number of a range (i.e., "{0,2}", not "{,2}").
Also, the symbols '*', '+', and '?' have the same effect as using the bounds "{0,}", "{1,}", and "{0,1}", respectively.
eg
/ab{2}/ : matches a string that has an a followed by exactly two b's ("abb")
/ab{2,}/ : matches a string that has an a followed by two or more b's ("abb", "abbb" etc)
/ab{3,5}/ : matches a string that has an a followed by three to five b's ("abbb", "abbbb" or "abbbbb")

Using the round brackets ( ) symbols (parentheses)
To quantify a sequence of characters, put them inside parentheses
eg
/a(bc)*/ : matches a string that has an a followed by zero or more copies of the sequence "bc"
/a(bc){1-5}/ : same, but there must be between one and five instances of the sequence "bc")


Using the | symbol
Known as the OR operator...
eg
/hi|hello/ : matches a string that has either "hi" or "hello" in it
/(b|cd)ef)/ : matches "bef" or "cdef"


Using the dot (.) symbol
A dot (.) stands for any single character
eg
/a.[0-9]/ : matches a string that has an 'a' followed by any chr and a digit (0-9)
/^.{3}$)/ : matches a string with exactly 3 chr's in it


Top

Special character (Metacharacters)

Regex also uses some special characters (always start with a backslash (\))
Eg...

. (dot) - Find any single character, except newline or other line terminators.
\w - Find a 'word character' (I think it's A-Z, a-z, 0-9, _, )
\W - Find a 'non-word character' (anything that is not A-Z, a-z, 0-9, _, )
\d - Find a 'digit' (0-9)
\D - Find a 'non-digit' (anything other than 0-9)
\s - Find a 'whitespace chr' (space, tab, c/return, new line, vertical tab, form feed)
\S - Find a 'non-whitespace chr' (not: space, tab, c/return, new line, vertical tab, form feed)
\b - Find a match at the beginning (\bABC) or at the end (ABC\b) of a word
\B - Find a match but NOT at the beginning (\bABC) or at the end (ABC\b) of a word
\0 - Find a NULL character...it is \zero (\0) not \owe (\O)

You can also search for New Line (\n), form feed (\f), c/return (\r), tab (\t) and vertical tab (\v).

NOTE...
[..] has a special meaning in regular expression.
[xyz] matches x, y or z
[^...] negates that; [^xyz] matches any character that is not x, y, or z
BUT if [ is preceded by \, it loses its special meaning and matches [ literally.
Eg... \[ checks for [ , \* checks for * , \^ checks for ^ , etc

A little recap on some of it...

n+ - Matches a string that contains at least one n
n* - Matches any string that contains zero or more occurrences of n
n* - Matches any string that contains zero or only one occurrence of n
n{X} - Matches any string that contains a sequence of X n's
n{X,Y} - Matches any string that contains a sequence of X to Y n's
n{X,} - Matches any string that contains a sequence of X or more n's
n$ - Matches any string with n at the END of it
^n - Matches any string with n at the BEGINNING of it
?=n - Matches any string that is followed by a specific string n
?!n - Matches any string that is not followed by a specific string n

Top

Playtime...

String to be checked for (toBeChecked):


RegEx expression (regExp):
















Useful...

An example of a RegEx expression that a DATE has been correctly formatted could be
   /^(\d){1,2}\/(\d){1,2}(\d){4}$/
this allows (dd/mm/yyyy or d/m/yyyy)
BUT does not check if the numbers entered are a valid DD (1-31) or MM (1-12)


An example of a RegEx expression that a TIME has been correctly formatted could be
   /^(\d){1,2}:(\d){1,2}([ ]?[ap]m)?$/
this allows (HH:MM or HH:MMam/pm or HH:MM am/pm)
BUT does not check if the numbers entered are a valid HH (1-23) or MM (1-59)












NOTE
I am not going to validate the entries as, in real life, you would not really use a normal text box here.
You would use type = "date" or type = "time" in the <input> boxes.

Top