Perl is very popular among Bioinformaticians and one of reasons behind it is the capability to do string matching and pattern matching with regular expressions. The cumbersome task of finding a pattern in a long sequence can be done easily with perl string matching method.
How perl pattern matching is carried out?
As pointed earlier, regular expressions are widely used for finding any patterns which might be present in a string. Another popular method is approximate string matching. With help of string matching algo, any set of patterns in a given string can be easily found out.
Three approaches are used mainly for doing perl pattern matching or string matching which are as follows:
A) Edit Distance: In this method, edit distance is calculated which is number of operations needed for transforming one string to another string.
B) Levenshtein distance: It is another popular perl string matching algorithm. It is a metric which is defined as number of operations required deriving one string from another but operations are limited to substitution, insertion or deletion of single character.
C) Fuzzy String Matching: It this method of perl pattern matching, strings are found out based on fact that they closely match with the given pattern.
String Matching and Regular Expressions:
Some regular expressions used for string matching are as follows:
$string =~ m/pattern/; # m is "match" operator.
The above operation will return true if $string will contain pattern. If it doesn’t contain it, it will be returning false.
If you want to search for a pattern in start of string, it can be done through following syntax:
$string =~ m/^pattern/;
If you want to find it at end of string, $ needs to be placed in the end like following:
$string =~ m/pattern$/;
If you want case insensitive match, you can do it easily by putting i in the end like:
$string =~ m/pattern$/i;
Perl String Matching Applications:
a) Approximate string matching is widely used for finding genetic variability among organisms.
e.g: Let’s take a sequence which four variations found in its genetic sequence in different organisms:
1) ATGGTACGTA
2) ATGGTACGAT
3) ATGGTACGAA
4) ATGGTAAGTG
With help of approximate string matching algorithm, two sequences can be easily compared and first two closely matched sequences are given as output.
b) Pattern matching is also used for finding a consensus sequence among many closely related organisms.
In future, we will be covering more on perl string matching and perl pattern matching. For
perl regex tutorial
click here.