Perl | Searching in a File using regex
Last Updated :
07 Jun, 2019
Prerequisite: Perl | Regular Expressions
Regular Expression (Regex or Regexp or RE) in
Perl is a special text string for describing a search pattern within a given text. Regex in Perl is linked to host language and are not the same as in PHP, Python, etc. Sometimes these are termed as "Perl 5 Compatible Regular Expressions". To use the Regex, Binding operators like
=~
(Regex Operator) and
!~
(Negated Regex Operator) are used.
These Binding regex operators are used to match a string from a regular expression. The left-hand side of the statement will contain a string which will be matched with the right-hand side which will contain the specified pattern. Negated regex operator checks if the string is not equal to the regular expression specified on the right-hand side.
Regex operators help in searching for a specific word or a group of words in a file. This can be done in multiple ways as per the user's requirement. Searching in Perl follows the standard format of first opening the file in the read mode and further reading the file line by line and then look for the required string or group of strings in each line. When the required match is found, then the statement following the search expression will determine what is the next step to do with the matched string, it can be either added to any other file specified by the user or simply printed on the console.
Within the regular expression created to match the required string with the file, there can be multiple ways to search for the required string:
Regular Search:
This is the basic pattern of writing a regular expression which looks for the required string within the specified file. Following is the syntax of such a Regular Expression:
$String =~ /the/
This expression will search for the lines in the file which contain a word with letters '
the' in it and store that word in the variable
$String
. Further, this variable's value can be copied to a file or simply printed on the console.
Example:
Perl
use strict;
use warnings;
sub main
{
my $file = 'C:\Users\GeeksForGeeks\GFG.txt';
open(FH, $file) or die("File $file not found");
while(my $String = <FH>)
{
if($String =~ /the/)
{
print "$String \n";
}
}
close(FH);
}
main();
Output:

As it can be seen that the above search also results in the selection of words which have 'the' as a part of it. To avoid such words the regular expression can be changed in the following manner:
$String =~ / the /
By providing spaces before and after the required word to be searched, the searched word is isolated from both the ends and no such word that contains it as a part of it is returned in the searching process. This will solve the problem of searching extra words which are not required. But, this will result in excluding the words that contain comma or full stop immediately after the requested search word.
To avoid such situation, there are other ways as well which help in limiting the search to a specific word, one of such ways is using the word boundary.
Using Word Boundary in Regex Search:
As seen in the above Example, regular search results in returning either the extra words which contain the searched word as a part of it or excluding some of the words if searched with spaces before and after the required word. To avoid such a situation, word boundary is used which is denoted by '
\b
'.
$String =~ /\bthe\b/;
This will limit the words which contain the requested word to be searched as a part of it and will not exclude the words that end with a comma or full stop.
Example:
Perl
use strict;
use warnings;
sub main
{
my $file = 'C:\Users\GeeksForGeeks\GFG.txt';
open(FH, $file) or die("File $file not found");
while(my $String = <FH>)
{
if($String =~ /\bthe\b/)
{
print "$String \n";
}
}
close(FH);
}
main();
Output:

As it can be seen in the above given example, the word which is ending with full stop is included in the search but the words which contain the searched words as a part are excluded. Hence, word boundary can help overcome the problem created in the Regular Search method.
What if there is a case in which there is a need to find words that either start or end or both with specific characters? Then that can't be done with the use of Regular Search or the word boundary. For cases like these, Perl allows the use of WildCards in the Regular Expression.
Use of Wild Cards in Regular Expression:
Perl allows to search for a specific set of words or the words that follow a specific pattern in the given file with the use of Wild cards in Regular Expression. Wild cards are 'dots' placed within the regex along with the required word to be searched. These wildcards allow the regex to search for all the related words that follow the given pattern and will display the same. Wild cards help in reducing the number of iterations involved in searching for various different words which have a pattern of letters in common.
$String =~ /t..s/;
Above pattern will search for all the words which start with t, end with s, and have two letters/characters between them.
Example:
Perl
use strict;
use warnings;
sub main
{
my $file = 'C:\Users\GeeksForGeeks\GFG.txt';
open(FH, $file) or die("File $file not found");
while(my $String = <FH>)
{
if($String =~ /t..s/)
{
print "$String \n";
}
}
close(FH);
}
main();
Output:

Above code contains all the words as specified in the given pattern.
In this method of printing the searched words, the whole line that contains that word gets printed which makes it difficult to find out exactly what word is searched by the user. To avoid this confusion, we can only print the searched words and not the whole sentence. This is done by grouping the searched pattern with the use of parentheses. To print this grouping of words,
$number
variables are used.
$number variables
are the matches from the last successful match of the capture groups that are formed in the regular expression. e.g. if there are multiple groupings in the regular expression then
$1
will print the words that match the first grouping, similarly,
$2
will match the second grouping and so on.
Given below is the above program transformed using the $number variables to show only the searched words and not the whole sentence:
Perl
use strict;
use warnings;
sub main
{
my $file = 'C:\Users\GeeksForGeeks\GFG.txt';
open(FH, $file) or die("File $file not found");
while(my $String = <FH>)
{
if($String =~ /(t..s)/)
{
print "$1 \n";
}
}
close(FH);
}
main();
Output:
Similar Reads
Perl - Extracting Date from a String using Regex
In Perl generally, we have to read CSV (Comma Separated Values) files to extract the required data. Sometimes there are dates in the file name like sample 2014-02-12T11:10:10.csv or there could be a column in a file that has a date in it. These dates can be of any pattern like YYYY-MM-DDThh:mm:ss or
5 min read
Perl | Extract IP Address from a String using Regex
Perl stands for Practical Extraction and Reporting Language and this not authorized acronym. One of the most powerful features of the Perl programming language is Regular Expression and in this article, you will learn how to extract an IP address from a string. A regular expression can be either sim
4 min read
Perl | Opening and Reading a File
A filehandle is an internal Perl structure that associates a physical file with a name. All filehandles have read/write access, so once filehandle is attached to a file reading/writing can be done. However, the mode in which file handle is opened is to be specified while associating a filehandle. Op
4 min read
Perl | Anchors in Regex
Anchors in Perl Regex do not match any character at all. Instead, they match a particular position as before, after, or between the characters. These are used to check not the string but its positional boundaries. Following are the respective anchors in Perl Regex: '^' '$', '\b', '\A', '\Z', '\z', '
5 min read
Perl | Regex Cheat Sheet
Regex or Regular Expressions are an important part of Perl Programming. It is used for searching the specified text pattern. In this, set of characters together form the search pattern. It is also known as regexp. When user learns regular expression then there might be a need for quick look of those
6 min read
Perl | Use of STDIN for Input
Perl allows the programmer to accept input from the user to perform operations on. This makes it easier for the user to give input of its own and not only the one provided as Hardcoded input by the programmer. This Input can then be processed and printed with the use of print() function. Input to a
2 min read
Perl - Use of Capturing in Regular Expressions
A regular expression or a regex is a string of characters that define the pattern that we are viewing. It is a special string describing a search pattern present inside a given text. Perl allows us to group portions of these patterns together into a subpattern and also remembers the string matched b
3 min read
Perl | 'ee' Modifier in Regex
In Perl, the regular expression allows performing various operations on a given string with the use of suitable operators. These operators can perform operations like modification of string, the substitution of other substrings, etc. Substitution of a substring in the given string is done with the u
4 min read
Comparing content of files using Perl
In Perl, we can easily compare the content of two files by using the File::Compare module. This module provides a function called compare, which helps in comparing the content of two files specified to it as arguments. If the data present in both the files comes out to be same, the function returns
2 min read
Perl | Accessing a Directory using File Globbing
In Perl, a directory is used to store values in the form of lists. A directory is quite similar to a file. Just like a file, the directory also allows performing several operations on it. These operations are used for the modification of an existing directory or creation of a new one. A directory ca
2 min read