Find and Grep utilities



The command find finds files depending on search criteria. In this paper, some useful examples, especially with regular expressions, underused but so powerful.

The grep command, on the other hand, searches in one or a series of files in a same directory for text that matches a given regular expression. grep is the acronym for "Globally search for a Regular Expression and Print matching lines".

The find command and its options

Command find syntax

find directory criteria

Some common search criteria :

-name search by file name
-perm search by file permissions
-user search by file owner
-group search by group
-type search by type (d=directory, l=link, f=regular file)
-size search by file size
-mtime search by last modified time
-ctime search by creation time

Search criteria can be combined using logical operators :

  • \( criteria1 criteria2 \) or \( criteria1 -a criteria2 \) : logical AND
  • \( ! critere \) : logical NOT
  • \( critere1 -o critere2 \) : logical OR

The find command is used at least with the option -print. Without at least this option, even if the search is successful, find displays nothing to the standard output.

The find command is recursive, directories and subdirectories are scanned by this command.

Searching by file name

Option -name

To find all files with the .c extension in the directory /usr :

find /usr -name *.c -print

To find in the current directory all files with the .jpg extension or the .gif extension but not containing the keyword gimp in the name :

find . \( ! -name "*gimp*" -a \( -name "*.jpg" -o -name "*.gif" \) \) -print | sort

By default, results are not sorted, that’s why in the above example sort is applied on the output of the find command.

To specify the current directory : find . < critères >

Searching by dates

Option -mtime +/-

To find *.js or *.css files modified less than 2 days ago :

find . -mtime -2 -a \( -name "*.js" -o -name "*.css" \) -print

Option -ctime +/-

To find *.json files created more than 30 days ago in the directory $LOG :

find $LOG -ctime +30 -name "*.json" | sort 

The option -mtime -2 means -48h, the option -ctime +30 means +30 × 24h : by default find uses the current date and the current time. Specify daystart to use the real number of days without considering the current time.

find . -daystart -mtime -2 -name "*.html"

Searching by size

Option -size +/-

To identify *.html files whose size is greater than 50K (i.e. 100 blocks of 512o) :

find . -size +100 -name "*.html" -print

In practice, the unit (k | M | G) is specified to avoid computing the multiples of 512 bytes.

find . -size +50k -name "*.html" -print
find . -size +100M -print
find . -size +2G -print

Redirecting error messages

Due to access rights in some directories, the find command may produce a large number of error messages (permission denied, etc.). To avoid this, redirect error messages to /dev/null. However, the errors can be saved in a regular file if needed.

Exemple :

find . \( -name a.out -o -name "*.c" \) -print > /dev/null

find and the option exec

The option -print is used to display the results to the standard output. The option -exec is available in the find command and this option is exclusive of the option -print.

When the option -exec is used, it is then possible to execute a command on the files found by the find command.

find directory criteria -exec command {} \;

The output of the find command with the -print option is very basic :

find . -type f -size +100k -print

With the -exec option, the command find is usually combined with the command ls to display more details :

find . -type f -size +100k -exec ls -lh {} \; 2> /dev/null
          -rw-r--r-- 1 sqlpac wapp 118K Jun 15 11:34 ./sybase-replication-server-guide-pratique.pdf
-rw-r--r-- 1 sqlpac wapp 104K Jun 15 11:33 ./images/gimp-supprimer-couleur-arriere-plan-fond-09.jpg

Other common examples :

To remove all files named core with the command rm :

find . -name core -exec rm {} \;

To remove all *.json files created more than 10 days ago in the directory $LOG with the command rm :

find $LOG -name "*.json" -ctime +10 -exec rm {} \;

More concrete examples :

The encoding of a file is displayed by the command file and the option -i, so to list all *.htm, *.html, *.inc, *.php, *.css, *.json, *.xml files for which the encoding is iso-8859-1 :

find . -type f  \( -name "*.html" -o -name "*.htm" -o -name "*.json" -o -name "*.php" -o -name "*.inc" -o -name "*.x
ml" -o -name "*.css" -o -name "*.xml" \) -exec file -i {} \;  | grep -i 'iso-8859-1'
          ./admpmgportal/ text/x-php; charset=iso-8859-1
./admpmgportal/include/ text/x-php; charset=iso-8859-1
./admpmgportal/include/ text/html; charset=iso-8859-1...

To find the string '79.13' in non binary files :

find . -type f -exec grep -Il '79\.13' {} \;

Useful to find in a tree structure values (address IP, functions…), whatever the file is but not a binary file. The option -I in the grep command discards binary files.

The find command and regular expressions (-regex et -regextype)

Option -regex

The previous example is not very elegant ( -o -name "*.css" -o -name "*.php… ). Regular expressions are implemented in the find command using the option -regex.

The code becomes much more readable with this feature.

find . -regex '.*\.\(css\|htm\|html\|inc\|js\|json\|php\|xml\)' -exec file -i {} \;

Several libraries exist for regular expressions (posix, GNU awk…), libraries for which regular expressions syntaxes may differ.

The option -regextype specifies the library to use for regular expressions : example, finding *.txt and *.inc files using the library posix-basic.

find -regextype posix-basic -regex ".*\(txt\|inc\)" -print

Libraries and regular expressions syntaxes are numerous and are not detailed here, it is not the subject of this paper.

A trick to find the libraries available on the platform used : call the find command with an invalid -regextype option.

find .-regextype dummy
          find: Unknown regular expression type `dummy'; valid types are `findutils-default', `awk', `egrep', `ed',
`emacs', `gnu-awk', `grep', `posix-awk', `posix-basic', `posix-egrep',
`posix-extended', `posix-minimal-basic', `sed'.

To handle case insensitivity in regular expressions, use the -iregex option.

find -iregex ".*\.\(txt\)' -print./README.TXT

Combining find and xargs commands

To find the PHP function ereg_replace in the source code of *.php and *.inc files :

find . -type f  \( -name "*.php" -o -name "*.inc" \) -print | xargs grep -ni "ereg_replace"
./sqlpacv2/prp_article.php5:216:    $caption = ereg_replace("\.","",_USRDIR_DOC)."/".$datadoc[1]["fichier"];
./sqlpacv2/prp_glossaire.php5:95:   $caption = ereg_replace("\.","",__USRDIR_DOC)."/".$article["fichier"];

The xargs command runs an echo/cat command against the file returned by the find command. So the command following xargs (grep in the example above) is executed on the file.

Find and links

Option -xtype

The find command is very useful to find broken links :

find . -xtype l -exec ls -ll {} \;
          lrwxrwxrwx 1 sqlpac wapp 26 Sep  3 14:33 ./postmenu.php -> ../../engines/postmenu.php

If the option -xtype is not available on the platform, use the command test :

find . -type l  ! -exec test -e {} \; -exec ls -ll {} \;
lrwxrwxrwx 1 sqlpac wapp 26 Sep  3 14:33 ./postmenu.php -> ../../engines/postmenu.php

A useful example : computing the size of images in a directory

In a single command line combining find and awk, to compute the size of images in a directory :

find . -regex '.*\.\(png\|gif\|jpg\|jpeg\)' -exec ls -l {} \; | \
          awk 'BEGIN {sum=0} {sum+=$5} END { printf("%.2f %s\n",sum/1024000,"Mb") }'
          18.16 Mb

The grep command and its options

grep command syntax

grep -option(s) expression file(s)

Some common options :

-v display lines that do not match the expression
-c counts the number of lines matching the expression without displaying the rows
-n display the row matching the expression and its row number
-i ignore case sensitivity

Some examples :

To search for tables (HTML tags <table>) in *.html files :

grep "<table" *.html
sybase-iq-12.7-migration-ase-vers-iq.html:          <table class="alt- r-brdr rco-">
sybase-iq-12.7-migration-ase-vers-iq.html:          <table>
sybase-iq-12.7-migration-ase-vers-iq.html:          <table class="alt- r-brdr">

With the row numbers and ignoring case sensitivity :

grep -ni "<table" *.html
sybase-iq-12.7-migration-ase-vers-iq.html:232:          <table class="alt- r-brdr rco-">
sybase-iq-12.7-migration-ase-vers-iq.html:512:          <table>
sybase-iq-12.7-migration-ase-vers-iq.html:661:          <table class="alt- r-brdr">

Just the number of occurences and ignoring case sensitivity :

grep -ci "<table" *.html

The output results of the grep command are separated by :, very useful to quickly process the results with the awk utility.

grep -ci "<table" *.html | \
          awk -F":" 'BEGIN { hf=0; tb=0; } { if ($2 != 0) { hf++; tb +=$2 } } END { print tb" tables in "hf" files"}'
          549 tables in 187 files

Regular expressions with grep

The option -E gives a regular expression to the grep command.

To find the strings mysql_connect, mysql_query and mysql_close in *.php files with row numbers :

grep -ni -E "mysql_connect|mysql_query|mysql_close" *.php
          cls_database_myisam.php5:59:    $objRessource = mysql_connect(_SGBD_SERVER,_SGBD_USER,$pwdUnCrypted);
cls_database_myisam.php:81:             $ret=mysql_close($objRessource);
cls_database_myisam.php5:108:   $return = mysql_query($queryString,$objRessource);
cls_database_myisam.php5:114:   $reserrors = mysql_query('select @errno as errno , @errmsg as errmsg'
cls_database_myisam.php5:341:   $resultPlan=mysql_query("EXPLAIN ".$query,$objRessource);

The egrep command is none other than the grep command with the option -E :

egrep -ni "mysql_connect|mysql_query|mysql_close" *.php

Another possible syntax, more elegant :

egrep -ni "mysql_(connect|query|close)" *.php

The list of regular expressions terms are sometimes long in one single command line. So the terms can be defined in a text file, file submitted to the grep command using the option -f.

grep -ni -f regex.txt *.php

As with the find command, the grep command allows the use of different regular expressions syntaxes.

  • Option -E : ERE, Extended Regular Expressions
  • Option -G : BRE, Basic Regular Expressions
  • Option -P : PRE, Perl Regular Expressions

Some other examples :

The character ^ in a regular expression matches the starting position of any line, to find lines starting with a star :

grep -ni -E "^\*" *.php

The character $ in a regular expression matches an end of line, to find lines ending by a semicolon :

grep -ni -E "$;" *.php

Using the first examples, to find the tables in *.html files having CSS class rco- and/or r-brdr :

egrep -ni "<table.*class.*(rco-|r-brdr).*>" *.html
sybase-iq-12.7-migration-ase-vers-iq.html:53:          <table class="alt- r-brdr rco-">
sybase-iq-12.7-migration-ase-vers-iq.html:121:          <table class="alt- r-brdr rco-">
sybase-iq-12.7-migration-ase-vers-iq.html:152:          <table class="alt- r-brdr rco-">
sybase-iq-12.7-migration-ase-vers-iq.html:232:          <table class="alt- r-brdr rco-">
sybase-iq-12.7-migration-ase-vers-iq.html:661:          <table class="alt- r-brdr">

To list *.html files not containing tables (option -L) :

grep -L "<table" *.html