Introduction
The command find finds files depending on search criteria. In this paper, some useful examples, especially
with regular expressions, underused but so powerful.
The grep command, on the other hand, searches in one or a series of files in a same directory
for text that matches a given regular expression. grep is the acronym for "Globally search for a Regular Expression and Print matching lines".
The find command and its options
Command find syntax
find directory criteria
Some common search criteria :
-name |
search by file name |
-perm |
search by file permissions |
-user |
search by file owner |
-group |
search by group |
-type |
search by type (d=directory, l=link, f=regular file) |
-size |
search by file size |
-mtime |
search by last modified time |
-ctime |
search by creation time |
Search criteria can be combined using logical operators :
\( criteria1 criteria2 \)or\( criteria1 -a criteria2 \): logicalAND\( ! critere \): logicalNOT\( critere1 -o critere2 \): logicalOR
The find command is used at least with the option -print. Without at least this option,
even if the search is successful, find displays nothing to the standard output.
The find command is recursive, directories and subdirectories are scanned by this command.
Searching by file name
Option -name
To find all files with the .c extension in the directory /usr :
find /usr -name *.c -print/usr/share/bison/yacc.c /usr/share/bison/glr.c …
To find in the current directory all files with the .jpg extension or the .gif extension
but not containing the keyword gimp in the name :
find . \( ! -name "*gimp*" -a \( -name "*.jpg" -o -name "*.gif" \) \) -print | sort… ./images/conception-html-dynamique-suppression-document.write-01.jpg ./images/conception-html-equations-math-mathjax-asciimath-01.jpg ./images/google-analytics-optimisation-mesure-audience-01.jpg ./images/google-analytics-optimisation-mesure-audience-02.jpg …
By default, results are not sorted, that’s why in the above example sort is applied on the output of the find command.
To specify the current directory : find . < critères >
Searching by dates
Option -mtime +/-
To find *.js or *.css files modified less than 2 days ago :
find . -mtime -2 -a \( -name "*.js" -o -name "*.css" \) -print./css/style-df.css ./css/style.css ./js/resources/nohttp.js
Option -ctime +/-
To find *.json files created more than 30 days ago in the directory $LOG :
find $LOG -ctime +30 -name "*.json" | sort./postgresql-9.6-setup-installation-rapide_20200929030800.json ./postgresql-9.6-setup-installation-rapide_20200929023720.json …
The option -mtime -2 means -48h, the option -ctime +30 means +30 × 24h : by default find uses the current date and the current time.
Specify daystart to use the real number of days without considering the current time.
find . -daystart -mtime -2 -name "*.html"
Searching by size
Option -size +/-
To identify *.html files whose size is greater than 50K (i.e. 100 blocks of 512o) :
find . -size +100 -name "*.html" -print./influxdb-v1.7-architecture-installation-configuration-utilisation.html ./sybase-ase-iq-comparaison.html …
In practice, the unit (k | M | G) is specified to avoid computing the multiples of 512 bytes.
find . -size +50k -name "*.html" -print
find . -size +100M -print
find . -size +2G -print
Redirecting error messages
Due to access rights in some directories, the find command may produce a large number of error
messages (permission denied, etc.). To avoid this, redirect error messages to /dev/null. However,
the errors can be saved in a regular file if needed.
Exemple :
find . \( -name a.out -o -name "*.c" \) -print > /dev/null
find and the option exec
The option -print is used to display the results to the standard output.
The option -exec is available in the find command and this option is exclusive of the option -print.
When the option -exec is used, it is then possible to execute a command on the files found by the
find command.
find directory criteria -exec command {} \;
The output of the find command with the -print option is very basic :
find . -type f -size +100k -print./sybase-replication-server-guide-pratique.pdf ./images/gimp-supprimer-couleur-arriere-plan-fond-09.jpg …
With the -exec option, the command find is usually combined with the command ls to display more details :
find . -type f -size +100k -exec ls -lh {} \; 2> /dev/null-rw-r--r-- 1 sqlpac wapp 118K Jun 15 11:34 ./sybase-replication-server-guide-pratique.pdf -rw-r--r-- 1 sqlpac wapp 104K Jun 15 11:33 ./images/gimp-supprimer-couleur-arriere-plan-fond-09.jpg …
Other common examples :
To remove all files named core with the command rm :
find . -name core -exec rm {} \;
To remove all *.json files created more than 10 days ago in the directory $LOG with the command rm :
find $LOG -name "*.json" -ctime +10 -exec rm {} \;
More concrete examples :
The encoding of a file is displayed by the command file and the option -i,
so to list all *.htm, *.html, *.inc, *.php, *.css, *.json, *.xml files for which the encoding is iso-8859-1 :
find . -type f \( -name "*.html" -o -name "*.htm" -o -name "*.json" -o -name "*.php" -o -name "*.inc" -o -name "*.x ml" -o -name "*.css" -o -name "*.xml" \) -exec file -i {} \; | grep -i 'iso-8859-1'./admpmgportal/config.inc: text/x-php; charset=iso-8859-1 ./admpmgportal/include/rules.inc: text/x-php; charset=iso-8859-1 ./admpmgportal/include/treeview.inc: text/html; charset=iso-8859-1...
To find the string '79.13' in non binary files :
find . -type f -exec grep -Il '79\.13' {} \;./redis/dba/srvrdisqlpac/cfg/srvrdisqlpac.conf ...
Useful to find in a tree structure values (address IP, functions…), whatever the file is but not a binary file.
The option -I in the grep command discards binary files.
The find command and regular expressions (-regex et -regextype)
Option -regex
The previous example is not very elegant ( -o -name "*.css" -o -name "*.php… ).
Regular expressions are implemented in the find command using the option -regex.
The code becomes much more readable with this feature.
find . -regex '.*\.\(css\|htm\|html\|inc\|js\|json\|php\|xml\)' -exec file -i {} \;
Several libraries exist for regular expressions (posix, GNU awk…), libraries for which regular expressions syntaxes may differ.
The option -regextype specifies the library to use for regular expressions : example, finding *.txt and *.inc files using the library posix-basic.
find -regextype posix-basic -regex ".*\(txt\|inc\)" -print
Libraries and regular expressions syntaxes are numerous and are not detailed here, it is not the subject of this paper.
A trick to find the libraries available on the platform used : call the find command with an invalid -regextype option.
find .-regextype dummyfind: Unknown regular expression type `dummy'; valid types are `findutils-default', `awk', `egrep', `ed', `emacs', `gnu-awk', `grep', `posix-awk', `posix-basic', `posix-egrep', `posix-extended', `posix-minimal-basic', `sed'.
To handle case insensitivity in regular expressions, use the -iregex option.
find -iregex ".*\.\(txt\)' -print./README.TXT
Combining find and xargs commands
To find the PHP function ereg_replace in the source code of *.php and *.inc files :
find . -type f \( -name "*.php" -o -name "*.inc" \) -print | xargs grep -ni "ereg_replace"... ./sqlpacv2/prp_article.php5:216: $caption = ereg_replace("\.","",_USRDIR_DOC)."/".$datadoc[1]["fichier"]; ./sqlpacv2/prp_glossaire.php5:95: $caption = ereg_replace("\.","",__USRDIR_DOC)."/".$article["fichier"];
The xargs command runs an echo/cat command against the file returned by the find command.
So the command following xargs (grep in the example above) is executed on the file.
Find and links
Option -xtype
The find command is very useful to find broken links :
find . -xtype l -exec ls -ll {} \;lrwxrwxrwx 1 sqlpac wapp 26 Sep 3 14:33 ./postmenu.php -> ../../engines/postmenu.php
If the option -xtype is not available on the platform, use the command test :
find . -type l ! -exec test -e {} \; -exec ls -ll {} \;lrwxrwxrwx 1 sqlpac wapp 26 Sep 3 14:33 ./postmenu.php -> ../../engines/postmenu.php
A useful example : computing the size of images in a directory
In a single command line combining find and awk, to compute the size of images in a directory :
find . -regex '.*\.\(png\|gif\|jpg\|jpeg\)' -exec ls -l {} \; | \ awk 'BEGIN {sum=0} {sum+=$5} END { printf("%.2f %s\n",sum/1024000,"Mb") }'18.16 Mb
The grep command and its options
grep command syntax
grep -option(s) expression file(s)
Some common options :
-v |
display lines that do not match the expression |
-c |
counts the number of lines matching the expression without displaying the rows |
-n |
display the row matching the expression and its row number |
-i |
ignore case sensitivity |
Some examples :
To search for tables (HTML tags <table>) in *.html files :
grep "<table" *.html… sybase-iq-12.7-migration-ase-vers-iq.html: <table class="alt- r-brdr rco-"> sybase-iq-12.7-migration-ase-vers-iq.html: <table> sybase-iq-12.7-migration-ase-vers-iq.html: <table class="alt- r-brdr"> …
With the row numbers and ignoring case sensitivity :
grep -ni "<table" *.html… sybase-iq-12.7-migration-ase-vers-iq.html:232: <table class="alt- r-brdr rco-"> sybase-iq-12.7-migration-ase-vers-iq.html:512: <table> sybase-iq-12.7-migration-ase-vers-iq.html:661: <table class="alt- r-brdr"> …
Just the number of occurences and ignoring case sensitivity :
grep -ci "<table" *.html… sybase-iq-12.7-migration-ase-vers-iq.html:6 … sybase-iq-index-cardinalite-sp_dba_helpcolumn.html:0 …
The output results of the grep command are separated by :,
very useful to quickly process the results with the awk utility.
grep -ci "<table" *.html | \ awk -F":" 'BEGIN { hf=0; tb=0; } { if ($2 != 0) { hf++; tb +=$2 } } END { print tb" tables in "hf" files"}'549 tables in 187 files
Regular expressions with grep
The option -E gives a regular expression to the grep command.
To find the strings mysql_connect, mysql_query and mysql_close in *.php files with row numbers :
grep -ni -E "mysql_connect|mysql_query|mysql_close" *.phpcls_database_myisam.php5:59: $objRessource = mysql_connect(_SGBD_SERVER,_SGBD_USER,$pwdUnCrypted); cls_database_myisam.php:81: $ret=mysql_close($objRessource); cls_database_myisam.php5:108: $return = mysql_query($queryString,$objRessource); cls_database_myisam.php5:114: $reserrors = mysql_query('select @errno as errno , @errmsg as errmsg' cls_database_myisam.php5:341: $resultPlan=mysql_query("EXPLAIN ".$query,$objRessource);
The egrep command is none other than the grep command with the option -E :
egrep -ni "mysql_connect|mysql_query|mysql_close" *.php
Another possible syntax, more elegant :
egrep -ni "mysql_(connect|query|close)" *.php
The list of regular expressions terms are sometimes long in one single command line.
So the terms can be defined in a text file, file submitted to the grep command using the option -f.
grep -ni -f regex.txt *.php
regex.txt
mysql_connect
mysql_query
As with the find command, the grep command allows the use of different regular expressions syntaxes.
- Option
-E: ERE, Extended Regular Expressions - Option
-G: BRE, Basic Regular Expressions - Option
-P: PRE, Perl Regular Expressions
Some other examples :
The character ^ in a regular expression matches the starting position of any line, to find lines starting with a star :
grep -ni -E "^\*" *.php
The character $ in a regular expression matches an end of line, to find lines ending by a semicolon :
grep -ni -E "$;" *.php
Using the first examples, to find the tables in *.html files having CSS class rco- and/or r-brdr :
egrep -ni "<table.*class.*(rco-|r-brdr).*>" *.html… sybase-iq-12.7-migration-ase-vers-iq.html:53: <table class="alt- r-brdr rco-"> sybase-iq-12.7-migration-ase-vers-iq.html:121: <table class="alt- r-brdr rco-"> sybase-iq-12.7-migration-ase-vers-iq.html:152: <table class="alt- r-brdr rco-"> sybase-iq-12.7-migration-ase-vers-iq.html:232: <table class="alt- r-brdr rco-"> sybase-iq-12.7-migration-ase-vers-iq.html:661: <table class="alt- r-brdr"> …
To list *.html files not containing tables (option -L) :
grep -L "<table" *.html