agrep Linux portal

Manual page and help for the agrep linux command Linux portal

Content

Data
Man page output
Help output
Related Content

Data

license:
Version number: 4.17-9.1 (in Debian 10)
Developer / owner: Sun Wu and Udi Manber

Short description:

Manual page and help for the agrep linux command. The agrep linux command looks for strings in files with approximate matching capabilities.

To use the command on a Debian system, agrep package installation required:

sudo apt-get install agrep

Man page output

man agrep

AGREP(1) General Commands Manual AGREP(1)

NAME
agrep - search a file for a string or regular expression, with approximate matching capa‐
bilities

SYNOPSIS
agrep [ -#cdehiklnpstvwxBDGIS ] pattern [ -f patternfile ] [ filename... ]

DESCRIPTION
agrep searches the input filenames (standard input is the default, but see a warning under
LIMITATIONS) for records containing strings which either exactly or approximately match a
pattern. A record is by default a line, but it can be defined differently using the -d
option (see below). Normally, each record found is copied to the standard output. Ap‐
proximate matching allows finding records that contain the pattern with several errors in‐
cluding substitutions, insertions, and deletions. For example, Massechusets matches Mass‐
achusetts with two errors (one substitution and one insertion). Running agrep -2
Massechusets foo outputs all lines in foo containing any string with at most 2 errors from
Massechusets.

agrep supports many kinds of queries including arbitrary wild cards, sets of patterns, and
in general, regular expressions. See PATTERNS below. It supports most of the options
supported by the grep family plus several more (but it is not 100% compatible with grep).
For more information on the algorithms used by agrep see Wu and Manber, "Fast Text Search‐
ing With Errors," Technical report #91-11, Department of Computer Science, University of
Arizona, June 1991 (available by anonymous ftp from cs.arizona.edu in agrep/agrep.ps.1),
and Wu and Manber, "Agrep -- A Fast Approximate Pattern Searching Tool", To appear in
USENIX Conference 1992 January (available by anonymous ftp from cs.arizona.edu in
agrep/agrep.ps.2).

As with the rest of the grep family, the characters `$', `^', `∗', `[', `]', `^', `|',
`(', `)', `!', and `\' can cause unexpected results when included in the pattern, as these
characters are also meaningful to the shell. To avoid these problems, one should always
enclose the entire pattern argument in single quotes, i.e., 'pattern'. Do not use double
quotes (").

When agrep is applied to more than one input file, the name of the file is displayed pre‐
ceding each line which matches the pattern. The filename is not displayed when processing
a single file, so if you actually want the filename to appear, use /dev/null as a second
file in the list.

OPTIONS
-# # is a non-negative integer (at most 8) specifying the maximum number of errors
permitted in finding the approximate matches (defaults to zero). Generally, each
insertion, deletion, or substitution counts as one error. It is possible to adjust
the relative cost of insertions, deletions and substitutions (see -I -D and -S op‐
tions).

-c Display only the count of matching records.

-d 'delim'
Define delim to be the separator between two records. The default value is '$',
namely a record is by default a line. delim can be a string of size at most 8
(with possible use of ^ and $), but not a regular expression. Text between two de‐
lim's, before the first delim, and after the last delim is considered as one
record. For example, -d '$$' defines paragraphs as records and -d '^From ' defines
mail messages as records. agrep matches each record separately. This option does
not currently work with regular expressions.

-e pattern
Same as a simple pattern argument, but useful when the pattern begins with a `-'.

-f patternfile
patternfile contains a set of (simple) patterns. The output is all lines that
match at least one of the patterns in patternfile. Currently, the -f option works
only for exact match and for simple patterns (any meta symbol is interpreted as a
regular character); it is compatible only with -c, -h, -i, -l, -s, -v, -w, and -x
options. see LIMITATIONS for size bounds.

-h Do not display filenames.

-i Case-insensitive search — e.g., "A" and "a" are considered equivalent.

-k No symbol in the pattern is treated as a meta character. For example, agrep -k
'a(b|c)*d' foo will find the occurrences of a(b|c)*d in foo whereas agrep
'a(b|c)*d' foo will find substrings in foo that match the regular expression
'a(b|c)*d'.

-l List only the files that contain a match. This option is useful for looking for
files containing a certain pattern. For example, " agrep -l 'wonderful' * " will
list the names of those files in current directory that contain the word 'wonder‐
ful'.

-n Each line that is printed is prefixed by its record number in the file.

-p Find records in the text that contain a supersequence of the pattern. For example,
agrep -p DCS foo will match "Department of Computer Science."

-s Work silently, that is, display nothing except error messages. This is useful for
checking the error status.

-t Output the record starting from the end of delim to (and including) the next delim.
This is useful for cases where delim should come at the end of the record.

-v Inverse mode — display only those records that do not contain the pattern.

-w Search for the pattern as a word — i.e., surrounded by non-alphanumeric characters.
The non-alphanumeric must surround the match; they cannot be counted as errors.
For example, agrep -w -1 car will match cars, but not characters.

-x The pattern must match the whole line.

-y Used with -B option. When -y is on, agrep will always output the best matches with‐
out giving a prompt.

-B Best match mode. When -B is specified and no exact matches are found, agrep will
continue to search until the closest matches (i.e., the ones with minimum number of
errors) are found, at which point the following message will be shown: "the best
match contains x errors, there are y matches, output them? (y/n)" The best match
mode is not supported for standard input, e.g., pipeline input. When the -#, -c,
or -l options are specified, the -B option is ignored. In general, -B may be
slower than -#, but not by very much.

-Dk Set the cost of a deletion to k (k is a positive integer). This option does not
currently work with regular expressions.

-G Output the files that contain a match.

-Ik Set the cost of an insertion to k (k is a positive integer). This option does not
currently work with regular expressions.

-Sk Set the cost of a substitution to k (k is a positive integer). This option does
not currently work with regular expressions.

PATTERNS
agrep supports a large variety of patterns, including simple strings, strings with classes
of characters, sets of strings, wild cards, and regular expressions.

Strings
any sequence of characters, including the special symbols `^' for beginning of line
and `$' for end of line. The special characters listed above ( `$', `^', `∗', `[',
`^', `|', `(', `)', `!', and `\' ) should be preceded by `\' if they are to be
matched as regular characters. For example, \^abc\\ corresponds to the string
^abc\, whereas ^abc corresponds to the string abc at the beginning of a line.

Classes of characters
a list of characters inside [] (in order) corresponds to any character from the
list. For example, [a-ho-z] is any character between a and h or between o and z.
The symbol `^' inside [] complements the list. For example, [^i-n] denote any
character in the character set except character 'i' to 'n'. The symbol `^' thus
has two meanings, but this is consistent with egrep. The symbol `.' (don't care)
stands for any symbol (except for the newline symbol).

Boolean operations
agrep supports an `and' operation `;' and an `or' operation `,', but not a combina‐
tion of both. For example, 'fast;network' searches for all records containing both
words.

Wild cards
The symbol '#' is used to denote a wild card. # matches zero or any number of ar‐
bitrary characters. For example, ex#e matches example. The symbol # is equivalent
to .* in egrep. In fact, .* will work too, because it is a valid regular expres‐
sion (see below), but unless this is part of an actual regular expression, # will
work faster.

Combination of exact and approximate matching
any pattern inside angle brackets <> must match the text exactly even if the match
is with errors. For example, <mathemat>ics matches mathematical with one error
(replacing the last s with an a), but mathe<matics> does not match mathematical no
matter how many errors we allow.

Regular expressions
The syntax of regular expressions in agrep is in general the same as that for
egrep. The union operation `|', Kleene closure `*', and parentheses () are all
supported. Currently '+' is not supported. Regular expressions are currently lim‐
ited to approximately 30 characters (generally excluding meta characters). Some
options (-d, -w, -f, -t, -x, -D, -I, -S) do not currently work with regular expres‐
sions. The maximal number of errors for regular expressions that use '*' or '|' is
4.

EXAMPLES
agrep -2 -c ABCDEFG foo
gives the number of lines in file foo that contain ABCDEFG within two errors.

agrep -1 -D2 -S2 'ABCD#YZ' foo
outputs the lines containing ABCD followed, within arbitrary distance, by YZ, with
up to one additional insertion (-D2 and -S2 make deletions and substitutions too
"expensive").

agrep -5 -p abcdefghij /path/to/dictionary/words
outputs the list of all words containing at least 5 of the first 10 letters of the
alphabet in order. (Try it: any list starting with academia and ending with sac‐
rilegious must mean something!)

agrep -1 'abc[0-9](de|fg)*[x-z]' foo
outputs the lines containing, within up to one error, the string that starts with
abc followed by one digit, followed by zero or more repetitions of either de or fg,
followed by either x, y, or z.

agrep -d '^From ' 'breakdown;internet' mbox
outputs all mail messages (the pattern '^From ' separates mail messages in a mail
file) that contain keywords 'breakdown' and 'internet'.

agrep -d '$$' -1 '<word1> <word2>' foo
finds all paragraphs that contain word1 followed by word2 with one error in place
of the blank. In particular, if word1 is the last word in a line and word2 is the
first word in the next line, then the space will be substituted by a newline symbol
and it will match. Thus, this is a way to overcome separation by a newline. Note
that -d '$$' (or another delim which spans more than one line) is necessary, be‐
cause otherwise agrep searches only one line at a time.

agrep '^agrep' <this manual>
outputs all the examples of the use of agrep in this man pages.

SEE ALSO
ed(1), ex(1), grep(1V), sh(1), csh(1).

BUGS/LIMITATIONS
Any bug reports or comments will be appreciated! Please mail them to sw@cs.arizona.edu or
udi@cs.arizona.edu

Regular expressions do not support the '+' operator (match 1 or more instances of the pre‐
ceding token). These can be searched for by using this syntax in the pattern:

'pattern(pattern)*'

(search for strings containing one instance of the pattern, followed by 0 or more in‐
stances of the pattern).

The following can cause an infinite loop: agrep pattern * > output_file. If the number of
matches is high, they may be deposited in output_file before it is completely read leading
to more matches of the pattern within output_file (the matches are against the whole di‐
rectory). It's not clear whether this is a "bug" (grep will do the same), but be warned.

The maximum size of the patternfile is limited to be 250Kb, and the maximum number of pat‐
terns is limited to be 30,000.

Standard input is the default if no input file is given. However, if standard input is
keyed in directly (as opposed to through a pipe, for example) agrep may not work for some
non-simple patterns.

There is no size limit for simple patterns. More complicated patterns are currently lim‐
ited to approximately 30 characters. Lines are limited to 1024 characters. Records are
limited to 48K, and may be truncated if they are larger than that. The limit of record
length can be changed by modifying the parameter Max_record in agrep.h.

DIAGNOSTICS
Exit status is 0 if any matches are found, 1 if none, 2 for syntax errors or inaccessible
files.

AUTHORS
Sun Wu and Udi Manber, Department of Computer Science, University of Arizona, Tucson, AZ
85721. {sw|udi}@cs.arizona.edu.

Jan 17, 1992 AGREP(1)

Help output

agrep

usage: agrep [-@#abcdehiklnoprstvwxyBDGIMSV] [-f patternfile] [-H dir] pattern [files]

summary of frequently used options:
(For a more detailed listing see 'man agrep'.)
-#: find matches with at most # errors
-c: output the number of matched records
-d: define record delimiter
-h: do not output file names
-i: case-insensitive search, e.g., 'a' = 'A'
-l: output the names of files that contain a match
-n: output record prefixed by record number
-v: output those records that have no matches
-w: pattern has to match as a word, e.g., 'win' will not match 'wind'
-B: best match mode. find the closest matches to the pattern
-G: output the files that contain a match
-H 'dir': the cast-dictionary is located in directory 'dir'

Title	Viewed
Make bootable flash drives with Rufus 3.3	11,913
How to Build a Bootable Drive with Rufus	8,212
How to share directories between Linux and Windows	6,111
How can we achieve…	5,927
Creating a bootable flash drive on Debian	4,797

Title	Viewed
How to configure VirtualBox for graphical desktop operating systems	8
CIDR (Classless Inter-Domain Routing)	6
Installing VirtualBox 6.0.x on a Debian 10 (Buster) host operating system	4
How to reset our forgotten root password on our Linux system	4
How to get behind a NAT network…	4

Label	occurrences
manual	655
help	609
encyclopedia	74
installation	74
setting	54
configuration	54
Debian	54
article	44
Apache	37
PHP	37

Linuxportal

agrep

Content

Data

Short description:

Man page output

Help output

Related Content

Categories

Popular content

Today's popular content

Frequent tags

Monthly archive

Facebook