|
PROLOG | NAME | SYNOPSIS | DESCRIPTION | OPTIONS | OPERANDS | STDIN | INPUT FILES | ENVIRONMENT VARIABLES | ASYNCHRONOUS EVENTS | STDOUT | STDERR | OUTPUT FILES | EXTENDED DESCRIPTION | EXIT STATUS | CONSEQUENCES OF ERRORS | APPLICATION USAGE | EXAMPLES | RATIONALE | FUTURE DIRECTIONS | SEE ALSO | COPYRIGHT |
DIFF(1P) POSIX Programmer's Manual DIFF(1P)
This manual page is part of the POSIX Programmer's Manual. The Linux
implementation of this interface may differ (consult the
corresponding Linux manual page for details of Linux behavior), or
the interface may not be implemented on Linux.
diff — compare two files
diff [−c|−e|−f|−u|−C n|−U n] [−br] file1 file2
The diff utility shall compare the contents of file1 and file2 and
write to standard output a list of changes necessary to convert file1
into file2. This list should be minimal. No output shall be produced
if the files are identical.
The diff utility shall conform to the Base Definitions volume of
POSIX.1‐2008, Section 12.2, Utility Syntax Guidelines.
The following options shall be supported:
−b Cause any amount of white space at the end of a line to be
treated as a single <newline> (that is, the white-space
characters preceding the <newline> are ignored) and other
strings of white-space characters, not including <newline>
characters, to compare equal.
−c Produce output in a form that provides three lines of
copied context.
−C n Produce output in a form that provides n lines of copied
context (where n shall be interpreted as a positive decimal
integer).
−e Produce output in a form suitable as input for the ed
utility, which can then be used to convert file1 into
file2.
−f Produce output in an alternative form, similar in format to
−e, but not intended to be suitable as input for the ed
utility, and in the opposite order.
−r Apply diff recursively to files and directories of the same
name when file1 and file2 are both directories.
The diff utility shall detect infinite loops; that is,
entering a previously visited directory that is an ancestor
of the last file encountered. When it detects an infinite
loop, diff shall write a diagnostic message to standard
error and shall either recover its position in the
hierarchy or terminate.
−u Produce output in a form that provides three lines of
unified context.
−U n Produce output in a form that provides n lines of unified
context (where n shall be interpreted as a non-negative
decimal integer).
The following operands shall be supported:
file1, file2
A pathname of a file to be compared. If either the file1 or
file2 operand is '−', the standard input shall be used in
its place.
If both file1 and file2 are directories, diff shall not compare block
special files, character special files, or FIFO special files to any
files and shall not compare regular files to directories. Further
details are as specified in Diff Directory Comparison Format. The
behavior of diff on other file types is implementation-defined when
found in directories.
If only one of file1 and file2 is a directory, diff shall be applied
to the non-directory file and the file contained in the directory
file with a filename that is the same as the last component of the
non-directory file.
The standard input shall be used only if one of the file1 or file2
operands references standard input. See the INPUT FILES section.
The input files may be of any type.
The following environment variables shall affect the execution of
diff:
LANG Provide a default value for the internationalization
variables that are unset or null. (See the Base Definitions
volume of POSIX.1‐2008, Section 8.2, Internationalization
Variables for the precedence of internationalization
variables used to determine the values of locale
categories.)
LC_ALL If set to a non-empty string value, override the values of
all the other internationalization variables.
LC_CTYPE Determine the locale for the interpretation of sequences of
bytes of text data as characters (for example, single-byte
as opposed to multi-byte characters in arguments and input
files).
LC_MESSAGES
Determine the locale that should be used to affect the
format and contents of diagnostic messages written to
standard error and informative messages written to standard
output.
LC_TIME Determine the locale for affecting the format of file
timestamps written with the −C and −c options.
NLSPATH Determine the location of message catalogs for the
processing of LC_MESSAGES.
TZ Determine the timezone used for calculating file timestamps
written with a context format. If TZ is unset or null, an
unspecified default timezone shall be used.
Default.
Diff Directory Comparison Format
If both file1 and file2 are directories, the following output formats
shall be used.
In the POSIX locale, each file that is present in only one directory
shall be reported using the following format:
"Only in %s: %s\n", <directory pathname>, <filename>
In the POSIX locale, subdirectories that are common to the two
directories may be reported with the following format:
"Common subdirectories: %s and %s\n", <directory1 pathname>,
<directory2 pathname>
For each file common to the two directories, if the two files are not
to be compared: if the two files have the same device ID and file
serial number, or are both block special files that refer to the same
device, or are both character special files that refer to the same
device, in the POSIX locale the output format is unspecified.
Otherwise, in the POSIX locale an unspecified format shall be used
that contains the pathnames of the two files.
For each file common to the two directories, if the files are
compared and are identical, no output shall be written. If the two
files differ, the following format is written:
"diff %s %s %s\n", <diff_options>, <filename1>, <filename2>
where <diff_options> are the options as specified on the command
line.
All directory pathnames listed in this section shall be relative to
the original command line arguments. All other names of files listed
in this section shall be filenames (pathname components).
Diff Binary Output Format
In the POSIX locale, if one or both of the files being compared are
not text files, it is implementation-defined whether diff uses the
binary file output format or the other formats as specified below.
The binary file output format shall contain the pathnames of two
files being compared and the string "differ".
If both files being compared are text files, depending on the options
specified, one of the following formats shall be used to write the
differences.
Diff Default Output Format
The default (without −e, −f, −c, −C, −u, or −U options) diff utility
output shall contain lines of these forms:
"%da%d\n", <num1>, <num2>
"%da%d,%d\n", <num1>, <num2>, <num3>
"%dd%d\n", <num1>, <num2>
"%d,%dd%d\n", <num1>, <num2>, <num3>
"%dc%d\n", <num1>, <num2>
"%d,%dc%d\n", <num1>, <num2>, <num3>
"%dc%d,%d\n", <num1>, <num2>, <num3>
"%d,%dc%d,%d\n", <num1>, <num2>, <num3>, <num4>
These lines resemble ed subcommands to convert file1 into file2. The
line numbers before the action letters shall pertain to file1; those
after shall pertain to file2. Thus, by exchanging a for d and
reading the line in reverse order, one can also determine how to
convert file2 into file1. As in ed, identical pairs (where num1=
num2) are abbreviated as a single number.
Following each of these lines, diff shall write to standard output
all lines affected in the first file using the format:
"< %s", <line>
and all lines affected in the second file using the format:
"> %s", <line>
If there are lines affected in both file1 and file2 (as with the c
subcommand), the changes are separated with a line consisting of
three <hyphen> characters:
"−−−\n"
Diff −e Output Format
With the −e option, a script shall be produced that shall, when
provided as input to ed, along with an appended w (write) command,
convert file1 into file2. Only the a (append), c (change), d
(delete), i (insert), and s (substitute) commands of ed shall be used
in this script. Text lines, except those consisting of the single
character <period> ('.'), shall be output as they appear in the file.
Diff −f Output Format
With the −f option, an alternative format of script shall be
produced. It is similar to that produced by −e, with the following
differences:
1. It is expressed in reverse sequence; the output of −e orders
changes from the end of the file to the beginning; the −f from
beginning to end.
2. The command form <lines> <command-letter> used by −e is reversed.
For example, 10c with −e would be c10 with −f.
3. The form used for ranges of line numbers is <space>-separated,
rather than <comma>-separated.
Diff −c or −C Output Format
With the −c or −C option, the output format shall consist of affected
lines along with surrounding lines of context. The affected lines
shall show which ones need to be deleted or changed in file1, and
those added from file2. With the −c option, three lines of context,
if available, shall be written before and after the affected lines.
With the −C option, the user can specify how many lines of context
are written. The exact format follows.
The name and last modification time of each file shall be output in
the following format:
"*** %s %s\n", file1, <file1 timestamp>
"−−− %s %s\n", file2, <file2 timestamp>
Each <file> field shall be the pathname of the corresponding file
being compared. The pathname written for standard input is
unspecified.
In the POSIX locale, each <timestamp> field shall be equivalent to
the output from the following command:
date "+%a %b %e %T %Y"
without the trailing <newline>, executed at the time of last
modification of the corresponding file (or the current time, if the
file is standard input).
Then, the following output formats shall be applied for every set of
changes.
First, a line shall be written in the following format:
"***************\n"
Next, the range of lines in file1 shall be written in the following
format if the range contains two or more lines:
"*** %d,%d ****\n", <beginning line number>, <ending line number>
and the following format otherwise:
"*** %d ****\n", <ending line number>
The ending line number of an empty range shall be the number of the
preceding line, or 0 if the range is at the start of the file.
Next, the affected lines along with lines of context (unaffected
lines) shall be written. Unaffected lines shall be written in the
following format:
" %s", <unaffected_line>
Deleted lines shall be written as:
"− %s", <deleted_line>
Changed lines shall be written as:
"! %s", <changed_line>
Next, the range of lines in file2 shall be written in the following
format if the range contains two or more lines:
"−−− %d,%d −−−−\n", <beginning line number>, <ending line number>
and the following format otherwise:
"−−− %d −−−−\n", <ending line number>
Then, lines of context and changed lines shall be written as
described in the previous formats. Lines added from file2 shall be
written in the following format:
"+ %s", <added_line>
Diff −u or −U Output Format
The −u or −U options behave like the −c or −C options, except that
the context lines are not repeated; instead, the context, deleted,
and added lines are shown together, interleaved. The exact format
follows.
The name and last modification time of each file shall be output in
the following format:
"--- %s%s%s %s0, file1, <file1 timestamp>, <file1 frac>, <file1 zone>
"+++ %s%s%s %s0, file2, <file2 timestamp>, <file2 frac>, <file2 zone>
Each <file> field shall be the pathname of the corresponding file
being compared, or the single character '−' if standard input is
being compared. However, if the pathname contains a <tab> or a
<newline>, or if it does not consist entirely of characters taken
from the portable character set, the behavior is implementation-
defined.
Each <timestamp> field shall be equivalent to the output from the
following command:
date '+%Y-%m-%d %H:%M:%S'
without the trailing <newline>, executed at the time of last
modification of the corresponding file (or the current time, if the
file is standard input).
Each <frac> field shall be either empty, or a decimal point followed
by at least one decimal digit, indicating the fractional-seconds part
(if any) of the file timestamp. The number of fractional digits shall
be at least the number needed to represent the file's timestamp
without loss of information.
Each <zone> field shall be of the form "shhmm", where "shh" is a
signed two-digit decimal number in the range −24 through +25, and
"mm" is an unsigned two-digit decimal number in the range 00 through
59. It represents the timezone of the timestamp as the number of
hours (hh) and minutes (mm) east (+) or west (−) of UTC for the
timestamp. If the hours and minutes are both zero, the sign shall be
'+'. However, if the timezone is not an integral number of minutes
away from UTC, the <zone> field is implementation-defined.
Then, the following output formats shall be applied for every set of
changes.
First, the range of lines in each file shall be written in the
following format:
"@@ -%s +%s @@", <file1 range>, <file2 range>
Each <range> field shall be of the form:
"%1d", <beginning line number>
if the range contains exactly one line, and:
"%1d,%1d", <beginning line number>, <number of lines>
otherwise. If a range is empty, its beginning line number shall be
the number of the line just before the range, or 0 if the empty range
starts the file.
Next, the affected lines along with lines of context shall be
written. Each non-empty unaffected line shall be written in the
following format:
" %s", <unaffected_line>
where the contents of the unaffected line shall be taken from file1.
It is implementation-defined whether an empty unaffected line is
written as an empty line or a line containing a single <space>
character. This line also represents the same line of file2, even
though file2's line may contain different contents due to the −b.
Deleted lines shall be written as:
"-%s", <deleted_line>
Added lines shall be written as:
"+%s", <added_line>
The order of lines written shall be the same as that of the
corresponding file. A deleted line shall never be written immediately
after an added line.
If −U n is specified, the output shall contain no more than n
consecutive unaffected lines; and if the output contains an affected
line and this line is adjacent to up to n consecutive unaffected
lines in the corresponding file, the output shall contain these
unaffected lines. −u shall act like −U3.
The standard error shall be used only for diagnostic messages.
None.
None.
The following exit values shall be returned:
0 No differences were found.
1 Differences were found.
>1 An error occurred.
Default.
The following sections are informative.
If lines at the end of a file are changed and other lines are added,
diff output may show this as a delete and add, as a change, or as a
change and add; diff is not expected to know which happened and users
should not care about the difference in output as long as it clearly
shows the differences between the files.
If dir1 is a directory containing a directory named x, dir2 is a
directory containing a directory named x, dir1/x and dir2/x both
contain files named date.out, and dir2/x contains a file named y, the
command:
diff −r dir1 dir2
could produce output similar to:
Common subdirectories: dir1/x and dir2/x
Only in dir2/x: y
diff −r dir1/x/date.out dir2/x/date.out
1c1
< Mon Jul 2 13:12:16 PDT 1990
−−−
> Tue Jun 19 21:41:39 PDT 1990
The −h option was omitted because it was insufficiently specified and
does not add to applications portability.
Historical implementations employ algorithms that do not always
produce a minimum list of differences; the current language about
making every effort is the best this volume of POSIX.1‐2008 can do,
as there is no metric that could be employed to judge the quality of
implementations against any and all file contents. The statement
``This list should be minimal'' clearly implies that implementations
are not expected to provide the following output when comparing two
100-line files that differ in only one character on a single line:
1,100c1,100
all 100 lines from file1 preceded with "< "
−−−
all 100 lines from file2 preceded with "> "
The ``Only in'' messages required when the −r option is specified are
not used by most historical implementations if the −e option is also
specified. It is required here because it provides useful information
that must be provided to update a target directory hierarchy to match
a source hierarchy. The ``Common subdirectories'' messages are
written by System V and 4.3 BSD when the −r option is specified. They
are allowed here but are not required because they are reporting on
something that is the same, not reporting a difference, and are not
needed to update a target hierarchy.
The −c option, which writes output in a format using lines of
context, has been included. The format is useful for a variety of
reasons, among them being much improved readability and the ability
to understand difference changes when the target file has line
numbers that differ from another similar, but slightly different,
copy. The patch utility is most valuable when working with difference
listings using a context format. The BSD version of −c takes an
optional argument specifying the amount of context. Rather than
overloading −c and breaking the Utility Syntax Guidelines for diff,
the standard developers decided to add a separate option for
specifying a context diff with a specified amount of context (−C).
Also, the format for context diffs was extended slightly in 4.3 BSD
to allow multiple changes that are within context lines from each
other to be merged together. The output format contains an additional
four <asterisk> characters after the range of affected lines in the
first filename. This was to provide a flag for old programs (like old
versions of patch) that only understand the old context format. The
version of context described here does not require that multiple
changes within context lines be merged, but it does not prohibit it
either. The extension is upwards-compatible, so any vendors that wish
to retain the old version of diff can do so by adding the extra four
<asterisk> characters (that is, utilities that currently use diff and
understand the new merged format will also understand the old
unmerged format, but not vice versa).
The −u and −U options of GNU diff have been included. Their output
format, designed by Wayne Davison, takes up less space than −c and −C
format, and in many cases is easier to read. The format's timestamps
do not vary by locale, so LC_TIME does not affect it. The format's
line numbers are rendered with the %1d format, not %d, because the
file format notation rules would allow extra <blank> characters to
appear around the numbers.
The substitute command was added as an additional format for the −e
option. This was added to provide implementations with a way to fix
the classic ``dot alone on a line'' bug present in many versions of
diff. Since many implementations have fixed this bug, the standard
developers decided not to standardize broken behavior, but rather to
provide the necessary tool for fixing the bug. One way to fix this
bug is to output two periods whenever a lone period is needed, then
terminate the append command with a period, and then use the
substitute command to convert the two periods into one period.
The BSD-derived −r option was added to provide a mechanism for using
diff to compare two file system trees. This behavior is useful, is
standard practice on all BSD-derived systems, and is not easily
reproducible with the find utility.
The requirement that diff not compare files in some circumstances,
even though they have the same name, is based on the actual output of
historical implementations. The specified behavior precludes the
problems arising from running into FIFOs and other files that would
cause diff to hang waiting for input with no indication to the user
that diff was hung. An earlier version of this standard specified the
output format more precisely, but in practice this requirement was
widely ignored and the benefit of standardization seemed small, so it
is now unspecified. In most common usage, diff −r should indicate
differences in the file hierarchies, not the difference of contents
of devices pointed to by the hierarchies.
Many early implementations of diff require seekable files. Since the
System Interfaces volume of POSIX.1‐2008 supports named pipes, the
standard developers decided that such a restriction was unreasonable.
Note also that the allowed filename − almost always refers to a pipe.
No directory search order is specified for diff. The historical
ordering is, in fact, not optimal, in that it prints out all of the
differences at the current level, including the statements about all
common subdirectories before recursing into those subdirectories.
The message:
"diff %s %s %s\n", <diff_options>, <filename1>, <filename2>
does not vary by locale because it is the representation of a
command, not an English sentence.
None.
cmp(1p), comm(1p), ed(1p), find(1p)
The Base Definitions volume of POSIX.1‐2008, Chapter 8, Environment
Variables, Section 12.2, Utility Syntax Guidelines
Portions of this text are reprinted and reproduced in electronic form
from IEEE Std 1003.1, 2013 Edition, Standard for Information
Technology -- Portable Operating System Interface (POSIX), The Open
Group Base Specifications Issue 7, Copyright (C) 2013 by the
Institute of Electrical and Electronics Engineers, Inc and The Open
Group. (This is POSIX.1-2008 with the 2013 Technical Corrigendum 1
applied.) In the event of any discrepancy between this version and
the original IEEE and The Open Group Standard, the original IEEE and
The Open Group Standard is the referee document. The original
Standard can be obtained online at http://www.unix.org/online.html .
Any typographical or formatting errors that appear in this page are
most likely to have been introduced during the conversion of the
source files to man page format. To report such errors, see
https://www.kernel.org/doc/man-pages/reporting_bugs.html .
IEEE/The Open Group 2013 DIFF(1P)
Pages that refer to this page: cmp(1p), comm(1p), delta(1p), patch(1p), xargs(1p)