请输入您要查询的百科知识:

 

词条 Diff
释义

  1. History

  2. Algorithm

  3. Usage

  4. Variations

      Edit script    Context format    Unified format    Others  

  5. Free file comparison tools

  6. See also

  7. References

  8. Further reading

  9. External links

{{About|the file comparison utility|data comparisons in general|data comparison|diffs in Wikipedia|:help:diff|other uses|DIFF (disambiguation){{!}}DIFF}}{{lowercase|title=diff}}{{Infobox software
| name = diff
| title = diff
| screenshot =
| caption =
| screenshot size =
| screenshot alt =
| collapsible =
| author = Douglas McIlroy
| developer = AT&T Bell Laboratories
| released = {{Start date and age|1974|6}}
| latest release version =
| latest release date =
| programming language =
| platform = Unix and Unix-like
| genre = Command
| license =
| website =
| standard =
| AsOf =
}}

In computing, the {{Mono|diff}} utility is a data comparison tool that calculates and displays the differences between two files. Unlike edit distance notions used for other purposes, {{Mono|diff}} is line-oriented rather than character-oriented, but it is like Levenshtein distance in that it tries to determine the smallest set of deletions and insertions to create one file from the other. The {{Mono|diff}} command displays the changes made in a standard format, such that both humans and machines can understand the changes and apply them: given one file and the changes, the other file can be created.

Typically, {{Mono|diff}} is used to show the changes between two versions of the same file. Modern implementations also support binary files.[1] The output is called a "diff", or a patch, since the output can be applied with the Unix program {{Mono|patch}}. The output of similar file comparison utilities are also called a "diff"; like the use of the word "grep" for describing the act of searching, the word diff became a generic term for calculating data difference and the results thereof.[2] The POSIX standard specifies the behavior of the "diff" and "patch" utilities and their file formats.[3]

History

The {{Mono|diff}} utility was developed in the early 1970s on the Unix operating system which was emerging from Bell Labs in Murray Hill, New Jersey. The final version, first shipped with the 5th Edition of Unix in 1974, was entirely written by Douglas McIlroy. This research was published in a 1976 paper co-written with James W. Hunt who developed an initial prototype of {{Mono|diff}}.[4] The algorithm this paper described became known as the Hunt–McIlroy algorithm.

McIlroy's work was preceded and influenced by Steve Johnson's comparison program on GECOS and Mike Lesk's {{Mono|proof}} program. {{Mono|Proof}} also originated on Unix and, like {{Mono|diff}}, produced line-by-line changes and even used angle-brackets (">" and "<") for presenting line insertions and deletions in the program's output. The heuristics used in these early applications were, however, deemed unreliable. The potential usefulness of a diff tool provoked McIlroy into researching and designing a more robust tool that could be used in a variety of tasks but perform well in the processing and size limitations of the PDP-11's hardware. His approach to the problem resulted from collaboration also with individuals at Bell Labs including Alfred Aho, Elliot Pinson, Jeffrey Ullman, and Harold S. Stone.

In the context of Unix, the use of the {{Mono|ed}} line editor provided {{Mono|diff}} with the natural ability to create machine-usable "edit scripts". These edit scripts, when saved to a file, can, along with the original file, be reconstituted by {{Mono|ed}} into the modified file in its entirety. This greatly reduced the secondary storage necessary to maintain multiple versions of a file. McIlroy considered writing a post-processor for {{Mono|diff}} where a variety of output formats could be designed and implemented, but he found it more frugal and simpler to have {{Mono|diff}} be responsible for generating the syntax and reverse-order input accepted by the {{Mono|ed}} command.

Late in 1984 Larry Wall created a separate utility, patch,

releasing its source code on the mod.sources and net.sources newsgroups.[5][6][7]

This program generalized and extended the ability to modify files with output from {{Mono|diff}}.

Modes in Emacs also allow for converting the format of patches and even editing patches interactively.

In {{Mono|diff}}'s early years, common uses included comparing changes in the source of software code and markup for technical documents, verifying program debugging output, comparing filesystem listings and analyzing computer assembly code. The output targeted for {{Mono|ed}} was motivated to provide compression for a sequence of modifications made to a file. The Source Code Control System (SCCS) and its ability to archive revisions emerged in the late 1970s as a consequence of storing edit scripts from {{Mono|diff}}.

Algorithm

The operation of {{Mono|diff}} is based on solving the longest common subsequence problem.[4]

In this problem, given two sequences of items:

and we want to find a longest sequence of items that is present in both original sequences in the same order. That is, we want to find a new sequence which can be obtained from the first original sequence by deleting some items, and from the second original sequence by deleting other items. We also want this sequence to be as long as possible. In this case it is

From a longest common subsequence it is only a small step to get {{Mono|diff}}-like output: if an item is absent in the subsequence but present in the first original sequence, it must have been deleted (as indicated by the '-' marks, below). If it is absent in the subsequence but present in the second original sequence, it must have been inserted (as indicated by the '+' marks).

        e   h i   q   k r x y        +   - +   -   + + + +

Usage

The diff command is invoked from the command line, passing it the names of two files: diff original new. The output of the command represents the changes required to transform the original file into the new file.

If original and new are directories, then {{Mono|diff}} will be run on each file that exists in both directories. An option, -r, will recursively descend any matching subdirectories to compare files between directories.

Any of the examples in the article use the following two files, original and new:

{{Col-begin}}{{Col-break|width=33%}}

original:

This part of the

document has stayed the

same from version to

version. It shouldn't

be shown if it doesn't

change. Otherwise, that

would not be helping to

compress the size of the

changes.

This paragraph contains

text that is outdated.

It will be deleted in the

near future.

It is important to spell

check this dokument. On

the other hand, a

misspelled word isn't

the end of the world.

Nothing in the rest of

this paragraph needs to

be changed. Things can

be added after it.

{{col-break}}

new:

This is an important

notice! It should

therefore be located at

the beginning of this

document!

This part of the

document has stayed the

same from version to

version. It shouldn't

be shown if it doesn't

change. Otherwise, that

would not be helping to

compress the size of the

changes.

It is important to spell

check this document. On

the other hand, a

misspelled word isn't

the end of the world.

Nothing in the rest of

this paragraph needs to

be changed. Things can

be added after it.

This paragraph contains

important new additions

to this document.

{{col-break|width=33%}}

The command diff original new produces the following normal diff output:

{{pre|

0a1,6

{{font color|darkgreen|> This is an important

> notice! It should

> therefore be located at

> the beginning of this

> document!

>}}

11,15d16

{{font color|darkred|< This paragraph contains

< text that is outdated.

< It will be deleted in the

< near future.

<}}

17c18

{{font color|darkred|< check this dokument. On}}
{{font color|darkgreen|> check this document. On}}

24a26,29

{{font color|darkgreen|>

> This paragraph contains

> important new additions

> to this document.}}}}

{{Note2}} Here, the diff output is shown with colors to make it easier to read. The diff utility does not produce colored output; its output is plain text. However, many tools can show the output with colors by using syntax highlighting.{{col-end}}

In this traditional output format, a stands for added, d for deleted and c for changed. Line numbers of the original file appear before a/d/c and those of the new file appear after. The less-than and greater-than signs (at the beginning of lines that are added, deleted or changed) indicate which file the lines appear in. Addition lines are added to the original file to appear in the new file. Deletion lines are deleted from the original file to be missing in the new file.

By default, lines common to both files are not shown. Lines that have moved are shown as added at their new location and as deleted from their old location.[8] However, some diff tools highlight moved lines.

Variations

Changes since 1975 include improvements to the core algorithm, the addition of useful features to the command, and the design of new output formats. The basic algorithm is described in the papers An O(ND) Difference Algorithm and its Variations by Eugene W. Myers[9]

and in A File Comparison Program by Webb Miller and Myers.[10]

The algorithm was independently discovered and described in Algorithms for Approximate String Matching, by Esko Ukkonen.[11]

The first editions of the diff program were designed for line comparisons of text files expecting the newline character to delimit lines. By the 1980s, support for binary files resulted in a shift in the application's design and implementation.

Edit script

An edit script can still be generated by modern versions of diff with the -e option. The resulting edit script for this example is as follows:

 24a  This paragraph contains important new additions to this document. . 17c check this document. On . 11,15d 0a This is an important notice! It should therefore be located at the beginning of this document!  .

In order to transform the content of file original into the content of file new using {{Mono|ed}}, we should append two lines to this diff file, one line containing a w (write) command, and one containing a q (quit) command (e.g. by {{code|lang=bash|printf "w\q\" >> mydiff}}). Here we gave the diff file the name mydiff and the transformation will then happen when we run {{code|lang=bash|ed -s original < mydiff}}.

Context format

The Berkeley distribution of Unix made a point of adding the context format ({{code|-c}}) and the ability to recurse on filesystem directory structures ({{code|-r}}), adding those features in 2.8 BSD, released in July 1981. The context format of diff introduced at Berkeley helped with distributing patches for source code that may have been changed minimally.

In the context format, any changed lines are shown alongside unchanged lines before and after. The inclusion of any number of unchanged lines provides a context to the patch. The context consists of lines that have not changed between the two files and serve as a reference to locate the lines' place in a modified file and find the intended location for a change to be applied regardless of whether the line numbers still correspond. The context format introduces greater readability for humans and reliability when applying the patch, and an output which is accepted as input to the patch program. This intelligent behavior isn't possible with the traditional diff output.

The number of unchanged lines shown above and below a change hunk can be defined by the user, even zero, but three lines is typically the default. If the context of unchanged lines in a hunk overlap with an adjacent hunk, then diff will avoid duplicating the unchanged lines and merge the hunks into a single hunk.

A "{{code|!}}" represents a change between lines that correspond in the two files. A "{{code|+}}" represents the addition of a line, while a blank space represents an unchanged line. At the beginning of the patch is the file information, including the full path and a time stamp delimited by a tab character. At the beginning of each hunk are the line numbers that apply for the corresponding change in the files. A number range appearing between sets of three asterisks applies to the original file, while sets of three dashes apply to the new file. The hunk ranges specify the starting and ending line numbers in the respective file.

The command {{code|diff -c original new}} produces the following output:

  • /path/to/original timestamp
--- /path/to/new timestamp
  • 1,3
--- 1,9 ----

+ This is an important

+ notice! It should

+ therefore be located at

+ the beginning of this

+ document!

+

  This part of the  document has stayed the  same from version to
  • 8,20
      compress the size of the  changes.
- This paragraph contains- text that is outdated.- It will be deleted in the- near future.
check this dokument. On
  the other hand, a  misspelled word isn't  the end of the world.
--- 14,21 ----
  compress the size of the  changes.
check this document. On
  the other hand, a  misspelled word isn't  the end of the world.
  • 22,24
--- 23,29 ----
  this paragraph needs to  be changed. Things can  be added after it.

+

+ This paragraph contains

+ important new additions

+ to this document.

Unified format

The unified format (or unidiff) inherits the technical improvements made by the context format, but produces a smaller diff with old and new text presented immediately adjacent. Unified format is usually invoked using the "-u" command line option. This output is often used as input to the patch program. Many projects specifically request that "diffs" be submitted in the unified format, making unified diff format the most common format for exchange between software developers.

Unified context diffs were originally developed by Wayne Davison in August 1990 (in unidiff which appeared in Volume 14 of comp.sources.misc). Richard Stallman added unified diff support to the GNU Project's diff utility one month later, and the feature debuted in GNU diff 1.15, released in January 1991. GNU diff has since generalized the context format to allow arbitrary formatting of diffs.

The format starts with the same two-line header as the context format, except that the original file is preceded by "---" and the new file is preceded by "+++". Following this are one or more change hunks that contain the line differences in the file. The unchanged, contextual lines are preceded by a space character, addition lines are preceded by a plus sign, and deletion lines are preceded by a minus sign.

A hunk begins with range information and is immediately followed with the line additions, line deletions, and any number of the contextual lines. The range information is surrounded by double-at signs, and combines onto a single line what appears on two lines in the context format (above). The format of the range information line is as follows:

The hunk range information contains two hunk ranges. The range for the hunk of the original file is preceded by a minus symbol, and the range for the new file is preceded by a plus symbol. Each hunk range is of the format l,s where l is the starting line number and s is the number of lines the change hunk applies to for each respective file. In many versions of GNU diff, each range can omit the comma and trailing value s, in which case s defaults to 1. Note that the only really interesting value is the l line number of the first range; all the other values can be computed from the diff.

The hunk range for the original should be the sum of all contextual and deletion (including changed) hunk lines. The hunk range for the new file should be a sum of all contextual and addition (including changed) hunk lines. If hunk size information does not correspond with the number of lines in the hunk, then the diff could be considered invalid and be rejected.

Optionally, the hunk range can be followed by the heading of the section or function that the hunk is part of. This is mainly useful to make the diff easier to read. When creating a diff with GNU diff, the heading is identified by regular expression matching.[12]

If a line is modified, it is represented as a deletion and addition. Since the hunks of the original and new file appear in the same hunk, such changes would appear adjacent to one another.[13]

An occurrence of this in the example below is:

 -check this dokument. On +check this document. On

The command diff -u original new produces the following output:

--- /path/to/original timestamp

+++ /path/to/new timestamp

@@ -1,3 +1,9 @@

+This is an important

+notice! It should

+therefore be located at

+the beginning of this

+document!

+

 This part of the document has stayed the same from version to

@@ -8,13 +14,8 @@

 compress the size of the changes.
-This paragraph contains-text that is outdated.-It will be deleted in the-near future.
-check this dokument. On

+check this document. On

 the other hand, a misspelled word isn't the end of the world.

@@ -22,3 +23,7 @@

 this paragraph needs to be changed. Things can be added after it.

+

+This paragraph contains

+important new additions

+to this document.

Note that to successfully separate the file names from the timestamps, the delimiter between them is a tab character. This is invisible on screen and can be lost when diffs are copy/pasted from console/terminal screens.

There are some modifications and extensions to the diff formats that are used and understood by certain programs and in certain contexts. For example, some revision control systems—such as Subversion—specify a version number, "working copy", or any other comment instead of or in addition to a timestamp in the diff's header section.

Some tools allow diffs for several different files to be merged into one, using a header for each modified file that may look something like this:

The special case of files that do not end in a newline is not handled. Neither the unidiff utility nor the POSIX diff standard define a way to handle this type of files. (Indeed, such files are not "text" files by strict POSIX definitions.[14])

The patch program is not aware even of an implementation specific diff output.

Others

Postprocessors sdiff and diffmk render side-by-side diff listings and applied change marks to printed documents, respectively. Both were developed elsewhere in Bell Labs in or before 1981.{{citation needed|date=February 2015}}

Diff3 compares one file against two other files. It was originally conceived by Paul Jensen to reconcile changes made by two people editing a common source. It is also used by revision control systems, e.g. RCS,[15] for merging.

GNU diff and diff3 are included in the diffutils package with other diff and patch related utilities. Emacs has Ediff for showing the changes a patch would provide in a user interface that combines interactive editing and merging capabilities for patch files. Nowadays there is also a patchutils package that can combine, rearrange, compare and fix context diffs and unified diffs.

GNU Wdiff[16] is a front end to diff that shows the words or phrases that changed in a text document of written language even in the presence of word-wrapping or different column widths.

colordiff ia a Perl script colordiff is a wrapper for 'diff' and produces the same output but with pretty 'syntax' highlighting[17].

Utilities that compare source files by their syntactic structure have been built mostly as research tools for some programming languages;[18][19][20] some are available as commercial tools.[21][22][23] Tools exist to compare HTML [24] and ones for XML have been published by Microsoft and IBM.[25][26]

spiff compares files' logical not literal differences thus the same as diff but ignoring differences in:

  1. floating point calculations with roundoff errors&91;27&93;&91;28&93;&91;29&93;
  2. and whitespace&91;27&93;&91;29&93;

which are generally irrelevant to source code comparison. An extreme inverse is cmp. Bellcore wrote the original version.[27][29] An HPUX port is the most current public release. spiff does not support binary files.[30][31] spiff outputs to the standard output in standard diff format and accepts inputs in the C, Bourne shell, Fortran, Modula-2 and Lisp programming languages.

Free file comparison tools

{{colbegin}}
  • cmp
  • comm
  • Diff-Text
  • diff3
  • Kompare
  • tkdiff
  • WinMerge (Microsoft Windows)
  • meld
  • Pretty Diff
  • spiff
{{colend}}

See also

{{colbegin}}
  • Comparison of file comparison tools
  • Delta encoding
  • Difference operator
  • Edit distance
    • Levenshtein distance
  • History of software configuration management
  • Longest common subsequence problem
  • Microsoft File Compare
  • Revision control
  • Software configuration management
{{colend}}

References

1. ^MacKenzie et al. "Binary Files and Forcing Text Comparison" in Comparing and Merging Files with GNU Diff and Patch. Downloaded 28 April 2007. [https://www.gnu.org/software/diffutils/manual/html_node/Binary.html]
2. ^Eric S. Raymond (ed.), “diff”, The Jargon File, version 4.4.7
3. ^{{cite book|author1 = IEEE Computer Society|author2 = The Open Group|date=26 September 2008|title = Standard for Information Technology—Portable Operating System Interface (POSIX) Base Specifications, Issue 7|pages = 2599–2607}} IEEE Std. 1003.1-2001 specifies traditional, "ed script", and context diff output formats; IEEE Std. 1003.1-2008 added the (by then more common) unified format.
4. ^{{cite journal|author1=James W. Hunt |author2=M. Douglas McIlroy |title=An Algorithm for Differential File Comparison|volume=41|journal=Computing Science Technical Report, Bell Laboratories|date=June 1976|pages=|url=http://www.cs.dartmouth.edu/~doug/diff.pdf}}
5. ^{{cite newsgroup | title = A patch applier--YOU WANT THIS!!! | author = Larry Wall | date = November 9, 1984 | newsgroup = net.sources | message-id = 1457@sdcrdcf.UUCP | url = https://groups.google.com/d/msg/net.sources/qtfVio1sSHs/G0cPT5HFDFcJ | access-date = May 11, 2015 }}
6. ^{{cite newsgroup | title = patch version 1.2--YOU WANT THIS | author = Larry Wall | date = November 29, 1984 | newsgroup = net.sources | message-id = 1508@sdcrdcf.UUCP | url = https://groups.google.com/d/msg/net.sources/uWFr9NOp_fw/SRS_P2vSgFgJ | access-date = May 11, 2015 }}
7. ^{{cite newsgroup | title = patch version 1.3 | author = Larry Wall | date = May 8, 1985 | newsgroup = net.sources | message-id = 813@genrad.UUCP | url = https://groups.google.com/d/msg/mod.sources/xSQM63e39YY/apNNJSkJi0gJ | access-date = May 11, 2015 }}
8. ^{{cite book|title=Comparing and Merging Files with GNU Diff and Patch|url=https://www.gnu.org/software/diffutils/manual/|author1=David MacKenzie |author2=Paul Eggert |author3=Richard Stallman |isbn=978-0-9541617-5-0|publisher=Network Theory|year=1997|location=Bristol}}
9. ^{{cite journal|author=E. Myers|title=An O(ND) Difference Algorithm and Its Variations|journal=Algorithmica|volume=1|issue=2|year=1986|pages=251–266|doi=10.1007/BF01840446|citeseerx=10.1.1.4.6927}}
10. ^{{cite journal|author1=Webb Miller |author2=Eugene W. Myers |title=A File Comparison Program|journal=Software — Practice and Experience|volume=15|issue=11|year=1985|pages=1025–1040|doi=10.1002/spe.4380151102|citeseerx=10.1.1.189.70 }}
11. ^{{cite journal|author=Esko Ukkonen|title=Algorithms for Approximate String Matching|volume=64|journal=Information and Control|issue=1–3|year=1985|pages=100–118 | doi = 10.1016/S0019-9958(85)80046-2}}
12. ^[https://www.gnu.org/software/diffutils/manual/html_node/Sections.html 2.2.3 Showing Which Sections Differences Are in], GNU diffutils manual
13. ^Unified Diff Format by Guido van Rossum, June 14, 2006
14. ^http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_205 Section 3.206
15. ^https://www.gnu.org/software/rcs/manual/merge.html
16. ^https://www.gnu.org/software/wdiff/
17. ^https://www.colordiff.org/
18. ^{{cite journal|last1=Horwitz|first1=Susan|title=Identifying the semantic and textual differences between two versions of a program|journal=ACM SIGPLAN Notices|date=June 1990|volume=25|issue=6|pages=234–245|doi=10.1145/93548.93574|url=http://www.cs.wisc.edu/wpis/papers/sigplan90.ps|citeseerx=10.1.1.49.3377}}
19. ^{{cite journal|last1=Yang|first1=Wuu|title=Identifying syntactic differences between two programs|journal=Software: Practice and Experience|date=July 1991|volume=21|issue=7|pages=739–755|doi=10.1002/spe.4380210706|citeseerx=10.1.1.13.9377}}
20. ^Grass. Cdiff: A syntax directed Diff for C++ programs. Proceedings USENIX C++ Conf., pp. 181-193, 1992
21. ^Compare++, http://www.coodesoft.com/
22. ^SmartDifferencer, http://www.semanticdesigns.com/Products/SmartDifferencer
23. ^Cheney, Austin. Pretty Diff - Documentation. http://prettydiff.com/documentation.php
24. ^DaisyDiff, https://code.google.com/p/daisydiff/
25. ^xmldiffpatch, http://msdn.microsoft.com/en-us/library/aa302294.aspx
26. ^xmldiffmerge, http://www.alphaworks.ibm.com/tech/xmldiffmerge
27. ^{{cite web|url=https://github.com/dontcallmedom/spiff|title=spiff|author=dontcallmedotcom|accessdate=2013-06-16}}
28. ^{{cite web|url=https://stackoverflow.com/a/1489107/2291035|date=2009-09-28|author=Davide|title=stackoverflow|accessdate=2013-06-16}}
29. ^{{cite web|url=http://hpux.connect.org.uk/hppd/hpux/Text/spiff-1.0/|title=HP-UX Porting and Archiving|location=UK|first=Daniel W|last=Nachbar|date=1999-12-01|accessdate=2013-06-13}}
30. ^{{cite web|url=http://www.math.utah.edu/cgi-bin/man2html.cgi?/usr/local/man/man1/spiff.1|title=SPIFF 1|date=1988-02-02|accessdate=2013-06-16}}
31. ^{{cite web|url=http://hpux.connect.org.uk/hppd/hpux/Text/spiff-1.0/man.html|title=Man page|location=UK|first=Daniel W|last=Nachbar|date=1988-02-02|accessdate=2013-06-16}}

Further reading

  • {{cite journal|author=Paul Heckel|title=A technique for isolating differences between files|journal=Communications of the ACM|volume=21|issue=4|date=April 1978|pages=264–268|doi=10.1145/359460.359467}} 
  • A generic implementation of the Myers SES/LCS algorithm with the Hirschberg linear space refinement (C source code)

External links

{{Prone to spam|date=May 2012}}{{Z148}}
  • {{man|cu|diff|SUS|compare two files}}
  • {{Dmoz|Computers/Software/File_Management/File_Comparison|File comparison}}
  • [https://www.gnu.org/software/diffutils/ GNU Diff utilities]. Made available by the Free Software Foundation. Free Documentation. Free source code.
  • JavaScript Implementation
{{Unix commands}}{{Version control software}}

7 : 1974 software|Free file comparison tools|Formal languages|Pattern matching|Data differencing|Standard Unix programs|Unix SUS2008 utilities

随便看

 

开放百科全书收录14589846条英语、德语、日语等多语种百科知识,基本涵盖了大多数领域的百科知识,是一部内容自由、开放的电子版国际百科全书。

 

Copyright © 2023 OENC.NET All Rights Reserved
京ICP备2021023879号 更新时间:2024/11/11 4:21:49