Summary
I couldn't find a thorough spec for the format called "unified diff" so I decided to research it. Here are my findings.
Advertisement
I haven't found a satisfactory specification of the unified diff
format (the one on the GNU website is hopelessly incomplete).
Here's what I've discovered by experimenting with diff(1) on Red Hat
Linux; this identifies itself as 'diff (GNU diffutils) 2.8.1'.
Hopefully this is useful for someone who needs to generate unified
diffs or who needs to parse them. (I had both needs recently. :-)
The header lines look like this:
indicator ' ' filename '\t' date ' ' time ' ' timezone
where:
indicator is '---' for the old file and '+++' for the new
date has the form YYYY-MM-DD
time has the form hh:mm:ss.nnnnnnnnn on a 24-hour clock
timezone is has the form ('+'|'-') hhmm where hhmm is hours and
minutes east (if the sign is +) or west (if the sign is -) of
GMT/UTC
Each chunk starts with a line that looks like this:
'@@ -' range ' +' range ' @@'
where range is either one unsigned decimal number or two separated
by a comma. The first number is the start line of the chunk in the
old or new file. The second number is chunk size in that file; it
and the comma are omitted if the chunk size is 1.
(Email from a reader suggests that this omission is optional
and may be phased out.) If the chunk size is
0, the first number is one lower than one would expect (it is the
line number after which the chunk should be inserted or deleted; in
all other cases it gives the first line number or the replaced range
of lines).
A chunk then continues with lines starting with ' ' (common line),
'-' (only in old file), or '+' (only in new file). If the last line
of a file doesn't end in a newline character, it is displayed with a
newline characer, and the following line in the chunk has the
literal text (starting in the first column):
I had the same need to a parse unified diffs a while ago. Your description of the format looks accurate, though there are some other details about deleted and created files:
When a file is deleted (rather than just made empty), the +++ date is set to the epoch. Similarly, when a file is created, the --- date is set to the epoch.
This can be shown by doing a recursive directory diff with the -N option on:
I found the option of omitting the comma in the range description wasn't very useful, and the patchutils maintainer agrees, so that will probably be phased out.
A colleague just thanked me for writing this, and added:
"Just fyi, there seems to be some small variations in the header ---/+++ lines. SVN replaces the timestamps with repository paths and revision info which invalidates Aaron's removed/created logic."
Hi Guido, I know this is an old post, but I just found it and here's a (hopefully) useful addition:
The other way to tell if a file has been created/deleted is to look at the range of lines affected in the first chunk. -0,0 means that it's a created file. +0,0 means that it's a deleted file.
0,0 will never appear in the ranges except in these cases.