Thursday, 25 August 2011

Diff two files ignoring certain fields (like timestamps)

This is a useful trick to avoid creating lots of intermediate temporary files when you're trying to compare two files that are almost the same, but which have some fields which are guaranteed to be different. Classic examples of this are two log files that have almost the same data in them, but where every line in these files is prefixed by a pesky timestamp which is different between the two files. bash process substitution (using named pipes) to the rescue! Let's assume we have two files "file1.log" and "file2.log" and the first six space-separated fields comprise a timestamp. To ignore those fields and just diff the log file contents we can do this:

$ diff <(cut -d" " -f7- /tmp/file1.log) <(cut -d" " -f7- /tmp/file2.log)
But why limit yourself to a textual diff. Go graphical if you prefer:
$ meld <(cut -d" " -f7- /tmp/file1.log) <(cut -d" " -f7- /tmp/file2.log)
This technique can be employed in other ways. Imagine you're working with a VCS / SCM tool such as bazaar. Here's how you can interactively edit a file whilst comparing it against the latest committed version in the branch or repository using vim's diff mode:
$ vim -d src/foo.c <(bzr cat src/foo.c)

1 comment:

  1. Nice Post, Thanks for your very useful information... I will bookmark for next reference.