Tuesday, 24 July 2012

simplified strace diffing

strace is an extremely powerful tool. But have you ever attempted to compare strace log files? That can get tricky. How about diffing multiple strace logs of multi-process applications? That can be a world of pain. So, I wrote a simple shell script to make a life easier. Then I decided to rewrite it in Python and life got even better :-)

My reason for diffing multi-process strace logs was not to see where an application was failing (that's what debuggers are for) but more to understand the flow of program execution. So what follows may have a fairly niche audience.

The script is pretty simple: it's just simplifies the log files to allow easier diffing. Specifically it:
  • replaces all addresses with 0xADDR (or 0xNULL).
  • replaces all timestamps with HH:MM:SS.
  • replaces all datestamps with YYYY/MM/DD.
  • tracks PIDs and replaces each PID seen with a 1-based value (so the first PID seen will be assigned PID1, the second PID2 et cetera).
These simple changes turn out to be pretty powerful. For example...

$ strace -o /tmp/st1.log -fFv -s 1024 script -c 'whoami' /dev/null
$ strace -o /tmp/st2.log -fFv -s 1024 script -c 'whoami' /dev/null
$ diff /tmp/st?.log | wc -l

1,718 lines of differences from running the same command twice. Ouch!

As shown in the screenshot, meld cannot find any lines in common between the files (hence the blue background).

But when we pre-process st1.log and st2.log through strace_summarise.py, the results are much improved:

$ strace_summarise.py /tmp/st1.log > /tmp/st1.sum
$ strace_summarise.py /tmp/st2.log > /tmp/st2.sum
$ diff /tmp/st?.sum | wc -l

Down to only 153 lines of difference - that's a lot more manageable as shown by the meld screenshot above.

In actual fact, since meld is written in Python, you can configure it to ignore certain patterns (Edit -> Preferences -> Text Filters). However, I usually use the awesomely fast xxdiff which doesn't offer this feature (although unlike meld it actually highlights similarities in lines which are marked as non-matching).

The script is here:

Wednesday, 18 July 2012

A Bazaar pre-commit hook to look for signs of unfinished work


Programming involves a fair amount of mental juggling: jumping around a codebase working on a new feature or attempting to track down a bug in existing code, you have to keep track of a lot of pieces of information. However, more balls end up being thrown into the air when, passing through some file or function, you come across an area that needs a bit of lovin'. Often the niggles you spot are minor, but not minor enough to ignore (no Broken Windows here please!)

The problem is that these minor imperfections -- which are probably entirely unrelated to the task at hand -- are both annoying and distracting. Imagine...
  • A needs fixing.
  • C, D, E and G need refactoring now that B exists.
  • Documentation for C, E and G need their function docs updating.
  • H introduces a new feature that needs to be documented in a man page.
  • I is inconsistently formatted.
  • J contains a couple of typos.
  • K lacks sufficient tests.
  • L is now giving warnings with the new stricter version of GCC you've just started using.

Now... where was I?
If you can't make the changes at the time (and unless they are extremely quick to make you probably shouldn't as it will derail your current activity), an obvious strategy is leave little "doc droppings" (aka FIXME's) around your code as reminders of things that need to be done "later". If this sounds like you, read on...

Found some Fluff

Recently, I discovered a file I'd changed a long time ago still had a "FIXME" in it. This irked me on a couple of levels. Firstly, that I hadn't spotted it before committing the change, but also that I shouldn't have to worry about checking for such things manually. And so, unto the world an unscratched itch was born.

It's Not a Problem, it's an Opportunity

Enter bzr which allows you to write plugins and hooks extremely easily in Python. I'd wanted an excuse to write a plugin for a while. And now I had discovered the need to be reminded of unfinished work before I commit it, hence the excuse to write the unimaginatively-named "find_fixmes_pre_commit_hook.py".

Hook Overview

The hook iterates through all modified files looking for a set of regular expressions I tend to use. The current list being:

  • TODO
  • // 
  • #if 1
  • #if 0

If you're not a C coder, the last two are preprocessor directives: the last regex being indicative of a commented out block of code (we all know these are bad to leave lying around right?), and the penultimate being indicative of a chunk of test or debug code that is enabled but should probably be removed. I also occasionally use C++-style comments for temporarily disabling a 1-liner.

If the hook finds any matches, it will perform the following steps for each match:
  1. Display the line matching the regular expression in question.
  2. Display a number of lines of context.
  3. Interactively prompt the user to either proceed or abort the commit.


The hook is currently extremely basic - it doesn't even have any form of packaging so to install it, all you need to do is copy the hook file to ~/.bazaar/plugins/ (create the directory if required). You could just sym link it to that directory if you prefer.


Once you've installed the hook, whenever you run "bzr commit" in any branch, the hook will be called automatically.


To disable the hook, just delete the file or move it out of that directory.


  • If you "bzr commit -F mychanges.txt", you'll still be prompted if any of the regex's are found.
  • If "bzr commit" is run with no associated terminal (for example if you redirect stdin to /dev/null), even if the hook finds issues, the changes will still be committed.


  • Add ability to show "post-context" (where available) to give the user a better indication of where the code issue is located.
  • Add in ability to toggle the hook on/off depending on the branch being committed to.
  • Add ability to restrict the hook to only operating on particular types of files (it currently attempts to ignore checking binary files by considering the mime-type).
  • Add command-line option to "bzr commit" to force the commit but still display the warnings.

The Hook in Operation

Lets create a branch of the "hello" application (yes, it really does exist!), make a change to it that introduces one of my chosen regular expressions ("FIXME") and then attempt to commit the change.

$ cd /tmp
$ bzr branch lp:ubuntu/hello
$ cd hello
$ echo "/* FIXME: still have stuff to do */" >> src/hello.c
$ bzr commit -m "a change"
Committing to: /tmp/hello/
modified src/hello.c
| Running pre_commit hooks - Stage 3/5

ERROR: Found unexpected pattern '\bFIXME\b' at line 214 in file 'src/hello.c'
(branch: name '', path '/tmp/hello', url 'file:///tmp/hello/'):

209:License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>\n\
210:This is free software: you are free to change and redistribute it.\n\
211:There is NO WARRANTY, to the extent permitted by law.\n"),
212:              "2011");
214:/* FIXME: still have stuff to do */

Abort commit to allow problem to be resolved? ([y]es, [n]o): yes
Commit aborted.

Show Me the Code

The plugin is here: find_fixmes_pre_commit_hook.py

To Infinity and Beyond

Hooks are cool and not difficult to implement. Consider a hook that stopped you committing code unless:

  • a build is successful.
  • a test suite passes successfully.
  • a security / license auditing tool runs cleanly over the code.
  • a static analysis run is successful (or code is atleast within some predefined threshold).
  • the new code conforms to a suitable coding standard.
(Those last 2 ideas might seem a little harsh but there are companies which mandate this (I've worked for some ;-))


See also: