Tuesday, 24 July 2012

simplified strace diffing

strace is an extremely powerful tool. But have you ever attempted to compare strace log files? That can get tricky. How about diffing multiple strace logs of multi-process applications? That can be a world of pain. So, I wrote a simple shell script to make a life easier. Then I decided to rewrite it in Python and life got even better :-)

My reason for diffing multi-process strace logs was not to see where an application was failing (that's what debuggers are for) but more to understand the flow of program execution. So what follows may have a fairly niche audience.

The script is pretty simple: it's just simplifies the log files to allow easier diffing. Specifically it:
  • replaces all addresses with 0xADDR (or 0xNULL).
  • replaces all timestamps with HH:MM:SS.
  • replaces all datestamps with YYYY/MM/DD.
  • tracks PIDs and replaces each PID seen with a 1-based value (so the first PID seen will be assigned PID1, the second PID2 et cetera).
These simple changes turn out to be pretty powerful. For example...

$ strace -o /tmp/st1.log -fFv -s 1024 script -c 'whoami' /dev/null
$ strace -o /tmp/st2.log -fFv -s 1024 script -c 'whoami' /dev/null
$ diff /tmp/st?.log | wc -l
1718
$ 



1,718 lines of differences from running the same command twice. Ouch!

As shown in the screenshot, meld cannot find any lines in common between the files (hence the blue background).

But when we pre-process st1.log and st2.log through strace_summarise.py, the results are much improved:

$ strace_summarise.py /tmp/st1.log > /tmp/st1.sum
$ strace_summarise.py /tmp/st2.log > /tmp/st2.sum
$ diff /tmp/st?.sum | wc -l
153
$ 



Down to only 153 lines of difference - that's a lot more manageable as shown by the meld screenshot above.

In actual fact, since meld is written in Python, you can configure it to ignore certain patterns (Edit -> Preferences -> Text Filters). However, I usually use the awesomely fast xxdiff which doesn't offer this feature (although unlike meld it actually highlights similarities in lines which are marked as non-matching).

The script is here:




3 comments:

  1. Nice Post, Thanks for your very useful information... I will bookmark for next reference.

    ReplyDelete
  2. I know where I'm going and l know the truth, and I don't have to be what you want me to be. I'm free to be what I want.Thankyou i really love it..

    ReplyDelete
  3. A couple of little bugs:

    1) errors for 3-digit PIDs

    2) files with different length PIDs end up with different number of spaces, so no lines match

    ReplyDelete