Friday, 25 May 2012

Job Logging in Upstart

Job Logging in Upstart


The big Upstart feature in Ubuntu Precise is "job logging" (in fact, it turned out to be a significantly bigger feature than we'd originally envisaged :-). This had been a wishlist item for some time and a lot of folk were very keen to see this implemented. All system jobs now have their stdout and stderr logged automatically by default to a text file in directory /var/log/upstart/.

Why did we make this the default? Surely daemons and services don't generally write any output? True, but when they do produce output, it is worth capturing since it has a very high chance of being an error message. And errors should not be ignored. For jobs that do not produce any output, there is minimal overhead and of course no log is written.

The logger actually uses pseudo-ptys just like script(1), xterm(1), expect(1)screen(1) et al. This is advantageous for a number of reasons, but from the logging perspective the biggie is "no buffering" so Upstart is able to slurp the jobs data as soon as it is generated. So even if a job produces only a single byte of output, this will be detected immediately.

Whilst developing this feature, we found a few services that had been generating warning and error messages for a looooong time but since the output was been discarded (implicit "console none"), we were not aware of them.

Job Debugging


The other great thing about job logging is that you don't need any special setup to use this feature as it's already enabled. All you need to do is add some calls to echo(1) and you are away.

To help with debugging script sections, you might like to add a "set -x" immediately after the script stanza. This is particularly helpful if you have tests in your code as you can see clearly which one is failing (recall that Upstart runs the script sections using "/bin/sh -e" so any errors are fatal.

$ cat <<EOT | sudo tee /etc/init/foo.conf
> task
> script
>   set -x
>   echo I am a job running with the following environment:
>   env
>   if [ 1 -gt 2 ]
>   then
>     echo impossible
>   else
>     echo sane
>   fi
> end script
> EOT
$ init-checkconf /etc/init/foo.conf && sudo start foo && sudo cat /var/log/upstart/foo.log
+ echo I am a job running with the following environment:
I am a job running with the following environment:
+ env
UPSTART_INSTANCE=
UPSTART_JOB=foo
TERM=linux
PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin
PWD=/
+ [ 1 -gt 2 ]
+ echo sane
sane

Early Logging


Another neat feature of job logging is that it works from Upstart startup. What this means is that you can have a job that specifies "start on startup" (where startup is the first event Upstart emits at the point the disks are still read-only) and still have its output logged. Upstart handles this situation by writing the log as soon as the disk becomes writeable. This makes debugging very early system startup issues significantly easier. If you want the data "immediately" you could always use the  "--logdir" command-line option and arrange to have Upstart write job logs to a tmpfs filesystem.

Controlling Logging


Logging can be disabled entirely using the "--no-log" command-line option which gives you the old Upstart behaviour where all jobs implicitly defaulted to "console none".

You can of course disable logging for individual jobs by specifying an explicit "console none".

But you can also invert the logic and disable logging for all jobs except those you choose. To do this, add "console log" to all jobs you wish Upstart to log output for and then boot adding the following command-line option:  "--default-console=none". Now, Upstart will honour any job that specifies "console" explicitly, but if not, it will default to "console none", and discard all job output.

Further Reading


As usual, for further details, consult init(5) and the Upstart Cookbook, specifically:


If you haven't yet upgraded to precise, check out the release notes:

Why not take a look in /var/log/upstart/ now and see what you can find?

JaMeS



Thursday, 17 May 2012

A quick libnih tutorial

Introduction

The NIH Utility Library (libnih) is a small, efficient and most importantly safe library of general purpose routines. It was written by Keybuk so you can be assured that it is extremely elegant, well-designed, well-tested (includes 2863 tests currently!) and well-written. NIH is used by Upstart, the event-based init daemon which is used by:

That's a lot of deployments of Upstart and NIH around the world!! (And we're not even including mobile device operating systems in that list).

But why not just use glib I hear you ask? Well, glib is a very large library whereas NIH is small and designed for low-level daemons and systems which may be resource-constrained. Also, lets not forget that NIH, like Upstart, comes with a very comprehensive test suite so bugs are rare.

Other reasons to use NIH:
  • It handles garbage collection for you

    That's right, you don't need to free memory manually.

  • It uses an Object-Oriented-like Design... in C!

    This is extremely powerful and elegant. It's also quite easy to use once you understand the way the API works.

Let's start with some basics...

Garbage Collection


/* WARNING! Contains bugs! */
int
main (int argc, char *argv[])
{
    nih_local char *string;

    if (argc > 1) {
        string = nih_strdup (NULL, "hello, world");
        nih_message ("string is set to '%s'", string);
    }
}

This code nominally is trying to display a message if the user runs this application with one or more command-line arguments specified. However, there are a couple of problems with it:

  • No check is performed on the memory allocated by nih_strdup().
  • If no command-line argument is specified, chances are this program will crash.
The first issue is easy to spot and easy to remedy, but what about this crash? Well, nih_local variables are garbage collected automatically when they go out of scope. The string variable will therefore be garbage collected when it goes out of scope, which is when main() exits. However, since string was never initialized, it will be pointing to a random location in memory such that when the program exits, the runtime will attempt to free that random memory address. That will probably result in a SIGSEGV caused by dereferencing an illegal pointer value. The fix is easy and you should chant this mantra whenever you use nih_local variables:

Always assign nih_local variables to NULL.

Here's a corrected version:

/* Correct version */
int
main (int argc, char *argv[])
{
    /* XXX: *ALWAYS* set nih_local variables to NULL */ 
    nih_local char *string = NULL;

    if ( argc > 1) {
        string = nih_strdup (NULL, "hello, world");
        if (string) 
            nih_message ("string is set to '%s'", string);
        else {
            nih_error ("failed to allocate space for string");
            exit (EXIT_FAILURE);
        }
    }
}


However, there is an even better way to code that check to ensure nih_strdup() succeeded:


/* Improved version */ 
int
main (int argc, char *argv[])
{
    /* XXX: *ALWAYS* set nih_local variables to NULL */ 
    nih_local char *string = NULL;

    if ( argc > 1) {
        string = NIH_MUST (nih_strdup (NULL, "hello, world"));
        nih_message ("string is set to '%s'", string);
    }
}

So now, if the user specifies a command-line option, the program will print "hello, world" and automatically free the variable string. If the user does not specify a command-line option, no garbage collection will be performed since the string variable will never be associated with allocated memory.

Note that the code is simpler and easier to understand as a result. Note too that we're now using NIH_MUST(). This is a macro which will call the block you pass to it ('nih_strdup (NULL, "hello, world")' in this case) repeatedly until it succeeds. You should exercise caution using NIH_MUST()though since if there is a high likelihood of the allocation never succeeding, the code will spin forever at this point. There is similar call "NIH_SHOULD()" that will call the block passed to it repeatedly until either the result is TRUE, or an error other than ENOMEM is raised.

Parent-Pointer

Let's take a closer look at that call to nih_strdup. The system version of strdup takes a single argument (the string to copy), so why does nih_strdup take two arguments?


nih_strdup (NULL, "hello, world");

Well that first NULL parameter is the parent pointer. Most NIH functions take a parent pointer as their first argument. Lets see these pointers in action before explaining the detail...


#include <nih/macros.h>
#include <nih/logging.h>
#include <nih/string.h>
#include <nih/alloc.h>

int
main(int argc, char *argv[])
{
    typedef struct foo {
        char *str1;
        char *str2;
    } Foo;

    nih_local Foo *foo = NIH_MUST (nih_new (NULL, Foo));

    foo->str1 = NIH_MUST (nih_strdup (foo, "first string"));
    foo->str2 = NIH_MUST (nih_strdup (foo, "second string"));

    nih_message ("foo->str1='%s'", foo->str1); 
    nih_message ("foo->str2='%s'", foo->str2); 

    exit(EXIT_SUCCESS);
}


Here we see our first complete NIH program. There are a couple of important points to note:


  • The call to nih_new() is like malloc()except it too takes a parent pointer. Since the foo object we're creating doesn't have a parent, we set the pointer to NULL.
  • Note that there is no call to free the memory allocated by nih_new()because since we're using nih_local, the object and all its children will be freed automatically when the block (in this example the main() function) ends. This is incredibly powerful: we've made 3 memory allocations in the example (one call to nih_new() and two calls to nih_strdup()), and all that memory will be automatically garbage collected for us because NIH knows to free the foo object when it goes out of scope, but it also knows that the str1 and str2 elements also need to be freed (since we told nih_strdup() their parent is the foo object we previously created).
So the parent pointer provided by most NIH calls is used to enable intelligent garbage collection: by effectively tagging objects with a reference to their parent you are assured of that object being automatically garbage collected when the parent is freed.

Lists

The NIH list implementation is essentially the same as a "struct list_head" in the Linux kernel:

typedef struct nih_list {
    struct nih_list *prev, *next;  
} NihList;

Lists are designed to be contained within other objects like this:
typedef struct bar {
    NihList   entry;
    char     *str;
} Bar;

So you don't create a "list of Bar objects", you create a list of list objects which provide access to their containing types.

Note that the list element is the first in the Bar structure. This allows a list pointer to be dereferenced to its containing type trivially.

Let's look at an example of list usage by implementing echo(1):

#include <nih/macros.h>
#include <nih/logging.h>
#include <nih/string.h>
#include <nih/alloc.h>

typedef struct bar {
    NihList  entry;
    char    *str;
} Bar;

int
main(int argc, char *argv[])
{
    int i;
    NihList *args;

    args = NIH_MUST (nih_list_new (NULL));

    /* store all arguments in a list */
    for (i = 1; i < argc; ++i) {
        Bar *bar = NIH_MUST (nih_new (args, Bar));

        nih_list_init (&bar->entry);

        bar->str = NIH_MUST (nih_strdup (bar, argv[i]));

        nih_list_add (args, &bar->entry);
    }

    i = 1;

    /* display all arguments by iterating over list */
    NIH_LIST_FOREACH (args, iter) {
        Bar *bar = (Bar *)iter;

        nih_message ("argument %d='%s'", i, bar->str);

        ++i;
    }

    nih_free (args);

    return (0);
}

The new features introduced here are the calls to nih_list_init() to initialise a list, and nih_list_add(), which adds the second argument to the list specified by the first argument. Additionally, we have that rather funky NIH_LIST_FOREACH() macro which allows for easy (and fast!) list traversal. In this example we are not using nih_local so what happens when nih_free() is called? Well, all entries in the args list are freed, but before each is freed, the str string within each entry is freed. Then the list itself is freed. Neat huh?

To build our version of echo:
gcc -std=gnu99 -Wall -pedantic echo.c -o echo $(pkg-config --cflags --libs libnih)
Now let's run it:
$ ./echo a b c "hello world" "foo bar" wibble "the end"
argument 1='a'
argument 2='b'
argument 3='c'
argument 4='hello world'
argument 5='foo bar'
argument 6='wibble'
argument 7='the end'
$ 


We've really only scratched the surface of NIHs abilities in this post. Here are some of the other facilities it provides:


  • hashes
  • binary trees
  • string arrays
  • file watches
  • I/O handling
  • signal handling
  • timers
  • reference handling
  • error/exception handling
  • main loop handling
  • command-line option and usage handling
  • child process handling
  • config file handling
  • logging facilities
  • test facilities
If you're interested to learn more, start hacking, or take a look at some of the projects already using NIH:

References