martes, 26 de julio de 2011

Auditing table changes

How often are you asked for logging changes in tables? AFAIR, every single customer i have had asked me for that... Until now, i have using table_log (http://pgfoundry.org/projects/tablelog) for that but this is less than
ideal for several reasons.

what table_log does is to create a new table for every table we track with the same columns as the original table has plus some added columns at the end of the table, this has a few problems:
  1. if we add new columns to the original table we need to contort things to add the same columns in the correct place in the log table if we don't want to lose history.
  2. if you drop a column, you should do the same on the log table or you couldn't be able to insert new records. in this case the solution is to lose some history
  3. this inserts 1 record per INSERT, 1 record per DELETE and 2 records per UPDATE (the old and new versions of the record are stored); so in heavily updated tables size can be double of the real one
  4. no support for logging truncate actions
  5. there is no simple way to determine which is the old and new versions of a single tuple, you need to match new versions with old versions using time of change, user and mode (UPDATE)
  6. there is missing information, at least ip of the client
for these reasons, i have made a little trigger to make this a little better (https://github.com/jcasanov/pg_audit).

what i have done here is:
  • use hstore, and ensure only log the values that have changed. because of that, if columns are added or removed we don't need to do anything to the logger table.
  • it inserts only 1 record per action so use less space than table_log
  • it logs TRUNCATE actions (it needs an additional trigger for that, though)
  • old and new values are on the same record so is really obvious what have changed in the tuple
  • i have added two fields that there aren't in table_log current_user and client_addr (i'm not sure how useful current_user is but surely the ip can be useful)
  • i have used only one table for all tables tracked, that way if we want to have separate tables for some we can partition by relid
i have written this in plpgsql but can think on move it to c if it seems useful