Filed under projects

InfluxDB With Cascaded Downsampling

InfluxDB as Round-Robin Database replacement

Time-series databases are useful beasts. They are particularly useful for storing and visualising measurements.

When storing measurement data, a particularly useful property is the granularity decay. Most recent measurement data is kept with the highest granularity, and as time passes, the data is aggregated and downsampled in progressively coarser steps.

The venerable tools, RRD and Graphite (or more accurately, its Carbon/Whisper storage) require to configure upfront how the granularity, compaction and retention are set. InfluxDB doesn't.

If you want the same kind of retention logic and granularity decay with InfluxDB, there are a few hoops to jump through. Oddly enough, configuring such a setup is not really documented.

Data storage and downsampling in InfluxDB

Retention time and data granularity are tied to retention policies, which are used to specify how long the stored data is kept around. However, they say nothing about how this data should look like.

As time-series data comes in, it gets stored in, and is retained according to, the DEFAULT retention policy. (Yep, such a thing always exists. Even if you didn't create it.)

When storing and accessing data, InfluxDB uses $database.$retention_policy.$datapoint_name as the full data point path. Incidentally, $database/$retention_policy is also an on-disk path under the main data directory.

We might as well call them buckets.

So, internally InfluxDB writes the incoming data points to the DEFAULT bucket. The retention policy is just a fancy way of saying that data will be expired and deleted from the bucket once it is older than the retention time.

What has this got to do with downsampling?

We're getting to that.

The standard usecase for downsampling is that all data, across all time series dimensions, is continously being aggregated according to configured granularity decay rules. So far nothing in the above has dealt with this aspect.

The trick with InfluxDB is that we can create individual buckets with progressively longer and longer retention periods. And finally, we tell InfluxDB how to populate these buckets with data. Until then, only the DEFAULT bucket will be written to.

Step 1: Choose your retention periods

Let's go with something truly extreme. Don't try this at home.1

We want the following:

  • 1-second granularity for 60 hours
  • 10-second granularity for 8 days
  • 30-second granularity for 15 days
  • 2-minute granularity for 45 days
  • 5-minute granularity for 120 days
  • 15-minute granularity for 220 days, and
  • 1-hour granularity for 800 days

All the data will be coming from collectd.

Step 2: Create named retention policies

Now that we know how long we want to store data, and how we we want it to decay, it's time to get down and dirty.

Create a text file with the following contents:

CREATE DATABASE collectd CREATE RETENTION POLICY "1s_for_60d" ON collectd DURATION 60h REPLICATION 1 DEFAULT CREATE RETENTION POLICY "10s_for_8d" ON collectd DURATION 8d REPLICATION 1 CREATE RETENTION POLICY "30s_for_15d" ON collectd DURATION 15d REPLICATION 1 CREATE RETENTION POLICY "2m_for_45d" ON collectd DURATION 45d REPLICATION 1 CREATE RETENTION POLICY "5m_for_120d" ON collectd DURATION 120d REPLICATION 1 CREATE RETENTION POLICY "15m_for_220d" ON collectd DURATION 220d REPLICATION 1 CREATE RETENTION POLICY "1h_for_800d" ON collectd DURATION 800d REPLICATION 1

And run it with InfluxDB: influx < textfile

At this point we have the data buckets in place, but data is still only being stored in the DEFAULT bucket.

NOTE: There has to be a DEFAULT. That is the only bucket where incoming data is written to.

Step 3: Tell InfluxDB how to generate the downsampled data

As we have already learned, the out-of-the-box behaviour of InfluxDB is to only write data points to DEFAULT bucket. However, we expect the RRD/Graphite semantics - at least they are intuitive.

InfluxDB has a concept of CONTINUOUS QUERY. We can think of them as time-based triggers. A continuous query runs at specified time intervals, reads data from one RETENTION POLICY bucket and writes - likely modified - data to another.

We have the missing piece of the puzzle.

In order to generate the downsampled data, we will need to create continuous queries that progressively aggregate all time-series data from one bucket to another.

So, we can create a file with contents like this:

CREATE CONTINUOUS QUERY "cq_10s_for_8d" ON "collectd" BEGIN SELECT mean(*) INTO "collectd"."10s_for_8d".:MEASUREMENT FROM /.*/ GROUP BY time(10s),* END CREATE CONTINUOUS QUERY "cq_30s_for_15d" ON "collectd" BEGIN SELECT mean(*) INTO "collectd"."30s_for_15d".:MEASUREMENT FROM collectd."10s_for_8d"./.*/ GROUP BY time(30s),* END CREATE CONTINUOUS QUERY "cq_2m_for_45d" ON "collectd" BEGIN SELECT mean(*) INTO "collectd"."2m_for_45d".:MEASUREMENT FROM collectd."30s_for_15d"./.*/ GROUP BY time(2m),* END CREATE CONTINUOUS QUERY "cq_5m_for_120d" ON "collectd" BEGIN SELECT mean(*) INTO "collectd"."5m_for_120d".:MEASUREMENT FROM collectd."2m_for_45d"./.*/ GROUP BY time(5m),* END [... and so on ...]

And run it: influx -database collectd < myqueries.txt

If we look at the first two continous queries, we can see that there is a slight syntactical difference. The first aggregation level reads data from the DEFAULT bucket, and the subsequent ones read from their respective higher granularity buckets.

However, internally the created queries are stored like this:

cq_10s_for_8d CREATE CONTINUOUS QUERY cq_10s_for_8d ON collectd BEGIN SELECT mean(*) INTO collectd."10s_for_8d".:MEASUREMENT FROM collectd."1s_for_60h"./.*/ GROUP BY time(10s), * END cq_30s_for_15d CREATE CONTINUOUS QUERY cq_30s_for_15d ON collectd BEGIN SELECT mean(*) INTO collectd."30s_for_15d".:MEASUREMENT FROM collectd."10s_for_8d"./.*/ GROUP BY time(30s), * END cq_2m_for_45d CREATE CONTINUOUS QUERY cq_2m_for_45d ON collectd BEGIN SELECT mean(*) INTO collectd."2m_for_45d".:MEASUREMENT FROM collectd."30s_for_15d"./.*/ GROUP BY time(2m), * END cq_5m_for_120d CREATE CONTINUOUS QUERY cq_5m_for_120d ON collectd BEGIN SELECT mean(*) INTO collectd."5m_for_120d".:MEASUREMENT FROM collectd."2m_for_45d"./.*/ GROUP BY time(5m), * END [... and so on ...]

The first query has been created with the name of the DEFAULT bucket as the data source, even though we didn't specify it.

After this operation, we can inspect the files on the disk and see how the storage buckets and continuous queries behave:

fluxhost:/var/lib/influxdb% find data -path '*/collectd/*.tsm' data/collectd/10s_for_8d/13/000000005-000000002.tsm data/collectd/10s_for_8d/18/000000001-000000001.tsm data/collectd/autogen/2/000000020-000000002.tsm data/collectd/30s_for_15d/14/000000002-000000002.tsm data/collectd/2m_for_45d/15/000000001-000000001.tsm data/collectd/1s_for_60h/17/000000002-000000002.tsm data/collectd/1s_for_60h/9/000000002-000000002.tsm data/collectd/1s_for_60h/12/000000014-000000003.tsm

Result: Downsampled data for all collected series

It took a few somewhat unintuitive steps, but we have created a progressively decaying time-series storage in InfluxDB

At the time of writing, the above sequence has not been really documented. Official docs explain how to build the individual RETENTION POLICY and CONTINUOUS QUERY elements, but not really how they should be intuitively tied together.

Footnotes

  1. Most time-series setups store their highest granularity data at 10- or 20-second interval and start to decay it after just a few hours. Higher granularity with long retention period will explode the storage requirements.

Graphite - No Such Thing as Overkill

No such thing as overkill

Pretty graphs, state-of-the-art backend system, easy deployment. Pick any two.

Graphite seems to be the current tool for taking care of heavy lifting, and of the three options, it has gone for first two. This is the story of making Graphite run on a Debian Sid system. To really add some gnarly twists, this is done with Lighttpd.

Some of the unexpected dependencies include Twisted (for graphite itself), and Django (for actually running Graphite).

Broken with Django 1.5 - investigation required. Upstream is aware, will be fixed eventually Fixed in Debian Unstable as #721085

Prepackaged

The Graphite stack is commonly available:

sudo apt-get install python-whisper graphite-carbon graphite-web

The tentacles...

Running Graphite with Postgres takes a few extra packages.

sudo apt-get install postgresql python-psycopg2 python-flup

Psycopg2 is a Postgres connector. Flup is required for FastCGI. And Postgres? Well, that's for Django.

Massaging Postgres into shape

Before we add any databases (for Django's benefit), there are a couple of changes we need to do:

New line in postgres's pg_ident.conf :

# MAPNAME SYSTEM-USERNAME PG-USERNAME graphs _graphite graphite

This creates a new internal account-name map "graphs", telling that system name "_graphite" will be treated as Postgres user/role "graphite".

Then, we change access control for domain-socket connections:

Changes in postgres's pg_hba.conf

# Locally mapped usernames in 'graphs' local all all ident map=graphs # "local" is for Unix domain socket connections only local all all ident

These changes mean that access control decisions are based on local identity (available to peers when connecting over domain socket), so these connections do not require passwords for database connections. We add the manually mapped account identification before the global default, so the map is checked first.

Restart Postgres.

No base without data

Let's create the database for our graph data.

sudo -u postgres createuser -D -R -S graphite sudo -u postgres createdb -O graphite graphite

This gives us the Postgres user/role "graphite" as the owner of database "graphite". Thanks to mapping above, we can now do this:

sudo -s su -l -s /bin/bash _graphite psql -d graphite graphite

So we can connect to the newly created graphite database as Postgres user "graphite", when our session user is "_graphite".

Configuring Graphite

The default configuration file /etc/graphite/settings.py corresponds to local_settings.py in Graphite documentation. However, the file provided in graphite-web packaging is from an older version. The database connection section in particular has changed. Instead of individual DATABASE_FOO tokens, the configuration section now allows multiple databases and has been changed into a dictionary. The options now reside in DATABASES['default'][FOO] - so the configuration for our setup looks like this:

DATABASES = { 'default': { 'NAME': 'graphite', 'ENGINE': 'django.db.backends.postgresql_psycopg2', 'USER': 'graphite', 'PORT': 5432, } }

The connection port is not generally needed, since we're connecting over domain socket. However, the port number is also used by Psycopg2: the listening port number is included in the name of the Postgresql socket. In case of non-standard Postgres port, the number will be necessary.

While editing settings.py, we also drop RRD_DIR from DATA_DIRS. We simply don't have any RRD files around at the moment, and Graphite insists that all data directories must exist. Thus, removing the bogus directory prevents a runtime error.

At this point, Graphite should be able to connect to Postgres. We'll initialise the database.

_graphite% /usr/bin/graphite-manage syncdb

That takes care of the Django part. (Note that we did not create a Django project. Graphite IS a pre-packaged Django project, with its own manage.py wrapper. The file's just named differently and is placed in an unusual path.)

Connecting with FastCGI

We start by enabling FastCGI in Lighttpd. Individual modules are available in conf-available, and linked from conf-enabled just like with Apache.

/etc/lighttpd% ls -l conf-enabled/10-fastcgi.conf lrwxrwxrwx 1 root 33 Jan 21 14:47 conf-enabled/10-fastcgi.conf -> ../conf-available/10-fastcgi.conf

Then, we need to choose the path for our FastCGI domain socket. In my case, I added the following block to /etc/lighttpd/lighttpd.conf:

# Graphite setup; FCGI+Django+Twisted+Carbon+GraphiteWeb fastcgi.server = ( "/graphite" => ( "main" => ( "socket" => "/run/graphite/graphite.sock", "check-local" => "disable", ) ), )

The socket is the communication point between Lighttpd and Graphite. Since the socket is created by Graphite, we need to ensure that Lighttpd processes can use the socket. We add www-data user to _graphite group:

% grep -e www-data -e _graphite /etc/group www-data:x:33: _graphite:x:122:www-data

Now, finally, with only slight preparations we can run the entire stack.

As root/sudo-capable user:

sudo mkdir -m 775 /run/graphite sudo chown _graphite._graphite /run/graphite

And as _graphite:

/usr/bin/graphite-manage runfcgi socket=/run/graphite/graphite.sock pidfile=/run/graphite/graphite-fcgi.pid daemonize=true umask=002

A particularly fun fact is that socket permissions are only applied when running in daemonize=true mode.

Taking it out for a spin

From the FastCGI server setup in lighttpd.conf we know that all queries to the path /graphite are forwarded to Graphite engine. We can test that Graphite is working correctly by manually entering the following URL in browser:

http://YOURHOST/graphite/render?target=carbon.*.*.*

If everything is working as it should, we should see a single graph showing how the storage backend (carbon) has been using resources.

Stapling the stack together

Visualised the stack looks something like this:

Graphite stack with lighty+fastcgi

Pokerstats - Hold'em and Omaha Statistics

PokerStats - Hold'em and Omaha Statistics

PokerStats is a software for Linux (and probably other *nix variants) which allows you to track your play, as well as your opponents in Hold'em and Omaha cash games. It reads the hand histories generated by PokerStars client and creates statistics about players and their tendencies. A savvy player can use this information as an additional tool when making decisions.

Features

New/Hot Features

  • HUD - FPDB's Heads-Up Display ported to PokerStats
  • Omaha is now a supported game (currently only in git)

Shows the basic statistical information for players: VPIP, PFR, attempt to steal, continuation-bet frequency, 3bet/4bet frequency and so on.

  • Visualisation of your monetary progress with a graph
  • Statistics about individual hands
  • Player notes
  • Positional statistics
  • Filtering by dates and game level

Download

Latest release: pokerstats-0.3.99 (1.1 MB)

Git tree:

git://bostik.iki.fi/src/pokerstats.git

What does it do?

  • Basic functionality
  • Money graph
  • Most often needed stats
  • Hand reader and HUD in their separate processes for reliability
  • Simple set-o-meter

Requirements

  • Python
  • PostgreSQL
  • Python-Psycopg2
  • PyGTK2+
  • Python-matplotlib
  • Python-dbus
  • Python-libxml2
  • Python-wnck

License

GNU General Public License 3

You are free to use the software as you wish. You are free to learn from it. You can improve or modify it as you wish. You can redistribute it to anyone you want to. The only requirement is that if you pass the software to someone else, you need to provide them with your modifications and grant them the same privileges.

Known Problems

Every now and then the pot calculations are off. About once every 8000 hands a small blind is missed. Also, some shove-contests may end up being miscalculated. If you know which hand triggers this behaviour, please report it.

History

I wanted a poker statistics software that would run natively on my system. But more than that, I wanted to understand how they worked on the inside.

When I first started playing, the only client that would work neatly with WINE was PokerStars. So I took to that. Later on, when I decided to take poker seriously, I found out that there was a selection of very sophisticated statistics software written to help the players analyse their own, as well as their opponents' game.

At the time, the state-of-the-art software was Poker Tracker, which I tried to get working with WINE, without success. I only needed a very basic subset of all the features, so instead of installing another operating system for just casual poker playing, I decided to write my own statistics tool instead. How hard could it be?

And that's how it started.

Wayland for Kids

Educational Computing

What do you get if you combine Wayland, in-car minimonitor and Raspberry Pi? In my case, a perfect setup for our daughter who just loves computers.

Background

Our daughter has been infatuated with computers since before she could crawl. A physical keyboard which produces immediate effects on the screen is a particular favourite. Since it's impossible to draw the little lady away from computers, we might as well let her have one of her own.

The idea for this project was to build a simple computer we could let our daughter to play with, and which would automatically boot to a kiosk-like state. There were only a few requirements:

  • Cheap
  • Low power consumption
  • Movable around the house as needed
  • Easy to hack

The computer part was easy: Raspberry Pi fits all the specifications. If it breaks, replacement cost won't even exceed that of a few beers. I still wanted the setup to be simple, so any monitor I chose would have to have HDMI input and relatively small physical size. I found the answer in a rather unexpected place - small in-car monitors. They fit the physical specifications - and they come with HDMI inputs by default. In comparison, cheapest desktop monitors are heavier, way bigger, and tend to omit HDMI. (At least for the hardware available where I live.)

One thing still missing is the enclosure for RPi. It's a work-in-progress, made of a leftover piece of 3mm thick acrylic plate which cost next to nothing. The piece is also big enough to provide material for a dozen setups.

Construction

The setup looks and behaves like this:

? (Click image for video)

As a base, I used a thick(ish) piece of plywood.

The monitor's power adapter is literally strapped down with large zip ties. The keyboard and cables are set in their place with smaller ones and matching stick-on hooks.

For powering the RPi, I simply used a quality 1.2A/5V µUSB charger.

When the system starts, it boots directly into a native Wayland session. Instructions for building Wayland+Weston (reference compositor) can be read here, and followed with a just couple of tiny modifications.

  1. install autoconf, libpam0g-dev (+ vim-nox) in addition to other packages
  2. add --disable-documentation to wayland build configuration
  3. if necessary, rebuild cairo with --enable-glesv2
  4. build weston with --enable-weston-launch
  5. after weston install, change mode on weston-launch:

    -rwsr-xr-x 1 root root 37289 Jan 15 16:41 bin/weston-launch

Session startup scripts

For the automatic session launch, I use a few simple scripts.

Wayland Environment:

pi@raspberrypi ~ $ cat wayland-env-setup.sh #!/bin/sh export WLD="$HOME/local" export PATH="$WLD/bin:$PATH" export LD_LIBRARY_PATH="$WLD/lib:/opt/vc/lib" export PKG_CONFIG_PATH="$WLD/lib/pkgconfig:$WLD/share/pkgconfig" export ACLOCAL="aclocal -I $WLD/share/aclocal" export XDG_RUNTIME_DIR="/run/shm/wayland" export XDG_CONFIG_HOME="$WLD/etc" if [ ! -d $WLD/share/aclocal ]; then mkdir -p $WLD/share/aclocal fi if [ ! -d $XDG_RUNTIME_DIR ]; then mkdir -p $XDG_RUNTIME_DIR fi chmod 700 $XDG_RUNTIME_DIR chown pi.pi $XDG_RUNTIME_DIR

Weston Compositor Launch

pi@raspberrypi ~ $ cat W #!/bin/bash source ${HOME}/wayland-env-setup.sh exec weston-launch -- --max-planes=0 --idle-time=99999

Terminal Launch

pi@raspberrypi ~ $ cat T #!/bin/bash source ${HOME}/wayland-env-setup.sh exec weston-terminal

Modified /etc/rc.local

pi@raspberrypi ~ $ cat /etc/rc.local #!/bin/sh -e # # rc.local # # This script is executed at the end of each multiuser runlevel. # Make sure that the script will "exit 0" on success or any other # value on error. # # In order to enable or disable this script just change the execution # bits. # # By default this script does nothing. # Print the IP address _IP=$(hostname -I) || true if [ "$_IP" ]; then printf "My IP address is %s\n" "$_IP" fi # Launch Weston as 'pi' if [ -e /home/pi/W ]; then su -l -c '/home/pi/W &' pi fi # Launch terminal inside weston session # but wait until the compositor is ready if [ -e /home/pi/T ]; then printf "Waiting for Weston session..\n" while true; do sleep 1 if [ -e /run/shm/wayland/wayland-0 ]; then break fi done printf "Socket found, compositor ready!\n" su -l -c '/home/pi/T &' pi fi exit 0

Known problems/oddities

The monitor supposedly has a native resolution of 800x480, but the EDID data does not expose this setting. Trying to force this resolution in /boot/config.txt results in black screen and no video output at all. The automatically chosen nearest-match resolution is larger than what the monitor supports, and so the desktop does not fit on screen.

The monitor has to be on when the RPi is booted. Apparently RPi only tries a HDMI handshake at boot time. (Fixed by hdmi_force_hotplug=1 in config.txt)

Disclosure

Some of the fascination for Wayland and car monitors comes from my place of work, Nomovok. Our dealings with bleeding-edge stuff and embedded systems have had an effect on me.

Tramp - Tray MPD Client

Tramp - Tray Mpd Client

The primary function of a music player is to play audio files. mpd does just that, and Tramp is a minimalistic client with only the bare necessities needed to listen to music.

Features:

  • Tray icon
  • Playlist creation
  • Playlist selection
  • Seeking with mousewheel

Everything else is controlled with mpc by binding actions to multimedia keys.

Get the Source

There are no prebuilt binaries. Just grab the source:

git clone git://bostik.iki.fi/src/tramp.git

and build. The source tree contains required files to build into a debian/ubuntu package. Install the needed build dependencies and make your own package:

sudo aptitude install autotools-dev debhelper-dev libmpd-dev libgtk-3-dev fakeroot

dpkg-buildpackage -rfakeroot

The code, including comments, is just ~1600 lines of C.

Screenshots

Because tramp is a minimalistic client, there won't be much in way of screenshots. However, these are the things you're likely to see when using tramp:

Systray icon: ? ; Main window for playlist selection: ?

In addition, the current playlist is shown in tooltip.

Using Tramp

  • Left-clicking on tray icon pops up the window
  • Right-clicking on window hides it
  • Right-clicking on tray icon also hides window, if it's shown
  • Each scrollwheel click on main window seeks 15 seconds

Configuration Examples

When using tramp, controlling mpd is meant to be done with hardware keys. Practically all modern keyboards sport a variety of "multimedia keys" that may be bound to perform actions of all sorts.

My controls for mpd and audio, from fluxbox keybinding file look like this:

None XF86Forward :ExecCommand mpc next None XF86Back :ExecCommand mpc prev None XF86AudioPlay :ExecCommand mpc play None XF86AudioMute :ExecCommand mpc stop Mod1 XF86AudioPlay :ExecCommand mpc toggle None XF86AudioRaiseVolume :ExecCommand aumix -v +2 None XF86AudioLowerVolume :ExecCommand aumix -v -2

While the unmodified keys require no explanation, I have bound Alt+AudioPlay to "toggle", which really is just mpd's way of saying "Pause/Un-Pause".

Blazing Fast 7-card Poker Hand Evaluator

Blazing Fast 7-card Poker Hand Evaluator

So called "twoplustwo algorithm" is the fastest known poker hand evaluator for any form of 7-card poker. It is based on a precomputed lookup-table, through which any hand may be evaluated in 7 steps.

Handeval library computes the lookup table and builds into a shared library to use the table with ease. It also provides python-bindings.

Source and Installation

There is no general build yet, but the source builds directly into a set of debian/ubuntu packages. First, get the source:

git clone git://bostik.iki.fi/src/handeval.git

Build the packages:

dpkg-buildpackage -rfakeroot

which creates these four:

  • libtwoplustwoeval
  • libtwoplustwoeval-data
  • libtwoplustwoeval-dev
  • python-twoplustwo

Then just install the packages. The package libtwoplustwo-data contains the huge (>120MB) lookup table, which is computed during the build.

The following examples assume that the lookup table is installed in the default path as provided by the package.

Using the Evaluator

In C

#include <2p2-eval.h>
...
int hand1[] = { 2, 5, 18, 31, 32, 47, 50 };
int eval1;
long hand_category;
const char *hand_name;

eval_init("/usr/share/twoplustwo/handranks.dat");
eval1 = eval_7_card_hand(hand1);
hand_category = hand_category_id(eval1)
hand_name = hand_category_name(hand_category)

The returned value is the evaluated key for the best poker hand that can be formed by using the seven cards. Bigger value is better, so all evaluated hands can be compared directly.

In Python

import twoplustwo as t
t.init('/usr/share/twoplustwo/handranks.dat')
t.eval_7hand(4, 7, 11, 12, 21, 27, 40)