Wednesday, November 30, 2016

Stampede is out!

Stampede just got live!

I'll try this for sure, when I have time, when I have time...

Sunday, November 27, 2016

pgchem::tigress 4.0 released

pgchem::tigress 4.0 is finally out!
  • This builds against PostgreSQL >= 9.4.x and OpenBabel 4.2.x on Linux.
  • It contains all fixes and contributions of the previous versions.
  • Support for building with Indigo has been removed
  • A lot of dead code has been removed
  • All palloc calls have been replaced by palloc0
  • MOLECULE is not binary compatible and needs to be recreated when you upgrade from 3.2 to 4.0!
  • MS spectrum creation has been removed. You may use enviPat instead for much more accurate spectra 
So: CREATE EXTENSION pgchem_tigress;

Friday, November 18, 2016

tcn again

In the previous post, I suggested that contrib/tcn for example could be used to selectively update caches.

OK, here is my try:


By touching the cache only when the database signals a change of the base table and using the passed information to touch the affected rows only, this should be much more efficient than other strategies like routinely invalidating the cache every n minutes or so.

Monday, November 14, 2016

Module Monday: tcn

The next installment of the series covers contrib/tcn.

tl;dr

Tcn provides a native trigger function to send notifications about INSERT, UPDATE and DELETEs on any table it is attached to, to subscribers of an arbitrarily named channel via PostgreSQL's NOTIFY/LISTEN mechanism. This is useful e.g. for refreshing caches automagically.

First, tcn must be installed in the database:

CREATE EXTENSION tcn;

Now, a test table is created:

CREATE TABLE public.cats
(
  id serial NOT NULL,
  face_name text NOT NULL,
  CONSTRAINT cats_pkey PRIMARY KEY (face_name)
)
WITH (
  OIDS=FALSE
);

And the trigger:

CREATE TRIGGER cats_tcn_trigger
   AFTER INSERT OR UPDATE OR DELETE ON cats
   FOR EACH ROW EXECUTE PROCEDURE triggered_change_notification('cats');

Here, the channel is named 'cats'. If the parameter is omitted, the channel is 'tcn' by default.

It's all set, now we can subscribe to channel 'cats':

LISTEN cats;

Now, when we INSERT some data, the following happens:

INSERT INTO cats (face_name) VALUES ('Tailchaser');

Query returned successfully: one row affected, 13 msec execution time.

Asynchronous notification of 'cats' received from backend pid 5457

Data: "cats",I,"face_name"='Tailchaser'

Multiple operations generate multiple notifications:

INSERT INTO cats (face_name) VALUES ('Hushpad'),('Pouncequick');

Query returned successfully: 2 rows affected, 11 msec execution time.

Asynchronous notification of 'cats' received from backend pid 5457
Data: "cats",I,"face_name"='Hushpad'
Asynchronous notification of 'cats' received from backend pid 5457
Data: "cats",I,"face_name"='Pouncequick'

But what is this good for?

Well, some PostgreSQL drivers support asynchronous notification, e.g. libpq for C, psycopg2 for Python or JDBC for Java. So this can be used for notifying all clients if the content of a table has been changed.

Then the client can react accordingly, e.g. refresh the cache of such a table, or display the changes in a (near) real-time information system, or whatever action is required in the context of the application if a change has occurred.

The following Python code shows an example matching the 'cats' table from above.

Start with python notify.py:


Waiting for notifications on channel 'cats'

Another INSERT

INSERT INTO cats (face_name) VALUES ('Roofshadow');

yields

Got NOTIFY: 5457 cats "cats",I,"face_name"='Roofshadow'
Must refresh cache

from Python.

As always, imagination is the limit.

Monday, November 7, 2016

Locking PostgreSQL shared memory to physical RAM

As an anonymous reader of my last post remarked, instance-level encryption might leak decrypted data to disk.

It looks like if there is a solution to this problem. If you change only one line in src/include/portability/mem.h from:

#define PG_MMAP_FLAGS                   (MAP_SHARED|MAP_ANONYMOUS|MAP_HASSEMAPHORE)

to:

#define PG_MMAP_FLAGS                   (MAP_LOCKED|MAP_SHARED|MAP_ANONYMOUS|MAP_HASSEMAPHORE)

PostgreSQL shared memory should be locked in physical RAM and never been swapped/paged out to disk.

In order for this to work you obviously need enough physical RAM, and the user PostgreSQL runs as needs permission to lock enough memory, though. So you better check the ulimit:

ulimit -l

unlimited (or at least big enough)

Otherwise you get this rather cryptic error at startup:

"could not map anonymous shared memory: resource temporarily unavailable"

If you are not root, it will say "resource temporarily unavailable" and hide the real cause. Since PostgreSQL refuses to run as root, you'll never see the real error: "cannot allocate memory".

Well I'm no PostgreSQL hacker. I seems to be working but is it really that easy to fix?


Sunday, November 6, 2016

The Elephant in the Locker Room: Instance-encrypted PostgreSQL meets Vault

Recently, Cybertec has released a patched PostgreSQL 9.6.0 with instance level encryption. And since I'm frequently asked if PostgreSQL supports this, I had to try it out immediately.

Well, it seems to work as advertised, but as always when dealing with encryption, the pesky question is - where to put the key?

One method to provide the key, is to put a passphrase into a environment variable named PGENCRYPTIONKEY, which is a bit of a misnomer, because the actual key is derived
by SHA256 from the passphrase. But the documentation mentions a second way to provide
the key.

Once the cluster is initialized with initdb, you can provide a command that retrieves the key at startup,
but no example for this is given in the release announcement. So, here is how it works:

As key repository, I chose HashiCorp's Vault, which is "a tool for managing secrets" as they say.

First, fire up vault. I use development mode here which is the easiest way but of course not recommended for production systems.

vault server -dev

Keep the output of this command, you'll need it later!

Second, get the SHA256 hash of the passphrase the database cluster was initialized with. Mine was 'sekrit'. :-)

echo -n sekrit | sha256sum

7e35aaaea6b83ab43c9271737060ce485dc285837d9b4b98b6458408886ac7b1  -

This is the secret to store in Vault. Don't forget the '-n' option for echo!

vault write secret/pg_key value=encryptionkey=7e35aaaea6b83ab43c9271737060ce485dc285837d9b4b98b6458408886ac7b1

The documentation says, that the value returned by the key retrieval command must be 'encryptionkey=...', so this was written as the secret to the vault, not only the key. 'pg_key' is an arbitrary name under which the secret can be retrieved later.

Now, we need a command to retrieve the secret from the Vault. Using the REST interface this becomes:

curl -s -X GET -H "X-Vault-Token:$VAULT_TOKEN" http://localhost:8200/v1/secret/pg_key | jq '.data.value' | tr -d '"'

This is the raw response:


{

  "request_id": "1d485f2a-893f-67e7-6257-12b3b401f916",
  "lease_id": "",
  "renewable": false,
  "lease_duration": 2592000,
  "data": {
    "value": "encryptionkey=7e35aaaea6b83ab43c9271737060ce485dc285837d9b4b98b6458408886ac7b1"
  },
  "wrap_info": null,
  "warnings": null,
  "auth": null
}

curl does the actual REST call, jq parses the value of interest out of the JSON response and tr removes the superfluous quotes around it. At the end we get the desired output:

encryptionkey=7e35aaaea6b83ab43c9271737060ce485dc285837d9b4b98b6458408886ac7b1

Looks OK, so this can be put into postgresql.conf.

pgcrypto.keysetup_command='curl -s -X GET -H "X-Vault-Token:$VAULT_TOKEN" http://localhost:8200/v1/secret/pg_key | jq ''.data.value'' | tr -d ''"'''

And with that in place, every time the PostgreSQL server starts, the key is pulled out of vault and all is set.

But what is the point of using a secret manager like vault instead of other mechanisms? Well, vault needs authentication and authorization to provide it's services.

Actually the REST interface requires a authorization token  to work: -H "X-Vault-Token:$VAULT_TOKEN". vault writes a root token to stdout, that's why the output at startup is important. vault will not reveal that token ever again after first startup!

Unseal Key (hex)   : d6426284cb1b9756d8a108f17a2508a261963acf529cfbbd6b12b8cdc90da643
Unseal Key (base64): 1kJihMsbl1bYoQjxeiUIomGWOs9SnPu9axK4zckNpkM=
Root Token: 60a42d59-1e2b-76fb-329f-c32f3b77291e

In production you would of course never use the root token directly, but get additional tokens with limited rights and lifetime! HTTPS instead of HTTP is the protocol of choice for the REST API.

Also note the unseal key. A vault in production mode will emit several unseal keys of which you need a subset to unseal the secret storage.

So in a production setup, vault allows to:
  1. Manage all your secrets in a central service
  2. Restrict access to the secrets by scope and time
  3. Protect all secrets in encrypted storage
  4. Seal the vault immediately if you suspect your system is compromised
  5. Unseal the secrets only by consent of several unseal key holders
  6. And much more, see the documentation for details
I'm not affiliated with HashiCorp in any way and when I took first look at vault I was not impressed. But the more I dive into it's documentation and try things, this gradually changes. And it can not only manage secrets for PostgreSQL, but use PostgreSQL as a storage backend too!

Monday, October 31, 2016

Module Monday: spi/timetravel

For pg_sentinel I used a SPI function for convenience. After that I decided to refresh my SPI knowledge and discovered the contrib/spi module that comes with PostgreSQL, especially timetravel, which I will introduce today.

tl;dr

Timetravel transparently morphs UPDATEs to INSERTs and DELETEs to UPDATEs together with timestamps to preserve DML changes made to a table. Thus, the state of the data in the table can be reproduced for any given point in time since the table was put under timetravel control.

First, timetravel must be installed in the database:

create extension timetravel;

Now, a test table is created

CREATE TABLE baz (
id INTEGER,
val INTEGER,
PRIMARY KEY (id));

Let's INSERT and UPDATE a single row.

INSERT INTO baz(id,val) VALUES(1,0);

UPDATE baz SET val=1 WHERE id=1;
UPDATE baz SET val=2 WHERE id=1;
UPDATE baz SET val=3 WHERE id=1;

A SELECT reveals that only the last UPDATE survived, which is the normal behaviour.

SELECT * FROM baz;

Now, in order to enable timetravel capabilities on table baz, it has to be changed a bit.

TRUNCATE TABLE baz;

ALTER TABLE baz ADD COLUMN v_start ABSTIME;
ALTER TABLE baz ADD COLUMN v_end ABSTIME;
ALTER TABLE baz DROP CONSTRAINT baz_pkey;
ALTER TABLE baz ADD PRIMARY KEY (id, v_end);

Now, for every row, two timestamp columns define the timespan for which this row was valid. These columns can be arbitrarily named, but have to be of type ABSTIME. And the primary key must be extended with the 'end' timestamp. Otherwise primary key duplication violations will occur.

Now, baz can be put under timetravel control:

CREATE TRIGGER baz_timetravel
BEFORE INSERT OR DELETE OR UPDATE ON baz
FOR EACH ROW
EXECUTE PROCEDURE
timetravel (v_start, v_end);

Again, we INSERT and UPDATE a single row.

INSERT INTO baz(id,val) VALUES(1,0);

UPDATE baz SET val=1 WHERE id=1;
UPDATE baz SET val=2 WHERE id=1;
UPDATE baz SET val=3 WHERE id=1;

Now, a SELECT reveals what timetravel does - for every UPDATE a new row was INSERTed and timestamped.

SELECT * FROM baz;

See, all values were persisted. The row of the last UPDATE has a 'end' timestamp of infinity, indicating that this is the current value of the row.

Likewise, a DELETE becomes an UPDATE, setting the 'end' timestamp to the time when the DELETE happened.

DELETE FROM baz WHERE id=1 AND val=3;

SELECT * FROM baz;

If you try to UPDATE or DELETE a non active row, timetravel intercepts it with a no-op. Only active rows with a 'end' of infinity are mutable.

Timetravel can be switched on and off per table with SELECT set_timetravel('table_name', 1) and SELECT set_timetravel('table_name', 0) respectively, while SELECT get_timetravel('table_name') shows the current status.

And that's basically it. However, timetravel has a few shortcomings:
  • Tables and primary keys have to be changed
  • Does not protect against TRUNCATE TABLE
  • Needs additional disk space
  • The timestamps have to be ABSTIME, which is deprecated according to the PostgreSQL documentation
  • You can only travel backwards, not into the future ;-)
To be continued every Monday whenever my spare time permits...