tag:blogger.com,1999:blog-83333629532161763672024-01-11T04:03:16.520+01:00The plate is badchemoinformatics and whatnotergohttp://www.blogger.com/profile/06303496169445217331noreply@blogger.comBlogger142125tag:blogger.com,1999:blog-8333362953216176367.post-72050759123916805132023-11-16T19:13:00.008+01:002023-11-16T19:15:16.734+01:00TRAKTOR, protection for tables in a replicaset<p>Altering or dropping tables which are in logical replication is not a good idea if the changes are not uniform across a <a href="https://github.com/ergo70/TRAKTOR/tree/main" target="_blank">TRAKTOR </a>cluster. So, TABLEs in a replicaset are now protected against DROP TABLE and ALTER TABLE by default using PostgreSQL <a href="https://www.postgresql.org/docs/16/event-triggers.html" target="_blank">event triggers</a>.</p>ergohttp://www.blogger.com/profile/06303496169445217331noreply@blogger.com0tag:blogger.com,1999:blog-8333362953216176367.post-79524354732358908892023-11-10T21:04:00.002+01:002023-11-10T21:04:59.299+01:00TRAKTOR, a native trigger function for the fearless<p>TRAKTOR uses a trigger to break replication cycles on PostgreSQL servers < 16.x. By default, the trigger function <i>tf_break_cycle()</i> is implemented in pl/pgSQL. But now, if you are fearless, there is a native replacement written in C <a href="https://github.com/ergo70/TRAKTOR/tree/main/native_trigger" target="_blank">available</a> for pedal-to-the-metal performance.</p>ergohttp://www.blogger.com/profile/06303496169445217331noreply@blogger.com0tag:blogger.com,1999:blog-8333362953216176367.post-67284843937358576112023-10-30T23:40:00.007+01:002023-11-02T14:22:03.810+01:00TRAKTOR: Towards the cloud<p><a href="https://github.com/ergo70/TRAKTOR" target="_blank">TRAKTOR</a> now has the capability to analyze the PostgreSQL server logs through <a href="https://www.postgresql.org/docs/current/file-fdw.html" target="_blank">file_fdw </a>or <a href="https://github.com/aws/postgresql-logfdw" target="_blank">log_fdw</a>, without requiring direct access to the server filesystem any more.</p><p>This enables it to work with managed PostgreSQL like AWS <a href="https://aws.amazon.com/rds/" target="_blank">RDS </a>and <a href="https://aws.amazon.com/rds/aurora/" target="_blank">Aurora</a>.</p><p>File_fdw and log_fdw seem to be lacking from Azure Database for PostgreSQL.</p>ergohttp://www.blogger.com/profile/06303496169445217331noreply@blogger.com0tag:blogger.com,1999:blog-8333362953216176367.post-19186867742056241452023-10-10T09:59:00.003+02:002023-11-11T13:36:39.205+01:00TRAKTOR, true multimaster replication for PostgreSQL<p>Hi all,</p><p>after the retirement of pgchem::tigress, I stumbled over this <a href="https://programmer.help/blogs/implementation-of-multi-master-using-postgresql-logical-subscription.html" target="_blank">Blog post</a>, that showed a method to break the vicious replication cycle when using PostgreSQL logical replication in a multimaster cluster, i.e. all nodes are readers and writers.</p><p>For PostgreSQL 16.x, this is not necessary anymore, since it introduced <a href="https://www.postgresql.fastware.com/blog/bi-directional-replication-using-origin-filtering-in-postgresql" target="_blank">origin=NONE</a>, but for pre-16.x,it works quite well.</p><p>Yet, setting up a multimaster cluster is tedious and error prone, so I decided to start a new endeavour and make it easier (that was well before the introduction of <a href="https://www.pgedge.com/" target="_blank">pgEdge</a> or <a href="https://aws.amazon.com/de/blogs/database/using-pgactive-active-active-replication-extension-for-postgresql-on-amazon-rds-for-postgresql/" target="_blank">pgactive</a>).</p><p>So without much furter ado, <a href="https://github.com/ergo70/TRAKTOR" target="_blank">TRAKTOR </a>was released today. With the help of TRAKTOR, you can set-up true multimaster replication clusters on top of vanilla PostgreSQL servers. Notable features are:</p><p></p><ol style="text-align: left;"><li>Does not need anything beyond a standard PostgreSQL server. No fork, no extension.</li><li>Shared nothing architecture. No single point of failure,</li><li>Can automatically heal conflicts in case of split brain situations.</li><li>REST API for control and monitoring.</li><li>Works with extensions and custom datataypes, <a href="https://postgis.net/" target="_blank">PostGIS </a>and <a href="https://www.rdkit.org/docs/Cartridge.html" target="_blank">rdkit </a>have been tested.</li></ol><div>It does NOT help with designing a robust logical multimaster model! No fancy <a href="https://en.wikipedia.org/wiki/Conflict-free_replicated_data_type" target="_blank">CRDT</a> datatypes, no vector clocks. And it lacks a comprehensive user's guide yet, but there is a <a href="https://github.com/ergo70/TRAKTOR/blob/main/doc/TRAKTOR_tutorial.md" target="_blank">tutorial </a>to get you started.</div><p></p>ergohttp://www.blogger.com/profile/06303496169445217331noreply@blogger.com0tag:blogger.com,1999:blog-8333362953216176367.post-23991421057542447922023-09-14T17:36:00.000+02:002023-09-14T17:36:06.080+02:00Static code analysis in PostgreSQL like ORACLE has.<p>Out of the box, PostgreSQL lacks static code analysis at compile time like ORACLE can do.</p><p><a href="https://github.com/okbob/plpgsql_check" target="_blank">plpgsql_check </a>provides this capability (and a profiler), but needs to be called manually. So I tried to emulate ORACLE's behavior with an <a href="https://www.postgresql.org/docs/current/event-triggers.html" target="_blank">event trigger</a>.</p><p>The <a href="https://github.com/ergo70/plpgsql_compile_check" target="_blank">plpgsql_compile_check</a> extension is experimental. You MUST preload plpgsql and plpgsql_check with <i>shared_preload_libraries='plpgsql,plpgsql_check'</i> in postgresql.conf, otherwise strange things happen!</p>ergohttp://www.blogger.com/profile/06303496169445217331noreply@blogger.com0tag:blogger.com,1999:blog-8333362953216176367.post-88451830929799112642023-05-15T13:07:00.005+02:002023-05-15T15:34:05.856+02:00How to build RDKit 2023 on Windows w/o Conda<p>It looks like the build instructions for RDKit on Windows have not been updated for some time, at least they did not work for me. So I dug out the correct build process out of the <a href="https://docs.conda.io/en/latest/" target="_blank">Conda </a>builds:</p><p><b>General instructions:</b></p><p><a href="https://github.com/rdkit/rdkit/blob/master/Docs/Book/Install.md">https://github.com/rdkit/rdkit/blob/master/Docs/Book/Install.md</a></p><p>Required to run the binaries, IF you don't have the Visual Studio C++ compiler and SDK installed:</p><p>Visual Studio 2015, 2017, 2019 and 2022 redistributable</p><p><a href="https://learn.microsoft.com/en-us/cpp/windows/latest-supported-vc-redist?view=msvc-170">https://learn.microsoft.com/en-us/cpp/windows/latest-supported-vc-redist?view=msvc-170</a></p><p><b>Development environment:</b></p><p>Windows 11 x64</p><p>Microsoft Visual Studio 2022 (Community Edition or Professional)</p><p>Windows SDK version 10.0.22000.0</p><p>Compiler MSVC 19.35.32217.1</p><p>PowerShell</p><p><b>Directory structure:</b></p><p>c:\Devel\RDBuild</p><p style="text-align: left;"><span style="white-space: pre;"> </span>boost<br /><span style="white-space: pre;"> </span>cairo<br /><span style="white-space: pre;"> </span>eigen3<br /><span style="white-space: pre;"> </span>freetype<br /><span style="white-space: pre;"> </span>RDKit<br /><span style="white-space: pre;"> </span>zlib</p><p>RDBASE = C:\Devel\RDBuild\RDKit</p><p><b>RDKit source 2023_03_1:</b></p><p><a href="https://codeload.github.com/rdkit/rdkit/zip/refs/tags/Release_2023_03_1">https://codeload.github.com/rdkit/rdkit/zip/refs/tags/Release_2023_03_1</a></p><p><b>ZLib 1.2.13:</b></p><p><a href="https://github.com/kiyolee/zlib-win-build">https://github.com/kiyolee/zlib-win-build</a></p><p><b>Eigen 3.4.0:</b></p><p><a href="https://gitlab.com/libeigen/eigen/-/archive/3.4.0/eigen-3.4.0.zip">https://gitlab.com/libeigen/eigen/-/archive/3.4.0/eigen-3.4.0.zip</a></p><p><b>Cairo 1.17.2:</b></p><p><a href="https://github.com/preshing/cairo-windows/releases">https://github.com/preshing/cairo-windows/releases</a></p><p><b>Boost 1.82.0:</b></p><p><a href="https://sourceforge.net/projects/boost/files/boost-binaries/">https://sourceforge.net/projects/boost/files/boost-binaries/</a></p><p><b>freetype 2.11.1:</b></p><p><a href="https://codeload.github.com/ubawurinna/freetype-windows-binaries">https://codeload.github.com/ubawurinna/freetype-windows-binaries</a></p><p><b>CMake 3.26.3:</b></p><p><a href="https://github.com/Kitware/CMake/releases/download/v3.26.3/cmake-3.26.3-windows-x86_64.msi">https://github.com/Kitware/CMake/releases/download/v3.26.3/cmake-3.26.3-windows-x86_64.msi</a></p><p><b>PostgreSQL 15.2.1:</b></p><p><a href="https://get.enterprisedb.com/postgresql/postgresql-15.2-1-windows-x64-binaries.zip">https://get.enterprisedb.com/postgresql/postgresql-15.2-1-windows-x64-binaries.zip</a></p><p><b>Python 3.10.11:</b></p><p><a href="https://www.python.org/downloads/release/python-31011/">https://www.python.org/downloads/release/python-31011/</a></p><p><b>Build commands:</b></p><p>1.) <i>Prepare for DLLs and Python package:</i></p><p>c:/cmake/bin/cmake -DRDK_BUILD_PYTHON_WRAPPERS=ON -DBOOST_ROOT=C:/Devel/RDBuild/boost -DRDK_BUILD_CAIRO_SUPPORT=ON -DRDK_BUILD_INCHI_SUPPORT=ON -DRDK_BUILD_AVALON_SUPPORT=ON -DRDK_BUILD_PGSQL=OFF -DPostgreSQL_ROOT="C:\PostgreSQL\15" -DRDK_INSTALL_INTREE=OFF -DCMAKE_INSTALL_PREFIX=c:/RDKit -DEIGEN3_INCLUDE_DIR=C:/Devel/RDBuild/eigen3 -DFREETYPE_INCLUDE_DIRS=c:/Devel/RDbuild/freetype/include -DFREETYPE_LIBRARY="c:/Devel/RDBuild/freetype/release dll/win64/freetype.lib" -DZLIB_INCLUDE_DIR=c:/Devel/RDBuild/zlib/include -DZLIB_LIBRARY=c:/Devel/RDBuild/zlib/libz.lib -DCAIRO_INCLUDE_DIRS=c:/Devel/RDBuild/cairo/include -DCAIRO_LIBRARIES=c:/Devel/RDBuild/cairo/lib/x64/cairo.lib -DRDK_BUILD_FREETYPE_SUPPORT=ON -DRDK_BUILD_COMPRESSED_SUPPLIERS=ON -DRDK_OPTIMIZE_POPCNT=ON -DRDK_INSTALL_STATIC_LIBS=OFF -DRDK_INSTALL_DLLS_MSVC=ON -DCMAKE_BUILD_TYPE=Release -DRDK_BUILD_THREADSAFE_SSS=ON -G"Visual Studio 17 2022" -A x64 ..</p><p>2.) <i>Build & install:</i></p><p>C:\CMake\bin\cmake --build . --config=Release --target install</p><p>3.) <i>Prepare for static libraries and PostgreSQL extension:</i></p><p>c:/cmake/bin/cmake -DRDK_BUILD_PYTHON_WRAPPERS=OFF -DBOOST_ROOT=C:/Devel/RDBuild/boost -DRDK_BUILD_CAIRO_SUPPORT=ON -DRDK_BUILD_INCHI_SUPPORT=ON -DRDK_BUILD_AVALON_SUPPORT=ON -DRDK_BUILD_PGSQL=ON -DPostgreSQL_ROOT="C:\PostgreSQL\15" -DRDK_INSTALL_INTREE=OFF -DCMAKE_INSTALL_PREFIX=c:/RDKit -DEIGEN3_INCLUDE_DIR=C:/Devel/RDBuild/eigen3 -DFREETYPE_INCLUDE_DIRS=c:/Devel/RDbuild/freetype/include -DFREETYPE_LIBRARY="c:/Devel/RDBuild/freetype/release dll/win64/freetype.lib" -DZLIB_INCLUDE_DIR=c:/Devel/RDBuild/zlib/include -DZLIB_LIBRARY=c:/Devel/RDBuild/zlib/libz.lib -DCAIRO_INCLUDE_DIRS=c:/Devel/RDBuild/cairo/include -DCAIRO_LIBRARIES=c:/Devel/RDBuild/cairo/lib/x64/cairo.lib -DRDK_BUILD_FREETYPE_SUPPORT=ON -DRDK_BUILD_COMPRESSED_SUPPLIERS=ON -DRDK_OPTIMIZE_POPCNT=ON -DRDK_INSTALL_STATIC_LIBS=ON -DRDK_INSTALL_DLLS_MSVC=OFF -DCMAKE_BUILD_TYPE=Release -DRDK_BUILD_THREADSAFE_SSS=ON -G"Visual Studio 17 2022" -A x64 ..</p><p>4.) <i>Build & install:</i></p><p>C:\CMake\bin\cmake --build . --config=Release --target install</p><p><b>Usage:</b></p><p>If you want to use the PostgreSQL extension, it needs rdkit.dll, which is automatically copied to the PostgreSQL /lib directory BUT also freetype.dll, and boost_serialization-vc142-mt-x64-1_82.dll. The easiest way is to copy them to the PostgreSQL /lib directory AND set the PATH, e. g. $env:PATH+=';C:\PostgreSQL\15\lib'. You also need to define RDBASE, e. g. $env:RDBASE='c:\RDKit'. Start PostgreSQL and 'CREATE EXTENSION IF NOT EXISTS rdkit;' should work.</p><p>If you want to use the Python package, you must point PYTHONPATH to the RDKit installation directory AND for Python < 3.8 add <where the dlls live> to the PATH OR for Python >= 3.8 add the DLL path explicitly with:</p><p><span style="font-family: courier;">os.add_dll_directory(<where the DLLs live>)</span></p><p>In addition to the RDKit DLLs, freetype.dll, cairo.dll, libz.dll, boost_serialization-vc142-mt-x64-1_82.dll, boost_bzip2-vc143-mt-x64-1_82.dll, boost_iostreams-vc143-mt-x64-1_82.dll, boost_python310-vc143-mt-x64-1_82.dll, and boost_zlib-vc143-mt-x64-1_82.dll must be present.</p><p><b>Result:</b></p><p></p><ul style="text-align: left;"><li>C++ static libraries</li><li>C++ shared libraries</li><li>PostgreSQL extension</li><li>Python package</li></ul><p></p>ergohttp://www.blogger.com/profile/06303496169445217331noreply@blogger.com0tag:blogger.com,1999:blog-8333362953216176367.post-42059729445544448172023-02-05T01:21:00.028+01:002023-03-14T13:18:18.714+01:00The BfArM database of essential drugs in short supply is lacking an API. So I built one.<p>The <a href="https://www.bfarm.de/DE/Home/_node.html" target="_blank">BfArM</a> database of essential drugs in short supply is lacking an API. So I implemented one in about 12 hours. The <a href="https://github.com/ergo70/LE_Meldungen_API" target="_blank">source code</a> can be found on GitHub.</p><p><b>tl;dr </b></p><p>The state of digitalization in Germany is in dire straits, especially so when looking at the healtcare system and govenment/administration. If APIs exist, they are often difficult to discover and/or undocumented (The private <a href="http://bund.dev">bund.dev</a> project tries to collect and document them in a central repository).</p><p>Yet I was surprised when I, inspired by an an <a href="https://www.spiegel.de/netzwelt/netzpolitik/medikamentenknappheit-in-deutschland-haette-haette-schmerztablette-kolumne-von-sascha-lobo-a-5e1aa287-2255-4adb-952e-36c503ae757e" target="_blank">article </a>about the shortage of essential drugs in Germany, took a closer look at the official governmental database on this matter.</p><p>This database is hosted by the "Bundesinstitut für Arzneimittel und Medizinprodukte (BfArM)", and can be found <a href="https://anwendungen.pharmnet-bund.de/lieferengpassmeldungen/faces/public/meldungen.xhtml" target="_blank">here</a>.</p><p>It can be accessed in two ways:</p><p></p><ol style="text-align: left;"><li>As filterable dynamic HTML/JavaScript table rendered using JSF.</li><li>As CSV download.</li></ol><p></p><p>No API for M2M communication or the like. At least I could not find one. There is also no documentation about the data itself. </p><p>Regarding data quality, I noticed the following issues:</p><p></p><ol style="text-align: left;"><li>The CSV file is encoded in ISO-8859-1 (Latin1) and not UTF-8. While this is not uncommon it is a bit unexpected, since ISO-8859-1 only covers the first 256 Unicode characters. The file encoding is not documented.</li><li>The CSV is actually not comma, but semicolon separated.</li><li>Data not available is not only NULL, it is also encoded as "N/A", "n.a.", "-", and "'-", respectively. There might me more undocumented encodings.</li><li>"*" encodes "Altdatenübernahme war nicht möglich", meaning that older data could not be transferred. This is documented in the legend of the table, but not what it actually means.</li><li>The update frequency of the data on display is not documented.</li></ol><p></p><p>Having worked with databases for about 30 years now, this looks like this data comes directly from some kind of manually curated data set to me. There is obviously no decent data standardization process in place.</p><p>But whining alone doesn't help, so I decided to implement the missing API on a tiny server hosted in Germany. It took me about 12 hours of my private time, including implementing basic data sanitization.</p><p>The most difficult part was to automate the CSV download, since the submit button calls some JavaScript function, and thus can't be called using a HTTP request or a scraping library like <a href="https://pypi.org/project/beautifulsoup4/" target="_blank">beautifulsoup</a>. I'm now using a remote controlled headless Browser via <a href="https://pypi.org/project/selenium/" target="_blank">selenium</a>. <strike>That the HTML <i>name</i> attribute frequently changes does not especially help, either</strike>. This has been solved with an XPATH expression on the <i>value</i>.</p><p>The demo API is available under <strike>https://3.73.42.17:8443/docs</strike>. You will be warned because of the self-signed SSL certificate. This is ok, AWS did not want to register a domain, so no <a href="https://letsencrypt.org/" target="_blank">Let's Encrypt</a>. But this just swaps SSL with SSH semantics anyway. Since it is for demonstration purposes only, and runs on a small server, there is a rate limiter in place. Resource names and data are in German, like in the source system.</p><p>The <a href="https://github.com/ergo70/LE_Meldungen_API" target="_blank">source code</a> can be found on GitHub. Maybe somebody at BfArM realizes that it does not cost a fortune to implement an API on top of what they already have, and finds my example useful to build on.</p><p>Public data (I assume it is public, there is no need to register) is updated daily at 11:00 UTC. In complicance with Article 4 of the <a href="https://gdpr.eu/" target="_blank">GDPR</a>, personal information (Client IP address, Telephone number and E-Mail address) is not stored or displayed. As mentioned before, the API is hosted in Germany.</p><p>I<strike>'m not intending to run this forever, and outages are possible at any time.</strike> I have shut down the service by now.</p>ergohttp://www.blogger.com/profile/06303496169445217331noreply@blogger.com0tag:blogger.com,1999:blog-8333362953216176367.post-54717569855010373632022-12-27T20:54:00.003+01:002023-02-05T01:23:23.456+01:00pg_sentinel - Update<p>In 2016 I released <a href="https://github.com/ergo70/pg_sentinel" target="_blank">pg_sentinel</a> as a proof-of-concept for implementing a sentinel value sensor deep in the PostgreSQL sever. This does not compile since PostgreSQL >= 12.x because of changes in the internal API. So I adapted my old code to make it work again.</p><p>As a bonus, it does not need <a href="https://www.postgresql.org/docs/current/spi.html" target="_blank">SPI</a> any more.</p>ergohttp://www.blogger.com/profile/06303496169445217331noreply@blogger.com0tag:blogger.com,1999:blog-8333362953216176367.post-5653308040086625212022-07-23T20:30:00.003+02:002023-02-05T01:22:58.161+01:00After 18 years, pgchem::tigress retires<p>To whom it may concern.</p><p>Today I will retire pgchem::tigress, the PostgreSQL chemoinformatics extension based on OpenBabel, after 18 years of service. This decision is based on three main reasons:</p><p style="text-align: left;"></p><ol style="text-align: left;"><li>I have not touched the GIST-Index Code for at least eight years, but beginning with PostgreSQL 14.x it started to cause SIGSEVs when building the index on molecules, and I'm unable to find the cause.</li><li>OpenBabel 3.x made changes in their API that would require me to rewrite functions or to disable them. And those changes are not very well documented.</li><li>Since my recent brush with death, I have decided that there are better ways to spend my time, than chasing Signal 11s. Especially since the RDKit cartridge has come a long way, and is more powerful than pgchem::tigress ever was.</li></ol><p></p><p style="text-align: left;">This decision was not easy, since building pgchem::tigress was a part of my life. It was the first open source ever released by Bayer AG (at least in Germany). It also is the foundation of my PhD thesis. </p><p style="text-align: left;">The code will remain public as long as there is a way to publish it.</p>ergohttp://www.blogger.com/profile/06303496169445217331noreply@blogger.com0tag:blogger.com,1999:blog-8333362953216176367.post-27394229940941903672020-05-22T10:48:00.001+02:002020-05-22T10:59:21.544+02:00Native (PostgreSQL only) streaming data tablesIf you want to see (and analyze) only a window of data over some continuous data stream in <a href="https://www.postgresql.org/" target="_blank">PostgreSQL</a>, one way is to use a specialized tool like the <a href="https://www.pipelinedb.com/" target="_blank">PipelineDB </a>extension. But if you can't do that, e.g. because you are stuck with AWS <a href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_PostgreSQL.html" target="_blank">RDS</a> or for some other reason, streaming data tables, or continuous views, can be implemented with pretty much PostgreSQL alone.<br />
<br />
The basic idea is to have a table that allows for fast INSERT operations, is aggressively <a href="https://www.postgresql.org/docs/12/sql-vacuum.html" target="_blank">VACUUM</a>ed, and has some key that can be used to prune outdated entries. This table is fed with the events from the data stream and regularly pruned. Voilà: a streaming data table.<br />
<br />
We have done some testing with two approaches on an <a href="https://blog.nukomeet.com/faster-inserts-how-to-increase-postgresql-write-performance-24d76bd56f75" target="_blank">UNLOGGED </a>table, prune on every INSERT, and pruning at reqular intervals. UNLOGGED is not a problem here, since a view on a data stream can be considered pretty much as ephemeral.<br />
<br />
The timed variant is about 5x - 8x faster on INSERTs. And if you balance the timing and the pruning interval right, the window size is almost as stable.<br />
<br />
The <a href="https://github.com/ergo70/PostgreSQLNativeStreamTables" target="_blank">examples</a> are implemented in <a href="https://www.python.org/" target="_blank">Python3 </a>with <a href="https://www.psycopg.org/" target="_blank">psycopg2</a>. Putting an index on the table can help or hurt performance, INSERT might get slower but pruning with DELETE faster, depending on the size and structure of the data. Feel free to experiment. In our case, a vanilla <a href="https://www.postgresql.org/docs/12/brin-intro.html" target="_blank">BRIN </a>index did just fine.<br />
<br />
Instead of using an external scheduler for pruning, like the Python daemon thread in the <a href="https://github.com/ergo70/PostgreSQLNativeStreamTables/blob/master/stream_timed_cleanup.py" target="_blank"><i>stream_timed_cleanup.py</i></a> example, other scheduling mechanisms can be of course used, e.g. <a href="https://github.com/citusdata/pg_cron" target="_blank">pg_cron</a>, or a <a href="https://docs.aws.amazon.com/lambda/latest/dg/services-cloudwatchevents.html" target="_blank">scheduled Lambda</a> on AWS, or similar.<br />
<br />
Feel free to experiment and improve...ergohttp://www.blogger.com/profile/06303496169445217331noreply@blogger.com0tag:blogger.com,1999:blog-8333362953216176367.post-5728748991475728542020-05-19T12:13:00.003+02:002020-05-19T12:13:55.180+02:00MQTT as transport for PostgreSQL events<a href="http://mqtt.org/" target="_blank">MQTT </a>has become a de-facto standard for the transport of messages between <a href="https://en.wikipedia.org/wiki/Internet_of_things" target="_blank">IoT</a> devices. As a result, a plethora of libraries and MQTT message brokers have become available. Can we use this to transport messages originating from <a href="https://www.postgresql.org/" target="_blank">PostgreSQL</a>?<br />
<br />
Aa message broker we use <a href="https://mosquitto.org/" target="_blank">Eclipse Mosquitto</a> which is dead simple to set up if you don't have to change the default settings. Such a default installation is neither secure nor highly available, but for our demo it will do just fine. The event generators are written in <a href="https://www.python.org/">Python3</a> with <a href="https://www.eclipse.org/paho/">Eclipse paho</a><a href="https://www.eclipse.org/paho/" target="_blank"> </a><a href="https://pypi.org/project/paho-mqtt/" target="_blank">mqtt for Python</a>.<br />
<br />
There are at least two ways to generate events from a PostgreSQL database, <a href="https://www.postgresql.org/docs/12/app-pgrecvlogical.html" target="_blank">pg_recvlogical </a>and <a href="https://www.postgresql.org/docs/12/libpq-notify.html" target="_blank">NOTIFY / LISTEN</a>. Both have their advantages and shortcomings.<br />
<br />
pg_recvlogical:<br />
<br />
<ul>
<li>Configured on server and database level</li>
<li>Generates comprehensive information about everything that happens in the database</li>
<li>No additional programming neccessary</li>
<li>Needs plugins to decode messages, e.g. into JSON</li>
<li>Filtering has to be done later, e.g. by the decoder plugin</li>
</ul>
<div>
NOTIFY / LISTEN:</div>
<div>
<ul>
<li>Configured on DDL and table level</li>
<li>Generates exactly the information and format you program into the triggers</li>
<li>Filtering can be done before sending the message</li>
<li>Needs trigger programming</li>
<li>The message size is limited to 8000 bytes</li>
</ul>
<div>
Examples for both approaches can be found <a href="https://github.com/ergo70/PostgreSQL2MQTT" target="_blank">here</a>. The NOTIFY / LISTEN example lacks a proper decoder but this makes be a good excercise to start with. The pg_recvlogical example needs the wal2json plugin, which can be found <a href="https://github.com/eulerto/wal2json" target="_blank">here </a>and the proper setup, which is also explained in the Readme. Please note, that the slot used in the example is <i>mqtt_slot</i>, not <i>test_slot</i>:</div>
</div>
<div>
<br /></div>
<br />
<pre style="background-color: #f6f8fa; border-radius: 3px; box-sizing: border-box; color: #24292e; font-family: SFMono-Regular, Consolas, "Liberation Mono", Menlo, monospace; font-size: 13.6px; line-height: 1.45; margin-bottom: 16px; overflow-wrap: normal; overflow: auto; padding: 16px;"><code style="background: initial; border-radius: 3px; border: 0px; box-sizing: border-box; display: inline; font-family: SFMono-Regular, Consolas, "Liberation Mono", Menlo, monospace; font-size: 13.6px; line-height: inherit; margin: 0px; overflow-wrap: normal; overflow: visible; padding: 0px; word-break: normal;">pg_recvlogical -d postgres --slot mqtt_slot --create-slot -P wal2json</code></pre>
<div>
</div>
<br />
<div>
Otherwise, <a href="https://github.com/ergo70/PostgreSQL2MQTT/blob/master/setup.sql">setup.sql</a> should generate all objects to run both examples.</div>
<div>
<br /></div>
ergohttp://www.blogger.com/profile/06303496169445217331noreply@blogger.com0tag:blogger.com,1999:blog-8333362953216176367.post-42429156212204232362020-04-25T14:08:00.000+02:002020-04-25T14:09:25.750+02:00It looks like pgchem::tigress just got a major upgradeWith the Release of<a href="https://www.postgresql.org/about/news/1976/" target="_blank"> PostgreSQL 12.x</a> and <a href="https://github.com/openbabel/openbabel/releases/tag/openbabel-3-0-0" target="_blank">OpenBabel 3.x</a>, I decided to see if <a href="https://github.com/ergo70/pgchem_tigress/" target="_blank">pgchem::tigress</a> would still compile. Well, it took some minor changes, but YES, it does!<br />
<br />
And - it seems like OpenBabel now handles E/Z and enantiomer stereochemistry correctly, at least in SMILES notation. This is a major step forward, but I have to do some more checks before the next release...ergohttp://www.blogger.com/profile/06303496169445217331noreply@blogger.com0tag:blogger.com,1999:blog-8333362953216176367.post-43737355404976678302020-03-15T00:10:00.001+01:002020-03-15T11:36:17.812+01:00Authenticate PostgreSQL users against the Amazon AWS Cognito serviceI was asked recently if PostgreSQL could authenticate login users against AWS <a href="https://aws.amazon.com/cognito/?nc1=h_ls" target="_blank">Cognito</a>. Since PostgreSQL allows <a href="https://www.postgresql.org/docs/12/auth-pam.html" target="_blank">PAM authentication</a>, I was pretty sure it could.<br />
<br />
But an (admittedly not exhaustive) search on the web did not produce any PAMs for Cognito.<br />
<br />
So I wrote one, using <a href="http://pam-python.sourceforge.net/" target="_blank">pam-python</a>, <a href="https://github.com/boto/boto3" target="_blank">boto3</a>, <a href="https://github.com/capless/warrant" target="_blank">warrant</a> and <a href="https://pyjwt.readthedocs.io/en/latest/" target="_blank">pyJWT</a>:<br />
<br />
<div style="text-align: center;">
<a href="https://gist.github.com/ergo70/ca776d1f49c464c07930d94c6e8b01aa" target="_blank">cognito_PAM.py</a></div>
<br />
It is designed primarily for PostgreSQL and <a href="https://www.pgbouncer.org/" target="_blank">pgbouncer</a>, so it only supports <a href="http://man7.org/linux/man-pages/man3/pam_sm_authenticate.3.html" target="_blank">pam_sm_authenticate</a> and <a href="http://man7.org/linux/man-pages/man3/pam_sm_acct_mgmt.3.html" target="_blank">pam_sm_acct_mgmt</a>, and all the work is done in pam_sm_authenticate. Because calling Cognito is comparatively slow, I didn't want to call it twice.<br />
<br />
The necessary <a href="https://linux.die.net/man/5/pam.d">pam.d</a> config is:<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">#%PAM-1.0</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"># Information for PostgreSQL process with the 'pam' option.</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">auth required pam_python.so cognito_PAM.py <i>aws_region user_pool_id client_id </i></span><span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"><client_id></client_id></span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">account required pam_python.so cognito_PAM.py</span><br />
<br />
<b>If you use PAM authentication, passwords are sent in cleartext, so transport layer encryption, e.g. SSL/TLS, between client and server becomes mandatory!</b><br />
<br />
I think it does the correct dance of authentication with Cognito and supports USER_SRP_AUTH, but if you see any problem, please raise a paw.<br />
<div>
<br /></div>
<br />ergohttp://www.blogger.com/profile/06303496169445217331noreply@blogger.com0tag:blogger.com,1999:blog-8333362953216176367.post-59401022429710525832020-02-14T11:53:00.001+01:002020-02-14T11:53:40.639+01:00Excel and ODF support for cloudfs_fdw<a href="https://github.com/ergo70/cloudfs_fdw" target="_blank">cloudfs_fdw</a> now supports .xls (Excel 97-2003), .xlsx, and .ods (Open Document Format) Spreadsheets via p<a href="https://pandas.pydata.org/" target="_blank">andas</a>, <a href="https://pypi.org/project/xlrd/" target="_blank">xlrd</a>, and <a href="https://pypi.org/project/odfpy/" target="_blank">odfpy</a>. It requires pandas >= 1.0.1, so <a href="https://github.com/Kozea/Multicorn/" target="_blank">Multicorn</a> must be compiled against Python 3.<br />
<br />
Since pandas provides sorting and filtering capabilities, <a href="https://github.com/ergo70/cloudfs_fdw" target="_blank">cloudfs_fdw</a> tries to push down SQL qualifiers and sort keys when they can be translated into pandas notation.<br />
<br />
Take a look and have fun.ergohttp://www.blogger.com/profile/06303496169445217331noreply@blogger.com0tag:blogger.com,1999:blog-8333362953216176367.post-41876750837879730082019-09-25T20:40:00.001+02:002019-09-25T20:40:35.381+02:00cloudfs_fdwSince I needed a <a href="https://www.percona.com/blog/2018/08/21/foreign-data-wrappers-postgresql-postgres_fdw/" target="_blank">Foreign Data Wrapper</a> for files stored on S3, and the ones I found did things like loading the whole file in memory before sending the first rows, I wrote my own, using <a href="https://multicorn.org/" target="_blank">Multicorn</a>.<br />
<br />
Along the way, I discovered libraries like <a href="https://pypi.org/project/smart-open/" target="_blank">smart-open</a> and <a href="https://pypi.org/project/ijson/" target="_blank">ijson</a> that allow to stream various file formats from various filesystems - and so this escalated a bit, into <a href="https://github.com/ergo70/cloudfs_fdw" target="_blank">cloudfs_fdw</a>.<br />
<br />
It currently supports CSV and JSON files from S3, HTTP/HTTPS sources and local or network filesystems but since smart-open supports more than that (e.g. HDFS, SSH), it certainly can be extended if needed.<br />
<br />
For now, have fun.ergohttp://www.blogger.com/profile/06303496169445217331noreply@blogger.com0tag:blogger.com,1999:blog-8333362953216176367.post-20115102180626900032019-05-02T18:32:00.002+02:002019-05-03T09:45:25.384+02:00Not all CASTs are created equal?<span style="font-family: "arial" , "helvetica" , sans-serif;">Can somebody explain this?</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">[Solved: See the comments section]</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">PostgreSQL 11.2.</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">The <a href="https://www.postgresql.org/docs/11/sql-expressions.html#SQL-SYNTAX-TYPE-CASTS" target="_blank">documentation</a> says:</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">"A type cast specifies a conversion from one data type to another. PostgreSQL accepts two equivalent syntaxes for type casts:</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<i><span style="font-family: "arial" , "helvetica" , sans-serif;">CAST ( expression AS type )</span></i><br />
<i><span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span></i>
<i><span style="font-family: "arial" , "helvetica" , sans-serif;">expression::type</span></i><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">The CAST syntax conforms to SQL; the syntax with :: is historical PostgreSQL usage."</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">But when I test the <a href="https://www.postgresql.org/docs/11/datatype-numeric.html" target="_blank">lower limits</a> of PostgreSQL's integer types, strange things happen.</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">select cast(-9223372036854775808 as bigint);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">select cast(-2147483648 as integer);</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">select cast(-32768 as smallint);</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">All OK.</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">select -9223372036854775808::bigint;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">select -2147483648::integer;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">select -32768::smallint;</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">All fail with <i>SQL Error [22003]: ERROR: <type> out of range</type></i></span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">But:</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">select -9223372036854775807::bigint;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">select -2147483647::integer;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">select -32767::smallint;</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">All OK.</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">???</span>ergohttp://www.blogger.com/profile/06303496169445217331noreply@blogger.com3tag:blogger.com,1999:blog-8333362953216176367.post-91909513614139680212019-04-19T00:22:00.001+02:002019-04-19T00:22:37.396+02:00The Hare and the Hedgehog. Muscle, brain - or both?I<span style="font-family: "arial" , "helvetica" , sans-serif;">n the famous fairy tale the hedgehog wins the race against the hare because he uses his brain to outwit the much faster hare: Brain beats muscle. But is that always the case? And what if we combine the two virtues?</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">The case at hand: Screening large sets of molecules for chemical simliarity.</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">Since <a href="https://en.wikipedia.org/wiki/Graph_isomorphism" target="_blank">(sub)graph isomorphism</a> searching faces some mathematical challenges because of nonpolynomial O - even if you can use a specialized index, like <a href="https://github.com/ergo70/pgchem_tigress" target="_blank">pgchem::tigress</a> does - fast <a href="https://en.wikipedia.org/wiki/Chemical_similarity" target="_blank">similarity</a> searching based on <a href="http://www.daylight.com/dayhtml/doc/theory/theory.finger.html" target="_blank">binary fingerprints</a> has gained popularity in recent years.</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">I was tasked with evaluating a solution to the problem of similarity screening large sets of molecules with PostgreSQL where the fingerprints are generated externally, e.g. with the <a href="https://cdk.github.io/" target="_blank">CDK</a>.</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">This is, what I came up with...</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<h4>
<span style="font-family: "arial" , "helvetica" , sans-serif;">
Preparing the Racetrack</span></h4>
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace;">CREATE TABLE cdk.externalfp (</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="white-space: pre;"> </span>id int4 NOT NULL,</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="white-space: pre;"> </span>smiles varchar NOT NULL,</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="white-space: pre;"> </span>pubchemfp varbit NULL,</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="white-space: pre;"> </span>"cardinality" int4 NULL,</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="white-space: pre;"> </span>CONSTRAINT externalfp_pk PRIMARY KEY (id)</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">);</span><br />
<br />
<span style="font-family: "arial" , "helvetica" , sans-serif;">Above is the table definition of the final table. The cardinality column will be not used now, but since it is calculated by the fingerprint generator anyway, keeping it will save some work later. If you want to copy my example code 1:1, please use a database named <i>chemistry</i> and a schema named <i>cdk</i>.</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">First we need to load some data into the table. I used the free <a href="https://cactus.nci.nih.gov/download/nci/NCISMA99.sdz" target="_blank">NCISMA99</a> dataset from the <a href="https://www.cancer.gov/" target="_blank">National Cancer Institute</a>, containing 249081 chemical structures in SMILES notation.</span><br />
<br />
<span style="font-family: Courier New, Courier, monospace;">COPY cdk.externalfp (id, smiles) FROM '/tmp/NCISMA99' </span><br />
<span style="font-family: Courier New, Courier, monospace;">WITH (DELIMITER ' ', HEADER false, FORMAT csv);</span><br />
<br />
<span style="font-family: "arial" , "helvetica" , sans-serif;">And a few seconds later you should have 249081 rows in the table. Now we need to generate the fingerprints. The generator code is <a href="https://gist.github.com/ergo70/a7ee432fe67db4a8f4124c6d5f5e589b" target="_blank">here</a>, additionally you need the <a href="https://github.com/cdk/cdk/releases/download/cdk-2.2/cdk-2.2.jar" target="_blank">CDK 2.2</a> and a PostgreSQL <a href="https://jdbc.postgresql.org/" target="_blank">JDBC driver</a>. After changing the code to reflect your JDBC URL you are good to go.</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">Running the <i>FingerprintGenerator</i> should show no errors and takes about 30 Minutes on my Core i5 Linux Notebook. The fingerprint used is the <a href="https://pubchem.ncbi.nlm.nih.gov/" target="_blank">PubChem</a> fingerprint as described <a href="ftp://ftp.ncbi.nlm.nih.gov/pubchem/specifications/pubchem_fingerprints.txt" target="_blank">here</a>.</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;">Now we can put an index on the cardinality column (also used later) and are all set.</span><br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">CREATE INDEX externalfp_cardinality_idx ON cdk.externalfp USING btree (cardinality);</span><br />
<br />
<span style="font-family: "arial" , "helvetica" , sans-serif;">Almost...</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">The function to calculate the similarity measure is still missing. We use the <a href="http://mines.humanoriented.com/classes/2010/fall/csci568/portfolio_exports/sphilip/tani.html" target="_blank">Tanimoto coefficient</a>, as it is widely used and fairly easy to understand. The Tanimoto coefficient over PostgreSQL <a href="https://www.postgresql.org/docs/11/datatype-bit.html" target="_blank">BIT VARYING</a> can thus be written in pure SQL as:</span><br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">CREATE OR REPLACE FUNCTION cdk.tanimoto(bit varying, bit varying)</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> RETURNS real</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> LANGUAGE sql</span><br />
<span style="font-family: "courier new" , "courier" , monospace;"> IMMUTABLE STRICT SECURITY INVOKER LEAKPROOF</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">AS $function$</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">select length(replace(($1 & $2)::text, '0', ''))::real / length(replace(($1 | $2)::text, '0', ''))::real;</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">$function$;</span><br />
<div>
<br /></div>
<span style="font-family: "arial" , "helvetica" , sans-serif;">Please note that the required bitcount function on BIT VARYING is emulated by removing all 0s and measuring the length of the remaining string, containing all 1s.</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<br />
<h4>
<span style="font-family: "arial" , "helvetica" , sans-serif;">The Baseline. No brain, no muscles</span></h4>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">For the first naive test, we run <a href="https://gist.github.com/ergo70/22472c0225a9721709194f6da8048706" target="_blank">FindBySimilarity</a> with the following inputs:</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: Arial, Helvetica, sans-serif;">SMILES: <span style="background-color: white; color: #212121;"><span style="font-family: inherit;">CC(=O)OC1=CC=CC=C1C(=O)O</span></span></span><br />
<span style="background-color: white; color: #212121;"><span style="font-family: Arial, Helvetica, sans-serif;">Threshold: 0.9</span></span><br />
<span style="background-color: white; color: #212121;"><span style="font-family: Arial, Helvetica, sans-serif;">top N: 10</span></span><br />
<span style="background-color: white; color: #212121;"><span style="font-family: Arial, Helvetica, sans-serif;"><br /></span></span>
<span style="background-color: white; color: #212121;"><span style="font-family: Arial, Helvetica, sans-serif;">which gives the following plan from EXPLAIN ANALYZE:</span></span><br />
<br />
<span style="font-family: "arial" , "helvetica" , sans-serif; font-size: xx-small;">Limit (cost=14515.22..14516.38 rows=10 width=77) (actual time=6436.022..6436.621 rows=10 loops=1) </span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif; font-size: xx-small;"> -> Gather Merge (cost=14515.22..22587.95 rows=69190 width=77) (actual time=6436.021..6436.618 rows=10 loops=1) </span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif; font-size: xx-small;"> Workers Planned: 2 </span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif; font-size: xx-small;"> Workers Launched: 2 </span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif; font-size: xx-small;"> -> Sort (cost=13515.19..13601.68 rows=34595 width=77) (actual time=6432.534..6432.535 rows=8 loops=3) </span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif; font-size: xx-small;"> Sort Key: (((length(replace((('1100000001110000001110000000000000000000000000000000000000000000000000000000000000000000000000</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif; font-size: xx-small;"> Sort Method: top-N heapsort Memory: 27kB </span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif; font-size: xx-small;"> Worker 0: Sort Method: quicksort Memory: 27kB </span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif; font-size: xx-small;"> Worker 1: Sort Method: top-N heapsort Memory: 27kB </span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif; font-size: xx-small;"> -> Parallel Seq Scan on externalfp (cost=0.00..12767.61 rows=34595 width=77) (actual time=41.329..6432.411 rows=30 loops=3)</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif; font-size: xx-small;"> Filter: (((length(replace((('110000000111000000111000000000000000000000000000000000000000000000000000000000000000000000</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif; font-size: xx-small;"> Rows Removed by Filter: 82997 </span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif; font-size: xx-small;">Planning Time: 0.169 ms </span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif; font-size: xx-small;">Execution Time: 6436.648 ms </span> <br />
<span style="font-family: "arial" , "helvetica" , sans-serif;">Including retrieval, this plan leads to an overall response time of about 14 seconds.</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<br />
<h4>
<span style="font-family: "arial" , "helvetica" , sans-serif;">
First race. Introducing the Hedgehog</span></h4>
<div>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span></div>
<div>
<span style="font-family: "arial" , "helvetica" , sans-serif;">An index could be helpful, but can we build one without major programming effort? Yes, due to the work of S. Joshua Swamidass and Pierre Baldi. In their 2007 <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2527184/" target="_blank">paper</a> <i>"Bounds and Algorithms for Fast Exact Searches of Chemical Fingerprints in Linear and Sub-Linear Time"</i>, they found ways to calculate upper and lower bounds on the cardinality of fingerprints necessary to meet a given similarity once the cardinality of the search fingerprint is known.</span></div>
<div>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span></div>
<div>
<span style="font-family: "arial" , "helvetica" , sans-serif;">The paper covers these calculations for various similarity measures. For Tanimoto it is:</span></div>
<div>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span></div>
<div>
<i><span style="font-family: "arial" , "helvetica" , sans-serif;">min_cardinality_of_target = floor(cardinality_of_search_argument * similarity_threshold)</span></i></div>
<div>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span></div>
<div>
<span style="font-family: "arial" , "helvetica" , sans-serif;">and </span></div>
<div>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span></div>
<div>
<i><span style="font-family: "arial" , "helvetica" , sans-serif;">max_cardinality_of_target = ceil(cardinality_of_search_argument / similarity_threshold)</span></i></div>
<div>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span></div>
<div>
<span style="font-family: "arial" , "helvetica" , sans-serif;">The function <i>swamidassBaldiLimitsForTanimoto() </i>in <a href="https://gist.github.com/ergo70/22472c0225a9721709194f6da8048706" target="_blank">FindBySimilarity</a> calculates those bounds and if you change:</span></div>
<div>
<br /></div>
<div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">//int[] lohi = swamidassBaldiLimitsForTanimoto(fp.cardinality(), threshold);</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">int[] lohi = {0, Integer.MAX_VALUE};</span></div>
</div>
<div>
<br /></div>
<div>
<span style="font-family: "arial" , "helvetica" , sans-serif;">to</span></div>
<div>
<br /></div>
<div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">int[] lohi = swamidassBaldiLimitsForTanimoto(fp.cardinality(), threshold);</span></div>
<div>
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">//int[] lohi = {0, Integer.MAX_VALUE};</span></div>
</div>
<br />
<span style="font-family: Arial, Helvetica, sans-serif;">they will be used. Now the index we already put on the "cardinality" column makes sense, allowing the database to filter out impossible candidates before invoking the tanimoto function.</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span>
<span style="font-family: Arial, Helvetica, sans-serif;">For CC(=O)OC1=CC=CC=C1C(=O)O, the fingerprint has a cardinality of 115, which gives an upper bound of 128 and a lower bound of 103 bits. All fingerprints with cardinalities outside that bounds can be safely ignored since they cannot yield a Tanimoto coefficient >= 0.9.</span><br />
<br />
<span style="font-family: "arial" , "helvetica" , sans-serif;">Now the plan becomes:</span><br />
<br />
<span style="font-family: "arial" , "helvetica" , sans-serif; font-size: xx-small;">Limit (cost=10445.68..10446.85 rows=10 width=77) (actual time=1492.430..1495.552 rows=10 loops=1) </span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif; font-size: xx-small;"> -> Gather Merge (cost=10445.68..12328.11 rows=16134 width=77) (actual time=1492.429..1495.548 rows=10 loops=1) </span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif; font-size: xx-small;"> Workers Planned: 2 </span><span style="font-family: arial, helvetica, sans-serif; font-size: xx-small;"> </span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif; font-size: xx-small;"> Workers Launched: 2 </span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif; font-size: xx-small;"> -> Sort (cost=9445.66..9465.82 rows=8067 width=77) (actual time=1489.566..1489.567 rows=7 loops=3) </span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif; font-size: xx-small;"> Sort Key: (((length(replace((('1100000001110000001110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif; font-size: xx-small;"> Sort Method: top-N heapsort Memory: 27kB </span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif; font-size: xx-small;"> Worker 0: Sort Method: top-N heapsort Memory: 27kB </span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif; font-size: xx-small;"> Worker 1: Sort Method: top-N heapsort Memory: 27kB </span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif; font-size: xx-small;"> -> Parallel Bitmap Heap Scan on externalfp (cost=826.09..9271.33 rows=8067 width=77) (actual time=22.032..1489.502 rows=30 loops=3) </span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif; font-size: xx-small;"> Recheck Cond: ((cardinality >= 103) AND (cardinality <= 128)) </span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif; font-size: xx-small;"> Filter: (((length(replace((('110000000111000000111000000000000000000000000000000000000000000000000000000000000000000000000000000000000000</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif; font-size: xx-small;"> Rows Removed by Filter: 19303 </span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif; font-size: xx-small;"> Heap Blocks: exact=1928 </span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif; font-size: xx-small;"> -> Bitmap Index Scan on externalfp_cardinality_idx (cost=0.00..821.25 rows=58083 width=0) (actual time=5.238..5.238 rows=58000 loops=1)</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif; font-size: xx-small;"> Index Cond: ((cardinality >= 103) AND (cardinality <= 128)) </span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif; font-size: xx-small;">Planning Time: 0.166 ms </span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif; font-size: xx-small;">Execution Time: 1495.587 ms </span> <br />
<br />
<span style="font-family: "arial" , "helvetica" , sans-serif;">Including retrieval, this plan gives us now an overall response time of about 3 seconds, or 4.6 times faster.</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<h4>
<span style="font-family: "arial" , "helvetica" , sans-serif;">Second race. Introducing the Hare</span></h4>
<div>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">Time to pull out the big guns. The tanimoto calculation in SQL is apparently slow, primarily because PostgreSQL is lacking a native bit count function for BIT VARYING, so this must be emulated using binary string replace() and length().</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">However, we can build one in C. Since counting bits is a simple operation for a microprocessor (Actually, it is an art in itself. See Andrew Dalke's <a href="http://dalkescientific.com/writings/diary/popcnt.cpp" target="_blank">popcount benchmark</a> for many different ways to count bits.), a C function should perform much better.</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: arial, helvetica, sans-serif;">After you installed <a href="https://github.com/ergo70/tanimoto" target="_blank">tanimoto.c</a>, changed the tanimoto function calls in the SQL in FindBySimilarity to tanimoto_c, and disabled the Swamidass/Baldi indexing, we see the raw power of a native function.</span><br />
<span style="font-family: arial, helvetica, sans-serif;"><br /></span>
<span style="font-family: arial, helvetica, sans-serif;">The plan now becomes:</span></div>
<br />
<span style="font-family: "arial" , "helvetica" , sans-serif; font-size: xx-small;">Limit (cost=10363.85..10365.02 rows=10 width=77) (actual time=45.829..48.101 rows=10 loops=1) </span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif; font-size: xx-small;"> -> Gather Merge (cost=10363.85..18436.58 rows=69190 width=77) (actual time=45.827..48.097 rows=10 loops=1) </span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif; font-size: xx-small;"> Workers Planned: 2 </span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif; font-size: xx-small;"> Workers Launched: 2 </span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif; font-size: xx-small;"> -> Sort (cost=9363.83..9450.32 rows=34595 width=77) (actual time=43.667..43.668 rows=8 loops=3) </span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif; font-size: xx-small;"> Sort Key: (tanimoto_c('11000000011100000011100000000000000000000000000000000000000000000000000000000000000000000000000000</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif; font-size: xx-small;"> Sort Method: top-N heapsort Memory: 27kB </span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif; font-size: xx-small;"> Worker 0: Sort Method: top-N heapsort Memory: 27kB </span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif; font-size: xx-small;"> Worker 1: Sort Method: top-N heapsort Memory: 27kB </span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif; font-size: xx-small;"> -> Parallel Seq Scan on externalfp (cost=0.00..8616.24 rows=34595 width=77) (actual time=0.095..43.559 rows=30 loops=3)</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif; font-size: xx-small;"> Filter: (tanimoto_c('1100000001110000001110000000000000000000000000000000000000000000000000000000000000000000000000</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif; font-size: xx-small;"> Rows Removed by Filter: 82997 </span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif; font-size: xx-small;">Planning Time: 0.062 ms </span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif; font-size: xx-small;">Execution Time: 48.145 ms </span><br />
<br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span><span style="font-family: "arial" , "helvetica" , sans-serif;">Overall response is 0.128 seconds, or 110 times faster.</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">Muscle beats brain. At least here.</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<br />
<h4>
<span style="font-family: "arial" , "helvetica" , sans-serif;">Third race. Relay</span></h4>
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">But if we enable the Swamidass/Baldi index again and keep the native Tanimoto function, the real magic happens.<br /><br />The final plan:</span><br />
<br />
<span style="font-family: arial, helvetica, sans-serif; font-size: xx-small;">Limit (cost=9427.54..9427.56 rows=10 width=77) (actual time=35.567..35.570 rows=10 loops=1) </span><br />
<span style="font-family: arial, helvetica, sans-serif; font-size: xx-small;"> -> Sort (cost=9427.54..9475.94 rows=19361 width=77) (actual time=35.566..35.566 rows=10 loops=1) </span><br />
<span style="font-family: arial, helvetica, sans-serif; font-size: xx-small;"> Sort Key: (tanimoto_c('110000000111000000111000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000</span><br />
<span style="font-family: arial, helvetica, sans-serif; font-size: xx-small;"> Sort Method: top-N heapsort Memory: 26kB </span><br />
<span style="font-family: arial, helvetica, sans-serif; font-size: xx-small;"> -> Bitmap Heap Scan on externalfp (cost=826.09..9009.15 rows=19361 width=77) (actual time=5.404..35.459 rows=90 loops=1) </span><br />
<span style="font-family: arial, helvetica, sans-serif; font-size: xx-small;"> Recheck Cond: ((cardinality >= 103) AND (cardinality <= 128)) </span><br />
<span style="font-family: arial, helvetica, sans-serif; font-size: xx-small;"> Filter: (tanimoto_c('11000000011100000011100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000</span><br />
<span style="font-family: arial, helvetica, sans-serif; font-size: xx-small;"> Rows Removed by Filter: 57910 </span><br />
<span style="font-family: arial, helvetica, sans-serif; font-size: xx-small;"> Heap Blocks: exact=6801 </span><br />
<span style="font-family: arial, helvetica, sans-serif; font-size: xx-small;"> -> Bitmap Index Scan on externalfp_cardinality_idx (cost=0.00..821.25 rows=58083 width=0) (actual time=4.307..4.307 rows=58000 loops=1)</span><br />
<span style="font-family: arial, helvetica, sans-serif; font-size: xx-small;"> Index Cond: ((cardinality >= 103) AND (cardinality <= 128)) </span><br />
<span style="font-family: arial, helvetica, sans-serif; font-size: xx-small;">Planning Time: 0.149 ms </span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif; font-size: xx-small;">Execution Time: 35.641 ms </span> <br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"> </span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;">Final overall response now 0.04 seconds, or 350 times faster!</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "arial" , "helvetica" , sans-serif;">Muscle + brain: unmatched.</span><br />
<span style="font-family: "arial" , "helvetica" , sans-serif;"><br /></span>
<h4>
<span style="font-family: "arial" , "helvetica" , sans-serif;">Conclusion</span></h4>
<span style="font-family: "arial" , "helvetica" , sans-serif;">Sometimes the Hare wins, sometimes the Hedgehog does. Always attack nontrivial optimization problems from different angles and experiment, experiment, experiment.<br /><br />While C largely outperforms the index in this case, it's still good to know both, because using native functions is not always possible, e.g. on an AWS RDS instance. Then, 4.6x faster is better than nothing.<br /><br />PostgreSQL needs a native builtin function for counting bits in BIT VARYING. Since fingerprint screening can be used not only for chemical structures, but for virtually every complex data that can be broken down into binary features, this would be ever so useful.</span>ergohttp://www.blogger.com/profile/06303496169445217331noreply@blogger.com0tag:blogger.com,1999:blog-8333362953216176367.post-52410794151507825552018-06-20T09:04:00.002+02:002018-06-20T09:04:18.683+02:00Blockchain SoccerA opinion piece - published on medium:<br />
<br />
<a href="https://medium.com/@ernst.georg.schmid/blockchain-soccer-869a45a8077b">https://medium.com/@ernst.georg.schmid/blockchain-soccer-869a45a8077b</a>ergohttp://www.blogger.com/profile/06303496169445217331noreply@blogger.com0tag:blogger.com,1999:blog-8333362953216176367.post-55221341635603639502017-10-18T08:53:00.002+02:002017-10-18T08:53:28.443+02:00Wrong prediction of the day<pre style="white-space: pre-wrap; word-wrap: break-word;">"There is this fear people have that eventually actors are going to be replaced by computer characters. I don't think it is valid at all. The only thing that the technology is going to do is provide the actors with new places to go and new ways to go there."</pre>
<pre style="white-space: pre-wrap; word-wrap: break-word;">
</pre>
<pre style="white-space: pre-wrap; word-wrap: break-word;">- Steven Lisberger, 1982, BYTE Magazine, Vol. 07, #11, p. 74</pre>
ergohttp://www.blogger.com/profile/06303496169445217331noreply@blogger.com0tag:blogger.com,1999:blog-8333362953216176367.post-37202248969806232112017-06-08T12:14:00.002+02:002017-06-08T12:15:29.946+02:00Using PostgreSQL to get things done: 2. Tuning for read performance<div style="height: 0; padding-bottom: 56.25%; position: relative;">
<iframe allowfullscreen="" frameborder="0" height="360" src="https://www.youtube.com/embed/l_KfvVO6vmA?ecver=2" style="height: 100%; left: 0; position: absolute; width: 100%;" width="640"></iframe></div>
ergohttp://www.blogger.com/profile/06303496169445217331noreply@blogger.com0tag:blogger.com,1999:blog-8333362953216176367.post-40110701465365485832017-03-29T13:57:00.002+02:002017-03-29T13:57:48.546+02:00Using PostgreSQL to get things done: 1. InstallationCurrently I'm doing a short series of videos about data analysis PostgreSQL. Since this is for people who usually do not directly deal with databases every day, it starts very basic with the installation on Linux and Windows.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<iframe width="320" height="266" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/6DpGdYTmO38/0.jpg" src="https://www.youtube.com/embed/6DpGdYTmO38?feature=player_embedded" frameborder="0" allowfullscreen></iframe></div>
<br />
What do you think, is it worth the effort?ergohttp://www.blogger.com/profile/06303496169445217331noreply@blogger.com1tag:blogger.com,1999:blog-8333362953216176367.post-9282314863544094922017-03-23T15:09:00.000+01:002017-03-23T16:33:53.991+01:00Windows, keep your dirty fingers off my files!<span style="font-family: "georgia" , "times new roman" , serif;">I spent the better part of the morning figuring out why a colleague could not import a PostgreSQL dump in <a href="https://www.postgresql.org/docs/9.6/static/app-pgdump.html">plain format</a> made on Linux on his Windows machine.</span><br />
<span style="font-family: "georgia" , "times new roman" , serif;"><br /></span>
<span style="font-family: "georgia" , "times new roman" , serif;">According to <a href="https://www.postgresql.org/docs/9.6/static/backup-dump.html">documentation</a>, this works like so (OS agnostic):</span><br />
<br />
<pre class="SYNOPSIS"><span style="font-size: small;"><span style="font-family: "courier new" , "courier" , monospace;">psql <span class="REPLACEABLE c2">dbname</span> < <span class="REPLACEABLE c2">infile</span></span></span></pre>
<br />
<span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-size: small;">However, this gave the following error:</span></span><br />
<br />
<pre><span style="font-size: small;"><code>ERROR: missing data for ...</code></span></pre>
<br />
<span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-size: small;">However, the documentation for<a href="https://www.postgresql.org/docs/9.6/static/app-psql.html"> <i>psql</i></a> gives an alternative way to read commands from a file:</span></span><br />
<br />
<span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-size: small;">The <i>-f</i> switch.</span></span><br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;"><span style="font-size: small;"> "</span></span><br />
<dl><dd><span style="font-family: "courier new" , "courier" , monospace;">Read commands from the file <span class="REPLACEABLE c2">filename</span>, rather than standard input.
This option can be repeated and combined in any order with
the <span class="OPTION">-c</span> option. When either
<span class="OPTION">-c</span> or <span class="OPTION">-f</span> is
specified, <span class="APPLICATION">psql</span> does not
read commands from standard input; instead it terminates
after processing all the <span class="OPTION">-c</span> and
<span class="OPTION">-f</span> options in sequence. Except for
that, this option is largely equivalent to the meta-command
<span class="COMMAND">\i</span>.</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">If <span class="REPLACEABLE c2">filename</span> is
<span class="LITERAL">-</span> (hyphen), then standard input is
read until an EOF indication or <span class="COMMAND">\q</span>
meta-command. This can be used to intersperse interactive
input with input from files. Note however that Readline is
not used in this case (much as if <span class="OPTION">-n</span> had been specified).</span><br />
<span style="font-family: "courier new" , "courier" , monospace;">Using this option is subtly different from writing
<span class="LITERAL">psql < <span class="REPLACEABLE c2">filename</span></span>. In general, both will
do what you expect, but using <span class="LITERAL">-f</span>
enables some nice features such as error messages with line
numbers. There is also a slight chance that using this
option will reduce the start-up overhead. On the other
hand, the variant using the shell's input redirection is
(in theory) guaranteed to yield exactly the same output you
would have received had you entered everything by hand.</span></dd></dl>
<span style="font-family: "courier new" , "courier" , monospace;"><span style="font-size: small;"> "</span></span><br />
<br />
<br />
<span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-size: small;">What this doesn't tell you, is that on Windows, <i>CMD.exe</i> apparently somehow tries to interpret the file it reads. And by doing so, it destroyed data in the dump so that <i><a href="https://www.postgresql.org/docs/9.6/static/sql-copy.html">COPY</a></i> was unable to understand it anymore. So the last sentence of the statement above is just theory on Windows.</span></span><br />
<span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-size: small;"><br /></span></span>
<span style="font-family: "georgia" , "times new roman" , serif;"><span style="font-size: small;">Long story short, with <i>psql -f </i>all went fine - and don't use I/O redirection with <i>psql</i> on Windows!</span></span>ergohttp://www.blogger.com/profile/06303496169445217331noreply@blogger.com0tag:blogger.com,1999:blog-8333362953216176367.post-72893609828226244582017-02-06T00:00:00.000+01:002017-02-06T00:00:10.332+01:00Module Monday: safeupdateA recent tweet pointed me to <a href="https://bitbucket.org/eradman/pg-safeupdate">safeupdate</a> which is a little extension that forbids unconditional <span style="font-family: "courier new" , "courier" , monospace;">UPDATE</span> and <span style="font-family: "courier new" , "courier" , monospace;">DELETE</span> statements, i.e. without <span style="font-family: "courier new" , "courier" , monospace;">WHERE</span> clause.<br />
<br />
Let's see how this works...<br />
<br />
The build and install procedure is pretty non-spectacular: <span style="font-family: "courier new" , "courier" , monospace;">make</span>, <span style="font-family: "courier new" , "courier" , monospace;">make install</span> and then either activate it per session with <span style="font-family: "courier new" , "courier" , monospace;">load 'safeupdate'</span> or globally by adding it to <i>postgresql.conf</i>:<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">shared_preload_libraries = 'safeupdate'</span><br />
<br />
After that, if you try an unconditional UPDATE or DELETE on any table, you'll get the following errors:<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">ERROR: UPDATE requires a WHERE clause</span><br />
<br />
and<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">ERROR: DELETE requires a WHERE clause</span><br />
<br />
respectively and the operation is cancelled.<br />
<br />
Unconditional <span style="font-family: "courier new" , "courier" , monospace;">UPDATE</span> and <span style="font-family: "courier new" , "courier" , monospace;">DELETE</span> can still be forced by using a condition that always evaluates to <i>true</i>, like<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">WHERE TRUE</span> or <span style="font-family: "courier new" , "courier" , monospace;">WHERE 1=1</span><br />
<br />
But now you have to explicitly <i>request</i> this behaviour.<br />
<br />
I think this is ever so useful to protect against human error, that I wonder why this isn't something mandatory, even if it's non-standard.<br />
<br />
However, there is one glitch: If you <span style="font-family: "courier new" , "courier" , monospace;">JOIN</span> tables to <span style="font-family: "courier new" , "courier" , monospace;">UPDATE</span> or <span style="font-family: "courier new" , "courier" , monospace;">DELETE</span> only certain rows that match the <span style="font-family: "courier new" , "courier" , monospace;">ON</span> condition, this is not recognized as a conditional operation and the <span style="font-family: "courier new" , "courier" , monospace;">WHERE TRUE</span> is still needed to pacify the extension. Well, I can live with that...ergohttp://www.blogger.com/profile/06303496169445217331noreply@blogger.com2tag:blogger.com,1999:blog-8333362953216176367.post-85677516090318135982017-01-23T19:03:00.002+01:002017-01-23T19:03:46.434+01:00A new type for PostgreSQL<pre class="PROGRAMLISTING" style="background-color: #f7f7f7; border-color: rgb(207, 207, 207); border-radius: 8px; border-style: solid; border-width: 1px; box-shadow: rgb(223, 223, 223) 3px 3px 5px; font-size: 1.3em; margin-bottom: 2ex; margin-left: 2ex; margin-top: 2ex; overflow: auto; padding: 2ex;">CREATE TYPE trumpean AS ENUM ('true', 'alternateTrue');</pre>
ergohttp://www.blogger.com/profile/06303496169445217331noreply@blogger.com0tag:blogger.com,1999:blog-8333362953216176367.post-63912775587444637082017-01-04T23:38:00.001+01:002017-01-04T23:42:05.613+01:00What happened?<table __gwtcellbasedwidgetimpldispatchingblur="true" __gwtcellbasedwidgetimpldispatchingfocus="true" cellspacing="0" class="OYKEW4D-lc-y" style="border-spacing: 0px; border: 0px; color: #222222; font-family: Arial, Helvetica, sans-serif; font-size: 13px; height: 32px; margin: 0px; padding: 0px; vertical-align: baseline; width: 362px;"><tbody style="border: 0px; font-family: inherit; font-style: inherit; font-weight: inherit; margin: 0px; padding: 0px; vertical-align: baseline;">
<tr __gwt_row="0" __gwt_subrow="0" class="OYKEW4D-lc-b" style="background: rgb(255, 255, 255); border: 0px; font-family: inherit; font-style: inherit; font-weight: inherit; margin: 0px; padding: 0px; vertical-align: baseline;"><td class="OYKEW4D-lc-a OYKEW4D-lc-c OYKEW4D-lc-d" style="border: 2px solid rgb(255, 255, 255); font-family: inherit; font-style: inherit; font-weight: inherit; margin: 0px; overflow: hidden; padding: 2px 15px; vertical-align: baseline;"><div __gwt_cell="cell-gwt-uid-1283" style="border: 0px; font-family: inherit; font-style: inherit; font-weight: inherit; margin: 0px; outline-style: none; padding: 0px; vertical-align: baseline;" tabindex="0">
<div class="OYKEW4D-d-k" style="border: 0px; font-family: inherit; font-style: inherit; font-weight: inherit; overflow: hidden; padding: 0px; text-overflow: ellipsis; vertical-align: baseline; white-space: nowrap; width: 220px;">
Deutschland</div>
<div style="border: 0px; font-family: inherit; font-style: inherit; font-weight: inherit; padding: 0px; vertical-align: baseline;">
</div>
</div>
</td><td class="OYKEW4D-lc-a OYKEW4D-lc-c OYKEW4D-lc-n" style="border: 2px solid rgb(255, 255, 255); font-family: inherit; font-style: inherit; font-weight: inherit; margin: 0px; overflow: hidden; padding: 2px 15px; vertical-align: baseline;"><div __gwt_cell="cell-gwt-uid-1284" style="border: 0px; font-family: inherit; font-style: inherit; font-weight: inherit; margin: 0px; outline-style: none; padding: 0px; vertical-align: baseline;">
3222</div>
</td></tr>
<tr __gwt_row="1" __gwt_subrow="0" class="OYKEW4D-lc-r" style="background: rgb(243, 247, 251); border: 0px; font-family: inherit; font-style: inherit; font-weight: inherit; margin: 0px; padding: 0px; vertical-align: baseline;"><td class="OYKEW4D-lc-a OYKEW4D-lc-s OYKEW4D-lc-d" style="border: 2px solid rgb(243, 247, 251); font-family: inherit; font-style: inherit; font-weight: inherit; margin: 0px; overflow: hidden; padding: 2px 15px; vertical-align: baseline;"><div __gwt_cell="cell-gwt-uid-1283" style="border: 0px; font-family: inherit; font-style: inherit; font-weight: inherit; margin: 0px; outline-style: none; padding: 0px; vertical-align: baseline;">
<div class="OYKEW4D-d-k" style="border: 0px; font-family: inherit; font-style: inherit; font-weight: inherit; overflow: hidden; padding: 0px; text-overflow: ellipsis; vertical-align: baseline; white-space: nowrap; width: 220px;">
Vereinigte Staaten</div>
<div style="border: 0px; font-family: inherit; font-style: inherit; font-weight: inherit; padding: 0px; vertical-align: baseline;">
</div>
</div>
</td><td class="OYKEW4D-lc-a OYKEW4D-lc-s OYKEW4D-lc-n" style="border: 2px solid rgb(243, 247, 251); font-family: inherit; font-style: inherit; font-weight: inherit; margin: 0px; overflow: hidden; padding: 2px 15px; vertical-align: baseline;"><div __gwt_cell="cell-gwt-uid-1284" style="border: 0px; font-family: inherit; font-style: inherit; font-weight: inherit; margin: 0px; outline-style: none; padding: 0px; vertical-align: baseline;">
3127</div>
</td></tr>
<tr __gwt_row="2" __gwt_subrow="0" class="OYKEW4D-lc-b OYKEW4D-lc-i" style="background: rgb(238, 238, 238); border: 0px; font-family: inherit; font-style: inherit; font-weight: inherit; margin: 0px; padding: 0px; vertical-align: baseline;"><td class="OYKEW4D-lc-a OYKEW4D-lc-c OYKEW4D-lc-d OYKEW4D-lc-j" style="border: 2px solid rgb(238, 238, 238); font-family: inherit; font-style: inherit; font-weight: inherit; margin: 0px; overflow: hidden; padding: 2px 15px; vertical-align: baseline;"><div __gwt_cell="cell-gwt-uid-1283" style="border: 0px; font-family: inherit; font-style: inherit; font-weight: inherit; margin: 0px; outline-style: none; padding: 0px; vertical-align: baseline;">
<div class="OYKEW4D-d-k" style="border: 0px; font-family: inherit; font-style: inherit; font-weight: inherit; overflow: hidden; padding: 0px; text-overflow: ellipsis; vertical-align: baseline; white-space: nowrap; width: 220px;">
Tschechische Republik</div>
<div style="border: 0px; font-family: inherit; font-style: inherit; font-weight: inherit; padding: 0px; vertical-align: baseline;">
</div>
</div>
</td><td class="OYKEW4D-lc-a OYKEW4D-lc-c OYKEW4D-lc-n OYKEW4D-lc-j" style="border: 2px solid rgb(238, 238, 238); font-family: inherit; font-style: inherit; font-weight: inherit; margin: 0px; overflow: hidden; padding: 2px 15px; vertical-align: baseline;"><div __gwt_cell="cell-gwt-uid-1284" style="border: 0px; font-family: inherit; font-style: inherit; font-weight: inherit; margin: 0px; outline-style: none; padding: 0px; vertical-align: baseline;">
2918</div>
</td></tr>
<tr __gwt_row="3" __gwt_subrow="0" class="OYKEW4D-lc-r" style="background: rgb(243, 247, 251); border: 0px; font-family: inherit; font-style: inherit; font-weight: inherit; margin: 0px; padding: 0px; vertical-align: baseline;"><td class="OYKEW4D-lc-a OYKEW4D-lc-s OYKEW4D-lc-d" style="border: 2px solid rgb(243, 247, 251); font-family: inherit; font-style: inherit; font-weight: inherit; margin: 0px; overflow: hidden; padding: 2px 15px; vertical-align: baseline;"><div __gwt_cell="cell-gwt-uid-1283" style="border: 0px; font-family: inherit; font-style: inherit; font-weight: inherit; margin: 0px; outline-style: none; padding: 0px; vertical-align: baseline;">
<div class="OYKEW4D-d-k" style="border: 0px; font-family: inherit; font-style: inherit; font-weight: inherit; overflow: hidden; padding: 0px; text-overflow: ellipsis; vertical-align: baseline; white-space: nowrap; width: 220px;">
Frankreich</div>
<div style="border: 0px; font-family: inherit; font-style: inherit; font-weight: inherit; padding: 0px; vertical-align: baseline;">
</div>
</div>
</td><td class="OYKEW4D-lc-a OYKEW4D-lc-s OYKEW4D-lc-n" style="border: 2px solid rgb(243, 247, 251); font-family: inherit; font-style: inherit; font-weight: inherit; margin: 0px; overflow: hidden; padding: 2px 15px; vertical-align: baseline;"><div __gwt_cell="cell-gwt-uid-1284" style="border: 0px; font-family: inherit; font-style: inherit; font-weight: inherit; margin: 0px; outline-style: none; padding: 0px; vertical-align: baseline;">
66</div>
</td></tr>
<tr __gwt_row="4" __gwt_subrow="0" class="OYKEW4D-lc-b" style="background: rgb(255, 255, 255); border: 0px; font-family: inherit; font-style: inherit; font-weight: inherit; margin: 0px; padding: 0px; vertical-align: baseline;"><td class="OYKEW4D-lc-a OYKEW4D-lc-c OYKEW4D-lc-d" style="border: 2px solid rgb(255, 255, 255); font-family: inherit; font-style: inherit; font-weight: inherit; margin: 0px; overflow: hidden; padding: 2px 15px; vertical-align: baseline;"><div __gwt_cell="cell-gwt-uid-1283" style="border: 0px; font-family: inherit; font-style: inherit; font-weight: inherit; margin: 0px; outline-style: none; padding: 0px; vertical-align: baseline;">
<div class="OYKEW4D-d-k" style="border: 0px; font-family: inherit; font-style: inherit; font-weight: inherit; overflow: hidden; padding: 0px; text-overflow: ellipsis; vertical-align: baseline; white-space: nowrap; width: 220px;">
Vereinigtes Königreich</div>
<div style="border: 0px; font-family: inherit; font-style: inherit; font-weight: inherit; padding: 0px; vertical-align: baseline;">
</div>
</div>
</td><td class="OYKEW4D-lc-a OYKEW4D-lc-c OYKEW4D-lc-n" style="border: 2px solid rgb(255, 255, 255); font-family: inherit; font-style: inherit; font-weight: inherit; margin: 0px; overflow: hidden; padding: 2px 15px; vertical-align: baseline;"><div __gwt_cell="cell-gwt-uid-1284" style="border: 0px; font-family: inherit; font-style: inherit; font-weight: inherit; margin: 0px; outline-style: none; padding: 0px; vertical-align: baseline;">
49</div>
</td></tr>
<tr __gwt_row="5" __gwt_subrow="0" class="OYKEW4D-lc-r" style="background: rgb(243, 247, 251); border: 0px; font-family: inherit; font-style: inherit; font-weight: inherit; margin: 0px; padding: 0px; vertical-align: baseline;"><td class="OYKEW4D-lc-a OYKEW4D-lc-s OYKEW4D-lc-d" style="border: 2px solid rgb(243, 247, 251); font-family: inherit; font-style: inherit; font-weight: inherit; margin: 0px; overflow: hidden; padding: 2px 15px; vertical-align: baseline;"><div __gwt_cell="cell-gwt-uid-1283" style="border: 0px; font-family: inherit; font-style: inherit; font-weight: inherit; margin: 0px; outline-style: none; padding: 0px; vertical-align: baseline;">
<div class="OYKEW4D-d-k" style="border: 0px; font-family: inherit; font-style: inherit; font-weight: inherit; overflow: hidden; padding: 0px; text-overflow: ellipsis; vertical-align: baseline; white-space: nowrap; width: 220px;">
Belgien</div>
<div style="border: 0px; font-family: inherit; font-style: inherit; font-weight: inherit; padding: 0px; vertical-align: baseline;">
</div>
</div>
</td><td class="OYKEW4D-lc-a OYKEW4D-lc-s OYKEW4D-lc-n" style="border: 2px solid rgb(243, 247, 251); font-family: inherit; font-style: inherit; font-weight: inherit; margin: 0px; overflow: hidden; padding: 2px 15px; vertical-align: baseline;"><div __gwt_cell="cell-gwt-uid-1284" style="border: 0px; font-family: inherit; font-style: inherit; font-weight: inherit; margin: 0px; outline-style: none; padding: 0px; vertical-align: baseline;">
34</div>
</td></tr>
<tr __gwt_row="6" __gwt_subrow="0" class="OYKEW4D-lc-b" style="background: rgb(255, 255, 255); border: 0px; font-family: inherit; font-style: inherit; font-weight: inherit; margin: 0px; padding: 0px; vertical-align: baseline;"><td class="OYKEW4D-lc-a OYKEW4D-lc-c OYKEW4D-lc-d" style="border: 2px solid rgb(255, 255, 255); font-family: inherit; font-style: inherit; font-weight: inherit; margin: 0px; overflow: hidden; padding: 2px 15px; vertical-align: baseline;"><div __gwt_cell="cell-gwt-uid-1283" style="border: 0px; font-family: inherit; font-style: inherit; font-weight: inherit; margin: 0px; outline-style: none; padding: 0px; vertical-align: baseline;">
<div class="OYKEW4D-d-k" style="border: 0px; font-family: inherit; font-style: inherit; font-weight: inherit; overflow: hidden; padding: 0px; text-overflow: ellipsis; vertical-align: baseline; white-space: nowrap; width: 220px;">
Polen</div>
<div style="border: 0px; font-family: inherit; font-style: inherit; font-weight: inherit; padding: 0px; vertical-align: baseline;">
</div>
</div>
</td><td class="OYKEW4D-lc-a OYKEW4D-lc-c OYKEW4D-lc-n" style="border: 2px solid rgb(255, 255, 255); font-family: inherit; font-style: inherit; font-weight: inherit; margin: 0px; overflow: hidden; padding: 2px 15px; vertical-align: baseline;"><div __gwt_cell="cell-gwt-uid-1284" style="border: 0px; font-family: inherit; font-style: inherit; font-weight: inherit; margin: 0px; outline-style: none; padding: 0px; vertical-align: baseline;">
23</div>
</td></tr>
<tr __gwt_row="7" __gwt_subrow="0" class="OYKEW4D-lc-r" style="background: rgb(243, 247, 251); border: 0px; font-family: inherit; font-style: inherit; font-weight: inherit; margin: 0px; padding: 0px; vertical-align: baseline;"><td class="OYKEW4D-lc-a OYKEW4D-lc-s OYKEW4D-lc-d" style="border: 2px solid rgb(243, 247, 251); font-family: inherit; font-style: inherit; font-weight: inherit; margin: 0px; overflow: hidden; padding: 2px 15px; vertical-align: baseline;"><div __gwt_cell="cell-gwt-uid-1283" style="border: 0px; font-family: inherit; font-style: inherit; font-weight: inherit; margin: 0px; outline-style: none; padding: 0px; vertical-align: baseline;">
<div class="OYKEW4D-d-k" style="border: 0px; font-family: inherit; font-style: inherit; font-weight: inherit; overflow: hidden; padding: 0px; text-overflow: ellipsis; vertical-align: baseline; white-space: nowrap; width: 220px;">
Slowakei</div>
<div style="border: 0px; font-family: inherit; font-style: inherit; font-weight: inherit; padding: 0px; vertical-align: baseline;">
</div>
</div>
</td><td class="OYKEW4D-lc-a OYKEW4D-lc-s OYKEW4D-lc-n" style="border: 2px solid rgb(243, 247, 251); font-family: inherit; font-style: inherit; font-weight: inherit; margin: 0px; overflow: hidden; padding: 2px 15px; vertical-align: baseline;"><div __gwt_cell="cell-gwt-uid-1284" style="border: 0px; font-family: inherit; font-style: inherit; font-weight: inherit; margin: 0px; outline-style: none; padding: 0px; vertical-align: baseline;">
12</div>
</td></tr>
<tr __gwt_row="8" __gwt_subrow="0" class="OYKEW4D-lc-b" style="background: rgb(255, 255, 255); border: 0px; font-family: inherit; font-style: inherit; font-weight: inherit; margin: 0px; padding: 0px; vertical-align: baseline;"><td class="OYKEW4D-lc-a OYKEW4D-lc-c OYKEW4D-lc-d" style="border: 2px solid rgb(255, 255, 255); font-family: inherit; font-style: inherit; font-weight: inherit; margin: 0px; overflow: hidden; padding: 2px 15px; vertical-align: baseline;"><div __gwt_cell="cell-gwt-uid-1283" style="border: 0px; font-family: inherit; font-style: inherit; font-weight: inherit; margin: 0px; outline-style: none; padding: 0px; vertical-align: baseline;">
<div class="OYKEW4D-d-k" style="border: 0px; font-family: inherit; font-style: inherit; font-weight: inherit; overflow: hidden; padding: 0px; text-overflow: ellipsis; vertical-align: baseline; white-space: nowrap; width: 220px;">
Brasilien</div>
<div style="border: 0px; font-family: inherit; font-style: inherit; font-weight: inherit; padding: 0px; vertical-align: baseline;">
</div>
</div>
</td><td class="OYKEW4D-lc-a OYKEW4D-lc-c OYKEW4D-lc-n" style="border: 2px solid rgb(255, 255, 255); font-family: inherit; font-style: inherit; font-weight: inherit; margin: 0px; overflow: hidden; padding: 2px 15px; vertical-align: baseline;"><div __gwt_cell="cell-gwt-uid-1284" style="border: 0px; font-family: inherit; font-style: inherit; font-weight: inherit; margin: 0px; outline-style: none; padding: 0px; vertical-align: baseline;">
11</div>
</td></tr>
<tr __gwt_row="9" __gwt_subrow="0" class="OYKEW4D-lc-r" style="background: rgb(243, 247, 251); border: 0px; font-family: inherit; font-style: inherit; font-weight: inherit; margin: 0px; padding: 0px; vertical-align: baseline;"><td class="OYKEW4D-lc-a OYKEW4D-lc-s OYKEW4D-lc-d" style="border: 2px solid rgb(243, 247, 251); font-family: inherit; font-style: inherit; font-weight: inherit; margin: 0px; overflow: hidden; padding: 2px 15px; vertical-align: baseline;"><div __gwt_cell="cell-gwt-uid-1283" style="border: 0px; font-family: inherit; font-style: inherit; font-weight: inherit; margin: 0px; outline-style: none; padding: 0px; vertical-align: baseline;">
<div class="OYKEW4D-d-k" style="border: 0px; font-family: inherit; font-style: inherit; font-weight: inherit; overflow: hidden; padding: 0px; text-overflow: ellipsis; vertical-align: baseline; white-space: nowrap; width: 220px;">
Irland</div>
<div style="border: 0px; font-family: inherit; font-style: inherit; font-weight: inherit; padding: 0px; vertical-align: baseline;">
</div>
</div>
</td><td class="OYKEW4D-lc-a OYKEW4D-lc-s OYKEW4D-lc-n" style="border: 2px solid rgb(243, 247, 251); font-family: inherit; font-style: inherit; font-weight: inherit; margin: 0px; overflow: hidden; padding: 2px 15px; vertical-align: baseline;"><div __gwt_cell="cell-gwt-uid-1284" style="border: 0px; font-family: inherit; font-style: inherit; font-weight: inherit; margin: 0px; outline-style: none; padding: 0px; vertical-align: baseline;">
11</div>
</td></tr>
</tbody></table>
<br />
<div>
Germany suddenly jumped from a two-digit to a four-digit access count?<br />
<br />
Sudden FAME! ;-)</div>
ergohttp://www.blogger.com/profile/06303496169445217331noreply@blogger.com0