tag:blogger.com,1999:blog-8333362953216176367.post2848227070859135950..comments2022-07-24T21:57:29.200+02:00Comments on The plate is bad: pg_strom - The rough road aheadergohttp://www.blogger.com/profile/06303496169445217331noreply@blogger.comBlogger4125tag:blogger.com,1999:blog-8333362953216176367.post-68370196366134745882015-09-14T14:54:04.092+02:002015-09-14T14:54:04.092+02:00Thank you for addressing the issue so fast.
After...Thank you for addressing the issue so fast.<br /><br />After doing some desk research on CUDA and high CPU load I still have a feeling that the driver has a saying in this too, but I'm stuck with 352 for the moment. 355 messes up my installation.ergohttps://www.blogger.com/profile/06303496169445217331noreply@blogger.comtag:blogger.com,1999:blog-8333362953216176367.post-23084818108059573572015-09-14T14:52:12.837+02:002015-09-14T14:52:12.837+02:00Yes, it's a know issue with my type of graphic...Yes, it's a know issue with my type of graphics card. See the answer by the original author. Your GTX960 has much more memory bandwith. And your driver is the latest. After doing some desk research on CUDA and high CPU load I still have a feeling that the driver has a saying in this too.ergohttps://www.blogger.com/profile/06303496169445217331noreply@blogger.comtag:blogger.com,1999:blog-8333362953216176367.post-91164465945729919862015-09-14T13:49:37.737+02:002015-09-14T13:49:37.737+02:00It's a known issue, and I'm now working on...It's a known issue, and I'm now working on.<br /><br />Your GPU (Quadro K1100M) has relatively less memory performance (44.8GB/sec bandwidth), on the other hands, workload is very memory intensive - GpuPreAgg heavily uses atomic operations.<br />In addition, grouping key distribution is worst, because all the "y" column has 'a'. It means all the GPU kernel thread tries to make atomic operation on a particular item.KaiGai Koheihttps://www.blogger.com/profile/12619621362349947108noreply@blogger.comtag:blogger.com,1999:blog-8333362953216176367.post-58607985081383927562015-09-14T08:56:13.647+02:002015-09-14T08:56:13.647+02:00Looks good to me. 4360.404 ms vs 2165.910 ms
wocu...Looks good to me. 4360.404 ms vs 2165.910 ms<br /><br />wocuda=# EXPLAIN ANALYZE SELECT count(*)<br />FROM t_test<br />WHERE sqrt(x) > 0<br />GROUP BY y;<br /> QUERY PLAN <br />----------------------------------------------------------------------------------------------------------------------------<br /> HashAggregate (cost=242892.45..242892.46 rows=1 width=101) (actual time=4360.312..4360.312 rows=1 loops=1)<br /> Group Key: y<br /> -> Seq Scan on t_test (cost=0.00..234559.11 rows=1666669 width=101) (actual time=4.197..1791.154 rows=5000000 loops=1)<br /> Filter: (sqrt((x)::double precision) > '0'::double precision)<br /> Planning time: 0.134 ms<br /> Execution time: 4360.404 ms<br />(6 řádek)<br /><br />wocuda=# SET pg_strom.enabled = ON;<br />SET<br />wocuda=# EXPLAIN ANALYZE SELECT count(*)<br />FROM t_test<br />WHERE sqrt(x) > 0<br />GROUP BY y;<br /> QUERY PLAN <br />----------------------------------------------------------------------------------------------------------------------------------------------------<br /> HashAggregate (cost=177230.91..177230.92 rows=1 width=101) (actual time=2006.707..2006.707 rows=1 loops=1)<br /> Group Key: y<br /> -> Custom Scan (GpuPreAgg) (cost=13929.24..173681.41 rows=260 width=408) (actual time=997.161..2005.989 rows=76 loops=1)<br /> Bulkload: On (density: 100.00%)<br /> Reduction: Local + Global<br /> Device Filter: (sqrt((x)::double precision) > '0'::double precision)<br /> -> Custom Scan (BulkScan) on t_test (cost=9929.24..168897.56 rows=5000006 width=101) (actual time=22.665..1975.907 rows=5000000 loops=1)<br /> Planning time: 0.434 ms<br /> Execution time: 2165.910 ms<br />(9 řádek)<br /><br />Going on <br />Gentoo<br />Postgresql 9.5alpha1<br /><br />CUDA Runtime version: 7.5.0<br />NVIDIA driver version: 355.11<br />GPU0 GeForce GTX 960 (1024 CUDA cores, 1278MHz), L2 1024KB, RAM 4095MB (128bits, 3505MHz), capability 5.2<br />NVRTC - CUDA Runtime Compilation vertion 7.5<br />AMD Phenom(tm) II X6 1100T Processor<br />Anonymoushttps://www.blogger.com/profile/16373743268692686308noreply@blogger.com