postgresql 13.1 insert into select并行查询的实现

数据库 2024/12/28 佚名

3 1 2

本文信息基于PG13.1。

从PG9.6开始支持并行查询。PG11开始支持CREATE TABLE … AS、SELECT INTO以及CREATE MATERIALIZED VIEW的并行查询。

先说结论：

换用create table as 或者select into或者导入导出。

首先跟踪如下查询语句的执行计划：

select count(*) from test t1,test1 t2 where t1.id = t2.id ;
postgres=# explain analyze select count(*) from test t1,test1 t2 where t1.id = t2.id ;
                  QUERY PLAN                  
-------------------------------------------------------------------------------------------
Finalize Aggregate (cost=34244.16..34244.17 rows=1 width=8) (actual time=683.246..715.324 rows=1 loops=1)
 -> Gather (cost=34243.95..34244.16 rows=2 width=8) (actual time=681.474..715.311 rows=3 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   -> Partial Aggregate (cost=33243.95..33243.96 rows=1 width=8) (actual time=674.689..675.285 rows=1 loops=3)
    -> Parallel Hash Join (cost=15428.00..32202.28 rows=416667 width=0) (actual time=447.799..645.689 rows=333333 loops=3)
      Hash Cond: (t1.id = t2.id)
      -> Parallel Seq Scan on test t1 (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.025..74.010 rows=333333 loops=3)
      -> Parallel Hash (cost=8591.67..8591.67 rows=416667 width=4) (actual time=260.052..260.053 rows=333333 loops=3)
       Buckets: 131072 Batches: 16 Memory Usage: 3520kB
       -> Parallel Seq Scan on test1 t2 (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.032..104.804 rows=333333 loops=3)
 Planning Time: 0.420 ms
 Execution Time: 715.447 ms
(13 rows)

可以看到走了两个Workers。

下边看一下insert into select：

postgres=# explain analyze insert into va select count(*) from test t1,test1 t2 where t1.id = t2.id ;   
                 QUERY PLAN                 
-------------------------------------------------------------------------------------------
Insert on va (cost=73228.00..73228.02 rows=1 width=4) (actual time=3744.179..3744.187 rows=0 loops=1)
 -> Subquery Scan on "*SELECT*" (cost=73228.00..73228.02 rows=1 width=4) (actual time=3743.343..3743.352 rows=1 loops=1)
   -> Aggregate (cost=73228.00..73228.01 rows=1 width=8) (actual time=3743.247..3743.254 rows=1 loops=1)
    -> Hash Join (cost=30832.00..70728.00 rows=1000000 width=0) (actual time=1092.295..3511.301 rows=1000000 loops=1)
      Hash Cond: (t1.id = t2.id)
      -> Seq Scan on test t1 (cost=0.00..14425.00 rows=1000000 width=4) (actual time=0.030..421.537 rows=1000000 loops=1)
      -> Hash (cost=14425.00..14425.00 rows=1000000 width=4) (actual time=1090.078..1090.081 rows=1000000 loops=1)
       Buckets: 131072 Batches: 16 Memory Usage: 3227kB
       -> Seq Scan on test1 t2 (cost=0.00..14425.00 rows=1000000 width=4) (actual time=0.021..422.768 rows=1000000 loops=1)
 Planning Time: 0.511 ms
 Execution Time: 3745.633 ms
(11 rows)

可以看到并没有Workers的指示，没有启用并行查询。

即使开启强制并行，也无法走并行查询。

postgres=# set force_parallel_mode =on;
SET
postgres=# explain analyze insert into va select count(*) from test t1,test1 t2 where t1.id = t2.id ;
                 QUERY PLAN                 
-------------------------------------------------------------------------------------------
Insert on va (cost=73228.00..73228.02 rows=1 width=4) (actual time=3825.042..3825.049 rows=0 loops=1)
 -> Subquery Scan on "*SELECT*" (cost=73228.00..73228.02 rows=1 width=4) (actual time=3824.976..3824.984 rows=1 loops=1)
   -> Aggregate (cost=73228.00..73228.01 rows=1 width=8) (actual time=3824.972..3824.978 rows=1 loops=1)
    -> Hash Join (cost=30832.00..70728.00 rows=1000000 width=0) (actual time=1073.587..3599.402 rows=1000000 loops=1)
      Hash Cond: (t1.id = t2.id)
      -> Seq Scan on test t1 (cost=0.00..14425.00 rows=1000000 width=4) (actual time=0.034..414.965 rows=1000000 loops=1)
      -> Hash (cost=14425.00..14425.00 rows=1000000 width=4) (actual time=1072.441..1072.443 rows=1000000 loops=1)
       Buckets: 131072 Batches: 16 Memory Usage: 3227kB
       -> Seq Scan on test1 t2 (cost=0.00..14425.00 rows=1000000 width=4) (actual time=0.022..400.624 rows=1000000 loops=1)
 Planning Time: 0.577 ms
 Execution Time: 3825.923 ms
(11 rows)

原因在官方文档有写：

The query writes any data or locks any database rows. If a query contains a data-modifying operation either at the top level or within a CTE, no parallel plans for that query will be generated. As an exception, the commands CREATE TABLE … AS, SELECT INTO, and CREATE MATERIALIZED VIEW which create a new table and populate it can use a parallel plan.

解决方案有如下三种：

1.select into

postgres=# explain analyze select count(*) into vaa from test t1,test1 t2 where t1.id = t2.id ;
                  QUERY PLAN                  
-------------------------------------------------------------------------------------------
Finalize Aggregate (cost=34244.16..34244.17 rows=1 width=8) (actual time=742.736..774.923 rows=1 loops=1)
 -> Gather (cost=34243.95..34244.16 rows=2 width=8) (actual time=740.223..774.907 rows=3 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   -> Partial Aggregate (cost=33243.95..33243.96 rows=1 width=8) (actual time=731.408..731.413 rows=1 loops=3)
    -> Parallel Hash Join (cost=15428.00..32202.28 rows=416667 width=0) (actual time=489.880..700.830 rows=333333 loops=3)
      Hash Cond: (t1.id = t2.id)
      -> Parallel Seq Scan on test t1 (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.033..87.479 rows=333333 loops=3)
      -> Parallel Hash (cost=8591.67..8591.67 rows=416667 width=4) (actual time=266.839..266.840 rows=333333 loops=3)
       Buckets: 131072 Batches: 16 Memory Usage: 3520kB
       -> Parallel Seq Scan on test1 t2 (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.058..106.874 rows=333333 loops=3)
 Planning Time: 0.319 ms
 Execution Time: 783.300 ms
(13 rows)

2.create table as

postgres=# explain analyze create table vb as select count(*) from test t1,test1 t2 where t1.id = t2.id ;
                  QUERY PLAN                  
-------------------------------------------------------------------------------------------
 Finalize Aggregate (cost=34244.16..34244.17 rows=1 width=8) (actual time=540.120..563.733 rows=1 loops=1)
 -> Gather (cost=34243.95..34244.16 rows=2 width=8) (actual time=537.982..563.720 rows=3 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   -> Partial Aggregate (cost=33243.95..33243.96 rows=1 width=8) (actual time=526.602..527.136 rows=1 loops=3)
    -> Parallel Hash Join (cost=15428.00..32202.28 rows=416667 width=0) (actual time=334.532..502.793 rows=333333 loops=3)
      Hash Cond: (t1.id = t2.id)
      -> Parallel Seq Scan on test t1 (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.018..57.819 rows=333333 loops=3)
      -> Parallel Hash (cost=8591.67..8591.67 rows=416667 width=4) (actual time=189.502..189.503 rows=333333 loops=3)
       Buckets: 131072 Batches: 16 Memory Usage: 3520kB
       -> Parallel Seq Scan on test1 t2 (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.023..77.786 rows=333333 loops=3)
 Planning Time: 0.189 ms
 Execution Time: 565.448 ms
(13 rows)

3.或者通过导入导出的方式，例如：

psql -h localhost -d postgres -U postgres -c "select count(*) from test t1,test1 t2 where t1.id = t2.id " -o result.csv -A -t -F ","
psql -h localhost -d postgres -U postgres -c "COPY va FROM 'result.csv' WITH (FORMAT CSV, DELIMITER ',', HEADER FALSE, ENCODING 'windows-1252')"

一些场景下也会比非并行快。

补充：POSTGRESQL: 动态SQL语句中不能使用SELECT INTO？

我的数据库版本是 PostgreSQL 8.4.7 。下面是出错的存储过程：

CREATE or Replace FUNCTION func_getnextid(
 tablename varchar(240),
 idname varchar(20) default 'id')
RETURNS integer AS $funcbody$
Declare
 sqlstring varchar(240);
 currentId integer;
Begin
 sqlstring:= 'select max("' || idname || '") into currentId from "' || tablename || '";';
 EXECUTE sqlstring;
 if currentId is NULL or currentId = 0 then
  return 1;
 else
  return currentId + 1;
 end if;
End;
$funcbody$ LANGUAGE plpgsq

执行后出现这样的错误：

SQL error:

ERROR: EXECUTE of SELECT ... INTO is not implemented

CONTEXT: PL/pgSQL function "func_getnextbigid" line 6 at EXECUTE statement

改成这样的就对了：

CREATE or Replace FUNCTION func_getnextid(
 tablename varchar(240),
 idname varchar(20) default 'id')
RETURNS integer AS $funcbody$
Declare
 sqlstring varchar(240);
 currentId integer;
Begin
 sqlstring:= 'select max("' || idname || '") from "' || tablename || '";';
 EXECUTE sqlstring into currentId;
 if currentId is NULL or currentId = 0 then
  return 1;
 else
  return currentId + 1;
 end if;
End;
$funcbody$ LANGUAGE plpgsql;

以上为个人经验，希望能给大家一个参考，也希望大家多多支持。如有错误或未考虑完全的地方，望不吝赐教。

postgresql,insert,into,select,并行查询

华山资源网 Design By www.eoogi.com

广告合作：本站广告合作请联系QQ：858582 申请时备注：广告合作（否则不回）
免责声明：本站资源来自互联网收集,仅供用于学习和交流,请遵循相关法律法规,本站一切资源不代表本站立场,如有侵权、后门、不妥请联系本站删除！

华山资源网 Design By www.eoogi.com

评论“postgresql 13.1 insert into select并行查询的实现”

暂无评论...

www.eoogi.com 华山资源网

120,135影音资源

344,641技术资源

22,817软件资源

435,032站长资源

postgresql 13.1 insert into select并行查询的实现

先说结论：

1.select into

2.create table as

3.或者通过导入导出的方式，例如：

postgresql连续归档及时间点恢复的操作

PostgreSQL pg_archivecleanup与清理archivelog的操作

评论“postgresql 13.1 insert into select并行查询的实现”

RTX 5090要首发性能要翻倍！三星展示GDDR7显存

更新日志

友情链接

postgresql 13.1 insert into select并行查询的实现

先说结论：

1.select into

2.create table as

3.或者通过导入导出的方式，例如：

postgresql连续归档及时间点恢复的操作

PostgreSQL pg_archivecleanup与清理archivelog的操作

评论“postgresql 13.1 insert into select并行查询的实现”

RTX 5090要首发 性能要翻倍！三星展示GDDR7显存

更新日志

友情链接

RTX 5090要首发性能要翻倍！三星展示GDDR7显存