- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 视频嵌入链接 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
Greenplum磁盘配额管理工具Diskquota
Diskquota 是 Greenplum 的磁盘配额管理工具(https://github.com/greenplum-db/diskquota)。本次分享围绕 Diskquota 怎样使用 launcher 进程和多个 worker 进程实现监控多个数据库, 怎样增量收集表占用磁盘空间大小,以及怎样拦截超出限额的 query 来深入了解 Diskquota 的内部实现,最后分享一下 Diskquota 下一个版本的改进。
话题大纲:
- Diskquota 介绍
- Launcher 和 worker 架构
- Diskquota 元数据表
- schema/role 占用磁盘大小计算
- Query 阻拦
- Diskquota下一个版本
展开查看详情
1 . Diskquota Internals https://github.com/greenplum-db/diskquota 王晓冉 Software developer at VMware Confidential │ ©2018 VMware, Inc.
2 .Agenda q Diskquota 介绍 q Launcher-worker 架构 q Diskquota 元数据表 q Schema/Role 占用磁盘大小计算 q Query 拦截 Confidential │ ©2018 VMware, Inc. 2
3 .Diskquota 介绍 o Enable/Disable diskquota on a database o Set shema/role quota o Disk usage Confidential │ ©2018 VMware, Inc. 3
4 .CREATE/DROP EXTENSION wxiaoran-a01:backend xiwang$ psql postgres pp | grep diskquota 501 25871 25692 0 11:06AM ?? 0:00.05 postgres: 6000, bgworker: postgres=# create extension diskquota; [diskquota] - launcher CREATE EXTENSION 501 25876 25590 0 11:06AM ?? 0:00.04 postgres: 6002, xiwang diskquota 127.0.0.1(51880) con5 seg0 idle 501 25877 25559 0 11:06AM ?? 0:00.04 postgres: 6003, xiwang diskquota 127.0.0.1(51881) con5 seg1 idle 501 25878 25602 0 11:06AM ?? 0:00.04 postgres: 6004, xiwang diskquota 127.0.0.1(51882) con5 seg2 idle 501 27594 25692 0 11:06AM ?? 0:00.20 postgres: 6000, bgworker: [diskquota] postgres cmd2 wxiaoran-a01:backend xiwang$ psql postgres wxiaoran-a01:backend xiwang$ pp | grep diskquota 501 25871 25692 0 11:06AM ?? 0:00.07 postgres: 6000, bgworker: postgres=# drop extension diskquota; [diskquota] - launcher DROP EXTENSION 501 25876 25590 0 11:06AM ?? 0:00.06 postgres: 6002, xiwang diskquota 127.0.0.1(51880) con5 seg0 idle 501 25877 25559 0 11:06AM ?? 0:00.05 postgres: 6003, xiwang diskquota 127.0.0.1(51881) con5 seg1 idle 501 25878 25602 0 11:06AM ?? 0:00.06 postgres: 6004, xiwang diskquota Confidential │ ©2018 VMware, Inc. 127.0.0.1(51882) con5 seg2 idle 4
5 .SCHEMA/ROLE QUOTA 1. CREATE SCHEMA s1; 2. SET search_path TO s1; 3. CREATE TABLE a(i int) DISTRIBUTED BY (i); 4. INSERT INTO a SELECT generate_series(1,100000); SELECT diskquota.set_schema_quota('s1', '1 MB'); set_schema_quota ------------------ (1 row) INSERT INTO a SELECT generate_series(1,100); ERROR: schema's disk space quota exceeded with name: s1 Confidential │ ©2018 VMware, Inc. 5
6 .DISK USAGE SELECT schema_name, quota_in_mb, nspsize_in_bytes FROM diskquota.show_fast_schema_quota_view WHERE schema_name = ‘s1' schema_name | quota_in_mb | nspsize_in_bytes ---------------------+-------------------+-------------------------------------- s1 |1 | 3932160 SELECT tableid::regclass, size, segid FROM diskquota.table_size WHERE tableid::regclass::name NOT LIKE ‘%.%’ ORDER BY tableid; tableid | size | segid ---------+---------+------- a | 3932160 | -1 a | 1310720 | 2 a | 1310720 | 1 a | 1310720 | 0 Confidential │ ©2018 VMware, Inc. 6
7 .Launcher-worker 架构 o Extension o Bgworker o Launcher & worker Confidential │ ©2018 VMware, Inc. 7
8 . MPP Confidential │ ©2018 VMware, Inc. 8
9 .Extension 1. diskquota extension package • diskquota.so • control file (version) • sql scripts (UDF/TABLE) 2. CREATE/DROP EXTENSION • Execute SQL scripts CREATE FUNCTION diskquota.diskquota_start_worker() RETURNS void STRICT AS '$libdir/diskquota-2.0.so' LANGUAGE C; • Load diskquota.so 3. pg_extension (upgrade/downgrade) • diskquota--2.0.sql • diskquota--1.0--2.0.sql • diskquota--2.0--1.0.sql Confidential │ ©2018 VMware, Inc. 9
10 .Bgworker 特性 1. 访问共享内存 2. 直接访问数据库(SPI) 3. 运行用户代码(diskquota.so) 基本步骤 1. Postmaster fork 2. InitProcess (shared memory) 3. InitPostgres(dbname, InvalidOid, username, NULL) (连接数据库) 4. Extension main function Confidential │ ©2018 VMware, Inc. 10
11 .Bgworker API 1. RegisterBackgroundWorker • 在 _PG_init() 中调用, • 其所在的外部库需要注册到 shared_preload_libraries . postmaster 在启动的时候会加载 shared_preload_libraries 里面指定的外部库,调用 _PG_init , 调用 RegisterBackgroundWorker 注册启动 bgworker . 这类 bgworker 和 postmaster 一起启动 2. RegisterDynamicBackgroudWorker 当 postgres启动之后,用户也可以在其它进程中(bgworker) 动态注册启动新的bgworker. 3. TerminateBackgroundWorker Confidential │ ©2018 VMware, Inc. 11
12 .Launcher-worker 架构 gpconfig -c shared_preload_libraries -v 'diskquota-2.0’ _PG_init RegisterBackgroundWorker diskquota.diskquota_start_worker() RegisterDynamicBa Postgres process signal Diskquota ckgroudWorker Bgworker (create diskquota Master extension ) 1 Launcher 2 Framawork 4 3 1 Shared memory Diskquota worker Launcher( on master node) • Greenplum 集群启动的时候创建 (shared_preload_libraries) • 创建或者销毁 diskquota worker 进程 Confidential │ ©2018 VMware, Inc. 12
13 .Launcher-worker 架构 Worker Postgres Shared memory Generate Postgres Collect disk usage Block list Block list Segment Segment Segment Diskquota worker ( On master node) • 轮询收集表占用的磁盘大小 • 计算 shema/role 的磁盘大小 • 生成 block list Confidential │ ©2018 VMware, Inc. 13
14 .Limited bgworkers diskquota 目前最多支持 10 个 database • max_worker_processes • One bgworker can only connect to one database diskquota 不限制 database 数量(diskquota 2.1) Dynamically create bgworker for diskquota Confidential │ ©2018 VMware, Inc. 14
15 .Diskquota 元数据表 Confidential │ ©2018 VMware, Inc. 15
16 .Diskquota 元数据表 o diskquota 数据库 • diskquota_namespace.database_list (dbid oid) o diskquota shema (each database has installed diskquota extension) • diskquota.table_size(tableid oid, size bigint, segid smallint) • diskquota. quota_config Confidential │ ©2018 VMware, Inc. 16
17 .Schema/Role 磁盘占用 Confidential │ ©2018 VMware, Inc. 17
18 .Schema/Role 磁盘占用 1. 初始化 o 获取当前数据库下的所有表的大小 § Create diskquota extension之后扫描所有的表 § 从table_size中获取 o 计算每个shema/role 磁盘占用大小 2. 每次轮询 o 获取 active tables (轮询间隔) o 获取每个 active table 占用磁盘大小 o 获取删除 和 新增的表的大小 o 增量计算每个 shema/role 占用磁盘大小 o 同步表的大小到table_size表中 Confidential │ ©2018 VMware, Inc. 18
19 .Active tables Heap AO/CO Buffer SMGR MD File IO Confidential │ ©2018 VMware, Inc. 19
20 .Active tables Hooks in SMGR: Worker • file_create_hook • file_extend_hook Collect disk usage • file_truncate_hook • file_unlink_hook diskquota_fetch_table_stat Shared memory Shared memory Shared memory Active table list Active table list Active table list Segment Segment Segment 1. Save active table oids in share memory. (on segments) 2. UDF to fetch active oids from shared memory on segments Datum diskquota_fetch_table_stat(PG_FUNCTION_ARGS) 3. Diskquota worker run “diskquota_fetch_table_stat” on each segment by CdbDispatchCommand Confidential │ ©2018 VMware, Inc. 20
21 .表的可见性 1. 已提交的表 2. 未提交的表 wxiaoran-a01:backend xiwang$ psql postgres wxiaoran-a01:backend xiwang$ psql postgres postgres=# create table foo(a int); postgres=# begin; postgres=# insert into foo select generate_series(0,1000); postgres=# create table test(a int); postgres=# select oid from pg_class where relname='foo’; postgres=# insert into test select generate_series(1,1000); oid postgres=# select oid from pg_class where relname='test'; ------- oid 49166 ------- 49169 select * from diskquota.table_size where tableid in (49166, postgres=# abort; 49169) and segid=-1; postgres=# select * from diskquota.table_size where tableid tableid | size | segid in (49166, 49169) and segid=-1;; ---------+-------+------- tableid | size | segid 49166 | 98304 | -1 ---------+-------+------- 49169 | 98304 | -1 49166 | 98304 | -1 Confidential │ ©2018 VMware, Inc. 21
22 .Table size 1. HEAP • main fork • fsm fork • vm fork 2. APPENDONLY • segments Confidential │ ©2018 VMware, Inc. 22
23 .Table size 1. Heap table • table + index (heap) • toast table + index (heap) 2. AO/CO table • segments (appendonly) • index (heap) • block directory table + index (heap) • visibility map table + index (heap) • toast table + index (heap) • aoseg table (heap) Confidential │ ©2018 VMware, Inc. 23
24 .stat 1. relationpath = relpathbackend(rel->rd_node, rel->rd_backend, forknum); 2. stat(pathname, &fst) 3. totalsize += fst.st_size Confidential │ ©2018 VMware, Inc. 24
25 .Query 拦截 o Query 执行之前拦截 o Query 执行中拦截 Confidential │ ©2018 VMware, Inc. 25
26 .Query 执行之前拦截 o Hook: ExecutorCheckPerms_hook(在 master 节点上拦截 query) o Insert/update query 执行之前拦截 • INSERT/UPDATE/COPY • CREATE TABLE AS, CREATE INDEX o 不足 有些 Query 一次插入很多数据,执行之前 quota 没有超,diskquota 没拦截,执行完之后导致磁盘占用量过大 Confidential │ ©2018 VMware, Inc. 26
27 .Query 执行中拦截 o Hook: file_extend_hook (在 segment 拦截 query) o Insert/update query 执行之中拦截 • INSERT/UPDATE/COPY • CREATE TABLE AS, CREATE INDEX o 实现: • Distribute block map to segments • 通过 Hook 在 segment 上拦截正在执行的 query Confidential │ ©2018 VMware, Inc. 27
28 .Thank You Confidential │ ©2018 VMware, Inc.