使用PostgreSQL外部数据包装器加入异构数据库

PostgreSQL提供了一种与外部数据源通信的方法。这可能是另一个PostgreSQL实例或任何其他数据库。另一个数据库可能是关系数据库,如mysql或oracle;或者任何nosql数据库,如mongodb或hadoop。为了实现这一点,PostgreSQL以外部数据包装器(FDW)的形式实现了ISO标准调用SQL-MED。本演示将详细解释PostgreSQL FDW的工作原理。它将包括对简单特性的详细解释,并将介绍在PostgreSQL的最新版本中添加的更高级的特性。这些示例将显示PostgreSQL中聚合下推和连接下推的工作方式。
会谈将包括这些高级功能的工作示例,并演示它们在不同数据库中的使用。这些例子展示了PostgreSQL如何使用来自不同数据库风格的数据,包括来自异构关系数据库的数据,并显示了NoSQL列存储连接。

展开查看详情

1.Join Heterogeneous Databases Using PostgreSQL Foreign Data Wrappers Ibrar Ahmed Senior Database Architect - Percona LLC May 2019

2.Why? Accessing Data From Multiple Sources SELECT * from multiple “Database Engines” and generate results?

3.Application Architecture 1/2 MySQL Module libmysqlclient U PostgreS S QL Libpq Module E R MongoDB Module libmongo-c A P Join P JDBC L Module JDBC I C A JDBC JDBC T Module I O ODBC N Module ODBC

4.SQL-MED - Management of External Data • SQL standard, it is defined by ISO/IEC 9075-9:2008 • SQL/MED provides extensions to SQL that define FDW ( Foreign Data Wrapper) • PostgreSQL start implementing in its core since PostgreSQL Version 9.1 • PostgreSQL community builds PostgreSQL FDW called postgresql_fdw Now there are many FDWs implemented by other people https://wiki.postgresql.org/wiki/Foreign_data_wrappers

5.Application Architecture 2/2 MySQL Module mysql_fdw libmysqlclient U S PostgreS E QL postgres_fdw Libpq Module R P o A MongoDB s P Module t mogo_fdw libmongo-c P g FDW L r I Spark e Module hdfs_fdw JDBC C S A Q T L Hive I Module hdfs_fdw JDBC O N Clickhouse Module file_fdw ODBC

6.Example US States / Cities pg_tbl_states Countries / mysql_tbl_continents Country mysql_tbl_countries Flight Information clickhouse_tbl_ontime

7.Setup mysqldb_fdw (MySQL) CREATE EXTENSION mysqldb_fdw; CREATE SERVER mysql_svr FOREIGN DATA WRAPPER mysqldb_fdw OPTIONS (host '127.0.0.1', port '3306’ CREATE USER MAPPING FOR postgres SERVER mysql_svr OPTIONS (username 'mysql_user', password 'mysql_pass'); CREATE FOREIGN TABLE mysql_tbl_continents CREATE FOREIGN TABLE mysql_tbl_countries ( ( code VARCHAR(2), code VARCHAR(2), name VARCHAR(255) name VARCHAR(255), ) SERVER mysql_svr OPTIONS(dbname ‘db’); full_name VARCHAR(255), iso3 CHAR(3), number INTEGER, continent_code VARCHAR(2) ) SERVER mysql_svr OPTIONS (dbname ‘db’);

8.Setup clickhousedb_fdw (ClickHouse) CREATE EXTENSION clickhousedb_fdw; CREATE SERVER clickhouse_svr FOREIGN DATA WRAPPER mysql_fdw OPTIONS(dbname 'test_database’, driver '/use/lib/libclickhouseodbc.so'); CREATE USER MAPPING FOR postgres OPTIONS (username ‘clickhouse_user’, password ‘clickhouse_pass’); CREATE FOREIGN TABLE clickhouse_tbl_ontime( Year INTEGER, Quarter INTEGER, Month INTEGER, … ) SERVER clickhouse_svr OPTIONS (table_name ‘ontime’);

9.SELECT Data From MySQL Using mysqldb_fdw 1/2 postgres=# SELECT * FROM mysql_tbl_continents; postgres=# SELECT code, name, continent_code code | name FROM mysql_tbl_countries LIMIT 7; ------+--------------- code | name | continent_code AF | Africa ------+----------------------+---------------- AN | Antarctica AD | Andorra | EU AS | Asia AE | United Arab Emirates | AS EU | Europe AF | Afghanistan | AS NA | North America AG | Antigua and Barbuda | NA OC | Oceania AI | Anguilla | NA SA | South America AL | Albania | EU (7 rows) AM | Armenia | AS (7 rows) Data comes from MySQL Database

10.SELECT Data From MySQL Using mysqldb_fdw 2/2 postgres=# SELECT country.code, country.name, continent.name FROM mysql_tbl_continents continent, mysql_tbl_countries country WHERE continent.code = country.continent_code LIMIT 3; code | name | name ------+--------------+-------- AO | Angola | Africa BF | Burkina Faso | Africa BI | Burundi | Africa (3 rows)

11.SELECT Data From Clickhouse Using clickhousedb_fdw postgres=# SELECT a."Year", c1/c2 as value FROM (SELECT "Year", count(*)*1000 as c1 FROM clickhouse_tbl_ontime WHERE "DepDelay">10 GROUP BY "Year") a INNER JOIN (SELECT "Year", count(*) as c2 FROM clickhouse_tbl_ontime GROUP BY "Year" ) b ON a."Year"=b."Year" LIMIT 3; Year | value ------+----------- 1987 | 199 1988 | 654182000 (2 rows)

12.Join ClickHouse, MySQL and PostgreSQL Using FDW postgres=# SELECT "Year",pg.code,"OriginStateName", pg.country_code, my.name FROM clickhouse_tbl_ontime ch LEFT JOIN pg_tbl_states pg ON pg.name = ch."OriginStateName" LEFT JOIN mysql_tbl_countries my ON pg.country_code = my.code LIMIT 3; Year | code | OriginStateName | country_code | name ------+------+-----------------+--------------+-------------------------- 2011 | MO | Missouri | US | United States of America 2011 | MO | Missouri | US | United States of America 2011 | MO | Missouri | US | United States of America (3 rows)

13.EXPLAIN: Join ClickHouse, MySQL and PostgreSQL postgres=# EXPLAIN VERBOSE SELECT "Year", pg.code, "OriginStateName", pg.country_code,my.name FROM clickhouse_tbl_ontime ch LEFT JOIN pg_tbl_states pg ON pg.name = ch."OriginStateName" LEFT JOIN mysql_tbl_countries my ON pg.country_code = my.code limit 3; QUERY PLAN -> Hash Right Join (cost=10.00..1900.21 rows=5000 width=558) Hash Cond: ((pg.name)::text = ch."OriginStateName") -> Nested Loop Left Join (cost=10.00..1899.09 rows=295 width=532) Join Filter: ((pg.country_code)::text = (my.code)::text) -> Seq Scan on public.pg_tbl_states pg (cost=0.00..1.59 rows=59 width=16) -> Materialize (cost=10.00..1015.00 rows=1000 width=528) -> Foreign Scan on public.mysql_tbl_countries my (cost=10.00..1010.00 rows=1000 width=528) Remote query: SELECT `code`, `name` FROM `db`.`mysql_tbl_countries` -> Hash (cost=0.00..0.00 rows=0 width=36) -> Foreign Scan on public.clickhouse_tbl_ontime ch (cost=0.00..0.00 rows=0 width=36) Output: ch."Year", ch."OriginStateName" Remote SQL: SELECT "Year", "OriginStateName" FROM "default".ontime

14.Push Down – A Performance Feature • Operator and function push down • Predicate push down • Aggregate push down • Join push down

15.PostgreSQL Foreign Data Wrapper - JOIN Push Down postgres=# EXPLAIN (VERBOSE, COST off) SELECT * FROM postgres_tbl_name n RIGHT JOIN postgres_tbl_job j ON(j.name_id > n.id); QUERY PLAN Foreign Scan Output: n.id, n.name, j.id, j.job_title, j.name_id Relations: (public.postgres_tbl_job j) LEFT JOIN (public.postgres_tbl_name n) Remote SQL: SELECT r2.id, r2.job_title, r2.name_id, r1.id, r1.name FROM (public.postgres_tbl_job r2 LEFT JOIN public.postgres_tbl_name r1 ON (((r2.name_id > r1.id)))) (4 rows)

16.PostgreSQL Foreign Data Wrapper - Aggregate Push Down postgres=# EXPLAIN VERBOSE SELECT count(*) FROM postgres_tbl_name; QUERY PLAN ------------------------------------------------------------- Foreign Scan (cost=108.53..152.69 rows=1 width=8) Output: (count(*)) Relations: Aggregate on (public.postgres_tbl_name) Remote SQL: SELECT count(*) FROM public.postgres_tbl_name (4 rows) postgres=# EXPLAIN VERBOSE SELECT count(*) FROM mysql_tbl_continents; QUERY PLAN -------------------------------------------------------------------------------------------- Aggregate (cost=1012.50..1012.51 rows=1 width=8) Output: count(*) -> Foreign Scan on public.mysql_tbl_continents (cost=10.00..1010.00 rows=1000 width=0) Output: continent_id, continent_name Local server startup cost: 10 Remote query: SELECT NULL FROM `db`.`mysql_tbl_continents` (6 rows)

17.DML Support • PostgreSQL has DML support • There are a number of Foreign Data Wrappers that support DML such as: ○ postgres_fdw ○ mysql_fdw ○ oracle_fdw ○ etc.

18. ? “Poor leaders rarely ask questions of themselves or others. Good leaders, on the other hand, ask many questions. Great leaders ask the great questions.” Michael Marquardt author of Leading with Questions 18

19.Thank You to Our Sponsors

20.Rate My Session

21.Champions of Unbiased Open Source Database Solutions