Getting Started with PostgreSQL

PostgreSQL无疑是第二大最受欢迎的开源RDBMS,它在db-engines.com中排名前五。为什么不在你最喜欢的数据库服务器上休息一会儿,开始了解更多关于它的信息,并在你的简历上寻找另一个标志呢?
在本课程中,我们将探讨这个RDBMS,并将其与主要竞争对手MySQL进行比较,以同时关联概念并了解Postgres的优势所在。在我们将涉及的主题中:
服务器体系结构 复制和HA
Postgres特定功能 还有更多!

展开查看详情

1.© 2017 Pythian. Confidential 1

2. PYTHIAN A global IT company that helps businesses leverage disruptive technologies to better compete. Our services and software solutions unleash the power of cloud, data and analytics to drive better business outcomes for our clients. Our 20 years in data, commitment to hiring the best talent, and our deep technical and business expertise allow us to meet our promise of using technology to deliver the best outcomes faster. © The Pythian Group Inc., 2019 2018 2

3. 20 Years in Business 400+ Pythian Experts 350+ Current Clients in 35 Countries Globally © 2017 Pythian. Confidential 3

4.DIVERSIFIED, BLUE CHIP CLIENT BASE Media/Information Financial Services Retail E-commerce SaaS Services © The Pythian Group Inc., 2019 2018 4

5.About me ● > 20 years in IT ● 2 kids (5 and 8) and 2 dogs ● 3 Years with Pythian ● 6 Years working with MySQL ● AWS Certified Solutions Architect - Associate ● GCP Certified Professional Cloud Architect © The Pythian Group Inc., 2019 2018 5

6.Getting Started with PostgreSQL by Kevin Markwardt © The Pythian Group Inc., 2018 6

7.Introduction ● My Journey on Learning PostgreSQL ● Looking at it from a MySQL perspective ● Review Major Differences that I have found ● Look at what makes PosgreSQL special ©©The The Pythian Pythian Group Group Inc., Inc., 2017 2018 7

8.Major Differences ● Command Line and Clients: mysql and mysqladmin (MySQL) - psql and pg_ctl (PostgreSQL) ● Client Commands: SHOW DATABASES; (MySQL) - \l (PostgreSQL) ● Database and Schemas: Databases (MySQL) - Schemas (PostgreSQL) ● Users: Users (MySQL) - Roles (PostgreSQL) ● Connections: Threads (MySQL) - Processes (PostgreSQL) ● MVCC - Purge (MySQL) - Vacuuming (PostgreSQL) ● Memory Management: Buffer Pool (MySQL) - Shared Memory (PostgreSQL) ● Replication: Binary Logs (MySQL) - WAL Logs (PostgreSQL) ● File System: File Per Table (MySQL) - PGDATA (PostgreSQL) © The Pythian Group Inc., 2019 2018 8

9.Major Differences - Command Line and Clients(MySQL) ● mysql - Command line client. Default login is with root user with no password from localhost. ● mysqladmin - Administrative client ● mysqldump - Logical backup ● MySQL Workbench - Graphical client © The Pythian Group Inc., 2019 2018 9

10.Major Differences - Command Line and Clients (PostgreSQL) ● psql - Command line client. Default login is done with the postgres user account ● pg_ctl - Administrative client ● pg_dump - Logical backup for one database ● pg_dumpall - Logical backup of all the databases ● pg_basebackup - Binary backup ● pgAdmin - Graphical client © The Pythian Group Inc., 2019 2018 10

11.Major Differences - Client Commands MySQL PostgreSQL ● Client Commands ● Client commands ○ SHOW DATABASES; - List databases ○ \l - List databases ○ use <db_name> - Connect to database ○ \c <db_name> - Connect to database ○ SELECT user, host FROM mysql.user - ○ \du - List users List users ○ \dt - List tables ○ SHOW TABLES; - List tables ○ \d <table> - Describe table ○ DESCRIBE TABLE <table_name> - ○ select * from pg_stat_activity; - Process Describe table list ○ SHOW PROCESS LIST; - Process list ○ show all - Show settings ○ SHOW VARIABLES; - Show settings ○ \? - Show commands available © The Pythian Group Inc., 2019 2018 2017 11

12.Major Differences - Database and Schemas ● MySQL has Databases Only ● PostgreSQL has both Databases and Schemas ○ Schemas add a layer of isolation within a database between tables ○ Tables in different schemas but in the same database can have the same name ○ Used to organize database objects into logical groups ○ “public” is the default schema in a database © The Pythian Group Inc., 2019 2018 12

13.Major Differences - Users (MySQL) ● Users are created with the “CREATE USER” command ● Privileges are granted with the “GRANT” commands ● Restrictions on where users can login from are based upon IP address or hostname ● MySQL 8.0 and MariaDB 10.0 has ROLES that allows permissions to be granted to a role and then users added or removed from the role © The Pythian Group Inc., 2019 2018 13

14.Major Differences - Users (PostgreSQL) ● Users has been depreciated and there are only ROLES ● A ROLE can be a user or a group. A ROLE becomes a user when it is given the LOGIN permission. By default a ROLE is created with NOLOGIN ● Permissions to databases and schemas are granted to roles ● Restricting where a ROLE can login from is done in the pg_hba.conf configuration file. ● File is scanned on each connection request ● Grants to database objects ● Table, column, view, sequence, database, foreign-data wrapper, foreign server, function, procedural language, schema, or tablespace © The Pythian Group Inc., 2019 2018 14

15.Major Differences - Connections (MySQL) Threads ● MySQL runs as a single process ● New connections coming into MySQL spawn a new thread within the MySQL process ● Threads are lightweight ● Each Thread connection is 256k on a 64-bit system by default ● Less CPU in opening and closing threaded connections ● If a thread crashes, the entire MySQL process crashes ● The “SHOW PROCESSLIST” command should actually be called “SHOW THREADLIST” © The Pythian Group Inc., 2019 2018 15

16.Major Differences - Connections (PostgreSQL) Processes ● Postmaster (Daemon) - Is the main process in PostgreSQL. As a new connection comes in it spawns a backend process ● Backend processes - A new process that is spawned from the postmaster process when a new connection is established to handle the requests that ● Memory consumption can be up to 10MB per connection ● When scaling PostgreSQL, connection planning is a very critical component to consider © The Pythian Group Inc., 2019 2018 16

17.Major Differences - MVCC (MySQL) ● Undo logs stores the data changed in a transaction for rollback ● A row is not physically deleted until MySQL discards the update undo log record ● Commit transactions regularly, or InnoDB cannot discard data from the update undo logs ● Old data in tables is automatically cleaned up with the purge process that is run after commits © The Pythian Group Inc., 2019 2018 2017 17

18.Major Differences - MVCC (PostgreSQL) ● Deleted records are marked with a transaction_id and removed from view ● Updates insert a new row and the old row is marked with a transaction_id ● Unused data is still in the table, known as tuples ● A VACUUM process goes through the tables and removes the dead tuples. ● VACUUM FULL command can reclaim disk space, but locks the table ● pg_repack (previously pg_reorg) does an online VACUUM FULL without holding an exclusive lock ● Recommended blog https://www.percona.com/blog/2018/08/06/basic-unde rstanding-bloat-vacuum-postgresql-mvcc/ © The Pythian Group Inc., 2019 2018 2017 18

19.Major Differences - Memory Management (MySQL) InnoDB buffer pool - Stores many in-memory data structures of InnoDB, buffers, caches, indexes and even row-data (Usually 80% of the system memory) InnoDB log buffer - Allows transactions to run without having to write the log to disk before the transactions commit © The Pythian Group Inc., 2019 2018 2017 19

20.Major Differences - Memory Management (PostgreSQL) ● Backend Process - Created by the primary PostgreSQL server process to handle client queries ● Each client connection creates a backend process ● Repeated connections are handled by outside tool like pg_bouncer ● Shared Memory - Used by all processes of a PostgreSQL server ● Determines how much memory is dedicated to PostgreSQL to use for caching data and other processes ● Usually 25% to 40% of system memory © The Pythian Group Inc., 2019 2018 20

21.Major Differences - Replication (MySQL) ● Uses binary logs which are logs of changes that have been made to the database ● binary log data is replicated using Row, Statement, or Mixed replication formats ● Many types of replication configurations such as Master/Slave, Master/Master, Percona XtraDB or Galera Clustering, Multi Master, and more ● Replication assumes all changes are to be replicated to slaves unless filters are put into place © The Pythian Group Inc., 2019 2018 21

22.Major Differences - Streaming Replication (Postgres) ● Before PostgreSQL 10 there was only streaming replication ● Streaming replication implements file-based log shipping by transferring WAL records one file (WAL segment) at a time ● If replication breaks or a failover occurs, the slave needs to be rebuilt ● Primary and standby nodes must be the same version ● Everything must be replicated, no filtering © The Pythian Group Inc., 2019 2018 2017 22

23.Major Differences - Logical Replication (PostgreSQL) ● PostgreSQL 10 and greater introduces logical replication ● Data is replicated from a publisher that has a publication to subscribers that have a subscription ● Publications can be configured on a subset of data ● When subscription is created, the data is then backed up from publisher and restored to the subscriber ● Changes are then replicated using WAL logs ● A publication can have multiple subscriptions from different servers ● Cross version support and can be used for migrations and upgrades ● Multi source replication for consolidation ● Truncate and DDL such as ALTER TABLE are not supported in replication ● UPDATE and DELETE operations on missing data will be automatically skipped © The Pythian Group Inc., 2019 2018 23

24.Major Differences - File System (MySQL) ● Data Files are located in the data directory ● Logs are also stored in the data directory unless configured differently ● File per table config allows for tables to be stored individually on disk instead of shared table space ● Folder structure is similar to the layout within the database. Each database is a folder, and the tables are files within that folder ● Partitioning a table splits the table IBD file across multiple files ● /var/lib/mysql (Default path) ● Database ■ Table IBD (table data file) ■ Table FRM (table definition file) © The Pythian Group Inc., 2019 2018 24

25.Major Differences - File System (PostgreSQL) ● The data directory is a known as a database cluster. Which is a single directory under which all data will be stored. ● SHOW DATA_DIRECTORY; can be used to display the data directory ● /var/lib/pgsql/10/data ● SELECT pg_relation_filepath('people'); can be used to find the path of a table ● base/16385/16388 ● Using this information you can then see things like file size on disk by looking at the file path /var/lib/pgsql/10/data/base/16385/16388 ● After 1 GB or configured size, tables are split across multiple files. adjusted using the configuration option --with-segsize when building PostgreSQL © The Pythian Group Inc., 2019 2018 25

26.Minor Differences to Note ● PostgreSQL has a true Boolean data type where MySQL you have to use TinyINT with a 0 and 1 ● PostgreSQL has SERIAL where MySQL has AUTO_INCREMENT ● MySQL is not case specific for string comparison. PostgreSQL is ● PostgreSQL is truly open source where MySQL is owned by Oracle and has components that need to be purchased ● MySQL is a relational database (RDB) where PostgreSQL is an object relational database (ORD), allowing for PostgreSQL to be customizable and extensible ● And many more …. © The Pythian Group Inc., 2019 2018 26

27.PostgreSQL specific features ● PostgreSQL is a developers dream database ● PostgreSQL has Procedural Languages PL/pgSQL, PL/Tcl, PL/Perl, and PL/Python ● Has Data domain objects for constraints across tables ● PostgreSQL has table inheritance ● Has materialized views, used in data warehousing ● Is Highly extensible, so you can add your own data types, operators, index types, and functional languages. You can even override different parts of the system, using plugins to alter the execution of commands, or add a new optimizer. © The Pythian Group Inc., 2019 2018 27

28.References ● https://www.covermymeds.com/main/insights/articles/command-equivalents-in-postgres-co ming-from-mysql/ ● http://blog.dumper.io/showdown-mysql-8-vs-postgresql-10/ ● https://www.yumpu.com/en/document/read/53683323/migrating-uber-from-mysql-to-postgr esql ● https://eng.uber.com/mysql-migration/ ● https://xuri.me/2014/11/10/fix-error-innodb-the-innodb-memory-heap-is-disabled.html ● https://dev.mysql.com/doc/refman/5.7/en/innodb-architecture.html ● https://www.datadoghq.com/blog/postgresql-monitoring/ ● http://www.interdb.jp/pg/pgsql02.html ● https://severalnines.com/blog/architecture-and-tuning-memory-postgresql-databases ● https://severalnines.com/blog/postgresql-streaming-replication-deep-dive ● https://severalnines.com/blog/postgresql-streaming-replication-vs-logical-replication ● https://www.postgresql.org/docs/10/creating-cluster.html ● https://dev.mysql.com/doc/refman/5.7/en/innodb-multi-versioning.html © The Pythian Group Inc., 2019 2018 28

29.Thank You © The Pythian Group Inc., 2018 29