PostgreSQL 101

PostgreSQL是一个完全由社区驱动的高级开源数据库软件。
理解它在概念上与MySQL有多大的不同,以及它与MySQL的不同之处也很重要。
本文首先简要概述了PostgreSQL以及PostgreSQL和MySQL之间的概念差异。

展开查看详情

1.PostgreSQL 101 Avinash Vallarapu (Avi)
 Fernando Laudares
 Percona

2. History of PostgreSQL Ingres Year 1973 - INGRES (INteractive GRaphics Retrieval System), work on one of the world's first RDBMS was Started by Eugene Wong and Michael Stonebraker at University of California at Berkeley. Year 1979 - Oracle Database first version was released. Early 1980’s - INGRES used QUEL as its preferred Query Language. Whereas Oracle used SQL. Ingres lost its Market dominance to Oracle as it was too late for IngreS to adopt SQL as a Preferred Query Language as opposed to QUEL. Year 1985 - UC Berkeley INGRES research project officially ended. Postgres Year 1986 - Postgres was introduced as a Post-Ingres evolution aimed to incorporate ORDBMS. Postgres used POSTQUEL as its query language until 1994 Year 1995 - Postgres95 replaced Postgres with its support for SQL as a query language. - Andrew Yu and Jolly Chen(PhD students from Stonebraker’s lab). PostgreSQL Year 1996 - Project renamed to PostgreSQL to reflect the original name Postgres and its SQL Compatibility. Year 1997 - PostgreSQL first version - PostgreSQL 6.0 released. 2 © 2018 Percona

3. PostgreSQL Features ▪ Portable ▪ Written in C ▪ Flexible across all the UNIX platforms, Windows, MacOS and others. ▪ World’s most advanced open source database. Community driven. ▪ ANSI/ISO Compliant SQL support.
 ▪ Reliable ▪ ACID Compliant ▪ Supports Transactions ▪ Uses Write Ahead Logging ▪ Scalable ▪ MVCC ▪ Table Partitioning ▪ Tablespaces ▪ FDWs ▪ Sharding 3 © 2018 Percona

4. PostgreSQL Advanced Features ▪ Security ▪ Host-Based Access Control ▪ Object-Level and Row-Level Security ▪ Logging and Auditing ▪ Encryption using SSL
 ▪ High Availability ▪ Synchronous/Asynchronous Replication and Delayed Standby ▪ Cascading Replication ▪ Online Consistent Physical Backups and Logical Backups ▪ PITR ▪ Logical Replication ▪ Other Features ▪ Triggers and Functions/Stored Procedures ▪ Custom Stored Procedural Languages like PL/pgSQL, PL/perl, PL/TCL, PL/php, PL/python, PL/java. ▪ PostgreSQL Major Version Upgrade using pg_upgrade ▪ Unlogged Tables ▪ Materialized Views ▪ Hot Standby - Slaves accept Reads ▪ Parallel Query and parallel partition scan 4 © 2018 Percona

5. PostgreSQL Cluster ▪ After Initializing your PostgreSQL using initdb(similar to mysqld --initialize) and starting it, you can create multiple databases in it. 
 ▪ A group of databases running on one Server & One Port - Is called a Cluster in PostgreSQL. 
 ▪ PostgreSQL Cluster may be referred to as a PostgreSQL Instance as well.
 ▪ A PostgreSQL Cluster or an Instance : ▪ Servers only one TCP/IP Port ▪ Has a Dedicated Data Directory ▪ Contains 3 default databases : postgres, template0 and template1. ▪ When you add a Slave(aka Standby) to your PostgreSQL Cluster(Master), it may be referred to as a PostgreSQL High Availability Cluster or a PostgreSQL Replication Cluster.
 ▪ PostgreSQL Cluster that can accept Writes and ships WALs to Slave(Standby), is called a Master. 5 © 2018 Percona

6. PostgreSQL Database & Schema ▪ A PostgreSQL Database can contain one or more Schemas. Default Schema is - public schema. 
 ▪ A Schema is a logical entity used to group objects together. An example : A Folder/Directory that contains Tables, Index and other objects as files. 
 ▪ A Database can be related to a Parent Folder/Directory that contains one or more Schemas. 
 ▪ You can always have more than 1 Database with one or more Schemas in it.
 ▪ A Schema in PostgreSQL helps you group objects of a certain Application logic together. This helps you create multiple objects with the same name in one Database. 
 For example : In a Database named percona, A Table employee can exist in both full_time and contractor schemas. Database : percona Schema(s) : full_time & contractor Tables : 1. full_time.employee 2. contractors.employee ▪ A Fully Qualified Table Name : schemaname.tablename must be used to query a particular Table in a Schema. 
 For example : select * from full_time.employee where salary > 10000; 6 © 2018 Percona

7. PostgreSQL ACID Compliance ▪ Atomicity : Transactions. Either All or Nothing. BEGIN …SQL1, SQL2, …SQLn…..COMMIT/ROLLBACK/END.
 ▪ Consistency : Give me a consistent picture of the data based on Isolation Levels. 
 Let us see the following example when Isolation Level is READ_COMMITTED
 
 Query 1 : select count(*) from employees;
 
 9am : Records in employee table : 10000
 9:10 am : Query 1 Started by User 1
 9:11am : 2 employee records deleted by User 2.
 9:12am : Query 1 that was started by User 1 Completed.
 
 Result of Query 1 at 9:12am would still be 10000. A Consistent image as how it was at 9:00am.
 ▪ Isolation : Prevent Concurrent data access through Locking. 
 ▪ Durability : Once the Data is committed, it must be safe. 
 Through WAL’s, fsync, synchronous_commit, Replication.
 
 7 © 2018 Percona

8. PostgreSQL Terminology ▪ PostgreSQL was designed in academia
 ▪ Objects are defined in academic terms ▪ Terminology based on relational calculus/algebra 8 © 2018 Percona

9. Client Architecture Applications connect to Database and send SQL’s to interact with the Database. Client-side APIs are needed to send SQL’s and receive the results. ▪ libpq : ▪ C application programmer’s interface to PostgreSQL. ▪ libpq is is a set of library functions that allow client programs to pass queries to the PostgreSQL backend server and to receive the results of these queries. ▪ Along with C, other PostgreSQL application interfaces such as C++, Perl, Python, Tcl and ECPG uses libpq.
 ▪ JDBC : ▪ Java, Client side API 9 © 2018 Percona

10. PostgreSQL Server Multi-Process Architecture. ▪ Postmaster (Parent PostgreSQL Process)
 ▪ Backend Utility Processes
 ▪ Per-Connection backend processes
 ▪ Every Connection is a Process. 10 © 2018 Percona

11. PostgreSQL Installation using rpm’s on RedHat/CentOS/OEL PGDG Repository : PostgreSQL Global Development Group maintains YUM and APT repository of PostgreSQL for the linux platforms. One of the most easiest and the desired methods is to install PostgreSQL using rpm’s from PGDG repo. For YUM https://yum.postgresql.org For APT https://apt.postgresql.org/pub/repos/apt/ Step 1 : Choose the appropriate rpm that adds pgdg repo to your server. Please make sure to choose the desired PostgreSQL version and the OS version appropriately. Install the pgdg repo rpm using YUM. # yum install https://yum.postgresql.org/11/redhat/rhel-7.5-x86_64/pgdg-centos11-11-2.noarch.rpm Step 2 : Install PostgreSQL using the following step. # yum install postgresql11 postgresql11-contrib postgresql11-libs postgresql11-server 11 © 2018 Percona

12. PostgreSQL Architecture 12 © 2018 Percona

13.13 © 2018 Percona

14. Background Utility Processes 14 © 2018 Percona

15. Process Components ▪ Postmaster :
 ▪ Master database control process. ▪ Responsible for startup & shutdown ▪ Spawning other necessary backend processes
 ▪ Postgres backend :
 ▪ Dedicated, per-connection server process ▪ Responsible for fetching data from disk and communicating with the client 15 © 2018 Percona

16. Utility Processes ▪ BGWriter :
 ▪ Background Writer ▪ Writes/Flushes dirty data blocks to disk
 ▪ WAL Writer :
 ▪ Writes WAL Buffers to Disk. ▪ WAL Buffers are written to WALs(Write-Ahead Logs) on the Disk. 
 ▪ Autovacuum :
 ▪ Starts Autovacuum worker processes to start a vacuum and analyze
 ▪ Checkpointer :
 ▪ Perform a CHECKPOINT that ensures that all the changes are flushed to Disk ▪ Depends on configuration parameters. 16 © 2018 Percona

17. Utility Processes (Cont.d) ▪ Archiver :
 ▪ Archives Write-Ahead-Logs ▪ Used for High Availability, Backups, PITR
 ▪ Logger : 
 ▪ Logs messages, events, error to syslog or log files. ▪ Errors, slow running queries, warnings,..etc. are written to log files by this Process.
 ▪ Stats Collector :
 ▪ Collects statistics of Relations. ▪ Similar to ANALYZE in MySQL 17 © 2018 Percona

18. Utility Processes (Cont.d) ▪ WAL Sender :
 ▪ Sends WALs to Replica(s). ▪ One WAL Sender for each Slave connected for Replication.
 ▪ WAL Receiver : 
 ▪ Started on a Slave(aka Standby or Replica) in Replication ▪ Streams WALs from Master
 ▪ bgworker : 
 ▪ PostgreSQL is extensible to run user-supplied code in separate processes that are monitored by Postgres. ▪ Such processes can access PostgreSQL's shared memory area ▪ Connect as a Client using libpq
 ▪ bgworker: logical replication launcher
 ▪ Logical Replication between a Publisher and a Subscriber 18 © 2018 Percona

19. Memory Components ▪ Shared Buffers
 ▪ PostgreSQL Database Memory Area ▪ Shared by all the Databases in the Cluster ▪ Pages are fetched from Disk to Shared Buffers during Reads/Writes ▪ Modified Buffers are also called as Dirty Buffers ▪ Parameter : shared_buffers sets the amount of RAM allocated to shared_buffers ▪ Uses LRU Algorithm to flush less frequently used buffers. ▪ Dirty Buffers written to disk after a CHECKPOINT.
 ▪ WAL Buffers : 
 ▪ Stores Write Ahead Log Records ▪ Contains the change vector for a buffer being modified. ▪ WAL Buffers written to WAL Segments(On Disk).
 ▪ work_mem :
 ▪ Memory used by each Query for internal sort operations such as ORDER BY and DISTINCT. ▪ Postgres writes to disk(temp files) if memory is not sufficient. 19 © 2018 Percona

20. Memory Components (Cont.d) ▪ maintenance_work_mem
 ▪ Amount of RAM used by VACUUM, CREATE INDEX, REINDEX like maintenance operations. ▪ Setting this to a bigger value can help in faster database restore. 20 © 2018 Percona

21. Disk Components ▪ Data Directory
 ▪ In MySQL, Data Directory is created when you initialize your MySQL Instance. ▪ Initialized using initdb in PostgreSQL. Similar to mysqld --initialize ▪ Contains Write-Ahead-Logs, Log Files, Databases, Objects and other configuration files. ▪ You can move WAL’s and Logs to different directories using symlinks and parameters. ▪ Environment Variable : $PGDATA
 ▪ Configuration Files inside the Data Directory ▪ postgresql.conf (Similar to my.cnf file for MySQL). ▪ Contains several configurable parameters. ▪ pg_ident.conf ▪ pg_hba.conf ▪ postgresql.auto.conf 21 © 2018 Percona

22. What’s inside the Data Directory ? 22 © 2018 Percona

23. Configuration Files inside the Data Directory ▪ PG_VERSION ▪ Version String of the Database Cluster
 ▪ pg_hba.conf ▪ Host-Based access control file (built-in firewall)
 ▪ pg_ident.conf ▪ ident-based access file for OS User to DB User Mapping
 ▪ postgresql.conf ▪ Primary Configuration File for the Database
 ▪ postmaster.opts ▪ Contains the options used to start the PostgreSQL Instance
 ▪ postmaster.pid ▪ The Parent Process ID or the Postmaster Process ID 23 © 2018 Percona

24. Base Directory(base) 24 © 2018 Percona

25. Tables and Indexes inside Base Directory(base) 25 © 2018 Percona

26. Base Directory & Datafiles on Disk ▪ Base Directory ▪ Contains Sub-Directories for every Database you create ▪ Every Database Sub-Directory contains files for every Relation/Object created in the Database.
 ▪ Datafiles ▪ Datafiles are the files for Relations in the base directory ▪ Base Directory contains Relations. ▪ Relations stored on Disk as 1GB segments. ▪ Each 1GB Datafile is made up of several 8KB Pages that are allocated as needed. ▪ Segments are automatically added unlike Oracle. 26 © 2018 Percona

27. Write Ahead Logs(WALs) ▪ WALs ▪ When Client commits a transaction, it is written to WAL Segments (on Disk) before a success message is sent to Client. ▪ Transaction Journal aka REDO Logs. Similar to InnoDB Buffers in MySQL. ▪ Written by WAL Writer background process. ▪ Ensures Durability with fsync and synchronous_commit set to ON and commit_delay set to 0. ▪ Used during Crash Recovery. ▪ Size of each WAL is 16MB. Modifiable during Initialization. ▪ Created in pg_xlog directory until PostgreSQL 9.6. ▪ Location of WALs is renamed to pg_wal in PostgreSQL 10 ▪ WAL Directory exits in Data Directory by default. Can be modified using Symlinks. ▪ WALs are deleted depending on the parameters : wal_keep_segments and checkpoint_timeout. 27 © 2018 Percona

28. Archived Logs ▪ Archived WALs ▪ Archiving in PostgreSQL can be enabled through parameters : archive_mode and archive_command. ▪ Ships WALs to safe locations like a Backup Server or Cloud Storage like S3 or Object Store. ▪ WALs are archived by archiver background process. ▪ archive_command can be set with the appropriate shell command to archive WALs. 28 © 2018 Percona

29. Reading Data from PostgreSQL 29 © 2018 Percona