Introduction to PostgreSQL for Oracle and MySQL DBAs

本教程的重点是向对Oracle或MySQL有经验的专业人员教授PostgreSQL。正如我们从基础知识中讨论的,即使是初学者也可以很容易地适应这些概念。关系数据库有许多相似的概念。如果我们帮助您将PostgreSQL的概念与Oracle/MySQL联系起来,您会发现这很容易。因此,我们将从用于Oracle/MySQL DBA的PostgreSQL体系结构开始。 我们在本教程中采用的方法是从MySQL和Oracle的角度概述PostgreSQL的架构和操作。我们将讲座与实际工作相结合,让听众将我们在实践中提出的概念以实验室的形式应用,然后再介绍新的概念,从WI开始。这是一个简单而快速的PostgreSQL安装,并一直在建立和维护一个备用副本。当天结束时,与会者将对PostgreSQL的工作方式有一个很好的了解,并有足够的信心自己尝试。
PostgreSQL简史及其特点 针对MySQL/Oracle DBA和初学者的PostgreSQL体系结构。
PostgreSQL安装 创建/启动/停止PostgreSQL集群和数据目录的组件
需要知道的PostgreSQL参数/设置 PostgreSQL用户管理
PostgreSQL中的MVCC PostgreSQL中的真空和事务ID环绕
索引类型 在PostgreSQL中解释
逻辑和物理备份和pitr 流式复制
故障转移 PostgreSQL 10中的声明性分区以及PostgreSQL 11中的进一步特性。


1. Introduction to PostgreSQL for Oracle and MySQL DBAs Avinash Vallarapu Percona

2.The History of PostgreSQL Ingres Year 1973 - INGRES (INteractive GRaphics Retrieval System) work on one of the world's first RDBMS was started by Eugene Wong and Michael Stonebraker at University of California at Berkeley Year 1979 - Oracle Database first version was released Early 1980’s - INGRES used QUEL as its preferred Query Language. Whereas Oracle used SQL Year 1985 - UC Berkeley INGRES research project officially ended Postgres Year 1986 - Postgres was introduced as a Post-Ingres evolution. Used POSTQUEL as its query language until 1994 Year 1995 - Postgres95 replaced Postgres with its support for SQL as a query language PostgreSQL Year 1996 - Project renamed to PostgreSQL to reflect the original name Postgres and its SQL Compatibility Year 1997 - PostgreSQL first version - PostgreSQL 6.0 released 2

3.PostgreSQL Features ● Portable ○ Written in C ○ Flexible across all the UNIX platforms, Windows, MacOS and others ○ World’s most advanced open source database. Community-driven ○ ANSI/ISO Compliant SQL support ● Reliable ○ ACID Compliant ○ Supports Transactions ○ Uses Write Ahead Logging ● Scalable ○ MVCC ○ Table Partitioning ○ Tablespaces ○ FDWs ○ Sharding 3

4.PostgreSQL Advanced Features ● Security ○ Host-Based Access Control ○ Object-Level and Row-Level Security ○ Logging and Auditing ○ Encryption using SSL ● High Availability ○ Synchronous/Asynchronous Replication and Delayed Standby ○ Cascading Replication ○ Online Consistent Physical Backups and Logical Backups ○ PITR ● Other Features ○ Triggers and Functions/Stored Procedures ○ Custom Stored Procedural Languages like PL/pgSQL, PL/perl, PL/TCL, PL/php, PL/python, PL/java. ○ PostgreSQL Major Version Upgrade using pg_upgrade ○ Unlogged Tables, Parallel Query, Native Partitioning, FDWs ○ Materialized Views 4 ○ Hot Standby - Slaves accept Reads

5.PostgreSQL Cluster ● After Initializing your PostgreSQL using initdb (similar to mysqld --initialize) and starting it, you can create multiple databases in it ● A group of databases running on one Server & One Port - Is called a Cluster in PostgreSQL ● PostgreSQL Cluster may be referred to as a PostgreSQL Instance as well ● A PostgreSQL Cluster or an Instance: ○ Serves only one TCP/IP Port ○ Has a Dedicated Data Directory ○ Contains 3 default databases: postgres, template0 and template1 ● When you add a Slave(aka Standby) to your PostgreSQL Cluster(Master), it may be referred to as a PostgreSQL High Availability Cluster or a PostgreSQL Replication Cluster ● PostgreSQL Cluster that can accept Writes and ships WALs to Slave(Standby), is called a Master 5

6.PostgreSQL Database and Schema ● A PostgreSQL Database can contain one or more Schemas ● Default Schema is - public schema ● A Schema in PostgreSQL is a logical entity that helps you group objects of a certain Application logic together. This helps you create multiple objects with the same name in one Database ● A Database can be related to a Parent Folder/Directory.You can always have more than 1 Database with one or more Schemas in it ● For example: In a Database named percona, a Table employee can exist in both scott and tiger schemas Database: percona Schema(s): scott & tiger Tables: 1. scott.employee 2. tiger.employee ● A Fully Qualified Table Name: schemaname.tablename must be used to query a particular Table in a Schema For example: select * from scott.employee where salary > 10000; 6

7.PostgreSQL ACID Compliance ● Atomicity: Transactions. Either All or Nothing BEGIN …SQL1, SQL2, …SQLn…..COMMIT/ROLLBACK/END ● Consistency: Give me a consistent picture of the data based on Isolation Levels Let us see the following example when Isolation Level is READ_COMMITTED Query 1 : select count(*) from employees; 9am: Records in employee table: 10000 9:10 am: Query 1 Started by User 1 9:11am: 2 employee records deleted by User 2 9:12am: Query 1 that was started by User 1 Completed Result of Query 1 at 9:12am would still be 10000. A Consistent image as how it was at 9:00am ● Isolation: Prevent Concurrent data access through Locking ● Durability: Once the Data is committed, it must be safe Through WAL’s, fsync, synchronous_commit, Replication 7

8.PostgreSQL Terminology ● PostgreSQL was designed in academia ○ Objects are defined in academic terms ○ Terminology based on relational calculus/algebra 8

9.PostgreSQL Installation

10.PostgreSQL Installation Using RPM’s PGDG Repository : PostgreSQL Global Development Group maintains YUM and APT repository For YUM For APT Step 1: Choose the appropriate rpm that adds pgdg repo to your server # yum install Step 2: Install PostgreSQL using the following step # yum install postgresql11 postgresql11-contrib postgresql11-libs postgresql11-server 10

11.Initialize Your First PostgreSQL Cluster ● initdb is used to Initialize a PostgreSQL cluster $ echo "PATH=/usr/pgsql-11/bin:$PATH">>~/.bash_profile $ source .bash_profile $ echo $PGDATA /var/lib/pgsql/11/data $initdb --version initdb (PostgreSQL) 11.0 $ initdb 11


13.Starting and Stopping a PostgreSQL • PostgreSQL can be stopped and started from command line using pg_ctl ○ Starting PostgreSQL ▪ pg_ctl -D $PGDATA start ○ Stopping PostgreSQL ▪ pg_ctl -D $PGDATA stop 13

14.Shutdown Modes in PostgreSQL ● -ms (Smart Mode - Default mode) ○ Waits for all connections to exit and does not allow new transactions ○ Committed transactions applied to Disk through a CHECKPOINT before shutdown ○ May take more time on busy systems $ pg_ctl -D $PGDATA stop -ms ● -mf (Fast Mode - Recommended on Busy Systems) ○ Closes/Kills all the open transactions and does not allow new transactions. SIGTERM is sent to server processes to exit promptly ○ Committed transactions applied to Disk through a CHECKPOINT before shutdown ○ Recommended on Busy Systems $ pg_ctl -D $PGDATA stop -mf ● -mi (Immediate Mode - Forced and Abnormal Shutdown during Emergencies) ○ SIGQUIT is sent to all the processes to exit immediately, without properly shutting down ○ Requires Crash Recovery after Instance Start ○ Recommended in Emergencies $ pg_ctl -D $PGDATA stop -mi 14

15.psql and shortcuts ● Connect to your PostgreSQL using psql ▪ $ psql List the databases \l \l + (Observe the difference) To connect to your database \c dbname List Objects \dt -> List all the tables \dn -> List all the schemas ▪ Show all backslash (shortcut) commands \? 15

16.PostgreSQL Architecture

17.PostgreSQL Server ● Multi-Process Architecture ○ Postmaster (Parent PostgreSQL Process) ○ Backend Utility Processes ○ Per-Connection backend processes 17

18.Background Utility Processes Start your PostgreSQL Instance and see the Postgres processes 18


20.PostgreSQL Components ● Postmaster: ○ Master database control process ○ Responsible for startup and shutdown ○ Spawning other necessary backend processes 20

21.Utility Processes • BGWriter: ○ Background Writer ○ Writes/Flushes dirty data blocks to disk • WAL Writer: ○ Writes WAL Buffers to Disk ○ WAL Buffers are written to WALs(Write-Ahead Logs) on the Disk • Autovacuum: ○ Starts Autovacuum worker processes to start a vacuum and analyze • Checkpointer: ○ Perform a CHECKPOINT that ensures that all the changes are flushed to Disk ○ Depends on configuration parameters 21

22.Utility Processes • Archiver: ○ Archives Write-Ahead-Logs ○ Used for High Availability, Backups, PITR • Logger: ○ Logs messages, events, error to syslog or log files. ○ Errors, slow running queries, warnings,..etc. are written to log files by this Process • Stats Collector: ○ Collects statistics of Relations. 22

23.Utility Processes • WAL Sender: • Sends WALs to Replica(s) • One WAL Sender for each Slave connected for Replication • WAL Receiver: • Started on a Slave(aka Standby or Replica) in Replication • Streams WALs from Master • bgworker: • PostgreSQL is extensible to run user-supplied code in separate processes that are monitored by Postgres • Such processes can access PostgreSQL's shared memory area • Connect as a Client using libpq • bgworker: logical replication launcher • Logical Replication between a Publisher and a Subscriber 23

24.Memory Components ● Shared Buffers: ○ PostgreSQL Database Memory Area ○ Shared by all the Databases in the Cluster ○ Pages are fetched from Disk to Shared Buffers during Reads/Writes ○ Modified Buffers are also called as Dirty Buffers ○ Parameter : shared_buffers sets the amount of RAM allocated to shared_buffers ○ Uses LRU Algorithm to flush less frequently used buffers ○ Dirty Buffers written to disk after a CHECKPOINT ● WAL Buffers: ○ Stores Write Ahead Log Records ○ Contains the change vector for a buffer being modified ○ WAL Buffers written to WAL Segments(On Disk) ● work_mem: ○ Memory used by each Query for internal sort operations such as ORDER BY and DISTINCT ○ Postgres writes to disk(temp files) if memory is not sufficient 24

25.Memory Components • maintenance_work_mem: ○ Amount of RAM used by VACUUM, CREATE INDEX, REINDEX like maintenance operations ○ Setting this to a bigger value can help in faster database restore 25

26.PostgreSQL Does Not Use Direct IO ● When it needs a Page(Data Block), it searches it’s own memory aka Shared Buffers ● If not found in shared buffers, it will request the OS for the same block ● The OS fetches the block from the Disk and gives it to Postgres, if the block is not found in OS Cache ● More important to Caching when Database and Active Data set cannot fit in memory 26

27.Disk Components • Data Directory ○ In MySQL, Data Directory is created when you initialize your MySQL Instance ○ Initialized using initdb in PostgreSQL. Similar to mysqld --initialize ○ Contains Write-Ahead-Logs, Log Files, Databases, Objects and other configuration files ○ You can move WAL’s and Logs to different directories using symlinks and parameters ○ Environment Variable: $PGDATA • Configuration Files inside the Data Directory ○ postgresql.conf (Similar to my.cnf file for MySQL) ○ Contains several configurable parameters ○ pg_ident.conf ○ pg_hba.conf ○ 27

28.What’s Inside Data Directory? 28

29.Configuration Files Inside Data Directory? ● PG_VERSION ○ Version String of the Database Cluster ● pg_hba.conf ○ Host-Based access control file (built-in firewall) ● pg_ident.conf ○ ident-based access file for OS User to DB User Mapping ● postgresql.conf ○ Primary Configuration File for the Database ● postmaster.opts ○ Contains the options used to start the PostgreSQL Instance ● ○ The Parent Process ID or the Postmaster Process ID 29