Introduction to PostgreSQL for Oracle and MySQL DBAs

这是一个分为两部分的辅导课程的第一部分,上午和下午。
本教程的重点是向对Oracle或MySQL有经验的专业人员教授PostgreSQL。正如我们从基础知识中讨论的,即使是初学者也可以很容易地适应这些概念。关系数据库有许多相似的概念。如果我们帮助您将PostgreSQL的概念与Oracle/MySQL联系起来,您会发现这很容易。因此,我们将从用于Oracle/MySQL DBA的PostgreSQL体系结构开始。 我们在本教程中采用的方法是从MySQL和Oracle的角度概述PostgreSQL的架构和操作。我们将讲座与实际工作相结合,让听众将我们在实践中提出的概念以实验室的形式应用,然后再介绍新的概念,从WI开始。这是一个简单而快速的PostgreSQL安装,并一直在建立和维护一个备用副本。当天结束时,与会者将对PostgreSQL的工作方式有一个很好的了解,并有足够的信心自己尝试。
以下是本教程中介绍的要点:
PostgreSQL简史及其特点 针对MySQL/Oracle DBA和初学者的PostgreSQL体系结构。
PostgreSQL安装 创建/启动/停止PostgreSQL集群和数据目录的组件
需要知道的PostgreSQL参数/设置 PostgreSQL用户管理
PostgreSQL中的MVCC PostgreSQL中的真空和事务ID环绕
索引类型 在PostgreSQL中解释
逻辑和物理备份和pitr 流式复制
故障转移 PostgreSQL 10中的声明性分区以及PostgreSQL 11中的进一步特性。

展开查看详情

1. Introduction to PostgreSQL for Oracle and MySQL DBAs Avinash Vallarapu Percona

2.The History of PostgreSQL Ingres Year 1973 - INGRES (INteractive GRaphics Retrieval System) work on one of the world's first RDBMS was started by Eugene Wong and Michael Stonebraker at University of California at Berkeley Year 1979 - Oracle Database first version was released Early 1980’s - INGRES used QUEL as its preferred Query Language. Whereas Oracle used SQL Year 1985 - UC Berkeley INGRES research project officially ended Postgres Year 1986 - Postgres was introduced as a Post-Ingres evolution. Used POSTQUEL as its query language until 1994 Year 1995 - Postgres95 replaced Postgres with its support for SQL as a query language PostgreSQL Year 1996 - Project renamed to PostgreSQL to reflect the original name Postgres and its SQL Compatibility Year 1997 - PostgreSQL first version - PostgreSQL 6.0 released 2

3.PostgreSQL Features ● Portable ○ Written in C ○ Flexible across all the UNIX platforms, Windows, MacOS and others ○ World’s most advanced open source database. Community-driven ○ ANSI/ISO Compliant SQL support ● Reliable ○ ACID Compliant ○ Supports Transactions ○ Uses Write Ahead Logging ● Scalable ○ MVCC ○ Table Partitioning ○ Tablespaces ○ FDWs ○ Sharding 3

4.PostgreSQL Advanced Features ● Security ○ Host-Based Access Control ○ Object-Level and Row-Level Security ○ Logging and Auditing ○ Encryption using SSL ● High Availability ○ Synchronous/Asynchronous Replication and Delayed Standby ○ Cascading Replication ○ Online Consistent Physical Backups and Logical Backups ○ PITR ● Other Features ○ Triggers and Functions/Stored Procedures ○ Custom Stored Procedural Languages like PL/pgSQL, PL/perl, PL/TCL, PL/php, PL/python, PL/java. ○ PostgreSQL Major Version Upgrade using pg_upgrade ○ Unlogged Tables, Parallel Query, Native Partitioning, FDWs ○ Materialized Views 4 ○ Hot Standby - Slaves accept Reads

5.PostgreSQL Cluster ● After Initializing your PostgreSQL using initdb (similar to mysqld --initialize) and starting it, you can create multiple databases in it ● A group of databases running on one Server & One Port - Is called a Cluster in PostgreSQL ● PostgreSQL Cluster may be referred to as a PostgreSQL Instance as well ● A PostgreSQL Cluster or an Instance: ○ Serves only one TCP/IP Port ○ Has a Dedicated Data Directory ○ Contains 3 default databases: postgres, template0 and template1 ● When you add a Slave(aka Standby) to your PostgreSQL Cluster(Master), it may be referred to as a PostgreSQL High Availability Cluster or a PostgreSQL Replication Cluster ● PostgreSQL Cluster that can accept Writes and ships WALs to Slave(Standby), is called a Master 5

6.PostgreSQL Database and Schema ● A PostgreSQL Database can contain one or more Schemas ● Default Schema is - public schema ● A Schema in PostgreSQL is a logical entity that helps you group objects of a certain Application logic together. This helps you create multiple objects with the same name in one Database ● A Database can be related to a Parent Folder/Directory.You can always have more than 1 Database with one or more Schemas in it ● For example: In a Database named percona, a Table employee can exist in both scott and tiger schemas Database: percona Schema(s): scott & tiger Tables: 1. scott.employee 2. tiger.employee ● A Fully Qualified Table Name: schemaname.tablename must be used to query a particular Table in a Schema For example: select * from scott.employee where salary > 10000; 6

7.PostgreSQL ACID Compliance ● Atomicity: Transactions. Either All or Nothing BEGIN …SQL1, SQL2, …SQLn…..COMMIT/ROLLBACK/END ● Consistency: Give me a consistent picture of the data based on Isolation Levels Let us see the following example when Isolation Level is READ_COMMITTED Query 1 : select count(*) from employees; 9am: Records in employee table: 10000 9:10 am: Query 1 Started by User 1 9:11am: 2 employee records deleted by User 2 9:12am: Query 1 that was started by User 1 Completed Result of Query 1 at 9:12am would still be 10000. A Consistent image as how it was at 9:00am ● Isolation: Prevent Concurrent data access through Locking ● Durability: Once the Data is committed, it must be safe Through WAL’s, fsync, synchronous_commit, Replication 7

8.PostgreSQL Terminology ● PostgreSQL was designed in academia ○ Objects are defined in academic terms ○ Terminology based on relational calculus/algebra 8

9.PostgreSQL Installation

10.PostgreSQL Installation Using RPM’s PGDG Repository : PostgreSQL Global Development Group maintains YUM and APT repository For YUM https://yum.postgresql.org For APT https://apt.postgresql.org/pub/repos/apt/ Step 1: Choose the appropriate rpm that adds pgdg repo to your server # yum install https://yum.postgresql.org/11/redhat/rhel-7.5-x86_64/pgdg-centos11-11-2.noarch.rpm Step 2: Install PostgreSQL using the following step # yum install postgresql11 postgresql11-contrib postgresql11-libs postgresql11-server 10

11.Initialize Your First PostgreSQL Cluster ● initdb is used to Initialize a PostgreSQL cluster $ echo "PATH=/usr/pgsql-11/bin:$PATH">>~/.bash_profile $ source .bash_profile $ echo $PGDATA /var/lib/pgsql/11/data $initdb --version initdb (PostgreSQL) 11.0 $ initdb 11

12.12

13.Starting and Stopping a PostgreSQL • PostgreSQL can be stopped and started from command line using pg_ctl ○ Starting PostgreSQL ▪ pg_ctl -D $PGDATA start ○ Stopping PostgreSQL ▪ pg_ctl -D $PGDATA stop 13

14.Shutdown Modes in PostgreSQL ● -ms (Smart Mode - Default mode) ○ Waits for all connections to exit and does not allow new transactions ○ Committed transactions applied to Disk through a CHECKPOINT before shutdown ○ May take more time on busy systems $ pg_ctl -D $PGDATA stop -ms ● -mf (Fast Mode - Recommended on Busy Systems) ○ Closes/Kills all the open transactions and does not allow new transactions. SIGTERM is sent to server processes to exit promptly ○ Committed transactions applied to Disk through a CHECKPOINT before shutdown ○ Recommended on Busy Systems $ pg_ctl -D $PGDATA stop -mf ● -mi (Immediate Mode - Forced and Abnormal Shutdown during Emergencies) ○ SIGQUIT is sent to all the processes to exit immediately, without properly shutting down ○ Requires Crash Recovery after Instance Start ○ Recommended in Emergencies $ pg_ctl -D $PGDATA stop -mi 14

15.psql and shortcuts ● Connect to your PostgreSQL using psql ▪ $ psql List the databases \l \l + (Observe the difference) To connect to your database \c dbname List Objects \dt -> List all the tables \dn -> List all the schemas ▪ Show all backslash (shortcut) commands \? 15

16.PostgreSQL Architecture

17.PostgreSQL Server ● Multi-Process Architecture ○ Postmaster (Parent PostgreSQL Process) ○ Backend Utility Processes ○ Per-Connection backend processes 17

18.Background Utility Processes Start your PostgreSQL Instance and see the Postgres processes 18

19.19

20.PostgreSQL Components ● Postmaster: ○ Master database control process ○ Responsible for startup and shutdown ○ Spawning other necessary backend processes 20

21.Utility Processes • BGWriter: ○ Background Writer ○ Writes/Flushes dirty data blocks to disk • WAL Writer: ○ Writes WAL Buffers to Disk ○ WAL Buffers are written to WALs(Write-Ahead Logs) on the Disk • Autovacuum: ○ Starts Autovacuum worker processes to start a vacuum and analyze • Checkpointer: ○ Perform a CHECKPOINT that ensures that all the changes are flushed to Disk ○ Depends on configuration parameters 21

22.Utility Processes • Archiver: ○ Archives Write-Ahead-Logs ○ Used for High Availability, Backups, PITR • Logger: ○ Logs messages, events, error to syslog or log files. ○ Errors, slow running queries, warnings,..etc. are written to log files by this Process • Stats Collector: ○ Collects statistics of Relations. 22

23.Utility Processes • WAL Sender: • Sends WALs to Replica(s) • One WAL Sender for each Slave connected for Replication • WAL Receiver: • Started on a Slave(aka Standby or Replica) in Replication • Streams WALs from Master • bgworker: • PostgreSQL is extensible to run user-supplied code in separate processes that are monitored by Postgres • Such processes can access PostgreSQL's shared memory area • Connect as a Client using libpq • bgworker: logical replication launcher • Logical Replication between a Publisher and a Subscriber 23

24.Memory Components ● Shared Buffers: ○ PostgreSQL Database Memory Area ○ Shared by all the Databases in the Cluster ○ Pages are fetched from Disk to Shared Buffers during Reads/Writes ○ Modified Buffers are also called as Dirty Buffers ○ Parameter : shared_buffers sets the amount of RAM allocated to shared_buffers ○ Uses LRU Algorithm to flush less frequently used buffers ○ Dirty Buffers written to disk after a CHECKPOINT ● WAL Buffers: ○ Stores Write Ahead Log Records ○ Contains the change vector for a buffer being modified ○ WAL Buffers written to WAL Segments(On Disk) ● work_mem: ○ Memory used by each Query for internal sort operations such as ORDER BY and DISTINCT ○ Postgres writes to disk(temp files) if memory is not sufficient 24

25.Memory Components • maintenance_work_mem: ○ Amount of RAM used by VACUUM, CREATE INDEX, REINDEX like maintenance operations ○ Setting this to a bigger value can help in faster database restore 25

26.PostgreSQL Does Not Use Direct IO ● When it needs a Page(Data Block), it searches it’s own memory aka Shared Buffers ● If not found in shared buffers, it will request the OS for the same block ● The OS fetches the block from the Disk and gives it to Postgres, if the block is not found in OS Cache ● More important to Caching when Database and Active Data set cannot fit in memory 26

27.Disk Components • Data Directory ○ In MySQL, Data Directory is created when you initialize your MySQL Instance ○ Initialized using initdb in PostgreSQL. Similar to mysqld --initialize ○ Contains Write-Ahead-Logs, Log Files, Databases, Objects and other configuration files ○ You can move WAL’s and Logs to different directories using symlinks and parameters ○ Environment Variable: $PGDATA • Configuration Files inside the Data Directory ○ postgresql.conf (Similar to my.cnf file for MySQL) ○ Contains several configurable parameters ○ pg_ident.conf ○ pg_hba.conf ○ postgresql.auto.conf 27

28.What’s Inside Data Directory? 28

29.Configuration Files Inside Data Directory? ● PG_VERSION ○ Version String of the Database Cluster ● pg_hba.conf ○ Host-Based access control file (built-in firewall) ● pg_ident.conf ○ ident-based access file for OS User to DB User Mapping ● postgresql.conf ○ Primary Configuration File for the Database ● postmaster.opts ○ Contains the options used to start the PostgreSQL Instance ● postmaster.pid ○ The Parent Process ID or the Postmaster Process ID 29