大众媒体(硬盘、CD、DVD等),以任何顺序访问信息,通常不支持或不需要顺序访问,因为“文件”不是按顺序存储的。 在PC平台上进行磁盘碎片整理 面向块的设备利用。

注脚

展开查看详情

1.Chapter 19 6e - 17 & 18 5: System Catalog and Query Optimization Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut 191 Auditorium Road, Box U-155 Storrs, CT 06269-3155 steve@engr.uconn.edu http://www.engr.uconn.edu/~steve (860) 486 - 4818 A portion of these slides are being used with the permission of Dr. Ling Lui , Associate Professor, College of Computing, Georgia Tech. Other slides have been adapted from the AWL web site for the textbook. Remaining slides represent new material .

2.Overview of Material Key Background Topics: What are Typical Database Processing Actions? Disk Drives and Disk Storage Database Processing/Architectures Motivating Query Optimization Query Processing Chapter 17 - System Catalog What is it? How is it Used? Chapter 18 - Query Optimization in RDBMS High-level Query Optimization (Algebraic) Low-level Query Optimization (Cost-based)

3.Typical Database Processing Pre-Processing - Parser/Lexical - Optimizer/Views Post-Processing - Collection of Results - Aggregation Operations - Security Checks User Transaction Response to User Errors High-Level Processing - Enqueue Trans. - Request Locks - Release Locks -Dequeue Trans. Errors Results Parsed and Optimized User Trans. Low-Level Processing - Enqueue Trans. - Request Locks - Issue I/Os - Process Returned Data - Integrity Checks - Security Checks - Logging for Recovery - Release Locks - Dequeue Trans. Concurrency Control Lock Request Response Lock Request Disk I/O Recovery I/O Request Results

4.Typical Database Processing Pre-Processing - Parser/Lexical - Optimizer/Views Post-Processing - Collection of Results - Aggregation Operations - Security Checks User Transaction Response to User Errors High-Level Processing - Enqueue Trans. - Request Locks - Release Locks -Dequeue Trans. Errors Results Parsed and Optimized User Trans. Low-Level Processing - Enqueue Trans. - Request Locks - Issue I/Os - Process Returned Data - Integrity Checks - Security Checks - Logging for Recovery - Release Locks - Dequeue Trans. Concurrency Control Lock Request Response Lock Request Disk I/O Recovery I/O Request Results

5.90-10 Rule for Database Processing Load (Transaction per second) vs. Performance (Response Time of Transactions) Processing of Large Amounts of Raw Data Addressed in Secondary Storage Staged to Main Memory Identifying Relevant Data Large Amounts of Raw Data Discarded Focus on Data Most Likely to Contain Answers Possible Loss of CPU and Main Memory Cycles This is Double Jeopardy! Load of DBS Must be Reduced Performance of DBS Degrades

6.Only 10% of Relevant Data has Answers Note: Naive Approach to Database Searching Often Occurs (Little or No Indexing in Practice!) 90-10 Rule for Conventional DBS Application Programs Operating System Database Functions On-Line I/O Disk I/O Only 10% of Raw Data is Relevant

7.Randomly Accessed Storage Devices Popular Media (Hard Drives, CDs, DVDs, etc.) Access to Information in Any Order Sequential Access Not Typically Supported or Needed, Since “Files” Not Stored Sequentially Recall, Disk Defragmentation on PC Platform Block-Oriented Utilization of Device Block Access to Optimize Transfer Block Size is Device/Controller Dependent Linear/Non-Linear Byte Orders with Blocks Key Concepts … Platter Track Sector Cylinder Read/Write Heads

8.Track Sector Top View of a Surface Rotating Storage Cylinder Platters R/W Heads Note: Parallel Read/Write Drives Activate All Heads Simultaneously

9. Disk Drive Components

10.Disk Characteristics and Access Transfer Time: Time to Copy Bits From Disk Surface to Primary Memory Disk Latency Time: Rotational Delay Waiting for Proper Sector to Rotate Under R/W Head Rotate to Next Sector to Process Next Request Disk Seek Time: Delay While R/W Head Moves to the Destination Track/Cylinder Move Head In/Out to Seek Next Track/Cylinder Access = Seek (In/Out) + Latency (Around) + Transfer (Bytes) For DBMS - Key is Moving Data To/From Disk ASAP w.r.t. Performance and Response Time Improve on 90-10 via Processing/Optimization

11.Historical DB Architecture - Mainframe

12.Client/Server DBS Architecture

13.Mixed Architecture

14.Three and Four Tier Architectures From: http://java.sun.com/javaone/javaone98/sessions/T400/index.html

15.What is MBDS? MBDS is Multi-Process, Multi-Computer, Parallel Database System MBDS Composed of … Host for Issuing User Requests Controller to Interact with Host (and User) One or More Backend Database Processors Goals of MBDS Suppose Request Takes 4 Minutes with One Backend Improve Response Time by Increasing Backends Two Backends - Request 2+ Minutes Four Backends - Request 1+ Minutes

16.What is MBDS Architecture? Backend Database Processor Backend Database Processor Backend Database Processor Database Controller Host User Database Blocks are Distributed Across All Backends Backend (BE) DB Processors are Replicated Database Controller Sends Same Query in Parallel to all BEs BEs work in Parallel on Each Query and Communicate for Join Results are Sent to and Collected by the DB Controller - then to the User

17.Approach Distributes Data Across Backends Suppose System has 10 Backends Consider a Number of Tables Inventory Customers Employees … What Happens if Place One Table/Backend? What Happens if you Distribute … Table Across 10 Backends ? Backend Database Processor 2 Backend Database Processor 1 Backend Database Processor 10

18.What are MBDS Processes? Get Msg. Put Msg. Request Preparation Post Processing Get Msg. Put Msg. Directory Management Record Processing Concurrency Control Disk I/O Database Controller Backend Database Processor

19.What are MBDS Messages? No. Type SRC DST 1 New Request Host ReqP 2 Results of Request PoPr Host 3 Number of Reqs in Transaction ReqP PoPr 4 Aggregate Operators (Sum, etc.) ReqP PoPr 6 Parsed Request to Backends ReqP DM 12 Backend Aggregate Operator Results RecP PoPr 15 Ids for Accessing Database Indexes DM DMs 16 Request and Disk Addresses DM RecP 21 Ids for Accessing Database Records DM CC 22 Locks Obtained: Okay to Execute CC RecP 23 Request ID of Finished Request RecP CC

20.Sample Processing of Retrieve Request Get Msg. Put Msg. Request Preparation Post Processing Get Msg. Put Msg. Directory Management Record Processing Concurrency Control Disk I/O F15 From Other Backend E15 To Backend(s) A1 B3 C4 D6 D6,F15 E15 G21 H22 I16 J23 K12 K12 K12

21.Coordination of Synchronous Behavior … Within Controller and Backend to Allow Multiple Active Requests within Each Process Requests at Different Stages in Different Processes Between Controller and Backends to Allow A Request to be Processed by All Backends A Request to be Processed by One Backend Among Multiple Backends to Allow a Backend to Synchronize its Work on one Request with Other Backends to Forward Results to Another Backend What are Synchronization Issues in MBDS?

22.Introduction to Query Processing Query optimization: The process of choosing a suitable execution strategy for processing a query. Two internal representations of a query: Query Tree Query Graph

23.Introduction to Query Processing

24.How Does this Relate to Compilers? Source Program Lexical Analyzer 1 Syntax Analyzer 2 Semantic Analyzer 3 Intermediate Code Generator 4 Code Optimizer 5 Code Generator 6 Target Program Symbol-table Manager Error Handler 1, 2, 3 : Analysis - Our Focus 4, 5, 6 : Synthesis

25.Translating SQL Queries into Relational Algebra Query block: The basic unit that can be translated into the algebraic operators and optimized. A query block contains a single SELECT-FROM-WHERE expression, as well as GROUP BY and HAVING clause if these are part of the block. Nested queries within a query are identified as separate query blocks. Aggregate operators in SQL must be included in the extended algebra.

26.Translating SQL Queries into Relational Algebra SELECT LNAME, FNAME FROM EMPLOYEE WHERE SALARY > ( SELECT MAX (SALARY) FROM EMPLOYEE WHERE DNO = 5); SELECT MAX (SALARY) FROM EMPLOYEE WHERE DNO = 5 SELECT LNAME, FNAME FROM EMPLOYEE WHERE SALARY > C π LNAME, FNAME ( σ SALARY>C (EMPLOYEE)) ℱ MAX SALARY ( σ DNO=5 (EMPLOYEE))

27.Why is Query Optimization Needed? Data Volume in any Type of Join or Cartesian Product has the Potential to be Very Large! Consider R(A, B) = {r 1 , r 2 , ..., r n } Consider S(C, D) = {s 1 , s 2 , ..., s m } R x S = { r 1 s 1 , r 1 s 2 , r 1 s 3 , r 1 s 4 , … r 2 s 1 , r 2 s 2 , r 2 s 3 , r 2 s 4 , … } which contains n x m tuples! What is the Issue? If n is 10,000 and m is 20,000 then Cartesian Product has 200,000,000 Tuples Join must Perform 200,000,000 Comparisons

28.Aside – What is an External Sort? Traditional – All Algorithm/Programming Classes Focus on Internal Searches/Sorts Internal – All Data Loaded into Main Memory Data Searched/Sorted in Main Memory Results in Main Memory What Happens When Data Source to be Searched Exceeds Main Memory (or Virtual Memory)? External Search Stages Blocks of Data from Disk into Memory Sorts/Searches with Blocks in Memory Writes Intermediate results to Disk Need to Reread Results from Disk for Final Search Results, Merging for Sort, etc.

29.Why is Query Optimization Needed? n/m - Number of Tuples of R/S Respectively b R / b S - Number of Tuples/Block of Memory Assume that K Blocks Fit into Primary Memory 2 1 3 K-1 n / b R (m / b S ) Number of Blocks for R/S 1 Block of R K-1 Blocks of S (m / b S )/(K-1) Number of Times that K-1 Memory Chunk Filled by S (n / b R )[(m / b S )/(K-1)] Which if Filled for Each Block of R (n / b R ) + (n / b R )[(m / b S )/(K-1)] Total Block Reads Must also Read Blocks of R