- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
UsingVoltDB9.2
This book explains how to use VoltDB to design, build, and run high performance applications.
The text and illustrations in this document are licensed under the terms of the GNU Affero General Public License Version 3 as published by the Free Software Foundation. See the GNU Affero General Public License (http://www.gnu.org/licenses/) for more details.
Many of the core VoltDB database features described herein are part of the VoltDB Community Edition, which is licensed under the GNU Affero Public License 3 as published by the Free Software Foundation. Other features are specific to the VoltDB Enterprise Edition and VoltDB Pro, which are distributed by VoltDB, Inc. under a commercial license. Your rights to access and use VoltDB features described herein are defined by the license you received when you acquired the software.
VoltDB is a trademark of VoltDB, Inc.
VoltDB software is protected by U.S. Patent Nos. 9,600,514 and 9,639,571. Other patents pending
展开查看详情
1 .Using VoltDB Abstract This book explains how to use VoltDB to design, build, and run high performance applicaons. V9.2
2 .Using VoltDB V9.2 Copyright © 2008-2019 VoltDB, Inc. The text and illustrations in this document are licensed under the terms of the GNU Affero General Public License Version 3 as published by the Free Software Foundation. See the GNU Affero General Public License (http://www.gnu.org/licenses/) for more details. Many of the core VoltDB database features described herein are part of the VoltDB Community Edition, which is licensed under the GNU Affero Public License 3 as published by the Free Software Foundation. Other features are specific to the VoltDB Enterprise Edition and VoltDB Pro, which are distributed by VoltDB, Inc. under a commercial license. Your rights to access and use VoltDB features described herein are defined by the license you received when you acquired the software. VoltDB is a trademark of VoltDB, Inc. VoltDB software is protected by U.S. Patent Nos. 9,600,514 and 9,639,571. Other patents pending. This document was generated on November 04, 2019.
3 .Table of Contents About This Book .............................................................................................................. xiii 1. Overview ....................................................................................................................... 1 1.1. What is VoltDB? .................................................................................................. 1 1.2. Who Should Use VoltDB ....................................................................................... 1 1.3. How VoltDB Works .............................................................................................. 2 1.3.1. Partitioning ................................................................................................ 2 1.3.2. Serialized (Single-Threaded) Processing ......................................................... 2 1.3.3. Partitioned vs. Replicated Tables ................................................................... 3 1.3.4. Ease of Scaling to Meet Application Needs ..................................................... 4 1.4. Working with VoltDB Effectively ............................................................................ 4 2. Installing VoltDB ............................................................................................................ 5 2.1. Operating System and Software Requirements ............................................................ 5 2.2. Installing VoltDB .................................................................................................. 6 2.2.1. Upgrading From Older Versions ................................................................... 6 2.2.2. Building a New VoltDB Distribution Kit ........................................................ 7 2.3. Setting Up Your Environment ................................................................................. 7 2.4. What is Included in the VoltDB Distribution ............................................................. 7 2.5. VoltDB in Action: Running the Sample Applications .................................................. 8 3. Starting the Database ....................................................................................................... 9 3.1. Initializing and Starting a VoltDB Database ............................................................... 9 3.2. Initializing and Starting a VoltDB Database on a Cluster ............................................ 10 3.3. Stopping a VoltDB Database ................................................................................. 11 3.4. Saving the Data .................................................................................................. 12 3.5. Restarting a VoltDB Database ............................................................................... 12 3.6. Updating Nodes on the Cluster .............................................................................. 12 3.7. Defining the Cluster Configuration ......................................................................... 13 3.7.1. Determining How Many Sites per Host ......................................................... 14 3.7.2. Configuring Paths for Runtime Features ........................................................ 14 3.7.3. Verifying your Hardware Configuration ........................................................ 15 4. Designing the Database Schema ....................................................................................... 16 4.1. How to Enter DDL Statements .............................................................................. 17 4.2. Creating Tables and Primary Keys ......................................................................... 18 4.3. Analyzing Data Volume and Workload ................................................................... 19 4.4. Partitioning Database Tables ................................................................................. 20 4.4.1. Choosing a Column on which to Partition Table Rows ..................................... 20 4.4.2. Specifying Partitioned Tables ...................................................................... 21 4.4.3. Design Rules for Partitioning Tables ............................................................ 21 4.5. Replicating Database Tables .................................................................................. 21 4.5.1. Choosing Replicated Tables ........................................................................ 22 4.5.2. Specifying Replicated Tables ...................................................................... 22 4.6. Modifying the Schema ......................................................................................... 22 4.6.1. Effects of Schema Changes on Data and Clients ............................................. 23 4.6.2. Viewing the Schema .................................................................................. 24 4.6.3. Modifying Tables ...................................................................................... 24 4.6.4. Adding and Dropping Indexes ..................................................................... 26 4.6.5. Modifying Partitioning for Tables and Stored Procedures ................................. 27 5. Designing Stored Procedures to Access the Database ........................................................... 31 5.1. How Stored Procedures Work ................................................................................ 31 5.1.1. VoltDB Stored Procedures are Transactional .................................................. 31 5.1.2. VoltDB Stored Procedures are Deterministic .................................................. 31 5.2. The Anatomy of a VoltDB Stored Procedure ............................................................ 33 iii
4 . Using VoltDB 5.2.1. The Structure of the Stored Procedure .......................................................... 33 5.2.2. Passing Arguments to a Stored Procedure ...................................................... 35 5.2.3. Creating and Executing SQL Queries in Stored Procedures ............................... 36 5.2.4. Interpreting the Results of SQL Queries ........................................................ 37 5.2.5. Returning Results from a Stored Procedure .................................................... 40 5.2.6. Rolling Back a Transaction ......................................................................... 41 5.3. Installing Stored Procedures into the Database .......................................................... 41 5.3.1. Compiling, Packaging, and Loading Stored Procedures .................................... 42 5.3.2. Declaring Stored Procedures in the Schema ................................................... 42 5.3.3. Partitioning Stored Procedures in the Schema ................................................. 43 6. Designing VoltDB Client Applications .............................................................................. 47 6.1. Connecting to the VoltDB Database ....................................................................... 47 6.1.1. Connecting to Multiple Servers ................................................................... 48 6.1.2. Using the Auto-Connecting Client ............................................................... 48 6.2. Invoking Stored Procedures ................................................................................... 49 6.3. Invoking Stored Procedures Asynchronously ............................................................ 49 6.4. Closing the Connection ........................................................................................ 51 6.5. Handling Errors .................................................................................................. 51 6.5.1. Interpreting Execution Errors ...................................................................... 51 6.5.2. Handling Timeouts .................................................................................... 53 6.5.3. Writing a Status Listener to Interpret Other Errors .......................................... 54 6.6. Compiling and Running Client Applications ............................................................. 56 6.6.1. Starting the Client Application .................................................................... 56 6.6.2. Running Clients from Outside the Cluster ..................................................... 57 7. Simplifying Application Development ............................................................................... 58 7.1. Using Default Procedures ..................................................................................... 58 7.2. Shortcut for Defining Simple Stored Procedures ....................................................... 59 7.3. Verifying Expected Query Results .......................................................................... 60 7.4. Scheduling Stored Procedures as Tasks ................................................................... 61 7.5. Directed Procedures: Distributing Transactions to Every Partition ................................. 62 8. Using VoltDB with Other Programming Languages ............................................................. 64 8.1. C++ Client Interface ............................................................................................ 64 8.1.1. Writing VoltDB Client Applications in C++ .................................................. 64 8.1.2. Creating a Connection to the Database Cluster ............................................... 65 8.1.3. Invoking Stored Procedures ........................................................................ 65 8.1.4. Invoking Stored Procedures Asynchronously .................................................. 66 8.1.5. Interpreting the Results .............................................................................. 67 8.2. JSON HTTP Interface .......................................................................................... 67 8.2.1. How the JSON Interface Works .................................................................. 67 8.2.2. Using the JSON Interface from Client Applications ......................................... 69 8.2.3. How Parameters Are Interpreted .................................................................. 71 8.2.4. Interpreting the JSON Results ..................................................................... 72 8.2.5. Error Handling using the JSON Interface ...................................................... 73 8.3. JDBC Interface ................................................................................................... 74 8.3.1. Using JDBC to Connect to a VoltDB Database .............................................. 74 8.3.2. Using JDBC to Query a VoltDB Database ..................................................... 75 9. Using VoltDB in a Cluster .............................................................................................. 77 9.1. Starting a Database Cluster ................................................................................... 77 9.2. Updating the Cluster Configuration ........................................................................ 77 9.3. Elastic Scaling to Resize the Cluster ....................................................................... 78 9.3.1. Adding Nodes with Elastic Scaling .............................................................. 79 9.3.2. Removing Nodes with Elastic Scaling .......................................................... 80 9.3.3. Configuring How VoltDB Rebalances Nodes During Elastic Scaling .................. 80 10. Availability ................................................................................................................. 82 iv
5 . Using VoltDB 10.1. How K-Safety Works ......................................................................................... 82 10.2. Enabling K-Safety ............................................................................................. 83 10.2.1. What Happens When You Enable K-Safety ................................................. 84 10.2.2. Calculating the Appropriate Number of Nodes for K-Safety ............................ 84 10.3. Recovering from System Failures ......................................................................... 85 10.3.1. What Happens When a Node Rejoins the Cluster .......................................... 85 10.3.2. Where and When Recovery May Fail ......................................................... 86 10.4. Avoiding Network Partitions ................................................................................ 86 10.4.1. K-Safety and Network Partitions ................................................................ 86 10.4.2. Using Network Fault Protection ................................................................. 87 11. Database Replication .................................................................................................... 90 11.1. How Database Replication Works ......................................................................... 91 11.1.1. Starting Database Replication .................................................................... 92 11.1.2. Database Replication, Availability, and Disaster Recovery .............................. 93 11.1.3. Database Replication and Completeness ...................................................... 94 11.2. Using Passive Database Replication ...................................................................... 95 11.2.1. Specifying the DR Tables in the Schema ..................................................... 95 11.2.2. Configuring the Clusters ........................................................................... 96 11.2.3. Starting the Clusters ................................................................................ 96 11.2.4. Loading the Schema and Starting Replication ............................................... 96 11.2.5. Updating the Schema During Replication .................................................... 97 11.2.6. Stopping Replication ................................................................................ 98 11.2.7. Database Replication and Read-only Clients ............................................... 100 11.3. Using Cross Datacenter Replication ..................................................................... 100 11.3.1. Designing Your Schema for Active Replication ........................................... 101 11.3.2. Configuring the Database Clusters ............................................................ 101 11.3.3. Starting the Database Clusters .................................................................. 103 11.3.4. Loading a Matching Schema and Starting Replication .................................. 104 11.3.5. Updating the Schema During Active Replication ......................................... 104 11.3.6. Stopping Replication .............................................................................. 105 11.3.7. Example XDCR Configurations ............................................................... 106 11.3.8. Understanding Conflict Resolution ............................................................ 106 11.4. Monitoring Database Replication ........................................................................ 113 12. Security .................................................................................................................... 115 12.1. How Security Works in VoltDB ......................................................................... 115 12.2. Enabling Authentication and Authorization ........................................................... 115 12.3. Defining Users and Roles .................................................................................. 116 12.4. Assigning Access to Stored Procedures ................................................................ 117 12.5. Assigning Access by Function (System Procedures, SQL Queries, and Default Procedures) ............................................................................................................. 117 12.6. Using Built-in Roles ......................................................................................... 118 12.7. Encrypting VoltDB Communication Using TLS/SSL .............................................. 118 12.7.1. Configuring TLS/SSL on the VoltDB Server .............................................. 119 12.7.2. Choosing What Ports to Encrypt with TLS/SSL .......................................... 120 12.7.3. Using the VoltDB Command Line Utilities with TLS/SSL ............................ 120 12.7.4. Implementing TLS/SSL in the Java Client Applications ................................ 121 12.7.5. Configuring Database Replication (DR) With TLS/SSL ................................ 121 12.8. Integrating Kerberos Security with VoltDB ........................................................... 122 12.8.1. Installing and Configuring Kerberos .......................................................... 122 12.8.2. Installing and Configuring the Java Security Extensions ................................ 123 12.8.3. Configuring the VoltDB Servers and Clients ............................................... 124 12.8.4. Accessing the Database from the Command Line and the Web ....................... 126 13. Saving & Restoring a VoltDB Database ......................................................................... 127 13.1. Performing a Manual Save and Restore of a VoltDB Cluster .................................... 127 v
6 . Using VoltDB 13.1.1. How to Save the Contents of a VoltDB Database ........................................ 128 13.1.2. How to Restore the Contents of a VoltDB Database Manually ........................ 128 13.1.3. Changing the Cluster Configuration Using Save and Restore .......................... 129 13.2. Scheduling Automated Snapshots ........................................................................ 131 13.3. Managing Snapshots ......................................................................................... 131 13.4. Special Notes Concerning Save and Restore ......................................................... 132 14. Command Logging and Recovery .................................................................................. 133 14.1. How Command Logging Works ......................................................................... 133 14.2. Controlling Command Logging .......................................................................... 134 14.3. Configuring Command Logging for Optimal Performance ....................................... 134 14.3.1. Log Size .............................................................................................. 135 14.3.2. Log Frequency ...................................................................................... 135 14.3.3. Synchronous vs. Asynchronous Logging .................................................... 135 14.3.4. Hardware Considerations ........................................................................ 136 15. Importing and Exporting Live Data ............................................................................... 138 15.1. Understanding Export ....................................................................................... 138 15.2. Planning your Export Strategy ............................................................................ 139 15.3. Identifying Export Streams in the Schema ............................................................ 141 15.4. Configuring Export in the Configuration File ........................................................ 142 15.5. How Export Works .......................................................................................... 143 15.5.1. Export Overflow .................................................................................... 144 15.5.2. Persistence Across Database Sessions ........................................................ 145 15.6. The File Export Connector ................................................................................ 145 15.7. The HTTP Export Connector ............................................................................. 147 15.7.1. Understanding HTTP Properties ............................................................... 147 15.7.2. Exporting to Hadoop via WebHDFS ......................................................... 149 15.7.3. Exporting to Hadoop Using Kerberos Security ............................................ 151 15.8. The JDBC Export Connector .............................................................................. 151 15.9. The Kafka Export Connector .............................................................................. 153 15.10. The RabbitMQ Export Connector ...................................................................... 155 15.11. The Elasticsearch Export Connector ................................................................... 157 15.12. Understanding Import ...................................................................................... 158 15.12.1. One-Time Import Using Data Loading Utilities ......................................... 159 15.12.2. Streaming Import Using Built-in Import Features ....................................... 159 15.13. The Kafka Importer ........................................................................................ 160 15.14. The Kinesis Importer ...................................................................................... 162 15.15. The CSV/TSV Import Formatters ...................................................................... 163 A. Supported SQL DDL Statements .................................................................................... 165 ALTER STREAM ................................................................................................... 166 ALTER TABLE ...................................................................................................... 167 ALTER TASK ........................................................................................................ 170 CREATE AGGREGATE FUNCTION ......................................................................... 171 CREATE FUNCTION .............................................................................................. 173 CREATE INDEX .................................................................................................... 175 CREATE PROCEDURE AS ...................................................................................... 177 CREATE PROCEDURE FROM CLASS ..................................................................... 179 CREATE ROLE ...................................................................................................... 181 CREATE STREAM ................................................................................................. 183 CREATE TABLE .................................................................................................... 186 CREATE TASK ...................................................................................................... 193 CREATE VIEW ...................................................................................................... 195 DR TABLE ............................................................................................................ 197 DROP FUNCTION .................................................................................................. 198 DROP INDEX ........................................................................................................ 199 vi
7 . Using VoltDB DROP PROCEDURE ............................................................................................... 200 DROP ROLE .......................................................................................................... 201 DROP STREAM ..................................................................................................... 202 DROP TABLE ........................................................................................................ 203 DROP TASK .......................................................................................................... 204 DROP VIEW .......................................................................................................... 205 PARTITION PROCEDURE ...................................................................................... 206 PARTITION TABLE ............................................................................................... 208 B. Supported SQL Statements ............................................................................................ 209 DELETE ................................................................................................................ 210 INSERT ................................................................................................................. 212 MIGRATE .............................................................................................................. 214 SELECT ................................................................................................................. 215 TRUNCATE TABLE ............................................................................................... 224 UPDATE ................................................................................................................ 225 UPSERT ................................................................................................................ 226 C. SQL Functions ............................................................................................................ 228 ABS() .................................................................................................................... 231 APPROX_COUNT_DISTINCT() ............................................................................... 232 AREA() .................................................................................................................. 233 ARRAY_ELEMENT() .............................................................................................. 234 ARRAY_LENGTH() ................................................................................................ 235 ASTEXT() .............................................................................................................. 236 AVG() ................................................................................................................... 237 BIN() ..................................................................................................................... 238 BIT_SHIFT_LEFT() ................................................................................................. 239 BIT_SHIFT_RIGHT() .............................................................................................. 240 BITAND() .............................................................................................................. 241 BITNOT() .............................................................................................................. 242 BITOR() ................................................................................................................. 243 BITXOR() .............................................................................................................. 244 CAST() .................................................................................................................. 245 CEILING() ............................................................................................................. 246 CENTROID() .......................................................................................................... 247 CHAR() ................................................................................................................. 248 CHAR_LENGTH() .................................................................................................. 249 COALESCE() ......................................................................................................... 250 CONCAT() ............................................................................................................. 251 CONTAINS() .......................................................................................................... 252 COS() .................................................................................................................... 253 COT() .................................................................................................................... 254 COUNT() ............................................................................................................... 255 CSC() .................................................................................................................... 256 CURRENT_TIMESTAMP() ...................................................................................... 257 DATEADD() ........................................................................................................... 258 DAY(), DAYOFMONTH() ........................................................................................ 259 DAYOFWEEK() ...................................................................................................... 260 DAYOFYEAR() ...................................................................................................... 261 DECODE() ............................................................................................................. 262 DEGREES() ............................................................................................................ 263 DISTANCE() .......................................................................................................... 264 DWITHIN() ............................................................................................................ 265 EXP() .................................................................................................................... 266 EXTRACT() ........................................................................................................... 267 vii
8 . Using VoltDB FIELD() ................................................................................................................. 269 FLOOR() ................................................................................................................ 271 FORMAT_CURRENCY() ......................................................................................... 272 FORMAT_TIMESTAMP() ....................................................................................... 273 FROM_UNIXTIME() ............................................................................................... 274 HEX() .................................................................................................................... 275 HOUR() ................................................................................................................. 276 INET6_ATON() ...................................................................................................... 277 INET6_NTOA() ...................................................................................................... 278 INET_ATON() ........................................................................................................ 279 INET_NTOA() ........................................................................................................ 280 ISINVALIDREASON() ............................................................................................ 281 ISVALID() ............................................................................................................. 282 IS_VALID_TIMESTAMP() ....................................................................................... 284 LATITUDE() .......................................................................................................... 285 LEFT() ................................................................................................................... 286 LN(), LOG() ........................................................................................................... 287 LOG10() ................................................................................................................ 288 LONGITUDE() ....................................................................................................... 289 LOWER() ............................................................................................................... 290 MAKEVALIDPOLYGON() ....................................................................................... 291 MAX() ................................................................................................................... 292 MAX_VALID_TIMESTAMP() .................................................................................. 293 MIGRATING() ........................................................................................................ 294 MIN() .................................................................................................................... 295 MIN_VALID_TIMESTAMP() ................................................................................... 296 MINUTE() .............................................................................................................. 297 MOD() ................................................................................................................... 298 MONTH() .............................................................................................................. 299 NOW() ................................................................................................................... 300 NUMINTERIORRINGS() ......................................................................................... 301 NUMPOINTS() ....................................................................................................... 302 OCTET_LENGTH() ................................................................................................. 303 OVERLAY() ........................................................................................................... 304 PI() ........................................................................................................................ 305 POINTFROMTEXT() ............................................................................................... 306 POLYGONFROMTEXT() ......................................................................................... 307 POSITION() ........................................................................................................... 308 POWER() ............................................................................................................... 309 QUARTER() ........................................................................................................... 310 RADIANS() ............................................................................................................ 311 REGEXP_POSITION() ............................................................................................. 312 REPEAT() .............................................................................................................. 313 REPLACE() ............................................................................................................ 314 RIGHT() ................................................................................................................ 315 ROUND() ............................................................................................................... 316 SEC() .................................................................................................................... 317 SECOND() ............................................................................................................. 318 SET_FIELD() .......................................................................................................... 319 SIN() ..................................................................................................................... 321 SINCE_EPOCH() .................................................................................................... 322 SPACE() ................................................................................................................ 323 SQRT() .................................................................................................................. 324 STR() .................................................................................................................... 325 viii
9 . Using VoltDB SUBSTRING() ........................................................................................................ 326 SUM() ................................................................................................................... 327 TAN() .................................................................................................................... 328 TO_TIMESTAMP() ................................................................................................. 329 TRIM() .................................................................................................................. 330 TRUNCATE() ......................................................................................................... 331 UPPER() ................................................................................................................ 332 VALIDPOLYGONFROMTEXT() .............................................................................. 333 WEEK(), WEEKOFYEAR() ...................................................................................... 334 WEEKDAY() .......................................................................................................... 335 YEAR() .................................................................................................................. 336 D. VoltDB CLI Commands ............................................................................................... 337 csvloader ................................................................................................................ 338 jdbcloader ............................................................................................................... 343 kafkaloader ............................................................................................................. 347 sqlcmd ................................................................................................................... 351 voltadmin ............................................................................................................... 356 voltdb .................................................................................................................... 363 voltsql .................................................................................................................... 370 E. Configuration File (deployment.xml) ............................................................................... 373 E.1. Understanding XML Syntax ................................................................................ 373 E.2. The Structure of the Configuration File ................................................................. 373 F. VoltDB Datatype Compatibility ...................................................................................... 379 F.1. Java and VoltDB Datatype Compatibility ............................................................... 379 G. System Procedures ....................................................................................................... 382 @AdHoc ................................................................................................................ 383 @Explain ............................................................................................................... 385 @ExplainProc ......................................................................................................... 386 @ExplainView ........................................................................................................ 387 @GetPartitionKeys ................................................................................................... 389 @Pause .................................................................................................................. 391 @Ping .................................................................................................................... 392 @Promote .............................................................................................................. 393 @QueryStats ........................................................................................................... 394 @Quiesce ............................................................................................................... 396 @Resume ............................................................................................................... 398 @Shutdown ............................................................................................................ 399 @SnapshotDelete ..................................................................................................... 400 @SnapshotRestore ................................................................................................... 402 @SnapshotSave ....................................................................................................... 405 @SnapshotScan ....................................................................................................... 409 @Statistics .............................................................................................................. 412 @StopNode ............................................................................................................ 435 @SwapTables ......................................................................................................... 437 @SystemCatalog ...................................................................................................... 439 @SystemInformation ................................................................................................ 444 @UpdateApplicationCatalog ...................................................................................... 448 @UpdateClasses ...................................................................................................... 450 @UpdateLogging ..................................................................................................... 452 ix
10 .List of Figures 1.1. Partitioning Tables ........................................................................................................ 2 1.2. Serialized Processing ..................................................................................................... 3 1.3. Replicating Tables ......................................................................................................... 4 4.1. Components of a Database Schema ................................................................................ 16 4.2. Partitions Distribute Table Data and Stored Procedure Processing ........................................ 17 4.3. Diagram Representing the Flight Reservation System ........................................................ 19 5.1. Array of VoltTable Structures ....................................................................................... 37 5.2. One VoltTable Structure is returned for each Queued SQL Statement .................................... 38 5.3. Stored Procedures Execute in the Appropriate Partition Based on the Partitioned Parameter Value ............................................................................................................................... 44 8.1. The Structure of the VoltDB JSON Response ................................................................... 72 10.1. K-Safety in Action ..................................................................................................... 83 10.2. Network Partition ...................................................................................................... 87 10.3. Network Fault Protection in Action ............................................................................... 88 11.1. Passive Database Replication ....................................................................................... 90 11.2. Cross Datacenter Replication ....................................................................................... 91 11.3. Replicating an Existing Database .................................................................................. 93 11.4. Promoting the Replica ................................................................................................ 94 11.5. Read-Only Access to the Replica ................................................................................ 100 11.6. Standard XDCR Configuration ................................................................................... 106 11.7. XDCR Configuration with Read-Only Replicas ............................................................. 106 11.8. Transaction Order and Conflict Resolution ................................................................... 107 14.1. Command Logging in Action ..................................................................................... 133 14.2. Recovery in Action .................................................................................................. 134 15.1. Overview of the Export Process ................................................................................. 139 15.2. Flight Schema with Export Streams ............................................................................. 140 E.1. Configuration XML Structure ...................................................................................... 375 x
11 .List of Tables 2.1. Operating System and Software Requirements ................................................................... 5 2.2. Components Installed by VoltDB ..................................................................................... 7 4.1. Example Application Workload ..................................................................................... 19 5.1. Methods of the VoltTable Classes .................................................................................. 39 8.1. Datatypes in the JSON Interface .................................................................................... 71 11.1. Structure of the XDCR Conflict Logs .......................................................................... 112 12.1. Named Security Permissions ...................................................................................... 117 15.1. File Export Properties ............................................................................................... 146 15.2. Export Metadata ...................................................................................................... 147 15.3. HTTP Export Properties ............................................................................................ 149 15.4. JDBC Export Properties ............................................................................................ 152 15.5. Kafka Export Properties ............................................................................................ 154 15.6. RabbitMQ Export Properties ...................................................................................... 156 15.7. Elasticsearch Export Properties ................................................................................... 158 15.8. Kafka Import Properties ............................................................................................ 161 15.9. Kinesis Import Properties .......................................................................................... 162 15.10. CSV and TSV Formatter Properties ........................................................................... 163 A.1. Supported SQL Datatypes .......................................................................................... 186 C.1. Selectable Values for the EXTRACT Function ............................................................... 267 E.1. Configuration File Elements and Attributes .................................................................... 376 F.1. Java and VoltDB Datatype Compatibility ....................................................................... 379 G.1. @SnapshotRestoreOptions .......................................................................................... 402 G.2. @SnapshotSave Options ............................................................................................. 406 xi
12 .List of Examples 4.1. DDL Example of a Reservation Schema .......................................................................... 18 5.1. Components of a VoltDB Java Stored Procedure ............................................................... 34 5.2. Cycles of Queue and Execute in a Stored Procedure .......................................................... 37 5.3. Displaying the Contents of VoltTable Arrays ................................................................... 40 xii
13 .About This Book This book is a complete guide to VoltDB. It describes what VoltDB is, how it works, and — more importantly — how to use it to build high performance, data intensive applications. The book is divided into five parts: Part 1: Getting Started Explains what VoltDB is, how it works, how to install it, and how to start using VoltDB. The chapters in this section are: • Chapter 1, Overview • Chapter 2, Installing VoltDB • Chapter 3, Starting the Database Part 2: Developing VoltDB Describes how to design and develop applications using VoltDB. The Database Applications chapters in this section are: • Chapter 4, Designing the Database Schema • Chapter 5, Designing Stored Procedures to Access the Database • Chapter 6, Designing VoltDB Client Applications • Chapter 7, Simplifying Application Development • Chapter 8, Using VoltDB with Other Programming Languages Part 3: Running VoltDB in a Describes additional features useful for running a database in a cluster. Cluster The chapters in this section are: • Chapter 9, Using VoltDB in a Cluster • Chapter 10, Availability • Chapter 11, Database Replication • Chapter 12, Security Part 4: Managing the Data Provides techniques for ensuring data durability and integrity. The chapters in this section are: • Chapter 13, Saving & Restoring a VoltDB Database • Chapter 14, Command Logging and Recovery • Chapter 15, Importing and Exporting Live Data Part 5: Reference Material Provides reference information about the languages and interfaces used by VoltDB, including: • Appendix A, Supported SQL DDL Statements • Appendix B, Supported SQL Statements • Appendix C, SQL Functions • Appendix D, VoltDB CLI Commands xiii
14 . About This Book • Appendix E, Configuration File (deployment.xml) • Appendix F, VoltDB Datatype Compatibility • Appendix G, System Procedures This book provides the most complete description of the VoltDB product. It includes features from both the open source Community Edition and the commercial products VoltDB Enterprise Edition and VoltDB Pro. In general, the features described in Parts 1 and 2 are available in all versions of the product. Several features in Parts 3 and 4 are unique to the commercial products. If you are new to VoltDB, the VoltDB Tutorial provides an introduction to the product and its features. The tutorial, and other books, are available on the web from http://docs.voltdb.com/. xiv
15 .Chapter 1. Overview 1.1. What is VoltDB? VoltDB is a revolutionary new database product. Designed from the ground up to be the best solution for high performance business-critical applications, the VoltDB architecture is able to achieve 45 times higher throughput than current database products. The architecture also allows VoltDB databases to scale easily by adding processors to the cluster as the data volume and transaction requirements grow. Current commercial database products are designed as general-purpose data management solutions. They can be tweaked for specific application requirements. However, the one-size-fits-all architecture of traditional databases limits the extent to which they can be optimized. Although the basic architecture of databases has not changed significantly in 30 years, computing has. As have the demands and expectations of business applications and the corporations that depend on them. VoltDB is designed to take full advantage of the modern computing environment: • VoltDB uses in-memory storage to maximize throughput, avoiding costly disk access. • Further performance gains are achieved by serializing all data access, avoiding many of the time- consuming functions of traditional databases such as locking, latching, and maintaining transaction logs. • Scalability, reliability, and high availability are achieved through clustering and replication across multiple servers and server farms. VoltDB is a fully ACID-compliant transactional database, relieving the application developer from having to develop code to perform transactions and manage rollbacks within their own application. By using ANSI standard SQL for the schema definition and data access, VoltDB also reduces the learning curve for experienced database designers. 1.2. Who Should Use VoltDB VoltDB is not intended to solve all database problems. It is targeted at a specific segment of business computing. VoltDB focuses specifically on fast data. That is, applications that must process large streams of data quickly. This includes financial applications, social media applications, and the burgeoning field of the Internet of Things. The key requirements for these applications are scalability, reliability, high availability, and outstanding throughput. VoltDB is used today for traditional high performance applications such as capital markets data feeds, financial trade, telco record streams and sensor-based distribution systems. It's also used in emerging applications like wireless, online gaming, fraud detection, digital ad exchanges and micro transaction systems. Any application requiring high database throughput, linear scaling and uncompromising data accuracy will benefit immediately from VoltDB. However, VoltDB is not optimized for all types of queries. For example, VoltDB is not the optimal choice for collecting and collating extremely large historical data sets which must be queried across multiple tables. This sort of activity is commonly found in business intelligence and data warehousing solutions, for which other database products are better suited. 1
16 . Overview To aid businesses that require both exceptional transaction performance and ad hoc reporting, VoltDB includes integration functions so that historical data can be exported to an analytic database for larger scale data mining. 1.3. How VoltDB Works VoltDB is not like traditional database products. Each VoltDB database is optimized for a specific application by partitioning the database tables and the stored procedures that access those tables across multiple "sites" or partitions on one or more host machines to create the distributed database. Because both the data and the work is partitioned, multiple queries can be run in parallel. At the same time, because each site operates independently, each transaction can run to completion without the overhead of locking individual records that consumes much of the processing time of traditional databases. Finally, VoltDB balances the requirements of maximum performance with the flexibility to accommodate less intense but equally important queries that cross partitions. The following sections describe these concepts in more detail. 1.3.1. Paroning In VoltDB, each stored procedure is defined as a transaction. The stored procedure (i.e. transaction) succeeds or rolls back as a whole, ensuring database consistency. By analyzing and precompiling the data access logic in the stored procedures, VoltDB can distribute both the data and the processing associated with it to the individual partitions on the cluster. In this way, each partition contains a unique "slice" of the data and the data processing. Each node in the cluster can support multiple partitions. Figure 1.1. Partitioning Tables 1.3.2. Serialized (Single-Threaded) Processing At run-time, calls to the stored procedures are passed to the appropriate partition. When procedures are "single-partitioned" (meaning they operate on data within a single partition) the server process executes the procedure by itself, freeing the rest of the cluster to handle other requests in parallel. 2
17 . Overview By using serialized processing, VoltDB ensures transactional consistency without the overhead of locking, latching, and transaction logs, while partitioning lets the database handle multiple requests at a time. As a general rule of thumb, the more processors (and therefore the more partitions) in the cluster, the more transactions VoltDB completes per second, providing an easy, almost linear path for scaling an application's capacity and performance. When a procedure does require data from multiple partitions, one node acts as a coordinator and hands out the necessary work to the other nodes, collects the results and completes the task. This coordination makes multi-partitioned transactions slightly slower than single-partitioned transactions. However, transactional integrity is maintained and the architecture of multiple parallel partitions ensures throughput is kept at a maximum. Figure 1.2. Serialized Processing It is important to note that the VoltDB architecture is optimized for total throughput. Each transaction runs uninterrupted in its own thread, minimizing the individual latency per transaction (the time from when the transaction begins until processing ends). This also eliminates the overhead needed for locking, latching, and other administrative tasks, reducing the amount of time requests sit in the queue waiting to be executed. The result is that for a suitably partitioned schema, the number of transactions that can be completed in a second (i.e. throughput) is orders of magnitude higher than traditional databases. 1.3.3. Paroned vs. Replicated Tables Tables are partitioned in VoltDB based on a column that you, the developer or designer, specify. When you choose partitioning columns that match the way the data is accessed by the stored procedures, it optimizes execution at runtime. To further optimize performance, VoltDB allows certain database tables to be replicated to all nodes of the cluster. For small tables that are largely read-only, this allows stored procedures to create joins between this table and another larger table while remaining a single-partitioned transaction. For example, a retail merchandising database that uses product codes as the primary key may have one table that simply correlates the product code with the product's category and full name, Since this table is relatively small and does not change frequently (unlike inventory and orders) it can be replicated for access by all partitions. This way stored procedures can retrieve and return user-friendly product information when searching by product code without impacting the performance of order and inventory updates and searches. 3
18 . Overview Figure 1.3. Replicating Tables 1.3.4. Ease of Scaling to Meet Applicaon Needs The VoltDB architecture is designed to simplify the process of scaling the database to meet the changing needs of your application. Increasing the number of nodes in a VoltDB cluster both increases throughput (by increasing the number of simultaneous queues in operation) and increases the data capacity (by increasing the number of partitions used for each table). Scaling up a VoltDB database is a simple process that doesn't require any changes to the database schema or application code. You can either: • Save the database (using a snapshot), then restart the database specifying the new number of nodes for the resized cluster and using restore to reload the schema and data. • Add nodes "on the fly" while the database is running. 1.4. Working with VoltDB Effecvely It is possible to use VoltDB like any other SQL database, creating tables and performing ad hoc SQL queries using standard SQL statements. However, to take full advantage of VoltDB's capabilities, it is best to design your schema and your stored procedures to maximize the use of partitioned tables and procedures. There are also additional features of VoltDB to increase the availability and durability of your data. The following sections explain how to work effectively with VoltDB, including: • Chapters 2 and 3 explain how to install VoltDB and create a new database. • Chapters 4 through 8 explain how to design your database, stored procedures, and client applications to maximize performance. • Chapters 9 through 12 explain how to create and use VoltDB clusters to increase scalability and availability. • Chapters 13 through 15 explain how VoltDB ensures the durability of your data and how you can integrate VoltDB with other data sources using export for complete business solutions 4
19 .Chapter 2. Installing VoltDB VoltDB is available in both open source and commercial editions. The open source, or community, edition provides all the transactional performance benefits of VoltDB, plus basic durability and availability. The commercial editions provide additional features needed to support production environments, such as complete durability, dynamic scaling, and WAN replication. Depending on which version you choose, the VoltDB software comes as either pre-built distributions or as source code. This chapter explains the system requirements for running VoltDB, how to install and upgrade the software, and what resources are provided in the kit. 2.1. Operang System and Soware Requirements The following are the requirements for developing and running VoltDB applications. Table 2.1. Operating System and Software Requirements Operating System VoltDB requires a 64-bit Linux-based operating system. Kits are built and qualified on the following platforms: • CentOS version 7.0 and later • Red Hat (RHEL) version 7.0 and later • Ubuntu versions 14.04, 16.04, and 18.04 • Macintosh OS X 10.9 and later (for development only) CPU • Dual core1 x86_64 processor • 64 bit • 1.6 GHz Memory 4 Gbytes2 Java3 VoltDB Server: Java 8 or Java 11 Java and JDBC Client: Java 8 or Java 11 Required Software Time synchronization service, such as NTP or chrony4 Python 2.6 or later (2.7 is recommended) Recommended Software Eclipse 3.x (or other Java IDE) Footnotes: 1. Dual core processors are a minimum requirement. Four or eight physical cores are recommended for optimal performance. 2. Memory requirements are very specific to the storage needs of the application and the number of nodes in the cluster. However, 4 Gigabytes should be considered a minimum configuration. 3. VoltDB supports JDKs from OpenJDK or Oracle. 4. Time synchronization services minimize the time difference between nodes in a database cluster, which is critical for VoltDB. All nodes of the cluster should be configured to synchronize against the same time server. Using a single local server is recommended, but not required. 5
20 . Installing VoltDB 2.2. Installing VoltDB VoltDB is distributed as a compressed tar archive. The file name identifies the edition (community or enterprise) and the version number. The best way to install VoltDB is to unpack the distribution kit as a folder in the home directory of your personal account, like so: $ tar -zxvf voltdb-ent-7.0.tar.gz -C $HOME/ Installing into your personal directory gives you full access to the software and is most useful for development. If you are installing VoltDB on a production server where the database will be run, you may want to install the software into a standard system location so that the database cluster can be started with the same commands on all nodes. The following shell commands install the VoltDB software in the folder /opt/voltdb: $ sudo tar -zxvf voltdb-ent-7.0.tar.gz -C /opt $ cd /opt $ sudo mv voltdb-ent-7.0 voltdb Note that installing as root using the sudo command makes the installation folders read-only for non- privileged accounts. Which is why installing in $HOME is recommended for running the sample applications and other development activities. 2.2.1. Upgrading From Older Versions When upgrading an existing database from a recent version of VoltDB, the easiest way to upgrade is as follows: 1. Perform an orderly shutdown of the database, saving a final snapshot (voltadmin shutdown --save) 2. Upgrade the VoltDB software 3. Restart the database (voltdb start) Using this process VoltDB automatically restores the final snapshot taken before the upgrade. To upgrade VoltDB on clusters running database replication (DR), see the instructions specific to DR in the VoltDB Administrator's Guide. If you are upgrading from a version before V6.8, you need to save and restore the snapshot manually. In which case, the recommended steps for upgrading an existing database are: 1. Place the database in admin mode (voltadmin pause --wait). 2. Perform a manual snapshot of the database (voltadmin save --blocking). 3. Shutdown the database (voltadmin shutdown). 4. Upgrade VoltDB. 5. Initialize a new database root directory (voltdb init) 6. Start the new database in admin mode (voltdb start --pause). 7. Restore the snapshot created in Step #2 (voltadmin restore). 6
21 . Installing VoltDB 8. Return the database to normal operations (voltadmin resume). 2.2.2. Building a New VoltDB Distribuon Kit If you want to build the open source VoltDB software from source (for example, if you want to test recent development changes), you must first fetch the VoltDB source files. The VoltDB sources are stored in a GitHub repository. The VoltDB sources are designed to build and run on 64-bit Linux-based or 64-bit Macintosh platforms. However, the build process has not been tested on all possible configurations. Attempts to build the sources on other operating systems may require changes to the build files and possibly to the sources as well. Once you obtain the sources, use Ant 1.7 or later to build a new distribution kit for the current platform: $ ant dist The resulting distribution kit is created as obj/release/volt-n.n.nn.tar.gz where n.n.nn identifies the current version and build numbers. Use this file to install VoltDB according to the instructions in Section 2.2, “Installing VoltDB”. 2.3. Seng Up Your Environment VoltDB comes with shell command scripts that simplify the process of developing and deploying VoltDB applications. These scripts are in the /bin folder under the installation root and define short-cut commands for executing many VoltDB actions. To make the commands available to your session, you must include the /bin directory as part your PATH environment variable. You can add the /bin directory to your PATH variable by redefining PATH. For example, the following shell command adds /bin to the end of the environment PATH, assuming you installed the VoltDB Enterprise Edition as /voltdb-ent-n.n in your $HOME directory: $ export PATH="$PATH:$HOME/voltdb-ent-n.n/bin" To avoid having to redefine PATH every time you create a new session, you can add the preceding command to your shell login script. For example, if you are using the bash shell, you would add the preceding command to the $HOME/.bashrc file. 2.4. What is Included in the VoltDB Distribuon Table 2.2 lists the components that are provided as part of the VoltDB distribution. Table 2.2. Components Installed by VoltDB Component Description VoltDB Software & Runtime The VoltDB software comes as Java archives (.JAR files) and a callable library that can be found in the /voltdb subfolder. Other software libraries that VoltDB depends on are included in a separate /lib subfolder. Example Applications VoltDB comes with several example applications that demonstrate VoltDB capabilities and performance. They can be found in the / examples subfolder. 7
22 . Installing VoltDB Component Description VoltDB Management Center VoltDB Management Center is a browser-based management tool for monitoring, examining, and querying a running VoltDB database. The Management Center is bundled with the VoltDB server software. You can start the Management Center by connecting to the HTTP port of a running VoltDB database server. For example, http:// voltsvr:8080/. Note that the httpd server and JSON interface must be enabled on the server to be able to access the Management Center. Shell Commands The /bin subfolder contains executable scripts to perform common VoltDB tasks, such as starting the VoltDB server process and issuing database queries from the command line using sqlcmd, Add the / bin subfolder to your PATH environment variable to use the following shell commands: csvloader jdbcloader kafkaloader sqlcmd voltadmin voltdb Documentation Online documentation, including the full manuals and javadoc describing the Java programming interface, is available in the /doc subfolder. 2.5. VoltDB in Acon: Running the Sample Applicaons Once you install VoltDB, you can use the sample applications to see VoltDB in action and get a better understanding of how it works. The easiest way to do this is to set directory to the /examples folder where VoltDB is installed. Each sample application has its own subdirectory and a run.sh script to simplify building and running the application. See the README file in the /examples subfolder for a complete list of the applications and further instructions. Once you get a taste for what VoltDB can do, we recommend following the VoltDB tutorial to understand how to create your own applications using VoltDB. 8
23 .Chapter 3. Starng the Database This chapter describes the procedures for starting and stopping a VoltDB database and includes details about configuring the database. The chapter contains the following sections: • Section 3.1, “Initializing and Starting a VoltDB Database” • Section 3.2, “Initializing and Starting a VoltDB Database on a Cluster” • Section 3.3, “Stopping a VoltDB Database” • Section 3.5, “Restarting a VoltDB Database” • Section 3.6, “Updating Nodes on the Cluster” • Section 3.7, “Defining the Cluster Configuration” 3.1. Inializing and Starng a VoltDB Database Before you start a VoltDB database, you must initialize the root directory where VoltDB stores its configuration data, logs, and other disk-based information. Once you initialize the root directory, you can start the database. For example, you can accept the defaults for the voltdb init and start commands to initialize and start a new, single-node database suitable for developing and testing a database and application. $ voltdb init $ voltdb start This creates a VoltDB root directory as a subfolder of your current working directory and starts a database with all default options. You only need to initialize the root directory once and can then start and stop the database as often as you like. $ voltadmin shutdown $ voltdb start If you are using command logging, which is enabled by default in the VoltDB Enterprise Edition, VoltDB automatically saves and recovers your database between any stoppage and a restart. If you are not using command logging, you will want to save a snapshot before shutting down. The easiest way to do this is by adding the --save argument to the shutdown command. The snapshot is automatically restored when the database restarts: $ voltadmin shutdown --save $ voltdb start If you want to create a new database, you can reinitialize the root directory. However, you must use the -- force flag if the database has already been used; VoltDB will not clear the root directory of existing data unless you explicitly "force" it to. $ voltdb init --force $ voltdb start Also, you can specify an alternate location for the root directory using the --dir or -D flag. Of course, you must specify the same location for the root directory when both initializing and starting the database. You cannot start a database in a directory that has not been initialized. 9
24 . Starting the Database $ voltdb init --dir=~/mydb $ voltdb start --dir=~/mydb In most cases, you will want to use additional arguments to configure the server and database options. But the preceding commands are sufficient to get you started in a test environment. The rest of this chapter explains how to use other arguments and how to start, stop, and recover a database when using a cluster. Finally, when using the VoltDB Enterprise Edition, you must provide a license file when starting the database. VoltDB looks for the license as a file named license.xml in three possible locations, in the following order: 1. The current working directory 2. The directory where the VoltDB image files are installed (usually in the /voltdb subfolder of the installation directory) 3. The current user's home directory If the license file is not in any of these locations, you must explicitly identify it when you run the voltdb start command using the --license or -l flag. For example, the command might be: $ voltdb start -l /usr/share/voltdb-license.xml The examples in this manual assume that the license file is in one of the default locations and therefore do not show the --license flag for simplicity's sake. 3.2. Inializing and Starng a VoltDB Database on a Cluster You initialize and start a cluster the same way you start a single node: with the voltdb init and start commands. The only difference is that when starting the cluster, you must tell the cluster nodes how big the cluster is and which nodes to use as potential hosts for the startup. You initialize a root directory on each server using the voltdb init command. You can accept the default configuration as shown in the previous section. However, when setting up a cluster you often want to make some configuration adjustments (for example, enabling K-safety). So it is a good idea to get into the habit of specifying a configuration file. You specify the configuration file with the --config or -C flag when you initialize the root directory. All nodes must use the same configuration file. For example: $ voltdb init -D ~/mydb --config=myconfig.xml Once the nodes are initialized, you start the cluster by issuing the voltdb start command on all nodes specifying the following information: • Number of nodes in the cluster: When you start the cluster, you specify how many servers will make up the cluster using the --count flag. • Host names: You specify the hostnames or IP addresses of one or more servers from the cluster that are potential "hosts" for coordinating the formation of the cluster. You specify the list of hosts with the --host or -H flag. You must specify at least one node as a host. For each node of the cluster, log in and start the server process using the same voltdb start command. For example, the following example starts a five-node database cluster specifying voltsvr1 as the host node. 10
25 . Starting the Database Be sure the number of nodes on which you run the command match the number of nodes specified in the --count argument. $ voltdb start --count=5 -–host=voltsvr1 Or you can also use shortened forms for the argument flags: $ voltdb start -c 5 -H voltsvr1 Although you only need to specify one potential host, it is a good idea to specify multiple hosts. This way, you can use the exact same command for both starting and rejoining nodes in a highly-available cluster. Even if the rejoining node is in the host list another, running node can be chosen to facilitate the rejoin. To simplify even further, you can specify all of the servers in the --host argument. If you do this, you can skip the --count argument. If --count is missing, VoltDB assumes the number of servers in the --host list is complete and sets the server count to match. For example, the following command — issued on all three servers — starts a three node cluster: $ voltdb start --host=svrA,svrB,svrC When starting a VoltDB database on a cluster, the VoltDB server process performs the following actions: 1. If you are starting the database process on the node selected as the host node, it waits for initialization messages from the remaining nodes. The host is selected from the list of hosts on the command line and plays a special role during startup by managing the cluster initiation process. It is important that all nodes in the cluster can resolve the hostnames or IP addresses of the host nodes you specify. 2. If you are starting the database on a non-host node, it sends an initialization message to the host indicating that it is ready. The database is not operational until the correct number of nodes (as specified on the command line) have connected. 3. Once all the nodes have sent initialization messages, the host sends out a message to the other nodes that the cluster is complete. Once the startup procedure is complete, the host's role is over and it becomes a peer like every other node in the cluster. It performs no further special functions. Manually logging on to each node of the cluster every time you want to start the database can be tedious. Instead, you can use secure shell (ssh) to execute shell commands remotely. By creating an ssh script (with the appropriate permissions) you can copy files and/or start the database on each node in the cluster from a single script. Or you can use distributed system management tools such as Chef and Puppet to automate the startup procedures. 3.3. Stopping a VoltDB Database Once the VoltDB database is up and running, you can shut it down by stopping the VoltDB server processes on each cluster node. However, it is easier to stop the database as a whole with a single command. You do this with the voltadmin shutdown command, which pauses database activity, completes all current transactions, and empties any queued data (such as export or database replication) before shutting down. For example, entering the following command without specifying a host server will perform an orderly shut down the database cluster the current system is part of. $ voltadmin shutdown If you are not using command logging, which automatically saves all progress, be sure to add the --save argument to save a final snapshot before shutting down: 11
26 . Starting the Database $ voltadmin shutdown --save To shutdown a database running on another system, use the --host argument to access the remote database. For example, the following command shuts down the VoltDB database that includes the server zeus: $ voltadmin shutdown --host=zeus You can pause the database using the voltadmin pause command to restrict clients from accessing it while you perform changes in administration mode. You resume the database using the voltadmin resume command. See the VoltDB Administrator's Guide for more about modes of operation. 3.4. Saving the Data Because VoltDB is an in-memory database, once the database server process stops, the database schema and the data itself are removed from memory. However, VoltDB can save this information to disk through the use of command logs and snapshots, so use of these features is strongly encouraged. • Command logging provides the most complete data durability for VoltDB and is enabled by default in the VoltDB Enterprise Edition. Command logging works automatically by saving a record of every transaction. These logs can then be replayed if the database stops for any reason. • Snapshots, on the other hand, provide a point-in-time copy of the database contents written to disk. You can create snapshots manually with the voltadmin save command, you can enable periodic (also known as automatic) snapshots, or you can save a final snapshot when you shutdown the database using the voltadmin shutdown --save command. Snapshots are restored when the database restarts, but only take you back to the state of the database at the time the last snapshot was saved. To learn more about using command logging see Chapter 14, Command Logging and Recovery. To learn more about how to save and restore snapshots of the database, see Chapter 13, Saving & Restoring a VoltDB Database. 3.5. Restarng a VoltDB Database Once a database stops, you can restart it using the same voltdb start command used to start the database the first time. Once the database starts, any command logs or snapshots are restored. In the VoltDB Enterprise Edition, command logs automatically restore the last state of the database. If no command log exist but a snapshot does, the database is restored to its state when that snapshot was taken. For example, the following command restarts a single-node database: $ voltdb start To restart a database on a cluster, issue the same voltdb start command used to start that cluster, including the server count and list of host nodes. For example: $ voltdb start --count=5 -–host=voltsvr1 3.6. Updang Nodes on the Cluster A cluster is a dynamic system in which nodes might be stopped either deliberately or by unforeseen circumstances, or nodes might be added to the cluster on-the-fly to scale the database for improved performance. The voltdb start command provides the following additional functions, described later in this book, for rejoining and adding nodes to a running VoltDB database: 12
27 . Starting the Database • Section 10.3, “Recovering from System Failures” — Use the same voltdb start command to start the cluster or rejoin a failed node. • Section 9.3.1, “Adding Nodes with Elastic Scaling” — Use voltdb start with the --add flag to add a new node to the running database cluster. 3.7. Defining the Cluster Configuraon Two important aspects of a VoltDB database are the physical layout of the cluster that runs the database and the database features you choose to use. You define the physical cluster layout on the voltdb start command using the --count and --host arguments. You enable and disable specific database features in the configuration file when you initialize the database root directory with the voltdb init command. The configuration file is an XML file, which you specify when you initialize the root directory. The basic syntax of the configuration file is as follows: <?xml version="1.0"?> <deployment> <cluster kfactor="n" /> <feature option... > </feature> ... </deployment> The attributes of the <cluster> tag define the layout of the database partitions. The attributes of the <cluster> tag are: • sitesperhost — specifies the number of partitions created on each server in the cluster. The sitesperhost value times the number of servers gives you the total number of partitions in the cluster. See Section 3.7.1, “Determining How Many Sites per Host” for more information about partition count. • kfactor — specifies the K-safety value to use for durability when creating the database. The K-safety value controls the duplication of database partitions. See Chapter 10, Availability for more information about K-safety. In the simplest case — when running on a single node with no special options enabled — you can skip the configuration file on the voltdb init command and the server count and host list on the voltdb start command. If you do not specify a configuration file, VoltDB defaults to eight execution sites per host, and a K-safety value of zero. The configuration file is also used to enable and configure many other runtime options related to the database, which are described later in this book. For example, the configuration file can specify: • Whether security is enabled and what users and passwords are needed to authenticate clients at runtime. See Chapter 12, Security for more information. • A schedule for saving automatic snapshots of the database. See Section 13.2, “Scheduling Automated Snapshots”. • Properties for exporting and importing data to other data sources. See Chapter 15, Importing and Exporting Live Data. 13
28 . Starting the Database For the complete configuration file syntax, see Appendix E, Configuration File (deployment.xml). 3.7.1. Determining How Many Sites per Host There is very little penalty for allocating more sites than needed for the partitions the database will use (except for incremental memory usage). Consequently, VoltDB defaults to eight sites per node to provide reasonable performance on most modern system configurations. This default does not normally need to be changed. However, for systems with a large number of available processors (16 or more) or older machines with fewer than 8 processors and limited memory, you may wish to tune the sitesperhost attribute. The number of sites needed per node is related to the number of processor cores each system has, the optimal number being approximately 3/4 of the number of CPUs reported by the operating system. For example, if you are using a cluster of dual quad-core processors (in other words, 8 cores per node), the optimal number of partitions is likely to be 6 or 7 sites per node. <?xml version="1.0"?> <deployment> <cluster . . . sitesperhost="6" /> </deployment> For systems that support hyperthreading (where the number of physical cores support twice as many threads), the operating system reports twice the number of physical cores. In other words, a dual quad- core system would report 16 virtual CPUs. However, each partition is not quite as efficient as on non- hyperthreading systems. So the optimal number of sites is more likely to be between 10 and 12 per node in this situation. Because there are no hard and set rules, the optimal number of sites per node is best calculated by actually benchmarking the application to see what combination of cores and sites produces the best results. However, it is important to remember that all nodes in the cluster will use the same number of sites. So the best performance is achieved by using a cluster with all nodes having the same physical architecture (i.e. cores). 3.7.2. Configuring Paths for Runme Features An important aspect of some runtime features is that they make use of disk resources for persistent storage across sessions. For example, automatic snapshots need a directory for storing snapshots of the database contents. Similarly, export uses disk storage for writing overflow data if the export connector cannot keep up with the export queue. You can specify individual paths for each feature in the configuration file. If not, VoltDB creates subfolders for each feature in the database root directory as needed, which can be useful for testing. However, in production, it is useful to direct certain high volume features, such as command logging, to separate devices to avoid disk I/O affecting database performance. You can identify specific path locations, within the <paths> element, for the following features: • <commandlog> • <commandlogsnapshot> • <exportoverflow> • <snapshots> 14
29 . Starting the Database If you name a specific feature path and it does not exist, VoltDB attempts to create it for you. For example, the <exportoverflow> path contains temporary data which can be deleted periodically. The following excerpt from a configuration file specifies /opt/overflow as the directory for export overflow. <paths> <exportoverflow path="/opt/overflow" /> </paths> 3.7.3. Verifying your Hardware Configuraon The configuration file and start command options define the desired configuration of your database cluster. However, there are several important aspects of the physical hardware and operating system configuration that you should be aware of before running VoltDB: • VoltDB can operate on heterogeneous clusters. However, best performance is achieved by running the cluster on similar hardware with the same type of processors, number of processors, and amount of memory on each node. • All nodes must be able to resolve the IP addresses and host names of the other nodes in the cluster. That means they must all have valid DNS entries or have the appropriate entries in their local hosts file. • You must run a time synchronization service such as Network Time Protocol (NTP) or chrony on all of the cluster nodes, preferably synchronizing against the same local time server. If the time skew between nodes in the cluster is greater than 200 milliseconds, VoltDB cannot start the database. • It is strongly recommended that you configure your time service to avoid adjusting time backwards for all but very large increments. For example, in NTP this is done using the -x argument. If the server time moves backward, VoltDB must pause and wait for time to catch up. 15