19_10 - DataStax-eBook-The-5-Main-Benefits-of-Apache-Cassandra



1 .The 5 Main Benefits of Apache Cassandra™

2 . E B O O K | T H E 5 M A I N B E N E F I T S O F A PA C H E C A S S A N D R A ™ Introduction For decades, organizations relied on traditional relational database management systems (RDBMS) to organize, store, and analyze their data. But then Facebook came along, and an RDBMS was suddenly not quite enough. The social giant needed a powerful database solution for its Inbox Search feature, and Apache Cassandra—a distributed NoSQL database—was born. Released as an open source project in July 2008, Cassandra—named after the mythological prophet who famously put a curse on an oracle—became an Apache Incubator project in March 2009. It graduated to a top-level project in February 2010. Since its 2010 release, Cassandra has gone through several iterations. As we approach the release of Cassandra 4.0, it’s worth checking out a brief overview of how the database evolved over the last several years: Cassandra 1.0 Cassandra 3.0 Cassandra 4.0 was released in was released in is expected to be released October 2011, adding November 2015, adding in the near future. It will improved read a refactored storage include increased reliability, performance, integrated engine, materialized audit logging, and simplified compression, and more. views, and more. repair operations, among 2011 2013 2017 other things. JUNE SEPTEMBER JUNE Cassandra 0.8 2011 Cassandra 2.0 2015 Cassandra 3.11 was released in June OCTOBER was released in NOVEMBER was released in 2011, adding support September 2013, June 2017; it’s the for the Cassandra Query adding lightweight latest release. Language (CQL), support transactions, improved for zero-downtime compactions, and more. upgrades, and more. 2

3 . E B O O K | T H E 5 M A I N B E N E F I T S O F A PA C H E C A S S A N D R A ™ Introduction As an open source project, Cassandra is freely available from the Apache Software Foundation. There are, however, various distributions of Cassandra—one of which is (Cont.) DataStax Distribution of Apache Cassandra™, which is distributed and supported by the same people who wrote the majority of Cassandra’s code. Cassandra adoption has significantly increased over the last few years, and for good reason: the distributed database delivers a ton of value. With that in mind, let’s take a look at five of the big benefits of Cassandra. 3

4 . E B O O K | T H E 5 M A I N B E N E F I T S O F A PA C H E C A S S A N D R A ™  When you scale easily, you win. Period. There’s no substitute for knowing you’ll be able to handle a surge of holiday season traffic— Scalability even when you’re asleep. On the flipside, when scaling is difficult to achieve or adds significant risk like potential downtime, you panic. You never know when a large influx of traffic is headed your way. If your systems can’t scale to accommodate this traffic, your customers will go somewhere else. Generally speaking, there are two ways to achieve scale at the database level: 1. You can scale upward by adding capacity to a single machine (e.g., memory, storage, and CPU). You won’t have to run multiple servers. But there’s a much bigger chance your infrastructure will fail due to increased strain. You’ll only need to handle a single system—or a small number of systems. However, you’ll also have a single point of failure and you will likely spend a lot of money on implementation (i.e., expensive high-end hardware)—to the point that you’ll likely be completely locked in. 2. You can scale out by adding more servers. Of course, you’ll have to run more servers. Licensing fees and utility costs might go up, too. But overall you’ll spend a lot less cash compared to scaling up. You’ll also enjoy resilience and fault tolerance, both of which can be baked into the foundation of the database cake. Cassandra enables organizations to scale out easily in a linear fashion—which is quickly becoming the preferred method of scalability for leading enterprises. Scaling out is simple: if you want to double the workload, just double the number of servers. It’s that easy. You can scale out without downtime or impacting performance. 4

5 . E B O O K | T H E 5 M A I N B E N E F I T S O F A PA C H E C A S S A N D R A ™  Node 1 Node 1 Node 8 Node 1 Node 2 Scalability 100,000 Node 3 200,000 Node 4 Node 7 400,000 Node 3 (Cont.) ops/sec ops/sec ops/sec Node 6 Node 4 Node 2 Node 2 Node 5 Source Not only will Cassandra’s ability to scale save you tons of money, but you also won’t have to worry about getting stuck into a less-than-optimal vendor’s tech stack, either. BUSINESS VALUE: It’s estimated that Amazon lost up to $100 million on a one-minute outage in 2018, ostensibly due to too many users flooding the site simultaneously. With scalable systems in place, your business won’t miss out on opportunities during heavily trafficked periods, and you’ll be avoiding extremely costly outages. Add opportunity. Subtract losses. That’s value. 5

6 . E B O O K | T H E 5 M A I N B E N E F I T S O F A PA C H E C A S S A N D R A ™  It’s a bit of a paradox, but the world is becoming increasingly connected as it becomes increasingly distributed. High This evolving reality demands a database that can handle data coming from multiple Availability geographically distributed sources. via Data Traditionally, databases had master-slave architectures. Master nodes could read Replication and write while slave nodes could only read. While this architecture helped ensure consistency, it also introduced serious problems. Database operations, for example, would grind to a halt in the event the master node failed. That might have been something an enterprise could stomach in the 1980s. But as we approach 2020, no serious organization can absorb such a significant disruption. Good news: Cassandra’s masterless architecture means that every node can perform read and write operations. This enables data to quickly be replicated across data centers and geographies. NORTH AMERICA Amazon EC2 SAN NEW YORK FRANCISCO EMEA Microsoft Azure 6

7 . E B O O K | T H E 5 M A I N B E N E F I T S O F A PA C H E C A S S A N D R A ™  As a result, team members and customers spread out across the world can expect an optimal experience each time they interact with applications. Data is always available, High no matter where the physical infrastructure is located. In the event a node gets knocked offline, traffic is automatically rerouted to the nearest healthy node. Availability via Data Replication (Cont.) BUSINESS VALUE: Recent IBM research revealed that bad data collectively costs U.S. organizations $3.1 trillion each year. Thanks to Cassandra, you won’t have to worry about duplicative work, lost intellectual property, or inaccessible customer data. Automatic data replication means data is never lost, and because of this, you don’t need to invest in a separate disaster recovery data center. Money saved is money earned. 7

8 . E B O O K | T H E 5 M A I N B E N E F I T S O F A PA C H E C A S S A N D R A ™  In a perfect world, your systems would always run as designed— even when one part fails. High Fault Cassandra gives you the ticket to that perfect world. Tolerance Thanks to its masterless, peer-to-peer architecture and data replication capabilities, applications never slow down or fail when nodes get knocked offline. If you use the leading distribution of Cassandra, DataStax Enterprise, you’ll have built-in repair services that fix problems immediately after they occur. Cassandra also has transparent fault detection and recovery—nodes that fail can easily be restored or replaced. When a node goes down, master-slave architectures require administrators to invest a lot of time and energy repairing the database. Cassandra has no such requirements; there’s no need for any manual intervention when a node fails. With Cassandra, you can forget about fault tolerance altogether. It’s automatic. BUSINESS VALUE: According to Gartner, the average company loses $5,600 per hour of downtime. On the high end, an enterprise can lose as much as $540,000 per hour of downtime. Who can afford that? 8

9 . E B O O K | T H E 5 M A I N B E N E F I T S O F A PA C H E C A S S A N D R A ™  Suffice it to say: speed matters. High We expect prompt service at restaurants, quick delivery of packages, and zero lag from our applications. And when things don’t happen as quickly as we hoped, we are prone Performance to switch to a better service. The same holds true for websites and applications. Consider these statistics compiled by HubSpot: N N 47% of customers expect a website to load in two seconds or less N N 79% of customers are unlikely to support a business that has poor website performance N N A one-second delay in page load time translates into an 11% reduction in page views The end result? Employees can get things done quickly and customers can enjoy positive user experiences in every interaction. In a world that moves faster than lightning, yesterday’s data is already a dinosaur. BUSINESS VALUE: Thanks to Cassandra’s high performance, developer productivity increases as users don’t have high latency or bottlenecks slowing them down. From the customer’s perspective, websites and applications will work as they’re expected to, translating into positive user experiences and improved customer retention. 9

10 . E B O O K | T H E 5 M A I N B E N E F I T S O F A PA C H E C A S S A N D R A ™  In an age where hybrid cloud is quickly becoming the go-to data management environment, this is key. Multi-Data Cassandra is designed as a distributed system for deployment of large numbers Center and of nodes across multiple data centers. Key features of Cassandra’s distributed and Hybrid architecture are specifically tailored for multiple-data center deployment. These features are robust and flexible enough that you can configure the cluster for optimal Cloud Support geographical distribution, for redundancy for failover and disaster recovery, or even for creating a dedicated analytics center replicated from your main data storage centers. Cassandra characteristics that are key to multi-data center deployment include: NN Replication factor and replica placement strategy – NetworkTopologyStrategy (the default placement strategy) has capabilities for fine-grained adjustment of the number and location of replicas at the data center and rack level. NN Snitch – For multi-data center deployments, it is important to make sure the snitch has complete and accurate information about the network, either by automatic detection (RackInferringSnitch) or details specified in a properties file (PropertyFileSnitch). NN Consistency level – Cassandra provides consistency levels that are specifically designed for scenarios with multiple data centers. Your specific needs will determine how you combine these ingredients in a “recipe” for multi-data center operations. BUSINESS VALUE: Being able to reliably serve a distributed, global audience with powerful, always-on applications means using multiple data centers. To serve multiple data centers, having an easily scalable database across geographic regions is critical. 10

11 . E B O O K | T H E 5 M A I N B E N E F I T S O F A PA C H E C A S S A N D R A ™ How DataStax As previously mentioned, there are also various distributions of Cassandra out there, one of which is DataStax Distribution of Distribution Apache Cassandra. of Apache DataStax Distribution of Apache Cassandra is 100% open source compatible and Cassandra allows organizations to unlock the true power of Cassandra. Adds Value Enterprises that use DataStax Distribution of Apache Cassandra benefit from a production-ready version of Cassandra that’s gone through an intensive QA process. Remember how expensive downtime is? With DataStax Distribution of Apache Cassandra, hotfixes, bug escalation, and upgrades are included—which accelerates time to resolution and reduces maintenance costs. DataStax Distribution of Apache Cassandra also comes with 8x5 support—with the option of 24x7x365 support—from the folks responsible for writing a majority of the Cassandra codebase. DataStax Distribution of Apache Cassandra allows you to avoid the maintenance, support, and compliance issues many enterprises that deploy the open source version of Cassandra eventually run into. Get started with DataStax Distribution of Apache Cassandra here. 11

12 . E B O O K | T H E 5 M A I N B E N E F I T S O F A PA C H E C A S S A N D R A ™ About DataStax delivers the always-on, active-everywhere distributed hybrid cloud database built on Apache Cassandra™. The foundation for personalized, real-time applications DataStax at scale, DataStax Enterprise makes it easy for enterprises to exploit hybrid and multi- cloud environments via a seamless data layer that eliminates the issues that typically come with deploying applications across multiple on-premises data centers and/or multiple public clouds. Our product also gives businesses full data visibility, portability, and control, allowing them to retain strategic ownership of their most valuable asset in a hybrid/multi cloud world. We help many of the world’s leading brands across industries transform their businesses through an enterprise data layer that eliminates data silos and cloud vendor lock-in while powering modern, mission-critical applications. For more information, visit www.DataStax.com and follow us on Twitter @DataStax. © 2019 DataStax, All Rights Reserved. DataStax, Titan, and TitanDB are registered trademarks of DataStax, Inc. and its subsidiaries in the United States and/or other countries. Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka, and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States, and/or other countries. Last Update: FEB2019 12

6 点赞
2 收藏