HBase Schema设计

大多数开发小伙伴都熟悉“数据库设计”这个词,在关系数据库的概念中,范式是关键。然而在大数据时代,分布式,可扩展,类似HBase这种nonSQL数据库,打破了传统关系数据库中的某些范式规定。这篇PPT将从大概念上介绍HBase大schema设计理念,并给一些通用的模式和真实的schema设计案例。当然也会探讨一些HBase的基本概念包括rowkey,列族等等,通过理解这些概念而理解在Schema定义的时候我们必须要做的一些取舍。
展开查看详情

1.HBase Schema Design

2.HBase Schema Design How I Learned To Stop Worrying And Love Denormalization

3.What Is Schema Design?

4. Who am I? Ian Varley Software engineer at Salesforce.com @thefutureian

5.What Is Schema Design? Logical Data Modeling

6.What Is Schema Design? Logical Data Modeling + Physical Implementation

7.You always start with a logical model. Even if it's just an implicit one. That's totally fine. (If you're right.)

8.There are lots of ways to model data. The most common one is: Entity / Attribute / Relationship (This is probably what you know of as just "data modeling".)

9.There's a well established visual language for data modeling.

10. Entities are boxes. With rounded corners if you're fancy.

11.Attributes are listed vertically in the box. Optionally with data types, for clarity.

12.Relationships are connecting lines. Optionally with special endpoints, and/or verbs

13.Example: Concerts

14.Example: Concerts

15. Example: Concerts A note about diagrams: they're useful for communicating, but can be more trouble than they're worth. Don't do them out of obligation; only do them to understand your problem better.

16. Example: Concerts Entity Attribute Relationship

17.For relational databases, you usually start with this normalized model, then plug & chug.

18.For relational databases, you usually start with this normalized model, then plug & chug. Entities → Tables Attributes → Columns Relationships → Foreign Keys Many-to-many → Junction tables Natural keys → Artificial IDs

19.

20.So, what's missing?

21. So, what's missing? If your data is not massive, NOTHING. You should use a relational database. They rock*

22. So, what's missing? If your data is not massive, NOTHING. You should use a relational database. They rock* * - This statement has not been approved by the HBase product management committee, and neglects known deficiencies with the relational model such as poor modeling of hierarchies and graphs, overly rigid attribute structure enforcement, neglect of the time dimension, and physical optimization concerns leaking into the conceptual abstraction.

23.Relational DBs work well because they are close to the pure logical model. That model is less likely to change as your business needs change. You may want to ask different questions over time, but if you got the logical model correct, you'll have the answers.

24.Ah, but what if you do have massive data? Then what's missing?

25.Problem: The relational model doesn't acknowledge scale.

26.Problem: The relational model doesn't acknowledge scale. "It's an implementation concern; you shouldn't have to worry about it."

27.The trouble is, you do have to worry about it. So you... ● Add indexes ● Add hints ● Write really complex, messy SQL ● Memorize books by Tom Kyte & Joe Celko ● Bow down to the optimizer! ● Denormalize ● Cache ● etc ...

28.Generally speaking, you poke holes in the abstraction, and it starts leaking.

29.So then you hear about this thing called NoSQL. Can it help?