1.HBase Schema Design
2.HBase Schema Design How I Learned To Stop Worrying And Love Denormalization
3.What Is Schema Design?
4. Who am I? Ian Varley Software engineer at Salesforce.com @thefutureian
5.What Is Schema Design? Logical Data Modeling
6.What Is Schema Design? Logical Data Modeling + Physical Implementation
7.You always start with a logical model. Even if it's just an implicit one. That's totally fine. (If you're right.)
8.There are lots of ways to model data. The most common one is: Entity / Attribute / Relationship (This is probably what you know of as just "data modeling".)
9.There's a well established visual language for data modeling.
10. Entities are boxes. With rounded corners if you're fancy.
11.Attributes are listed vertically in the box. Optionally with data types, for clarity.
12.Relationships are connecting lines. Optionally with special endpoints, and/or verbs
15. Example: Concerts A note about diagrams: they're useful for communicating, but can be more trouble than they're worth. Don't do them out of obligation; only do them to understand your problem better.
16. Example: Concerts Entity Attribute Relationship
17.For relational databases, you usually start with this normalized model, then plug & chug.
18.For relational databases, you usually start with this normalized model, then plug & chug. Entities → Tables Attributes → Columns Relationships → Foreign Keys Many-to-many → Junction tables Natural keys → Artificial IDs
20.So, what's missing?
21. So, what's missing? If your data is not massive, NOTHING. You should use a relational database. They rock*
22. So, what's missing? If your data is not massive, NOTHING. You should use a relational database. They rock* * - This statement has not been approved by the HBase product management committee, and neglects known deficiencies with the relational model such as poor modeling of hierarchies and graphs, overly rigid attribute structure enforcement, neglect of the time dimension, and physical optimization concerns leaking into the conceptual abstraction.
23.Relational DBs work well because they are close to the pure logical model. That model is less likely to change as your business needs change. You may want to ask different questions over time, but if you got the logical model correct, you'll have the answers.
24.Ah, but what if you do have massive data? Then what's missing?
25.Problem: The relational model doesn't acknowledge scale.
26.Problem: The relational model doesn't acknowledge scale. "It's an implementation concern; you shouldn't have to worry about it."
27.The trouble is, you do have to worry about it. So you... ● Add indexes ● Add hints ● Write really complex, messy SQL ● Memorize books by Tom Kyte & Joe Celko ● Bow down to the optimizer! ● Denormalize ● Cache ● etc ...
28.Generally speaking, you poke holes in the abstraction, and it starts leaking.
29.So then you hear about this thing called NoSQL. Can it help?