0% found this document useful (0 votes)
11 views9 pages

Aggregrate Data Models

Chapter 2 discusses aggregate data models, which provide a way to perceive and manipulate data in databases, distinguishing between data models and storage models. It highlights the shift from relational models to NoSQL models, specifically focusing on aggregate orientation, which allows for more complex data structures and easier manipulation of related data as a unit. The chapter also contrasts key-value and document data models, emphasizing their aggregate-oriented nature and the differences in how data is accessed and structured.

Uploaded by

Chess
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
11 views9 pages

Aggregrate Data Models

Chapter 2 discusses aggregate data models, which provide a way to perceive and manipulate data in databases, distinguishing between data models and storage models. It highlights the shift from relational models to NoSQL models, specifically focusing on aggregate orientation, which allows for more complex data structures and easier manipulation of related data as a unit. The chapter also contrasts key-value and document data models, emphasizing their aggregate-oriented nature and the differences in how data is accessed and structured.

Uploaded by

Chess
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 9
Chapter 2 —— Aggregate Data Models A data model is the model through which we perceive and manipulate our data. For people using a database, the data model describes how we interact with the data in the database. This is distinct from a storage model, which describes how the database stores and manipulates the data internally. In an ideal world, we should be ignorant of the storage model, but in practice we need at least some inkling of it—primarily to achieve decent performance. In conversation, the term “data model” often means the model of the specific data in an application. A developer might point to an entity-relationship diagram of their database and refer to that as their data model containing customers, or- ders, products, and the like. However, in this book we'll mostly be using “data model” to refer to the model by which the database organizes data—what might be more formally called a metamodel. The dominant data model of the last couple of decades is the relational data model, which is best visualized as a set of tables, rather like a page of a spread- sheet. Each table has rows, with each row representing some entity of interest. We describe this entity through columns, each having a single value. A column may refer to another row in the same or different table, which constitutes a rela- tionship between those entities. (We’re using informal but common terminology when we speak of tables and rows; the more formal terms would be relations and tuples.) One of the most obvious shifts with NoSQL is a move away from the relational model, Each NoSQL solution has a different model that it uses, which we put into four categories widely used in the NoSQL ecosystem: key-value, document, column-family, and graph. Of these, the first three share a common characteristi of their data models which we will call aggregate orientation. In this chapter we'll explain what we mean by aggregate orientation and whac it means for data models. 13 @ scanned with OKEN Scanner REGATE Data MODELS Cnaprer 2 AC 2.1 Aggregate ‘The relational model takes the infor 7 nle is a limited data structur into tuples (rows). A tuple i Keeape so you cannot nest one tuple within another to get nested records, nor can you put a list of values of tuples within another. This simplicity underpins the rela- tional model—it allows us to think of all operations as operating on and returning mation that we want to store and divides it : It captures a set of values, tuples. ee Aggregate orientation takes a different approach. It recognizes that often, you want fo operate on data in units that have a more complex structure than a set of tuples. It can be handy to think in terms of a complex record that allows lists and other record structures to be nested inside it. As we'll see, key-value, docu- ment, and column-family databases all make use of this more complex record. However, there is no common term for this complex record; in this book we use the term “aggregate.” Aggregate is a term that comes from Domain-Driven Design [Evans]. In Domain-Driven Design, an aggregate is a collection of related objects that we wish to treat as a unit. In particular, itis a unit for data manipulation and man- agement of consistency. Typically, we like to update aggregates with atomic op- erations and communicate with our data storage in terms of aggregates. This definition matches really well with how key-value, document, and column. family databases work. Dealing in aggregates makes it much easier for these databases to handle operating on a cluster, since the aggregate makes a natural unit for replication and sharding, Aggregates are also often easier for application Programmers to work with, since they often manipulate data through aggregate structures, 2.1.1 Example of Relations and Aggregates Atthis point, an example may help explain what we’ re talking about. Let’s assume we have to build an e-commerce website; We are going to be selling items directly our protean tr the web, and we will have to store information about users, data, We can ee orders, shipping addresses, billing addresses, and paymens da We cn ne this scenario to model the data using a relation deta store as ae ae as a about their pros and cons, For a relational se, we might start with a data model shows base, we mi n in Figure 2.1. igure 2.2 Presents some sample data for this model. Ss we're pood r¢ soldiers, i " no data i repeated init Soldiers, everything is properly normalized, so that : Wy Pie tables, We also have referential integrity, Areal istic order system would natur; rn d bur this is the naturally be more involve i is fit ofthe rarefied air of they, De MOFE involved than this, is is chi Now le’s see | see how this model mi perep; = ‘Bbregate-oriented terms (Figure 2,3) look when we think in more @ scanned with OKEN Scanner 2.1 AGGREGATES Vv ———] ‘Customer jp name Le 1 7 a * Order Payment Order Item Billing 1 * Address cardNumber price ba es eae: meal cnld A 1 ‘Address street city state post code shipping Address emer! Figure 2.1 Data model oriented around a relational database (using UML notation [Fowler UML]) [customer | Orders i 3a | Gustonertd | shippingaadressta] 2 2 2 7 | Product ; } BilLingAddress 16 Hane - 1 Custonert | addressia 2 oS0L bast = oso Bistitied = 7 Productia | _ Price a 32.85 |rderPaynent uw =| Fig n, “862.2 Typical data using RDBMS data model @ scanned with OKEN Scanner Cnarrer 2 Aggricate Data MODELS 1 LJ Order bing asons [x a F | ore: payne ‘street 1 Order Item Payment oy ngnatrss [rice cana postcove om [se Toe scoes Figure 2.3. An aggregate data model Again, we have some sample data, which we'll show in JSON format as that’s a common representation for data in NoSQL land. // in customers { “sa": billingAddress": [{"city":"Chicago"}] } // in orders ; OSQL Distilled” i, hi ppingAddress": [{"city": “Chicago” “orderPayment" : [ ee ae { scinfo" :"1000-1000-1000-1000" , ‘txnld":"abeli f879rF¢" “billingAddress”: (eity": "Chicago") 1, 3 @ scanned with OKEN Scanner In this model, we have two main aggregates: customer and order. We've used the black-diamond composition marker in UML to show hows data fits into the aggregation structure, The customer contains a list of billing addresses; the order serains a list of order items, a shipping address, and payments. The payment itself contains a billing address for that payment, : ‘A single logical address record appears three times in the example data, but inatead of using IDs it’s treated as a value and copied cach time, This firs Jomain where we would not want the shipping address, nor the payment’ billing adress, to change. In a relational database, we would ensure that the address ows aren't updated for this case, making a new row instead. With aggcegates, wwe can copy the whole address structure into the aggregate as we need to. The link between the customer and the order isn’t within either aggregate—it's a relationship between aggregates. Similarly, the link from an order item would cross into a separate aggregate structure for products, which we haven't gone jnto. We've shown the product name as part of the order item here—this kind of denormalization is similar to the tradeoffs with relational databases, but is more common with aggregates because we want to minimize the number of aggregates we access during a data interaction, ‘The important thing to notice here isn’t the particular way we've drawn che aggregate boundary so much as the fact that you have to think about accessing that data—and make that part of your thinking when developing the application data model. Indeed we could draw our aggregate boundaries differently, putting all the orders for a customer into the customer aggregate (Figure 2.4). Using the above data model, an example Customer and Order would look like this: // in customers: { tomer": { 1, “name: "Martin", “billingAddress": [{"city": “Chicago"}], “orders”: [ { id":99, ‘customerId":1, rorderteeas”: i “productId” :27, price”: 32.45, “productName": “NoSQL Distilled" hicago"}] ‘shippingAddress": [{"city":" — @ scanned with OKEN Scanner re Data MODELS Cnarter 2 AGGRr Customer name Order billing Address | ‘Address. street 1 iy Shipping Address state pein * x | order payment Post code apd billing Address Order Item’ Payment ice ccinto Lal txnid Ea Product name ——__ Figure 2.4 Embed all the objects for customer and the customer's orders “orderPayment": [ t “ccinfo":"1000-1000-1000-1000", “txnld" :"abelif879rft", “billingAddress": {"city"; } ‘Chicago"} € most things in modeliny ate boundaries, ig, there’s no universal answer for how to draw your It depends entirely — on how you tend to manipulate yen ata. If you tend to access a Customer together with all of that customer's to feat Ones then you would Prefer a single aggregate, However, if you tend sary oll accessing a single order at sere 4 time, then you should prefer havin “Parate aggregates for each order, Natu . 4 ally, this is very context-specific; some @ scanned with OKEN Scanner applications will prefer one or the other, even within a sing le system, which is why many people prefer aggregate ignorance. exactly 2.1.2. Consequences of Aggregate Orientation While the relational mapping captures the various data elements and their rela- tionships reasonably well, it does so without any notion of an aggregate entity. In our domain language, we might say that an order consists of order items, 4 shipping address, and a payment. This can be expressed in the relational model in terms of foreign key relationships—but there is nothing to distinguish relation- ships that represent aggregations from those that don’t. As a result, the data- base can’t use a knowledge of aggregate structure to help it store and distribute the data. Various data modeling techniques have provided ways of marking aggregate or composite structures. The problem, however, is that modelers rarely provide any semantics for what makes an aggregate relationship different from any other; where there are semantics, they vary. When working with aggregate-oriented databases, we have a clearer semantics to consider by focusing on the unit of i teraction with the data storage. It is, however, not a logical data property: It’s all about how the data is being used by applications—a concern that is often outside the bounds of data modeling. Relational databases have no concept of aggregate within their data model, so we call them aggregate-ignorant. In the NoSQL world, graph databases are also aggregate-ignorant. Being aggregate-ignorant is not a bad thing. It’s often difficult to draw aggregate boundaries well, particularly if the same data is used in many different contexts. An order makes a good aggregate when a customer is making and reviewing orders, and when the retailer is processing orders. However, ifa retailer wants to analyze its product sales over the last few months, then an order aggregate becomes a trouble. To get to product sales history, you'll have to dig into every aggregate in the database. So an aggregate structure may help with some data interactions but be an obstacle for others. An aggregate- ignorant model allows you to easily look at the data in different ways, so itis a better choice when you don’t have a primary structure for manipulating your dara, The clinching reason for aggregate orientation is that it helps greatly with running on a cluster, which as you'll remember is the killer argument for the rise of NoSQL. If we're running on a cluster, we need to minimize how many nodes we need to query when we are gathering data. By explicidy icelading aggregates, we give the database important information about which bits of data will be manipulated together, and thus should live on the same nod ae Aggregates have an important consequence for transactions. Re risa databases allow you to manipulate any combination of rows from any el bl “a a single transaction. Such transactions are called ACID transactions: ee ie nsistent, Isolated, and Durable. ACID is a rather contrived aeronyins O16 8 Point is the atomicity: Many rows spanning many tables are up - @ scanned with OKEN Scanner Cnarrer 2 AGGREGATE Data MovELs single operation. This operation either sueceeds or fails in its entirety, and con. current operations are isolated from each other so they cannot seea partial update It’s often said that NoSQL databases don’t support ACID transactions and thus sacrifice consistency. This is-a rather sweeping simplification. In general, it’s true that aggregate-oriented databases don’t have ACID transactions that span multiple aggregates. Instead, they support atomic manipulation of a single aggregate at atime. This means that if we need to manipulate multiple aggregates in an atomic way, we have to manage that ourselves in the application code, In practice, we find that most ofthe time we are able to keep our atomicity needs to within a single aggregates indeed, that’s part of the consideration for deciding how to divide up our data into aggregates. We should also remember that graph and other aggregate-ignorant databases usually do support ACID transactions similar to relational databases. Above all, the topic of consistency is much more involved than whether a database is ACID or not, as we'll explore in Chapter 5. ee 2.2 Key-Value and Document Data Models We said. earlier on that key-value and document databases were strongly aggregate-oriented. What we meant by this was that we think of these databases as primarily constructed through aggregates. Both of these types of databases consist of lots of aggregates with each aggregate having a key or ID that's used to get at the data. “The two models differ in that in a key-value database, the aggregate is opaque to the database—just some big blob of mostly meaningless bits. In contrast, a document database is able to see a structure in the aggregate. The advantage of opacity is that we can store whatever we like in the aggregate. The database may impose some general size limit, but other than that we have complete freedom. A document database imposes limits on what we can place in it, defining allowable structures and types. In return, however, we get more flexibility in access. With a key-value store, we can only access an aggregate by lookup based on its key. With a document database, we can submit queries to the database based on the fields in the aggregate, we can retrieve part of the aggregate rather than the whole thing, and database can create indexes based on the contents of the aggregate. In practice, the line between key-value and document gets a bit blurry. People often put an ID fild in a document database to do a Key-value style lookup. Dashes clad as key-value databases may allow you structures for data Pevond jut an opaque aggregate, For example, Rak allows you to add metadats 1) abr rgares for indexing and interaggregate links, Redi allows you to break sav et ‘beregate into lists or sets, You can support querying by integrating : ols such as Solr. As an example, Riak includes a search facility that uses Solr-like searching on any aggregates that are 8 a eae ayregates that are stored as JSON or XML structures: 4 @ scanned with OKEN Scanner 2.3 COLUMN FAMILY Stores Vv pespite this blurriness, the general distinetion still holds, With k orabises, we expect £0 mostly Look up aggregates using a key Wit done sa databases, we mostly expect to submit some form of query based on the Feral structure of the documents this might be a key, but ee ore likened pe something clse. > ore likely to Till here—! —_—_——_—$$ 2.3 Column-Family Stores One of the early and influential NoSQL databases was Google's BigTable [Chang ete. Its name conjured up a tabular structure which it realized with sparse columns and no schema. As you'll soon see, it doesn’t help to think of this croeture as a table; rather, it is a two-level map. But, however you think about the structure, it has been a model that influenced later databases such as HBase and Cassandra. “These databases with a bigtable-style data model are often referred to as column, stores, but that name has been around for a while to deseribe a different animal. Pre NoSQL column stores, such as C-Store [C-Store], were happy with SQL and the relational model. The thing that made them different was the way which they physically stored data. Most databases have a row asa unit of storage which, in particular, helps write performance. However, there are many scenston where enes are rare, but you often necd to read a few columns of many rows ft oe Inthis situation, it’s better to store groups of columns for all rows as the basic storage unit—which is why these databases are called column stores. Bigtable and its offspring follow this notion of storing groups of columns (column families) together, but part company with C-Store and friends by abandoning the relational model and SQL. In this book, we refer to this class of databases as column-family databases. Perhaps the best way to think of the column-family model is as a two-level aggregate structure. As with key-value stores, the first key is often described as a row identifier, picking up the aggregate of interest. ‘The difference with column- family structures is that this row ageregate 16 itself formed of a map of more detailed values. These second-level values are referred to as columns. AS well as accessing the row as a whole, operations also allow picking out a patti a or umn, so to get a particular customer's name from Figure 2.5 you cow something like get('1234", ‘name'). 2 olunn-family fistabasts organize their columns into column tari. Each column has to be part of a single column family, and the cola oF unit for access, with the assumption that dara for a particular column fn be usually accessed together. This also gives you a coupl is seructured. cof ways to think about low ee dais seructu @ scanned with OKEN Scanner

You might also like