0% found this document useful (0 votes)
43 views8 pages

Index: Mlbase Component, 100

Uploaded by

fallu447
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views8 pages

Index: Mlbase Component, 100

Uploaded by

fallu447
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Index

„„ A
        Big Data revolution
cloud computing, 22
Aerospike, 91, 217 competing definitions, 21
Aerospike query language (AQL), 218 industrial revolution, 22
AJAX. See Asynchronous JavaScript and IoT, 22
XML (AJAX) social networks and
Alternative persistence model, 92 smartphones, 22
Amazon Binary JSON (BSON), 157
ACID RDBMS, 46 Blockchain, 212
Dynamo, 14, 45–46 Bloom filters, 161
DynamoDB, 219 Boolean bit logic, 214
hashing, 47–48 B-tree index structure, 158–159
key–value stores, 51 Business intelligence (BI) practices, 193
NWR notation, 49–50
SOA, 45
Amazon Web Services (AWS), 15 „„ C
       
Apache Cassandra , 218. See also Cassandra Cache-less architecture, 92
Apache HBase, 220 CAP theorem
Apache Kudu, 211. See also Hbase partition tolerance, 44
Append only file (AOF), 94 RAC solution, 44
Asynchronous JavaScript and Cascading Style Sheets (CSS), 54
XML (AJAX), 15 Cassandra, 211
Atomic, Consistent, Independent, and Durable cluster node, 120
(ACID) transactions, 9–10, 128 consistent hashing, 120–121
AWS. See Amazon Web Services (AWS) data model, 153, 155
gossip, 119
node adding, 122
„„ B
        order-preserving
Berkeley analytics data stack and spark partitioners, 124
AMPlab, 99 replicas, 124–125
BDAS, 100 snitches, 126
DAG, 101 virtual nodes, 122–123
Hadoop, 99–100 Cassandra consistency
JDBC-compliant database, 101 hinted handoff, 136–137
MapReduce, 99 LWT (see Lightweight
MLBase component, 100 transactions (LWT))
RDD, 101 read consistency, 135
spark architecture, 101 read repair, 136–137
spark processing replication factor, 134
elements, 102 timestamps and granularity, 137
spark SQL, 100 vector clocks, 138–140
spark streaming, 100 write consistency, 134–135

229
■ index

Cassandra Query Language (CQL), 218


column structures, 175
„„ D
       
cqlsh program, 175 DAG. See Directed acyclic graph (DAG)
JDBC, 177 Database
CGI. See Common Gateway Interface (CGI) bewildering array, 215
Cloud computing, 15 BI frameworks, 197
Codd’s 13th rule (nonsubversion), 198 blockchain, 212
Columnar architecture and Column database Cambrian explosion, 214
architectures Cloudera distribution of Hadoop, 201
advantage, 77, 79 consistency models, 195–196
aggregate operations, 78 convergent, 210
columnar and row-oriented storage, Criticisms of next generation
comparison, 77 business intelligence, 193
compression, 79 compromises, 193
disadvantage, 79 decision points, 194
data backed, 81 de-normalization, 193
data warehouse, 81 Edgar Codd’s key critiques, 193
delta store, 81 high-level logical model, 194
insert, column store, 80 IDMS and IMS, 193
IO and CPU optimizations, 78 inconsistent, 193
columnar technology, 84 navigational model, 193
large-scale bulk sequential loads, 82 nonrelational systems, 192–193
oracle’s hybrid columnar compression scheme, 84 RDBMS, 194
projections unambiguous and
database table, 83 nonredundant view, 194
pre-join, 83 disruptive database technologies, 211
superprojection, 82 Dynamo-style eventual
RLV, 81 consistency, 210
Tuple Mover, 81 graph compute engine, 202
vertica, 81 hybrid capabilities, 197
write optimization, 82 in ACID RDBMS systems, 210
write penalty, 79 incompatible technologies, 195
write-optimized delta store, 81 JSON embedded in RDBMS, 201
Column family structure, 151–152 JSON via Oracle REST, 204–206
Common Gateway Interface (CGI), 40 languages, 198–199
Consistency models modern RDBMS, 214
ACID, 128, 130 MongoDB users, 210
Cassandra (see Cassandra consistency) Oracle Big Data SQL, 201
HBase, 132–133 Oracle graph, 207
MongoDB, 131 Oracle JSON support, 202–203
MVCC, 128, 130 Oracle sharding, 208–210
transactional consistency, 130 Oracle’s RAC clustered, 210
transaction sequence number, 130 Oracle tables, 206–207
two-phase-commit (2PC), 130 ORDS, 201
Copenhagen interpretation, 213 possible convergence, schema models, 198
Couchbase, 61, 219 quantum computing, 213–214
queries, 202 RDBMS incumbents, 195
N1QL, 198 relational model, 197
CouchDB, 61 revolution
CQL. See Cassandra Query Language (CQL) competitive challenges, 192
CRDT. See Convergent replicated graph databases, 192
data types (CRDT) Hadoop and Spark, 192
Cryptocurrency, 212 Internet of Things (IoT), 191
CSS. See Cascading Style Sheets (CSS) nonrelational operational
C-store, 81 databases, 192
Cypher graph query language, 199, 222 predominant drivers, 191

230
■ Index

relational model, 192 MPP databases, 108


SQL, 192 RAC cluster database, 109
transactions, 192 replication
sharded distributed database, 202 approach, 107
storage, 199–201 log-based, 107
storage technologies, 211–212 standby database, 107
strict multi-record ACID transactions, 195 transaction log, 107
Database Management System (DBMS), 6 shared disk, 107, 109
Database survey shared-nothing, 108
Aerospike, 217 web architectures, 106
Cassandra, 218 Document databases, 15
CouchBase, 219 JSON, 57
DB-Engines site, 217 nonrelational database, 53
DynamoDB, 219 XML, 54–57
HBase, 220 Durable distributed cache (DDC) architecture, 223
MarkLogic, 221
MongoDB, 221
Neo4J, 222 „„ E
       
non-trivial score, 217 Early database systems
NuoDB, 223 definition, 4
Oracle RDBMS, 223 electronic computers, 5
Redis, 224 human civilization and technology, 4
“revolutionary”, 217 indexing methods, 5
Riak, 225 tabulating machines and punched cards, 5
SAP Hana, 225 EC2. See Elastic Compute Cloud (EC2)
TimesTen, 226 EHCC. See Enhanced Hybrid Columnar
Vertica, 227 Compression (EHCC)
VoltDB, 227 Elastic Compute Cloud (EC2), 15
Data models Enhanced Hybrid Columnar
BigTable and HBase, 145, 151–152 Compression (EHCC), 84
Cassandra, 153–156 eXtensible Markup Language (XML)
document databases, 146 CSS, 54
graph databases, 146 database architecture, 56
JSON, 156–157 relational systems, 57
key-value, 145 tools and standards, 54
key-value stores XQuery statement, 54
conflict resolution, 148
CRDT, 148–150
data-type agnostic, 148 „„ F
       
Riak, 148, 150 Facebook, 14
secondary indexes, 148 Fast projection index, 84
relational, 145–147 First database revolution
Data warehousing schemas data handling code, 6
CPU and IO intensive, 76 DBMS, 6
CRUD operations, 75 network and hierarchical model, 6–7
OLAP, 75
OLTP system, 75
snowflake schema, 75 „„ G
       
star schemas, 75–76 Google
Directed acyclic graph (DAG), 101, 198 hardware platform, 23
Apache Tez project, 181 MapReduce, 26–27
MapReduce paradigm, 181 modular data center, 24
Distributed relational databases PageRank, 23
client-server, 106 software, 25
mainframe, 105 Google Cloud BigTable, 151
monolithic database server, 106 Google Modular Data Center, 24
231
■ index

Graph databases, 192


definition, 66
„„ I
       
graph compute engines, 73 IaaS. See Infrastructure as
Gremlin, 71, 73 a Service (IaaS)
index-free adjacency, 73 IBM, 7
internal storage, 73 Inconsistent, 193
Neo4j, 69–71 Index Sequential Access
property, 69–71 Method (ISAM), 5
RDBMS patterns, 67–68 Infrastructure as a Service (IaaS), 15
RDF, 68–69 In-memory databases
SPARQL, 68–69 alternative persistence model, 92
Big Data phenomenon, 92
Cache-less architecture, 92
„„ H
        COMMIT operations, 92
Hadoop memory cost and capacity, 91
analytic processing, 28 Oracle 12c, 98–99
architecture, 29–30 Redis, 93–94
ecosystem, 37 SAP HANA, 95
HBase, 32–34 TimesTen, 92–93
Hive, 34–35 traditional database architecture, 92
Nutch, 28 VoltDB, 97–98
open-source project, 28 Internet of Things (IoT), 191
Pig, 36 ISAM. See Index Sequential Access
Hadoop Distributed File Method (ISAM)
System (HDFS), 29
HANA architecture, 95
HBase
„„ J, K
       
architecture, 115, 117 JavaScript object notation (JSON), 15, 156
caching and data locality, 117–118 AJAX, 57, 58
catalog tables, 116 content-management systems, 57
DataNode, 118 CouchBase, 61
Hadoop HDFS file system, 115 CouchDB, 61
HDFS, 32 databases, 58–59
HDFS DataNodes, 115 document embedding, 60
implementation, 115 MemBase, 61
master node, 119 MongoDB, 61
master server, 116 JSON. See JavaScript object notation (JSON)
OpenTSDB, 118 JSON embedded in RDBMS, 201
random access database services, 115
real-time random access database, 115
region replicas, 119 „„ L
       
RegionServer, 116, 118–119 Lightweight transactions (LWT)
vs. relational model, 33 compare-and-set (CAS) pattern, 141
rowkey ordering, 118 lockless architecture, 140
short-circuit reads, 117 optimistic locking pattern, 142
tables, 116 Paxos protocol, 142
Zookeeper service, 116 processing, 142–143
HDFS. See Hadoop Distributed Log-based replication, 107
File System (HDFS) Log-structured merge (LSM) trees, 1117
Hierarchical model, 6–7 architecture, 90
Hive bloom filters, 161
architecture, 35 Cassandra terminology, 160–161
Impala, 35 CommitLog, 160
SQL processing layer, 34 compaction, 162
Hive Query Language (HQL), 34 in–memory tree, 160

232
■ Index

on-disk trees, 160 Non-first Normal Form Query Language (NIQL), 61


SSTables, 160–161 Nonrelational distributed databases
Tombstones, 162 ACID compliance, 110
WAL, 160 balancing availability and consistency, 110
LSM. See Log-structured merge tree (LSM) consistent hashing model, 110
hardware economics, 110
omniscient master, 110
„„ M
        traditional sharding architecture, 110
Magnetic disk device, 87 Nonrelational operational databases, 192
MarkLogic, 221 NoSQL
Massively parallel processing (MPP), 108 APIs, 169
Membase cascading, 181
Memcached technology, 61 CQL, 175–177
nonrelational system, 61 DAG, 181
Memristors, 212 Hbase, 171–172
MemTable, 160 MapReduce, 177–179
Mesos, 100 MongoDB, 173–175
MongoDB, 221 Pig, 179–180
cluster balancing, 113 Riak, 169–171
JavaScript query and SQL, 173 Spark project, 181–182
JSON, 61 NuoDB, 210, 223
MySQL, 62 Nutch, 28
replica set and primary
failover, 113–114
replication, 113 „„ O
       
sharding Object Oriented Database Management System
architecture, 110 (OODBMS), 11, 13
mechanisms, 111, 113 Object-oriented programming (OOP)
range and hash, 112 encapsulation, 12
shard key, 111 inheritance, 12
tag-aware, 113 RDBMS, 12
write concern and read preference, 115 Object-Relational Mapping (ORM), 13
MongoDB query, 192–193, 199, 202, 205, 208 OLTP. See On-line Transaction Processing (OLTP)
Multi-level cell (MLC), 88 Online Analytic Processing, 75
Multi–version concurrency On-line Transaction Processing (OLTP), 5
control (MVCC) OODBMS. See Object Oriented Database
advantage, 130 Management System (OODBMS)
patterns, 193 OOP. See Object-oriented programming (OOP)
snapshot construction, 128 openCypher graph language, 207
Oplog, 113
Oracle Big Data Appliance, 201
„„ N
        Oracle Big Data Hadoop system, 207
N1QL Oracle database in-memory, 98–99
analytic systems, 185 Oracle JSON support, 202–203
UNNEST command, 187 Oracle Parallel Server, 208
N1QL. See Non-first Normal Form Oracle RDBMS, 223
Query Language (N1QL) Oracle Real Application Clusters (RAC), 208
Neo4j, 222 Oracle REST Data Services (ORDS), 201
cypher and cypher query, 69–70, 71 Oracle REST interface, 207
Network model, 6–7 Oracle REST JSON query, 204
Network topology aware replication Oracle REST query, 206
strategy, 124 Oracle sharding architecture, 208–209
NewSQL Oracle TimesTen In-Memory
H-Store and C-Store, 16 Database (TimesTen), 226
RDBMS, 16 ORM. See Object-Relational Mapping (ORM)

233
■ index

„„ P
        INGRES, 10
mainframe computer, 10
Pig Latin, 36 OODBMS, 11–13
PropertyFileSnitch, 126 OOP, 11–13
QUEL, 10
„„ Q
        relational database model (see Relational
theory)
Quantum query language (QQL), 214 SQL/DS, 10
Quantum search, 213 SQL language, 10
Quantum transactions, 213 transaction models, 9–10
QUEL, 10 Secondary index
B-tree indexes, 163
„„ R
        DIY, 163
global and local, 165
RAC. See Real Application Clusters (RAC) implementations, 166
RackInferringSnitch, 126 nonrelational operational
RDD. See Resilient distributed datasets (RDD) database systems, 163
RDF. See Resource Description Framework (RDF) Service-oriented architecture (SOA), 45
Real Application Clusters (RAC), 43, 109 Set-based query language (SQL)
Redis. See Remote dictionary server (Redis) advantages, 183
Relational storage model ANSI and ISO standard, 168
B-tree index structure, 158–159 Apache Drill framework, 188–190
Couchbase’s HB+-Trie, 160 Hive, 183–184
database architecture, 158 Impala, 184
index blocks, 159 N1QL, 185–187
RDBMS architectural pattern, 158 NoSQL, 190
Tokutek’s fractal tree index, 160 spark, 185
Relational theory types, 168
concepts, 8 Shard chunk, 113
normalized and un-normalized data, 8–9 Sharding
Remote dictionary server (Redis), 224 ACID transactions, 14
AOF, 94 drawbacks, 43
architectural components, 94 Facebook, 14
architecture, 94 memcached/replication architecture, 42
EMC, 93 Shared-disk database architecture, 109
key-value store, 95 Simple Oracle Data Access (SODA), 204
key-value store architecture, 93 Simple Oracle Document Access (SODA), 201
memory database system, 95 Single-level cell (SLC), 88
MongoDB, 95 SOA. See Service-oriented architecture (SOA)
snapshot, 93 SODA REST query, 205
virtual memory system, 94 Solid state disk (SSD)
Replica sets, 113–114 Aerospike, 91
Replication factor, 124 algorithms, 89
Resilient distributed datasets (RDD), 101 battery-backed RAM device, 88
Resource Description Framework (RDF), 68–69 DDR RAM, 88
Riak, 225 economics, 89–90
Row Level Versioned, 81 enabled databases, 90
NAND flash, 88
„„ S
        performance characteristics, 89
SLC and MLC, 88
SAP Hana, 95–96, 225 write amplification, 89
SCN. See System change number (SCN) SPARQL Protocol, 69
Second database revolution Splice Machine layers, 210
client-server computing, 11 SQL/DS, 10
IBM, 7 SSTables, 160

234
■ Index

Star schemas, 75–76, 147


Superprojection, 82
„„ W
       
Sybase IQ, 81 Web 2.0
System change number (SCN), 130 CAP theorem, 43–44
CGI-based approaches, 40
e-commerce, 40
„„ T, U
        eventual consistency, 45
Tabulating machines and punched cards, 5 open-source solution
Tachyon, 100 memcached servers and
Third database revolution replication, 41
cloud computing, 15 MySQL, 41
document databases, 15 scale-up solution, 40
Google, 14 sharding, 41–43
Hadoop, 14 Write-Ahead Log (WAL), 117, 160
NewSQL, 16
TimesTen, 92–93, 226
Time to live (TTL), 152
„„ X
       
Transaction models, ACID transaction, 9 XML. See eXtensible Markup Language (XML)
Transactions, 192
Tunable consistency model, 201
„„ Y, Z
       
Yet Another Resource
„„ V
        Negotiator (YARN)
Vertica, 81, 83, 227 Application Manager, 30
VoltDB, 97–98, 227 Resource Manager, 30

235

You might also like