Unit5_Notes_Short_DB
Unit5_Notes_Short_DB
Storage Systems
Things I know.
View on GitHub
MongoDB
collections of similar documents
individual documents resemble complex objects or XML documents
documents are self-describing
can have different data elements
documents can be specified in various formats: XML, JSON
MongoDB supports CRUD operations
documents stored in binary JSON (BSON) format
individual documents stored in a collection
each document in collection has unique ObjectID field called _id
a collection does not have a schema
structure of the data fields in documents chosen based on how documents will be
accessed
user can choose normalized or denormalized design
replication
concept of replica set to create multiple copies on different nodes
variation of master-slave approach
a replica set will have one primary copy of a collection C stored in one node
N1 , and at least one secondary copy (replica) of C stored at another node N2
primary copy, secondary copy, and arbiter
arbiter participates in elections to select new primary if needed
all write operations applied to the primary copy and propagated to the secondaries
user can choose read preference
read requests can be processed at any replica
sharding
horizontal partitioning divides the documents into disjoint partitions (shards)
allows adding more nodes as needed
shards stored on different nodes to achieve load balancing
partitioning field (shard key) must exist in every document in the collection (must
have an index; use of shard key)
range partitioning
creates chunks by specifying a range of key values
works best with range queries
Hash partitioning
partitioning based on the hash values of each shard key
hash function h(K) to each shard key K to give the shard