Bcis5420 - Lecture Note - ch6 - Big Data Technologies
Bcis5420 - Lecture Note - ch6 - Big Data Technologies
Chapter 6
2.1
Ch.6. Big Data Technologies
Section. 1
2.2
Ch.6. Big Data Technologies
• Big Data: data that exist in very large volumes and many
different data types and that need to be processed at a very
high velocity (e.g., Internet of Things)
2.3
Ch.6. Big Data Technologies
2.4
Ch.6. Big Data Technologies
• Hard to organize big data into relational structures, due to various data
types (variety), enormous size (volume), and growth rate (velocity)
• Hence, hard to ensure credibility (veracity) and usefulness of data (value)
Variety: Data
Types and Sources
Velocity: Real (or near)
e.g., image, video, text Unstructured time data transaction
Data
2.5
Ch.6. Big Data Technologies
2.6
Ch.6. Big Data Technologies
2.7
Ch.6. Big Data Technologies
2.8
Ch.6. Big Data Technologies
Section. 2
NoSQL
2.9
Ch.6. Big Data Technologies
source: aws.amazon.com
2.10
Ch.6. Big Data Technologies
NoSQL Classifications
• Key-Value Stores
▪ A simple pair of a key and an associated collection of values
▪ Key is usually a string
▪ Database has no knowledge of the structure or meaning of the values
▪ Example: Redis
Key Value
Table_Name: Primary_Key_value: Attribute_Name = Value
STUDENT
STUDENT:ID100:FName = “John”,
ID FName LName Grade
STUDENT:ID100:LName = “Smith”,
ID100 John Smith 90 STUDENT:ID100:Grade = 90
2.11
Ch.6. Big Data Technologies
• Wide-Column Stores
Column Family_2
2.12
Ch.6. Big Data Technologies
“Document”
{ Primary_Key: ID1, Attribute_1: Value_1, Attribute_2: Value_2, …}
STUDENT
ID FName LName Grade {_id: “ID100”, FName: “John”,
ID100 John Smith 90 LName: “Smith”, Grade: 90}
2.13
Ch.6. Big Data Technologies
PRODUCT ORDERLINE
ORDER
CUSTOMER LINE
CUSTOMER ORDER
PRODUCT
2.14
Ch.6. Big Data Technologies
Section. 2
MongoDB
2.15
Ch.6. Big Data Technologies
MongoDB
• “Mongo” being short for humongous, meaning huge or gigantic
• A document-store database
• Collections
• Equivalent to tables in a relational database, containing multiple documents
• Keys
• Equivalent to columns in a relational database
• Documents
• Equivalent to rows in a relational database
• Documents do not need to have the same structure
• Relationships
• _id property serves as “primary key”
• Another document can have a “foreign” key for another JSON property
2.16
Ch.6. Big Data Technologies
• JSON Data
“Advisor”:
[
{“_id”: “Advisor1”, “name”: “Luke Shire”, “dept”: “Sales”},
{“_id”: “Advisor2”, “name”: “Merry Longbottom”, “dept”: “Accounting”}
{“_id”: “Advisor3”, “name”: “Jason Manor”, “dept”: “Marketing”, “tell”: “123-456-7890”}
]
Missing values may occur
• Relational Data (i.e., data anomalies)
ADVISOR
id name dept tell
Advisor1 Luke Shire Sales
2.17
Ch.6. Big Data Technologies
2.18
Ch.6. Big Data Technologies
▪ db.collection.find ( )
• Types of Brackets
2.19
Ch.6. Big Data Technologies
2.20
Ch.6. Big Data Technologies
2.21
Ch.6. Big Data Technologies
2.22
Ch.6. Big Data Technologies
2.23
Ch.6. Big Data Technologies
References
• The major contents of this note are reproduced from the textbook of BCIS 5420;
Topi et al. Modern Database Management. 13 th edition. Pearson's, 2019
• Unless having a specific reference source, the photos and icons used in this
material are from the following sources providing copyright free images:
imagesource.com, iconfinder.com, and pexels.com
2.24