0% found this document useful (0 votes)
9 views

Lecture 1- OverView

Uploaded by

i8thushara.edu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Lecture 1- OverView

Uploaded by

i8thushara.edu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Introduction to Data

Selvarajah Selvendra

IT 4005 - Advanced Database Systems 1


Welcome!
• Lecturer : Selvarajah Selvendra
• Email : [email protected]
• Lecture Hour : Tuesday (8:00 AM to 12:00 NOON)
• More info : https://www.linkedin.com/in/selvarajahselvendra/

IT 4005 - Advanced Database Systems 2


Course Outline
• 40% of the class is about core DBMS concepts
• Query execution, query optimization, transactions, recovery, etc.
• Textbook material
• 60% of the class is on “what is happening today in data management”
• New developments on textbook material
• Data streams
• Web search – Google, Yahoo!
• Data integration (structured data + unstructured data)
• Data mining
• BIG Data
• BIG data Analytics
• Unsolved challenges
Grade Scheme

Exam 60%

Individual Assessments 20%

Group Assessments 20%

IT 4005 - Advanced Database Systems 4


Group Project
• Project (due May 29th)
• One project: Group size <= 4 students
• Checkpoints
• Proposal: title and goal (due March 6st)
• Outline of approach (due March 27th)
• Implementation and Demo (May 16th and 23rd )
• Final Project Report (due May 29th)
• Each group will have a short presentation and demo (15-20 minutes)
• Each group will provide a 20 page document on the project; the responsibility and work of
each student shall be described precisely

IT 4005 - Advanced Database Systems 5


What’s Data?
1. Information in raw or unorganized form (such as alphabets,
numbers, or symbols) that refer to, or represent, conditions,
ideas, or objects. Data is limitless and present everywhere in
the universe.

2. Computers: Symbols or signals that are input, stored, and


processed by a computer, for output as usable information.

Read more: ttp://www.businessdictionary.com/definition/data.html

IT 4005 - Advanced Database Systems 6


What’s Big Data?
No single definition; here is from Wikipedia:
1. Big data is the term for a collection of data sets so large and complex that it becomes difficult to
process using on-hand database management tools or traditional data processing applications.

2. The challenges include capture, curation, storage, search, sharing, transfer, analysis, and
visualization.

3. The trend to larger data sets is due to the additional information derivable from analysis of a
single large set of related data, as compared to separate smaller sets with the same total amount
of data, allowing correlations to be found to "spot business trends, determine quality of research,
prevent diseases, link legal citations, combat crime, and determine real-time roadway traffic
conditions.”
IT 4005 - Advanced Database Systems 7
Data Scalars
1. Data Size (Volume) – Bite

2. Data Transfer (Velocity) – Bite/s

3. Complexity (variety)

4. Uncertainty (veracity )
F zetta 1021 1,0006 1,0246 =260 70

IT 4005 - Advanced Database Systems 8


Volume
• Data Volume
• 44x increase from 2009 2020
• From 0.8 zettabytes to 35zb
• Data volume is increasing exponentially

IT 4005 - Advanced Database Systems 9


Velocity (Speed)
• Data is begin generated fast and need to be processed fast
• Online Data Analytics
• Late decisions ➔ missing opportunities
• Examples
• E-Promotions: Based on your current location, your purchase history, what you like ➔ send promotions
right now for store next to you

• Healthcare monitoring: sensors monitoring your activities and body ➔ any abnormal measurements
require immediate reaction

IT 4005 - Advanced Database Systems 10


Variety (Complexity)
• Relational Data (Tables/Transaction/Legacy Data)
• Text Data (Web)
• Semi-structured Data (XML)
• Graph Data
• Social Network, Semantic Web (RDF), …

• Streaming Data
• You can only scan the data once

• A single application can be generating/collecting many


types of data

• Big Public Data (online, weather, finance, etc)

IT 4005 - Advanced Database Systems 11


Distributed and Parallel Programming
• How do we assign work units to workers?
• What if we have more work units than workers?
• What if workers need to share partial results?
• How do we aggregate partial results?
• How do we know all the workers have finished?
• What if workers die?

IT 4005 - Advanced Database Systems 12


Cloud Computing
• IT resources provided as a service
• Compute, storage, databases, queues
• Clouds leverage economies of scale of
commodity hardware
• Cheap storage, high bandwidth networks
& multicore processors
• Geographically distributed data centers
• Offerings from Microsoft, Amazon,
Google, …
Syllabus
• Introduction and overview
• Advanced SQL
• Stored Procedures and Triggers
• Query Optimization
• Concurrency and Recovery
• Database System Architectures
• Data Warehousing
• Graph Databases
• NOSQL Data Base
• Object-Oriented and Object-Relational Databases
• XML and Databases
IT 4005 - Advanced Database Systems 14
IT 4005 - Advanced Database Systems 15

You might also like