Lecture 1- Introduction to Big Data
Lecture 1- Introduction to Big Data
https://xkcd.com/605/
WORKING WITH
STRUCTURED
DATA
Working with DBMS
• DBMS are here with us for a long time (the first DBMS
was developed in 1960s )
• Using Structured Query Language (SQL) is a common
and useful way to analyze/manipulate data
• There are excellent open source DBMS that can
be easily installed and used
• Can also be useful to run queries on
Hadoop, Spark, and BigQuery
Data Science and Databases
From my personal experience:
When to use databases:
• Working with structured/tabular data
• Working with relatively small datasets (up to several million
rows)
• Doing relatively simple analytics
• Needing to work with many subsets of the datasets
When not to use databases:
• Working with unstructured data
• Working with data that contains dictionary/lists structures
• Working with relatively large datasets (several hundreds of
millions of rows)
• Doing complex analytics
SQL - A Very Quick Review
Select <Col_1>,<Col_2>,…,<Col_N>
From <Table1>, <Table2>, ….,<Table_N>
Where <RowCondtion>
Order by <Col_i>
Links Users
User1 User2 Userid FirstName LastName JoinYear GroupNumber
1 2
1 Jhon Smith 2018 1
2 3
2 Marry Perry 2019 1
2 1
3 William Brown 2018 2
3 1
4 1 4 Daniel Miler 2017 2
SQL Joins
SQLite
In this course, we will be working with SQLite
Useful Links:
• SQLite.org
• DB Browser for SQLite
• sqlite3 module
WORKING WITH
REAL WORLD
DATA
Importing Dataset from CSV
Example 1: Baby Names
There is an open datasets containing the names
- SQLZOO
- SQL Murder Mystery
Let’s move to reviewing the
course first notebook