0% found this document useful (0 votes)
22 views22 pages

MSA8040 Lecture 1

Uploaded by

pratik.nikam1920
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views22 pages

MSA8040 Lecture 1

Uploaded by

pratik.nikam1920
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Introduction

MSA 8040: Data Management for Analytics


Saeid Motevali, [email protected]

CRN 92564: T 08/22/2023


Course information
Instructor: Saeid Motevali
Office: Room 542@Buckhead Center
Room 1632@55 Park Place
Office hour: Wednesday 11am-12:30 pm @Buckhead
/ WebEx
Email: [email protected]
Location: Buckhead Room 542
Website: iCollege, https://icollege.gsu.edu/
It’s highly recommended to enable notifications for this course (all
courses) to receive announcements, updates, etc
Software: MySQL, MongoDB, Python (packages
including Beautiful Soup, Selenium, etc)
Teaching Assistants
Rida Fathima
Email: [email protected]
Office hours: Wednsday 4:30 – 6:30 pm

Swetha Siddhantam
Email: [email protected]
Office hours: Tuesday 11am – 1 pm

Office location: Instite for Insight lab (3 rd floor)


Course Objective:

By the end of the semester, students will be able to:
Section 2:
 MySQL
Understand database


Be familiar with relational database concepts


Be proficient in manipulating data using SQL


Understand structured and unstructured data


Be familiar with MongoDB


Be able to extract, store and query unstructured data


Be able to recall and discuss algorithms for analysis of unstructured data


Be familiar with Python


Apply unstructured data analytics techniques to solve real problems

4
Overview of the Course:

Use of SQL in
ER / Design DB SQL Software / SQL
standard

NoSQL / ETC
Overview
Section 1: Section 2: Section Section 4: Section
Introducti MySQL 3: Web 5:
on NoSQL Scraping Text
Data Management for Analytics

Database MySQL NoSQL Beautiful Soup Mining


Unstructured
Functions Types Regular data

Work Based Projects


concepts
Design concepts Operators Pro & Con expression Textual data

MongoDB Selenium
ER model SQL Topic Modeling
CURD Navigating
ER diagrams’ Statement LDA
Aggregatio Locating
Normalization syntax Dynamic LDA
n elements
Sentiment analysis
Advanced SQL Twitter API Neural network
Relational model Procedure SVM
Trigger Decision tree

From Query to Analytics

08/24/2023 MSA8040-I4I 6
Why Study Databases??

Computation Information

08/24/2023 MSA8040-I4I 7
Files and Databases
File: A collection of records or
documents
• Manual (paper) files
• Computer files
Database: A collection of similar
records, along with their relationships
What Is a Database??
Shared, integrated computer structures that store
data
Components:
• End-user data: raw facts of interest to end user
• Metadata: data about data, integrating & managing end-
user data
⎯ Describes data characteristics and relationships
⎯ Examples: the name of each data element, the type of values
(numeric, dates, or text) stored on each data element, and
whether the data element can be left empty
A Database management system (DBMS) is a
• Collection of programs
• Manages the database structure
• Controls access to data stored in the database
What Is a Database??

08/24/2023 MSA8040-I4I 10
Why Use a DBMS??
Minimal data redundancy
Data consistency, data integration,
and data sharing
Ease of application development,
reduced program maintenance
Uniform security, privacy, and
integrity controls
Data accessibility and responsiveness
Data independence
DBMS Functions
Data dictionary management
• Data dictionary: stores definitions of data
elements and their relationships
Data storage management
• Performance tuning ensures efficient
performance
Data transformation and presentation
• Data is formatted to conform to logical
expectations
Security management
• Enforces user security and data privacy
DBMS Functions
Multiuser access control
• Sophisticated algorithms ensure that
multiple users can access the database
concurrently without compromising its
integrity
Backup and recovery management
• Enables recovery of the database after a
failure
Data integrity management
• Minimizes redundancy and maximizes
consistency
DBMS Functions
Database access languages and
application programming interfaces
• Query language: lets the user specify what
must be done without having to specify how
• Structured Query Language (SQL): de facto
query language and data access standard
supported by the majority of DBMS vendors
Database communication interfaces
• Accept end-user requests via multiple,
different network environments
Data Modeling
Creating a specific data model before building your
databases.

08/24/2023 MSA8040-I4I 15
Data Models
A data model is a collection of concepts
for describing data
A schema is description of a particular
collection of data, using the a given data
model
The relational model of data is the most
widely used model today
• Main concept: relation, basically a table with
rows and columns
• Every relation has a schema, which describes
the columns, or fields
Evolution of Data Models

08/24/2023 MSA8040-I4I 17
NoSQL Databases
Usually very simple
key/value search
operations
May use distributed
parallel processing
(grid/cloud, e.g.
MongoDB + Hadoop)
Semantic Web
“TripleStores” are
one type
Well-designed DBs
Facilitate Data
Management

08/24/2023 MSA8040-I4I 19
Databases Make These
Folks HAPPY …
End users and DBMS vendors
DB application programmers
• E.g., smart webmasters
Database administrator (DBA)
• Designs logical /physical schemas
• Handle security and authorization
• Data availability, crash recovery
• Database tuning as needs evolve
Must understand how a DBMS works!
Structure of a DBMS
A typical DBMS has a
layered architecture query optimization
and execution

The figure does not relational operators


These layers
files and access
show the concurrency methods must
consider
control and recovery buffer management
disk space
concurrency
control and
components management recovery

This is one of several


DB
possible architectures;
each system has its
own variations
Summary
DBMS used to maintain, query large datasets
Benefits include recovery from system crashes,
concurrent access, quick application
development, data integrity and security
Levels of abstraction give data independence
A DBMS typically has a layered architecture
DBAs hold responsible jobs and are well-paid
DBMS R&D is one of the broadest, most exciting
areas in CS

You might also like