Module 4 Lecture Slides-1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 45

SQL Data Definition Language

Create and Insert
SQL Create and Use Database

Let's create a database named “mystore”

Item_no Item_name Unit_Price Inventory


2321 Dell Laptop 1500 56
5432 Seagate Drive 200 100
5674 Kingston USB Drive 70 500
8542 Backpack 100 45
Introduction to Data Science: SQL
-- Create database for mystore
CREATE SCHEMA mystore;

-- Create table named inventory in mystore


CREATE TABLE mystore.inventory(Item_no INT NOT
NULL, Item_name VARCHAR(100) NOT NULL,
Unit_Price INT NOT NULL, Inventory INT, PRIMARY KEY
(Item_no));

-- Populate table with values/data


INSERT INTO mystore.inventory (Item_no, Item_name,
Unit_Price, Inventory) VALUES (2321, 'Dell Laptop',
1500, 56);
INSERT INTO mystore.inventory (Item_no, Item_name,
Unit_Price, Inventory) VALUES (5432, 'Seagate Drive',
200, 100);
INSERT INTO mystore.inventory (Item_no, Item_name,
Unit_Price, Inventory) VALUES (5674, 'Kingston USB
Drive', 70, 500);
INSERT INTO mystore.inventory (Item_no, Item_name,
Unit_Price, Inventory) VALUES (8542, 'Backpack', 100,
45);
Introduction to Data Science: SQL DDL
-- Select database to use
USE mystore;

/* once USE mystore is executed, we can eliminate


the dot operator and database name */
CREATE TABLE inventory(Item_no INT NOT NULL,
Item_name VARCHAR(100) NOT NULL, Unit_Price INT
NOT NULL, Inventory INT, PRIMARY KEY (Item_no));

-- Insert new product with null value


INSERT INTO inventory (Item_no, Item_name,
Unit_Price, Inventory) VALUES (2348, ‘HP Laptop', 1000,
null);

-- Insert new product with no null columns only


INSERT INTO inventory (Item_no, Item_name,
Unit_Price) VALUES (7344, ‘Lenovo Laptop', 988);
SQL Load Data from CSV File
SQL Load Data from CSV File
SQL Load Data from CSV File

Step 1: Understand the structure (column heading, data type)

Step 2: Create SQL script
• Create database
• Create table with the same structure observed from CSV file
• Load data from CSV file

Step 3: Run SQL script
SQL Load Data from CSV File
CREATE SCHEMA cookies;

CREATE TABLE cookies.sales
(Sales_Date varchar(10),
Day_of_Week varchar(10),
Salesman varchar(10),
Temperature INT,
Tweets INT,
Price INT,
Sales INT);
LOAD DATA INFILE 'C:/ProgramData/MySQL/MySQL Server 8.0/Uploads/Cookies Sample.csv' 
INTO TABLE cookies.sales
FIELDS TERMINATED BY ',' 
LINES TERMINATED BY '\n‘
IGNORE 1 ROWS;
SQL Load Data from CSV File
Python MySQL Load Data from CSV File
Python MySQL Load Data from CSV File
import mysql.connector as sq
mydb=sq.connect(host="localhost",user="root",passwd="ucla", buffered=True)

mycursor = mydb.cursor()
mycursor.execute('CREATE SCHEMA cookies')

SQLCMD = 'CREATE TABLE cookies.sales (Sales_Date varchar(10), \


Day_of_Week varchar(10), Salesman varchar(10), Temperature INT, \
Tweets INT, Price FLOAT, Sales INT)'

mycursor.execute(SQLCMD)

SQLCMD = "LOAD DATA INFILE 'C:/ProgramData/MySQL/MySQL Server 8.0/Uploads/Cookies Sample.csv' \
INTO TABLE cookies.sales FIELDS TERMINATED BY ',' LINES TERMINATED BY '\\n' IGNORE 1 ROWS"

mycursor.execute(SQLCMD)
mydb.commit()
Python MySQL Load Data from CSV File
Python Pandas DataFrame

DataFrame is a 2‐dimensional 
labeled data structure with 
columns of potentially different 
types
Data Science: Python Pandas DataFrame
Python Pandas DataFrame
Data Science: Python Pandas DataFrame and Correlation
Data Science: Python Pandas DataFrame and Correlation Demo
Python Pandas DataFrame
import pandas as pd 
from matplotlib import pyplot as plt
%matplotlib inline

grades = [90.5, 100.0, 75.8, 25.6]
studytime = [40, 50, 35, 10]

# Convert the List of Grades into Excel Spreadsheet Lookalike Column Format
data = list(zip(grades, studytime))

df = pd.DataFrame(data, columns = ["Grades", "StudyTime"]) 

print (df)

# Use Pandas DataFrame Correlation 
print (df.corr())
plt.scatter(studytime, grades)
plt.show()
Data Science: Python Pandas DataFrame and Correlation
Python Pandas DataFrame

import pandas as pd

MyClass = {'students':['Bruce', 'Jane', 'Nancy', 
'Bill'],
'grades':[10, 9, 9, 8]}

df = pd.DataFrame(MyClass)
Python Pandas DataFrame
import pandas as pd

MyClass = {'students':['Bruce', 'Jane', 'Nancy', 
'Bill'],
'grades':[10, 9, 9, 8]}

df = pd.DataFrame(MyClass, index 
=["ID1","ID2","ID3","ID4"])
Python Pandas DataFrame
import pandas as pd

MyClass = 
{'John':10,'Jake':9,'Jackie':8,'Jack':7,'Jane':6,'Jo'
:10,'Ja':9,'Jac':8,'Jacky':7,'Jan':6}

df = pd.DataFrame(MyClass, index=[1, 2, 3])
Python Pandas DataFrame
MyInventory = {
"Item": ["coffee", "chocolate", "tea", 
"water"],
"Promotion": [False, False, True, False],
"Price": [5.95, 5.95, 3.95, 2.95],
"Stock": [100, 250, 1000, 1200]
}

ddf = pd.DataFrame(MyInventory)
ddf
Python Pandas DataFrame
Inv2 = {
"Item": ["coffee", "chocolate", "tea", 
"water"],
"Promotion": ["no","no","yes","yes"],
"Price": [5.95, 5.95, 3.95, 2.95],
"Stock": [100, 250, 1000, 1200]
}

InvDF = pd.DataFrame(Inv2)
InvDF
Python Pandas DataFrame
Inv2 = {
"Item": ["coffee", "chocolate", "tea", 
"water"],
"Promotion": ["no","no","yes","yes"],
"Price": [5.95, 5.95, 3.95, 2.95],
"Stock": [100, 250, 1000, 1200]
}

InvDF = pd.DataFrame(Inv2)
InvDF

InvDF = InvDF.replace({'Promotion': {'no': 
False,'yes': True}})
Python Pandas DataFrame
InvDF[InvDF["Promotion"] == False]

InvDF[InvDF["Price"] < 5]
Data Science: Python Pandas DataFrame from Excel
Pandas DataFrame Review
DataFrame is a 2‐dimensional 
labeled data structure with 
columns of potentially different 
types
Data Science: Python Pandas DataFrame from Excel
Data Science: Python Pandas DataFrame from Excel

STEPS:
1. import the pandas library
2. Create a variable for the data frame 
to store all columns and values 
from the Excel worksheet
3. Use the API from pandas 
pandas.read_excel(“filename.xlsx”) 
to read the file “filename.xlsx” and 
assign all columns and values to the 
data frame variable
Data Science: Python Pandas DataFrame from Excel
Data Science: Python Pandas DataFrame from Excel
• You can access the specific column by 
referencing the index (column label) of 
that column
– df[“Price”] will return the values for the 
entire column “Price”   
– You can also use the form df.Price
• 0, 1, 2, 3, 4, 5, 6, 7 on the left most 
column is the index for the rows. You 
can access the value stored in column 
Price row 0 using df.Price[index]
– df.Price[0] will return the value of the 
first element of column “Price” = 50
– df.Price[6] will return the value of the 
seventh element of column “Price” = 80
– df.Sales[2] = 11
– df.Quantity[4] = 13.2
Data Science: Python Pandas DataFrame from Excel
Data Science: Python Pandas DataFrame from Excel Columns

pandas.read_excel(“filename.xlsx”) 
Data Science: Python Pandas DataFrame from Excel

What if we only want to read column 
“Price” into the dataframe variable?
Data Science: Python Pandas DataFrame from Excel

STEPS:
1. import the pandas library
2. Create a variable for the data frame to store all 
columns and values from the Excel worksheet
3. Use the API from pandas 
pandas.read_excel(“filename.xlsx”) to read the file 
“filename.xlsx” and assign all columns and values to 
the data frame variable. 
1. This time we will use an additional argument named 
usecols.
2. pandas.read_excel(“filename.xlsx”, usecols=[0]) to read 
first column only
3. pandas.read_excel(“filename.xlsx”, usecols=[0, 1]) to read 
first column and second column only
Data Science: Python Pandas DataFrame from Excel
Data Science: Python Pandas DataFrame Saving to Excel File
Data Science: Python Pandas DataFrame from csv
Data Science: Python Pandas DataFrame from csv
Data Science: Python Pandas DataFrame from csv

STEPS:
1. import the pandas library
2. Create a variable for the data frame 
to store all columns and values 
from the csv file
3. Use the API from pandas 
pandas.read_csv(“filename.csv”) to 
read the file “filename.csv” and 
assign all data to the data frame 
variable
Data Science: Python Pandas DataFrame from csv
Data Science: Python Pandas DataFrame from csv

Specific Column
Data Science: Python Pandas DataFrame Saving to a csv file

You might also like