Module 4 Lecture Slides-1
Module 4 Lecture Slides-1
Module 4 Lecture Slides-1
Create and Insert
SQL Create and Use Database
Step 1: Understand the structure (column heading, data type)
Step 2: Create SQL script
• Create database
• Create table with the same structure observed from CSV file
• Load data from CSV file
Step 3: Run SQL script
SQL Load Data from CSV File
CREATE SCHEMA cookies;
CREATE TABLE cookies.sales
(Sales_Date varchar(10),
Day_of_Week varchar(10),
Salesman varchar(10),
Temperature INT,
Tweets INT,
Price INT,
Sales INT);
LOAD DATA INFILE 'C:/ProgramData/MySQL/MySQL Server 8.0/Uploads/Cookies Sample.csv'
INTO TABLE cookies.sales
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n‘
IGNORE 1 ROWS;
SQL Load Data from CSV File
Python MySQL Load Data from CSV File
Python MySQL Load Data from CSV File
import mysql.connector as sq
mydb=sq.connect(host="localhost",user="root",passwd="ucla", buffered=True)
mycursor = mydb.cursor()
mycursor.execute('CREATE SCHEMA cookies')
mycursor.execute(SQLCMD)
SQLCMD = "LOAD DATA INFILE 'C:/ProgramData/MySQL/MySQL Server 8.0/Uploads/Cookies Sample.csv' \
INTO TABLE cookies.sales FIELDS TERMINATED BY ',' LINES TERMINATED BY '\\n' IGNORE 1 ROWS"
mycursor.execute(SQLCMD)
mydb.commit()
Python MySQL Load Data from CSV File
Python Pandas DataFrame
DataFrame is a 2‐dimensional
labeled data structure with
columns of potentially different
types
Data Science: Python Pandas DataFrame
Python Pandas DataFrame
Data Science: Python Pandas DataFrame and Correlation
Data Science: Python Pandas DataFrame and Correlation Demo
Python Pandas DataFrame
import pandas as pd
from matplotlib import pyplot as plt
%matplotlib inline
grades = [90.5, 100.0, 75.8, 25.6]
studytime = [40, 50, 35, 10]
# Convert the List of Grades into Excel Spreadsheet Lookalike Column Format
data = list(zip(grades, studytime))
df = pd.DataFrame(data, columns = ["Grades", "StudyTime"])
print (df)
# Use Pandas DataFrame Correlation
print (df.corr())
plt.scatter(studytime, grades)
plt.show()
Data Science: Python Pandas DataFrame and Correlation
Python Pandas DataFrame
import pandas as pd
MyClass = {'students':['Bruce', 'Jane', 'Nancy',
'Bill'],
'grades':[10, 9, 9, 8]}
df = pd.DataFrame(MyClass)
Python Pandas DataFrame
import pandas as pd
MyClass = {'students':['Bruce', 'Jane', 'Nancy',
'Bill'],
'grades':[10, 9, 9, 8]}
df = pd.DataFrame(MyClass, index
=["ID1","ID2","ID3","ID4"])
Python Pandas DataFrame
import pandas as pd
MyClass =
{'John':10,'Jake':9,'Jackie':8,'Jack':7,'Jane':6,'Jo'
:10,'Ja':9,'Jac':8,'Jacky':7,'Jan':6}
df = pd.DataFrame(MyClass, index=[1, 2, 3])
Python Pandas DataFrame
MyInventory = {
"Item": ["coffee", "chocolate", "tea",
"water"],
"Promotion": [False, False, True, False],
"Price": [5.95, 5.95, 3.95, 2.95],
"Stock": [100, 250, 1000, 1200]
}
ddf = pd.DataFrame(MyInventory)
ddf
Python Pandas DataFrame
Inv2 = {
"Item": ["coffee", "chocolate", "tea",
"water"],
"Promotion": ["no","no","yes","yes"],
"Price": [5.95, 5.95, 3.95, 2.95],
"Stock": [100, 250, 1000, 1200]
}
InvDF = pd.DataFrame(Inv2)
InvDF
Python Pandas DataFrame
Inv2 = {
"Item": ["coffee", "chocolate", "tea",
"water"],
"Promotion": ["no","no","yes","yes"],
"Price": [5.95, 5.95, 3.95, 2.95],
"Stock": [100, 250, 1000, 1200]
}
InvDF = pd.DataFrame(Inv2)
InvDF
InvDF = InvDF.replace({'Promotion': {'no':
False,'yes': True}})
Python Pandas DataFrame
InvDF[InvDF["Promotion"] == False]
InvDF[InvDF["Price"] < 5]
Data Science: Python Pandas DataFrame from Excel
Pandas DataFrame Review
DataFrame is a 2‐dimensional
labeled data structure with
columns of potentially different
types
Data Science: Python Pandas DataFrame from Excel
Data Science: Python Pandas DataFrame from Excel
STEPS:
1. import the pandas library
2. Create a variable for the data frame
to store all columns and values
from the Excel worksheet
3. Use the API from pandas
pandas.read_excel(“filename.xlsx”)
to read the file “filename.xlsx” and
assign all columns and values to the
data frame variable
Data Science: Python Pandas DataFrame from Excel
Data Science: Python Pandas DataFrame from Excel
• You can access the specific column by
referencing the index (column label) of
that column
– df[“Price”] will return the values for the
entire column “Price”
– You can also use the form df.Price
• 0, 1, 2, 3, 4, 5, 6, 7 on the left most
column is the index for the rows. You
can access the value stored in column
Price row 0 using df.Price[index]
– df.Price[0] will return the value of the
first element of column “Price” = 50
– df.Price[6] will return the value of the
seventh element of column “Price” = 80
– df.Sales[2] = 11
– df.Quantity[4] = 13.2
Data Science: Python Pandas DataFrame from Excel
Data Science: Python Pandas DataFrame from Excel Columns
pandas.read_excel(“filename.xlsx”)
Data Science: Python Pandas DataFrame from Excel
What if we only want to read column
“Price” into the dataframe variable?
Data Science: Python Pandas DataFrame from Excel
STEPS:
1. import the pandas library
2. Create a variable for the data frame to store all
columns and values from the Excel worksheet
3. Use the API from pandas
pandas.read_excel(“filename.xlsx”) to read the file
“filename.xlsx” and assign all columns and values to
the data frame variable.
1. This time we will use an additional argument named
usecols.
2. pandas.read_excel(“filename.xlsx”, usecols=[0]) to read
first column only
3. pandas.read_excel(“filename.xlsx”, usecols=[0, 1]) to read
first column and second column only
Data Science: Python Pandas DataFrame from Excel
Data Science: Python Pandas DataFrame Saving to Excel File
Data Science: Python Pandas DataFrame from csv
Data Science: Python Pandas DataFrame from csv
Data Science: Python Pandas DataFrame from csv
STEPS:
1. import the pandas library
2. Create a variable for the data frame
to store all columns and values
from the csv file
3. Use the API from pandas
pandas.read_csv(“filename.csv”) to
read the file “filename.csv” and
assign all data to the data frame
variable
Data Science: Python Pandas DataFrame from csv
Data Science: Python Pandas DataFrame from csv
Specific Column
Data Science: Python Pandas DataFrame Saving to a csv file