Refactor Wine Quality - Ipynb

The document describes code that analyzes a wine quality dataset. The code first renames columns to replace spaces with underscores. It then calculates statistics to see how different features relate to wine quality ratings by grouping data into above and below median values and finding the mean quality for each group. The document notes the code could be refactored to make it more clean, modular and efficient.

Uploaded by

Amal Abdallah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

120 views4 pages

Refactor Wine Quality - Ipynb

Uploaded by

Amal Abdallah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 4

{

"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Refactor: Wine Quality Analysis\n",
"In this exercise, you'll refactor code that analyzes a wine quality dataset
taken from the UCI Machine Learning Repository [here]
(https://archive.ics.uci.edu/ml/datasets/wine+quality). Each row contains data on a
wine sample, including several physicochemical properties gathered from tests, as
well as a quality rating evaluated by wine experts.\n",
"\n",
"The code in this notebook first renames the columns of the dataset and then
calculates some statistics on how some features may be related to quality ratings.
Can you refactor this code to make it more clean and modular?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"df = pd.read_csv('winequality-red.csv', sep=';')\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Renaming Columns\n",
"You want to replace the spaces in the column labels with underscores to be
able to reference columns with dot notation. Here's one way you could've done it."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"new_df = df.rename(columns={'fixed acidity': 'fixed_acidity',\n",
" 'volatile acidity': 'volatile_acidity',\n",
" 'citric acid': 'citric_acid',\n",
" 'residual sugar': 'residual_sugar',\n",
" 'free sulfur dioxide': 'free_sulfur_dioxide',\n",
" 'total sulfur dioxide':
'total_sulfur_dioxide'\n",
" })\n",
"new_df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And here's a slightly better way you could do it. You can avoid making naming
errors due to typos caused by manual typing. However, this looks a little
repetitive. Can you make it better?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"labels = list(df.columns)\n",
"labels[0] = labels[0].replace(' ', '_')\n",
"labels[1] = labels[1].replace(' ', '_')\n",
"labels[2] = labels[2].replace(' ', '_')\n",
"labels[3] = labels[3].replace(' ', '_')\n",
"labels[5] = labels[5].replace(' ', '_')\n",
"labels[6] = labels[6].replace(' ', '_')\n",
"df.columns = labels\n",
"\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Analyzing Features\n",
"Now that your columns are ready, you want to see how different features of
this dataset relate to the quality rating of the wine. A very simple way you could
do this is by observing the mean quality rating for the top and bottom half of each
feature. The code below does this for four features. It looks pretty repetitive
right now. Can you make this more concise? \n",
"\n",
"You might challenge yourself to figure out how to make this code more
efficient! But you don't need to worry too much about efficiency right now - we
will cover that more in the next section."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"median_alcohol = df.alcohol.median()\n",
"for i, alcohol in enumerate(df.alcohol):\n",
" if alcohol >= median_alcohol:\n",
" df.loc[i, 'alcohol'] = 'high'\n",
" else:\n",
" df.loc[i, 'alcohol'] = 'low'\n",
"df.groupby('alcohol').quality.mean()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"median_pH = df.pH.median()\n",
"for i, pH in enumerate(df.pH):\n",
" if pH >= median_pH:\n",
" df.loc[i, 'pH'] = 'high'\n",
" else:\n",
" df.loc[i, 'pH'] = 'low'\n",
"df.groupby('pH').quality.mean()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"median_sugar = df.residual_sugar.median()\n",
"for i, sugar in enumerate(df.residual_sugar):\n",
" if sugar >= median_sugar:\n",
" df.loc[i, 'residual_sugar'] = 'high'\n",
" else:\n",
" df.loc[i, 'residual_sugar'] = 'low'\n",
"df.groupby('residual_sugar').quality.mean()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"median_citric_acid = df.citric_acid.median()\n",
"for i, citric_acid in enumerate(df.citric_acid):\n",
" if citric_acid >= median_citric_acid:\n",
" df.loc[i, 'citric_acid'] = 'high'\n",
" else:\n",
" df.loc[i, 'citric_acid'] = 'low'\n",
"df.groupby('citric_acid').quality.mean()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

Time Series Forecasting Jupyter Code - Ipynb
No ratings yet
Time Series Forecasting Jupyter Code - Ipynb
2,484 pages
Pico Bricks Ebook 15
100% (1)
Pico Bricks Ebook 15
234 pages
Quality Prediction Checkpoint
No ratings yet
Quality Prediction Checkpoint
14 pages
Sales - Project - v3 (2) .Ipynb
No ratings yet
Sales - Project - v3 (2) .Ipynb
1,230 pages
Python For DS Cheat Sheet
100% (2)
Python For DS Cheat Sheet
6 pages
Audi A4 Avant 95-01 Service & Repair Manual - Heating and AC
No ratings yet
Audi A4 Avant 95-01 Service & Repair Manual - Heating and AC
231 pages
Trends
No ratings yet
Trends
13 pages
50 Safety Director Interview Questions and Answers 1734275478
No ratings yet
50 Safety Director Interview Questions and Answers 1734275478
5 pages
Project CST 383
No ratings yet
Project CST 383
1,083 pages
Smart Factory Energy Prediction - Ipynb
No ratings yet
Smart Factory Energy Prediction - Ipynb
355 pages
Basic and Advanced Laboratory Techniques in Histopathology and Cytology
100% (12)
Basic and Advanced Laboratory Techniques in Histopathology and Cytology
275 pages
Pandas Commands
No ratings yet
Pandas Commands
3 pages
Essential n8n Playbook
From Everand
Essential n8n Playbook
Leandro Calado
No ratings yet
Assignment 4
No ratings yet
Assignment 4
216 pages
COT-MATH - Identifying Parallel, Inter-Secting and Perpendicular Lines
70% (10)
COT-MATH - Identifying Parallel, Inter-Secting and Perpendicular Lines
3 pages
Kaggle Course Notes
No ratings yet
Kaggle Course Notes
87 pages
Name: Reg. No.: Lab Exercise:: Shivam Batra 19BPS1131
100% (1)
Name: Reg. No.: Lab Exercise:: Shivam Batra 19BPS1131
10 pages
Learning Concepts Hackers Realm
No ratings yet
Learning Concepts Hackers Realm
78 pages
2 3-SVM Ipynb
No ratings yet
2 3-SVM Ipynb
111 pages
Pandas Cheat Sheet
100% (2)
Pandas Cheat Sheet
6 pages
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
100% (4)
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
11 pages
Pandas
No ratings yet
Pandas
91 pages
Interactive Data Analysis With Jupyter Cheatsheet 1731972443
No ratings yet
Interactive Data Analysis With Jupyter Cheatsheet 1731972443
10 pages
Pandas Cheat Sheet
100% (1)
Pandas Cheat Sheet
2 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
42 pages
Python Cheat Sheet For Excel Users
100% (2)
Python Cheat Sheet For Excel Users
5 pages
Basic Python Analysis
No ratings yet
Basic Python Analysis
33 pages
2600 v25 n3 (Autumn 2008)
No ratings yet
2600 v25 n3 (Autumn 2008)
68 pages
Pandas Cheat Sheet PDF
67% (3)
Pandas Cheat Sheet PDF
1 page
Wine
No ratings yet
Wine
22 pages
Agri
No ratings yet
Agri
106 pages
Eda Red Wine
No ratings yet
Eda Red Wine
16 pages
Data Cleaning
No ratings yet
Data Cleaning
22 pages
Script 5
No ratings yet
Script 5
2 pages
Multi Classification - Py (For 1 Class TP, TN, FP, FN)
No ratings yet
Multi Classification - Py (For 1 Class TP, TN, FP, FN)
25 pages
Empirical Crop Suitability Model 1694688954
No ratings yet
Empirical Crop Suitability Model 1694688954
24 pages
Datamining Exp5 Datanormalisation
No ratings yet
Datamining Exp5 Datanormalisation
14 pages
EDA Assignment Day 14.ipynb
No ratings yet
EDA Assignment Day 14.ipynb
19 pages
Mini Project Report
No ratings yet
Mini Project Report
12 pages
Zomoto Data Analysis Using Python - 1
No ratings yet
Zomoto Data Analysis Using Python - 1
10 pages
1 4-EDA Ipynb
No ratings yet
1 4-EDA Ipynb
12 pages
Import As Import As Import As Import As Import As From Import From Import From Import
No ratings yet
Import As Import As Import As Import As Import As From Import From Import From Import
12 pages
A Beginner's Guide To ETL With Python - by Jesús Cantú - Medium
No ratings yet
A Beginner's Guide To ETL With Python - by Jesús Cantú - Medium
13 pages
My Own Cheatsheet
No ratings yet
My Own Cheatsheet
13 pages
Wine Quality Prediction Using Machine Learning
No ratings yet
Wine Quality Prediction Using Machine Learning
10 pages
EDS - Python Cheat Sheet
0% (1)
EDS - Python Cheat Sheet
3 pages
Book 13 Apr 2024
No ratings yet
Book 13 Apr 2024
15 pages
Wine DS
No ratings yet
Wine DS
14 pages
Abdullah Dawoud - SCMA - Volume 21 - Issue 1 - Pages 173-188
No ratings yet
Abdullah Dawoud - SCMA - Volume 21 - Issue 1 - Pages 173-188
17 pages
Information Brochure Diploma Certificate Courses
No ratings yet
Information Brochure Diploma Certificate Courses
12 pages
PFD For Upload - 4
No ratings yet
PFD For Upload - 4
11 pages
Qaisar Nadeem Department of Nuclear Engineering, PIEAS Pakistan 1 Meteorology and Radioactive Effluent Dispersion
No ratings yet
Qaisar Nadeem Department of Nuclear Engineering, PIEAS Pakistan 1 Meteorology and Radioactive Effluent Dispersion
21 pages
Python Machine Learning Tutorial With Scikit-Learn
No ratings yet
Python Machine Learning Tutorial With Scikit-Learn
16 pages
Python Project 2 Colab
No ratings yet
Python Project 2 Colab
6 pages
Lab Manual - Skull Bones - English - Student - Fill in
No ratings yet
Lab Manual - Skull Bones - English - Student - Fill in
6 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
5 pages
UNIT 3 - Test 2
No ratings yet
UNIT 3 - Test 2
7 pages
Linear Regression with Multiple Covariates
From Everand
Linear Regression with Multiple Covariates
Brett Kottmann
No ratings yet
Pyspark MLlib
No ratings yet
Pyspark MLlib
4 pages
Content Pandas Cheat Sheet
No ratings yet
Content Pandas Cheat Sheet
9 pages
Low-Cost Strategy in The Air Air Arabia
No ratings yet
Low-Cost Strategy in The Air Air Arabia
15 pages
Practical04.ipynb - Colab
No ratings yet
Practical04.ipynb - Colab
2 pages
Worcester Wave: Installation and Operating Manual
No ratings yet
Worcester Wave: Installation and Operating Manual
16 pages
HW04
No ratings yet
HW04
3 pages
Untitledd
No ratings yet
Untitledd
3 pages
Different Methods of Plotting
No ratings yet
Different Methods of Plotting
4 pages
Exercise Ipynb
No ratings yet
Exercise Ipynb
3 pages
Pandas Notes
No ratings yet
Pandas Notes
5 pages
3 3 2
100% (1)
3 3 2
5 pages
Compte Rendu TP 2 Pandas
No ratings yet
Compte Rendu TP 2 Pandas
2 pages
Important Pandas Operations 1697910759
No ratings yet
Important Pandas Operations 1697910759
6 pages
Python Cheat Sheet Code Academy
100% (1)
Python Cheat Sheet Code Academy
1 page
Decision Trees
No ratings yet
Decision Trees
2 pages
EDA With Pandas CheatSheet
No ratings yet
EDA With Pandas CheatSheet
3 pages
New Text Document
No ratings yet
New Text Document
1 page
Part 3 Speaking On The Phone Public Places
No ratings yet
Part 3 Speaking On The Phone Public Places
1 page
Europe 1900
No ratings yet
Europe 1900
1 page
Fox Pueblo Baseball A New Use For Old Witchcraft 1961
No ratings yet
Fox Pueblo Baseball A New Use For Old Witchcraft 1961
9 pages
Mathematics 8 Lesson Plan
No ratings yet
Mathematics 8 Lesson Plan
8 pages
Far FA1200-5300047040 Despiece
No ratings yet
Far FA1200-5300047040 Despiece
6 pages
Anja Golob 5 Poems (Tadeja Spruk)
No ratings yet
Anja Golob 5 Poems (Tadeja Spruk)
9 pages
Accelerated Data Science Getting Started Cheat Sheet Cudf 2003937 r4
No ratings yet
Accelerated Data Science Getting Started Cheat Sheet Cudf 2003937 r4
2 pages
Data Science Cheat Sheet: KEY Imports
100% (1)
Data Science Cheat Sheet: KEY Imports
1 page
Pandas Usefull Code
No ratings yet
Pandas Usefull Code
2 pages
Troubleshooting
No ratings yet
Troubleshooting
6 pages
Mathematics 1: Matrix Algebra E. Kreyszig
No ratings yet
Mathematics 1: Matrix Algebra E. Kreyszig
1 page
Jama Caricchio 2021 Oi 210064 1626283669.23567
No ratings yet
Jama Caricchio 2021 Oi 210064 1626283669.23567
10 pages
Calamba Doctors' College: Entrepreneurship Subject Activity Sheet
No ratings yet
Calamba Doctors' College: Entrepreneurship Subject Activity Sheet
1 page
Jenkins BeetleBook Guide
No ratings yet
Jenkins BeetleBook Guide
2 pages
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet

Refactor Wine Quality - Ipynb

Uploaded by

Refactor Wine Quality - Ipynb

Uploaded by

{

You might also like