0% found this document useful (0 votes)
120 views4 pages

Refactor Wine Quality - Ipynb

The document describes code that analyzes a wine quality dataset. The code first renames columns to replace spaces with underscores. It then calculates statistics to see how different features relate to wine quality ratings by grouping data into above and below median values and finding the mean quality for each group. The document notes the code could be refactored to make it more clean, modular and efficient.

Uploaded by

Amal Abdallah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
120 views4 pages

Refactor Wine Quality - Ipynb

The document describes code that analyzes a wine quality dataset. The code first renames columns to replace spaces with underscores. It then calculates statistics to see how different features relate to wine quality ratings by grouping data into above and below median values and finding the mean quality for each group. The document notes the code could be refactored to make it more clean, modular and efficient.

Uploaded by

Amal Abdallah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 4

{

"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Refactor: Wine Quality Analysis\n",
"In this exercise, you'll refactor code that analyzes a wine quality dataset
taken from the UCI Machine Learning Repository [here]
(https://archive.ics.uci.edu/ml/datasets/wine+quality). Each row contains data on a
wine sample, including several physicochemical properties gathered from tests, as
well as a quality rating evaluated by wine experts.\n",
"\n",
"The code in this notebook first renames the columns of the dataset and then
calculates some statistics on how some features may be related to quality ratings.
Can you refactor this code to make it more clean and modular?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"df = pd.read_csv('winequality-red.csv', sep=';')\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Renaming Columns\n",
"You want to replace the spaces in the column labels with underscores to be
able to reference columns with dot notation. Here's one way you could've done it."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"new_df = df.rename(columns={'fixed acidity': 'fixed_acidity',\n",
" 'volatile acidity': 'volatile_acidity',\n",
" 'citric acid': 'citric_acid',\n",
" 'residual sugar': 'residual_sugar',\n",
" 'free sulfur dioxide': 'free_sulfur_dioxide',\n",
" 'total sulfur dioxide':
'total_sulfur_dioxide'\n",
" })\n",
"new_df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And here's a slightly better way you could do it. You can avoid making naming
errors due to typos caused by manual typing. However, this looks a little
repetitive. Can you make it better?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"labels = list(df.columns)\n",
"labels[0] = labels[0].replace(' ', '_')\n",
"labels[1] = labels[1].replace(' ', '_')\n",
"labels[2] = labels[2].replace(' ', '_')\n",
"labels[3] = labels[3].replace(' ', '_')\n",
"labels[5] = labels[5].replace(' ', '_')\n",
"labels[6] = labels[6].replace(' ', '_')\n",
"df.columns = labels\n",
"\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Analyzing Features\n",
"Now that your columns are ready, you want to see how different features of
this dataset relate to the quality rating of the wine. A very simple way you could
do this is by observing the mean quality rating for the top and bottom half of each
feature. The code below does this for four features. It looks pretty repetitive
right now. Can you make this more concise? \n",
"\n",
"You might challenge yourself to figure out how to make this code more
efficient! But you don't need to worry too much about efficiency right now - we
will cover that more in the next section."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"median_alcohol = df.alcohol.median()\n",
"for i, alcohol in enumerate(df.alcohol):\n",
" if alcohol >= median_alcohol:\n",
" df.loc[i, 'alcohol'] = 'high'\n",
" else:\n",
" df.loc[i, 'alcohol'] = 'low'\n",
"df.groupby('alcohol').quality.mean()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"median_pH = df.pH.median()\n",
"for i, pH in enumerate(df.pH):\n",
" if pH >= median_pH:\n",
" df.loc[i, 'pH'] = 'high'\n",
" else:\n",
" df.loc[i, 'pH'] = 'low'\n",
"df.groupby('pH').quality.mean()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"median_sugar = df.residual_sugar.median()\n",
"for i, sugar in enumerate(df.residual_sugar):\n",
" if sugar >= median_sugar:\n",
" df.loc[i, 'residual_sugar'] = 'high'\n",
" else:\n",
" df.loc[i, 'residual_sugar'] = 'low'\n",
"df.groupby('residual_sugar').quality.mean()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"median_citric_acid = df.citric_acid.median()\n",
"for i, citric_acid in enumerate(df.citric_acid):\n",
" if citric_acid >= median_citric_acid:\n",
" df.loc[i, 'citric_acid'] = 'high'\n",
" else:\n",
" df.loc[i, 'citric_acid'] = 'low'\n",
"df.groupby('citric_acid').quality.mean()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

You might also like