Starter-Chasedb1-Ca720eec-4 (1) .Ipynb - File
Starter-Chasedb1-Ca720eec-4 (1) .Ipynb - File
Starter-Chasedb1-Ca720eec-4 (1) .Ipynb - File
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Introduction\n",
"Greetings from the Kaggle bot! This is an automatically-generated kernel with
starter code demonstrating how to read in the data and begin exploring. If you're
inspired to dig deeper, click the blue \"Fork Notebook\" button at the top of this
kernel to begin editing."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exploratory Analysis\n",
"To begin this exploratory analysis, first import libraries and define
functions for plotting the data using `matplotlib`. Depending on the data, not all
plots will be made. (Hey, I'm just a simple kerneling bot, not a Kaggle
Competitions Grandmaster!)"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"_kg_hide-input": false,
"collapsed": false
},
"outputs": [],
"source": [
"from mpl_toolkits.mplot3d import Axes3D\n",
"from sklearn.preprocessing import StandardScaler\n",
"import matplotlib.pyplot as plt # plotting\n",
"import numpy as np # linear algebra\n",
"import os # accessing directory structure\n",
"import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There is 0 csv file in the current version of the dataset:\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"_kg_hide-input": false,
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"/kaggle/input/Image_08L.jpg\n",
"/kaggle/input/Image_12L.jpg\n",
"/kaggle/input/Image_11R.jpg\n",
"/kaggle/input/Image_09L.jpg\n",
"/kaggle/input/Image_03R.jpg\n",
"/kaggle/input/Image_12L_2ndHO.png\n",
"/kaggle/input/Image_06L_2ndHO.png\n",
"/kaggle/input/Image_02R_2ndHO.png\n",
"/kaggle/input/Image_03R_1stHO.png\n",
"/kaggle/input/Image_06R_2ndHO.png\n",
"/kaggle/input/Image_12R.jpg\n",
"/kaggle/input/Image_01R_1stHO.png\n",
"/kaggle/input/Image_06R.jpg\n",
"/kaggle/input/Image_11L_2ndHO.png\n",
"/kaggle/input/Image_07R_2ndHO.png\n",
"/kaggle/input/Image_02R.jpg\n",
"/kaggle/input/Image_14R.jpg\n",
"/kaggle/input/Image_11L.jpg\n",
"/kaggle/input/Image_07L.jpg\n",
"/kaggle/input/Image_01L_2ndHO.png\n",
"/kaggle/input/Image_11L_1stHO.png\n",
"/kaggle/input/Image_12L_1stHO.png\n",
"/kaggle/input/Image_10R.jpg\n",
"/kaggle/input/Image_06R_1stHO.png\n",
"/kaggle/input/Image_14L_1stHO.png\n",
"/kaggle/input/Image_08R.jpg\n",
"/kaggle/input/Image_11R_2ndHO.png\n",
"/kaggle/input/Image_09L_1stHO.png\n",
"/kaggle/input/Image_01R_2ndHO.png\n",
"/kaggle/input/Image_09L_2ndHO.png\n",
"/kaggle/input/Image_14R_2ndHO.png\n",
"/kaggle/input/Image_11R_1stHO.png\n",
"/kaggle/input/Image_03L.jpg\n",
"/kaggle/input/Image_06L_1stHO.png\n",
"/kaggle/input/Image_09R_1stHO.png\n",
"/kaggle/input/Image_13R_1stHO.png\n",
"/kaggle/input/Image_14L.jpg\n",
"/kaggle/input/Image_01L.jpg\n",
"/kaggle/input/Image_05R_2ndHO.png\n",
"/kaggle/input/Image_13R.jpg\n",
"/kaggle/input/Image_09R.jpg\n",
"/kaggle/input/Image_07L_2ndHO.png\n",
"/kaggle/input/Image_09R_2ndHO.png\n",
"/kaggle/input/Image_01R.jpg\n",
"/kaggle/input/Image_10R_1stHO.png\n",
"/kaggle/input/Image_10L_1stHO.png\n",
"/kaggle/input/Image_04R.jpg\n",
"/kaggle/input/Image_07R.jpg\n",
"/kaggle/input/Image_02R_1stHO.png\n",
"/kaggle/input/Image_13L_1stHO.png\n",
"/kaggle/input/Image_12R_2ndHO.png\n",
"/kaggle/input/Image_08R_1stHO.png\n",
"/kaggle/input/Image_08L_1stHO.png\n",
"/kaggle/input/Image_05L_1stHO.png\n",
"/kaggle/input/Image_13L_2ndHO.png\n",
"/kaggle/input/Image_10L_2ndHO.png\n",
"/kaggle/input/Image_03L_2ndHO.png\n",
"/kaggle/input/Image_07R_1stHO.png\n",
"/kaggle/input/Image_08L_2ndHO.png\n",
"/kaggle/input/Image_02L_2ndHO.png\n",
"/kaggle/input/Image_01L_1stHO.png\n",
"/kaggle/input/Image_03L_1stHO.png\n",
"/kaggle/input/Image_12R_1stHO.png\n",
"/kaggle/input/Image_05L_2ndHO.png\n",
"/kaggle/input/Image_04R_1stHO.png\n",
"/kaggle/input/Image_08R_2ndHO.png\n",
"/kaggle/input/Image_05L.jpg\n",
"/kaggle/input/Image_02L.jpg\n",
"/kaggle/input/Image_04R_2ndHO.png\n",
"/kaggle/input/Image_05R_1stHO.png\n",
"/kaggle/input/Image_13R_2ndHO.png\n",
"/kaggle/input/Image_04L_2ndHO.png\n",
"/kaggle/input/Image_14L_2ndHO.png\n",
"/kaggle/input/Image_02L_1stHO.png\n",
"/kaggle/input/Image_14R_1stHO.png\n",
"/kaggle/input/Image_06L.jpg\n",
"/kaggle/input/Image_13L.jpg\n",
"/kaggle/input/Image_04L_1stHO.png\n",
"/kaggle/input/Image_10R_2ndHO.png\n",
"/kaggle/input/Image_03R_2ndHO.png\n",
"/kaggle/input/Image_10L.jpg\n",
"/kaggle/input/Image_07L_1stHO.png\n",
"/kaggle/input/Image_05R.jpg\n",
"/kaggle/input/Image_04L.jpg\n"
]
}
],
"source": [
"for dirname, _, filenames in os.walk('/kaggle/input'):\n",
" for filename in filenames:\n",
" print(os.path.join(dirname, filename))\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The next hidden code cells define functions for plotting data. Click on
the \"Code\" button in the published kernel to reveal the hidden code."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"_kg_hide-input": true,
"collapsed": true
},
"outputs": [],
"source": [
"# Distribution graphs (histogram/bar graph) of column data\n",
"def plotPerColumnDistribution(df, nGraphShown, nGraphPerRow):\n",
" nunique = df.nunique()\n",
" df = df[[col for col in df if nunique[col] > 1 and nunique[col] < 50]] #
For displaying purposes, pick columns that have between 1 and 50 unique values\n",
" nRow, nCol = df.shape\n",
" columnNames = list(df)\n",
" nGraphRow = (nCol + nGraphPerRow - 1) / nGraphPerRow\n",
" plt.figure(num = None, figsize = (6 * nGraphPerRow, 8 * nGraphRow), dpi =
80, facecolor = 'w', edgecolor = 'k')\n",
" for i in range(min(nCol, nGraphShown)):\n",
" plt.subplot(nGraphRow, nGraphPerRow, i + 1)\n",
" columnDf = df.iloc[:, i]\n",
" if (not np.issubdtype(type(columnDf.iloc[0]), np.number)):\n",
" valueCounts = columnDf.value_counts()\n",
" valueCounts.plot.bar()\n",
" else:\n",
" columnDf.hist()\n",
" plt.ylabel('counts')\n",
" plt.xticks(rotation = 90)\n",
" plt.title(f'{columnNames[i]} (column {i})')\n",
" plt.tight_layout(pad = 1.0, w_pad = 1.0, h_pad = 1.0)\n",
" plt.show()\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"_kg_hide-input": true,
"collapsed": true
},
"outputs": [],
"source": [
"# Correlation matrix\n",
"def plotCorrelationMatrix(df, graphWidth):\n",
" filename = df.dataframeName\n",
" df = df.dropna('columns') # drop columns with NaN\n",
" df = df[[col for col in df if df[col].nunique() > 1]] # keep columns where
there are more than 1 unique values\n",
" if df.shape[1] < 2:\n",
" print(f'No correlation plots shown: The number of non-NaN or constant
columns ({df.shape[1]}) is less than 2')\n",
" return\n",
" corr = df.corr()\n",
" plt.figure(num=None, figsize=(graphWidth, graphWidth), dpi=80,
facecolor='w', edgecolor='k')\n",
" corrMat = plt.matshow(corr, fignum = 1)\n",
" plt.xticks(range(len(corr.columns)), corr.columns, rotation=90)\n",
" plt.yticks(range(len(corr.columns)), corr.columns)\n",
" plt.gca().xaxis.tick_bottom()\n",
" plt.colorbar(corrMat)\n",
" plt.title(f'Correlation Matrix for {filename}', fontsize=15)\n",
" plt.show()\n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"_kg_hide-input": true,
"collapsed": true
},
"outputs": [],
"source": [
"# Scatter and density plots\n",
"def plotScatterMatrix(df, plotSize, textSize):\n",
" df = df.select_dtypes(include =[np.number]) # keep only numerical columns\
n",
" # Remove rows and columns that would lead to df being singular\n",
" df = df.dropna('columns')\n",
" df = df[[col for col in df if df[col].nunique() > 1]] # keep columns where
there are more than 1 unique values\n",
" columnNames = list(df)\n",
" if len(columnNames) > 10: # reduce the number of columns for matrix
inversion of kernel density plots\n",
" columnNames = columnNames[:10]\n",
" df = df[columnNames]\n",
" ax = pd.plotting.scatter_matrix(df, alpha=0.75, figsize=[plotSize,
plotSize], diagonal='kde')\n",
" corrs = df.corr().values\n",
" for i, j in zip(*plt.np.triu_indices_from(ax, k = 1)):\n",
" ax[i, j].annotate('Corr. coef = %.3f' % corrs[i, j], (0.8, 0.2),
xycoords='axes fraction', ha='center', va='center', size=textSize)\n",
" plt.suptitle('Scatter and Density Plot')\n",
" plt.show()\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Oh, no! There are no automatic insights available for the file types used in
this dataset. As your Kaggle kerneler bot, I'll keep working to fine-tune my hyper-
parameters. In the meantime, please feel free to try a different dataset."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Conclusion\n",
"This concludes your starter analysis! To go forward from here, click the
blue \"Fork Notebook\" button at the top of this kernel. This will create a copy of
the code and environment for you to edit. Delete, modify, and add code as you
please. Happy Kaggling!"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.6"
}
},
"nbformat": 4,
"nbformat_minor": 0
}