Pandas
Pandas
{"name":"python3","display_name":"Python 3"},"language_info":{"name":"python"}},"cells":
[{"cell_type":"markdown","source":["# Guideline to Exploratpry Data Analysis\n","\n","standard guideline for exploratory
data analysis (EDA) that you can follow:\n","\n","- **Understand the Data:** Familiarize yourself with the dataset and
the variables it contains. Read any available documentation or data dictionary to gain insights into the meaning and
context of the data.\n","\n","- **Data Cleaning:** Preprocess and clean the data to handle missing values, outliers, and
inconsistencies. This step ensures the data is in a usable format for analysis.\n","\n","- **Descriptive Statistics:**
Calculate summary statistics (mean, median, mode, standard deviation, etc.) to gain a high-level understanding of the
data distribution and central tendencies.\n","\n","- **Data Visualization:** Visualize the data using various charts,
graphs, and plots to identify patterns, trends, and relationships. Common visualizations include histograms, box plots,
scatter plots, bar charts, and line plots.\n","\n","- **Univariate Analysis:** Analyze individual variables in isolation to
understand their distributions, skewness, and potential outliers. Use appropriate visualizations and statistical measures
for univariate analysis.\n","\n","- **Bivariate Analysis:** Explore relationships between pairs of variables to identify
correlations, associations, or dependencies. Scatter plots, heatmaps, correlation matrices, and statistical tests (e.g.,
Pearson correlation coefficient) can be useful for bivariate analysis.\n","\n","- **Multivariate Analysis:** Investigate
interactions and dependencies among multiple variables simultaneously. Techniques such as dimensionality reduction
(e.g., PCA), cluster analysis, and parallel coordinates plots can aid in understanding complex relationships.\n","\n","-
**Feature Engineering:** Create new derived features or transform existing features to enhance the predictive power of
the data. This step may involve techniques like scaling, binning, one-hot encoding, or creating interaction
variables.\n","\n","- **Data Quality Check:** Verify the quality of the data by checking for data integrity, consistency, and
accuracy. Address any issues or discrepancies that may affect the reliability of the analysis.\n","\n","- **Statistical
Testing:** Conduct statistical tests, such as t-tests, chi-square tests, or ANOVA, to evaluate hypotheses, compare
groups, or identify significant differences in the data.\n","\n","- **Documentation:** Document your findings, insights,
and any decisions made during the EDA process. Prepare clear and concise summaries, visualizations, and reports
that effectively communicate the results of your analysis.\n","\n"],"metadata":{"id":"x5mH3McmMxuU"}},
{"cell_type":"markdown","source":["# Load dataset"],"metadata":{"id":"8sxU6JfQXkAY"}},{"cell_type":"code","source":
["!gdown --id 1Qk5FZxfA_jhDcxI3YmuEIbVgd8ZeldMn"],"metadata":{"colab":
{"base_uri":"https://localhost:8080/"},"id":"Gay3yccNXjto","outputId":"f3b944cd-743d-4b7f-c404-
1fa10e36050c","executionInfo":{"status":"ok","timestamp":1731237952455,"user_tz":-360,"elapsed":5322,"user":
{"displayName":"Mohammad Rifat Ahmmad
Rashid","userId":"17207620860184690696"}}},"execution_count":null,"outputs":
[{"output_type":"stream","name":"stdout","text":["/usr/local/lib/python3.10/dist-packages/gdown/__main__.py:140:
FutureWarning: Option `--id` was deprecated in version 4.3.1 and will be removed in 5.0. You don't need to pass it
anymore to use a file ID.\n"," warnings.warn(\n","Downloading...\n","From: https://drive.google.com/uc?
id=1Qk5FZxfA_jhDcxI3YmuEIbVgd8ZeldMn\n","To: /content/BMI Calculation_MJH.xlsx\n","100% 138k/138k
[00:00<00:00, 60.5MB/s]\n"]}]},{"cell_type":"markdown","source":["# Part 1: Basics of Pandas Data
Structures"],"metadata":{"id":"xuNEetqDX_y1"}},{"cell_type":"markdown","source":["### DataFrame
Creation\n","Objective: Learn to create Pandas DataFrame from various sources."],"metadata":
{"id":"EURjDyLTXH0m"}},{"cell_type":"code","source":["import pandas as pd\n","\n","# Create a DataFrame from a
Python dictionary\n","data = {'Name': ['John', 'Anna', 'Peter', 'Linda'],\n"," 'Age': [28, 34, 29, 32],\n"," 'City': ['New York',
'Paris', 'Berlin', 'London']}\n","\n","employee = {'id' : [\"emp-1\",\"emp-2\"],\n"," 'name': ['emp1-name','emp2-name'],\n","
'salary': [50000,6000]\n"," }\n","\n","df = pd.DataFrame(data)\n","\n","df_employee =
pd.DataFrame(employee)\n","df_employee"],"metadata":{"colab":
{"base_uri":"https://localhost:8080/","height":125},"id":"VuXBMGSh1OgO","executionInfo":
{"status":"ok","timestamp":1731238247696,"user_tz":-360,"elapsed":384,"user":{"displayName":"Mohammad Rifat
Ahmmad Rashid","userId":"17207620860184690696"}},"outputId":"4ded653c-fe00-4cf3-e900-
1eb5bc16b587"},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":[" id name
salary\n","0 emp-1 emp1-name 50000\n","1 emp-2 emp2-name 6000"],"text/html":["\n","
\n","
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
id name salary
0 emp-1 emp1-name 50000
1 emp-2 emp2-name 6000
\n","
\n","
\n","\n","
\n" " \n" "\n" " \n" " \n" "\n" " \n" "\n" " \n" "
\n , \n , \n , \n , \n , \n , \n , \n , \n ,
\n","\n","\n","
\n","\n","
\n"," \n"," \n","\n","\n","\n"," \n","
\n","\n","
\n","\n","
\n"," \n"," \n"," \n"," \n","
\n","\n","
\n","
\n"],"application/vnd.google.colaboratory.intrinsic+json":
{"type":"dataframe","variable_name":"df_employee","summary":"{\n \"name\": \"df_employee\",\n \"rows\": 2,\n \"fields\":
[\n {\n \"column\": \"id\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 2,\n \"samples\": [\n \"emp-
2\",\n \"emp-1\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"name\",\n \"properties\": {\n
\"dtype\": \"string\",\n \"num_unique_values\": 2,\n \"samples\": [\n \"emp2-name\",\n \"emp1-name\"\n ],\n
\"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"salary\",\n \"properties\": {\n \"dtype\":
\"number\",\n \"std\": 31112,\n \"min\": 6000,\n \"max\": 50000,\n \"num_unique_values\": 2,\n \"samples\": [\n 6000,\n
50000\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"}},"metadata":{},"execution_count":4}]},
{"cell_type":"code","execution_count":null,"metadata":{"colab":
{"base_uri":"https://localhost:8080/","height":258},"id":"REkZVYhtVQaY","outputId":"e1b31506-d5cc-4416-c2cf-
70f524494953","executionInfo":{"status":"ok","timestamp":1731238369211,"user_tz":-360,"elapsed":1094,"user":
{"displayName":"Mohammad Rifat Ahmmad Rashid","userId":"17207620860184690696"}}},"outputs":
[{"output_type":"stream","name":"stdout","text":["\n","Loaded DataFrame from a Excel file:\n","\n"]},
{"output_type":"execute_result","data":{"text/plain":[" Feet Inch Hieght (cm) Hieght (m2) W1 W2 BMI (Before COVID)
\\\n","0 5 6.0 167.64 2.810317 70.0 78.0 24.908222 \n","1 5 1.0 154.94 2.400640 51.0 55.0 21.244332 \n","2 5 0.0
152.40 2.322576 44.0 49.0 18.944482 \n","3 5 1.0 154.94 2.400640 49.0 47.0 20.411221 \n","4 5 3.0 160.02 2.560640
75.0 78.0 29.289552 \n","\n"," BMI (During COVID) \n","0 27.754876 \n","1 22.910554 \n","2 21.097264 \n","3
19.578110 \n","4 30.461134 "],"text/html":["\n","
\n","
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
Feet Inch Hieght (cm) Hieght (m2) W1 W2 BMI (Before COVID) BMI (During COVID)
05 6.0 167.64 2.810317 70.0 78.0 24.908222 27.754876
15 1.0 154.94 2.400640 51.0 55.0 21.244332 22.910554
25 0.0 152.40 2.322576 44.0 49.0 18.944482 21.097264
35 1.0 154.94 2.400640 49.0 47.0 20.411221 19.578110
45 3.0 160.02 2.560640 75.0 78.0 29.289552 30.461134
\n","
\n","
\n","\n","
\n" " \n" "\n" " \n" " \n" "\n" " \n" "\n" " \n" "
\n , \n , \n , \n , \n , \n , \n , \n , \n ,
\n","\n","\n","
\n","\n","
\n"," \n"," \n","\n","\n","\n"," \n","
\n","\n","
\n","
\n"],"application/vnd.google.colaboratory.intrinsic+json":{"type":"dataframe","variable_name":"df_excel","summary":"{\n
\"name\": \"df_excel\",\n \"rows\": 1602,\n \"fields\": [\n {\n \"column\": \"Feet\",\n \"properties\": {\n \"dtype\":
\"number\",\n \"std\": 0,\n \"min\": 4,\n \"max\": 7,\n \"num_unique_values\": 4,\n \"samples\": [\n 6,\n 7,\n 5\n ],\n
\"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Inch\",\n \"properties\": {\n \"dtype\": \"number\",\n
\"std\": 3.0922374995274406,\n \"min\": 0.0,\n \"max\": 33.0,\n \"num_unique_values\": 21,\n \"samples\": [\n 6.0,\n
7.5,\n 8.5\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Hieght (cm)\",\n \"properties\": {\n
\"dtype\": \"number\",\n \"std\": 9.513993865916744,\n \"min\": 121.92,\n \"max\": 236.22,\n \"num_unique_values\":
40,\n \"samples\": [\n 185.42000000000002,\n 158.75,\n 200.66\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n
},\n {\n \"column\": \"Hieght (m2)\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.31662606270109006,\n \"min\":
1.48644864,\n \"max\": 5.57998884,\n \"num_unique_values\": 40,\n \"samples\": [\n 3.4380576400000002,\n
2.52015625,\n 4.026443560000001\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"W1\",\n
\"properties\": {\n \"dtype\": \"number\",\n \"std\": 12.335592024444987,\n \"min\": 3.0,\n \"max\": 118.0,\n
\"num_unique_values\": 78,\n \"samples\": [\n 80.0,\n 70.0,\n 71.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n
}\n },\n {\n \"column\": \"W2\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 21.01654716375466,\n \"min\": 0.4,\n
\"max\": 748.0,\n \"num_unique_values\": 102,\n \"samples\": [\n 61.8,\n 46.2,\n 37.0\n ],\n \"semantic_type\": \"\",\n
\"description\": \"\"\n }\n },\n {\n \"column\": \"BMI (Before COVID)\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\":
4.117923349181555,\n \"min\": 1.1352561767623535,\n \"max\": 40.47230316682855,\n \"num_unique_values\":
577,\n \"samples\": [\n 26.045000630223903,\n 27.151870912738364,\n 36.15167580189387\n ],\n \"semantic_type\":
\"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"BMI (During COVID)\",\n \"properties\": {\n \"dtype\": \"number\",\n
\"std\": 7.678420507393607,\n \"min\": 0.15621094482299824,\n \"max\": 283.05720673941346,\n
\"num_unique_values\": 600,\n \"samples\": [\n 26.696125645979503,\n 28.903053243519697,\n
24.432400150219703\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"}},"metadata":
{},"execution_count":5}],"source":["# Assuming 'BMI Calculation_MJH.xlsx' is present in your directory\n","\n","# for csv
file pd.read_csv(\"file path\")\n","\n","df_excel = pd.read_excel('/content/BMI
Calculation_MJH.xlsx')\n","\n","print(\"\\nLoaded DataFrame from a Excel file:\\n\")\n","df_excel.head()\n"]},
{"cell_type":"markdown","source":["The dataset consists of the following columns:\n","\n","* **Feet and Inch:**
Representing the height of individuals, which could be considered as discrete variables for classification purposes, but
more often, height is treated as continuous when combined into a single metric (e.g., total inches or centimeters).\n","*
**Height (cm) and Height (m2):** Continuous variables representing height in centimeters and height squared in
meters squared, respectively, used in BMI calculations.\n","* **W1 and W2:** Weights before and during COVID,
continuous variables representing individuals' weight in kilograms at two different times.\n","* **BMI (Before COVID)
and BMI (During COVID):** Continuous variables representing individuals' Body Mass Index before and during the
COVID pandemic."],"metadata":{"id":"-sU4xT-rnViv"}},{"cell_type":"markdown","source":["### Viewing and Inspecting
Data\n","Objective: Familiarize with methods to view and inspect DataFrame properties."],"metadata":
{"id":"BY3qmLY2XzPO"}},{"cell_type":"code","source":["# Viewing the first few rows\n","print(df.head())\n","\n","#
Viewing the last few rows\n","print(df.tail())\n","\n","# Getting info about DataFrame\n","df.info()\n","\n","# Descriptive
statistics for numerical columns\n","print(df.describe())\n"],"metadata":{"colab":
{"base_uri":"https://localhost:8080/"},"id":"jDjW7jAPXLd4","outputId":"2ed5a7fd-c68b-4c80-d945-
69dafe26e9db","executionInfo":{"status":"ok","timestamp":1731168111463,"user_tz":-360,"elapsed":130,"user":
{"displayName":"Mohammad Rifat Ahmmad
Rashid","userId":"17207620860184690696"}}},"execution_count":null,"outputs":
[{"output_type":"stream","name":"stdout","text":[" Name Age City\n","0 John 28 New York\n","1 Anna 34 Paris\n","2
Peter 29 Berlin\n","3 Linda 32 London\n"," Name Age City\n","0 John 28 New York\n","1 Anna 34 Paris\n","2 Peter 29
Berlin\n","3 Linda 32 London\n","\n","RangeIndex: 4 entries, 0 to 3\n","Data columns (total 3 columns):\n"," # Column
Non-Null Count Dtype \n","--- ------ -------------- ----- \n"," 0 Name 4 non-null object\n"," 1 Age 4 non-null int64 \n"," 2 City
4 non-null object\n","dtypes: int64(1), object(2)\n","memory usage: 224.0+ bytes\n"," Age\n","count 4.000000\n","mean
30.750000\n","std 2.753785\n","min 28.000000\n","25% 28.750000\n","50% 30.500000\n","75% 32.500000\n","max
34.000000\n"]}]},{"cell_type":"code","source":["# Viewing the first few rows\n","print(df_excel.head())"],"metadata":
{"colab":{"base_uri":"https://localhost:8080/"},"id":"JFux6YA3oGpJ","executionInfo":
{"status":"ok","timestamp":1731238437582,"user_tz":-360,"elapsed":603,"user":{"displayName":"Mohammad Rifat
Ahmmad Rashid","userId":"17207620860184690696"}},"outputId":"de76d65e-16d2-4b44-8169-
a179570c34a6"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":[" Feet Inch Hieght
(cm) Hieght (m2) W1 W2 BMI (Before COVID) \\\n","0 5 6.0 167.64 2.810317 70.0 78.0 24.908222 \n","1 5 1.0 154.94
2.400640 51.0 55.0 21.244332 \n","2 5 0.0 152.40 2.322576 44.0 49.0 18.944482 \n","3 5 1.0 154.94 2.400640 49.0
47.0 20.411221 \n","4 5 3.0 160.02 2.560640 75.0 78.0 29.289552 \n","\n"," BMI (During COVID) \n","0 27.754876
\n","1 22.910554 \n","2 21.097264 \n","3 19.578110 \n","4 30.461134 \n"]}]},{"cell_type":"code","source":["# Viewing the
last few rows\n","print(df_excel.tail())"],"metadata":{"colab":
{"base_uri":"https://localhost:8080/"},"id":"6r8ynfOfoJC5","executionInfo":
{"status":"ok","timestamp":1731238452588,"user_tz":-360,"elapsed":473,"user":{"displayName":"Mohammad Rifat
Ahmmad Rashid","userId":"17207620860184690696"}},"outputId":"523ddccd-c731-47e2-d199-
5ba961256ebc"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":[" Feet Inch Hieght
(cm) Hieght (m2) W1 W2 BMI (Before COVID) \\\n","1597 5 9.0 175.26 3.071607 68.0 71.0 22.138251 \n","1598 6 0.0
182.88 3.344509 76.0 75.0 22.723811 \n","1599 5 6.0 167.64 2.810317 67.0 63.0 23.840727 \n","1600 6 3.0 190.50
3.629025 70.0 77.0 19.288927 \n","1601 5 3.0 160.02 2.560640 63.0 61.0 24.603224 \n","\n"," BMI (During COVID)
\n","1597 23.114938 \n","1598 22.424813 \n","1599 22.417400 \n","1600 21.217820 \n","1601 23.822169 \n"]}]},
{"cell_type":"code","source":["# Getting info about DataFrame\n","df_excel.info()"],"metadata":{"colab":
{"base_uri":"https://localhost:8080/"},"id":"g6q6-lnVoL15","executionInfo":
{"status":"ok","timestamp":1731238467338,"user_tz":-360,"elapsed":367,"user":{"displayName":"Mohammad Rifat
Ahmmad Rashid","userId":"17207620860184690696"}},"outputId":"1aafd189-8ece-4467-84d7-
ddcd1662540f"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["\n","RangeIndex:
1602 entries, 0 to 1601\n","Data columns (total 8 columns):\n"," # Column Non-Null Count Dtype \n","--- ------ -------------
- ----- \n"," 0 Feet 1602 non-null int64 \n"," 1 Inch 1602 non-null float64\n"," 2 Hieght (cm) 1602 non-null float64\n"," 3
Hieght (m2) 1602 non-null float64\n"," 4 W1 1602 non-null float64\n"," 5 W2 1602 non-null float64\n"," 6 BMI (Before
COVID) 1602 non-null float64\n"," 7 BMI (During COVID) 1602 non-null float64\n","dtypes: float64(7),
int64(1)\n","memory usage: 100.2 KB\n"]}]},{"cell_type":"code","source":["# Descriptive statistics for numerical
columns\n","df_excel.describe()"],"metadata":{"colab":
{"base_uri":"https://localhost:8080/","height":355},"id":"blNoDNguoAHK","executionInfo":
{"status":"ok","timestamp":1731238505722,"user_tz":-360,"elapsed":420,"user":{"displayName":"Mohammad Rifat
Ahmmad Rashid","userId":"17207620860184690696"}},"outputId":"befd1217-09a8-4b3e-e69b-
414407afdd7e"},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":[" Feet Inch Hieght
(cm) Hieght (m2) W1 \\\n","count 1602.000000 1602.000000 1602.000000 1602.000000 1602.000000 \n","mean
4.990012 4.876092 164.480855 2.714441 61.701623 \n","std 0.275867 3.092237 9.513994 0.316626 12.335592
\n","min 4.000000 0.000000 121.920000 1.486449 3.000000 \n","25% 5.000000 2.000000 157.480000 2.479995
53.000000 \n","50% 5.000000 5.000000 165.100000 2.725801 61.000000 \n","75% 5.000000 7.000000 170.180000
2.896123 69.000000 \n","max 7.000000 33.000000 236.220000 5.579989 118.000000 \n","\n"," W2 BMI (Before
COVID) BMI (During COVID) \n","count 1602.000000 1602.000000 1602.000000 \n","mean 63.661236 22.790117
23.525109 \n","std 21.016547 4.117923 7.678421 \n","min 0.400000 1.135256 0.156211 \n","25% 55.000000
20.112497 20.638241 \n","50% 62.000000 22.434305 23.129064 \n","75% 71.000000 24.993751 25.619886 \n","max
748.000000 40.472303 283.057207 "],"text/html":["\n","
\n","
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
\n"," \n"," \n"," \n"," \n"," \n","
BMI (Before BMI (During
Feet Inch Hieght (cm) Hieght (m2) W1 W2
COVID) COVID)
count 1602.000000 1602.000000 1602.000000 1602.000000 1602.000000 1602.000000 1602.000000 1602.000000
mean 4.990012 4.876092 164.480855 2.714441 61.701623 63.661236 22.790117 23.525109
std 0.275867 3.092237 9.513994 0.316626 12.335592 21.016547 4.117923 7.678421
min 4.000000 0.000000 121.920000 1.486449 3.000000 0.400000 1.135256 0.156211
25% 5.000000 2.000000 157.480000 2.479995 53.000000 55.000000 20.112497 20.638241
50% 5.000000 5.000000 165.100000 2.725801 61.000000 62.000000 22.434305 23.129064
75% 5.000000 7.000000 170.180000 2.896123 69.000000 71.000000 24.993751 25.619886
max 7.000000 33.000000 236.220000 5.579989 118.000000 748.000000 40.472303 283.057207
\n","
\n","
\n","\n","
\n" " \n" "\n" " \n" " \n" "\n" " \n" "\n" " \n" "
\n , \n , \n , \n , \n , \n , \n , \n , \n ,
\n","\n","\n","
\n","\n","
\n"," \n"," \n","\n","\n","\n"," \n","
\n","\n","
\n","
\n"],"application/vnd.google.colaboratory.intrinsic+json":{"type":"dataframe","summary":"{\n \"name\": \"df_excel\",\n
\"rows\": 8,\n \"fields\": [\n {\n \"column\": \"Feet\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\":
564.8165691141892,\n \"min\": 0.2758669257875601,\n \"max\": 1602.0,\n \"num_unique_values\": 6,\n \"samples\":
[\n 1602.0,\n 4.990012484394507,\n 7.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\":
\"Inch\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 563.7136218772606,\n \"min\": 0.0,\n \"max\": 1602.0,\n
\"num_unique_values\": 8,\n \"samples\": [\n 4.8760923845193505,\n 5.0,\n 1602.0\n ],\n \"semantic_type\": \"\",\n
\"description\": \"\"\n }\n },\n {\n \"column\": \"Hieght (cm)\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\":
518.6052678519698,\n \"min\": 9.513993865916744,\n \"max\": 1602.0,\n \"num_unique_values\": 8,\n \"samples\": [\n
164.48085518102374,\n 165.1,\n 1602.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\":
\"Hieght (m2)\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 565.4752755075044,\n \"min\":
0.31662606270109006,\n \"max\": 1602.0,\n \"num_unique_values\": 8,\n \"samples\": [\n 2.714441129843945,\n
2.725801,\n 1602.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"W1\",\n \"properties\":
{\n \"dtype\": \"number\",\n \"std\": 548.4417973432782,\n \"min\": 3.0,\n \"max\": 1602.0,\n \"num_unique_values\": 8,\n
\"samples\": [\n 61.70162297128589,\n 61.0,\n 1602.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n
\"column\": \"W2\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 570.9948056032903,\n \"min\": 0.4,\n \"max\":
1602.0,\n \"num_unique_values\": 8,\n \"samples\": [\n 63.661235955056185,\n 62.0,\n 1602.0\n ],\n \"semantic_type\":
\"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"BMI (Before COVID)\",\n \"properties\": {\n \"dtype\": \"number\",\n
\"std\": 559.6564305916718,\n \"min\": 1.1352561767623535,\n \"max\": 1602.0,\n \"num_unique_values\": 8,\n
\"samples\": [\n 22.79011745578402,\n 22.434304824015044,\n 1602.0\n ],\n \"semantic_type\": \"\",\n \"description\":
\"\"\n }\n },\n {\n \"column\": \"BMI (During COVID)\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\":
554.9553142069764,\n \"min\": 0.15621094482299824,\n \"max\": 1602.0,\n \"num_unique_values\": 8,\n \"samples\":
[\n 23.525108794387545,\n 23.129063705326676,\n 1602.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n
]\n}"}},"metadata":{},"execution_count":9}]},{"cell_type":"code","source":["df_excel['BMI (Before
COVID)'].mean()"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"WsXVnopq6qvS","executionInfo":
{"status":"ok","timestamp":1731238559647,"user_tz":-360,"elapsed":380,"user":{"displayName":"Mohammad Rifat
Ahmmad Rashid","userId":"17207620860184690696"}},"outputId":"13399a7e-6b4e-44e8-9ea4-
aa62ab8b5c8a"},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":
["22.79011745578402"]},"metadata":{},"execution_count":11}]},{"cell_type":"code","source":["!gdown --id
1b2_5EdoTVS5XbuuhTUYXl5tyuNk7pxML"],"metadata":{"colab":
{"base_uri":"https://localhost:8080/"},"id":"1PqJaiIi3ebl","executionInfo":
{"status":"ok","timestamp":1731238712036,"user_tz":-360,"elapsed":4314,"user":{"displayName":"Mohammad Rifat
Ahmmad Rashid","userId":"17207620860184690696"}},"outputId":"f198105e-56de-47ca-af54-
7a9651fe727d"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":
["/usr/local/lib/python3.10/dist-packages/gdown/__main__.py:140: FutureWarning: Option `--id` was deprecated in
version 4.3.1 and will be removed in 5.0. You don't need to pass it anymore to use a file ID.\n","
warnings.warn(\n","Downloading...\n","From: https://drive.google.com/uc?
id=1b2_5EdoTVS5XbuuhTUYXl5tyuNk7pxML\n","To: /content/user_behavior_dataset.csv\n","100% 38.9k/38.9k
[00:00<00:00, 50.7MB/s]\n"]}]},{"cell_type":"code","source":["df_csv =
pd.read_csv('/content/user_behavior_dataset.csv')\n","df_csv.head()"],"metadata":{"colab":
{"base_uri":"https://localhost:8080/","height":347},"id":"_LsS5lci3pBt","executionInfo":
{"status":"ok","timestamp":1731238747209,"user_tz":-360,"elapsed":393,"user":{"displayName":"Mohammad Rifat
Ahmmad Rashid","userId":"17207620860184690696"}},"outputId":"65599d7e-f45e-40a3-9842-
d6860088d39b"},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":[" User ID Device
Model Operating System App Usage Time (min/day) \\\n","0 1 Google Pixel 5 Android 393 \n","1 2 OnePlus 9 Android
268 \n","2 3 Xiaomi Mi 11 Android 154 \n","3 4 Google Pixel 5 Android 239 \n","4 5 iPhone 12 iOS 187 \n","\n"," Screen
On Time (hours/day) Battery Drain (mAh/day) \\\n","0 6.4 1872 \n","1 4.7 1331 \n","2 4.0 761 \n","3 4.8 1676 \n","4 4.3
1367 \n","\n"," Number of Apps Installed Data Usage (MB/day) Age Gender \\\n","0 67 1122 40 Male \n","1 42 944 47
Female \n","2 32 322 42 Male \n","3 56 871 20 Male \n","4 58 988 31 Female \n","\n"," User Behavior Class \n","0 4
\n","1 3 \n","2 2 \n","3 3 \n","4 3 "],"text/html":["\n","
\n","
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
App
Screen On Battery Number Data User
User Device Operating Usage
Time Drain of Apps Usage Age Gender Behavior
ID Model System Time
(hours/day) (mAh/day) Installed (MB/day) Class
(min/day)
Google
01 Android 393 6.4 1872 67 1122 40 Male 4
Pixel 5
OnePlus
12 Android 268 4.7 1331 42 944 47 Female 3
9
Xiaomi
23 Android 154 4.0 761 32 322 42 Male 2
Mi 11
App
Screen On Battery Number Data User
User Device Operating Usage
Time Drain of Apps Usage Age Gender Behavior
ID Model System Time
(hours/day) (mAh/day) Installed (MB/day) Class
(min/day)
Google
34 Android 239 4.8 1676 56 871 20 Male 3
Pixel 5
iPhone
45 iOS 187 4.3 1367 58 988 31 Female 3
12
\n","
\n","
\n","\n","
\n" " \n" "\n" " \n" " \n" "\n" " \n" "\n" " \n" "
\n , \n , \n , \n , \n , \n , \n , \n , \n ,
\n","\n","\n","
\n","\n","
\n"," \n"," \n","\n","\n","\n"," \n","
\n","\n","
\n","
\n"],"application/vnd.google.colaboratory.intrinsic+json":{"type":"dataframe","variable_name":"df_csv","summary":"{\n
\"name\": \"df_csv\",\n \"rows\": 700,\n \"fields\": [\n {\n \"column\": \"User ID\",\n \"properties\": {\n \"dtype\":
\"number\",\n \"std\": 202,\n \"min\": 1,\n \"max\": 700,\n \"num_unique_values\": 700,\n \"samples\": [\n 159,\n 501,\n
397\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Device Model\",\n \"properties\": {\n
\"dtype\": \"category\",\n \"num_unique_values\": 5,\n \"samples\": [\n \"OnePlus 9\",\n \"Samsung Galaxy S21\",\n
\"Xiaomi Mi 11\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Operating System\",\n
\"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 2,\n \"samples\": [\n \"iOS\",\n \"Android\"\n ],\n
\"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"App Usage Time (min/day)\",\n \"properties\": {\n
\"dtype\": \"number\",\n \"std\": 177,\n \"min\": 30,\n \"max\": 598,\n \"num_unique_values\": 387,\n \"samples\": [\n
582,\n 402\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Screen On Time (hours/day)\",\n
\"properties\": {\n \"dtype\": \"number\",\n \"std\": 3.068583910273257,\n \"min\": 1.0,\n \"max\": 12.0,\n
\"num_unique_values\": 108,\n \"samples\": [\n 10.8,\n 1.4\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n
{\n \"column\": \"Battery Drain (mAh/day)\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 819,\n \"min\": 302,\n
\"max\": 2993,\n \"num_unique_values\": 628,\n \"samples\": [\n 2597,\n 1632\n ],\n \"semantic_type\": \"\",\n
\"description\": \"\"\n }\n },\n {\n \"column\": \"Number of Apps Installed\",\n \"properties\": {\n \"dtype\": \"number\",\n
\"std\": 26,\n \"min\": 10,\n \"max\": 99,\n \"num_unique_values\": 86,\n \"samples\": [\n 79,\n 67\n ],\n \"semantic_type\":
\"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Data Usage (MB/day)\",\n \"properties\": {\n \"dtype\": \"number\",\n
\"std\": 640,\n \"min\": 102,\n \"max\": 2497,\n \"num_unique_values\": 585,\n \"samples\": [\n 839,\n 765\n ],\n
\"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Age\",\n \"properties\": {\n \"dtype\": \"number\",\n
\"std\": 12,\n \"min\": 18,\n \"max\": 59,\n \"num_unique_values\": 42,\n \"samples\": [\n 56,\n 26\n ],\n \"semantic_type\":
\"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Gender\",\n \"properties\": {\n \"dtype\": \"category\",\n
\"num_unique_values\": 2,\n \"samples\": [\n \"Female\",\n \"Male\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n
}\n },\n {\n \"column\": \"User Behavior Class\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1,\n \"min\": 1,\n
\"max\": 5,\n \"num_unique_values\": 5,\n \"samples\": [\n 3,\n 1\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n
}\n ]\n}"}},"metadata":{},"execution_count":13}]},{"cell_type":"code","source":["df_csv.info()"],"metadata":{"colab":
{"base_uri":"https://localhost:8080/"},"id":"YaiMAStr4d9N","executionInfo":
{"status":"ok","timestamp":1731238952694,"user_tz":-360,"elapsed":4,"user":{"displayName":"Mohammad Rifat
Ahmmad Rashid","userId":"17207620860184690696"}},"outputId":"4b5b4e6d-8d80-4589-95ba-
3a6c5568c0ab"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["\n","RangeIndex:
700 entries, 0 to 699\n","Data columns (total 11 columns):\n"," # Column Non-Null Count Dtype \n","--- ------ --------------
----- \n"," 0 User ID 700 non-null int64 \n"," 1 Device Model 700 non-null object \n"," 2 Operating System 700 non-null
object \n"," 3 App Usage Time (min/day) 700 non-null int64 \n"," 4 Screen On Time (hours/day) 700 non-null float64\n","
5 Battery Drain (mAh/day) 700 non-null int64 \n"," 6 Number of Apps Installed 700 non-null int64 \n"," 7 Data Usage
(MB/day) 700 non-null int64 \n"," 8 Age 700 non-null int64 \n"," 9 Gender 700 non-null object \n"," 10 User Behavior
Class 700 non-null int64 \n","dtypes: float64(1), int64(7), object(3)\n","memory usage: 60.3+ KB\n"]}]},
{"cell_type":"code","source":["df.describe()"],"metadata":{"colab":
{"base_uri":"https://localhost:8080/","height":300},"id":"VOrDqXS84hNF","executionInfo":
{"status":"ok","timestamp":1731238968808,"user_tz":-360,"elapsed":397,"user":{"displayName":"Mohammad Rifat
Ahmmad Rashid","userId":"17207620860184690696"}},"outputId":"b8db2684-06ca-45d3-ce11-
d6462611ef84"},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":[" Age\n","count
4.000000\n","mean 30.750000\n","std 2.753785\n","min 28.000000\n","25% 28.750000\n","50% 30.500000\n","75%
32.500000\n","max 34.000000"],"text/html":["\n","
\n","
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
Age
count 4.000000
mean 30.750000
std 2.753785
min 28.000000
25% 28.750000
50% 30.500000
75% 32.500000
max 34.000000
\n","
\n","
\n","\n","
\n" " \n" "\n" " \n" " \n" "\n" " \n" "\n" " \n" "
\n , \n , \n , \n , \n , \n , \n , \n , \n ,
\n","\n","\n","
\n","\n","
\n"," \n"," \n","\n","\n","\n"," \n","
\n","\n","
\n","
\n"],"application/vnd.google.colaboratory.intrinsic+json":{"type":"dataframe","summary":"{\n \"name\": \"df\",\n \"rows\":
8,\n \"fields\": [\n {\n \"column\": \"Age\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 12.817159456014078,\n
\"min\": 2.753785273643051,\n \"max\": 34.0,\n \"num_unique_values\": 8,\n \"samples\": [\n 30.75,\n 30.5,\n 4.0\n ],\n
\"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"}},"metadata":{},"execution_count":15}]},
{"cell_type":"markdown","source":["### Indexing, Selecting, and Filtering\n","Objective: Select and filter DataFrame
rows and columns."],"metadata":{"id":"4JZ3LctdX4gs"}},{"cell_type":"code","source":["# Select specific
columns\n","print(\"Names column:\\n\", df['Name'])\n","\n","# Filter rows based on a condition\n","print(\"\\nRows where
Age > 30:\\n\", df[df['Age'] > 30])\n","\n","# Label-based indexing\n","print(\"\\nSelect specific row and column with
.loc:\\n\",\n"," df.loc[1, 'Name'])\n","\n","# Integer-based indexing\n","print(\"\\nSelect specific row and column with
.iloc:\\n\",\n"," df.iloc[1, 0])\n"],"metadata":{"colab":
{"base_uri":"https://localhost:8080/"},"id":"2L36VKjNX2I3","outputId":"4e29ae83-f17d-4a4c-fb9a-
14025a89e832","executionInfo":{"status":"ok","timestamp":1731168111465,"user_tz":-360,"elapsed":101,"user":
{"displayName":"Mohammad Rifat Ahmmad
Rashid","userId":"17207620860184690696"}}},"execution_count":null,"outputs":
[{"output_type":"stream","name":"stdout","text":["Names column:\n"," 0 John\n","1 Anna\n","2 Peter\n","3
Linda\n","Name: Name, dtype: object\n","\n","Rows where Age > 30:\n"," Name Age City\n","1 Anna 34 Paris\n","3
Linda 32 London\n","\n","Select specific row and column with .loc:\n"," Anna\n","\n","Select specific row and column
with .iloc:\n"," Anna\n"]}]},{"cell_type":"code","source":["df_excel.head()"],"metadata":{"colab":
{"base_uri":"https://localhost:8080/","height":206},"id":"MYPCS-nw56At","executionInfo":
{"status":"ok","timestamp":1731239330808,"user_tz":-360,"elapsed":402,"user":{"displayName":"Mohammad Rifat
Ahmmad Rashid","userId":"17207620860184690696"}},"outputId":"571042dd-b2c3-4a59-cbfc-
b0188f126711"},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":[" Feet Inch Hieght
(cm) Hieght (m2) W1 W2 BMI (Before COVID) \\\n","0 5 6.0 167.64 2.810317 70.0 78.0 24.908222 \n","1 5 1.0 154.94
2.400640 51.0 55.0 21.244332 \n","2 5 0.0 152.40 2.322576 44.0 49.0 18.944482 \n","3 5 1.0 154.94 2.400640 49.0
47.0 20.411221 \n","4 5 3.0 160.02 2.560640 75.0 78.0 29.289552 \n","\n"," BMI (During COVID) \n","0 27.754876
\n","1 22.910554 \n","2 21.097264 \n","3 19.578110 \n","4 30.461134 "],"text/html":["\n","
\n","
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
Feet Inch Hieght (cm) Hieght (m2) W1 W2 BMI (Before COVID) BMI (During COVID)
05 6.0 167.64 2.810317 70.0 78.0 24.908222 27.754876
15 1.0 154.94 2.400640 51.0 55.0 21.244332 22.910554
25 0.0 152.40 2.322576 44.0 49.0 18.944482 21.097264
35 1.0 154.94 2.400640 49.0 47.0 20.411221 19.578110
45 3.0 160.02 2.560640 75.0 78.0 29.289552 30.461134
\n","
\n","
\n","\n","
\n" " \n" "\n" " \n" " \n" "\n" " \n" "\n" " \n" "
\n , \n , \n , \n , \n , \n , \n , \n , \n ,
\n","\n","\n","
\n","\n","
\n"," \n"," \n","\n","\n","\n"," \n","
\n","\n","
\n","
\n"],"application/vnd.google.colaboratory.intrinsic+json":{"type":"dataframe","variable_name":"df_excel","summary":"{\n
\"name\": \"df_excel\",\n \"rows\": 1602,\n \"fields\": [\n {\n \"column\": \"Feet\",\n \"properties\": {\n \"dtype\":
\"number\",\n \"std\": 0,\n \"min\": 4,\n \"max\": 7,\n \"num_unique_values\": 4,\n \"samples\": [\n 6,\n 7,\n 5\n ],\n
\"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Inch\",\n \"properties\": {\n \"dtype\": \"number\",\n
\"std\": 3.0922374995274406,\n \"min\": 0.0,\n \"max\": 33.0,\n \"num_unique_values\": 21,\n \"samples\": [\n 6.0,\n
7.5,\n 8.5\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Hieght (cm)\",\n \"properties\": {\n
\"dtype\": \"number\",\n \"std\": 9.513993865916744,\n \"min\": 121.92,\n \"max\": 236.22,\n \"num_unique_values\":
40,\n \"samples\": [\n 185.42000000000002,\n 158.75,\n 200.66\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n
},\n {\n \"column\": \"Hieght (m2)\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.31662606270109006,\n \"min\":
1.48644864,\n \"max\": 5.57998884,\n \"num_unique_values\": 40,\n \"samples\": [\n 3.4380576400000002,\n
2.52015625,\n 4.026443560000001\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"W1\",\n
\"properties\": {\n \"dtype\": \"number\",\n \"std\": 12.335592024444987,\n \"min\": 3.0,\n \"max\": 118.0,\n
\"num_unique_values\": 78,\n \"samples\": [\n 80.0,\n 70.0,\n 71.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n
}\n },\n {\n \"column\": \"W2\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 21.01654716375466,\n \"min\": 0.4,\n
\"max\": 748.0,\n \"num_unique_values\": 102,\n \"samples\": [\n 61.8,\n 46.2,\n 37.0\n ],\n \"semantic_type\": \"\",\n
\"description\": \"\"\n }\n },\n {\n \"column\": \"BMI (Before COVID)\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\":
4.117923349181555,\n \"min\": 1.1352561767623535,\n \"max\": 40.47230316682855,\n \"num_unique_values\":
577,\n \"samples\": [\n 26.045000630223903,\n 27.151870912738364,\n 36.15167580189387\n ],\n \"semantic_type\":
\"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"BMI (During COVID)\",\n \"properties\": {\n \"dtype\": \"number\",\n
\"std\": 7.678420507393607,\n \"min\": 0.15621094482299824,\n \"max\": 283.05720673941346,\n
\"num_unique_values\": 600,\n \"samples\": [\n 26.696125645979503,\n 28.903053243519697,\n
24.432400150219703\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"}},"metadata":
{},"execution_count":16}]},{"cell_type":"code","source":["df_excel['W2']"],"metadata":
{"id":"jzuNZ_Zh5_7M","executionInfo":{"status":"ok","timestamp":1731239357824,"user_tz":-360,"elapsed":387,"user":
{"displayName":"Mohammad Rifat Ahmmad Rashid","userId":"17207620860184690696"}},"outputId":"fb751bff-5f31-
46e0-b191-8dd971b061b3","colab":{"base_uri":"https://localhost:8080/","height":458}},"execution_count":null,"outputs":
[{"output_type":"execute_result","data":{"text/plain":["0 78.0\n","1 55.0\n","2 49.0\n","3 47.0\n","4 78.0\n"," ... \n","1597
71.0\n","1598 75.0\n","1599 63.0\n","1600 77.0\n","1601 61.0\n","Name: W2, Length: 1602, dtype: float64"],"text/html":
["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
\n"," \n"," \n"," \n"," \n","
W2
0 78.0
1 55.0
2 49.0
3 47.0
4 78.0
... ...
1597 71.0
1598 75.0
1599 63.0
1600 77.0
1601 61.0
\n","
\n","
dtype: float64"]},"metadata":{},"execution_count":17}]},{"cell_type":"code","source":["df_csv.info()"],"metadata":{"colab":
{"base_uri":"https://localhost:8080/"},"id":"0PRRk1cXpX4D","executionInfo":
{"status":"ok","timestamp":1731168111466,"user_tz":-360,"elapsed":99,"user":{"displayName":"Mohammad Rifat
Ahmmad Rashid","userId":"17207620860184690696"}},"outputId":"61075d9b-9b2c-4617-c778-
63cbe73173e6"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["\n","RangeIndex:
1602 entries, 0 to 1601\n","Data columns (total 8 columns):\n"," # Column Non-Null Count Dtype \n","--- ------ -------------
- ----- \n"," 0 Feet 1602 non-null int64 \n"," 1 Inch 1602 non-null float64\n"," 2 Hieght (cm) 1602 non-null float64\n"," 3
Hieght (m2) 1602 non-null float64\n"," 4 W1 1602 non-null float64\n"," 5 W2 1602 non-null float64\n"," 6 BMI (Before
COVID) 1602 non-null float64\n"," 7 BMI (During COVID) 1602 non-null float64\n","dtypes: float64(7),
int64(1)\n","memory usage: 100.2 KB\n"]}]},{"cell_type":"code","source":["# Select specific columns\n","print(\"Names
column:\\n\", df_csv['W1'])\n"],"metadata":{"colab":
{"base_uri":"https://localhost:8080/"},"id":"hcANMGIMpeBx","executionInfo":
{"status":"ok","timestamp":1731168111466,"user_tz":-360,"elapsed":96,"user":{"displayName":"Mohammad Rifat
Ahmmad Rashid","userId":"17207620860184690696"}},"outputId":"984da36f-e035-4a60-afe6-
79ee959ed2f7"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["Names column:\n","
0 70.0\n","1 51.0\n","2 44.0\n","3 49.0\n","4 75.0\n"," ... \n","1597 68.0\n","1598 76.0\n","1599 67.0\n","1600
70.0\n","1601 63.0\n","Name: W1, Length: 1602, dtype: float64\n"]}]},{"cell_type":"code","source":["# Filter rows based
on a condition\n","print(\"\\nRows where W1 > 70:\\n\", df_csv[df_csv['W1'] < 70])"],"metadata":{"colab":
{"base_uri":"https://localhost:8080/"},"id":"EJmNX_YXpnyJ","executionInfo":
{"status":"ok","timestamp":1731168111466,"user_tz":-360,"elapsed":94,"user":{"displayName":"Mohammad Rifat
Ahmmad Rashid","userId":"17207620860184690696"}},"outputId":"912222aa-ed12-4709-8884-
4c84d31cb8b4"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["\n","Rows where W1
> 70:\n"," Feet Inch Hieght (cm) Hieght (m2) W1 W2 BMI (Before COVID) \\\n","1 5 1.0 154.94 2.400640 51.0 55.0
21.244332 \n","2 5 0.0 152.40 2.322576 44.0 49.0 18.944482 \n","3 5 1.0 154.94 2.400640 49.0 47.0 20.411221 \n","6
5 6.0 167.64 2.810317 68.0 74.0 24.196559 \n","7 5 8.0 172.72 2.983220 69.0 72.0 23.129372 \n","... ... ... ... ... ... ...
... \n","1595 4 11.0 149.86 2.245802 59.0 62.0 26.271239 \n","1596 5 8.0 172.72 2.983220 67.0 64.0 22.458955
\n","1597 5 9.0 175.26 3.071607 68.0 71.0 22.138251 \n","1599 5 6.0 167.64 2.810317 67.0 63.0 23.840727 \n","1601
5 3.0 160.02 2.560640 63.0 61.0 24.603224 \n","\n"," BMI (During COVID) \n","1 22.910554 \n","2 21.097264 \n","3
19.578110 \n","6 26.331549 \n","7 24.134996 \n","... ... \n","1595 27.607065 \n","1596 21.453330 \n","1597 23.114938
\n","1599 22.417400 \n","1601 23.822169 \n","\n","[1227 rows x 8 columns]\n"]}]},{"cell_type":"code","source":["# Label-
based indexing\n","print(\"\\nSelect specific row and column with .loc:\\n\", df_csv.loc[1598, 'Hieght (cm)'])"],"metadata":
{"colab":{"base_uri":"https://localhost:8080/"},"id":"Vl2Cx0OWpzsp","executionInfo":
{"status":"ok","timestamp":1731168111466,"user_tz":-360,"elapsed":91,"user":{"displayName":"Mohammad Rifat
Ahmmad Rashid","userId":"17207620860184690696"}},"outputId":"7abe114b-bc4e-4d1c-9fa4-
5218068fca28"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["\n","Select specific
row and column with .loc:\n"," 182.88\n"]}]},{"cell_type":"code","source":["df_csv.iloc[1550,5]\n"],"metadata":{"colab":
{"base_uri":"https://localhost:8080/"},"id":"O4VbxEmWqJx6","executionInfo":
{"status":"ok","timestamp":1731168111466,"user_tz":-360,"elapsed":87,"user":{"displayName":"Mohammad Rifat
Ahmmad Rashid","userId":"17207620860184690696"}},"outputId":"cd32bd0c-623b-425b-bc09-
9388c5b4a577"},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":
["71.0"]},"metadata":{},"execution_count":14}]},{"cell_type":"code","source":["df_csv.loc[1550,'W2']"],"metadata":
{"colab":{"base_uri":"https://localhost:8080/"},"id":"p8vfUiZxqZuw","executionInfo":
{"status":"ok","timestamp":1731168111467,"user_tz":-360,"elapsed":69,"user":{"displayName":"Mohammad Rifat
Ahmmad Rashid","userId":"17207620860184690696"}},"outputId":"c535fc83-17ed-4aac-f9e6-
772d0a412913"},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":
["71.0"]},"metadata":{},"execution_count":15}]},{"cell_type":"code","source":["# Label-based indexing\n","print(\"\\nSelect
specific row and column with .loc:\\n\", df.loc[1, 'Name'])\n","\n","# Integer-based indexing\n","print(\"\\nSelect specific
row and column with .iloc:\\n\", df.iloc[1, 0])"],"metadata":{"id":"dl7ygBXcpdDZ","colab":
{"base_uri":"https://localhost:8080/"},"executionInfo":
{"status":"ok","timestamp":1731168111467,"user_tz":-360,"elapsed":57,"user":{"displayName":"Mohammad Rifat
Ahmmad Rashid","userId":"17207620860184690696"}},"outputId":"6351b174-424a-402a-9bad-
c33824731af8"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["\n","Select specific
row and column with .loc:\n"," Anna\n","\n","Select specific row and column with .iloc:\n"," Anna\n"]}]},
{"cell_type":"code","source":["df_csv.info()"],"metadata":{"colab":
{"base_uri":"https://localhost:8080/"},"id":"ez68Btg08Kuq","executionInfo":
{"status":"ok","timestamp":1731168111467,"user_tz":-360,"elapsed":38,"user":{"displayName":"Mohammad Rifat
Ahmmad Rashid","userId":"17207620860184690696"}},"outputId":"805908be-3021-4612-8a12-
0accfd1e0152"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["\n","RangeIndex:
1602 entries, 0 to 1601\n","Data columns (total 8 columns):\n"," # Column Non-Null Count Dtype \n","--- ------ -------------
- ----- \n"," 0 Feet 1602 non-null int64 \n"," 1 Inch 1602 non-null float64\n"," 2 Hieght (cm) 1602 non-null float64\n"," 3
Hieght (m2) 1602 non-null float64\n"," 4 W1 1602 non-null float64\n"," 5 W2 1602 non-null float64\n"," 6 BMI (Before
COVID) 1602 non-null float64\n"," 7 BMI (During COVID) 1602 non-null float64\n","dtypes: float64(7),
int64(1)\n","memory usage: 100.2 KB\n"]}]},{"cell_type":"code","source":["df_csv.loc[:,'BMI (During
COVID)']"],"metadata":{"colab":
{"base_uri":"https://localhost:8080/","height":458},"id":"BoGV74Lm8S38","executionInfo":
{"status":"ok","timestamp":1731168111467,"user_tz":-360,"elapsed":35,"user":{"displayName":"Mohammad Rifat
Ahmmad Rashid","userId":"17207620860184690696"}},"outputId":"0a08908d-3894-4442-9c94-
a3b07e4ec8a7"},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":["0
27.754876\n","1 22.910554\n","2 21.097264\n","3 19.578110\n","4 30.461134\n"," ... \n","1597 23.114938\n","1598
22.424813\n","1599 22.417400\n","1600 21.217820\n","1601 23.822169\n","Name: BMI (During COVID), Length:
1602, dtype: float64"],"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
\n"," \n"," \n"," \n"," \n","
BMI (During COVID)
0 27.754876
1 22.910554
2 21.097264
3 19.578110
4 30.461134
BMI (During COVID)
... ...
1597 23.114938
1598 22.424813
1599 22.417400
1600 21.217820
1601 23.822169
\n","
\n","