0% found this document useful (0 votes)
9 views

Pandas

You can Learn Pandas
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Pandas

You can Learn Pandas
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 91

{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"provenance":[],"toc_visible":true},"kernelspec":

{"name":"python3","display_name":"Python 3"},"language_info":{"name":"python"}},"cells":
[{"cell_type":"markdown","source":["# Guideline to Exploratpry Data Analysis\n","\n","standard guideline for exploratory
data analysis (EDA) that you can follow:\n","\n","- **Understand the Data:** Familiarize yourself with the dataset and
the variables it contains. Read any available documentation or data dictionary to gain insights into the meaning and
context of the data.\n","\n","- **Data Cleaning:** Preprocess and clean the data to handle missing values, outliers, and
inconsistencies. This step ensures the data is in a usable format for analysis.\n","\n","- **Descriptive Statistics:**
Calculate summary statistics (mean, median, mode, standard deviation, etc.) to gain a high-level understanding of the
data distribution and central tendencies.\n","\n","- **Data Visualization:** Visualize the data using various charts,
graphs, and plots to identify patterns, trends, and relationships. Common visualizations include histograms, box plots,
scatter plots, bar charts, and line plots.\n","\n","- **Univariate Analysis:** Analyze individual variables in isolation to
understand their distributions, skewness, and potential outliers. Use appropriate visualizations and statistical measures
for univariate analysis.\n","\n","- **Bivariate Analysis:** Explore relationships between pairs of variables to identify
correlations, associations, or dependencies. Scatter plots, heatmaps, correlation matrices, and statistical tests (e.g.,
Pearson correlation coefficient) can be useful for bivariate analysis.\n","\n","- **Multivariate Analysis:** Investigate
interactions and dependencies among multiple variables simultaneously. Techniques such as dimensionality reduction
(e.g., PCA), cluster analysis, and parallel coordinates plots can aid in understanding complex relationships.\n","\n","-
**Feature Engineering:** Create new derived features or transform existing features to enhance the predictive power of
the data. This step may involve techniques like scaling, binning, one-hot encoding, or creating interaction
variables.\n","\n","- **Data Quality Check:** Verify the quality of the data by checking for data integrity, consistency, and
accuracy. Address any issues or discrepancies that may affect the reliability of the analysis.\n","\n","- **Statistical
Testing:** Conduct statistical tests, such as t-tests, chi-square tests, or ANOVA, to evaluate hypotheses, compare
groups, or identify significant differences in the data.\n","\n","- **Documentation:** Document your findings, insights,
and any decisions made during the EDA process. Prepare clear and concise summaries, visualizations, and reports
that effectively communicate the results of your analysis.\n","\n"],"metadata":{"id":"x5mH3McmMxuU"}},
{"cell_type":"markdown","source":["# Load dataset"],"metadata":{"id":"8sxU6JfQXkAY"}},{"cell_type":"code","source":
["!gdown --id 1Qk5FZxfA_jhDcxI3YmuEIbVgd8ZeldMn"],"metadata":{"colab":
{"base_uri":"https://localhost:8080/"},"id":"Gay3yccNXjto","outputId":"f3b944cd-743d-4b7f-c404-
1fa10e36050c","executionInfo":{"status":"ok","timestamp":1731237952455,"user_tz":-360,"elapsed":5322,"user":
{"displayName":"Mohammad Rifat Ahmmad
Rashid","userId":"17207620860184690696"}}},"execution_count":null,"outputs":
[{"output_type":"stream","name":"stdout","text":["/usr/local/lib/python3.10/dist-packages/gdown/__main__.py:140:
FutureWarning: Option `--id` was deprecated in version 4.3.1 and will be removed in 5.0. You don't need to pass it
anymore to use a file ID.\n"," warnings.warn(\n","Downloading...\n","From: https://drive.google.com/uc?
id=1Qk5FZxfA_jhDcxI3YmuEIbVgd8ZeldMn\n","To: /content/BMI Calculation_MJH.xlsx\n","100% 138k/138k
[00:00<00:00, 60.5MB/s]\n"]}]},{"cell_type":"markdown","source":["# Part 1: Basics of Pandas Data
Structures"],"metadata":{"id":"xuNEetqDX_y1"}},{"cell_type":"markdown","source":["### DataFrame
Creation\n","Objective: Learn to create Pandas DataFrame from various sources."],"metadata":
{"id":"EURjDyLTXH0m"}},{"cell_type":"code","source":["import pandas as pd\n","\n","# Create a DataFrame from a
Python dictionary\n","data = {'Name': ['John', 'Anna', 'Peter', 'Linda'],\n"," 'Age': [28, 34, 29, 32],\n"," 'City': ['New York',
'Paris', 'Berlin', 'London']}\n","\n","employee = {'id' : [\"emp-1\",\"emp-2\"],\n"," 'name': ['emp1-name','emp2-name'],\n","
'salary': [50000,6000]\n"," }\n","\n","df = pd.DataFrame(data)\n","\n","df_employee =
pd.DataFrame(employee)\n","df_employee"],"metadata":{"colab":
{"base_uri":"https://localhost:8080/","height":125},"id":"VuXBMGSh1OgO","executionInfo":
{"status":"ok","timestamp":1731238247696,"user_tz":-360,"elapsed":384,"user":{"displayName":"Mohammad Rifat
Ahmmad Rashid","userId":"17207620860184690696"}},"outputId":"4ded653c-fe00-4cf3-e900-
1eb5bc16b587"},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":[" id name
salary\n","0 emp-1 emp1-name 50000\n","1 emp-2 emp2-name 6000"],"text/html":["\n","
\n","
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
id name salary
0 emp-1 emp1-name 50000
1 emp-2 emp2-name 6000
\n","
\n","
\n","\n","
\n" " \n" "\n" " \n" " \n" "\n" " \n" "\n" " \n" "
\n , \n , \n , \n , \n , \n , \n , \n , \n ,
\n","\n","\n","
\n","\n","
\n"," \n"," \n","\n","\n","\n"," \n","
\n","\n","
\n","\n","
\n"," \n"," \n"," \n"," \n","
\n","\n","
\n","
\n"],"application/vnd.google.colaboratory.intrinsic+json":
{"type":"dataframe","variable_name":"df_employee","summary":"{\n \"name\": \"df_employee\",\n \"rows\": 2,\n \"fields\":
[\n {\n \"column\": \"id\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 2,\n \"samples\": [\n \"emp-
2\",\n \"emp-1\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"name\",\n \"properties\": {\n
\"dtype\": \"string\",\n \"num_unique_values\": 2,\n \"samples\": [\n \"emp2-name\",\n \"emp1-name\"\n ],\n
\"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"salary\",\n \"properties\": {\n \"dtype\":
\"number\",\n \"std\": 31112,\n \"min\": 6000,\n \"max\": 50000,\n \"num_unique_values\": 2,\n \"samples\": [\n 6000,\n
50000\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"}},"metadata":{},"execution_count":4}]},
{"cell_type":"code","execution_count":null,"metadata":{"colab":
{"base_uri":"https://localhost:8080/","height":258},"id":"REkZVYhtVQaY","outputId":"e1b31506-d5cc-4416-c2cf-
70f524494953","executionInfo":{"status":"ok","timestamp":1731238369211,"user_tz":-360,"elapsed":1094,"user":
{"displayName":"Mohammad Rifat Ahmmad Rashid","userId":"17207620860184690696"}}},"outputs":
[{"output_type":"stream","name":"stdout","text":["\n","Loaded DataFrame from a Excel file:\n","\n"]},
{"output_type":"execute_result","data":{"text/plain":[" Feet Inch Hieght (cm) Hieght (m2) W1 W2 BMI (Before COVID)
\\\n","0 5 6.0 167.64 2.810317 70.0 78.0 24.908222 \n","1 5 1.0 154.94 2.400640 51.0 55.0 21.244332 \n","2 5 0.0
152.40 2.322576 44.0 49.0 18.944482 \n","3 5 1.0 154.94 2.400640 49.0 47.0 20.411221 \n","4 5 3.0 160.02 2.560640
75.0 78.0 29.289552 \n","\n"," BMI (During COVID) \n","0 27.754876 \n","1 22.910554 \n","2 21.097264 \n","3
19.578110 \n","4 30.461134 "],"text/html":["\n","
\n","
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
Feet Inch Hieght (cm) Hieght (m2) W1 W2 BMI (Before COVID) BMI (During COVID)
05 6.0 167.64 2.810317 70.0 78.0 24.908222 27.754876
15 1.0 154.94 2.400640 51.0 55.0 21.244332 22.910554
25 0.0 152.40 2.322576 44.0 49.0 18.944482 21.097264
35 1.0 154.94 2.400640 49.0 47.0 20.411221 19.578110
45 3.0 160.02 2.560640 75.0 78.0 29.289552 30.461134
\n","
\n","
\n","\n","
\n" " \n" "\n" " \n" " \n" "\n" " \n" "\n" " \n" "
\n , \n , \n , \n , \n , \n , \n , \n , \n ,
\n","\n","\n","
\n","\n","
\n"," \n"," \n","\n","\n","\n"," \n","
\n","\n","
\n","
\n"],"application/vnd.google.colaboratory.intrinsic+json":{"type":"dataframe","variable_name":"df_excel","summary":"{\n
\"name\": \"df_excel\",\n \"rows\": 1602,\n \"fields\": [\n {\n \"column\": \"Feet\",\n \"properties\": {\n \"dtype\":
\"number\",\n \"std\": 0,\n \"min\": 4,\n \"max\": 7,\n \"num_unique_values\": 4,\n \"samples\": [\n 6,\n 7,\n 5\n ],\n
\"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Inch\",\n \"properties\": {\n \"dtype\": \"number\",\n
\"std\": 3.0922374995274406,\n \"min\": 0.0,\n \"max\": 33.0,\n \"num_unique_values\": 21,\n \"samples\": [\n 6.0,\n
7.5,\n 8.5\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Hieght (cm)\",\n \"properties\": {\n
\"dtype\": \"number\",\n \"std\": 9.513993865916744,\n \"min\": 121.92,\n \"max\": 236.22,\n \"num_unique_values\":
40,\n \"samples\": [\n 185.42000000000002,\n 158.75,\n 200.66\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n
},\n {\n \"column\": \"Hieght (m2)\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.31662606270109006,\n \"min\":
1.48644864,\n \"max\": 5.57998884,\n \"num_unique_values\": 40,\n \"samples\": [\n 3.4380576400000002,\n
2.52015625,\n 4.026443560000001\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"W1\",\n
\"properties\": {\n \"dtype\": \"number\",\n \"std\": 12.335592024444987,\n \"min\": 3.0,\n \"max\": 118.0,\n
\"num_unique_values\": 78,\n \"samples\": [\n 80.0,\n 70.0,\n 71.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n
}\n },\n {\n \"column\": \"W2\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 21.01654716375466,\n \"min\": 0.4,\n
\"max\": 748.0,\n \"num_unique_values\": 102,\n \"samples\": [\n 61.8,\n 46.2,\n 37.0\n ],\n \"semantic_type\": \"\",\n
\"description\": \"\"\n }\n },\n {\n \"column\": \"BMI (Before COVID)\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\":
4.117923349181555,\n \"min\": 1.1352561767623535,\n \"max\": 40.47230316682855,\n \"num_unique_values\":
577,\n \"samples\": [\n 26.045000630223903,\n 27.151870912738364,\n 36.15167580189387\n ],\n \"semantic_type\":
\"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"BMI (During COVID)\",\n \"properties\": {\n \"dtype\": \"number\",\n
\"std\": 7.678420507393607,\n \"min\": 0.15621094482299824,\n \"max\": 283.05720673941346,\n
\"num_unique_values\": 600,\n \"samples\": [\n 26.696125645979503,\n 28.903053243519697,\n
24.432400150219703\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"}},"metadata":
{},"execution_count":5}],"source":["# Assuming 'BMI Calculation_MJH.xlsx' is present in your directory\n","\n","# for csv
file pd.read_csv(\"file path\")\n","\n","df_excel = pd.read_excel('/content/BMI
Calculation_MJH.xlsx')\n","\n","print(\"\\nLoaded DataFrame from a Excel file:\\n\")\n","df_excel.head()\n"]},
{"cell_type":"markdown","source":["The dataset consists of the following columns:\n","\n","* **Feet and Inch:**
Representing the height of individuals, which could be considered as discrete variables for classification purposes, but
more often, height is treated as continuous when combined into a single metric (e.g., total inches or centimeters).\n","*
**Height (cm) and Height (m2):** Continuous variables representing height in centimeters and height squared in
meters squared, respectively, used in BMI calculations.\n","* **W1 and W2:** Weights before and during COVID,
continuous variables representing individuals' weight in kilograms at two different times.\n","* **BMI (Before COVID)
and BMI (During COVID):** Continuous variables representing individuals' Body Mass Index before and during the
COVID pandemic."],"metadata":{"id":"-sU4xT-rnViv"}},{"cell_type":"markdown","source":["### Viewing and Inspecting
Data\n","Objective: Familiarize with methods to view and inspect DataFrame properties."],"metadata":
{"id":"BY3qmLY2XzPO"}},{"cell_type":"code","source":["# Viewing the first few rows\n","print(df.head())\n","\n","#
Viewing the last few rows\n","print(df.tail())\n","\n","# Getting info about DataFrame\n","df.info()\n","\n","# Descriptive
statistics for numerical columns\n","print(df.describe())\n"],"metadata":{"colab":
{"base_uri":"https://localhost:8080/"},"id":"jDjW7jAPXLd4","outputId":"2ed5a7fd-c68b-4c80-d945-
69dafe26e9db","executionInfo":{"status":"ok","timestamp":1731168111463,"user_tz":-360,"elapsed":130,"user":
{"displayName":"Mohammad Rifat Ahmmad
Rashid","userId":"17207620860184690696"}}},"execution_count":null,"outputs":
[{"output_type":"stream","name":"stdout","text":[" Name Age City\n","0 John 28 New York\n","1 Anna 34 Paris\n","2
Peter 29 Berlin\n","3 Linda 32 London\n"," Name Age City\n","0 John 28 New York\n","1 Anna 34 Paris\n","2 Peter 29
Berlin\n","3 Linda 32 London\n","\n","RangeIndex: 4 entries, 0 to 3\n","Data columns (total 3 columns):\n"," # Column
Non-Null Count Dtype \n","--- ------ -------------- ----- \n"," 0 Name 4 non-null object\n"," 1 Age 4 non-null int64 \n"," 2 City
4 non-null object\n","dtypes: int64(1), object(2)\n","memory usage: 224.0+ bytes\n"," Age\n","count 4.000000\n","mean
30.750000\n","std 2.753785\n","min 28.000000\n","25% 28.750000\n","50% 30.500000\n","75% 32.500000\n","max
34.000000\n"]}]},{"cell_type":"code","source":["# Viewing the first few rows\n","print(df_excel.head())"],"metadata":
{"colab":{"base_uri":"https://localhost:8080/"},"id":"JFux6YA3oGpJ","executionInfo":
{"status":"ok","timestamp":1731238437582,"user_tz":-360,"elapsed":603,"user":{"displayName":"Mohammad Rifat
Ahmmad Rashid","userId":"17207620860184690696"}},"outputId":"de76d65e-16d2-4b44-8169-
a179570c34a6"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":[" Feet Inch Hieght
(cm) Hieght (m2) W1 W2 BMI (Before COVID) \\\n","0 5 6.0 167.64 2.810317 70.0 78.0 24.908222 \n","1 5 1.0 154.94
2.400640 51.0 55.0 21.244332 \n","2 5 0.0 152.40 2.322576 44.0 49.0 18.944482 \n","3 5 1.0 154.94 2.400640 49.0
47.0 20.411221 \n","4 5 3.0 160.02 2.560640 75.0 78.0 29.289552 \n","\n"," BMI (During COVID) \n","0 27.754876
\n","1 22.910554 \n","2 21.097264 \n","3 19.578110 \n","4 30.461134 \n"]}]},{"cell_type":"code","source":["# Viewing the
last few rows\n","print(df_excel.tail())"],"metadata":{"colab":
{"base_uri":"https://localhost:8080/"},"id":"6r8ynfOfoJC5","executionInfo":
{"status":"ok","timestamp":1731238452588,"user_tz":-360,"elapsed":473,"user":{"displayName":"Mohammad Rifat
Ahmmad Rashid","userId":"17207620860184690696"}},"outputId":"523ddccd-c731-47e2-d199-
5ba961256ebc"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":[" Feet Inch Hieght
(cm) Hieght (m2) W1 W2 BMI (Before COVID) \\\n","1597 5 9.0 175.26 3.071607 68.0 71.0 22.138251 \n","1598 6 0.0
182.88 3.344509 76.0 75.0 22.723811 \n","1599 5 6.0 167.64 2.810317 67.0 63.0 23.840727 \n","1600 6 3.0 190.50
3.629025 70.0 77.0 19.288927 \n","1601 5 3.0 160.02 2.560640 63.0 61.0 24.603224 \n","\n"," BMI (During COVID)
\n","1597 23.114938 \n","1598 22.424813 \n","1599 22.417400 \n","1600 21.217820 \n","1601 23.822169 \n"]}]},
{"cell_type":"code","source":["# Getting info about DataFrame\n","df_excel.info()"],"metadata":{"colab":
{"base_uri":"https://localhost:8080/"},"id":"g6q6-lnVoL15","executionInfo":
{"status":"ok","timestamp":1731238467338,"user_tz":-360,"elapsed":367,"user":{"displayName":"Mohammad Rifat
Ahmmad Rashid","userId":"17207620860184690696"}},"outputId":"1aafd189-8ece-4467-84d7-
ddcd1662540f"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["\n","RangeIndex:
1602 entries, 0 to 1601\n","Data columns (total 8 columns):\n"," # Column Non-Null Count Dtype \n","--- ------ -------------
- ----- \n"," 0 Feet 1602 non-null int64 \n"," 1 Inch 1602 non-null float64\n"," 2 Hieght (cm) 1602 non-null float64\n"," 3
Hieght (m2) 1602 non-null float64\n"," 4 W1 1602 non-null float64\n"," 5 W2 1602 non-null float64\n"," 6 BMI (Before
COVID) 1602 non-null float64\n"," 7 BMI (During COVID) 1602 non-null float64\n","dtypes: float64(7),
int64(1)\n","memory usage: 100.2 KB\n"]}]},{"cell_type":"code","source":["# Descriptive statistics for numerical
columns\n","df_excel.describe()"],"metadata":{"colab":
{"base_uri":"https://localhost:8080/","height":355},"id":"blNoDNguoAHK","executionInfo":
{"status":"ok","timestamp":1731238505722,"user_tz":-360,"elapsed":420,"user":{"displayName":"Mohammad Rifat
Ahmmad Rashid","userId":"17207620860184690696"}},"outputId":"befd1217-09a8-4b3e-e69b-
414407afdd7e"},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":[" Feet Inch Hieght
(cm) Hieght (m2) W1 \\\n","count 1602.000000 1602.000000 1602.000000 1602.000000 1602.000000 \n","mean
4.990012 4.876092 164.480855 2.714441 61.701623 \n","std 0.275867 3.092237 9.513994 0.316626 12.335592
\n","min 4.000000 0.000000 121.920000 1.486449 3.000000 \n","25% 5.000000 2.000000 157.480000 2.479995
53.000000 \n","50% 5.000000 5.000000 165.100000 2.725801 61.000000 \n","75% 5.000000 7.000000 170.180000
2.896123 69.000000 \n","max 7.000000 33.000000 236.220000 5.579989 118.000000 \n","\n"," W2 BMI (Before
COVID) BMI (During COVID) \n","count 1602.000000 1602.000000 1602.000000 \n","mean 63.661236 22.790117
23.525109 \n","std 21.016547 4.117923 7.678421 \n","min 0.400000 1.135256 0.156211 \n","25% 55.000000
20.112497 20.638241 \n","50% 62.000000 22.434305 23.129064 \n","75% 71.000000 24.993751 25.619886 \n","max
748.000000 40.472303 283.057207 "],"text/html":["\n","
\n","
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
\n"," \n"," \n"," \n"," \n"," \n","
BMI (Before BMI (During
Feet Inch Hieght (cm) Hieght (m2) W1 W2
COVID) COVID)
count 1602.000000 1602.000000 1602.000000 1602.000000 1602.000000 1602.000000 1602.000000 1602.000000
mean 4.990012 4.876092 164.480855 2.714441 61.701623 63.661236 22.790117 23.525109
std 0.275867 3.092237 9.513994 0.316626 12.335592 21.016547 4.117923 7.678421
min 4.000000 0.000000 121.920000 1.486449 3.000000 0.400000 1.135256 0.156211
25% 5.000000 2.000000 157.480000 2.479995 53.000000 55.000000 20.112497 20.638241
50% 5.000000 5.000000 165.100000 2.725801 61.000000 62.000000 22.434305 23.129064
75% 5.000000 7.000000 170.180000 2.896123 69.000000 71.000000 24.993751 25.619886
max 7.000000 33.000000 236.220000 5.579989 118.000000 748.000000 40.472303 283.057207
\n","
\n","
\n","\n","
\n" " \n" "\n" " \n" " \n" "\n" " \n" "\n" " \n" "
\n , \n , \n , \n , \n , \n , \n , \n , \n ,
\n","\n","\n","
\n","\n","
\n"," \n"," \n","\n","\n","\n"," \n","
\n","\n","
\n","
\n"],"application/vnd.google.colaboratory.intrinsic+json":{"type":"dataframe","summary":"{\n \"name\": \"df_excel\",\n
\"rows\": 8,\n \"fields\": [\n {\n \"column\": \"Feet\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\":
564.8165691141892,\n \"min\": 0.2758669257875601,\n \"max\": 1602.0,\n \"num_unique_values\": 6,\n \"samples\":
[\n 1602.0,\n 4.990012484394507,\n 7.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\":
\"Inch\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 563.7136218772606,\n \"min\": 0.0,\n \"max\": 1602.0,\n
\"num_unique_values\": 8,\n \"samples\": [\n 4.8760923845193505,\n 5.0,\n 1602.0\n ],\n \"semantic_type\": \"\",\n
\"description\": \"\"\n }\n },\n {\n \"column\": \"Hieght (cm)\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\":
518.6052678519698,\n \"min\": 9.513993865916744,\n \"max\": 1602.0,\n \"num_unique_values\": 8,\n \"samples\": [\n
164.48085518102374,\n 165.1,\n 1602.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\":
\"Hieght (m2)\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 565.4752755075044,\n \"min\":
0.31662606270109006,\n \"max\": 1602.0,\n \"num_unique_values\": 8,\n \"samples\": [\n 2.714441129843945,\n
2.725801,\n 1602.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"W1\",\n \"properties\":
{\n \"dtype\": \"number\",\n \"std\": 548.4417973432782,\n \"min\": 3.0,\n \"max\": 1602.0,\n \"num_unique_values\": 8,\n
\"samples\": [\n 61.70162297128589,\n 61.0,\n 1602.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n
\"column\": \"W2\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 570.9948056032903,\n \"min\": 0.4,\n \"max\":
1602.0,\n \"num_unique_values\": 8,\n \"samples\": [\n 63.661235955056185,\n 62.0,\n 1602.0\n ],\n \"semantic_type\":
\"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"BMI (Before COVID)\",\n \"properties\": {\n \"dtype\": \"number\",\n
\"std\": 559.6564305916718,\n \"min\": 1.1352561767623535,\n \"max\": 1602.0,\n \"num_unique_values\": 8,\n
\"samples\": [\n 22.79011745578402,\n 22.434304824015044,\n 1602.0\n ],\n \"semantic_type\": \"\",\n \"description\":
\"\"\n }\n },\n {\n \"column\": \"BMI (During COVID)\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\":
554.9553142069764,\n \"min\": 0.15621094482299824,\n \"max\": 1602.0,\n \"num_unique_values\": 8,\n \"samples\":
[\n 23.525108794387545,\n 23.129063705326676,\n 1602.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n
]\n}"}},"metadata":{},"execution_count":9}]},{"cell_type":"code","source":["df_excel['BMI (Before
COVID)'].mean()"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"WsXVnopq6qvS","executionInfo":
{"status":"ok","timestamp":1731238559647,"user_tz":-360,"elapsed":380,"user":{"displayName":"Mohammad Rifat
Ahmmad Rashid","userId":"17207620860184690696"}},"outputId":"13399a7e-6b4e-44e8-9ea4-
aa62ab8b5c8a"},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":
["22.79011745578402"]},"metadata":{},"execution_count":11}]},{"cell_type":"code","source":["!gdown --id
1b2_5EdoTVS5XbuuhTUYXl5tyuNk7pxML"],"metadata":{"colab":
{"base_uri":"https://localhost:8080/"},"id":"1PqJaiIi3ebl","executionInfo":
{"status":"ok","timestamp":1731238712036,"user_tz":-360,"elapsed":4314,"user":{"displayName":"Mohammad Rifat
Ahmmad Rashid","userId":"17207620860184690696"}},"outputId":"f198105e-56de-47ca-af54-
7a9651fe727d"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":
["/usr/local/lib/python3.10/dist-packages/gdown/__main__.py:140: FutureWarning: Option `--id` was deprecated in
version 4.3.1 and will be removed in 5.0. You don't need to pass it anymore to use a file ID.\n","
warnings.warn(\n","Downloading...\n","From: https://drive.google.com/uc?
id=1b2_5EdoTVS5XbuuhTUYXl5tyuNk7pxML\n","To: /content/user_behavior_dataset.csv\n","100% 38.9k/38.9k
[00:00<00:00, 50.7MB/s]\n"]}]},{"cell_type":"code","source":["df_csv =
pd.read_csv('/content/user_behavior_dataset.csv')\n","df_csv.head()"],"metadata":{"colab":
{"base_uri":"https://localhost:8080/","height":347},"id":"_LsS5lci3pBt","executionInfo":
{"status":"ok","timestamp":1731238747209,"user_tz":-360,"elapsed":393,"user":{"displayName":"Mohammad Rifat
Ahmmad Rashid","userId":"17207620860184690696"}},"outputId":"65599d7e-f45e-40a3-9842-
d6860088d39b"},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":[" User ID Device
Model Operating System App Usage Time (min/day) \\\n","0 1 Google Pixel 5 Android 393 \n","1 2 OnePlus 9 Android
268 \n","2 3 Xiaomi Mi 11 Android 154 \n","3 4 Google Pixel 5 Android 239 \n","4 5 iPhone 12 iOS 187 \n","\n"," Screen
On Time (hours/day) Battery Drain (mAh/day) \\\n","0 6.4 1872 \n","1 4.7 1331 \n","2 4.0 761 \n","3 4.8 1676 \n","4 4.3
1367 \n","\n"," Number of Apps Installed Data Usage (MB/day) Age Gender \\\n","0 67 1122 40 Male \n","1 42 944 47
Female \n","2 32 322 42 Male \n","3 56 871 20 Male \n","4 58 988 31 Female \n","\n"," User Behavior Class \n","0 4
\n","1 3 \n","2 2 \n","3 3 \n","4 3 "],"text/html":["\n","
\n","
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
App
Screen On Battery Number Data User
User Device Operating Usage
Time Drain of Apps Usage Age Gender Behavior
ID Model System Time
(hours/day) (mAh/day) Installed (MB/day) Class
(min/day)
Google
01 Android 393 6.4 1872 67 1122 40 Male 4
Pixel 5
OnePlus
12 Android 268 4.7 1331 42 944 47 Female 3
9
Xiaomi
23 Android 154 4.0 761 32 322 42 Male 2
Mi 11
App
Screen On Battery Number Data User
User Device Operating Usage
Time Drain of Apps Usage Age Gender Behavior
ID Model System Time
(hours/day) (mAh/day) Installed (MB/day) Class
(min/day)
Google
34 Android 239 4.8 1676 56 871 20 Male 3
Pixel 5
iPhone
45 iOS 187 4.3 1367 58 988 31 Female 3
12
\n","
\n","
\n","\n","
\n" " \n" "\n" " \n" " \n" "\n" " \n" "\n" " \n" "
\n , \n , \n , \n , \n , \n , \n , \n , \n ,
\n","\n","\n","
\n","\n","
\n"," \n"," \n","\n","\n","\n"," \n","
\n","\n","
\n","
\n"],"application/vnd.google.colaboratory.intrinsic+json":{"type":"dataframe","variable_name":"df_csv","summary":"{\n
\"name\": \"df_csv\",\n \"rows\": 700,\n \"fields\": [\n {\n \"column\": \"User ID\",\n \"properties\": {\n \"dtype\":
\"number\",\n \"std\": 202,\n \"min\": 1,\n \"max\": 700,\n \"num_unique_values\": 700,\n \"samples\": [\n 159,\n 501,\n
397\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Device Model\",\n \"properties\": {\n
\"dtype\": \"category\",\n \"num_unique_values\": 5,\n \"samples\": [\n \"OnePlus 9\",\n \"Samsung Galaxy S21\",\n
\"Xiaomi Mi 11\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Operating System\",\n
\"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 2,\n \"samples\": [\n \"iOS\",\n \"Android\"\n ],\n
\"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"App Usage Time (min/day)\",\n \"properties\": {\n
\"dtype\": \"number\",\n \"std\": 177,\n \"min\": 30,\n \"max\": 598,\n \"num_unique_values\": 387,\n \"samples\": [\n
582,\n 402\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Screen On Time (hours/day)\",\n
\"properties\": {\n \"dtype\": \"number\",\n \"std\": 3.068583910273257,\n \"min\": 1.0,\n \"max\": 12.0,\n
\"num_unique_values\": 108,\n \"samples\": [\n 10.8,\n 1.4\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n
{\n \"column\": \"Battery Drain (mAh/day)\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 819,\n \"min\": 302,\n
\"max\": 2993,\n \"num_unique_values\": 628,\n \"samples\": [\n 2597,\n 1632\n ],\n \"semantic_type\": \"\",\n
\"description\": \"\"\n }\n },\n {\n \"column\": \"Number of Apps Installed\",\n \"properties\": {\n \"dtype\": \"number\",\n
\"std\": 26,\n \"min\": 10,\n \"max\": 99,\n \"num_unique_values\": 86,\n \"samples\": [\n 79,\n 67\n ],\n \"semantic_type\":
\"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Data Usage (MB/day)\",\n \"properties\": {\n \"dtype\": \"number\",\n
\"std\": 640,\n \"min\": 102,\n \"max\": 2497,\n \"num_unique_values\": 585,\n \"samples\": [\n 839,\n 765\n ],\n
\"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Age\",\n \"properties\": {\n \"dtype\": \"number\",\n
\"std\": 12,\n \"min\": 18,\n \"max\": 59,\n \"num_unique_values\": 42,\n \"samples\": [\n 56,\n 26\n ],\n \"semantic_type\":
\"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Gender\",\n \"properties\": {\n \"dtype\": \"category\",\n
\"num_unique_values\": 2,\n \"samples\": [\n \"Female\",\n \"Male\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n
}\n },\n {\n \"column\": \"User Behavior Class\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1,\n \"min\": 1,\n
\"max\": 5,\n \"num_unique_values\": 5,\n \"samples\": [\n 3,\n 1\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n
}\n ]\n}"}},"metadata":{},"execution_count":13}]},{"cell_type":"code","source":["df_csv.info()"],"metadata":{"colab":
{"base_uri":"https://localhost:8080/"},"id":"YaiMAStr4d9N","executionInfo":
{"status":"ok","timestamp":1731238952694,"user_tz":-360,"elapsed":4,"user":{"displayName":"Mohammad Rifat
Ahmmad Rashid","userId":"17207620860184690696"}},"outputId":"4b5b4e6d-8d80-4589-95ba-
3a6c5568c0ab"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["\n","RangeIndex:
700 entries, 0 to 699\n","Data columns (total 11 columns):\n"," # Column Non-Null Count Dtype \n","--- ------ --------------
----- \n"," 0 User ID 700 non-null int64 \n"," 1 Device Model 700 non-null object \n"," 2 Operating System 700 non-null
object \n"," 3 App Usage Time (min/day) 700 non-null int64 \n"," 4 Screen On Time (hours/day) 700 non-null float64\n","
5 Battery Drain (mAh/day) 700 non-null int64 \n"," 6 Number of Apps Installed 700 non-null int64 \n"," 7 Data Usage
(MB/day) 700 non-null int64 \n"," 8 Age 700 non-null int64 \n"," 9 Gender 700 non-null object \n"," 10 User Behavior
Class 700 non-null int64 \n","dtypes: float64(1), int64(7), object(3)\n","memory usage: 60.3+ KB\n"]}]},
{"cell_type":"code","source":["df.describe()"],"metadata":{"colab":
{"base_uri":"https://localhost:8080/","height":300},"id":"VOrDqXS84hNF","executionInfo":
{"status":"ok","timestamp":1731238968808,"user_tz":-360,"elapsed":397,"user":{"displayName":"Mohammad Rifat
Ahmmad Rashid","userId":"17207620860184690696"}},"outputId":"b8db2684-06ca-45d3-ce11-
d6462611ef84"},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":[" Age\n","count
4.000000\n","mean 30.750000\n","std 2.753785\n","min 28.000000\n","25% 28.750000\n","50% 30.500000\n","75%
32.500000\n","max 34.000000"],"text/html":["\n","
\n","
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
Age
count 4.000000
mean 30.750000
std 2.753785
min 28.000000
25% 28.750000
50% 30.500000
75% 32.500000
max 34.000000
\n","
\n","
\n","\n","
\n" " \n" "\n" " \n" " \n" "\n" " \n" "\n" " \n" "
\n , \n , \n , \n , \n , \n , \n , \n , \n ,
\n","\n","\n","
\n","\n","
\n"," \n"," \n","\n","\n","\n"," \n","
\n","\n","
\n","
\n"],"application/vnd.google.colaboratory.intrinsic+json":{"type":"dataframe","summary":"{\n \"name\": \"df\",\n \"rows\":
8,\n \"fields\": [\n {\n \"column\": \"Age\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 12.817159456014078,\n
\"min\": 2.753785273643051,\n \"max\": 34.0,\n \"num_unique_values\": 8,\n \"samples\": [\n 30.75,\n 30.5,\n 4.0\n ],\n
\"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"}},"metadata":{},"execution_count":15}]},
{"cell_type":"markdown","source":["### Indexing, Selecting, and Filtering\n","Objective: Select and filter DataFrame
rows and columns."],"metadata":{"id":"4JZ3LctdX4gs"}},{"cell_type":"code","source":["# Select specific
columns\n","print(\"Names column:\\n\", df['Name'])\n","\n","# Filter rows based on a condition\n","print(\"\\nRows where
Age > 30:\\n\", df[df['Age'] > 30])\n","\n","# Label-based indexing\n","print(\"\\nSelect specific row and column with
.loc:\\n\",\n"," df.loc[1, 'Name'])\n","\n","# Integer-based indexing\n","print(\"\\nSelect specific row and column with
.iloc:\\n\",\n"," df.iloc[1, 0])\n"],"metadata":{"colab":
{"base_uri":"https://localhost:8080/"},"id":"2L36VKjNX2I3","outputId":"4e29ae83-f17d-4a4c-fb9a-
14025a89e832","executionInfo":{"status":"ok","timestamp":1731168111465,"user_tz":-360,"elapsed":101,"user":
{"displayName":"Mohammad Rifat Ahmmad
Rashid","userId":"17207620860184690696"}}},"execution_count":null,"outputs":
[{"output_type":"stream","name":"stdout","text":["Names column:\n"," 0 John\n","1 Anna\n","2 Peter\n","3
Linda\n","Name: Name, dtype: object\n","\n","Rows where Age > 30:\n"," Name Age City\n","1 Anna 34 Paris\n","3
Linda 32 London\n","\n","Select specific row and column with .loc:\n"," Anna\n","\n","Select specific row and column
with .iloc:\n"," Anna\n"]}]},{"cell_type":"code","source":["df_excel.head()"],"metadata":{"colab":
{"base_uri":"https://localhost:8080/","height":206},"id":"MYPCS-nw56At","executionInfo":
{"status":"ok","timestamp":1731239330808,"user_tz":-360,"elapsed":402,"user":{"displayName":"Mohammad Rifat
Ahmmad Rashid","userId":"17207620860184690696"}},"outputId":"571042dd-b2c3-4a59-cbfc-
b0188f126711"},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":[" Feet Inch Hieght
(cm) Hieght (m2) W1 W2 BMI (Before COVID) \\\n","0 5 6.0 167.64 2.810317 70.0 78.0 24.908222 \n","1 5 1.0 154.94
2.400640 51.0 55.0 21.244332 \n","2 5 0.0 152.40 2.322576 44.0 49.0 18.944482 \n","3 5 1.0 154.94 2.400640 49.0
47.0 20.411221 \n","4 5 3.0 160.02 2.560640 75.0 78.0 29.289552 \n","\n"," BMI (During COVID) \n","0 27.754876
\n","1 22.910554 \n","2 21.097264 \n","3 19.578110 \n","4 30.461134 "],"text/html":["\n","
\n","
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
Feet Inch Hieght (cm) Hieght (m2) W1 W2 BMI (Before COVID) BMI (During COVID)
05 6.0 167.64 2.810317 70.0 78.0 24.908222 27.754876
15 1.0 154.94 2.400640 51.0 55.0 21.244332 22.910554
25 0.0 152.40 2.322576 44.0 49.0 18.944482 21.097264
35 1.0 154.94 2.400640 49.0 47.0 20.411221 19.578110
45 3.0 160.02 2.560640 75.0 78.0 29.289552 30.461134
\n","
\n","
\n","\n","
\n" " \n" "\n" " \n" " \n" "\n" " \n" "\n" " \n" "
\n , \n , \n , \n , \n , \n , \n , \n , \n ,
\n","\n","\n","
\n","\n","
\n"," \n"," \n","\n","\n","\n"," \n","
\n","\n","
\n","
\n"],"application/vnd.google.colaboratory.intrinsic+json":{"type":"dataframe","variable_name":"df_excel","summary":"{\n
\"name\": \"df_excel\",\n \"rows\": 1602,\n \"fields\": [\n {\n \"column\": \"Feet\",\n \"properties\": {\n \"dtype\":
\"number\",\n \"std\": 0,\n \"min\": 4,\n \"max\": 7,\n \"num_unique_values\": 4,\n \"samples\": [\n 6,\n 7,\n 5\n ],\n
\"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Inch\",\n \"properties\": {\n \"dtype\": \"number\",\n
\"std\": 3.0922374995274406,\n \"min\": 0.0,\n \"max\": 33.0,\n \"num_unique_values\": 21,\n \"samples\": [\n 6.0,\n
7.5,\n 8.5\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Hieght (cm)\",\n \"properties\": {\n
\"dtype\": \"number\",\n \"std\": 9.513993865916744,\n \"min\": 121.92,\n \"max\": 236.22,\n \"num_unique_values\":
40,\n \"samples\": [\n 185.42000000000002,\n 158.75,\n 200.66\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n
},\n {\n \"column\": \"Hieght (m2)\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.31662606270109006,\n \"min\":
1.48644864,\n \"max\": 5.57998884,\n \"num_unique_values\": 40,\n \"samples\": [\n 3.4380576400000002,\n
2.52015625,\n 4.026443560000001\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"W1\",\n
\"properties\": {\n \"dtype\": \"number\",\n \"std\": 12.335592024444987,\n \"min\": 3.0,\n \"max\": 118.0,\n
\"num_unique_values\": 78,\n \"samples\": [\n 80.0,\n 70.0,\n 71.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n
}\n },\n {\n \"column\": \"W2\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 21.01654716375466,\n \"min\": 0.4,\n
\"max\": 748.0,\n \"num_unique_values\": 102,\n \"samples\": [\n 61.8,\n 46.2,\n 37.0\n ],\n \"semantic_type\": \"\",\n
\"description\": \"\"\n }\n },\n {\n \"column\": \"BMI (Before COVID)\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\":
4.117923349181555,\n \"min\": 1.1352561767623535,\n \"max\": 40.47230316682855,\n \"num_unique_values\":
577,\n \"samples\": [\n 26.045000630223903,\n 27.151870912738364,\n 36.15167580189387\n ],\n \"semantic_type\":
\"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"BMI (During COVID)\",\n \"properties\": {\n \"dtype\": \"number\",\n
\"std\": 7.678420507393607,\n \"min\": 0.15621094482299824,\n \"max\": 283.05720673941346,\n
\"num_unique_values\": 600,\n \"samples\": [\n 26.696125645979503,\n 28.903053243519697,\n
24.432400150219703\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"}},"metadata":
{},"execution_count":16}]},{"cell_type":"code","source":["df_excel['W2']"],"metadata":
{"id":"jzuNZ_Zh5_7M","executionInfo":{"status":"ok","timestamp":1731239357824,"user_tz":-360,"elapsed":387,"user":
{"displayName":"Mohammad Rifat Ahmmad Rashid","userId":"17207620860184690696"}},"outputId":"fb751bff-5f31-
46e0-b191-8dd971b061b3","colab":{"base_uri":"https://localhost:8080/","height":458}},"execution_count":null,"outputs":
[{"output_type":"execute_result","data":{"text/plain":["0 78.0\n","1 55.0\n","2 49.0\n","3 47.0\n","4 78.0\n"," ... \n","1597
71.0\n","1598 75.0\n","1599 63.0\n","1600 77.0\n","1601 61.0\n","Name: W2, Length: 1602, dtype: float64"],"text/html":
["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
\n"," \n"," \n"," \n"," \n","
W2
0 78.0
1 55.0
2 49.0
3 47.0
4 78.0
... ...
1597 71.0
1598 75.0
1599 63.0
1600 77.0
1601 61.0
\n","

1602 rows × 1 columns

\n","

dtype: float64"]},"metadata":{},"execution_count":17}]},{"cell_type":"code","source":["df_csv.info()"],"metadata":{"colab":
{"base_uri":"https://localhost:8080/"},"id":"0PRRk1cXpX4D","executionInfo":
{"status":"ok","timestamp":1731168111466,"user_tz":-360,"elapsed":99,"user":{"displayName":"Mohammad Rifat
Ahmmad Rashid","userId":"17207620860184690696"}},"outputId":"61075d9b-9b2c-4617-c778-
63cbe73173e6"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["\n","RangeIndex:
1602 entries, 0 to 1601\n","Data columns (total 8 columns):\n"," # Column Non-Null Count Dtype \n","--- ------ -------------
- ----- \n"," 0 Feet 1602 non-null int64 \n"," 1 Inch 1602 non-null float64\n"," 2 Hieght (cm) 1602 non-null float64\n"," 3
Hieght (m2) 1602 non-null float64\n"," 4 W1 1602 non-null float64\n"," 5 W2 1602 non-null float64\n"," 6 BMI (Before
COVID) 1602 non-null float64\n"," 7 BMI (During COVID) 1602 non-null float64\n","dtypes: float64(7),
int64(1)\n","memory usage: 100.2 KB\n"]}]},{"cell_type":"code","source":["# Select specific columns\n","print(\"Names
column:\\n\", df_csv['W1'])\n"],"metadata":{"colab":
{"base_uri":"https://localhost:8080/"},"id":"hcANMGIMpeBx","executionInfo":
{"status":"ok","timestamp":1731168111466,"user_tz":-360,"elapsed":96,"user":{"displayName":"Mohammad Rifat
Ahmmad Rashid","userId":"17207620860184690696"}},"outputId":"984da36f-e035-4a60-afe6-
79ee959ed2f7"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["Names column:\n","
0 70.0\n","1 51.0\n","2 44.0\n","3 49.0\n","4 75.0\n"," ... \n","1597 68.0\n","1598 76.0\n","1599 67.0\n","1600
70.0\n","1601 63.0\n","Name: W1, Length: 1602, dtype: float64\n"]}]},{"cell_type":"code","source":["# Filter rows based
on a condition\n","print(\"\\nRows where W1 > 70:\\n\", df_csv[df_csv['W1'] < 70])"],"metadata":{"colab":
{"base_uri":"https://localhost:8080/"},"id":"EJmNX_YXpnyJ","executionInfo":
{"status":"ok","timestamp":1731168111466,"user_tz":-360,"elapsed":94,"user":{"displayName":"Mohammad Rifat
Ahmmad Rashid","userId":"17207620860184690696"}},"outputId":"912222aa-ed12-4709-8884-
4c84d31cb8b4"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["\n","Rows where W1
> 70:\n"," Feet Inch Hieght (cm) Hieght (m2) W1 W2 BMI (Before COVID) \\\n","1 5 1.0 154.94 2.400640 51.0 55.0
21.244332 \n","2 5 0.0 152.40 2.322576 44.0 49.0 18.944482 \n","3 5 1.0 154.94 2.400640 49.0 47.0 20.411221 \n","6
5 6.0 167.64 2.810317 68.0 74.0 24.196559 \n","7 5 8.0 172.72 2.983220 69.0 72.0 23.129372 \n","... ... ... ... ... ... ...
... \n","1595 4 11.0 149.86 2.245802 59.0 62.0 26.271239 \n","1596 5 8.0 172.72 2.983220 67.0 64.0 22.458955
\n","1597 5 9.0 175.26 3.071607 68.0 71.0 22.138251 \n","1599 5 6.0 167.64 2.810317 67.0 63.0 23.840727 \n","1601
5 3.0 160.02 2.560640 63.0 61.0 24.603224 \n","\n"," BMI (During COVID) \n","1 22.910554 \n","2 21.097264 \n","3
19.578110 \n","6 26.331549 \n","7 24.134996 \n","... ... \n","1595 27.607065 \n","1596 21.453330 \n","1597 23.114938
\n","1599 22.417400 \n","1601 23.822169 \n","\n","[1227 rows x 8 columns]\n"]}]},{"cell_type":"code","source":["# Label-
based indexing\n","print(\"\\nSelect specific row and column with .loc:\\n\", df_csv.loc[1598, 'Hieght (cm)'])"],"metadata":
{"colab":{"base_uri":"https://localhost:8080/"},"id":"Vl2Cx0OWpzsp","executionInfo":
{"status":"ok","timestamp":1731168111466,"user_tz":-360,"elapsed":91,"user":{"displayName":"Mohammad Rifat
Ahmmad Rashid","userId":"17207620860184690696"}},"outputId":"7abe114b-bc4e-4d1c-9fa4-
5218068fca28"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["\n","Select specific
row and column with .loc:\n"," 182.88\n"]}]},{"cell_type":"code","source":["df_csv.iloc[1550,5]\n"],"metadata":{"colab":
{"base_uri":"https://localhost:8080/"},"id":"O4VbxEmWqJx6","executionInfo":
{"status":"ok","timestamp":1731168111466,"user_tz":-360,"elapsed":87,"user":{"displayName":"Mohammad Rifat
Ahmmad Rashid","userId":"17207620860184690696"}},"outputId":"cd32bd0c-623b-425b-bc09-
9388c5b4a577"},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":
["71.0"]},"metadata":{},"execution_count":14}]},{"cell_type":"code","source":["df_csv.loc[1550,'W2']"],"metadata":
{"colab":{"base_uri":"https://localhost:8080/"},"id":"p8vfUiZxqZuw","executionInfo":
{"status":"ok","timestamp":1731168111467,"user_tz":-360,"elapsed":69,"user":{"displayName":"Mohammad Rifat
Ahmmad Rashid","userId":"17207620860184690696"}},"outputId":"c535fc83-17ed-4aac-f9e6-
772d0a412913"},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":
["71.0"]},"metadata":{},"execution_count":15}]},{"cell_type":"code","source":["# Label-based indexing\n","print(\"\\nSelect
specific row and column with .loc:\\n\", df.loc[1, 'Name'])\n","\n","# Integer-based indexing\n","print(\"\\nSelect specific
row and column with .iloc:\\n\", df.iloc[1, 0])"],"metadata":{"id":"dl7ygBXcpdDZ","colab":
{"base_uri":"https://localhost:8080/"},"executionInfo":
{"status":"ok","timestamp":1731168111467,"user_tz":-360,"elapsed":57,"user":{"displayName":"Mohammad Rifat
Ahmmad Rashid","userId":"17207620860184690696"}},"outputId":"6351b174-424a-402a-9bad-
c33824731af8"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["\n","Select specific
row and column with .loc:\n"," Anna\n","\n","Select specific row and column with .iloc:\n"," Anna\n"]}]},
{"cell_type":"code","source":["df_csv.info()"],"metadata":{"colab":
{"base_uri":"https://localhost:8080/"},"id":"ez68Btg08Kuq","executionInfo":
{"status":"ok","timestamp":1731168111467,"user_tz":-360,"elapsed":38,"user":{"displayName":"Mohammad Rifat
Ahmmad Rashid","userId":"17207620860184690696"}},"outputId":"805908be-3021-4612-8a12-
0accfd1e0152"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["\n","RangeIndex:
1602 entries, 0 to 1601\n","Data columns (total 8 columns):\n"," # Column Non-Null Count Dtype \n","--- ------ -------------
- ----- \n"," 0 Feet 1602 non-null int64 \n"," 1 Inch 1602 non-null float64\n"," 2 Hieght (cm) 1602 non-null float64\n"," 3
Hieght (m2) 1602 non-null float64\n"," 4 W1 1602 non-null float64\n"," 5 W2 1602 non-null float64\n"," 6 BMI (Before
COVID) 1602 non-null float64\n"," 7 BMI (During COVID) 1602 non-null float64\n","dtypes: float64(7),
int64(1)\n","memory usage: 100.2 KB\n"]}]},{"cell_type":"code","source":["df_csv.loc[:,'BMI (During
COVID)']"],"metadata":{"colab":
{"base_uri":"https://localhost:8080/","height":458},"id":"BoGV74Lm8S38","executionInfo":
{"status":"ok","timestamp":1731168111467,"user_tz":-360,"elapsed":35,"user":{"displayName":"Mohammad Rifat
Ahmmad Rashid","userId":"17207620860184690696"}},"outputId":"0a08908d-3894-4442-9c94-
a3b07e4ec8a7"},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":["0
27.754876\n","1 22.910554\n","2 21.097264\n","3 19.578110\n","4 30.461134\n"," ... \n","1597 23.114938\n","1598
22.424813\n","1599 22.417400\n","1600 21.217820\n","1601 23.822169\n","Name: BMI (During COVID), Length:
1602, dtype: float64"],"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
\n"," \n"," \n"," \n"," \n","
BMI (During COVID)
0 27.754876
1 22.910554
2 21.097264
3 19.578110
4 30.461134
BMI (During COVID)
... ...
1597 23.114938
1598 22.424813
1599 22.417400
1600 21.217820
1601 23.822169
\n","

1602 rows × 1 columns

\n","

dtype: float64"]},"metadata":{},"execution_count":18}]},{"cell_type":"markdown","source":["# Part 2: Data Cleaning and


Preprocessing\n"],"metadata":{"id":"oauuj_dbYLXq"}},{"cell_type":"markdown","source":["## Handling Missing
Values\n","Objective: Identify and handle missing values in a DataFrame."],"metadata":{"id":"MO_suv1_YNjj"}},
{"cell_type":"code","source":["!gdown --id 1E74UleBaEe6GYPb9vx3lxI4Iq-0n8hGI"],"metadata":{"colab":
{"base_uri":"https://localhost:8080/"},"id":"4kaz-ivgrhSC","executionInfo":
{"status":"ok","timestamp":1731168113690,"user_tz":-360,"elapsed":2256,"user":{"displayName":"Mohammad Rifat
Ahmmad Rashid","userId":"17207620860184690696"}},"outputId":"fd1c464d-cee1-49c6-ab9d-
8f7516570870"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":
["/usr/local/lib/python3.10/dist-packages/gdown/__main__.py:140: FutureWarning: Option `--id` was deprecated in
version 4.3.1 and will be removed in 5.0. You don't need to pass it anymore to use a file ID.\n","
warnings.warn(\n","Downloading...\n","From: https://drive.google.com/uc?id=1E74UleBaEe6GYPb9vx3lxI4Iq-
0n8hGI\n","To: /content/BMI Calculation_MJH_With_Null.csv\n","100% 83.9k/83.9k [00:00<00:00, 50.0MB/s]\n"]}]},
{"cell_type":"code","source":["pd_null = pd.read_csv(\"/content/BMI Calculation_MJH_With_Null.csv\")"],"metadata":
{"id":"AavqDiazrplZ"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["pd_null.info()"],"metadata":
{"colab":{"base_uri":"https://localhost:8080/"},"id":"IXDiMCE3rvCi","executionInfo":
{"status":"ok","timestamp":1731168113690,"user_tz":-360,"elapsed":303,"user":{"displayName":"Mohammad Rifat
Ahmmad Rashid","userId":"17207620860184690696"}},"outputId":"a9bbd55d-61ea-4939-82d4-
a5aeb14074cb"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["\n","RangeIndex:
1602 entries, 0 to 1601\n","Data columns (total 8 columns):\n"," # Column Non-Null Count Dtype \n","--- ------ -------------
- ----- \n"," 0 Feet 1602 non-null int64 \n"," 1 Inch 1602 non-null float64\n"," 2 Hieght (cm) 1602 non-null float64\n"," 3
Hieght (m2) 1602 non-null float64\n"," 4 W1 1600 non-null float64\n"," 5 W2 1600 non-null object \n"," 6 BMI (Before
COVID) 1602 non-null float64\n"," 7 BMI (During COVID) 1600 non-null float64\n","dtypes: float64(6), int64(1),
object(1)\n","memory usage: 100.2+ KB\n"]}]},{"cell_type":"code","source":["# Assuming 'df' contains some missing
values\n","# Identify missing values\n","print(pd_null.isnull().sum())"],"metadata":{"colab":
{"base_uri":"https://localhost:8080/"},"id":"CZNOfZBVr5HB","executionInfo":
{"status":"ok","timestamp":1731168113691,"user_tz":-360,"elapsed":298,"user":{"displayName":"Mohammad Rifat
Ahmmad Rashid","userId":"17207620860184690696"}},"outputId":"db25188a-5138-48f8-966d-
864bc0a54803"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["Feet 0\n","Inch
0\n","Hieght (cm) 0\n","Hieght (m2) 0\n","W1 2\n","W2 2\n","BMI (Before COVID) 0\n","BMI (During COVID)
2\n","dtype: int64\n"]}]},{"cell_type":"code","source":["# Fill missing values\n","df_filled = pd_null.fillna(value={'W1':
pd_null['W1'].mean(),\n"," 'W2': pd_null['BMI (During COVID)'].mean()})"],"metadata":
{"id":"Sfry7JgksC3x"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":
["print(df_filled.isnull().sum())"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"88CIta-
2sTEp","executionInfo":{"status":"ok","timestamp":1731168113691,"user_tz":-360,"elapsed":294,"user":
{"displayName":"Mohammad Rifat Ahmmad Rashid","userId":"17207620860184690696"}},"outputId":"f3818e46-efed-
45ea-f21d-bc5fd688229c"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["Feet
0\n","Inch 0\n","Hieght (cm) 0\n","Hieght (m2) 0\n","W1 0\n","W2 0\n","BMI (Before COVID) 0\n","BMI (During COVID)
2\n","dtype: int64\n"]}]},{"cell_type":"code","source":["# Assuming 'df' contains some missing values\n","# Identify
missing values\n","print(df.isnull().sum())\n","\n","# Fill missing values\n","df_filled = df.fillna(value={'Age':
df['Age'].mean()})\n","\n","# Drop rows with missing values\n","df_dropped = df.dropna()\n"],"metadata":{"colab":
{"base_uri":"https://localhost:8080/"},"id":"n2jqwBBRX7Lz","outputId":"42de0b50-bb66-461b-9ad9-
84f925bcf200","executionInfo":{"status":"ok","timestamp":1731168113691,"user_tz":-360,"elapsed":291,"user":
{"displayName":"Mohammad Rifat Ahmmad
Rashid","userId":"17207620860184690696"}}},"execution_count":null,"outputs":
[{"output_type":"stream","name":"stdout","text":["Name 0\n","Age 0\n","City 0\n","dtype: int64\n"]}]},
{"cell_type":"markdown","source":["## Data Types and Conversion\n","Objective: Check and convert the data types of
DataFrame columns."],"metadata":{"id":"O0U7Z6isY41T"}},{"cell_type":"code","source":["# Check data
types\n","print(df.dtypes)\n","\n","# Convert data types\n","df['Age'] = df['Age'].astype(float)\n"],"metadata":{"colab":
{"base_uri":"https://localhost:8080/"},"id":"WdeBdo-WYPxZ","outputId":"d1d0a55f-5838-4919-a65d-
2094c640ab03","executionInfo":{"status":"ok","timestamp":1731168113691,"user_tz":-360,"elapsed":287,"user":
{"displayName":"Mohammad Rifat Ahmmad
Rashid","userId":"17207620860184690696"}}},"execution_count":null,"outputs":
[{"output_type":"stream","name":"stdout","text":["Name object\n","Age int64\n","City object\n","dtype: object\n"]}]},
{"cell_type":"code","source":["pd_null.info()"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"bw1G4ff6-
A6D","executionInfo":{"status":"ok","timestamp":1731168113691,"user_tz":-360,"elapsed":267,"user":
{"displayName":"Mohammad Rifat Ahmmad Rashid","userId":"17207620860184690696"}},"outputId":"0ddf9ec6-9c49-
4241-dd9f-0b6992d70513"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":
["\n","RangeIndex: 1602 entries, 0 to 1601\n","Data columns (total 8 columns):\n"," # Column Non-Null Count Dtype
\n","--- ------ -------------- ----- \n"," 0 Feet 1602 non-null int64 \n"," 1 Inch 1602 non-null float64\n"," 2 Hieght (cm) 1602
non-null float64\n"," 3 Hieght (m2) 1602 non-null float64\n"," 4 W1 1600 non-null float64\n"," 5 W2 1600 non-null object
\n"," 6 BMI (Before COVID) 1602 non-null float64\n"," 7 BMI (During COVID) 1600 non-null float64\n","dtypes:
float64(6), int64(1), object(1)\n","memory usage: 100.2+ KB\n"]}]},{"cell_type":"code","source":["# Remove percentage
signs and convert to float\n","pd_null['W2'] = pd_null['W2'].str.replace('%', '').astype('float32') / 100.0\n","\n","# Check
the first few rows to ensure the conversion is done correctly\n","pd_null.head()\n"],"metadata":{"colab":
{"base_uri":"https://localhost:8080/","height":206},"id":"YFf8Uq3BE052","executionInfo":
{"status":"ok","timestamp":1731175113152,"user_tz":-360,"elapsed":1863,"user":{"displayName":"Mohammad Rifat
Ahmmad Rashid","userId":"17207620860184690696"}},"outputId":"e0482654-336b-47ce-f704-
5d0f101615dc"},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":[" Feet Inch Hieght
(cm) Hieght (m2) W1 W2 BMI (Before COVID) \\\n","0 5 6.0 167.64 2.810317 70.0 NaN 24.908222 \n","1 5 1.0 154.94
2.400640 NaN 0.55 0.000000 \n","2 5 0.0 152.40 2.322576 44.0 0.49 18.944482 \n","3 5 1.0 154.94 2.400640 49.0
NaN 20.411221 \n","4 5 3.0 160.02 2.560640 NaN 0.78 0.000000 \n","\n"," BMI (During COVID) \n","0 NaN \n","1
22.910554 \n","2 21.097264 \n","3 0.000000 \n","4 30.461134 "],"text/html":["\n","
\n","
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
Feet Inch Hieght (cm) Hieght (m2) W1 W2 BMI (Before COVID) BMI (During COVID)
05 6.0 167.64 2.810317 70.0 NaN 24.908222 NaN
15 1.0 154.94 2.400640 NaN 0.55 0.000000 22.910554
25 0.0 152.40 2.322576 44.0 0.49 18.944482 21.097264
35 1.0 154.94 2.400640 49.0 NaN 20.411221 0.000000
45 3.0 160.02 2.560640 NaN 0.78 0.000000 30.461134
\n","
\n","
\n","\n","
\n" " \n" "\n" " \n" " \n" "\n" " \n" "\n" " \n" "
\n , \n , \n , \n , \n , \n , \n , \n , \n ,
\n","\n","\n","
\n","\n","
\n"," \n"," \n","\n","\n","\n"," \n","
\n","\n","
\n","
\n"],"application/vnd.google.colaboratory.intrinsic+json":{"type":"dataframe","variable_name":"pd_null","summary":"{\n
\"name\": \"pd_null\",\n \"rows\": 1602,\n \"fields\": [\n {\n \"column\": \"Feet\",\n \"properties\": {\n \"dtype\": \"number\",\n
\"std\": 0,\n \"min\": 4,\n \"max\": 7,\n \"num_unique_values\": 4,\n \"samples\": [\n 6,\n 7,\n 5\n ],\n \"semantic_type\":
\"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Inch\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\":
3.0922374995274406,\n \"min\": 0.0,\n \"max\": 33.0,\n \"num_unique_values\": 21,\n \"samples\": [\n 6.0,\n 7.5,\n 8.5\n
],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Hieght (cm)\",\n \"properties\": {\n \"dtype\":
\"number\",\n \"std\": 9.513993865916744,\n \"min\": 121.92,\n \"max\": 236.22,\n \"num_unique_values\": 40,\n
\"samples\": [\n 185.42,\n 158.75,\n 200.66\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\":
\"Hieght (m2)\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.3166260627010893,\n \"min\": 1.48644864,\n
\"max\": 5.57998884,\n \"num_unique_values\": 40,\n \"samples\": [\n 3.43805764,\n 2.52015625,\n 4.02644356\n ],\n
\"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"W1\",\n \"properties\": {\n \"dtype\": \"number\",\n
\"std\": 12.33592048993348,\n \"min\": 3.0,\n \"max\": 118.0,\n \"num_unique_values\": 78,\n \"samples\": [\n 80.0,\n
70.0,\n 71.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"W2\",\n \"properties\": {\n
\"dtype\": \"float32\",\n \"num_unique_values\": 101,\n \"samples\": [\n 0.9900000095367432,\n
0.8399999737739563,\n 0.4620000123977661\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n
\"column\": \"BMI (Before COVID)\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 4.192523190526696,\n \"min\":
0.0,\n \"max\": 40.47230317,\n \"num_unique_values\": 575,\n \"samples\": [\n 18.10124728,\n 23.4793369,\n
19.33619372\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"BMI (During COVID)\",\n
\"properties\": {\n \"dtype\": \"number\",\n \"std\": 7.704015939745007,\n \"min\": 0.0,\n \"max\": 283.0572067,\n
\"num_unique_values\": 598,\n \"samples\": [\n 26.69612565,\n 24.19359677,\n 24.43240015\n ],\n \"semantic_type\":
\"\",\n \"description\": \"\"\n }\n }\n ]\n}"}},"metadata":{},"execution_count":30}]},{"cell_type":"code","source":
["pd_null['W2'] = pd_null['W2'].astype(float)"],"metadata":{"id":"8VUg3q60E5I1"},"execution_count":null,"outputs":[]},
{"cell_type":"code","source":["pd_null.info()"],"metadata":{"colab":
{"base_uri":"https://localhost:8080/"},"id":"ynAaME0wFBaW","executionInfo":
{"status":"ok","timestamp":1731175130350,"user_tz":-360,"elapsed":488,"user":{"displayName":"Mohammad Rifat
Ahmmad Rashid","userId":"17207620860184690696"}},"outputId":"6abf074e-964a-4a13-d476-
a5fea181ba7e"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["\n","RangeIndex:
1602 entries, 0 to 1601\n","Data columns (total 8 columns):\n"," # Column Non-Null Count Dtype \n","--- ------ -------------
- ----- \n"," 0 Feet 1602 non-null int64 \n"," 1 Inch 1602 non-null float64\n"," 2 Hieght (cm) 1602 non-null float64\n"," 3
Hieght (m2) 1602 non-null float64\n"," 4 W1 1600 non-null float64\n"," 5 W2 1600 non-null float64\n"," 6 BMI (Before
COVID) 1602 non-null float64\n"," 7 BMI (During COVID) 1600 non-null float64\n","dtypes: float64(7),
int64(1)\n","memory usage: 100.2 KB\n"]}]},{"cell_type":"markdown","source":["## Renaming and Replacing
Values\n","Objective: Rename DataFrame columns and replace values within the DataFrame."],"metadata":
{"id":"ipW30TdOY-AA"}},{"cell_type":"code","source":["# Rename columns\n","df_renamed = df.rename(columns=
{'Name': 'FirstName'})\n","\n","# Replace values\n","df_replaced = df.replace({'City': {'Berlin': 'Berlin,
Germany'}})\n"],"metadata":{"id":"8vPjRzYfY7XN"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":
["pd_null.info()"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"v4H_cnVv-sQL","executionInfo":
{"status":"ok","timestamp":1731175141818,"user_tz":-360,"elapsed":34,"user":{"displayName":"Mohammad Rifat
Ahmmad Rashid","userId":"17207620860184690696"}},"outputId":"26aa7062-0060-453d-9c17-
dc4e0aef1b9e"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["\n","RangeIndex:
1602 entries, 0 to 1601\n","Data columns (total 8 columns):\n"," # Column Non-Null Count Dtype \n","--- ------ -------------
- ----- \n"," 0 Feet 1602 non-null int64 \n"," 1 Inch 1602 non-null float64\n"," 2 Hieght (cm) 1602 non-null float64\n"," 3
Hieght (m2) 1602 non-null float64\n"," 4 W1 1600 non-null float64\n"," 5 W2 1600 non-null float64\n"," 6 BMI (Before
COVID) 1602 non-null float64\n"," 7 BMI (During COVID) 1600 non-null float64\n","dtypes: float64(7),
int64(1)\n","memory usage: 100.2 KB\n"]}]},{"cell_type":"code","source":["# Rename columns\n","pd_renamed =
pd_null.rename(columns={'W1': 'BeforeW1',\n"," 'W2': 'AfterW2'})\n","pd_renamed.info()"],"metadata":{"colab":
{"base_uri":"https://localhost:8080/"},"id":"IDkn1HmP-xbi","executionInfo":
{"status":"ok","timestamp":1731175141818,"user_tz":-360,"elapsed":27,"user":{"displayName":"Mohammad Rifat
Ahmmad Rashid","userId":"17207620860184690696"}},"outputId":"2aa978d7-7f7a-4015-dc14-
08d805cd19bb"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["\n","RangeIndex:
1602 entries, 0 to 1601\n","Data columns (total 8 columns):\n"," # Column Non-Null Count Dtype \n","--- ------ -------------
- ----- \n"," 0 Feet 1602 non-null int64 \n"," 1 Inch 1602 non-null float64\n"," 2 Hieght (cm) 1602 non-null float64\n"," 3
Hieght (m2) 1602 non-null float64\n"," 4 BeforeW1 1600 non-null float64\n"," 5 AfterW2 1600 non-null float64\n"," 6
BMI (Before COVID) 1602 non-null float64\n"," 7 BMI (During COVID) 1600 non-null float64\n","dtypes: float64(7),
int64(1)\n","memory usage: 100.2 KB\n"]}]},{"cell_type":"code","source":["pd_renamed.head()"],"metadata":{"colab":
{"base_uri":"https://localhost:8080/","height":206},"id":"vWoGfPT4_M3r","executionInfo":
{"status":"ok","timestamp":1731175141818,"user_tz":-360,"elapsed":19,"user":{"displayName":"Mohammad Rifat
Ahmmad Rashid","userId":"17207620860184690696"}},"outputId":"53d72f11-5fe8-4fdb-ef1e-
e19a5bb34578"},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":[" Feet Inch
Hieght (cm) Hieght (m2) BeforeW1 AfterW2 \\\n","0 5 6.0 167.64 2.810317 70.0 NaN \n","1 5 1.0 154.94 2.400640 NaN
0.55 \n","2 5 0.0 152.40 2.322576 44.0 0.49 \n","3 5 1.0 154.94 2.400640 49.0 NaN \n","4 5 3.0 160.02 2.560640 NaN
0.78 \n","\n"," BMI (Before COVID) BMI (During COVID) \n","0 24.908222 NaN \n","1 0.000000 22.910554 \n","2
18.944482 21.097264 \n","3 20.411221 0.000000 \n","4 0.000000 30.461134 "],"text/html":["\n","
\n","
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
Feet Inch Hieght (cm) Hieght (m2) BeforeW1 AfterW2 BMI (Before COVID) BMI (During COVID)
05 6.0 167.64 2.810317 70.0 NaN 24.908222 NaN
15 1.0 154.94 2.400640 NaN 0.55 0.000000 22.910554
25 0.0 152.40 2.322576 44.0 0.49 18.944482 21.097264
35 1.0 154.94 2.400640 49.0 NaN 20.411221 0.000000
45 3.0 160.02 2.560640 NaN 0.78 0.000000 30.461134
\n","
\n","
\n","\n","
\n" " \n" "\n" " \n" " \n" "\n" " \n" "\n" " \n" "
\n , \n , \n , \n , \n , \n , \n , \n , \n ,
\n","\n","\n","
\n","\n","
\n"," \n"," \n","\n","\n","\n"," \n","
\n","\n","
\n","
\n"],"application/vnd.google.colaboratory.intrinsic+json":
{"type":"dataframe","variable_name":"pd_renamed","summary":"{\n \"name\": \"pd_renamed\",\n \"rows\": 1602,\n
\"fields\": [\n {\n \"column\": \"Feet\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0,\n \"min\": 4,\n \"max\": 7,\n
\"num_unique_values\": 4,\n \"samples\": [\n 6,\n 7,\n 5\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n
\"column\": \"Inch\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 3.0922374995274406,\n \"min\": 0.0,\n \"max\":
33.0,\n \"num_unique_values\": 21,\n \"samples\": [\n 6.0,\n 7.5,\n 8.5\n ],\n \"semantic_type\": \"\",\n \"description\":
\"\"\n }\n },\n {\n \"column\": \"Hieght (cm)\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 9.513993865916744,\n
\"min\": 121.92,\n \"max\": 236.22,\n \"num_unique_values\": 40,\n \"samples\": [\n 185.42,\n 158.75,\n 200.66\n ],\n
\"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Hieght (m2)\",\n \"properties\": {\n \"dtype\":
\"number\",\n \"std\": 0.3166260627010893,\n \"min\": 1.48644864,\n \"max\": 5.57998884,\n \"num_unique_values\":
40,\n \"samples\": [\n 3.43805764,\n 2.52015625,\n 4.02644356\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n
},\n {\n \"column\": \"BeforeW1\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 12.33592048993348,\n \"min\":
3.0,\n \"max\": 118.0,\n \"num_unique_values\": 78,\n \"samples\": [\n 80.0,\n 70.0,\n 71.0\n ],\n \"semantic_type\": \"\",\n
\"description\": \"\"\n }\n },\n {\n \"column\": \"AfterW2\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\":
0.20971222485805083,\n \"min\": 0.3199999928474426,\n \"max\": 7.480000019073486,\n \"num_unique_values\":
101,\n \"samples\": [\n 0.9900000095367432,\n 0.8399999737739563,\n 0.4620000123977661\n ],\n
\"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"BMI (Before COVID)\",\n \"properties\": {\n
\"dtype\": \"number\",\n \"std\": 4.192523190526696,\n \"min\": 0.0,\n \"max\": 40.47230317,\n \"num_unique_values\":
575,\n \"samples\": [\n 18.10124728,\n 23.4793369,\n 19.33619372\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n
}\n },\n {\n \"column\": \"BMI (During COVID)\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\":
7.704015939745007,\n \"min\": 0.0,\n \"max\": 283.0572067,\n \"num_unique_values\": 598,\n \"samples\": [\n
26.69612565,\n 24.19359677,\n 24.43240015\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n
]\n}"}},"metadata":{},"execution_count":36}]},{"cell_type":"code","source":["pd_renamed['Address'] =
'Dhaka'"],"metadata":{"id":"cNiSZ9Fx_nor"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":
["pd_renamed.head()"],"metadata":{"colab":
{"base_uri":"https://localhost:8080/","height":206},"id":"f1W7EoU0_vxC","executionInfo":
{"status":"ok","timestamp":1731175143623,"user_tz":-360,"elapsed":1820,"user":{"displayName":"Mohammad Rifat
Ahmmad Rashid","userId":"17207620860184690696"}},"outputId":"48e739e7-460e-4df6-ace9-
ecb199a29eb8"},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":[" Feet Inch
Hieght (cm) Hieght (m2) BeforeW1 AfterW2 \\\n","0 5 6.0 167.64 2.810317 70.0 NaN \n","1 5 1.0 154.94 2.400640 NaN
0.55 \n","2 5 0.0 152.40 2.322576 44.0 0.49 \n","3 5 1.0 154.94 2.400640 49.0 NaN \n","4 5 3.0 160.02 2.560640 NaN
0.78 \n","\n"," BMI (Before COVID) BMI (During COVID) Address \n","0 24.908222 NaN Dhaka \n","1 0.000000
22.910554 Dhaka \n","2 18.944482 21.097264 Dhaka \n","3 20.411221 0.000000 Dhaka \n","4 0.000000 30.461134
Dhaka "],"text/html":["\n","
\n","
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
\n"," \n"," \n"," \n","
Feet Inch Hieght (cm) Hieght (m2) BeforeW1 AfterW2 BMI (Before COVID) BMI (During COVID) Address
05 6.0 167.64 2.810317 70.0 NaN 24.908222 NaN Dhaka
15 1.0 154.94 2.400640 NaN 0.55 0.000000 22.910554 Dhaka
25 0.0 152.40 2.322576 44.0 0.49 18.944482 21.097264 Dhaka
35 1.0 154.94 2.400640 49.0 NaN 20.411221 0.000000 Dhaka
45 3.0 160.02 2.560640 NaN 0.78 0.000000 30.461134 Dhaka
\n","
\n","
\n","\n","
\n" " \n" "\n" " \n" " \n" "\n" " \n" "\n" " \n" "
\n , \n , \n , \n , \n , \n , \n , \n , \n ,
\n","\n","\n","
\n","\n","
\n"," \n"," \n","\n","\n","\n"," \n","
\n","\n","
\n","
\n"],"application/vnd.google.colaboratory.intrinsic+json":
{"type":"dataframe","variable_name":"pd_renamed","summary":"{\n \"name\": \"pd_renamed\",\n \"rows\": 1602,\n
\"fields\": [\n {\n \"column\": \"Feet\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0,\n \"min\": 4,\n \"max\": 7,\n
\"num_unique_values\": 4,\n \"samples\": [\n 6,\n 7,\n 5\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n
\"column\": \"Inch\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 3.0922374995274406,\n \"min\": 0.0,\n \"max\":
33.0,\n \"num_unique_values\": 21,\n \"samples\": [\n 6.0,\n 7.5,\n 8.5\n ],\n \"semantic_type\": \"\",\n \"description\":
\"\"\n }\n },\n {\n \"column\": \"Hieght (cm)\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 9.513993865916744,\n
\"min\": 121.92,\n \"max\": 236.22,\n \"num_unique_values\": 40,\n \"samples\": [\n 185.42,\n 158.75,\n 200.66\n ],\n
\"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Hieght (m2)\",\n \"properties\": {\n \"dtype\":
\"number\",\n \"std\": 0.3166260627010893,\n \"min\": 1.48644864,\n \"max\": 5.57998884,\n \"num_unique_values\":
40,\n \"samples\": [\n 3.43805764,\n 2.52015625,\n 4.02644356\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n
},\n {\n \"column\": \"BeforeW1\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 12.33592048993348,\n \"min\":
3.0,\n \"max\": 118.0,\n \"num_unique_values\": 78,\n \"samples\": [\n 80.0,\n 70.0,\n 71.0\n ],\n \"semantic_type\": \"\",\n
\"description\": \"\"\n }\n },\n {\n \"column\": \"AfterW2\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\":
0.20971222485805083,\n \"min\": 0.3199999928474426,\n \"max\": 7.480000019073486,\n \"num_unique_values\":
101,\n \"samples\": [\n 0.9900000095367432,\n 0.8399999737739563,\n 0.4620000123977661\n ],\n
\"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"BMI (Before COVID)\",\n \"properties\": {\n
\"dtype\": \"number\",\n \"std\": 4.192523190526696,\n \"min\": 0.0,\n \"max\": 40.47230317,\n \"num_unique_values\":
575,\n \"samples\": [\n 18.10124728,\n 23.4793369,\n 19.33619372\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n
}\n },\n {\n \"column\": \"BMI (During COVID)\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\":
7.704015939745007,\n \"min\": 0.0,\n \"max\": 283.0572067,\n \"num_unique_values\": 598,\n \"samples\": [\n
26.69612565,\n 24.19359677,\n 24.43240015\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\":
\"Address\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"Dhaka\"\n ],\n
\"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"}},"metadata":{},"execution_count":38}]},
{"cell_type":"code","source":["df_replaced = pd_renamed.replace({'Address': {'Dhaka': 'Berlin,
Germany'}})"],"metadata":{"id":"v22zDXWk_cB8"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":
["df_replaced.head()"],"metadata":{"colab":
{"base_uri":"https://localhost:8080/","height":206},"id":"52eQJvf3_7Jc","executionInfo":
{"status":"ok","timestamp":1731175143623,"user_tz":-360,"elapsed":42,"user":{"displayName":"Mohammad Rifat
Ahmmad Rashid","userId":"17207620860184690696"}},"outputId":"441f6ec4-3d38-412b-aa77-
f04652a8a135"},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":[" Feet Inch Hieght
(cm) Hieght (m2) BeforeW1 AfterW2 \\\n","0 5 6.0 167.64 2.810317 70.0 NaN \n","1 5 1.0 154.94 2.400640 NaN 0.55
\n","2 5 0.0 152.40 2.322576 44.0 0.49 \n","3 5 1.0 154.94 2.400640 49.0 NaN \n","4 5 3.0 160.02 2.560640 NaN 0.78
\n","\n"," BMI (Before COVID) BMI (During COVID) Address \n","0 24.908222 NaN Berlin, Germany \n","1 0.000000
22.910554 Berlin, Germany \n","2 18.944482 21.097264 Berlin, Germany \n","3 20.411221 0.000000 Berlin, Germany
\n","4 0.000000 30.461134 Berlin, Germany "],"text/html":["\n","
\n","
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
\n"," \n"," \n"," \n","
Hieght Hieght BMI (Before BMI (During
Feet Inch BeforeW1 AfterW2 Address
(cm) (m2) COVID) COVID)
Berlin,
05 6.0 167.64 2.810317 70.0 NaN 24.908222 NaN
Germany
Berlin,
15 1.0 154.94 2.400640 NaN 0.55 0.000000 22.910554
Germany
Berlin,
25 0.0 152.40 2.322576 44.0 0.49 18.944482 21.097264
Germany
Berlin,
35 1.0 154.94 2.400640 49.0 NaN 20.411221 0.000000
Germany
Berlin,
45 3.0 160.02 2.560640 NaN 0.78 0.000000 30.461134
Germany
\n","
\n","
\n","\n","
\n" " \n" "\n" " \n" " \n" "\n" " \n" "\n" " \n" "
\n , \n , \n , \n , \n , \n , \n , \n , \n ,
\n","\n","\n","
\n","\n","
\n"," \n"," \n","\n","\n","\n"," \n","
\n","\n","
\n","
\n"],"application/vnd.google.colaboratory.intrinsic+json":{"type":"dataframe","variable_name":"df_replaced","summary":"
{\n \"name\": \"df_replaced\",\n \"rows\": 1602,\n \"fields\": [\n {\n \"column\": \"Feet\",\n \"properties\": {\n \"dtype\":
\"number\",\n \"std\": 0,\n \"min\": 4,\n \"max\": 7,\n \"num_unique_values\": 4,\n \"samples\": [\n 6,\n 7,\n 5\n ],\n
\"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Inch\",\n \"properties\": {\n \"dtype\": \"number\",\n
\"std\": 3.0922374995274406,\n \"min\": 0.0,\n \"max\": 33.0,\n \"num_unique_values\": 21,\n \"samples\": [\n 6.0,\n
7.5,\n 8.5\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Hieght (cm)\",\n \"properties\": {\n
\"dtype\": \"number\",\n \"std\": 9.513993865916744,\n \"min\": 121.92,\n \"max\": 236.22,\n \"num_unique_values\":
40,\n \"samples\": [\n 185.42,\n 158.75,\n 200.66\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n
\"column\": \"Hieght (m2)\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.3166260627010893,\n \"min\":
1.48644864,\n \"max\": 5.57998884,\n \"num_unique_values\": 40,\n \"samples\": [\n 3.43805764,\n 2.52015625,\n
4.02644356\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"BeforeW1\",\n \"properties\": {\n
\"dtype\": \"number\",\n \"std\": 12.33592048993348,\n \"min\": 3.0,\n \"max\": 118.0,\n \"num_unique_values\": 78,\n
\"samples\": [\n 80.0,\n 70.0,\n 71.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\":
\"AfterW2\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.20971222485805083,\n \"min\":
0.3199999928474426,\n \"max\": 7.480000019073486,\n \"num_unique_values\": 101,\n \"samples\": [\n
0.9900000095367432,\n 0.8399999737739563,\n 0.4620000123977661\n ],\n \"semantic_type\": \"\",\n \"description\":
\"\"\n }\n },\n {\n \"column\": \"BMI (Before COVID)\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\":
4.192523190526696,\n \"min\": 0.0,\n \"max\": 40.47230317,\n \"num_unique_values\": 575,\n \"samples\": [\n
18.10124728,\n 23.4793369,\n 19.33619372\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\":
\"BMI (During COVID)\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 7.704015939745007,\n \"min\": 0.0,\n
\"max\": 283.0572067,\n \"num_unique_values\": 598,\n \"samples\": [\n 26.69612565,\n 24.19359677,\n
24.43240015\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Address\",\n \"properties\": {\n
\"dtype\": \"category\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"Berlin, Germany\"\n ],\n \"semantic_type\": \"\",\n
\"description\": \"\"\n }\n }\n ]\n}"}},"metadata":{},"execution_count":40}]},{"cell_type":"markdown","source":["# Part 3:
Data Handling - Numerical and Categorical Data"],"metadata":{"id":"cC8F8rXWZQBV"}},
{"cell_type":"markdown","source":["## Handling Categorical Data\n","Objective: Convert categorical data into numeric
format using one-hot encoding."],"metadata":{"id":"Asp6LhjSZSal"}},{"cell_type":"code","source":
["df.head()"],"metadata":{"colab":
{"base_uri":"https://localhost:8080/","height":175},"id":"2LI6YxpwADYT","executionInfo":
{"status":"ok","timestamp":1731175143623,"user_tz":-360,"elapsed":39,"user":{"displayName":"Mohammad Rifat
Ahmmad Rashid","userId":"17207620860184690696"}},"outputId":"c16c3a42-ea46-4bff-bf91-
91c60e83f0ad"},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":[" Name Age
City\n","0 John 28.0 New York\n","1 Anna 34.0 Paris\n","2 Peter 29.0 Berlin\n","3 Linda 32.0 London"],"text/html":["\n","
\n","
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
Name Age City
0 John 28.0 New York
1 Anna 34.0 Paris
2 Peter 29.0 Berlin
3 Linda 32.0 London
\n","
\n","
\n","\n","
\n" " \n" "\n" " \n" " \n" "\n" " \n" "\n" " \n" "
\n , \n , \n , \n , \n , \n , \n , \n , \n ,
\n","\n","\n","
\n","\n","
\n"," \n"," \n","\n","\n","\n"," \n","
\n","\n","
\n","
\n"],"application/vnd.google.colaboratory.intrinsic+json":{"type":"dataframe","variable_name":"df","summary":"{\n
\"name\": \"df\",\n \"rows\": 4,\n \"fields\": [\n {\n \"column\": \"Name\",\n \"properties\": {\n \"dtype\": \"string\",\n
\"num_unique_values\": 4,\n \"samples\": [\n \"Anna\",\n \"Linda\",\n \"John\"\n ],\n \"semantic_type\": \"\",\n
\"description\": \"\"\n }\n },\n {\n \"column\": \"Age\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\":
2.753785273643051,\n \"min\": 28.0,\n \"max\": 34.0,\n \"num_unique_values\": 4,\n \"samples\": [\n 34.0,\n 32.0,\n
28.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"City\",\n \"properties\": {\n \"dtype\":
\"string\",\n \"num_unique_values\": 4,\n \"samples\": [\n \"Paris\",\n \"London\",\n \"New York\"\n ],\n \"semantic_type\":
\"\",\n \"description\": \"\"\n }\n }\n ]\n}"}},"metadata":{},"execution_count":41}]},{"cell_type":"code","source":["# One-hot
encoding\n","df_encoded = pd.get_dummies(df, columns=['City'])\n"],"metadata":
{"id":"DkCWSr4GZFK5"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["df_encoded"],"metadata":
{"id":"H9PQLFP6tNSZ","colab":{"base_uri":"https://localhost:8080/","height":175},"executionInfo":
{"status":"ok","timestamp":1731175143623,"user_tz":-360,"elapsed":34,"user":{"displayName":"Mohammad Rifat
Ahmmad Rashid","userId":"17207620860184690696"}},"outputId":"27952d13-77db-4bf9-8512-
dce2696fe137"},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":[" Name Age
City_Berlin City_London City_New York City_Paris\n","0 John 28.0 False False True False\n","1 Anna 34.0 False False
False True\n","2 Peter 29.0 True False False False\n","3 Linda 32.0 False True False False"],"text/html":["\n","
\n","
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
\n"," \n","
Name Age City_Berlin City_London City_New York City_Paris
0 John 28.0 False False True False
1 Anna 34.0 False False False True
2 Peter 29.0 True False False False
3 Linda 32.0 False True False False
\n","
\n","
\n","\n","
\n" " \n" "\n" " \n" " \n" "\n" " \n" "\n" " \n" "
\n , \n , \n , \n , \n , \n , \n , \n , \n ,
\n","\n","\n","
\n","\n","
\n"," \n"," \n","\n","\n","\n"," \n","
\n","\n","
\n","\n","
\n"," \n"," \n"," \n"," \n","
\n","\n","
\n","
\n"],"application/vnd.google.colaboratory.intrinsic+json":{"type":"dataframe","variable_name":"df_encoded","summary":"
{\n \"name\": \"df_encoded\",\n \"rows\": 4,\n \"fields\": [\n {\n \"column\": \"Name\",\n \"properties\": {\n \"dtype\":
\"string\",\n \"num_unique_values\": 4,\n \"samples\": [\n \"Anna\",\n \"Linda\",\n \"John\"\n ],\n \"semantic_type\": \"\",\n
\"description\": \"\"\n }\n },\n {\n \"column\": \"Age\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\":
2.753785273643051,\n \"min\": 28.0,\n \"max\": 34.0,\n \"num_unique_values\": 4,\n \"samples\": [\n 34.0,\n 32.0,\n
28.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"City_Berlin\",\n \"properties\": {\n
\"dtype\": \"boolean\",\n \"num_unique_values\": 2,\n \"samples\": [\n true,\n false\n ],\n \"semantic_type\": \"\",\n
\"description\": \"\"\n }\n },\n {\n \"column\": \"City_London\",\n \"properties\": {\n \"dtype\": \"boolean\",\n
\"num_unique_values\": 2,\n \"samples\": [\n true,\n false\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n
\"column\": \"City_New York\",\n \"properties\": {\n \"dtype\": \"boolean\",\n \"num_unique_values\": 2,\n \"samples\": [\n
false,\n true\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"City_Paris\",\n \"properties\": {\n
\"dtype\": \"boolean\",\n \"num_unique_values\": 2,\n \"samples\": [\n true,\n false\n ],\n \"semantic_type\": \"\",\n
\"description\": \"\"\n }\n }\n ]\n}"}},"metadata":{},"execution_count":43}]},{"cell_type":"markdown","source":["##
Numerical Data Transformation\n","Objective: Apply transformations to numerical data."],"metadata":
{"id":"jjQ5vTvHZWVb"}},{"cell_type":"code","source":["# Normalize Age column\n","df['Age'] = (df['Age'] - df['Age'].min())
/ (df['Age'].max() - df['Age'].min())\n","\n","# Apply a custom function\n","df['Age'] = df['Age'].apply(lambda x: x *
100)\n"],"metadata":{"id":"KZwmBEIaZUWw"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":
["df_excel = pd.read_excel(\"/content/BMI Calculation_MJH.xlsx\")"],"metadata":
{"id":"oqyqYdastWwp"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["df_excel.info()"],"metadata":
{"colab":{"base_uri":"https://localhost:8080/"},"id":"n4n4_7ZMtge5","executionInfo":
{"status":"ok","timestamp":1731175146819,"user_tz":-360,"elapsed":48,"user":{"displayName":"Mohammad Rifat
Ahmmad Rashid","userId":"17207620860184690696"}},"outputId":"d67a86bd-0a43-4264-d699-
9c6c7568c6ab"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["\n","RangeIndex:
1602 entries, 0 to 1601\n","Data columns (total 8 columns):\n"," # Column Non-Null Count Dtype \n","--- ------ -------------
- ----- \n"," 0 Feet 1602 non-null int64 \n"," 1 Inch 1602 non-null float64\n"," 2 Hieght (cm) 1602 non-null float64\n"," 3
Hieght (m2) 1602 non-null float64\n"," 4 W1 1602 non-null float64\n"," 5 W2 1602 non-null float64\n"," 6 BMI (Before
COVID) 1602 non-null float64\n"," 7 BMI (During COVID) 1602 non-null float64\n","dtypes: float64(7),
int64(1)\n","memory usage: 100.2 KB\n"]}]},{"cell_type":"code","source":["df_excel['ratio'] = df_excel['BMI (Before
COVID)']/df_excel['BMI (During COVID)']"],"metadata":{"id":"Nbeuwk9Vteta"},"execution_count":null,"outputs":[]},
{"cell_type":"code","source":["df_excel.head()"],"metadata":{"colab":
{"base_uri":"https://localhost:8080/","height":206},"id":"rLyyIyCotyU6","executionInfo":
{"status":"ok","timestamp":1731175146820,"user_tz":-360,"elapsed":44,"user":{"displayName":"Mohammad Rifat
Ahmmad Rashid","userId":"17207620860184690696"}},"outputId":"444e503e-1164-4347-df73-
d2fec1fd6eba"},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":[" Feet Inch Hieght
(cm) Hieght (m2) W1 W2 BMI (Before COVID) \\\n","0 5 6.0 167.64 2.810317 70.0 78.0 24.908222 \n","1 5 1.0 154.94
2.400640 51.0 55.0 21.244332 \n","2 5 0.0 152.40 2.322576 44.0 49.0 18.944482 \n","3 5 1.0 154.94 2.400640 49.0
47.0 20.411221 \n","4 5 3.0 160.02 2.560640 75.0 78.0 29.289552 \n","\n"," BMI (During COVID) ratio \n","0 27.754876
0.897436 \n","1 22.910554 0.927273 \n","2 21.097264 0.897959 \n","3 19.578110 1.042553 \n","4 30.461134 0.961538
"],"text/html":["\n","
\n","
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
\n"," \n"," \n"," \n","
Feet Inch Hieght (cm) Hieght (m2) W1 W2 BMI (Before COVID) BMI (During COVID) ratio
05 6.0 167.64 2.810317 70.0 78.0 24.908222 27.754876 0.897436
15 1.0 154.94 2.400640 51.0 55.0 21.244332 22.910554 0.927273
25 0.0 152.40 2.322576 44.0 49.0 18.944482 21.097264 0.897959
35 1.0 154.94 2.400640 49.0 47.0 20.411221 19.578110 1.042553
45 3.0 160.02 2.560640 75.0 78.0 29.289552 30.461134 0.961538
\n","
\n","
\n","\n","
\n" " \n" "\n" " \n" " \n" "\n" " \n" "\n" " \n" "
\n , \n , \n , \n , \n , \n , \n , \n , \n ,
\n","\n","\n","
\n","\n","
\n"," \n"," \n","\n","\n","\n"," \n","
\n","\n","
\n","
\n"],"application/vnd.google.colaboratory.intrinsic+json":{"type":"dataframe","variable_name":"df_excel","summary":"{\n
\"name\": \"df_excel\",\n \"rows\": 1602,\n \"fields\": [\n {\n \"column\": \"Feet\",\n \"properties\": {\n \"dtype\":
\"number\",\n \"std\": 0,\n \"min\": 4,\n \"max\": 7,\n \"num_unique_values\": 4,\n \"samples\": [\n 6,\n 7,\n 5\n ],\n
\"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Inch\",\n \"properties\": {\n \"dtype\": \"number\",\n
\"std\": 3.0922374995274406,\n \"min\": 0.0,\n \"max\": 33.0,\n \"num_unique_values\": 21,\n \"samples\": [\n 6.0,\n
7.5,\n 8.5\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Hieght (cm)\",\n \"properties\": {\n
\"dtype\": \"number\",\n \"std\": 9.513993865916744,\n \"min\": 121.92,\n \"max\": 236.22,\n \"num_unique_values\":
40,\n \"samples\": [\n 185.42000000000002,\n 158.75,\n 200.66\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n
},\n {\n \"column\": \"Hieght (m2)\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.31662606270109006,\n \"min\":
1.48644864,\n \"max\": 5.57998884,\n \"num_unique_values\": 40,\n \"samples\": [\n 3.4380576400000002,\n
2.52015625,\n 4.026443560000001\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"W1\",\n
\"properties\": {\n \"dtype\": \"number\",\n \"std\": 12.335592024444987,\n \"min\": 3.0,\n \"max\": 118.0,\n
\"num_unique_values\": 78,\n \"samples\": [\n 80.0,\n 70.0,\n 71.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n
}\n },\n {\n \"column\": \"W2\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 21.01654716375466,\n \"min\": 0.4,\n
\"max\": 748.0,\n \"num_unique_values\": 102,\n \"samples\": [\n 61.8,\n 46.2,\n 37.0\n ],\n \"semantic_type\": \"\",\n
\"description\": \"\"\n }\n },\n {\n \"column\": \"BMI (Before COVID)\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\":
4.117923349181555,\n \"min\": 1.1352561767623535,\n \"max\": 40.47230316682855,\n \"num_unique_values\":
577,\n \"samples\": [\n 26.045000630223903,\n 27.151870912738364,\n 36.15167580189387\n ],\n \"semantic_type\":
\"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"BMI (During COVID)\",\n \"properties\": {\n \"dtype\": \"number\",\n
\"std\": 7.678420507393607,\n \"min\": 0.15621094482299824,\n \"max\": 283.05720673941346,\n
\"num_unique_values\": 600,\n \"samples\": [\n 26.696125645979503,\n 28.903053243519697,\n
24.432400150219703\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"ratio\",\n
\"properties\": {\n \"dtype\": \"number\",\n \"std\": 2.3510452656618366,\n \"min\": 0.05555555555555555,\n \"max\":
95.0,\n \"num_unique_values\": 734,\n \"samples\": [\n 1.0679611650485437,\n 0.98,\n 1.0833333333333333\n ],\n
\"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"}},"metadata":{},"execution_count":48}]},
{"cell_type":"code","source":["df_excel['Inch'] = df_excel['Inch'].apply(lambda x : x / 10)"],"metadata":
{"id":"1jXlkh1fuFTR"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["df_excel.head()"],"metadata":
{"colab":{"base_uri":"https://localhost:8080/","height":206},"id":"HpheHAYiuNXx","executionInfo":
{"status":"ok","timestamp":1731175146820,"user_tz":-360,"elapsed":40,"user":{"displayName":"Mohammad Rifat
Ahmmad Rashid","userId":"17207620860184690696"}},"outputId":"19ec1a50-66f1-4f6f-b98b-
f25e0445c7c0"},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":[" Feet Inch Hieght
(cm) Hieght (m2) W1 W2 BMI (Before COVID) \\\n","0 5 0.6 167.64 2.810317 70.0 78.0 24.908222 \n","1 5 0.1 154.94
2.400640 51.0 55.0 21.244332 \n","2 5 0.0 152.40 2.322576 44.0 49.0 18.944482 \n","3 5 0.1 154.94 2.400640 49.0
47.0 20.411221 \n","4 5 0.3 160.02 2.560640 75.0 78.0 29.289552 \n","\n"," BMI (During COVID) ratio \n","0 27.754876
0.897436 \n","1 22.910554 0.927273 \n","2 21.097264 0.897959 \n","3 19.578110 1.042553 \n","4 30.461134 0.961538
"],"text/html":["\n","
\n","
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
\n"," \n"," \n"," \n","
Feet Inch Hieght (cm) Hieght (m2) W1 W2 BMI (Before COVID) BMI (During COVID) ratio
05 0.6 167.64 2.810317 70.0 78.0 24.908222 27.754876 0.897436
15 0.1 154.94 2.400640 51.0 55.0 21.244332 22.910554 0.927273
25 0.0 152.40 2.322576 44.0 49.0 18.944482 21.097264 0.897959
35 0.1 154.94 2.400640 49.0 47.0 20.411221 19.578110 1.042553
45 0.3 160.02 2.560640 75.0 78.0 29.289552 30.461134 0.961538
\n","
\n","
\n","\n","
\n" " \n" "\n" " \n" " \n" "\n" " \n" "\n" " \n" "
\n , \n , \n , \n , \n , \n , \n , \n , \n ,
\n","\n","\n","
\n","\n","
\n"," \n"," \n","\n","\n","\n"," \n","
\n","\n","
\n","
\n"],"application/vnd.google.colaboratory.intrinsic+json":{"type":"dataframe","variable_name":"df_excel","summary":"{\n
\"name\": \"df_excel\",\n \"rows\": 1602,\n \"fields\": [\n {\n \"column\": \"Feet\",\n \"properties\": {\n \"dtype\":
\"number\",\n \"std\": 0,\n \"min\": 4,\n \"max\": 7,\n \"num_unique_values\": 4,\n \"samples\": [\n 6,\n 7,\n 5\n ],\n
\"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Inch\",\n \"properties\": {\n \"dtype\": \"number\",\n
\"std\": 0.3092237499527406,\n \"min\": 0.0,\n \"max\": 3.3,\n \"num_unique_values\": 21,\n \"samples\": [\n 0.6,\n
0.75,\n 0.85\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Hieght (cm)\",\n \"properties\":
{\n \"dtype\": \"number\",\n \"std\": 9.513993865916744,\n \"min\": 121.92,\n \"max\": 236.22,\n \"num_unique_values\":
40,\n \"samples\": [\n 185.42000000000002,\n 158.75,\n 200.66\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n
},\n {\n \"column\": \"Hieght (m2)\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.31662606270109006,\n \"min\":
1.48644864,\n \"max\": 5.57998884,\n \"num_unique_values\": 40,\n \"samples\": [\n 3.4380576400000002,\n
2.52015625,\n 4.026443560000001\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"W1\",\n
\"properties\": {\n \"dtype\": \"number\",\n \"std\": 12.335592024444987,\n \"min\": 3.0,\n \"max\": 118.0,\n
\"num_unique_values\": 78,\n \"samples\": [\n 80.0,\n 70.0,\n 71.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n
}\n },\n {\n \"column\": \"W2\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 21.01654716375466,\n \"min\": 0.4,\n
\"max\": 748.0,\n \"num_unique_values\": 102,\n \"samples\": [\n 61.8,\n 46.2,\n 37.0\n ],\n \"semantic_type\": \"\",\n
\"description\": \"\"\n }\n },\n {\n \"column\": \"BMI (Before COVID)\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\":
4.117923349181555,\n \"min\": 1.1352561767623535,\n \"max\": 40.47230316682855,\n \"num_unique_values\":
577,\n \"samples\": [\n 26.045000630223903,\n 27.151870912738364,\n 36.15167580189387\n ],\n \"semantic_type\":
\"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"BMI (During COVID)\",\n \"properties\": {\n \"dtype\": \"number\",\n
\"std\": 7.678420507393607,\n \"min\": 0.15621094482299824,\n \"max\": 283.05720673941346,\n
\"num_unique_values\": 600,\n \"samples\": [\n 26.696125645979503,\n 28.903053243519697,\n
24.432400150219703\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"ratio\",\n
\"properties\": {\n \"dtype\": \"number\",\n \"std\": 2.3510452656618366,\n \"min\": 0.05555555555555555,\n \"max\":
95.0,\n \"num_unique_values\": 734,\n \"samples\": [\n 1.0679611650485437,\n 0.98,\n 1.0833333333333333\n ],\n
\"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"}},"metadata":{},"execution_count":50}]},
{"cell_type":"markdown","source":["# Part 4: Advanced Data Manipulation"],"metadata":{"id":"wVYZeqqvZawy"}},
{"cell_type":"markdown","source":["## Grouping and Aggregating Data\n","Objective: Group data and apply various
aggregation functions."],"metadata":{"id":"4JRYwaXPZcz5"}},{"cell_type":"code","source":["# Group by City and
calculate mean Age\n","grouped = df.groupby('City')['Age'].mean()\n","print(\"Mean Age by City:\\n\",
grouped)\n"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"ZJNbuml3ZZHB","outputId":"ff5c2cce-7a1e-
4c20-cb25-8d8ad490970e","executionInfo":
{"status":"ok","timestamp":1731175146820,"user_tz":-360,"elapsed":36,"user":{"displayName":"Mohammad Rifat
Ahmmad Rashid","userId":"17207620860184690696"}}},"execution_count":null,"outputs":
[{"output_type":"stream","name":"stdout","text":["Mean Age by City:\n"," City\n","Berlin 16.666667\n","London
66.666667\n","New York 0.000000\n","Paris 100.000000\n","Name: Age, dtype: float64\n"]}]},
{"cell_type":"code","source":[],"metadata":{"id":"-Lf4zHomx5Fl"},"execution_count":null,"outputs":[]},
{"cell_type":"markdown","source":["## Merging, Joining, and Concatenating\n","Objective: Combine DataFrames using
different techniques."],"metadata":{"id":"sp4m-vH5Zhre"}},{"cell_type":"code","source":["# Another DataFrame to merge
with\n","data2 = {'City': ['New York', 'Paris'],\n"," 'Population': [8800000, 2148327]}\n","df2 =
pd.DataFrame(data2)\n"],"metadata":{"id":"b0ZSmIFbZf4_"},"execution_count":null,"outputs":[]},
{"cell_type":"code","source":["# Merge on City\n","df_merged = pd.merge(df, df2, on='City')\n","df_merged"],"metadata":
{"colab":{"base_uri":"https://localhost:8080/","height":125},"id":"nkWdu5qqZr9-","outputId":"8acbc638-737a-42a2-85af-
919f60165a37","executionInfo":{"status":"ok","timestamp":1731175147423,"user_tz":-360,"elapsed":628,"user":
{"displayName":"Mohammad Rifat Ahmmad
Rashid","userId":"17207620860184690696"}}},"execution_count":null,"outputs":[{"output_type":"execute_result","data":
{"text/plain":[" Name Age City Population\n","0 John 0.0 New York 8800000\n","1 Anna 100.0 Paris
2148327"],"text/html":["\n","
\n","
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
\n"," \n"," \n","
Name Age City Population
0 John 0.0 New York 8800000
1 Anna 100.0 Paris 2148327
\n","
\n","
\n","\n","
\n" " \n" "\n" " \n" " \n" "\n" " \n" "\n" " \n" "
\n , \n , \n , \n , \n , \n , \n , \n , \n ,
\n","\n","\n","
\n","\n","
\n"," \n"," \n","\n","\n","\n"," \n","
\n","\n","
\n","\n","
\n"," \n"," \n"," \n"," \n","
\n","\n","
\n","
\n"],"application/vnd.google.colaboratory.intrinsic+json":{"type":"dataframe","variable_name":"df_merged","summary":"
{\n \"name\": \"df_merged\",\n \"rows\": 2,\n \"fields\": [\n {\n \"column\": \"Name\",\n \"properties\": {\n \"dtype\":
\"string\",\n \"num_unique_values\": 2,\n \"samples\": [\n \"Anna\",\n \"John\"\n ],\n \"semantic_type\": \"\",\n
\"description\": \"\"\n }\n },\n {\n \"column\": \"Age\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\":
70.71067811865476,\n \"min\": 0.0,\n \"max\": 100.0,\n \"num_unique_values\": 2,\n \"samples\": [\n 100.0,\n 0.0\n ],\n
\"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"City\",\n \"properties\": {\n \"dtype\": \"string\",\n
\"num_unique_values\": 2,\n \"samples\": [\n \"Paris\",\n \"New York\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n
}\n },\n {\n \"column\": \"Population\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 4703443,\n \"min\": 2148327,\n
\"max\": 8800000,\n \"num_unique_values\": 2,\n \"samples\": [\n 2148327,\n 8800000\n ],\n \"semantic_type\": \"\",\n
\"description\": \"\"\n }\n }\n ]\n}"}},"metadata":{},"execution_count":53}]},{"cell_type":"code","source":["# Concatenate
DataFrames vertically\n","df_concat = pd.concat([df, df2], axis=0)\n","df_concat"],"metadata":{"colab":
{"base_uri":"https://localhost:8080/","height":238},"id":"4EijmWPkZsym","outputId":"93a3ae78-c422-44b4-fb22-
c7e1574053a0","executionInfo":{"status":"ok","timestamp":1731175147424,"user_tz":-360,"elapsed":28,"user":
{"displayName":"Mohammad Rifat Ahmmad
Rashid","userId":"17207620860184690696"}}},"execution_count":null,"outputs":[{"output_type":"execute_result","data":
{"text/plain":[" Name Age City Population\n","0 John 0.000000 New York NaN\n","1 Anna 100.000000 Paris NaN\n","2
Peter 16.666667 Berlin NaN\n","3 Linda 66.666667 London NaN\n","0 NaN NaN New York 8800000.0\n","1 NaN NaN
Paris 2148327.0"],"text/html":["\n","
\n","
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
\n"," \n"," \n"," \n"," \n"," \n","
Name Age City Population
0 John 0.000000 New York NaN
1 Anna 100.000000 Paris NaN
2 Peter 16.666667 Berlin NaN
3 Linda 66.666667 London NaN
0 NaN NaN New York 8800000.0
1 NaN NaN Paris 2148327.0
\n","
\n","
\n","\n","
\n" " \n" "\n" " \n" " \n" "\n" " \n" "\n" " \n" "
\n , \n , \n , \n , \n , \n , \n , \n , \n ,
\n","\n","\n","
\n","\n","
\n"," \n"," \n","\n","\n","\n"," \n","
\n","\n","
\n","\n","
\n"," \n"," \n"," \n"," \n","
\n","\n","
\n","
\n"],"application/vnd.google.colaboratory.intrinsic+json":{"type":"dataframe","variable_name":"df_concat","summary":"
{\n \"name\": \"df_concat\",\n \"rows\": 6,\n \"fields\": [\n {\n \"column\": \"Name\",\n \"properties\": {\n \"dtype\":
\"string\",\n \"num_unique_values\": 4,\n \"samples\": [\n \"Anna\",\n \"Linda\",\n \"John\"\n ],\n \"semantic_type\": \"\",\n
\"description\": \"\"\n }\n },\n {\n \"column\": \"Age\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\":
45.89642122738418,\n \"min\": 0.0,\n \"max\": 100.0,\n \"num_unique_values\": 4,\n \"samples\": [\n 100.0,\n
66.66666666666666,\n 0.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"City\",\n
\"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 4,\n \"samples\": [\n \"Paris\",\n \"London\",\n \"New
York\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Population\",\n \"properties\": {\n
\"dtype\": \"number\",\n \"std\": 4703443.0845354665,\n \"min\": 2148327.0,\n \"max\": 8800000.0,\n
\"num_unique_values\": 2,\n \"samples\": [\n 2148327.0,\n 8800000.0\n ],\n \"semantic_type\": \"\",\n \"description\":
\"\"\n }\n }\n ]\n}"}},"metadata":{},"execution_count":54}]},{"cell_type":"markdown","source":["## Pivot Tables and Cross-
Tabulation\n","Objective: Utilize pivot tables and perform cross-tabulation for advanced data analysis."],"metadata":
{"id":"AbeHz7MiZmB7"}},{"cell_type":"code","source":["# Pivot table\n","pivot = df.pivot_table(values='Age', index='City',
aggfunc='mean')\n","\n","pivot"],"metadata":{"colab":
{"base_uri":"https://localhost:8080/","height":206},"id":"HgTmVf1yZjuz","outputId":"2ef11670-ee52-4583-bac5-
c60ac261c057","executionInfo":{"status":"ok","timestamp":1731175157428,"user_tz":-360,"elapsed":1012,"user":
{"displayName":"Mohammad Rifat Ahmmad
Rashid","userId":"17207620860184690696"}}},"execution_count":null,"outputs":[{"output_type":"execute_result","data":
{"text/plain":[" Age\n","City \n","Berlin 16.666667\n","London 66.666667\n","New York 0.000000\n","Paris
100.000000"],"text/html":["\n","
\n","
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
\n"," \n"," \n"," \n"," \n"," \n","
Age
City
Berlin 16.666667
London 66.666667
New York 0.000000
Paris 100.000000
\n","
\n","
\n","\n","
\n" " \n" "\n" " \n" " \n" "\n" " \n" "\n" " \n" "
\n , \n , \n , \n , \n , \n , \n , \n , \n ,
\n","\n","\n","
\n","\n","
\n"," \n"," \n","\n","\n","\n"," \n","
\n","\n","
\n","\n","
\n"," \n"," \n"," \n"," \n","
\n","\n","
\n","
\n"],"application/vnd.google.colaboratory.intrinsic+json":{"type":"dataframe","variable_name":"pivot","summary":"{\n
\"name\": \"pivot\",\n \"rows\": 4,\n \"fields\": [\n {\n \"column\": \"City\",\n \"properties\": {\n \"dtype\": \"string\",\n
\"num_unique_values\": 4,\n \"samples\": [\n \"London\",\n \"Paris\",\n \"Berlin\"\n ],\n \"semantic_type\": \"\",\n
\"description\": \"\"\n }\n },\n {\n \"column\": \"Age\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\":
45.89642122738418,\n \"min\": 0.0,\n \"max\": 100.0,\n \"num_unique_values\": 4,\n \"samples\": [\n
66.66666666666666,\n 100.0,\n 16.666666666666664\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n
]\n}"}},"metadata":{},"execution_count":57}]},{"cell_type":"code","source":["# Cross-tabulation\n","cross_tab =
pd.crosstab(df['City'], df['Age'])\n","cross_tab"],"metadata":{"colab":
{"base_uri":"https://localhost:8080/","height":206},"id":"XItXg1k7Zn06","outputId":"663b09e0-e77b-45ec-8a29-
6ab4c710305c","executionInfo":{"status":"ok","timestamp":1731175157429,"user_tz":-360,"elapsed":15,"user":
{"displayName":"Mohammad Rifat Ahmmad
Rashid","userId":"17207620860184690696"}}},"execution_count":null,"outputs":[{"output_type":"execute_result","data":
{"text/plain":["Age 0.000000 16.666667 66.666667 100.000000\n","City \n","Berlin 0 1 0 0\n","London 0 0 1 0\n","New
York 1 0 0 0\n","Paris 0 0 0 1"],"text/html":["\n","
\n","
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
Age 0.000000 16.666667 66.666667 100.000000
City
Berlin 0 1 0 0
London 0 0 1 0
New York 1 0 0 0
Paris 0 0 0 1
\n","
\n","
\n","\n","
\n" " \n" "\n" " \n" " \n" "\n" " \n" "\n" " \n" "
\n , \n , \n , \n , \n , \n , \n , \n , \n ,
\n","\n","\n","
\n","\n","
\n"," \n"," \n","\n","\n","\n"," \n","
\n","\n","
\n","\n","
\n"," \n"," \n"," \n"," \n","
\n","\n","
\n","
\n"],"application/vnd.google.colaboratory.intrinsic+json":{"type":"dataframe","variable_name":"cross_tab","summary":"
{\n \"name\": \"cross_tab\",\n \"rows\": 4,\n \"fields\": [\n {\n \"column\": \"City\",\n \"properties\": {\n \"dtype\": \"string\",\n
\"num_unique_values\": 4,\n \"samples\": [\n \"London\",\n \"Paris\",\n \"Berlin\"\n ],\n \"semantic_type\": \"\",\n
\"description\": \"\"\n }\n },\n {\n \"column\": 0.0,\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0,\n \"min\": 0,\n
\"max\": 1,\n \"num_unique_values\": 2,\n \"samples\": [\n 1,\n 0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n
},\n {\n \"column\": 16.666666666666664,\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0,\n \"min\": 0,\n \"max\":
1,\n \"num_unique_values\": 2,\n \"samples\": [\n 0,\n 1\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n
\"column\": 66.66666666666666,\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0,\n \"min\": 0,\n \"max\": 1,\n
\"num_unique_values\": 2,\n \"samples\": [\n 1,\n 0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n
\"column\": 100.0,\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0,\n \"min\": 0,\n \"max\": 1,\n
\"num_unique_values\": 2,\n \"samples\": [\n 1,\n 0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n
]\n}"}},"metadata":{},"execution_count":58}]},{"cell_type":"code","source":[],"metadata":{"id":"QBD-N-
AgZ4bb"},"execution_count":null,"outputs":[]}]}

You might also like