Project Information-Gain
Project Information-Gain
In [3]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 94269 entries, 0 to 94268
Data columns (total 21 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 latitude 94269 non-null float64
1 longitude 94269 non-null float64
2 wind_kph 94269 non-null float64
3 wind_degree 94269 non-null int64
4 pressure_mb 94269 non-null int64
5 precip_mm 94269 non-null float64
6 humidity 94269 non-null int64
7 cloud 94269 non-null int64
8 feels_like_celsius 94269 non-null float64
9 visibility_km 94269 non-null float64
10 uv_index 94269 non-null int64
11 gust_kph 94269 non-null float64
12 air_quality_Carbon_Monoxide 94269 non-null float64
13 air_quality_Ozone 94269 non-null float64
14 air_quality_Nitrogen_dioxide 94269 non-null float64
15 air_quality_Sulphur_dioxide 94269 non-null float64
16 air_quality_PM2.5 94269 non-null float64
17 air_quality_PM10 94269 non-null float64
18 air_quality_us_epa_index 94269 non-null int64
19 air_quality_gb_defra_index 94269 non-null int64
20 temperature_celsius 94269 non-null int64
dtypes: float64(13), int64(8)
memory usage: 15.1 MB
In [5]: df.head()
localhost:8888/nbconvert/html/OneDrive/Desktop/project_information-gain.ipynb?download=false 1/5
5/10/24, 9:02 PM project_information-gain
Out[5]: latitude longitude wind_kph wind_degree pressure_mb precip_mm humidity cloud feels_li
5 rows × 21 columns
In [6]: X_train.head()
localhost:8888/nbconvert/html/OneDrive/Desktop/project_information-gain.ipynb?download=false 2/5
5/10/24, 9:02 PM project_information-gain
feels_like_celsius 2.446813
Out[8]:
pressure_mb 0.508741
latitude 0.497739
longitude 0.428921
air_quality_PM2.5 0.200308
air_quality_PM10 0.195606
gust_kph 0.183770
humidity 0.154072
cloud 0.123359
air_quality_us_epa_index 0.119657
air_quality_Carbon_Monoxide 0.103721
wind_degree 0.093052
visibility_km 0.088083
air_quality_gb_defra_index 0.078150
air_quality_Nitrogen_dioxide 0.072775
wind_kph 0.051646
precip_mm 0.049911
air_quality_Ozone 0.047286
uv_index 0.047006
air_quality_Sulphur_dioxide 0.033090
dtype: float64
<AxesSubplot:>
Out[9]:
localhost:8888/nbconvert/html/OneDrive/Desktop/project_information-gain.ipynb?download=false 3/5
5/10/24, 9:02 PM project_information-gain
In [16]: X_train.head()
feels_like_celsius 2.445022
Out[20]:
pressure_mb 0.508167
latitude 0.495073
longitude 0.427537
air_quality_PM2.5 0.197115
air_quality_PM10 0.192541
gust_kph 0.184881
humidity 0.159219
cloud 0.119761
air_quality_us_epa_index 0.117665
air_quality_Carbon_Monoxide 0.097628
wind_degree 0.090348
visibility_km 0.086309
air_quality_gb_defra_index 0.080962
air_quality_Nitrogen_dioxide 0.071574
precip_mm 0.053176
air_quality_Ozone 0.051077
wind_kph 0.050961
uv_index 0.045472
air_quality_Sulphur_dioxide 0.027164
dtype: float64
In [21]: mutual_info.sort_values(ascending=False).plot.bar(figsize=(15,5))
<AxesSubplot:>
Out[21]:
localhost:8888/nbconvert/html/OneDrive/Desktop/project_information-gain.ipynb?download=false 4/5
5/10/24, 9:02 PM project_information-gain
SelectPercentile(percentile=20,
Out[23]:
score_func=<function mutual_info_regression at 0x000001BFAEBB9CA0
>)
In [25]: X_train.columns[selected_top_columns.get_support()]
In [ ]:
localhost:8888/nbconvert/html/OneDrive/Desktop/project_information-gain.ipynb?download=false 5/5