Binning
Binning
Binning method is used to smoothingdata or to handle noisy data. In this method, the
data is first sor ted and then the sor ted values are distributed into a number of
buckets or bins. As binning methods consult the neighbourhood of values, they per
form local smoothing. There are three approaches to per forming smoothing –
Smoothing by bin median : In this method each bin value is replaced by its bin
median value.
- Bin 1 : 4, 8, 9, 15
- Bin 2 : 21, 21, 24, 25
- Bin 3 : 26, 28, 29, 34
- Bin 1: 9, 9, 9, 9
- Bin 2: 23, 23, 23, 23
- Bin 3: 29, 29, 29, 29
- Bin 1: 4, 4, 4, 15
- Bin 2: 21, 21, 25, 25
- Bin 3: 26, 26, 26, 34
- Bin 1: 9 9, 9, 9
- Bin 2: 23, 23, 23, 23
- Bin 3: 29, 29, 29, 29
import numpy as np
import math
import pandas as pd
df = pd.read_csv('nse50_data.csv')
data = df['Turnover (Rs. Cr)']
d
a data = data[:30]
data=np.sort(data)
t print(data)
a
=
12199.98, 12211.18, 12290.16, 12528.8 , 12649.4 , 12834.85,
d
13320.2 , 13520.01, 13591.3 , 13676.58, 13709.57, 13837.03,
a
t 13931.15, 14006.48, 14105.94, 14440.17, 14716.66, 14744.56,
a
[
:
3
0
]
14932.51, 15203.09, 15787.28, 15944.45, 20187.98, 21595.33])
b1=np.zeros((10,3))
b2=np.zeros((10,3))
b3=np.zeros((10,3))
-----------------Mean Bin:-----------------
k=int(i/3)
for j in range (3):
b2[k,j]=data[i+1]
print("-----------------Median Bin :----------------- \n",b2)
compute the Boundary Bin as follows:
Output:
pd.cut()
We can use the ‘cut’ function in broadly 2 ways: by specifying the number
of bins directly and let pandas do the work of calculating equal-sized bins
for us, or we can manually specify the bin edges as we desire.
import pandas as pd
df=pd.read_csv('/home/student/Desktop/IPL.csv')
print(df)
Instead of getting the intervals back, we can specify the ‘labels’ parameter
as a list for better analysis.
pd.qcut():
Qcut (quantile-cut) differs from cut in the sense that, in qcut, the number
of elements in each bin will be roughly the same, but this will come at the
cost of differently sized interval widths. On the other hand, in cut, the bin
edges were equal sized (when we specified bins=3) with uneven number
of elements in each bin or group. Also, cut is useful when you know for
sure the interval ranges and the bins,
import pandas as pd
import numpy as np
df=pd.read_csv('/home/student/Desktop/IPL.csv')
print(df)
np.array(sorted(df['AUCTION YEAR'].unique()))
df['Yr_qcut'] = pd.qcut(df['AUCTION YEAR'], q=5,labels=['oldest','not so
old','medium','newer','latest'])
ERROR:
import pandas as pd
#create DataFrame
df = pd.DataFrame({'points': [4, 4, 7, 8, 12, 13, 15, 18, 22, 23, 23, 25],
'assists': [2, 5, 4, 7, 7, 8, 5, 4, 5, 11, 13, 8],
'rebounds': [7, 7, 4, 6, 3, 8, 9, 9, 12, 11, 8, 9]})
print(df)
df['points_bin'] = pd.qcut(df['points'], q=[0, .2, .4, .6, .8, 1], labels=['A', 'B', 'C', 'D',
'E'])