Binnnig Using Python (2)
Binnnig Using Python (2)
handle noisy data. In this method, the data is first sorted and then the sorted values are
distributed into a number of buckets or bins. As binning methods consult the neighbourhood
of values, they perform local smoothing. There are three approaches to performing
smoothing –
Smoothing by bin means : In smoothing by bin means, each value in a bin is replaced by the
mean value of the bin. Smoothing by bin median : In this method each bin value is replaced
by its bin median value. Smoothing by bin boundary : In smoothing by bin boundaries, the
minimum and maximum values in a given bin are identified as the bin boundaries. Each bin
value is then replaced by the closest boundary value.
Approach:
Examples:
Sorted data for price (in dollars): 4, 8, 9, 15, 21, 21, 24, 25, 26, 28,
29, 34
import numpy as np
import math
from sklearn.datasets import load_iris
from sklearn import datasets, linear_model, metrics
# create bins
bin1=np.zeros((30,5))
bin2=np.zeros((30,5))
bin3=np.zeros((30,5))
# Bin mean
for i in range (0,150,5):
k=int(i/5)
mean=(b[i] + b[i+1] + b[i+2] + b[i+3] + b[i+4])/5
for j in range(5):
bin1[k,j]=mean
print("Bin Mean: \n",bin1)
# Bin boundaries
for i in range (0,150,5):
k=int(i/5)
for j in range (5):
if (b[i+j]-b[i]) < (b[i+4]-b[i+j]):
bin2[k,j]=b[i]
else:
bin2[k,j]=b[i+4]
print("Bin Boundaries: \n",bin2)
# Bin median
for i in range (0,150,5):
k=int(i/5)
for j in range (5):
bin3[k,j]=b[i+2]
print("Bin Median: \n",bin3)