0% found this document useful (0 votes)
10 views2 pages

Binnnig Using Python (2)

Uploaded by

mahwatatakunda21
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views2 pages

Binnnig Using Python (2)

Uploaded by

mahwatatakunda21
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Prerequisite: ML | Binning or Discretization Binning method is used to smoothing data or to

handle noisy data. In this method, the data is first sorted and then the sorted values are
distributed into a number of buckets or bins. As binning methods consult the neighbourhood
of values, they perform local smoothing. There are three approaches to performing
smoothing –

Smoothing by bin means : In smoothing by bin means, each value in a bin is replaced by the
mean value of the bin. Smoothing by bin median : In this method each bin value is replaced
by its bin median value. Smoothing by bin boundary : In smoothing by bin boundaries, the
minimum and maximum values in a given bin are identified as the bin boundaries. Each bin
value is then replaced by the closest boundary value.

Approach:

1. Sort the array of a given data set.


2. Divides the range into N intervals, each containing the approximately same number of
samples(Equal-depth partitioning).
3. Store mean/ median/ boundaries in each row.

Examples:

Sorted data for price (in dollars): 4, 8, 9, 15, 21, 21, 24, 25, 26, 28,
29, 34

Partition using equal frequency approach:


- Bin 1 : 4, 8, 9, 15
- Bin 2 : 21, 21, 24, 25
- Bin 3 : 26, 28, 29, 34

Smoothing by bin means:


- Bin 1: 9, 9, 9, 9
- Bin 2: 23, 23, 23, 23
- Bin 3: 29, 29, 29, 29

Smoothing by bin boundaries:


- Bin 1: 4, 4, 4, 15
- Bin 2: 21, 21, 25, 25
- Bin 3: 26, 26, 26, 34

Smoothing by bin median:


- Bin 1: 9 9, 9, 9
- Bin 2: 23, 23, 23, 23
- Bin 3: 29, 29, 29, 29

Below is the Python implementation for the above algorithm –

import numpy as np
import math
from sklearn.datasets import load_iris
from sklearn import datasets, linear_model, metrics

# load iris data set


dataset = load_iris()
a = dataset.data
b = np.zeros(150)
# take 1st column among 4 column of data set
for i in range (150):
b[i]=a[i,1]

b=np.sort(b) #sort the array

# create bins
bin1=np.zeros((30,5))
bin2=np.zeros((30,5))
bin3=np.zeros((30,5))

# Bin mean
for i in range (0,150,5):
k=int(i/5)
mean=(b[i] + b[i+1] + b[i+2] + b[i+3] + b[i+4])/5
for j in range(5):
bin1[k,j]=mean
print("Bin Mean: \n",bin1)

# Bin boundaries
for i in range (0,150,5):
k=int(i/5)
for j in range (5):
if (b[i+j]-b[i]) < (b[i+4]-b[i+j]):
bin2[k,j]=b[i]
else:
bin2[k,j]=b[i+4]
print("Bin Boundaries: \n",bin2)

# Bin median
for i in range (0,150,5):
k=int(i/5)
for j in range (5):
bin3[k,j]=b[i+2]
print("Bin Median: \n",bin3)

You might also like