0% found this document useful (0 votes)
22 views4 pages

Batch-2 Ieee DMT

Uploaded by

Rekha Desireddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views4 pages

Batch-2 Ieee DMT

Uploaded by

Rekha Desireddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

T3-ASSESSMENT

S.Vyshnavi(211FA04015),D.Pavan(211FA04101),T.Teja(211FA04108), S.Sri Pujitha(211FA04289)


B.TECH 3RD YEAR,CSE,BATCH-2 ,SECTION-C ,VIGNAN UNIVERSITY

PROBLEM STATEMENT : variable and enables prediction based on these


relationships.

Histogram Generation:
For the given Dataset Solve the below problems
Histogram generation involves creating a
:
graphical representation of the distribution of data
A) Implement Regression analysis values in a dataset. It displays the frequency or
count of data points within specific intervals or
B) Generate Histograms bins, providing insights into the central tendency,
spread, and shape of the data distribution.
C) Performing Normalizations

D) Performing Discretization

ABSTRACT:
This study explores the application of statistical
techniques including regression analysis,
histogram generation, normalization, and
discretization on a given dataset. Regression
analysis uncovers relationships between variables,
while histograms visualize their distributions.
Normalization scales features for unbiased
analysis, and discretization simplifies continuous Normalization:
data. Python, with libraries like NumPy, pandas,
Matplotlib, and scikit-learn, is commonly used for Normalization is a data preprocessing technique
implementation. Through these techniques, the used to scale the values of features in a dataset to a
study aims to uncover insights for informed standard range. By bringing all features to a
decision-making across various domains. common scale, normalization ensures that each
feature contributes proportionately to the analysis,
INTRODUCTION: preventing biases caused by varying scales.
Regression Analysis:
Regression analysis is a statistical method used to
explore the relationship between one or more
independent variables and a dependent variable. It
helps in understanding how changes in
independent variables affect the dependent
Discretization: d. Set titles and labels for both subplots.

Discretization is the process of converting e. Display the histograms.


continuous variables into discrete intervals or
4. Define a function normalization(df):
categories. It simplifies the analysis of continuous
data by grouping values into predefined bins or a. Instantiate a StandardScaler.
categories, facilitating interpretation and analysis,
particularly in the context of categorical data or b. Fit and transform the 'Annual Healthcare
decision-making processes. Costs' column using the scaler.
c. Add a new column 'Normalized Healthcare
Costs' to the DataFrame containing the scaled
values.

d. Print the DataFrame with normalized


values.
5. Define a function binning(df):

a. Define bins and labels for age categories.


b. Use pd.cut() to bin the 'Age' variable into
categories.
ALGORITHM:
c. Add a new column 'Age Category' to the
1. Define a function get_input(): DataFrame containing the categorized age values.
a. Initialize an empty dictionary `data`. d. Print the DataFrame with categorized age
b. Define a list `features` containing the names values.
of the features. 6. Main code:
c. Iterate over each feature: a. Call get_input() to collect data into a
i. If the feature is 'Smoking Status' or 'Diabetes DataFrame `df`.
Status' - Prompt the user to input values for the b. Extract features and target variable for
feature 10 times and store them in a list. linear regression analysis.
ii. If the feature is numerical-Prompt the user to c. Call linear_regression() function with the
input numerical values for the feature 10 times and extracted features and target variable.
store them in a list.
d. Call histogram_plotting() function with
d. Create a DataFrame from the collected data the DataFrame `df`.
and return it.
e. Call normalization() function with the
2. Define a function linear_regression(X, y): DataFrame `df`.
a. Instantiate a LinearRegression model. f. Call binning() function with the
b. Fit the model to the input features `X` and DataFrame `df`.
target variable `y`.
SOURCE CODE:
c. Print the intercept and coefficients of the
import pandas as pd
model.
import numpy as np
3. Define a function histogram_plotting(df):
import matplotlib.pyplot as plt
a. Create a figure with two subplots.
from sklearn.linear_model import
b. Plot a histogram of 'BMI' in the first subplot.
LinearRegression
c. Plot a histogram of 'Cholesterol Level' in
from sklearn.preprocessing import StandardScaler
the second subplot.
plt.tight_layout()

def get_input(): plt.show()


data = {} def normalization(df):
features = ['Resident ID', 'Age', 'BMI', 'Blood scaler = StandardScaler()
Pressure', 'Cholesterol Level',
df['Normalized Healthcare Costs'] =
'Daily Exercise Time (minutes)', scaler.fit_transform(df[['Annual Healthcare
'Smoking Status', 'Diabetes Status', Costs']])

'Annual Healthcare Costs'] print(df)


for feature in features:
if feature == 'Smoking Status' or feature == def binning(df):
'Diabetes Status':
bins = [20, 40, 60, 80]
data[feature] = [input(f'Enter {feature}: ')
for _ in range(10)] labels = ['Young', 'Middle-aged', 'Elderly']

else: df['Age Category'] = pd.cut(df['Age'],


bins=bins, labels=labels, right=False)
data[feature] = [float(input(f'Enter
{feature}: ')) for _ in range(10)] print(df[['Age', 'Age Category']])

return pd.DataFrame(data)
def linear_regression(X, y): # Main code

model = LinearRegression() df = get_input()

model.fit(X, y) # a) Linear Regression for Daily Exercise Time


and Annual Healthcare Costs
print(f'Intercept: {model.intercept_},
Coefficient: {model.coef_[0]}') X = df[['Daily Exercise Time (minutes)']]
y = df['Annual Healthcare Costs']

def histogram_plotting(df): linear_regression(X, y)

plt.figure(figsize=(10, 4)) # b) Histograms for BMI and Cholesterol Level

plt.subplot(1, 2, 1) histogram_plotting(df)

plt.hist(df['BMI'], bins=5, edgecolor='black') # c) Normalization of Annual Healthcare Costs

plt.xlabel('BMI') normalization(df)

plt.ylabel('Frequency') # d) Discretize the Age variable into meaningful


categories
plt.title('Distribution of BMI')
binning(df)
plt.subplot(1, 2, 2)
plt.hist(df['Cholesterol Level'], bins=5,
edgecolor='black')
plt.xlabel('Cholesterol Level')
plt.ylabel('Frequency')
plt.title('Distribution of Cholesterol Level')
OUTPUT: https://www.geeksforgeeks.org/data-
normalization-in-data-mining/
https://www.saedsayad.com/binning.htm#:~:text=
Binning%20or%20discretization%20is%20the,(e.g
.%2C%20decision%20trees)

https://corporatefinanceinstitute.com/resources/exc
el/histogram/#:~:text=A%20histogram%5B1%5D
%20is%20used,to%20a%20vertical%20bar%20gr
aph.

CONCLUSION:
In conclusion,

The integration of regression analysis, histograms,


normalization, and discretization equips us with a
holistic understanding of the dataset. These
techniques empower informed decision-making
and facilitate the implementation of targeted
interventions across diverse domains.

REFERENCES:
https://www.javatpoint.com/regression-in-data-
mining

You might also like