0% found this document useful (0 votes)

22 views4 pages

Batch-2 Ieee DMT

Uploaded by

Rekha Desireddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views4 pages

Batch-2 Ieee DMT

Uploaded by

Rekha Desireddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

T3-ASSESSMENT

S.Vyshnavi(211FA04015),D.Pavan(211FA04101),T.Teja(211FA04108), S.Sri Pujitha(211FA04289)

B.TECH 3RD YEAR,CSE,BATCH-2 ,SECTION-C ,VIGNAN UNIVERSITY

PROBLEM STATEMENT : variable and enables prediction based on these

relationships.

Histogram Generation:
For the given Dataset Solve the below problems
Histogram generation involves creating a
:
graphical representation of the distribution of data
A) Implement Regression analysis values in a dataset. It displays the frequency or
count of data points within specific intervals or
B) Generate Histograms bins, providing insights into the central tendency,
spread, and shape of the data distribution.
C) Performing Normalizations

D) Performing Discretization

ABSTRACT:
This study explores the application of statistical
techniques including regression analysis,
histogram generation, normalization, and
discretization on a given dataset. Regression
analysis uncovers relationships between variables,
while histograms visualize their distributions.
Normalization scales features for unbiased
analysis, and discretization simplifies continuous Normalization:
data. Python, with libraries like NumPy, pandas,
Matplotlib, and scikit-learn, is commonly used for Normalization is a data preprocessing technique
implementation. Through these techniques, the used to scale the values of features in a dataset to a
study aims to uncover insights for informed standard range. By bringing all features to a
decision-making across various domains. common scale, normalization ensures that each
feature contributes proportionately to the analysis,
INTRODUCTION: preventing biases caused by varying scales.
Regression Analysis:
Regression analysis is a statistical method used to
explore the relationship between one or more
independent variables and a dependent variable. It
helps in understanding how changes in
independent variables affect the dependent
Discretization: d. Set titles and labels for both subplots.

Discretization is the process of converting e. Display the histograms.

continuous variables into discrete intervals or
4. Define a function normalization(df):
categories. It simplifies the analysis of continuous
data by grouping values into predefined bins or a. Instantiate a StandardScaler.
categories, facilitating interpretation and analysis,
particularly in the context of categorical data or b. Fit and transform the 'Annual Healthcare
decision-making processes. Costs' column using the scaler.
c. Add a new column 'Normalized Healthcare
Costs' to the DataFrame containing the scaled
values.

d. Print the DataFrame with normalized

values.
5. Define a function binning(df):

a. Define bins and labels for age categories.

b. Use pd.cut() to bin the 'Age' variable into
categories.
ALGORITHM:
c. Add a new column 'Age Category' to the
1. Define a function get_input(): DataFrame containing the categorized age values.
a. Initialize an empty dictionary `data`. d. Print the DataFrame with categorized age
b. Define a list `features` containing the names values.
of the features. 6. Main code:
c. Iterate over each feature: a. Call get_input() to collect data into a
i. If the feature is 'Smoking Status' or 'Diabetes DataFrame `df`.
Status' - Prompt the user to input values for the b. Extract features and target variable for
feature 10 times and store them in a list. linear regression analysis.
ii. If the feature is numerical-Prompt the user to c. Call linear_regression() function with the
input numerical values for the feature 10 times and extracted features and target variable.
store them in a list.
d. Call histogram_plotting() function with
d. Create a DataFrame from the collected data the DataFrame `df`.
and return it.
e. Call normalization() function with the
2. Define a function linear_regression(X, y): DataFrame `df`.
a. Instantiate a LinearRegression model. f. Call binning() function with the
b. Fit the model to the input features `X` and DataFrame `df`.
target variable `y`.
SOURCE CODE:
c. Print the intercept and coefficients of the
import pandas as pd
model.
import numpy as np
3. Define a function histogram_plotting(df):
import matplotlib.pyplot as plt
a. Create a figure with two subplots.
from sklearn.linear_model import
b. Plot a histogram of 'BMI' in the first subplot.
LinearRegression
c. Plot a histogram of 'Cholesterol Level' in
from sklearn.preprocessing import StandardScaler
the second subplot.
plt.tight_layout()

def get_input(): plt.show()

data = {} def normalization(df):
features = ['Resident ID', 'Age', 'BMI', 'Blood scaler = StandardScaler()
Pressure', 'Cholesterol Level',
df['Normalized Healthcare Costs'] =
'Daily Exercise Time (minutes)', scaler.fit_transform(df[['Annual Healthcare
'Smoking Status', 'Diabetes Status', Costs']])

'Annual Healthcare Costs'] print(df)

for feature in features:
if feature == 'Smoking Status' or feature == def binning(df):
'Diabetes Status':
bins = [20, 40, 60, 80]
data[feature] = [input(f'Enter {feature}: ')
for _ in range(10)] labels = ['Young', 'Middle-aged', 'Elderly']

else: df['Age Category'] = pd.cut(df['Age'],

bins=bins, labels=labels, right=False)
data[feature] = [float(input(f'Enter
{feature}: ')) for _ in range(10)] print(df[['Age', 'Age Category']])

return pd.DataFrame(data)
def linear_regression(X, y): # Main code

model = LinearRegression() df = get_input()

model.fit(X, y) # a) Linear Regression for Daily Exercise Time

and Annual Healthcare Costs
print(f'Intercept: {model.intercept_},
Coefficient: {model.coef_[0]}') X = df[['Daily Exercise Time (minutes)']]
y = df['Annual Healthcare Costs']

def histogram_plotting(df): linear_regression(X, y)

plt.figure(figsize=(10, 4)) # b) Histograms for BMI and Cholesterol Level

plt.subplot(1, 2, 1) histogram_plotting(df)

plt.hist(df['BMI'], bins=5, edgecolor='black') # c) Normalization of Annual Healthcare Costs

plt.xlabel('BMI') normalization(df)

plt.ylabel('Frequency') # d) Discretize the Age variable into meaningful

categories
plt.title('Distribution of BMI')
binning(df)
plt.subplot(1, 2, 2)
plt.hist(df['Cholesterol Level'], bins=5,
edgecolor='black')
plt.xlabel('Cholesterol Level')
plt.ylabel('Frequency')
plt.title('Distribution of Cholesterol Level')
OUTPUT: https://www.geeksforgeeks.org/data-
normalization-in-data-mining/
https://www.saedsayad.com/binning.htm#:~:text=
Binning%20or%20discretization%20is%20the,(e.g
.%2C%20decision%20trees)

https://corporatefinanceinstitute.com/resources/exc
el/histogram/#:~:text=A%20histogram%5B1%5D
%20is%20used,to%20a%20vertical%20bar%20gr
aph.

CONCLUSION:
In conclusion,

The integration of regression analysis, histograms,

normalization, and discretization equips us with a
holistic understanding of the dataset. These
techniques empower informed decision-making
and facilitate the implementation of targeted
interventions across diverse domains.

REFERENCES:
https://www.javatpoint.com/regression-in-data-
mining

Healthcare Insurance Prediction Main
No ratings yet
Healthcare Insurance Prediction Main
74 pages
CS-3361-Data-science-lab Manual
No ratings yet
CS-3361-Data-science-lab Manual
36 pages
Aayushi ML File
No ratings yet
Aayushi ML File
37 pages
4-10 Aiml
No ratings yet
4-10 Aiml
25 pages
Data Science Fundamentals
No ratings yet
Data Science Fundamentals
22 pages
ML Manual Final
No ratings yet
ML Manual Final
35 pages
FDS Lab Question Bank
No ratings yet
FDS Lab Question Bank
11 pages
Datascience 2 PDF
No ratings yet
Datascience 2 PDF
24 pages
Aih Lab1
No ratings yet
Aih Lab1
10 pages
Stroke Prediction
No ratings yet
Stroke Prediction
10 pages
Experiment 5
No ratings yet
Experiment 5
9 pages
utf-8''C2M1 Assignment
No ratings yet
utf-8''C2M1 Assignment
24 pages
Project Paarth
No ratings yet
Project Paarth
21 pages
Assignment 03
No ratings yet
Assignment 03
6 pages
Data Science and Analtics Laboratory
No ratings yet
Data Science and Analtics Laboratory
21 pages
LAB8 LogisticReg HeartDisease
No ratings yet
LAB8 LogisticReg HeartDisease
31 pages
02 B Regression Healthcare
No ratings yet
02 B Regression Healthcare
5 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
42 pages
ML 7
No ratings yet
ML 7
6 pages
DWDM Lab Manual
No ratings yet
DWDM Lab Manual
32 pages
C2M4 - Assignment: 1 Cox Proportional Hazards and Random Survival Forests
No ratings yet
C2M4 - Assignment: 1 Cox Proportional Hazards and Random Survival Forests
18 pages
Fds 1
No ratings yet
Fds 1
44 pages
Preprocessing1.ipynb - Colab
No ratings yet
Preprocessing1.ipynb - Colab
13 pages
Continuous Assessment
No ratings yet
Continuous Assessment
4 pages
Model2.ipynb - Colab
No ratings yet
Model2.ipynb - Colab
11 pages
Abhishek Pandey - BI Lab - Exp 1
No ratings yet
Abhishek Pandey - BI Lab - Exp 1
4 pages
ModuleAr Merged
No ratings yet
ModuleAr Merged
42 pages
ML Lab
No ratings yet
ML Lab
14 pages
Data Science Manual
No ratings yet
Data Science Manual
16 pages
ML Complete Notes Hridoy
No ratings yet
ML Complete Notes Hridoy
5 pages
Heart Disease Diagnosis Using Machine Learning
No ratings yet
Heart Disease Diagnosis Using Machine Learning
26 pages
DSBDA Practicals
No ratings yet
DSBDA Practicals
16 pages
Coloring Fruits
No ratings yet
Coloring Fruits
15 pages
Step 1
No ratings yet
Step 1
10 pages
02 B Regression Healthcare
No ratings yet
02 B Regression Healthcare
5 pages
Baseline - Ipynb - Colab
No ratings yet
Baseline - Ipynb - Colab
5 pages
KNN - Jupyter Notebook
No ratings yet
KNN - Jupyter Notebook
7 pages
Lab Manual - MachineLearningLaboratory-DR - Vaishnavi
No ratings yet
Lab Manual - MachineLearningLaboratory-DR - Vaishnavi
71 pages
Cardio Screen RF
100% (1)
Cardio Screen RF
27 pages
Data Pre-Processing
No ratings yet
Data Pre-Processing
22 pages
ML Proj Diabetes
No ratings yet
ML Proj Diabetes
51 pages
Data Science Practical Book - Ipynb
No ratings yet
Data Science Practical Book - Ipynb
21 pages
Formulario - EA
No ratings yet
Formulario - EA
6 pages
ML Recordjp
No ratings yet
ML Recordjp
35 pages
Monika Sree 11-07-2024
No ratings yet
Monika Sree 11-07-2024
36 pages
Linear and Multilinear Regression
No ratings yet
Linear and Multilinear Regression
5 pages
DataAnalytics Lab Manual
No ratings yet
DataAnalytics Lab Manual
35 pages
Medical
No ratings yet
Medical
4 pages
ML Lab Codes
No ratings yet
ML Lab Codes
14 pages
Assignment 1 - LP1
No ratings yet
Assignment 1 - LP1
14 pages
Data Analysis
No ratings yet
Data Analysis
8 pages
Gaurav - Data Mining Lab Assignment
No ratings yet
Gaurav - Data Mining Lab Assignment
36 pages
AIDS - DM Using Python - Lab Programs
No ratings yet
AIDS - DM Using Python - Lab Programs
19 pages
Group Work Assignment Supervised and Unsupervised Learning
No ratings yet
Group Work Assignment Supervised and Unsupervised Learning
10 pages
Logistic Regression
No ratings yet
Logistic Regression
12 pages
PROJECTS
No ratings yet
PROJECTS
6 pages
Logistic Regression 205
No ratings yet
Logistic Regression 205
8 pages
ML Data Preprocessing in Python
No ratings yet
ML Data Preprocessing in Python
9 pages
FYMCA IDSLab A6 Submission
No ratings yet
FYMCA IDSLab A6 Submission
9 pages

Batch-2 Ieee DMT

Uploaded by

Batch-2 Ieee DMT

Uploaded by

T3-ASSESSMENT

S.Vyshnavi(211FA04015),D.Pavan(211FA04101),T.Teja(211FA04108), S.Sri Pujitha(211FA04289)

PROBLEM STATEMENT : variable and enables prediction based on these

Discretization is the process of converting e. Display the histograms.

d. Print the DataFrame with normalized

a. Define bins and labels for age categories.

def get_input(): plt.show()

'Annual Healthcare Costs'] print(df)

else: df['Age Category'] = pd.cut(df['Age'],

model = LinearRegression() df = get_input()

model.fit(X, y) # a) Linear Regression for Daily Exercise Time

def histogram_plotting(df): linear_regression(X, y)

plt.figure(figsize=(10, 4)) # b) Histograms for BMI and Cholesterol Level

plt.hist(df['BMI'], bins=5, edgecolor='black') # c) Normalization of Annual Healthcare Costs

plt.ylabel('Frequency') # d) Discretize the Age variable into meaningful

The integration of regression analysis, histograms,

You might also like