Assignment Questions

Uploaded by

priyanshuaryangupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views8 pages

Assignment Questions

Uploaded by

priyanshuaryangupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

PYTHON ASSIGNMENT

Q1. DAD hospital wants to understand what are the key factors
influencing the cost to hospital. The hospital wants to provide
treatment packages (fixed price contract) to the patients at the time of
the admission. Can the hospital build a model using the historical data
to estimate the cost of treatment?

Ans 1.
import pandas as pd
import numpy as np
data = pd.read_csv("hospital_data.csv")
X = data.drop('total_cost', axis=1)
y = data['total_cost']
Q2. Build a correlation matrix between all the numeric features in the
dataset. Report the features which are correlated at a cut-off of 0.70.
What actions will you take on the features which are highly
correlated?

Ans 2.
import seaborn as sns
import matplotlib.pyplot as plt
numeric_features = data.select_dtypes(include=[np.number])
corr_matrix = numeric_features.corr()
plt.figure(figsize=(12, 8))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', fmt=".2f",
linewidths=0.5)
highly_correlated_features = corr_matrix[abs(corr_matrix) >
0.7].stack().index
print(f"Highly correlated features (>= 0.7):
{highly_correlated_features}")
Q3. Select the features that can be used to build a model to estimate
the cost to the hospital.

Ans 3.
selected_features = ['age', 'admission_days', 'diagnosis_code',
'hospital_type', 'previous_conditions']
X_selected = X[selected_features]
Q4. Identify which features are numerical and which are categorical.
Create a new Data Frame with the selected numeric features and
categorical features. Encode the categorical features and create
dummy features.

Ans 4.
numeric_features = X_selected.select_dtypes(include=[np.number])
categorical_features =
X_selected.select_dtypes(exclude=[np.number])
X_encoded = pd.get_dummies(X_selected, drop_first=True)
X_encoded.head()
Q5. Which features have the symptoms of multi-collinearity and need
to be removed from the model?

Ans 5.
from statsmodels.stats.outliers_influence import
variance_inflation_factor
X_vif = X_encoded.copy()
vif_data = pd.DataFrame()
vif_data["Feature"] = X_vif.columns
vif_data["VIF"] = [variance_inflation_factor(X_vif.values, i) for i in
range(len(X_vif.columns))]
print(vif_data)
X_encoded = X_encoded.drop(vif_data[vif_data['VIF'] > 10]
['Feature'], axis=1)
Q6. Find the outliers in the dataset using Z-score and Cook’s distance.
If required, remove the observations from the dataset.

Ans 6.
from scipy.stats import zscore
from statsmodels.stats.outliers_influence import OLSInfluence
z_scores = np.abs(zscore(X_encoded))
outliers = (z_scores > 3).all(axis=1)
print(f"Number of outliers detected: {outliers.sum()}")
X_clean = X_encoded[~outliers]
y_clean = y[~outliers]
model = sm.OLS(y_clean, sm.add_constant(X_clean)).fit()
influence = OLSInfluence(model)
cooks_d = influence.cooks_distance[0]
outliers_cooks = cooks_d > 4 / len(X_clean)
print(f"Number of outliers detected by Cook’s Distance:
{outliers_cooks.sum()}")
X_clean = X_clean[~outliers_cooks]
y_clean = y_clean[~outliers_cooks]
Q7. Split the data into training set and test set. Use 80% of data for
model training and 20% for model testing.

Ans 7.
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_clean, y_clean,
test_size=0.2, random_state=42)
8. Build a regression model with statsmodel.api to estimate the total
cost to hospital. How do you interpret the model outcome?
9. Which features are statistically significant in predicting the total
cost to the hospital?
10. Build a linear regression model with significant features and
report model performance.
11. Conduct residual analysis using P-P plot to find out if the model is
valid.
12. Predict the total cost using the test set and report RMSE of the
model.

Supervised Learning
100% (1)
Supervised Learning
15 pages
(Feature Engineering) (Extended-Cheatsheet)
No ratings yet
(Feature Engineering) (Extended-Cheatsheet)
9 pages
2023 02 Ansys General Hardware Recommendations
No ratings yet
2023 02 Ansys General Hardware Recommendations
24 pages
Intro To Plastic Injection Molding Ebook
78% (9)
Intro To Plastic Injection Molding Ebook
43 pages
CQF June 2021 M4L4 Solutions
No ratings yet
CQF June 2021 M4L4 Solutions
14 pages
Continuous Assessment
No ratings yet
Continuous Assessment
4 pages
Machine Learning Algorithm 1690246024
No ratings yet
Machine Learning Algorithm 1690246024
26 pages
PROJECTS
No ratings yet
PROJECTS
6 pages
Linear Reg Signal and Noise PDF
No ratings yet
Linear Reg Signal and Noise PDF
20 pages
Heart Disease Diagnosis Using Machine Learning
No ratings yet
Heart Disease Diagnosis Using Machine Learning
26 pages
Gaurav - Data Mining Lab Assignment
No ratings yet
Gaurav - Data Mining Lab Assignment
36 pages
Data Analytucs 1
No ratings yet
Data Analytucs 1
5 pages
20BCP021 Assignment 6
No ratings yet
20BCP021 Assignment 6
15 pages
COMP5318
No ratings yet
COMP5318
42 pages
Unit1 ML Programs
No ratings yet
Unit1 ML Programs
5 pages
Healthcare-Project-Simplilearn - Week3
No ratings yet
Healthcare-Project-Simplilearn - Week3
7 pages
DL (Pra 01)
No ratings yet
DL (Pra 01)
9 pages
ML Manual Final
No ratings yet
ML Manual Final
35 pages
Data Preprocessing Example Programs1
No ratings yet
Data Preprocessing Example Programs1
9 pages
Data Science Manual
No ratings yet
Data Science Manual
16 pages
Xgboost
No ratings yet
Xgboost
12 pages
Da Program Upto 6
No ratings yet
Da Program Upto 6
20 pages
ML Labmanual
No ratings yet
ML Labmanual
33 pages
utf-8''C2M1 Assignment
No ratings yet
utf-8''C2M1 Assignment
24 pages
DA Programs
No ratings yet
DA Programs
44 pages
Slip
No ratings yet
Slip
5 pages
Group Work Assignment Supervised and Unsupervised Learning
No ratings yet
Group Work Assignment Supervised and Unsupervised Learning
10 pages
Assignment No 8
No ratings yet
Assignment No 8
17 pages
Dav Lab Manual
No ratings yet
Dav Lab Manual
28 pages
Ass 1 ML
No ratings yet
Ass 1 ML
21 pages
DataAnalytics Lab Manual
No ratings yet
DataAnalytics Lab Manual
35 pages
HIV Regression Source Code
No ratings yet
HIV Regression Source Code
26 pages
Condition Number
No ratings yet
Condition Number
6 pages
DWDM Lab Report
No ratings yet
DWDM Lab Report
26 pages
ML - Datascience Manual
No ratings yet
ML - Datascience Manual
64 pages
ModuleAr Merged
No ratings yet
ModuleAr Merged
42 pages
Sla4a 21im30005
No ratings yet
Sla4a 21im30005
11 pages
DS Food
No ratings yet
DS Food
18 pages
Assignment 2 Documentation
No ratings yet
Assignment 2 Documentation
15 pages
ML Lab Codes
No ratings yet
ML Lab Codes
14 pages
Lab (Work) Experiment File Priyanka Rajak 0901MC221056
No ratings yet
Lab (Work) Experiment File Priyanka Rajak 0901MC221056
19 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
26 pages
Week 11 Features Additive
No ratings yet
Week 11 Features Additive
19 pages
Data Preprocessing Techniques in ML
No ratings yet
Data Preprocessing Techniques in ML
12 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
42 pages
ML Lab
No ratings yet
ML Lab
14 pages
DSBDA Practicals
No ratings yet
DSBDA Practicals
16 pages
ML Final Prac
No ratings yet
ML Final Prac
47 pages
Aiml Programs
No ratings yet
Aiml Programs
12 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
9 pages
Practical 3 - Categorical Feature Engineering
No ratings yet
Practical 3 - Categorical Feature Engineering
6 pages
DL 1
No ratings yet
DL 1
4 pages
UNITIV BtechIot
No ratings yet
UNITIV BtechIot
43 pages
Batch-2 Ieee DMT
No ratings yet
Batch-2 Ieee DMT
4 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
24 pages
Mlalllabprgs
No ratings yet
Mlalllabprgs
17 pages
22K61A0654 2 Sasi Auto
No ratings yet
22K61A0654 2 Sasi Auto
24 pages
Train
No ratings yet
Train
17 pages
MLLab Manual
No ratings yet
MLLab Manual
24 pages
Part A Assignment 6
No ratings yet
Part A Assignment 6
28 pages
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Advanced C++ Interview Questions You'll Most Likely Be Asked
From Everand
Advanced C++ Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Volume Profile 部分20
No ratings yet
Volume Profile 部分20
5 pages
Administration of Estates
No ratings yet
Administration of Estates
52 pages
Case - Study Vietjet
No ratings yet
Case - Study Vietjet
26 pages
Report On Rural Haat
83% (6)
Report On Rural Haat
22 pages
The Danger of Credit Cards - Updated
No ratings yet
The Danger of Credit Cards - Updated
6 pages
Marine Hsse Final Assignment Chop Saw
No ratings yet
Marine Hsse Final Assignment Chop Saw
11 pages
Microsoft Azure Fundamentals: Microsoft AZ-900 Dumps Available Here at
No ratings yet
Microsoft Azure Fundamentals: Microsoft AZ-900 Dumps Available Here at
9 pages
Practical Asessment - 3.2022
No ratings yet
Practical Asessment - 3.2022
303 pages
Digital Signatures: CCA Controller of Certifying Authorities
No ratings yet
Digital Signatures: CCA Controller of Certifying Authorities
18 pages
Contributions of Filipino Scientist
100% (1)
Contributions of Filipino Scientist
2 pages
True or False 1
No ratings yet
True or False 1
7 pages
Module 3.1 - Training Certificate - Folayeni - Awosika
No ratings yet
Module 3.1 - Training Certificate - Folayeni - Awosika
1 page
My Resume
No ratings yet
My Resume
2 pages
Assignment 3 BTF3363
No ratings yet
Assignment 3 BTF3363
5 pages
Quality Practices and Problems in Free Software Projects: Martin Michlmayr, Francis Hunt, David Probert
No ratings yet
Quality Practices and Problems in Free Software Projects: Martin Michlmayr, Francis Hunt, David Probert
5 pages
School Action Plan For Literacy Catch-Up Sessions
No ratings yet
School Action Plan For Literacy Catch-Up Sessions
7 pages
Cat Connectors
No ratings yet
Cat Connectors
85 pages
Production of Amorphous SIlica From Geothermal Sludge of Dieng Indonesia
No ratings yet
Production of Amorphous SIlica From Geothermal Sludge of Dieng Indonesia
9 pages
X U Data Sheet Technical Information ASSET DOC 2597808
No ratings yet
X U Data Sheet Technical Information ASSET DOC 2597808
10 pages
PROPOSAL Syringe4 Needle Assemble INDIA 20180212 MR - Rohit Shaha
No ratings yet
PROPOSAL Syringe4 Needle Assemble INDIA 20180212 MR - Rohit Shaha
31 pages
Tata Nano Car
No ratings yet
Tata Nano Car
34 pages
Installation Guide & User 'S Manual: The ACS-600 Load Moment Limiter
100% (1)
Installation Guide & User 'S Manual: The ACS-600 Load Moment Limiter
35 pages
px840t 12 Dfu Eng
No ratings yet
px840t 12 Dfu Eng
19 pages
2025 Reqwhiterun
No ratings yet
2025 Reqwhiterun
6 pages
Charles Vaughner, Cross-Appellants v. F.J. Pulito, Cross-Appellee v. General Accident Insurance Company of America, the Camden Fire Insurance Association, Potomac Insurance Company of Illinois and Pennsylvania General Insurance Company, Third-Party, 804 F.2d 873, 3rd Cir. (1986)
No ratings yet
Charles Vaughner, Cross-Appellants v. F.J. Pulito, Cross-Appellee v. General Accident Insurance Company of America, the Camden Fire Insurance Association, Potomac Insurance Company of Illinois and Pennsylvania General Insurance Company, Third-Party, 804 F.2d 873, 3rd Cir. (1986)
9 pages
BALL VALVE 0.75 INCHI
No ratings yet
BALL VALVE 0.75 INCHI
7 pages
Chapter 2
No ratings yet
Chapter 2
42 pages
Department of Educat
No ratings yet
Department of Educat
3 pages

Assignment Questions

Uploaded by

Assignment Questions

Uploaded by

PYTHON ASSIGNMENT

You might also like