0% found this document useful (0 votes)
9 views11 pages

Revision

The document contains Python code for data analysis and visualization using libraries such as pandas, numpy, matplotlib, and seaborn. It includes examples of linear regression, dataset loading, and various plotting techniques with datasets like iris and tips. Additionally, it provides descriptions of datasets like diabetes and wine, detailing their attributes and characteristics.

Uploaded by

sirsakiawaaz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views11 pages

Revision

The document contains Python code for data analysis and visualization using libraries such as pandas, numpy, matplotlib, and seaborn. It includes examples of linear regression, dataset loading, and various plotting techniques with datasets like iris and tips. Additionally, it provides descriptions of datasets like diabetes and wine, detailing their attributes and characteristics.

Uploaded by

sirsakiawaaz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

11/25/24, 12:33 AM revision

In [136… import pandas as pd


import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LinearRegression,LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score,accuracy_score,confusion_matrix,mean_square
from sklearn.preprocessing import MinMaxScaler

In [35]: xpoints=np.array([1,2,3,4])
ypoints = np.array([3, 8, 1, 10])
font1 = {'family':'serif','color':'blue','size':20}
font2={"family":"serif","color":"purple","size":14}
plt.plot(ypoints, marker = '*',mec="red",mfc="black",ms=20,linestyle="dotted",co
plt.xlabel("X label",fontdict=font2)
plt.ylabel("Y label",fontdict=font2)
plt.xlim(0,4)
plt.ylim(0,12)
plt.title("Revision plot",fontdict=font1,loc='left')
plt.show()

In [36]: #plot 1:
x = np.array([0, 1, 2, 3])
y = np.array([3, 8, 1, 10])

plt.subplot(1, 2, 1)
plt.plot(x,y)

#plot 2:
x = np.array([0, 1, 2, 3])

file:///C:/Users/eshaa/OneDrive/Desktop/e/semV/sec/revision.html 1/11
11/25/24, 12:33 AM revision

y = np.array([10, 20, 30, 40])

plt.subplot(1, 2, 2)
plt.plot(x,y)

plt.show()

file:///C:/Users/eshaa/OneDrive/Desktop/e/semV/sec/revision.html 2/11
11/25/24, 12:33 AM revision

In [44]: # Example Dataset


data = {
'Experience': [1, 2, 3, 4, 5],
'Salary': [30000, 35000, 40000, 45000, 50000]
}
df = pd.DataFrame(data)

# Features and Target


X = df[['Experience']].values # Independent variable
y = df['Salary'] .values # Dependent variable

# Split Data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_

# Train Model
model = LinearRegression()
model.fit(X_train, y_train)

# Predictions
y_pred = model.predict(X_test)

# Metrics
print("Coefficients:", model.coef_) # Slope
print("Intercept:", model.intercept_) # Y-Intercept
print("Mean Squared Error:", mean_squared_error(y_test, y_pred))

# Predicted Results
print("Predicted Salary:", y_pred)

Coefficients: [5000.]
Intercept: 25000.0
Mean Squared Error: 0.0
Predicted Salary: [35000.]

In [47]: iris=sns.load_dataset('iris')

In [48]: iris

file:///C:/Users/eshaa/OneDrive/Desktop/e/semV/sec/revision.html 3/11
11/25/24, 12:33 AM revision

Out[48]: sepal_length sepal_width petal_length petal_width species

0 5.1 3.5 1.4 0.2 setosa

1 4.9 3.0 1.4 0.2 setosa

2 4.7 3.2 1.3 0.2 setosa

3 4.6 3.1 1.5 0.2 setosa

4 5.0 3.6 1.4 0.2 setosa

... ... ... ... ... ...

145 6.7 3.0 5.2 2.3 virginica

146 6.3 2.5 5.0 1.9 virginica

147 6.5 3.0 5.2 2.0 virginica

148 6.2 3.4 5.4 2.3 virginica

149 5.9 3.0 5.1 1.8 virginica

150 rows × 5 columns

In [54]: sns.scatterplot(x='sepal_length',y='sepal_width',hue='species',data=iris)

Out[54]: <Axes: xlabel='sepal_length', ylabel='sepal_width'>

In [63]: sns.scatterplot(x='petal_length',y='petal_width',hue='species',data=iris,palette
plt.title('petal')
plt.show()

file:///C:/Users/eshaa/OneDrive/Desktop/e/semV/sec/revision.html 4/11
11/25/24, 12:33 AM revision

In [65]: tip = sns.load_dataset('tips')

In [67]: tip.head(1)

Out[67]: total_bill tip sex smoker day time size

0 16.99 1.01 Female No Sun Dinner 2

In [74]: sns.boxplot(x='day', y='tip', data=tip,hue='sex',palette='coolwarm',linewidth=2.

Out[74]: <Axes: xlabel='day', ylabel='tip'>

file:///C:/Users/eshaa/OneDrive/Desktop/e/semV/sec/revision.html 5/11
11/25/24, 12:33 AM revision

In [143… sns.violinplot(x='day',y='tip',hue='smoker',data=tip)

Out[143… <Axes: xlabel='day', ylabel='tip'>

In [82]: sns.countplot(x='sex',hue='smoker',data=tip)

Out[82]: <Axes: xlabel='sex', ylabel='count'>

file:///C:/Users/eshaa/OneDrive/Desktop/e/semV/sec/revision.html 6/11
11/25/24, 12:33 AM revision

In [102… from sklearn.datasets import load_diabetes,load_wine

In [98]: diabetes=load_diabetes()

In [99]: diabetes.DESCR

Out[99]: '.. _diabetes_dataset:\n\nDiabetes dataset\n----------------\n\nTen baseline va


riables, age, sex, body mass index, average blood\npressure, and six blood seru
m measurements were obtained for each of n =\n442 diabetes patients, as well as
the response of interest, a\nquantitative measure of disease progression one ye
ar after baseline.\n\n**Data Set Characteristics:**\n\n:Number of Instances: 44
2\n\n:Number of Attributes: First 10 columns are numeric predictive values\n\n:
Target: Column 11 is a quantitative measure of disease progression one year aft
er baseline\n\n:Attribute Information:\n - age age in years\n - sex\n
- bmi body mass index\n - bp average blood pressure\n - s1
tc, total serum cholesterol\n - s2 ldl, low-density lipoproteins\n -
s3 hdl, high-density lipoproteins\n - s4 tch, total cholesterol /
HDL\n - s5 ltg, possibly log of serum triglycerides level\n - s6
glu, blood sugar level\n\nNote: Each of these 10 feature variables have been me
an centered and scaled by the standard deviation times the square root of `n_sa
mples` (i.e. the sum of squares of each column totals 1).\n\nSource URL:\nhttp
s://www4.stat.ncsu.edu/~boos/var.select/diabetes.html\n\nFor more information s
ee:\nBradley Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani (2004)
"Least Angle Regression," Annals of Statistics (with discussion), 407-499.\n(ht
tps://web.stanford.edu/~hastie/Papers/LARS/LeastAngle_2002.pdf)\n'

In [111… diabetes_df = pd.DataFrame(diabetes.data, columns=diabetes.feature_names)


diabetes_target = diabetes.target # Add target values

In [112… diabetes_df

file:///C:/Users/eshaa/OneDrive/Desktop/e/semV/sec/revision.html 7/11
11/25/24, 12:33 AM revision

Out[112… age sex bmi bp s1 s2 s3

0 0.038076 0.050680 0.061696 0.021872 -0.044223 -0.034821 -0.043401 -0.0025

1 -0.001882 -0.044642 -0.051474 -0.026328 -0.008449 -0.019163 0.074412 -0.0394

2 0.085299 0.050680 0.044451 -0.005670 -0.045599 -0.034194 -0.032356 -0.0025

3 -0.089063 -0.044642 -0.011595 -0.036656 0.012191 0.024991 -0.036038 0.0343

4 0.005383 -0.044642 -0.036385 0.021872 0.003935 0.015596 0.008142 -0.0025

... ... ... ... ... ... ... ...

437 0.041708 0.050680 0.019662 0.059744 -0.005697 -0.002566 -0.028674 -0.0025

438 -0.005515 0.050680 -0.015906 -0.067642 0.049341 0.079165 -0.028674 0.0343

439 0.041708 0.050680 -0.015906 0.017293 -0.037344 -0.013840 -0.024993 -0.0110

440 -0.045472 -0.044642 0.039062 0.001215 0.016318 0.015283 -0.028674 0.0265

441 -0.045472 -0.044642 -0.073030 -0.081413 0.083740 0.027809 0.173816 -0.0394

442 rows × 10 columns

In [113… wine=load_wine()

In [114… wine.DESCR

file:///C:/Users/eshaa/OneDrive/Desktop/e/semV/sec/revision.html 8/11
11/25/24, 12:33 AM revision

Out[114… '.. _wine_dataset:\n\nWine recognition dataset\n------------------------\n\n**D


ata Set Characteristics:**\n\n:Number of Instances: 178\n:Number of Attributes:
13 numeric, predictive attributes and the class\n:Attribute Information:\n -
Alcohol\n - Malic acid\n - Ash\n - Alcalinity of ash\n - Magnesium
\n - Total phenols\n - Flavanoids\n - Nonflavanoid phenols\n - Proa
nthocyanins\n - Color intensity\n - Hue\n - OD280/OD315 of diluted win
es\n - Proline\n - class:\n - class_0\n - class_1\n
- class_2\n\n:Summary Statistics:\n\n============================= ==== ===== =
====== =====\n Min Max Mean SD\n========
===================== ==== ===== ======= =====\nAlcohol: 1
1.0 14.8 13.0 0.8\nMalic Acid: 0.74 5.80 2.34 1.12
\nAsh: 1.36 3.23 2.36 0.27\nAlcalinity of Ash:
10.6 30.0 19.5 3.3\nMagnesium: 70.0 162.0 99.7 14.
3\nTotal Phenols: 0.98 3.88 2.29 0.63\nFlavanoids:
0.34 5.08 2.03 1.00\nNonflavanoid Phenols: 0.13 0.66 0.36 0.1
2\nProanthocyanins: 0.41 3.58 1.59 0.57\nColour Intensity:
1.3 13.0 5.1 2.3\nHue: 0.48 1.71 0.96 0.23
\nOD280/OD315 of diluted wines: 1.27 4.00 2.61 0.71\nProline:
278 1680 746 315\n============================= ==== ===== ======= =====
\n\n:Missing Attribute Values: None\n:Class Distribution: class_0 (59), class_1
(71), class_2 (48)\n:Creator: R.A. Fisher\n:Donor: Michael Marshall (MARSHALL%P
[email protected])\n:Date: July, 1988\n\nThis is a copy of UCI ML Wine recogni
tion datasets.\nhttps://archive.ics.uci.edu/ml/machine-learning-databases/wine/
wine.data\n\nThe data is the results of a chemical analysis of wines grown in t
he same\nregion in Italy by three different cultivators. There are thirteen dif
ferent\nmeasurements taken for different constituents found in the three types
of\nwine.\n\nOriginal Owners:\n\nForina, M. et al, PARVUS -\nAn Extendible Pack
age for Data Exploration, Classification and Correlation.\nInstitute of Pharmac
eutical and Food Analysis and Technologies,\nVia Brigata Salerno, 16147 Genoa,
Italy.\n\nCitation:\n\nLichman, M. (2013). UCI Machine Learning Repository\n[ht
tps://archive.ics.uci.edu/ml]. Irvine, CA: University of California,\nSchool of
Information and Computer Science.\n\n.. dropdown:: References\n\n (1) S. Aeb
erhard, D. Coomans and O. de Vel,\n Comparison of Classifiers in High Dimens
ional Settings,\n Tech. Rep. no. 92-02, (1992), Dept. of Computer Science an
d Dept. of\n Mathematics and Statistics, James Cook University of North Quee
nsland.\n (Also submitted to Technometrics).\n\n The data was used with m
any others for comparing various\n classifiers. The classes are separable, t
hough only RDA\n has achieved 100% correct classification.\n (RDA : 100%,
QDA 99.4%, LDA 98.9%, 1NN 96.1% (z-transformed data))\n (All results using t
he leave-one-out technique)\n\n (2) S. Aeberhard, D. Coomans and O. de Ve
l,\n "THE CLASSIFICATION PERFORMANCE OF RDA"\n Tech. Rep. no. 92-01, (199
2), Dept. of Computer Science and Dept. of\n Mathematics and Statistics, Jam
es Cook University of North Queensland.\n (Also submitted to Journal of Chem
ometrics).\n'

In [116… wine_df=pd.DataFrame(wine.data,columns=wine.feature_names)
wine_target=wine.target

In [117… wine_df

file:///C:/Users/eshaa/OneDrive/Desktop/e/semV/sec/revision.html 9/11
11/25/24, 12:33 AM revision

Out[117… alcohol malic_acid ash alcalinity_of_ash magnesium total_phenols flavanoids

0 14.23 1.71 2.43 15.6 127.0 2.80 3.06

1 13.20 1.78 2.14 11.2 100.0 2.65 2.76

2 13.16 2.36 2.67 18.6 101.0 2.80 3.24

3 14.37 1.95 2.50 16.8 113.0 3.85 3.49

4 13.24 2.59 2.87 21.0 118.0 2.80 2.69

... ... ... ... ... ... ... ...

173 13.71 5.65 2.45 20.5 95.0 1.68 0.61

174 13.40 3.91 2.48 23.0 102.0 1.80 0.75

175 13.27 4.28 2.26 20.0 120.0 1.59 0.69

176 13.17 2.59 2.37 20.0 120.0 1.65 0.68

177 14.13 4.10 2.74 24.5 96.0 2.05 0.76

178 rows × 13 columns

In [119… X_train,X_test,y_train,y_test=train_test_split(diabetes_df,diabetes_target,rando

In [120… model=LinearRegression()
model.fit(X_train,y_train)

Out[120… ▾ LinearRegression i ?

LinearRegression()

In [122… y_preds=model.predict(X_test)

In [123… mean_squared_error(y_pred=y_preds,y_true=y_test)

Out[123… np.float64(2848.3106508475043)

In [124… mean_squared_error(y_test,y_preds)

Out[124… np.float64(2848.3106508475043)

In [133… model2=LogisticRegression(max_iter=5000)

In [130… WX_train,WX_test,wy_train,wy_test=train_test_split(wine.data,wine.target,test_si

In [134… model2.fit(WX_train,wy_train)

Out[134… ▾ LogisticRegression i ?

LogisticRegression(max_iter=5000)

file:///C:/Users/eshaa/OneDrive/Desktop/e/semV/sec/revision.html 10/11
11/25/24, 12:33 AM revision

In [135… yw_preds=model2.predict(WX_test)

In [139… class_report=classification_report(y_pred=yw_preds,y_true=wy_test)
print(class_report)

precision recall f1-score support

0 1.00 1.00 1.00 15


1 1.00 1.00 1.00 18
2 1.00 1.00 1.00 12

accuracy 1.00 45
macro avg 1.00 1.00 1.00 45
weighted avg 1.00 1.00 1.00 45

In [ ]:

file:///C:/Users/eshaa/OneDrive/Desktop/e/semV/sec/revision.html 11/11

You might also like