05 Data Preparation and Regression
05 Data Preparation and Regression
618535-EPP-1-2020-1-JO-EPPKA2-CBHE-JP
Experiment Number 5
Objectives The students learn basic skills in machine learning including data preparation and regression
using Python and Scikit-Learn.
Introduction This is an introductory experiment in machine learning. The student solves two exercises to
practice some basic skills in data preparation and solving a simple regression problem.
Materials Computer with Python integrated development environment (IDE) software installed
(PyCharm is recommended).
Dataset files: diabetes.features.csv and diabetes.labels.csv
import pandas as pd
from numpy.random import randn
from sklearn.model_selection import train_test_split
Exercise 2: Regression
The following Python code loads the features and labels of the Diabetes dataset. Complete
this code to evaluate the RMSE of the linear regressor. Normalize the entire feature set using
the standard scalar, then train and evaluate the linear model using the scaled features.
import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
X = pd.read_csv('diabetes.features.csv')
y = pd.read_csv('diabetes.labels.csv').to_numpy().flatten()
Try to improve the results by replacing the linear regressor with the following SVM regressor
that uses polynomial kernel of degree 5 and C parameter = 100.
from sklearn.svm import SVR
reg = SVR(kernel="poly", degree=5, C=100)
The European Commission's support for the production of this publication does not constitute an endorsement of the contents, which reflect
the views only of the authors, and the Commission cannot be held responsible for any use which may be made of the information contained
therein.
2
Developing Curricula for Artificial Intelligence and Robotics (DeCAIR)
618535-EPP-1-2020-1-JO-EPPKA2-CBHE-JP
Data Collection Capture the output of your code for the above two exercises.
The European Commission's support for the production of this publication does not constitute an endorsement of the contents, which reflect
the views only of the authors, and the Commission cannot be held responsible for any use which may be made of the information contained
therein.