The document outlines a data preprocessing experiment using a dataset named Data.csv. It includes steps for handling missing data, encoding categorical variables, splitting the dataset into training and test sets, and applying feature scaling. Various libraries such as pandas, numpy, and sklearn are utilized for these preprocessing tasks.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
4 views2 pages
Code Preprocessing
The document outlines a data preprocessing experiment using a dataset named Data.csv. It includes steps for handling missing data, encoding categorical variables, splitting the dataset into training and test sets, and applying feature scaling. Various libraries such as pandas, numpy, and sklearn are utilized for these preprocessing tasks.
from sklearn.preprocessing import LabelEncoder, OneHotEncoder labelencoder = LabelEncoder() X[:,0]=labelencoder.fit_transform(X[:,0]) onehotencoder = OneHotEncoder(categorical_features =[0]) X = onehotencoder.fit_transform(X).toarray() labelencoder_Y = LabelEncoder() Y=labelencoder.fit_transform(Y) #Splitting the dataset into Training set and Test set from sklearn.model_selection import train_test_split X_train, X_test, Y_train, Y_test = train_test_split(X,Y,test_size=0.2, random_state=0)