ML Lab File
ML Lab File
WHAT IS IDE?
There are many other IDEs available for Python development, but these
are some of the most popular and widely used ones. Ultimately, the
choice of IDE depends on personal preference and the specific needs of
the project.
PANDAS SERIES
Pandas Series is a one-dimensional labeled array-like object provided by
the Pandas library in Python. It is similar to a column in a spreadsheet or
a SQL table. A series can hold various data types, such as integers,
floats, strings, and Python objects. The series has two main components,
the data, and the index, where the index labels the data points in the
series.
my_series = pd.Series()
print(my_series)
CODE:
import numpy as np
import pandas as pd
By default, the index of the series starts from 0 till the length of
series -1.
3. CREATING A SERIES FROM LIST:
In order to create a series from list, we have to first create a list
after that we can create a series from list.
CODE:
import pandas as pd
my_dict = {'a': 10, 'b': 20, 'c': 30, 'd': 40, 'e': 50}
my_series = pd.Series(my_dict)
print(my_series)
1. CREATING AN EMPTYDATAFRAME :
To create an empty dataframe using pandas library in Python,
you can use the pd.DataFrame() function with no arguments
passed to it.
CODE:
import pandas as pd
CODE:
import pandas as pd
CODE:
import pandas as pd
We can read this file into a dataframe using the following code:
import pandas as pd
import pandas as pd
first_row = df.iloc[0]
print(first_row)
second_third_rows = df.loc[1:2]
print(second_third_rows)
a_column = df['A']
print(a_column)
print(a_c_columns)
print(subset)
DATA CLEANING
import pandas as pd
import numpy as np
data = pd.read_csv('data.csv')
data = data.dropna()
# Remove duplicates
data = data.drop_duplicates()
# Handle outliers
z = np.abs(stats.zscore(data))
le = LabelEncoder()
data['category'] = le.fit_transform(data['category'])
# Print the first 5 rows of the data after handling categorical variables
# Print the first 5 rows of the data after removing irrelevant features
data.to_csv('cleaned_data.csv', index=False)
Write a program to extract a subset of data from a data frame:
import pandas as pd
df = pd.read_csv("C:\\Users\\surjit\\Documents\\pokemon_data.csv")
print(df.head(5))
#print(df)
#df.to_csv("C:\\Users\\surjit\\Documents\\pokemon_data1.csv")
Write a program to handle categorical data:
import pandas as pd
le = LabelEncoder()
df =
pd.read_csv("C:\\Users\\varinder\\Documents\\machine_learning_Lab
\\datasets\\melb_dat
a.csv")
object_attributes_list = list(object_attributes[object_attributes].index)
catagorical_df = df[object_attributes_list]
labeled_data = catagorical_df[["Type"]]
#applyinga fit transform to the column Type and placing values in new
column
Type(LAbel_Encoding)
labeled_data["Type(Label_Encoding)"] =
le.fit_transform(catagorical_df["Type"])
#Affter using fit transorm, column wise Distinct value count of dataframe
labeled data
one_hot_Encoding = pd.get_dummies(catagorical_df["Type"])
for i in range(0,len(one_hot_Encoding.columns)):
labeled_data[one_hot_Encoding.columns[i]] = one_hot_Encoding.iloc[:,i]
df = pd.read_csv("C:\\Users\\varinder\\Documents\\modified_1.csv")
#reading csv
values
print(df)