0% found this document useful (0 votes)
3 views4 pages

Anoosha ML Lab01

The document outlines a lab assignment for a Machine Learning course focused on loading CSV files and handling data conversions. It includes code examples for detecting and handling missing values, removing empty lines, and supporting different delimiters. Additional suggestions for improving data handling efficiency are also provided.

Uploaded by

21b-200-se
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views4 pages

Anoosha ML Lab01

The document outlines a lab assignment for a Machine Learning course focused on loading CSV files and handling data conversions. It includes code examples for detecting and handling missing values, removing empty lines, and supporting different delimiters. Additional suggestions for improving data handling efficiency are also provided.

Uploaded by

21b-200-se
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

ANOOSHA MEHAK 21B-200-SE SEC “C”

CS334 - Machine Learning


Lab 01
You learned how to load CSV files and perform basic data conversions. Data loading can be a difficult
task given the variety of data cleaning and conversion that may be required from problem to
problem. There are many extensions that you could make to make these examples more robust to
new and different data files. Below are just a few ideas That you to implement yourself and submit
as homework file on MS Teams:

 Detect and handle missing values in a column.


 Detect and remove empty lines at the top or bottom of the file

CODE:
from csv import reader

# Load a CSV file and handle missing values

def load_csv(filename, default_value=None):

dataset = list()

with open(filename, 'r') as file:

lines = file.readlines()

# Remove empty lines from the beginning and end

while lines and lines[0].strip() == '':

lines.pop(0)

while lines and lines[-1].strip() == '':

lines.pop()

csv_reader = reader(lines)

for row in csv_reader:

if not row:

continue

for i in range(len(row)):

if row[i] == '':

# Replace missing value with default value

row[i] = default_value

dataset.append(row)

return dataset
ANOOSHA MEHAK 21B-200-SE SEC “C”

# Example usage:

filename = 'iris.csv'

default_value = 'missing value' # Specify the default value for missing values

dataset = load_csv(filename, default_value)

# Check if replacement worked by printing some rows

print("First 5 rows after handling missing values and removing empty lines from top and bottom:")

for row in dataset[:5]:

print(row)

OUTPUT:

 Support for other delimiters such as pipe (|) or white space.

# Define a function to read the file with a specified delimiter

def read_file_with_delimiter(file_path, delimiter='\t'):

data = []

with open(file_path, 'r') as file:

# Read the header

header = file.readline().strip().split(delimiter)

# Read the remaining lines

for line in file:

# Split each line based on the delimiter

row = line.strip().split(delimiter)

# Convert numeric data to appropriate types if needed

for i in range(1, len(row)): # Convert columns 1 to end to floats

row[i] = float(row[i])

data.append(row)

return header, data

# Example usage with pipe delimiter

header, data = read_file_with_delimiter('iris.csv', delimiter='|')


ANOOSHA MEHAK 21B-200-SE SEC “C”

# Displaying the header and the first few rows of data

print("Header:", header)

print("Data:")

for row in data[:5]: # Displaying first 5 rows

print(row)

OUTPUT

 Support more efficient data structures such as arrays.

def read_file_with_delimiter(file_path, delimiter='\t'):

header = []

data = []

with open(file_path, 'r') as file:

# Read the header

header = file.readline().strip().split(delimiter)

# Read the remaining lines

for line in file:

# Split each line based on the delimiter

row = line.strip().split(delimiter)

# Convert numeric data to appropriate types if needed

for i in range(1, len(row)): # Convert columns 1 to end to floats

row[i] = float(row[i])

data.append(row)

return header, data

# Example usage with pipe delimiter

header, data = read_file_with_delimiter("Iris.csv", delimiter='|')

# Displaying the header and the first few rows of data


ANOOSHA MEHAK 21B-200-SE SEC “C”

print("Header:", header)

print("Data:")

for row in data[:5]: # Displaying first 5 rows

print(row)

OUTPUT

You might also like