0% found this document useful (0 votes)
23 views

Assignment 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

Assignment 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

assignments

Assignment 1: Data Analysis and Machine Learning:


Objective:
Analyze a dataset using Python, focusing on string manipulation, Pandas,
NumPy, and Scikit-learn.

Data Loading and Cleaning:


import pandas as pd
import numpy as np
# Load the dataset
iris = pd.read_csv("iris.data", header=None)
# Assign column names
iris.columns = ["sepal_length", "sepal_width", "petal_length", "petal_width",
"species"]
# Check for missing values print(iris.isnull().sum())
# Handle missing values (if any)
# iris.fillna(method='ffill', inplace=True)

String Manipulation:
While the Iris dataset doesn't have any text columns, this step would be crucial
for datasets with textual data.
example:
Python
# Assuming a 'text_column' exists
iris['text_column'] = iris['text_column'].str.lower().str.strip()

NumPy Operations:
# Convert relevant columns to NumPy arrays
sepal_length_np = iris['sepal_length'].values
assignments

petal_width_np = iris['petal_width'].values
# Calculate basic statistics
print("Mean sepal length:", np.mean(sepal_length_np))
print("Median petal width:", np.median(petal_width_np))

Data Splitting:
from sklearn.model_selection import train_test_split
# Split into features (X) and target (y)
X = iris[['sepal_length', 'sepal_width', 'petal_length', 'petal_width']]
y = iris['species']
# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

Building a Model:
from sklearn.linear_model import LogisticRegression
# Create a logistic regression model
model = LogisticRegression()
# Train the model
model.fit(X_train, y_train)
# Evaluate the model
accuracy = model.score(X_test, y_test)
print("Accuracy:", accuracy)

Report:
assignments

Analysis and Results:


This analysis explored the Iris dataset using Python. After loading and
cleaning the data, basic statistics were calculated using NumPy. The dataset was
then split into training and testing sets to build a logistic regression model for
predicting iris species. The model achieved an accuracy of [insert accuracy
value] on the testing set.

Further Exploration
 Feature Engineering: Consider creating new features based on existing
ones (e.g., ratios of sepal and petal measurements).
 Model Selection: Experiment with other models like decision trees,
random forests, or support vector machines.
 Hyperparameter Tuning: Optimize model parameters to improve
performance.
 Visualization: Create plots to visualize the data and model predictions.
By following these steps and exploring further, you can gain deeper insights
into the dataset and build more accurate machine learning models.

You might also like