Data Science Project
Data Science Project
Project Objectives
The goal was to build a predictive model to classify
iris flowers based on their features.
I. Introduction
The task was to build a predictive model to classify iris flowers based on four
features: Sepal Length, Sepal Width, Petal Length, Petal Width. The Iris dataset
was sourced from a CSV file.
II. Data Exploration
We started with loading the dataset using pandas and displayed initial rows to
understanding the content and format of the dataset.We checked for missing
values and anomalies in the data.
III. Data Preprocessing
The dataset was well-structured, so no explicit data cleaning was required. We
split the data into features (X) and target (y).
IV. Model Selection and Training
We chose the Decision Tree classifier for its simplicity and interpretability. The
data was split into training and testing sets, and the model was trained on the
training set.
V. Model Evaluation
The model was evaluated using accuracy, precision, recall, confusion matrix,
and classification report. These metrics were chosen to provide insights into
different aspects of classification performance.
VI. Exploratory Data Analysis (EDA)
We conducted EDA to understand the distribution of individual features and
their relationships. We used histograms, box plots, pair plots, violin plots, and a
correlation matrix heatmap.
VII. Methodologies