PDS Report 2024-25
PDS Report 2024-25
Guided by
Sunil Ghane
Course Project
Python for Data Science (S.Y.)
Abstract
This dataset provides a comprehensive basis for training machine learning models
to predict suitable crops based on environmental and soil parameters.
● Preprocessing:
Handling Missing Data - Ensured no missing values for cleaner and more
effective model training.
Normalization - For SVM, all feature values were scaled to a 0-1 range
using MinMaxScaler to improve model stability and convergence.
Label Encoding - Categorical crop labels were encoded into numeric values
to ensure compatibility with machine learning models.
● Data Exploration:
● Models Used: The models tested include Decision Tree, Naive Bayes, SVM,
Logistic Regression, and Random Forest.
● Justification: Models were selected to capture a range of learning methods,
from simple decision boundaries (Decision Tree) to ensemble learning
(Random Forest) for complex patterns. SVM was chosen for its robustness
with normalized data, while Naive Bayes and Logistic Regression provided
baseline comparisons.
2. Model Implementation
● Training and Testing: Data was split into 80% for training and 20% for
testing to assess generalizability.
● Hyperparameter Tuning: Optimized parameters such as the maximum
depth for Decision Tree and kernel type for SVM. Random Forest was
evaluated with different tree counts.
● Pandas Documentation
● NumPy Documentation
○ Link: https://scikit-learn.org/stable/
● Seaborn Documentation
● Matplotlib Documentation
○ Link: https://matplotlib.org
Appendices (optional)
● Additional Figures or Tables: Include any figures or tables that do not fit
into the main body.
● Code Snippets: Provide any relevant code sections, especially if you want
to highlight a specific method or function.