Sampath Report
Sampath Report
A report submitted in partial fulfilment of the requirements for the Award of Degree of
BACHELOR OF TECHNOLOGY
In
ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING
BY
A. SAMPATH
Regd. No.: 22671A7362
Under Supervision of
Ms. Maryam Fatima Farooqui
(Duration: 05th August 2023 to 15th September 2023)
I
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND
MACHINE LEARNING
J.B. INSTITUTE OF ENGINEERING AND TECHNOLOGY
(UGC Autonomous)
CERTIFICATE
II
ACKNOWLEDGEMENT
First, I would like to thank COINCENT COMPANY for giving me the opportunity to
do an internship within the organization.
I also like to thank all the people that worked along with me COINCENT COMPANY
with their patience and openness they created an enjoyable working environment.
It is indeed with a great sense of pleasure and immense sense of gratitude that I
acknowledge the help of these individuals.
I would like to thank my Head of the Department Dr. G. Arun Sampaul Thomas for
his constructive criticism throughout my internship. I am highly indebted to our
Principal Dr. P.C. Krishnamachary, for the facilities provided to accomplish this
internship.
A. SAMPATH
22671A7362
III
TABLE OF CONTENTS
1. ABSTRACT
2. ORGANISATION INFORMATION
3. INTERNSHIP OBJECTIVES
5. INTRODUCTION
6. MODULES
7. SYSTEM SPECIFICATION
9. SOFTWARE ENVIROMENT
10.SYSTEM DESIGN
11.CODING
12.SCREENSHOTS
13.CONCLUSION
IV
V
ABSTRACT
The provided code implements a Random Forest Classifier to predict diabetes based
on a dataset. The dataset, loaded using Pandas, consists of diabetes-related
features, and the target variable indicating the presence or absence of diabetes.
The data is split into training and testing sets using sklearn' s train test split
A Random Forest Classifier is trained on the training set with a specified maximum
depth and random state. The model's predictions on the test set are then evaluated
using confusion matrix and accuracy score metrics. The confusion matrix provides
insights into the classifier's performance, while the accuracy score quantifies its
overall accuracy.
The code concludes with a demonstration of making a prediction for a new data
point, and the result is displayed, indicating whether the person is diabetic or not
based on the model's prediction.
The dataset is partitioned into training and testing sets using the widely adopted
train-test split methodology. The Random Forest Classifier, characterized by a
defined maximum depth and random state, is trained on the testing set to harness
its predictive capabilities.
This Python-based approach not only provides a foundation for diabetes prediction
but also serves as a template for the development of machine learning solutions in
healthcare. By offering a transparent and accessible codebase, this project strives
to empower practitioners and researchers alike in harnessing the potential of
machine learning for predictive analytics in the realm of medical diagnoses.
1
ORGANISATION INFORMATION
Coincent is a manage marketplace that offers Live Industrial Training, Projects &
Internship to students. We are a professional community of Industry Experts and academia,
who have come together to help learners become employable. We use Design Thinking and
a learner-centred approach to problem-solving and focusing on application based learning.
Our Industrial Trainers, mentors, and counsellors are passionate tutors & student career-
driven specialists in their fields with years of Industrial Expertise in eminent companies like
Google, IBM, Microsoft, and more.
Besides we use our leading-edge and comprehensive Internally developed AI tools to make
certain that our learners experience customized and personalized learning to achieve
exponential success.
We are a bunch of individuals always working to make a difference in students’ career and
to meet the needs of the industries by bridging the skill gap between colleges and
industries.
Personalized Learning
Coincent Partnered Companies provide live interactive classes, and amiable mentors to
make the session more engaging and informative
Anywhere Anytime
Our time schedule is very flexible, and our live sessions will be held in evening time to
avoid any clashes with the college schedule
Lifetime Access
Students can access the dashboard at any time to see their progress, and a customized
resume builder will be accessible after completion of Internship.
2
METHODOLOGIES
o Data Collection
o Data Preprocessing
o Data Splitting
o Model Architecture
o Model Compilation
o Model Training
o Hyperparameter Tuning
o Model Evaluation
o Visualization
o Deployment
3
INTERNSHIP OBJECTIVES
One of the main objectives of an internship is to expose you to a
particular job and a profession or industry. While you might have an idea
about what a job is like, you won’t know until you actually perform it if it’s
what you thought it was, if you have the training and skills to do it and if it’s
something you like. For example, you might think that advertising is a
creative process that involves coming up with slogans and fun campaigns.
Taking an internship at an advertising agency would help you find that
advertising includes consumer demographic research, focus groups,
knowledge of a client’s pricing and distribution strategies, and media
research and buying. When you apply for jobs, the more experience and
accomplishments you have, the more attractive you’ll look to a potential
employer. Just because you have an internship with a specific title or well-
known company doesn’t mean your internship will help you land a nice gig.
Make an impact where you work by asking for responsibility and looking
for ways to achieve accomplishments. Be willing to work more hours than
you’re required and ask to work in different departments to expand your
skill set. Don’t just fetch coffee, make copies and sit in on meetings, even if
that’s all it will take to finish your internship.
Another benefit of an internship is developing business contacts. These
people can help you find a job later, act as references or help you with
projects after you’re hired somewhere else. Meet the people who have jobs
you would like some day and ask them if you can take them to lunch. Ask
them how they started their careers, how they got to where they are now and
if they have any suggestions for you to improve your skills.
4
WEEKLY REPORT OF INTERNSHIP ACTIVITIES
WEEK PROGRESS
WEEK-1 o Introduction to python.
o Installation.
o Basic, Number, Strings…
o Basic Python and Datatypes.
o Control flow Conditions.
WEEK-2 o Exceptional Handling.
o Functions.
o Object-Oriented Programming (OOP).
5
6
PROGRAMS AND OPPORTUNITIES:
1. Machine Learning with Python
2. Full Stack Web development
3. Cloud computing
4. Data Science
5. Cyber Security
6. Artificial Intelligence
7. Microsoft Azure Cloud Computing
8. Augmented & Virtual Reality
9. App Development Combined Course
10.Graphic Design ……etc.,
7
INTRODUCTION
Diabetes is a chronic health condition characterized by elevated blood sugar levels,
primarily resulting from the body's inability to produce or effectively use insulin. It is
a significant global health concern, with millions of people affected worldwide. Early
detection and management of diabetes are crucial for preventing complications and
improving overall health outcomes.
In the dynamic landscape of healthcare, the fusion of advanced data analytics and
machine learning has emerged as a catalyst for transformative breakthroughs. This
project represents a pioneering Endeavor, orchestrating a sophisticated diabetes
prediction system through an intricate Python implementation. By leveraging the
capabilities of Pandas, NumPy, and the scikit-learn library, the study delves into a
meticulously curated diabetes prediction dataset, navigating the complex terrain of
patient data.
Key to the project's ingenuity is the strategic identification and exclusion of crucial
features, culminating in a refined dataset that serves as the crucible for predictive
Modelling. At the heart of this Endeavor lies the Random Forest Classifier, an
ensemble learning algorithm revered for its versatility and predictive power.
Tailored to discern patterns within the data, this classifier is meticulously trained on
a testing set, ultimately offering a model capable of predicting diabetes outcomes
with a remarkable blend of accuracy and transparency.
8
Beyond its immediate application, this project carries profound implications for the
intersection of technology and healthcare. As the Python-based methodology
unfolds, it not only augurs
9
MODULES
MODULES:
● Users
● Data Collection
● Attribute Selection
● Preprocessing of data.
Users: Users add the data to the database and view the data to the view data
and predict the heart disease using ml.
Data Collection: First step for predication system is data collection and
deciding about the training and testing dataset. In this project we have used 8
0% training dataset and 20% dataset used as testing dataset the system.
Admin: Admin will give authority to Users. To activate the users. the admin
can Predict Diabetes
10
SYSTEM SPECIFICATIONS
HARDWARE REQUIREMENTS:
Ram : 16GB.
SOFTWARE REQUIREMENTS:
11
HARDWARE AND SOFTWARE SPECIFICATIONS
REQUIREMENT ANALYSIS:
The project involved analyzing the design of few applications so as to make the application more
users friendly. To do so, it was really important to keep the navigations from one screen to the other
well-ordered and at the same time reducing the amount of typing the user needs to do. In order to
make the application more accessible, the browser version had to be chosen so that it is compatible
with most of the Browsers.
REQUIREMENT SPECIFICATION
Functional Requirements:
Graphical User interface with the User.
Software Requirements:
For developing the application, the following are the Software Requirements:
1. Windows 10 64-bit OS
You can run these lines one by one in your Python interpreter to see the
intermediate results and visualize the data as you go along. For a more
comprehensive and interactive experience, consider using Jupyter
notebooks that allow mixing code, visualizations, and text explanations
in a more organized way
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
def generate_and_plot_data():
# Create sample data using NumPy
data = np.random.randn(100).cumsum()
if __name__ == "__main__":
generate_and_plot_data()
15
RANDOM FOREST CLASSIFIER:-
Random Forest Classifier is a powerful and popular machine learning algorithm used
for both classification and regression tasks. It belongs to the ensemble learning
methods, which combine multiple individual models to produce a more robust and
accurate prediction. Here's an explanation of Random Forest Classifier in the
context of data science using Python
How does Random Forest Classifier work?
1. Decision Trees: Random Forest is constructed from multiple decision trees.
Each decision tree is built independently and operates based on a set of rules
to make decisions. It splits the dataset into smaller subsets while
progressively narrowing down to make predictions.
2. Ensemble Learning: Random Forest uses the concept of ensemble learning
by creating a multitude of decision trees. Each tree is trained on a random
subset of the data and uses a random subset of features.
3. Voting Mechanism: When making predictions, Random Forest collects
predictions from each individual decision tree and performs a majority vote
(for classification) or averaging (for regression) to determine the final
prediction.
data = load_iris()
rf_classifier.fit(X_train, y_train)
16
# Make predictions on the test set
predictions = rf_classifier.predict(X_test)
# Calculate accuracy
print(f"Accuracy: {accuracy}")
Adjust the parameters like n_estimators (the number of trees in the forest) and
others based on your specific dataset and requirements.
17
18
SYSTEM DESIGN
SYSTEM ARCHITECTURE:
20
CODING: -
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
dataset = pd.read_csv('/content/diabetes_prediction_dataset.csv')
dataset.head()
Fem neve
0 80.0 0 1 25.19 6.6 140 0
ale r
Fem No
1 54.0 0 0 27.32 6.6 80 0
ale Info
Mal neve
2 28.0 0 0 27.32 5.7 158 0
e r
Fem curr
3 36.0 0 0 23.45 5.0 155 0
ale ent
Mal curr
4 76.0 1 1 20.14 4.8 155 0
e ent
Smo
a
gen hyperte heart_di ing b HbA1c_ blood_gluco diab
g
der nsion sease hist mi level se_level etes
e
ory
21
columns = ['gender', 'smoking_history', 'HbA1c_level'] # Store the column names
in a list
# Use the correct variable name (columns) and specify axis=1 to drop columns
dataset = dataset.drop(columns=columns, axis=1)
1 54.0 0 0 27.32 80 0
x=dataset.iloc[:,:-1]
22
y=dataset.iloc[:,-1]
print(x)
print(y)
age 0
hypertension 0
heart_disease 0
bmi 0
blood_glucose_level 0
diabetes 0
23
dtype: int64
print(x_train)
print(x_test)
print(y_train)
print(y_test)
24
SCREENSHOTS
Conclusion: -
The provided code is a Python script for predicting diabetes using a Random Forest
Classifier. The script uses the pandas library to handle a diabetes dataset and the scikit-
learn library for machine learning tasks. Here's a summary of the key steps in the code:
1. Data Loading: The script begins by loading a diabetes dataset from a CSV file
using the panda’s library.
2. Data Preparation: The features (X) and labels (Y) are extracted from the dataset.
The script then prints the features and labels for inspection.
3. Data Splitting: The dataset is split into training and testing sets using the train test
split function from scikit-learn.
4. Model Training: A Random Forest Classifier is initialized and trained on the
training data.
5. Prediction: The trained model is used to predict labels for the test set, and the
results are printed alongside the actual labels.
6. Model Evaluation: The script calculates a confusion matrix and accuracy score to
evaluate the performance of the model on the test set.
7. Prediction on New Data: The script performs a prediction on a new data point
representing a person's health metrics. It prints whether the person is predicted to be
diabetic or not based on the trained model.
25
The provided code serves as a concise example of using a Random Forest Classifier for
diabetes prediction. However, there are a few points to consider for improvement:
The script lacks proper comments and documentation, making it less readable and
harder to understand for someone unfamiliar with the code.
It would be beneficial to include explanations of the features in the dataset for better
understanding.
The dataset source and context are not provided, making it challenging to interpret
the significance of the features and the reliability of the model.
Further analysis, such as hyperparameter tuning, cross-validation, or feature
importance exploration, could enhance the model's performance and interpretability.
26
BIBLIOGRAPHY
1. Rosman, N.F., Asli, N.A., Abdullah, S. and Rusop, M. (2019) Review: Some Common Disease
in Mango. AIP Conference Proceedings, 2151, Article No. 020019.
2. Gulavnai, S. and Patil, R. (2019) Deep Learning for Image Based Mango Leaf Disease Detection.
International Journal of Recent Technology and Engineering.
3. Wu, S.-L., Tung, H.-Y. and Hsu, Y.-L. (2020) Deep Learning for Automatic Quality Grading of
Mangoes: Methods and Insights. 2020 19th IEEE International Conference on Machine Learning
and Applications, Miami.
4. FAO (2022) Major Tropical Fruits: Preliminary Results 2021. FAO, Rome.
5. Mohanty, S.P, Hughes, D. and Salathé, M. (2016). Using Deep Learning for Image-Based Plant
Disease Detection. Frontiers in Plant Science, 7, Article 1419.
6. Pham, T.N., Tran, L.V. and Dao, S.V.T. (2020) Early Disease Classification of Mango Leaves
Using Feed-Forward Neural Network and Hybrid Metaheuristic Feature Selection. IEEE Access.
7. Singh, U.P., Chouhan, S.S., Jain, S. and Jain, S. (2019) Multilayer Convolution Neural Network
for the Classification of Mango Leaves Infected by Anthracnose Disease. IEEE Access, 7,
43721-43729.
8. Sutrodhor, N., Hussein, M.R., Mridha, F., Karmokar, P. and Nur, T. (2018) Mango Leaf Ailment
Detection using Neural Network Ensemble and Support Vector Machine. International Journal of
Computer Applications.
9. Saleem, R., Shah, J.H., Sharif, M. and Ansari, G.J. (2021) Mango Leaf Disease Identification
Using Fully Resolution Convolutional Network. Computers, Materials & Continua,
10. Arivazhagan, S. and Ligi, S.V. (2018) Mango Leaf Diseases Identification Using Convolutional
Neural Network. International Journal of Pure and Applied Mathematics.
11. Mia, M.R., Roy, S., Das, S.K. and Rahman, M.A. (2020) Mango Leaf Disease Recognition Using
Neural Network and Support Vector Machine. Iran Journal of Computer Science.
12. Ullagaddi, S.B. and Raju, S.V. (2017) Disease Recognition in Mango Crop Using Modified
Rotational Kernel Transform Features. 2017 4th International Conference on Advanced
Computing and Communication Systems (ICACCS), Coimbatore, 6-7 January 2017.
13. Ullagaddi, S.B. and Viswanadha Raju, S. (2017) An Enhanced Feature Extraction Technique for
Diagnosis of Pathological Problems in Mango Crop. International Journal of Image, Graphics
and Signal Processing.
27
28