0% found this document useful (0 votes)

99 views32 pages

Sample INTERNSHIP Report

Uploaded by

Rathika Goapalakrishanan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

99 views32 pages

Sample INTERNSHIP Report

Uploaded by

Rathika Goapalakrishanan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 32

KGiSL Institute of Technology

(An Autonomous Institution)

Affiliated to Anna University, Approved by AICTE, Recognized by UGC,
Accredited by NAAC & NBA (B.E-CSE,B.E-ECE, B.Tech-IT),
365, KGiSL Campus, Thudiyalur Road, Saravanampatti, Coimbatore – 641035.

DIABETES PREDICTION USING

DATASCIENCE AND MACHINE
LEARNING
A SUMMER INTERNSHIP REPORT

Submitted by

JASFER I (711721243038)

in partial fulfilment for the award of the degree

BACHELOR OF ENGINEERING
IN
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

KGiSL INSTITUTE OF TECHNOLOGY

ANNA UNIVERSITY: CHENNAI 600 025

NOV 2024
KGiSL Institute of Technology
(An Autonomous Institution)
Affiliated to Anna University, Approved by AICTE, Recognized by UGC,
Accredited by NAAC & NBA (B.E-CSE,B.E-ECE, B.Tech-IT),
365, KGiSL Campus, Thudiyalur Road, Saravanampatti, Coimbatore – 641035.

BONAFIDE CERTIFICATE

Certified that this Internship report on “Diabetes prediction using Data

Science and Machine Learning” at Exposys DataLabs is the bonafide work
of JASFER I who belongs to IV Year Computer Science and Engineering “A”
during VII Semester of Academic Year 2024-2025.

FACULTY INCHARGE HEAD OF THE DEPARTMENT

Certified that the candidates were examined by us for Summer Internship Viva
held on ____ at KGiSL Institute of Technology, Saravanampatti,
Coimbatore 641035.

INTERNAL EXAMINER EXTERNAL EXAMINER

ACKNOWLEDGEMENT

We express our deepest gratitude to our Chairman and Managing

Trustee Dr. Ashok Bakthavathsalam for providing us with an environment to
complete our Internship project successfully.

We are grateful to our CEO of Academic Initiatives Mr. Aravind

Kumar Rajendran and our beloved Secretary Dr. Rajkumar N. Our sincere
thanks to honourable Principal Dr. Suresh Kumar S for his support, guidance,
and blessings.

We would like to thank Dr. Thenmozhi T, Head of the Department,

and Internship Coordinator Mr. Vivekanandan V, Department of Computer
Science and Engineering for firm support during the entire course of this
Internship and who modeled us both technically and morally for achieving
greater success in this project work.

We also thank all the faculty members of our department for their help in
making this Internship project a successful one. Finally, we take this
opportunity to extend our deep appreciation to our Family and Friends, for all
they meant to us during the crucial times of the completion of our project
INTERNSHIP
OFFER LETTER
INTERNSHIP
CERTIFICATE
ABSTRACT

During the internship tenure at Exposys Data Labs, an extensive

exploration into machine learning applications for diabetes prediction unfolded.
The analysis focused on a comprehensive dataset of 768 patient profiles,
subjecting three pivotal algorithms like Logistic Regression, Random Forest,
and Bagging ensemble to rigorous evaluation.

Sourced from a reputable medical research database, the dataset underwent

meticulous preprocessing to ensure data integrity and relevance. The subsequent
feature selection process retained only the most impactful variables, including
age, BMI, blood pressure, and glucose levels.

The comparative analysis revealed the exceptional performance of the

Bagging ensemble, particularly utilizing 10 base estimators, achieving a
noteworthy accuracy of 79%. Surpassing Logistic Regression and Random
Forest, Bagging emerged as a compelling candidate for diabetes prediction.

Exploration into feature importance within the Bagging ensemble elucidated

key predictors like glucose levels, BMI, and age signifying critical determinants
in the landscape of diabetes outcomes. These findings substantively contribute
to the domain of diabetes prediction, emphasizing the broader relevance of
ensemble methods in enhancing predictive accuracy for intricate medical
conditions.

This internship project offers a nuanced and professionally oriented

perspective on the strategic deployment of machine learning for diabetes
prediction. The robust methodology employed, coupled with insightful results,
positions this endeavor as a substantive contribution to the ongoing discourse on
leveraging data science for optimized healthcare outcomes.
INTRODUCTION
Domain:

Healthcare Focus:

 The project centers on healthcare, specifically emphasizing the

prediction of diabetes. This focus aligns with the critical need
for advanced tools in disease prediction and prevention within
the healthcare sector.
 By concentrating efforts on diabetes, a prevalent chronic
condition, the project directly contributes to improving patient
outcomes and reducing the overall burden of the disease on
healthcare systems.

Global Health Impact:

 Given the global prevalence of diabetes as a chronic health condition,

the project's scope extends beyond regional boundaries. Its findings and
methodologies have the potential to impact healthcare practices on a
global scale.
 The emphasis on diabetes prediction addresses a crucial aspect of
public health, with the goal of implementing effective strategies for
early detection and intervention, ultimately mitigating the impact of
diabetes worldwide.
Technology:

Machine Learning Algorithms:

 The project employs state-of-the-art machine learning

algorithms, including Logistic Regression, Random Forest, and
the Bagging ensemble. This strategic choice ensures a thorough
exploration of diverse modeling techniques for diabetes
prediction.
 The incorporation of multiple algorithms reflects a commitment
to leveraging the strengths of each method, allowing for a
nuanced understanding of their performance in healthcare
applications.

Programming Language:

 Python serves as the core programming language, providing a versatile

and widely adopted platform for machine learning implementations.
Its readability and extensive libraries contribute to an efficient and
standardized development process.
 The use of Python underscores the project's alignment with
industry standards and best practices in data science and machine
learning.

Libraries:

 The project leverages popular Python libraries, notably Scikit-Learn, to

streamline the implementation of machine learning algorithms. These
libraries offer robust functionalities for model development,
evaluation, and optimization.
 The choice of well-established libraries ensures the reliability and
efficiency of the project's technological stack, contributing to the
overall success of diabetes prediction models.
Advanced Predictive Modeling:

 Logistic Regression, Random Forest, and Bagging ensemble

represent advanced predictive modeling techniques. The selection of
these algorithms demonstrates a commitment to harnessing
contemporary advancements in machine learning for healthcare
applications.
 By incorporating sophisticated modeling approaches, the project aims
to uncover nuanced patterns within healthcare data, advancing the
capabilities of predictive analytics in disease diagnosis and prognosis.

Technological Integration:

 The project's technological stack integrates machine learning

technologies seamlessly, driven by Python and relevant libraries.
This cohesive integration enhances the scalability and efficiency of
the predictive modeling process.
 The alignment with current technological trends in machine learning
ensures that the project remains at the forefront of advancements,
contributing to the development of robust healthcare predictive
models.

Methods:

Systematic Data Collection:

 A comprehensive dataset of 768 patient instances is systematically

collected from a reputable medical research database. This
meticulous data collection ensures a representative sample for
training and evaluating diabetes prediction models.
 The systematic approach to data collection establishes a solid
foundation for generating insights into the complex
relationships between patient characteristics and diabetes
outcomes.

Feature Selection and Engineering:

 Rigorous preprocessing techniques are applied to enhance data

relevance, with a particular focus on feature selection and engineering.
This step ensures that only the most impactful variables, such as age,
BMI, blood pressure, and glucose levels, are included in the analysis.
 Feature engineering techniques, including scaling and
normalization, contribute to the compatibility of data across
different algorithms, enhancing the overall robustness of the
predictive models.

Algorithmic Evaluation:

 Logistic Regression, Random Forest, and Bagging ensemble

undergo systematic training and evaluation on a split dataset. This
approach enables a comprehensive understanding of each algorithm's
performance in predicting diabetes.
 The systematic training process ensures that each algorithm is
optimized for predictive accuracy, considering default parameters as
a baseline for subsequent fine-tuning.

Performance Metrics:

 Rigorous assessment of algorithmic performance employs a suite of

metrics, including accuracy, precision, recall, F1-score, and AUC-ROC.
These metrics provide a nuanced evaluation of each algorithm's
strengths and weaknesses in diabetes prediction.
 The use of multiple metrics ensures a thorough assessment,
capturing various aspects of predictive performance and allowing for
a comprehensive comparison between different algorithms.

Bagging Ensemble Focus:

 Special attention is given to the Bagging ensemble method,

characterized by the aggregation of predictions from multiple base
estimators. This ensemble approach aims to capitalize on the
collective wisdom of diverse models.
 The focus on Bagging ensemble reflects an acknowledgment of the
potential improvements in predictive accuracy that can be achieved
through ensemble methods, particularly in the context of
healthcare applications.

Feature Importance Analysis:

 Within the Bagging ensemble, a detailed analysis of feature

importance is conducted using Python-based tools. This analysis aims
to identify critical predictors influencing diabetes outcomes, providing
valuable insights into the underlying factors contributing to disease
prediction.
 Feature importance analysis adds a layer of interpretability to the
predictive models, facilitating a deeper understanding of the
variables that significantly contribute to diabetes prediction.
Contemporary Practices:

 The project's technological framework, including Python and

relevant libraries, and methodological rigor align with current best
practices in data science and healthcare analytics. This alignment
ensures that the project adheres to industry standards and leverages
the latest methodologies in predictive modeling.
 By incorporating contemporary practices, the project contributes to
the ongoing discourse on the application of data science in healthcare,
fostering innovation and advancement in the field.

Model Training and Evaluation:

Dataset Splitting:

 The dataset is randomly split into training and testing sets using
Python's Scikit-Learn library. This strategic splitting process
maintains a balanced class distribution in both subsets, ensuring
representative training and evaluation samples.
 The use of Scikit-Learn for dataset splitting reflects a commitment
to standardized and well-established tools in machine learning.

Training Procedure:

 Each algorithm, including Logistic Regression, Random Forest,

and Bagging ensemble, undergoes systematic training
SYSTEM SPECIFICATIONS

Software Requirements:
1. Python Version:
• Python 3.11

2. Integrated Development Environment (IDE):

• Jupyter Notebook or Anaconda Navigator

3. Machine Learning Libraries:

• Scikit-Learn
• Pandas
• NumPy
• Matplotlib
• Seaborn
• TensorFlow and/or PyTorch (optional, based on specific
algorithm requirements)
4. Database Management System(Optional):
• SQLite or any preferred relational database

5. Documentation and Reporting: Jupyter Notebook, Microsoft Word

Hardware Requirements:

1. Processor:
• Quad-core processor or higher

2. RAM:
• 16 GB or higher
3. Storage:
• 512 GB SSD or higher

4. Graphics Processing Unit (GPU) (optional):

• NVIDIA GeForce or AMD Radeon series with CUDA cores

5. Display:
• Resolution: Full HD (1920 x 1080) or higher

6. Operating System:
• Windows 10, macOS, or Linux (Ubuntu recommended for machine
learning tasks)

7. Internet Connectivity:
• Required for accessing external datasets, libraries, and updates.

8. Peripheral Devices:
• Mouse and keyboard for input
• Webcam and microphone for virtual collaborations and presentations
MODULE DESCRIPTION
Data Collection Module:

 Methodically retrieves a diverse dataset of 768 patient instances from

a reputable medical research database, ensuring a comprehensive
representation of health profiles.
 Establishes a robust foundation for model training and evaluation
by incorporating nuanced patient information, fostering a deeper
understanding of diabetes predictors.

Data Preprocessing Module:

 Applies sophisticated preprocessing techniques, including feature

selection and engineering, to elevate the relevance and
informativeness of the dataset.
 Implements cutting-edge scaling and normalization methodologies,
ensuring seamless compatibility across diverse machine learning
algorithms and optimizing input feature quality.

Machine Learning Algorithms Module:

 Implements three forefront machine learning algorithms:

Logistic Regression, Random Forest, and Bagging ensemble.
 Harnesses the advanced capabilities of the Scikit-Learn library, and
optionally integrates TensorFlow and PyTorch, showcasing a
commitment to state-of-the-art predictive modeling techniques.

Model Training and Evaluation Module:

 Systematically trains each algorithm on a strategically split dataset,

utilizing default parameters as a baseline for subsequent
optimization.
 Conducts a comprehensive evaluation of model performance,
employing an extensive suite of metrics, including accuracy,
precision, recall, F1-score, and AUC-ROC on the testing set.

Bagging Ensemble Focus Module:

 Delves deeply into the intricacies of the Bagging ensemble

method, emphasizing the intelligent aggregation of predictions
from diverse base estimators.
 Strives for predictive excellence by leveraging the collective
intelligence of multiple models, demonstrating a commitment
to pioneering solutions in diabetes prediction.

Feature Importance Analysis Module:

 Conducts a meticulous analysis of feature importance within

the Bagging ensemble, utilizing advanced Python-based tools
and algorithms.
 Uncovers critical predictors influencing diabetes outcomes, offering
invaluable insights for enhanced model interpretability and
informed decision-making in healthcare contexts.

Documentation and Reporting Module:

 Utilizes Jupyter Notebook, Microsoft Word, or other cutting-edge

documentation tools to craft comprehensive reports and visually
compelling presentations.
 Adopts a user-centric approach, ensuring stakeholders can
intuitively grasp the intricacies of the methodology, results, and
implications of the diabetes prediction models.
Contemporary Practices Module:

 Aligns seamlessly with current best practices in data science and

healthcare analytics, integrating state-of-the-art technologies and
methodologies.
 Serves as a catalyst for advancements in the field by adopting
and contributing to the evolution of innovative practices, ensuring
relevance and impact.

System Specifications Module:

 Defines precise and advanced software and hardware

prerequisites, emphasizing compatibility, efficiency, and optimal
performance throughout the project lifecycle.
 Ensures a seamless development and execution experience
by providing clear guidelines for the technological
infrastructure, incorporating the latest advancements.

User Interface Module:

 Integrates an intuitive and visually appealing user interface,

designed for streamlined interactions with the machine learning
models.
 Enhances user experience through thoughtful design, allowing
for effortless data input, initiation of predictions, and intuitive
visualization of results, prioritizing accessibility and usability.
EXPERIMENTAL RESULTS
Libraries and Data Visualization:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv("/content/Diabetes_Dataset.csv")
df.head()
df.tail()
df.info()
df.isnull().sum()
df.isnull().sum()
x=df[["Glucose","BloodPressure","SkinThickness","Insu
lin", "BMI"]] == 0
x = x.sum()
print(x)
import numpy as np
df[["BloodPressure","Glucose","BMI"]] =
df[["BloodPressure","Glucose","BMI"]].replace(0, np.NaN)
df.fillna(df.mean(), inplace=True)
df[["SkinThickness","Insulin"]] =
df[["SkinThickness","Insulin"]].replace(0, np.NaN)
#box plot for data
preprocessing fig =
df.hist(figsize = (20,15))
df.plot(kind='bar', subplots=True, layout=(3,3),
sharex=False, sharey=False, figsize=(50,50))
plt.show()
df1.plot(kind='box', subplots=True, layout=(3,3),
sharex=False, sharey=False, figsize=(15,15))
plt.show()

fig = sns.FacetGrid(df1, hue="Outcome",

aspect =5) fig.map(sns.kdeplot, 'Age',
shade=True)
oldest =
df1['Age'].max()
fig.set(xlim=(0,
oldest))
fig.add_legend()
fig = sns.FacetGrid(df1, hue="Outcome",
aspect=4) fig.map(sns.kdeplot, 'Insulin',
shade=True) oldest = df1['Insulin'].max()
#finding relations:
fig.set(xlim=(0, oldest))
fig.add_legend()
fig = sns.FacetGrid(df1, hue="Outcome", aspect=4)
fig.map(sns.kdeplot, 'BMI', shade=True)

oldest =
df1['BMI'].max()
fig.set(xlim=(0,
oldest))
fig.add_legend()
fig = sns.FacetGrid(df1, hue="Outcome",
aspect=4) fig.map(sns.kdeplot,
'BloodPressure', shade=True) oldest =

df1['BloodPressure'].max()

fig.set(xlim=(0, oldest))
fig.add_legend()
fig = sns.FacetGrid(df1, hue="Outcome",
aspect=4) fig.map(sns.kdeplot, 'Glucose',
shade=True) oldest = df1['Glucose'].max()
fig.set(xlim=(0,
oldest))
fig.add_legend()
#traintest split
from sklearn.model_selection import train_test_split
train,test=
train_test_split(df1,test_size=0.25,random_state=0,st
ratif y=df1['Outcome'])# stratify the outcome
X=df.drop('Outcome',axis=1)
train_X=train[train.columns[:8]
] test_X=test[test.columns[:8]]
train_Y=train['Outcome']
test_Y=test['Outcome']

Alogorithms:
from sklearn.linear_model import
LogisticRegression from sklearn.naive_bayes
import GaussianNB
from sklearn.svm import SVC
from sklearn.neighbors import
KNeighborsClassifier from sklearn.tree import
DecisionTreeClassifier from sklearn.ensemble
import RandomForestClassifier #logistic
regression:
lr =
LogisticRegression()
lr.fit(train_X,train_Y
) p =
lr.predict(test_X)
from sklearn import metrics
print('The accuracy Score for logistic regression
is:\n',metrics.accuracy_score(p,test_Y))
print('\n \n The confusion matrix: \n',
metrics.confusion_matrix(p, test_Y))
print('\n\n The metrics classification report:\n ',
metrics.classification_report(p, test_Y))
prob = lr.predict_proba(test_X)prob = prob[:, 1]
def plot_roc_curve(fpr, tpr):
plt.plot(fpr, tpr, color='orange', label='ROC')
plt.plot([0, 1], [0, 1], color='darkblue',
linestyle='--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve for logistic
regression') plt.legend()
plt.show()
#Random forest:
from sklearn.ensemble import
RandomForestClassifier rf=
RandomForestClassifier(n_estimators=100,random_state=
0) rf.fit(train_X,train_Y)
rfm = rf.predict(test_X)

# Results

print('The accuracy Score for random forest

algorithm is:\n',metrics.accuracy_score(rfm,test_Y))
print('\n \n The confusion matrix: \n',
metrics.confusion_matrix(rfm, test_Y))
print('\n\n The metrics classification report:\n ',
metrics.classification_report(rfm, test_Y))

from sklearn.metrics import roc_curve,

auc y_scores = rf.predict_proba(test_X)
[:, 1]
fpr, tpr, thresholds = roc_curve(test_Y,
y_scores) roc_auc = auc(fpr, tpr)
plt.figure()
plt.plot(fpr, tpr, color='yellow', lw=2, label='ROC
curve (area = %0.2f)' % roc_auc)
plt.plot([0, 1], [0, 1], color='navy', lw=2,
linestyle='-- ')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive
Rate') plt.ylabel('True
Positive Rate')
plt.title('Receiver Operating Characteristic for
random forest algorithm')
plt.legend(loc="lower right")

#bagging classifier:
from sklearn.ensemble import
BaggingClassifier from sklearn.tree import
DecisionTreeClassifier from
sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.ensemble import
RandomForestClassifier from sklearn.metrics
import accuracy_score base_classifiers = [
DecisionTreeClassifier(),
LogisticRegression(),
RandomForestClassifier(),SVC
()
]
bagging_classifier =
BaggingClassifier( base_estimator=None, # Set
to None to use multiple
base classifiers
n_estimators=len(base_classifiers), # Number of
base classifiers
random_state=42
)
bagging_classifier.fit(X_train, y_train)
y_pred =
bagging_classifier.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy for bagging classifier:", accuracy)
CONCLUSION
Embarking on this internship journey at Exposys Data Labs has been a
profound and enriching experience. The focus on diabetes prediction through
advanced data science and machine learning techniques has not only deepened
my understanding of these technologies but has also provided practical insights
into their real-world applications, particularly in healthcare analytics.

The internship has been structured around a meticulous series of modules,

each contributing to a holistic approach in developing and evaluating diabetes
prediction models. From the systematic data collection to the nuanced feature
engineering and the implementation of cutting-edge machine learning
algorithms, every step has been a learning opportunity. The incorporation of
advanced methodologies, such as Bagging ensemble, showcases the
commitment of Exposys Data Labs to staying at the forefront of the field.

Throughout the internship, the emphasis on contemporary practices has been

evident. The integration of state-of-the-art technologies like TensorFlow,
PyTorch, and Scikit-Learn aligns with industry best practices and demonstrates
the organization's dedication to staying current with technological
advancements. This exposure has not only enhanced my technical skills but has
also broadened my perspective on the dynamic landscape of data science.

In conclusion, this internship has been an invaluable chapter in my

professional development. The hands-on experience, exposure to advanced
technologies, and the collaborative work environment have not only enhanced
my technical skills but have also instilled in me a deeper appreciation for the
transformative potential of data science in addressing real-world challenges. I
am grateful for the opportunities and mentorship provided during this
internship, laying a strong foundation for my future endeavors in the dynamic
field of data science.

Ithaca Gun Company - M1911a1 Shipping Records
No ratings yet
Ithaca Gun Company - M1911a1 Shipping Records
7 pages
Project Report On Diabetes Prediction
No ratings yet
Project Report On Diabetes Prediction
29 pages
Sem 1 Jan'24 Batch-Date Sheet-End Semester Exams
No ratings yet
Sem 1 Jan'24 Batch-Date Sheet-End Semester Exams
3 pages
Sales Force Motivation Thesis PDF
100% (4)
Sales Force Motivation Thesis PDF
6 pages
Lecture 1-VHDL
No ratings yet
Lecture 1-VHDL
15 pages
Projectreport Diabetes Prediction
No ratings yet
Projectreport Diabetes Prediction
25 pages
SJLT Book Final
No ratings yet
SJLT Book Final
164 pages
PL 950 A English
No ratings yet
PL 950 A English
15 pages
Receipt 1713415482
No ratings yet
Receipt 1713415482
2 pages
Creating A Data Collection Form With Epicollect5
No ratings yet
Creating A Data Collection Form With Epicollect5
11 pages
AICTE Internship 2024 Project Report Template 2
No ratings yet
AICTE Internship 2024 Project Report Template 2
27 pages
Major Project
No ratings yet
Major Project
53 pages
Sample C Memorandum and Articles of Asso
No ratings yet
Sample C Memorandum and Articles of Asso
19 pages
Minor Project Report
No ratings yet
Minor Project Report
46 pages
Kanak Blackbook Project
No ratings yet
Kanak Blackbook Project
57 pages
Mozart 60x120cm Lenza Punch E Catalogue
No ratings yet
Mozart 60x120cm Lenza Punch E Catalogue
18 pages
ELAT Guidelinesenglishmsa
No ratings yet
ELAT Guidelinesenglishmsa
4 pages
Diabetes Analysis and Prediction
No ratings yet
Diabetes Analysis and Prediction
45 pages
1822 B.E Cse Batchno 227
No ratings yet
1822 B.E Cse Batchno 227
45 pages
Project Report Codecrafters
No ratings yet
Project Report Codecrafters
3 pages
DIABETES
No ratings yet
DIABETES
33 pages
Computing Key Stage 3 Lesson COMy9u5L1
No ratings yet
Computing Key Stage 3 Lesson COMy9u5L1
20 pages
Project Report Diabetes
No ratings yet
Project Report Diabetes
31 pages
FRTemplate Software
No ratings yet
FRTemplate Software
50 pages
VCE SAMPLE (2) .Edit.1
No ratings yet
VCE SAMPLE (2) .Edit.1
23 pages
Final Seminar Report Soumya
No ratings yet
Final Seminar Report Soumya
20 pages
Bca 5th Sem Minor Report
No ratings yet
Bca 5th Sem Minor Report
46 pages
Smart Materials Final
No ratings yet
Smart Materials Final
4 pages
Dap Project
No ratings yet
Dap Project
31 pages
(Ebook PDF) Politics in The Developing World 5th Edition All Chapters Instant Download
100% (4)
(Ebook PDF) Politics in The Developing World 5th Edition All Chapters Instant Download
46 pages
Minipro 2
No ratings yet
Minipro 2
24 pages
Pro 1
No ratings yet
Pro 1
11 pages
Final
No ratings yet
Final
44 pages
How Are IKEA Mattresses Packaged 800 S4
No ratings yet
How Are IKEA Mattresses Packaged 800 S4
2 pages
1.2.5 Techniques of Time Value of Money
No ratings yet
1.2.5 Techniques of Time Value of Money
11 pages
CSD Project Batch 4
No ratings yet
CSD Project Batch 4
22 pages
Diabetes Prediction
No ratings yet
Diabetes Prediction
13 pages
ppt715B.pptm (Autosaved)
No ratings yet
ppt715B.pptm (Autosaved)
15 pages
Risab
No ratings yet
Risab
13 pages
Machine Learning Based Diabetes Prediction - WITH TRACH CHANGES
No ratings yet
Machine Learning Based Diabetes Prediction - WITH TRACH CHANGES
10 pages
DSPYProject Report
No ratings yet
DSPYProject Report
14 pages
REPORT Final
No ratings yet
REPORT Final
29 pages
Simmi
No ratings yet
Simmi
8 pages
Innovative
No ratings yet
Innovative
15 pages
TDP Sem 3
No ratings yet
TDP Sem 3
9 pages
CIEA Term Project
No ratings yet
CIEA Term Project
19 pages
Slide Presetatio
No ratings yet
Slide Presetatio
30 pages
IPL Winning Prediction Intern Report
No ratings yet
IPL Winning Prediction Intern Report
52 pages
Major Project Final TABLE DIAGRAM
No ratings yet
Major Project Final TABLE DIAGRAM
28 pages
Kush Don FINAL Jatu
No ratings yet
Kush Don FINAL Jatu
11 pages
Loans Webquest
No ratings yet
Loans Webquest
3 pages
DPS
No ratings yet
DPS
18 pages
Principles of Accounting (ACC-1101)
No ratings yet
Principles of Accounting (ACC-1101)
4 pages
Pme 826 Westcott Mod 1 Minor Task 2
No ratings yet
Pme 826 Westcott Mod 1 Minor Task 2
2 pages
Seetu Papers 1
No ratings yet
Seetu Papers 1
6 pages
AI Phase1
No ratings yet
AI Phase1
2 pages
Bond Auctions and Futures PDF
No ratings yet
Bond Auctions and Futures PDF
2 pages
Mini Project Report
No ratings yet
Mini Project Report
34 pages
Machine Learning and Deep Learning Techniques
No ratings yet
Machine Learning and Deep Learning Techniques
13 pages
Machine Learning and Applications CS522I1C
No ratings yet
Machine Learning and Applications CS522I1C
15 pages
John M. Gilligan Resume
No ratings yet
John M. Gilligan Resume
3 pages
Final Survey Diabetes Prediction ML IEEE
No ratings yet
Final Survey Diabetes Prediction ML IEEE
5 pages
Ningtyas - 2019 - Bagaimana Literasi Dan Perilaku Keuangan Pada Generasi Millenial
No ratings yet
Ningtyas - 2019 - Bagaimana Literasi Dan Perilaku Keuangan Pada Generasi Millenial
10 pages
Report
No ratings yet
Report
47 pages
ZEROTHREVIEW
No ratings yet
ZEROTHREVIEW
10 pages
Power Electronics
No ratings yet
Power Electronics
676 pages
Diabetes Synopsis Report
No ratings yet
Diabetes Synopsis Report
10 pages
Diabetes Prediction Using Machine Learning
No ratings yet
Diabetes Prediction Using Machine Learning
1 page
Topacio v. Ong, G.R. No. 179895, December 18, 2008
No ratings yet
Topacio v. Ong, G.R. No. 179895, December 18, 2008
13 pages
Autoencoder
No ratings yet
Autoencoder
10 pages
BBDMS-REPORT - For Merge
No ratings yet
BBDMS-REPORT - For Merge
9 pages
Vce 90
No ratings yet
Vce 90
28 pages
Adikavi Nannaya University: University College of Engineering
No ratings yet
Adikavi Nannaya University: University College of Engineering
13 pages
Automated Payroll Management System
No ratings yet
Automated Payroll Management System
4 pages
B13 Poster (Final)
No ratings yet
B13 Poster (Final)
1 page
Project Report Minor
No ratings yet
Project Report Minor
33 pages
Internship Report
No ratings yet
Internship Report
12 pages
Diabetes Prediction Using Machine Learning
No ratings yet
Diabetes Prediction Using Machine Learning
6 pages
Lie Detection in Virutal Reality
No ratings yet
Lie Detection in Virutal Reality
10 pages
Synopsis Diabetes Pred System ML
No ratings yet
Synopsis Diabetes Pred System ML
9 pages
Ai Datascience Project Grade 10
No ratings yet
Ai Datascience Project Grade 10
14 pages
A Brand That Listens To Its Customers
No ratings yet
A Brand That Listens To Its Customers
8 pages
Mini Project
No ratings yet
Mini Project
15 pages
Gautam
No ratings yet
Gautam
7 pages
Prediction of Diabetes Using Machine Learning: A Modern User-Friendly Model
No ratings yet
Prediction of Diabetes Using Machine Learning: A Modern User-Friendly Model
7 pages
DIAPRO - Diabetes Prediction Application
No ratings yet
DIAPRO - Diabetes Prediction Application
18 pages
Business Environment Notes
No ratings yet
Business Environment Notes
4 pages
Major Proj
No ratings yet
Major Proj
12 pages
Poster Template
No ratings yet
Poster Template
1 page
Big Data and Data Science: Analytics for the Future
From Everand
Big Data and Data Science: Analytics for the Future
Dhaanyalakshmi Ahuja
No ratings yet
Disini Case - Case Analysis No. 3
No ratings yet
Disini Case - Case Analysis No. 3
33 pages
54 Batch Project Documentation-1
No ratings yet
54 Batch Project Documentation-1
82 pages
Food Law S
No ratings yet
Food Law S
13 pages
5 A. Syllabus Control Copy - THEORY
No ratings yet
5 A. Syllabus Control Copy - THEORY
2 pages
2012 SP CHSL
No ratings yet
2012 SP CHSL
16 pages
Aiml Project Report
No ratings yet
Aiml Project Report
10 pages

Sample INTERNSHIP Report

Uploaded by

Sample INTERNSHIP Report

Uploaded by

KGiSL Institute of Technology

(An Autonomous Institution)

DIABETES PREDICTION USING

in partial fulfilment for the award of the degree

KGiSL INSTITUTE OF TECHNOLOGY

ANNA UNIVERSITY: CHENNAI 600 025

Certified that this Internship report on “Diabetes prediction using Data

FACULTY INCHARGE HEAD OF THE DEPARTMENT

INTERNAL EXAMINER EXTERNAL EXAMINER

We express our deepest gratitude to our Chairman and Managing

We are grateful to our CEO of Academic Initiatives Mr. Aravind

We would like to thank Dr. Thenmozhi T, Head of the Department,

During the internship tenure at Exposys Data Labs, an extensive

Sourced from a reputable medical research database, the dataset underwent

The comparative analysis revealed the exceptional performance of the

Exploration into feature importance within the Bagging ensemble elucidated

This internship project offers a nuanced and professionally oriented

 The project centers on healthcare, specifically emphasizing the

Global Health Impact:

 Given the global prevalence of diabetes as a chronic health condition,

Machine Learning Algorithms:

 The project employs state-of-the-art machine learning

 Python serves as the core programming language, providing a versatile

 The project leverages popular Python libraries, notably Scikit-Learn, to

 Logistic Regression, Random Forest, and Bagging ensemble

 The project's technological stack integrates machine learning

Systematic Data Collection:

 A comprehensive dataset of 768 patient instances is systematically

Feature Selection and Engineering:

 Rigorous preprocessing techniques are applied to enhance data

 Logistic Regression, Random Forest, and Bagging ensemble

 Rigorous assessment of algorithmic performance employs a suite of

Bagging Ensemble Focus:

 Special attention is given to the Bagging ensemble method,

Feature Importance Analysis:

 Within the Bagging ensemble, a detailed analysis of feature

 The project's technological framework, including Python and

Model Training and Evaluation:

 Each algorithm, including Logistic Regression, Random Forest,

2. Integrated Development Environment (IDE):

3. Machine Learning Libraries:

5. Documentation and Reporting: Jupyter Notebook, Microsoft Word

4. Graphics Processing Unit (GPU) (optional):

 Methodically retrieves a diverse dataset of 768 patient instances from

Data Preprocessing Module:

 Applies sophisticated preprocessing techniques, including feature

Machine Learning Algorithms Module:

 Implements three forefront machine learning algorithms:

Model Training and Evaluation Module:

 Systematically trains each algorithm on a strategically split dataset,

Bagging Ensemble Focus Module:

 Delves deeply into the intricacies of the Bagging ensemble

Feature Importance Analysis Module:

 Conducts a meticulous analysis of feature importance within

Documentation and Reporting Module:

 Utilizes Jupyter Notebook, Microsoft Word, or other cutting-edge

 Aligns seamlessly with current best practices in data science and

System Specifications Module:

 Defines precise and advanced software and hardware

User Interface Module:

 Integrates an intuitive and visually appealing user interface,

fig = sns.FacetGrid(df1, hue="Outcome",

print('The accuracy Score for random forest

from sklearn.metrics import roc_curve,

The internship has been structured around a meticulous series of modules,

Throughout the internship, the emphasis on contemporary practices has been

In conclusion, this internship has been an invaluable chapter in my

You might also like