Sample INTERNSHIP Report
Sample INTERNSHIP Report
Submitted by
JASFER I (711721243038)
of
BACHELOR OF ENGINEERING
IN
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
NOV 2024
KGiSL Institute of Technology
(An Autonomous Institution)
Affiliated to Anna University, Approved by AICTE, Recognized by UGC,
Accredited by NAAC & NBA (B.E-CSE,B.E-ECE, B.Tech-IT),
365, KGiSL Campus, Thudiyalur Road, Saravanampatti, Coimbatore – 641035.
BONAFIDE CERTIFICATE
Certified that the candidates were examined by us for Summer Internship Viva
held on ____ at KGiSL Institute of Technology, Saravanampatti,
Coimbatore 641035.
We also thank all the faculty members of our department for their help in
making this Internship project a successful one. Finally, we take this
opportunity to extend our deep appreciation to our Family and Friends, for all
they meant to us during the crucial times of the completion of our project
INTERNSHIP
OFFER LETTER
INTERNSHIP
CERTIFICATE
ABSTRACT
Healthcare Focus:
Programming Language:
Libraries:
Technological Integration:
Methods:
Algorithmic Evaluation:
Performance Metrics:
Dataset Splitting:
The dataset is randomly split into training and testing sets using
Python's Scikit-Learn library. This strategic splitting process
maintains a balanced class distribution in both subsets, ensuring
representative training and evaluation samples.
The use of Scikit-Learn for dataset splitting reflects a commitment
to standardized and well-established tools in machine learning.
Training Procedure:
Software Requirements:
1. Python Version:
• Python 3.11
Hardware Requirements:
1. Processor:
• Quad-core processor or higher
2. RAM:
• 16 GB or higher
3. Storage:
• 512 GB SSD or higher
5. Display:
• Resolution: Full HD (1920 x 1080) or higher
6. Operating System:
• Windows 10, macOS, or Linux (Ubuntu recommended for machine
learning tasks)
7. Internet Connectivity:
• Required for accessing external datasets, libraries, and updates.
8. Peripheral Devices:
• Mouse and keyboard for input
• Webcam and microphone for virtual collaborations and presentations
MODULE DESCRIPTION
Data Collection Module:
oldest =
df1['BMI'].max()
fig.set(xlim=(0,
oldest))
fig.add_legend()
fig = sns.FacetGrid(df1, hue="Outcome",
aspect=4) fig.map(sns.kdeplot,
'BloodPressure', shade=True) oldest =
df1['BloodPressure'].max()
fig.set(xlim=(0, oldest))
fig.add_legend()
fig = sns.FacetGrid(df1, hue="Outcome",
aspect=4) fig.map(sns.kdeplot, 'Glucose',
shade=True) oldest = df1['Glucose'].max()
fig.set(xlim=(0,
oldest))
fig.add_legend()
#traintest split
from sklearn.model_selection import train_test_split
train,test=
train_test_split(df1,test_size=0.25,random_state=0,st
ratif y=df1['Outcome'])# stratify the outcome
X=df.drop('Outcome',axis=1)
train_X=train[train.columns[:8]
] test_X=test[test.columns[:8]]
train_Y=train['Outcome']
test_Y=test['Outcome']
Alogorithms:
from sklearn.linear_model import
LogisticRegression from sklearn.naive_bayes
import GaussianNB
from sklearn.svm import SVC
from sklearn.neighbors import
KNeighborsClassifier from sklearn.tree import
DecisionTreeClassifier from sklearn.ensemble
import RandomForestClassifier #logistic
regression:
lr =
LogisticRegression()
lr.fit(train_X,train_Y
) p =
lr.predict(test_X)
from sklearn import metrics
print('The accuracy Score for logistic regression
is:\n',metrics.accuracy_score(p,test_Y))
print('\n \n The confusion matrix: \n',
metrics.confusion_matrix(p, test_Y))
print('\n\n The metrics classification report:\n ',
metrics.classification_report(p, test_Y))
prob = lr.predict_proba(test_X)prob = prob[:, 1]
def plot_roc_curve(fpr, tpr):
plt.plot(fpr, tpr, color='orange', label='ROC')
plt.plot([0, 1], [0, 1], color='darkblue',
linestyle='--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve for logistic
regression') plt.legend()
plt.show()
#Random forest:
from sklearn.ensemble import
RandomForestClassifier rf=
RandomForestClassifier(n_estimators=100,random_state=
0) rf.fit(train_X,train_Y)
rfm = rf.predict(test_X)
# Results
#bagging classifier:
from sklearn.ensemble import
BaggingClassifier from sklearn.tree import
DecisionTreeClassifier from
sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.ensemble import
RandomForestClassifier from sklearn.metrics
import accuracy_score base_classifiers = [
DecisionTreeClassifier(),
LogisticRegression(),
RandomForestClassifier(),SVC
()
]
bagging_classifier =
BaggingClassifier( base_estimator=None, # Set
to None to use multiple
base classifiers
n_estimators=len(base_classifiers), # Number of
base classifiers
random_state=42
)
bagging_classifier.fit(X_train, y_train)
y_pred =
bagging_classifier.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy for bagging classifier:", accuracy)
CONCLUSION
Embarking on this internship journey at Exposys Data Labs has been a
profound and enriching experience. The focus on diabetes prediction through
advanced data science and machine learning techniques has not only deepened
my understanding of these technologies but has also provided practical insights
into their real-world applications, particularly in healthcare analytics.