0% found this document useful (0 votes)

42 views11 pages

AI Project Medicine Recommending System

Uploaded by

kashaf.zahra04

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views11 pages

AI Project Medicine Recommending System

Uploaded by

kashaf.zahra04

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

Semester Project

“Medicine Prediction Model”

Name: Syeda Kashaf Naqvi

Roll#: BSSEM-F22-125

Section: 5C

Department: Software Engineering

Submission Date: 27-11-2024

Course: Artificial Intelligence

Course Ms. Asma Abubakar

Instructor:
1. Title and Overview

Title:
Medical Dataset for Predicting Medicines Based on Symptoms

Overview:
This dataset is designed to support the prediction of appropriate medicines based
on a patient’s symptoms. It contains medical records including patient
demographics, symptoms, diagnosed causes, and prescribed treatments. The
dataset was likely collected for research or to develop a recommendation system
for healthcare. It represents a small sample of real-world medical cases.

2. Source

Origin:
The origin of the dataset is unspecified but may have come from simulated or
anonymized medical records.

Collection Method:
Data appears to be collected via case summaries, combining observational and
synthetic inputs. The dataset includes common symptoms, probable causes, and
associated treatments.

Why use Random Forest Classifier?

Logistic Regression Accuracy: 0.5344827586206896

Decision Tree Accuracy: 0.8448275862068966

Random Forest Accuracy: 0.896551724137931

3. Content Description

Variables:

Name Type Description Units

Name Categorical Patient name (potentially anonymized). N/A

Date of Birth Date Date of birth of the patient. DD-MM-

YYYY

Gender Categorical Gender of the patient (e.g., Male, N/A

Female).

Symptoms Categorical List of reported symptoms. N/A

Causes Categorical Possible causes of symptoms. N/A

Disease Categorical Diagnosed disease based on symptoms N/A

and causes.

Medicine Categorical Prescribed medication or treatment. N/A

Sample Size:
The dataset includes 287 records, though some entries have missing values.

4. Data Structure
Format:
CSV file.

Dimensions:
287 rows × 7 columns.
5. Summary Statistics

Descriptive Statistics:

● Name: 241 non-null, with 87 unique entries; most frequent value is "Sophia
Koh".
● Gender: 242 non-null; predominantly "Male" (116 occurrences).
● Symptoms: 247 non-null, with 53 unique combinations; the most common
combination is "Fatigue, Weakness".
● Disease: 249 non-null, 68 unique; "Gastroenteritis" appears most frequently
(20 times).
● Medicine: 242 non-null, 65 unique; "Rest, Lifestyle" is the most common
treatment (16 occurrences).

Distributions:
Variables such as Symptoms, Causes, and Disease are categorical and multi-modal.
Numeric descriptive statistics are not applicable. Missing data and frequent
repetitions suggest synthetic or anonymized nature.

6. Data Quality

Missing Values:

● Missing entries in columns such as Name, Gender, Symptoms, Causes, and

Medicine.
● Possible imputation strategies include filling with mode/mean or excluding
incomplete rows.

Outliers:

● Some rare values may indicate edge cases or errors but should be validated.

Validation:

● Cross-verification against known medical knowledge could help improve

data reliability.
7. References

● Dataset Source: Kaggle: Medicine Recommendation System Dataset

● Additional Resources:
o DrugBank Dataset (for drug information)
o Symptom-Disease Mapping from Mayo Clinic

“CODE”

The following code has been implemented for the model:

Library Imports
# =========== IMPORT REQUIRED LIBRARIES =====================

import numpy as np

import pandas as pd

from sklearn.preprocessing import LabelEncoder, MultiLabelBinarizer

from sklearn.model_selection import train_test_split

from sklearn.ensemble import RandomForestClassifier

from sklearn.metrics import accuracy_score

import joblib

Pre Processing

unique_symptoms = [symptom for symptoms in df['Symptoms'] for symptom

in symptoms]

print(unique_symptoms)

# Encode Symptoms column using MultiLabelBinarizer

if not test:

# Fit and transform for training data

symptoms_encoded = mlb.fit_transform(df['Symptoms'])

else:

# Only transform for test data

symptoms_encoded = mlb.transform(df['Symptoms'])

# Create a new DataFrame for the encoded symptoms

symptoms_df = pd.DataFrame(symptoms_encoded, columns=mlb.classes_,

index=df.index)

# Drop the original Symptoms column and concatenate the new binary
features

df = pd.concat([df.drop('Symptoms', axis=1), symptoms_df], axis=1)

# Fill missing values for other categorical columns

for column in categorical_columns:

if column in df.columns:

df[column].fillna(df[column].mode()[0], inplace=True)

else:

print(f"Warning: Column '{column}' is missing in the

DataFrame.")

return df, mlb

Encoding features
# ============= Encoding of Features ==================

def label_encode(df: pd.DataFrame):

label_encoders = {}

for column in df.columns:

if df[column].dtype == 'object':

le = LabelEncoder()

df[column] = le.fit_transform(df[column])

label_encoders[column] = le

return label_encoders

def checking_missing(df: pd.DataFrame):

missing_values = df.isnull().sum()

return missing_values[missing_values > 0]

Testing on unseen test data

# ================== UNSEEN TESTING DATA ========================

def test(C_COLS, LEs, MODEL, BINARIZER):

data = pd.DataFrame({

'Name': ['Zaid', 'Jawad'],

'DateOfBirth': ['1990-01-01', '1992-02-02'],

'Gender': ['Male', 'Female'],

'Symptoms': ['Anxiety, Numbness', 'Abdominal Pain, Bloating'],

'Causes': ['Stress', 'Obesity'],

'Disease': ['Anxiety Disorder', 'Sleep Apnea'],

})

# Preprocess test data using the existing BINARIZER

data, _ = preprocess(data, C_COLS, BINARIZER, test=True)

# Apply label encoding using the trained encoders

for column in data.columns:

if column in LEs:

encoder = LEs[column]

data[column] = encoder.transform(data[column])

# Make predictions

predictions = MODEL.predict(data)

print(LEs['Medicine'].inverse_transform(predictions))

Main Function
This function asks for the training dataset csv and then train the data accordingly

# ================== MAIN FUNCTION ===================

def main():

# For Colab: Upload file and load dataset

from google.colab import files

print("Upload your 'medical data.csv' file.")

uploaded = files.upload() # Prompt for file upload in Colab

df = pd.read_csv(list(uploaded.keys())[0])

CATEGORICAL_COLUMNS_TRAIN = ['Gender', 'Causes', 'Disease',

'Medicine']

CATEGORICAL_COLUMNS_TEST = CATEGORICAL_COLUMNS_TRAIN[:-1]

print([x for x in df['Disease'].unique()])

# Preprocess training data

mlb = MultiLabelBinarizer()

df, mlb = preprocess(df, CATEGORICAL_COLUMNS_TRAIN, mlb)

# Label encode categorical columns

label_encoders = label_encode(df)

# Prepare features and target variable

X = df.drop('Medicine', axis=1)

y = df['Medicine']

# Train-test split

X_train, X_test, y_train, y_test = train_test_split(X, y,

test_size=0.2, random_state=42)
# Train a RandomForest model

rf_classifier = RandomForestClassifier(random_state=42)

rf_classifier.fit(X_train, y_train)

# Evaluate the model

predictions = rf_classifier.predict(X_test)

accuracy = accuracy_score(y_test, predictions)

print(fAccuracy: {accuracy}')

# Test on new data

# test(CATEGORICAL_COLUMNS_TEST, label_encoders, rf_classifier, mlb)

print("ALL GOOD")

# Save model (optional)

# joblib.dump(rf_classifier, 'rf_classifier.joblib')

# Call the main function

main()

Additional Program
No ratings yet
Additional Program
573 pages
COMP5318
No ratings yet
COMP5318
42 pages
Final Proj Imp and Test
No ratings yet
Final Proj Imp and Test
21 pages
Bacdeaf 23032025 115708 Split 1
No ratings yet
Bacdeaf 23032025 115708 Split 1
37 pages
DS Report 03
No ratings yet
DS Report 03
30 pages
How To Prepare Your Dataset For Machine Learning in Python
No ratings yet
How To Prepare Your Dataset For Machine Learning in Python
14 pages
Compiled Report
No ratings yet
Compiled Report
18 pages
Lab Manual - MachineLearningLaboratory-DR - Vaishnavi
No ratings yet
Lab Manual - MachineLearningLaboratory-DR - Vaishnavi
71 pages
Disease Prediction Using ML
No ratings yet
Disease Prediction Using ML
20 pages
Experiment 5
No ratings yet
Experiment 5
9 pages
ML (Lab 8) Tasks Bilal Habib (5th Semester)
No ratings yet
ML (Lab 8) Tasks Bilal Habib (5th Semester)
16 pages
Python Project Report
No ratings yet
Python Project Report
15 pages
Ai in HC - 2
No ratings yet
Ai in HC - 2
9 pages
Diabetes - Test Report
No ratings yet
Diabetes - Test Report
62 pages
Aiml Programs
No ratings yet
Aiml Programs
12 pages
MLPPT 11 45
No ratings yet
MLPPT 11 45
31 pages
AI - ML in Heathcare
No ratings yet
AI - ML in Heathcare
15 pages
Ca2 INT315
No ratings yet
Ca2 INT315
8 pages
Week - 6 - SWI - MLP - LogisticRegression - Ipynb - Colaboratory
No ratings yet
Week - 6 - SWI - MLP - LogisticRegression - Ipynb - Colaboratory
15 pages
Disease Prediction Report
No ratings yet
Disease Prediction Report
8 pages
AIML Practical 05 22105A2021
No ratings yet
AIML Practical 05 22105A2021
9 pages
17.11.24 - Jupyter Notebook - Doc
No ratings yet
17.11.24 - Jupyter Notebook - Doc
6 pages
Disease Prediction Using Patient Data
No ratings yet
Disease Prediction Using Patient Data
7 pages
ML Model Report
No ratings yet
ML Model Report
8 pages
Assignment ML
No ratings yet
Assignment ML
5 pages
PDF To Jpeg
No ratings yet
PDF To Jpeg
7 pages
Boo PH 3
No ratings yet
Boo PH 3
11 pages
A Disease Prediction Model Using Naive Bayes and Keras Based Neural Networks
No ratings yet
A Disease Prediction Model Using Naive Bayes and Keras Based Neural Networks
8 pages
24MCB0021 VL2024250505870 Ast03
No ratings yet
24MCB0021 VL2024250505870 Ast03
4 pages
ML Projects Part C
No ratings yet
ML Projects Part C
8 pages
Rubric 3 (10020,10033,10216)
No ratings yet
Rubric 3 (10020,10033,10216)
9 pages
KNN - Jupyter Notebook
No ratings yet
KNN - Jupyter Notebook
5 pages
End To End Project Multiple Disease Detection Using ML - Nomidl
No ratings yet
End To End Project Multiple Disease Detection Using ML - Nomidl
24 pages
Code Explanation
No ratings yet
Code Explanation
3 pages
Documentation Code
No ratings yet
Documentation Code
20 pages
Cse437 4
No ratings yet
Cse437 4
14 pages
Dataset Source Kaggle-1
No ratings yet
Dataset Source Kaggle-1
4 pages
Medical Diagnosis
No ratings yet
Medical Diagnosis
2 pages
ML Lab 5
No ratings yet
ML Lab 5
2 pages
Disease Prediction Based On Symptoms
No ratings yet
Disease Prediction Based On Symptoms
16 pages
20MIS7043 (LAB 7) .Ipynb Colaboratory
No ratings yet
20MIS7043 (LAB 7) .Ipynb Colaboratory
4 pages
Import As From Import From Import From Import From Import From Import From Import From Import From Import From Import From Import Import As
No ratings yet
Import As From Import From Import From Import From Import From Import From Import From Import From Import From Import From Import Import As
8 pages
Heart Disease Prediction - Jupyter Notebook
100% (1)
Heart Disease Prediction - Jupyter Notebook
9 pages
20MIS7095 (LAB 7) .Ipynb Colaboratory
No ratings yet
20MIS7095 (LAB 7) .Ipynb Colaboratory
4 pages
Disease Prediction
No ratings yet
Disease Prediction
3 pages
Healthcare-Project-Simplilearn - Week3
No ratings yet
Healthcare-Project-Simplilearn - Week3
7 pages
Stroke Prediction Dataset
No ratings yet
Stroke Prediction Dataset
48 pages
HussainBadshah SafwanSheikh
No ratings yet
HussainBadshah SafwanSheikh
12 pages
INFX 499 Milestone 1
No ratings yet
INFX 499 Milestone 1
8 pages
Python Cod1
No ratings yet
Python Cod1
3 pages
Project Report
No ratings yet
Project Report
18 pages
Team No-7
No ratings yet
Team No-7
12 pages
Final Research Paper
No ratings yet
Final Research Paper
5 pages
ExNo 08ml
No ratings yet
ExNo 08ml
4 pages
Article Eda
No ratings yet
Article Eda
7 pages
Ch-2 Acct For Public Enterprises
100% (1)
Ch-2 Acct For Public Enterprises
15 pages
Personalized Healthcare Recommendations
No ratings yet
Personalized Healthcare Recommendations
6 pages
Base Paper
No ratings yet
Base Paper
4 pages
DLL Philosophy QUARTER 1 WEEK 8
No ratings yet
DLL Philosophy QUARTER 1 WEEK 8
4 pages
Grade 1 Sses Quiz Bee Reviewer
88% (8)
Grade 1 Sses Quiz Bee Reviewer
8 pages
How To Be A Professional Organizer
No ratings yet
How To Be A Professional Organizer
2 pages
CASE DISCUSSION Subgroup 1 1
No ratings yet
CASE DISCUSSION Subgroup 1 1
112 pages
NX WAVE Geometry Linker
No ratings yet
NX WAVE Geometry Linker
13 pages
Lumbang Integrated National High School
No ratings yet
Lumbang Integrated National High School
3 pages
How To Choose ESC For Racing Drones, Mini Quad and Quadcopters - Oscar Liang
100% (1)
How To Choose ESC For Racing Drones, Mini Quad and Quadcopters - Oscar Liang
19 pages
UGBS 105 Lecture 1 - 4 Updated
No ratings yet
UGBS 105 Lecture 1 - 4 Updated
28 pages
Updated List Convocation
No ratings yet
Updated List Convocation
216 pages
Banana Paper Paper Making Process Technology Compa
0% (1)
Banana Paper Paper Making Process Technology Compa
13 pages
Stihl MS 661 C M
No ratings yet
Stihl MS 661 C M
52 pages
Building in Existing Fabric Refurbishment Extensions New Design 1sst Edition Christian Schittich
No ratings yet
Building in Existing Fabric Refurbishment Extensions New Design 1sst Edition Christian Schittich
77 pages
CSR Pepsico
No ratings yet
CSR Pepsico
5 pages
Saluting On The March: Name: Parth Anand Lalit Class: Fybsc
No ratings yet
Saluting On The March: Name: Parth Anand Lalit Class: Fybsc
10 pages
Condition Monitoring of A Surface Mounted Permanen
No ratings yet
Condition Monitoring of A Surface Mounted Permanen
18 pages
Complete Amazon Seo
No ratings yet
Complete Amazon Seo
5 pages
God's Promises: The Major Biblical Covenants
No ratings yet
God's Promises: The Major Biblical Covenants
3 pages
Dragonborn Warlock 3rd Level
No ratings yet
Dragonborn Warlock 3rd Level
3 pages
Nodi Amazzonici - Genere, Genere e Donne Guerriere Di Ariosto
No ratings yet
Nodi Amazzonici - Genere, Genere e Donne Guerriere Di Ariosto
24 pages
FP Anime
No ratings yet
FP Anime
41 pages
Psychological Behaviorism and Behaviorizing Psychology: The Behavior Analyst / MABA April 1994
No ratings yet
Psychological Behaviorism and Behaviorizing Psychology: The Behavior Analyst / MABA April 1994
23 pages
London Is Open: Kurtis Bevan, Manon Raja, Cristian Spinu, Jacob Barry
No ratings yet
London Is Open: Kurtis Bevan, Manon Raja, Cristian Spinu, Jacob Barry
10 pages
Harvard Referencing: A Guide With Examples (1.1)
No ratings yet
Harvard Referencing: A Guide With Examples (1.1)
7 pages
Unfavarouble and Hostile Witnesess
No ratings yet
Unfavarouble and Hostile Witnesess
2 pages
5 Ways To Improve User Experience
No ratings yet
5 Ways To Improve User Experience
10 pages
Organic Agricultural Production - A Case Study of Karnal District of Haryana State of India
100% (5)
Organic Agricultural Production - A Case Study of Karnal District of Haryana State of India
2 pages
Atmospheric-pollutants-EXAM-QUESTIONS-Mark Scheme
No ratings yet
Atmospheric-pollutants-EXAM-QUESTIONS-Mark Scheme
3 pages
3SU19000GB100AA0 Datasheet en
No ratings yet
3SU19000GB100AA0 Datasheet en
2 pages
Science 7th Paper
No ratings yet
Science 7th Paper
2 pages
Advanced AI Prompts for Healthcare Professionals: 100 Prompts to Enhance Diagnostics, Treatment & Patient Care
From Everand
Advanced AI Prompts for Healthcare Professionals: 100 Prompts to Enhance Diagnostics, Treatment & Patient Care
J.P. Shore
No ratings yet
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet
Statistical Classification: Fundamentals and Applications
From Everand
Statistical Classification: Fundamentals and Applications
Fouad Sabry
No ratings yet

AI Project Medicine Recommending System

Uploaded by

AI Project Medicine Recommending System

Uploaded by

Semester Project

“Medicine Prediction Model”

Name: Syeda Kashaf Naqvi

Department: Software Engineering

Submission Date: 27-11-2024

Course: Artificial Intelligence

Course Ms. Asma Abubakar

Why use Random Forest Classifier?

Decision Tree Accuracy: 0.8448275862068966

Random Forest Accuracy: 0.896551724137931

Name Type Description Units

Date of Birth Date Date of birth of the patient. DD-MM-

Gender Categorical Gender of the patient (e.g., Male, N/A

Symptoms Categorical List of reported symptoms. N/A

Causes Categorical Possible causes of symptoms. N/A

Disease Categorical Diagnosed disease based on symptoms N/A

Medicine Categorical Prescribed medication or treatment. N/A

● Missing entries in columns such as Name, Gender, Symptoms, Causes, and

● Cross-verification against known medical knowledge could help improve

● Dataset Source: Kaggle: Medicine Recommendation System Dataset

The following code has been implemented for the model:

from sklearn.preprocessing import LabelEncoder, MultiLabelBinarizer

from sklearn.model_selection import train_test_split

from sklearn.ensemble import RandomForestClassifier

from sklearn.metrics import accuracy_score

unique_symptoms = [symptom for symptoms in df['Symptoms'] for symptom

# Encode Symptoms column using MultiLabelBinarizer

# Fit and transform for training data

# Only transform for test data

# Create a new DataFrame for the encoded symptoms

symptoms_df = pd.DataFrame(symptoms_encoded, columns=mlb.classes_,

df = pd.concat([df.drop('Symptoms', axis=1), symptoms_df], axis=1)

# Fill missing values for other categorical columns

for column in categorical_columns:

print(f"Warning: Column '{column}' is missing in the

return df, mlb

def label_encode(df: pd.DataFrame):

for column in df.columns:

def checking_missing(df: pd.DataFrame):

return missing_values[missing_values > 0]

Testing on unseen test data

def test(C_COLS, LEs, MODEL, BINARIZER):

'Name': ['Zaid', 'Jawad'],

'DateOfBirth': ['1990-01-01', '1992-02-02'],

'Gender': ['Male', 'Female'],

'Symptoms': ['Anxiety, Numbness', 'Abdominal Pain, Bloating'],

'Causes': ['Stress', 'Obesity'],

# Preprocess test data using the existing BINARIZER

data, _ = preprocess(data, C_COLS, BINARIZER, test=True)

# Apply label encoding using the trained encoders

for column in data.columns:

# ================== MAIN FUNCTION ===================

# For Colab: Upload file and load dataset

from google.colab import files

print("Upload your 'medical data.csv' file.")

CATEGORICAL_COLUMNS_TRAIN = ['Gender', 'Causes', 'Disease',

print([x for x in df['Disease'].unique()])

# Preprocess training data

df, mlb = preprocess(df, CATEGORICAL_COLUMNS_TRAIN, mlb)

# Label encode categorical columns

# Prepare features and target variable

X_train, X_test, y_train, y_test = train_test_split(X, y,

# Evaluate the model

accuracy = accuracy_score(y_test, predictions)

# Test on new data

# test(CATEGORICAL_COLUMNS_TEST, label_encoders, rf_classifier, mlb)

# Save model (optional)

# Call the main function

You might also like