0% found this document useful (0 votes)

20 views7 pages

Predicting Early Stage Lung Cancer Using Advanced Machine Learning Methods

Uploaded by

joestanly8055

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views7 pages

Predicting Early Stage Lung Cancer Using Advanced Machine Learning Methods

Uploaded by

joestanly8055

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

Predicting Early Stage Lung Cancer Using Advanced Machine Learning Methods

Abstract

Lung cancer is one of the most prevalent and serious diseases globally, affecting individuals of all age
groups, from children to the elderly. Annually, substantial financial resources are required for the
diagnosis and treatment of lung cancer. Existing clinical techniques, such as X-rays and other imaging
procedures, necessitate complex hardware and incur significant costs. Consequently, the need arises
for accurate and reliable prediction methods. Machine learning models offer a comparatively more
effective and cost-efficient solution for medical diagnosis using medical datasets. Long-term tobacco
smoking accounts for 85 percent of lung cancer cases, while 10–15 percent of cases occur in
individuals who have never smoked. Numerous methods and tools are currently available for data
analysis and computation. These technological advancements will be leveraged to develop prediction
models aimed at detecting lung cancer at an early stage. This study involves comparing various
classification and ensemble models, including Support Vector Machine (SVM), K-Nearest Neighbour
(KNN), Random Forest (RF), Artificial Neural Networks (ANN), and a hybrid model, the Voting
classifier. The performance of these models will be evaluated and compared in terms of their
accuracy, facilitating the early identification of lung cancer in patients using the sophisticated
technologies available today.

Keywords : Machine Learning, Support Vector Machine, Voting, Random Forest, Cancer, K-Nearest
Neighbour, Neural Networks
Introduction

Lung cancer stands as one of the most formidable health challenges worldwide, with its prevalence
cutting across all age groups, from children to the elderly. The financial burden associated with lung
cancer is significant, encompassing both the costs of diagnosis and treatment. Traditional clinical
techniques for diagnosing lung cancer, such as X-rays and other imaging procedures, require
sophisticated hardware and are often expensive. This underscores the urgent need for more efficient,
cost-effective methods to accurately predict lung cancer, particularly in its early stages. In this
context, machine learning models have emerged as a promising alternative, offering the potential for
more effective and affordable diagnostic solutions.

The etiology of lung cancer is closely linked to tobacco smoking, which is responsible for
approximately 85 percent of cases. However, it is noteworthy that 10–15 percent of lung cancer
cases occur in individuals who have never smoked, highlighting the complexity of the disease and the
necessity for robust diagnostic tools. The advent of advanced data analysis and computational
methods has paved the way for the development of sophisticated machine learning models capable
of predicting lung cancer with high accuracy. These models leverage large medical datasets to
identify patterns and correlations that might be imperceptible to traditional diagnostic approaches.

This study focuses on the application and comparison of various machine learning classification and
ensemble models for the early detection of lung cancer. The models under consideration include
Support Vector Machine (SVM), K-Nearest Neighbour (KNN), Random Forest (RF), Artificial Neural
Networks (ANN), and a hybrid model known as the Voting classifier. Each of these models brings
unique strengths to the table, and their comparative analysis aims to identify the most accurate and
reliable model for lung cancer prediction.

Machine Learning in Medical Diagnosis

Machine learning, a subset of artificial intelligence, involves the use of algorithms and statistical
models to analyze and interpret complex data sets. In medical diagnosis, machine learning models
can process vast amounts of data, identify patterns, and make predictions with a high degree of
accuracy. These models are particularly valuable in the context of lung cancer, where early detection
is crucial for improving patient outcomes. By analyzing medical data sets, machine learning models
can predict the likelihood of lung cancer in patients, potentially before clinical symptoms become
apparent.

Support Vector Machine (SVM) is a powerful classification method that finds the optimal hyperplane
for separating data into distinct classes. It is particularly effective in high-dimensional spaces and is
known for its robustness in handling both linear and non-linear data. K-Nearest Neighbour (KNN) is a
simple, yet effective, algorithm that classifies data based on the majority class among the k-nearest
neighbors. Random Forest (RF) is an ensemble method that builds multiple decision trees and
merges their predictions to improve accuracy and prevent overfitting. Artificial Neural Networks
(ANN) are inspired by the human brain's neural networks and are capable of learning complex
patterns through multiple layers of interconnected nodes.

The Voting classifier is a hybrid model that combines the predictions of multiple machine learning
algorithms to improve overall accuracy. By leveraging the strengths of various models, the Voting
classifier can provide more reliable predictions, making it a valuable tool in medical diagnosis.
Importance of Early Detection

Early detection of lung cancer significantly enhances the chances of successful treatment and
survival. Traditional diagnostic methods often detect lung cancer at advanced stages, where
treatment options are limited and less effective. Machine learning models can facilitate earlier
detection by identifying subtle patterns in medical data that may indicate the presence of lung
cancer. This early intervention can lead to better patient outcomes, reduced treatment costs, and
improved quality of life for patients.

Evaluation and Comparison of Models

In this study, the performance of various machine learning models will be evaluated and compared
based on their accuracy in predicting lung cancer. Accuracy is a critical metric in medical diagnosis, as
it directly impacts the reliability of the predictions and, consequently, the clinical decisions made
based on those predictions. By comparing the performance of different models, this study aims to
identify the most effective machine learning techniques for early stage lung cancer prediction. The
application of machine learning models in lung cancer prediction represents a significant
advancement in medical diagnostics. By leveraging advanced computational methods and large
medical data sets, these models can provide accurate and cost-effective diagnostic solutions. The
comparative analysis of various models, including SVM, KNN, RF, ANN, and the Voting classifier, will
offer valuable insights into the most effective techniques for early detection of lung cancer,
ultimately contributing to better patient outcomes and reduced healthcare costs.
import numpy as np

import pandas as pd

import seaborn as sns

import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler

from sklearn.metrics import accuracy_score, classification_report

from sklearn.svm import SVC

from sklearn.ensemble import RandomForestClassifier

from sklearn.neighbors import KNeighborsClassifier

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense

from tensorflow.keras.utils import to_categorical

# Load dataset from CSV file

# Replace 'path_to_your_file.csv' with the actual path to your dataset

data_path = 'D:/cancer/'

data = pd.read_csv(data_path)

# Check the first few rows of the dataset

print(data.head())

# Assuming the last column is the target variable and the rest are features

X = data.iloc[:, :-1].values
y = data.iloc[:, -1].values

# Plot the correlation matrix

plt.figure(figsize=(12, 10))

correlation_matrix = data.corr()

sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', linewidths=0.5)

plt.title('Correlation Matrix')

plt.show()

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the data

scaler = StandardScaler()

X_train = scaler.fit_transform(X_train)

X_test = scaler.transform(X_test)

# Train and evaluate SVM

svm_model = SVC(kernel='linear')

svm_model.fit(X_train, y_train)

y_pred_svm = svm_model.predict(X_test)

print("SVM Accuracy:", accuracy_score(y_test, y_pred_svm))

print(classification_report(y_test, y_pred_svm))

# Train and evaluate Random Forest

rf_model = RandomForestClassifier(n_estimators=100, random_state=42)

rf_model.fit(X_train, y_train)

y_pred_rf = rf_model.predict(X_test)

print("Random Forest Accuracy:", accuracy_score(y_test, y_pred_rf))

print(classification_report(y_test, y_pred_rf))
# Train and evaluate K-Nearest Neighbour

knn_model = KNeighborsClassifier(n_neighbors=5)

knn_model.fit(X_train, y_train)

y_pred_knn = knn_model.predict(X_test)

print("K-Nearest Neighbour Accuracy:", accuracy_score(y_test, y_pred_knn))

print(classification_report(y_test, y_pred_knn))

# Prepare data for CNN

# Assuming we need to reshape the data to a suitable format for CNN

# Adjust the shape as per your actual dataset's requirements

num_features = X_train.shape[1]

X_train_cnn = X_train.reshape(X_train.shape[0], int(np.sqrt(num_features)),

int(np.sqrt(num_features)), 1)

X_test_cnn = X_test.reshape(X_test.shape[0], int(np.sqrt(num_features)),

int(np.sqrt(num_features)), 1)

y_train_cnn = to_categorical(y_train, 2)

y_test_cnn = to_categorical(y_test, 2)

# Define the CNN model

cnn_model = Sequential([

Conv2D(32, (3, 3), activation='relu', input_shape=(int(np.sqrt(num_features)),

int(np.sqrt(num_features)), 1)),

MaxPooling2D((2, 2)),

Flatten(),

Dense(64, activation='relu'),

Dense(2, activation='softmax')

])

cnn_model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Train and evaluate CNN

cnn_model.fit(X_train_cnn, y_train_cnn, epochs=10, batch_size=32, validation_split=0.2)

cnn_loss, cnn_accuracy = cnn_model.evaluate(X_test_cnn, y_test_cnn)

print("CNN Accuracy:", cnn_accuracy)

# Summarize results

print(f"SVM Accuracy: {accuracy_score(y_test, y_pred_svm)}")

print(f"Random Forest Accuracy: {accuracy_score(y_test, y_pred_rf)}")

print(f"K-Nearest Neighbour Accuracy: {accuracy_score(y_test, y_pred_knn)}")

print(f"CNN Accuracy: {cnn_accuracy}")

PPT_minor[1]
No ratings yet
PPT_minor[1]
21 pages
Lung_Cancer_Detection_using_Machine_Learning
No ratings yet
Lung_Cancer_Detection_using_Machine_Learning
5 pages
5 41-55 IJMSPHR Performance of Machine Learning Algorithm
No ratings yet
5 41-55 IJMSPHR Performance of Machine Learning Algorithm
15 pages
2024
No ratings yet
2024
14 pages
Lung Cancer Prediction by Using Machine Learning Models With Distributed System and Weka Visualization Ijariie24170
No ratings yet
Lung Cancer Prediction by Using Machine Learning Models With Distributed System and Weka Visualization Ijariie24170
15 pages
Prediction of Lung Cancer Using Machine Learning Classifier
No ratings yet
Prediction of Lung Cancer Using Machine Learning Classifier
11 pages
BDM presentation
No ratings yet
BDM presentation
10 pages
Lung Cancer Report
No ratings yet
Lung Cancer Report
55 pages
V5I2N01
No ratings yet
V5I2N01
7 pages
Artificial intelligence
No ratings yet
Artificial intelligence
31 pages
Lung Cancer Project Report
No ratings yet
Lung Cancer Project Report
34 pages
Lung Cancer Detection Using Machine Learning
No ratings yet
Lung Cancer Detection Using Machine Learning
24 pages
The Relationship Between Lung Cancer Prevalence and Air
No ratings yet
The Relationship Between Lung Cancer Prevalence and Air
8 pages
1-s2.0-S2210650224003055-main
No ratings yet
1-s2.0-S2210650224003055-main
15 pages
lung_cancer
No ratings yet
lung_cancer
10 pages
Prediction of Lung Cancer Patient Survival Using Machine Learning Techniques
No ratings yet
Prediction of Lung Cancer Patient Survival Using Machine Learning Techniques
11 pages
Lung Cancer Prediction Using ML 5 Pages (1)
No ratings yet
Lung Cancer Prediction Using ML 5 Pages (1)
3 pages
1 s2.0 S2001037022001106 Main
No ratings yet
1 s2.0 S2001037022001106 Main
10 pages
Lung_Cancer_Prediction_and_Classification_Using_Machine_Learning_Algorithms
No ratings yet
Lung_Cancer_Prediction_and_Classification_Using_Machine_Learning_Algorithms
4 pages
Early Stage Lung Cancer Prediction Using Various Machine Learning Techniques
No ratings yet
Early Stage Lung Cancer Prediction Using Various Machine Learning Techniques
8 pages
IJRAR22B3053
No ratings yet
IJRAR22B3053
18 pages
ffffffffffffffffffffff
No ratings yet
ffffffffffffffffffffff
25 pages
Lung cancer detection_Research Paper-2
No ratings yet
Lung cancer detection_Research Paper-2
9 pages
ML for Air Quality
No ratings yet
ML for Air Quality
11 pages
latex_first_project (4)
No ratings yet
latex_first_project (4)
7 pages
AI Research Paper Final
No ratings yet
AI Research Paper Final
12 pages
8
No ratings yet
8
12 pages
Early Prediction of Disease Using Machine Learning: Leveraging Medical Data for Accurate Classification
No ratings yet
Early Prediction of Disease Using Machine Learning: Leveraging Medical Data for Accurate Classification
11 pages
Updated Lung Format Two
No ratings yet
Updated Lung Format Two
8 pages
Mini Project 5
No ratings yet
Mini Project 5
27 pages
Short-Term_Lung_Cancer_Survival_Prediction_Combining_Linear_Regression_and_Convolutional_Neural_Network
No ratings yet
Short-Term_Lung_Cancer_Survival_Prediction_Combining_Linear_Regression_and_Convolutional_Neural_Network
6 pages
Lung Cancer Prediction Using Machine Learning
No ratings yet
Lung Cancer Prediction Using Machine Learning
6 pages
PA Research Papers
No ratings yet
PA Research Papers
5 pages
Lung cancer detection using Ml
No ratings yet
Lung cancer detection using Ml
2 pages
DOI_FINAL
No ratings yet
DOI_FINAL
10 pages
Deep Learning and Machine Learning Algorithms to Predict Lung Cancer
No ratings yet
Deep Learning and Machine Learning Algorithms to Predict Lung Cancer
5 pages
10 1109@optronix 2019 8862326
No ratings yet
10 1109@optronix 2019 8862326
5 pages
Aihc Report
No ratings yet
Aihc Report
13 pages
A Critical Study of Classification Algorithms For Lungcancer Disease Detection and Diagnosis
No ratings yet
A Critical Study of Classification Algorithms For Lungcancer Disease Detection and Diagnosis
8 pages
Lung cancer affects both men and women
No ratings yet
Lung cancer affects both men and women
1 page
Lung Cancer Prediction Literatur Survey
No ratings yet
Lung Cancer Prediction Literatur Survey
7 pages
Hybrid model detection and classification of lung cancer
No ratings yet
Hybrid model detection and classification of lung cancer
11 pages
Nishajenipher 2020
No ratings yet
Nishajenipher 2020
6 pages
Investigation of Lung Cancer Prediction and Classification Using CT-Scan Images by Employing Machine Learning & Population Based Techniques
No ratings yet
Investigation of Lung Cancer Prediction and Classification Using CT-Scan Images by Employing Machine Learning & Population Based Techniques
11 pages
198.detection of Lung Cancer From CT Image Using SVM Classification and Compare The Survival Rate of Patients Using 3d Convolutional Neural Network (3d CNN) On Lung Nodules Data Set
No ratings yet
198.detection of Lung Cancer From CT Image Using SVM Classification and Compare The Survival Rate of Patients Using 3d Convolutional Neural Network (3d CNN) On Lung Nodules Data Set
2 pages
Article 4
No ratings yet
Article 4
7 pages
Lung Cancer Detection and Classification Using Machine Learning Algorithm
No ratings yet
Lung Cancer Detection and Classification Using Machine Learning Algorithm
7 pages
Lung Cancer Risk Prediction and Feature Importance
No ratings yet
Lung Cancer Risk Prediction and Feature Importance
6 pages
597 Icac3n23
No ratings yet
597 Icac3n23
5 pages
Download Complete Constructing Leadership 4.0: Swarm Leadership and the Fourth Industrial Revolution 1st Edition Richard Kelly PDF for All Chapters
100% (2)
Download Complete Constructing Leadership 4.0: Swarm Leadership and the Fourth Industrial Revolution 1st Edition Richard Kelly PDF for All Chapters
65 pages
M1 DS Project LungCancerPrediction
No ratings yet
M1 DS Project LungCancerPrediction
6 pages
(eBook PDF) Genetics A Conceptual Approach 5th Edition instant download
100% (1)
(eBook PDF) Genetics A Conceptual Approach 5th Edition instant download
50 pages
Minor Project (IEEE) (1)
No ratings yet
Minor Project (IEEE) (1)
2 pages
Final PPT Lung
100% (4)
Final PPT Lung
21 pages
Prediction_of_Lung_Cancer_Using_Machine_Learning_Techniques_and_their_Comparative_Analysis
No ratings yet
Prediction_of_Lung_Cancer_Using_Machine_Learning_Techniques_and_their_Comparative_Analysis
4 pages
Icimia48430 2020 9074947
No ratings yet
Icimia48430 2020 9074947
8 pages
Lung Cancer Prediction Using Data Mining Techniques
No ratings yet
Lung Cancer Prediction Using Data Mining Techniques
6 pages
CIPS L4M1.2.4 Compliance
100% (1)
CIPS L4M1.2.4 Compliance
18 pages
Lung Disease Prediction System Using Data Mining Techniques
No ratings yet
Lung Disease Prediction System Using Data Mining Techniques
6 pages
Lung Cancer Detection Using Machine Learning Algorithms and Neural Network On A Conducted Survey Dataset Lung Cancer Detection
No ratings yet
Lung Cancer Detection Using Machine Learning Algorithms and Neural Network On A Conducted Survey Dataset Lung Cancer Detection
4 pages
Lungcancer
No ratings yet
Lungcancer
5 pages
Potential Buyers List: N0 Company Name Products Contact Position Email Tel Fax Cellphone URL Address
100% (1)
Potential Buyers List: N0 Company Name Products Contact Position Email Tel Fax Cellphone URL Address
11 pages
BMamigonian-PhD2002 TESE LIBERDADE NO ATLANTICO
No ratings yet
BMamigonian-PhD2002 TESE LIBERDADE NO ATLANTICO
346 pages
Kozier and Erb's Tests
No ratings yet
Kozier and Erb's Tests
59 pages
Orifices, Mouthpieces & Time to empty a tank
No ratings yet
Orifices, Mouthpieces & Time to empty a tank
35 pages
Necrotizing Fasciitis (Autosaved)
No ratings yet
Necrotizing Fasciitis (Autosaved)
23 pages
Corporate Brand Reputation and Brand Crisis Management
No ratings yet
Corporate Brand Reputation and Brand Crisis Management
16 pages
Background of The Study - Docx123
No ratings yet
Background of The Study - Docx123
28 pages
Food Substitute List
No ratings yet
Food Substitute List
4 pages
PSB Mes
No ratings yet
PSB Mes
13 pages
Molluscs:: Bivalvia
No ratings yet
Molluscs:: Bivalvia
21 pages
Fintech - : The Uk Proposition
No ratings yet
Fintech - : The Uk Proposition
16 pages
Yao Vs People - Digest
100% (1)
Yao Vs People - Digest
4 pages
Tamil Nadu National Law School
No ratings yet
Tamil Nadu National Law School
21 pages
Mekanik Bendalir - Venturi Meter
No ratings yet
Mekanik Bendalir - Venturi Meter
17 pages
Evaluation of Mechanical and Tribological Behavior of Al-4 % Cu-X% SiC Composites Prepared Through Powder Metallurgy Technique
No ratings yet
Evaluation of Mechanical and Tribological Behavior of Al-4 % Cu-X% SiC Composites Prepared Through Powder Metallurgy Technique
11 pages
Object Oriented Programming
100% (2)
Object Oriented Programming
17 pages
Chua Periodic Answer
No ratings yet
Chua Periodic Answer
13 pages
Week 1: Day 1 Headliners
No ratings yet
Week 1: Day 1 Headliners
15 pages
100% Complete Cardlist: Verified By: Submitted By: On October 25, 2008
No ratings yet
100% Complete Cardlist: Verified By: Submitted By: On October 25, 2008
9 pages
A Clean Break
No ratings yet
A Clean Break
4 pages
AP 5905Q Inventories
No ratings yet
AP 5905Q Inventories
3 pages
A Brief Introduction To Arguments
No ratings yet
A Brief Introduction To Arguments
7 pages
Lizandra Vega (2010) The Image of Success
100% (3)
Lizandra Vega (2010) The Image of Success
272 pages
Animal Farm / Chapter 4 Reading Organizer Sample Answers
No ratings yet
Animal Farm / Chapter 4 Reading Organizer Sample Answers
3 pages
Arc
No ratings yet
Arc
7 pages
Slaughterhouse Five Comparative Essay
No ratings yet
Slaughterhouse Five Comparative Essay
4 pages
Canais Da SKY
No ratings yet
Canais Da SKY
1 page
Applied Machine Learning and Multi-criteria Decision-making in Healthcare
From Everand
Applied Machine Learning and Multi-criteria Decision-making in Healthcare
PublishDrive
No ratings yet
Smart Business Problems and Analytical Hints in Cancer Research
From Everand
Smart Business Problems and Analytical Hints in Cancer Research
Zemelak Goraga
No ratings yet

Predicting Early Stage Lung Cancer Using Advanced Machine Learning Methods

Uploaded by

Predicting Early Stage Lung Cancer Using Advanced Machine Learning Methods

Uploaded by

Predicting Early Stage Lung Cancer Using Advanced Machine Learning Methods

Machine Learning in Medical Diagnosis

Evaluation and Comparison of Models

import seaborn as sns

import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler

from sklearn.metrics import accuracy_score, classification_report

from sklearn.svm import SVC

from sklearn.ensemble import RandomForestClassifier

from sklearn.neighbors import KNeighborsClassifier

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense

from tensorflow.keras.utils import to_categorical

# Load dataset from CSV file

# Replace 'path_to_your_file.csv' with the actual path to your dataset

# Check the first few rows of the dataset

# Plot the correlation matrix

sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', linewidths=0.5)

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the data

# Train and evaluate SVM

print("SVM Accuracy:", accuracy_score(y_test, y_pred_svm))

# Train and evaluate Random Forest

rf_model = RandomForestClassifier(n_estimators=100, random_state=42)

print("Random Forest Accuracy:", accuracy_score(y_test, y_pred_rf))

print("K-Nearest Neighbour Accuracy:", accuracy_score(y_test, y_pred_knn))

# Prepare data for CNN

# Assuming we need to reshape the data to a suitable format for CNN

# Adjust the shape as per your actual dataset's requirements

X_train_cnn = X_train.reshape(X_train.shape[0], int(np.sqrt(num_features)),

X_test_cnn = X_test.reshape(X_test.shape[0], int(np.sqrt(num_features)),

# Define the CNN model

Conv2D(32, (3, 3), activation='relu', input_shape=(int(np.sqrt(num_features)),

cnn_model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Train and evaluate CNN

cnn_model.fit(X_train_cnn, y_train_cnn, epochs=10, batch_size=32, validation_split=0.2)

print("CNN Accuracy:", cnn_accuracy)

print(f"SVM Accuracy: {accuracy_score(y_test, y_pred_svm)}")

print(f"Random Forest Accuracy: {accuracy_score(y_test, y_pred_rf)}")

print(f"K-Nearest Neighbour Accuracy: {accuracy_score(y_test, y_pred_knn)}")

print(f"CNN Accuracy: {cnn_accuracy}")

You might also like