Lung_Cancer_Prediction_and_Classification_Using_Machine_Learning_Algorithms
Lung_Cancer_Prediction_and_Classification_Using_Machine_Learning_Algorithms
Adenocarcinoma 474
1. Random Forest: Random Forest, an ensemble learning
technique, builds multiple decision trees during training. It
Large cell carcinoma 332 then combines their predictions by taking the class mode (for
classification) or the mean prediction (for regression). It
Squamous cell carcinoma 360 should be noted that Random Forest is robust to
Normal 363 oversampling and works efficiently with high-dimensional
data. The method can be used for classification and
The ordered structure of the dataset allows it to be divided into regression, making it a versatile approach.
separate subsets for training, testing and validation. This approach 2. Decision Tree: A simple but effective supervised learning
provides comprehensive coverage and extensive evaluation of method for solving classification and regression problems is
machine learning models for lung cancer detection. called a decision tree. To maximize data access or minimize
dirtiness, it recursively partitions the data set based on
characteristic values to create a hierarchical tree structure.
Decision trees can handle categorical and numerical data and
1013
Authorized licensed use limited to: MVSR Engineering College. Downloaded on October 19,2024 at 07:38:01 UTC from IEEE Xplore. Restrictions apply.
are interpretable. K among its nearest neighbors in the feature space, it can
3. Logistic Regression: Binary classification tasks use a linear classify a new case. KNN is easy to understand. easy to use and
classification algorithm, known as logistic regression. suitable for datasets with complex decision constraints.
Logistic regression is a classification method that uses a
logistic (sigmoid) function to predict the probability that a
case belongs to a certain class. It works well with linearly ii. Evaluating parameters
separable data and is interpretable and computationally Several additional performance criteria are needed to evaluate the
economical. performance of a classification model. F1 score, precision, recall
and support are some of these metrics, as shown in Figure 4.
4. Support Vector Machine (SVM): SVM is a powerful Taken as a whole, these metrics provide a comprehensive
supervised learning method that can be applied to regression and assessment of the model's performance in lung cancer detection
classification problems. It works well with both linearly and and provide insight into its ability to accurately detect different
nonlinearly separable data because it defines an ideal types of lung tissue while reducing false positives and false
hyperplane that maximizes the margin between classes in the negatives.
feature space. SVM handles high-dimensional data well and is
robust to overfitting.
5. K-Nearest Neighbors (KNN): KNN is an example-based
non-parametric learning technique used to solve regression and
classification problems. By specifying the majority class label
5. Data Acquisition
Data acquisition involves collecting relevant data from various
sources to be used for analysis or training machine learning
models. This process typically begins with defining the data
requirements, identifying potential sources, and then collecting
the data. Techniques for data acquisition include web scraping, Fig. 4: Performance metrics
API calls, database queries, sensor data collection, and manual iii. Confusion matrix
data entry. It's essential to ensure data quality, validity, and The confusion matrix is a practical instrument to evaluate the
compliance with privacy regulations throughout the acquisition performance of varying classification models as shown in Figure
process. Once the data is acquired, it can be preprocessed, 5, which provides a comprehensive analysis of the differences
cleaned, and transformed to prepare it for analysis or model between the model predictions and the actual classes in the
training. dataset. This is particularly useful for finding out what mistakes
the model makes and where it needs to be improved. By
6. Cross- Validation examining the confusion matrix, we can gain insight into the
strengths and weaknesses of a classification model, identify
Cross-validation is a technique used to assess the performance of patterns of misclassification, and modify the model to optimize
a machine learning model by partitioning the available data into its performance in real-world scenarios.
subsets for training and testing. The process involves splitting the
data into k-folds, where the model is trained on k-1 folds and
evaluated on the remaining fold. This process is repeated k times,
with each fold serving as the test set exactly once. Cross-
validation helps to ensure that the model's performance is not
overly dependent on a particular subset of the data, providing a
more reliable estimate of its generalization ability.
1014
Authorized licensed use limited to: MVSR Engineering College. Downloaded on October 19,2024 at 07:38:01 UTC from IEEE Xplore. Restrictions apply.
Photonics.
[4] Siegel, R. L., Miller, K. D. and Jemal, A., “Cancer statistics, 2018,”
CA. Cancer J. Clin. 68(1), 7–30 (2018). Günaydin, Ö.,
Günay, M., & Şengel, Ö. (2019, April). Comparison of lung cancer
detection algorithms. In 2019 Scientific Meeting on Electrical-
Electronics & Biomedical Engineering and Computer Science (EBBT)
(pp. 1-4). IEEE.
[5] Zeebaree, D. Q., Haron, H., Abdulazeez, A. M., & Zebari, D. A. (2019,
April). Trainable Model Based on New Uniform LBP Feature to
Identify the Risk of the Breast Cancer. In 2019 International
Conference on Advanced Science and Engineering (ICOASE) (pp. 106-
111). IEEE.
[6] Zebari, D. A., Zeebaree, D. Q., Abdulazeez, A. M., Haron, H., &
Hamed, H. N. A. (2020). Improved Threshold Based and Trainable
Fully Automated Segmentation for Breast Cancer Boundary and
Pectoral Muscle in Mammogram Images. IEEE Access, 8, 203097-
203116.
[7] Alakwaa, W., Nassef, M., & Badr, A. (2017). Lung cancer detection
and classification with 3D convolutional neural network (3D-CNN).
Lung Cancer, 8(8), 409.
[8] Somvanshi, M., Chavan, P., Tambade, S., & Shinde, S. V. (2016,
August). A review of machine learning techniques using decision tree
and support vector machine. In 2016 International Conference on
Fig.6: Lung Cancer Prediction Computing Communication Control and automation (ICCUBEA) (pp.
1-7). IEEE.
[9] Maione, C., Barbosa Jr, F., & Barbosa, R. M. (2019). Predicting the
VI. CONCLUSION botanical and geographical origin of honey with multivariate data
analysis and machine learning techniques: A review. Computers and
Electronics in Agriculture, 157, 436-446.
This study has demonstrated the ability of Machine Learning [10] Sulaiman, D. M., Abdulazeez, A. M., Haron, H., & Sadiq, S. S. (2019,
(ML) algorithms to accurately classify several different types of April). Unsupervised Learning Approach-Based New Optimization K-
lung tissue using medical images. The research will make an Means Clustering for Finger Vein Image Localization. In 2019
important contribution to the ongoing fight against lung cancer International Conference on Advanced Science and Engineering
(ICOASE) (pp. 82-87). IEEE.
and improve patient outcomes by achieving high precision with
[11] Huang, C. H., Zeng, C., Wang, Y. C., Peng, H. Y., Lin, C. S.,
advanced approaches. The Random Forest (RF) algorithm was
Chang, C. J., & Yang, H. Y. (2018). A study of diagnostic accuracy
the most effective model used for correctly classifying lung using a chemical sensor array and a machine learning technique to
tissue types with 99.7% accuracy. In addition to precision, other detect lung cancer. Sensors, 18(9), 2845.
performance measures such as precision, recall and F1 score [12] Singh, G. A. P., & Gupta, P. K. (2019). Performance analysis of various
also provide insights into the performance of classification machine learning-based approaches for detection and classification of
models. lung cancer in humans. Neural Computing and Applications, 31(10),
6863-6877.
[13] Alam, J., Alam, S., & Hossan, A. (2018, February). Multi-stage lung
REFERENCES cancer detection and prediction using multi-class svm classifie. In 2018
International Conference on Computer, Communication, Chemical,
Engineering (IC4ME2) (pp. 1-4). IEEE.
[1] Yu, K. H., Lee, T. L. M., Yen, M. H., Kou, S. C., Rosen, B.,
[14] Reddy, U., Reddy, B., & Reddy, B. (2019). Recognition of Lung Cancer
Chiang, J. H., & Kohane, I. S. (2020). Reproducible Machine Learning Using Machine Learning Mechanisms with Fuzzy Neural Networks.
Methods for Lung Cancer Detection Using Computed Tomography Traitement du Signal, 36(1), 87-91.
Images: Algorithm Development and Validation. Journal of medical [15] Bhatia, S., Sinha, Y., & Goel, L. (2019). Lung cancer detection: A deep
Internet research, 22(8), e16709. learning approach. In Soft Computing for Problem Solving (pp. 699-
[2] Radhika, P. R., Nair, R. A., & Veena, G. (2019, February). A 705). Springer, Singapore.
Comparative Study of Lung Cancer Detection using Machine Learning
Algorithms. In 2019 IEEE International Conference on Electrical,
Computer and Communication Technologies (ICECCT) (pp. 1-4).
IEEE.
[3] Hussain, L., Rathore, S., Abbasi, A. A., & Saeed, S. (2019, March).
Automated lung cancer detection based on multimodal features
extracting strategy using machine learning techniques. In Medical
Imaging 2019: Physics of Medical Imaging (Vol. 10948, p.
109483Q). International Society for Optics and
1015
Authorized licensed use limited to: MVSR Engineering College. Downloaded on October 19,2024 at 07:38:01 UTC from IEEE Xplore. Restrictions apply.