LayoutingFix

Uploaded by

Santoper Yosua 210402062

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views8 pages

LayoutingFix

Uploaded by

Santoper Yosua 210402062

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Jurnal Informatika Universitas Pamulang ISSN: 2541-1004

Penerbit: Fakultas Ilmu Komputer Universitas Pamulang e-ISSN: 2622-4615

Vol. 9, No. 2, Juni 2024 (34-41) https://doi.org/10.32493/informatika.v9i2.39355

A Hybrid Model for Human DNA Sequence Classification Using

Convolutional Neural Networks and Random Forests
Gregorius Airlangga1*
1
Information System Study Program, Universitas Katolik Indonesia Atma Jaya, Jakarta, Indonesia
e-mail: [email protected]
*Corresponding author

Submitted Date: May 15th, 2024 Reviewed Date: June 29th, 2024
Revised Date: July 15th, 2024 Accepted Date: July 29th, 2024

Abstract

Human DNA sequence classification is a fundamental task in genomics, essential for understanding
genetic variations and its implications in disease susceptibility, personalized medicine, and evolutionary
biology. This study proposes a novel hybrid model combining Convolutional Neural Networks (CNN) for
feature extraction and Random Forest classifiers for final classification. The model was evaluated on a
dataset of human DNA sequences, with achieving an accuracy of 75.34%. The results showed that
performance metrics, including precision, recall, and F1-scores across multiple classes, showed significant
improvements over traditional models. The CNN component effectively captures local dependencies and
patterns within the sequences, while the Random Forest classifier handles complex decision boundaries,
resulting in enhanced classification accuracy. Comparative analysis demonstrated the superiority of our
hybrid approach, with the CNN-LSTM model achieving only 59.47% accuracy, and other RNN-based
models like CNN-GRU and CNN-BiLSTM performing similarly lower. These results suggest that hybrid
models can leverage the strengths of both deep learning and traditional machine learning techniques an
offering a more effective tool for DNA sequence classification. The future work will optimize model
architecture and explore larger, thus more diverse datasets to validate our approach's generalizability and
robustness.

Keywords: DNA classification; CNN; Random Forests; Hybrid models; Genomic data analysis

1. Introduction analysis of genomic data, offering the potential for

Advancements in the field of genomics have greater accuracy and efficiency (Li et al., 2022; Tan
significantly enhanced our understanding of the et al., 2021; Waring et al., 2020).
human genome, paving the way for breakthroughs The classification of DNA sequences
in medical research, personalized medicine, and involves determining the class or category to which
biotechnology (Satam et al., 2023; Sindelar, 2024; a given sequence belongs, based on its nucleotide
Wilson et al., 2022). One of the key challenges in composition (Tao et al., 2023). This task is
genomics is the accurate classification of DNA challenging due to the vast amount of data and the
sequences, which is crucial for identifying genetic complex patterns inherent in genomic sequences
disorders, understanding evolutionary (Cortés-Ciriano et al., 2022). Traditional
relationships, and discovering new genetic markers approaches, such as k-mer counting and motif
(Laskar et al., 2021; Maharachchikumbura et al., analysis, have been used extensively but often
2021; Theodoridis et al., 2020). Traditional require significant preprocessing and domain
methods for DNA sequence classification often expertise (Nisa et al., 2021). Machine learning
rely on manual feature extraction and domain- models, particularly deep learning architectures,
specific knowledge, which can be both time- offer a promising alternative by automating feature
consuming and prone to human error (Alamro et extraction and learning directly from raw sequence
al., 2024; Landolsi et al., 2024; Papoutsoglou et al., data (Goshisht, 2024).
2023). In recent years, machine learning techniques This study offers a novel approach for
have emerged as powerful tools for automating the human DNA sequence classification using a

http://openjournal.unpam.ac.id/index.php/informatika 34
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0
International (CC BY-NC 4.0) License
Copyright © 2024 Gregorius Airlangga
Jurnal Informatika Universitas Pamulang ISSN: 2541-1004
Penerbit: Fakultas Ilmu Komputer Universitas Pamulang e-ISSN: 2622-4615
Vol. 9, No. 2, Juni 2024 (34-41) https://doi.org/10.32493/informatika.v9i2.39355

combination of deep learning and ensemble particularly deep learning, have shown great
learning techniques. Specifically, we employ a promise in the field of genomics. Convolutional
Convolutional Neural Network (CNN) for Neural Networks (CNNs) have been successfully
automatic feature extraction from DNA sequences, applied to various genomic tasks, including
followed by a Random Forest classifier to perform sequence classification, motif discovery, and
the final classification. CNN is designed to capture variant calling (Avanzo et al., 2020). CNNs are
local patterns in the DNA sequences through well-suited for genomic data due to their ability to
convolutional layers, while Random Forest, an capture local dependencies and hierarchical
ensemble classifier, leverages the extracted patterns (Walkowiak et al., 2020). However,
features to make robust predictions. Ensemble training deep learning models on genomic data can
classifiers like Random Forest work by combining be challenging due to the high dimensionality and
the predictions of multiple base classifiers, limited availability of labeled data (Meharunnisa et
typically decision trees, will enhance overall al., 2024). Ensemble learning methods, such as
prediction performance. Using aggregating the Random Forests, provide a complementary
outputs of these individual trees, Random Forest approach by aggregating predictions from multiple
reduces the risk of overfitting and increases the models to improve accuracy and robustness
model's accuracy and generalizability. This hybrid (Mahmud et al., 2021).
approach aims to leverage the strengths of both State-of-the-art methods for DNA sequence
deep learning and traditional machine learning classification often combine deep learning with
methods, potentially improving classification traditional machine learning techniques to leverage
accuracy and generalizability. The urgency of their respective strengths (Luo et al., 2021). For
developing accurate and efficient methods for instance, hybrid models that integrate CNNs with
DNA sequence classification cannot be overstated. support vector machines (SVMs) or decision trees
With the increasing availability of genomic data, have shown improved performance over individual
driven by advances in sequencing technologies, models (Khan et al., 2020). These approaches
there is a pressing need for scalable and reliable benefit from the feature extraction capabilities of
analytical methods (Goshisht, 2024). Accurate deep learning and the interpretability and
classification of DNA sequences has far-reaching robustness of traditional classifiers (Balamurugan
implications, including the early detection of & Gnanamanoharan, 2023; Bian & Priyadarshi,
genetic diseases, identification of therapeutic 2024). Our proposed method builds on this
targets, and advancements in evolutionary biology paradigm by using CNN for feature extraction and
(Satam et al., 2023). Moreover, the ability to Random Forest for classification, aiming to achieve
automate this process can significantly reduce the a balance between accuracy, efficiency, and
time and resources required for genomic research, interpretability.
accelerating the pace of discovery and innovation The objective of this study is to develop a
(Liu et al., 2020). robust and accurate method for human DNA
Our literature survey reveals a diverse array sequence classification that can outperform
of approaches for DNA sequence classification, traditional approaches. We aim to demonstrate that
ranging from traditional statistical methods to the combination of CNN and Random Forest can
cutting-edge machine learning algorithms (Cheng effectively capture complex patterns in DNA
et al., 2023). Early methods focused on alignment- sequences and provide reliable predictions.
based techniques, such as BLAST, which compare Additionally, we seek to compare our method with
DNA sequences to known reference sequences other traditional models, such as k-mer frequency
(Wang et al., 2022). While effective, these methods analysis and alignment-based techniques, to
are computationally intensive and may not scale highlight the advantages and limitations of each
well with large datasets (Rashed et al., 2021). approach. Gap analysis reveals several areas where
Alignment-free methods, such as k-mer frequency current methods fall short. Traditional approaches
analysis, offer an alternative by representing often require extensive preprocessing and feature
sequences as fixed-length vectors, enabling faster engineering, which can be both time-consuming
comparisons. However, these methods often and prone to human error. Deep learning models,
require extensive feature engineering and may not while powerful, may suffer from overfitting and
capture complex patterns in the data (Narayanan et require large amounts of labeled data for training.
al., 2021). Recent advances in machine learning, Hybrid models, which combine deep learning and
http://openjournal.unpam.ac.id/index.php/informatika 35
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0
International (CC BY-NC 4.0) License
Copyright © 2024 Gregorius Airlangga
Jurnal Informatika Universitas Pamulang ISSN: 2541-1004
Penerbit: Fakultas Ilmu Komputer Universitas Pamulang e-ISSN: 2622-4615
Vol. 9, No. 2, Juni 2024 (34-41) https://doi.org/10.32493/informatika.v9i2.39355

traditional machine learning techniques, offer a conducted. The dataset is first checked for missing
promising solution but have not been extensively values and inconsistencies. Any sequences with
explored in the context of DNA sequence missing nucleotides or ambiguous characters (e.g.,
classification. Our research aims to address these 'N' for unknown bases) are either removed or
gaps by developing a hybrid model that is both replaced based on the overall quality and
accurate and efficient, with minimal preprocessing importance of the data. This ensures that the input
requirements. data is clean and reliable, which is essential for both
Our contributions to the field are threefold. CNN and Random Forest to learn effectively.
First, we propose a novel hybrid model that Furthermore, outlier detection and treatment
combines a CNN for feature extraction with a are conducted. Outliers in the DNA sequences,
Random Forest for classification, offering a which could be unusually short or long sequences
balance between accuracy and interpretability. or sequences with atypical nucleotide distributions,
Second, we conduct a comprehensive comparison are identified. These outliers are either corrected, if
of our method with traditional models, possible, or removed to prevent them from skewing
demonstrating its advantages in terms of accuracy the model's learning process.the DNA sequences
and efficiency. Third, we provide a detailed are converted into k-mers of length 3. Then k-mer
analysis of the model's performance, highlighting transformation is conducted. The DNA sequences
its ability to capture complex patterns in DNA are converted into k-mers of length 3. A k-mer is a
sequences and its potential for scalability to large substring of length 𝑘 from a sequence. For a DNA
datasets. The remaining structure of this journal sequence 𝑆 = 𝑠! , 𝑠" , … , 𝑠# , where 𝑠$ represents the
article is organized as follows. In the Methods i-th nucleotide, the sequence is transformed into
section, we provide a detailed description of the overlapping k-mers such that each k-mer is
dataset, preprocessing steps, and model (𝑠$ , 𝑠$%! , … , 𝑠$%&'! ). For example, for 𝑘 = 3, the
architecture. The Results section presents the sequence AGCTCGA would be represented as
performance metrics of our proposed method, AGC, GCT, CTC, TCG, CGA. This transformation
along with a comparison to traditional models. helps capture local patterns in the sequences.
Finally, the Conclusion section summarizes our Next, the class labels are encoded into
contributions and outlines potential directions for numerical values using a label encoder. Let the
future research. class labels be 𝐶 = {𝑐! , 𝑐" , … , 𝑐( } , where 𝑐$
represents the i-th class. The label encoder assigns
2. Research Methodology a unique integer to each class, transforming the
2.1. Dataset labels into 𝐶 ) = {𝑦! , 𝑦" , … , 𝑦( } , where 𝑦$ is the
The dataset used in this study consists of encoded value of class 𝑐$ . The k-mers are then
human DNA sequences, each associated with a tokenized, converting them into sequences of
specific class label indicating its category or integers. Let the vocabulary of k-mers be 𝑉 =
function. These sequences are drawn from a {𝑣! , 𝑣" , … , 𝑣& }, where 𝑣$ represents the i-th unique
comprehensive genomic database, and the dataset k-mer. The tokenizer maps each k-mer to a unique
encompasses seven distinct classes representing integer, transforming the sequence of k-mers into a
different functional categories. Each DNA sequence of integers 𝑇 = {𝑡! , 𝑡" , … , 𝑡# }, where 𝑡$ is
sequence is composed of the four nucleotides: the integer representation of the i-th k-mer.
adenine (A), cytosine (C), guanine (G), and To ensure uniform input dimensions for the
thymine (T). The sequences vary in length but have neural network, the tokenized sequences are
an average length of approximately 150 padded to a fixed length. Let 𝐿 be the desired
nucleotides. The dataset is stored in a tab-separated sequence length. If the length of a tokenized
text file with columns representing the DNA sequence 𝑇 is less than 𝐿, it is padded with zeros to
sequences and their corresponding class labels. obtain a sequence of length 𝐿 . This results in a
Dataset can be downloaded from (Vasani, 2022). padded sequence 𝑇 ) = {𝑡!) , 𝑡") , … , 𝑡*) } , where 𝑡$) is
either an integer token or zero. Finally, the dataset
2.2. Preprocessing Steps is split into training and testing sets. Let X represent
Preprocessing is a crucial step in preparing the set of padded sequences and Y represent the set
the dataset for model training. The steps involved of encoded labels. The dataset is split into training
in preprocessing the dataset are as follows: First, set (𝑋train , 𝑌train ) and testing set (𝑋test , 𝑌test ) using
the handling missing values and data cleansing is an 80-20 split, where 80% of the data is used for
http://openjournal.unpam.ac.id/index.php/informatika 36
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0
International (CC BY-NC 4.0) License
Copyright © 2024 Gregorius Airlangga
Jurnal Informatika Universitas Pamulang ISSN: 2541-1004
Penerbit: Fakultas Ilmu Komputer Universitas Pamulang e-ISSN: 2622-4615
Vol. 9, No. 2, Juni 2024 (34-41) https://doi.org/10.32493/informatika.v9i2.39355

training and 20% for testing. There are seven labels k-mer vocabulary and 𝑑 is the embedding
such as G-protein coupled receptors, Tyrosine dimension. The embedding layer transforms the
kinase, Tyrosine phosphatase, Synthetase, input sequence 𝑇 ) into a sequence of dense vectors
Synthase, Ion channel Transcription factor 𝑍 = {𝑧! , 𝑧" , … , 𝑧* } , where 𝑧$ ∈ 𝑅2 is the
embedding of the i-th k-mer. A convolutional layer
2.3. Model Architecture applies a set of filters to the embedded sequences
As presented in figure 1, the proposed model to capture local patterns. Let 𝐹 be the number of
architecture combines a Convolutional Neural filters and 𝑘3 be the filter size. Each filter 𝑊 ∈
Network (CNN) for feature extraction with a 𝑅&! ×2 is convolved with the input sequence to
Random Forest classifier for final classification. produce a feature map. The convolution operation
CNN is designed to capture local patterns in the
DNA sequences, while Random Forest leverages is defined as ℎ$ = 𝑓 C𝑊 ⋅ 𝑧$:$%&! '! + 𝑏G, where ℎ$
these features for robust predictions. is the i-th element of the feature map, 𝑓 is the
activation function (ReLU), ⋅ denotes the dot
product, and 𝑏 is the bias term.
A global max pooling layer reduces the
dimensionality of the feature maps by taking the
maximum value over each feature map. This
operation produces a fixed-length feature vector
ℎ = {ℎ! , ℎ" , … , ℎ6 } , where ℎ$ is the maximum
value in the i-th feature map. Fully connected layers
further process the extracted features. Let 𝑊3 ∈
𝑅6×7 and 𝑊8 ∈ 𝑅7×9 be the weight matrices of
the fully connected layers, where 𝐻 and 𝐺 are the
number of units in the respective layers. The output
of the fully connected layers is given by 𝑦3 =
𝑓J𝑊3 ⋅ ℎ + 𝑏3 K and 𝑦8 = 𝑓J𝑊8 ⋅ 𝑦3 + 𝑏8 K where
𝑏3 and 𝑏8 are the bias terms, and 𝑓 is the ReLU
activation function. The output 𝑦8 of the second
fully connected layer is used as the feature vector
for the subsequent classifier.

2.3.2. Random Forest Classifier

The extracted features 𝑦8 are used to trains a
Random Forest classifier. A Random Forest is an
ensemble learning method that constructs multiple
decision trees and aggregates their predictions. Let
𝐹$ be the i-th decision tree in the forest, and 𝑛 be
the total number of trees. The prediction of the
Random Forest for an input feature vector 𝑦8 is
given by the majority vote of the individual trees
𝑦M = mode{𝐹$ J𝑦8 K ∣ 𝑖 = 1, … , 𝑛 where 𝑦M is the
predicted class label. The Random Forest classifier
Figure 1. Model’s Architecture is trained on the features extracted from the training
set (𝑋train , 𝑌train ) and evaluated on the test set
2.3.1. Convolutional Neural Network (CNN) (𝑋test , 𝑌test ).
CNN consists of several layers designed to
extract features from the input sequences. The 2.4. Evaluation
architecture is as follows: firstly, an embedding The performance of the Random Forest
layer maps the integer-encoded k-mers into dense classifier is evaluated using accuracy, precision,
vectors of fixed size. Let 𝐸 be the embedding recall, and F1-score metrics. Accuracy is the ratio
matrix of size |𝑉| × 𝑑, where |𝑉| is the size of the
http://openjournal.unpam.ac.id/index.php/informatika 37
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0
International (CC BY-NC 4.0) License
Copyright © 2024 Gregorius Airlangga
Jurnal Informatika Universitas Pamulang ISSN: 2541-1004
Penerbit: Fakultas Ilmu Komputer Universitas Pamulang e-ISSN: 2622-4615
Vol. 9, No. 2, Juni 2024 (34-41) https://doi.org/10.32493/informatika.v9i2.39355

of correctly predicted instances to the total number indicating a strong ability to correctly identify
:;%:< positive instances of this class. Class 6 had the
of instances Accuracy = . Precision
:;%:<%6;%6<
highest recall (0.90), reflecting the model's
is the ratio of correctly predicted positive instances
effectiveness in capturing most of the actual
to the total predicted positive instances
:; positive instances for this class. The macro-
Precision = :;%6;. Furthermore, recall is the ratio averaged F1-score, which considers the F1-score
of correctly predicted positive instances to the total for each class and computes their unweighted
:;
actual positive instances Recall = :;%6<. F1-score mean, was 0.76, highlighting the model's overall
balanced performance. The results indicate that the
is the harmonic mean of precision and recall,
Precision×Recall proposed model outperforms several other models
F1-score = 2 × Precision%Recall, where 𝑇𝑃, 𝑇𝑁, 𝐹𝑃, in terms of accuracy.
and 𝐹𝑁 represent true positives, true negatives, The CNN-LSTM model achieved an
false positives, and false negatives, respectively. accuracy of 0.5947, precision of 0.7628, recall of
0.4660, and F1-score of 0.5756. The CNN-GRU
2.5. Comparison with Traditional Models model had an accuracy of 0.5571, precision of
The proposed method is compared with 0.7607, recall of 0.4025, and F1-score of 0.5239.
traditional models, including k-mer frequency The CNN-BiLSTM model achieved an accuracy of
analysis and alignment-based techniques. These 0.6110, precision of 0.7690, recall of 0.5039, and
methods involve manually extracting features from F1-score of 0.6042. Standalone CNN achieved an
DNA sequences and using standard classifiers like accuracy of 0.7486, precision of 0.8918, recall of
Support Vector Machines (SVMs) or k-Nearest 0.6934, and F1-score of 0.7800. The LSTM model
Neighbors (k-NN). In k-mer frequency analysis, k- achieved an accuracy of 0.7395, precision of
mer counts are extracted from the DNA sequences 0.8667, recall of 0.6856, and F1-score of 0.7646.
and used as features for classification. Let 𝐶(𝑘) be The GRU model had an accuracy of 0.7263,
the k-mer count vector for a sequence, representing precision of 0.8908, recall of 0.6258, and F1-score
the frequency of each k-mer in the sequence. These of 0.7342. The BiLSTM model achieved an
count vectors are used to train classifiers such as accuracy of 0.7397, precision of 0.8881, recall of
SVMs or k-NN. 0.6575, and F1-score of 0.7546.
In alignment-based techniques, DNA The performance comparison reveals several
sequences are aligned to known reference important insights. Firstly, the proposed hybrid
sequences using tools like BLAST. The alignment model (CNN + Random Forest) exhibits superior
scores are used as features for classification. Let performance compared to CNN-LSTM, CNN-
$A(s)$ be the alignment score vector for a GRU, and CNN-BiLSTM models. This suggests
sequence, representing the similarity scores to that while combining CNN with recurrent neural
reference sequences. These score vectors are used network (RNN) architectures like LSTM, GRU, or
to train classifiers. The performance of the BiLSTM can capture sequential dependencies in
traditional models is evaluated using the same the data, the Random Forest classifier is more
metrics as the proposed method, allowing for a effective in leveraging the features extracted by
comprehensive comparison. CNN for classification purposes. The Random
Forest's ability to aggregate the decisions from
3. Results and Discussion multiple trees contributes to its robustness and
The proposed model, which integrates a improved classification performance.
Convolutional Neural Network (CNN) for feature Secondly, standalone deep learning models,
extraction and a Random Forest classifier for final including CNN, LSTM, GRU, and BiLSTM, also
classification, demonstrated an overall accuracy of demonstrate competitive performance. The CNN
0.753 on the test set. The detailed performance model, with an accuracy of 0.7486, performs nearly
metrics, including precision, recall, and F1-score, on par with the proposed hybrid model, indicating
for each class are presented in Table 1. The model the strength of CNN in capturing spatial patterns
achieved a balanced performance across different within the DNA sequences. LSTM, GRU, and
classes, with precision values ranging from 0.65 to BiLSTM models, which are designed to handle
0.98, recall values ranging from 0.64 to 0.90, and sequential data, also achieve reasonable accuracies
F1-scores ranging from 0.65 to 0.88. The highest of 0.7395, 0.7263, and 0.7397, respectively. These
precision (0.98) was observed for class 2, models excel in capturing long-term dependencies
http://openjournal.unpam.ac.id/index.php/informatika 38
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0
International (CC BY-NC 4.0) License
Copyright © 2024 Gregorius Airlangga
Jurnal Informatika Universitas Pamulang ISSN: 2541-1004
Penerbit: Fakultas Ilmu Komputer Universitas Pamulang e-ISSN: 2622-4615
Vol. 9, No. 2, Juni 2024 (34-41) https://doi.org/10.32493/informatika.v9i2.39355

and temporal patterns, which are inherent in DNA F1-score provides a balanced measure to evaluate
sequences. overall performance. The LSTM and BiLSTM
However, the hybrid approach of combining models, with their ability to capture bidirectional
CNN for feature extraction with Random Forest for dependencies, demonstrate strong performance,
classification provides an optimal balance, with F1-scores of 0.7646 and 0.7546, respectively.
leveraging the strengths of both deep learning and The GRU model, although slightly lower in
traditional machine learning techniques. CNN performance, achieves a respectable F1-score of
efficiently extracts hierarchical features from the 0.7342. These results highlight the effectiveness of
DNA sequences, while the Random Forest, with its RNN-based models in handling sequential data,
ensemble of decision trees, effectively handles the such as DNA sequences. The proposed hybrid
classification task by reducing the risk of model (CNN + Random Forest) outperforms
overfitting and improving generalization. The several other models in terms of accuracy and
macro-averaged metrics (precision, recall, and F1- balanced performance metrics. The integration of
score) provide further insights into the model's deep learning techniques for feature extraction with
performance across different classes. The proposed traditional machine learning classifiers for final
model achieved a macro-averaged precision of classification proves to be an effective approach for
0.81, recall of 0.73, and F1-score of 0.76, indicating DNA sequence classification. The results
a balanced performance across classes. This is underscore the potential of hybrid models in
particularly important in the context of DNA leveraging the strengths of both paradigms to
sequence classification, where it is crucial to achieve superior predictive performance.
accurately identify sequences belonging to
different functional categories. Table 1. Performance Results of Models
In terms of precision, the proposed model Model Accuracy Precision Recall F1-
Score
excels in classifying sequences of classes 1, 2, and CNN_LSTM 0.5947 0.7628 0.4660 0.5756
5, with precision values of 0.93, 0.98, and 0.92, CNN_GRU 0.5571 0.7607 0.4025 0.5239
CNN_BiLSTM 0.6110 0.7690 0.5039 0.6042
respectively. These high precision values suggest CNN 0.7486 0.8918 0.6934 0.7800
that the model is effective in minimizing false LSTM 0.7395 0.8667 0.6856 0.7646
positives for these classes. The high recall value of GRU 0.7263 0.8908 0.6258 0.7342
BiLSTM 0.7397 0.8881 0.6575 0.7546
0.90 for class 6 indicates the model's ability to Hybrid Model 0.7534 0.81 0.73 0.7699
correctly identify most of the true positive
instances for this class, although the precision for Table 2. Performance Results of Models
this class is relatively lower (0.70). The balanced Class Precision Recall F1-Score
F1-scores across different classes, ranging from 0 0.84 0.72 0.77
1 0.93 0.70 0.80
0.65 to 0.88, reflect the model's overall robustness. 2 0.98 0.79 0.88
The F1-score, which considers both precision and 3 0.65 0.65 0.65
recall, is a crucial metric for evaluating 4 0.67 0.64 0.66
5 0.92 0.69 0.79
classification performance, particularly when 6 0.70 0.90 0.79
dealing with imbalanced datasets. The macro- Accuracy 0.75
Average 0.81 0.73 0.76
averaged F1-score of 0.76 further supports the
effectiveness of the proposed model in maintaining
a balance between precision and recall across all 4. Conclusions
classes. In this study, we introduced a novel hybrid
Comparing the hybrid model's performance model for human DNA sequence classification that
with standalone models, it is evident that the CNN combines a Convolutional Neural Network (CNN)
model achieves the highest precision (0.8918) for feature extraction with a Random Forest
among all models, followed by BiLSTM (0.8881), classifier for final classification. Our model
LSTM (0.8667), and GRU (0.8908). These achieved a significant performance improvement,
precision values highlight the capability of these with an accuracy of 75.34%, outperforming several
models to accurately identify positive instances. other models, including CNN-LSTM, CNN-GRU,
However, their recall values are slightly lower, and other standalone deep learning approaches.
indicating potential challenges in capturing all true The hybrid model's superior performance in
positive instances. This trade-off between precision precision, recall, and F1-score across multiple
and recall is common in classification tasks, and the classes demonstrates its effectiveness in accurately

http://openjournal.unpam.ac.id/index.php/informatika 39
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0
International (CC BY-NC 4.0) License
Copyright © 2024 Gregorius Airlangga
Jurnal Informatika Universitas Pamulang ISSN: 2541-1004
Penerbit: Fakultas Ilmu Komputer Universitas Pamulang e-ISSN: 2622-4615
Vol. 9, No. 2, Juni 2024 (34-41) https://doi.org/10.32493/informatika.v9i2.39355

classifying DNA sequences into their respective classification, challenges, and Future Research
categories. The significance of our findings lies in Issues. Archives of Computational Methods in
the innovative integration of CNNs and Random Engineering, 1–25.
Forests, which effectively captures local Cheng, K., Guo, Q., He, Y., Lu, Y., Gu, S. & Wu, H.
(2023). Exploring the potential of GPT-4 in
dependencies within DNA sequences while also
biomedical engineering: the dawn of a new era.
handling complex decision boundaries. This Annals of Biomedical Engineering, 51(8),
combination allows for a more nuanced 1645–1653.
understanding and classification of genomic data, Cortés-Ciriano, I., Gulhan, D. C., Lee, J. J.-K., Melloni,
setting our approach apart from traditional models. G. E. M. & Park, P. J. (2022). Computational
Notably, the CNN-LSTM model, which achieved analysis of cancer genome sequencing data.
an accuracy of 59.47%, was less effective Nature Reviews Genetics, 23(5), 298–314.
compared to our hybrid model, underscoring the Goshisht, M. K. (2024). Machine Learning and Deep
potential of combining deep learning with Learning in Synthetic Biology: Key
traditional machine learning techniques. Architectures, Applications, and Challenges.
ACS Omega, 9(9), 9921–9945.
Our research contributes to the existing body
Khan, S., Sajjad, M., Hussain, T., Ullah, A. & Imran, A.
of knowledge by offering a scalable and efficient S. (2020). A review on traditional machine
solution for genomic data analysis, demonstrating learning and deep learning models for WBCs
that hybrid models can leverage the strengths of classification in blood smear images. Ieee
both deep learning and traditional machine learning Access, 9, 10657–10673.
to improve predictive accuracy. This advancement Landolsi, M. Y., Hlaoua, L. & Romdhane, L. Ben.
has the potential to lead to more accurate and robust (2024). Extracting and structuring information
predictive models in the field of human DNA from the electronic medical text: state of the art
analysis, facilitating better understanding and and trendy directions. Multimedia Tools and
classification of genomic sequences. Future work Applications, 83(7), 21229–21280.
Laskar, P., Bhattacharya, S., Chaudhuri, A. & Kundu, A.
will focus on optimizing the model architecture,
(2021). Exploring the GRAS gene family in
including fine-tuning hyperparameters and common bean (Phaseolus vulgaris L.):
experimenting with different combinations of characterization, evolutionary relationships,
feature extraction and classification techniques. and expression analyses in response to abiotic
Additionally, applying the proposed model to stresses. Planta, 254, 1–21.
larger and more diverse genomic datasets could Li, R., Li, L., Xu, Y. & Yang, J. (2022). Machine
provide further insights into its generalizability and learning meets omics: applications and
robustness. Exploring other hybrid approaches, perspectives. Briefings in Bioinformatics,
such as combining different deep learning 23(1), bbab460.
architectures or incorporating domain-specific Liu, C., Ma, Y., Zhao, J., Nussinov, R., Zhang, Y.-C.,
Cheng, F. & Zhang, Z.-K. (2020).
knowledge, could also be a promising direction for
Computational network biology: data, models,
improving DNA sequence classification. and applications. Physics Reports, 846, 1–66.
Luo, D., Cheng, W., Yu, W., Zong, B., Ni, J., Chen, H.
References & Zhang, X. (2021). Learning to drop: Robust
graph neural network via topological
Alamro, H., Gojobori, T., Essack, M. & Gao, X. (2024). denoising. Proceedings of the 14th ACM
BioBBC: a multi-feature model that enhances International Conference on Web Search and
the detection of biomedical entities. Scientific Data Mining, 779–787.
Reports, 14(1), 7697. Maharachchikumbura, S. S. N., Chen, Y., Ariyawansa,
Avanzo, M., Wei, L., Stancanello, J., Vallieres, M., Rao, H. A., Hyde, K. D., Haelewaters, D., Perera, R.
A., Morin, O., Mattonen, S. A. & El Naqa, I. H., Samarakoon, M. C., Wanasinghe, D. N.,
(2020). Machine and deep learning methods for Bustamante, D. E., Liu, J.-K. & others. (2021).
radiomics. Medical Physics, 47(5), e185--e202. Integrative approaches for species delimitation
Balamurugan, T. & Gnanamanoharan, E. (2023). Brain in Ascomycota. Fungal Diversity, 109(1), 155–
tumor segmentation and classification using 179.
hybrid deep CNN with LuNetClassifier. Neural Mahmud, M., Kaiser, M. S., McGinnity, T. M. &
Computing and Applications, 35(6), 4739– Hussain, A. (2021). Deep learning in mining
4753. biological data. Cognitive Computation, 13(1),
Bian, K. & Priyadarshi, R. (2024). Machine learning 1–33.
optimization techniques: a Survey,

http://openjournal.unpam.ac.id/index.php/informatika 40
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0
International (CC BY-NC 4.0) License
Copyright © 2024 Gregorius Airlangga
Jurnal Informatika Universitas Pamulang ISSN: 2541-1004
Penerbit: Fakultas Ilmu Komputer Universitas Pamulang e-ISSN: 2622-4615
Vol. 9, No. 2, Juni 2024 (34-41) https://doi.org/10.32493/informatika.v9i2.39355

Meharunnisa, M., Sornam, M. & Ramesh, B. (2024). An Tan, X., Su, A. T., Hajiabadi, H., Tran, M. & Nguyen,
Optimized Hybrid Model for Classifying Q. (2021). Applying machine learning for
Bacterial Genus using an Integrated CNN-RF integration of multi-modal genomics data and
Approach on 16S rDNA Sequences: imaging data to quantify heterogeneity in
OPTIMIZED CNN-RF MODEL FOR tumour tissues. Artificial Neural Networks,
BACTERIAL GENUS CLASSIFICATION. 209–228.
Journal of Scientific & Industrial Research Tao, J., Bauer, D. E. & Chiarle, R. (2023). Assessing and
(JSIR), 83(4), 392–404. advancing the safety of CRISPR-Cas tools:
Narayanan, D., Shoeybi, M., Casper, J., LeGresley, P., from DNA to RNA editing. Nature
Patwary, M., Korthikanti, V., Vainbrand, D., Communications, 14(1), 212.
Kashinkunti, P., Bernauer, J., Catanzaro, B. & Theodoridis, S., Fordham, D. A., Brown, S. C., Li, S.,
others. (2021). Efficient large-scale language Rahbek, C. & Nogues-Bravo, D. (2020).
model training on gpu clusters using megatron- Evolutionary history and past climate change
lm. Proceedings of the International shape the distribution of genetic diversity in
Conference for High Performance Computing, terrestrial mammals. Nature Communications,
Networking, Storage and Analysis, 1–15. 11(1), 2557.
Nisa, I., Pandey, P., Ellis, M., Oliker, L., Buluç, A. & Vasani, N. (2022). Human DNA Data.
Yelick, K. (2021). Distributed-memory k-mer https://www.kaggle.com/datasets/neelvasani/h
counting on GPUs. 2021 IEEE International umandnadata
Parallel and Distributed Processing Walkowiak, S., Gao, L., Monat, C., Haberer, G., Kassa,
Symposium (IPDPS), 527–536. M. T., Brinton, J., Ramirez-Gonzalez, R. H.,
Papoutsoglou, G., Tarazona, S., Lopes, M. B., Kolodziej, M. C., Delorean, E., Thambugala,
Klammsteiner, T., Ibrahimi, E., Eckenberger, D. & others. (2020). Multiple wheat genomes
J., Novielli, P., Tonda, A., Simeon, A., Shigdel, reveal global variation in modern breeding.
R. & others. (2023). Machine learning Nature, 588(7837), 277–283.
approaches in microbiome research: challenges Wang, Z., Jiang, Y., Liu, Z., Tang, X. & Li, H. (2022).
and best practices. Frontiers in Microbiology, Machine learning and ensemble learning for
14, 1261889. transcriptome data: principles and advances.
Rashed, A. E. E.-D., Amer, H. M., El-Seddek, M. & 2022 5th International Conference on
Moustafa, H. E.-D. (2021). Sequence Advanced Electronic Materials, Computers and
alignment using machine learning-based Software Engineering (AEMCSE), 676–683.
needleman--wunsch algorithm. IEEE Access, Waring, J., Lindvall, C. & Umeton, R. (2020).
9, 109522–109535. Automated machine learning: Review of the
Satam, H., Joshi, K., Mangrolia, U., Waghoo, S., Zaidi, state-of-the-art and opportunities for
G., Rawool, S., Thakare, R. P., Banday, S., healthcare. Artificial Intelligence in Medicine,
Mishra, A. K., Das, G. & others. (2023). Next- 104, 101822.
generation sequencing technology: current Wilson, S., Steele, S. & Adeli, K. (2022). Innovative
trends and advancements. Biology, 12(7), 997. technological advancements in laboratory
Sindelar, R. D. (2024). Genomics, other “OMIC” medicine: Predicting the lab of the future.
technologies, precision medicine, and Biotechnology & Biotechnological Equipment,
additional biotechnology-related techniques. In 36(sup1), S9--S21.
Pharmaceutical Biotechnology: Fundamentals
and Applications (pp. 209–254). Springer.

http://openjournal.unpam.ac.id/index.php/informatika 41
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0
International (CC BY-NC 4.0) License
Copyright © 2024 Gregorius Airlangga

DNA Sequencing With Machine Learning
No ratings yet
DNA Sequencing With Machine Learning
34 pages
Genomic Sequence Data Classification Using Machine Learning Techniques
100% (1)
Genomic Sequence Data Classification Using Machine Learning Techniques
23 pages
NIHMS753481 Supplement Supplemental Data
No ratings yet
NIHMS753481 Supplement Supplemental Data
124 pages
Sequencelab: A Comprehensive Benchmark of Computational Methods For Comparing Genomic Sequences
No ratings yet
Sequencelab: A Comprehensive Benchmark of Computational Methods For Comparing Genomic Sequences
40 pages
Deep Lerning Annottation
No ratings yet
Deep Lerning Annottation
11 pages
Gene Prediction Using Statistical Methods
No ratings yet
Gene Prediction Using Statistical Methods
47 pages
Optimizing Classification Efficiency With Machine Learning Techniques For Pattern Matching
No ratings yet
Optimizing Classification Efficiency With Machine Learning Techniques For Pattern Matching
18 pages
Action Plan in He 2019 2020
No ratings yet
Action Plan in He 2019 2020
2 pages
Proj 782
No ratings yet
Proj 782
31 pages
A Review of Deep Learning Applications in Human Genomics Using Next-Generation Sequencing Data
No ratings yet
A Review of Deep Learning Applications in Human Genomics Using Next-Generation Sequencing Data
20 pages
Nihms 839467
No ratings yet
Nihms 839467
30 pages
Head Master
100% (1)
Head Master
1 page
Xóa 16
No ratings yet
Xóa 16
29 pages
Accepted Version Full
No ratings yet
Accepted Version Full
48 pages
Action Research Proposal Sample
100% (3)
Action Research Proposal Sample
16 pages
Paper 4 PloS One
No ratings yet
Paper 4 PloS One
21 pages
Research Article Analysis of DNA Sequence Classification Using CNN and Hybrid Models
No ratings yet
Research Article Analysis of DNA Sequence Classification Using CNN and Hybrid Models
12 pages
Disease Prediction Model An Efficient Machine Lear
No ratings yet
Disease Prediction Model An Efficient Machine Lear
30 pages
Unveiling DNA Sequences: A Comparison of Machine Learning and Deep Learning Techniques For Prediction
No ratings yet
Unveiling DNA Sequences: A Comparison of Machine Learning and Deep Learning Techniques For Prediction
11 pages
MGCP Report (4-1)
No ratings yet
MGCP Report (4-1)
19 pages
Deep Learning: New Computational Modelling Techniques For Genomics
No ratings yet
Deep Learning: New Computational Modelling Techniques For Genomics
15 pages
Da1 SSDM
No ratings yet
Da1 SSDM
16 pages
Annurev Genom 021623 024727
No ratings yet
Annurev Genom 021623 024727
18 pages
P.E. 9 - Q1 - Module1b
No ratings yet
P.E. 9 - Q1 - Module1b
13 pages
AI-powered Biodiversity Assessment Species Classification Via DNA Barcoding and Deep Learning
No ratings yet
AI-powered Biodiversity Assessment Species Classification Via DNA Barcoding and Deep Learning
19 pages
Application of Deep Learning Technique in Next Generation Sequence Experiments
No ratings yet
Application of Deep Learning Technique in Next Generation Sequence Experiments
21 pages
Organizational Behavior Summary Chapter 7 Motivation Concept
No ratings yet
Organizational Behavior Summary Chapter 7 Motivation Concept
6 pages
New04 Thefuture Sequence To Expression Modells
No ratings yet
New04 Thefuture Sequence To Expression Modells
12 pages
Epics Ppt21
No ratings yet
Epics Ppt21
14 pages
CSE3068-Sequential and Spatial Data Mining: School of Computing Science and Engineering
No ratings yet
CSE3068-Sequential and Spatial Data Mining: School of Computing Science and Engineering
12 pages
Comparation Analysis of Ensemble Technique With Boosting (Xgboost) and Bagging (Randomforest) For Classify Splice Junction Dna Sequence Category
No ratings yet
Comparation Analysis of Ensemble Technique With Boosting (Xgboost) and Bagging (Randomforest) For Classify Splice Junction Dna Sequence Category
10 pages
Deep Learning For CB
No ratings yet
Deep Learning For CB
16 pages
Journal Pone 0263790
No ratings yet
Journal Pone 0263790
19 pages
Deep Learning For Comp Bio Review
No ratings yet
Deep Learning For Comp Bio Review
16 pages
Mock Et Al 2022 Taxonomic Classification of Dna Sequences Beyond Sequence Similarity Using Deep Neural Networks
No ratings yet
Mock Et Al 2022 Taxonomic Classification of Dna Sequences Beyond Sequence Similarity Using Deep Neural Networks
10 pages
AutoGenome An AutoML Tool For Genomi - 2021 - Artificial Intelligence in The Li
No ratings yet
AutoGenome An AutoML Tool For Genomi - 2021 - Artificial Intelligence in The Li
11 pages
Bio Report El
No ratings yet
Bio Report El
8 pages
ViroNia LSTM Based Proteomics Model For Precis - 2025 - Computers in Biology An
No ratings yet
ViroNia LSTM Based Proteomics Model For Precis - 2025 - Computers in Biology An
12 pages
Genomics 4
No ratings yet
Genomics 4
9 pages
Rep 16
No ratings yet
Rep 16
17 pages
High-Performance Virus Detection System by Using Deep Learning
No ratings yet
High-Performance Virus Detection System by Using Deep Learning
9 pages
Why Deep Learning Is Changing The Way To Approach NGS Data Processing A Review
No ratings yet
Why Deep Learning Is Changing The Way To Approach NGS Data Processing A Review
9 pages
CSE3068-Sequential and Spatial Data Mining: School of Computing Science and Engineering
No ratings yet
CSE3068-Sequential and Spatial Data Mining: School of Computing Science and Engineering
8 pages
Deep Neural Networks For High Dimension, Low Sample Size Data
No ratings yet
Deep Neural Networks For High Dimension, Low Sample Size Data
7 pages
Genetical Studies Ai ML Modules
No ratings yet
Genetical Studies Ai ML Modules
14 pages
Genomic Benchmarks: A Collection of Datasets For Genomic Sequence Classification
No ratings yet
Genomic Benchmarks: A Collection of Datasets For Genomic Sequence Classification
9 pages
Early Feature Extraction Drives Model Performance in High-Resolution Chromatin Accessibility Prediction A Systematic Evaluation of Deep Learning Architectures
No ratings yet
Early Feature Extraction Drives Model Performance in High-Resolution Chromatin Accessibility Prediction A Systematic Evaluation of Deep Learning Architectures
20 pages
P11 - Machine Learning Applications in Genetics and Genomics
No ratings yet
P11 - Machine Learning Applications in Genetics and Genomics
12 pages
Enhanced Viral Genome Classification Using Large L
No ratings yet
Enhanced Viral Genome Classification Using Large L
16 pages
Genomics Report
No ratings yet
Genomics Report
11 pages
Survey Paper
No ratings yet
Survey Paper
7 pages
Deep Learning Models On Big Data For Genomic Research
No ratings yet
Deep Learning Models On Big Data For Genomic Research
3 pages
AI For Personalized Medicine
No ratings yet
AI For Personalized Medicine
6 pages
1 Doc
No ratings yet
1 Doc
6 pages
Alfarama Journal of Basic & Applied Sciences Faculty of Science Port Said University
No ratings yet
Alfarama Journal of Basic & Applied Sciences Faculty of Science Port Said University
9 pages
THE SYLLABUS, Schemes of Work, Notes
No ratings yet
THE SYLLABUS, Schemes of Work, Notes
13 pages
Gene Finding
No ratings yet
Gene Finding
5 pages
DNA Design
No ratings yet
DNA Design
10 pages
Analysis of DNA Sequence Classification Using CNN and Hybrid Models
No ratings yet
Analysis of DNA Sequence Classification Using CNN and Hybrid Models
2 pages
AI in DNA
No ratings yet
AI in DNA
5 pages
AI in Genetics
No ratings yet
AI in Genetics
5 pages
Papers Summary
No ratings yet
Papers Summary
4 pages
Agriculture 5
No ratings yet
Agriculture 5
3 pages
Some Applications of Integration
No ratings yet
Some Applications of Integration
17 pages
BBT3 - CASD - BIOCOMP - 2ndassignment' With You
No ratings yet
BBT3 - CASD - BIOCOMP - 2ndassignment' With You
7 pages
2025 - Zhu Et Al - 1-s2.0-S1673852725000219-Main - A Critical Evaluation of Deep
No ratings yet
2025 - Zhu Et Al - 1-s2.0-S1673852725000219-Main - A Critical Evaluation of Deep
4 pages
Managment Assignment 1
No ratings yet
Managment Assignment 1
5 pages
Public Administration
No ratings yet
Public Administration
17 pages
EDUC 109 (2649) - Teaching Profession: Submitted By: Kristine Nicolle E. Dana
No ratings yet
EDUC 109 (2649) - Teaching Profession: Submitted By: Kristine Nicolle E. Dana
3 pages
Chapter 1-Humanities, Art History, Art Appreciation, and Assumptions of Art - 2ndsem20192020 PDF
No ratings yet
Chapter 1-Humanities, Art History, Art Appreciation, and Assumptions of Art - 2ndsem20192020 PDF
46 pages
Apply To Chevening 2024-25 Final
No ratings yet
Apply To Chevening 2024-25 Final
19 pages
Master Thesis Template Chalmers Latex
100% (3)
Master Thesis Template Chalmers Latex
6 pages
Reliability Assessment of Distribution Power System When Considering Energy Storage Configuration Technique
No ratings yet
Reliability Assessment of Distribution Power System When Considering Energy Storage Configuration Technique
10 pages
Ilovepdf Merged 2
No ratings yet
Ilovepdf Merged 2
52 pages
The Republic by Plato
No ratings yet
The Republic by Plato
3 pages
Management 13th & 20th Feb 2025
No ratings yet
Management 13th & 20th Feb 2025
16 pages
Lesson Plan Template Ailene
No ratings yet
Lesson Plan Template Ailene
10 pages
NUS AMP Brochure
No ratings yet
NUS AMP Brochure
15 pages
Bahan Kuliah HVDC 3 Rev-1
No ratings yet
Bahan Kuliah HVDC 3 Rev-1
97 pages
CHC42015 - Overview of Role Plays and Recorded Assessments
No ratings yet
CHC42015 - Overview of Role Plays and Recorded Assessments
10 pages
Schools Division of Tarlac Province: Blended Learning
No ratings yet
Schools Division of Tarlac Province: Blended Learning
35 pages
Critical Analysis of Judy Blumes Are You There God Its Me Margaret
No ratings yet
Critical Analysis of Judy Blumes Are You There God Its Me Margaret
11 pages
Tugas 2 Aljabar Linear - Santoper Yosua - 210402062
No ratings yet
Tugas 2 Aljabar Linear - Santoper Yosua - 210402062
18 pages
Republic of The Philippines Tacloban City Office of The Graduate School Email: Program: Master of Science in Information Technology (MSIT)
No ratings yet
Republic of The Philippines Tacloban City Office of The Graduate School Email: Program: Master of Science in Information Technology (MSIT)
2 pages
Gb-Xi (Half Yearly) Set-1
No ratings yet
Gb-Xi (Half Yearly) Set-1
6 pages
Tugas Off-Class Bahasa Inggris Profesi Film Review On A Movie "Taare Zameen Par"
No ratings yet
Tugas Off-Class Bahasa Inggris Profesi Film Review On A Movie "Taare Zameen Par"
4 pages
Hameed
No ratings yet
Hameed
2 pages
Management 8th Jan 2025
No ratings yet
Management 8th Jan 2025
14 pages
Management 8th Jan 2025
No ratings yet
Management 8th Jan 2025
14 pages
Management 26th Feb 2025
No ratings yet
Management 26th Feb 2025
12 pages
Management 6th Feb 2025
No ratings yet
Management 6th Feb 2025
10 pages
Ethics Review PDF
No ratings yet
Ethics Review PDF
10 pages
6) STD - 8th Answersheet
No ratings yet
6) STD - 8th Answersheet
3 pages
1 s2.0 S266682702400001X Main
No ratings yet
1 s2.0 S266682702400001X Main
8 pages
GSB5021 MODULE 7 ASSIGNMENT Sipiwe Singo Chisha 2
No ratings yet
GSB5021 MODULE 7 ASSIGNMENT Sipiwe Singo Chisha 2
9 pages
Heart Disease Detector
No ratings yet
Heart Disease Detector
7 pages
Informed Consent 2
No ratings yet
Informed Consent 2
7 pages
AP® Computer Science Principles Course and Exam Description, Effective Fall 2023
No ratings yet
AP® Computer Science Principles Course and Exam Description, Effective Fall 2023
1 page
MS Computer Engineering Degree Requirement Worksheet: Course Title Units Prerequisite Semester
No ratings yet
MS Computer Engineering Degree Requirement Worksheet: Course Title Units Prerequisite Semester
2 pages
DNA Computing: Harnessing Biological Processes for Computational Innovation
From Everand
DNA Computing: Harnessing Biological Processes for Computational Innovation
Fouad Sabry
No ratings yet

LayoutingFix

Uploaded by

LayoutingFix

Uploaded by

Jurnal Informatika Universitas Pamulang ISSN: 2541-1004

Penerbit: Fakultas Ilmu Komputer Universitas Pamulang e-ISSN: 2622-4615

A Hybrid Model for Human DNA Sequence Classification Using

1. Introduction analysis of genomic data, offering the potential for

2.3.2. Random Forest Classifier

You might also like