Classification
Classification
• In recent years, machine learning (ML) has emerged as a transformative tool in the field of antenna designing and materials, particularly in the
detection and classification of cracks. Traditional methods of crack detection, which often rely on visual inspections or basic statistical techniques, have
limitations in terms of scalability, accuracy, and the ability to process complex, high-dimensional data. These traditional methods struggle to handle the
variability and noise inherent in real-world sensor data. Machine learning, on the other hand, offers powerful algorithms that can automatically learn
from large datasets, identify intricate patterns, and make highly accurate predictions.
• ML-based approaches leverage features extracted from sensor data, such as frequency and amplitude, to enhance the detection and classification of
cracks in materials. These models can be trained to differentiate between various crack lengths or types, offering higher precision and reliability than
conventional methods. Additionally, machine learning models can effectively manage noisy and unstructured data, making them suitable for diverse
real-world scenarios. By applying advanced techniques like neural networks, random forests, and gradient boosting, these models provide a robust
framework for predicting crack lengths, assessing structural integrity, and ensuring safety in engineering applications.
• The integration of machine learning into crack detection systems not only improves the accuracy of predictions but also facilitates proactive
maintenance and decision-making, helping to reduce downtime and prevent catastrophic failures. With the ability to process and analyze large
volumes of data, machine learning represents a key advancement in creating smarter, more efficient monitoring systems for the maintenance of critical
infrastructure.
• The four techniques applied to crack assessment in this study are as follows:
1) Artificial Neural Network
In this study, we employed a multi-3 layer Artificial Neural Network (ANN) for classification, with the goal of predicting the crack length classes based on
resonant frequency and amplitude measurements. The ANN model architecture was designed with the following layers:
Input Layer and First Hidden Layer: The input layer consists of 128 neurons, matching the number of features in the training data (which is 2,
corresponding to resonant frequency and amplitude). The ReLU (Rectified Linear Unit) activation function was used for this layer to introduce non-linearity
and allow the model to learn complex patterns in the data. ReLU is widely used because of its simplicity and effectiveness in mitigating issues such as
vanishing gradients during training.
Second Hidden Layer: The second hidden layer consists of 64 neurons, with the ReLU activation function again used. This layer further processes the
information from the first hidden layer and enhances the model's ability to extract high-level features from the data.
Output Layer: The output layer contains 4 neurons, corresponding to the four classes (No Crack, 1 Crack, 2 Cracks, 3 Cracks) in the classification task. The
activation function used in the output layer is softmax, which is ideal for multi-class classification. Softmax converts the raw output of the network into a
probability distribution over the classes, with the highest probability corresponding to the predicted class.
Model Compilation: The model was compiled with the following settings:
Loss Function: The categorical crossentropy loss function was chosen, as this is the standard for multi-class classification problems. It measures the
difference between the predicted probability distribution (from the softmax layer) and the true class labels.
Optimizer: The Adam optimizer was used due to its adaptive learning rate capabilities, which helps in efficiently minimizing the loss function. Adam is
popular for its fast convergence and robustness in training neural networks.
The ANN model was trained on the resampled and normalized dataset, which was split into training and testing sets. This architecture, along with the
chosen activation functions, optimizer, and loss function, allowed the model to learn complex patterns from the input features and perform multi-class
classification effectively.
2) Random Forest
The Random Forest classifier is an ensemble learning technique that is highly effective for multi-class classification tasks, such as crack detection in
materials or antenna-based designs. Random Forest works by constructing a collection of decision trees during training and outputting the mode (for
classification) of the classes or the mean prediction (for regression) of the individual trees. The key advantages of using a Random Forest classifier are its
robustness, ability to handle a large number of features, and resilience to overfitting.
Each tree in the Random Forest is built by randomly selecting a subset of features and training on a random subset of the training data. This randomness
helps create a diverse set of trees, and by aggregating their predictions, Random Forest can provide more accurate and stable results compared to
individual decision trees. For multi-class classification, Random Forest uses a majority voting mechanism to predict the class label, improving the overall
prediction accuracy.
It deals highly better with highly Imbalanced Data as it can learn robust patterns even when the class distribution is skewed.
Although Random Forest is a complex model, feature importance can be extracted from the trained model, providing valuable insights into which features
(e.g., frequency or amplitude) most influence crack classification.
Random Forest can handle both numerical and categorical data, making it suitable for a wide range of features, including sensor data.
By using an ensemble of trees, Random Forest often outperforms individual models, especially on tasks with complex data and non-linear relationships,
like crack detection in materials.
3) Decision Tree
In our multi-class classification, the Decision Tree builds a tree where each leaf node corresponds to a predicted class label. The decision-making process
starts at the root of the tree, where a feature is chosen that best splits the data according to some criterion. This process continues recursively through the
tree, with each subsequent node corresponding to a further division of the dataset. The tree stops growing when a stopping criterion is met, such as a
maximum depth or when the data in the node is pure (i.e., all the samples in the node belong to the same class).
Decision Tree predicts the class of a given instance by assigning it the class label that is most frequent among the samples in the leaf node. The tree can be
trained using algorithms like CART (Classification and Regression Trees) or ID3 (Iterative Dichotomiser 3), which build the tree based on splitting criteria.
Advantages of Decision Tree for Multi-Class Classification
Interpretability: One of the major advantages of Decision Trees is their ease of interpretation. The model can be visualized as a tree, where each path from
the root to a leaf node represents a series of decisions based on feature values. This makes it easy to understand the reasoning behind each prediction,
which is valuable for decision-making, especially in fields like materials analysis or crack detection.
Non-Linear Relationships: Decision Trees are capable of modeling non-linear relationships between features and the target variable. In your case, for crack
detection or antenna design, the relationship between the features (like frequency and amplitude) and the class labels (such as crack types) can be
complex and non-linear. Decision Trees handle such complexities naturally.
Handling Both Categorical and Numerical Data: Decision Trees can handle both categorical and continuous features, making them versatile for datasets
with mixed types of data, such as sensor readings or material properties.
Feature Importance: Decision Trees provide an inherent method to calculate the importance of each feature in making predictions. This can help in
understanding which features (like frequency or amplitude in crack detection) are most important in determining the class label.
Handling Missing Values: Decision Trees can handle missing values by splitting data based on available features. This is useful when dealing with real-world
datasets that may have incomplete or noisy entries.
4) Gradient Boosting
Gradient Boosting is an ensemble machine learning technique that combines the predictions of multiple weak learners (typically
decision trees) to create a powerful predictive model. It builds the model iteratively by fitting new trees to the residual errors
made by the previous trees. This boosting process helps in reducing the bias of the model, making it highly effective for both
binary and multi-class classification tasks.
Initialization: The first model is usually a simple decision tree, which makes a prediction for each instance. This model is fit to
minimize the initial residuals (differences between the observed and predicted values).
Residual Calculation: After the first model is trained, the residuals (errors) are calculated by subtracting the predicted values from
the true target values.
Fitting the Next Model: A new decision tree is then trained on these residuals. This tree focuses on learning the errors made by
the previous tree and tries to improve the predictions.
Update Model Predictions: The predictions of all previous models are combined to form the final prediction. Each new tree added
is weighted by a learning rate, which controls the contribution of the new model to the final prediction.
Repeat: This process continues iteratively, with each new model correcting the errors of the previous one. The learning rate and
the number of trees are hyperparameters that influence the performance of the model.
4) Gradient Boosting
Advantages of Gradient Boosting for Multi-Class Classification
High Accuracy: Gradient Boosting is known for its high predictive performance, often outperforming other machine learning
algorithms. This makes it particularly useful when dealing with complex datasets like those involving crack detection or
materials design, where intricate patterns need to be captured.
Handling Complex Data: Gradient Boosting is capable of modeling complex relationships between features and target labels.
It is especially effective in scenarios with non-linear relationships, where other algorithms may struggle to identify patterns.
Feature Importance: Like Decision Trees, Gradient Boosting can evaluate the importance of each feature in the prediction
process. This helps identify which features (e.g., frequency, amplitude) are most influential in determining the class label,
aiding in feature selection and model interpretability.
Robust to Overfitting: While individual decision trees are prone to overfitting, Gradient Boosting mitigates this risk by adding
trees sequentially, focusing on the hardest-to-predict instances. Additionally, regularization techniques like early stopping and
shrinkage (learning rate) can further prevent overfitting.
Versatility: Gradient Boosting can be applied to a wide range of problems, including both classification and regression tasks.
In multi-class classification, it can handle multiple categories without the need for additional transformation of the problem.
Experiment:Predictive Analysis of
Crack Erosion
Para1:We demonstrate a machine learning-based analysis using four algorithms: Artificial Neural
Network (ANN) Random Forest, Decision Tree, Gradient Boosting, and Decision Tree. The
process involved critical steps such as data preprocessing, model training, and evaluation. n our
experiments, we applied the Synthetic Minority Oversampling Technique (SMOTE) to address
the issue of class imbalance in the dataset. Initially, the dataset contained 63 measurements of
resonant frequency and resonant amplitude, each corresponding to a specific crack length, with
an uneven distribution across the classes. This imbalance could lead to biased model
performance, where the classifier may be more inclined to predict the majority class, resulting in
inaccurate evaluations. To mitigate this, SMOTE was employed to generate synthetic samples for
the underrepresented classes, expanding the dataset to 500 samples per class. By doing so, we
ensured that the model was exposed to a more balanced dataset, allowing it to learn from all
classes equally and ultimately improving the generalization ability of the model. This
preprocessing step, along with normalization of the data on a scale of 0 to 1, was essential in
preparing the data for training, ensuring the model's robustness across different class
distributions. The dataset was then split into training and testing sets with an 80:20 ratio.
Experiment: Predictive Analysis of
Crack Erosion
Para2:In this work, TensorFlow and Keras, powerful Python libraries, were
utilized for the development and training of neural network models,
enabling efficient and accurate multi-class classification. The performance
of these models was rigorously evaluated using key metrics, including
Precision, Recall, F1-Score, Accuracy, and AUC-ROC, all of which are
presented in the table below. The integration of these machine learning
algorithms significantly enhanced the predictive capabilities of the crack
detection system. By leveraging a variety of algorithms alongside robust
preprocessing techniques, the overall accuracy and reliability of the
system were notably improved, demonstrating the transformative
potential of machine learning in advanced crack detection and structural
integrity analysis.
Details for Evaluation Metrics
In the context of multi-class classification for crack detection, several evaluation metrics
were used to assess the performance of the models. These metrics provide valuable
insights into how well the model is performing in terms of both classification accuracy
and error rates, especially in imbalanced datasets.
1) Precision: Precision is a measure of how many of the predicted positive instances are
actually positive. In multi-class classification, precision is calculated for each class and
refers to the proportion of true positive predictions for that class out of all predictions
made for that class. It is especially important when the cost of false positives is high.
Formula:
Details for Evaluation Metrics
2) Recall: Recall (also known as Sensitivity or True Positive Rate)
measures how many of the actual positive instances were correctly
identified by the model. In crack detection, recall is critical because it
emphasizes the model’s ability to detect cracks, even at the cost of
some false positives. A high recall ensures that the system is less likely
to miss any cracks.
Formula:
Details for Evaluation Metrics
3) F1-Score: The F1-score is the harmonic mean of precision and recall,
providing a balance between the two. It is particularly useful when the
class distribution is imbalanced, as it considers both false positives and
false negatives. The F1-score is a good overall metric to evaluate the
model when the goal is to optimize both precision and recall
simultaneously, which is often the case in applications like crack detection
where both false positives and false negatives should be minimized.
Formula:
Details for Evaluation Metrics
4) AUC-ROC: The AUC-ROC (Area Under the Receiver Operating Characteristic Curve) is a performance measurement for
classification problems at various threshold settings. The ROC curve plots the true positive rate (Recall) against the false positive
rate (1 - Specificity). The AUC represents the area under this curve, with values closer to 1 indicating better model performance.
AUC ranges from 0 to 1, where 1 represents a perfect model and 0.5 indicates a model that performs no better than random
chance.
Formula:
These metrics together provide a comprehensive view of the model’s effectiveness in predicting the correct crack class and help in
determining the trade-off between false positives and false negatives.
Details [To link with your training
loss curves]
During training, the objective is to optimize both AUC,
accuracy and F1 scores, with a maximum possible value of
1. The model was trained for 200 epochs, achieving an
impressive training accuracy and loss.
Similarly, simulated ambient noise was only introduced to the resonant frequency values. Figure 1 illustrates the
antenna responses with added noise, where the noise threshold was set to a 10% variation in frequency. This process
was repeated for various noise thresholds of 1%, 5%, and 10%, with the corresponding confusion matrices shown in
Figures 4(a-c). The results again demonstrated the model's robustness, with stable classification accuracies of 85%,
83%, and 81% for noise levels of 1%, 5%, and 10%, respectively. However, similar to the amplitude-based tests, when
the noise threshold increased beyond 10%, the error rate rose, indicating that the model's performance became less
stable with higher noise levels.
Fill
Evaluation Metrics Artificial Neural Random Forest Decision Tree Gradient Boosting
Network (ANN) (RF) (DT) (GB)
Precision
Recall
Accuracy
F1-Score
AUC-ROC