0% found this document useful (0 votes)
21 views10 pages

Fuzzy CNN

This article presents a novel framework called Fuzzy Convolution Neural Network (FCNN) designed for classifying tabular data, which has been less explored compared to image classification using CNNs. The proposed FCNN model maps feature values to fuzzy memberships and converts these into images for training, demonstrating competitive performance against traditional machine learning algorithms. The findings suggest that FCNN could serve as a promising alternative for structured data analysis, bridging the gap between traditional methods and deep learning techniques.

Uploaded by

gnaneshkatam0073
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views10 pages

Fuzzy CNN

This article presents a novel framework called Fuzzy Convolution Neural Network (FCNN) designed for classifying tabular data, which has been less explored compared to image classification using CNNs. The proposed FCNN model maps feature values to fuzzy memberships and converts these into images for training, demonstrating competitive performance against traditional machine learning algorithms. The findings suggest that FCNN could serve as a promising alternative for structured data analysis, bridging the gap between traditional methods and deep learning techniques.

Uploaded by

gnaneshkatam0073
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

This article has been accepted for publication in IEEE Access.

This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3479882

Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS. 2024.Doi Number

Fuzzy Convolution Neural Networks for Tabular


Data Classification
Arun D. Kulkarni
Computer Science Department, University of Texas at Tyler, Tyler, TX 75799, USA

Corresponding author: Arun D. Kulkarni (e-mail: [email protected]).

ABSTRACT Recently, convolution neural networks (CNNs) have attracted a great deal of attention due to
their remarkable performance in various domains, particularly in image and text classification tasks.
However, their application to tabular data classification remains underexplored. There are many fields such
as bioinformatics, finance, medicine where non-image data are prevalent. Adaption of CNNs to classify non-
image data remains highly challenging. This paper investigates the efficacy of CNNs for tabular data
classification, aiming to bridge the gap between traditional machine learning approaches and deep learning
techniques. We propose a novel framework fuzzy convolution neural network (FCNN) tailored specifically
for tabular data to capture local patterns within feature vectors. In our approach, we map feature values to
fuzzy memberships. The fuzzy membership vectors are converted into images that are used to train CNN
model. The trained CNN model is used to classify unknown feature vectors. To validate our approach, we
generated six complex noisy data sets. We used randomly selected seventy percent samples from each data
set for training and thirty percent for testing. The data sets were also classified using the state-of-the-art
machine learning algorithms such as the decision tree (DT), support vector machine (SVM), fuzzy neural
network (FNN), Bayes’ classifier, and Random Forest (RF). Experimental results demonstrate that our
proposed model can effectively learn meaningful representations from tabular data, achieving competitive or
superior performance compared to existing methods. Overall, our finding suggests that the proposed FCNN
model holds promise as a viable alternative for tabular data classification tasks, offering a fresh prospective
and potentially unlocking new opportunities for leveraging deep learning in structured data analysis.

INDEX TERMS Fuzzy Logic, Convolution Neural Networks, Deep Learning, Tabular data, Machine
Learning, Classification

I. INTRODUCTION high accuracy for image data classification. While CNNs


In the era of data-driven decision-making, the ability to have demonstrated remarkable success in tasks like image
accurately classify and analyze tabular data plays a crucial and text classification, their application to tabular data
role across various domains, including finance, healthcare, classification has received comparatively less attention.
marketing, and beyond. Traditionally, this task has been Tabular data typically consist of rows and columns, where
approached using machine learning algorithms such as each column represents a feature. CNNs are designed for
decision trees, support vector machines, random forests, processing grid-like data to capture spatial dependencies in
and artificial neural network models which rely on data like images, where relationships exist both
handcrafted features and explicit rule-based horizontally and vertically. However, tabular data do not
representations. However, with the advent of deep learning, possess the same grid-like structure as images, and the
particularly Convolutional Neural Networks (CNNs), there relationships between features are not spatial in nature.
has been a change in thinking in how complex patterns and CNNs offer several advantages over traditional machine
relationships in data can be learned directly from raw learning techniques. Firstly, they provide flexibility and
inputs. CNN models are conventionally used for image support iterative learning. Secondly, deep networks enable
classification due to their high performance, availability of the generation of tabular data, which can help alleviate
various architectures, and availability of graphical class imbalance issues. Thirdly, neural networks can be
processing units (GPUs). CNN models excel in achieving employed for multimodal learning problems, where tabular
VOLUME XX 2024 1

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3479882

data serve as one of many input modalities [1]. CNNs follow a hierarchical structure, making it less suitable for
exploit the spatial locality of features in data, which is not CNNs. CNN models encounter challenges such as the
applicable to tabular data. In tabular data, the relationships vanishing gradient problem, which is mitigated by
between features are not spatially organized but rather employing the entropy loss function with linear rectified
depend on their interdependencies. CNNs excel at units (ReLu) in the output layer. Another issue is
capturing local patterns in data due to their convolutional overfitting, especially prevalent in small datasets. Both
kernels, which slide across the input. Convolutional kernels Alex Net and ResNet-50 employ several techniques to
are excellent feature extractors that exploit two properties mitigate overfitting. Alex Net uses techniques such as data
of the input images: local connectivity and spatial locality. augmentation, drop out, and weight decay, while ResNet-
Local connectivity means that each kernel is connected to a 50 uses techniques such as data augmentation, batch
small region of the input image when performing the normalization, global average pooling, and weight decay.
convolution. The spatial locality property means that the Tabular data, characterized by structured rows and
pixels where the convolutional kernel is applied are highly columns, present unique challenges such as dealing with
correlated, and usually processing them jointly makes it heterogeneous feature types and capturing interactions
possible to extract meaningful feature representations. For between features.
example, a single convolutional kernel can learn to extract This paper aims to explore the efficacy of CNNs for
edges, textures, shapes, and gradients. While this is tabular data classification, filling a crucial gap in the
effective for tasks like image classification where local literature and advancing the understanding of deep learning
features matter, tabular data often require capturing both methods in structured data analysis. We propose a novel
local and global patterns to make accurate predictions. Fuzzy Convolution Neural Network (FCNN) architecture
Fully connected neural networks or tree-based models can specifically tailored for tabular data. We introduce a
better capture these global patterns. In tabular data, where method for mapping a feature vector onto an image. This
the number of features can be relatively small compared to mapping involves assigning feature values to their
other domains like images or text, the efficiency of CNNs corresponding fuzzy membership values. We employed
might not be fully utilized. CNNs require a large amount of fuzzy membership functions representing five term sets:
data to effectively learn the parameters of the convolutional very_low, low, medium, high, and very_high. We map each
filters. Tabular datasets are often smaller compared to fuzzified feature vector into an image. In our earlier
image datasets, making it challenging for CNNs to research work, we converted feature vectors into images by
generalize well. CNNs are known for their black-box mapping features and their ratios to rectangular shapes in
nature, making it challenging to interpret how they make the image canvas [3]. In this research work, we assign
predictions, especially in the context of tabular data where features to their fuzzy membership values and represent
interpretability is often crucial for understanding model fuzzy membership values by square shapes within the
decisions and gaining insights from the data. Despite these mapped image.
challenges, CNNs offer the potential to automatically learn Through extensive experimentation on six complex
hierarchical representations of tabular data, capturing both datasets, we evaluate the performance of our proposed
local and global patterns within feature vectors. There have model FCNN against traditional machine learning
been attempts to adapt CNNs for tabular data. One of the algorithms and the fuzzy neural network model. The
approaches to classifying tabular data is to transform the contributions of this paper are a) we introduce a novel
tabular data into images. CNNs require fixed-size input FCNN architecture designed for tabular data classification,
tensors, typically with three dimensions (width, height, addressing the unique challenges associated with structured
channels). Tabular data, on the other hand, can have data. b) we conducted comprehensive experiments to
varying numbers of features, and the order of features may demonstrate the effectiveness of FCNNs in comparison to
not have any significance. The effectiveness of CNNs on conventional machine learning approaches for tabular data
tasks involving image processing is because they consider classification tasks and demonstrated that the proposed
the spatial structure of data, capturing spatially local input FCNN model performs equal or superior to the state-of-the-
patterns. In tabular data, the relationships between features art methods. The paper's structure is as follows: Section II
are often more complex and might not be easily captured delves into related work, Section III presents the framework
by 1-D convolutions alone. We cannot feed a tabular for the FCNN model, Section IV covers implementation
dataset straight forward to a convolutional layer because and results, and Section V presents conclusions and future
tabular features are not spatially correlated [2]. Most work.
tabular data do not assume a spatial relationship between II. RELATED WORK
features, and thus are unsuitable for modeling using CNNs. Various machine learning algorithms are employed for the
CNNs are designed to automatically learn hierarchical classification of tabular data. These include the minimum
representations of features in data. In tabular data, the distance classifier, the maximum likelihood classifier (MLC),
importance of features and their relationships may not and non-parametric techniques such as the support vector

VOLUME XX, 2024 2

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3479882

machine (SVM), decision tree (DT), ensemble of decision algorithms, neural networks (NNs) have evolved into deep
trees, multi-layer perceptron model, fuzzy inference system, neural networks (DNNs) and convolutional neural networks
and fuzzy neural networks. The maximum likelihood (CNNs). CNNs stand out as highly effective learning
classification algorithm assumes a normal distribution for algorithms for understanding image content and have
feature values, computing the mean vector and covariance displayed remarkable performance in various computer vision
matrix for each class using training set data. By applying tasks. CNN models employ multiple layers of nonlinear
Bayes’ rule, the classifier calculates posterior probabilities and information processing units. The machine learning
assigns the sample to the class with the highest posterior community's interest in CNN surged after the ImageNet
probability [4]. competition in 2012, where Alex Net achieved record-
Decision tree (DT) classifiers are non-parametric breaking results in classifying images from a dataset
classifiers that do not require any priori statistical assumptions containing over 1.2 million images spanning one thousand
regarding the distribution of data. The structure of a decision classes [10]. Alex Net was built upon principles utilized in
tree consists of a root node, non-terminal nodes, and terminal LeNet. Deep Convolutional Neural Networks (DCNNs) have
nodes. The data are recursively divided down the decision tree heralded breakthroughs in processing images, videos, speech,
according to the defined classifier framework. One of the most and audio [11]. CNN models consist of convolution and
popular algorithms for constructing a decision tree is the ID3 pooling layers organized followed by one or more fully
algorithm. The ID3 induction tree algorithm has proven to be connected layers. They operate as feed-forward networks. In
effective when working with large datasets that have several convolution layers, inputs undergo convolution with a
features, where it is inefficient for human experts to process. weighted kernel, and the output is then passed through a
C4.5 is a supervised learning algorithm that is a descendant of nonlinear activation function to the subsequent layer. The
the ID3 algorithm. C4.5 allows the usage of both continuous primary aim of the pooling layer is to reduce spatial resolution.
and discrete attributes. The main problem with decision trees Rawat and Wang [12] offer a comprehensive survey of CNNs.
is overfitting [5, 6]. Random Forest (RF) is based on tree Zhang et al. [13] present a taxonomy of CNN models. CNNs
classifiers. It implements several classification trees. The input have the capability to learn internal representations directly
vector is classified with each tree in the forest. Each tree from raw pixels and are hierarchical learning models capable
provides a classification or "votes" for that class. The RF then of feature extraction [14]. Khan et al. [15] in their review
selects the classification with the most votes among all the article, categorized DCNN architectures into seven groups.
trees. The main advantages of Random Forest are unparalleled Deep learning enables computational models composed of
accuracy among current algorithms, efficient implementation multiple processing layers to learn representations of data with
on large datasets, and an easily saved structure for future use various levels of abstraction. Recent advancements in CNN
of pre-generated trees. In ensemble, results of trained models have been facilitated by the accessibility of fast
classifiers are combined through a voting process. The most graphical processing units (GPUs) and the availability of
widely used ensemble methods are boosting and bagging [7]. extensive datasets.
Support vector machines (SVMs) are supervised non- The primary advantage of CNNs is their capacity to
parametric statistical learning methods. SVMs aim to find a learn from input data and make decisions. However, due to the
hyperplane that separates training samples into a predefined substantial number of parameters, they can be challenging to
number of classes. Vapnik [8] proposed the SVM algorithm. interpret and are often regarded as black boxes since they do
Operating as a binary classifier, SVM assigns a sample to one not transparently explain how outcomes are achieved. Fuzzy
of the two linearly separable classes. In this algorithm, two Logic (FL) systems, on the other hand, excel at explaining
hyperplanes are chosen to not only maximize the distance their decisions but struggle with learning from input data.
between the two classes but also to exclude any points between Combining FL and CNN can mitigate the drawbacks of each
them. Nonlinearly separable classes are accommodated by approach to create a more robust and flexible computational
extending the SVM algorithm to map samples into a higher- system. Talpur et al. [16] in their survey article, detail methods
dimensional feature space. The SVM algorithm is especially for integrating FL and DNN to create hybrid systems. One
well-suited for tabular data due to its adeptness in handling approach involves a sequential structure, where fuzzy systems
small datasets, frequently yielding higher classification and DNN operate sequentially. In this structure, there are two
accuracy compared to traditional methods. possibilities: a) converting input data into fuzzy sets, followed
Neural networks are favored for classification due to by processing the fuzzified data with the DNN, and b) the
their parallel processing capabilities, as well as their learning DNN model aiding the fuzzy system in determining desired
and decision-making prowess. Several studies have aimed to parameters. Another method to combine FL and CNN is by
evaluate neural networks' performance compared to traditional utilizing CNN for feature extraction, transforming the output
statistical methods for tabular data. Neural networks equipped of the final convolution layer for fuzzy classification
with learning algorithms like backpropagation (BP) can Sarabakha et al. [17] propose a DFNN structure where input
extract insights from training samples and are utilized in features are fed to the fuzzification layer, and the fuzzified
tabular data analysis [9]. With advancements in hardware and vector serves as input to fully connected hidden layers. Fuzzy

VOLUME XX, 2024 3

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3479882

Deep Neural Networks (FDNNs) have been employed in Recurrent Unit (Bi-GRU) to capture statistical information
many practical applications. FDNNs represent a compelling and local positional correlation, respectively. Their
constructive collaboration between fuzzy logic and neural experiments show that TabularNet significantly outperforms
networks, offering a powerful tool for managing uncertainty the state-of-the-art ML algorithms. Arik et al. [23] propose a
and intricate relationships in real-world applications. Das et al. high-performance and interpretable deep tabular data learning
[18] provide a survey of FDNN systems. architecture called TabNet that uses sequential attention to
Deep Convolutional Neural Network (DCNN) choose which features to reason from at each decision step,
models have demonstrated remarkable performance and have enabling interpretability and more efficient learning. Besides
consequently been widely adopted for computer vision tasks. robust performance, TabNet provides explainable insights on
However, adapting them to tabular data remains highly its reasoning, both locally and globally. Borisov et al. [1]
challenging. Sun et al. [19] introduced a method called provide an overview of deep learning methods tailored for
SuperTML to convert tabular data into images. The algorithm tabular data, categorizing them into three groups: a) data
adopts the concept of the Super Characters method for transformations, b) specialized architectures, and c)
addressing machine learning tasks with tabular data. Initially, regularization models. In this study, we focus on the first
the input tabular features are projected onto a two-dimensional category: data transformations. Iqbal et al. [24] introduced a
embedding and then fed into fine-tuned two-dimensional novel feature embedding technique Dynamic Weighted
CNN models for classification. They validated the algorithm Tabular Method (DWTM), which dynamically uses feature
using four datasets, and experimental results demonstrate that weights based on their strength of the correlations to the class
SuperTML method achieves state-of-the-art results on both labels during applying any CNN architecture to the tabular
large and small tabular datasets. The main difference between data. In their approach each feature in the observation vector
SuperTML and FCNN method is that SuperTML method is is assigned space in the image canvas based on its
based on success of Super Characters method in text corresponding weight. They use statistical techniques such as
classification. Whereas FCNN method maps fuzzy Pearson correlation, chi-square test to compute the weights of
membership values into rectangular shapes. FCNN method is each feature. Their results show that DWTM usually
based on shape recognition. Sharma et al. [20] developed a outperforms the results of traditional ML algorithms.
method called DeepInsight to convert non-image data into Medeiros et al. [25] have provided comparative analysis of
images suitable for CNNs. Their approach constructs the tabular data into image for classification. They conclude that
image by grouping similar features together and positioning transforming tabular data into images to leverage the power of
dissimilar ones farther apart, facilitating the collective CNN has the potential to increase the model performance by
utilization of neighboring elements. They evaluated their the additional 2D spatial information that can be processed by
algorithm using four distinct datasets and compared the results CNN. Their study highlights the potential benefits and
against state-of-the-art classifiers such as decision trees, limitations of using image-based DL models for tabular data.
AdaBoost, and Random Forest. Their model exhibited Kulkarni [3] proposed a method to map tabular data into
superior classification accuracy across all datasets. In images. They mapped feature values and ratios of the feature
DeepInsight method feature vectors are transformed into values as rectangular shapes in the image canvas. They used
feature matrices that are represented by pixels in the mapped the model to classify tabular data. Li et al. [26] provide a
image. The method is more suitable for large data sets. Zhu et survey on Graph Neural Networks (GNNs) for tabular data.
al. [21] proposed a method named Image Generator for The survey highlights a critical gap in deep neural tabular data
Tabular Data (IGTD) to convert tabular data into images by learning methods: the underrepresentation of latent
assigning features to pixel positions in a way that similar correlations amongst data instances and feature values.
features are placed close to each other. The algorithm assigns
each feature to a pixel in the image, generating an image for III. PROPOSED FRAMEWORK
each data sample where the pixel intensity corresponds to the The framework for the proposed Fuzzy Convolution
value of the respective feature in the sample. The algorithm Neural Network (FCNN) is shown in Fig. 1. We analyzed six
seeks to optimize the assignment of features to pixels by tabular data sets using the proposed framework. The columns
minimizing the difference between the ranking of pairwise represent the features and rows represent entities. The last
distances between features and the ranking of pairwise column in the training set data represents class labels. The
distances between assigned pixels. They applied the algorithm first module in the proposed framework is a fuzzifier block,
to two datasets. Their results demonstrate that CNNs trained which converts feature values into the corresponding fuzzy
on IGTD images yield the highest average prediction memberships. The second module is a converter that maps
performance in cross-validation on both datasets. Du et al. [22] fuzzy memberships onto the image canvas. During the
Have proposed a neural network architecture TabularNet to training phase images are stored in Datamart. The last block
is CNN, which is trained using the images in Datamart.
simultaneously extract spatial and relational information from
During decision making phase, an unknown input feature
tables. The spatial encoder of the TabularNet utilizes the
vector is fuzzified. The fuzzified vector is converted into an
row/column level pooling and the Bidirectional Gated
image which is classified with the trained FCNN.
VOLUME XX, 2024 4

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3479882

FIGURE 1. Framework for fuzzy convolution neural network (FCNN)

  c −b 
The trapezoidal and π-shaped fuzzy membership functions  S  x; c − b, 2 , c  for x  c
 
f ( x; b, c) = 
are shown in Fig. 2 and Fig. 3, respectively. The trapezoidal
(2)
membership functions are given in (1).   c+b 
1 − S  x; c, , c + b  for x  c
  2 
Where a, b, c, d are constant that define the fuzzy
membership function. The π-shaped membership functions
are given by (2) and S ( x, a, b, c ) represents a membership
function which is defined in (3) [27].

FIGURE 2. Trapezoidal membership functions

0 for x  a

2( x − a)
2

 c−a 2 for a  x  b
 ( )
S ( x; a, b, c) =  (3)
 2( x − c)
2

1 − for b  x  c
(c − a)
2

FIGURE 3. -shaped membership functions 1 for x  c

In (3), a, b, and c are the parameters that are adjusted to fit
the desired membership function. The parameter b is the half
0 for xa width of the curve at the crossover point. The triangular
x−a membership functions are defined by three parameters a, b,
 for a xb and c as shown in (4).
b − a

f ( x; a, b, c, d ) = 1 for bxc (1)
0 for xa
d − x x−a
 for cxd  for a xb
d −c b − a
0 for d 0 f ( x; a, b, c) =  (4)
c − x for b xc
c − b
0 cx
 for
VOLUME XX, 2024 5

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3479882

The Gaussian membership functions are defined by mean intensive. Although they provide smooth curves, adjusting
and standard deviation as shown in (5). their shape precisely can be less intuitive since changes in
parameters affect both the spread and the height of the curve
 −( x − c ) 2  simultaneously. In situations where the data distribution is
f ( x; , c) = exp   (5) normal with known mean and variance values, Gaussian
 2
2
 MFs can represent the system more accurately. In our FCNN
Where c represents the mean value and ϭ represents the system implementation, we have chosen trapezoidal MFs.
standard deviation. The fuzzified feature vectors are mapped into
Both trapezoidal and π-shaped membership images, which are saved in Datamart in folders that are
functions (MFs) offer advantages over triangular and labeled with class names. The mapped sample image is
Gaussian MFs, particularly in terms of computational shown in Fig. 4. The shapes within the resulting mapped
efficiency, robustness to noise, and flexibility. The flat top of image symbolize the fuzzy membership values. The number
these functions allows for a range of values to have full of columns of shapes is equal to the number of term sets,
membership, which is beneficial when precise membership while the number of rows is equal to the number of features
values are less critical. The flat top region can enhance the in the observation vector. The number of the squares in the
system's robustness to small variations or noise in the input, output image is equal to n f  nterm where nf is the number of
as a range of input values will share the same membership features and nterm is the number of term sets. The area of each
degree. Trapezoidal MFs are computationally less intensive square in the mapped image is proportional to the
compared to Gaussian MFs. They are straightforward to corresponding fuzzy membership value. We analyzed six
define and implement, requiring only four parameters. Their datasets that contain two features for each observation.
The last module implements DCNN model. The
DCNN models are trained with the images stored in the images
stored in the Datamart. Convolution layers extract features
from the input image. Inputs are convolved with learned
weights to compute feature maps and results are sent through
a nonlinear activation function. The convolution layer is
followed by a pooling layer. All neurons within a feature map
have equal weights, however, different feature maps within the
same. convolution layers have different weights. [11]. The
output of the kth feature map Yk is given by (6)

Yk = f (Wk * x ) (6)

Where x denotes the input image, Wk is the convolution filter,


and the ‘*’sign represents the 2D convolution operator. The
FIGURE 4. Sample mapped image. purpose of the pooling layer is to reduce the spatial resolution
and extract invariant features. The output of a pooling layer if
simplicity and practical applicability in fuzzy inference given by (7).
systems make trapezoidal MFs especially advantageous. On
the other hand, π-shaped MFs provide a balance between
Ykij = max ( p ,q )Rij ( X kpq ) (7)
smooth transitions and robustness, combining the
characteristics of both trapezoidal and Gaussian functions.
Where Xkpq denotes elements at location (p, q) contained by
They are smooth like Gaussian functions but also feature a
flat top like trapezoidal functions. These features make π- the pooling region Rij. We used two DCNN models in our
shaped MFs suitable for a wide range of fuzzy logic analysis, the Alex Net and Resnet-50. Alex Net is seminal
applications where both smooth transitions and robustness CNN architecture that significantly contributed to the
are desired. Triangular MFs, with their linear transitions, can advancement of deep learning in computer vision tasks.
result in more abrupt changes, especially if the input value is AlexNet consists of eight layers: five convolution layers
close to the peak of the triangle. Managing overlapping followed by max-pooling layers, and three fully connected
regions can be challenging, as the membership value changes layers. The ReLu activation function is used throughout the
linearly and abruptly at the boundaries, making the system network, and dropout regularization is applied to prevent
more sensitive to input variations. Any change in input overfitting. The network has the image input size of 227-by-
directly affects the membership value due to the linear nature 227. The network maximizes the multinomial logistic
of the function. The advantage of triangular MFs lies in their regression objective function. Resnet-50 is DCNN, which is a
ease of implementation. Gaussian MFs, which require variant of Resnet architecture. It is one of the most popular and
exponential calculations, can be computationally more influential deep learning models used for image classification

VOLUME XX, 2024 6

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3479882

and related tasks. It uses residual connections that allow the


network to learn a set of residual functions that map the input
to desired output. These connections enable the network to
learn without suffering from vanishing gradients. It has fifty
layers. The architecture is divided into four parts: convolution
layers, the identity block, the convolution block, and fully
connected layers. It introduced the concept of residual
connections, which are shortcut connections that skip one or
more layers. These connections allow gradients to flow more
easily during training, mitigating the vanishing gradient
problem and enabling training of very deep networks. overlapping samples in the feature space. The datasets were
ResNet50 employes a bottleneck architecture, which reduces generated using a MATLAB script [29]. The Outliers dataset
the computational cost of the network by using 1x1, 3x3, 1x1
convolutions in sequence. Resnet50 is often used as a pre- FIGURE 6. Parallel coordinates for Two Spiral dataset.
trained model for transfer learning. The pre-trained model can
be fine-tuned on a smaller dataset for a specific task [28]. In comprised of 200 samples from four classes. The data set
our research work we used both AlexNet and Resnet50 to train contains overlapping samples in the feature space. As an
and classify images that were generated from fuzzified feature illustration, the results from analysis of the Two Spiral
vectors. dataset are presented below. The parallel coordinates plot for
the Two Spirals dataset is shown in Fig. 6.
IV. EXPERIMENT AND RESULTS The dataset was split for training and testing. 70
To validate our approach, we generated six artificial noisy percent of randomly selected samples were selected for
non-linearly separable datasets: Half Kernel, Two Spirals, training and the remaining 30 percent were used for
Clusters in Cluster, Crescent and Moon, Corners, and evaluating the models. The decision tree that was generated
Outliers. Scatter plots for the datasets are displayed in Fig. 5. to the Two Spiral data set is shown in Fig. 7. The confusion
Each dataset comprises two attributes and 400 samples. All matrix is shown in Fig. 8. The decision tree classifier was
datasets, except for the Corners dataset, consist of samples able to classify dataset with 92.5 percent accuracy. Fig. 9
from two classes, with 200 samples per class represented by shows the ROC curve obtained with the DT classifier. The
\ SVM and Bayes’ classifier were able to classify the dataset
with 65 percent accuracy, and with the RF we got the
accuracy of 95 percent.
We also classified the Two Spiral dataset using the
fuzzy neural network (FNN) shown in Fig. 10. The fuzzy
membership values were used as the input for the neural
network. FNN model consists of two modules. The first
module is a fuzzifier module that maps feature values into
fuzzy membership functions. We have used five trapezoidal

FIGURE 7. Decision tree for Two Spiral dataset


FIGURE 5. Scatter plots for Half Kernel, Spirals, Cluster in Cluster,
Crescent, Corners, Outliers datasets. membership functions that represent five term sets. The
neural network has 10 input units that represent fuzzy
blue and red dots. The Corners dataset contains samples from membership values for the two features. The hidden layer has
four classes. Two datasets, Half Kernel and Corners, exhibit ten units, and the output unit is with two units that represent
VOLUME XX, 2024 7

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3479882

two classes. The same dataset was classified by FNN model .

with the accuracy of 86 percent. The learning curve for the


FNN model is shown in Fig 11.

FIGURE 11. Error curve for FNN learning

The Two Spiral data set was analyzed by the proposed FCNN
model. We developed software using MATLAB script to
FIGURE 8. Confusion matrix for decision tree for Two Spiral dataset.
map fuzzy membership values to images. Each fuzzified
feature vector was mapped to an image. The mapped images
were stores in respective class folders in Datamart. We
implemented two FCNN models, one with Alex Net and the
other with Resnet 50 using MATLAB script. The input
image size for Alex Net was 227 x 277 x 3, and the input
image size for Resnet 50 was 512 x 512 x 3. The number of
output units for both models was equal to the number of
classes. The training progress plots for Alex Net and
Resnet50 are shown in Fig 12 and Fig. 13, respectively. For
training two FDNN models 70 percent of randomly chosen
images were used for training and the remaining 30 percent
were used for testing. Both FCNNs were able to classify
images in the testing set with 100 percent accuracy. and
Resnet-50. We implemented and executed FCNN models
FIGURE 9. ROC curve for decision tree for Two Spiral dataset. with both Alex Net and Resnet-50 on a desktop with a
Pentium dual processor. The execution time can be decreased
by executing the script on a workstation with a GPU. The
training process for Alex Net took about 4 min and 50 sec for
each data set for 28 iterations, while learning process for
Resnet-50 took about 78 min for each data set for 72
iterations. Some sample classified images with class labels
are shown in Fig. 14. In this example, the dataset consists of
two features. While mapping features, we mapped each
feature twice. The first two rows of shapes represent the first
feature, and the last two rows of shapes represent the second
feature. The ROC Curves for both FCNN models are shown
in Fig. 15 and 16. All six datasets were classified using ML
models that include decision tree (DT), support vector
machine (SVM), Bayes’ classifier, Random Forest (RF) and
fuzzy neural network (FNN). The classification accuracy
obtained by with these classifies for all six datasets is shown
in Table I. The classification accuracy for FCNN models
FIGURE 10. Fuzzy neural network (FNN) model. with AlexNet and Resnet-50 was the same for all datasets.

VOLUME XX, 2024 8

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3479882

TABLE I. CLASSIFICATION ACCURACY

Half Kernel Two Spirals Cluster-in-Cluster Crescent Moon Corners Outliers


Decision Tree 95.00 90.83 94.17 97.50 98.33 98.33
Support Vector Machine 66.67 65.00 56.67 85.00 58.33 99.17
Bayes’ Classifier 84.17 65.00 87.50 88.30 50.00 99.17
Random Forest 98.33 95.00 95.83 97.5 71.67 44.17
Fuzzy Neural Network 91.5 86.6 100.00 98.50 53.0 95.2
Fuzzy Convolution Neural Network 99.19 100.00 100.00 100.00 100.00 99.17

FIGURE 15. ROC curve for Two Spiral dataset (Alex Net)
FIGURE 12. Training progress plot for FCNN (Alex Net)

FIGURE 13. Training progress plot for FCNN (Resnet-50) FIGURE 16. ROC curve for Two Spiral dataset (Resnet-50)

V. CONCLUSIONS
In this paper, we present a novel framework called
FCNN for classifying tabular data. We developed software
using MATLAB scripts to map features to corresponding
fuzzy membership values and to convert fuzzified vectors
into images. Additionally, we implemented AlexNet and
ResNet-50 using the MATLAB Deep Learning Toolbox. To
evaluate the proposed approach, we generated six complex
noisy datasets and analyzed them using various ML
algorithms: decision trees, support vector machines, Bayes’
classifiers, Random Forests, and fuzzy neural networks. The
six datasets were also classified using the proposed FCNN
FIGURE 14. Classified images with labels (FCNN-Resnet-50) model. It can be observed from Table I that FCNN model
performs as well as or better than state-of-the-art ML
algorithms, suggesting that FCNN provides a viable
alternative for classifying tabular data. The limitation of the
proposed approach is the number of features and term sets.
VOLUME XX, 2024 9

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3479882

The number of shapes in the mapped image is proportional [15] A. Khan, A. Sohail, U. Zahoora, A. S. Qureshi, “A Survey of the
Recent Architectures of Deep Convolutional Neural Networks,”
to the number of features and term sets. For a finite image
Artificial Intelligence Review, 2020, vol. 53, pp. 5455–5516.
size, the number of shapes that can be mapped onto the image https://doi.org/10.1007/s10462-020-09825-6.
canvas is limited. Therefore, the approach is suitable for [16] N. Talpur, S. J. Abdulkadir, H. Alhussian, M. Hilmi Hasan, N. Aziz,
datasets with a small number of features. The future work A. Bamhdi. “Deep Neuro‑Fuzzy System application trends,
challenges, and future perspectives: a systematic survey,” Artificial
includes a) It is possible to directly feed images to the DCNN
Intelligence Review, 2023, vol. 56, pp 865–913.
and eliminate Datamart, b) Experimenting with shapes https://doi.org/10.1007/s10462-022-10188-3
having different morphological properties, such as circular, [17] A. Sarabakha, E. Kayacan E, “Online deep fuzzy learning for control
rectangular, hexagonal, or triangular to generate mapped of nonlinear systems using expert knowledge,” IEEE Transactions on
Fuzzy Systems, vol. 28, no. 7, 2019, pp. 1492–1503.
images. c) In this research work, we have used trapezoidal
https://doi.org/10.1109/TFUZZ.2019.2936787
fuzzy membership function. We would like to try other [18] R. Das, S. Sen, and U. Maulik, “Survey on fuzzy deep neural
membership functions such as Gaussian and triangular and networks.” ACM Comput. Survey, vol. 53, no. 3, May 2020.
evaluate the classification accuracy. d) In our current https://dl.acm.org/doi/abs/10.1145/3369798
[19] B. Sun, L. Yang, P. Dong, W. Zhang, J. Dong, and C. Young. ‘‘Super
research, we have utilized AlexNet and ResNet-50 DCNN
characters: A conversion from sentiment classification to image
models. We would like to analyze data using other DCNNs classification,” 2018, arXiv:1810.07653.
such as VGG-16 and GoogleNet and deploy the FCNN [20] A. Sharma, E. Vans, D. Shigemizu, K. A. Boroevich, and T. Tsunoda,
model to real-life applications. “Deep Insight: A methodology to transform a non-image data to an
image for convolution neural network architecture.” Nature Sci Rep
vol. 9, 2019, pp. 11399...
ACKNOWLEDGMENT [21] Zhu, Y., Brettin, T., Xia, F. et al.,” Converting tabular data into images
The author would like to express sincere thanks to for deep learning with convolutional neural networks.” Nature Sci
Rep, vol. 11, 2021, pp.11325 https://doi.org/10.1038/s41598-021-
the anonymous reviewers for their insightful comments and
90923-y
constructive suggestions, which have significantly [22] L. Du, et al, “TabularNet: A Neural Network Architecture for
contributed to the improvement of this manuscript. Understanding Semantic Structures of Tabular Data,” KDD ’21,
August 14–18, 2021, Virtual Event, Singapore, pp 322-331.
[23] S. O. Arik and T. Pfister, “TabNet: Attentive Interpretable Tabular
REFERENCES Learning.,” Proceedings of the AAAI Conference on Artificial
[1] V. Borisov, T. Leemann, K. Seßler, J. Haug, M. Pawelczyk, and G. Intelligence, vol. 35, no. 8, 2021, pp. 6679-6687.
Kasneci. “Deep Neural Networks and Tabular Data: A Survey,” IEEE https://doi.org/10.1609/aaai.v35i8.16826
Transactions on Neural Networks and Learning Systems, 2022. [24] Md. I. Iqbal et al.: Dynamic “Weighted Tabular Method for
(accepted for publication) https://doi.org/10.48550/arXiv.2110.01889 Convolutional Neural Networks Convolutional Neural Networks,”
[2] V. Martin, “Convolutional Neural Networks on Tabular Datasets (Part IEEE Access, vol 10,2022, pp 134183- 134198
1)”, 2021. https://medium.com/spikelab/convolutional-neural- [25] N. I. Medeiros, N. S. Rogerio da Silva, P. T. Endo, “A comparative
networks-on-tabular-datasets-part-1-4abdd67795b6 analysis of converters of tabular data into image for the classification
[3] A. D. Kulkarni, “Multispectral Image Analysis Using Convolution of Arboviruses using Convolutional Neural Networks,” PLoS ONE
Neural Networks,” International Journal of Advanced Computer 2023, vol.18, no. 12: 2023, e0295598.
Science and Applications, vol. 14, no. 10, 2023, pp. 13-19. doi: https://doi.org/10.1371/journal.pone.0295598
10.14569/IJACSA.2023.0141002 [26] C.-T. Li, Y.-C. Tsai, C.-Y. Chen, J. C. Liao. “Graph Neural Networks
[4] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern classification. John for Tabular Data Learning: A Survey with Taxonomy & Directions,”
Wiley & Sons, New York, 2001. 2024. https://github.com/Roytsai27/awesome-GNN4TDL
[5] T. Mitchell, Machine Learning, WCB/McGraw-Hill, Boston, MA, [27] A. D. Kulkarni, Computer Vision, and Fuzzy Neural Systems. 2001,
1997, pp. 52-80. Prentice Hall, Upper Saddle River, NJ.
[6] Y. Song, Y. Lu, “Decision tree methods: applications for classification [28] K. He, X. Zhang, S. Ren, and J. Sun. ‘‘Deep residual learning for
and prediction,” Shanghai Arch Psychiatry. 2015 Apr 25;27(2):130-5. image recognition,’ in Proceeding of. IEEE Conf. Comput. Vis.
doi: 10.11919/j.issn.1002-0829.215044. PMID: 26120265; PMCID: Pattern Recognit. (CVPR), Jun. 2016, pp. 770–778.
PMC4466856. [29] J. Kools, “6 functions for generating artificial datasets”,
[7] I. Breiman. “Random Forests.” Machine Learning, vol 45, 2001, pp. https://www.mathworks.com/matlabcentral/fileexchange/41459-6-
5–32. https://doi.org/10.1023/A:1010933404324 functions-for-generating-artificial-datasets, MATLAB Central File
[8] V. Vapnik, and S. Kotz, (2006). Estimation of Dependences Based on Exchange. Retrieved March 7, 2024.
Empirical Data. 2006 doi: 10.2307/2988246.
[9] B. Mehlig, Machine learning with neural networks: an introduction ARUN KULKARNI obtained M. Tech. and
for scientists and engineers. Cambridge University Press, 2021. Ph.D. degrees from the Indian Institute of
[10] A. Krizhevsky, I. Sutskever, G. Hinton, “ImageNet classification with Technology, Mumbai, and was a post-doctoral
deep convolutional neural networks.” Adv Neural Inf Process Syst. fellow at Virginia Tech. His areas of interest
2012. https://doi.org/10.1061/(ASCE)GT.1943-5606.0001284 include machine learning, data mining, deep
[11] Y. LeCun, Y. Bengin, and G. Hinton, “Deep learning,” Nature, vol. learning, and computer vision. He has more
521, 2015, pp. 436-444, than eighty refereed papers to his credit and has
[12] W. Rawat, and Z. Wang, “Deep convolution neural networks for authored two books. Currently, he is working as
image classification: A comprehensive review,” Neural Computation, a Professor of Computer Science, with The
vol. 29, 2017, pp. 2352-2449. University of Texas at Tyler. His awards
[13] S. Zhang, L. Yaq, A. Sun, Y. Tay, “Deep Learning based include the Office of Naval Research (ONR) 2008, Senior Summer Faculty
Recommender System: A Survey and New Perspectives,” ACM Fellowship award, 2005-2006 President’s Scholarly Achievement Award,
Computing Surveys, 2018, vol. 1, no. 1, pp. 1-35. 2001-2002 Chancellor's Council Outstanding Teaching award, 1997
[14] Q. Abbas, M. Ibrahim, M. Jaffar (2019) A comprehensive review of NASA/ASEE Summer Faculty Fellowship award, and the 1984 Fulbright
recent advances in deep vision systems. Artificial Intelligence Review, Fellowship award. He has been listed in who's who in America.
2019, vol. 52, pp 39–76. https://doi.org/10.1007/s10462-018-9633-3

VOLUME XX, 2024 10

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4

You might also like