0% found this document useful (0 votes)
98 views10 pages

3.2 Preprocessing

The document discusses preprocessing steps for medical image classification using convolutional neural networks. It describes resizing all images to a standardized size, extracting patches from each image containing the region of interest, and augmenting the training data by flipping and rotating images. The dataset is split into training, validation, and test subsets for developing and evaluating the model. Preprocessing is an important step to prepare medical images for use in CNNs for classification tasks.

Uploaded by

ALNATRON GROUPS
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
98 views10 pages

3.2 Preprocessing

The document discusses preprocessing steps for medical image classification using convolutional neural networks. It describes resizing all images to a standardized size, extracting patches from each image containing the region of interest, and augmenting the training data by flipping and rotating images. The dataset is split into training, validation, and test subsets for developing and evaluating the model. Preprocessing is an important step to prepare medical images for use in CNNs for classification tasks.

Uploaded by

ALNATRON GROUPS
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

3.

2 PREPROCESSING:

 Images come in different shapes and sizes. They also come


through different sources. 

 Taking all these variations into consideration, we need to perform some


pre-processing on any image data. RGB is the most popular encoding
format, and most “natural images” . Also, among the first step of data
pre-processing is to make the images of the same size.

 Here we have used auto resizing for training to make all the images in the
dataset to convert in to same resolution.

3.3 FEATURE EXTRACTION:

The process of feature extraction is useful when you need to reduce the
number of resources needed for processing without losing important or
relevant information. Feature extraction can also reduce the amount of
redundant data for a given analysis. Also, the reduction of the data and
the machine’s efforts in building variable combinations (features)
facilitate the speed of learning and generalization steps in the machine
learning process.

3.5 CNN:

 In deep learning, a convolutional neural network (CNN) is a type of deep


neural networks, which deals with the set of data to extract information
about that data. Like images, sounds or videos etc. can be used in the
CNN for the data extraction. There are mainly three things in CNN. First
one is local receptive field and then shared weight and biases and the last
one is activation and pooling. In CNN, first the neural networks are
trained using a heavy set of data so that the CNN can extract the feature
of given input. When the input is given, first image preprocessing is done
then the feature extraction occurs on the basis of set of data stored and
then the classification of data is done and output is shown as the result.

 The CNN can deal with those input only for what the neural network is
trained and the data is saved.

 They are used in image and video recognition, recommender systems,


image classification, medical image analysis, and natural language
processing

DATASET:

One major advantage of using CNNs over NNs is that you do not need to flatten
the input images to 1D as they are capable of working with image data in 2D.
This helps in retaining the “spatial” properties of images.So here we are using
x-ray data base which consist of three categories Frontal,Maxillary and Normal

PRE-PROCESSING STEPS

The pre-processing steps were conducted with resizing, patch, and


augmentation steps. The first pre-processing step normalizes the size of the
input images. Almost all the radiographs were rectangles of different heights
and too large (median value of matrix size ≥1,800). Accordingly, we resized all
images to a standardized 224×224 pixel square, through a combination of
preserving their aspect ratios and using zero-padding. The investigation of deep
learning efficiency depends on the input data; therefore, in the second
processing step, input images were pre-processed by using a patch (a cropped
part of each image). A patch was extracted using a bounding box so that it
contained sufficient maxillary sinus segmentation for analysis. Finally, data
augmentation was conducted for just the training dataset, using mirror images
that were reversed left to right and rotated −30, −10, 10, and 30 degrees.

IMAGE LABELING AND DATASET DISTRIBUTIONS:

All subjects were independently labeled twice as “normal” or “sinusitis” by two


radiologists. Labeling was first evaluated with the original images on a picture
archiving communication system (PACS) and secondly with the resized images
that were used for the actual learning data. Datasets were defined as the
internal dataset and temporal dataset, with the temporal dataset used to
evaluate the test. The internal dataset was randomly split into training (70%),
validation (15%), and test (15%) subsets. The distribution of internal the test
dataset consisted of 32% maxillary sinusitis, 32% Frontol sinusitis, and 34%
Normal.

Activation function

Activation function serves as a decision function and helps in learning of


intricate patterns. The selection of an appropriate activation function can
accelerate the learning process. In literature, different activation functions such
as sigmoid, tanh, maxout, SWISH, ReLU, and variants of ReLU, such as leaky
ReLU, ELU, and PReLU are used to inculcate non-linear combination of
features
Operations using NumPy:

Using NumPy, a developer can perform the following operations:


 Mathematical and logical operations on arrays.
 Fourier transforms and routines for shape manipulation.
 Operations related to linear algebra. NumPy has in-built functions for linear
algebra and random number generation.

The most important object defined in NumPy is an N-dimensional array type


called ndarray. It describes the collection of items of the same type. Items in the
collection can be accessed using a zero-based index. Every item in an ndarray
takes the same size of block in the memory. Each element in ndarray is an
object of data-type object (called dtype).

Data Type Objects(dtype)


A data type object describes interpretation of fixed block of memory
corresponding to an array, depending on the following aspects:
 Type of data (integer, float or Python object)
 Size of data
 Byte order (little-endian or big-endian)
 In case of structured type, the names of fields, data type of each field and part
of the memory block taken by each field
 If data type is a subarray, its shape and data type The byte order is decided by
prefixing '<' or '>' to data type. '<' means that encoding is littleendian (least
significant is stored in smallest address)
TENSORFLOW:
TensorFlow is an interface for expressing machine learning algorithms, and an
implementation for executing such algorithms. A computation expressed using
TensorFlow can be executed with little or no change on a wide variety of
heterogeneous systems, ranging from mobile devices such as phones and
tablets up to large-scale distributed systems of hundreds of machines and
thousands of computational devices such as GPU cards. The system is flexible
and can be used to express a wide variety of algorithms, including training and
inference algorithms for deep neural network models, and it has been used for
conducting research and for deploying machine learning systems into
production across more than a dozen areas of computer science and other
fields, including speech recognition, computer vision, robotics, information
retrieval, natural language processing, geographic information extraction, and
computational drug discovery. This paper describes the TensorFlow interface
and an implementation of that interface that we have built at Google. The
TensorFlow API and a reference implementation were released as an open-
source package under the Apache 2.0 license in November, 2015 and are
available at www.tensorflow.org.

Based on our experience with Disbelief and a more complete understanding of


the desirable system properties and requirements for training and using neural
networks, we have built TensorFlow, our second-generation system for the
implementation and deployment of largescale machine learning models.
TensorFlow takes computations described using a dataflow-like model and
maps them onto a wide variety of different hardware platforms, ranging from
running inference on mobile device platforms such as Android and iOS to
modest sized training and inference systems using single machines containing
one or many GPU cards to large-scale training systems running on hundreds of
specialized machines with thousands of GPUs. Having a single system that can
span such a broad range of platforms significantly simplifies the real-world use
of machine learning system, as we have found that having separate systems
for large-scale training and small-scale deployment leads to significant
maintenance burdens and leaky abstractions. TensorFlow computations are
expressed as stateful dataflow graphs (described in more detail in Section 2),
and we have focused on making the system both flexible enough for quickly
experimenting with new models for research purposes and sufficiently high
performance and robust for production training and deployment of machine
learning models. For scaling neural network training to larger deployments,
TensorFlow allows clients to easily express various kinds of parallelism through
replication and parallel execution of a core model dataflow graph, with many
different computational devices all collaborating to update a set of shared
parameters or other state. Modest changes in the description of the
computation allow a wide variety of different approaches to parallelism to be
achieved and tried with low effort [14, 29, 42]. Some TensorFlow uses allow
some flexibility in terms of the consistency of parameter updates, and we can
easily express and take advantage of these relaxed synchronization
requirements in some of our larger deployments. Compared to Disbelief,
TensorFlow’s programming model is more flexible, its performance is
significantly better, and it supports training and using a broader range of
models on a wider variety of heterogeneous hardware platforms.
In a TensorFlow graph, each node has zero or more inputs and zero or more
outputs, and represents the instantiation of an operation. Values that flow
along normal edges in the graph (from outputs to inputs) are tensors, arbitrary
dimensionality arrays where the underlying element type is specified or
inferred at graph-construction time. Special edges, called control
dependencies, can also exist in the graph: no data flows along such edges, but
they indicate that the source node for the control dependence must finish
executing before the destination node for the control dependence starts
executing. Since our model includes mutable state, control dependencies can
be used directly by clients to enforce happens before relationships. Our
implementation also sometimes inserts control dependencies to enforce
orderings between otherwise independent operations as a way of, for
example, controlling the peak memory usage.

TENSORFLOW IMPLEMENTATION:

The main components in a TensorFlow system are the client, which uses the
Session interface to communicate with the master, and one or more worker
processes, with each worker process responsible for arbitrating access to one
or more computational devices (such as CPU cores or GPU cards) and for
executing graph nodes on those devices as instructed by the master. We have
both local and distributed implementations of the TensorFlow interface. The
local implementation is used when the client, the master, and the worker all
run on a single machine in the context of a single operating system process
(possibly with multiple devices, if for example, the machine has many GPU
cards installed). The distributed implementation shares most of the code with
the local implementation, but extends it with support for an environment
where the client, the master, and the workers can all be in different processes
on different machines. In our distributed environment, these different tasks
are containers in jobs managed by a cluster scheduling system . These two
different modes are illustrated . Most of the rest of this section discusses issues
that are common to both implementations, while This discusses some issues
that are particular to the distributed implementation.

Data Parallel Training

One simple technique for speeding up SGD is to parallelize the computation of


the gradient for a mini-batch across mini-batch elements. For example, if we
are using a mini-batch size of 1000 elements, we can use 10 replicas of the
model to each compute the gradient for 100 elements, and then combine the
gradients and apply updates to the parameters synchronously, in order to
behave exactly as if we were running the sequential SGD algorithm with a
batch size of 1000 elements. In this case, the TensorFlow graph simply has
many replicas of the portion of the graph that does the bulk of the model
computation, and a single client thread drives the entire training loop for this
large graph. The TensorFlow system shares some design characteristics with its
predecessor system, Disbelief , and with later systems with similar designs like
Project Adam and the Parameter Server project . Like Disbelief and Project
Adam, TensorFlow allows computations to be spread out across many
computational devices across many machines, and allows users to specify
machine learning models using relatively high-level descriptions. Unlike
DistBelief and Project Adam, though, the general-purpose dataflow graph
model in TensorFlow is more flexible and more amenable to expressing a wider
variety of machine learning models and optimization algorithms.
LIMITATIONS:

The VGG-16 and VGG-19 architecture consist of large kernel-size filters with
multiple 3×3 kernel-size filters, one after another. Within a given receptive field
(the effective area size of an input image on which output depends), multiple
stacked smaller sized kernels are better than a single larger sized kernel because
multiple non-linear layers increase the depth of the network, enabling it to learn
more complex features at a lower cost. As a result, the 3×3 kernels in the VGG
architecture help to retain more fine details of an image . The ResNet
architecture is similar to the VGG model, consisting mostly of 3×3 filters.
Additionally, the ResNet model has a network depth of as large as 152.
Therefore, it achieves better accuracy than VGG and GoogleNet, while being
computationally more efficient than VGG. While the VGG and ResNet models
achieve phenomenal accuracy, their deployment on even the most modest sized
GPUs is a problem because of the massive computational requirements, both in
terms of memory and time. There are several limitations to this study. First, the
external test dataset in multiple medical centers did not be included for
reproducibility. In the case of X-ray equipment, there is a relatively small
difference in performance compared to other medical imaging equipment,
depending on the manufacturer or model. In this study, therefore, the external
test dataset in other medical centers did not be included. In the case of a local
medical center using relatively old equipment; however, an additional
performance evaluation is also required to utilize artificial intelligence (AI)
assistive software. Second, the proposed majority decision algorithm was
optimized to evaluate only maxillary sinusitis. Therefore, there is a limitation to
evaluate sinusitis in frontal, ethmoid, and sphenoid. In order to utilize AI based
assistive software in the future, further study is underway because it is
necessary to evaluate sinusitis at other locations as well as maxillary. Third, it
lacks pattern recognition and representation methods that can solve black-box in
deep learning. It needs to determine a reasonable consensus for solving the
black-box problem. The feature recognition based activation map was used to
solve the black-box problem in deep learning. As it can be shown from the
results, not only classification but also lesion localization can be expressed as a
result. It helps medical doctors make a reasonable inference about the deep
learning analysis. However, it is not enough to understand all deep leaning
procedures. For example, it is difficult to understand the pattern of each learned
CNN model. By understanding the pattern recognition capabilities of each
model, we can understand the advantages and disadvantages of each model and
achieve the optimization of the overall AI system. To overcome this limitation,
a feature connectivity representation should be available for each layer to
determine which feature weights are strong. In addition to feature
representation, text-based description algorithm can be applied to overcome the
black-box limitation in a medical application using the convolutional recurrent
neural network (CRNN) that is the combination CNN and recurrent neural
network (RNN) (20,21). A majority decision algorithm with multiple CNN
models was shown to have high accuracy and significantly more accurate lesion
detection ability compared to individual CNN models. The proposed deep
learning method using PNS X-ray images can be used as an adjunct tool to help
improve the diagnostic accuracy of maxillary sinusitis.

You might also like