A Hybrid Data Model To Share Medical Images
A Hybrid Data Model To Share Medical Images
31
International Journal of Computer Applications (0975 – 8887)
Volume 161 – No 9, March 2017
medical images is the need of the hour. The healthcare format files” or simply “DICOM files” and have the extension
industry is moving to the cloud and this adaption is essential “.dcm.”[16]. Due to this ease of integration this
to handle the huge storage required for storage of medical communication standard has become a nearly universal level
images. Also in [6] the author has discussed the advantages of of acceptance among vendors of radiological equipment.
NoSQL databases over RDBMS. As the literature shows the
suitability of NoSQL databases in handling medical images, 3.2 Parts of a DICOM file.
MongoDB is considered in this work. A Hybrid data model is The Digital Imaging and Communications in Medicine
essential which will effectively handle huge medical images. (DICOM) standard adopts files as individual, self-contained
This paper aims proposing a hybrid data model for sharing repositories for the storage of a mixed of alphanumerical and
Medical Images in the cloud, using distributed sharding binary content regarding radiological images.
environment. Medical images can be shared using the Cloud
The digital imaging and communications in medicine
and it is a necessity to have a Non-Relational based storage to
(DICOM) protocol is the one default standard for image data
handle medical images in the Cloud environment. MongoDB
management in healthcare. The DICOM file contains two
is a Document Database is much suitable to store information
parts stored as a single object i) A header that stores Meta data
in the cloud. The concept of sharding supported by MongoDB
ii)Image data stored as pixels. The header stores details about
allows partitioning the huge medical image into chunks and
the patient, acquisition parameters for the imaging study. It
move to a distributed environment.[2,3] is the objective study
also stores image dimensions, matrix size, color space, and a
of this paper.
host of additional non intensity information required by the
2.2 Robustness of NoSQL Databases computer to correctly display the image. The header is
The requirement of a NoSQL database to handle Medical followed by the image data stored as a long series of 0s and
Images in DICOM format is inevitable. The salient features 1s, which can be reconstructed as the image by using the
NoSQL Databases is the way the NoSQL database differ from information from the header. Fig 3.1 shows a sample DICOM
a traditional RDBMS. NoSQL databases are better in handling file.
unstructured data. They differ in Data model, Architecture,
data distribution and also in performance.
• Data model – A NoSQL database has a flexible
schema whereas the Data Model of RDBMS
follows a rigid Schema and it can handle only
structured data. A NoSQL Database is capable of
handling all types of data, structured,semi-
structured and Unstructured.
• Architecture – A NoSQL system can operate in a
distributed, scale-out design whereas RDBMS's are
architected in a centralized way.
• Data distribution model – A NoSQL database
allows data to be distributed evenly to all nodes
making up a database cluster and enables both reads
and writes on all machines whereas it is difficult to Fig 1. Structure of a DICOM image file
distribute data to the clients as it in works in a a The process of medical diagnosis relies on the technological
centralized fashion. In NoSQL it is the distributed capabilities of medical imaging and image analysis. The
model enables to parallel process huge data. diagnosis by the physicians relies on the accuracy and
• Scaling and Performance model – A NoSQL conclusions drawn from the medical images. The final
database scales horizontally based on the load by medical prescription depends on the various capabilities of the
adding extra nodes that deliver increased medical imaging in computer systems that aids in medical
performance in a linear manner wheras an RDBMS diagnosis [7].
typically scales vertically by adding extra CPU,
RAM, etc., to a centralized machine.[2]
3.3 Medical Image Sharing
Today the Medical diagnosis happens by sharing the medical
3. HYBRID DATA MODEL FOR images. The most important challenge in implementing a
sharing system for medical diagnosis using medical imaging
MEDICAL IMAGE SHARING is to consider and to choose the right data storage technology.
As mentioned before, medical imaging plays a vital role in The development of information technology has given
both decision making and treatment support. Sharing of different solutions in handling images. The methods have
Medical images with shorter latency time to access the images changed from time to time. The various methods are discussed
is needed to have the best quality health care service. below.
3.1 Medical Images storage structures 3.3.1 File Systems
The DICOM standard is capable of integrating almost every Initially, images were stored in files outside databases and
all modern imaging equipments, networking servers, inside databases only their paths were collected. This was
accessories and picture archiving and communication systems referred as file systems. Usually, groups of DICOM files are
(PACS) from different manufacturers[1]. A DICOM image hierarchically organized in studies and series, physically
file is Digital Imaging and Communications in Medicine disposed into file system directory trees. Despite its simplicity
standard. To be more specific, image files that are compliant in storing content, ordinary file systems do not provide index
with part 10 of the DICOM standard are referred as “DICOM
32
International Journal of Computer Applications (0975 – 8887)
Volume 161 – No 9, March 2017
capabilities allowing searches by content – restricting access Data needs to be processed in a way that computations on it
by directory names and file names. can be allowed to be performed as isolated subsets and then
combine to generate the desired output.[15]
3.3.2 RDBMS
To surpass above limitation, Picture Archiving and MongoDB has the ability to shard and distribute, parallel
Communication Systems (PACSs) often adopt Relational process it.
Database Management Systems (RDBMSs) as metadata
repositories, benefiting from its general-purposed index 3.4.2 Sharding
structures The process of splitting data up and storing the different
portions of the data on different machines is called sharding;
Then BLOB (Binary Large Object)- a new type of data was we can also use the term partitioning to describe this concept.
developed and introduced which allowed the possibility of It is possible to handle more loads without using powerful
image storage in RDBMS. Even though, Relational Databases servers by just splitting of data and storing it up across many
are the most popular technology for data storage, the accepted machines. It is possible to handle huge files without requiring
fact being that the BLOB is not the best solution for binary large or powerful machines.
data storage. SQL is highly incapable of handling binary
content. It is not possible to access binary content from the A single server’s capacity is challenged while handling large
SQL. data sets or huge sized Medical images. These applications
that handle huge medical image data can be categorized into
3.3.3 NoSQL Big Data needs demands high throughput. Huge data which is
The NoSQL(Not Only SQL) databases which are non- larger than the system’s RAM stress the I/O capacity of disk
relational in nature can handle the multimedia content with drives. A Shard is a computer connected to a cluster of
ease. As there is a substantial growth of multimedia data in machines used in the Sharding process.
the form of binary, it is essential to look into other non-
There are two types of Sharding methods i) Manual sharding
relational solutions to handle images and medical images in
and ii) Automatic Sharding
specific. So the use of NoSQL in handling Medical images
for the many reasons discussed in Section II is to be
considered. The reason primarily is the i) the ability to handle
3.4.2.1 Manual Sharding
Manual sharding is when the application connects to different
binary content as the native data format of NoSQL databases
independent servers. The sharding process is taken care of
have JSON as their storage format. ii) They are capable of
using the application code which manages the sharding
handling huge data files as they are scalable. This approach
process of storing the data in different servers and getting it
has gained general name of NoSQL approach
back by querying against the appropriate server. This
3.4 Hybrid Data model approach becomes difficult to maintain when nodes are added
Medical images in DICOM format has to be stored and also or removed from the database cluster in maintain the load
be shared. Sharing of medical images leads to transfer of large patterns.
amounts of data shared across the network. This may lead
network bottlenecks and congestion. Due to this there is an 3.4.2.2 Auto Sharding
increase in the latency time. This can be avoided and it is Auto sharding is the process where the data gets evenly
possible to have better bandwidth utilization. A data model distributed across the shards or the computers connected to
using MongoDB is presented here. This transfers medical the sharding environment. The data is chunked and sent
images using a distributed –sharding environment. It is across. The balancer put approximately same number of
possible to distribute and parallelly process the huge data chunks into each shard/system connected to the sharding
through sharding. cluster.
33
International Journal of Computer Applications (0975 – 8887)
Volume 161 – No 9, March 2017
and sent. Here the sharding process is abstracted from the anything permanently, it needs somewhere to get the shard
application. Sharding can be used only when there is a need to configuration. It syncs this data from the config servers.
handle large objects with ease, which improves performance.
5. EXPERIMENTAL SETUP
Fig 4.1 shows a Non-sharded MongoDB setup; where a client A Sharding environment was setup using 7 systems with
connects to a mongod process. Here there is no cluster of Ubutu Operating System and MongoDB 3.03. The machines
machines wherein huge files cannot be handled with ease. had the configuration,, 6th Generation Intel(R) Core(TM) i5-
6200U Processor (3M Cache, up to 2.80 GHz). The set up is
as given below.
We set one config server, One Query Router and 3 shards.
One system was a Client and other one was a server.
We studied the time complexity of sharing or storing DICOM
image files from a client machine to a server.
First the Config Server was set and then the Query router.
Then the shards were added one by one to the config server.
Sharding was enabled in the database level and the time
complexity was studied for huge DICOM files. The file sizes
varied from 1 GB to 5 GB. The study was carried carried out
for sharing data from a Client to a server machine through
shards.
Fig 2
A Non-sharded MongoDB setup is shown in Fig 3; where a
client connects to a mongod process. Here there is no cluster
of machines wherein huge files cannot be handled with ease.
The latency time increases as the sharing is directly uploaded
to the network for sharing. This method fails in handling huge
medical images.
6. RESULT
6.1 Time Complexity with Sharding
We try to share the medical images in a sharded and non-
sharded environment. The time was recorded in Sharding and
a Non-sharding environment. The latency time in a non-
sharded environment is much higher than the latency in
sharded environment. The results of the Non-sharded
environment are shown in Table 1.
The study was also carried out in a sharded environment. The
results indicate that the latency time decreased as we
increased the number of shards. The time taken to store was
Fig 3 much higher with a two shards and was very less with 3 or
more shards.
4.1 Setting up a Sharding environment
Sharding basically involves three different components Table-1 Time in a Non-Sharded Environment
working together: Size(MB) Three N0
shard shards(Mins)
Shard(Mins)
A shard is a container that holds a subset of a collection’s
data. Thus, even if there are many servers in a shard, there is 1000 1.5 3.1
only one master, and all of the servers contain the same data.
2000 2.14 13.3
mongos
This is the router process and comes with all MongoDB 3000 2.8 19.1
distributions. It basically just routes requests and aggregates
responses. It doesn’t store any data or configuration 4000 3.1 24.4
information.
5000 3.8 35.2
Config Server
Config servers store the configuration of the cluster: which
data is on which shard.Because mongos doesn’t store
34
International Journal of Computer Applications (0975 – 8887)
Volume 161 – No 9, March 2017
Table-2 Two and Three Shards comparison benefited with this model. This can be extended to any
radiological department which involves sharing of Medical
Size(MB) Three shards Two Shards Images.
1000 1.5 1.8
2000 2.14 5.21 7.2 Future Work
3000 2.8 8.16 This Data model is suitable for medical Image processing in
the Cloud environment. Health care Informatics data grows
4000 3.1 12.23
day by day. The amount of data is huge and the Health care
5000 3.8 16.8 providers are in a verge to move Health care information and
medical images to the cloud. The movement of medical
The Following is the Graphical representation of the same. images to the cloud needs a specific data model where large
amounts of data can be shared and processed without much
network bottlenecks. There is also a huge need to process and
analyze the huge volumes of data stored in the Cloud. This
model will be highly suitable to share and analyze huge sized
images and health informatics data stored in the Cloud. In
future, it is possible to develop a model to move medical
images to the cloud using this method.
8. ACKNOWLEDGMENTS
The authors wish to thank Gautham, Manikanda and Yogesh
MCA students of REVA University, Bangalore, for their
contribution to this work.
9. REFERENCES
[1] Oleg S. P ianykh,"Digital Imaging and Communications
in Medicine (DlCOM), A Practical Introduction and
Survival Guide ", book published by Springer-Verlag
Berlin Heidelberg, pp 247-261, 2008 and 2012
[2] Yimeng Liu, Yizhi Wang,Yi Jin, Research on The
Improvement of MongoDB Auto-Sharding in Cloud
Fig 5. Sharding vs NoSharding Environment, IEEE, 978-1-4673-0242-5-2012
[3] Kristina Chodrow, Michael Dirolf, Scaling MongoDB.
[4] Alexandre Savaris, Theo Härder, Aldo von Wangenheim,
DCMDSM:A DICOM decomposed storage model,
Journal of the American Medical Informatics Association
· February 2014
[5] Alexandre Savaris, Gabriela Bussolo Colonetti, Rodrigo
Rodrigues Pires de Mello, Aldo von Wangenheim
Relational Databases versus Search Engines: A
Performance Comparison for Storing and Querying
DICOM Metadata
[6] D.Revina Rebecca, I.Elizabeth Shanthi,A NoSQL
Solution to efficient storage and retrieval of Medical
Images,International Journal of Scientific & Engineering
Research, Volume 7, Issue 2, February-2016,ISSN 2229-
5518
[7] Liliana BYCZKOWSKA-LIPIŃSKA, Agnieszka
Fig 5 Three Shards vs Two Shards
WOSIAK, Multimedia NoSQL database solutions in the
medical imaging data analysis
7. CONCLUSION AND FUTURE WORK
7.1 Conclusion [8] Simón J. Rascovsky, MD, MSc • Jorge A. Delgado, MD
The study shows the effect of parallel processing and there is • Alexander Sanz, BS • Víctor D. Calvo, BS • Gabriel
a great reduction in time as the number of machines used to Castrillón, BS,Use of CouchDB for Document-based
distribute the data is increased. The time taken to share the Storage of DICOM Objects
image reduces with Data distribution using Sharding. The [9] Luís A. Bastião Silva, Louis Beroud, Carlos Costa and
time gets reduced with more number of Shards. More the data José Luis Oliveira,Medical imaging archiving: a
gets distributed, it takes lesser time. This clearly indicates comparison between several NoSQL,978-1-4799-2131-
that a huge data can be shared in a sharded environment with 7/14/$31.00 ©2014 IEEE.
ease. The main challenge in medical images was in handling
huge images, sharing, storing and retrieval. The degradation [10] Luan Henrique Santos Simões de Almeidaa, Marcelo
of the performance as the size increased can be easily Costa Oliveiraa, A Medical Image Backup Architecture
overcome with this model. Health care departments using Based on a NoSQL Database and Cloud Computing
telemedicine and sharing of medical images can be highly Services, MEDINFO 2015: eHealth-enabled Health,
35
International Journal of Computer Applications (0975 – 8887)
Volume 161 – No 9, March 2017
IJCATM : www.ijcaonline.org
36