0% found this document useful (0 votes)
30 views6 pages

A Hybrid Data Model To Share Medical Images

This document discusses challenges in storing, retrieving, and sharing large medical images. It proposes a hybrid data model using MongoDB, a NoSQL database, to address these challenges. MongoDB allows for flexible data modeling and is well-suited for medical image storage due to its ability to handle unstructured data and large file sizes. The model utilizes MongoDB's sharding feature to partition large images into chunks and distribute them across multiple machines, improving throughput and reducing latency during sharing. Previous research found MongoDB performed better than other databases for large files. The proposed hybrid model aims to facilitate medical image sharing in the cloud by leveraging the scalability of distributed storage.

Uploaded by

Giovana Saraiva
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views6 pages

A Hybrid Data Model To Share Medical Images

This document discusses challenges in storing, retrieving, and sharing large medical images. It proposes a hybrid data model using MongoDB, a NoSQL database, to address these challenges. MongoDB allows for flexible data modeling and is well-suited for medical image storage due to its ability to handle unstructured data and large file sizes. The model utilizes MongoDB's sharding feature to partition large images into chunks and distribute them across multiple machines, improving throughput and reducing latency during sharing. Previous research found MongoDB performed better than other databases for large files. The proposed hybrid model aims to facilitate medical image sharing in the cloud by leveraging the scalability of distributed storage.

Uploaded by

Giovana Saraiva
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

International Journal of Computer Applications (0975 – 8887)

Volume 161 – No 9, March 2017

A Hybrid Data Model to Share Medical Images


D. Revina Rebecca I. Elizabeth Shanthi, PhD
Research Scholar Associate Profeesor
Avinashilingam University Avinashilingam University
Coimbatore, India Coimbatore, India

ABSTRACT 2. BACKGROUND RESEARCH


The challenges involved in effectively storing, retrieving and &CHALLENGES
sharing medical images have led the researchers to look into
various means and methods of doing the same. It is the need 2.1 DICOM and NoSQL
of the hour for a hybrid data model which will solve all the The digital imaging and communications in medicine
challenges involved in it. In the previous work the suitability (DICOM) protocol is the one default standard for image data
of using NoSQL databases in storing and retrieval of medical management in healthcare. The DICOM file contains two
images was analyzed. It was found the MongoDB, A NoSQL parts stored as a single object i)A header that stores Meta data
database suitable to handle medical images. It is also ii)Image data stored as pixels. The medical images in DICOM
necessary to look for a better way to transfer medical images. format is acquired from different types of modern modalities
Since medical images are huge, it is a challenge to share it like CT-Scanner,MR, X-ray, etc. these images are huge in size
with minimal latency. A Model based on a distributed strategy and the challenege lies in the image data transmission or
using the sharding environment is proposed. It may be sharing of these images required in telemedicine or
considered to be a hybrid data model using MongoDB to teleradiology. Few attempts have been made to improve the
share and handle medical images. This data model is based on data transmission time between medical imaging systems.
storing and retrieving using parallel processing and Rascovsky et al[8] developed a CouchDB based solution to
distributing the data across many machines. The aim of this Store medical images. The author argues the disadvantages of
paper is to study the effectiveness of the sharding or RDBMS to store and access DICOM metadata. DICOM
distributed processing concepts available in the NoSQL objects are heterogeneous and it is unable to represent using
databases and how it helps us to enhance the bandwidth in an RDBMS. A DICOM object can be loaded into RDBMS, if
sharing of huge medical images. and only if most of the metadata is stripped out.
General Terms The author also concluded the suitability of Document based
Health Informatics, Distributed Databases, Sharding, Cloud databases in storing medical images. The document-based
Computing. databases do not have the limitation of RDBMS databases.
Document-based databases are much suitable than RDBMS
Keywords for storing and retrieving DICOM objects, as they are schema-
DICOM, Cloud Computing, MongoDB , Chunked Storage, less. DICOM objects are freely structured and it is not
,sharding,parallel processing, Medical Images. possible to force them to fit into a predefined schema in
RDBMS.
1. INTRODUCTION
Today it is possible to share any information instantly, with Luís A et al [9] developed a PACS archives based on
the but instant sharing of huge Medical images has few MongoDB and CouchDB. The authors concluded the inability
challenges. The Cloud can aid to the instant storing of medical of both NoSQL databases in handling huge files. The authors
images, but the literature lacks in directing the means and reiterated the need for a better solution for storing huge files
methods of doing it. The failure of the Relational Databases to and there was performance degradation as the file size
work with the cloud has led to a few cloud databases also increased. The conclusion of the study was to find a better
referred as NoSQL Databases. As these NoSQL databases replication schema to handle bigger files.
allow flexible data modeling, it is necessary to recommend a A poster paper by Luan Henrique Santos , et al [10]suggest a
suitable Data Model which can work well with the Cloud work based on MONGODB. In a previous work [11] two
technology and also with the Medical Images. It is desirable NoSQL databases, the performances of Cassandra and
to have a Data model which suits NoSQL Databases and MongoDB were compared. It was proved experimentally that
Medical Images which enhances the movement to the Cloud the performance of MongoDB was better in huge files. So we
Technology. So a hybrid data model is proposed for handling conclude MongoDB, a Document based model to be suitable
Medical Images with high through put and minimal network to store medical images. The literature clearly indicates the
latency. This paper is structured as follows: Section II, related difficulty in handling huge medical images. It is also required
work in the area of handling medical Images is discussed. In to look for NoSQL databases for moving these images to the
section III the various medical imaging methods and the need Cloud environment.
for a better way to transfer medical images using distributed
methodology, sharding environment is discussed. Further in 2.1 Data Modeling to handle huge Medical
section IV the implementation of sharding in Mongodb is
discussed. In Section V conclusions and future work is
Images.
As medical image sizes vary from 10 GB to 300 GB, these
presented.
images are categorized as Big Data. A recent study predicts
that there is a potential growth for the medical imaging [13].
A hybrid data model which can possibly handle huge sized

31
International Journal of Computer Applications (0975 – 8887)
Volume 161 – No 9, March 2017

medical images is the need of the hour. The healthcare format files” or simply “DICOM files” and have the extension
industry is moving to the cloud and this adaption is essential “.dcm.”[16]. Due to this ease of integration this
to handle the huge storage required for storage of medical communication standard has become a nearly universal level
images. Also in [6] the author has discussed the advantages of of acceptance among vendors of radiological equipment.
NoSQL databases over RDBMS. As the literature shows the
suitability of NoSQL databases in handling medical images, 3.2 Parts of a DICOM file.
MongoDB is considered in this work. A Hybrid data model is The Digital Imaging and Communications in Medicine
essential which will effectively handle huge medical images. (DICOM) standard adopts files as individual, self-contained
This paper aims proposing a hybrid data model for sharing repositories for the storage of a mixed of alphanumerical and
Medical Images in the cloud, using distributed sharding binary content regarding radiological images.
environment. Medical images can be shared using the Cloud
The digital imaging and communications in medicine
and it is a necessity to have a Non-Relational based storage to
(DICOM) protocol is the one default standard for image data
handle medical images in the Cloud environment. MongoDB
management in healthcare. The DICOM file contains two
is a Document Database is much suitable to store information
parts stored as a single object i) A header that stores Meta data
in the cloud. The concept of sharding supported by MongoDB
ii)Image data stored as pixels. The header stores details about
allows partitioning the huge medical image into chunks and
the patient, acquisition parameters for the imaging study. It
move to a distributed environment.[2,3] is the objective study
also stores image dimensions, matrix size, color space, and a
of this paper.
host of additional non intensity information required by the
2.2 Robustness of NoSQL Databases computer to correctly display the image. The header is
The requirement of a NoSQL database to handle Medical followed by the image data stored as a long series of 0s and
Images in DICOM format is inevitable. The salient features 1s, which can be reconstructed as the image by using the
NoSQL Databases is the way the NoSQL database differ from information from the header. Fig 3.1 shows a sample DICOM
a traditional RDBMS. NoSQL databases are better in handling file.
unstructured data. They differ in Data model, Architecture,
data distribution and also in performance.
• Data model – A NoSQL database has a flexible
schema whereas the Data Model of RDBMS
follows a rigid Schema and it can handle only
structured data. A NoSQL Database is capable of
handling all types of data, structured,semi-
structured and Unstructured.
• Architecture – A NoSQL system can operate in a
distributed, scale-out design whereas RDBMS's are
architected in a centralized way.
• Data distribution model – A NoSQL database
allows data to be distributed evenly to all nodes
making up a database cluster and enables both reads
and writes on all machines whereas it is difficult to Fig 1. Structure of a DICOM image file
distribute data to the clients as it in works in a a The process of medical diagnosis relies on the technological
centralized fashion. In NoSQL it is the distributed capabilities of medical imaging and image analysis. The
model enables to parallel process huge data. diagnosis by the physicians relies on the accuracy and
• Scaling and Performance model – A NoSQL conclusions drawn from the medical images. The final
database scales horizontally based on the load by medical prescription depends on the various capabilities of the
adding extra nodes that deliver increased medical imaging in computer systems that aids in medical
performance in a linear manner wheras an RDBMS diagnosis [7].
typically scales vertically by adding extra CPU,
RAM, etc., to a centralized machine.[2]
3.3 Medical Image Sharing
Today the Medical diagnosis happens by sharing the medical
3. HYBRID DATA MODEL FOR images. The most important challenge in implementing a
sharing system for medical diagnosis using medical imaging
MEDICAL IMAGE SHARING is to consider and to choose the right data storage technology.
As mentioned before, medical imaging plays a vital role in The development of information technology has given
both decision making and treatment support. Sharing of different solutions in handling images. The methods have
Medical images with shorter latency time to access the images changed from time to time. The various methods are discussed
is needed to have the best quality health care service. below.
3.1 Medical Images storage structures 3.3.1 File Systems
The DICOM standard is capable of integrating almost every Initially, images were stored in files outside databases and
all modern imaging equipments, networking servers, inside databases only their paths were collected. This was
accessories and picture archiving and communication systems referred as file systems. Usually, groups of DICOM files are
(PACS) from different manufacturers[1]. A DICOM image hierarchically organized in studies and series, physically
file is Digital Imaging and Communications in Medicine disposed into file system directory trees. Despite its simplicity
standard. To be more specific, image files that are compliant in storing content, ordinary file systems do not provide index
with part 10 of the DICOM standard are referred as “DICOM

32
International Journal of Computer Applications (0975 – 8887)
Volume 161 – No 9, March 2017

capabilities allowing searches by content – restricting access Data needs to be processed in a way that computations on it
by directory names and file names. can be allowed to be performed as isolated subsets and then
combine to generate the desired output.[15]
3.3.2 RDBMS
To surpass above limitation, Picture Archiving and MongoDB has the ability to shard and distribute, parallel
Communication Systems (PACSs) often adopt Relational process it.
Database Management Systems (RDBMSs) as metadata
repositories, benefiting from its general-purposed index 3.4.2 Sharding
structures The process of splitting data up and storing the different
portions of the data on different machines is called sharding;
Then BLOB (Binary Large Object)- a new type of data was we can also use the term partitioning to describe this concept.
developed and introduced which allowed the possibility of It is possible to handle more loads without using powerful
image storage in RDBMS. Even though, Relational Databases servers by just splitting of data and storing it up across many
are the most popular technology for data storage, the accepted machines. It is possible to handle huge files without requiring
fact being that the BLOB is not the best solution for binary large or powerful machines.
data storage. SQL is highly incapable of handling binary
content. It is not possible to access binary content from the A single server’s capacity is challenged while handling large
SQL. data sets or huge sized Medical images. These applications
that handle huge medical image data can be categorized into
3.3.3 NoSQL Big Data needs demands high throughput. Huge data which is
The NoSQL(Not Only SQL) databases which are non- larger than the system’s RAM stress the I/O capacity of disk
relational in nature can handle the multimedia content with drives. A Shard is a computer connected to a cluster of
ease. As there is a substantial growth of multimedia data in machines used in the Sharding process.
the form of binary, it is essential to look into other non-
There are two types of Sharding methods i) Manual sharding
relational solutions to handle images and medical images in
and ii) Automatic Sharding
specific. So the use of NoSQL in handling Medical images
for the many reasons discussed in Section II is to be
considered. The reason primarily is the i) the ability to handle
3.4.2.1 Manual Sharding
Manual sharding is when the application connects to different
binary content as the native data format of NoSQL databases
independent servers. The sharding process is taken care of
have JSON as their storage format. ii) They are capable of
using the application code which manages the sharding
handling huge data files as they are scalable. This approach
process of storing the data in different servers and getting it
has gained general name of NoSQL approach
back by querying against the appropriate server. This
3.4 Hybrid Data model approach becomes difficult to maintain when nodes are added
Medical images in DICOM format has to be stored and also or removed from the database cluster in maintain the load
be shared. Sharing of medical images leads to transfer of large patterns.
amounts of data shared across the network. This may lead
network bottlenecks and congestion. Due to this there is an 3.4.2.2 Auto Sharding
increase in the latency time. This can be avoided and it is Auto sharding is the process where the data gets evenly
possible to have better bandwidth utilization. A data model distributed across the shards or the computers connected to
using MongoDB is presented here. This transfers medical the sharding environment. The data is chunked and sent
images using a distributed –sharding environment. It is across. The balancer put approximately same number of
possible to distribute and parallelly process the huge data chunks into each shard/system connected to the sharding
through sharding. cluster.

3.4.1 Methodology 4. MONGODB AND SHARDING


MongoDB supports autosharding, which helps in eliminating
A Single machine cannot hold huge data, whereas a cluster of
the administrative overhead involved in manual sharding. As
inexpensive hardware can be leveraged to hold huge amounts
mentioned earlier the sharding cluster manages the splitting
data. The data can be stored and processed effectively and
up of data and rebalancing it automatically. MongoDB
efficiently. Three key goals emerged to achieve this:
sharding can be used to support applications with very huge
Data needs to be stored in a networked file system that can be data sets which needs to have high throughput operations with
stored in multiple machines, rather than a centralized system minimum latency.[3]
as in RDBMS. Huge files can be chunked and stored in
multiple nodes. 4.1 Autosharding in MongoDB
MongoDB performs autosharding by breaking up the data
Data needs to be stored in a schema free structure or it should stored in collections into smaller chunks. A cluster of
possible to change schemas without much alteration. computers can be connected to the sharding environment and
Data needs to be processed in a way that computations on it the broken up chunks can be distributed across shards evenly.
can be allowed to be performed as isolated subsets and then Each shard contains/stores in a subset of the total data set. A
combine to generate the desired output.[15] routing process called mongos stores detail about where all of
the data is located, to keep things anonymous to the
Data needs to be stored in a networked file system that can be application. The applications connect to the router and gets
stored in multiple machines, rather than a centralized system information regarding the meta Data. The router, knows what
as in RDBMS. Huge files can be chunked and stored in data is on which shard, is able to forward the requests to the
multiple nodes. appropriate shard(s).Fig 4.2 Shows the sharding process
where 3 Shards are connected to the router/Mongos. When the
Data needs to be stored in a schema free structure or it should
client needs to send data across, the data is evenly distributed
possible to change schemas without much alteration.

33
International Journal of Computer Applications (0975 – 8887)
Volume 161 – No 9, March 2017

and sent. Here the sharding process is abstracted from the anything permanently, it needs somewhere to get the shard
application. Sharding can be used only when there is a need to configuration. It syncs this data from the config servers.
handle large objects with ease, which improves performance.
5. EXPERIMENTAL SETUP
Fig 4.1 shows a Non-sharded MongoDB setup; where a client A Sharding environment was setup using 7 systems with
connects to a mongod process. Here there is no cluster of Ubutu Operating System and MongoDB 3.03. The machines
machines wherein huge files cannot be handled with ease. had the configuration,, 6th Generation Intel(R) Core(TM) i5-
6200U Processor (3M Cache, up to 2.80 GHz). The set up is
as given below.
We set one config server, One Query Router and 3 shards.
One system was a Client and other one was a server.
We studied the time complexity of sharing or storing DICOM
image files from a client machine to a server.
First the Config Server was set and then the Query router.
Then the shards were added one by one to the config server.
Sharding was enabled in the database level and the time
complexity was studied for huge DICOM files. The file sizes
varied from 1 GB to 5 GB. The study was carried carried out
for sharing data from a Client to a server machine through
shards.

Fig 2
A Non-sharded MongoDB setup is shown in Fig 3; where a
client connects to a mongod process. Here there is no cluster
of machines wherein huge files cannot be handled with ease.
The latency time increases as the sharing is directly uploaded
to the network for sharing. This method fails in handling huge
medical images.

Fig 4 .Splitting/Chunking of Huge DICOM files

6. RESULT
6.1 Time Complexity with Sharding
We try to share the medical images in a sharded and non-
sharded environment. The time was recorded in Sharding and
a Non-sharding environment. The latency time in a non-
sharded environment is much higher than the latency in
sharded environment. The results of the Non-sharded
environment are shown in Table 1.
The study was also carried out in a sharded environment. The
results indicate that the latency time decreased as we
increased the number of shards. The time taken to store was
Fig 3 much higher with a two shards and was very less with 3 or
more shards.
4.1 Setting up a Sharding environment
Sharding basically involves three different components Table-1 Time in a Non-Sharded Environment
working together: Size(MB) Three N0
shard shards(Mins)
Shard(Mins)
A shard is a container that holds a subset of a collection’s
data. Thus, even if there are many servers in a shard, there is 1000 1.5 3.1
only one master, and all of the servers contain the same data.
2000 2.14 13.3
mongos
This is the router process and comes with all MongoDB 3000 2.8 19.1
distributions. It basically just routes requests and aggregates
responses. It doesn’t store any data or configuration 4000 3.1 24.4
information.
5000 3.8 35.2
Config Server
Config servers store the configuration of the cluster: which
data is on which shard.Because mongos doesn’t store

34
International Journal of Computer Applications (0975 – 8887)
Volume 161 – No 9, March 2017

Table-2 Two and Three Shards comparison benefited with this model. This can be extended to any
radiological department which involves sharing of Medical
Size(MB) Three shards Two Shards Images.
1000 1.5 1.8
2000 2.14 5.21 7.2 Future Work
3000 2.8 8.16 This Data model is suitable for medical Image processing in
the Cloud environment. Health care Informatics data grows
4000 3.1 12.23
day by day. The amount of data is huge and the Health care
5000 3.8 16.8 providers are in a verge to move Health care information and
medical images to the cloud. The movement of medical
The Following is the Graphical representation of the same. images to the cloud needs a specific data model where large
amounts of data can be shared and processed without much
network bottlenecks. There is also a huge need to process and
analyze the huge volumes of data stored in the Cloud. This
model will be highly suitable to share and analyze huge sized
images and health informatics data stored in the Cloud. In
future, it is possible to develop a model to move medical
images to the cloud using this method.

8. ACKNOWLEDGMENTS
The authors wish to thank Gautham, Manikanda and Yogesh
MCA students of REVA University, Bangalore, for their
contribution to this work.

9. REFERENCES
[1] Oleg S. P ianykh,"Digital Imaging and Communications
in Medicine (DlCOM), A Practical Introduction and
Survival Guide ", book published by Springer-Verlag
Berlin Heidelberg, pp 247-261, 2008 and 2012
[2] Yimeng Liu, Yizhi Wang,Yi Jin, Research on The
Improvement of MongoDB Auto-Sharding in Cloud
Fig 5. Sharding vs NoSharding Environment, IEEE, 978-1-4673-0242-5-2012
[3] Kristina Chodrow, Michael Dirolf, Scaling MongoDB.
[4] Alexandre Savaris, Theo Härder, Aldo von Wangenheim,
DCMDSM:A DICOM decomposed storage model,
Journal of the American Medical Informatics Association
· February 2014
[5] Alexandre Savaris, Gabriela Bussolo Colonetti, Rodrigo
Rodrigues Pires de Mello, Aldo von Wangenheim
Relational Databases versus Search Engines: A
Performance Comparison for Storing and Querying
DICOM Metadata
[6] D.Revina Rebecca, I.Elizabeth Shanthi,A NoSQL
Solution to efficient storage and retrieval of Medical
Images,International Journal of Scientific & Engineering
Research, Volume 7, Issue 2, February-2016,ISSN 2229-
5518
[7] Liliana BYCZKOWSKA-LIPIŃSKA, Agnieszka
Fig 5 Three Shards vs Two Shards
WOSIAK, Multimedia NoSQL database solutions in the
medical imaging data analysis
7. CONCLUSION AND FUTURE WORK
7.1 Conclusion [8] Simón J. Rascovsky, MD, MSc • Jorge A. Delgado, MD
The study shows the effect of parallel processing and there is • Alexander Sanz, BS • Víctor D. Calvo, BS • Gabriel
a great reduction in time as the number of machines used to Castrillón, BS,Use of CouchDB for Document-based
distribute the data is increased. The time taken to share the Storage of DICOM Objects
image reduces with Data distribution using Sharding. The [9] Luís A. Bastião Silva, Louis Beroud, Carlos Costa and
time gets reduced with more number of Shards. More the data José Luis Oliveira,Medical imaging archiving: a
gets distributed, it takes lesser time. This clearly indicates comparison between several NoSQL,978-1-4799-2131-
that a huge data can be shared in a sharded environment with 7/14/$31.00 ©2014 IEEE.
ease. The main challenge in medical images was in handling
huge images, sharing, storing and retrieval. The degradation [10] Luan Henrique Santos Simões de Almeidaa, Marcelo
of the performance as the size increased can be easily Costa Oliveiraa, A Medical Image Backup Architecture
overcome with this model. Health care departments using Based on a NoSQL Database and Cloud Computing
telemedicine and sharing of medical images can be highly Services, MEDINFO 2015: eHealth-enabled Health,

35
International Journal of Computer Applications (0975 – 8887)
Volume 161 – No 9, March 2017

doi:10.3233/978-1-61499-564-7-929. [13] http://www.siemens.com/innovation/en/home/pictures-


of-the-future/health-and-well-being/medical-imaging-
[11] Marcosa E., Acuna C.J., Vela B., Caveroa J. M., facts-and-forecasts.html
Hermandez J.A.: A database for medical image
management, Computer methods and programs in [14] Yan Hu, Fangjie Lu, Israr Khan, Guohua Bai, A Cloud
biomedicine, vol. 86, pp: 255-269, 2007 Elsevier Ireland Computing Solution for Sharing Healthcare Information
Ltd
[15] D.Revina Rebecca et al, Impact of adapting Cloud
[12] D.Revina Rebecca, I.Elizabeth Shanthi , Analysing the Computing in health care industry for storing medical
suitability of storing Medical Images in NoSQL Images.
Databases, International Journal of Scientific &
Engineering Research, Volume 7, Issue 6, June- [16] Dandu Ravi Varma, Managing DICOM images: Tips and
2016,ISSN 2229-5518 tricks for the radiologist, Indian J Radiol Imaging. 2012
Jan-Mar; 22(1): 4–13, doi: 10.4103/0971-3026.95396.

IJCATM : www.ijcaonline.org
36

You might also like