Paper 150
Paper 150
{cmgr4,aor14}@alu.ua.es, {jperal,jtrujillo}@dlsi.ua.es
[email protected], [email protected]
1 Introduction
be easy to develop and manage, and it offers high performance with the inner aggrupa-
tion of relevant data [1-3].
Every instance or combination of data is named by Document, and documents can
be grouped into Collections (equivalent to tables in relational databases) with no pre-
defined schema. Some fields can be indexed in documents for higher performance.
The main hierarchy of MongoDB is given by the previously presented elements. The
main difference between MongoDB and relational database tools is that the concepts
of tables, rows and columns do not exist, instead documents are used with different
structures. An aggrupation of fields would form a document and in case this document
is grouped with other documents it would result in a collection. The databases will be
composed by collections. Finally, in a real scenario, a server can host many databases.
As mentioned before, MongoDB does not have a standard schema of data, but this
does not mean that the data would be inconsistent. In fact, structured documents are
very common in MongoDB, so it will not be difficult to manage all the data. For data
transfer of documents in MongoDB, the BSON (Binary JavaScript Object Notation) is
used. It consists of a binary representation of data structures, designed to be lighter and
more efficient than JSON (JavaScript Object Notation). MongoDB has a series of tools
that allow a very intuitive interaction with the databases, between the most popular are:
Regarding security aspects and accessing sensitive data, MongoDB (and other
schema less NoSQL databases) presents the serious problem that it is difficult to auto-
matically or semi-automatically control the access to sensitive data due to the fact that
several different structures of documents can coexist in the same database.
Therefore, in this paper, we propose the basis for semi-automatically implementing
security rules on MongoDB (and other NoSQL databases in the future). In concrete, we
have defined a security approach focused on NoSQL document databases. Our ap-
proach allows us the specification of both structural and security aspects related to doc-
ument databases. Therefore, new collections of documents can easily be added and the
new security and access-control rules can be more easily defined. In order to check the
applicability of our proposal, we use it in to generate the required code to implement
the defined security rules on MongoDB.
The rest of the paper is structured as follows. Section 2 summarizes the most relevant
concepts and challenges on NoSQL databases security. Section 3 presents our proposal
to establish security constraints on: collection, field and field content. Subsequently,
Section 4 shows a case study with a dataset about patient clinical reports. Finally, the
main contributions and our directions for future work are explained in Section 5.
3
2 Related Work
With regard to the incorporation of security policies in these NoSQL databases (used
in Big Data technologies) several work have been defined. However, they usually do
not consider security at the modelling stages [4-6].
Other contributions present a complete secure development of information systems.
Although they do not focus specifically on NoSQL databases and their specific security
problems, they present interesting ideas: (1) Secure TROPOS [7] is an extension that
includes security in the TROPOS methodology for software development based on the
intentional goals of agents. (2) Mokum [8], which is an active object-oriented
knowledge-based system for modelling, allows the specification of security and integ-
rity constraints. (3) UMLsec [9] uses formal semantics in order to evaluate security
specifications; it defines the specification of confidentiality, the integrity requirements
and the accessing control policies. (4) The application of the model-driven approach to
include security properties in high-level system models and the automatic generation
of secure systems are carried out in MDS (Model-driven Security) [10].
3 Our proposal
One of the contributions of this proposal is the establishment of the security privileges
needed to access each field of the data set. It is carried out by using Natural Language
Processing (NLP) and lexical ontological resources. We have used the lexical resource
WordNet1. By analyzing the values of each field we establish tree kinds of security
constraints.
0. Security collection. All users are not allow to access all site collections, a given
level of security is required in order to limit the collections users can access.
1. Security constraints. There are fields in which all the information is sensitive at
the same security level, that is, it does not depend on their specific values.
For instance, the information of the address field is sensitive and a certain security
level (for instance, SL = 1) could be required for queries. This level is the same for all
the values of this field, that is, there are not instances of address more sensitive than
others.
Once the security constraints have been established, the designer models the data set
according to our security approach. In this paper we have defined an approach focused
on a kind of NoSQL databases, document databases. It allows the specification of both
structural and security aspects related with document databases. It permits modeling
structural aspects such as Databases (as Packages), Collections (as Classes) and Fields
(as Properties).
The security configuration of the system which we want to model is defined by using
three points of view: a hierarchical structure of Security Roles; a list of Security Levels
with the clearance levels of the users; and a set of horizontal Security Compartments or
groups. We can define security rules associated with structural elements. Each rule in-
dicates the actions that certain subjects can carry out over certain objects. Furthermore,
we can define fine-grain security rules which affect specific fields of a collection. This
kind of rules allows us to establish different security privileges when the values of a
field satisfy a condition.
4 Case study
4.1 Description
The dataset used for the demonstration is a custom adaptation from the UCI Machine
Learning Repository that represents the patient clinical reports of 130 hospitals between
the years 1999 and 2008 [11].
6
In both collections, pre-loaded data can be found. The documents contain data of
different patients with their admissions. Fig. 2 shows the insertion of the test documents
in the “Patient” collection, while in the “Admission” collection it has been used a ref-
erence technique insertion.
The next step is to implement the creation of the role that will have access to this
View. For this we will use the command: “db.createRole(roleName, privileges, other-
Roles)”. In Fig. 4 it is shown that the role will only be applied to security level 2. The
“Collection: Admission2” is the view defined with restrictions and as we can see we
treat it as if it was a collection of the database.
Finally, the users are created, this way the environment will be entirely prepared for
testing. In Fig. 5 we use the “createUser” command to define a user that would be ca-
pable of applying the security role 2 over the “Hospital” database.
In order to manage the access of users to the database and collections it is mandatory
to include in the configuration file “mongod.conf”, inside “#security” the following
sentence: “security.authorization : enabled”. The file is located in “/etc/mongod.conf”,
accessible via terminal.
2. Level 1, security constraints: The level 1 can only access the defined views of “Pa-
tient” and “Admission” since it is the lowest permission level allowed for the data-
base. Fig. 8 shows how the system will not authorize to enter any database if the user
does not have permission and the view will only show the fields to what the user has
permission.
4. Level 3: Finally, the user with access level 3 will be able to see all the fields and all
the data. As shown in Fig. 11, the result query displays all the available fields and
data in both “Patient” and “Admission” collections.
12
Despite the level 3 is the highest security level, for the sake of investigation it has a
restriction on the CRUD (Create, Read, Update, and Delete) operations over the data-
base, it will not be able to remove any data from the collections, this way we can demon-
strate that the restrictions concerning commands can also be defined. Fig. 12 shows an
attempt to execute an unauthorized command from this user level of security running
into an error.
5 Conclusion
6 Acknowledgements
This work is part of the Final Degree Project in the Computer Science Degree, pre-
sented by Carlos M. García-Ruiz, Alejandro Oliver, and Jorge Espinosa in June 2017,
and adviced by Jesús Peral, Juan Trujillo, Eduardo Fernández, and Carlos Blanco. This
work is partially funded by the funded projects SEQUOIA-UA (TIN2015-63502-C3-
3-R) and the SEQUOIA-UCLM (TIN2015-63502-C3-1-R) from the MINECO.
7 References
1. https://docs.mongodb.com/manual/core/views/#create-view
2. https://blog.pandorafms.org/nosql-vs-sql-key-differences/
3. https://docs.mongodb.com/manual/reference/method/db.createRole/
4. N. Kshetri. Big data's impact on privacy, security and consumer welfare. Telecommunica-
tions Policy, 38(11):1134-1145, 2014.
5. K. Michael and K. Miller. Big data: New opportunities and new challenges [guest editors'
introduction]. Computer, 46(6):22-24, 2013.
6. R. Toshniwal, K.G. Dastidar, and A. Nath. Big data security issues and challenges. Interna-
tional Journal of Innovative Research in Advanced Engineering (IJIRAE), 2(2):15-20, 2015.
7. L. Compagna, P.E. Khoury, A. Krausová, F. Massacci, and N. Zannone. How to integrate
legal requirements into a requirements engineering methodology for the development of se-
curity and privacy patterns. Artificial Intelligence and Law, 17(1):1-30, 2009.
14
8. R.P. van de Riet. Twenty-five years of mokum: For 25 years of data and knowledge engi-
neering: Correctness by design in relation to mde and correct protocols in cyberspace. Data
& Knowledge Engineering, 67(2):293-329, 2008.
9. J. Jurjens and H. Schmidt. Umlsec4uml2 - adopting umlsec to support uml2. Technical re-
port, Technical Reports in Computer Science. Technische Universitat Dortmund,
http://hdl.handle.net/2003/27602, 2011.
10. D. Basin, J. Doser, and T. Lodderstedt. Model driven security: from uml models to access
control infrastructures. ACM Transactions on Software Engineering and Methodology,
15(1):39-91, 2006.
11. A. Frank and A. Asuncion. UCI machine learning repository [http://archive. ics. uci.
edu/ml]. Irvine, ca: University of California. School of Information and Computer Science,
213, 2010.