Instant Access to Databases Theory and Applications Junhu Wang ebook Full Chapters
Instant Access to Databases Theory and Applications Junhu Wang ebook Full Chapters
Instant Access to Databases Theory and Applications Junhu Wang ebook Full Chapters
com
https://textbookfull.com/product/databases-theory-and-
applications-junhu-wang/
OR CLICK BUTTON
DOWNLOAD NOW
https://textbookfull.com/product/mongodb-performance-tuning-
optimizing-mongodb-databases-and-their-applications-1st-edition-guy-
harrison/
textboxfull.com
https://textbookfull.com/product/a-deep-dive-into-nosql-databases-the-
use-cases-and-applications-first-edition-raj/
textboxfull.com
https://textbookfull.com/product/seven-databases-in-seven-weeks-a-
guide-to-modern-databases-and-the-nosql-movement-perkins/
textboxfull.com
Microcontroller Theory and Applications Rafiquzzaman
https://textbookfull.com/product/microcontroller-theory-and-
applications-rafiquzzaman/
textboxfull.com
https://textbookfull.com/product/defoaming-theory-and-industrial-
applications-garrett/
textboxfull.com
https://textbookfull.com/product/relative-fidelity-processing-of-
seismic-data-methods-and-applications-1-edition-edition-wang/
textboxfull.com
https://textbookfull.com/product/eukaryotic-genomic-databases-martin-
kollmar/
textboxfull.com
Junhu Wang · Gao Cong
Jinjun Chen · Jianzhong Qi (Eds.)
LNCS 10837
Databases Theory
and Applications
29th Australasian Database Conference, ADC 2018
Gold Coast, QLD, Australia, May 24–27, 2018
Proceedings
123
Lecture Notes in Computer Science 10837
Commenced Publication in 1973
Founding and Former Series Editors:
Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board
David Hutchison
Lancaster University, Lancaster, UK
Takeo Kanade
Carnegie Mellon University, Pittsburgh, PA, USA
Josef Kittler
University of Surrey, Guildford, UK
Jon M. Kleinberg
Cornell University, Ithaca, NY, USA
Friedemann Mattern
ETH Zurich, Zurich, Switzerland
John C. Mitchell
Stanford University, Stanford, CA, USA
Moni Naor
Weizmann Institute of Science, Rehovot, Israel
C. Pandu Rangan
Indian Institute of Technology Madras, Chennai, India
Bernhard Steffen
TU Dortmund University, Dortmund, Germany
Demetri Terzopoulos
University of California, Los Angeles, CA, USA
Doug Tygar
University of California, Berkeley, CA, USA
Gerhard Weikum
Max Planck Institute for Informatics, Saarbrücken, Germany
More information about this series at http://www.springer.com/series/7409
Junhu Wang Gao Cong
•
Databases Theory
and Applications
29th Australasian Database Conference, ADC 2018
Gold Coast, QLD, Australia, May 24–27, 2018
Proceedings
123
Editors
Junhu Wang Jinjun Chen
ICT Faculty of Information and Communication
Griffith University Technologies
Southport, QLD Swinburne University of Technology
Australia Hawthorn, VIC
Australia
Gao Cong
Nanyang Technological University Jianzhong Qi
Singapore The University of Melbourne
Singapore Melbourne, VIC
Australia
LNCS Sublibrary: SL3 – Information Systems and Applications, incl. Internet/Web, and HCI
This Springer imprint is published by the registered company Springer International Publishing AG
part of Springer Nature
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
It is our great pleasure to present the proceedings of the 29th Australasian Database
Conference (ADC 2018). The Australasian Database Conference is an annual inter-
national forum for sharing the latest research advancements and novel applications of
database systems, data-driven applications, and data analytics between researchers and
practitioners from around the globe, particularly Australia and New Zealand. The
mission of ADC is to share novel research solutions to problems of today’s information
society that fulfil the needs of heterogeneous applications and environments and to
identify new issues and directions for future research. ADC seeks papers from aca-
demia and industry presenting research on all practical and theoretical aspects of
advanced database theory and applications, as well as case studies and implementation
experiences.
ADC 2018 was held during May 23–25, 2018, on the Gold Coast, Australia. As in
previous years, ADC 2018 accepted all the papers that the Program Committee con-
sidered as being of ADC quality without setting any predefined quota. The conference
received 53 submissions, each of which was carefully peer reviewed by at least three
independent reviewers, and in some cases four or five reviewers. Based on the reviewer
comments, we accepted 23 full research papers, six short papers, and three demo
papers. The Program Committee that selected the papers comprised 52 members from
around the world including Australia, China, USA, Finland, Denmark, Switzerland,
Japan, New Zealand, and Singapore. The conference programme also includes keynote
talks and invited tutorials for ADC’s PhD school.
We are grateful to Professor Xiaofang Zhou (University of Queensland, ADC
Steering Committee member) for his helpful advice, Professor Rui Zhang (University
of Melbourne, ADC 2018 General Chair), and Dr. Sen Wang (Griffith University, ADC
2018 Local Organization Chair) for their tireless work in coordinating the conference
activities. We would like to thank all members of the Organizing Committee, and the
many volunteers, for their support in the conference organization. Special thanks go to
the Program Committee members and the external reviewers who contributed their time
and expertise in the paper review process. We would also like to thank the invited
speakers, all authors who submitted their papers, and all conference attendees.
Rui Zhang
Organization
General Chair
Rui Zhang University of Melbourne, Australia
Program Chairs
Junhu Wang Griffith University, Australia
Gao Cong Nanyang Technical University, Singapore
Jinjun Chen Swinburne University of Technology, Australia
Proceedings Chair
Jianzhong Qi University of Melbourne, Australia
Publicity Chair
Lijun Chang University of Sydney, Australia
Steering Committee
Rao Kotagiri University of Melbourne, Australia
Timos Sellis RMIT University, Australia
Gill Dobbie University of Auckland, New Zealand
Alan Fekete University of Sydney, Australia
Xuemin Lin University of New South Wales, Australia
Yanchun Zhang Victoria University, Australia
Xiaofang Zhou University of Queensland, Australia
Program Committee
Tarique Anwar Swinburne University of Technology, Australia
Zhifeng Bao RMIT University, Australia
Huiping Cao New Mexico State University, USA
Xin Cao University of New South Wales, Australia
Lijun Chang University of Sydney, Australia
Muhammad Aamir Cheema Monash University, Australia
X Organization
External Reviewers
Taotao Cai University of Western Australia
Xuefeng Chen University of New South Wales, Australia
Qixu Gong New Mexico State University, USA
Yifan Hao New Mexico State University, USA
Nguyen Quoc Viet Hung Griffith University, Australia
Md Zahidul Islam University of South Australia
Saiful Islam Griffith University, Australia
Selasi Kwasie University of South Australia
Yadan Luo University of Queensland, Australia
Yue Qian Dalian University of Technology, China
Nguyen Khoi Tran University of Adelaide, Australia
Edgar Ceh Varela New Mexico State University, USA
Can Wang Griffith University, Australia
Fan Wang Aston University, UK
Lujing Yang University of South Australia
Invited Talks
MapReduce Algorithms for Big Data Analysis
Kyuseok Shim
Abstract. There is a growing trend of applications that should handle big data.
However, analyzing big data is very challenging today. For such applications,
the MapReduce framework has recently attracted a lot of attention. MapReduce
is a programming model that allows easy development of scalable parallel
applications to process big data on large clusters of commodity machines.
Google’s MapReduce or its open-source equivalent Hadoop is a powerful tool
for building such applications. In this tutorial, I will first introduce the
MapReduce framework based on Hadoop system available to everyone to run
distributed computing algorithms using MapReduce. I will next discuss how to
design efficient MapReduce algorithms and present the state-of-the-art in
MapReduce algorithms for big data analysis. Since Spark is recently developed
to overcome the shortcomings of MapReduce which is not optimized for of
iterative algorithms and interactive data analysis, I will also present an outline of
Spark as well as the differences between MapReduce and Spark. The intended
audience of this tutorial is professionals who plan to develop efficient
MapReduce algorithms and researchers who should be aware of the
state-of-the-art in MapReduce algorithms available today for big data analysis.
Reynold Cheng
the Performance Reward in years 2006 and 2007 awarded by the Hong Kong
Polytechnic University. He is the Chair of the Department Research Postgraduate
Committee, and was the Vice Chairperson of the ACM (Hong Kong Chapter) in 2013.
He is a member of the IEEE, the ACM, the Special Interest Group on Management of
Data (ACM SIGMOD), and the UPE (Upsilon Pi Epsilon Honor Society). He is an
editorial board member of TKDE, DAPD and IS, and was a guest editor for TKDE,
DAPD, and Geoinformatica. He is an area chair of ICDE 2017, a senior PC member for
DASFAA 2015, PC co-chair of APWeb 2015, area chair for CIKM 2014, area chair for
Encyclopedia of Database Systems, program co-chair of SSTD 2013, and a workshop
co-chair of ICDE 2014. He received an Outstanding Service Award in the CIKM 2009
conference. He has served as PC members and reviewer for top conferences (e.g.,
SIGMOD, VLDB, ICDE, EDBT, KDD, ICDM, and CIKM) and journals (e.g., TODS,
TKDE, VLDBJ, IS, and TMC).
Approximate Computation for Big Data
Analytics
Shuai Ma
Beihang University
Abstract. Over the past a few years, research and development has made sig-
nificant progresses on big data analytics with the supports from both govern-
ments and industries all over the world, such as Spark, IBM Watson and Google
AlphaGo. A fundamental issue for big data analytics is the efficiency, and
various advances towards attacking these issues have been achieved recently,
from theory to algorithms to systems. In this talk, we shall present the idea of
approximate computation for efficient and effective big data analytics: query
approximation and data approximation, based on our recent research experi-
ences. Different from existing approximation techniques, the approximation
computation that we are going to introduce does not necessarily ask for theo-
retically guaranteed approximation solutions, but asks for sufficiently efficient
and effective solutions in practice.
Short Biography. Shuai Ma is a full professor in the School of Computer Science and
Engineering, Beihang University, China. He obtained two PhD degrees: University of
Edinburgh in 2010 and Peking University in 2004, respectively. His research interests
include database theory and systems, and big data. He is a recipient of the best paper
award of VLDB 2010, the best challenge paper award of WISE 2013, the National
Science Fund of China for Excellent Young Scholars in 2013, and the special award of
Chinese Institute of Electronics for progress in science and technology in 2017 (8/15).
He is an Associate Editor of VLDB Journal since 2017.
Understanding Human Behaviors via Learning
Internet of Things Interactions
Lina Yao
Short Biography. Lina Yao is currently a lecturer in the School of Computer Science
and Engineering, University of New South Wales. Her research interests lie in data
mining and machine learning applications with the focuses on Internet of Things,
recommender systems, human activity recognition and Brain-Computer Interface.
Mining Geo-social Networks – Spatial Item
Recommendation
Abstract. The rapid development of Web 2.0, location acquisition and wireless
communication technologies has fostered a profusion of geo-social networks
(e.g., Foursquare, Yelp and Google Place). They provide users an online plat-
form to check-in at points of interests (e.g., cinemas, galleries and hotels) and
share their life experiences in the physical world via mobile devices. The new
dimension of location implies extensive knowledge about an individual’s
behaviors and interests by bridging the gap between online social networks and
the physical world. It is crucial to develop spatio-temporal recommendation
services for mobile users to explore the new places, attend new events and find
their potentially preferred spatial items from billions of candidate ones. Com-
pared with traditional recommendation tasks, the spatio-temporal recommen-
dation faces the following new challenges: Travel Locality, Spatial Dynamics of
User Interests, Temporal Dynamics of User Interests, Sequential Influence of
user mobility behaviors and Real-time Requirement. In this talk, I will present
our recent advancement of spatio-temporal recommendation techniques and how
to address these unique challenges.
Short Biography. Dr. Hongzhi Yin is now working as a lecturer in data science and an
ARC DECRA Fellow (Australia Discovery Early Career Researcher Award) with The
University of Queensland, Australia. He received his doctoral degree from Peking
University in July 2014. After graduation, he joined the school of ITEE, the University
of Queensland. He successfully won the ARC DECRA award in 2015 and obtained an
ARC Discovery Project grant as a chief investigator in 2016. His current main research
interests include social media analytic, user profiling, recommender system, especially
spatial-temporal recommendation, topic discovery and event detection, deep learning,
user linkage across social networks, knowledge graph mining and construction. He has
published over 70 peer-reviewed papers in prestigious journals and top international
conferences including ACM TOIS, VLDBJ, IEEE TKDE, ACM TKDD, ACM TIST,
ACM SIGMOD, ACM SIGKDD, VLDB, IEEE ICDE, AAAI, SIGIR, WWW, ACM
Multimedia, ICDM, WSDM and CIKM. He has been actively engaged in professional
services by serving as conference organizers, conference PC members for PVLDB,
SIGIR, ICDE, IJCAI, ICDM, CIKM, DASFAA, ASONAM, MDM, WISE, PAKDD
and reviewer of more than 10 reputed journals such as VLDB Journal, TKDE, TOIS,
TKDD, TWeb, IEEE Transactions on Cybernetics, WWW Journal, Knowledge-based
system and etc.
Mining Geo-social Networks – Spatial Item Recommendation XXI
Dr. Weiqing Wang is now working as a Research Fellow in the school of ITEE, the
University of Queensland, where she also obtained her PhD in July on 2017. She will
join Monash University as a lecturer in data science in this July. Her major research
interests include user modelling and recommender systems, especially spatial-temporal
recommender systems. She has published over ten peer-reviewed papers in prestigious
journals and top conferences including IEEE TKDE, ACM TOIS, ACM TIST,
ACM SIGKDD, ACM SIGIR, IEEE ICDE, ACM Multimedia, and CIKM.
Contents
An Efficient Framework for the Analysis of Big Brain Signals Data . . . . . . . 199
Supriya, Siuly, Hua Wang, and Yanchun Zhang
Demo Papers
1 Introduction
Current in-memory databases are significantly limited by the main memory’s
latency and bandwidth [2]. In the time spent for transferring a cache line from
DRAM to the CPU (roughly 100 ns), a modern CPU can execute 300 instructions
or more. When the compute part of database operators executes in fewer cycles,
the CPU stalls and waits for more data to arrive. This gets exacerbated in NUMA
setups where remote DRAM accesses take roughly 200 ns with a single NUMA
hop. Scale-up systems, as used for big SAP HANA or Oracle databases, can
include multiple NUMA hops and up to 48 TB of memory. These connect up to
eight blades with four processors each to a single, cache-coherent network using
a proprietary interconnect. In such setups, memory latency from one end to the
other can reach hundreds of nanoseconds, making the influence even bigger.
Closely related to memory latency is memory bandwidth. On our test sys-
tem (cf. Sect. 3), we measured a NUMA node-local bandwidth of slightly over
50 GB/s, while remote accesses on the same blade had a reduced bandwidth of
12.5 GB/s and remote blades of 11.5 GB/s. As such, making good use of the
available physical bandwidth is vital. Doing so includes reducing the amount of
data transferred by using compression for a higher logical bandwidth (i.e., more
information transferred per byte) or organizing the data in a cache line-friendly
way. This could be a columnar table layout where each cache line only holds
values from the column that is accessed and cache line bycatch, i.e., data that is
loaded into the CPU but never used, is avoided for column store-friendly queries.
Making the DBMS more aware of NUMA can significantly improve the per-
formance [7]. By ensuring that data is moved across the NUMA network only
c Springer International Publishing AG, part of Springer Nature 2018
J. Wang et al. (Eds.): ADC 2018, LNCS 10837, pp. 3–14, 2018.
https://doi.org/10.1007/978-3-319-92013-9_1
4 M. Dreseler et al.
when it is unavoidable, the memory access costs can be reduced. Still, there
remain cases in which a load from a distant node cannot be avoided. This hap-
pens when joins access data from a remote table or when the data (and thus the
load) is imbalanced and an operator cannot be executed on the optimal node.
In addition to NUMA optimization and better data layouts, developers use
dedicated hardware to increase the effective bandwidth [1,6,9]. There are sev-
eral approaches, but no one-size-fits-all technique. In this paper, we look at one
specific method that is used to improve the physical bandwidth available to
database operators, namely the Global Reference Unit (GRU) built into sys-
tems like SGI’s UV series or HPE’s Superdome Flex. The GRU provides an API
that can be used to offload certain memory operations, allowing the CPU to
work on other data in the meantime. Previous work [3] has shown that this can
result in a performance benefit of up to 30% for table scans. We extend on this
by evaluating which factors lead to an advantage of the GRU over the CPU in
some cases and what causes it to be slower in others. This knowledge can be used
by the DBMS to automatically choose between the CPU and GRU access paths.
Furthermore, we present relaxed cache coherence as another access method.
This paper is organized as follows: Sect. 2 gives background information on
the hardware discussed in this paper. To gather data on the memory bus uti-
lization and better profile the physical properties of database operations, we use
performance counters as described in Sect. 3. These are then used in Sect. 4 to
discuss the factors that influence if one method or another gives the higher effec-
tive bandwidth. Section 5 explains how a DBMS can use these results in order
to choose an access method. In Sect. 6, we show how relaxing cache coherency
could further improve the physical bandwidth of a table scan. Related work is
discussed in Sect. 7 and a summary is given in Sect. 8.
H CPU CPU H
each HARP is directly connected with every other A QPI A
R R NUMAlink
using a special interconnect called NUMAlink. This P CPU CPU P
creates an all-to-all topology that allows the addi-
tion of more CPUs and more memory by attach- H CPU CPU H
Blade 2
A A
ing additional blades to the machine. In order to R
P CPU CPU
R
P
make the memory of one blade accessible to another
blade, the HARPs participate in the QPI ring of
their blades and mimic a NUMA node with a large Fig. 1. General architecture
amount of main memory, i.e., the memory of every of the discussed system
other blade [8].
Adaptive Access Path Selection for Hardware-Accelerated DRAM Loads 5
1
All benchmarks were executed on an SGI UV 300H with 6 TB RAM and eight Intel
E7-8890 v2 processors. Our code was compiled with gcc 7.2 at -O3.
Adaptive Access Path Selection for Hardware-Accelerated DRAM Loads 7
16 M
Output Rows
Fig. 3. Influence of the input data size for off-blade CPU and GRU scans - each dot is
one measured data point
Data Locality. Depending on where the input data is located relative to the
executing CPU, it needs to be transferred through zero to multiple NUMA hops.
Figure 4 shows that the throughput for the regular CPU scan changes depending
on NUMA distance. The highest throughput of 8 GB/s is achieved on the same
node. With increasing NUMA distance, the throughput rates decrease. For a
blade-local scan, the throughput rates reach up to 5 GB/s, and scanning an off-
blade vector only nets approximately 3 GB/s. The GRU scan performance stays
8 M. Dreseler et al.
Fig. 4. Influence of the NUMA distance for CPU and GRU scans
stable for all NUMA distances at around 6 GB/s. For both CPU and GRU, a
high variance is measured. This is dependent on the other execution parameters
as described in this section. It shows that there are parameters other than the
data locality, especially for small tables, that play an important role in deciding
if the CPU or the GRU is faster.
For the model described in Sect. 5, we take the latency (instead of the number
of hops) between source and destination node as it describes a linear variable,
not a discrete one. This makes it easier to adapt the model to other systems
where the latency between hops is different.
Result Size. When scanning the input data vector, i.e., the column in an
in-memory database, the operation also needs to save the results. In this imple-
mentation, the scan returns a vector of indexes of the input vector where a
certain search value was found. This means that both the value distribution and
the given search value have an impact on how large the result gets. We have
chosen to take both the data size and the result size as parameters instead of
just using the selectivity. This is because the impact of the selectivity on the
scan cost varies for different data sizes.
Figure 5 shows the performance of scans with different result sizes. As the
output size grows, the amount of time spent for writing the output vector slowly
grows as well, and at some point, surpasses the value comparisons in terms of
runtime. Consequently, after that point the benefits gained from our improved
scanning method become insignificant.
Fig. 5. Influence of the output size when scanning an off-blade vector of 512 KiB
Adaptive Access Path Selection for Hardware-Accelerated DRAM Loads 9
*****
— Mutta jos jokin vaara uhkaa neitiäsi, niin tulet salaa minulle
ilmoittamaan.
— Kernaasti tulenkin. Mutta entä jos joku olisi kuullut, kun avasin
porstuan oven?…
Miten olikaan, mutta Harald ei voinut mitään sille, että hän luuli
von Assarin panevan uhkauksensa toimeen ja piti siis
uudenvuodenpäivää hyvin tukalana Irenelle. Hän antaisi tämän
kuitenkin kestää tulikokeen, ennenkuin astuisi häntä puolustamaan,
sillä hän epäili omaa rohkeuttaan ja mielenlujuuttaan, ennenkuin
oikein kova hätä olisi häntä rohkaissut. Ehkä hänen läsnäolonsa
myös olisi lykännyt asian lopullisen ratkaisun tuonnemmaksi… mutta
hän tahtoi saada sen nyt kohta päätetyksi. Siksipä hän aikoi pysyä
vaikka koko päivän huoneessaan, jos tarve niin vaatisi, ja sitten olla
niinkuin olisi tullut päivää myöhemmin. Niinkuin muistamme, lupasi
Harald kamarineitsyelle tulla jo aamupäivällä alas. "Aamuhetki kullan
kallis", sanotaan sananparressa ja hän oli melkein vakuutettu siitä,
että jos ei kamarijunkkari jo edellä puolenpäivän tekisi hyökkäystä,
niin se siltä päivältä jäisi tekemättä. Harald kyllä muisti, että
kamarijunkkarin ensimmäinen yritys oli tapahtunut iltapäivällä, mutta
tähän verrattuna se olikin ollut vain leikkiä.
Mitä hän oikein tekisi ja miten esiintyisi, sitä hän ei nyt ajatellut.
Mutta siitä hän mielestään oli varma, että tehokkaisiin toimiin hän
ainakin ryhtyisi.
Tämä hänen pieni juonensa ei suinkaan ollut kauan edeltäpäin
harkittu, vaan tuokion synnyttämä. Sen sijaan, että peräti olisi
joutunut tunteittensa valtaan, hän, astuttuaan Ristilän seinien
sisäpuolelle, muuttui paljon käytännöllisemmäksi kuin oli ollut
ulkosalla.
Kasvatusäitisi."
Näinä viimeisinä päivinä oli ollut hetkiä, jolloin hän oli kironnut
rakkautensa Ireneenkin. Nyt hän ei enää sitä tehnyt. Vaikka se
tuottikin hänelle tuskaa, että hän tiesi väärinkäyttäneensä tuon
rakkauden, niin hän kuitenkin kiitti Jumalaa sen todellisuudesta.
Päivällä oli ollut suoja, joten oli hiukan satanutkin, mutta illemmällä
pilvet alkoivat hajaantua ja nyt yösydännä oli taivas kirkas ja tähtiä
täynnä. Tähtien kirkas valo tuotti lepoa Haraldin sielulle ja hänestä
ne tuntuivat vanhoilta tuttavilta.
LÄNTINEN SIIPIRAKENNUS.
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
textbookfull.com