Bigdata
Bigdata
net/publication/349519216
CITATIONS READS
3 2,095
1 author:
SEE PROFILE
All content following this page was uploaded by Md. Anwar Hossain on 26 February 2021.
https://doi.org/10.34104/ajeit.021.0109
ABSTRACT
End the age of digitalization, data generated from numerous online and offline sources in every second. The
Data are having a considerable amount of size and several properties termed as Bigdata. It is challenging to
store, manage process, analyze, visualize, and extract useful information from Bigdata using traditional
approaches in local machines. To resolve this cloud computing platform is the solution. Cloud computing
has high-level processing units, storage, and applications that do not depend on user devices' performance.
Many users can access resources and demanded services remotely from the cloud on a pay-as-use basis. That
is why users are not needed to buy and install costly resources locally. Some cloud services providers are
Google, AWS, IBM, and Microsoft, and they have their Bigdata analyzing robust systems and products in a
cost-efficient manner. There are many Cloud Service Providers (CSP's) having different services of Bigdata
analyzing filed. However, we discuss in the paper about an excellent service BigQuery in the Data ware
house product of Google to analyze and represent numerous samples of datasets in real-time for making the
right decisions within a short time.
Keywords: Bigdata, Cloud computing, Services provider of cloud, Google cloud computing, and BigQuery.
BigQuery in a platform of cloud computing for Several researchers define Bigdata from a differing
research and educational purposes and recommends point of view. Bigdata may be distinguished by as
only how the platform can be utilized in data 5Vs, i.e., Volume, Variety, Velocity, Veracity, and
analysis. (Tomar and Tomar, 2018) present an Value.
overview of Bigdata and cloud computing
1) Volume: It is a sign of the enormous data sets
integration from two sources, i.e., RedBus and
generated at high-frequency rates
Twitter. This paper only discusses the data analysis
2) Variety: This deals with the various categories
framework and some methods but does not present a
of data, such as structured, semi-structured,
better analysis process in detail.
and unstructured.
(Kotecha and Joshiyara, 2018) present a method of 3) Velocity: This means the processing speed of
managing and handling non-rational data on data at which an application might create data.
BigQuery and calculating the processing time. This 4) Veracity: This means the accuracy, truth-
paper only covers the analysis time with the fulness in the data, also if its authentication.
dataset’s size using Google SDK rather than 5) Value: The remarkable attributes of Bigdata
extracting the taken dataset’s necessary values. that means the ways of finding hidden values
(Harsha, 2017) discuss some important roles and from the datasets instantly (Juneja and Das,
analyzing tools of Bigdata regarding cloud comput- 2016).
ing technology. However, These papers have not Some vital statistics of Bigdata:
represented any analyzing methods of Bigdata in a
clear view. (Bathla, 2018) presents a theoretical 1) In everyday data are adding in the amount
discussion on Bigdata management tools in the cloud over 1 billion through Google queries and
platform rather than analyzing the non-structural sending 294 billion mails
data within a concise time. 2) By a minute, 65,972 Instagram photos are
posted, 448,800 tweets are composed, and 500
(Riahi and Riahi, 2018) discuss the fundamentals of hours' worth of YouTube videos is uploaded.
Bigdata, its challenges, and its applications of it in 3) By 2020, the number of smartphone users
data analytics. The paper covers Hadoop use, an could reach 6.1 billion. Moreover, considering
open-source framework to manage data from diff- the Internet of Things into account, there could
erent sources and analyze it. Our presented papers be 26 billion connected devices by then
highlight cloud-based Bigdata analysis techniques (THORNTECH, 2020).
using the BigQuery service of google without the
infrastructural development and database adminis- 3.1 Categories of Bigdata
trators. BigQuery has the option of SQL commends We can classify Bigdata according to these five
in an easy optionof extraction the useful information aspects (a) data sources, (b) content format, (c) data
from Bigdata in realtime. stores, (d) data staging, and (e) data processing
(Hashem et al., 2015).
3. Bigdata
Nowadays, a large volume of data produced from
several offline and online sources every second.
These data refer to Bigdata. It is difficult to store,
process, and analyze Bigdata through traditional
database technologies. Bigdata is indistinct and
requires substantial processes of classification and
conversion knowledge into new insights. Gartner
defined Bigdata as high volume, high velocity, and a
wide variety of information assets that required new
forms of processing to enable enhanced decision
making; insight discovery also processes optima-
zation" (Harjinder, 2019).
Fig 1: Categories of Bigdata.
UniversePG l www.universepg.com 2
Ali et al., / Australian Journal of Engineering and Innovative Technology, 3(1), 1-9, 2021
services, BigQuery is as nerveless, user-friendly time and environmental setup. Besides data in
low-cost Data ware house for analytic (Kumar, different forms and sources, it is not easy to extract
2016). useful information for decision-making. The follow-
ing Fig 4 shows the basic architecture of integration
of Bigdata into cloud computing from multiple
sources (Tomar & Tomar, 2018).
4.2 BigQuery
BigQuery is the fully controlled data in the cloud
platform. The ware house allows carrying out sub- Fig 4: Integration of Bigdata into Cloud.
stantial economic queries-data amounts at speeds
one would expect from Google. Taking advantage of 5. Case Study
low pricing and Google's world-class scalability and We will focus the studies for Bigdata on google
protection infrastructure provides business insights cloud. We consider this by a problem statement for
with strength (Kotecha and Joshiyara, 2018). Big- the case of dataset 01(ted_main.csv) and dataset 02
Query is a petabyte-scale, one of the fastest data (appstore_games.csv). You can load any dataset of
warehouse solutions for Bigdata analysis. Without the following formats in Google cloud:
infrastructure and database administrator, one can
CSV
easily query, represent, and analyze Bigdata as
JSON (newline delimited only)
similar SQL commends by BigQery. Hence, most
Avro
institutions and business organizations are used,
ORC
from startups to Fortune 500 companies (Kumar,
Parquet
2016). Fig 3 shows the data sources that are
integrated into BigQuery (Bussiness2Community, Explanation of the steps:
2020).
1. Firstly, setting up the environment for
BigQuery in Google Cloud -
Login to Console
Login to Big Query
Browsing Publicly available sample
tables
2. Real-life case study-Downloading the
Publicly available data set
Uploading on Google Big Query
Result set/ Query E
3. Results of the case study
Fig 3: Bigdata from multiple sources into
For the case of a public dataset
Google BigQuery.
For the case of real-time dataset
4.3 Bigdata Integration into Cloud
We create a project on Google Cloud Platform and
Bigdata and cloud computing and are very closely use the service of BigQuery. It is possible to access
interrelated. We cannot think about analyzing of publicly available datasets and queries through
Bigdata in our local machine considering processing structured query language (SQL) to see various
UniversePG l www.universepg.com 4
Ali et al., / Australian Journal of Engineering and Innovative Technology, 3(1), 1-9, 2021
outputs and data processing speed in BigQuery's dataset, upload it into BigQuery Data ware house
data ware house. and then run a query to find out the result. We take a
1. Accessing publicly available sample data sets sample dataset TED talks collected from
in BigQuery Data ware house: www.kaggle.com in the format of CSV. These
a) Click on product and services datasets provide metadata on all TED Talk audio-
b) Click on a product category of Big- video recordings posted to TED.com's official
Query website until March 2020 (TED, 2020). This dataset
c) Click on bigquery-public-data-sets downloaded has information about all the recordings
2. Browsing publicly available datasets and which were uploaded on YouTube at different times.
running some queries with the query editor. TED stands for "Technology, Entertainment, and
3. After clicking on the tables, for example, Wiki- Design" as a media company that publishes free
pedia and natality, one can see metadata about dissemination talks online under the slogan "ideas
the table. Metadata represents information worth spreading" (WIKIPEDIA, 2020).
about data. In Fig 5 and Fig 6, column details Problem Statement: The main target is to find the
can be seen about a Wikipedia table and the top 2000 topics from Ted Talks for dataset 01
natality table. The tables can be queried by (ted_main.csv) at YouTube having maximum views
clicking the 'Query Table' button on the top of all time from dataset downloaded and find top
right in the web console. 1500 games at App Store counting maximum user
rating from for data set 02 (appstore_games.csv). For
the solution, the problem statement, we used datasets
from external and internal sources.
In Fig 9, the file path is given, which was The final result for Dataset 01: Click query table
downloaded from www.kaggle.com. as per Fig10, and writing below SQL query resulted
in needed output for finding out the top 2000 topics
in the TED event on dataset 01 ted_main.cvs has
maximum user rating.
UniversePG l www.universepg.com 7
Ali et al., / Australian Journal of Engineering and Innovative Technology, 3(1), 1-9, 2021
main analyzing product of Google BigQuery is used me and giving me the proper support to complete the
here to have its smooth managing and handling research work.
capability. The outcome of this study is analyzing
9. CONFLICTS OF INTEREST:
Bigdata (structured, semi-structured, and non-
structured) in a real-life scenario. Data from several The authors declare that they do not have competing
sources are processed and represented cost- interests regarding the publication of the paper.
effectively without infrastructure development and
10. REFERENCES:
database administrators.
1) Balachandran, B., and Prasad, S., (2017).
7. CONCLUSION AND FUTURE WORK: Challenges and Benefits of Deploying Big
Data is an important capital for firms, organizations, Data Analytics in the Cloud for Business
and other business areas in technology. The proce- Intelligence, in International Conference on
ssing of data is so essential to an organization or Knowledge-Based and Intelligent Infor-
industry to take the right decision instantly. How- mation and Engineering System, Marseille,
ever, processing and storing a massive amount of France, pp. 1113–1121.
different data types (which are in the form of text, https://doi.org/10.1016/j.procs.2017.08.138
audio, and video, etc.) using traditional techniques 2) Bussiness2Community, (2020). Google Big-
and methods are so complicated, time-consuming, Query: A Tutorial for Marketer.
and costly. Moreover, the traditional servers and https://www.business2community.com/market
database have some limitations in handling these ing/google-bigquery-a-tutorial-for-marketers-
data categories efficiently, which is why evolution to 02252216
cloud computing began. In our paper, we use the 3) Harjinder Kaur, D. M. S. G. (2019). Role of
Bigquery service product of google Datawarehouse Big Data in Cloud Computing: A Review,
for solving the issue. Data (structured, semi-struc- International Journal of Engineering
tured, and non-structured) from real-time sources is Research & Technology (IJERT), 8(7), pp.
uploaded to the google cloud platform and using 866-869.
BigQuery we can instantly extract necessary infor- 4) Harsha, T. (2017). Big Data Analytics in the
mation from the data. Moreover, BigQuery is Cloud Computing Environment. Internat-
serverless,cost-effective, and easily handled. With- ional Journal of Scientific & Engineering
out infrastructure development and administrator, we Research, 8(8), 393-398.
https://cutt.ly/TjX1wv1
can query, analyze and represent Bigdata within a
5) Hashem, I., Yaqoob, I., Anuar, N., Mokhtar,
few seconds.
S., Gani, A., and Ullah Khan, S. (2015). The
The paper's main outcome is a representation of big rise of 'big data' on cloud computing:
data from a different perspective to instantly take Review and open research issues, Infor-
action with the respect of organizations, industry, or mation Systems, 47, pp. 98–115.
any business area. This research will be carried out https://doi.org/10.1016/j.is.2014.07.006
in the future via a more significant number of 6) Islam, M., and Reza, S., (2019). The Rise of
experimental datasets. The work will be expanded to Big Data and Cloud Computing, Internet of
BigQueryGIS GIS, rooted in spatial science, incor- Things and Cloud Computing, 7(2), 45–53.
porates multiple data types. The spatial location is https://doi.org/10.11648/j.iotcc.20190702.12
analyzed, and information layers are structured into 7) Juneja, A., and Das, N. (2019). Big Data
visualizations using maps and 3D scenes. Quality Framework: Pre-Processing Data in
Weather Monitoring Application, in 2019
8. ACKNOWLEDGEMENT: International Conference on Machine
Firstly, I acknowledge Almighty Allah's help beca- Learning, BigData, Cloud and Parallel Com-
use it was impossible to be done without the help of puting (Com-IT-Con), India, pp. 559-563.
Allah. Furthermore, thank the co-authors and my https://doi.org/10.1109/COMITCon.2019.8862
honorable teachers, the Department of Information 267
and Communication Engineering, Pabna University 8) K. Bathla, R., G, S. (2018). Research Ana-
of Science and Technology (PUST), for supervised lysis of Big Data and Cloud Computing with
UniversePG l www.universepg.com 8
Ali et al., / Australian Journal of Engineering and Innovative Technology, 3(1), 1-9, 2021
Citation: Ali MH, Hosain MS, and Hossain MA. (2021). Big data analysis using bigquery on cloud
computing platform, Aust. J. Eng. Innov. Technol., 3(1), 1-9.
https://doi.org/10.34104/ajeit.021.0109
UniversePG l www.universepg.com 9