ML Cloud

Uploaded by

mprachi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views12 pages

ML Cloud

Uploaded by

mprachi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Machine Learning and Cloud Computing: Survey of Distributed and

SaaS Solutions ∗

Daniel Pop
Institute e-Austria Timişoara
Bd. Vasile Pârvan No. 4, 300223 Timişoara, România
arXiv:1603.08767v1 [cs.DC] 29 Mar 2016

E-mail: [email protected]

Abstract pages (text mining, Web mining), spatial data, mul-

timedia data, relational data (molecules, social net-
Applying popular machine learning algorithms to works). Analytics tools allow end-users to harvest the
large amounts of data raised new challenges for the meaningful patterns buried in large volumes of struc-
ML practitioners. Traditional ML libraries does not tured and unstructured data. Analyzing big datasets
support well processing of huge datasets, so that new gives users the power to identify new revenue sources,
approaches were needed. Parallelization using mod- develop loyal and profitable customer relationships,
ern parallel computing frameworks, such as MapRe- and run your overall organization more efficiently and
duce, CUDA, or Dryad gained in popularity and accep- cost effectively.
tance, resulting in new ML libraries developed on top Research in knowledge discovery and machine learn-
of these frameworks. We will briefly introduce the most ing combines classical questions of computer science
prominent industrial and academic outcomes, such as (efficient algorithms, software systems, databases) with
Apache MahoutTM , GraphLab or Jubatus. elements from artificial intelligence and statistics up to
We will investigate how cloud computing paradigm user oriented issues (visualization, interactive mining).
impacted the field of ML. First direction is of popu- Although for more than two decades, parallel
lar statistics tools and libraries (R system, Python) de- database products, such as Teradata, Oracle or Netezza
ployed in the cloud. A second line of products is aug- have provided means to realize a parallel implemen-
menting existing tools with plugins that allow users to tation of ML-DM algorithms, expressing ML-DM al-
create a Hadoop cluster in the cloud and run jobs on gorithms in SQL code is a complex task and difficult
it. Next on the list are libraries of distributed imple- to maintain. Furthermore, large-scale installations of
mentations for ML algorithms, and on-premise deploy- these products are expensive and are not an afford-
ments of complex systems for data analytics and data able option in most cases. Another driver for paradigm
mining. Last approach on the radar of this survey is shift from relational model to other alternatives is the
ML as Software-as-a-Service, several BigData start-ups new nature of data. Until about five years ago, most
(and large companies as well) already opening their so- data was transactional in nature, consisting of numeric
lutions to the market. or string data that fit easily into rows and columns
of relational databases. Since then, while structured
data is following a near-linear growth, unstructured
(e.g. audio and video) and semi-structured data (e.g,
1 Introduction
Web traffic data, social media content, sensor gener-
ated data etc.) exhibit an exponential growth (see fig-
Given the enormous growth of collected and avail- ure 1). Most of the new data is either semi-structured
able data in companies, industry and science, tech- in format, i.e. it consists of headers followed by text
niques for analyzing such data are becoming ever more strings, or pure unstructured data (photo, video, au-
important. Today, data to be analyzed are no longer dio). While the latter has limited textual content and
restricted to sensor data and classical databases, but is more difficult to parse and analyze, semi-structured
more and more include textual documents and web- data triggered a plethora of non-relational data stores
∗ This manuscript was originally published as IEAT Technical (NoSQL data stores) solutions tailored to handle huge
Report at https://www.ieat.ro/technical-reports in 2012. amount of data. Consequently, the past 5 years have

1
mode yet, who are offering machine learning services
to their customers, or big data analysis services can be
noticed in past 5 years. These initiatives can be either
PaaS/SaaS platforms or products that can be deployed
on private environments.

Reviewing the literature and the market, we can

conclude that ML-DM comes in many flavors. We clas-
sify these approaches in 5 distinct classes:

• Machine Learning environments from the cloud –

create a computer cluster in the cloud and boot-
strapping it with statistics tools. ⇒ Section 3.
Figure 1. Trends in data growth • Plugins for Machine Learning tools – augment
statistics tools with plugins that allow users to cre-
ate a Hadoop cluster in the cloud and run ML jobs
seen researchers moving to parallelization of ML-DM
on it. ⇒ Section 4.
using these new platforms, such as NoSQL datastores,
distributed processing environments (MapReduce), or • Distributed Machine Learning libraries – collec-
cloud computing. tions of parallelized implementations of ML al-
At this point, it is worth reflecting to a nice gorithms for distributed environments (Hadoop,
metaphor by Ben Werther [18], co-founder of Platfora, Dryad etc). ⇒ Section 5.
for big data processing today:
• Complex Machine Learning systems – products
In ‘industrial revolution’ terms, we are in that need to be installed on private data centers
the pre-industrial era of artisanship that (or in the cloud) and offers high performance data
mining and analysis. ⇒ Section 6.
preceeded mass production.
It is the equivalent of needing to engage an • Software as a Service providers for Machine Learn-
ing – PaaS/SaaS solutions that allow clients to ac-
expert blacksmith to forge the forks and cess ML algorithms via Web services. ⇒ Section 7.
spoons for our dinner table.
The remaining of the paper is structured as follows:
next section presents similar, recent studies, followed
Machine Learning is inherently a time consuming
by 5 sections, each of them devoted to a particular
task, thus plenty of efforts were conducted to speed-up
class identified above. The paper ends with conclusion
the execution time. Cloud computing paradigm and
and future plans.
cloud providers turned out to be valuable alternatives
to speed-up machine learning platforms. Thus, popular
statistics tools environments – like R, Octave, Python 2 Related studies
– went in the cloud as well. There are two main direc-
tions to integrate them with cloud providers: create a Since 1995, many implementations were proposed
cluster in the cloud and bootstrapping it with statistic for ML-DM algorithms parallelization for shared or
tools, or augment statistic environments with plugins distributed systems. For a comprehensive study the
that allow users to create Hadoop clusters in the cloud reader is referred to a recent survey [17]. Our work
and run jobs on them. is focused in frameworks, toolkits, libraries that al-
Environments like R, Octave, Mapple and similar low large-scale, distributed implementations of state-
offer low-level infrastructure for data analysis, that of-the-art ML-DM algorithms. To this respect, we
can be applied for large datasets once leveraged by mention a recent book dealing with machine learning at
cloud providers. Machine Learning is something that large [1], which contains both presentations of general
comes on top of this and facilitates the retrieval of frameworks for highly scalable ML implementations,
useful knowledge out of huge data for customers with like DryadLINQ or IBM PMLT, and specific imple-
no/less statistical background by automatically infer- mentations of ML techniques on these platforms, like
ring ‘knowledge models’ out of data. To support this ensemble decision trees, SVM, k-Means etc. It contains
need, an explosion of start-ups, some of them in stealth contributions from both industry leaders (Google, HP,

2
IBM, Microsoft) and academia (Berkeley, NYU, Uni- with map/reduce, visualization, security and version
versity of California etc). control packages. Results of data analysis processes,
Recent articles, such as those of S. Charrington [3], named dashboard in Opani, can easily be visualized
W. Eckerson [4] and D. Harris [5], review different and shared from desktop or mobile devices.
large-scale ML solutions providers that are trying to Approaches in this class are powerful and flexible
offer better tools and technologies, most of them based solutions, offering users the possibility to develop com-
on Hadoop infrastructure, to move forward the novel plex ML-DM applications ran on the cloud. Users are
industry of big data. They are aiming at improving freed from the burden of provisioning own distributed
user experience, at product recommendations, or web- environments for scientific computing, while being able
site optimization applicable for finance, telecommuni- to use their favorite environments. On the other side,
cations, retail, advertising or media. users of these tools need to have extensive experience in
programming and strong knowledge of statistics. Per-
3 Machine Learning environments haps, due to this limited audience, the stable providers
from the cloud in this category are fewer than in other categories, some
of them (such as CRdata.org) shutting down the oper-
Providers of this category offer computer clusters ation only shortly after taking off.
using public cloud providers, such as Amazon EC2,
Rackspace etc, pre-installed with statistics software, 4 Plugins for Machine Learning toosl
preferred packages being R system, Octave or Map-
ple. These solutions offer scalable high-performance
In this class, statistics applications (e.g. R system,
resources in the cloud to their customers, who are freed
Python) are extended with plugins that allow users to
from the burden of installating and managing own clus-
create a Hadoop cluster in the cloud and run time con-
ters.
suming jobs over large datasets on it. Most of the in-
Cloudnumbers.com 1 are using Amazon EC2 2 terest went towards R, for which several extensions are
provider to setup computer clusters preinstalled with available, comparing to Python for which less effort
software for scientific computing, such as R system, was invested until recently in supporting distributed
Octave or Mapple. Customers benefit from a web- processing. In this section we will mention several so-
interface where they can create own workspaces, con- lutions for R and Python.
figure and monitor the cluster, upload datasets or con-
nect to public databases. On top of default features RHIPE 6 is a R package that implements a
from cloud provider, Cloudnumbers offers high secu- map/reduce framework for R offering access to a
rity standards by providing secure encryption for data Hadoop installation from within R environment. Us-
transmission and storage. Overall, a HPC platform in ing specific R functions, users are able to launch
the cloud, easy to create and effortless to maintain. map/reduce jobs executed on the Hadoop cluster and
results are then retrieved from HDFS.
CloudStat 3 is a cloud integrated development en-
vironment built based on R system, and exposes its Snow 7 [16] and its variants (snowfall, snowFT) im-
functionalities via 2 types of user interfaces: console – plement a framework that is able to express an impor-
for experienced users in R language, and applications tant class of parallel computations and is easy to use
– designed as a point and click forms based interface within an interactive environment like R. It supports
for R for users with no R programming skills. There three types of clusters: socket-based, MPI, and PVM.
is also a CloudStat AppStore where users can choose Segue for R 8 project makes it easier to run
applications from a growing repository. map/reduce jobs from within R environment on elastic
Opani 4 is offering similar services to Cloudnum- clusters at Amazon Elastic Map Reduce 9 .
bers.com, but additionally helps customers to size their Anaconda 10 is a scalable data analytics and scien-
cluster according to their needs: size of data and the tific computing in Python offered by Continuum An-
time-frame for processing this data. They are us- alytics 11 . It is a collection of packages (NumbaPro –
ing Rackspace’s 5 infrastructure and support environ-
6 http://www.stat.purdue.edu/ sguha/rhipe/doc/html/index.html
ments such as R system, Node and Python, bundled 7 http://cran.r-project.org/web/packages/
1 http://cloudnumbers.com available packages by name.html
2 http://aws.amazon.com/ec2/ 8 http://code.google.com/p/segue/
3 http://cs.croakun.com 9 http://aws.amazon.com/elasticmapreduce/
4 http://opani.com 10 https://store.continuum.io/cshop/anaconda
5 http://rackspace.com 11 http://continuum.io

3
fast, multi-core and GPU-enabled computation, IOPro for good performance also for non-distributed al-
– fast data access, and wiseRF Pine – multi-core imple- gorithms.
mentation of the Random Forest) that enables large-
scale data management, analysis, and visualization and • Scalable to support various business cases. Ma-
more. It can be installed as a full Python distribution hout is distributed under a commercially friendly
or can be plugged into an existing installation. Apache Software license.
Due to its popularity among ML-DM practitioners, • Scalable community. The goal of Mahout is to
R system being the preferred tool for such tasks in past build a vibrant, responsive, diverse community to
2 years [15, 10], efforts have been made recently to par- facilitate discussions not only on the project itself
allelize lengthy processes on scalable distributed frame- but also on potential use cases.
works (Hadoop). This approach is largely preferred
over ML in the cloud due to the possibility to re-use ex- Currently Mahout supports mainly four use cases:
isting infrastructure of research, or industrial (private)
data centers. To the best of our knowledge, there are • Recommendation mining takes users’ behavior
no similar approaches for related mathematical tools, and from that tries to find items users might like
such as Mathematica, Maple or Matlab/Octave, except
• Clustering takes e.g. text documents and groups
HadoopLink 12 for Mathematica. The audience of this
them into groups of topically related documents
class of solutions is also highly qualified in program-
ming languages, mathematics, statistics and machine • Classification learns from existing categorized doc-
learning algorithms. uments what documents of a specific category look
like and is able to assign unlabelled documents to
5 Distributed Machine Learning li- the (hopefully) correct category.
braries • Frequent itemset mining takes a set of item groups
(terms in a query session, shopping cart content)
This category offers complex libraries operating on and identifies, which individual items usually ap-
various distributed setups (Hadoop, Dryad, MPI). pear together.
They allow users to use out-of-the-box algorithms, or
implement their own, that are run in parallel mode over Integration with initiatives such as graph processing
a cluster of computers. These solutions does not inte- platforms Apache Giraph 14 are actively under discus-
grate, nor use, statistics/mathematics software, rather sion. An active community is behind this project.
they offer self-contained packages of optimised, state- GraphLab 15 [11] is a framework for ML-DM in
of-the-art ML-DM methods and algorithms. the Cloud. While high-level data parallel frameworks,
Apache MahoutTM 13 [12] is an Apache project like MapReduce, simplify the design and implementa-
to produce free implementations of distributed or oth- tion of large-scale data processing systems, they do not
erwise scalable machine learning algorithms on the naturally or efficiently support many important data
Hadoop platform [20]. It started as a collection of in- mining and machine learning algorithms and can lead
dependent, ”Hadoop-free” components, e.g. ”Taste” to inefficient learning systems. To help fill this criti-
collaborative-filtering. Its goal is to build scalable ma- cal void, GraphLab is an abstraction which naturally
chine learning libraries, where scalable has a broader expresses asynchronous, dynamic, graph-parallel com-
meaning: putation while ensuring data consistency and achieving
a high degree of parallel performance, in both shared-
• Scalable to reasonably large datasets. Mahout’s memory and distributed settings. It is written in C++
core algorithms for clustering, classification and and is able to directly access data from Hadoop Dis-
batch based collaborative filtering are imple- tributed File System (HDFS) [20]. The authors report
mented on top of Apache Hadoop [20] using the out-performing similar approaches by orders of magni-
map/reduce paradigm. However, it does not re- tude.
strict contributions to Hadoop based implemen-
DryadLINQ 16 [19, 2] is LINQ (Language IN-
tations: contributions that run on a single node
tegrated Query 17 subsystem developed at Microsoft
or on a non-Hadoop cluster are welcome as well.
The core libraries are highly optimized to allow 14 http://incubator.apache.org/giraph/
15 http://graphlab.org
12 https://github.com/shadanan/HadoopLink 16 http://research.microsoft.com/en-us/projects/DryadLINQ/
13 http://mahout.apache.org 17 http://msdn.microsoft.com/netframework/future/linq/

4
Research on top of Dryad [9], a general purpose ar- transform regression, nearest neighbors, k-means,
chitecture for execution of data parallel applications. fuzzy k-means, kernel k-means, PCA, and kernel PCA.
It supports DAG-based abstractions, inherited from One of the main advantages of the PML toolbox is the
Dryad, for implementing data processing algorithms. ability to run it on a variety of operating systems and
A DryadLINQ program is a sequential program com- platforms, from multi-core laptops to supercomputers
posed of LINQ expressions performing arbitrary side- such as BlueGene. This is because the toolbox incor-
effect-free transformations on datasets, and can be porates a parallelization infrastructure that completely
written and debugged using standard .NET develop- separates parallel communications, control, and data
ment tools. The DryadLINQ system automatically and access from learning algorithm implementation. This
transparently translates the data-parallel portions of approach enables learning algorithm designers to
the program into a distributed execution plan which is focus on algorithmic issues without having to concern
passed to the Dryad execution platform that ensures themselves with low-level parallelization issues. It also
efficient and reliable execution of this plan. Authors enables learning algorithms to be deployed on multiple
demonstrate near-linear scaling of execution time on hardware architectures, running either serially or in
the number of computers used for a job. While the parallel, without having to change any algorithmic
DAG-based abstraction permits rich computational de- code. The toolbox uses the popular MPI library as
pendencies, it does not naturally express iterative, data the basis for its operation, and is written in C++.
parallel, task parallel and dynamic data driven algo- Despite of our effort to get latest news on this project,
rithms that are prevalent in ML-DM. we found no recent activity on this project since 2007,
Jubatus 18 [8], started April 2011, is an online/real- except for a chapter in [1] (2012). On the other side,
time machine learning platform, implemented on a dis- the toolkit is suited for parallel environments, not for
tributed architecture. Comparing to MahoutTM is a distributed ones.
next-step platform that offers stream processing and NIMBLE [6] is a sequel project to Parallel Ma-
online learning. In online ML, the model is continu- chine Learning Toolbox, also developed at IBM Re-
ously updated with each data sample that is coming search Labs. It exposes a multi-layered framework
by fast and not memory-intensive algorithms. It re- where developers may express their ML-DM algorithms
quires no data storage, nor sharing; only model mixing. as tasks. Tasks are then passed to the next layer, an
It supports classification problems (Passive Aggressive architecture independent layer, composed of one queue
(PA), Confidence Weighted Learning, AROW), PA- of DAGs of tasks, plus worker threads pool that unfold
based regression, nearest neighbor (LSH, MinHash, Eu- this queue. Next layer is an architecture dependent
clid LSH), recommendation, anomaly detection (LOF layer that translates the generic entities from upper
based on NN) and graph analysis (shortest path, layer into various runtimes. Currently, NIMBLE sup-
PageRank). In order to efficiently support online learn- ports execution on Hadoop platform [20] only. Other
ing, Jubatus operates updates on local models and platforms, such as Dryad [9], are also good candidates,
then each server transmits its model difference that are but not yet supported. Advantages of this framework
merged and distributed back to all servers. The mixed include:
model improves gradually thanks to all servers’ work.
IBM Parallel Machine Learning Tool- • higher level of abstraction, hiding low-level con-
box 19 [13] (PMLT), a joint effort of the Machine trol and choreography details of most of the
Learning group at the IBM Haifa Lab and the Data distributed and parallel programming paradigms
Analytics department at the IBM Watson Lab, pro- (MR, MPI etc), allowing programmers to compose
vides tools for execution of data mining and machine parallel ML-DM algorithms using reusable (serial
learning algorithms on multiple processor environ- and parallel) building blocks
ments or on multiple threaded machines. The toolbox
comprises two main components: an API for running • portability: providing specific implementation for
the users’ own machine learning algorithms, and architecture dependent layer, same code can be
several pre-programmed algorithms which serve both executed on various distributed runtimes
as examples and for comparison. The pre-programmed
algorithms include a parallel version of the Support • efficiency and scalability: due to optimisation in-
Vector Machine (SVM) classifier, linear regression, troduced by DAGs of tasks and co-scheduling, re-
18 http://jubat.us/ sults presented in [6] for Hadoop runtime show
19 https://www.research.ibm.com/haifa/projects/verification/ speedup improvement with increasing dataset size
ml toolbox/index.html and dimensionality.

5
SystemML [7], developed at IBM Research labs tion and modeling, (ii) data integration from NoSQL
as NIMBLE and PMLT, proposes an R-like language and relational databases, (iii) distributed execution on
(Declarative Machine Learning language) that includes Hadoop platform [20], (iv) instant and interactive anal-
linear algebra primitives and shows how it can be ysis (no code, no ETL (Extract, Transform, Load))
optimized and compiled down to MapReduce. They and (v) business analytics platform: data discovery, ex-
report an extensive performance evaluation on three ploration, visualization and predictive analytics. Main
(Group Nonnegative Matrix Factorization, Liner re- characteristics of Pentaho solution include:
gression, Page Rank) ML algorithms on varying data
and Hadoop cluster sizes. • MapReduce-based data processing
Table 5 presents a synthesis on investigated plat- • Can be configured for different Hadoop distribu-
forms. One can notice that Java is the preferred envi- tions (such as Cloudera, Hadapt etc.)
ronment, due to large adoption and usage of Hadoop
as distributed processing model. The good news is the • Data can be loaded and processed into Hadoop
fact that most active and lively solutions are the open- HDFS, HBase 23 , or Hive 24
source ones. Target audience of this class of products
• Supports Pig scripts
are programmers, system developers and ML experts
who need fast, scalable distributed solutions for ML- • Native support for most NoSQL databases, such
DM problems. as Apache Cassandra, DataStax, Apache HBase,
MongoDB, 10gen etc.
6 Complex Machine Learning systems
• Enables performance-optimized data analysis,
reporting and data integration for analytic
This section present several solutions for business
databases (such as Teradata, monetdb, Netezza
intelligence and data analytics that share a set of com-
etc.), through deep integration with native SQL
mon features: (i) all are deployable on on-premise or
dialects and parallel bulk data loader
in-the-cloud clusters, (ii) provide rich set of graphical
tools to analyse, explore and visualize large amounts • Integration wit HPCC (High Performance Com-
of data, (iii) expose a rather limited set of ML-DM puting Cluster) from LexisNexis Risk Solutions 25
functions, usually limited to prediction models and
(iv) utilize Apache Hadoop [20] as processing engine • Import/export from/to PMML (Predictive Mod-
and/or storage environment. There are differences on eling Markup Language)
how data is integrated and processed, supported data
• Pentaho Instaview, a visual application to reduce
sources or related to complexity of the system. Here
the time needed to deploy data analytics solutions
are the most known ones:
and to help novice users to get insights of their
Kitenga Analytics 20 , recently purchased by Dell, data, in three simple steps: select data source, au-
is a native Hadoop application that offers visual ETL, tomatically prepare data for analytics, and visual-
Apache SolrTM 21 -based search, natural language pro- ize and explore built models.
cessing, Apache Mahout-based data mining, and ad-
vanced visualization capabilities. It is a big data en- • Pentaho Mobile - application for iPad that pro-
vironment for sophisticated analysts who want a ro- vides interactive business analytics for business
bust toolbox of analytical tools, all from an easy-to- users
use interface that that does not require understanding
of complex programming or the Apache Hadoop stack Their ecosystem is composed of several powerful sys-
itself. tems, each of them a complex project of its own:
Pentaho BI Platform/Server the BI platform is a
Pentaho Business Analytics 22 offers a complete
framework providing core services, such as authen-
solution for big data analytics, supporting all phases of
tication, logging, auditing and rules engines; it also
an analytics process, from pre-processing to advanced
has a solution engine that integrates all other sys-
data exploration and visualization. It offers (i) a com-
tems (reporting, analysis, integration and data min-
plete visual design tool to accelerate data prepara-
ing); BI Server is the most well known implementation
20 http://www.quest.com/news-release/quest-software-
of the platform, which functions as a web based report
expands-its-big-data-solution-with-new-hadoop-ce-102012-
818658.aspx 23 http://hbase.apache.org
21 http://lucene.apache.org/solr/ 24 hive.apache.org
22 http://www.pentaho.org 25 http://hpccsystems.com

6
management system, application integration server and the tools to start building complex data processing
lightweight workflow engine. pipelines immediately. WibiData also provides graphi-
Pentaho Reporting based on JFreeReport, is a suite cal tools to export your data from its distributed data
of open-source tools – Pentaho Report Designer, Pen- repository into any relational database [21]. In order to
taho Reporting Engine, Pentaho Reporting SDK and simplify data processing using Hadoop, WibiData in-
the common reporting libraries shared with the entire troduces the concepts of producers – computation func-
Pentaho BI Platform – that allows users to create relations that update a row in a table, and gatherers – close
tional and analytical reports from a variety of sources the gap between WibiData table and key-value pairs
outputting results in various formats (HTML, PDF, processed by Hadoop MapReduce engine.
Excel etc.) We are aware that we could not cover all the solu-
Pentaho Data Integration (Kettle) delivers powerful tion provider in the field of business intelligence and big
ETL capabilities using metadata-driven approach with data analytics. We tried to cover those who are also of-
an intuitive, graphical, drag and drop design environ- fering ML components in their applications, many oth-
ment; ers focusing only on big data analytics, such as Alteryx,
Pentaho Analysis Service (Mondrian) is an Online SiSense, SAS or SAP, being omitted from this survey.
Analytical Processing (OLAP) server that supports Solutions in this category target mostly business users,
data analysis in real-time who need to quickly and easily extract insights from
Pentaho Data Mining (Weka) a collection of ma- their data, being good candidates for users with less
chine learning algorithms for classification, regression, computer or statistics background.
clustering and association rules;
Platfora 26 delivers in-memory business intelligence 7 Software as a Service providers for
with no separate data warehouse or ETL required. Its
Machine Learning
visual interface built on HTML5 allows business users
to analyse data. Results may be easily shared between
users. It relies on Hadoop cluster, that can be installed This section focuses on platform-as-a-service, or
either on own premise, or on cloud providers (Amazon software-as-a-service providers for machine learning
EMR and S3). It is primarly focused on BI features, problems. They are offering the services mainly via
such as elaborated visualization types (charts, plots, RESTful interfaces, and in some (rare) cases the solu-
maps), or slice-and-dice operations, but also offers a tion may also be installed on-premise (Myrrix), con-
predictive analysis framework. trasting to solutions from previous section that are
mainly deployable systems on private data centers.
Skytree Server 27 is a general purpose machine
As class of ML problems, predictive modeling is the
learning and data analytics system that supports data
favorite (BigML, Google Prediction API, Eigendog)
coming from relational databases, Hadoop systems, or
among these systems. We did not include in this study
flat files and offers connectors to common statistical
providers of SQL over Hadoop solutions (e.g. Cloudera
packages and ML libraries. ML methods supported are:
Impala, Hadapt, Hive) because their main target is not
Support Vector Machine (SVM), Nearest Neighbor, K-
ML-DM, rather fast, elastic and scalable SQL process-
Means, Principal Component Analysis (PCA), Linear
ing of relational data using the distributed architecture
Regression, 2-point correlation and Kernel Density Es-
of Hadoop.
timation (KDE). Skytree Server connects with analyt-
ics front-ends, such as Web services or statistical and BigML 29 is a SaaS approach to machine learning.
ML libraries (R, Weka), for data visualization. Its de- Users can setup datasources, create, visualize and share
ployment options include cloud providers, or dedicated prediction models (only decision trees are supported),
cluster based on Linux machines. It also supports cus- and use models to generate predictions. All from a
tomers in estimating the size of the cluster they need Web interface or programmatically using REST API.
by a simple formula (Analytics Requirements Index). BitYota 30 is a young start-up (2012) SaaS provider
Wibidata 28 is a complex solution based on for BigData warehousing solution. On top of data in-
open source software stack from Apache, combining tegration from different sources (relational, NoSQL,
Hadoop, HBase and Avro with proprietary compo- HDFS) it also allows customers to run statistics and
nents. WibiData’s machine learning libraries give summarization queries in SQL92, standard R statistics
and custom functions written in JavaScript, Perl, or
26 http://platfora.com
27 http://skytree.net 29 http://bigml.com
28 http://wibidata.com 30 http://bityota.com

7
Python on a parallel analytics engine. Results are vi- Metamarkets 35 claim as being Data Science-as-
sualized by integrating with popular BI tools and dash- a-Service providers, helping users to get insights out
boards. of their large datasets. They offer end-users the pos-
sibility to perform fast, ad-hoc investigations on data,
Precog 31 has a more elaborate SaaS solution com-
to discover new and unique anomalies, to spot trends
posed of Precog database, Quirrel language, Report-
in data streams, based on statistical models, in an in-
Grid and LabCoat tools. At the core of Precog, we
tuitive, interactive and collborative way. They are fo-
have an original (no Hadoop, no other NoSQL based),
cused on business people, less knowledgeable on statis-
schemaless, columnar database designed for storing
tics and machine learning.
and analyzing semi-structured, measured data, such as
events (users clicking, engaging, and buying), sensor Myrrix 36 is a complete, real-time, scalable recom-
data, activity stream data, facts, and other kinds of mender system built using Apache MahoutTM (see Sec-
data that do not need to be mutably updated. Precog’s tion 5). It can be accessed as PaaS using a RESTful
functionality is exposed by REST APIs, but client li- interface. It is able to incrementally update the model
braries are available in JavaScript, Python, PHP, Ruby, once new data is available. It is organized in 2 lay-
Java, or C#. LabCoat is a GUI tool for creation and ers – Serving (open source and free) and Computation
management of Quirrel queries. Quirrel is a a highly (Hadoop based) – that can be deployed on-premise as
expressive data analysis language that makes it easy well, either both of them or only one.
to do in-database analytics, statistics, and machine Prior Knowledge Veritable API 37 offers
learning across any kind of measured data. Results Python and Ruby interfaces; upload data on their
are available in JSON or CSV formats. ReportGrid is servers, and build prediction model using Markov
an HTML5 visualization engine that interactively, or Chain Monte Carlo samplers. They were operating
programmatically, build reports and charts. a cloud based infrastructure based on Amazon WS.
SalesForce.com acquired Prior Knowledge at the end
Google Prediction API 32 is Google’s cloud-
of 2012.
based machine learning tools that can help analyze
your data. It is closely connected to Google Cloud Predictobot 38 by Prediction Appliance also aims
Storage33 where training data is stored and offers its at doing machine learning modeling easier. The user
services using a RESTful interface, client libraries al- will upload a spreadsheet of data, answer a few ques-
lowing programmers to connect from Java, JavaScript, tions, and then download a spreadsheet with the pre-
.NET, Ruby, Python etc. In the first step, the model dictive model. It is going to bring predictive modeling
need to be trained from data, supported models being to anyone with the skills to make a spreadsheet. The
classification and regression for now. After the model business is still in stealth mode.
is built, one can query this model to obtain predic-
tions on new instances. Adding new data to a trained 7.1 Text mining as SaaS
model is called Streaming Training and it is also nicely
supported. Recently, PMML preprocessing feature has Due to explosion of social media technologies, such
been added, i.e. Prediction API .supports preprocess- as blog platforms (WordPress.com, Blogger etc), mini-
ing your data against a PMML transform specified us- blogging (Twitter), or social networks (Facebook,
ing PMML 4.0 syntax; does not support importing of Google+), an increased interest is paid to text min-
a complete PMML model that includes data. Created ing and natural language processing (NLP) solutions
models can be shared as hosted models in the market- delivered as services to their customers. This is why
place. we devoted an entire subsection to group together
software/platform-as-a-service solutions for text min-
EigenDog 34 is a service for scalable predictive ing. Before reviewing available solutions, a short intro-
modeling, hosted on Amazon EC2 (for computation) duction to NLP and text mining is helpful.
and S3 (for data and models storage) platforms. It While NLP uses linguistically inspired techniques
builds decision tree model out of data in Weka’s ARFF (text is syntactically parsed using information from a
format. Models can be downloaded in binary format formal grammar and a lexicon, and the resulting in-
and integrated in user applications thanks to API, or formation is then interpreted semantically and used to
open-source library provided by vendor. extract information) to deeply analyse the document,
31 http://precog.com 35 http://metamarkets.com/
32 https://developers.google.com/prediction/ 36 http://myrrix.com
33 https://developers.google.com/storage/ 37 http://priorknowledge.com
34 https://eigendog.com/#home 38 http://predictobot.com

8
text mining is more recent and uses techniques devel- Ruby, PHP and Objective-C, responses are JSON en-
oped in the fields of information retrieval, statistics, coded and Python NLTK demos are offered to achieve a
and machine learning. Contrasting with NLP, text steep learning curve. For commercial purposes, clients
mining’s aim is not to understand what is ”said” in are offered monthly subscriptions via Mashape.com.
a text, rather to extract patterns across large number Yahoo! Content Analysis Web Service 42 de-
of documents. Features of text mining include extrac- tects entities/concepts, categories, and relationships
tion of concept/entity, text clustering, summarization, within unstructured content. It ranks those detected
or sentiment analysis. entities/concepts by their overall relevance, resolves
Size and number of documents that need to be pro- those if possible into Wikipedia pages, and annotates
cessed, plus real-time processing constrain contribute tags with relevant meta-data. The service is available
to the development of novel, distributed toolkits able as an YQL table and response is in XML format. It is
to answer demanding users’ needs. Websites operators freely available for non-commercial usage.
are willing to offer text mining features to their visitors
This section presented PaaS solutions addressing,
with minimum investment and reduced maintenance
to some extent, machine learning problems. A spe-
costs. Thus, more and more providers are offering text
cial sub-section was devoted to text mining problem
mining services through RESTful web services, saving
due to its spreading in the landscape of ML PaaS
clients from costly infrastructures and deployments.
landscape. We notice big players, such as Yahoo! or
Without aiming at providing an exhaustive survey of
Google, as well as many start-ups with million dollars
text mining P(S)aaS providers, we will mention several
fundings. They offer Web developers the possibility
of them hereafter:
to easily integrate in their sites ML intelligence. Easy
AlchemyAPI 39 is a cloud-based text mining SaaS usage prevailed over functionality offered by these ser-
platform providing the most comprehensive set of NLP vices, therefore there are only limited options of tweak-
capabilities of any text mining platform, including: ing algorithms behind the services. Thus, these are
named entity extraction, sentiment analysis, concept good candidates for users with basic ML needs, but
tagging, author extraction, relations extraction, web are not flexible enough for addressing more advanced
page cleaning, language detection, keyword extraction, problems.
quotations extraction, intent mining, and topic cate-
gorization. AlchemyAPI uses deep linguistic parsing,
statistical natural language processing, and machine 8 Conclusions and future work
learning to analyze your content, extracting semantic
meta-data: information about people, places, compa- Our main findings are synthesized below:
nies, topics, languages, and more. It provides RESTful (1) Existing programming paradigms for express-
API endpoints, SDKs in all major programming lan- ing large-scale parallelism such as MapReduce (MR)
guages and responses are encoded in various formats and the Message Passing Interface (MPI) are de facto
(XML, JSON, RDF). Organizations with specific data choices for implementing ML-DM algorithms. More
security needs or regulatory constraints are offered the and more interest has been devoted to MR due to its
possibility to install the solution on own environment. ability to handle large datasets and built-in resilience
NathanAppTM 40 is AI-one’s general purpose ma- against failures.
chine learning PaaS, also available for deployment on- (2) Machine Learning in distributed environments
premise as NathanNodeTM . Like Topic-Mapper, it is come in different approaches, offering viable and cost
ideally suited to learn the meaning of any human lan- effective alternatives to traditional ML and statistical
guage by learning the context of words, only faster and applications, which are not focused on distributed en-
with greater deployment flexibility. NathanApp is a vironments [14].
RESTful API using JavaScript and JSON. (3) Existing solutions target either experienced,
TextProcessing 41 is also a NLP API that sup- skilled computer scientists, mathematicians, statisti-
ports stemming and lemmatization, sentiment anal- cians or novice users who are happy with no (or few)
ysis, tagging and chunk extraction, phase extraction possibilities to tune the algorithms. Ens-user sup-
and named entity recognition. These services are of- port and guidance is largely missing from existing dis-
fered open and free (for limited usage) via RESTful tributed ML-DM solutions.
API endpoints, client libraries exist in Java, Python, After reviewing over 30 different offers on the mar-
ket, we think that there is still room for a scalable,
39 http://www.alchemyapi.com
40 http://ai-one.com 42 http://developer.yahoo.com/search/content/V2/
41 http://text-processing.com contentAnalysis.html

9
easy to use and deploy solution for ML-DM in the con- [8] S. Hido – Jubatus: Distributed On-
text of cloud computing paradigm, targeting end-users line Machine Learning Framework for
with less programming or statistical experience, but Big Data, XLDB Asia, Beijing, 2012
willing to run and tweak advanced scientific ML tasks, http://www.slideshare.net/JubatusOfficial/
such as researchers and practitioners from fields like distributed-online-machine-learning-framework-
medicine, financial, telecommunications etc. To this for-big-data
respect, our future plans include prototyping such a
distributed system relying on existing distributed ML- [9] M. Isard et al. – Dryad: distributed data-parallel
DM frameworks, but enhancing them with usability programs from sequential building blocks. In
and user friendliness features. SIGOPS Operating System Review, 2007
[10] KD Nuggets Survey 2012,
Acknowledgments http://www.kdnuggets.com/software/suites.html

This work was supported by EC-FP7 project FP7- [11] Y. Low, J. Gonzalez, A. Kyrola, D. Bickson,
REGPOT-2011-1 284595 (HOST). C. Guestrin, J. M. Hellerstein – Distributed
GraphLab: A Framework for Machine Learning
References and Data Mining in the Cloud, Proceedings of the
VLDB Endowment, Vol. 5, No. 8, August 2012,
Istanbul, Turkey
[1] R. Bekkerman, M. Bilenko and J. Lang-
ford (editors) – Scaling up Machine Learn- [12] S. Owen, R. Anil, T. Dunning, E. Friedman – Ma-
ing, Cambridge University Press, 2012, sum- hout in Action, Manning Publications, 2011, ISBN
mary at http://people.cs.umass.edu/˜ronb/ scal- 978-1935182689
ing up machine learning.htm
[13] E. Pednault, E. Yom-Tov, A. Ghoting – IBM Par-
[2] M. Budiu, D. Fetterly, M. Isard, F. McSherry, allel Machine Learning Toolbox, in R. Bekkerman,
and Y. Yu – Large-Scale Machine Learning using M. Bilenko and J. Langford (editors) – Scaling up
DryadLINQ, in R. Bekkerman, M. Bilenko and J. Machine Learning, Cambridge University Press,
Langford (editors) – Scaling up Machine Learning, 2012
Cambridge University Press, 2012
[14] D. Pop, G. Iuhasz – Survey of Machine Learning
[3] S. Charrington – Three New Tools Tools and Libraries, Institute e-Austria Timişoara
Bring Machine Learning Insights to the Technical Report, 2011
Masses, February 2012, Read Write Web,
http://www.readwriteweb.com/hack/2012/02/ [15] Rexer Analytics Survey 2011,
three-new-tools-bring-machine.php http://www.rexeranalytics.com/Data-Miner-
Survey-Results-2011.html
[4] W. Eckerson – New technologies
for Big Data, http://www.b-eye- [16] L. Tierney, A. J. Rossini, Na Li – Snow: A parallel
network.com/blogs/eckerson/archives/2012/11/ computing framework for the R System, Int J Par-
new technologie.php (2012) allel Prog (2009) 37:78–90, DOI 10.1007/s10766-
008-0077-2
[5] D. Harris – 5 low-profile startups that could
change the face of big data, Januray 2012, [17] S. R. Upadhyaya – Parallel approaches to ma-
http://gigaom.com/cloud/5-low-profile-startups- chine learning—A comprehensive survey, Journal
that-could-change-the-face-of-big-data/ of Parallel and Distributed Computing, Volume
73, Issue 3, March 2013, Pages 284–292
[6] A. Ghoting, P. Kambadur, E. Pednault, and R.
Kannan – NIMBLE: A Toolkit for the Imple- [18] B. Werther – Pre-industrial age of big data, June
mentation of Parallel Data Mining and Machine 2012, http://www.platfora.com/pre-industrial-
Learning Algorithms on MapReduce, KDD 11 age-of-big-data/
[7] A. Ghoting et al. – SystemML: Declarative ma- [19] Y. Yu, M. Isard, D. Fetterly, M. Budiu, U. Erlings-
chine learning on mapreduce. In Proceedings of son, P. Kumar Gunda, J. Currey – DryadLINQ:
the 2011 IEEE 27th International Conference on A System for General-Purpose Distributed Data-
Data Engineering, ICDE 11, pages 231-242, Wash- Parallel Computing Using a High-Level Language,
ington, DC, USA, 2011 In OSDI, 2008

10
[20] Apache Hadoop Webseite, Daniel Pop received his PhD degree in computer
http://hadoop.apache.org (2012) science from West University of Timişoara in 2006.
He is currently a senior researcher at Department of
Computer Science, Faculty of Mathematics and Com-
puter Science, West University of Timişoara. Research
interests covers high performance computing and dis-
tributed computing technologies, machine learning and
knowledge discovery and representation, and multi-
agent systems. He also has a broad experience in IT
industry (+15 years), where he applied agile software
development processes, such as SCRUM and Kanban.

[21] WibiData How It Works,

http://www.wibidata.com/product/how-it-
works/ (2012)

11
Name Platform Licensing Language Activity
Mahout Hadoop Apache 2 Java High
GraphLab MPI / Hadoop Apache 2 C++ High
DryadLINQ Dryad Commercial .NET Low
Jubatus ZooKeeper LGPL 2 C++ Medium
NIMBLE Hadoop ? Java Low
SystemML Hadoop ? DML Low

Table 1. Distributed Frameworks for ML-DM

COS2614 2024 Assignment 1
No ratings yet
COS2614 2024 Assignment 1
3 pages
Big Data Unit 1 Notes
100% (1)
Big Data Unit 1 Notes
27 pages
Tailwind Css Starter Kit
100% (1)
Tailwind Css Starter Kit
47 pages
Anaconda's Guide To Open-Source: Tools and Libraries For Enterprise Data Science and Machine Learning
No ratings yet
Anaconda's Guide To Open-Source: Tools and Libraries For Enterprise Data Science and Machine Learning
29 pages
Big Data Analytics Methods and Applications Jovan Pehcevski
100% (5)
Big Data Analytics Methods and Applications Jovan Pehcevski
430 pages
Super PTI' Guidelines For Very Sensitive Cargo Container: Reefer Procedures
No ratings yet
Super PTI' Guidelines For Very Sensitive Cargo Container: Reefer Procedures
4 pages
Machine Learning and Cloud Computing: Survey of Distributed and Saas Solutions
No ratings yet
Machine Learning and Cloud Computing: Survey of Distributed and Saas Solutions
13 pages
Developing Analytic Talent: Becoming a Data Scientist
From Everand
Developing Analytic Talent: Becoming a Data Scientist
Vincent Granville
3/5 (7)
A Survey On Parallel Clustering Algorithms For Big Data
No ratings yet
A Survey On Parallel Clustering Algorithms For Big Data
33 pages
Modern Technologies Driving Digital Transformation
No ratings yet
Modern Technologies Driving Digital Transformation
6 pages
The Power of Big Data: Transforming Industries and Shaping the Future
From Everand
The Power of Big Data: Transforming Industries and Shaping the Future
Tom Henricksen
No ratings yet
A New Platform For Distributed
No ratings yet
A New Platform For Distributed
19 pages
A Research On Machine Learning Methods For Big Data Processing, and Youming Sun
No ratings yet
A Research On Machine Learning Methods For Big Data Processing, and Youming Sun
9 pages
Retrieve
No ratings yet
Retrieve
40 pages
Review Paper On Big Data Analytics in Cloud Computing: July 2017
No ratings yet
Review Paper On Big Data Analytics in Cloud Computing: July 2017
6 pages
2021 Article 9362
No ratings yet
2021 Article 9362
21 pages
Machine Learning Tools and Toolkits in The Explora
No ratings yet
Machine Learning Tools and Toolkits in The Explora
7 pages
Big Data A Comprehensive Overview
No ratings yet
Big Data A Comprehensive Overview
25 pages
This Document Is Published In:: Institutional Repository
No ratings yet
This Document Is Published In:: Institutional Repository
9 pages
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet
UNIT IV - Iot - 1
No ratings yet
UNIT IV - Iot - 1
27 pages
HW14
No ratings yet
HW14
5 pages
Learning and Big Data AI, Machine
No ratings yet
Learning and Big Data AI, Machine
42 pages
Introduction To Big Data Analytics
No ratings yet
Introduction To Big Data Analytics
33 pages
C - B D A - A S C R F D: Loud Based IG ATA Nalytics Urvey of Urrent Esearch and Uture Irections
No ratings yet
C - B D A - A S C R F D: Loud Based IG ATA Nalytics Urvey of Urrent Esearch and Uture Irections
12 pages
Unit1 - BDH
No ratings yet
Unit1 - BDH
77 pages
Seminar - Report Kiran
No ratings yet
Seminar - Report Kiran
14 pages
Krishna Rungta - TensorFlow in 1 Day Make Your Own Neural Network (2018) - Trang-4
No ratings yet
Krishna Rungta - TensorFlow in 1 Day Make Your Own Neural Network (2018) - Trang-4
11 pages
Unit-1 Introduction To Data Analytics
No ratings yet
Unit-1 Introduction To Data Analytics
35 pages
Unit 4
No ratings yet
Unit 4
28 pages
Magic Quadrant For D 799982 NDX
No ratings yet
Magic Quadrant For D 799982 NDX
36 pages
A Mini Review of Machine Learning in Big Data Analytics Applications
No ratings yet
A Mini Review of Machine Learning in Big Data Analytics Applications
17 pages
The Growing Enormous of Big Data Storage
No ratings yet
The Growing Enormous of Big Data Storage
6 pages
Big Data Deep Learning: Challenges and Perspectives
0% (1)
Big Data Deep Learning: Challenges and Perspectives
12 pages
Big Data Unit 1 Notes
No ratings yet
Big Data Unit 1 Notes
20 pages
Streaming Big Data Processing in Datacenter Clouds: Blue Skies
No ratings yet
Streaming Big Data Processing in Datacenter Clouds: Blue Skies
6 pages
Databricks State of Data Report 010524 v9 Final
No ratings yet
Databricks State of Data Report 010524 v9 Final
27 pages
1.2. Preparing Machine Learning Environment: Installation of Python (In Windows OS)
No ratings yet
1.2. Preparing Machine Learning Environment: Installation of Python (In Windows OS)
8 pages
Data Science
No ratings yet
Data Science
87 pages
Spark On Hadoop Vs MPI OpenMP On Beowulf
No ratings yet
Spark On Hadoop Vs MPI OpenMP On Beowulf
10 pages
Book Chapter
No ratings yet
Book Chapter
23 pages
Big Data Pyq 2023 Solution
No ratings yet
Big Data Pyq 2023 Solution
18 pages
Unit 1
No ratings yet
Unit 1
11 pages
Databricks 2023 State of Data Report 06072023-v2
No ratings yet
Databricks 2023 State of Data Report 06072023-v2
25 pages
BIT4440 BSE4040 CloudComputing 3.big Data Technologies
No ratings yet
BIT4440 BSE4040 CloudComputing 3.big Data Technologies
43 pages
Machine Learning With Spark Nick Pentreath Download
No ratings yet
Machine Learning With Spark Nick Pentreath Download
61 pages
Big Data 2
No ratings yet
Big Data 2
49 pages
BIG DATA Notes
No ratings yet
BIG DATA Notes
11 pages
BDA BigDataArchitecturesAndModelManagement
No ratings yet
BDA BigDataArchitecturesAndModelManagement
48 pages
Unit 2 AI & ML
No ratings yet
Unit 2 AI & ML
86 pages
Exploring The Intersection of Deep Learning and Big Data
No ratings yet
Exploring The Intersection of Deep Learning and Big Data
4 pages
Cours BI 23 24 Session 4 2
No ratings yet
Cours BI 23 24 Session 4 2
46 pages
Deep Learning With TensorFlow and Spark Using GPUs and Docker Containers Presentation
No ratings yet
Deep Learning With TensorFlow and Spark Using GPUs and Docker Containers Presentation
38 pages
Introduction To Big Data Platform
No ratings yet
Introduction To Big Data Platform
20 pages
ML Platforms
No ratings yet
ML Platforms
5 pages
Big Data Distributed Platforms
No ratings yet
Big Data Distributed Platforms
18 pages
Big Data Technologies UNIT 1
No ratings yet
Big Data Technologies UNIT 1
5 pages
Big Data Introduction
No ratings yet
Big Data Introduction
7 pages
Top 10 Big Data Trends
No ratings yet
Top 10 Big Data Trends
13 pages
Sakhr - Chaib - Paper On Data Mining
No ratings yet
Sakhr - Chaib - Paper On Data Mining
3 pages
1 s2.0 S2666764921000485 Main
No ratings yet
1 s2.0 S2666764921000485 Main
11 pages
Big Data Complete Notes
No ratings yet
Big Data Complete Notes
33 pages
Scalable Data Pipelines: Architecting For The Petabyte Era
From Everand
Scalable Data Pipelines: Architecting For The Petabyte Era
Oreoluwa Adebayo
No ratings yet
Httpsekursy - Put.poznan - Plpluginfile.php1626355mod Resourcecontent31 Imperative SHORT PDF
No ratings yet
Httpsekursy - Put.poznan - Plpluginfile.php1626355mod Resourcecontent31 Imperative SHORT PDF
17 pages
Atm Machine
No ratings yet
Atm Machine
12 pages
Domain - 1 - Agile Principles and Mindset - Minitest - With Answer
No ratings yet
Domain - 1 - Agile Principles and Mindset - Minitest - With Answer
37 pages
Degree (BCA) Marks Card
No ratings yet
Degree (BCA) Marks Card
1 page
E47528 Control Panel Messages Document (CPMD)
No ratings yet
E47528 Control Panel Messages Document (CPMD)
354 pages
Book
No ratings yet
Book
475 pages
Cyber Security Course Index
No ratings yet
Cyber Security Course Index
4 pages
Barco Alchemy ICMP-X: Integrated Cinema Media Processor (ICMP) - Compatible With Series 2 and Series 4 Projectors
No ratings yet
Barco Alchemy ICMP-X: Integrated Cinema Media Processor (ICMP) - Compatible With Series 2 and Series 4 Projectors
3 pages
Database Course For Electrical Engineering (Full)
No ratings yet
Database Course For Electrical Engineering (Full)
63 pages
NIST AI Attacks Adversarial AI NIST - Ai.100-2e2025
No ratings yet
NIST AI Attacks Adversarial AI NIST - Ai.100-2e2025
127 pages
How To Implement Icsp™ Using Pic16F8X Flash Mcus: Author: Rodger Richey Microchip Technology Inc
No ratings yet
How To Implement Icsp™ Using Pic16F8X Flash Mcus: Author: Rodger Richey Microchip Technology Inc
4 pages
MIC Microproject 22415 Final
No ratings yet
MIC Microproject 22415 Final
20 pages
Fresher Linux, AWS and DeVops Interview Questions & Answers
No ratings yet
Fresher Linux, AWS and DeVops Interview Questions & Answers
7 pages
Manual
No ratings yet
Manual
93 pages
XXX
No ratings yet
XXX
66 pages
1081 IssuesAddressedList 072420
No ratings yet
1081 IssuesAddressedList 072420
13 pages
PDF NTP 251009 - Compress
No ratings yet
PDF NTP 251009 - Compress
9 pages
Canopen: Application Profile For Lift Control Systems
No ratings yet
Canopen: Application Profile For Lift Control Systems
115 pages
Unit 4-Solidity
No ratings yet
Unit 4-Solidity
39 pages
Simplified Owl Ontology Editing: Is Webprotégé Enough?
No ratings yet
Simplified Owl Ontology Editing: Is Webprotégé Enough?
40 pages
Owlet Smart Sock User Guide - English
No ratings yet
Owlet Smart Sock User Guide - English
28 pages
Getting Started With Smart DFM
100% (1)
Getting Started With Smart DFM
19 pages
Cst305 System Software, December 2022
No ratings yet
Cst305 System Software, December 2022
2 pages
How To Download Free VidMate
No ratings yet
How To Download Free VidMate
8 pages
Example Lab Mis 201-2nd Semester Syllabus 1436-37
No ratings yet
Example Lab Mis 201-2nd Semester Syllabus 1436-37
2 pages
IJRPR1903
No ratings yet
IJRPR1903
4 pages
1431-5403-1-1 Ict Spring 2025
No ratings yet
1431-5403-1-1 Ict Spring 2025
18 pages

ML Cloud

Uploaded by

ML Cloud

Uploaded by

Machine Learning and Cloud Computing: Survey of Distributed and

Abstract pages (text mining, Web mining), spatial data, mul-

Reviewing the literature and the market, we can

• Machine Learning environments from the cloud –

[21] WibiData How It Works,

Table 1. Distributed Frameworks for ML-DM

You might also like