Ingesting Data: © Hortonworks Inc. 2011 - 2018. All Rights Reserved
Ingesting Data: © Hortonworks Inc. 2011 - 2018. All Rights Reserved
nfs gateway
The Files View is an Ambari Web UI plug-in providing a graphical interface to HDFS.
Create a
directory
The Files View is an Ambari Web UI plug-in providing a graphical interface to HDFS.
Upload a file.
The Files View is an Ambari Web UI plug-in providing a graphical interface to HDFS.
Rename
directory.
The Files View is an Ambari Web UI plug-in providing a graphical interface to HDFS.
Go up one
directory.
The Files View is an Ambari Web UI plug-in providing a graphical interface to HDFS.
Delete to Trash or
permanently.
The Files View is an Ambari Web UI plug-in providing a graphical interface to HDFS.
Move to another
directory.
The Files View is an Ambari Web UI plug-in providing a graphical interface to HDFS.
Go to directory.
The Files View is an Ambari Web UI plug-in providing a graphical interface to HDFS.
Download to local
system.
⬢ REST API for accessing all of the HDFS file system interfaces:
– http://host:port/webhdfs/v1/test/mydata.txt?op=OPEN
– http://host:port/webhdfs/v1/user/train/data?op=MKDIRS
– http://host:port/webhdfs/v1/test/mydata.txt?op=APPEND
DFSClient
NFSv3
NFS ClientProtocol
NFS NN
Gate
Client
way Data
Trans
ferP
roto
col
DN
⬢ Credentials can be included in the connect string, so use the --username and
--password arguments
⬢ Must specify either a table to import using --table or the result of an SQL query using
--query
Channel
Log Data
Event Data
Social Media Source Sink
etc...
Flume Agent
Flume uses a Channel between the A background process
Source and Sink to decouple the
processing of events from the storing of
events.
Hadoop
cluster
including to HDFS
spout stream
bolt
Storm topology
Various types of message queues are often the source of the data processed by
real-time processing engines like Storm
real-time message
Storm
data source queue
operating systems,
log entries, events, Kestrel, RabbitMQ,
services and data from queue is
errors, status AMQP, Kafka, JMS,
applications, read by Storm
messages, etc. others…
sensors
Dstream
Spark Streaming
Perishable
Hortonworks DataFlow
Insights
(HDF)
powered by Apache NiFi
Internet
of Anything Hortonworks Data Platform (HDP)
powered by Apache Hadoop Historical
Hortonworks Data Platform Insights
powered by Apache Hadoop
Hortonworks DataFlow and the Hortonworks Data Platform
deliver the industry’s most complete Big Data solution
Hadoop
Raw Network Stream
Kafka
Network Metadata Stream
Storm Spark
Data Stores
Phoenix
HDF
Syslog HBase Hive SOLR
⬢ There are many different ways to ingest data including customer solutions written via
HDFS APIs as well as vendor connectors
⬢ Streaming and batch workflows can work together in a holistic system
⬢ The NFS Gateway may help some legacy systems populate data into HDFS
⬢ Sqoop’s configurable number of database connection can overload an RDBMS
⬢ The following are streaming frameworks:
– Flume
– Storm
– Spark Streaming
– HDF / NiFi