SlideShare a Scribd company logo
1 ©HortonworksInc. 2011–2018. All rightsreserved.
© Hortonworks, Inc. 2011-2018. All rights reserved. | Hortonworks confidential and proprietary information.
Open Source Computer Vision with TensorFlow, MiniFi,
Apache NiFi, OpenCV, Apache Tika and Python
Timothy Spann, Solutions Engineer
Hortonworks @PaaSDev
Vision Thing
2 ©HortonworksInc. 2011–2018. All rightsreserved.
Disclaimer
• This document may contain product features and technology directions that are under
development, may be under development in the future or may ultimately not be
developed.
• Technical feasibility, market demand, user feedback, and the Apache Software
Foundation community development process can all effect timing and final delivery.
• This document’s description of these features and technology directions does not
represent a contractual commitment, promise or obligation from Hortonworks to deliver
these features in any generally available product.
• Product features and technology directions are subject to change, and must not be
included in contracts, purchase orders, or sales agreements of any kind.
• Since this document contains an outline of general product development plans,
customers should not rely upon it when making a purchase decision.
4 ©HortonworksInc. 2011–2018. All rightsreserved.
Agenda
• Architecture
• OpenCV
• TensorFlow
• Apache Tika
• Apache NiFi and MiniFi
• Apache Kafka
• Schema Registry
• Streaming Analytics Manager
• Demos
• Questions
5 ©HortonworksInc. 2011–2018. All rightsreserved.
Use Cases
So Why Am I Ingesting Images From Edge Devices?
Object Recognition
• WebCam Security
• Anomaly Detection
• Logging
• Metadata about images
• Customer Analysis
Image Classification Motion Estimation
• Movement tracking
• Security
• Occupied Room
Active Archive
• Store all images
• Training datasets
• Joining With Other Data
• Cameras Everywhere
6 ©HortonworksInc. 2011–2018. All rightsreserved.
Architecture
7 ©HortonworksInc. 2011–2018. All rightsreserved.
Open Computer Vision Flow
Ingestion
Simple Event Processing
Engine
Stream Processing
Destination
Data Bus
Build
Predictive Model
From Historical Data
Deploy
Predictive Model
For Real-time Insights
Perishable Insights
Historical Insights
8 ©HortonworksInc. 2011–2018. All rightsreserved.
Open Source Image Analytical Components
Streaming Analytics
Manager
Image Ingest
Distributed queue
Buffering
Process decoupling
Routing and Pre-Processing
Orchestration
Queueing
Simple Event Processing
Image Capture
Image Processing
9 ©HortonworksInc. 2011–2018. All rightsreserved.
Streaming Analytics
Manager
Part of MiniFi C++ Agent
Detect metadata and data
Extract metadata and data
Content Analysis
Deep Learning Framework
Open Source Image Analytical Components
Enabling Record Processing
Schema Management
10 ©HortonworksInc. 2011–2018. All rightsreserved.
Aggregate all data from sensors, geo-location devices, machines and social
feeds
Collect: Bring Together
Mediate point-to-point and bi-directional data flows, delivering data
reliably to Apache HBase, Apache Hive, Slack and Email.
Conduct: Mediate the Data Flow
Parse, filter, join, transform, fork, query, sort, dissect; enrich with weather,
location, and TensorFlow.
Curate: Gain Insights
11 ©HortonworksInc. 2011–2018. All rightsreserved.
{
"imagefilename" : "/opt/demo/images/2018-04-
17_1127.jpg",
"yaw" : 100.0,
"host" : "sensehatmovidius",
"top3" : "n06874185 traffic light, traffic signal,
stoplight",
"top5" : "n03773504 missile",
"humidity" : 31.2,
"uuid" : "uuid_json_20180417152727.json",
"ipaddress" : 192.168.1.104,
"top2" : "n04286575 spotlight, spot",
"top3pct" : "6.199999898672104",
"top2pct" : "10.199999809265137",
"cputemp2" : 56.92,
"z" : 1.0,
"diskfree" : "4152.5 MB",
"top1pct" : "13.79999965429306",
"currenttime" : "2018-04-17 15:27:37",
"label2" : "n04592741 wing",
"pitch" : 360.0,
"pressure" : 1026.2,
"roll" : 1.0,
"label1" : "n04286575 spotlight, spot",
"top5pct" : "4.30000014603138",
"label4" : "n06874185 traffic light, traffic signal, stoplight",
"y" : 0.0,
"label3" : "n04009552 projector",
"cputemp" : 58,
"top1" : "n02930766 cab, hack, taxi, taxicab",
"top4pct" : "5.000000074505806",
"tempf" : 75.81,
"memory" : 56.5,
"top4" : "n03345487 fire engine, fire truck",
"starttime" : "2018-04-17 15:27:25",
"runtime" : "12",
"label5" : "n09229709 bubble",
"temp" : 35.45,
"x" : 0.0
}
Example Data
12 ©HortonworksInc. 2011–2018. All rightsreserved.
OpenCV
13 ©HortonworksInc. 2011–2018. All rightsreserved.
https://en.wikipedia.org/wiki/OpenCV
• OpenCV is a an open source computer vision library
• Nearly 20 years old
• Started by Intel
• Current Version 3.4.1
• C++, Python and Java Interfaces
• Runs on Windows, Linux, Mac, BSD, iOS, and Android.
• Can be built from source with Make
• Runs on Raspberry Pis and other devices
What is OpenCV?
14 ©HortonworksInc. 2011–2018. All rightsreserved.
https://github.com/jdye64/nifi-opencv
• Facial Recognition
• Image Capture From Cameras
• Object Identification
• Motion Tracking
• Pixel Manipulation
• Image Properties
• Image Data Manipulation
• Image Processing including filtering, color conversion and histograms
• Image Labelling
https://www.learnopencv.com/
https://docs.opencv.org/3.4.0/d9/df8/tutorial_root.html
What can I do with OpenCV?
15 ©HortonworksInc. 2011–2018. All rightsreserved.
• https://community.hortonworks.com/articles/182850/vision-thing.html
• https://community.hortonworks.com/articles/182984/vision-thing-part-2-processing-capturing-and-displ.html
• https://github.com/aruizga7/Self-Driving-Car-in-DSX/tree/master/1.%20Line%20Lane%20Detection
import cv2
cap = cv2.VideoCapture(0)
ret, frame = cap.read()
filename = ‘images/ilovedataworkssummit.jpg’
cv2.imwrite(filename, frame)
img = cv2.cvtColor(cv2.imread(filename),cv2.COLOR_BGR2RGB)
img = cv2.resize(img, (224, 224))
cv2.rectangle(image, (x, y), (x + w, y + h), (255, 255, 0), 2)
What does OpenCV Python Code Look like?
16 ©HortonworksInc. 2011–2018. All rightsreserved.
TensorFlow
17 ©HortonworksInc. 2011–2018. All rightsreserved.
• TensorFlow (C++, Python, Java)
via ExecuteStreamCommand
• TensorFlow NiFi Java Custom Processor
• TensorFlow Running on Edge Nodes (MiniFi)
Apache NiFi Integration with TensorFlow Options
18 ©HortonworksInc. 2011–2018. All rightsreserved.
python classify_image.py --image_file /opt/demo/dronedata/Bebop2_20160920083655-0400.jpg
solar dish, solar collector, solar furnace (score = 0.98316)
window screen (score = 0.00196)
manhole cover (score = 0.00070)
radiator (score = 0.00041)
doormat, welcome mat (score = 0.00041)
bazel-bin/tensorflow/examples/label_image/label_image --
image=/opt/demo/dronedata/Bebop2_20160920083655-0400.jpg
tensorflow/examples/label_image/main.cc:204] solar dish (577): 0.983162I
tensorflow/examples/label_image/main.cc:204] window screen (912): 0.00196204I
tensorflow/examples/label_image/main.cc:204] manhole cover (763): 0.000704005I
tensorflow/examples/label_image/main.cc:204] radiator (571): 0.000408321I
tensorflow/examples/label_image/main.cc:204] doormat (972): 0.000406186
TensorFlow via Python or C++ Binary (Java Library Is New!)
19 ©HortonworksInc. 2011–2018. All rightsreserved.
TensorFlow Python ExecuteStreamCommand NiFi
https://community.hortonworks.com/articles/58265/analyzing-images-in-hdf-20-using-tensorflow.html
20 ©HortonworksInc. 2011–2018. All rightsreserved.
Run TensorFlow on YARN 3.1
https://community.hortonworks.com/articles/83872/data-lake-30-containerization-erasure-coding-gpu-p.html
21 ©HortonworksInc. 2011–2018. All rightsreserved.
Why TensorFlow? Also Apache MXNet, PyTorch and DL4J.
• Google
• Multiple platform
support
• Hadoop integration
• Spark integration
• Keras
• Large Community
• Python and Java APIs
• GPU Support
• Mobile Support
• Inception v3
• Clustering
• Fully functional demos
• Open Source
• Apache Licensed
• Large Model Library
• Buzz
• Extensive Documentation
• Raspberry Pi Support
22 ©HortonworksInc. 2011–2018. All rightsreserved.
TensorFlow Java Processor in NiFi
https://community.hortonworks.com/content/kbentry/116803/building-a-custom-processor-in-apache-nifi-12-for.html
https://github.com/tspannhw/nifi-tensorflow-processor
23 ©HortonworksInc. 2011–2018. All rightsreserved.
TensorFlow Running on Edge Nodes (MiniFi)
24 ©HortonworksInc. 2011–2018. All rightsreserved.
Apache Tika with Apache NiFi
https://tika.apache.org/1.18/gettingstarted.html
• Detection
• Parsing
• Output Formats including Text and HTML
• Translation
• Language Identification
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-media-nar/1.6.0/org.apache.nifi.processors.media.ExtractMediaMetadata/
• Apache NiFi - Bundled ExtractMediaMetadata Processor
• Apache NiFi - Extract the content metadata from flowfiles
25 ©HortonworksInc. 2011–2018. All rightsreserved.
Apache Tika Supported File Formats
• HTML, XML
• Microsoft Word, Excel, PowerPoint, Outlook
• OpenOffice
• RSS
• RTF
• Zip, Tar, 7zip, Gzip, RAR
• PDF
• MP3, WAV, MIDI
• MP4, FLV
• TIFF, JPEG, PNG, BMP, GIF
• And more!
26 ©HortonworksInc. 2011–2018. All rightsreserved.
Apache Tika with Apache NiFi
https://community.hortonworks.com/articles/163776/parsing-any-document-with-apache-nifi-15-with-apac.html
https://community.hortonworks.com/articles/81694/extracttext-nifi-custom-processor-powered-by-apach.html
https://community.hortonworks.com/articles/76924/data-processing-pipeline-parsing-pdfs-and-identify.html
https://github.com/tspannhw/nifi-extracttext-processor
https://community.hortonworks.com/content/kbentry/177370/extracting-html-from-pdf-excel-and-word-
documents.html
27 ©HortonworksInc. 2011–2018. All rightsreserved.
Apache Tika with Apache NiFi
28 ©HortonworksInc. 2011–2018. All rightsreserved.
HORTONWORKS DATA FLOW
NIFI
1.2.0HDF3.0
Jul 2017
1.0.0HDF2.0
Mar 2016
1.1.0
NiFiRegistry
Ranger
0.7.0
0.5.0
0.6.0
Ambari
2.5.1
2.4.0
2.4.2
Kafka
0.10.1.0
0.9.0
0.10.0
Zookeeper
3.4.6
3.4.6
3.4.6
Storm
1.1.0
1.0.1
1.0.2SAM
0.5.0
SchemaRegistry
0.3.0
HDF2.1
Aug2016
Ongoing Innovation in Apache
HDF1.0
Dec2014 0.3.0
0.6.1HDF1.2
Oct 2015
MiNiFiC++andJava
0.2.0
Ongoing Innovation in OpenSource
1.0.0
0.0.1
0.10.0
HDF 3.1.2
June 2018 1.5.0 0.1.0 0.7.02.6.11.0.0 3.4.61.1.10.6.0 0.5.10.4.0
SECURITYSTREAM ING & INTEGRATION OPERATIONS
Hortonworks Data Flow 3.1.2
https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.1.2/bk_release-notes/content/ch_hdf_relnotes.html
29 ©HortonworksInc. 2011–2018. All rightsreserved.
HDF Data-In-Motion Platform – with HDF 3.1
30 ©HortonworksInc. 2011–2018. All rightsreserved.
Why Apache NiFi?
• Guaranteed delivery
• Data buffering
- Backpressure
- Pressure release
• Prioritized queuing
• Flow specific QoS
- Latency vs. throughput
- Loss tolerance
• Data provenance
• Supports push and pull
models
• Hundreds of processors
• Visual command and
control
• Over a fifty sources
• Flow templates
• Pluggable/multi-role
security
• Designed for extension
• Clustering
• Version Control
31 ©HortonworksInc. 2011–2018. All rightsreserved.
Apache MiNiFi
• NiFi lives in the data center. Give it an
enterprise server or a cluster of them.
• MiNiFi lives as close to where data is born
and is a guest on that device or system
“Let me get the key parts of NiFi close to where data begins and provide
bidirectional data transfer"
32 ©HortonworksInc. 2011–2018. All rightsreserved.
Edge Intelligence with Apache MiNiFi
à Guaranteed delivery
à Data buffering
‒ Backpressure
‒ Pressure release
à Prioritized queuing
à Flow specific QoS
‒ Latency vs. throughput
‒ Loss tolerance
à Data provenance
à Recovery / recording a rolling log
of fine-grained history
à Designed for extension
Different from Apache NiFi
à Design and Deploy
à Warm re-deploys
Key Features
33 ©HortonworksInc. 2011–2018. All rightsreserved.
Custom Apache NiFi Processors for Open Source Computer Vision
34 ©HortonworksInc. 2011–2018. All rightsreserved.
TensorFlow with MiniFi
https://community.hortonworks.com/articles/103863/using-an-asus-tinkerboard-with-tensorflow-and-pyth.html
https://community.hortonworks.com/articles/183151/enterprise-iiot-edge-processing-with-apache-nifi-m.html
https://community.hortonworks.com/articles/130814/sensors-and-image-capture-and-deep-learning-analys.html
https://community.hortonworks.com/articles/118132/minifi-capturing-converting-tensorflow-inception-t.html
35 ©HortonworksInc. 2011–2018. All rightsreserved.
Image Analytics
https://community.hortonworks.com/articles/118132/minifi-capturing-converting-tensorflow-inception-t.html
https://community.hortonworks.com/articles/155604/iot-ingesting-camera-data-from-nanopi-duo-devices.html
https://community.hortonworks.com/articles/182984/vision-thing-part-2-processing-capturing-and-displ.html
https://community.hortonworks.com/articles/182850/vision-thing.html
https://community.hortonworks.com/articles/77988/ingest-remote-camera-images-from-raspberry-pi-via.html
36 ©HortonworksInc. 2011–2018. All rightsreserved.
NiFi and Kafka Are Complementary
NiFi
Provide dataflow solution
• Centralized management, from edge to core
• Great traceability, event level data provenance
starting when data is born
• Interactive command and control – real time
operational visibility
• Dataflow management, including prioritization,
back pressure, and edge intelligence
• Visual representation of global dataflow
Kafka
Provide durable stream store
• Low latency
• Distributed data durability
• Decentralized management of producers &
consumers
+
37 ©HortonworksInc. 2011–2018. All rightsreserved.
Integrated Provisioning and Security
Kafka 1.0 Support
To enhance data governance and lineage, users can
now manage access control policies using resource or
tag-based security in Ranger for Kafka 1.0 clusters.
Users can now install, configure, manage, upgrade,
monitor, and secure Kafka 1.0 clusters with Ambari.
New processors in NiFi and Streaming Analytics
Manager support Kafka 1.0 features including message
headers and transactions.
38 ©HortonworksInc. 2011–2018. All rightsreserved.
What is Apache Kafka?
• Distributed streaming platformthat allows
publishing and subscribing to streams of
records
• Streams of records are organized into
categories called topics
• Topics can be partitioned and/or replicated
• Records consist of a key, value, and
timestamp
http://kafka.apache.org/intro
Kafka
Cluster
producer
producer
producer
consumer
consumer
consumer
APACHE KAFKA
39 ©HortonworksInc. 2011–2018. All rightsreserved.
40 ©HortonworksInc. 2011–2018. All rightsreserved.
https://community.hortonworks.com/articles/177349/big-data-devops-apache-nifi-
hwx-schema-registry-sc.html
41 ©HortonworksInc. 2011–2018. All rightsreserved.
Completion of Schema Lifecycle: Merged Schema from Dev Branch to
Master
42 ©HortonworksInc. 2011–2018. All rightsreserved.
Schema Registry Support for Different “States”: Enable, Disable,
Archive
43 ©HortonworksInc. 2011–2018. All rightsreserved.
SAM and Schema Registry Integration
• Streaming Apps Require a Schema
• Unlike NiFi, SAM requires a schema to build streaming
analytics applications.
• Every SAM builder component requires a schema to
function.
• SAM’s primary mechanism for connecting to a stream of
data is Kafka, but Kafka does not have a schema.
• This is where HDF’s Schema Registry component becomes
incredibly valuable.
• SAM’s Kafka Source Component integrated with
Schema Registry
• When you configure a Kafka source and supply kafka topic,
SAM calls the Schema Registry.
• Using the Kafka topic as the key, SAM will retrieve the
schema.
• This schema is then displayed on the tile component, and
is passed to downstream components.
44 ©HortonworksInc. 2011–2018. All rightsreserved.
Streaming Analytics Manager
45 ©HortonworksInc. 2011–2018. All rightsreserved.
Streaming Analytics Manager
46 ©HortonworksInc. 2011–2018. All rightsreserved.
Thank you
47 ©HortonworksInc. 2011–2018. All rightsreserved.
Contact
https://community.hortonworks.com/users/9304/tspann.html
https://dzone.com/users/297029/bunkertor.html
https://www.meetup.com/futureofdata-princeton/
https://twitter.com/PaaSDev
https://dzone.com/articles/integrating-keras-tensorflow-yolov3-into-
apache-ni
48 ©HortonworksInc. 2011–2018. All rightsreserved.
Hortonworks Community Connection
Read access for everyone, join to participate and be recognized
• Full Q&A Platform (like StackOverflow)
• Knowledge Base Articles
• Code Samples and Repositories
49 ©HortonworksInc. 2011–2018. All rightsreserved.
Community Engagement
Participate now at: community.hortonworks.com
©HortonworksInc. 2011–2015. All RightsReserved
4,000+
Registered Users
10,000+
Answers
15,000+
Technical Assets
One Website!

More Related Content

PDF
GPU仮想化最前線 - KVMGTとvirtio-gpu -
PDF
ブロックチェーン系プロジェクトで着目される暗号技術
PPTX
C++でテスト駆動開発
PPTX
Apache Spark on Kubernetes入門(Open Source Conference 2021 Online Hiroshima 発表資料)
PPTX
Dockerからcontainerdへの移行
PDF
Kubernetesのしくみ やさしく学ぶ 内部構造とアーキテクチャー
PDF
【Unite Tokyo 2018】その最適化、本当に最適ですか!? ~正しい最適化を行うためのテクニック~
PDF
IoTで生き残れ!成功なんて結果論、こうすれば失敗します。プロ達が語る『IoT失敗あるある談』!!! | IoT ありがちな失敗パターンと 回避する方法
GPU仮想化最前線 - KVMGTとvirtio-gpu -
ブロックチェーン系プロジェクトで着目される暗号技術
C++でテスト駆動開発
Apache Spark on Kubernetes入門(Open Source Conference 2021 Online Hiroshima 発表資料)
Dockerからcontainerdへの移行
Kubernetesのしくみ やさしく学ぶ 内部構造とアーキテクチャー
【Unite Tokyo 2018】その最適化、本当に最適ですか!? ~正しい最適化を行うためのテクニック~
IoTで生き残れ!成功なんて結果論、こうすれば失敗します。プロ達が語る『IoT失敗あるある談』!!! | IoT ありがちな失敗パターンと 回避する方法

What's hot (20)

PDF
PHPでスマホアプリにプッシュ通知する
PPTX
サイバーエージェントにおけるプライベートコンテナ基盤AKEを支える技術
PPTX
BuildKitによる高速でセキュアなイメージビルド
PDF
【Unite Tokyo 2019】たのしいDOTS〜初級から上級まで〜
DOCX
UE4でPerforceと連携するための手順
PPTX
Karpenterで君だけの最強のオートスケーリングを実装しよう
PDF
DockerからKubernetesへのシフト
PDF
JavaScript難読化読経
PDF
【 Unity道場 1月 ~LWRPとシェーダー~】軽量レンダーパイプライン、Light Weight Renderer Pipeline…とは
PDF
Apache Kuduを使った分析システムの裏側
PDF
KubeCon + CloudNativeCon Europe 2022 Recap / Kubernetes Meetup Tokyo #51 / #k...
PDF
ARM Trusted FirmwareのBL31を単体で使う!
PPTX
最新UE4タイトルでのローカライズ事例 (UE4 Localization Deep Dive)
PDF
データベース屋がHyperledger Fabricを検証してみた
PPTX
Java 18で入ったJVM関連の(やや細かめな)改善(JJUGナイトセミナー「Java 18 リリース記念イベント」発表資料)
PDF
KubeCon + CloudNativeCon Europe 2022 Recap - Batch/HPCの潮流とScheduler拡張事例 / Kub...
PDF
高位合成友の会(2018秋)スライド
PDF
オンラインゲームの仕組みと工夫
PDF
async/await不要論
PDF
[GKE & Spanner 勉強会] GKE 入門
PHPでスマホアプリにプッシュ通知する
サイバーエージェントにおけるプライベートコンテナ基盤AKEを支える技術
BuildKitによる高速でセキュアなイメージビルド
【Unite Tokyo 2019】たのしいDOTS〜初級から上級まで〜
UE4でPerforceと連携するための手順
Karpenterで君だけの最強のオートスケーリングを実装しよう
DockerからKubernetesへのシフト
JavaScript難読化読経
【 Unity道場 1月 ~LWRPとシェーダー~】軽量レンダーパイプライン、Light Weight Renderer Pipeline…とは
Apache Kuduを使った分析システムの裏側
KubeCon + CloudNativeCon Europe 2022 Recap / Kubernetes Meetup Tokyo #51 / #k...
ARM Trusted FirmwareのBL31を単体で使う!
最新UE4タイトルでのローカライズ事例 (UE4 Localization Deep Dive)
データベース屋がHyperledger Fabricを検証してみた
Java 18で入ったJVM関連の(やや細かめな)改善(JJUGナイトセミナー「Java 18 リリース記念イベント」発表資料)
KubeCon + CloudNativeCon Europe 2022 Recap - Batch/HPCの潮流とScheduler拡張事例 / Kub...
高位合成友の会(2018秋)スライド
オンラインゲームの仕組みと工夫
async/await不要論
[GKE & Spanner 勉強会] GKE 入門
Ad

Similar to Open Computer Vision with OpenCV, Apache NiFi, TensorFlow, Python (20)

PPTX
Open source computer vision with TensorFlow, Apache MiniFi, Apache NiFi, Open...
PDF
IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018
PDF
System and Software Engineering for Industry 4.0
PDF
Hands-On Deep Dive with MiniFi and Apache MXNet
PDF
Apache NiFi MiNiFi C++ and the tale of edgey stuff
PPTX
Navigating Idiosyncrasies of IoT Development
PPTX
Apache NiFi + Tensorflow + Hadoop: Big Data AI サンドイッチの作り方
PDF
Enterprise IIoT Edge Processing with Apache NiFi
PDF
Software Tools for Building Industry 4.0 Applications
PPTX
Hadoop Summit Tokyo Apache NiFi Crash Course
PPTX
Html5 today
PDF
Apache Deep Learning 101 - DWS Berlin 2018
PDF
Hortonworks sqrrl webinar v5.pptx
PDF
MiniFi and Apache NiFi : IoT in Berlin Germany 2018
PDF
Apache MXNet for IoT with Apache NiFi
PPTX
IoT with Apache MXNet and Apache NiFi and MiniFi
PPTX
OpenStack: Everything You Need To Know to Get Started (ATO2014)
PDF
Choreo: Empowering the Future of Enterprise Software Engineering
PDF
microXchg 2019: "Creating an Effective Developer Experience for Cloud-Native ...
PDF
“Khronos Standard APIs for Accelerating Vision and Inferencing,” a Presentati...
Open source computer vision with TensorFlow, Apache MiniFi, Apache NiFi, Open...
IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018
System and Software Engineering for Industry 4.0
Hands-On Deep Dive with MiniFi and Apache MXNet
Apache NiFi MiNiFi C++ and the tale of edgey stuff
Navigating Idiosyncrasies of IoT Development
Apache NiFi + Tensorflow + Hadoop: Big Data AI サンドイッチの作り方
Enterprise IIoT Edge Processing with Apache NiFi
Software Tools for Building Industry 4.0 Applications
Hadoop Summit Tokyo Apache NiFi Crash Course
Html5 today
Apache Deep Learning 101 - DWS Berlin 2018
Hortonworks sqrrl webinar v5.pptx
MiniFi and Apache NiFi : IoT in Berlin Germany 2018
Apache MXNet for IoT with Apache NiFi
IoT with Apache MXNet and Apache NiFi and MiniFi
OpenStack: Everything You Need To Know to Get Started (ATO2014)
Choreo: Empowering the Future of Enterprise Software Engineering
microXchg 2019: "Creating an Effective Developer Experience for Cloud-Native ...
“Khronos Standard APIs for Accelerating Vision and Inferencing,” a Presentati...
Ad

More from Timothy Spann (20)

PDF
14May2025_TSPANN_FromAirQualityUnstructuredData.pdf
PDF
Streaming AI Pipelines with Apache NiFi and Snowflake NYC 2025
PDF
2025-03-03-Philly-AAAI-GoodData-Build Secure RAG Apps With Open LLM
PDF
Conf42_IoT_Dec2024_Building IoT Applications With Open Source
PDF
2024 Dec 05 - PyData Global - Tutorial Its In The Air Tonight
PDF
2024Nov20-BigDataEU-RealTimeAIWithOpenSource
PDF
TSPANN-2024-Nov-CloudX-Adding Generative AI to Real-Time Streaming Pipelines
PDF
2024-Nov-BuildStuff-Adding Generative AI to Real-Time Streaming Pipelines
PDF
14 November 2024 - Conf 42 - Prompt Engineering - Codeless Generative AI Pipe...
PDF
2024 Nov 05 - Linux Foundation TAC TALK With Milvus
PPTX
tspann06-NOV-2024_AI-Alliance_NYC_ intro to Data Prep Kit and Open Source RAG
PDF
tspann08-Nov-2024_PyDataNYC_Unstructured Data Processing with a Raspberry Pi ...
PDF
2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Te...
PDF
10-25-2024_BITS_NYC_Unstructured Data and LLM_ What, Why and How
PDF
2024-OCT-23 NYC Meetup - Unstructured Data Meetup - Unstructured Halloween
PDF
DBTA Round Table with Zilliz and Airbyte - Unstructured Data Engineering
PDF
17-October-2024 NYC AI Camp - Step-by-Step RAG 101
PDF
11-OCT-2024_AI_101_CryptoOracle_UnstructuredData
PDF
2024-10-04 - Grace Hopper Celebration Open Source Day - Stefan
PDF
01-Oct-2024_PES-VectorDatabasesAndAI.pdf
14May2025_TSPANN_FromAirQualityUnstructuredData.pdf
Streaming AI Pipelines with Apache NiFi and Snowflake NYC 2025
2025-03-03-Philly-AAAI-GoodData-Build Secure RAG Apps With Open LLM
Conf42_IoT_Dec2024_Building IoT Applications With Open Source
2024 Dec 05 - PyData Global - Tutorial Its In The Air Tonight
2024Nov20-BigDataEU-RealTimeAIWithOpenSource
TSPANN-2024-Nov-CloudX-Adding Generative AI to Real-Time Streaming Pipelines
2024-Nov-BuildStuff-Adding Generative AI to Real-Time Streaming Pipelines
14 November 2024 - Conf 42 - Prompt Engineering - Codeless Generative AI Pipe...
2024 Nov 05 - Linux Foundation TAC TALK With Milvus
tspann06-NOV-2024_AI-Alliance_NYC_ intro to Data Prep Kit and Open Source RAG
tspann08-Nov-2024_PyDataNYC_Unstructured Data Processing with a Raspberry Pi ...
2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Te...
10-25-2024_BITS_NYC_Unstructured Data and LLM_ What, Why and How
2024-OCT-23 NYC Meetup - Unstructured Data Meetup - Unstructured Halloween
DBTA Round Table with Zilliz and Airbyte - Unstructured Data Engineering
17-October-2024 NYC AI Camp - Step-by-Step RAG 101
11-OCT-2024_AI_101_CryptoOracle_UnstructuredData
2024-10-04 - Grace Hopper Celebration Open Source Day - Stefan
01-Oct-2024_PES-VectorDatabasesAndAI.pdf

Recently uploaded (20)

PDF
ShowUs: Pharo Stream Deck (ESUG 2025, Gdansk)
PDF
Comprehensive Salesforce Implementation Services.pdf
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
DOCX
The Five Best AI Cover Tools in 2025.docx
PPTX
Hire Expert WordPress Developers from Brainwings Infotech
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PPTX
Visualising Data with Scatterplots in IBM SPSS Statistics.pptx
PPTX
Online Work Permit System for Fast Permit Processing
PDF
Forouzan Book Information Security Chaper - 1
PDF
Perfecting Gamer’s Experiences with Performance Testing for Gaming Applicatio...
PDF
Teaching Reproducibility and Embracing Variability: From Floating-Point Exper...
PPTX
CRUISE TICKETING SYSTEM | CRUISE RESERVATION SOFTWARE
PDF
How Creative Agencies Leverage Project Management Software.pdf
PDF
A REACT POMODORO TIMER WEB APPLICATION.pdf
PDF
Become an Agentblazer Champion Challenge
PPTX
Presentation of Computer CLASS 2 .pptx
PPTX
Odoo Consulting Services by CandidRoot Solutions
PDF
The Future of Smart Factories Why Embedded Analytics Leads the Way
PDF
Exploring AI Agents in Process Industries
PDF
Microsoft Teams Essentials; The pricing and the versions_PDF.pdf
ShowUs: Pharo Stream Deck (ESUG 2025, Gdansk)
Comprehensive Salesforce Implementation Services.pdf
2025 Textile ERP Trends: SAP, Odoo & Oracle
The Five Best AI Cover Tools in 2025.docx
Hire Expert WordPress Developers from Brainwings Infotech
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Visualising Data with Scatterplots in IBM SPSS Statistics.pptx
Online Work Permit System for Fast Permit Processing
Forouzan Book Information Security Chaper - 1
Perfecting Gamer’s Experiences with Performance Testing for Gaming Applicatio...
Teaching Reproducibility and Embracing Variability: From Floating-Point Exper...
CRUISE TICKETING SYSTEM | CRUISE RESERVATION SOFTWARE
How Creative Agencies Leverage Project Management Software.pdf
A REACT POMODORO TIMER WEB APPLICATION.pdf
Become an Agentblazer Champion Challenge
Presentation of Computer CLASS 2 .pptx
Odoo Consulting Services by CandidRoot Solutions
The Future of Smart Factories Why Embedded Analytics Leads the Way
Exploring AI Agents in Process Industries
Microsoft Teams Essentials; The pricing and the versions_PDF.pdf

Open Computer Vision with OpenCV, Apache NiFi, TensorFlow, Python

  • 1. 1 ©HortonworksInc. 2011–2018. All rightsreserved. © Hortonworks, Inc. 2011-2018. All rights reserved. | Hortonworks confidential and proprietary information. Open Source Computer Vision with TensorFlow, MiniFi, Apache NiFi, OpenCV, Apache Tika and Python Timothy Spann, Solutions Engineer Hortonworks @PaaSDev Vision Thing
  • 2. 2 ©HortonworksInc. 2011–2018. All rightsreserved. Disclaimer • This document may contain product features and technology directions that are under development, may be under development in the future or may ultimately not be developed. • Technical feasibility, market demand, user feedback, and the Apache Software Foundation community development process can all effect timing and final delivery. • This document’s description of these features and technology directions does not represent a contractual commitment, promise or obligation from Hortonworks to deliver these features in any generally available product. • Product features and technology directions are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind. • Since this document contains an outline of general product development plans, customers should not rely upon it when making a purchase decision.
  • 3. 4 ©HortonworksInc. 2011–2018. All rightsreserved. Agenda • Architecture • OpenCV • TensorFlow • Apache Tika • Apache NiFi and MiniFi • Apache Kafka • Schema Registry • Streaming Analytics Manager • Demos • Questions
  • 4. 5 ©HortonworksInc. 2011–2018. All rightsreserved. Use Cases So Why Am I Ingesting Images From Edge Devices? Object Recognition • WebCam Security • Anomaly Detection • Logging • Metadata about images • Customer Analysis Image Classification Motion Estimation • Movement tracking • Security • Occupied Room Active Archive • Store all images • Training datasets • Joining With Other Data • Cameras Everywhere
  • 5. 6 ©HortonworksInc. 2011–2018. All rightsreserved. Architecture
  • 6. 7 ©HortonworksInc. 2011–2018. All rightsreserved. Open Computer Vision Flow Ingestion Simple Event Processing Engine Stream Processing Destination Data Bus Build Predictive Model From Historical Data Deploy Predictive Model For Real-time Insights Perishable Insights Historical Insights
  • 7. 8 ©HortonworksInc. 2011–2018. All rightsreserved. Open Source Image Analytical Components Streaming Analytics Manager Image Ingest Distributed queue Buffering Process decoupling Routing and Pre-Processing Orchestration Queueing Simple Event Processing Image Capture Image Processing
  • 8. 9 ©HortonworksInc. 2011–2018. All rightsreserved. Streaming Analytics Manager Part of MiniFi C++ Agent Detect metadata and data Extract metadata and data Content Analysis Deep Learning Framework Open Source Image Analytical Components Enabling Record Processing Schema Management
  • 9. 10 ©HortonworksInc. 2011–2018. All rightsreserved. Aggregate all data from sensors, geo-location devices, machines and social feeds Collect: Bring Together Mediate point-to-point and bi-directional data flows, delivering data reliably to Apache HBase, Apache Hive, Slack and Email. Conduct: Mediate the Data Flow Parse, filter, join, transform, fork, query, sort, dissect; enrich with weather, location, and TensorFlow. Curate: Gain Insights
  • 10. 11 ©HortonworksInc. 2011–2018. All rightsreserved. { "imagefilename" : "/opt/demo/images/2018-04- 17_1127.jpg", "yaw" : 100.0, "host" : "sensehatmovidius", "top3" : "n06874185 traffic light, traffic signal, stoplight", "top5" : "n03773504 missile", "humidity" : 31.2, "uuid" : "uuid_json_20180417152727.json", "ipaddress" : 192.168.1.104, "top2" : "n04286575 spotlight, spot", "top3pct" : "6.199999898672104", "top2pct" : "10.199999809265137", "cputemp2" : 56.92, "z" : 1.0, "diskfree" : "4152.5 MB", "top1pct" : "13.79999965429306", "currenttime" : "2018-04-17 15:27:37", "label2" : "n04592741 wing", "pitch" : 360.0, "pressure" : 1026.2, "roll" : 1.0, "label1" : "n04286575 spotlight, spot", "top5pct" : "4.30000014603138", "label4" : "n06874185 traffic light, traffic signal, stoplight", "y" : 0.0, "label3" : "n04009552 projector", "cputemp" : 58, "top1" : "n02930766 cab, hack, taxi, taxicab", "top4pct" : "5.000000074505806", "tempf" : 75.81, "memory" : 56.5, "top4" : "n03345487 fire engine, fire truck", "starttime" : "2018-04-17 15:27:25", "runtime" : "12", "label5" : "n09229709 bubble", "temp" : 35.45, "x" : 0.0 } Example Data
  • 11. 12 ©HortonworksInc. 2011–2018. All rightsreserved. OpenCV
  • 12. 13 ©HortonworksInc. 2011–2018. All rightsreserved. https://en.wikipedia.org/wiki/OpenCV • OpenCV is a an open source computer vision library • Nearly 20 years old • Started by Intel • Current Version 3.4.1 • C++, Python and Java Interfaces • Runs on Windows, Linux, Mac, BSD, iOS, and Android. • Can be built from source with Make • Runs on Raspberry Pis and other devices What is OpenCV?
  • 13. 14 ©HortonworksInc. 2011–2018. All rightsreserved. https://github.com/jdye64/nifi-opencv • Facial Recognition • Image Capture From Cameras • Object Identification • Motion Tracking • Pixel Manipulation • Image Properties • Image Data Manipulation • Image Processing including filtering, color conversion and histograms • Image Labelling https://www.learnopencv.com/ https://docs.opencv.org/3.4.0/d9/df8/tutorial_root.html What can I do with OpenCV?
  • 14. 15 ©HortonworksInc. 2011–2018. All rightsreserved. • https://community.hortonworks.com/articles/182850/vision-thing.html • https://community.hortonworks.com/articles/182984/vision-thing-part-2-processing-capturing-and-displ.html • https://github.com/aruizga7/Self-Driving-Car-in-DSX/tree/master/1.%20Line%20Lane%20Detection import cv2 cap = cv2.VideoCapture(0) ret, frame = cap.read() filename = ‘images/ilovedataworkssummit.jpg’ cv2.imwrite(filename, frame) img = cv2.cvtColor(cv2.imread(filename),cv2.COLOR_BGR2RGB) img = cv2.resize(img, (224, 224)) cv2.rectangle(image, (x, y), (x + w, y + h), (255, 255, 0), 2) What does OpenCV Python Code Look like?
  • 15. 16 ©HortonworksInc. 2011–2018. All rightsreserved. TensorFlow
  • 16. 17 ©HortonworksInc. 2011–2018. All rightsreserved. • TensorFlow (C++, Python, Java) via ExecuteStreamCommand • TensorFlow NiFi Java Custom Processor • TensorFlow Running on Edge Nodes (MiniFi) Apache NiFi Integration with TensorFlow Options
  • 17. 18 ©HortonworksInc. 2011–2018. All rightsreserved. python classify_image.py --image_file /opt/demo/dronedata/Bebop2_20160920083655-0400.jpg solar dish, solar collector, solar furnace (score = 0.98316) window screen (score = 0.00196) manhole cover (score = 0.00070) radiator (score = 0.00041) doormat, welcome mat (score = 0.00041) bazel-bin/tensorflow/examples/label_image/label_image -- image=/opt/demo/dronedata/Bebop2_20160920083655-0400.jpg tensorflow/examples/label_image/main.cc:204] solar dish (577): 0.983162I tensorflow/examples/label_image/main.cc:204] window screen (912): 0.00196204I tensorflow/examples/label_image/main.cc:204] manhole cover (763): 0.000704005I tensorflow/examples/label_image/main.cc:204] radiator (571): 0.000408321I tensorflow/examples/label_image/main.cc:204] doormat (972): 0.000406186 TensorFlow via Python or C++ Binary (Java Library Is New!)
  • 18. 19 ©HortonworksInc. 2011–2018. All rightsreserved. TensorFlow Python ExecuteStreamCommand NiFi https://community.hortonworks.com/articles/58265/analyzing-images-in-hdf-20-using-tensorflow.html
  • 19. 20 ©HortonworksInc. 2011–2018. All rightsreserved. Run TensorFlow on YARN 3.1 https://community.hortonworks.com/articles/83872/data-lake-30-containerization-erasure-coding-gpu-p.html
  • 20. 21 ©HortonworksInc. 2011–2018. All rightsreserved. Why TensorFlow? Also Apache MXNet, PyTorch and DL4J. • Google • Multiple platform support • Hadoop integration • Spark integration • Keras • Large Community • Python and Java APIs • GPU Support • Mobile Support • Inception v3 • Clustering • Fully functional demos • Open Source • Apache Licensed • Large Model Library • Buzz • Extensive Documentation • Raspberry Pi Support
  • 21. 22 ©HortonworksInc. 2011–2018. All rightsreserved. TensorFlow Java Processor in NiFi https://community.hortonworks.com/content/kbentry/116803/building-a-custom-processor-in-apache-nifi-12-for.html https://github.com/tspannhw/nifi-tensorflow-processor
  • 22. 23 ©HortonworksInc. 2011–2018. All rightsreserved. TensorFlow Running on Edge Nodes (MiniFi)
  • 23. 24 ©HortonworksInc. 2011–2018. All rightsreserved. Apache Tika with Apache NiFi https://tika.apache.org/1.18/gettingstarted.html • Detection • Parsing • Output Formats including Text and HTML • Translation • Language Identification https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-media-nar/1.6.0/org.apache.nifi.processors.media.ExtractMediaMetadata/ • Apache NiFi - Bundled ExtractMediaMetadata Processor • Apache NiFi - Extract the content metadata from flowfiles
  • 24. 25 ©HortonworksInc. 2011–2018. All rightsreserved. Apache Tika Supported File Formats • HTML, XML • Microsoft Word, Excel, PowerPoint, Outlook • OpenOffice • RSS • RTF • Zip, Tar, 7zip, Gzip, RAR • PDF • MP3, WAV, MIDI • MP4, FLV • TIFF, JPEG, PNG, BMP, GIF • And more!
  • 25. 26 ©HortonworksInc. 2011–2018. All rightsreserved. Apache Tika with Apache NiFi https://community.hortonworks.com/articles/163776/parsing-any-document-with-apache-nifi-15-with-apac.html https://community.hortonworks.com/articles/81694/extracttext-nifi-custom-processor-powered-by-apach.html https://community.hortonworks.com/articles/76924/data-processing-pipeline-parsing-pdfs-and-identify.html https://github.com/tspannhw/nifi-extracttext-processor https://community.hortonworks.com/content/kbentry/177370/extracting-html-from-pdf-excel-and-word- documents.html
  • 26. 27 ©HortonworksInc. 2011–2018. All rightsreserved. Apache Tika with Apache NiFi
  • 27. 28 ©HortonworksInc. 2011–2018. All rightsreserved. HORTONWORKS DATA FLOW NIFI 1.2.0HDF3.0 Jul 2017 1.0.0HDF2.0 Mar 2016 1.1.0 NiFiRegistry Ranger 0.7.0 0.5.0 0.6.0 Ambari 2.5.1 2.4.0 2.4.2 Kafka 0.10.1.0 0.9.0 0.10.0 Zookeeper 3.4.6 3.4.6 3.4.6 Storm 1.1.0 1.0.1 1.0.2SAM 0.5.0 SchemaRegistry 0.3.0 HDF2.1 Aug2016 Ongoing Innovation in Apache HDF1.0 Dec2014 0.3.0 0.6.1HDF1.2 Oct 2015 MiNiFiC++andJava 0.2.0 Ongoing Innovation in OpenSource 1.0.0 0.0.1 0.10.0 HDF 3.1.2 June 2018 1.5.0 0.1.0 0.7.02.6.11.0.0 3.4.61.1.10.6.0 0.5.10.4.0 SECURITYSTREAM ING & INTEGRATION OPERATIONS Hortonworks Data Flow 3.1.2 https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.1.2/bk_release-notes/content/ch_hdf_relnotes.html
  • 28. 29 ©HortonworksInc. 2011–2018. All rightsreserved. HDF Data-In-Motion Platform – with HDF 3.1
  • 29. 30 ©HortonworksInc. 2011–2018. All rightsreserved. Why Apache NiFi? • Guaranteed delivery • Data buffering - Backpressure - Pressure release • Prioritized queuing • Flow specific QoS - Latency vs. throughput - Loss tolerance • Data provenance • Supports push and pull models • Hundreds of processors • Visual command and control • Over a fifty sources • Flow templates • Pluggable/multi-role security • Designed for extension • Clustering • Version Control
  • 30. 31 ©HortonworksInc. 2011–2018. All rightsreserved. Apache MiNiFi • NiFi lives in the data center. Give it an enterprise server or a cluster of them. • MiNiFi lives as close to where data is born and is a guest on that device or system “Let me get the key parts of NiFi close to where data begins and provide bidirectional data transfer"
  • 31. 32 ©HortonworksInc. 2011–2018. All rightsreserved. Edge Intelligence with Apache MiNiFi à Guaranteed delivery à Data buffering ‒ Backpressure ‒ Pressure release à Prioritized queuing à Flow specific QoS ‒ Latency vs. throughput ‒ Loss tolerance à Data provenance à Recovery / recording a rolling log of fine-grained history à Designed for extension Different from Apache NiFi à Design and Deploy à Warm re-deploys Key Features
  • 32. 33 ©HortonworksInc. 2011–2018. All rightsreserved. Custom Apache NiFi Processors for Open Source Computer Vision
  • 33. 34 ©HortonworksInc. 2011–2018. All rightsreserved. TensorFlow with MiniFi https://community.hortonworks.com/articles/103863/using-an-asus-tinkerboard-with-tensorflow-and-pyth.html https://community.hortonworks.com/articles/183151/enterprise-iiot-edge-processing-with-apache-nifi-m.html https://community.hortonworks.com/articles/130814/sensors-and-image-capture-and-deep-learning-analys.html https://community.hortonworks.com/articles/118132/minifi-capturing-converting-tensorflow-inception-t.html
  • 34. 35 ©HortonworksInc. 2011–2018. All rightsreserved. Image Analytics https://community.hortonworks.com/articles/118132/minifi-capturing-converting-tensorflow-inception-t.html https://community.hortonworks.com/articles/155604/iot-ingesting-camera-data-from-nanopi-duo-devices.html https://community.hortonworks.com/articles/182984/vision-thing-part-2-processing-capturing-and-displ.html https://community.hortonworks.com/articles/182850/vision-thing.html https://community.hortonworks.com/articles/77988/ingest-remote-camera-images-from-raspberry-pi-via.html
  • 35. 36 ©HortonworksInc. 2011–2018. All rightsreserved. NiFi and Kafka Are Complementary NiFi Provide dataflow solution • Centralized management, from edge to core • Great traceability, event level data provenance starting when data is born • Interactive command and control – real time operational visibility • Dataflow management, including prioritization, back pressure, and edge intelligence • Visual representation of global dataflow Kafka Provide durable stream store • Low latency • Distributed data durability • Decentralized management of producers & consumers +
  • 36. 37 ©HortonworksInc. 2011–2018. All rightsreserved. Integrated Provisioning and Security Kafka 1.0 Support To enhance data governance and lineage, users can now manage access control policies using resource or tag-based security in Ranger for Kafka 1.0 clusters. Users can now install, configure, manage, upgrade, monitor, and secure Kafka 1.0 clusters with Ambari. New processors in NiFi and Streaming Analytics Manager support Kafka 1.0 features including message headers and transactions.
  • 37. 38 ©HortonworksInc. 2011–2018. All rightsreserved. What is Apache Kafka? • Distributed streaming platformthat allows publishing and subscribing to streams of records • Streams of records are organized into categories called topics • Topics can be partitioned and/or replicated • Records consist of a key, value, and timestamp http://kafka.apache.org/intro Kafka Cluster producer producer producer consumer consumer consumer APACHE KAFKA
  • 38. 39 ©HortonworksInc. 2011–2018. All rightsreserved.
  • 39. 40 ©HortonworksInc. 2011–2018. All rightsreserved. https://community.hortonworks.com/articles/177349/big-data-devops-apache-nifi- hwx-schema-registry-sc.html
  • 40. 41 ©HortonworksInc. 2011–2018. All rightsreserved. Completion of Schema Lifecycle: Merged Schema from Dev Branch to Master
  • 41. 42 ©HortonworksInc. 2011–2018. All rightsreserved. Schema Registry Support for Different “States”: Enable, Disable, Archive
  • 42. 43 ©HortonworksInc. 2011–2018. All rightsreserved. SAM and Schema Registry Integration • Streaming Apps Require a Schema • Unlike NiFi, SAM requires a schema to build streaming analytics applications. • Every SAM builder component requires a schema to function. • SAM’s primary mechanism for connecting to a stream of data is Kafka, but Kafka does not have a schema. • This is where HDF’s Schema Registry component becomes incredibly valuable. • SAM’s Kafka Source Component integrated with Schema Registry • When you configure a Kafka source and supply kafka topic, SAM calls the Schema Registry. • Using the Kafka topic as the key, SAM will retrieve the schema. • This schema is then displayed on the tile component, and is passed to downstream components.
  • 43. 44 ©HortonworksInc. 2011–2018. All rightsreserved. Streaming Analytics Manager
  • 44. 45 ©HortonworksInc. 2011–2018. All rightsreserved. Streaming Analytics Manager
  • 45. 46 ©HortonworksInc. 2011–2018. All rightsreserved. Thank you
  • 46. 47 ©HortonworksInc. 2011–2018. All rightsreserved. Contact https://community.hortonworks.com/users/9304/tspann.html https://dzone.com/users/297029/bunkertor.html https://www.meetup.com/futureofdata-princeton/ https://twitter.com/PaaSDev https://dzone.com/articles/integrating-keras-tensorflow-yolov3-into- apache-ni
  • 47. 48 ©HortonworksInc. 2011–2018. All rightsreserved. Hortonworks Community Connection Read access for everyone, join to participate and be recognized • Full Q&A Platform (like StackOverflow) • Knowledge Base Articles • Code Samples and Repositories
  • 48. 49 ©HortonworksInc. 2011–2018. All rightsreserved. Community Engagement Participate now at: community.hortonworks.com ©HortonworksInc. 2011–2015. All RightsReserved 4,000+ Registered Users 10,000+ Answers 15,000+ Technical Assets One Website!