SlideShare a Scribd company logo
Using
Jython
To
Prototype

      Mahout
Code
        Jonathan
Altman
   Principal
Engineer,
Concur
       Twi=er:
@async_io
Who
Am
I
and
What
Do
I
Do?
• By
day:
principal
engineer
at
Concur
• Architect
of
high‐volume
travel
booking
site
• Architect
of
travel
data
model
for
iKnerary

  storage/expense
integraKon
• Currently
team
lead
for
effort
to
leverage
our

  travel
and
spend
data
into
an
effecKve

  recommendaKon
engine
What
is
Mahout?
• Java
library
of
pre‐built
implementaKons
of

  various
machine
learning
tasks
• Recommenders:
collaboraKve
filtering
• Clustering:
grouping
things
by
similarity
• ClassificaKon:
analysis
of
a
corpus
for
clustering
• Intended
to
run
against
Hadoop‐based
data
sets
• h=p://mahout.apache.org/
What
is
jython?
• ImplementaKon
of
python
that
runs
against

  the
jvm
• Has
full
access
to
any
well‐behaved
java
library
• Started
in
1997
by
Jim
Hugunin,
who
also
later

  did
IronPython
for
the
.Net
CLR
• Version
2.5.2
mirrors
python
2.5
• h=p://www.jython.org/
Why
Do
This?
• I
needed
to
evaluate
Mahout’s
suitability
as

  the
toolkit
for
our
travel
recommender
system
• I
am
not
primarily
a
java
dev
(yet?),
and
I
don’t

  know
how
to
create
a
maven
project
• But
I
do
know
python
• Fastest
way
between
2
points
is
a
straight
line
• Step
1:
adapt
sample
code
from
“Mahout
In

  AcKon”
to
jython
How
Do
I
Do
This?
# Add Mahout jars to jython’s path
sys.path.append(os.environ.get("MAHOUT_CORE"))
for jar in glob.glob(os.environ.get("MAHOUT_JAR_DIR") +
"/*.jar"):
    sys.path.append(jar)



# import classes from Mahout jar…
from org.apache.mahout.cf.taste.impl.model.file import *
# Bunch of imports deleted

def main():
    # and we are using the imported FileDataModel
        model = FileDataModel(File(sys.argv[1]))
What
Did
We
Learn?
• About
3
hours
to
port
first
“Mahout
In
AcKon”

  example
to
jython
• 3
minutes
to
port
the
second
• Includes
learning
how
to
import
jars
into
python
• And
building
a
nice
loop
to
punt
on
jar

  dependency
management
:‐)
• Increases
ability
to
experiment
with
ideas
in

  Mahout
by
reducing
ceremony
Want
Some
Extra
Stuff?
• Python
IDEs
that
work
with
jython:
  – PyCharm
(JetBrains)
  – PyDev
(Eclipse
add‐on)
  – WingIDE
(no
debugger)
• Ported
GroupLens
100k
data
set
example
from

  secKon
2.5
of
“Mahout
In
AcKon”
is
at
h=ps://
  gist.github.com/1041033

More Related Content

PDF
Lessons Learned from Building a REST API on Google App Engine
async_io
 
PPTX
Rapid Application Development on Google App Engine for Java
Kunal Dabir
 
PDF
Rails assets revisited
erichsen
 
PDF
[Kotlin Serverless 工作坊] 單元 4 - 實作 RSS Aggregator
Shengyou Fan
 
PDF
Fact-Based Monitoring - PuppetConf 2014
Puppet
 
PDF
老派浪漫:用 Kotlin 寫 Command Line 工具
Shengyou Fan
 
PPT
Adobe Experience Manager - Replication deep dive
mwmd
 
PPTX
habitat at docker bud
Mandi Walls
 
Lessons Learned from Building a REST API on Google App Engine
async_io
 
Rapid Application Development on Google App Engine for Java
Kunal Dabir
 
Rails assets revisited
erichsen
 
[Kotlin Serverless 工作坊] 單元 4 - 實作 RSS Aggregator
Shengyou Fan
 
Fact-Based Monitoring - PuppetConf 2014
Puppet
 
老派浪漫:用 Kotlin 寫 Command Line 工具
Shengyou Fan
 
Adobe Experience Manager - Replication deep dive
mwmd
 
habitat at docker bud
Mandi Walls
 

What's hot (20)

PDF
Monitoring kubernetes with prometheus
continohq
 
PPTX
CICD Pipeline Using Github Actions
Kumar Shìvam
 
PPTX
Using GitHub Actions to Deploy your Workloads to Azure
Kasun Kodagoda
 
PDF
JAX 2013: Introducing Eclipse Orion
martinlippert
 
PDF
Ktor 部署攻略 - 老派 Fat Jar 大法
Shengyou Fan
 
PDF
A User Interface for adding Machine Learning tools into GitHub
Rumyana Rumenova
 
PDF
PR workflow
Weiqiang Zhuang
 
PDF
2d web mapping with flask
Charmyne Mamador
 
PPT
Chat+twitter app with lift
k4200
 
PDF
Spring Tooling: What's new and what's coming
martinlippert
 
PDF
Ratpack and Grails 3
Lari Hotari
 
PPTX
GitHub Actions demo with mabl
Bertold Kolics
 
PDF
Apache Airflow
Sumit Maheshwari
 
PPTX
Apache Airflow Introduction
Liangjun Jiang
 
PDF
Automate your business
zmoog
 
PDF
用 OPENRNDR 將 Chatbot 訊息視覺化
Shengyou Fan
 
PDF
Knative CloudEvents
Nobuhiro Sue
 
PDF
Ecs gitlab runners
dynnamitt
 
PDF
Jenkins-Koji plugin presentation on Python & Ruby devel group @ Brno
Vaclav Tunka
 
PDF
Azkaban
wyukawa
 
Monitoring kubernetes with prometheus
continohq
 
CICD Pipeline Using Github Actions
Kumar Shìvam
 
Using GitHub Actions to Deploy your Workloads to Azure
Kasun Kodagoda
 
JAX 2013: Introducing Eclipse Orion
martinlippert
 
Ktor 部署攻略 - 老派 Fat Jar 大法
Shengyou Fan
 
A User Interface for adding Machine Learning tools into GitHub
Rumyana Rumenova
 
PR workflow
Weiqiang Zhuang
 
2d web mapping with flask
Charmyne Mamador
 
Chat+twitter app with lift
k4200
 
Spring Tooling: What's new and what's coming
martinlippert
 
Ratpack and Grails 3
Lari Hotari
 
GitHub Actions demo with mabl
Bertold Kolics
 
Apache Airflow
Sumit Maheshwari
 
Apache Airflow Introduction
Liangjun Jiang
 
Automate your business
zmoog
 
用 OPENRNDR 將 Chatbot 訊息視覺化
Shengyou Fan
 
Knative CloudEvents
Nobuhiro Sue
 
Ecs gitlab runners
dynnamitt
 
Jenkins-Koji plugin presentation on Python & Ruby devel group @ Brno
Vaclav Tunka
 
Azkaban
wyukawa
 
Ad

Viewers also liked (6)

PDF
Guide to AngularJS Services - NOVA MEAN August 2014
async_io
 
PDF
NOVA MEAN - Why the M in MEAN is a Significant Contributor to Its Success
async_io
 
PDF
Building a Cauldron for Chef to Cook In
async_io
 
PPT
Using npm to Manage Your Projects for Fun and Profit - USEFUL INFO IN NOTES!
async_io
 
PDF
Javascript Promises/Q Library
async_io
 
KEY
Dcjq node.js presentation
async_io
 
Guide to AngularJS Services - NOVA MEAN August 2014
async_io
 
NOVA MEAN - Why the M in MEAN is a Significant Contributor to Its Success
async_io
 
Building a Cauldron for Chef to Cook In
async_io
 
Using npm to Manage Your Projects for Fun and Profit - USEFUL INFO IN NOTES!
async_io
 
Javascript Promises/Q Library
async_io
 
Dcjq node.js presentation
async_io
 
Ad

Similar to Using Jython To Prototype Mahout Code (20)

PPTX
State of angular ecosystem
Giovanni Cândido da Silva
 
PPTX
All about that reactive ui
Paul van Zyl
 
PDF
Modern Web Framework : Play framework
Suman Adak
 
PPTX
Hot to build continuously processing for 24/7 real-time data streaming platform?
GetInData
 
PPTX
Introduction to React native
Dhaval Barot
 
PDF
Jython in workflow and rules engines
Vaclav Tunka
 
PPTX
Lattice yapc-slideshare
Gwenn Etourneau
 
PPTX
An evening with React Native
Mike Melusky
 
KEY
Google App Engine Java, Groovy and Gaelyk
Guillaume Laforge
 
PDF
The Kubernetes Effect
Bilgin Ibryam
 
PPTX
Parallel and Asynchronous Programming - ITProDevConnections 2012 (English)
Panagiotis Kanavos
 
PPTX
Concurrency Programming in Java - 01 - Introduction to Concurrency Programming
Sachintha Gunasena
 
PDF
DevOps: Automate all the things
Mat Mannion
 
PDF
mpandya_poster
Mihir Pandya
 
PDF
Byteman and The Jokre, Sanne Grinovero (JBoss by RedHat)
OpenBlend society
 
PPTX
Why real integration developers ride Camels
Christian Posta
 
PDF
[DevDay 2017] ReactJS Hands on - Speaker: Binh Phan - Developer at mgm techno...
DevDay Da Nang
 
PDF
Immutable infrastructure:觀念與實作 (建議)
William Yeh
 
PPTX
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
Muralidharan Deenathayalan
 
PPTX
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
Muralidharan Deenathayalan
 
State of angular ecosystem
Giovanni Cândido da Silva
 
All about that reactive ui
Paul van Zyl
 
Modern Web Framework : Play framework
Suman Adak
 
Hot to build continuously processing for 24/7 real-time data streaming platform?
GetInData
 
Introduction to React native
Dhaval Barot
 
Jython in workflow and rules engines
Vaclav Tunka
 
Lattice yapc-slideshare
Gwenn Etourneau
 
An evening with React Native
Mike Melusky
 
Google App Engine Java, Groovy and Gaelyk
Guillaume Laforge
 
The Kubernetes Effect
Bilgin Ibryam
 
Parallel and Asynchronous Programming - ITProDevConnections 2012 (English)
Panagiotis Kanavos
 
Concurrency Programming in Java - 01 - Introduction to Concurrency Programming
Sachintha Gunasena
 
DevOps: Automate all the things
Mat Mannion
 
mpandya_poster
Mihir Pandya
 
Byteman and The Jokre, Sanne Grinovero (JBoss by RedHat)
OpenBlend society
 
Why real integration developers ride Camels
Christian Posta
 
[DevDay 2017] ReactJS Hands on - Speaker: Binh Phan - Developer at mgm techno...
DevDay Da Nang
 
Immutable infrastructure:觀念與實作 (建議)
William Yeh
 
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
Muralidharan Deenathayalan
 
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
Muralidharan Deenathayalan
 

Recently uploaded (20)

PDF
Chapter 1 Introduction to CV and IP Lecture Note.pdf
Getnet Tigabie Askale -(GM)
 
PDF
Why Your AI & Cybersecurity Hiring Still Misses the Mark in 2025
Virtual Employee Pvt. Ltd.
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PPTX
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
PDF
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
PPTX
ChatGPT's Deck on The Enduring Legacy of Fax Machines
Greg Swan
 
PDF
Software Development Methodologies in 2025
KodekX
 
PDF
How-Cloud-Computing-Impacts-Businesses-in-2025-and-Beyond.pdf
Artjoker Software Development Company
 
PDF
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
PPTX
Stamford - Community User Group Leaders_ Agentblazer Status, AI Sustainabilit...
Amol Dixit
 
PPTX
How to Build a Scalable Micro-Investing Platform in 2025 - A Founder’s Guide ...
Third Rock Techkno
 
PDF
Software Development Company | KodekX
KodekX
 
PDF
Doc9.....................................
SofiaCollazos
 
PDF
Advances in Ultra High Voltage (UHV) Transmission and Distribution Systems.pdf
Nabajyoti Banik
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PDF
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 
PDF
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
PDF
A Day in the Life of Location Data - Turning Where into How.pdf
Precisely
 
PDF
Revolutionize Operations with Intelligent IoT Monitoring and Control
Rejig Digital
 
PPTX
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
Chapter 1 Introduction to CV and IP Lecture Note.pdf
Getnet Tigabie Askale -(GM)
 
Why Your AI & Cybersecurity Hiring Still Misses the Mark in 2025
Virtual Employee Pvt. Ltd.
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
ChatGPT's Deck on The Enduring Legacy of Fax Machines
Greg Swan
 
Software Development Methodologies in 2025
KodekX
 
How-Cloud-Computing-Impacts-Businesses-in-2025-and-Beyond.pdf
Artjoker Software Development Company
 
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
Stamford - Community User Group Leaders_ Agentblazer Status, AI Sustainabilit...
Amol Dixit
 
How to Build a Scalable Micro-Investing Platform in 2025 - A Founder’s Guide ...
Third Rock Techkno
 
Software Development Company | KodekX
KodekX
 
Doc9.....................................
SofiaCollazos
 
Advances in Ultra High Voltage (UHV) Transmission and Distribution Systems.pdf
Nabajyoti Banik
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
A Day in the Life of Location Data - Turning Where into How.pdf
Precisely
 
Revolutionize Operations with Intelligent IoT Monitoring and Control
Rejig Digital
 
cloud computing vai.pptx for the project
vaibhavdobariyal79
 

Using Jython To Prototype Mahout Code

Editor's Notes

  • #2: \n
  • #3: First we built a travel booking tool\nThen we integrated it with expense and built reporting\nThen we went back and built the trip data storage subsystem to handle increased volumes of data\nNow we are trying to put the combined travel and expense data into Hadoop to do analysis and leverage the knowledge of our customers for their benefit\n
  • #4: So Mahout looked like it might be a good way to bootstrap our efforts around building recommendations. If nothing else, it might be a fast path to v1 while we write more specialized algorithms tuned to our specific data sets as a v2.\n
  • #5: It’s very cool: Jim H started both projects as tests: jython to see if jvm would be faster than python’s vm. IronPython to “prove” CLR was slow compared to e.g. JVM (it wasn’t)\nYeah, jython’s definitely on the cutting edge with python 2.5 support\n
  • #6: Mahout appears to be a good system for doing recommendation engines. We need to find out how good, and what its strengths and limitations are.\n\nI do know some java; enough to do some light recreational Android programming. But not only do I know python, the data scientist who will actually determine the optimal factors to build our recommendation engine on knows python. She also doesn’t know java (yet?). So I have a tool that the team is familiar with\n\nJust building Mahout so I could test it out was painful enough. It requires maven2 to build, but since this is an existing project it was all configured for me to just build after downloading. But I still find it painful to watch maven work.\n\nI shuddered at the thought of having to actually do the maven setup for a new project that would have to be built\n\nMost importantly here, what you end up with when you make Mahout accessible via jython is a rapid prototyping/testing/experimentation tool for building out Mahout code. We’ve taken out the ceremony. That’s all.\n\nWhen you’re done figuring out what you need to do, you could then move to compiled java for speed.\n\nBut, for many/most applications, you can probably stop there. The actual Mahout processing is the serious limiting factor here, not the jython code. My suspicion is that there’s far more performance to be gained optimizing the actual Mahout implementation than moving the jython code (which is native jvm by the time it runs) to java/scala/clojure\n
  • #7: \n
  • #8: The single largest chunk of my time was actually spent trying to decide what jars I had to append to my jython path, followed by really grokking the jython path/import stuff\n\nAs you can see, after enough time I just punted on the jar dependencies. Every single jar is on the path, although I only import from the ones I need. Worth some research into jython to see if I’m adding any overhead other than search path like opening/inspecting the jars. I suspect not.\n\nNow, if you knew maven, it might take less time to start a new project and get it up than I would take, but once *I* was done, every subsequent jython script takes almost no time to set up, and the project is ready to run as soon as you’ve saved your source code.\n\nWe can work without having to either build a new app for every experiment, or build in some way to control which experiment runs in some ever-growing app\n
  • #9: I haven’t really tested either PyCharm or PyDev to do these things. Someone else can do *that* lightning talk at a later meetup\n