Russell Jurney

Agile Data Science 2.0

O'Reilly Media Feb 2017

Building analytics products at scale requires a deep investment in people, machines, and time. How can you be sure you’re building the right models that people will pay for? With this hands-on book, you’ll learn a flexible toolset and methodology for building effective analytics applications with Spark.

Using lightweight tools such as Python, PySpark, Elastic MapReduce, MongoDB, ElasticSearch, Doc2vec, Deep Learning, D3.js, Leaflet, Docker and Heroku, your team will create an agile…

Building analytics products at scale requires a deep investment in people, machines, and time. How can you be sure you’re building the right models that people will pay for? With this hands-on book, you’ll learn a flexible toolset and methodology for building effective analytics applications with Spark.

Using lightweight tools such as Python, PySpark, Elastic MapReduce, MongoDB, ElasticSearch, Doc2vec, Deep Learning, D3.js, Leaflet, Docker and Heroku, your team will create an agile environment for exploring data, starting with an example application to mine flight data into an analytic product. You’ll learn an iterative approach that enables you to quickly change the kind of analysis you’re doing, depending on what the data is telling you. All example code in this book is available as working applications.

* Create analytics applications by using the Agile Data Science development methodology
* Build value from your data in a series of agile sprints, using the data-value pyramid
* Learn how to build and deploy predictive analytics using Kafka and Spark Streaming
* Extract features for statistical models from a single dataset
* Visualize data with charts, and expose different aspects through interactive reports
* Use historical data to predict the future via classification and regression
* Translate predictions into actions
* Get feedback from users after each sprint to keep your project on track

See publication
Big Data for Chimps

O'Reilly Media September 15, 2015
With Early Release ebooks, you get books in their earliest form—the author's raw and unedited content as he or she writes—so you can take advantage of these technologies long before the official release of these titles. You'll also receive updates when significant changes are made, new chapters as they're written, and the final ebook bundle.

To help you answer big data questions, this unique guide shows you how to use simple, fun, and elegant tools leveraging Apache Hadoop. You'll learn…

With Early Release ebooks, you get books in their earliest form—the author's raw and unedited content as he or she writes—so you can take advantage of these technologies long before the official release of these titles. You'll also receive updates when significant changes are made, new chapters as they're written, and the final ebook bundle.

To help you answer big data questions, this unique guide shows you how to use simple, fun, and elegant tools leveraging Apache Hadoop. You'll learn how to break problems into efficient data transformations to meet most of your analysis needs.

This book uses real data and real problems to illustrate patterns found across knowledge domains. It equips you with a fundamental toolkit for performing statistical summaries, text mining, spatial and time-series analysis, and light machine learning. For those working in an elastic cloud environment, you'll learn superpowers that make exploratory analytics especially efficient.

* Learn from detailed example programs that apply Hadoop to interesting problems in context
* Gain advice and best practices for efficient software development
* Discover how to think at scale by understanding how data must flow through the cluster to effect transformations
* Identify the tuning knobs that matter, and rules-of-thumb to know when they're needed

Other authors
See publication
Agile Data Science

O'Reilly Media October 1, 2013

Mining big data requires a deep investment in people and time. How can you be sure you’re building the right models? With this hands-on book, you’ll learn a flexible toolset and methodology for building effective analytics applications with Hadoop.

Using lightweight tools such as Python, Apache Pig, and the D3.js library, your team will create an agile environment for exploring data, starting with an example application to mine your own email inboxes. You’ll learn an iterative approach…

Mining big data requires a deep investment in people and time. How can you be sure you’re building the right models? With this hands-on book, you’ll learn a flexible toolset and methodology for building effective analytics applications with Hadoop.

Using lightweight tools such as Python, Apache Pig, and the D3.js library, your team will create an agile environment for exploring data, starting with an example application to mine your own email inboxes. You’ll learn an iterative approach that enables you to quickly change the kind of analysis you’re doing, depending on what the data is telling you. All example code in this book is available as working Heroku apps.

* Create analytics applications by using the agile big data development methodology
* Build value from your data in a series of agile sprints, using the data-value stack
* Gain insight by using several data structures to extract multiple features from a single dataset
* Visualize data with charts, and expose different aspects through interactive reports
* Use historical data to predict the future, and translate predictions into action
* Get feedback from users after each sprint to keep your project on track

See publication
How to Build a Data Startup

Forbes: O'Reilly Radar November 2, 2010

Jurney takes on the question of how many people you need to start a data product team. He draws out the ideal roles for such a team including: customer, market strategist, deal maker, product manager, experience designer, interaction designer, Web developer, data hacker and researcher.

See publication
The Next Silicon Valley

The Wall Street Journal: All Things Digital August 10, 2010
What is Silicon Valley?

An economic cluster. A network of networks, rich in financial and social capital spanning every area of technology, all focused on developing and commercializing new technologies.

Other authors
See publication
LinkedIn, Apache Pig and Open Source

LinkedIn Blog June 1, 2010

The Integrated Circuit solved the Tyranny of Numbers and unleashed Moore’s law, enabling a computerized, networked society. It did so with the considerable overhead of patent licensing and litigation. MapReduce is solving the Tyranny of Threads, enabling any company to process data at scale in parallel to extract real value from our most abundant and underutilized resource: information. It is doing it in the open, through free and open-source software, through the Apache Foundation, Hadoop…

The Integrated Circuit solved the Tyranny of Numbers and unleashed Moore’s law, enabling a computerized, networked society. It did so with the considerable overhead of patent licensing and litigation. MapReduce is solving the Tyranny of Threads, enabling any company to process data at scale in parallel to extract real value from our most abundant and underutilized resource: information. It is doing it in the open, through free and open-source software, through the Apache Foundation, Hadoop and its sub-projects. We’ve gotten more efficient organizationally this time around.

See publication
MapReduce for the People

Lance Weatherby April 9, 2009

We are constrained in our strategies by what we imagine possible.
MapReduce and cloud computing open broad possibilities and business
opportunities by placing a usable supercomputer by the hour in the
hands of every startup that wants one. There is no problem which you
lack the processing power to solve, its just a question of whether the
hourly cost is profitable. That's a profound change from being bound to
one machine. As a result of this shift, smaller companies can…

We are constrained in our strategies by what we imagine possible.
MapReduce and cloud computing open broad possibilities and business
opportunities by placing a usable supercomputer by the hour in the
hands of every startup that wants one. There is no problem which you
lack the processing power to solve, its just a question of whether the
hourly cost is profitable. That's a profound change from being bound to
one machine. As a result of this shift, smaller companies can attack
'bigger' problems without a large up-front investment in hardware or
software infrastructure.

A new renaissance in computing is coming that will be comparable to
the business adoption of the personal computer and VisiCalc, and
MapReduce will drive it.

See publication
Mapping Big Data

O'Reilly Media

To discover the shape and structure of the big data market, the San Francisco-based startup Relato took a unique approach to market research and created the first fully data-driven market report. Company CEO Russell Jurney and his team collected and analyzed raw data from a variety of sources to reveal a boatload of business insights about the big data space. This exceptional report is now available for free download.
Using data analytic techniques such as social network analysis (SNA)…

To discover the shape and structure of the big data market, the San Francisco-based startup Relato took a unique approach to market research and created the first fully data-driven market report. Company CEO Russell Jurney and his team collected and analyzed raw data from a variety of sources to reveal a boatload of business insights about the big data space. This exceptional report is now available for free download.
Using data analytic techniques such as social network analysis (SNA), Relato exposed the vast and complex partnership network that exists among tens of thousands of unique big data vendors. The dataset Relato collected is centered around Cloudera, Hortonworks, and MapR, the major platform vendors of Hadoop, the primary force behind this market.

From this snowball sample, a 2-hop network, the Relato team was able to answer several questions, including:

* Who are the major players in the big data market?
* Which is the leading Hadoop vendor?
* What sectors are included in this market and how do they relate?
* Which among the thousands of partnerships are most important?
* Who’s doing business with whom?

Metrics used in this report are also visible in Relato’s interactive web application, via a link in the report, which walks you through the insights step-by-step.

See publication

Patents

Methods and systems for exploring career options

US US20120226623 A1

Techniques for presenting career information are described. Consistent with some embodiments, the profile data of members of a social network service is analyzed to generate a set of probabilities for use in predicting career transitions. Based on some profile data (e.g., academic major, academic degree, desired industry, etc.) provided by a user, the derived probabilities are used to predict a set of job titles likely to be of interest to the user. By repeating this process, the user can…

Techniques for presenting career information are described. Consistent with some embodiments, the profile data of members of a social network service is analyzed to generate a set of probabilities for use in predicting career transitions. Based on some profile data (e.g., academic major, academic degree, desired industry, etc.) provided by a user, the derived probabilities are used to predict a set of job titles likely to be of interest to the user. By repeating this process, the user can generate a career path, which is displayed in a visual and interactive manner, enabling the user to explore various aspects of different careers, industries and jobs.

See patent

Projects

Apache DataFu

Nov 2014 - Present

See project
InMaps

May 2011 - Present
InMaps is an interactive visual representation of your professional universe. It's a great way to understand the relationships between you and your entire set of LinkedIn connections. With it you can better leverage your professional network to help pass along job opportunities, seek professional advice, gather insights, and more.

Other creators
See project
Agile Data Science 2.0

-

Agile Data Science 2.0 [and its source code] were released in June, 2017 by O'Reilly Media. Source code is included on Github, with a Dockerfile to run everything. Please look at the notebooks, not the code in the book - it is badly outdated.

The book description follows:

Data science teams looking to turn research into useful analytics applications require not only the right tools, but also the right approach if they’re to succeed. With the revised second edition of this hands-on…

Agile Data Science 2.0 [and its source code] were released in June, 2017 by O'Reilly Media. Source code is included on Github, with a Dockerfile to run everything. Please look at the notebooks, not the code in the book - it is badly outdated.

The book description follows:

Data science teams looking to turn research into useful analytics applications require not only the right tools, but also the right approach if they’re to succeed. With the revised second edition of this hands-on guide, up-and-coming data scientists will learn how to use the Agile Data Science development methodology to build data applications with Python, Apache Spark, Kafka, and other tools.

Author Russell Jurney demonstrates how to compose a data platform for building, deploying, and refining analytics applications with Apache Kafka, MongoDB, ElasticSearch, d3.js, scikit-learn, and Apache Airflow. You’ll learn an iterative approach that lets you quickly change the kind of analysis you’re doing, depending on what the data is telling you. Publish data science work as a web application, and affect meaningful change in your organization.
Career Explorer

-

See project

Languages

Russian

Limited working proficiency
Spanish

Elementary proficiency
Greek, Ancient (to 1453)

Elementary proficiency

Organizations

Bay Area Data Drinking Group

Full Member

Recommendations received

LinkedIn User

“Russ is an amazing leader. He brought color and passion to our daily activities at Deep Discovery where he was co-founder and CTO. I can't say enough positive things about how effective Russ was at motivating our team to see the real-world value and importance of our work: the forest for the trees. Technically, Russ has great instincts and knows how to get to the root of a technical problem--identifying key leverage points to unblock complex problems and move quickly.”
Victoria Needham

“Russell is a complete, wide-screen, full-bandwidth breath of fresh air. If you are ready for it and are open to a level of enlightment (and not just the technical variety), then Russell will be more than happy to take you on a ride that will change forever the way you think about computing (and other things). Visionaries are now rare in an industry that seems to have been inundated with people who simply need a steady job. Russell and (you have probably guessed by now) I work on technology and projects that scare everybody else off, but that's what visionaries do - make is safe for everybody else.”

22 people have recommended Russell

Join now to view

More activity by Russell

I find Graphistry is an essential tool for sense making with large networks.

I find Graphistry is an essential tool for sense making with large networks.

Shared by Russell Jurney
Make big graphs less of a hairball in the latest Graphistry, Inc. pygraphistry with one line: `g.modularity_weighted_layout().plot()`…

Make big graphs less of a hairball in the latest Graphistry, Inc. pygraphistry with one line: `g.modularity_weighted_layout().plot()`…

Liked by Russell Jurney
Interesting article about the decision to reevaluate the NGAD.

Interesting article about the decision to reevaluate the NGAD.

Shared by Russell Jurney
Powerful: - Reasoning-grade LLMs now generate good answers faster than we read: 50-100 words/sec - That's 5-10X over last year's GPT-4, and we can…

Powerful: - Reasoning-grade LLMs now generate good answers faster than we read: 50-100 words/sec - That's 5-10X over last year's GPT-4, and we can…

Liked by Russell Jurney

View Russell’s full profile

See who you know in common
Get introduced
Contact Russell directly

Join to view full profile

Other similar profiles

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Add new skills with these courses

See all courses

Russell Jurney

Bellevue, Washington, United States 4K followers 500+ connections

See your mutual connections View mutual connections with Russell Sign in Welcome back Email or phone Password Show Forgot password? Sign in or New to LinkedIn? Join now or New to LinkedIn? Join now

About

Articles by Russell

Knowledge Graph Construction

By Russell Jurney

Large Language Models and Data Centric AI

By Russell Jurney

TAPE: LLM Explanations as GNN Features

By Russell Jurney

Activity

Shared by Russell Jurney

In my latest Towards Data Science blog I demonstrate how to import Microsoft's GraphRAG knowledge graph into Neo4j ( helped by Michael Hunger ) and…

Liked by Russell Jurney

Sentence embedding models are a key component in #RAG. The idea is simple: you get a vector representation of each text chunk (representing its…

Liked by Russell Jurney

Experience

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

Education

Volunteer Experience

Cofounder

Publications

O'Reilly Media Feb 2017

O'Reilly Media September 15, 2015

O'Reilly Media October 1, 2013

Forbes: O'Reilly Radar November 2, 2010

The Wall Street Journal: All Things Digital August 10, 2010

LinkedIn Blog June 1, 2010

Lance Weatherby April 9, 2009

O'Reilly Media

Patents

US US20120226623 A1

Projects

Nov 2014 - Present

May 2011 - Present

Agile Data Science 2.0

-

-

Languages

Russian

Limited working proficiency

Spanish

Elementary proficiency

Greek, Ancient (to 1453)

Elementary proficiency

Organizations

Bay Area Data Drinking Group

Full Member

Recommendations received

LinkedIn User

Victoria Needham

More activity by Russell

I find Graphistry is an essential tool for sense making with large networks.

Shared by Russell Jurney

Make big graphs less of a hairball in the latest Graphistry, Inc. pygraphistry with one line: `g.modularity_weighted_layout().plot()`…

Liked by Russell Jurney

Interesting article about the decision to reevaluate the NGAD.

Shared by Russell Jurney

Powerful: - Reasoning-grade LLMs now generate good answers faster than we read: 50-100 words/sec - That's 5-10X over last year's GPT-4, and we can…

Bellevue, Washington, United States

4K followers 500+ connections

View mutual connections with Russell

Welcome back

Email or phone

Password

Forgot password?

or

New to LinkedIn? Join now

or

New to LinkedIn? Join now