0% found this document useful (0 votes)
21 views2 pages

Tut031 Zhu

The document outlines a tutorial on predictive modeling in business applications, emphasizing its importance in driving ROI through actionable predictions. It covers best practices, challenges, and technologies for building predictive modeling solutions, with case studies from LinkedIn. The tutorial is aimed at researchers and practitioners interested in industry applications of predictive modeling, providing insights into framework selection and popular tools.

Uploaded by

charangoud02126
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views2 pages

Tut031 Zhu

The document outlines a tutorial on predictive modeling in business applications, emphasizing its importance in driving ROI through actionable predictions. It covers best practices, challenges, and technologies for building predictive modeling solutions, with case studies from LinkedIn. The tutorial is aimed at researchers and practitioners interested in industry applications of predictive modeling, providing insights into framework selection and popular tools.

Uploaded by

charangoud02126
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Business Applications of Predictive

Modeling at Scale
Qiang Zhu1, Songtao Guo1, Paul Ogilvie2, Yan Liu1
Business Analytics1 and Engineering2 at LinkedIn Corporation
2029 Stierln Ct, Mountain View, CA 94043 USA
{qzhu, soguo, pogilvie, yliu}@linkedin.com

ABSTRACT We motivate the tutorial with prominent examples of predictive


Predictive modeling is the art of building statistical models that models and demonstrate how actionable prediction scores can fuel
forecast probabilities and trends of future events. It has broad better ROI.
applications in industry across different domains. Some popular  Motivating examples
examples include user intention predictions, lead scoring, churn  Understanding of audience and learning objectives
analysis, etc. In this tutorial, we will focus on the best practice of
predictive modeling in the big data era and its applications in 2.2 Predictive Modeling Overview
industry, with motivating examples across a range of business Those applying predictive modeling in a business environment
tasks and relevance products. We will start with an overview of must carefully consider a wider range of aspects than commonly
how predictive modeling helps power and drive various key discussed in academic literature. Statistical methods and machine
business use cases [5]. We will introduce the essential concepts learning algorithms are just one component of full solution. This
and state of the art in building end-to-end predictive modeling section describes the full range of considerations, details common
solutions, and discuss the challenges [6], key technologies, and challenges, and provides a concrete example of applying
lessons learned from our practice, including case studies of predictive modeling to the feed ranking problem at LinkedIn.
LinkedIn feed relevance [1-4] and a platform for email response
prediction. Moreover, we will discuss some practical solutions of  End-to-end walkthrough of a production modeling
building predictive modeling platform to scale the modeling solution
efforts for data scientists and analysts, along with an overview of o Label preparation
popular tools and platforms used across the industry. o Data integration and feature engineering
o Machine learning algorithms
Keywords o Model management
predictive modeling; business analytics; machine learning; o Performance measurement through A/B test
machine learning platforms  Common pitfalls and challenges
 Case Study - LinkedIn Feed Ranking
1. INTENDED AUDIENCE
This tutorial is suitable for researchers, students, and practitioners 2.3 Choosing a Framework
of predictive modeling who are interested in the industry A practitioner in industry may be faced with the challenge of
applications. Advanced techniques in data mining and statistical choosing or building a framework for predictive modeling within
modeling are not required but some background in statistics and their company. Deciding whether to build or buy depends on a
big data is expected. range of considerations, which the tutorial presents. We also
present an overview of existing platforms and open source
2. OUTLINE software, closing with a concrete example of the decisions made
The tutorial consists of three main sections: an introduction, an for a propensity modeling developed at LinkedIn.
overview of predictive modeling, and considerations when
 Considerations when choosing a framework
choosing a framework for predictive modeling.
 Platforms
2.1 Introduction o Amazon Machine Learning1
In the introduction, we will briefly survey the audience to better o Databricks2
understand their goals and adapt the depth of information o Microsoft Azure Machine Learning3
presented during the tutorial. o Google Cloud Machine Learning Platform4
o H2O5
o Dato6

1
Permission to make digital or hard copies of part or all of this work for https://aws.amazon.com/machine-learning
personal or classroom use is granted without fee provided that copies are 2 https://databricks.com
not made or distributed for profit or commercial advantage and that 3
copies bear this notice and the full citation on the first page. Copyrights https://azure.microsoft.com/en-us/services/machine-learning
for third-party components of this work must be honored. For all other 4 https://cloud.google.com/ml/
uses, contact the Owner/Author.
5
Copyright is held by the owner/author(s). http://www.h2o.ai
KDD '16, August 13-17, 2016, San Francisco, CA, USA 6 https://dato.com
ACM 978-1-4503-4232-2/16/08.
http://dx.doi.org/10.1145/2939672.2945388
 Open source software functions. Before joining LinkedIn, she worked on search
o Vowpal Wabbit7 relevance and personalization at NexTag. Yan holds a Ph.D. in
o Spark MLlib8 statistics from University of Virginia and B.S. in computer
o DMLC9 science from China.
o Scikit-learn10
o R11 4. REFERENCES
 Modoop - example of a scaled framework [1] Agarwal, D., Chen, B.C., Gupta, R., Hartman, J., He,
Q., Iyer, A., Kolar, S., Ma, Y., Shivaswamy, P., Singh,
3. PRESENTER INFORMATION A., and Zhang, L. 2014. Activity ranking in LinkedIn
Qiang Zhu is a Staff member of Business Analytics Data Mining feed. Proceedings of the 20th ACM SIGKDD
team at LinkedIn. He and his team apply advanced Data Mining international conference on Knowledge discovery and
techniques to drive LinkedIn’s monetization efforts, ranging from data mining (KDD '14). ACM, New York, NY, USA,
a machine learning platform which powers member Email 1603-1612.
Marketing, to Sales Intelligence tools while help salespeople sell [2] Agarwal, D., Chen, B.C., He, Q., Hua, Z., Lebanon, G.,
smarter. Prior to joining LinkedIn, he worked at StumbleUpon as Ma, Y., Shivaswamy, P., Tseng, H.P., Yang, J., and
a Data Scientist. Qiang holds a PhD in Computer Science from Zhang, L. 2015. Personalizing LinkedIn Feed. In
University of California, Riverside. His work has appeared in Proceedings of the 21th ACM SIGKDD International
many top tier Data Mining conferences and journals, including the Conference on Knowledge Discovery and Data Mining
one which won the Best Paper Award in SIGKDD 2012. (KDD '15). ACM, New York, NY, USA, 1651-1660.
Songtao Guo is a Principal Data Scientist and tech lead of Data [3] Lebanon, G. 2015. Making Your Feed More Relevant –
Mining team at Linkedin where he leads many of data driven Part I. November 17, 2015. Retrieved June 12, 2016
products and analytics systems. His work involves building large- from
scale knowledge base as one of the foundations of LinkedIn's https://engineering.linkedin.com/blog/2015/11/making-
Economic Graph, inventing data mining platforms to scale your-feed-more-relevant--part-i
business analytics and partnering with product, sales, and
[4] Lebanon, G. 2016. Making Your Feed More Relevant –
marketing to deliver impactful solutions. Before joining LinkedIn,
Part 2: Relevance models and features. March 15, 2016.
Songtao was a senior researcher at AT&T interactive, focusing on
Retrieved June 12, 2016 from
improving data quality and search relevancy for local business
https://engineering.linkedin.com/blog/2016/03/making-
search. He holds a PhD in computer science from University of
your-feed-more-relevant--part-2--relevance-models-
North Carolina at Charlotte where he studied privacy preserving
and-fea
data mining.
[5] Rosenberg, C. 2015. B2B Predictive Analytics
Paul Ogilvie manages the Machine Learning Algorithms team in Technology Report: Best practices, tools, and vendor
the Engineering organization of LinkedIn. The team’s mission is evaluations to help marketing and sales organizations
to research and develop the learning algorithm libraries and adopt predictive analytics. July, 2015. Retrieved June
datasets that help research scientists more productively build 12, 2016 from Infer: https://www.infer.com/wp-
state-of-the art relevance models. He earned his PhD in Language content/uploads/2015/08/TOPO-Predictive-Analytics-
and Information Technologies from Carnegie Mellon University 08-03-15.pdf
in 2010, where he studied semi-structured information retrieval
with applications to web search, XML element retrieval, and [6] Sculley, D., Holt, G., Golovin, D., Davydov, E.,
question answering systems. He has previously worked on news Phillips, T., Ebner, D., Chaudhary, V., and Young, M.
recommendation at a startup (mSpoke) and at LinkedIn. 2014. Machine Learning: The High-Interest Credit Card
of Technical Debt. Software Engineering for Machine
Yan Liu manages the Data Mining team at LinkedIn Analytics Learning (NIPS 2014 Workshop).
group. She leads various data mining initiatives and efforts in
building advanced intelligence solutions and scalable data mining
platforms to create leverage and drive business impact across

7 https://github.com/JohnLangford/vowpal_wabbit/wiki
8 http://spark.apache.org/docs/latest/mllib-guide.html
9 http://dmlc.ml
10 http://scikit-learn.org
11 https://www.r-project.org

You might also like