Schifferer 2020

Uploaded by

fabian trillo malave

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views2 pages

Schifferer 2020

Uploaded by

fabian trillo malave

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Tutorial: Feature Engineering for Recommender Systems

Chris Deotte∗ Benedikt Schifferer∗ Even Oldridge∗†

[email protected] [email protected] [email protected]
NVIDIA NVIDIA NVIDIA
San Diego, California, United States New York City, New York, United Vancouver, British Columbia, Canada
States
ABSTRACT 1 MOTIVATION
The selection of features and proper preparation of data for deep In our tutorial, we provide a general framework for feature engi-
learning or machine learning models plays a significant role in the neering specific to recommender systems building off our teams’
performance of recommender systems. To address this we propose collective experience creating production recommender systems or
a tutorial highlighting best practices and optimization techniques competing in data science competitions, such as Kaggle and RecSys.
for feature engineering and preprocessing of recommender system Academic literature in recommender systems focuses mainly on
datasets. The tutorial will explore feature engineering using pan- the different models and model types and rarely discusses the steps
das and Dask, and will also cover acceleration on the GPU using for preprocessing or feature engineering. Feature engineering is
open source libraries like RAPIDS and NVTabular. Proposed length an important component in recommender systems, which can be
is 180min. We’ve designed the tutorial as a combination of a lec- easily integrated into an existing model. The data structure of the
ture covering the mathematical and theoretical background and an recommender system (tabular data) limits the models capabilities
interactive session based on jupyter notebooks. Participants will to learn the relationships between features and adding hand crafted
practice the discussed features by writing their own implementa- features can significantly boost their performance. For example, we
tion in Python. NVIDIA will host the tutorial on their infrastructure, observed in the RecSys2020 challenge, that hand crafted features
providing dataset, jupyter notebooks and GPUs. Participants will with simple models outperformed complex model architectures.
be able to easily attend the tutorial via their web browsers, avoid-
ing any complicated setup. Beginner to intermediate users are the 2 IMPORTANCE FOR THE RECSYS
target audience, which should have prior knowledge in python pro- COMMUNITY
gramming using libraries, such as pandas and NumPy. In addition,
Our goal is that participants can integrate the learned material into
they should have a basic understanding of recommender systems,
their recommender systems. As mentioned above, engineering hand
decision trees and feed forward neural networks.
crafted features can significantly improve recommendation systems.
Furthermore, participants will learn to optimize their feature engi-
CCS CONCEPTS neering pipelines, allowing for more exploration and iteration. The
• Information systems → Recommender systems; Content time taken to perform feature engineering, categorical encoding
analysis and feature selection; • Computer systems organiza- and normalization of numerical variables often exceeds the time
tion → Single instruction, multiple data. it takes to train a deep recommender model itself. Optimizing the
data processing enables the participants having more iterations and
KEYWORDS to try out more ideas. In our experiments, we are able to reduce the
Recommender Systems, Deep Learning, Boosting, Preprocessing, calculation time from multiple days into less than an hour. Apply-
Feature Engineering, GPU Acceleration ing our techniques allows the participants to focus on their actual
work of designing recommendation models instead of waiting for
ACM Reference Format:
the preprocessing calculations. Finally, reducing the data process-
Chris Deotte, Benedikt Schifferer, and Even Oldridge. 2020. Tutorial: Feature
Engineering for Recommender Systems . In Fourteenth ACM Conference on ing time empowers retraining the recommendation system more
Recommender Systems (RecSys ’20), September 22–26, 2020, Virtual Event, frequently, having updated models in production systems.
Brazil. ACM, New York, NY, USA, 2 pages. https://doi.org/10.1145/3383313.
3411543 3 OUTLINE
∗ Authors
Section 1 - Theory (40 min)
contributed equally to this research.
† Corresponding Author • Introduction and tutorial overview
• Short review recommendation models, tree-based and deep
Permission to make digital or hard copies of part or all of this work for personal or learning
classroom use is granted without fee provided that copies are not made or distributed • Overview different input feature types
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for third-party components of this work must be honored. • Preprocessing techniques
For all other uses, contact the owner/author(s). – cleaning, imputing missing values, correcting outliers
RecSys ’20, September 22–26, 2020, Virtual Event, Brazil • Feature engineering
© 2020 Copyright held by the owner/author(s).
ACM ISBN 978-1-4503-7583-2/20/09. – see Table 1
https://doi.org/10.1145/3383313.3411543 – Overview of all feature engineering techniques

754
RecSys ’20, September 22–26, 2020, Virtual Event, Brazil Deotte, Schifferer and Oldridge

Table 1: Overview of different feature engineering techniques by feature types

Feature Type Feature Engineering

Categorical Target Encoding
Count Encoding
Categorifying
Unstructured Lists Target Encoding
Count Encoding
Categorifying
Numeric Normalization (mean/std, min/max, log-based, Gauss Rank)
Power transformer
Binning
Timestamp Extract Month, Day, Weekday, Weekend, Hour, Minute, Second
Target Encode intervals
Count Encode intervals
Normalize based on time zone
Timeseries Time since last event
Differences in time (lag features)
# of events in the last 1min, 5min, 30min, etc.
Text Extract keywords
Tf–idf
Language embeddings (deep learning)
Length/Quality/Complexity
Images Image embeddings (deep learning)
Resolution
Quality
Color spectrum
Social Graph Link analysis
Geo Location Distance to different POI
Characteristics in area

– In-depth lecture of most common techniques (bold) • Rewriting Preprocessing and Feature Engineering pipeline
with NVTabular
5min break
Section 2 - Hands-on (80 min) Section 4 - Wrap up/Summary (10 min)

• Example of different types of data (exploring dataset) ACKNOWLEDGMENTS

• Hands-on contains multiple exercises, participants have to The authors wish to thank our colleagues on the Deep Learning
fill-in blank code blocks Institute, RecSys, KGMON and RAPIDS.AI teams for their support
– Preprocess techniques and in particular Joshua Patterson for his vision of a GPU accel-
∗ cleaning, imputing missing values, correcting outliers erated data science workflow and Nicolas Koumchatzky for his
– Feature engineering guidance and recommender system expertise.
∗ Implementation of most common techniques (see Table
1 bold)
5min break
Section 3 - Optimization (40 min)
• Definition of typical bottlenecks
• General optimization best-practices to speed-up
• Implementation of preprocessing techniques / feature engi-
neering, which required significant calculation time
• Participants will implement some operations
• Introduction to NVTabular pipeline

755

AI Feature Engineering in Detail
No ratings yet
AI Feature Engineering in Detail
12 pages
A 12 Garbage Classification Using Deep Learning Techniques
No ratings yet
A 12 Garbage Classification Using Deep Learning Techniques
7 pages
Rajat Agarwal-21bcon630
No ratings yet
Rajat Agarwal-21bcon630
13 pages
C2 - W2 Mlopssadasdsa
No ratings yet
C2 - W2 Mlopssadasdsa
123 pages
RS Assignment 2
No ratings yet
RS Assignment 2
3 pages
Feature Engineering
No ratings yet
Feature Engineering
13 pages
Feature Engineering and Deep Learning
No ratings yet
Feature Engineering and Deep Learning
2 pages
DL - 2019 - Deep Neural Networks For Recommender Systems
No ratings yet
DL - 2019 - Deep Neural Networks For Recommender Systems
5 pages
Ccs341 Data Warehousing All Units
No ratings yet
Ccs341 Data Warehousing All Units
86 pages
IRS Answer Key
No ratings yet
IRS Answer Key
16 pages
Learning Feature Engineering For Classification
No ratings yet
Learning Feature Engineering For Classification
7 pages
CIR 106 Database Systems Notes
No ratings yet
CIR 106 Database Systems Notes
57 pages
NLP Unit I
No ratings yet
NLP Unit I
30 pages
Chap 02
No ratings yet
Chap 02
70 pages
Quantum Machine Learning in Medical Image Analysis: A Survey
No ratings yet
Quantum Machine Learning in Medical Image Analysis: A Survey
12 pages
CH 9 Kiit
No ratings yet
CH 9 Kiit
28 pages
Secure Crypto-Biometric System For Cloud Computing Using HMM Algorithm
No ratings yet
Secure Crypto-Biometric System For Cloud Computing Using HMM Algorithm
12 pages
Malama Hassan
No ratings yet
Malama Hassan
6 pages
Front Pages
No ratings yet
Front Pages
30 pages
CC Unit-5
No ratings yet
CC Unit-5
33 pages
Cryptography and Network Security: Prepared by Anirban Bhadra
No ratings yet
Cryptography and Network Security: Prepared by Anirban Bhadra
15 pages
Automated Resume Parsing A Natural Language Processing Approach
No ratings yet
Automated Resume Parsing A Natural Language Processing Approach
6 pages
3.1 Purpose
No ratings yet
3.1 Purpose
10 pages
An Innovative Method For Hindi Word Sense Disambiguation: Binod Kumar Mishra Suresh Jain
No ratings yet
An Innovative Method For Hindi Word Sense Disambiguation: Binod Kumar Mishra Suresh Jain
17 pages
Literature Review
No ratings yet
Literature Review
10 pages
A+ Emerging Final Exam AAU
No ratings yet
A+ Emerging Final Exam AAU
14 pages
Difference Between FBS & DBMS
No ratings yet
Difference Between FBS & DBMS
2 pages
Wayspire Tools and Technology With The Job Role - 1
No ratings yet
Wayspire Tools and Technology With The Job Role - 1
10 pages
21 - Data Structure and Algorithms - Hash Table
No ratings yet
21 - Data Structure and Algorithms - Hash Table
9 pages
Transfer Learning
No ratings yet
Transfer Learning
7 pages
Unit - 5: File System Approach
No ratings yet
Unit - 5: File System Approach
14 pages
Software Design Wikipedia
No ratings yet
Software Design Wikipedia
6 pages
Microsoft Azure Data Fundamentals Explore Core Data Concepts
No ratings yet
Microsoft Azure Data Fundamentals Explore Core Data Concepts
8 pages
Handwritten Digit Recognition Using Quantum Convolution Neural Network
No ratings yet
Handwritten Digit Recognition Using Quantum Convolution Neural Network
9 pages
Cambridge International AS & A Level: Information Technology 9626/33
No ratings yet
Cambridge International AS & A Level: Information Technology 9626/33
12 pages
Question Bank
No ratings yet
Question Bank
4 pages
Depression Detection Emotion AI
No ratings yet
Depression Detection Emotion AI
5 pages
JDSysmtemanalyst
No ratings yet
JDSysmtemanalyst
2 pages
Orthanc Paper
No ratings yet
Orthanc Paper
4 pages
OpenJS Node.js Application Developer (JSNAD) Certification Guide: A complete practical study guide to become a node.js certified developer with 100+ sample programs demonstrated
From Everand
OpenJS Node.js Application Developer (JSNAD) Certification Guide: A complete practical study guide to become a node.js certified developer with 100+ sample programs demonstrated
Liora Venith
No ratings yet
Node.js for Beginners: A comprehensive guide to building efficient, full-featured web applications with Node.js
From Everand
Node.js for Beginners: A comprehensive guide to building efficient, full-featured web applications with Node.js
Ulises Gascón
No ratings yet
Building Scalable Systems with C: Optimizing Performance and Portability
From Everand
Building Scalable Systems with C: Optimizing Performance and Portability
Larry Jones
No ratings yet
React Anti-Patterns: Build efficient and maintainable React applications with test-driven development and refactoring
From Everand
React Anti-Patterns: Build efficient and maintainable React applications with test-driven development and refactoring
Juntao Qiu
No ratings yet
Efficient Scientific Programming with Spyder: Definitive Reference for Developers and Engineers
From Everand
Efficient Scientific Programming with Spyder: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Applied Machine Learning with Scikit-learn: Definitive Reference for Developers and Engineers
From Everand
Applied Machine Learning with Scikit-learn: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
PyGTK Techniques and Applications: Definitive Reference for Developers and Engineers
From Everand
PyGTK Techniques and Applications: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
AI for Everyone: An Intermediate Guide to Artificial Intelligence
From Everand
AI for Everyone: An Intermediate Guide to Artificial Intelligence
Nova Clarke
No ratings yet
OpenJS Node.js Application Developer (JSNAD) Certification Guide
From Everand
OpenJS Node.js Application Developer (JSNAD) Certification Guide
Liora Venith
No ratings yet
Exploring Autodesk Revit 2018 for Structure, 8th Edition
From Everand
Exploring Autodesk Revit 2018 for Structure, 8th Edition
Prof. Sham Tickoo
5/5 (2)
Mastering the Craft: Unleashing the Art of Software Engineering
From Everand
Mastering the Craft: Unleashing the Art of Software Engineering
Kiran Nagesh
No ratings yet
DESIGN ALGORITHMS TO SOLVE COMMON PROBLEMS: Mastering Algorithm Design for Practical Solutions (2024 Guide)
From Everand
DESIGN ALGORITHMS TO SOLVE COMMON PROBLEMS: Mastering Algorithm Design for Practical Solutions (2024 Guide)
ARCHER PAUL
No ratings yet
Study Guide Implementing DevOps Solutions (DevNet Professional) 300-910 DEVOPS
From Everand
Study Guide Implementing DevOps Solutions (DevNet Professional) 300-910 DEVOPS
Anand Vemula
No ratings yet
Mastering C: Advanced Techniques and Tricks
From Everand
Mastering C: Advanced Techniques and Tricks
Ted Norice
No ratings yet
Study Guide 300-435 ENAUTO: Automating and Programming Cisco Enterprise Solutions Certification Exam
From Everand
Study Guide 300-435 ENAUTO: Automating and Programming Cisco Enterprise Solutions Certification Exam
Anand Vemula
No ratings yet
Value Engineering Techniques and Applications: Definitive Reference for Developers and Engineers
From Everand
Value Engineering Techniques and Applications: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Efficient Development with WebStorm: Definitive Reference for Developers and Engineers
From Everand
Efficient Development with WebStorm: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
C# Algorithms for New Programmers: A Practical Guide with Examples
From Everand
C# Algorithms for New Programmers: A Practical Guide with Examples
William E. Clark
No ratings yet
Customizing AutoCAD 2020, 13th Edition
From Everand
Customizing AutoCAD 2020, 13th Edition
Prof. Sham Tickoo
No ratings yet
Caffe Deep Learning Framework Essentials: Definitive Reference for Developers and Engineers
From Everand
Caffe Deep Learning Framework Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Exploring Autodesk Revit 2017 for Structure , 7th Edition
From Everand
Exploring Autodesk Revit 2017 for Structure , 7th Edition
Prof. Sham Tickoo
1/5 (1)
C++ Algorithms for Beginners: A Practical Guide with Examples
From Everand
C++ Algorithms for Beginners: A Practical Guide with Examples
William E. Clark
No ratings yet
Practical MXNet Applications: Definitive Reference for Developers and Engineers
From Everand
Practical MXNet Applications: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
C# Debugging from Scratch: A Practical Guide with Examples
From Everand
C# Debugging from Scratch: A Practical Guide with Examples
William E. Clark
No ratings yet
Neutralino.js Essentials: Definitive Reference for Developers and Engineers
From Everand
Neutralino.js Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Blender 2.79 for Digital Artists
From Everand
Blender 2.79 for Digital Artists
Prof. Sham Tickoo
No ratings yet
ANSYS Workbench 2019 R2: A Tutorial Approach, 3rd Edition
From Everand
ANSYS Workbench 2019 R2: A Tutorial Approach, 3rd Edition
Prof. Sham Tickoo
No ratings yet
C++ OOP Made Simple: A Practical Guide with Examples
From Everand
C++ OOP Made Simple: A Practical Guide with Examples
William E. Clark
No ratings yet
IntelliJ IDEA Workflow and Productivity Guide: Definitive Reference for Developers and Engineers
From Everand
IntelliJ IDEA Workflow and Productivity Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
PyTorch Foundations and Applications: Definitive Reference for Developers and Engineers
From Everand
PyTorch Foundations and Applications: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Detectron2 in Practice: Definitive Reference for Developers and Engineers
From Everand
Detectron2 in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
5/5 (1)
Professional Test Driven Development with C#: Developing Real World Applications with TDD
From Everand
Professional Test Driven Development with C#: Developing Real World Applications with TDD
James Bender
No ratings yet
Expert Guide to Eclipse CDT: Definitive Reference for Developers and Engineers
From Everand
Expert Guide to Eclipse CDT: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Efficient Editing with Kate: Definitive Reference for Developers and Engineers
From Everand
Efficient Editing with Kate: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Efficient Development with Rider: Definitive Reference for Developers and Engineers
From Everand
Efficient Development with Rider: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Code::Blocks Essentials: Definitive Reference for Developers and Engineers
From Everand
Code::Blocks Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Netdata in Practice: Definitive Reference for Developers and Engineers
From Everand
Netdata in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Effective Workflow in PyCharm: Definitive Reference for Developers and Engineers
From Everand
Effective Workflow in PyCharm: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Designing deep learning systems: Software engineering, #1
From Everand
Designing deep learning systems: Software engineering, #1
rayaan
No ratings yet
Efficient Development with JetBrains Tools: Definitive Reference for Developers and Engineers
From Everand
Efficient Development with JetBrains Tools: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
NetBeans Development Guide: Definitive Reference for Developers and Engineers
From Everand
NetBeans Development Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Machine Learning Mastery for Engineers
From Everand
Machine Learning Mastery for Engineers
Abdellatif Sadeq
No ratings yet
Efficient Container Image Building with BuildKit: Definitive Reference for Developers and Engineers
From Everand
Efficient Container Image Building with BuildKit: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Exploring Autodesk Revit 2018 for MEP, 5th Edition
From Everand
Exploring Autodesk Revit 2018 for MEP, 5th Edition
Prof. Sham Tickoo
1/5 (1)
Exploring Autodesk Revit 2017 for Architecture, 13th Edition
From Everand
Exploring Autodesk Revit 2017 for Architecture, 13th Edition
Prof. Sham Tickoo
5/5 (1)
Exploring Autodesk Revit 2018 for Architecture, 14th Edition
From Everand
Exploring Autodesk Revit 2018 for Architecture, 14th Edition
Prof. Sham Tickoo
5/5 (1)
Exploring Autodesk Revit 2023 for Structure, 13th Edition
From Everand
Exploring Autodesk Revit 2023 for Structure, 13th Edition
Prof. Sham Tickoo
No ratings yet
AutoCAD Plant 3D 2021 for Designers, 6th Edition
From Everand
AutoCAD Plant 3D 2021 for Designers, 6th Edition
Prof. Sham Tickoo
No ratings yet
Learning Oracle 12c: A PL/SQL Approach
From Everand
Learning Oracle 12c: A PL/SQL Approach
Prof. Sham Tickoo
No ratings yet
Exploring Autodesk Revit 2017 for MEP, 4th Edition
From Everand
Exploring Autodesk Revit 2017 for MEP, 4th Edition
Prof. Sham Tickoo
No ratings yet

Schifferer 2020

Uploaded by

Schifferer 2020

Uploaded by

Tutorial: Feature Engineering for Recommender Systems

Chris Deotte∗ Benedikt Schifferer∗ Even Oldridge∗†

Table 1: Overview of different feature engineering techniques by feature types

Feature Type Feature Engineering

• Example of different types of data (exploring dataset) ACKNOWLEDGMENTS

You might also like