0% found this document useful (0 votes)
74 views5 pages

Large-Scale Machine Learning

This document provides an overview of a course on large-scale machine learning. It discusses key concepts in data science including mathematics, statistics, machine learning, domain expertise, and applications. It notes challenges like dealing with large amounts of training data or high-dimensional data. The course will cover topics like online linear learning, second order optimization methods, boosted decision trees, parallel learning techniques, hashing and dimensionality reduction, feature learning and deep learning, active learning, and exploration and learning. Students will complete programming assignments and a project, and the course will utilize a 100-node computing cluster for large-scale processing.

Uploaded by

John Doe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views5 pages

Large-Scale Machine Learning

This document provides an overview of a course on large-scale machine learning. It discusses key concepts in data science including mathematics, statistics, machine learning, domain expertise, and applications. It notes challenges like dealing with large amounts of training data or high-dimensional data. The course will cover topics like online linear learning, second order optimization methods, boosted decision trees, parallel learning techniques, hashing and dimensionality reduction, feature learning and deep learning, active learning, and exploration and learning. Students will complete programming assignments and a project, and the course will utilize a 100-node computing cluster for large-scale processing.

Uploaded by

John Doe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Large-Scale

Large-Scale
Machine
Machine Learning
Learning
John
JohnLangford
Langford
Microsoft
MicrosoftResearch
Research

Yann LeCun

Yann
YannLeCun
LeCun
Courant
CourantInstitute
Institute

What
WhatisisData
DataScience?
Science?
Data Science: automatically extracting knowledge from data
Mathematics & Statistics
Machine Learning
Domain Expertise
Applications in Business
Lots and lots
Applications in the Sciences
Astronomy, Cosmology
High-energy Physics
Biology, Genomics
Neuroscience
The Social Sciences

Mathematics &

Machine

Statistics

Learning

Computation

Data
Science
conventional

Danger

research

Zone!

Domain Expertise

Medicine
Government

Yann LeCun

[afterDrewConway'sDataScienceVennDiagram]

Large
LargeScale
ScaleMachine
MachineLearning
Learning
Class website:
http://cilvr.cs.nyu.edu/doku.php?id=courses:bigdata:start
http://cilvr.cs.nyu.edu courses big data
Forum, discussion, Q&A on Piazza
https://piazza.com/class#spring2013/csciga3033002
Evaluation:
Programming assignments
Project
Final exam
Computing infrastructure
100-node cluster, 8 CPUs/node, Hadoop (donated by Yahoo! Labs)
Software
Torch: http://www.torch.ch/
Vowpal Wabbit:
https://github.com/JohnLangford/vowpal_wabbit/wiki
Yann LeCun

Big
BigData?
Data?
Data often comes to in the form of a table
N: dimension of each vector (possibly very sparse)
T: number of training samples (possibly infinite)
Big Data is large T, or large N, or both
Large T, small N: great!
Infinite T, small N: on-line / streaming
Small T, large N: hell!
Problems:
(distributed) data storage and access
can't use algo super-linear in T
Large N: overfitting
T
Parallelizing
Dealing with unbalanced set
Representing high-dim data

Yann LeCun

Intro
Online Linear learning

Syllabus
Syllabus

2nd order optimization methods


LBFGS
Online Non-linear learning
Boosted Decision Trees
Hadoop, Allreduce
Parallel learning, OpenMP, CUDA
Inverted Indicies & Predictive Indexing
Hashing, LSH, linear/non-linear dimensionality reduction
Feature Learning, deep learning
Many Classes
Active Learning
Exploration and Learning
Yann LeCun

You might also like