Lecture Summary
Lecture Summary
Instead of taking all the unions, instead we look at the biggest set. The infinite
case does not have to exist.
Just like a function with a limit, the supremum is the upper/lower limit.
Eventhough the limit is never sampled if we sample it.
To get rid of the infinitity. We observe that we are training and testing ins the
equation
During AI training we have a training and testing set.
It does not matter for the expectation to change the elements of D and D', it does
not matter from which dataset we sample.
Using rademacher elements we can generate a vector o \elem {-1,1}^m we can observe
more things
Since the equality holds for any vector of length m we can swap around the
expectations.
This is interesting (we have an expectation within an expectation)
Expectation is average. The average is a loop, nested expectation is a nested loop.
We may have infinitely many hypotheses, but we're talkinga bout finite things
In finite case supernum is same as taking the maximum over the expectation.
The theorem from this says that all the 6 statements are equivalent
-- AFTER BREAK
Conceptual
We started by discussing what big data means
Everything is statistically significant, regardless of how small the difference is.
Using samples is the way to handle the big data and learn from this.
We used frequent item set mining
However this scan does multiple scans over database, if we don't sample and it
doesnt fit in RAM, its very costly.
We then asked ourselves how big our sample should be to learn from it.
You can check the frequent but not the missed frequent once, by lowering the
required frequency.
However there CAN still be set missing from the sample as frequent but is frequent
in reality.
Then we noticed that we are doing item set mining using machine learning.
Classification is supervised learning.
We derived a lower bound of how big the sample should be.
We call our learning PAC and then investigated how we can do this PAC learning.
Using hypothesis sets etc to denote how to learn
!!!!! It is DOABLE for finite hypothesis sets to learn
The no free lunch theorem tells us, if you express too much you cannot learn.
We were trying to find sample bounds for frequent item set mining? This will be
friday
And what are the guarantees
The one requirement, is can we learn more with a larger class of hypotheses sets.
Answer is not really, which is kindof suprirsing.
Can we guarantee that we're eventually having a accuracy of alpha. Answer is no
Another side note: This course teaches that there are requirements for amount of
data in order to learn things.
If we have only limited amount of clients and we have a great amount of nodes, then
we will not be able to learn.
Which is opposite of big data, but shows the other side of what we learned