Introductrory Notes
Introductrory Notes
2. Do Background Research
Rather than starting from scratch in putting together a plan for answering the question, be a savvy
scientist using library and Internet research to find the best way to do things and ensure that
mistakes from the past are not repeated.
3. Construct a Hypothesis
A hypothesis is an educated guess about how things work. It is an attempt to answer your question
with an explanation that can be tested. A good hypothesis allows to then make a prediction:
"If _____[I do this] _____, then _____[this]_____ will happen."
State both the hypothesis and the resulting prediction to be tested. Predictions must be easy to
measure.
Data Science
Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms and
systems to extract knowledge and insights from many structural and unstructured data. Data science
is related to Data Mining and Big data.
Data science is a "concept to unify statistics, data analytics, machine learning and their related
methods" in order to "understand and analyse actual phenomena" with data. It employs techniques
and theories drawn from many fields within the context of mathematics, statistics, Computer
Science, and Information Science. Turing Award winner Jim Gray imagined data science as a
"fourth paradigm" of science (empirical, theoretical, computational and now data-driven) and
asserted that "everything about science is changing because of the impact of information
technology" and the data deluge.
The term Data Science has emerged recently with the evolution of mathematical statistics and data
analysis. The journey has been amazing, we have accomplished so much today in the field of Data
Science.
In the next few years, we will be able to predict the future as claimed by researchers from MIT.
They already have reached a milestone in predicting the future, with their awesome research. They
can now predict what will happen in the next scene of a movie, with their machine! How? Well it
might be a little complex for you to understand as of now, but don’t worry by the end of this blog,
you shall have an answer to that as well.
Coming back, we were talking about Data Science, it is also known as data driven science, which
makes use of scientific methods, processes and systems to extract knowledge or insights from data
in various forms, i.e either structured or unstructured.
What are these methods and processes, is what we are going to discuss in this Data Science Tutorial
today.
Moving forward, who does all this brain storming, or who practices Data Science? A Data
Scientist.
Who is a Data Scientist?
As you can see in the image, a Data Scientist is the master of all trades! He should be proficient in
maths, he should be axing the Business field, and should have great Computer Science skills as
well. Scared? Don’t be. Though you need to be good in all these fields, but even if you aren’t,
you’re not alone! There is no such thing as “a complete data scientist”. If we talk about working in a
corporate environment, the work is distributed among teams, wherein each team has their own
expertise. But the thing is, you should be proficient in at least one of these fields.
Let’s address each of these questions and the associated algorithms one by one:
Is this A or B?
With this question, we are referring to problems which have a categorical answer, as in
problems which have a fixed solution, the answer could either be a yes or a no, 1 or 0, interested,
maybe or not interested.
For Example:
Q. What will you have, Tea or Coffee?
Here, you cannot say you would want a coke! Since the question only offers tea or coffee, and
hence you may answer one of these only.
When we have only two type of answers i.e yes or no, 1 or 0, it is called 2 – Class Classification.
With more than two options, it is called Multi Class Classification.
Concluding, whenever you come across questions, the answer to which is categorical, in Data
Science you will be solving these problems using Classification Algorithms.
Is this weird?
Questions like these deal with patterns and can be solved using Anomaly Detection algorithms.
For Example:
Try associating the problem “is this weird?” to this diagram,
What is weird in the above pattern? The red guy, isn’t it?
Whenever there is a break in pattern, the algorithm flags that particular event for us to review. A real
world application of this algorithm has been implemented by Credit Card companies where in, any
unusual transaction by a user is flagged for review. Hence implementing security and reducing
human’s effort on surveillance.
How much or How many?
Those of you, who don’t like maths, be relieved! Regression algorithms are here!
So, whenever there is a problem which may ask for figures or numerical values, we solve it using
Regression Algorithms.
For Example:
What will be the temperature for tomorrow?
Since we expect a numeric value in the response to this problem, we will solve it using Regression
Algorithms.
Moving along in this Data Science Tutorial, let’s discuss the next algorithm,
How is this organised?
Say you have some data, now you don’t have any idea, how to make sense out of this data. Hence
the question, how is this organised?
Well, you can solve it using clustering algorithms. How do they solve these problems? Let’s see:
Clustering algorithms group the data in terms of characteristics which are common. For example in
the above diagram, the dots are organised based on colors. Similarly, be it any data, clustering
algorithms try to apprehend what is common between them and hence “clusters” them together.
What should I do next?
Whenever you encounter a problem, wherein your computer has to make a decision based on the
training that you have given it, it involves Reinforcement Algorithms.
For Example:
Your temperature control system, when it has to decide whether it should lower the temperature of
the room, or increase it.
How do these algorithms work?
These algorithms are based on human psychology. We like being appreciated right? Computers
implement these algorithms, and expect being appreciated when being trained. How? Let’s see.
Rather than teaching the computer what to do, you let it decide what to do, and at the end of that
action, you give either a positive or a negative feedback. Hence, rather than defining what is right
and what is wrong in your system, you let your system “decide” what to do, and in the end give a
feedback.
It is a type of Artificial Intelligence that makes the computers capable of learning on their own i.e
without explicitly being programmed. With machine learning, machines can update their own code,
whenever they come across a new situation.
Concluding in this Data Science Tutorial, we now know Data Science is backed by Machine
Learning and its algorithms for its analysis. How we do the analysis, where do we do it. Data
Science further has some components which aids us in addressing all these questions.
Before that let me answer how MIT can predict the future, because I think you guys might be able
to relate it now. So, researchers in MIT trained their model with movies and the computers learnt
how humans respond, or how do they act before doing an action.
For example, when you are about shake hands with someone you take your hand out of your pocket,
or maybe lean in on the person. Basically there is a “pre action” attached to every thing we do. The
computer with the help of movies was trained on these “pre actions”. And by observing more and
more movies, their computers were then able to predict what the character’s next action could be.
Easy ain’t it? Let me throw one more question at you then in this Data Science Tutorial! Which
algorithm of Machine Learning they must have implemented in this?