Easy Explanation of Data Modelling in Python
Easy Explanation of Data Modelling in Python
In reference to data science, modelling means formulating every step and gathering the techniques
required to achieve the solution. All the calculation cannot be performed at once. We need to list
down the flow of the calculations which is nothing but modeling steps to the solution.
It means training a machine learning algorithm to predict the labels from the features, tuning it for
the business need, and validating it on holdout data. The output from modeling is a trained model
that can be used for inference, making predictions on new data points.
Although machine learning algorithms may sound technically complex, implementing them in
Python is simple thanks to standard machine learning libraries like Scikit-Learn.
Scikit-learn is a free machine learning library for Python. It features various algorithms like
support vector machine, random forests, and k-neighbor’s, and it also supports Python numerical
and scientific libraries like NumPy and SciPy.
Technique Flow
Data Solution
Modelling Calculations
• Regression
• Classification
• Clustering
• Dimensionality Reduction
• Ensemble Methods
• Neural Nets and Deep Learning
• Transfer Learning
• Reinforcement Learning
• Natural Language Processing
1
Regression methods fall within the category of supervised ML. They help to predict or explain a
particular numerical value based on a set of prior data, for example predicting the price of a property
based on previous pricing data for similar properties. The simplest method is linear regression
where we use the mathematical equation of the line (y = m * x + b) to model a data set. (x, y) are
coordinates and m is slope.
Classification another class of supervised ML, classification methods predict or explain a class
value. For example, they can help predict whether or not an online customer will buy a product.
The output can be yes or no: buyer or not buyer.
With Clustering methods, we get into the category of unsupervised ML because their goal is to
group or cluster observations that have similar characteristics. Clustering methods don’t use output
information for training, but instead let the algorithm define the output. In clustering methods, we
can only use visualizations to inspect the quality of the solution.
Deep Learning is a sub-category of machine learning. Similar to machine learning, deep learning
also has supervised, unsupervised, and reinforcement learning in it. As discussed earlier, the idea
of AI was inspired by the human brain. So, let's try to connect the dots here, deep learning was
inspired by artificial neural networks and artificial neural networks commonly known as ANN
were inspired by human biological neural networks. Deep learning is one of the ways of executing
machine learning.