ML Unit1.2
ML Unit1.2
Unit 1: Fundamentals Of
MachineLearning
Introduction to Machine Learning:- What is Machine Learning?
Why Use Machine Learning?, Types of Machine Learning Systems,
Main Challenges ofMachine Learning, Applications of Machine
Learning. Why python, Scikit- learn, Essential Libraries & Tools.
Machine Learning :
Machine learning (ML) is a subdomain of artificial intelligence (AI)
that focuses on developing systems that learn—or improve
performance—based onthe data they ingest.
It focuses on the development of algorithms and statistical models
that enable computers to perform tasks without being explicitly
programmed for each task. Instead of relying on explicit instructions,
machine learning algorithms use patterns and inference to learn from
data and make predictions or decisions.
How does Machine Learning work
A machine learning system builds prediction models, learns from
previous data,and predicts the output of new data whenever it receives
it. The amount of data helps to build a better model that accurately
predicts the output, which in turn affects the accuracy of the predicted
output.
• Unsupervised Learning
Unsupervised learning is a learning method in which a machine
learns withoutany supervision.
The training is provided to the machine with the set of data that has
not been labeled, classified, or categorized, and the algorithm needs
to act on that data without any supervision. The goal of unsupervised
learning is to restructure the input data into new features or a group of
objects with similar patterns.
Unsupervised learning classifieds into two categories of algorithms:
• Clustering
• Association
• Reinforcement Learning
• Image Recognition:
• Speech Recognition
• Traffic prediction:
• Real Time location of the vehicle form Google Map app and sensors
• Average time has taken on past days at the same time.
• Product recommendations:
• Self-driving cars:
various ways just by our voice instructions such as Play music, call
someone, Open an email, Scheduling an appointment, etc.
• Medical Diagnosis:
In medical science, machine learning is used for diseases diagnoses.
With this, medical technology is growing very fast and able to build
3D models that can predict the exact position of lesions in the brain.
• Gathering Data:
Data Gathering is the first step of the machine learning life cycle. The
goal of this step is to identify and obtain all data-related problems.
In this step, we need to identify the different data sources, as data can
be collected from various sources such as files, database, internet, or
mobile devices. It is one of the most important steps of the life cycle.
The quantity and quality of the collected data will determine the
efficiency of the output. The more will be the data, the more accurate
will be the prediction.
Example code
• Data preparation
After collecting the data, we need to prepare it for further steps. Data
preparation is a step where we put our data into a suitable place and
prepare it to use in our machine learning training.
In this step, first, we put all data together, and then randomize the
ordering of data.
• Data exploration:
It is used to understand the nature of data that we have to work
with. We need to understand the characteristics, format, and
quality of data. A better understanding of data leads to an
effective outcome. In this, we find Correlations, general trends,
and outliers.
Example code
• Data Wrangling
Data wrangling is the process of cleaning and converting raw data into
a useable format. It is the process of cleaning the data, selecting the
variable to use, and transforming the data in a proper format to make
it more suitable for analysis in the next step. It is one of the most
important steps of the complete process. Cleaning of data is required
to address the quality issues.
• Missing Values
• Duplicate data
• Invalid data
• Noise
Example code:
• Data Analysis
Now the cleaned and prepared data is passed on to the analysis step.
This step involves:
Example code:
• Train Model
Now the next step is to train the model, in this step we train our model
to improve its performance for better outcome of the problem.
We use datasets to train the model using various machine learning
algorithms. Training a model is required so that it can understand the
various patterns, rules, and, features.
Example code:
• Test Model
Once our machine learning model has been trained on a given dataset,
then we test the model. In this step, we check for the accuracy of our
model by providing a test dataset to it.
Example code:
• Deployment
The last step of machine learning life cycle is deployment, where we
deploy the model in the real-world system.
Example Code:
• Regression
• Classification
• Random Forest
• Decision Trees
• Logistic Regression
• Support vector Machines
Unsupervised Machine
Learning Types of
Unsupervised Learning
Algorithm:
• K-means clustering
• KNN (k-nearest neighbors)
• Hierarchal clustering
• Anomaly detection
• Neural Networks
• Principle Component Analysis
• Independent Component Analysis
• Apriori algorithm
• Singular value decomposition
What is Python?
Python Libraries
→ Numpy
→ Pandas
• Pandas (Python data analysis) is a must in the Data Science Life Cycle
→ Matplotlib
• Matplotlib is one of the most popular Python packages used for
Data Visualization
→ Seaborn
→ Scikit-learn
→ TensorFlow
→ Keras
→ NLTK
→ BeautifulSoup
3. Computational Complexity
• Resource Intensity: Training complex models, especially deep learning
models, requires significant computational resources. This includes powerful
hardware (e.g., GPUs) and time, which can be expensive.
• Scalability: Handling large-scale data and real-time processing can be
difficult, necessitating distributed computing frameworks and efficient
algorithms.
8. Cost
• Development and Maintenance Costs: Developing ML solutions can be
expensive due to the need for specialized hardware, software, and skilled
personnel. Ongoing maintenance and updates also add to the cost.
Measures to reduce the above challenges & Issues
• Data Augmentation: For image data, techniques like rotation, scaling, and
flipping can generate more training samples. For text data, synonym
replacement and back-translation are useful.
3. Computational Complexity
• Dimensionality Reduction: Techniques like Principal Component Analysis
(PCA) or t-SNE can reduce the computational load by decreasing the
number of features.
9. Cost
• Cost-Effective Tools: Leveraging open-source tools and cloud-based ML
services to reduce costs.