0% found this document useful (0 votes)
1 views14 pages

ML 01 (Shubham)

This document serves as a practical file for a Machine Learning course, detailing the fundamentals of machine learning, including its types (supervised, unsupervised, reinforcement), applications, and regression techniques. It explains regression as a supervised learning method for predicting continuous values, outlines various regression types, and provides a Python script for data handling and analysis. The document also highlights the importance of regression in various domains such as finance, healthcare, and marketing.

Uploaded by

pranav1256kam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views14 pages

ML 01 (Shubham)

This document serves as a practical file for a Machine Learning course, detailing the fundamentals of machine learning, including its types (supervised, unsupervised, reinforcement), applications, and regression techniques. It explains regression as a supervised learning method for predicting continuous values, outlines various regression types, and provides a Python script for data handling and analysis. The document also highlights the importance of regression in various domains such as finance, healthcare, and marketing.

Uploaded by

pranav1256kam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Practical File

Machine Learing
B.E. (IT) 6th Semester

Submitted To: Submitted by:

Dr.Mandeep Kaur ShubhamKumar

UE228093

IT Section-2

Department of Information Technology


University Institute of Engineering and Technology

Panjab University Chandigarh


Practical Number - 01
Introduction to Machine Learing
Machine Learning (ML) is a branch of artificial intelligence (AI) that enables
computers to learn from data and make decisions or predictions without being
explicitly programmed. Instead of following a set of predefined rules, machine
learning models identify patterns and improve their performance over time based on
experience.

Types of Machine Learning


1.Supervised Learning – The model is trained on labeled data, meaning it learns
from input-output pairs. Examples include:

· Classification: Predicting categories (e.g., spam detection in emails).


· Regression: Predicting continuous values (e.g., stock price prediction).
2.Unsupervised Learning – The model is given data without labeled outputs and
must find hidden patterns. Examples include:

· Clustering: Grouping similar data points (e.g., customer segmentation).


· Dimensionality Reduction: Simplifying data while preserving important
features (e.g., principal component analysis).

3.Reinforcement Learning – The model learns by interacting with an environment


and receiving rewards or penalties for actions. It is widely used in robotics and
gaming.

Applications of Machine Learning

· Healthcare: Disease prediction, medical image analysis.


· Finance: Fraud detection, algorithmic trading.
· E-commerce: Personalized recommendations.
· Autonomous Vehicles: Self-driving cars use ML for object recognition.
· Natural Language Processing (NLP): Chatbots, language translation.
What is Regression?
Regression is a type of supervised learning used to predict continuous numerical
values based on input data. It identifies relationships between independent variables
(features) and a dependent variable (target) to make predictions.

In machine learning regression, independent and dependent variables are key


concepts:

· IndependentVariable (Predictor, Feature, Input Variable):


These are the variables that you use to predict the outcome. They are the inputs
to the model.
Example: In predicting house prices, features like square footage, number of
bedrooms, and location are independent variables.
· DependentVariable (Response,Target,OutputVariable):
This is the variable that you are trying to predict. It depends on the
independent variables.
Example: The house price in a real estate prediction model is the dependent
variable.

Example: Simple Linear Regression


If you want to predict salary based on years of experience:

· Independent variable (X) = Years of experience


· Dependent variable (Y) = Salary

Mathematically, the regression equation is:

Y=mX+b

where:

· Y is the dependent variable (salary),


· X is the independent variable (years of experience),
· m is the coefficient (slope),
· b is the intercept.
How to Use Regression?
1. Identify the Problem

Determine if your problem involves predicting a continuous variable (e.g., price,


temperature, salary).

Choose regression when the goal is to find relationships between variables.

2. Collect and Prepare Data

Gather relevant data with independent and dependent variables.

Clean the data by handling missing values, removing outliers, and normalizing if
necessary.

Split data into training and testing sets (e.g., 80% train, 20% test).

3. Choose the Right Regression Model

Linear Regression: When the relationship between variables is linear.

Polynomial Regression: When the relationship is nonlinear.

Multiple Linear Regression: When multiple independent variables influence the


outcome.

Logistic Regression: For classification problems (e.g., spam or not spam).

Ridge/Lasso Regression: For regularization to avoid overfitting.

Decision Tree/Random Forest Regression: For complex, nonlinear relationships.

4. Train the Model

5. Evaluate the Model

6. Make Predictions
Where Do We Use Regression?
Regression is widely used in various domains where predicting a continuous
numerical value is required. Here are some key applications:

1. Finance & Economics 💰


· Stock Price Prediction: Predict future stock prices based on historical data.
· Risk Assessment: Estimate financial risks for loans and investments.
· House Price Prediction: Estimate real estate prices based on location, size,
and amenities.

2. Healthcare & Medicine 🏥


· Disease Progression Prediction: Estimate how a disease (e.g., diabetes)
progresses over time.
· Medical Cost Estimation: Predict hospital bills based on patient data.
· BMI Calculation: Predict BMI based on weight, height, and lifestyle habits.

3.Marketing & Sales 📊


· Sales Forecasting: Predict future sales based on past data and market trends.
· Customer Lifetime Value (CLV): Estimate how much revenue a customer
will generate over time.
· Advertising Effectiveness: Predict revenue impact from different ad
campaigns.

4. Manufacturing & Supply Chain 🏭


· Demand Forecasting: Predict future product demand to optimize inventory.
· Quality Control: Predict product defects based on production factors.
· Energy Consumption Prediction: Estimate power usage based on
operational conditions.

5. Environmental Science & Weather 🌍


· Weather Forecasting: Predict temperature, humidity, and rainfall.
· Pollution Level Prediction: Estimate air quality based on emissions and
climate factors.
· Natural Disaster Prediction: Forecast floods, earthquakes, and hurricanes.
6. Sports Analytics ⚽
· Performance Prediction: Estimate player performance based on training
data.
· Injury Risk Analysis: Predict probability of injuries based on physical
metrics.
· Game Score Prediction: Estimate the outcome of sports matches.

7. Education 🎓
· Student Performance Prediction: Predict grades based on study hours and
attendance.
· Dropout Rate Analysis: Estimate the likelihood of students dropping out.
· Tuition Fee Estimation: Predict costs based on various factors.
Types of Regression
1. Linear Regression 📈
✅ Use Case: Predict continuous values (e.g., house prices, salary).
✅ Equation:

Y=mX+b

✅ Example: Predicting salary based on years of experience.

🔹 Simple Linear Regression → One independent variable.


🔹 Multiple Linear Regression → Multiple independent variables.

2. Polynomial Regression 🔄
✅ Use Case: When data has a non-linear relationship but is still continuous.
✅ Equation:

Y=aX2+bX+c

✅ Example: Predicting population growth or temperature variations.

🔹 Used when the relationship is curved, not straight.


3. Logistic Regression (for Classification) 🚦
✅ Use Case: Binary or multi-class classification problems (e.g., spam detection,
disease prediction).
✅ Equation (Sigmoid Function):

P(Y)=1/1+e^−(b0+b1X)​

✅ Example: Predicting whether an email is spam (Yes/No).

🔹 Types:

· Binary Logistic Regression → Two classes (e.g., pass/fail).


· Multinomial Logistic Regression → More than two classes (e.g., predicting
type of weather: sunny, rainy, snowy).
· Ordinal Logistic Regression → Ordered categories (e.g., rating: low,
medium, high).
4. Ridge Regression (L2 Regularization) 🏋️
✅ Use Case: Prevents overfitting by adding a penalty to large coefficients.
✅ Equation (Loss Function with Regularization Term):

∑(Y−Y^)2+λ∑β^2

✅ Example: Used in high-dimensional data (e.g., financial modeling, genetics).

🔹 Helps when multicollinearity (correlation between independent variables) is


present.

5. Lasso Regression (L1 Regularization) ✂️


✅ Use Case: Feature selection by reducing less important variables to zero.
✅ Equation (Loss Function with Regularization Term):

∑(Y−Y^)2+λ∑∣β∣

✅ Example: Selecting the most relevant factors affecting house prices.

🔹 Helps in feature selection by shrinking irrelevant coefficients to zero.


Here's a Python script to perform the requested tasks:

Steps Covered in the Script:


1.Read a CSV file
You can read a CSV file in Python using the pandas library. Here’s how you can
do it:

· pd.read_csv("file.csv") loads the CSV file into a DataFrame.


2.Perform descriptive exploration (head, summary statistics)

· df.head() shows the first 5 rows of the dataset.

• The df.describe() function in Pandas provides summary statistics of a


DataFrame’s numerical columns.
3.Plot feature distributions (histograms, scatter plots)

Histogram Graph
Scatter Plot

4.Check linear relationship between two features


5.Split the dataset into 70% test and 30% train

Yes! train_test_split from sklearn.model_selection is used to split a dataset into


training and testing subsets.

· train_test_split() randomly divides New_Data into:


· 70% test data (test_data)
· 30% training data (train_data)
· test_size=0.3 → 30% of the data is used for testing.
· random_state=42 ensures that the split is reproducible (same split every time
you run it).

You might also like