CSC407 - Chapter 1
CSC407 - Chapter 1
(csc 407)
7: Unsupervised Learning
• Clustering techniques: K-Means, Hierarchical Clustering.
• Dimensionality reduction: PCA.
Practical:
• Apply K-Means clustering to a dataset (e.g., customer segmentation).
• Visualize clusters and reduce dimensions with PCA.
8: Feature Engineering
• Importance of feature selection and extraction.
• Techniques: One-hot encoding, label encoding, scaling, and transformation.
• Feature importance and correlation.
Practical:
• Perform feature engineering on a raw dataset.
• Use Scikit-learn to select important features.
Course Outline
9: Neural Networks and Deep Learning
• Introduction to neural networks.
• Key components: layers, activation functions, and loss functions.
• Overview ofTensorFlow and Keras.
Practical:
• Build a simple neural network for image classification (e.g., MNIST dataset).
• Train and evaluate the model using Keras.
• Association
• Association rule learning is a technique for discovering relationships between items in a
dataset. It identifies rules that indicate the presence of one item implies the presence of
another item with a specific probability.
• Examples: Apriori Algorithm, Eclat, FP-growth Algorithm
1: Introduction to Data Science and
Machine Learning
• Unsupervised Machine Learning
• Advantages of Unsupervised Machine Learning
• It helps to discover hidden patterns and various relationships between the data.
• Used for tasks such as customer segmentation, anomaly detection, and data exploration.
• It does not require labeled data and reduces the effort of data labeling.
• Example: Consider that you are training an AI agent to play a game like chess.
The agent explores different moves and receives positive or negative feedback
based on the outcome. Reinforcement Learning also finds applications in which
they learn to perform tasks by interacting with their surroundings.
1: Introduction to Data Science and
Machine Learning
• Reinforcement Machine Learning
• Types of Reinforcement Machine Learning
• Positive reinforcement
• Rewards the agent for taking a desired action.
• Encourages the agent to repeat the behavior.
• Examples: Giving a treat to a dog for sitting, providing a point in a game for a correct
answer.
• Negative reinforcement
• Removes an undesirable stimulus to encourage a desired behavior.
• Discourages the agent from repeating the behavior.
• Examples: Turning off a loud buzzer when a lever is pressed, avoiding a penalty by
completing a task.
• Advantages of Reinforcement Machine Learning
• It has autonomous decision-making that is well-suited for tasks and that can learn to make a
sequence of decisions, like robotics and game-playing.
• This technique is preferred to achieve long-term results that are very difficult to achieve.
• It is used to solve a complex problems that cannot be solved by conventional techniques.
• Disadvantages of Reinforcement Machine Learning
• Training Reinforcement Learning agents can be computationally expensive and time-
consuming.
• Reinforcement learning is not preferable to solving simple problems.
• It needs a lot of data and a lot of computation, which makes it impractical and costly.
1: Introduction to Data Science and
Machine Learning
• Comparison between Data Science & Machine Learning
While Data Science and Machine Learning are closely related, they have distinct purposes,
processes, and tools. Below is a detailed comparison:
Aspect Data Science Machine Learning
Definition A broad field that focuses on extracting A subset of Data Science that involves
insights and knowledge from data using designing algorithms that can learn
various techniques, including statistics, data patterns from data and make
analysis, and machine learning. It involves the predictions or decisions without being
end-to-end workflow of data handling, explicitly programmed.
analysis, and interpretation.
Scope Includes data collection, cleaning, preprocessing, Focuses specifically on designing and training
analysis, modeling, visualization, and interpretation. models to learn from data.
Encompasses both statistical analysis and machine Involves supervised, unsupervised, and
learning. reinforcement learning techniques.
Often involves domain expertise and storytelling with Aims to automate predictions or decision-
data. making processes.
Goal Provide actionable insights from data. Build models to predict or classify future
Explore and understand datasets. outcomes.
Solve business problems through analysis and Optimize tasks like recommendation, fraud
visualization. detection, and natural language processing.
Automate learning from data.
1: Introduction to Data Science and
Machine Learning
• Comparison between Data Science & Machine Learning
(Cont’d)
Aspect Data Science Machine Learning
Programming Tools Python, R, SQL Python, R, TensorFlow, PyTorch
Libraries Pandas, NumPy, Matplotlib, Seaborn Pandas, NumPy, Matplotlib, Seaborn,
Scikit-learn, TensorFlow, PyTorch
Techniques EDA, data cleaning, statistical modeling Regression, classification, clustering
Focus Data processing and analysis Model development and optimization
Output Visualizations, reports, dashboards, and Predictive or classification models.
insights. Systems that answer "what will
Answers to "what happened?" or "why did happen?" or "how can we improve
it happen?" this process?"
Common Roles Data Scientist, Data Analyst, Business Data Scientist, Data Analyst, Business
Analyst Analyst
Example Exploring customer behavior through sales Building a recommendation system for
Applications data and creating dashboards to guide Netflix or Amazon.
decision-making. Training a model to detect fraudulent
Analyzing trends and making data-driven transactions in real-time.
recommendations.
1: Introduction to Data Science and
Machine Learning
• Tools
• Programming language: Python.
• Libraries:
• NumPy: Numerical computations.
• Pandas: Data manipulation.
• Matplotlib/Seaborn:Visualization.
• Scikit-learn: ML models and utilities.
• Dataset Example
• Dataset:
• Iris DatasetFeatures: Sepal length, Sepal width, Petal length, Petal
width.
• Target: Species (setosa, versicolor, virginica).
1: Introduction to Data Science and
• Practicals
Machine Learning
• Setting up the Environment:
install Python (latest stable version), libraries
# Install Python and Jupyter Notebook
pip install numpy pandas matplotlib seaborn scikit-learn jupyter
• Note: If you have gmail account, just download Google Colab and you
are good to go! No need for installing Python and libraries
• Basic Dataset Exploration
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
data = load_iris()
df = pd.DataFrame(data=data['data'], columns=data['feature_names'])
df['target'] = data['target']
# Summary statistics
print(df.describe())
#Visualize distributions
sns.pairplot(df, hue='target', palette='Set1')
plt.show()