0% found this document useful (0 votes)
133 views30 pages

DSAI2201 Introduction To Data Science and AI: Winter 2022

This document provides an introduction to a course on data science and AI. It outlines the following key points: - The instructor's name is Professor Mohammed Hoda and the course covers topics like loading and analyzing datasets, using statistical techniques to summarize data, applying machine learning algorithms, and distinguishing reinforcement learning from traditional machine learning. - Students will be evaluated based on labs, quizzes, projects, a midterm, and a final exam. Course policies emphasize submitting original work and attending all classes and labs. - The textbook is "Python Data Science Handbook" and the first lecture provides an example of using customer satisfaction data to identify customer clusters through visualization and k-means clustering. This allows distinguishing different types

Uploaded by

Sam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
133 views30 pages

DSAI2201 Introduction To Data Science and AI: Winter 2022

This document provides an introduction to a course on data science and AI. It outlines the following key points: - The instructor's name is Professor Mohammed Hoda and the course covers topics like loading and analyzing datasets, using statistical techniques to summarize data, applying machine learning algorithms, and distinguishing reinforcement learning from traditional machine learning. - Students will be evaluated based on labs, quizzes, projects, a midterm, and a final exam. Course policies emphasize submitting original work and attending all classes and labs. - The textbook is "Python Data Science Handbook" and the first lecture provides an example of using customer satisfaction data to identify customer clusters through visualization and k-means clustering. This allows distinguishing different types

Uploaded by

Sam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30

DSAI2201

Introduction to Data Science and AI


Winter 2022

Lecture 1: Introduction

Professor: Mohammed Hoda


Data Science Disciplines
Agenda
• Professor’s Detail
• Acknowledgment
• What will you learn?
• Evaluation/ Earning Credits
• Course Policy.
• Textbook
• Course Introduction
Instructor’s Detail
• Name: Hussam Fetyan
• Email: [email protected]
• Office: 10.2.56
• Office Hours: As per the posted schedule
Acknowledgment
• These slides were prepared by Dr. Mohammed Hoda.
• Many of these slides were originally prepared by other
professors.
What will you learn?
• After you complete this course, you will have better understanding
of:
• Use current application to load, access, and describe datasets
• Summarize data and determine relationships using various
statistical techniques
• Apply the process of exploratory data wrangling to analyze
datasets
• Use machine learning algorithms such as Linear and Logistic
Regression for making predictions
• Distinguish between reinforcement learning algorithms and
traditional machine learning algorithms
Evaluation/ Earning Credits

ITEM DATE (SUBJECT TO CHANGE) VALUE

Labs Weekly Lab Reports + Exams 20%

Popup Quizzes Weekly Quizzes 10%

Projects Related Set of Assignments 20%

Midterm During the Midterm Exam Week 20%

Final Scheduled during final exams week 30%


Course Policies Highlights
• Students must submit their lab work by the end of the lab session or as directed by their
instructor.
• Submission of lab work doesn't entitle the student to gain any marks even if it's done
perfectly. Lab work is evaluated in class based on how much the student learnt from
doing the exercises.
• Missed project submission (assignments): mark of zero.
• Late project submission is rejected and treated as a missed assignment, i.e., no grade
will be given.
• Instructors reserve the right to conduct interviews with any student to clarify the details
or authenticity of any assessment.
Course Policies Highlights

• Students must submit their own work and be able to defend it. Students who are unable to
explain/justify/defend their work will get very low or no grades even if the work submitted
is correct.
• Attendance is taken at the beginning of each session. Late arrivals will be recorded absent.
Once attendance is taken, the instructor will not make changes.
• Absence from final examination: see School of IT regulations.
• Issues regarding marks, scheduled tests, or mistakes in evaluations must be discussed with
the instructor within 3 days during office hours, not during class time.
Textbook
• Main textbook (required):

– “Python Data Science Handbook: Essential Tools for


Working with Data” by Jake VanderPlas

• Recommended Books

– "Doing Data Science” by Cathy O'Neil, Rachel Schutt


– "Numsense! Data Science for the Layman: No Math
Added 1st Edition" by Annalyn Ng, Kenneth Soo
– “Machine Learning” by Amit Kumar Das; Subramanian
Chandramouli; Saikat Dutt
– “Data Science from Scratch: First Principles with Python”
by Joel Grus
Lectures and Labs
– To succeed in this course, a student must attend and pay attention in lectures (popup
quizzes may be conducted).
– Next review the lecture slides at home and try the examples presented. If stuck, you
can look at how the instructor did it but make sure to learn. prepare questions and ask
the instructor.
– Attend the labs and attempt the exercises yourself.
– To gain higher grades, you must expand your knowledge by reading other resources.
– Your target must be learning and be ready for the test, nothing else!
– Material will be posted on D2L. Read all the announcements carefully
Lectures and Labs

– Lecture content will be posted by the weekend of the corresponding


lecture
– Lab content will be posted by the weekend of the corresponding lab
(we will try to have them available one week before)
– Scheduled labs will be in groups of about 20-25 students, with help
available from your instructor.
– All labs will involve programming in Python 3
Labs and Lab Tools
 Two labs every week.
 Download and install Anaconda IDE to run your code (We may use Google
Colab too).
 Check the following video in order to download and install Anaconda.
https://youtu.be/aN6OVm0mTHo
Project

• Made of two assignments during the term.


• Both assignments will involve programming in Python; and maybe some
will involve written work.
• Project interview may be conducted to evaluate your work. Make sure to do
it yourself and more importantly to learn how it was done.
Final Examination

• To be scheduled by the University Registrar


– The exam date and time will be scheduled during
the exam period
– Check the university website later for the date,
time, and location.
Things to do right away

1. Check your lab schedule


2. Familiarize yourself with D2L (Virtual Campus)
3. Get the textbook
4. If you want to work on your own computer, install Anaconda.
5. Find out how to start Anaconda, write your first python
program. run it, save it, close it, open it again (in lab, at
home …)
Plagiarism

• You can share ideas, but not code or worked solutions.


• Never look at any other person/group’s assignment code, or have their code in
your possession, in any portion or form whatsoever.
• Never share your assignment code with other students. All students involved in
academic dishonesty will be penalized regardless of who copied from whom.
• As a rule of thumb:
 If you didn't learn it, you won't get credit for it. Furthermore, it may cause you
more headache than what you would get as a mark.
 How to measure learning? If you cannot fully explain/defend the submitted
work or redo a similar exercise, then you didn't learn it.
Academic Integrity
• What is academic fraud?
– Misrepresenting someone else’s work as your own:
• Failure to cite sources, including the internet and
discussions.
• Use of the words of someone else without quotation marks or
other highlighting.
– Falsified lab data or citations.
– Violation of examination regulations.
– Tampering with academic evaluations.
– Helping another student do any of the above.
General Suggestions
How to succeed in university?
• Based on problems identified by the Committee on Academic Standing:
– Stay engaged: attend classes, Labs, etc.
– Understand academic regulations
– Don't bite more than what you can chew. Do not take too many courses.
– Consider dropping a course that you think you might fail. That is better than
failing it.
– Take university seriously, it's not a joke.
– Do not work too many hours (in part-time jobs)
Now Let us Have our First Practical Example
Customer Satisfaction
• We have data collected from the customers of a
shop: 30 observations in total each observation
represents a client who shared their customer
satisfaction and brand loyalty.
• How to serve the future customers better?
• To do so, we need to apply machine learning
algorithms to the data.
• First step: import the required libraries and load
the data set.
Customer Satisfaction
Customer Satisfaction
• A good preliminary step of most analyses is to visualize the data and examine it.
Customer Satisfaction
• A good preliminary step of most analyses is to visualize the data and examine it.

• Two clusters can be identified instantly with


no machine learning whatsoever.
• One represents people with low loyalty and
low satisfaction and
• The other one containing all the rest.
Customer Satisfaction
• But let's take a more scientific approach.
• Most of the times in data science you would want to standardize your data
• After that perform some unsupervised machine learning more specifically cluster
analysis using the popular Kmeans algorithm.

• Let us plot now the data but with four clusters


Customer Satisfaction
• Let us plot now the data but with four clusters
Customer Satisfaction
• From here we can distinguish four types of customers and actually name them:
• The ones with the low satisfaction and low loyalty will be called alienated (Red
cluster; bottom-left)
• Those with high satisfaction and high loyalty are fans (Green cluster; up-right)
• Those with low satisfaction and high loyalty are supporters (Blue cluster; up-left)
• And the last ones that are neutral or disloyal but have a high satisfaction. These
are roamers (Black cluster; bottom-right)
Customer Satisfaction
• We've reached a remarkable result.
• We've applied an algorithm on our data to reach an insight.
• Naturally we must analyze what we see.
• Data Science is about storytelling and making sense of numbers.
• We have four groups but only one of them is favorable (the fans).
• Cluster analysis indicates the problem: some customers are dissatisfied, others
are disloyal.
• However we must figure out how to solve the problem ourselves.
Customer Satisfaction
• What are some ideas a data scientist and management will come up with.
• It makes sense to focus our efforts to turn supporters (Blue cluster) into fans by
improving their shopping experience.
• Normally we would have to dig deeper to find the drivers of dissatisfaction for
these customers.
• Maybe it is long queues or unfriendly staff or perhaps high prices.
• Whatever the reason we must take actionable steps to fix the issue and make our
supporters happier.
• We can turn the roamers (Black cluster) into fans by increasing their brand loyalty.
• For example, loyalty cards gifts, personalized discounts vouchers, and raffles are
different strategies used to make such clients loyal in the long run.
Customer Satisfaction
• It is worth to note that in this exercise we missed a few steps along the way.
• Typing code step by step
• Creating a DendroGram
• Analyzing a heat map
• Finding the optimal number of clusters.

• However these are all topics we will address later on in the course.

You might also like