বঙ্গবন্ধু শেখ মুজিবুর রহমান জিজিটাল ইউজনভাজসিজট, বাাংলাদেে
কাজলয়াককর, গাজীপুর
Week 01 Tutorial: Introduction to Data Science
1 This is a part of Real-Time Pond Water Dataset for Fish Farming. It has 4 columns and 591
rows. Here, the independent variables are- pH, Temperature, Turbidity, and Fish. Here fish
is the target variable and others are the independent variable. There are 11 fish categories,
86 pH distinct values, 46 temperature distinct values, and 85 Turbidity distinct value.
What kind of problems(classification/regression/clustering/association) is this? Why?
2 The following table is a portion of 'Medical cost personal dataset'. The dataset has 1338
rows. The attributes are shown below. The attribute 'charges' indicates the class label.
• age: age of primary beneficiary
• sex: insurance contractor gender, female, male
• bmi: Body mass index, providing an understanding of body, weights that are
relatively high or low relative to height,
• objective index of body weight (kg / m ^ 2) using the ratio of height to weight,
ideally 18.5 to 24.9
• children: Number of children covered by health insurance / Number of
dependents
• smoker: Smoking
• region: the beneficiary's residential area in the US, northeast, southeast, southwest,
northwest.
Prepared by: Nurjahan Nipa, Lecturer, Department of Internet of Things & Robotics Engineering (IRE), BDU Page 1|3
বঙ্গবন্ধু শেখ মুজিবুর রহমান জিজিটাল ইউজনভাজসিজট, বাাংলাদেে
কাজলয়াককর, গাজীপুর
• charges: Individual medical costs billed by health insurance.
What kind of problem is this?Why?
3 Write down the differences between classification and regression based on the table of
question 1 &2.
4 There are billions of websites on the internet with different classified content. To make this
information available to web users requires a vast team of human resources who can
organize and classify the content on the web pages.
i. In this scenario, which Machine Learning techniques can be most useful by
labeling the content and classifying it, thus improving the user experience?
ii. Contrast and compare among supervised, unsupervised, semi-supervised and
reinforcement learning.
5 You own the mall and want to understand the customers like who can be easily converge
[Target Customers] so that the sense can be given to marketing team and plan the strategy
accordingly. What kinds of algorithm can be useful for this?
Write some applications of clustering and association analysis.
6 In data science, you have to know about various types of data. Briefly describe some
categories of data such as structured, unstructured, natural language, machine-generated,
graph-based, audio/video/images, streaming data types with example.
7 Briefly discuss the properties that make big data different from the data found in traditional
Prepared by: Nurjahan Nipa, Lecturer, Department of Internet of Things & Robotics Engineering (IRE), BDU Page 2|3
বঙ্গবন্ধু শেখ মুজিবুর রহমান জিজিটাল ইউজনভাজসিজট, বাাংলাদেে
কাজলয়াককর, গাজীপুর
data management tools. Also, mention how they are different from each other.
Prepared by: Nurjahan Nipa, Lecturer, Department of Internet of Things & Robotics Engineering (IRE), BDU Page 3|3