0% found this document useful (0 votes)
21 views3 pages

Syllabus (CS361)

This document provides information about an introduction to data science course offered in the spring 2023 semester. The course will be taught by Professor Jae-Gil Lee and will cover basic probability, statistics, and machine learning techniques used in data science. Students will learn skills like exploratory data analysis, predictive modeling, and big data processing using Python. The grade will be based on midterm and final exams, as well as a data science competition project. The tentative schedule outlines topics like hypothesis testing, neural networks, decision trees, and MapReduce algorithms over 14 weeks.

Uploaded by

varshneysachit75
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views3 pages

Syllabus (CS361)

This document provides information about an introduction to data science course offered in the spring 2023 semester. The course will be taught by Professor Jae-Gil Lee and will cover basic probability, statistics, and machine learning techniques used in data science. Students will learn skills like exploratory data analysis, predictive modeling, and big data processing using Python. The grade will be based on midterm and final exams, as well as a data science competition project. The tentative schedule outlines topics like hypothesis testing, neural networks, decision trees, and MapReduce algorithms over 14 weeks.

Uploaded by

varshneysachit75
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Introduction to Data Science (CS361)

데이터 사이언스 개론
2023년 봄학기 (Tue/Thu 14:30~16:00), 선택, 3:0:3

1. Instructor
Jae-Gil Lee (x 3545, [email protected])

2. Summary
Data science is an inter-disciplinary field focused on extracting knowledge from
typically large data sets. This course aims at teaching basic skills in data science for
undergraduate students. It covers basic probability and statistics theories required
for data science; exploratory data analysis (EDA) required for understanding a given
data set; and predictive analysis based on statistical or machine learning techniques.
Additionally, it discusses recent big data processing techniques and various data
science applications. The students will learn how to implement the methodologies
using the Python language (on Google Colab).

IMPORTANT: If you took CS492 (Introduction to Data Science) in 2021 or 2022, you
cannot enroll this course.

3. Textbooks
• Main: Joel Grus, Data Science from Scratch: First Principles with Python, 2nd ed.,
O’Reilly, 2019.
• Auxiliary: Peter Bruce, Andrew Bruce, and Peter Gedeck, Practical Statistics for
Data Scientists: 50+ Essential Concepts Using R and Python, 2nd ed., O’Reilly, 2019.
• Auxiliary: Zhi-Hua Zhou, Machine Learning, Springer, 2021.

4. Course requirements (tentative)


• One data science competition with courtesy of the Korea Customs Service
(대한민국 관세청): The winners of this competition will be awarded by the
Commissioner (관세청장) of the Korea Customs Service.

5. Grading policy
• Midterm exam: 35%
• Final exam: 35%
• Project (data science competition): 30%
• Class attendance: deducting 1 point after 3 absences
• A-F style

6. Prerequisite
• Data structure related course (e.g., CS206)
• Python programming

1
7. Tentative schedule (subject to change)

Week Contents
Introduction
1
• Big data, data science, data scientist, etc.
Statistics and Probability Theory
2 • Central tendency, dispersion, correlation, etc.
• Bayes theorem, normal distribution, central limit theorem, etc.
Hypothesis and Inference
3 • Statistical hypothesis testing, confidence interval, p-value, A/B
testing, etc.
Data Acquisition
4
• Web scraping, data API (e.g., Twitter API), JSON format, etc.
Data Understanding
5 • Exploratory data analysis
• Data visualization (matplotlib, seaborn libraries)
Data Preprocessing
6 • Data cleaning, data scaling, etc.
• Dimensionality reduction etc.
Machine Learning Basics
7 • Modeling, training, overfitting, bias-variance tradeoff, feature
extraction, etc.
8 Midterm Exam
Linear and Logistic Regression
9 • Regression and prediction concepts
• Gradient descent, maximum likelihood estimation, etc.
k-Nearest Neighbor and Naïve Bayes
• Classification concepts
10
• k-Nearest neighbor concepts and examples
• Naïve Bayes concepts and examples
Decision Tree
11 • Decision tree concepts and examples
• Information theory (e.g., entropy)
Neural Network and Deep Learning (I)
12 • Perceptron, feed-forward neural network, etc.
• Learning theory (e.g., backpropagation)
Neural Network and Deep Learning (II)
• Multi-layer perceptron (MLP)
13 • Loss & optimization, activation function, softmax & cross-
entropy loss, etc.
• MNIST image classification example
Big Data Processing: MapReduce
14 • MapReduce concepts and Hadoop/Spark
• MapReduce algorithms

2
Relevant Topics and Applications
15 • Recent trends in industry and academia
• Recommender etc.
16 Final Exam

You might also like