0% found this document useful (0 votes)
388 views14 pages

ST2195 Programming For Data Science: Course Outline

This course provides an introduction to programming for data science applications. It covers principles of programming in R and Python with a focus on data handling, manipulation, visualization, and machine learning. The course aims to develop skills in relational databases, data wrangling, software development, and communicating data analysis. It is divided into 10 blocks that combine lectures, practical sessions, and additional revision and assessment activities. Students will complete an individual case study project worth 50% of the grade.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
388 views14 pages

ST2195 Programming For Data Science: Course Outline

This course provides an introduction to programming for data science applications. It covers principles of programming in R and Python with a focus on data handling, manipulation, visualization, and machine learning. The course aims to develop skills in relational databases, data wrangling, software development, and communicating data analysis. It is divided into 10 blocks that combine lectures, practical sessions, and additional revision and assessment activities. Students will complete an individual case study project worth 50% of the grade.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

ST2195 Programming for Data Science

Course Outline

1
Introduction

Name: Leong Chee Ming


Email: [email protected]

2
Introduction

This course will cover the main principles of computer


programming with a focus on data science applications by
following the entire pathway from raw data to databases, data
wrangling and visualisation, machine learning frameworks up to
software development.

3
Aims and Objectives

• Gain knowledge on the main principles of programming in the data science


context
• Develop ability to handle and visualise data
• Apply computational thinking in various applications domains
• Provide training in state-of-the-art tools, e.g. SQL, Python, R and Git
• Communicate the data analysis results to stakeholders and share work
with people in the Data Science industry

4
Learning Outcomes

At the end of the course and having completed the essential reading and
activities students should be able to:

• Convert raw data to relational databases such as SQL


• Import data to Python and R, apply data manipulation and visualization
• Program in Python and R
• Develop software using version control via Git

5
Assessment

1. Individual case study piece of coursework (50%)


2. Two-hour unseen written examination (50%)

6
Course Materials
The following materials, available through the UOL Virtual Learning Environment (VLE),
will be your main resources
1. ST2195 Subject Guides
2. ST2195 Practice Assignments

Readings/References:
• McKinney W. Python for Data Analysis, 2nd edition O’Reilly (2017)
• Gutagg J.V. Introduction to Computation and Programming using Python, MIT
Press, 2nd edition (2017)
• Wickham H. and Grolemund G. R for Data Science, 1st edition O’Reilly (2017)
• Wickham H. Advanced R., 1st edition Chapman & Hall (2015)
• Rammakrishnan R. and Gehrke J. Database Management Systems, 3rd edition,
McGraw Hill (2002)

7
Course Structure – Lectures and Practical Sessions

Course materials divided into 10 Blocks


• Each Block will be covered in ~2 Lectures and 1 Practical Session
• Total of 19 Lectures and 10 Practical Sessions

Lectures will go through the content in the ST2195 Subject Guides (19 sessions)
• Coverage of key concepts/highlights
• Illustrations through demos and trying out the code

Practical Sessions will go through the ST2195 Practice Assignments (10 sessions)
• Do the assignment in groups of 4-5 students each
• For each assignment, a few groups will be randomly selected to present their solution

8
Course Structure – Details
Block Objective
1 • Introduce yourself to Data Science and review real-world data examples
Programming Tools

• Gain experience using basic tools and technology for programming, such as notebooks, IDEs and
Data Science
Ecosystem &

version control using Git.

Each Block will have ~2


Lectures and 1 Practical
Session

• Gain familiarity with various data types and structures as well as popular data-exchange formats
Interacting with Data

(e.g. JSON, XML, CSV).


2
Structures

• Be able to work with various data types and structures and data-exchange formats in R and
Python.

• Use relational database models and structured query languages (SQL)


3
• Gain experience with interfacing SQL from R and Python

9
Course Structure – Details (cont’d)

Block Description
Core Programming Concepts

• Understand and use basic programming concepts such as control flow, variable and function
scoping in R.
4
• Understand and use basic programming concepts such as exceptions, error handling, testing
and debugging in R.

• Understand and use basic programming paradigms in R.


5
• Understand and use basic programming concepts and paradigms in Python.
Wrangling

• Understand and use data types and structures in R.


Data

6 • Demonstrate how to clean and manipulate data in R.


• Manage data types and structures in Python.

10
Course Structure – Details (cont’d)
Block Description
Graphics and Data

• Gain familiarity with graphics and data visualizations in R.


7
Visualisation

• Understand the grammar of graphics paradigm and its implementation in R.

• Use graphics frameworks in Python.


8
• Produce network visualisations
Frameworks

• Introduce yourself to Machine Learning frameworks.


Learning
Machine

• Gain experience with the formation of data analytic pipelines and principles of parallel computing.
9
• Interact with Machine Learning frameworks in R.
• Interact with Machine Learning frameworks in Python.
Development
Software

• Gain experience with documenting code


10 • Understand software testing frameworks and test-driven development.
• Developing R and Python packages

11
Course Structure – Additional Activities

Revision Class (1 session)


• Scheduled for March 2022

Individual Project Reviews (2x)


• 1st Submission – Plan and Commencement (early Jan 2022)
• 2nd Submission – Near Completion (mid-end Feb 2022)

Class Test/Assignment (2x)


• Take-home mode, self-timed, closed-book

12
Course Structure – Summary

Course Divided into 10 Blocks


• Lectures (19 sessions)
• Practical Sessions (10 sessions)

Additional Activities
• Revision Class (1 session)
• Individual Project Reviews (2x)
• Class Test/Assignment (2x)

13
Key Takeaways

• R and Python (also SQL, Git)


• Know how to find help
• Practice is important
• Individual project worth 50%

14

You might also like