0% found this document useful (0 votes)
2 views19 pages

DSBA Curriculum Guide

The Post Graduate Program in Data Science and Business Analytics (PGP-DSBA) is designed for mid-senior and senior professionals to develop skills in data analysis, visualization, and machine learning using Python. The curriculum includes hands-on projects, mentorship, and a focus on practical applications to solve business problems. Participants will gain a comprehensive understanding of data science concepts, statistical analysis, and machine learning techniques to enhance their careers in this rapidly growing field.

Uploaded by

mrchiwondah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views19 pages

DSBA Curriculum Guide

The Post Graduate Program in Data Science and Business Analytics (PGP-DSBA) is designed for mid-senior and senior professionals to develop skills in data analysis, visualization, and machine learning using Python. The curriculum includes hands-on projects, mentorship, and a focus on practical applications to solve business problems. Participants will gain a comprehensive understanding of data science concepts, statistical analysis, and machine learning techniques to enhance their careers in this rapidly growing field.

Uploaded by

mrchiwondah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

POST GRADUATE PROGRAM IN

DATA SCIENCE
AND BUSINESS
ANALYTICS
CURRICULUM GUIDE

In collaboration with
ABOUT THE PROGRAM
The Post Graduate Program in Data Science and Business Analytics (PGP-DSBA) is tailored for
mid-senior and senior professionals. The program’s curriculum is designed for those interested in
extracting insights from data to create insightful stories and impact business decisions. Through the
program, learners will familiarize themselves with the tools and techniques required to solve business
problems.
Learners will discover how to analyze and visualize data using Python to extract valuable insights and
offer practical business recommendations. They'll also learn how to conduct statistical analysis to test
business hypotheses and create machine learning models for predicting future occurrences based on
data relationships.
This program is built around the fundamental learning principle of ‘learning by doing’. It focuses on
building a meaningfully practical skill set with hands-on case studies, hands-on projects, and a
portfolio of Data Science and analytics projects. PGP-DSBA is designed to help you transition or
advance into one of the fastest-growing careers of the modern world.

PROGRAM HIGHLIGHTS

7 Months Online

20 Weekly Online Mentorship


7 Hands-On Projects
Sessions

Academic Learning Support


Dedicated Program Manager (GL Community, Project Discussion
Forums, Peer Groups)

Personalized Evaluation and Postgraduate Certificate from


Feedback for all Projects UT Austin

Career Services (Career Prep


Shareable E-Portfolio Material, Profile Reviews, Career
Orientation Session, 1 CMS)

02
LEARNING OUTCOMES

A solid understanding of Data Science from a business, technical,


01 and conceptual perspective

Working knowledge of using Python to perform end-to-end data analysis


02 and extract strategic business insights for a variety of business problems

Ability to perform statistical analysis and extract statistical inferences


03 from linear models

Ability to independently solve business problems using analytics


04 and Data Science

Working knowledge of using Python to design and implement machine

05 learning models to predict future trends and make informed business


decisions

03
CURRICULUM
The curriculum, designed by the faculty of UT Austin, Great Learning, and leading industry
practitioners is taught by the best-in-class professors and practicing industry experts. The objective
of the program is to familiarize learners with the concepts of data science and business analytics
necessary to establish their career or transition to a career in the field of data science.

Pre-work: Introduction to Data Science

This course provides you with an understanding of the evolution of Data Science over time, its
application in industries, the mathematics and statistics behind them, and an overview of the life
cycle of building data-driven solutions.

• The Fascinating History of Data Science


• Transforming Industries through Data Science
• The Math and Stats underlying the technology
• Navigating the Data Science Lifecycle

Pre-work: Python

This course provides you with a fundamental understanding of the basics of Python programming
and builds a strong foundation of the basics of coding to build Data Science applications.

• Introduction to Python Programming


• Data Science Application Case Study

04
Module 1: Python Foundations

Master data storytelling with Python. Learn to read, manipulate, and visualize data, driving insights
for impactful business solutions through exploratory data analysis. Transform raw information into
compelling narratives.

Topic 1- Python Programming

Python is a widely used, high-level, interpreted programming language, having a simple,


easy-to-learn syntax that highlights code readability. This module will cover the fundamentals
of Python programming and taking the first steps in organizing data with Python.

Concepts Used:
• Variables and Datatypes
• Data Structures
• Conditional and Looping Statements
• Functions
Learning Outcomes: Learn about the fundamentals of Python programming (variables, data
structures, conditional and looping statements, functions).

Topic 2- Python for Data Science

NumPy is a Python package for mathematical and scientific computing and involves
working with arrays and matrices. Pandas is a fast, powerful, flexible, and simple-to-use
open-source library in Python to manipulate and analyze data. This module will cover these
important libraries and provide a deep understanding of how to use them to explore data.

Concepts Used:
• NumPy Arrays and Functions
• Accessing and Modifying NumPy Arrays
• Saving and Loading NumPy Arrays
• Pandas Series (Creating, Accessing, and Modifying Series)
• Pandas DataFrames (Creating, Accessing, Modifying, and Combining DataFrames)
• Pandas Functions
• Saving and Loading Datasets using Pandas

Learning Outcomes: Learn about two of the most commonly used libraries
(NumPy and Pandas) used in Data Science for reading and manipulating data.

05
Topic 3- Exploratory Data Analysis (Deep Dive)

Exploratory Data Analysis, or EDA, is a process of examining and visualizing data to uncover
patterns and extract meaningful insights from it and facilitates storytelling. This module
provides a deep insight on how to conduct EDA using Python and utilize the insights
extracted to drive business decisions.

Concepts Used:
• Data Overview
• Univariate Analysis
• Bivariate/Multivariate Analysis
• Missing Value Treatment
• Outlier Detection and Treatment

Learning Outcomes: Learn how to perform Exploratory Data Analysis (EDA) to extract
insights from data.

Module 2: Business Statistics

Utilize Python for statistical analysis. Validate business estimates through confidence intervals,
ensuring reliability. Test assumptions with hypothesis testing, guiding informed resource
allocation and strategic decision-making based on data distribution analysis.

Topic 1- Inferential Statistics Foundations

Inferential statistics is pivotal in statistical analysis and decision-making and involves


drawing conclusions about populations based on samples. This module will introduce
learners to the common probability distributions and how they are used to make
statistically-sound, data-driven decisions.

Concepts Used:
• Experiments, Events, and Definition of Probability
• Introduction to Inferential Statistics
• Introduction to Probability Distributions (Random Variable, Discrete and Continuous
Random Variables, Probability Distributions)
• Binomial Distribution
• Normal Distribution
• Z-Score

Learning Outcomes: Learn about the fundamentals of probability distributions and the
foundations of Inferential Statistics

06
Topic 2- Estimation and Hypothesis Testing

Estimation involves determining likely values for population parameters from sample data,
while hypothesis testing provides a framework for drawing conclusions from sample data to
the broader population. This module covers the important concepts of central limit theorem
and estimation theory that are vital for statistical analysis, and the framework for conducting
hypothesis tests.

Concepts Used:
• Sampling
• Central Limit Theorem
• Estimation
• Introduction to Hypothesis Testing (Null and Alternative Hypothesis, Type-I
• and Type-II errors, Alpha, Critical Region, P-Value)
• Hypothesis Formulation and Performing a Hypothesis Test
• One-Tailed and Two-Tailed Tests
• Confidence Intervals and Hypothesis Testing

Learning Outcomes: Learn about the Central Limit Theorem, estimation, and the key
concepts of Hypothesis Testing.

Topic 3- Common Statistical Tests


Hypothesis tests assess the validity of a claim or hypothesis about a population parameter
through statistical analysis. This module introduces learners to the most commonly used
hypothesis tests used in the world of Data Science and how to choose the right test for a
given business claim depending on the associated context.

Concepts Used:
• Common Statistical Tests
• Test for One Mean
• Test for Equality of Means (Known Standard Deviation)
• Test for Equality of Means (Equal and Unknown Std Dev)
• Test for Equality of Means (Unequal and Unknown Std Dev)
• Test of Independence
• One-Way ANOVA

Learning Outcomes: Learn about various commonly used statistical tests and their
implementation in Python with business examples.

07
Module 3: Supervised Learning - Foundations

Delve into linear models for uncovering relationships between variables and continuous outcomes.
Validate models for statistical soundness, drawing inferences to extract crucial business insights
into decision-making factors.

Topic 1- Intro to Supervised Learning - Linear Regression

Machine Learning (ML), a subset of Artificial Intelligence (AI), which focuses on developing
algorithms capable of learning patterns in data and making predictions without being explicitly
programmed to do so. Linear Regression is one of the most popular supervised ML algorithms
that identifies the degree of linear relationship in data. This module introduces participants to
ML and explores how linear regression can be used for predictive analysis.

Concepts Used:
• Introduction to Learning from Data
• Simple and Multiple Linear Regression
• Evaluating a Regression Model
• Pros and Cons of Linear Regression

Learning Outcomes: Understand the concept of learning from data, how the linear regression
algorithm works, and how to build and assess the performance of a regression model in Python.

Topic 2- Linear Regression Assumptions and Statistical Inference

The linear regression algorithm has a set of assumptions that need to be satisfied for the model
to be statistically validated and to be able to draw inferences from it. This module walks
participants through these assumptions, how to check them, what to do in case they are
violated, and the statistical inferences that can be drawn based on the model's output.

Concepts Used:
• Statistician vs ML Practitioner
• Linear Regression Assumptions
• Statistical Inferences from a Linear Regression Model

Learning Outcomes: Understand the underlying assumptions of a linear regression model,


how to check and ensure their satisfaction, and making statistical inferences from the model.

08
Module 4: Supervised Learning - Classification

Unlock the power of classification models to discern relationships between variables and
categorical outcomes. Extract business insights by identifying pivotal factors shaping
decision-making processes.

Topic 1- Logistic Regression

Logistic regression is a statistical modeling technique primarily used for modeling the
probability of binary outcomes. It finds applications in various fields such as medicine, finance,
and manufacturing. This module covers the theory behind the logistic regression model, how to
assess its performance, and how to draw statistical inferences from it.

Concepts Used:
• Introduction to Logistic Regression
• Interpretation from a Logistic Regression Model
• Changing the Threshold of a Logistic Regression Model
• Evaluation of a Classification Model
• Pros and Cons

Learning Outcomes: Understand the foundations of the Logistic Regression Model, how to make
interpretations from it, how to evaluate the performance of classification models, and how
changing the threshold of a Logistic Regression Model can help in improving predictions.

Topic 2- Decision Tree

Decision Trees are supervised ML algorithms that utilize a hierarchical structure for
decision making and can be used for both classification and regression problems.
This module dives into how a decision tree can be used to model complex, non-linear
data and how to improve the performance of Decision Trees using pruning techniques.

Concepts Used:
• Introduction to Decision Tree
• How a Decision Tree is Built
• Methods of Pruning a Decision Tree
• Different impurity measures
• Regression Trees
• Pros and Cons

Learning Outcomes: Understand the Decision Tree algorithm, how it’s built, the different
pruning techniques that can be used to improve performance, and learn about the different
impurity measures used to make decisions.

09
Module 5: Ensemble Techniques and Model Tuning

Combine the decisions from multiple models using ensemble techniques to arrive at
more robust models that can make better predictions.

Topic 1- Bagging and Random Forest

Random forest is a popular ensemble learning technique that comprises several decision trees,
each using a subset of the data to understand patterns. The outputs of each tree are then
aggregated to provide predictive performance. This module will explore how to train a random
forest model to solve complex business problems.

Concepts Used:
• Introduction to Ensemble Techniques
• Introduction to Bagging
• Sampling with Replacement
• Introduction to Random Forest
Learning Outcomes: Understand how ensemble techniques work, learn about sampling with
replacement and the concept of bagging, and build Random Forest models to make better
predictions.

Topic 2- Boosting

Boosting models are robust ensemble models that comprise several sub-models, each of which
are developed in a sequential manner to improve upon the errors made by the previous one.
These modules will cover essential boosting algorithms like Adaboost and XGBoost that are
widely used in the industry for accurate and robust predictions.

Concepts Used:
• Introduction to Boosting
• Boosting Algorithms like Adaboost, Gradient Boost, and XGBoost
• Stacking

Learning Outcomes: Understand the concept of boosting, the difference between bagging
and boosting, learn various boosting algorithms, and understand the concept of stacking.

10
Topic 3- Feature Engineering and Cross Validation

Feature engineering involves creating new input features or modifying existing ones to improve
a machine learning model's performance, and cross-validation is used for getting a better
assessment of a model performance. This module covers these two concepts along with
regularization to tune the performance of ML models and correctly assess their performance.

Concepts Used:
• Feature Engineering
• Cross-Validation
• Oversampling and Undersampling
• Regularization

Learning Outcomes: Learn how to handle imbalanced data, how to use the cross-validation
technique to get a better picture of model performance, and understand the concept of
regularization.

Topic 4- ML Pipeline and Hyperparameter Tuning

Hyperparameter tuning involves optimizing the configuration of a machine learning model to


enhance its performance. This module covers two common techniques to find the optimal
hyperparameters of an ML model given a business context, and how to create an ML pipeline to
conduct data processing and modeling in a streamlined and reproducible manner.

Concepts Used:
• Machine Learning Pipeline
• Model Tuning and Performance
• Hyperparameter Tuning
• Grid Search
• Random Search

Learning Outcomes: Learn how to optimize model performance using hyperparameter tuning
and how to automate standard workflows in a machine learning process using pipelines.

11
Module 6: Unsupervised Learning

Unlock the power of clustering algorithms to group data based on similarity, unveiling hidden
patterns and intrinsic structures. Explore dimensionality reduction techniques to grasp the
significance of streamlined data analysis.

Topic 1- K-Means Clustering

K-means clustering is a popular unsupervised ML algorithm that is used for identifying patterns
in unlabeled data and grouping it. This module dives into the working of the algorithm and the
important points to keep in mind when implementing it in practical scenarios.

Concepts Used:
• Introduction to Clustering
• Types of Clustering
• K-Means Clustering
• Importance of Scaling
• Silhouette Score
• Visual Analysis of Clustering

Learning Outcomes: Learn about the different types of clustering algorithms, how K-means
clustering works, how to determine the optimal number of clusters by comparing different
metrics, and the importance of scaling data.

Topic 2- Hierarchical Clustering and PCA

Hierarchical clustering organizes data into a tree-like structure of nested clusters, while
dimensionality reduction techniques are used to transform data into a lower-dimensional
space while retaining the most important information in it. This module covers the business
applications of hierarchical clustering and how to reduce the dimension of data using PCA
to aid in visualization and feature selection of multivariate datasets.

Concepts Used:
• Hierarchical Clustering
• Cophenetic Correlation
• Introduction to Dimensionality Reduction
• Principal Component Analysis

Learning Outcomes: Learn how to apply the hierarchical clustering technique to group similar
data points together and discover underlying patterns, understand the need for reducing
dimensions of the data, and understand the working of the PCA and how to transform data into
fewer dimensions using PCA.

12
Module 7: Introduction to Generative AI

In this course, you will get an overview of Generative AI, understand the difference between
generative and discriminative AI, design, implement, and evaluate tailored prompts for specific
tasks to achieve desired outcomes, and integrate open-source models and prompt engineering
to solve business problems using generative AI.

Topic 1- Introduction to Generative AI

Generative AI is a subset of AI that leverages ML models to learn the underlying patterns and
structures in large volumes of training data and use that understanding to create new data such
as images, text, videos, and more. This module provides a comprehensive overview of what
generative AI models are, how they evolved, and how to apply them effectively to various
business challenges.

Concepts Used:
• Supervised vs Unsupervised Machine Learning
• Generative AI vs Discriminative AI
• Brief timeline of Generative AI
• Overview of Generative Models
• Generative AI Business Applications

Topic 2- Introduction to Prompt Engineering

Prompt engineering refers to the process of designing and refining prompts, which are
instructions provided to generative AI models, to guide the models in generating specific,
accurate, and relevant outputs. This module provides an overview of prompts and covers
common practices to effectively devise prompts to solve problems using generative AI models.

Concepts Used:
• Introduction to Prompts
• The Need for Prompt Engineering
• Different Types of Prompts (Conditional, Few-shot, Chain-of-thought, Returning
Structured Output)
• Limitations of Prompt Engineering

13
Module 8: Introduction to SQL

This course will help you gain an understanding of the core concepts of databases and SQL, gain
practical experience writing simple SQL queries to filter, manipulate, and retrieve data from
relational databases, and utilize complex SQL queries with joins, window functions, and subqueries
for data extraction and manipulation to solve real-world data problems and extract actionable
business insights.

Topic 1- Querying Data with SQL

SQL is a widely used querying language for efficiently managing and manipulating relational
databases. This module provides an essential foundation for understanding and working with
relational databases. Participants will explore the principles of database management and
Structured Query Language (SQL), and learn how to fetch, filter, and aggregate data using SQL
queries, enabling them to extract valuable insights from large datasets efficiently.

Concepts Used:
• Introduction to Databases and SQL
• Fetching data
• Filtering data
• Aggregating data

Topic 2- Advanced Querying

SQL offers a wide range of numeric, string, and date functions, gaining proficiency in leveraging
these functions to perform advanced calculations, string manipulations, and date operations.
SQL joins are used to combine data from multiple tables effectively and window functions
enable performing complex analytical tasks such as ranking, partitioning, and aggregating data
within specified windows. This module provides a comprehensive exploration of the various
functions and joins available within SQL for data manipulation and analysis, enabling them to
summarize and analyze large datasets effectively.

Concepts Used:
• In-built functions (Numeric, Datetime, Strings)
• Joins
• Window functions

Topic 3- Enhancing Query Proficiency

Subqueries allow one to nest queries within other queries, enabling more complex and flexible
data manipulation. This module will equip participants with advanced techniques for filtering
data based on conditional expressions or calculating derived values to extract and manipulate
data dynamically.

Concepts Used:
• Subqueries
• Order of query execution

14
ENHANCE KNOWLEDGE WITH
SELF-PACED MODULES
The self-paced modules cater to skills that are complementary to those learnt in guided modules.
Since all learners do not need to/may not want to learn them, they have been kept as part of
self-paced modules. All these modules have similar high-quality recorded video lectures by UT Austin
faculty, global academicians, and industry experts, but do not have mentorship sessions. You can learn
them at your own pace and schedule, based on your interests and the current and future demands of
your role.

Introduction to Data Science


Gain an understanding of the evolution of Data Science over time, its application in industries,
the mathematics and statistics behind it, and an overview of the life cycle of building
data-driven solutions.

Pre-Work
Gain a fundamental understanding of the basics of Python programming and build a strong
foundation of coding to build Data Science applications.

Data Visualization in Tableau


Read, explore and effectively visualize data using Tableau and tell stories by analyzing data
using Tableau dashboards.

Time Series Forecasting


Learn how to describe components of a time series data and analyze them using special
techniques and methods for time series forecasting.

Marketing and Retail Analytics


Understand the role of predictive modeling in influencing customer behavior and how
businesses leverage analytics in marketing and retail applications to make data-driven decisions.

Finance and Risk Analytics


Cultivate a profound understanding of credit and market risk. Explore how predictive analytics
shapes risk modeling in financial institutions.

Web and Social Media Analytics


Understand tools of web analytics which form the basis for rational and sound online business
decisions. Learn how to analyze social media data, including posts, content, and marketing
campaigns, to create effective online marketing strategies.

Supply Chain and Logistics Analytics


Explore the discipline of supply chain management and its stakeholders. Understand the role
of logistics in businesses and supply chains, and learn methods of forecasting prices, demand,
and indexes.

Model Deployment
Learn the role of model deployment in realizing the value of an ML model and how to build
and deploy an application using Python.

15
BUILD INDUSTRY-RELEVANT SKILLS WITH
HANDS-ON PROJECTS

Practical Learning

7 hands-on
projects Skill Development
that will help
you with:

Portfolio Enhancement

Data Analysis for Food Aggregator


Explore food aggregator data to address key business questions, uncover trends, and suggest
actionable insights for improved operations and customer satisfaction.

A/B Testing for News Portal


Conduct A/B testing to gauge the effectiveness of a new landing page design for an online news
portal, comparing user engagement metrics to optimize website performance.

Dynamic Pricing Model for Devices Seller


Utilize linear regression to build a dynamic pricing model for a seller of used and refurbished
devices, identifying influential factors to optimize pricing strategies for profitability.

Classification Analysis for Hotel Bookings


Employ classification models to determine factors influencing hotel booking cancellations, aiding
in proactive management strategies and customer retention efforts.

Visa Approval Prediction with ML


Implement ensemble machine learning models to facilitate visa approval processes, recommending
profiles for certification or denial based on comprehensive analysis of applicant data.

New Wheels Data Analysis


Analyze a vehicle resale company's listing and customer feedback data, answer business
questions, and provide recommendations for the leadership to enable data-driven decision-making.

Stock Clustering for Portfolio Diversification


Analyze financial attributes of stocks to cluster and build a diversified investment portfolio,
optimizing risk management and potential returns through strategic asset allocation.

16
SAMPLE CASE STUDIES

Hotel Booking Cancellation Prediction

Build a Data Science solution for a chain of hotels that will help them predict the likelihood of
a booking getting canceled so that they can take measures to fill in potential vacancies and
reduce revenue loss.

Tools and Concepts: Exploratory Data Analysis, Decision Trees, Random Forest, Scikit
Learn and Pandas

Restaurant Review Analysis

Analyze the customer reviews for different restaurants for a leading global food aggregator
and use generative AI models to analyze the reviews and tag them, thereby enhancing the
company's ability to understand customer sentiments at scale, enabling data-driven
decision-making, and improving overall customer satisfaction.

Tools and Concepts: Generative AI, Large Language Models, Prompt Engineering,
Hugging Face

Machine Predictive Maintenance

Analyze the data of an auto component manufacturing company and develop a predictive
model to detect potential machine failures, determine the most influencing factors on machine
health, and provide recommendations for cost optimization to the management.

Tools and Concepts: Exploratory Data Analysis, Data Visualization, Decision Trees, Pruning,
Scikit-Learn

Rental Bike Count Prediction

Analyze the customer data of a bike-sharing company and build a model to predict the count
of bikes shared so that the company can make prior decisions for surge hours.

Tools and Concepts: Exploratory Data Analysis, Data Visualization, Decision Trees,
AdaBoost, XGBoost, Scikit-Learn

CredPay

Analyze the data provided by a consultation firm that partners with banks, answer key ques-
tions provided, draw actionable insights, and help the company to improve the business by
identifying the attributes of customers eligible for a credit card.

Tools and Concepts: Exploratory Data Analysis, Data Visualization, Pandas, Seaborn

17
Diabetes Risk Prediction

Analyze the historical patient data provided and build a predictive model to help identify
whether a person is at risk of diabetes or not.

Tools and Concepts: Exploratory Data Analysis, Data Visualization, Bagging, Random
Forests, Scikit-Learn

Music-Startup Data Analysis

Analyze the data from the database of a music-based startup that recently started selling
music records, answer questions for a performance review to identify customer preferences
by demographies, and generate recommendations to help business growth.

Tools and Concepts: Data Filtering, SQL Functions, Data Aggregation, JoinsSQL

Tourism Services Analysis

Analyze the data comprising economic, social, and environmental & infrastructure indicators,
and group countries based on them to help a tourism management organization identify key
locations to invest to promote tourism services.

Tools and Concepts: Exploratory Data Analysis, Data Visualization, K-means Clustering,
Hierarchical Clustering, Principal Component Analysis, Scikit-Learn

Diet Plan Analysis

Analyze the data provided by a health company regarding a market test experiment to check
the effectiveness of various diet plans for weight loss, and conduct hypothesis tests to find
evidence of whether the different diet plans differ significantly.

Tools and Concepts: Exploratory Data Analysis. Confidence Intervals, Hypothesis Testing,
ANOVA, Statsmodels

Online Course Provider Data Analysis

Analyze the platform engagement data of a massive open online course provider and create
an analytical report for a given academic year and enable informed decision-making regarding
actions for the next academic year.

Tools and Concepts: Exploratory Data Analysis, Data Visualization, Tableau

18
READY TO ADVANCE YOUR CAREER?

APPLY NOW

CONTACT US
+1 512 793 9938

[email protected]

https://onlineexeced.mccombs.utexas.edu/online-data-science-business-analytics-course

You might also like