Post-Graduate Diploma in Data Science in Health and Climate Change for Social Impact
2024-25 (Semester 1)
Module Sub-topic Description Date Exam Date
1 Introduction to Data Introduction to Statistics and Overview of the program: Unique Data Science opportunities and challenges Nov 03 Oct 6 - Oct 27 Overview of data sciece approaches 10 - 11 AM Types of data 2 Working with Data Introduction to Excel, R and Introduction to excel, R and Python Python Reading and Writing Data using R and Python Dec 15 Oct 13 - Nov 03 Basic exploration of Data 2 - 3 PM Data Quality (Seeing and Analyzing) Remediation Approaches (Filtering, Transformations, Imputation) 3 Inferential Statistics and Basics of Biostatistics Getting Data into Shape(Long and Wide Comparative Data Sciences Data) Random variable and probability Measures of Centrality and Deviation Probability Distribution:Discrete and Continuous Moments of distribution (mean, variance, skewness, kurtosis) Probability density function and Cumulative density function Dec 22 Nov 10 - Dec 1 Central Limit Theorem and Normal 10 - 11 AM Distribution Sampling distribution (e.g, sampling distribution of the mean) Foundational Data Analytics Quantiles and Rank Statistics Testing for Normality and Data Transformations Statistical Inference: Parametric Tests
Statistical Inference: Non-parametric Tests
4 Modelling and Basic Machine Linear Algebra and Regression
Learning for Health and Models Introductory Linear Algebra Climate Applications Supervised and Unsupervised Learning
Regression: Linear Model General Linear
Model Classification: Logistic Regression and Multinomial Model Variable Selection: Stepwise approaches Feb 16 Dec 15 - Jan 19 10 - 11 AM Machine Learning Introduction to Machine Learning & Bayes Theorem High Dimensional Data: Principal Component Analysis and t-SNE Machine Learning Approaches: Decision Tree, Neural Networks. SVM, Random Forest, Bayesian Networks Overfitting, Generalization, Regularization
Unsupervised Learning: Clustering
5 Visualization of Health and Advanced Visualization Introduction to ggplot (R) and seaborn Climate Data Techniques in Data Sciences (python) Advanced Visualization Strategies in Data Sciences e.g., heatmaps, sankey plots, radar charts, sunbursts, word clouds, waterplots, icicle plots Interactive Data Visualization (using plotly, networkD3) 3D data visualization Creating Dashboard Introduction to Geographical Introduction to Spatial Data and Spatial Data & Spatial Statistics Models: Geostatistical data; Lattice sata; Mar 16 Feb 2 - Mar 2 and Point data 10 - 11 AM Characterising Spatial Autocorrelation (Metrics) and Relevant Issues for Classical Regression Analysis Exploratory Spatial Data Analysis and Stationarity of Spatial Random Processes
Measuring Spatial Dependence and Spatial
Heterogeneity Environmental Pollution and Economic Growth application with hands-on exercises on ArcGIS and R
(Wiley Series in Probability and Statistics) David C. Hoaglin (Editor)_ Frederick Mosteller (Editor)_ John W. Tukey (Editor) - Understanding Robust and E
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados