0% found this document useful (0 votes)
70 views4 pages

Master of Science in Big Data Science Modules

This document outlines the modules for a Master of Science in Big Data Science. The modules cover topics like big data analytics, programming for data science, research methods, computational statistics, machine learning, data visualization, mathematical modeling, time series analysis and forecasting, a big data science project, and a dissertation. Some specific topics mentioned include the three V's of big data, clustering and mining large datasets, streaming data analysis, distributed computing, research design, Monte Carlo methods, machine learning algorithms, and applying statistical learning to big data problems.

Uploaded by

neer dinger
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views4 pages

Master of Science in Big Data Science Modules

This document outlines the modules for a Master of Science in Big Data Science. The modules cover topics like big data analytics, programming for data science, research methods, computational statistics, machine learning, data visualization, mathematical modeling, time series analysis and forecasting, a big data science project, and a dissertation. Some specific topics mentioned include the three V's of big data, clustering and mining large datasets, streaming data analysis, distributed computing, research design, Monte Carlo methods, machine learning algorithms, and applying statistical learning to big data problems.

Uploaded by

neer dinger
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 4

Master of Science in Big Data Science

MODULES

MODULE SYNOPSES

SCDS 5101: Big Data Analytics


The three V’s of Big Data (Volume, Velocity, and Variety); Building models for data; Understand the
occurrence of rare events in random data. Understand sources of big data such as the web and social
networks; Model social networks; Apply algorithms for community detection in networks. Clustering big
data: clustering social networks; Apply hierarchical clustering. Mining rapidly arriving data streams:
Understand the types of queries for data streams; Analyse sampling methods for data streams; Count distinct
elements in data streams; Filter data streams. Big Data landscape including examples of real-world big data
problems including the three key sources of Big Data: people, organizations, and sensors. Identify what are
and what are not big data problems and be able to recast big data problems as data science questions.

SCDS 5102: Programming for Data Science


High-performance computing using high-level languages (python, java), distributed computing, cores,
threads, and nodes. Operating systems, multicore architectures, file systems, point-to-point communication.
Single-core optimization, parallel algorithms: Collecting, storing and organizing data using big data solutions.
Techniques using real-time and semi-structured data examples. Systems and tools including AsterixDB, HP
Vertica, Impala, Neo4j, Redis, SparkSQL. Extracting value from existing untapped data sources and
discovering new data sources. Recognize different data elements in everyday life problems. Design a big data
Infrastructure. Plan and Information System Design. Identify frequent data operations required for various
types of data. Select a data model to suit the characteristics of big data. Apply techniques to handle streaming
data. Differentiate between a traditional Database Management System and a Big Data Management System.
Apply MapReduce using Hadoop; Compute PageRank using MapReduce. Spark, Hadoop, R and SAS,
Streaming, Data fusion, Distributed file systems; and Data sources such as social media and sensor data.

SCDS5103: Big Data Science Research Methods


Fundamentals of the research process; from developing a good research question to designing good data
collection strategies to putting results into context. Topics include but not limited to: research process,
research
ethics, planning for analysis, research claims, measurement, correlational and experimental design. Phases
and life cycles of research in data science.

SORS 5104: Computational Statistics


Random number generation. Monte Carlo Integration: Simulation and Monte Carlo integration, variance
reduction, stratified sampling. Resampling Methods: Bootstrapping. Jackknife resampling, percentile
confidence interval. Markov chain Monte Carlo methods; Markov chains, Metropolis-Hastings algorithms,
Gibbs sampling convergence. Density Estimation: Univariate estimation, kernel smoothing, multivariate
density estimation, Numerical Methods: root finding, constrained and unconstrained optimisation, EM
algorithm

SCDS 5201: Big Data Project Management


The big data ecosystem; technology-agnostic market, pillars of big data. Adopting big data analytics; vendor
selection, opportunities and implications, research and development pipelines, intellectual property protection,
clients and projects, mission-critical and availability. Project management; project failures and successes,
PMBOK and data science. Project lifecycles; estimation, scope, schedule, quality, staffing shortages,
communication, risk management, and mitigation. Methodologies; scrum and scrum again, big data hub, big
data factory, big data lake, big data foundry, big data as a service , big data analytics as a service. Platforms
and governance; security and services, process monitoring, compliance reporting, ethical issues, system
metrics and
KPIs. Program portfolio and program management office.
SCDS 5202: Machine Learning
Machine and statistical learning algorithms for big data, identify trends from the data, modeling trends for
prediction purposes as well as modelling for the detection of hidden knowledge. Supervised learning
algorithms and unsupervised learning algorithms. Stochastic gradient descent. Building a machine learning
algorithm, deep learning. Bayesian networks, Support vector machines. Programming for machine learning
(e.g python, c, java). New developments in regression and classification, probabilistic graphical models,
numerical. Bayesian and Monte Carlo methods, neural networks, decision trees, deep learning, and other
computational methods. R for data mining, cluster analysis, dimensional reduction, calculating statistical
significance

SCDS 5203: Big Data Visualisation


Visualisation component focusing on the encoding of information, such as patterns, into visual objects.
Visualization using python and R. Python pandas data science library, python lambdas, and the Numpy
library,
data cleaning and manipulation techniques. Data collection structures: list, creating lists. Data frames. File I/O
processing and regular expressions. Data gathering and cleaning. Data exploring and analysis.

Mathematical Modelling
Formulation and analysis of mathematical models. Mathematical tools include dimensional analysis,
optimization, simulation, probability, and elementary differential equations. Applications to biology, sports,
economics, and other areas of science. The necessary mathematical and scientific background will be
developed as needed.

Time Series Analysis and Forecasting


Probability models for time series: stationarity. Moving average (MA), Autoregressive (AR), ARMA and
ARIMA models. Estimating the autocorrelation function and fitting ARIMA models. Forecasting:
Exponential smoothing, Forecasting from ARIMA models. Stationary processes in the frequency domain: The
spectral density function, the periodogram, spectral analysis. State-space models: Dynamic linear models and
the Kalman filter.

SCDS 5204: Big Data Science Project


In this module, a student is expected to demonstrate the application of the theoretical Big Data Science
knowledge to solve real-world problems.A student is expected to identify a real-world problem from industry
and commerce. Projects shall be based on the entire big data lifecycle. This includes the gathering of data of
significant size as well as a final technical report describing the process followed and the deliverables.
Students may be allowed to work in pairs. The proposed project shall be subject to approval by the
Department of Computer Science. It is expected that a submission to a relevant journal is made at the end of
this module. This module is assessed entirely through coursework.

SCDS 6101: Dissertation


A student shall be allowed to commence a dissertation only after successfully completing their taught
modules. He/She is expected to identify real-world problems and provide well-researched solutions. It is
recommended that the approach to the research dissertation adheres to the phases of solving a big data science
project i.e. (i) discovery, (ii) data preparation (iii) model planning (iv) model building (v) operationalise and
communicative results. The completion of the dissertation shall culminate in the production of a dissertation
report.

You might also like