UNGWG Competency Framework
UNGWG Competency Framework
Competency Framework
for big data acquisition and processing
Table of Contents
1. Background ................................................................................................................................. 3
2. How to use this Competency Framework .................................................................................. 3
3. Big Data-related competencies according to the statistical production process....................... 4
4. Core competencies – areas of knowledge and skills .................................................................. 5
Ethics and privacy ....................................................................................................................... 5
Mathematics ............................................................................................................................... 6
Data management ...................................................................................................................... 7
Statistics...................................................................................................................................... 8
Machine Learning ....................................................................................................................... 9
Programming ............................................................................................................................ 10
Data visualization...................................................................................................................... 11
5. Generic skills ............................................................................................................................. 12
6. References ................................................................................................................................ 13
2
1. Background
Dynamic socio-economic changes which originated inter alia, from progressive and ubiquitous
digitalization of most areas of life, have tremendously transformed the data environment. We
can now speak of a data revolution at every stage of data management and processing, and a
vibrant data industry where private entities act as data owners at an unprecedented scale. One
of the most visible manifestations of the new circumstances is a shift in data users’
expectations. There is now increasing demand from commercial entities, the public and
government, for real-time information, that goes beyond traditional statistical production. It is
worth noting that the pace at which data is collected, processed and made available is often
key to stakeholders.
The above-stated circumstances impose immense pressure on official statistics. On the one
hand, its role to ensure the highest standards and quality of statistical information becomes
even more vital in the era of fake news and post-truth. On the other hand, it is expected to keep
up with the growing demands of data users. To this end, attempts to modernize statistical
production have been increasingly undertaken by national statistical organizations (NSOs),
since they recognize the potential of novel data processing techniques and new data sources.
Among the latter, big data have been of particular interest to the NSOs. Yet, they entail another
challenge, not only at the level of their acquisition and implementation into the statistical
production, but also in the realm of sustaining relevant skills which reach beyond the traditional
set of statistical competencies.
To address this challenge, the UN Global Working Group Task Team on Training, Competencies
and Capacity Development has developed this Competency Framework for use by NSOs. It
covers the wide array of skills and knowledge considered relevant for those working with big
data acquisition and processing. The proposed framework involves core competencies, as well
as a more general set of soft skills. They are outlined with reference to a simplified statistical
production process, and are followed by thematic blocks. The framework is accompanied by
the appendix with the list of selected IT packages and tools which are neither obligatory to
apply nor exhaustive, but might prove useful as a reference catalogue of existing applications.
3
3. Big data-related competencies according to the statistical production process
Data Data Data Data
acqusition processing analysis visualization
k
Ethics and privacy Ethics and privacy Ethics and privacy Ethics and privacy
Core competencies
agile project management agile project management agile project management agile project management
4
4. Core competencies – areas of knowledge and skills
Dimension 1
Name of the area Ethics and privacy
Dimension 2
Competence title To possess a basic level of ethics and privacy knowledge in below-listed issues:
and description 1) Basic definitions of issues related to the processing of big data (personal data
and anonymous data, active and passive big data, dimensions of big data,
consciously and not-consciously transferred data, etc.)
2) Philosophical aspects of collecting and processing big data (ethic control and
a pragmatic view of the impact on the life of people and organizations:
privacy, impact on personal capabilities and freedom, rights between data
owner and data explorer)
3) Legal framework for management of big data (personal data processing
steps and principles, privacy and transparency policy, data processing
purposes)
4) Technical aspects of work with private customer and identity data (obtaining
and sharing private information, transparent view of how our data is being
used, openness of data)
Dimension 3 A - Foundation B - Intermediate C - Advanced
Proficiency levels Demonstrate knowledge Demonstrate Thorough knowledge of the
and understanding of knowledge, application of personal data
basic rules of understanding and protection law, proficiency
philosophical, legal of putting into practice in personal data
collecting, processing philosophical, legal management and
and sharing of big data. and technical rules of skillfulness in performing
collecting, processing operations on varied data
and sharing of big sets respecting the law,
data. ethical norms, while
maintaining the highest
technical standards.
Advises others on the
ethical and privacy
considerations of data.
Dimension 4
Knowledge Know the rules for the processing of personal data
examples Understands the ethical basis of managing large customer data sets
Describe the advantages and disadvantages of the use of record level data
to achieve business purposes
Skills examples Able to develop a method of collecting, storing and sharing data in
accordance with law regulations and ethical standards in the organization
Able to assess whether the acquired data sets have personal data that allow
the identification of units
Describe and uses software that protects data against uncontrolled
disclosure
Attitude examples Pragmatic view of the impact of personal data regulations on the life of
people and organizations
Critical thinking around ethics
Understanding and acceptance for rights between data owner and data
explorer
Awareness of the responsibility for the use of private data
Awareness of disclosure control methods if outputs are identifiable
5
Dimension 1
Name of the area Mathematics
Dimension 2
Competence title To possess a basic level of mathematics knowledge in a range of below-listed
and description issues:
1) Basis of algebra: matrices and linear algebra, algebra of sets
2) Probability: theories (conditional probability, Bayes rule, likelihood,
independence) and techniques (Naive Bayes, Gaussian Mixture Models,
Hidden Markov Models)
Dimension 3 A - Foundation B - Intermediate C - Advanced
Proficiency levels Demonstrate knowledge Demonstrate Thorough knowledge of
and understanding of knowledge and algebra, and skillfulness in
algebra. understanding of performing operations on
algebra and methods, varied data sets. Is able to
and ability to apply advise others on the
some of them. possible solutions and
application of methods to
particular problems.
Dimension 4
Knowledge Know the rules for creating matrices
examples Know sentence logic and first order logic
Describe the theoretical basis of probability theories
Skills examples Carry out operation on matrices (addition, scalar multiplication and
transposition)
Able to study the basic properties of functions and relations
Able to indicate classes of equivalence relations abstraction
Attitude examples Prepared for independent study of connection issues in mathematics
language
Understand significant limitations in defining concepts and mathematical
attitude
6
Dimension 1
Name of the area Data management
Dimension 2
Competence title To possess data management knowledge in a range of below-listed issues:
and description 1) Database systems: database management systems, data models – definition
and types, entity relationship model, models implementation
(pre-relational, relational and object-oriented models)
2) Basics of cryptography: hash function, binary tree
3) Database: relational database, tabular data, data frames and series, shard,
on-line analytical processing, data warehousing, data lakes, data vaults,
logical multidimensional data model, extract, transform and load (ETL),
NoSQL
4) Varied data formats: (Json, shp, XML, csv)
Dimension 3 A - Foundation B - Intermediate C - Advanced
Proficiency levels Demonstrate knowledge Demonstrate knowledge Thorough knowledge of
and understanding basic and understanding of, proficiency in data base
data management skills. data base management management and
tools and methods, and skillfulness in performing
ability to apply some of operations on varied data
them. sets. Is able to advise
others in finding data
management solutions.
Dimension 4
Knowledge Know the basic concept of SQL and NoSQL databases (such as table, column,
examples row, field, field type, primary and foreign key, relations)
Understand the consequences of using the hash function
Know the basic elements of the SQL language
Define functional dependencies occurring among the analyzed data
Describe the existing database and indicate the appropriate transition keys
for the use for official statistics
Describe the advantages and disadvantages of a dataset in various formats
Skills examples Able to create database structures in selected database management
systems (e.g. MySQL, MongoDB, more in annex)
Select the most used method of going deeper through all the binary tree
nodes
Able to present the logical structure of the database using tables and
graphical relationships in selected programs (e.g. MS Access, Hbase, more in
annex).
Able to place and search specific information in the database
Use simple administrative tasks related to databases, e.g. backing up
structures and the data itself
Apply query to relational and non-reactive databases
Apply ETL techniques - acquisition, processing (including pre-purification)
and loading data from non-statistical sources
Attitude examples Systematically supplement knowledge of new trends in the field of computer
science on the subject of computer data storage
Identify data sources and assess their usefulness in complementing studies
at hand
Carefully analyze the data and adjust them to the needs of database users
Use metadata to clarify data processing.
Aware of logged data import, export, edit, processes
7
Dimension 1
Name of the area Statistics
Dimension 2
Competence title To possess a certain level of statistical knowledge in a range of below-listed
and description techniques, to understand and be able to apply selected techniques, to know their
underlying assumptions and limitations:
1) Descriptive statistics (mean, median, range, SD, var)
2) Analysis of variance (ANOVA, MANOVA, ANCOVA, MANCOVA);
3) Multiple regression, time-series, cross-sectional
4) Other multivariate techniques: principal components analysis, factor
analysis, clustering techniques; discriminant analysis
5) Stochastic Processes: e.g. Markov chains, queuing processes; Poisson
processes, random walks
6) Time Series Analysis: time series models; ARIMA processes and stationarity;
frequency domain analysis
7) Generalized linear model; any of: log-linear models; logistic regression, probit
models, Poisson regression
8) Hypothesis testing: formulation of hypotheses; types of error; p-values;
common parametric (z, t, F) or non-parametric (χ², Mann-Whitney U,
Wilcoxon, Kolmogorov-Smirnov) tests
9) Index numbers: Laspeyres/Paasche indices, hedonic indices; chaining;
arithmetic and geometric means as applied to indices.
Dimension 3 A - Foundation B - Intermediate C - Advanced
Proficiency levels Demonstrate knowledge Demonstrate Demonstrate knowledge
and understanding of knowledge, and understanding of
underlying assumptions understanding of underlying assumptions in
of at list two of the underlying own area of expertise as
above-listed assumptions and ability well as, more generally, in
areas/techniques. to apply at least four of other statistical areas. Is
the above-listed able to advise others and
techniques. use network of contacts to
ensure that the most
appropriate methodology
is applied.
Dimension 4
Knowledge Understand the theoretical basis of analysis of variance (e.g. ANOVA)
examples Describe the assumptions underlying the logistic regression
Understand the consequences of the assumptions not holding
Depict the expected output of factor analysis
Skills examples Compare selected statistical methods and specify differences between them
Select most relevant statistical method for a specific analytical problem
Deploy most relevant statistical technique for a specific data set and
analytical problem
Effectively and accurately interpret statistical output
Attitude examples Identify new statistical needs and develop statistical analyses to meet them
Provide critique of statistical analyses produced or received
Provide guidance on the selection of data sources and matching them with
relevant statistical techniques to meet the goals of the analysis at hand
8
Dimension 1
Name of the area Machine Learning (ML)
Dimension 2
Competence title To possess a combination of knowledge and skills in developing self-learning
and description algorithms, including:
1) Programming: data structures (stacks, queues, multi-dimensional arrays,
trees, graphs, etc.), algorithms (searching, sorting, optimization, dynamic
programming, etc.), computability and complexity (P vs. NP, NP-complete
problems, big-O notation, approximate algorithms, etc.)
2) Data modelling: finding useful patterns (correlations, clusters, eigenvectors,
etc.) and/or predicting properties of previously unseen instances
(classification, regression, anomaly detection, etc.)
3) Model evaluation: e.g. validation accuracy, precision, recall, F1-score, MCC,
MAE, MAPE, RMSE, PCC2
4) Application of ML algorithms and libraries: identification of a suitable model
(e.g. decision tree, nearest neighbor, neural network, SVM, etc.), selecting a
learning procedure to fit the data (e.g. linear regression, gradient descent,
genetic algorithms, bagging, boosting), controlling for bias and variance,
overfitting and underfitting, missing data, data leakage, among others
5) Understanding the digital product the ML solution will constitute part of
Dimension 3 A – Foundation B - Intermediate C – Advanced
Proficiency levels Demonstrate knowledge Demonstrate Demonstrate knowledge,
and understanding knowledge and understanding of
underlying assumptions understanding of probability theories and
of basic probability applying probability most of the statistical
theories and most theories and variety of methods and a variety of
common statistical the statistical methods ML techniques.
methods and machine and machine learning Demonstrates the ability
learning techniques, techniques. to apply various ML
programming skills in May have developed techniques in various
one of the ML-related further programming scenarios, and is able to
applications. skills in at least two of advise and lead others.
the packages and ability Have the understanding
to apply them to resolve and skills to fit the ML
ML-related analytical solution into a system of
problem. product/service at hand.
Dimension 4
Knowledge Understand Bayes rules
examples Understand the assumptions underlying model evaluation (quality)
indicators, e.g. accuracy, recall, F1 score
Understand the differences between neural networks and SVM
Skills examples Develop a statistical model and fit relevant ML techniques to the analytical
problem at hand (e.g. classification and coding, data edition and imputation,
image recognition optimization process)
Apply adequate model evaluation indicators
Attitude examples Proactive in searching for optimization opportunities in statistical production
with the use of ML
Monitor predictive performance of the employed model to ensure its quality
control, being up to date and ability to generate valid results
9
Dimension 1
Name of the area Programming
Dimension 2
Competence title To possess a certain level of proficiency in programming languages and tools in
and description terms of their functionality and employment for acquisition, processing and
visualizing data, as follows:
1) Basic programs to handle the data and create databases: MS Office (e.g. Excel
Analysis ToolPak, Access)
2) Relational database management language: SQL
3) Integrated development environments (IDE): R-Studio, Anaconda, more in
annex
4) Programming languages, statistical computing environments and results
visualization (e.g. Python, R), including:
a) Basics of programming: variables, functions, expressions, loops (break,
continue and for expressions)
b) Data structures: vectors, matrices, arrays, factors, lists, data frames
c) Uploading, editing, saving and exporting data (also use of the API)
d) Functions: built-in functions, User-Defined Functions (UDFs)
e) Factor analysis
Dimension 3 A – Foundation B – Intermediate C - Advanced
Proficiency levels Demonstrate knowledge Apply the appropriate Demonstrate knowledge
and understanding of programs, tools and and understanding of the
the basic functionalities perform intermediate advanced functionality of
of analysis tools with operations (loading, selected tools.
graphical interfaces editing, saving, In work with data, use
exporting data) advanced functionalities
Use of built-in functions of libraries and packages.
or define own function Able to advise others on
(UDFs) and perform the best tool to use for
factor analysis. the job in hand.
Dimension 4
Knowledge Know the types of queries used in relational databases
examples Understand the differences between sorts of data structures: vectors,
matrices, arrays, factors, lists, data frames
Describe the functionality of selected libraries and packages in Python, R
Depict the expected output of factor analysis
Skills examples Upload, edit, save and export data using Python and R programming
language
Develop and create a relational database using dedicated programs
Deploy selected library or package for in-depth data analysis
Obtain data for R package, determine their quality, build and graphically
present the model
Attitude examples Automate processes related to the development of raw statistical data
Discover dedicated libraries to facilitate statistical analysis with various file
formats
Systematically increase knowledge related to the technical process in the
field of coding practices to build scalable digital products
Use version control platforms to assist with collaboration
Understand the need to expand technological knowledge in order to
improve the skills of using new computer tools
10
Dimension 1
Name of the area Data visualization
Dimension 2
Competence title To possess the skills to create graphical representation of the information
and description derived from big data sources (e.g. trends, outliers, patterns), based on the
ensemble of the following areas of knowledge and competencies:
1) Mathematics basics: trigonometric function, linear algebra, geometric
algorithm, graph theory, etc.
2) Data management and analysis: data cleaning, statistics, modelling
3) Graphics: Canvas, SVG, WebGL, computational graphics, etc.
4) Programming (libraries and packages): e.g. R (ggplot2) and Python, Tableau,
Power BI, ArcGIS (more in annex)
5) Essential design principles: aesthetic, color, interaction, cognition, etc.
6) Visual solutions: coding, analysis, graphical interaction
Dimension 3 A - Foundation B – Intermediate C - Advanced
Proficiency levels General knowledge of Demonstrate Thorough knowledge of
visual solutions related knowledge of specific visual solutions related
to big data visual solutions related to big data.
Programming skills to to big data. Programming skills to
develop simple visual Programming skills to deploy a wide array of
representation of the apply a selection of appropriate visual
data (e.g. charts, graphs, more complex visual methods.
box plots, histograms, methods (e.g. area General knowledge of
infographics). chart, bubble cloud, graphic design, color
Good understanding of heat map, treemap, regimes applicable in
when to use which word cloud) and an certain domains (e.g.
graph. understanding of when map making).
to apply which visual Able to advise others on
method. the most appropriate
data visualization tool
to apply.
Dimension 4
Knowledge Understand trigonometric functions and their relation to data visualization
examples Understand graph theory
Understand visualization functions of analysis software
Understand color regime in maps development
Skills examples Prepare data sets for visualization purpose
Generate heat map
Apply adequate visualization technique to the data/analytical output at
hand
Able to simplify complex theories/data through visualization
Attitude examples Proactive in searching for the most attractive, yet clear visualization
techniques
Critically assess the match between the target audience and the purpose
of the information to be presented in order to utilize the most adequate
visualization forms
Proactive in exploring new data visualization techniques and packages in
order to enhance data presenting
11
5. Generic skills
Communication Able to link business orientation with the scientific, analytical, and technical
facets
Skillfully communicate findings to data users and decision-makers
Describe and explain, with influence, the value of work to the stakeholders
Able to effectively convey information to both, technical and non-technical
audiences
Curiosity Intellectually curious to look for answers to address statistical research
questions
Able to go beyond the initial assumptions of research and results
Keen to seek solutions for hidden, overlooked queries
Business Acumen Able to deal with a massive amount of knowledge and translate it
effectively for a non-technical audience
Equipped with knowledge of current and upcoming trends
Able to acquire foundations of relevant disciplines, concepts and tools
Possess knowledge and analytical skills of organization’s business
objectives in order to provide answers to current problems
Able to use data to accelerate the growth of the organization
Storytelling Convey results of work coherently and understandably
Use data visualization to present decision-makers concepts/ideas/
phenomena from a new perspective
Able to use different approaches to build narratives in order for
stakeholders to attain a new sense of clarity and identify the best course of
action
Adaptability Able to quickly adapt activities to the latest technologies
Respond to varying business trends
Critical Thinking Able to perform an objective analysis of a problem at hand and take
appropriate actions to solve it
Understand the need to take a closer look at the data source and critically
asses its quality, usefulness and potential problems associated with it
Logically identify strengths and weaknesses of ideas and technical
approaches and make effective decisions based on these attributes
Product Work with the customer to fully understand their needs, and regularly
Understanding report on progress for feedback
Able to propose actionable insights that can improve product quality
Understand the need to adapt the production process to the expected
product and its functionality
Ensure that a plan is in place for implementation of the new product, with
customer involvement
Team Player Understand importance of teamwork
Able to collaborate effectively with others
Able to manage a team effectively
Agile project Work closely with the customer to deliver in small increments
management Manage work and delivers to plan
12
6. References
1) 5 Skills You Need to Become a Machine Learning Engineer. (2020). Retrieved May 19, 2020, from
https://blog.udacity.com/2016/04/5-skills-you-need-to-become-a-machine-learning-
engineer.html
2) 5 Skills You Need to Become a Machine Learning Engineer. (2020). Retrieved May 19, 2020, from
https://blog.udacity.com/2016/04/5-skills-you-need-to-become-a-machine-learning-
engineer.html
3) Carretero S., Vuorikari R., Punie Y. (2017). DigComp 2.1, The Digital Competence Framework for
Citizens. European Commission. Retrieved May 19, 2020, from
http://publications.jrc.ec.europa.eu/repository/bitstream/JRC106281/web-
digcomp2.1pdf_(online).pdf
4) Competency framework for the Government Statistician Group. (2016). Retrieved May 19, 2020,
from: https://gss.civilservice.gov.uk/policy-store/government-statistician-group-gsg-
competency-framework/
5) Competency profiles created by the Modernisation Committee for Organisational Frameworks
and Evaluation. (2016). Retrieved May 19, 2020, from
https://statswiki.unece.org/display/bigdata/Competency+Profiles
6) Curriculum for data science. (2016). Retrieved May 19, 2020, from
https://www.cyfronet.krakow.pl/cgw16/presentations/S8_02_presentation-Edison-CGW-26-10-
2016.pdf
7) Data visualization beginner's guide: a definition, examples, and learning resources. (2020).
Retrieved May 19, 2020, from https://www.tableau.com/learn/articles/data-visualization
8) Ferrari A. DIGCOMP: A Framework for Developing and Understanding Digital Competence in
Europe. European Commission. (2013). Retrieved May 19, 2020, from
https://www.rebiun.org/sites/default/files/2017-11/JRC83167.pdf
9) OECD Competency Framework, Talent.oecd – Learn. Perform. Succeed. (2018). Retrieved May 19,
2020, from https://www.oecd.org/careers/competency_framework_en.pdf
10) Proposing a framework for Statistical Capacity Development 4.0. (2017). Retrieved May 19, 2020,
from https://paris21.org/sites/default/files/inline-files/CD4.0-Framework_final.pdf
11) Rybnicka D., Wilczek G. Python. (2015). Podstawy programowania. Retrieved May 19, 2020, from
https://python101.readthedocs.io/pl/py3/basic/basic.html
12) Rybiński M. (2020). Krótkie wprowadzenie do R dla programistów, z elementami statystyki
opisowej. Retrieved May 19, 2020, from https://www.mimuw.edu.pl/~trybik/edu/0809/rps/r-
skrypt.pdf
13) Roland van Loon. (2020). The Soft Skills That Are An Asset to Every Data Scientist. Retrieved May
19, 2020, from https://www.simplilearn.com/soft-skills-for-data-scientist-article
14) Statistician Competency Framework. Government Statistical Service. (2012). Retrieved May 19,
2020, from https://gss.civilservice.gov.uk/archive/wp-content/uploads/2012/12/Statistician-
competency-framework.pdf
15) The European Qualifications Framework. (2020). Retrieved May 19, 2020, from
https://www.cedefop.europa.eu/en/events-and-projects/projects/european-qualifications-
framework-eqf
13
16) The 30 Best Python Libraries and Packages for Beginners. Retrieved May 19, 2020, from
https://www.ubuntupit.com/best-python-libraries-and-packages-for-beginners/
17) Top 7 key skills required for Machine Learning jobs. (2018). Retrieved May 19, 2020, from
https://bigdata-madesimple.com/7-key-skills-required-for-machine-learning-jobs/
18) Tutorials in DataCamp. Retrieved May 19, 2020, from
https://www.datacamp.com/community/tutorials/functions-python-tutorial\
19) UN Competency Development – A Practical Guide. UN Office of Human Resources. (2010).
Retrieved May 19, 2020, from:
https://hr.un.org/sites/hr.un.org/files/Un_competency_development_guide.pdf
20) U.S. Census Data Science & Visualization Curriculum. Retrieved May 19, 2020, from
- https://datavizcatalogue.com/
- https://www.census.gov/dataviz/
- https://www.census.gov/data/adrm/what-is-data-census-gov.html
14
Appendix – List of programs and tools
MS Excel, Access, SQL Server, MySQL, Python (arrow, numpy, pandas), R (DBI,
Data management Dplyr, stringr), MS Azure, Apache Hadoop, Ataccama, Profisee, SAS,
Cassandra, MongoDB, Oracle NoSQL DB, Hbase
Programming Python, R, Linux commands, R-studio and Anaconda, Git and Github, SQL
15