0% found this document useful (0 votes)

32 views54 pages

ACM Task Force On Data Science Education: Draft Report and Opportunity For Feedback

Uploaded by

Kassahu alebel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views54 pages

ACM Task Force On Data Science Education: Draft Report and Opportunity For Feedback

Uploaded by

Kassahu alebel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 54

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/331302134

ACM Task Force on Data Science Education: Draft Report and Opportunity for
Feedback

Conference Paper · February 2019

DOI: 10.1145/3287324.3287522

CITATIONS READS
12 641

4 authors, including:

Paul M. Leidig
Grand Valley State University
22 PUBLICATIONS 171 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Information Systems Curriculum 2020 View project

ACM Data Science Taskforce View project

All content following this page was uploaded by Paul M. Leidig on 22 October 2019.

The user has requested enhancement of the downloaded file.

Computing Competencies for
Undergraduate
Data Science Curricula

Initial Draft

January 2019

ACM Data Science Task Force

We welcome your feedback. Please submit

comments at
https://goo.gl/forms/pCQroVdI8sOtscRi1
by March 31, 2019.
ACM Data Science Task Force

Andrea Danyluk, Co-Chair, Williams College & Northeastern University, USA

Paul Leidig, Co-Chair, Grand Valley State University, USA

Scott Buck, Intel Corporation, USA

Lillian Cassel, Villanova University, USA

Andrew McGettrick, University of Strathclyde, UK

Weining Qian, East China Normal University, China

Christian Servin, El Paso Community College, USA

Hongzhi Wang, Harbin Institute of Technology, China

CONTENTS
Chapter 1 Introduction

1.1 Charter
1.2 Prior work on defining data science curricula
1.3 Committee work and processes
1.4 Survey of academic and industry representatives
1.5 Knowledge areas
1.6 Data Science in context
1.7 Competency framework
1.8 Motivating the study of data science
1.9 Overview of this report
References

Chapter 2 The Competency Framework

2.1 Competency in theory

2.1.1 Meaning of competency
2.1.2 A performance perspective on learning
2.1.3 Learning transfer
2.2 Competencies and professional practice
References

Appendix A A Draft of Competencies for Data Science

Appendix B A Summary of Survey Responses

B.1 Academic Survey
B.2 Industry Survey
Chapter 1: Introduction

1.1 Charter

At the August 2017 ACM Education Council meeting, a task force was formed to explore a
process to add to the broad, interdisciplinary conversation on data science, with an articulation of
the role of computing discipline-specific contributions to this emerging field. Specifically, the
task force would seek to define what the computing/computational contributions are to this new
field, and provide guidance on computing-specific competencies in data science for departments
offering such programs of study at the undergraduate level.

There are many stakeholders in the discussion of data science – these include colleges and
universities that (hope to) offer data science programs, employers who hope to hire a workforce
with knowledge and experience in data science, as well as individuals and professional societies
representing the fields of computing, statistics, machine learning, computational biology,
computational social sciences, digital humanities, and others. There is a shared desire to form a
broad interdisciplinary definition of data science and to develop curriculum guidance for degree
programs in data science.

This volume builds upon the important work of other groups who have published guidelines for
data science education. There is a need to acknowledge the definition and description of the
individual contributions to this interdisciplinary field. For instance, those interested in the
business context for these concepts generally use the term “analytics”; in some cases, the
abbreviation DSA appears, meaning Data Science and Analytics.

This volume is a draft articulation of computing-focused competencies for data science. It

recognizes the inherent interdisciplinarity of data science and situates computing-specific
competencies within the broader interdisciplinary space.
1.2 Prior work on defining data science curricula

As an inherently interdisciplinary area, data science generates interest within many fields. (See
Figure 1.) Accordingly, there have been a number of Data Science curriculum efforts, each
reflecting the perspective of the organization that created it.

Figure 1

This project looks at data science from the perspective of the computing disciplines, but
recognizes that other views contribute to the full picture. The following examples are especially
important, and have informed the committee’s work.

The EDISON Data Science Framework (2018)

EDISON is a project started in September 2015 “with the purpose of accelerating the creation of
the Data Science profession.” The core EDISON consortium consists of seven partners across
Europe. Since 2015, the group has worked to create the EDISON Data Science Framework. This
collection of documents includes a general introduction, as well as four detailed components,
including:

• Data Science Competences Framework

• Data Science Body of Knowledge
• Data Science Model Curriculum
• Data Science Professional Framework

This comprehensive set of curricular volumes parallels the intended structure of our work.
EDISON was in earlier stages as this project began; at present, it is clear that there are significant
overlaps, and future versions of our work with reconcile our model with the EDISON
curriculum, with the intention of creating a complementary volume, rather than a replicated or
competing volume.

The National Academies of Science, Engineering, and Medicine Report on Data Science for
Undergraduates (2018)

As the press release announcing the publication of the National Academies report states, “Data
science draws on skills and concepts from a wide array of disciplines that may not always
overlap, making it a truly interdisciplinary field. Students in many fields need to learn about data
collection, storage, integration, analysis, inference, communication, and ethics.” The report
highlights the demand for data scientists and calls for a broad education for students across
programs of study. Identifying many data science roles, including those related to hardware and
software platforms, data storage and access, statistical modelling and machine learning, and
business analytics, among others, the report does not presume that every data scientist will be
expert in all areas, but rather that programs will develop to allow graduates to fulfil specific
roles.

The intent of the National Academies report was to highlight the importance, breadth, and depth
of data science, and to provide high-level guidance for data science programs. It is not a detailed
curricular volume in the sense of the EDISON project or this ACM Data Science effort.

The Park City Report (2017)

The Park City Math Institute 2016 Summer Undergraduate Faculty Program convened with the
purpose of articulating guidelines for undergraduate programs in data science. The three-week
workshop brought together 25 faculty from computer science, statistics and mathematics. The
base assertion of the report and proposed curriculum is that data is the core: “The recursive data
cycle of obtaining, wrangling, curating, managing and processing data, exploring data, defining
questions, performing analyses, and communicating the results lies at the core of the data science
experience.”

The resulting list of key competencies shows the interdisciplinary nature of data science, with an
understandable focus on the mathematics and statistics:

• Computational and statistical thinking

• Mathematical foundations
• Model building and assessment
• Algorithms and software foundation
• Data curation
• Knowledge transference – communication and responsibility

The role of computer science appears in the description of computational thinking: “Data science
graduates should be proficient in many of the foundational software skills and the associated
algorithmic, computational problem solving of the discipline of computer science.” However,
further description relates these skills to understanding the programming and algorithms behind
“professional statistical analysis software tools.”
The Park City report deserves further description. It includes an outline of the Data Science
Major:

1. Introduction to data science

a. Introduction to Data Science I
b. Introduction to Data Science II
2. Mathematical foundations
a. Mathematics for Data Science I
b. Mathematics for Data Science II
3. Computational thinking
a. Algorithms and Software Foundations
b. Data Curation—Databases and Data Management
4. Statistical thinking
a. Introduction to Statistical Models
b. Statistical and Machine Learning
5. Course in an outside discipline

The report also includes a description of each of the courses. For the purposes of this report, it is
noted that programming is introduced in Introduction to Data Science I and II, and appears again
as a part of Algorithms and Software Foundations. The course in Data Curation includes
traditional databases as well as newer approaches to data storage and interaction. The course in
Statistical and Machine Learning “blends the algorithmic perspective of machine learning in
computer science and the predictive perspective of statistical thinking.”

Although there certainly are additional aspects of computer science that are relevant to the
preparation of a student of data science, there is clearly an effort to combine the mathematical
and computer science contributions to produce a blended program. This ACM Data Science
report builds on the Park City work with a heavy orientation toward computer science. The
position of the Task Force is that any Data Science program will have to reflect competencies in
mathematics, statistics, and computer science, possibly with different emphases. This is
consistent with the view of the National Academies report. Graduates of programs following the
Park City guidelines will have valuable strengths and graduates of programs following these
ACM guidelines will have different, but equally valuable strengths.

The Business Higher Education Framework (BHEF) Data Science and Analytics (DSA)
Competency Map (2016)

The work provides a four-level competency map. The base, or Tier 1, level describes personal
effectiveness competencies. These are not considered competencies learned in school, but rather
part of an individual’s personal development. Examples include integrity, initiative,
dependability, adaptability, professionalism, teamwork, interpersonal communication, and
respect.

Tier 2 describes academic competencies to be acquired in higher education. These are most
relevant to this report and include the following:
• Deriving value from data
• Data literacy
• Data Governance and Ethics
• Technology
• Programming and Data Management
• Analytic Planning
• Analytics
• Communication

Tier 3 presents workplace competencies: planning and organizing, problem solving, decision-
making, business fundamentals, customer focus, and working with tools and technology.

Tier 4 is for Industry-Wide Technical Competencies. These are not specified, but represent skills
that are common across sectors of a larger industry context.

Though Tier 2 includes a competency in “Programming and Data Management,” the description
mentions only “Write data analysis code using modern statistical software (e.g., R, Python, and
SAS).” This set of competencies does not address a need for developing new software or
systems in support of data science, but relies on available tools.

Business Analytics Curriculum for Undergraduate Majors (2015)

This report was produced in 2015 by the Institute for Operations Research and the Management
Sciences (INFORMS). Reflecting the focus of programs in Business, this INFORMS curriculum
assumes basic computer literacy as a starting point. It suggests revising some of the standard
courses in statistics to meet newer needs. The resulting course list includes: Data Management,
Descriptive Analytics, Data Visualization, Predictive Analytics, Prescriptive Analytics, Data
Mining, and Analytics Practicum. It also includes electives.

Like the guidelines from the Business Higher Education Framework, the focus is on doing
something with data, primarily to serve business needs. There is no mention of programming.
The data management course includes SQL, but has no prerequisites. The emphasis in the data
mining course is on framing a business problem. Data mining techniques are compared, and
large datasets are to be used. The tools to be used for that purpose are not specified.

Initial workshops related to this ACM Data Science Curriculum effort (2015)

In October 2015, the National Science Foundation sponsored a workshop with representatives of
many perspectives on data science. Some attendees represented established programs, others
represented societies with an interest in data science. The final report, “Strengthening Data
Science Education Through Collaboration,” describes the discussions and reflects the diversity of
opinions. Although opinions varied, there were some areas of agreement. Those form the basis
of the list of Knowledge Areas in this current ACM report.
Summary

The review of existing curricular efforts suggests that it would be important to capture in a single
volume the contributions that computing makes to data science. Through developments such as
the Internet of Things, sophisticated sensors, face recognition and voice recognition, automation,
etc., computing opens up many avenues for data collection. It can play a vital role as a custodian
of information with great attention being paid to maintenance but crucially also to security and
confidentiality matters. Then the analysis of large amounts of information and utilization of that
for the purposes of machine learning or augmented intelligence in its various roles can bring
significant benefit.
1.3 Committee Work and Processes

The Data Science Task Force was initiated at a meeting of the ACM Education Council in
August 2017. The Co-Chairs were appointed at the meeting and were charged with developing a
charter for the work, as well as assembling a task force with global representation.

The Co-Chairs drafted a proposal to create the Task Force, which was approved by the ACM
Education Board in January 2018. The initial Task Force – approximately two-thirds of the
members of the current committee – convened for a full-day meeting in February 2018.

In preparation for a second face-to-face meeting in July 2018, the Task Force designed two
surveys to gather input from academia and industry on the computing competencies most central
to Data Science. The results of the survey are presented in this report, with details provided in
Appendix B. During this time, the Co-Chairs invited additional members to join the committee
and began to develop a global advisory group.

At the July 2018 meeting, the ACM Task Force developed the set of computing-focused
Knowledge Areas for Data Science that appear in this report and began to articulate
competencies in each of those areas.

With the release of this first draft report, the ACM Data Science Task Force is calling for
discussion and feedback from all data science constituencies. The Task Force will be presenting
the report and gathering comments at conferences and meetings, including Educational Advances
in Artificial Intelligence (EAAI-9), held at AAAI in January 2019; the SIGCSE Symposium in
February 2019; and the Joint Statistical Meetings in July 2019. The Task Force also welcomes
feedback by email to the Co-Chairs:

• Andrea Danyluk ([email protected])

• Paul Leidig ([email protected])

1.4 Survey of Academic and Industry Representatives

In order to gain an understanding of the current data science landscape, the ACM Data Science
Task Force conducted a survey of ACM members, representing academic institutions and
industry organizations. Through outreach to ACM members, the Task Force was also able to
reach computing professionals outside of ACM membership. In all cases, the Task Force sought
global participation. There were 672 responses to the academic survey and 297 responses to the
industry survey.

Academic Survey

The academic survey asked academics whether their institution had any sort of data science
program at the undergraduate level, asked what type of program was offered, in what
department(s) it was housed, and what computing areas were required, elective, or not present in
the program. It also allowed respondents to add to the list of computing areas specified in the
survey. Finally, the survey asked participants whether their data science program had a “data
science in context” requirement – i.e., a requirement that students apply data science to another
area.

Nearly half of respondents from academic institutions (47%) reported they did not offer an
undergraduate data science program. However, over half of those who reported offering some
type of program offered a full bachelor’s degree in data science.

Nearly all of the programs offering a bachelor’s degree in data science required courses in
programming skills and statistics. In addition, the majority of programs also required data
management principles, probability, data structures and algorithms, data visualisation, data
mining, and machine learning. Other courses included topics such as ethics, calculus, discrete
mathematics and linear algebra. We note that a majority of programs also required a “data
science in context” course.

Administratively, the largest percentage of programs were housed in a computer science

department; however, almost as many were in an “other” category. This result might be
somewhat skewed, given that the survey was fielded primarily with ACM members.

Additionally, over half of these programs reported graduating 10 students or less annually.

We expect that the number of Data Science programs will increase, as will the number of
students choosing to study it. This, then, is an ideal time to articulate computing-based
competencies for those programs.

Industry Survey

The industry survey roughly mirrored the academic survey; however, the primary question was
whether a company looked for job applicants with data science experience and what computing
experience they required or preferred those applicants to have.
In the survey of industry representatives, nearly half (48%) responded that they look for
candidates specifically with data science or analytics degrees or educational backgrounds.
We found it particularly interesting that the majority of employers reported these employees
work as individual contributors on data science tasks.

Industry respondents reported requiring experience or skills in similar areas to those required by
college or university Data Science programs. One slight difference is that employers reported
requiring more computing skills than statistical or mathematical skills.

Other Observations

The ACM Task Force was somewhat surprised by certain survey results. For instance, industry
respondents did not report data security and privacy as a required competency area for job
applicants. We note that this may reflect employers’ understanding of what Data Science (and
Computer Science) programs are requiring of their majors. That is, it might reflect the reality of
the applicant pool, rather than a “wish list” of competencies.

Similarly, we note that academic institutions reported what they currently require, rather than
what they would require in an ideal world. This might, in some cases, reflect the availability of
courses and faculty at an institution, rather than a “gold standard” for Data Science programs.

A more detailed summary of survey results is presented in Appendix B.

1.5 Knowledge Areas

Following the work of previous ACM curricular volumes (see [ACM 2013], for instance), this
report is organized around Knowledge Areas (KAs) whose origins are based on survey input (see
Section 1.4) as well as prior work, with special attention being given to the results of the
workshop reported in [CasTopi 2015].

The core computing discipline-specific Knowledge Areas for Data Science are:

• Computing Fundamentals, including Programming, Data Structures, Algorithms, and

Software Engineering
• Data Acquirement and Governance
• Data Management, Storage, and Retrieval
• Data Privacy, Security, and Integrity
• Machine Learning
• Data Mining
• Big Data, including Complexity, Distributed Systems, Parallel Computing, and High
Performance Computing
• Analysis and Presentation, including Human-Computer Interaction and Visualization
• Professionalism

Other areas of computing may merit attention: sensors and sensor networks, the Internet of
Things, vision systems, among others.

In addition, for a full curriculum the above need to be augmented with courses covering calculus,
discrete structures, probability theory, elementary statistics, advanced topics in statistics, and
linear algebra.

1.6 Data Science in context

In addition to developing foundational skills in computing and statistics, data science students
should also learn to apply those skills to real applications. It is important for data science
education to incorporate real data used in an appropriate context.

Data Science curricula should include courses designed to promote dual coverage combining
both data science fundamentals and applications, exploring why people turn to data to explain
contextual phenomena. Such courses highlight how valuable context is in data analytics; where
data are viewed with narratives, and questions often arise about ethics and bias. It can be
beneficial to teach some courses with a disciplinary context so that students appreciate that data
science is not an abstract set of approaches. Related application disciplines might include
physics, biology, chemistry, the humanities, or other areas.
1.7 Competency Framework

The Competency Framework provides a framework for the description of the various Knowledge
Areas. Each KA is described by some preliminary descriptive material followed by a set of
topics and a set of associated competencies; levels of competence vary, with some requiring
greater expertise than others.

The details of the Competency Framework are described in Chapter 2. The descriptions of the
Knowledge Areas are then provided in Appendix A.

1.8 Motivating the Study of Data Science

Those who study Data Science have to develop a mind set with a strong focus on data – the
collection of data and, through analysing it appropriately, using this to bring about beneficial
insights and changes. For instance:

• Obtaining data about the quality of air in a city can result in removing dangerous
pollution or sending warning messages to those who suffer from asthma.

• Collecting data about traffic in real time can result in steps being taken to avoid traffic
congestion.

• Collecting patient data can lead to new insights for disease diagnosis and treatment.

• Recording data about speech in a certain area can assist with speech recognition.

The possibilities are endless, and the contributions that Data Science can make to transforming
businesses, transforming society and basically shaping the future for the better are huge. The
possibilities also carry with them potentially negative consequences.

Students of Data Science need to be imbued with the ‘joy of data’, seeing data as the ‘currency
or fuel of our time’. They also need to be imbued with a strong sense of professional and ethical
responsibility. Data Science courses ought to reflect such sentiments; likewise the education of
data scientists.

The topic of careers is of course important from a marketing perspective. Suffice it to say that the
current demand is considerable and growing daily.

1.9 Overview of this Report

Having set the scene in this chapter, the second chapter sets out the Competency Framework
used in describing the various Knowledge Areas in some detail. The computing related KAs are
captured in Appendix A.
References

[ACM 2103] Computer Science Curricula 2013: Curriculum Guidelines for Undergraduate
Degree Programs in Computer Science (ACM/IEEE 2013):
https://www.acm.org/education/CS2013-final-report.pdf

[ASA 2014] Curriculum Guidelines for Undergraduate Programs in Statistical Science (ASA
2014b): http://www.amstat. org/education/pdfs/guidelines2014-11-15.pdf>

[BHEF 2106] Data Science and Analytics (DSA) Competency Map, Business The
Business Higher Education Framework (BHEF) version 1.0 produced in November 2016

[CasselTopi 2015] Strengthening Data Science through Collaboration, by Lillian Cassel and
Heikki Topi, Technical Report and report of 2015 NSF Workshop.
http://www.computingportal.org/sites/default/files/Data%20Science%20Education%20Worksho
p%20Report%20.0_0.pdf

[CUPM 2015] Curriculum Guide to Majors in the Mathematical Sciences (MAA 2015). See
http://www.maa.org/ sites/default/files/pdf/CUPM/pdf/CUPMguide_print.pdf

[EDISON ] The Edison Data Science Competence Framework

http://edison-project.eu/edison/edison-data-science-framework-edsf.

[Edison 2015] Data science professional uncovered: How the Edison project will
contribute to a widely accepted profile for data scientists, by Manieri, A.; Brewer, S.; Riestra, R.;
Demchenko, Y.; Hemmje, M.; Wiktorski, T.; Ferrari, T.; and Frey, J. published in IEEE 7th
International Conference on Cloud Computing Technology and Science (CloudCom), 588–593.
National Academies of Sciences, Engineering, and
Medicine. 2018.

[INF 2015] Business Analytics Curriculum for Undergraduate Majors, Coleen R. Wilder,
Ceyhun O. Ozgur (2015) published in INFORMS Transactions on Education 15(2):180-187.
https://doi.org/10.1287/ited.2014.0134

[NatAc 2018] Data Science for Undergraduates: Opportunities and Options, published by the
National Academies of Sciences, Engineering, and Medicine, 2018. Washington, DC: The
National Academies Press. https://doi.org/10.17226/25104

[Park City 2017] Curricular Guidelines for Undergraduate Programs in Data Science by
DeVeaux, R.; Agarwal, M.; Averett, M.; Baumer, B.; Bray, A.; Bressoud, T.; Bryant, L.; Cheng,
L.; Francis, A.; Gould, R.; Kim, A.; Kretchmar, M.; Lu, Q.; Moskol, A.; Nolan, D.; Pelayo, R.;
Raleigh, S.; Sethi, R.; Sondjaja, M.; Tiruviluamala, N.; Uhlig, P.; Washington, T.; Wesley, C.;
White, D.; and Ye, P. 2017. Annual Review of Statistics and Its Application 4:15–30.
Chapter 2: The Competency Framework

Much of the material in this chapter leans very heavily on (i.e., is taken verbatim from) the work
of IT2017 – see [IT2017]. The motivation for this is to maintain consistency across a set of
curricula documents produced by ACM.

2.1 Competency in theory

2.1.1 Meaning of Competency

Learning outcomes are written statements of what a learner is expected to know and be able to
demonstrate at the end of a learning unit (or cohesive set of units, course module, entire course,
or full program).

In contrast, with the wide agreement on the meaning of learning outcomes, there is extensive
confusion and vagueness around the terms competence and competency. Generally, the term
competence refers to the performance standards associated with a profession or membership to a
licensing organization. Assessing some level of performance in the workplace is frequently used
as a competence measure, which means measuring aspects of the job at which a person is
competent. Competencies are what a person brings to a job conceptualized as qualities by which
people demonstrate superior job performance [Kli1].

There is general agreement in education that success in college and career readiness requires that
students develop a range of qualities [Ken1, Nas1, Nrc1], typically organized along three
dimensions: knowledge, skills, and dispositions. We utilize a working definition of competency
that connects knowledge, skills, and dispositions. Figure 2.1 (adapted from IT2017, which is, in
turn, adapted from [Ccs1, p. 5]) shows these interrelated dimensions of competency.

COMPETENCY = KNOWLEDGE + SKILLS + DISPOSITIONS

KNOWLEDGE
• Mastery of content knowledge • Transfer of learning

SKILLS
• Capabilities and strategies for higher-order thinking
• Interactions with others and world around

DISPOSITIONS
• Personal qualities (socio-emotional skills, behaviors, attitudes) associated with success in
college and career

Figure 2.1 Interrelated dimensions of competency

In the working definition of competency, the three interrelated dimensions have the following
meanings:

• Knowledge designates a proficiency in core concepts (or topics) and content of Data
Science and application of learning to new situations. This dimension usually gets most
of the attention from teachers, when they design their syllabi; from departments, when
they develop program curriculum; and from accreditation organizations, when they
articulate accreditation criteria.

• Skills refer to capabilities and strategies that develop over time, with deliberate practice
and through interactions with others and the real world. [Nrc1]. Skills also require
engagement in higher-order cognitive activities, meaning that “hands-on” practice of
skills join with a “minds-on” engagement. The inextricable connection between
knowledge and skills is evident in Michael Polanyi’s characterization of explicit versus
tacit knowledge [Pol1]. Explicit knowledge, or “know-that,” reflects core ideas and
principles, and corresponds to the knowledge dimension in our definition. Tacit
knowledge, or “know-how,” is a skillful action requiring sustained engagement and
practice. Problem-based assignments, real-world projects, and laboratory activities with
workplace relevance are examples of curriculum elements that focus on developing skills.
Well-designed syllabi and accredited programs are mindful of skill development when
they articulate student outcomes at course and program level.

• Dispositions encompass socio-emotional skills, behaviors, and attitudes that characterize

the inclination to carry out tasks and the sensitivity to know when and how to engage in
those tasks [Per2]. Originating from the field of vocational education and research on
career development, dispositions have received increasing attention in the K–12
computer science education community [Ste1]. Formulating an operational definition of
computational thinking, Barr and Stephenson [Bar2] included the dispositions category to
capture areas of values, motivation, feelings, stereotypes, and attitudes such as
confidence in dealing with complexity, tolerance to ambiguity, persistence in working
with difficult problems, and knowing one’s strengths and weaknesses and setting aside
differences when working with others. To distinguish dispositions from knowledge and
skills, we use Schussler’s view that a disposition “concerns not what abilities people
have, but how people are disposed to use those abilities.” [Sch1]

2.1.2 A Performance Perspective on Learning

A transmission theory of teaching, also known as teacher-focused, holds that knowledge emerges
as it transmits from the expert teacher to the inexpert learners with the objective of ‘getting it
across’ or covering all the topics in the material. The opposing theory of active learning is that
students themselves create meaning and develop understandings with the help of appropriately
designed learning activities. In undergraduate education, the active learning model underlies a
shift of the paradigm that has governed higher education institutions. The traditional paradigm of
providing instruction dominated by a passive lecture-based learning environment has shifted to
producing learning and creating experiences in which students are active participants in the
learning process [Bar1].
On a student learning continuum from passive (attending a standard lecture) to active (engaged
in problem solving with peers), to produce high level of student engagement means to design
learning activities in which students do more than taking notes, recalling, observing, or
describing. They learn more effectively when their active participation consists of asking
questions, applying concepts, discovering relationships, or generalizing a solution to new
situations [Big2]. Higher level of engagement cannot be encouraged if teaching is only about
declarative and procedural knowledge: information, vocabulary, basic concepts, basic knowhow,
and discrete skills [Wig1]. Indeed, students need the acquisition of knowledge and development
of basic skills, but this is just a means to a more important preparation for authentic performance
tasks and transfer of learning in new situations.

Perkins and Blythe formulated a “performance perspective” of learning and offered the view that
“understanding something is a matter of being able to carry out a variety of performances
concerning the topic.” [Bly1, Bly2] A performance perspective of learning requires a “modicum
of transfer, because it asks the learner to go beyond the information given” and seeks to “...
transcend the boundaries of the topic, the discipline, or the classroom.” [Per1]

2.1.3 Learning Transfer

The conventional way of framing curriculum guidelines for computing programs, has, until
recently, been content driven. A disciplinary body of knowledge decomposes into areas, units,
and topics to track recent developments in a rapidly changing computing field. For this report,
we follow the approach of the IT2017 report, which used the Understanding by Design (UbD)
framework [Wig1] to present a competency-based curricular framework.

The idea of the UbD framework is to treat content mastery as a means, not the end, to long-term
achievement gains that a program of study envisions for its graduates. Learners could know and
do many discrete things, but still not be able to see the bigger picture, put it all together in
context, and apply their learning autonomously in new situations.

In the UbD framework, learning transfer is multi-faceted as shown in Table 2.1 [Wig2]. We note
that these facets of learning transfer blended skills and dispositions. Explain, interpret, apply and
adjust are skills complemented by dispositions related to showing empathy, perceiving
sensitively, recognizing bias, considering various points of view, or reflecting on the meaning of
new learning and experiences. Dispositions relating to meta-cognitive awareness include being
responsible, adaptable, flexible, self-directed, and self-motivated, and having self-confidence,
integrity, and self-control. They also include how we work with others to achieve a common goal
or solution.
Table 2.1: Six facets of learning transfer adapted from Understanding by Design framework and
reproduced from IT2017.

Learners make connections, draw inferences, express them in their

Explain own words with support or justification, use apt analogies; teach
others.
Learners make sense of, provide a revealing historical or personal
dimension to ideas, data, and events; interpretation is personal and
Interpret
accessible through images, anecdotes, analogies, and stories; turn
data into information; provide a compelling and coherent theory.
Learners use what they have learned in varied and unique
Apply situations; go beyond the context in which they learned to new
units, courses, and situations, beyond the classroom.
Learners see the big picture, are aware of, and consider various
Demonstrate
points of view; take a critical and disinterested stance; recognize
Perspective
and avoid bias in how positions are stated.
Show Learners perceive sensitively; can “walk in another’s shoes;” find
Empathy potential value in what others might find odd, alien, or implausible.
Learners show meta-cognitive awareness on motivation,
confidence, responsibility, and integrity; reflect on the meaning of
Have Self- new learning and experiences; recognize the prejudices,
Knowledge projections, and habits of mind that both shape and impede their
own understanding; are aware of what they do not understand in a
specific context.

2.2 Competencies and professional practice

On a practical, operational level, competencies are conceptualized as higher-level learning

outcomes linked to performance tasks and are descriptive of the professional context of those
tasks. We follow the Van der Klink and Boon advice that the “fuzziness” of competencies
“disappears in the clarity of learning outcomes.” [Kli1] A sensible method to articulate
competencies is to select learning outcomes that lead to achieving those competencies along with
evaluation indicators suggestive of a professional context [Ken2]. A performance perspective on
learning [Per1] is not possible without performance-based assessments. The design of
performance assessments considers authentic situations and aspects of work that professionals
encounter and through which they demonstrate expertise.

A competency-based approach to a Data Science curricular framework considers the long-term

goal of learning to achieve genuine competence through ongoing transfer of what students learn
and graduates develop in their professions and advanced academic studies. To articulate
performance goals for each Data Science domain, the task group follows the recommendation of
IT2017: the UbD approach of considering performance verbs associated with the six facets of
learning transfer: explain, interpret, apply, demonstrate perspective, show empathy, and have
self-knowledge as described in Table 2.1. A sample list of performance verbs that generate ideas
for performance goals and professional practice [Wig2] appears in Table 2.2 below; they are
useful in describing the Data Science competencies expected from Data Science graduates.

Table 2.2: Performance verbs to generate ideas for performance goals and professional practice
(Reproduced from IT2017)

Demonstrat
Show Have Self-
Explain Interpret Apply e
Empathy Knowledge
Perspective
create
analogies
critique
adapt
document
build
evaluate
demonstrate create assume
illustrate
derive describe debug role of be
judge analyze
how design decide like
make sense argue be aware of
exhibit express design be open to
of make compare realize recognize
induce instruct exhibit believe
meaning of contrast reflect self-
justify model invent consider
provide criticize assess
predict prove perform imagine
metaphors infer
show how produce relate role
read
synthesize teach propose play
between the
solve test
lines
use
represent
tell a story
of translate

References

[Bar1] Barr, R.B. and Tagg, J. 1995. From Teaching to Learning: A New Paradigm for
Undergraduate Education. Change, 27(5), 12-25.

[Bar2] Barr, V. and Stephenson, C. 2011. Bringing computational thinking to K-12:

What is involved and what is the role of computer science education community? ACM Inroads,
2, 1 (May 2011), 48-54.

[Big2] Biggs. J. 1999. Teaching for Quality Learning at University – What the Student
Does (1st Edition), SRHE / Open University Press, Buckingham.

[Bly1] Blythe, T. 1998. The teaching for understanding guide. San Francisco: Jossey-
Bass. Blythe, T, and Perkins, D. 1988. Understanding understanding. In T.

[Bly2] Blythe (Ed.), The teaching for understanding guide, 9-16. San Francisco: Jossey-
Bass.
[Ccs1] Council of Chief State School Officers. 2013. Knowledge, Skills, and
Dispositions: The Innovation Lab Network State Framework for College, Career, and
Citizenship Readiness, and Implications for State Policy.

[IT2017] Information Technology 2017, Final Curriculum Report IT2017 published by

th
ACM on 10 December 2017

[Ken1] Kennedy, D., Hyland, Á., & Ryan, N. 2007. Writing and using learning
outcomes: a practical guide. Cork: University College Cork.

[Ken2] Kennedy, D., Hyland, A, and Ryan, N. 2009. Learning Outcomes and
Competences. Bologna Handbook. Introducing Bologna Objectives and Tools, B 2.3-3, 1-18.

[Kli1] Klink M. van der, Boon, J., and Schlusmans, K. 2007. Competences and
vocational higher education: Now and in future. European Journal of Vocational Training No 40
– 2007/1, 67-82.

[Nas1] National Academies of Sciences, Engineering, and Medicine. 2016. Supporting

Students’ College Success; Assessment of Intrapersonal and Interpersonal Competencies.
Washington, DC: The National Academies Press. https://doi.org/10.17726/24697.

[Nrc1] National Research Council. 2012. Education for Life and Work: Developing
Transferable Knowledge and Skills in the 21st Century. Washington, DC: The National
Academies Press. https://doi.org/10.17226/13398.

[Per1] Perkins, D. 1993. Teaching for understanding. American Educator: The

Professional Journal of the American Federation of Teachers, 17(3), 8, 28-35, Fall 1993.

[Per2] Perkins, D., Jay, E., and Tishman, S. 1993. Beyond abilities: A dispositional
theory of thinking. Merrill-Palmer Quarterly, 39(1), 1-21.

[Pol1] Polanyi, M. 1966. The Tacit Dimension. University of Chicago Press: Chicago.

[Sch1] Schussler, D.L. 2006. Defining dispositions: Wading through murky waters. The
Teacher Educator, 41(4).

[Ste1] Stephenson, C. and Malyn-Smith, J. 2016. Computational thinking from a

dispositions perspective. The Keyword, Google Education. Accessed July 6, 2017,
https://www.blog.google/topics/education/computational-thinking- dispositions-
perspective/?m=0.

[Wig1] Wiggins, G.P., McTighe, J., and Ebrary, I. 2005. Understanding by design
(Expanded second edition). Alexandria, VA: Association for Supervision and Curriculum
Development.
[Wig2] Wiggins, G., and McTighe, J. 2011. The Understanding by Design Guide to
Creating High-Quality Units. Alexandria, VA: Association for Supervision and Curriculum
Development.
Appendix A: A Draft of Competencies for Data
Science

Computing Fundamentals
Data scientists should be able to implement and understand algorithms for data collection and
analysis. They should understand the time and space considerations of algorithms. They should
follow good design principles developing software, understanding the importance of those
principles for testability and maintainability.

Programming

This includes development and implementation of algorithms, as well as integration with

existing software and/or tools.

Computing Fundamentals: Programming

Scope Competencies

● Development and implementation ● Design an algorithm in a

of algorithms, including integration programming language to solve a
with various existing software simple problem
and/or tools. ● Use the techniques of
● Usage of traditional programming decomposition to modularize a
languages to integrate existing program.
interfaces between datasets and ● Create code in a programming
applications. language that includes primitive
● Usage of a programming language data types, references, variables,
designed for statistical computing in expressions, assignments, I/O,
the context of a data science control structures, and functions
problem. ● Create a simple program that uses
● Abstract Data Types (ADT) to recursion.
create a simple program ● Illustrate the use of databases and
● Potential vulnerabilities by a given apply SQL and NoSQL
program ● Write a regular expression to match
● Programming code that utilizes a pattern
preconditions, postconditions, and ● Use standard libraries for a given
invariants. programming language
● Design and implement programs
that use a database
● Use techniques for searching
patterns in data
● Implement good documentation
practices in programming
Data Structures

A data scientist should know a variety of data structures, be able to use them, and understand the
implications of choosing one over another.

Computing Fundamentals: Data Structures

Scope Competencies
● Classification of data storage, ● Compare various data structures for
accessibility and complexities, a given problem, such as array, list,
based on implementation and set, map, stack, queue, hash table,
operation of domain-cluster tree, and graph
problems and/or applications ● Compare the trade-offs of different
● Analysis of a proper data structure representations of a matrix and
that suits data formats and common operations such as
constraints addition, subtraction, and
● Choice of adequate data structure multiplication
based on the preliminary ● Recognize data structures obtained
information about the data after called script-based subroutines
● Evaluate how efficient data
structure for the insert, remove, and
access operations
Algorithms

A data scientist should recognize that the choice of algorithm will have an impact on the time
and space required for a problem. A data scientist should be familiar with a range of algorithmic
techniques in order to select the appropriate one in a given situation.

Computing Fundamentals: Algorithms

Scope Competencies
● Problem solving through ● Analyze the differences between
algorithmic, computational and iterative vs recursive-based
statistical thinking. algorithms
● Algorithm design, implementation, ● Implement an efficient search
and analysis. algorithm to find a target with
● Comparison of various well-known certain characteristics
computing algorithms’ complexity, ● Provide the big Oh time and space
including machine learning (ML) for a given procedure
and statistics techniques. ● Evaluate best, average, and worst-
● Complexity of a given algorithm case behaviors of an algorithm.
● Factors that influence the algorithm ● Apply an appropriate algorithmic
complexity and performance approach to a given problem.
● Computational performance of ● Contrast which technique is more
certain algorithms based on appropriate to use based on a given
providing different data sets. scenario
Software Engineering

Software engineering principles include design, implementation and testing of programs. A data
scientist should understand design principles and their implications for issues such as
modularization, reusability, and security.

Computing Fundamentals: Software Engineering

Scope Competencies
● Software engineering principles, ● Implement a small software project
including design, implementation that uses a defined coding standard.
and testing of programs. ● Incorporate statistical models into
● Principles of object-oriented design the software lifecycle
such as encapsulation, inheritance ● Evaluate results of a program by
and polymorphism to address utilizing statistical significance
concerns such as modularization, testing
reusability and security; ● Demonstrate how software interacts
● Principles of functional with various systems, including
programming to maintain complex information management,
scaling applications and embedded, process control, and
model/function composition communications systems
● Principles of compiled imperative ● Test a given piece of code by
programming for numeric including security, unit testing,
computations and scientific system testing, integration testing,
computing. and interface usability
● Probabilistic computing for testing
and software lifecycle
Data Acquirement and Governance

There can be no analysis of data without the data itself. A data scientist must understand the
source and quality of their data, as well as understand appropriate processes for acquiring and
maintaining high quality data.

Data Acquirement and Governance

Scope Competencies
● Shaping data and their ● Construct and tune the data
relationships. acquirement and governance
● Acquiring data from physical process according to the
world and extracting data to a requirements of applications,
form suitable for analysis. including the selection of data
● Integrating heterogeneous data sources and data acquirement
sources. equipment, data preparation
● Preprocessing and cleaning data algorithms and steps. (Process
for applications. Construction and Tuning)
● Define and write semantics rules
for data acquirement and
governance, including
information extraction, data
integration and data cleaning
(Rules Definition)
● Develop scalable and efficient
algorithms for data acquirement
and governance according to the
property of data and the
requirements of applications,
including data proper discovery,
data acquirement, information
extraction, data integration, data
sampling, data reduction, data
compression, data transformation
and data cleaning algorithms
(Algorithm Development)
● Describe and discover the static
and dynamic properties of data,
changing mechanisms of data
and similarity between data.
(Property Description and
Discovery)
Data Management

A data scientist must understand the storage, maintenance, and retrieval of data.

Data Management

Scope Competencies
● Storing and indexing (structured, ● Design the logical and physical
semi-structured and structure for effective data
unstructured) data management according to data
● Data models; query languages type, data model and application.
based on the data model ● Design index structure for
● Effective conceptual models and efficient query processing and
architectures for databases information retrieval.
● Data retrieval: queries, ● Describe the semantic
keywords, efficiency. requirements of data access in a
● Processing transactions in declarative language or a
database management systems keyword set.
● Scaling database management ● Tune and optimize the storage
systems. structure and query processing in
data management systems for
scalability and efficiency issues.
● Determine a strategy for
transaction processing to balance
efficiency, scalability and
consistency of data management
systems, especially for parallel
and distributed environments.
● Design scalable and efficient
algorithms for query processing,
query optimization, transaction
processing as well as information
retrieval.
Data Privacy, Security, and Integrity

Data Privacy

This is intended to provide students with an understanding of data privacy and its related
challenges. Students are expected to understand the tradeoffs of sharing and protecting sensitive
information; and how domestic and international privacy rights impact a company’s
responsibility for collecting, storing, and handling data. [xref: Professionalism: Privacy and
Confidentiality]

Data Privacy, Security, and Integrity: Privacy

Scope Competencies
● Interdisciplinary tradeoffs of ● Evaluate and understand the
privacy and security. concept of privacy, including the
● Individual rights and impact on societal definition of what
needs of society. constitutes personally private
● Technologies to safeguard data information and the tradeoffs
privacy. between individual privacy and
● Relationships between security.
individuals, organizations, and ● Summarize the tradeoff between
governmental privacy the rights to privacy by the
requirements. individual versus the needs of
society.
● Evaluate common practices and
technologies, and identifying the
tools that reduce the risk of data
breaches while safeguarding data
privacy.
● Thoroughly comprehend how
organizations with international
engagement must consider
variances in privacy laws,
regulations, and standards across
the jurisdictions in which they
operate. This topic includes how
laws and technology intersect in
the context of the judicial
structures that are present –
international, national and local –
as organizations safeguard
information systems from
cyberattacks.
Data Security

This focuses on the protection of data at rest, during processing, and in transit. This area requires
the application of mathematical and analytical algorithms.

Data Privacy, Security, and Integrity: Security

Scope Competencies
● Basic concepts in cryptography: ● Describe the purpose of
Encryption/decryption, sender cryptography and list ways it is used
authentication, data integrity, in data communications; and which
non-repudiation; Attack cryptographic protocols, tools and
techniques are appropriate for a
classification (ciphertext-only,
given situation. Describe the
known plaintext, chosen
following terms: cipher,
plaintext, chosen ciphertext); cryptanalysis, cryptographic
Secret key (symmetric), algorithm, and cryptology, and
cryptography and public-key describe the two basic methods
(asymmetric) cryptography. (ciphers) for transforming plaintext
● Role of mathematical techniques into ciphertext. Explain how public
for encryption. key infrastructure supports digital
● Role of symmetric (private key) signing and encryption and discuss
ciphers for data security. limitations/vulnerabilities.
● Role of asymmetric (public-key) ● Exhibit a mathematical
understanding of encryption
ciphers for data security.
algorithms, such as modular
● Cross-border privacy and data
arithmetic, Fermat, Euler theorems,
security laws.
primitive roots, discrete log
● What are the data security laws problem, primality testing, factoring
and how do they impact. large integers, elliptic curves,
● lattices and hard lattice problems,
abstract algebra, finite fields, and
information theory.
● Describe methods for data security,
such as block ciphers and stream
ciphers (pseudo-random
permutations, pseudo-random
generators), Feistel networks, Data
Encryption Standard (DES),
Advanced Encryption Standard
(AES).
● Describe how mathematical
concepts (such computational
complexity) contribute to
algorithms for data security.
● Explain requirements of the General
Data Protection Regulation
(GDPR); and Privacy Shield
agreement between countries, such
as the United States and the United
Kingdom, allowing the transfer of
personal data.
● Describe how certain laws [such as
the following in the USA: Section 5
of the U.S. Federal Trade
Commission, State data security
laws, State data-breach notification
laws, Health Insurance Portability
Accountability Act (HIPAA),
Gramm Leach Bliley Act (GLBA),
and Information sharing through
US-CERT, Cybersecurity Act of
2015] impact data security.

Data Integrity

Data integrity refers to the overall soundness, completeness, accuracy, and consistency of data.

Data Privacy, Security, and Integrity: Integrity

Scope Competencies
● Approaches to the accuracy and ● Explain the concepts of message
consistency (validity) of data. authentication codes (HMAC,
CBC-MAC); Digital signatures;
Authenticated encryption, and
Hash trees that provide data
integrity.
Machine Learning

Machine learning refers to a broad set of algorithms and related concerns for discovering patterns
in data, making new inferences based on data, and generally improving the performance of a
software system without direct programming. These methods are critical for data science. Data
scientists should understand the algorithms they apply, be able to implement them, if necessary,
and make principled decisions about their use.

Machine Learning
Scope Competencies
• Broad categories of machine • Compare and contrast broad
learning approaches (e.g., classes of learning approaches,
supervised and unsupervised) that with a focus on inputs, outputs,
make assumptions about the data and ranges of problem types to
available at learning time and the which they can be applied.
general types of inferences that • Select and apply a broad range of
can be made from that data. machine learning
• Algorithms and tools (i.e., tools/implementations to real data.
implementations of those • Derive a (current) learning
algorithms) in each of the broad algorithm from first principles
learning categories. and/or justify a (current) learning
• Machine Learning as a set of algorithm from a mathematical,
principled algorithms (e.g., statistical, or information-
optimization algorithms), rather theoretic perspective.
than as a “bag of tricks.” • Express formally the
• Computational learning theory representational power of models
and what it tells us about the learned by an algorithm, and
theoretical limitations of various relate that to issues such as
approaches. expressiveness and overfitting.
• Notion of a hypothesis space of • Exhibit knowledge of methods to
learning outcomes and its mitigate the effects of overfitting
relationship to the expressive and curse of dimensionality in the
power of learned models. context of machine learning
• Problems related to model algorithms.
expressivity as well as availability • Provide an appropriate
of data, and techniques for performance metric for evaluating
mitigating their effects. E.g., machine learning algorithms/tools
problem of overfitting and for a given problem.
regularization techniques for • Apply appropriate empirical
mitigating effects of overfitting; evaluation methodology to assess
curse of dimensionality and the performance of a machine
feature learning algorithm/tool for a
selection/weighting/reformulation problem.
techniques for mitigating effects. • Apply appropriate empirical
• Ways to evaluate performance, evaluation methodology to
both in terms of specifying compare machine learning
objectives (e.g., predictive algorithms/tools to each other.
accuracy, cost-sensitivity, size of • Implement machine learning
learned model) and in techniques programs from their algorithmic
for measuring them. specifications.
• Methodology for evaluating the • Be aware of problems related to
model produced by a machine algorithmic and data bias, as well
learning algorithm/tool for a as privacy and integrity of data.
single problem; methodology for • Consider and evaluate the
empirically comparing algorithms possible effects -- both positive
against each other more generally. and negative -- of decisions
• Differences in interpretability of arising from machine learning
learned models. conclusions.
• Model drift over time. • Compare differences in
• Algorithmic and data bias, interpretability of learned models.
integrity of data, and professional
responsibility for fielding learned
models.
Data Mining

Data mining involves the application of machine learning and statistical techniques to extrapolate
information from data.

Data Mining

Scope Competencies
● Workflow of data mining and its ● Design data mining models for
relationship to data preparation specific data models according to
and data management applications (Model Design)
● Data mining models for a variety ● Design a data mining system,
of data models and applications including the system
● Design and analysis of data architecture, data process flow,
mining algorithms for various and data storage structure
data mining models (System Design)
● Develop efficient and scalable
data mining algorithms for
specific data models, data mining
models as well as data
management platforms
(Algorithm Development)
● Evaluate the significance and
usability of data mining results to
ensure that they may be applied
in real applications properly
(Result Evaluation)
Big Data

The term ‘Big Data’ has been coined to describe systems that are truly large. These introduce
problems of scale: how to store vast quantities of data, how to be certain the data is of high
quality, how to process that in ways that are efficient and how to derive insights that prove
useful. These matters are addressed below under the headings of problems of scale, complexity
theory, sampling, and concurrency and parallelism.

Problems of Scale

Big Data: Problems of Scale

Scope Competencies
● Approaches to storing vast ● Explain the role of the storage
quantities of data hierarchy in dealing with Big
● Ensuring clean, consistent and Data
representative data ● Demonstrate how redundancy
● Protecting and maintaining the may be removed from a Big Data
data set
● Retrieval issues ● Illustrate the role of hashing in
● Problems of computation and the dealing with Big Data
efficiency of algorithms ● Illustrate the role of filtering in
● Specific techniques used in dealing with Big Data
addressing the problems of scale

Complexity Theory

Big Data: Complexity Theory

Scope Competencies
● The notion of computational ● Explain why mathematical
complexity analysis alone is not always
● Limits to complexity sufficient in dealing with
● Evaluation of the complexity of efficiency considerations
algorithms ● Demonstrate how to evaluate the
● Selecting appropriate algorithms efficiency of an algorithm to be
used in processing Big Data
● Select algorithms appropriate to
a particular application involving
Big Data, taking account of the
problems of scale
Sampling and Filtering

Big Data: Sampling and Filtering

Scope Competencies
● The role of sampling and ● Perform sample selection for a
filtering in the processing of Big particular application involving
Data Big Data
● Benefits of sampling / filtering ● List a variety of approaches to
● Criteria to be used in guiding filtering, illustrating their use
typical sample selection

Concurrency and Parallelism

Big Data: Concurrency and Parallelism

Scope Competencies
● Concurrency and parallelism, ● Explain the limitations of
and distributed systems concurrency / parallelism in
● Limitations of parallelism dealing with problems of scale
including the overheads ● Identify the overheads associate
● Differing approaches to with parallelism in particular
addressing concurrency algorithms
● Complexity of parallel /
concurrent algorithms
Analysis and Presentation

The human computer interface provides the means whereby users interact with computer
systems. The quality of that interface significantly affects usability in all its forms and can
encompass a vast range of technologies: animation, visualisation, simulation, speech, video,
recognition (of faces, of hand writing, etc.) graphics. For the data scientist it is important to be
aware of the range of options and possibilities, and to be able to deploy these as appropriate.

Analysis and Presentation

Scope Competencies
• Importance of effectively • Explain data and inferences
presenting data, models, and made from data in oral, written,
inferences to clients in oral, and graphical formats.
written, and graphical formats. • Use standard APIs and tools to
• Visualization techniques for create visual displays of data,
exploring data and making including graphs, charts, tables,
inferences, as well as for and histograms.
presenting information to clients. • Apply a variety of visualization
• Effective visualizations for techniques to different types of
different types of data, including data. Make useful inferences /
time-varying data, spatial data, extract useful information from a
multivariate data, high- dataset using those techniques.
dimensional multivariate data, • Propose a suitable visualization
tree- or graph-structured data, design for a particular
and text. combination of data
• Knowing the audience: the client characteristics and application
or audience for a data science tasks.
project is not, in general, another • Analyze the effectiveness of a
data scientist. given visualization for a
• Human-Computer Interface particular task.
considerations for clients of data • Describe issues related to scaling
science products. data visualization from small to
large data sets.
• Be aware that the client (for an
interface or presentation) is often
not a data scientist.
• For an identified client,
undertake and document an
analysis of their needs.
Professionalism
In their technical activities, data scientists should behave in a responsible manner that brings
credit to the profession. One aspect of this is being positive and proactive in seeking to bring
benefit and doing so in a way that is responsible and ethical. Much of this is amplified in general
terms in [1]. This section below serves to highlight a number of relevant issues of specific
concern to the data scientist. A number of sub-areas are identified: continuing professional
development, communication, teamwork, economic considerations, privacy and confidentiality,
ethical issues, legal considerations, intellectual property, and on automation.

Continuing Professional Development (CPD)

The essence of a professional is being competent in certain aspects of data science. It is the
responsibility of the professional to undertake only tasks for which they are competent. There are
then implications for keeping up-to-date in a manner that is demonstrable to interested parties,
e.g. employers.

Professionalism: Continuing Professional Development

Scope Competencies
● The meaning of competency and ● Justify the importance to the
being able to demonstrate professional data scientist of
competency maintaining competence.
● Acquiring expertise / mastery or ● Describe the steps that would
extending competency; the role typically have to be taken to
of journals, conferences, courses, extend competence or acquire
webinars mastery, explaining the
● Technological change and its advantages of the latter.
impact on competency ● Argue the importance of the role
● The role of professional societies of professional societies in
in CPD and professional activity supporting career development.

Communication

There are various contexts in which the data scientist is required to undertake communication
with very diverse audiences. That communication may be oral, written or electronic. There is the
need to be able to engage in discussion about the role that data science can play, to communicate
multiple aspects of the data science process with colleagues, to convey results that may lead to
change or may provide insights. Being able to articulate the need for change and being sensitive
to the consequences are important attributes. These activities may entail the ability to have a
discussion about limitations in a certain context and to suggest a research topic.

The communication of the data scientist must be underpinned by an evidence-based approach to

decision making. There is special significance to this in the context of machine learning and
automation where the reasons for decisions may need to be clarified.
Professionalism: Communication

Scope Competencies

● Different forms of ● Evaluate an aspect of the

communication – written, oral, technical literature relevant to
electronic - and their effective data science
use ● Produce a technical document for
● The technical literature relevant colleagues to use to guide
to data science technical development
● Audiences relevant for ● Design and present a case to
communication involving the senior managers outlining a
data scientist – including small major initiative stemming from a
groups, large groups, experts and data science investigation
non experts, younger groups,
senior managers, machines – and
the elements of effective
communication in each case

Teamwork

The data scientist will often act as a member of a team. This may entail being a team leader, or
supporting the work of a team. It is important to understand the nature of the different roles as
well as the typical dynamics of teams. So in terms of teamwork the data scientist needs to be able
to collaborate not only with data scientists with different tool sets but, in general, with a diverse
group of problem solvers.

Professionalism: Teamwork

Scope Competencies
● Team selection, the need to ● Document and justify the
complement abilities and skills considerations involved is
of team members selecting a team to undertake a
● The dynamics of teams and team specific data science
discipline investigation
● Elements of effective team ● Recognise the qualities desirable
operation in the team leader for a data
science research investigation
Economic considerations

Data scientists need to be able to justify their own positions as well as the kind of activity in
which they engage.

Professionalism: Economic considerations

Scope Competencies
● The cost and value of high ● Predict the value of a particular
quality data sets, and the costs of data set to an organization,
maintenance taking into account any
● Justifying in cost terms data requirement for maintenance
science activity ● Argue the case for the data that
● Estimating the cost of projects an organization should routinely
● Promoting data science gather
● Automation ● Document the cost (in terms of
resources generally) of collecting
high quality data for a particular
purpose
● Justify or otherwise the creation
of a particular data science
activity within an organization
and quantify the cost
● Infer the value to an organization
of undertaking a particular
investigation or research project
● Document and quantify the
resources needed to carry out a
particular investigation in house
and compare this with
outsourcing the activity
● Evaluate and justify the costs
associated with the automation of
a particular activity
Privacy and Confidentiality [xref: Data Privacy, Security, Integrity]

It is possible to gain access to data in a multitude of ways, by accessing databases, using surveys
or questionnaires, taking account of conditions of access to some resource, and even with
developments such as the Internet of Things, specialized sensors, video capture and surveillance
systems. Although gaining access to all kinds of information is important, this must be done
legally and in such a way that the information is accurate and the rights of individuals, as well as
organizations and other groups, are protected.

Professionalism: Privacy and Confidentiality

Scope Competencies
● Freedom of information
● Data protection regulations ● Describe technical mechanisms
including GDPR – see [5] for maintaining the
● Privacy legislation confidentiality of data
● Maintaining the confidentiality ● Compare the privacy legislation
of data in two specific countries, and
● Threats to privacy and indicate the problems arising
confidentiality from the differences
● The international dimension ● Recognize the privacy and
confidentiality issues arising
from the use of video and face
recognition software
Ethical issues

Ethical issues are of vital importance for all involved in computing and information
activities. Such issues are captured extensively in [1]. Underpinning these is the view that a
professional should undertake only tasks for which they are competent, and even then should
carry out such tasks in a way that reflects good practice in its many forms. Maintaining or
extending competence is essential. A heightened awareness of legal and ethical issues must
underpin the work of the data scientist.

Teaching students to consider the ethical issues associated with their decisions is a very
important starting point, enabling them to recognize themselves as “independent, ethical
agents.”

Professionalism: Ethical Issues

Scope Competencies
● Confidentiality issues associated ● Demonstrate techniques for
with data and its use establishing lack of bias in a set
● The General Data Protection of data or in algorithms
Regulation (GDPR) regulation – ● Create a technical paper on an
see [5] aspect of data science for
● The need for data to be truly colleagues
representative ● Reflect on a network of
● Bias in data and in algorithms; professionals in the data science
mechanisms for checking area and outline the advantages
to be gained by joining such a
network
Legal Considerations

Computer crime has continued to increase both in volume and its severity over recent
years. This has brought disruption, even chaos, to many organizations. The threat of computer
crime cannot be ignored and steps need to be taken to counter the possibility of severe
disruption. The law has adjusted to counter these trends but this is an ongoing area of change and
adjustment.

Professionalism: Legal Considerations

Scope Competencies
● Computer crime – examples of
most relevance to data science ● Illustrate and evaluate a range of
● Cyber security mechanisms for detecting a
● Crime prevention stated form of criminal activity
● Mechanisms for detecting ● Justify the desirability of having
criminal activity, including the multiple diverse approaches to
need for diverse approaches countering threats
● Recovery mechanisms and
maintaining 100% operation
● Laws to counter computer crime
Intellectual Property (IP)

Intellectual Property rights such as copyright, patents, designs, trademarks and moral rights, exist
to protect the creators or owners of creations of the human mind; moral rights include the right to
be named as a creator of IP, and the right to avoid derogatory treatment of creations. For the
data scientist the items to be protected, in possibly different ways, include software, designs
(including GUIs), data sets, moral rights and reputation. Trade secrets may also be relevant.

Professionalism: Intellectual Property

Scope Competencies

● patents, copyrights, trademarks, ● Describe what kinds of IP are

trade secrets, moral rights relevant to the data scientist and
● what data science related IP can why
and cannot be protected, and ● Argue the difference between
what kinds of protection are patents, copyrights, designs and
available trademarks and illustrate their
● regulations related to IP, IP use in the context of data science
ownership, the territorial nature ● Describe the role of trade secrets
of IP rights including the effects in relation to data science
of international agreements (e.g. ● Illustrate the processes involved
the European Directive on trade in registering IP rights
secrets) and the issue of IP rights ● Describe and explain the issues
being time limited relating to IP ownership and
● which IP rights vest moral rights
automatically and which require ● Evaluate the risks involved in
registration, including overview using protected IP and how they
of the processes involved in may be validly overcome
acquiring registered IP rights
● the possibility of infringing the
rights of others and validly
utilizing protected IP
Change Management

One possible outcome of a data science investigation is that strategic change is needed in an
organization. The change suggested may be minimalistic at one extreme or transformational at
the other. The data scientist needs to be alert to the range of possibilities and, perhaps by
engaging other experts, be in a position to offer advice and guidance about how to move forward
with such change, to advise on the consequences and to outline and quantify the resources that
will be required to deliver on the change.

Professionalism: Change Management

Scope Competencies

● The need for strategic change, ● Justify with evidence the need
the role of simulation for strategic change within an
● Structural change, organisation and recognise the
transformational change nature of the change required
● Strategies for delivering effective (e.g. personnel change, structural
change including top-down and change, transformational change)
bottom up approaches ● Provide a set of feasible
● Resource needs associated with approaches about how to deal
change with transformational change in a
● People issues associated with given situation, to quantify the
change, managing resistance to resources needed and to highlight
change, the role of human benefits
resources and communication ● Outline a range of strategies for
● Monitoring the effectiveness of managing resistance to change
change ● Identify teamwork, leadership
● The role of automation and personnel issues associated
with change including gaining
support from management,
maintaining operation while
implementing change and
addressing loss of employment
● Recognise the issues associated
with change brought about by
automation (including ethical
issues), with possible need for
back-up
On Automation

Automation often creates concerns about loss of employment opportunities and, in general terms,
about machines behaving unreasonably; explanations from machines about their behavior may
be sought. Related issues are the subject of [3] and [6]. Automation can occur in critical
situations where serious loss may be possible, and then typically there is an expectation that
machines will operate to a code of ethics in sympathy with that of humans.

Professionalism: On Automation

Scope Competencies
● Automation, its benefits and its ● Analyze the impact on design of
justification the requirement to provide
● The particular concerns of insights into decisions made
automation in critical situations autonomously by machines
● Transparency and accountability ● Argue the benefits of automation
in algorithms in particular situations

References

[1] The ACM Code of Ethics and Professional Conduct, published by ACM on 17 July th

2018. See acm.org

[2] When computers decide: European Recommendations on Machine-Learned

Automated Decision Making, published by ACM, 2018. See europe.acm.org

[3] ACM US Public Policy Council and ACM Europe Policy Council, “Statement on
Algorithmic Transparency and Accountability,” 2017.

[4] Directive (EU) 2016/943 on protection of undisclosed know-how business

information (trade secrets) against their unlawful acquisition, use and disclosure. See eur-
lex.europa.eu June 2016.

[5] The EU General Data Protection Regulation, see www.eugdpr.org. Approved by EU

on 14 April 2016 with an implementation date of 25 May 2018.
th th

[6] Simson Garfinkel, Jeanna Mathews, Stuart S. Shapiro, Jonathan M. Smith, Towards
Algorithmic Transparency and Accountability, Communications of the ACM, September
2017,vol. 60, no. 9, page 5.
Appendix B: A Summary of Survey Responses

Here we include a subset of responses to the Academic and Industry surveys.

B.1 Academic Survey

B.1 Industry Survey
View publication stats

Iabac Edsf CFDS R2
No ratings yet
Iabac Edsf CFDS R2
61 pages
GEC 2A Readings in The Philippine History IPED PRMSU Module
No ratings yet
GEC 2A Readings in The Philippine History IPED PRMSU Module
153 pages
From Classroom To Community: Evaluating Data Science Practices in Education and Social Justice Projects
No ratings yet
From Classroom To Community: Evaluating Data Science Practices in Education and Social Justice Projects
18 pages
The Data Science Framework: Juan J. Cuadrado-Gallego Yuri Demchenko
No ratings yet
The Data Science Framework: Juan J. Cuadrado-Gallego Yuri Demchenko
202 pages
Literatures in English - C.a.P.E Past Paper
57% (7)
Literatures in English - C.a.P.E Past Paper
4 pages
Th. F. Stcherbatsky - Buddhist Logic - in Two Volumes. 1-Dover (1962)
100% (1)
Th. F. Stcherbatsky - Buddhist Logic - in Two Volumes. 1-Dover (1962)
573 pages
Heal Your Body AZ The Mental Causes For Physical Illness and The Way To Overcome Them Official Test Bank
No ratings yet
Heal Your Body AZ The Mental Causes For Physical Illness and The Way To Overcome Them Official Test Bank
403 pages
Branches of OfPhilosophy
No ratings yet
Branches of OfPhilosophy
23 pages
Lesson Plans For Multigrade Classes Grades 1and 2 Learning Area: MTB Tagalog Quarter: 1 Week: 8
No ratings yet
Lesson Plans For Multigrade Classes Grades 1and 2 Learning Area: MTB Tagalog Quarter: 1 Week: 8
36 pages
First Principles Thinking Manual by FIPS
No ratings yet
First Principles Thinking Manual by FIPS
60 pages
LE Template
No ratings yet
LE Template
382 pages
ICT Project Management: Framework for ICT-based Pedagogy System: Development, Operation, and Management
From Everand
ICT Project Management: Framework for ICT-based Pedagogy System: Development, Operation, and Management
Suman Ahmmed
No ratings yet
Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data
From Everand
Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data
EMC Education Services
No ratings yet
IDSSP Frameworks 1.0 2
No ratings yet
IDSSP Frameworks 1.0 2
96 pages
OI WhitePaper Albrecht
No ratings yet
OI WhitePaper Albrecht
17 pages
Intersection Mathematics Data Statistics Computing
No ratings yet
Intersection Mathematics Data Statistics Computing
115 pages
Edu3083 Topic 5 Teacher - S Role in Malaysia Primary School
No ratings yet
Edu3083 Topic 5 Teacher - S Role in Malaysia Primary School
22 pages
AI Project Cycle Class 10 Notes
No ratings yet
AI Project Cycle Class 10 Notes
7 pages
ClassXI DS Teacher Handbook
No ratings yet
ClassXI DS Teacher Handbook
114 pages
Curriculum Reform in Big Data Education at Applied Technical Colleges and Universities in China
No ratings yet
Curriculum Reform in Big Data Education at Applied Technical Colleges and Universities in China
11 pages
IELTS Academic Writing Task 2 May Aug 2022 @actual IELTS Test
No ratings yet
IELTS Academic Writing Task 2 May Aug 2022 @actual IELTS Test
171 pages
Preparing Data Science Career
No ratings yet
Preparing Data Science Career
41 pages
"Big Data Science" Basic Concepts and Applications
From Everand
"Big Data Science" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
Data Science: A Comprehensive Overview: General and Reference
No ratings yet
Data Science: A Comprehensive Overview: General and Reference
42 pages
Data Science in Context V.99 Web Beta
No ratings yet
Data Science in Context V.99 Web Beta
293 pages
EASEAI 2021 Paper 20
No ratings yet
EASEAI 2021 Paper 20
4 pages
Investigating Data Like A Data Scientist: KEY Practices AND Processes
No ratings yet
Investigating Data Like A Data Scientist: KEY Practices AND Processes
23 pages
Edison - CF Ds Release2 v08 - 0
No ratings yet
Edison - CF Ds Release2 v08 - 0
59 pages
EDISON Data Science Framework EDSF Addressing Demand For Data Science and Analytics Competences For The Data Driven Digital Economy
No ratings yet
EDISON Data Science Framework EDSF Addressing Demand For Data Science and Analytics Competences For The Data Driven Digital Economy
6 pages
De VEAUX - Curriculum Guidelines For Undergraduate Programes in Data Science
No ratings yet
De VEAUX - Curriculum Guidelines For Undergraduate Programes in Data Science
18 pages
EDISON Data Science Framework (EDSF)
No ratings yet
EDISON Data Science Framework (EDSF)
49 pages
Data Science Curriculum: Current Scenario
No ratings yet
Data Science Curriculum: Current Scenario
13 pages
LL LL LLLLL LLLLL
No ratings yet
LL LL LLLLL LLLLL
39 pages
Iabac Edsf MCDS R2
No ratings yet
Iabac Edsf MCDS R2
76 pages
1 s2.0 S2666764921000035 Main
No ratings yet
1 s2.0 S2666764921000035 Main
6 pages
Three Frameworks For Data Literacy
No ratings yet
Three Frameworks For Data Literacy
14 pages
What Should We Teach (Recomendado)
No ratings yet
What Should We Teach (Recomendado)
11 pages
DS - Module 1
No ratings yet
DS - Module 1
57 pages
Artificial Intelligence Notes
No ratings yet
Artificial Intelligence Notes
40 pages
BERMAN - Realizing The Potential of Data Science
No ratings yet
BERMAN - Realizing The Potential of Data Science
6 pages
Introduction To Data Science - Module 1
No ratings yet
Introduction To Data Science - Module 1
4 pages
Education Redefined Series
No ratings yet
Education Redefined Series
15 pages
The Data Science Life Cycle:: A Disciplined Approach To Advancing Data Science As A Science
No ratings yet
The Data Science Life Cycle:: A Disciplined Approach To Advancing Data Science As A Science
9 pages
A Guide To Teaching Data Science (Hicks)
No ratings yet
A Guide To Teaching Data Science (Hicks)
45 pages
Perspective On Data Science
No ratings yet
Perspective On Data Science
20 pages
On The Spatial Graph
No ratings yet
On The Spatial Graph
14 pages
Data Science Approach For Simulating Educational Data: Towards The Development of Teaching Outcome Model (TOM)
No ratings yet
Data Science Approach For Simulating Educational Data: Towards The Development of Teaching Outcome Model (TOM)
18 pages
UWE UOP Module Handbook 2018-2019
No ratings yet
UWE UOP Module Handbook 2018-2019
13 pages
Data Science Trends Perspectives and Prospects
No ratings yet
Data Science Trends Perspectives and Prospects
22 pages
Data Science R
No ratings yet
Data Science R
4 pages
Your Future in Data - Resources To Explore - EN
No ratings yet
Your Future in Data - Resources To Explore - EN
2 pages
HUM021 Reviewer
No ratings yet
HUM021 Reviewer
10 pages
Data Science in Education
No ratings yet
Data Science in Education
3 pages
Curriculum Guidelines For Undergraduate Programs in Data Science
No ratings yet
Curriculum Guidelines For Undergraduate Programs in Data Science
16 pages
Emerging Chapter 2
No ratings yet
Emerging Chapter 2
26 pages
Unit 1
No ratings yet
Unit 1
25 pages
Curriculum Development For Fair Data Stewardship 3a3grn6u
No ratings yet
Curriculum Development For Fair Data Stewardship 3a3grn6u
22 pages
Learning Difficulties in Building Measurement
No ratings yet
Learning Difficulties in Building Measurement
8 pages
A Guide To Teaching Data Science PDF
No ratings yet
A Guide To Teaching Data Science PDF
26 pages
AI Project Cycle Class 10
No ratings yet
AI Project Cycle Class 10
11 pages
Asm - Artificial Intelligence - 129649
No ratings yet
Asm - Artificial Intelligence - 129649
13 pages
Unit-5 Knowledge Management
No ratings yet
Unit-5 Knowledge Management
55 pages
Joint Position Statement Data Science
No ratings yet
Joint Position Statement Data Science
3 pages
5042-Article Text-8105-1-10-20190709
No ratings yet
5042-Article Text-8105-1-10-20190709
2 pages
Handbook Introduction of Data Science AY 23-24
No ratings yet
Handbook Introduction of Data Science AY 23-24
171 pages
Data Science Comprehensive Overview
No ratings yet
Data Science Comprehensive Overview
42 pages
Data Science A Systematic Treatment
No ratings yet
Data Science A Systematic Treatment
11 pages
PDF Data Science
No ratings yet
PDF Data Science
7 pages
A Concept Data Science Framework For Libraries
No ratings yet
A Concept Data Science Framework For Libraries
18 pages
A General Introduction To EDISON Data Science Framework (EDSF) (V1.3)
No ratings yet
A General Introduction To EDISON Data Science Framework (EDSF) (V1.3)
5 pages
Term Paper of Political Science
100% (1)
Term Paper of Political Science
8 pages
Towards Methods For Systematic Research On Big Data
No ratings yet
Towards Methods For Systematic Research On Big Data
10 pages
The Natural Law Theory
No ratings yet
The Natural Law Theory
3 pages
Data Science: An Action Plan For Expanding The Technical Areas of The Field of Statistics
No ratings yet
Data Science: An Action Plan For Expanding The Technical Areas of The Field of Statistics
6 pages
Applied Data Science in Europe: Challenges For Academia in Keeping Up With A Highly Demanded Topic
No ratings yet
Applied Data Science in Europe: Challenges For Academia in Keeping Up With A Highly Demanded Topic
2 pages
ST Mee S: Ae: Ss Spsseseetsstiess
No ratings yet
ST Mee S: Ae: Ss Spsseseetsstiess
544 pages
Basic Concepts in Medical Informatics
No ratings yet
Basic Concepts in Medical Informatics
6 pages
RISE - Pages
No ratings yet
RISE - Pages
12 pages
PMP Practice Exam 2
No ratings yet
PMP Practice Exam 2
41 pages
DLP Science 4
No ratings yet
DLP Science 4
10 pages
MS4 AT3 Scott Blackwood
No ratings yet
MS4 AT3 Scott Blackwood
7 pages
IGNOU MCA Data Science and Big Data Previous Years Unsolved Papers MCS 226
From Everand
IGNOU MCA Data Science and Big Data Previous Years Unsolved Papers MCS 226
Manish Soni
No ratings yet
The Impacts of Visual Aids in Promoting The Learni
No ratings yet
The Impacts of Visual Aids in Promoting The Learni
12 pages
EDT 5131 Task1, Essay Teaching
No ratings yet
EDT 5131 Task1, Essay Teaching
7 pages
Data Science Mastery: From Beginner to Expert in Big Data Analytics
From Everand
Data Science Mastery: From Beginner to Expert in Big Data Analytics
Kameron Hussain
No ratings yet
Replicationandscalingupchp - 10.1007 - 978 94 007 6652 5 - 21
No ratings yet
Replicationandscalingupchp - 10.1007 - 978 94 007 6652 5 - 21
16 pages
5-6 Unit 7 Heart and Pulse v2
No ratings yet
5-6 Unit 7 Heart and Pulse v2
7 pages
Primary Education Bachelors Degree
No ratings yet
Primary Education Bachelors Degree
9 pages
The Three Domains For Dialogue
No ratings yet
The Three Domains For Dialogue
12 pages
Applications of Classical Physics R Blandford K Thorne Instant Download
No ratings yet
Applications of Classical Physics R Blandford K Thorne Instant Download
35 pages