0% found this document useful (0 votes)
79 views6 pages

Non-Functional Requirements For Machine Learning: Challenges and New Directions

This document discusses challenges in applying traditional non-functional requirements (NFR) knowledge and techniques to machine learning (ML) systems. It argues that the meaning and interpretation of many NFRs change in an ML context. Additionally, the way ML systems are designed, implemented and maintained differs from traditional software. As such, existing NFR methods may need to be adapted or new techniques developed to address NFRs for ML systems. The document outlines open research questions and proposes a research agenda on this topic.

Uploaded by

Nahian Al Sabri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views6 pages

Non-Functional Requirements For Machine Learning: Challenges and New Directions

This document discusses challenges in applying traditional non-functional requirements (NFR) knowledge and techniques to machine learning (ML) systems. It argues that the meaning and interpretation of many NFRs change in an ML context. Additionally, the way ML systems are designed, implemented and maintained differs from traditional software. As such, existing NFR methods may need to be adapted or new techniques developed to address NFRs for ML systems. The document outlines open research questions and proposes a research agenda on this topic.

Uploaded by

Nahian Al Sabri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Non-Functional Requirements for Machine

Learning: Challenges and New Directions


Jennifer Horkoff
Chalmers and the University of Gothenburg
[email protected]

Abstract—Machine Learning (ML) provides approaches which well understood. However, when the software solution involves
use big data to enable algorithms to “learn”, producing outputs ML, some of our knowledge about NFRs may no longer
which would be difficult to obtain otherwise. Despite the advances apply. Fundamentally, the way in which we ‘design’, ‘run’, and
allowed by ML, much recent attention has been paid to certain
qualities of ML solutions, particularly fairness and transparency, ‘maintain’ ML-based solutions differs. The broad question of
but also qualities such as privacy, security, and testability. From how SE methods and procedures can be adapted for ML-based
a requirements engineering (RE) perspective, such qualities are solution development is already starting to be considered in
also known as non-functional requirements (NFRs). In RE, the venues such as the SEMLConf [9]. Here we focus particularly
meaning of certain NFRs, how to refine those NFRs, and how on methods for NFRs.
to use NFRs for design and runtime decision making over
traditional software is relatively well established and understood. In particular, the nature of ML means that the meaning
However, in a context where the solution involves ML, much of of many NFRs for ML solutions differs compared to regular
our knowledge about NFRs no longer applies. First, the types of software, and these NFRs are often not well understood (e.g.,
NFRs we are concerned with undergo a shift: NFRs like fairness what is fairness? [10]). What does it mean for an ML-enabled
and transparency become prominent, whereas other NFRs such system to be maintainable? Are NFRs such as compatibility
as modularity may become less relevant. The meanings and
interpretations of NFRs in an ML context (e.g., maintainability, and modularity still relevant? Some NFRs may have reduced
interoperability, and usability) must be rethought, including importance for ML solutions compared to typical software. On
how these qualities are decomposed into sub-qualities. Trade- the other hand, NFRs such as fairness [2] and transparency [3]
offs between NFRs in an ML context must be re-examined. have become critical from an ML perspective, whereas previ-
Beyond the changing landscape of NFRs, we can ask if our ous NFR work has not typically emphasized these dimensions.
known approaches to understanding, formalizing, modeling, and
reasoning over NFRs at design and runtime must also be Further, as-yet-unexplored NFRs such as “retrainability” may
adjusted, or can be applied as-is to this new area? Given these also become relevant.
questions, this work outlines challenges and a proposed research The complexity of NFRs has long been managed by re-
agenda for the exploration of NFRs for ML-based solutions. finement, e.g., security is typically refined to confidentiality,
Index Terms—Non-Functional Requirements, NFRs, qualities, integrity, etc. Not only may the meaning of certain NFRs
Machine Learning, Requirements Engineering
change in an ML context, but the refinements may also need
to be rethought and updated. In typical NFR research, we are
I. I NTRODUCTION
aware of common quality trade-offs, often called conflicting
Machine Learning (ML) describes a computational ap- NFRs [7], e.g., security and performance. But recent work is
proach which uses large amounts of data to enable algorithms only just beginning to explore quality trade-offs in the ML
to “learn”, performing tasks which are difficult to achieve space [2]. Do known trade-offs still apply in the case of ML?
via standard software. This enables solving difficult problems Do new trade-offs exist?
such as recognizing images, diagnosing cancer, and estimating In a traditional system, one can collect and implement many
insurance [1]. Despite the advances allowed by ML, recently functional requirements (FRs). The overall function or purpose
much attention has been paid to certain qualities of ML of an ML application is much more focused; e.g., recognize
solutions, particularly fairness [2], but also transparency [3], a face or diagnose a disease. Thus, there are far fewer FRs,
security [4], privacy [5], and testability [6]. and ML research has focused on the NFRs associated directly
Much existing work has been devoted to understanding, with those key FRs, e.g., accuracy of facial recognition, perfor-
decomposing, managing, formalizing and reasoning over qual- mance of diagnosis. Because an ML application has few FRs,
ities of typical non-ML software. Such qualities are often one can argue that the effective satisfaction of NFRs becomes
included as part of non-functional requirements (NFRs). These particularly critical. However, in practice, ML implementations
include well-studied NFRs such as performance, reliability, will be integrated with more standard software as part of larger
maintainability, and usability, but also security, privacy, and and more complex systems (e.g., in a self-driving car), which,
customer satisfaction [7], [8]. as a whole, will have many complex FRs and NFRs.
From the perspective of traditional software, the meaning of In this paper, we consider whether traditional knowledge
certain qualities, how to refine those qualities, and how to use about NFRs and quality from a requirements perspective can
such qualities for design and runtime reasoning is relatively apply to ML-based systems. We can view this knowledge
from two dimensions: 1) knowledge of NFRs, i.e., what Languages and Reasoning. Much work focuses on captur-
are common and important NFRs, how are they interpreted, ing NFRs in visual modeling languages, sometimes with an
refined, measured, and how they can conflict, and 2) methods underlying metamodel and semantics, facilitating (semi-) auto-
for NFRs, i.e., catalogues of NFRs [8], modeling methods mated qualitative and quantitative methods to support decision
like the NFR framework [7], methods to reason over NFRs, making, e.g., [7], [11], [12]. Usually, approaches allow users
e.g., [11], [12], to use NFRs to monitor software, e.g., [13], and to use NFRs to select among possible alternative functional
drive software adaptation and evolution, e.g., [14], [15], etc. requirements, e.g., given FRs and NFRs, many of which are
In this work, we make the argument that the first dimension in conflict, which requirements should we implement?
(1) must be at least partially re-thought in light of the rise of Runtime, Adaptation, and Evolution. NFR approaches
ML. Many of the ideas and techniques considering NFRs for were extended to consider a requirements-based view of
traditional software (2) may still be valid, but it is possible that runtime system operation, where functional and quality re-
techniques may need revamping in light of this new paradigm, quirements could be monitored at runtime, based on data from
or that new, completely novel techniques are needed. the running system, e.g., [13], [20]. Work in this area went
The next section outlines the state-of-the-art, followed by an further to consider requirements-based runtime adaptation,
illustrative and motivating ML example. Research challenges e.g., a certain quality aspect is not sufficiently satisfied at
and an agenda are outlined, followed by a discussion and runtime, thus the system will evolve and adjust to try to gain
consideration of future work. better performance or quality, all while considering quality
trade-offs [14], [15].
II. S TATE - OF - THE -A RT Linking Data to Quality. A related line of work uses an
adaption of common requirements notations to link business
A. NFRs in Requirements Engineering
data to organizational goals, including qualities [21], allowing
Requirements Engineering (RE) research has long made the for continuous goal-based business intelligence. More recent
argument that eliciting and considering NFRs is critical for work focuses on the design of data analytic systems for
the success of systems [7]. Such systems could be technically business, which may include ML algorithms [22], [23]. This
sound, but fail due to issues in quality. Such an argument is work focuses on finding designs which fit domain-specific
particularly relevant for ML solutions, whose effectiveness lies analytic questions, considering aspects of quality performance
mainly in the quality of the outcomes they provide. for various ML options. In this case, the authors adapt existing
What is an NFR? Simply, an NFR is any quality or at- RE languages to consider data analytics at the syntax level (no
tribute which is non-functional. This broad definition, defining formal semantics or metamodel), and they use existing analysis
something critical in terms of what it is not, is not ideal, as has procedures without modification.
been discussed by several authors, e.g. [16], [17]. Our purpose
here is not to define NFRs in a satisfactory way, but to explore B. Qualities for Machine Learning.
their application to ML. The concept of quality has had better ML encompasses over a dozen algorithm types (e.g., Re-
luck in terms of a precise definition, being covered by several gression, Bayesian, Instance-based, Deep Learning, Neural
prominent ontologies, e.g., DOLCE [18]. More recent work in Networks), with many more specific algorithms (e.g., Lo-
RE uses ideas from DOLCE to treat NFRs as qualities over an gistic Regression, Linear Regression, Naı̈ve Bayes, Nearest
entity [19], usually a functional requirement, the system, or a Neighbor) [1]. Most work on ML topics provide examples
system component, e.g., “send mail (entity) quickly (quality)” and algorithm details, including performance results, but do
or “the system (entity) should be secure (quality)”. not focus on a wide range of NFRs or quality aspects. We
Although qualities of ML solutions and NFRs for ML summarize a selection of current work considering NFRs for
solutions are similar, technically one can think of NFRs as ML in the following.
the requirements over the quality, e.g., the quality is usability Accuracy & Performance. Most ML work reports on
of system X, while the NFR is “System X must be usable”, algorithm accuracy (often precision and recall), i.e., how
which ideally should be defined in a measurable way, e.g., “correct” the output is compared to reality. Further work looks
“90% of test users would rate the system as an 8/10 in terms more broadly at algorithm performance (e.g., [24]), including
of usability”. Although there is a distinction, for simplicity, in comparisons of performance in specific contexts (e.g., [25]).
this work we treat NFRs and qualities as synonyms. Fairness. Recent work has focused on technical solutions
NFR Catalogues. To facilitate a consideration of NFRs, to make ML algorithms more fair, finding that the removal of
catalogues of software qualities were created. For example, sensitive features (e.g., race, gender) is not sufficient to ensure
the ISO/IEC 25010 standard divides system/software product fair results, and considering the trade-off between fairness
quality into eight categories, including performance efficiency, and other NFRs [2]. Work in this area has attempted to find
compatibility, usability, and security [8]. Each quality is further mathematical or formal definitions of fairness, e.g. statistical
decomposed; e.g., compatibility is refined into co-existence parity, individual fairness, and has found that the accurate
and interoperabilty. Such catalogues provide iterative refine- implementation of fairness depends more on how fairness
ment of NFRs into sub-qualities, possibly sub-sub-qualities, is defined and measured than how it is implemented [26].
sometimes down to measurable indicators, when possible. Empirical work has asked practitioners about their needs for
fairness in ML, finding that in practice, engineers want to
consider the side effects of fairness and see fairness in the Classification Regression
Novelty
Detection ...
context of the broader system [27]. Problem Type
OR
Transparency. Although the results of ML can have signif- Binary
Multi-
class
icant real-world impact, it is often not clear how these results
are derived, causing issues in trust and transparency. Work has Algorithm Supervised Unsupervised ...
begun to look at better explaining ML results [3], [28] to try Characteristics
OR
to mitigate this issue. Semi- Active Reinforcement
Security & Privacy. Efforts have been made to address pri-
vacy concerns when using big (often personal) data to facilitate Algorithm Regression Bayesian
Instance-
based
ML. Work in [4] introduces protocols for preserving privacy Type
OR OR OR ...
in various ML approaches, and explicitly acknowledges the Algorithm Logistic Linear ... Naive ... Nearest
neighbour
...
trade-off in terms of algorithm speed when revising techniques Assumptions and
A
for privacy. Similarly, Bonawitz et al. introduce a method to Optimizations
O O

preserve privacy in ML, focusing on keeping the overhead ...


in terms of runtime and communication low [5]. Papernot et Qualities Implementation
al. recognize the increase in ML-related security and privacy
Accuracy Storage Time
Complexity ...
threats and create a threat model for ML [29].
Testability. Work exists which considers systematically
testing the outcome of ML systems (e.g., [6]). However, the Fig. 1. Example Solution Space Representation evaluating Nearest Neighbor
against Relevant Quality attributes (Simplified and Incomplete)
majority of work focuses on the other direction, applying ML
to improve software testing strategies (e.g., [30]).
Reliability. Further work has considered reliability in ML, using layers to separate different ML concepts in Fig. 1.
e.g., looking at the reliability of individual ML predictions, We decompose available algorithms by problem type (clas-
focusing on reliability estimation [31]. sification, regression, etc.), then by algorithm characteris-
Other NFRs for ML, such as sustainability or maintainabil- tics (supervised, unsupervised, etc.), and by algorithm types
ity, have not seen significant attention. In most cases, one can (regression, Bayesian, etc.). The hierarchy and classification
find work applying ML techniques for prediction of the NFR, is yet incomplete, with the dimensions of incompleteness
e.g., to predict maintainability [32], but not work considering indicated by ellipses (...). We add a simple syntax addition
the NFR as it applies to ML. Similarly, efforts like the AIRE to capture algorithm assumptions (A) and optimizations (O),
workshop series focuses on applying AI to RE, and typically e.g., the Naı̈ve Bayesian algorithm assumes that the probability
not the other direction. of all relevant events are independent, and optimizations such
From a broader perspective, there are efforts to apply SE as Laplace Smoothing can be applied to account for unseen
techniques to the application of ML [9], with a focus on observations. The Nearest Neighbor algorithm can apply op-
reliability, testing and evolution. As far as we are aware, there timizations such as data projection or weighting the distances
is no unified collection or consideration of many NFRs for between neighbors.
ML, including a consideration of ML-specific quality trade-off We can collect information on quality performance from ex-
data. Current work consists of only individual considerations isting sources. In some cases, such data is available generally;
of specific quality trade-offs, e.g., privacy vs. processing e.g., Nearest Neighbor without any optimizations has a very
time. Similarly, we are not aware of approaches for explicitly high running time, comparatively, but this decreased with a
monitoring ML implementations at runtime, or considerations data projection optimization [1]. Other comparisons are more
of what exactly runtime monitoring may mean in this context. contextual, but often provide more specific findings; e.g., when
applied to identify spam email, a variant of Nearest Neighbor
III. NFR S FOR M ACHINE L EARNING E XAMPLE has a high accuracy (∼97%) [33]. We can use existing analysis
As a concrete example of an ML solution and its associated procedures such as [12] to evaluate an option in the model,
qualities, consider an airport which may screen passengers such as Nearest Neighbour (in dark purple), against relevant
against images of people of interest; here a precise match is qualities (bottom row). In this model, the evaluated algorithm
desirable, as the cost of misidentification is relatively high, solution, Nearest Neighbour, may be suitable for the airport
yet the processing time may be relatively slow (e.g., 20 recognition case due to its high accuracy and lower speed.
seconds), given the time one takes to get through security. Although this example helps to illustrate the use of NFR-
These desired quality requirements should be captured and based RE techniques for ML understanding and selection,
considered through the solution lifetime. many questions remain. Here we stick to well-known and
In order to understand what ML solutions may meet our easily measurable qualities such as accuracy and speed (Time).
quality requirements, we can attempt a relatively straightfor- What if we had included fairness, transparency or security?
ward application of existing frameworks for NFR modeling, Here we show only one level of qualities, but how do these
such as the NFR Framework [7] or iStar [11]. We do so qualities decompose into sub-qualities? How can these qual-
ities be decomposed to measurements? We have also found between components indirectly measure modularity, or number
some information linking the technical space to quality per- of runtime errors measure reliability. Some of these metrics
formance, but this information is scattered and has significant can be applied as-is to ML solutions, while others must be
gaps. The notion of trade-offs here (e.g., accuracy vs. time) reconsidered and rethought in an ML context, based on revised
is covered only implicitly, and we are similarly lacking data definitions of qualities, e.g., modifiability. Simple measure-
to feed into models of this type. Finally, this model considers ments like “average time to implement a change to the code”
only design-time decisions, not covering runtime monitoring, will no longer apply.
or re-design or evolution in the face of change. We outline C4. We need to understand the effects of ML algorithms on
these challenges further in the next section. desired qualities not only during ML solution design, but at
runtime – during the lifetime of the ML solution.
IV. C HALLENGES
In an ML context, runtime monitoring presents new chal-
Inspired by our example, and given our understanding of lenges, understanding how to measure relevant qualities such
ML and of related work, we outline challenges related to as fairness or privacy, as ML-supported decisions are made
treatment of NFRs for ML. The first four challenges relate to in practice. This is particularly challenging due to the nature
knowledge of NFRs (1), while the last three focus on methods of ML implementations, as transparency concerning the inner
(2). We acknowledge that this initial list of challenges may be workings of such algorithms is an open challenge.
incomplete.
C5. ML researchers and users currently lack an ML-specific
C1. Our understanding of NFRs for ML is fragmented and way to express and specify quality requirements for ML,
incomplete, including how to define and refine NFRs in ML- including targets and trade-offs, and the influence of domain
specific contexts. Take fairness as an example. The general context.
public are beginning to realize that ML algorithms may have
Although much of existing NFR-aware approaches can
an influence on critical decisions, and that such decisions
be reused in an ML context (see our Section III example),
may enforce unintended biases towards various vulnerable
we anticipate that the rise of ML will reveal a variety of
groups [34]. ML researchers have heard the call, and are
new concepts and challenges (e.g., training data, retraining,
working to find technical solutions to account for fairness
optimizations, networks, supervision) that will change the
(e.g., [2]). But what is fairness in these cases? How can it be
content and design of languages capturing NFRs for ML
effectively defined in ways understandable by ML? Does this
systems. Similarly, approaches supporting decision making
definition change depending on the ML approach used (e.g.,
using these languages will have to be adapted, not only as the
linear regression, neural networks). What type of fairness is
underlying concepts are updated, but as the type of decisions
needed? How does this change depending on the domain and
and space of alternatives differ. E.g., the question is no longer
context? We can ask similar questions for all of these qualities
“which requirements do I implement?”, but “which algorithm
in an ML context, e.g., what constitutes maintainability of a
type, with what characteristics, assumptions, training data, and
ML solution? Portability? Usability? Similarly, how are these
optimizations do I use?”
qualities refined in an ML-context, and does this refinement
echo the non-ML cases? C6. We need to understand how evolution, both in terms of
C2. Our knowledge of how various ML algorithms, along available training data, and in terms of quality requirements
with their optimizations and assumptions, affect relevant ML and thresholds, may affect our ML solutions. How do we use
qualities, including trade-offs among qualities, is incomplete our knowledge of ML quality performance to understand when
and fragmented. to re-train? When to modify our solutions?
The new meanings, refinements, and contexts of NFRs as If we can understand how to define, refine and measure
applied to ML mean that our known space of NFR conflicts quality of ML solutions, monitoring quality over ML-based
will have to be readjusted. Recent work has already begun to systems at runtime, we can understand when quality thresholds
explore conflicts between specific NFRs in ML, e.g., fairness reach a point such that changes must be made. This can
trade-offs can include performance or accuracy [2], privacy- be changes in terms of available data, or changes in the
preserving ML involves trade-offs with algorithm speed [4], requirements, particularly quality requirements or permissible
but further work may find trade-offs in other, unanticipated quality levels. Existing approaches for software evolution for
quality dimensions such as reusability or testability. How non-ML systems will need to be revisited and updated for an
does one balance trade-offs between these critical qualities ML context.
in ML solutions? Such questions will become pressing as ML C7. We need to understand how ML-based solutions inte-
becomes more widespread. grate with typical software from a quality perspective.
C3. Given our new understanding of the meaning and Increasingly, data scientists and software engineers must
refinements of NFRs, we must also understand how NFRs can work together to produce long-term, holistic, complex software
be measured in practice for ML-based solutions. solutions which include ML components. Thus, NFRs must be
In typical SE, work on software metrics has established considered not only over ML components, with their unique
a large body of possible metrics, linked to desired system quality definitions, refinements, and trade-offs, but also over a
quality, measurable at design or runtime, e.g., cross-references combination of ML solutions and typical software.
V. R ESEARCH D IRECTIONS Obj4. Develop a language and representation for expressing
NFRs specifically for ML solutions and required concepts.
Given the challenges outlined in the previous section, here
Develop conceptual underpinning. Based on the surveys
we outline a number of research objectives and plans which
conducted previously, researchers can produce an ontology
may address these challenges. Given the size of the problem,
of concepts needed for expressing NFRs over ML. This can
there are a wide variety of possible approaches which may
be based on existing quality and NFR ontologies (e.g., [19]),
address these challenges. Our suggestion is one of many pos-
but adapted and adjusted for ML-specific concepts and needs
sible approaches, and may be expanded as further challenges
as based on the findings of Obj1. For comprehension and
arise.
extension, the outcome can be captured as a visual metamodel.
Obj1. Explore and define NFRs for ML, including possible
interpretations and refinements of prominent NFRs in a variety Develop semantics. A formal semantics should be developed
of exemplar contexts. for the concepts, as appropriate. This will facilitate precise
Survey ML literature for NFRs. There is a need for one or modeling and reasoning as in many existing RE approaches,
more systematic literature reviews to find NFR-related defi- and should be inspired by existing guidelines such as [35].
nitions and refinements for ML. This should include relevant Develop graphical and textual syntax. Work in [22], [23] has
contextual information (e.g., the type of problem solved, the already made progress in this subgoal. In order to be usable,
algorithm applied, and optimizations) which may affect the language instance models should be read/writable in a variety
nature of the NFR decomposition and definition. For example, of forms, including text, tables and graphical syntax, using
sustainability may have a different meaning when applying principles, for example, from the Physics of Notation [36].
ML in a medical domain as compared to application for Obj5. Develop methods to reason over NFRs for ML at
autonomous driving. design time, making appropriate trade-offs to inform imple-
Ask ML experts about NFRs. One could use empirical mentation decisions, and a runtime, continually monitoring
methods like surveys or interviews to derive knowledge from achievement of critical qualities.
ML experts or researchers, asking questions like: “what are Develop reasoning. At system design time, we want to
important qualities for ML solutions? In what context? How select ML solutions which make a good context-based trade-
are they refined?” Etc. off between desired NFRs. Using the concepts and semantics
Consider Existing NFR Refinements. Taking sources such developed thus far, and inspired by many existing approaches,
as [8] or various NFR catalogues, consider if and in what one can develop procedures to allow for interactive and auto-
contexts such NFRs and their refinement may apply to ML. mated trade-off analysis, helping users to design an ML solu-
This may also involve elicitation from ML experts, asking tion which adequately accounts for NFRs. When accurate and
them to confirm or deny the applicability of established meaningful design-time data is available, approaches should
knowledge to ML. Via these efforts, NFR ML would be favor quantitative reasoning, providing numerical information
derived bidirectionally, from ML to RE and back. about NFR achievement. A mix of quantitative and qualitative
Obj2. Using the output of Obj1, create a catalogue of NFRs reasoning can be supported, as in [21], for example, to deal
for ML. with incomplete data.
Build NFR for ML Catalogue. The SLRs conducted in Develop runtime reasoning. One can make use of the
Obj1 will have produced much information in terms of NFRs measurements collected, along with the reasoning procedures
for MLs, and this can be captured in a publicly available developed, to facilitate runtime monitoring and reasoning over
catalogue. The catalogue should be made available online in ML qualities, i.e. to understand how sub- and super-qualities
a format that allows feedback and comments. are satisfied as the ML solution runs.
Link to empirical knowledge. The empirical findings found Obj6. Create methods to deal with changing data and
as part of Obj1 can be captured in the catalogue, including changing quality requirements, including triggers based on
links to sources and data concerning quality performance of measurement thresholds, and a process for periodically re-
particular ML approaches in certain contexts. evaluating ML quality needs. Recent work concerning soft-
Crowd sourcing. In order to improve the completeness ware adaptation and evolution can feed into the satisfaction of
and accuracy of the catalogue, it may be possible to collect this objective.
feedback via crowd sourcing, e.g., to ML-related conferences Methods for changing NFRs. When quality requirements
and groups. The output can be used not only for research but change there must be procedures and methods in place to
educational purposes. guide changes to the ML-based systems. This may include
Obj3. Collect operationalizations and measures of NFRs for eliciting thresholds and deciding when and if to make changes,
ML refinements, when possible. depending on desired and current quality values.
Focus on possible measurements of ML qualities, including Methods for changing data. As the sources, quality, and
potential contextual variance of measurements. In addition to content of data available for ML changes, methods are needed
searching through ML-related literature, one should explore to guide system changes. This includes when to change data
relevant literature in software quality and quality metrics sources, or perhaps ML algorithm choices, and how often to
which may apply to ML NFRs. retrain based on current data.
VI. D ISCUSSION AND C ONCLUSIONS [14] V. E. S. Souza, A. Lapouchnian, K. Angelopoulos, and J. Mylopoulos,
“Requirements-driven software evolution,” Computer Science-Research
Although our agenda is preliminary, to our knowledge this and Development, vol. 28, no. 4, pp. 311–329, 2013.
is the first RE-oriented broad consideration of many types of [15] F. Dalpiaz, P. Giorgini, and J. Mylopoulos, “Adaptive socio-technical
qualities over ML solutions. We aim to bring NFRs for ML systems: a requirements-based approach,” Requirements engineering,
vol. 18, no. 1, pp. 1–24, 2013.
to the forefront, facilitating early consideration, definition, and [16] M. Glinz, “On non-functional requirements,” in 15th IEEE International
trade-off analysis, raising awareness over ML performance in Requirements Engineering Conference (RE 2007). IEEE, 2007, pp. 21–
terms of such qualities, and understanding how this informa- 26.
[17] L. Chung and J. C. S. do Prado Leite, “On non-functional requirements
tion will lead to necessary changes. in software engineering,” in Conceptual modeling: Foundations and
The presented agenda opens significant opportunity for fu- applications. Springer, 2009, pp. 363–379.
ture work. The quality findings and catalogue will reveal gaps [18] A. Gangemi, N. Guarino, C. Masolo, A. Oltramari, and L. Schneider,
“Sweetening ontologies with dolce,” in International Conference on
in ML knowledge, particularly defining and refining quality Knowledge Engineering and Knowledge Management. Springer, 2002,
characteristics for ML algorithms, as well as empirical data pp. 166–181.
reporting on quality trade-offs. This can lead to comparative [19] F.-L. Li, J. Horkoff, J. Mylopoulos, R. S. Guizzardi, G. Guizzardi,
A. Borgida, and L. Liu, “Non-functional requirements as qualities, with
empirical work evaluating ML algorithms against different a spice of ontology,” in 2014 IEEE 22nd International Requirements
quality attributes in different contexts. Gaps in terms of ML Engineering Conference (RE). IEEE, 2014, pp. 293–302.
quality achievement will also spearhead new technical efforts [20] F. Dalpiaz, A. Borgida, J. Horkoff, and J. Mylopoulos, “Runtime
goal models: Keynote,” in Research Challenges in Information Science
to improve ML techniques to better satisfy such qualities, (RCIS), 2013 IEEE Seventh International Conference on. IEEE, 2013,
similar to the effort of [2] for fairness and [4] for privacy. pp. 1–11.
Just as ML qualities such as fairness and transparency [21] J. Horkoff, D. Barone, L. Jiang, E. Yu, D. Amyot, A. Borgida, and J. My-
lopoulos, “Strategic business modeling: representation and reasoning,”
are currently receiving much attention, other NFRs such as Software & Systems Modeling, vol. 13, no. 3, pp. 1015–1041, 2014.
sustainability and modifiability may receive similar attention. [22] S. Nalchigar and E. Yu, “Business-driven data analytics: a conceptual
Execution of the proposed agenda paves the way for consid- modeling framework,” Data & Knowledge Engineering, vol. 117, pp.
359–372, 2018.
ering these and other critical NFRs in an ML context. [23] ——, “Designing business analytics solutions,” Business & Information
Systems Engineering, Aug 2018.
R EFERENCES [24] M. Sokolova, N. Japkowicz, and S. Szpakowicz, “Beyond accuracy,
[1] A. Smola and S. Vishwanathan, Introduction to machine learning. f-score and roc: a family of discriminant measures for performance
Cambridge University Press, 2008. evaluation,” in Australasian joint conference on artificial intelligence.
[2] T. Kamishima, S. Akaho, and J. Sakuma, “Fairness-aware learning Springer, 2006, pp. 1015–1021.
through regularization approach,” in 2011 IEEE 11th International [25] R. Caruana and A. Niculescu-Mizil, “An empirical comparison of
Conference on Data Mining Workshops. IEEE, 2011, pp. 643–650. supervised learning algorithms,” in Proceedings of the 23rd international
[3] O. Biran and C. Cotton, “Explanation and justification in machine conference on Machine learning. ACM, 2006, pp. 161–168.
learning: A survey,” in IJCAI-17 Workshop on Explainable AI (XAI), [26] S. Corbett-Davies and S. Goel, “The measure and mismeasure of
2017, p. 8. fairness: A critical review of fair machine learning,” arXiv preprint
[4] P. Mohassel and Y. Zhang, “Secureml: A system for scalable privacy- arXiv:1808.00023, 2018.
preserving machine learning,” in 2017 IEEE Symposium on Security and [27] K. Holstein, J. W. Vaughan, H. D. III, M. Dudı́k, and H. M.
Privacy (SP). IEEE, 2017, pp. 19–38. Wallach, “Improving fairness in machine learning systems: What
[5] K. Bonawitz, V. Ivanov, B. Kreuter, A. Marcedone, H. B. McMahan, do industry practitioners need?” CoRR, vol. abs/1812.05239, 2018.
S. Patel, D. Ramage, A. Segal, and K. Seth, “Practical secure aggregation [Online]. Available: http://arxiv.org/abs/1812.05239
for privacy-preserving machine learning,” in Proceedings of the 2017 [28] A. Datta, S. Sen, and Y. Zick, “Algorithmic transparency via quantitative
ACM SIGSAC Conference on Computer and Communications Security. input influence: Theory and experiments with learning systems,” in 2016
ACM, 2017, pp. 1175–1191. IEEE symposium on security and privacy (SP). IEEE, 2016, pp. 598–
[6] X. Xie, J. W. Ho, C. Murphy, G. Kaiser, B. Xu, and T. Y. Chen, “Testing 617.
and validating machine learning classifiers by metamorphic testing,” [29] N. Papernot, P. McDaniel, A. Sinha, and M. Wellman, “Towards the
Journal of Systems and Software, vol. 84, no. 4, pp. 544–558, 2011. science of security and privacy in machine learning,” arXiv preprint
[7] L. Chung, B. A. Nixon, E. Yu, and J. Mylopoulos, Non-functional arXiv:1611.03814, 2016.
requirements in software engineering. Springer Science & Business [30] C. Murphy, G. E. Kaiser, and L. Hu, “Properties of machine learning
Media, 2000, vol. 5. applications for use in metamorphic testing,” 2008.
[8] ISO/IEC, “ISO/IEC 25010 - Systems and software engineering - Systems [31] Z. Bosnić and I. Kononenko, “An overview of advances in reliability
and software Quality Requirements and Evaluation (SQuaRE) - System estimation of individual predictions in machine learning,” Intelligent
and software quality models,” Tech. Rep., 2010. Data Analysis, vol. 13, no. 2, pp. 385–401, 2009.
[9] SEMLA, “Software engineering for machine learning applications [32] R. Malhotra and A. Chug, “Software maintainability prediction using
(SEMLA) initiative,” https://semla2018.soccerlab.polymtl.ca/program/, machine learning algorithms,” Software Engineering: An International
2018, accessed: 2018-07-14. Journal (SEIJ), vol. 2, no. 2, 2012.
[10] R. Binns, “Fairness in machine learning: Lessons from political philos- [33] I. Androutsopoulos, G. Paliouras, V. Karkaletsis, G. Sakkis, C. D.
ophy,” arXiv preprint arXiv:1712.03586, 2017. Spyropoulos, and P. Stamatopoulos, “Learning to filter spam e-mail: A
[11] E. S. Yu, “Towards modelling and reasoning support for early-phase comparison of a naı̈ve bayesian and a memory-based approach,” arXiv
requirements engineering,” in Proceedings of ISRE’97: 3rd IEEE Inter- preprint cs/0009009, 2000.
national Symposium on Requirements Engineering. IEEE, 1997, pp. [34] R. Hauser, “Can we protect AI from our biases?”
226–235. https://tinyurl.com/y9lvqxc8, March 2019.
[12] D. Amyot, S. Ghanavati, J. Horkoff, G. Mussbacher, L. Peyton, and [35] I. Jureta, The design of requirements modelling languages: how to make
E. Yu, “Evaluating goal models within the goal-oriented requirement formalisms for problem solving in requirements engineering. Springer,
language,” International Journal of Intelligent Systems, vol. 25, no. 8, 2015.
pp. 841–877, 2010. [36] D. Moody, “The physics of notations: toward a scientific basis for con-
[13] Y. Wang, S. A. Mcilraith, Y. Yu, and J. Mylopoulos, “Monitoring and structing visual notations in software engineering,” IEEE Transactions
diagnosing software requirements,” Automated Software Engineering, on software engineering, vol. 35, no. 6, pp. 756–779, 2009.
vol. 16, no. 1, p. 3, 2009.

You might also like