The Abstraction First Approach To Data A
The Abstraction First Approach To Data A
EDUCATION
PERGAMON Computers & Education 31 (1998) 135±150
Abstract
Experience in industry suggests that reuse does not happen without retraining. However, since reuse is
meant to simplify programming, this paper argues the case for re-ordering a traditional data structures
and algorithms course, using an object-oriented language, so that it starts from abstraction and reuse,
and postpones coding from scratch as far as possible. The intention is that reuse should be learnt before
other strategies, so coding from scratch does not have to be unlearnt before reuse seems natural. The
paper presents experience with a restructured abstraction-®rst course, and proposes that an essential tool
for such a strategy is a set of scaled-down libraries and frameworks, designed for teaching. Compared
with a Modula-2-based course and an earlier C++-based course in which concepts were presented in a
dierent order, more ground was covered, without a major change in the students' results (grades).
# 1998 Elsevier Science Ltd. All rights reserved.
1. Introduction
It is widely recognized that practicing reuse does not happen automatically; some form of re-
education is necessary, if programmers have been schooled in more traditional coding styles
(Auer, 1995; Berg, Cline, & Girou, 1995; Fayad & Tsai, 1995; Frakes & Fox, 1995). So why is
reuse hard? Is not object-oriented programming, complete with reusable class libraries and
frameworks, meant to make programming easier?
Indeed, Brad Cox (1991) has gone as far as to propose that almost all programming should
be based on reusable object-oriented components. Is component software to be restricted to
user-level components, as in OpenDoc (Adler, 1995), or is Cox right? Could the problem be in
the diculty in unlearning previously taught non-reuse strategies?
In the early days of structured programming, many educators struggled to ®nd the best order
for presenting material. Early Pascal texts tended to start with the parts of the language that most
resembled FORTRAN, and gradually worked their way around to the ``new'' featuresÐnamed
types, proper procedures and functions with local scope, the use of procedural abstraction to
reduce the complexity of code. Gradually, authors came round to the view that procedural
abstraction could and should be developed from the start: procedures and functions moved to
Chapter 1, user-de®ned types appeared early, and monolithic main programs disappeared.
Although object-oriented programming is not a new concept, its acceptance in the
mainstream is still relatively new. Consequently, problems of presentation order in course
materials are still likely to be an issue.
Many C++ books, for example, do not introduce code reuse at all; it is quite common
practice for abstraction mechanisms (classes, templates) to be either introduced late, or purely
as ways to implement classes from scratch (Budd, 1994; Coplui, 1992; Deitel, 1994; Ford &
Topp, 1996; Headington & Riley, 1994; Lippmann, 1991; Sedgewick, 1992; Soustrup, 1991;
Wang, 1994; Wiener & Pinson, 1990).
If reuse really simpli®es coding, should not reuse be at the start of any course involving
object-oriented programming, with skills such as design and coding from scratch only
introduced later?
In this paper, I present an alternative orderingÐwhich I call abstraction-®rstÐof concepts in
an object-oriented revision of a traditional data structures and algorithms course, which aims
to develop a reuse habit before other styles of programming are developed. Although the
discussion is based on transition from Modula-2 to C++, in principle, the issues I raise could
apply to transition from any procedural to any object-oriented language. In fact, some of the
®ndings I present in this paper relate to dierences between the ®nal C++ course and an
initial attempt which did not use the abstraction-®rst approach.
Just as later structured programming or Pascal texts (Garland, 1986; Komann, 1992;
Tenenbaum & Augenstein, 1986) started with procedures, user-de®ned types, and so on, I
argue that C++ or object-oriented programming texts should be ordered in the way I propose
in this paper.
My abstraction-®rst ordering requires that students ®rst be well schooled in the idea of
abstraction, starting from real-world examples, and then a virtual machine such as a good user
interface design (the Apple Macintosh, for example). The next step is to introduce object-oriented
code as a way of building programs from building blocks, rather than from scratch. Once the
essential idea of classes and objects is established, libraries and frameworks can be introduced. To
add substance to the discussion, toy libraries and frameworks are useful educational toolsÐjust
as a toy machine can be a useful tool for teaching computer architecture, or a toy language can be
useful for teaching programming concepts. The next level of complication is the idea of
incomplete types, in the form of templatesÐwhich are introduced through a library of container
classes. Once this background is built up, it becomes possible to branch out to design decisions:
object-oriented design can be introduced, along with test strategies, algorithm analysis and
methods of choosing data structures. At early stages of the course, relatively little detailed coding
is required of the studentsÐbut availability of a good selection of reusable classes and templates
is essential. Eventually, algorithm analysis can become a springboard for principles of designing
containers, as well as classic sorts and searches. By the end of the course, the students should have
P. Machanick / Computers & Education 31 (1998) 135±150 137
learnt that the default programming strategy is to reuseÐbut they should have the concepts to
start from scratch if need be. Better still, since they will be thoroughly schooled in reuse; if they
do code from scratch, they are more likely to think in terms of good abstractions that can be
reused.
Can my alternative be evaluated?
I have run a second-year computer science course on the basis outlined in this paper. Short-
term evaluation shows that the class has responded well to the course, in that their results are
comparable with those of the previous year, which introduced C++ in a more conventional
fashion, despite more material being covered. However, proper evaluation can only take place
through work place studies, after students who have been schooled in my approach have
attempted to work on a real project. Only if they respond well to reuse in the real world can I
ultimately claim success. Evaluation by seeing how well the students cope with more advanced
courses would be dicult, since the introduction of the new course was followed by changes in
later courses. However, a positive outcome is that it was possible to revise a programming
languages course the same students did in the following year to include more sophisticated
material.
For now I believe my ideas are of sucient interest to be worthy of further exposure.
In the remainder of the paper I outline the dierences in content between the object-oriented
data abstraction and algorithms course, and its predecessor, a Modula-2 data structures and
algorithms course. I go on to describe design decisions for the new course, based on the
abstraction-®rst principle. I follow discussion of design principles of the course with discussion
of ways C++ inhibits implementation of an abstraction-®rst ordering. To add substance to the
discussion, I then present more detail of the content of the new course. I then discuss how
introducing the abstraction-®rst approach made it possible to cover more ground than with a
more conventionally ordered C++-based course, which I presented the year before.
In conclusion, I weigh up the new course against the previous Modula-2 course, as well as
my previous C++ course, and propose a strategy for future authors of object-oriented texts.
Our old Modula-2 course was called Advanced Programming (AP); the new C++-based
course is called Data Abstraction and Algorithms (DAA).
The two courses cover substantially the same ground, except the new course is heavily object
oriented (a topic touched on brie¯y in the old course), and contains stronger reference to
software engineering. The AP course was run over 7 weeks; to accommodate the extra content,
DAA is run over 10 weeks.
Here is an extract from the course outline for AP:
The objectives of the topic include gaining skills in designing and analyzing algorithms,
becoming familiar with advanced data structures and fundamental algorithms. The emphasis
is on using abstraction as a problem solving technique.
This topic covers modern program design, analysis and implementation techniques with
emphasis on object-oriented methods. It covers the following areas, not necessarily in this
order: abstract data types, recursive algorithms, complexity analysis, sorting and searching,
problem-solving strategies, advanced data structures, object-oriented programming, object-
oriented design and analysis, scope and binding.
This DAA outline speci®es more detail, but object-oriented programming, and object-
oriented design and analysis are really the only new areas.
However, the order of presentation of the current C++-based course is considerably dierent.
For detailed comparison, Appendix A contains the chapter and section structure of the AP
course, while Appendix B contains similar information for the DAA course. Since a change in
language introduces too many variables to evaluate, I have compared student results with a
previous version of the course which covered much the same ground as the latest DAA course,
but with less emphasis on the abstraction-®rst ordering. Since the older DAA course content is
not much dierent, I do not present more detail of it here.
An important point to note about the latest DAA structure is that the C++ language is
introduced relatively gently. Pointers are introduced as early as necessary to illustrate dynamic
dispatch and to make problems with aliasing and parameter passing clear. While pointer-based
data structures are introduced as pre-implemented container classes, using templates, as early as
Chapter 3, only in Chapter 8 is implementation of a container presented in detail. By that time the
students have a clear idea of what such containers are for and the issues in making them general
and abstract to the user. Yet by the end of the course the students have seen all the concepts of
the old AP course, plus all the new material related to object-oriented programming.
By contrast, the AP course started a lot earlier with language syntax, and object-oriented
programming and generics (not available in the standard Modula-2 compiler we used at the
time) were treated super®cially.
The AP course was pretty much in line with the commonly accepted ACM/IEEE curriculum
(ACM, 1991), but we used our own notes because we felt we could cover some concepts better
than any Modula-2-based book of which we were aware. Given that Modula-2 was not in as
widespread use as some other languages like C or Pascal, it is not surprising that we could not
®nd a text that exactly ®tted our needs. Given that C++ is in wide use, and the new DAA
course is also based on the ACM/IEEE curriculum (only the order of concepts is claimed to be
novel), it is more surprising that no book which we could ®nd by the start of 1995 ®tted the
principles of the new course. However, if teaching object-oriented programming is as much of
a paradigm shift as teaching structured programming was, perhaps the ®rst generation of
C++ books can be expected to have the concepts in the wrong order.
So what is the right order? The following section outlines how I have designed the latest
DAA course.
3. Abstraction by design
The fundamental principle I have attempted to use throughout the design of the new DAA
course is to expose the students to just as much detail as they need to understand a speci®c
P. Machanick / Computers & Education 31 (1998) 135±150 139
concept, but no more. The idea is that students cannot be expected to see the point of
abstraction if the course itself dumps them into detail indiscriminately.
This design principle has driven the ordering of chapters, and of sections within chapters.
First, to appreciate abstraction, it is not necessary to know about programmingÐso I use
real-world examples to illustrate abstraction before I talk about computers (a common practice
in object-oriented books (Booch, 1991; Freeman & Ince, 1996)). Then, to introduce abstraction
in programming, I show that it is possible to reuse code without knowing anything about how
it is implemented. Most members of my class have no prior exposure to C or C++, so keeping
them ignorant of implementation is a simple matter. I deliberately introduce C++ syntax from
the header ®le inwards to the compilable ®le. In this way the students are more familiar at ®rst
with interfaces than implementations. I also introduce Booch diagrams (Booch, 1991) early, so
they learn that object-oriented design is not language speci®cÐthey learn to think of classes
and objects before they know enough C++ to be locked into its peculiarities.
When I ®rst introduce detailed implementation, I do so in the context of implementing a
small part of a partially constructed program, so as to demonstrate the value of abstraction as
a tool to shield the programmer from the complexity of a larger program. When I ®nally move
to larger examples, I try as far as possible to maintain the strategies of reuse, and hiding other
parts of a larger program. When I introduce design I do so by presenting part of a design and
showing how a small part can be implemented, only knowing the interfaces to the rest of the
design. Only when the general idea of abstraction is well established do I venture into
algorithm analysisÐand even there, I interleave algorithm analysis with a section on design for
reuse, so the point is not lost: you only code from scratch with reuse in mind.
Why is this approach better than the common strategy used in many object-oriented texts
(including introductory ones (Decker & Hirsh®eld, 1995) as well as others aimed at the same
territory as my course (Budd, 1994; Ford & Topp, 1996)), where implementation is introduced
early? These books are teaching students to think of objects in terms of implementation rather
than the abstraction they represent. If my view that unlearning previously taught strategies is
an obstacle to learning a dierent approach, then putting implementation early makes it harder
in the long run to make the case for thinking in terms of abstraction, classes as black boxes,
and reuse.
I do not argue that students should never learn how to implement abstraction, but rather
that implementation comes after learning reuse.
This point seems hard to put across to instructors familiar with non-object-oriented
languages who have converted to C++. It is instructive to go back to one of the earlier
Smalltalk object-oriented texts, though a book which is aimed more at the language specialist
than at the beginner, for support for the case that the natural order of presentation is ®rst
interfaces of library classes, then usage of library classes and only ®nally how they are
implemented (Goldberg & Robson, 1983).
An abstraction-®rst ordering diers from the common objects-®rst ordering (Decker &
Hirsh®eld, 1995) in the following way:
abstraction-®rst
objects-®rst
The key dierentiating factor between the two approaches is pushing implementation of an
ADT to later.
It is useful to dierentiate the abstraction-®rst principle from the common distinction of top-
down versus bottom-up. A top-down approach, super®cially, meets the requirement of
avoiding detail early. However, the convetional conception of a top-down approach requires
that students already have good abstaction skillsÐthey have to be able to decompose a
problem and design appropriate abstractions. The abstraction-®rst approach diers in that
students are presented with existing abstractions to use as building blocks, and the much more
dicult problem of designing their own abstractions is deferred to later, when the principles
have been more ®rmly established.
access to reasonably priced compilers with good programming environments1 and externally
imposed requirements for exposure to C++ , many educators are likely to be faced with the
need to use C++ as a ®rst object-oriented language.
1
I use Code Warrior on Power Macs for my course; since writing this paper, Java has become more accessible,
and may be worth considering as an alternative, despite the lack of some features, like parametrized types.
P. Machanick / Computers & Education 31 (1998) 135±150 143
The mindset change required of instructors here is that it is acceptable to introduce a wide
range of object-oriented concepts before having told the students how to write a loop.
Although the number of concepts needed to get this far is relatively high, understanding how
to write a loop is also complicated, and is hard to introduce without breaking away from
pushing the value of abstraction.
To make all this work requires reasonably well-developed class libraries which can be used for
non-trivial examples. Since commercial-quality libraries are generally too large and complex, I
have developed my own. To keep complexity under control I have divided them into three
categories: container class templates, a window-based application framework, and toolboxes for
speci®c purposes (database, graphics, text manipulation and strings). Each toolbox can be used
separately for speci®c examples, or combined for larger examples; by the end of the course the
students have implemented part of an example using all the libraries in one application.
One concept in the new DAA course which is totally new relative to both of the older
courses is the idea of a software architectureÐthe high-level design of overall structure and
¯ow of control (Garlan & Perry, 1995). I use the Smalltalk Model-View-Controller paradigm
as my main example. It would have been dicult to illustrate the idea of an application
architecture without an application framework as a basis for implementing the part of the
architecture which does not vary from application to application. Other architectures I describe
brie¯y include client-server architectures, a software bus and compound document architectures
(such as OpenDoc).
Once principles of abstraction and reuse are ®rmly established, the DAA course introduces
conventional data structures and algorithms material as a way of implementing your own
abstractions.
Compared with my ®rst attempt at the DAA course, in which I did not pursue the
abstraction-®rst idea as strongly, I was able to introduce a number of new concepts, some of
which had previously been covered in more advanced courses.
The new concepts in the abstraction-®rst version of the DAA course included:
. templates
. iterators
. generators
. deep and shallow copy (including reference counts)
. exceptions (if mainly from the perspective of problems with the C++ model)
. more challenging algorithm analysis and data structures
. use of an application framework
. software architectures
All of this is in addition to the content of the initial version of DAA, which included object-
oriented concepts, data structures and algorithms, as well as an introduction to software
engineering.
P. Machanick / Computers & Education 31 (1998) 135±150 145
Despite covering a number of new, sophisticated topics in a course of the same duration,
students' results did not dier signi®cantly between the two years (in both cases there were
about 85 students in the class). The average for the ®rst DAA course was 63%; the average
®nal result for the new course was 62%; in both cases the standard deviation was 12%. The
two groups of students had similar results in other subjects, which suggests that the new
ordering made it possible to cover signi®cantly more complex material.
In our marking (grading) system a mark above 75% is an A, 70±74% is a B, a C is 60±69%,
a D is 50±59%, and below 50% is a fail. The students' results in the two C++ courses are in
line with other results in our undergraduate course.
It is dicult to evaluate the impact of the revised ordering on later courses, since later
courses were revised at the same time as the students advancing from the DAA course moved
to those later courses. However, the fact that considerably more material was covered (some of
which was covered in more advanced courses previously) supports the claim that the
abstraction-®rst approach is an ecient teaching strategy. One area where we were able to see
a bene®t was in the programming language course taught to the same students in the following
year, where we were able to spend more time on language theory and semantics, since the
students had a better grounding in abstraction and object-oriented concepts.
7. Conclusions
Our old Modula-2 course had much to recommend it. Although Modula-2 is weak in
support for abstract data types, it has a relatively simple syntax and relatively few traps and
pitfalls. Much of the content in terms of data structures and algorithms issues is still good.
However, to present a course similar in order of concepts to the new DAA course would be
dicult in Modula-2. The opaque type concept, which is used to implement abstract types in
Modula-2, is not easy to introduce early, since the kind of pointer problems which arise in
C++ are even harder to avoid in Modula-2 (a Modula-2 opaque type is generally
implemented as a pointer, which leads to problems when standard operators for assignment
and comparison for equality, which cannot be replaced as in C++ , are used by mistake).
The lack of object-oriented featuresÐespecially inheritance and dynamic dispatch, but also
templates (or generics)Ðmake it hard to use Modula-2 to implement concepts like general
containers with iterators and generators. An object-oriented language also makes it a lot easier
to emphasize reuse from the start.
Although the AP course introduced these concepts, they could not be directly used, since the
language did not support them.
On the whole, though Modula-2 is a good language in many ways, it is not up to the task of
introducing abstraction and reuse from the start.
Other object-oriented languages may well be better than C++ for introducing concepts. A
language like Smalltalk (Goldberg & Robson, 1983), for example, which has implicit memory
management and garbage collection, would remove many of the pitfalls of C++ . However,
Smalltalk is a large, complex environment and collection of libraries, which makes it dicult to
right-size it to a course which is not solely focused on the language and related tools. Java
(Arnold & Gosling, 1996) has become more accessible since the course was designed, so it
146 P. Machanick / Computers & Education 31 (1998) 135±150
would be worth looking at as an alternative. Its main weakness as a language is that it lacks
parametrized types (equivalent to C++ templates). However, some eect of parametrized
types can be achieved with interfaces (which allow new classes to be based on partially
speci®ed classes, and support a limited form of multiple inheritance). There would be two big
wins with Java. Like Smalltalk, it has a garbage collection system and does not allow explicit
pointer manipulation, which eliminate many of the pitfalls of C++ . Also, Java's integration
with web pages could provide a basis for interesting examples, and Java academic texts are
starting to appear (Bishop, 1997). The quality and availability of Java environments has
improved considerably since this course was ®rst run.
Other object-oriented languages, such as Eiel, could be considered as well (Rist &
Terwilliger, 1995): the point of this paper though is the order of learning rather than to survey
a variety of object-oriented languages. This discussion should be sucient to show that the
same same principles can apply when using other languages.
Whichever language is used, the fundamental principles remain the same: concepts should be
introduced in an order which introduces and reinforces the reuse habit from the start.
Development from scratch should be seen as a more advanced skill, learnt only after
understanding reuse and designÐso you can design for reuse. The way to develop course
materials which support such a strategy is to start from comprehensive (if toy by industrial-
strength standards) class libraries. The classes should contain sucient functionality and
application-building tools to allow introduction of a broad range of object-oriented concepts,
before too much language syntax needs to be introduced.
To illustrate how these principles translate to text book design, I have included a table of
contents from both the Modula-2 and the latest C++ course notes as Appendices A and B,
respectively.
How well this strategy will translate to workplace skills will take time to evaluate; the widely
acknowledged problem in industry of converting programmers to reuse suggests that a change
in educational strategy is important to consider.
Acknowledgements
Much of the work that went into the Modula-2 AP course was done by Scott Hazelhurst;
his work saved me much time and eort in running the course in subsequent years. I have also
reused some of his material in the new C++ DAA course. Several generations of students
have contributed to my understanding of how best to present concepts, leading to the
proposals in this paper. I would also like to thank Apple Computer and Metrowerks (whose
CodeWarrior C++ was used) for making it possible for me to put this course together on a
very limited budget.
Appendix A
Appendix B
References
ACM (1991). A summary of the ACM/IEEE-CS Joint Curriculum Task Force Report: Computing Curricula 1991. Comm. ACM,
34(6), 68±84.
Adler, R. M. (1995). Emerging standards for component software. Computer, 28(3), 68±76.
Arnold, K., & Gosling, J. (1996). The Java Programming Language, Addison-Wesley, Reading, MA.
Auer, K. (1995). Smalltalk training: As innovative as the environment. Comm. ACM, 38(10), 115±117.
Bishop, J. M. (1997). Java Gently, Addison-Wesley, Harlow.
Berg, W., Cline, M., & Girou, M. (1995). Lessons learned from the OS/400 OO Project. Comm. ACM, 38(10), 54±64.
Booch, G. (1991). Object-oriented Design with Applications, Benjamin/Cummings, Redwood City, CA.
Budd, T. A. (1994). Classic Data Structures in C++ , Addison-Wesley, Reading, MA.
Coplien, J. O. (1992). Advanced C++ : Programming Styles and Idioms. Addison-Wesley, Reading, MA.
Cox, B. J. (1991). Object-oriented Programming: An Evolutionary Approach, 2nd edn. Addison-Wesley, Reading, MA.
Deitel, H. M., & Deitel, P. J. (1994). C++ How to Program. Prentice Hall, Englewood Clis, NJ.
Decker, R., & Hirsh®eld, S. (1995). The Object Concept: An Introduction to Computer Programming Using C++ , PWS, Boston.
Fayad, M. E., & Tsai, Wei-Tek (1995). Object-oriented experiences. Comm. ACM, 38(10), 51±53.
Ford, W., & Topp, W. (1996). Data Structures with C++ . Prentice Hall, Englewood Clis, NJ.
Frakes, W. B., & Fox, C. J. (1995). Sixteen questions about software reuse. Comm. ACM. 38(6) 75±87,112.
Freeman, A., & Ince, D. (1996). Active Java. Addison-Wesley, Harlow.
Garland, S. J., Introduction to Computer Science with Applications in Pascal. Addison-Wesley, Reading, MA, 1986.
Garlan, D., & Perry, D. E. (1995). Introduction to the Special Issue on Software Architecture. IEEE Trans. on Software Engineering,
21(4), 269±274.
Goldberg, A., & Robson, D. (1983). Smalltalk±80: The Language and its Implementation. Addison-Wesley, Reading, MA.
Headington, M. R., & Riley, D. D. (1994). Data Abstraction and Structures Using C++ , DC Heath, Lexington, MA, 1994.
Koman, E. B. (1992). Pascal, 4th edn. Addison-Wesley, Reading, MA.
Lippman, S. B. (1991). C++ Primer, 2nd edn. Addison-Wesley, Reading, MA.
Liskov, B., Atkinson, R., Bloom, T., Moss, E., Schaert, J. C., Schei¯er, R., & Snyder, A. (1981). CLU Reference Manual. Springer,
Berlin.
Rist, R., & Terwilliger, R. (1995). Object-Oriented Programming in Eiel, Prentice Hall.
Sedgewick, R. (1992). Algorithms in C++ . Addison-Wesley, Reading, MA.
Shaw, M. (1991). ALPHARD: Form & Content, Springer, New York.
Stroustrup, B. (1991). The C++ Programming Language, 2nd edn. Addison-Wesley, Reading, MA.
Tenenbaum, A. M., & Augenstein, M. J. (1986). Data Structures Using Pascal, 2nd edn. Prentice-Hall, Englewood Clis, NJ.
Wang, P. S. (1994). C++ with Object-Oriented Programming, PWS, Boston, MA.
Wiener, R. S., & Pinson, L. J. (1990). The C++ Workbook. Addison-Wesley, Reading, MA.