Teaching Natural User Interaction Using Openni and The Microsoft Kinect Sensor
Teaching Natural User Interaction Using Openni and The Microsoft Kinect Sensor
Norman Villaroman
Brigham Young University Provo, Utah
ABSTRACT
The launch of the Microsoft Kinect for Xbox (a real-time 3D imaging device) and supporting libraries has spurred a flurry of development of, among other things, natural user interfaces for computer applications. Using Kinect offers opportunities for novel approaches to classroom instruction on natural user interaction. With the launch of this sensor came the establishment of development platforms that are able to collect and process the data that this sensor provides, one of which is OpenNI. We evaluate the current state of this technology, and present examples of how Kinect-enabled user interfaces can provide tremendous opportunities for students in Human Computer Interaction (HCI) courses. Our paper presents sample learning activities to achieve various HCI learning outcomes listed in IT 2008. The advantages of using this as a tool in the classroom are discussed.
device. The adoption and expansion natural user interfaces is expected to make it easy for the people to learn how to use the interface in the quickest possible way. 1 The desire to develop natural user interfaces has existed for decades. Since the last world war, professional and academic groups have been formed to enhance interaction between man and machine [2, 3]. While computer user interfaces started with punch cards and keyboard-like devices, attempts to develop interfaces that process more natural movements emerged relatively early. An example of such an attempt involved the study of pattern recognition of natural written language using a pen-like interface called the RAND tablet and stylus in 1967 [4, 5]. An estimation of the value of the ability of a machine to process audio commands was published in 1968 [6]. The first touch device, called the Elograph was created in 1971, with further advancements from the same company continuing in the following decades. In the late 1970s, VIDEOPLACE, a system with video cameras and projectors allowed graphical interaction and communication between users without the aid of user-attached or controlled devices, was developed. The 1980s saw the beginnings of multi-touch systems with a master's thesis by Nimish Mehta that resulted in the development of the Flexible Machine Interface. In the same decade, touch screen interfaces began to be prevalent in commercial establishments. At this time, HewlettPackard 150, a personal computer with basic touch input, was launched. In the 1990s, touch screen interfaces made its way to mobile devices with Simon, launched by IBM and Bell South in 1994 [7]. Touch screens continued to spread in different consumer applications. In the first decade of this century, more advanced gestural technology and accompanying applications have surfaced with a plethora of PDAs, smartphones, and multimedia devices. Natural interfaces have become more prevalent in recent years in gaming systems that use such input such as the Nintendo Wii Remote, the PlayStation Eye/Move, and the Microsoft Kinect for Xbox. The emergence of these natural user interface and gestural technologies has motivated researchers and hardware enthusiasts
General Terms
Design, Experimentation, Human Factors
Keywords
HCI, Kinect, Natural User Interface, Education
1. INTRODUCTION
Human-computer interaction (HCI) is a fundamental pillar in the Information Technology discipline.[1] Any interactive computing system involves one or more interfaces with which a user can provide commands and gather results. Interactive computer systems started with command-line interfaces that are still widely used today. The development of graphical user interfaces has allowed users with varying levels of computer skills to use a wide variety of software applications. Recent advancements in HCI have facilitated the development of natural user interfaces that provide more intuitive ways of interacting with the computing
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SIGITE11, October 2022, 2011, West Point, NY, USA. Copyright 2011 ACM X-XXXXX-XX-X/XX/X.
The exact scope of command-line, graphical, and natural user interfaces may vary from one expert to another and may even overlap. Terminology and definitions may vary with terms aside from those mentioned such as tangible and gestural user interfaces. Suffice it to say that there have been continuous advancements in making the user interface easier and more intuitive for the user to learn.
to look for novel ways that they can be used. In addition to their use in consumer applications, more of these technologies are finding their way into the classrooms, particularly for the study of HCI. We suggest that the Kinect is an excellent addition to the toolset for teaching such topics for reasons discussed in this paper. Our paper begins with a discussion and assessment of the state of Kinect-enabled technology. We then make specific recommendations for how this technology can be applied in HCI course instruction. Finally, we discuss the advantages and disadvantages of a Kinect-enabled approach to HCI instruction.
the sensor [13]. A developer with an alias of AlexP was able to successfully accomplish the challenge first at two days after the launch but it was Hector Martin who was recognized as the winner as the former was not willing to open-source his code at the time [14]. The efforts of these two individuals led to the beginning of two of the projects mentioned. Hector Martin released his code on November 10th, which marks the beginning of OpenKinect [15]. Among other things, this project provides drivers for Kinect, wrappers for different languages and other projects, and an analysis library that are all open-source. This project is distributed under both the Apache 2.0 and the GPL v2 license [16]. AlexP's code was used in the development of the Windows Kinect Driver/SDK - CL NUI Platform of Code Laboratories, version 1.0 of which was released officially on December 8th. The driver and the SDK with C/C++/C# libraries are freely available to the public. In the same month of the Kinect's launching, OpenNI [11, 17] was formed by a group of companies, which included PrimeSense Ltd., as a not-for-profit organization that aims to set an industrystandard framework for the interoperability of natural interaction devices. With the framework they have released came middleware libraries that can convert raw sensor data from a compliant device to application-ready data. These libraries are available for applications written in C, C++, and C#. Because Kinect is an OpenNI-compliant device, this software can be used to build applications for it. OpenNI is written in C/C++ so that, among other reasons, it can be used across different platforms. While OpenNI is officially supported on later versions of Windows from Windows XP and Ubuntu from 10.10, [18] its use in other Linux distributions and Mac OS X has been documented to work in online forums. Microsoft Research has announced the release of a noncommercial Kinect for Windows SDK in the spring of 2011, with a commercial version coming later. [19] The initial release promises, among other things, audio processing with source localization.[20] Processing audio signals from the sensors microphones is not enabled and currently under development in the three projects previously mentioned. This official Microsoft SDK may also provide the support necessary to accomplish the learning outcomes discussed here.
2. DEVELOPMENT FRAMEWORK
Before presenting ideas for how Kinect gestural, natural user interface technology can be used in the classroom, we will discuss the state of the hardware and software technology as it exists today.
2.1 Hardware
The Kinect is based on a sensor design developed by PrimeSense Ltd. The technical specifications described here are taken from the reference design of the PrimeSense sensor.[8] Light Coding is the technology that allows it to construct 3D depth maps of a scene in real-time with the amount of detail that it does. Structured near-infrared light is cast on a region of space and a standard CMOS image sensor is used to receive the reflected light. The PS1080 SoC (System on a Chip) is able to control the structured light with the aid of a light diffuser and is able to process the data from the image sensor to provide realtime depth data [9]. The depth image size from the PS1080 has a maximum resolution of 640 x 480. At 2m from the sensor it is able to resolve down to 3mm for height and width and 1cm for depth. It operates at a range between 0.8m and 3.5m. Experimentation has shown that Kinect is only able to process depth data at a frame rate of 30 fps. The sensor also has an integrated RGB camera with a maximum resolution of 1600x1200 (UXGA) to match the depth data with real images and two built-in microphones for audio input. Connectivity is enabled by a USB 2.0 interface. While it has to be noted that the Kinect sensor is not the only device that uses the PrimeSense reference design (e.g. ASUS Xtion Pro), all related experiments and activities in this paper were accomplished using it. Its basic applications do not require special or powerful computer hardware. A dual-core machine with 2GB RAM and standard video graphics processor can handle these applications just fine.
3. EDUCATIONAL USE
Familiarity with a Kinect-related SDK will help students be able to develop and understand applications that work with the Kinect. Any of the three software packages mentioned can be used to create HCI learning activities. For the purposes of this paper, OpenNI, with its accompanying middleware library called NITE, was investigated for its academic value. OpenNI was selected because it was established as an industry standard. It can also be further improved by its professional maintainers and the open source community. In this section the advantages and limitations of using Kinectbased applications to teach various topics in HCI are discussed. An enumeration of some of these topics is given. And some learning activities are outlined.
2.2 Software
At the moment of this writing, there are three major projects with freely available libraries that can be used to collect and process data from a Kinect sensor OpenKinect [10], OpenNI [11], and CL NUI [12]. Before the official release of Microsoft Kinect on November 4th, 2010, Adafruit Industries announced that they would give monetary reward to hardware enthusiasts who can write software that will enable a PC to access and process the data coming from
Come up with usability guidelines and standards for Kinect-enabled interfaces. Perform a heuristic evaluation of the usability of a Kinect-enabled application. Determine and design performance, usability, and user experience metrics for Kinect-enabled interfaces. Evaluate the appropriateness of traditional usability and user experience testing methods to assess Kinectenabled interactions (both single and multimodal).
3.1.5 Accessibility
Sample activities for core learning outcomes Identify different ways that a Kinect sensor can assist users with certain disabilities. Consider the limits and capabilities of Kinect for disabled users and analyze the risks of such assistance. Discuss how 3D depth maps or gestures can improve computer use for special purpose or disabled computer users.
mouse, keyboard, and game controllers) and emerging (e.g., voice-recognition) interaction technologies. Illustrate how existing applications can be improved upon by Kinect-enabled interfaces.
well documented which is very helpful in self-directed study, a crucial skill that IT students have to have. The libraries and middleware that process the raw data into more useful and application-ready information are already available (e.g. skeleton tracking, object segmentation, gesture and pose recognition, sensor data recording, etc.) A simple Internet search on Kinect-enabled applications will show scores of applications that have already been developed soon after the Kinects release. Even though they are based on the same sensor technology, these applications vary widely in purpose and usage. Having a plethora of explored and unexplored applications provides a great opportunity for students to exercise creative thought and innovation. Students who have gone through an undergraduate advanced programming course in C/C++ will find it relatively easy to develop applications directly from the project libraries. If the students do not have this advanced level of understanding but only have basic programming skills, wrapper classes or incomplete implementations can be written to make its use simpler for them. Even just using the sample applications out of the box could be useful in various topics of HCI. The NITE libraries provide a wide range of functionality that is more than sufficient for a structured course on natural user interaction. These libraries are well documented. Among much functionality that is provided out of the box include (though not by any means complete): Recognition of push, steady, swipes, waves and other gestures Skeleton detection and tracking of individual skeleton joints Pose detection User segmentation and multiple user detection Accessing depth and video data Multiple point tracking Various calibration and smoothing functions to enhance recognition Sample applications of gesture-controlled interfaces, user segmentation (Figure 1), point tracking (Figure 2), skeleton tracking (Figure 3), and other functionalities previously mentioned.
courses, in order to learn how to keep up with advancements in this exciting technology.
5. REFERENCES
[1] LUNT, B., EKSTROM, J., REICHGELT, H., et al. IT 2008. Communications of the ACM, 53, 12, 133. [2] BIRMINGHAM, H. P. Human Factors in ElectronicsHistorical Sketch. Proceedings of the IRE, 50, 51962), 11161117. [3] SANDERS, M. S. Human factors in engineering and design: Mark S. Sanders, Ernest J. McCormick. New York: McGraw-Hill, 1987. [4] HORNBUCKLE, G. D. The Computer Graphics User/Machine Interface. Human Factors in Electronics, IEEE Transactions on, HFE-8, 11967), 17-20. [5] DAVIS, M. R. and ELLIS, T. O. The RAND tablet: a manmachine graphical communication device. In Proceedings of the Proceedings of the October 27-29, 1964, fall joint computer conference, part I (San Francisco, California, 1964). ACM, [insert City of Publication],[insert 1964 of Publication]. [6] LEA, W. Establishing the value of voice communication with computers. Audio and Electroacoustics, IEEE Transactions on, 16, 21968), 184-197. [7] SAFFER, D. Designing Gestural Interfaces. Sebastopol : O'Reilly Media, In., Sebastopol, 2008. [8] LTD., P. The PrimeSensor (TM) Reference Design 1.08. PrimeSense Ltd, http://www.primesense.com/files/FMF_2.PDF (Last Accessed: April 2011) [9] ZALEVSKY, Z., SHPUNT, A., MAIZELS, A., et al. METHOD AND SYSTEM FOR OBJECT RECONSTRUCTION. 2006. [10] OPENKINECT OpenKinect Main Page. http://openkinect.org/ (Last Accessed: April 2011) [11] OPENNI OpenNI. http://openni.org/ (Last Accessed: April 2011) [12] LABORATORIES, C. About: CL NUI Platform. Code Laboratories, http://codelaboratories.com/kb/nui (Last Accessed: April 2011) [13] INDUSTRIES, A. The Open Kinect project THE OK PRIZE get $3,000 bounty for Kinect for Xbox 360 open source drivers. http://www.adafruit.com/blog/2010/11/04/the-openkinect-project-the-ok-prize-get-1000-bounty-for-kinect-for-xbox360-open-source-drivers/ (Last Accessed: February 2011) [14] GILES, J. Inside the Race to Hack the Kinect. 2010. http://www.newscientist.com/article/dn19762-inside-the-race-tohack-the-kinect.html?full=true [15] OPENKINECT OpenKinect History. http://openkinect.org/wiki/History (Last Accessed: April 2011) [16] OPENKINECT OpenKinect Policies. http://openkinect.org/wiki/Policies (Last Accessed: April 2011) [17] OPENNI About. http://openni.org/about (Last Accessed: May 2011) [18] OPENNI OpenNI User Guide. 2011. http://openni.org/images/stories/pdf/OpenNI_UserGuide_v3.pdf [19] KNIES, R. Academics, Enthusiasts to Get Kinect SDK. Microsoft Research, http://research.microsoft.com/enus/news/features/kinectforwindowssdk-022111.aspx (Last Accessed: April 2011). [20] RESEARCH, M. Kinect for Windows SDK beta. http://research.microsoft.com/enus/um/redmond/projects/kinectsdk/ (Last Accessed: May 2011)
Figure 3 Skeleton tracking These and the other standard features that are included in OpenNI/NITE could be used in learning activities for HCI. While these could very well be sufficient, there are many other opensource and commercial applications that can be used if necessary. The technology behind Kinect also has the potential to allow significant advances in natural user interaction technology, as evidenced by the wide variety of applications that have been developed from the very beginning of its launch. Students interested in the field of computer vision and/or natural user interaction will benefit from keeping up with this advancement. While Kinect-enabled technology can be a very helpful aid in learning HCI topics, caution is given to keep its use appropriate and balanced with other technologies that are also useful for HCI instruction.
4. CONCLUSION
The Microsoft Kinect sensor and its supporting development platforms provide significant learning opportunities in various topics of Human Computer Interaction. A number of such platforms are currently available. In this paper, we evaluated OpenNI. OpenNI and its libraries can be very effective, robust, and flexible in providing natural user interface functionalities, which can be used to provide students with hands-on experience with this gesture-based, natural user interaction technology. Since gesture-based interaction technologies are becoming a standard part of commercial systems, IT students will benefit from integrating this technology in their education, such as in HCI