Explore 1.5M+ audiobooks & ebooks free for days

Only $12.99 CAD/month after trial. Cancel anytime.

R Data Structures and Algorithms
R Data Structures and Algorithms
R Data Structures and Algorithms
Ebook519 pages3 hours

R Data Structures and Algorithms

Rating: 0 out of 5 stars

()

Read preview

About this ebook

This book is for R developers who want to use data structures efficiently. Basic knowledge of R is expected.
LanguageEnglish
PublisherPackt Publishing
Release dateNov 21, 2016
ISBN9781786464163
R Data Structures and Algorithms

Related to R Data Structures and Algorithms

Related ebooks

Programming For You

View More

Reviews for R Data Structures and Algorithms

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    R Data Structures and Algorithms - Dr. PKS Prakash

    (missing alt)

    Table of Contents

    R Data Structures and Algorithms

    Credits

    About the Authors

    Acknowledgments

    About the Reviewer

    www.PacktPub.com

    Why subscribe?

    Preface

    What this book covers

    What you need for this book

    Who this book is for

    Conventions

    Reader feedback

    Customer support

    Downloading the example code

    Downloading the color images of this book

    Errata

    Piracy

    Questions

    1. Getting Started

    Introduction to data structure

    Abstract data type and data structure

    Relationship between problem and algorithm

    Basics of R

    Installation of R

    Basic data types in R

    Operations in R

    Control structures in R

    If condition

    If...else condition

    Ifelse function

    For() loop

    Nested for( ) loop

    While loop

    Special statements in loops

    Break statement

    Next statement

    Repeat loop

    First class functions in R

    Exercises

    Summary

    2. Algorithm Analysis

    Getting started with data structure

    Memory management in R

    System runtime in R

    Best, worst, and average cases

    Computer versus algorithm

    Algorithm asymptotic analysis

    Upper bounds or Big O notation

    Lower bounds or Big Omega notation (Ω)

    Big θ notation

    Simplifying rules

    Classifying rules

    Computation evaluation of a program

    Component 1 - Assignment operator

    Component 2 - Simple loop

    Component 3 - Complex loop

    Component 4 - Loops with conditional statements

    Component 5 - Recursive statements

    Analyzing problems

    Space bounds

    Exercises

    Summary

    3. Linked Lists

    Data types in R

    Vector and atomic vector

    Element data types

    Factor

    Matrix

    Array

    Dataframes

    List

    Object-oriented programming using R

    Linked list

    Linear linked list

    Doubly linked list

    Circular linked list

    Array-based list

    Analysis of list operations

    Exercises

    Summary

    4. Stacks and Queues

    Stacks

    Array-based stacks

    Linked stacks

    Comparison of array-based and linked stacks

    Implementing recursion

    Queues

    Array-based queues

    Linked queues

    Comparison of array-based and linked queues

    Dictionaries

    Exercises

    Summary

    5. Sorting Algorithms

    Sorting terminology and notation

    Three Θ(n²) sorting algorithms

    Insertion sort

    Bubble sort

    Selection sort

    The cost of exchange sorting

    Shell sort

    Merge sort

    Quick sort

    Heap sort

    Bin sort and radix sort

    An empirical comparison of sorting algorithms

    Lower bounds for sorting

    Exercises

    Summary

    6. Exploring Search Options

    Searching unsorted and sorted vectors

    Self-organizing lists

    Heuristic 1 - Count

    Heuristic 2 - Move-to-front

    Heuristic 3 - Transpose

    Hashing

    Hash functions

    Open hashing

    Closed hashing

    Bucket hashing

    Linear probing

    Analysis of closed hashing

    Deletion

    Exercises

    Summary

    7. Indexing

    Linear indexing

    ISAM

    Tree-based indexing

    2-3 trees

    B-trees

    B+ trees

    B-tree analysis

    Exercises

    Summary

    8. Graphs

    Terminology and representations

    Graph implementations

    Graph traversals

    Depth-first search

    Breadth-first search

    Topological sort

    Shortest path problems

    Single-source shortest paths

    Minimum-cost spanning tree

    Prim's algorithm

    Kruskal's algorithm

    Exercises

    Summary

    9. Programming and Randomized Algorithms

    Dynamic programming

    The knapsack problem

    All pairs shortest paths

    Randomized algorithms

    Randomized algorithms for finding large values

    Skip lists

    Probabilistic analysis of skip lists

    Exercises

    Summary

    10. Functional Data Structures

    Functional data structure

    Lazy evaluation

    Functional stacks

    Functional queues

    Fast fully-persistent queues

    Slowly-persistent queues and deques

    Summary

    R Data Structures and Algorithms


    R Data Structures and Algorithms

    Copyright © 2016 Packt Publishing

    All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

    Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

    Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

    First published: November 2016

    Production reference: 1141116

    Published by Packt Publishing Ltd.

    Livery Place

    35 Livery Street

    Birmingham 

    B3 2PB, UK.

    ISBN 978-1-78646-515-3

    www.packtpub.com

    Credits

    About the Authors

    Dr. PKS Prakash has pursued his PhD in industrial and system engineering at Wisconsin-Madison, US. He defended his second PhD in engineering from University of Warwick, UK. He has provided data science support to numerous leading companies in healthcare, manufacturing, pharmaceutical, and e-commerce domains on a wide range of business problems related to predictive and prescriptive modeling, virtual metrology, predictive maintenance, root cause analysis, process simulations, fraud detection, early warning systems, and so on. Currently, he is working as the Vice President and Practice Lead for data science at Dream11. Dream11 offers the world's largest fantasy cricket, football, and kabaddi games of skill. He has published widely in research areas of operational research and management, soft computing tools, and advanced algorithms in the manufacturing and healthcare domains in leading journals such as IEEE-Trans, EJOR, and IJPR, among others. He has contributed a chapter in Evolutionary Computing in Advanced Manufacturing and edited an issue of Intelligent Approaches to Complex Systems.

    Achyutuni Sri Krishna Rao has an MS in enterprise business analytics (data science) from National University of Singapore. He has worked on a wide range of data science problems in the domain of manufacturing, healthcare and pharmaceuticals. He is an R enthusiast and loves to contribute to the open source community. His passions include freelancing, technical blogs (http://rcodeeasy.blogspot.com) and marathon runs. Currently he works as a data science consultant at a leading consulting firm.

    Acknowledgments

    The completion of this book wouldn't have been possible without the participation and assistance of so many people whose names may not all be enumerated. Their contribution is sincerely appreciated and gratefully acknowledged. However, we would like to express our deep appreciation and indebtedness to all the members of the Packt team involved in this project. The book arose with an idea with an early discussion with Denim Pinto (acquisition editor), so we want to extend special thanks to him; without his input, this book would never have happened. We also want to thank Pooja Mhapsekar and Siddhi Chavan (content editor) and Sunith Shetty (technical editor) for ensuring the timely publication of this book. We would like to extend thanks to Vahid Mirjalili (reviewer), whose feedback has helped us tremendously improve this book.

    About the Reviewer

    Dr. Vahid Mirjalili is a software engineer/data scientist, currently working toward his PhD in computer science at Michigan State University. His research at the Integrated Pattern Recognition and Biometrics (i-PRoBE) lab involves attribute classification of face images from large image datasets.  Furthermore, he teaches Python programming, as well as computing concepts for data analysis and databases. With his specialty in data mining, he is very interested in predictive modeling and getting insights from data. He is also a Python developer and likes to contribute to the open source community. Furthermore, he enjoys making tutorials for different areas of data science and computer algorithms, which can be found in his GitHub repository (http://github.com/mirjalil/DataScience).

    www.PacktPub.com

    For support files and downloads related to your book, please visit www.PacktPub.com.

    Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.

    At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

    www.PacktPub.com

    https://www.packtpub.com/mapt

    Get the most in-demand software skills with Mapt. Mapt gives you full access to all Packt books and video courses, as well as industry-leading tools to help you plan your personal development and advance your career.

    Why subscribe?

    Fully searchable across every book published by Packt

    Copy and paste, print, and bookmark content

    On demand and accessible via a web browser

    Preface

    Data structures represent a way to organize and access particular data efficiently. They are critical to any problem solving and provide a complete solution to implement reusable codes. R Data Structures and Algorithms aims at strengthening the data structure skills among R users working in the analytics and intelligence domains. R is a well-designed language and environment for statistical computing and graphics developed at Bell Laboratories (formerly AT&T, now Lucent Technologies). This book will allow users to design optimized algorithms from the computational efficiency and resource usage perspective of an algorithm. This book puts forward the processes of building algorithms by introducing several data structures and their relationship with algorithms, followed by their analysis and evaluation. This book intends to cover not only the classical data structures, but also understand the integrities of functional data structures. We will cover the fundamentals of data structures, such as lists, stacks, queues, and dictionaries, followed by topics such as indexing, sorting, and searching in depth. Readers will also be exposed to advanced topics such as graphs, dynamic programming, and randomized algorithms.

    The objective of this book is to build data structure concepts using R.

    What this book covers

    Chapter 1, Getting Started, builds a background for the aspects of data structures that are important to develop basics of R, as well as why they are important.

    Chapter 2, Algorithm Analysis, talks about motivation, basic notation, and fundamental techniques for algorithm analysis.

    Chapter 3, Linked Lists, builds a foundation of linked lists and will cover multiple variants of linked lists, such as linear linked lists, doubly linked lists, and circular linked lists.

    Chapter 4, Stacks and Queues, introduces you to array-based and linked list-based stacks and queues and their implementation in R.

    Chapter 5, Sorting Algorithms, explains various sorting algorithms, such as insertion sort, bubble sort, selection sort, and shell sort, and provides an empirical comparison between different algorithms.

    Chapter 6, Exploring Search Options, provides details about search operations carried out on both vectors and lists, including linked lists. It also introduces you to self-organizing lists and hashing concepts.

    Chapter 7, Indexing, covers indexing concepts, which are essential in file structuring, and organize large amounts of data on disk. It will also cover ISAM, 2-3 trees, B-tree, and B+ tree in detail.

    Chapter 8, Graphs, builds a foundation for the graph data structure and its implementation. It also covers various algorithms for traversals, shortest-paths problems, and minimum-cost spanning trees in detail.

    Chapter 9, Programming and Randomized Algorithms, extends the concept of a static data structure to randomize data structure, such as randomized skip lists. The chapter will also introduce programming concepts and several applications of it.

    Chapter 10, Functional Data Structures, introduces you to functional data structures and lazy evaluation. It will also cover functional stacks and queues in R.

    What you need for this book

    You will need inquisitiveness, perseverance, and a passion for algorithm design and data science. The scope and application of data structures is quite broad and wide.

    You will need a good understanding of R or another programming language. Preliminary experience of programming and data analysis will be helpful as well. You will need to appreciate algorithms that can be applied in scale to build applications.

    Who this book is for

    This book is for R developers who want to use data structures efficiently. Basic knowledge of R is expected.

    Conventions

    In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning.

    Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: It also enables us to install and compile new R packages directly from the R console using the install.packages() command.

    A block of code is set as follows:

        if (test expression)

        {

          Statement upon condition is true

        }

    Any command-line input or output is written as follows:

    pip3 install -upgrade pip

    pip3 install jupyter

    New terms and important words are shown in bold. Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: To start a new R Notebook, click on right hand side New tab and select R kernel as shown in Figure 1.7.

    Note

    Warnings or important notes appear in a box like this.

    Tip

    Tips and tricks appear like this.

    Reader feedback

    Feedback from our readers is always welcome. Let us know what you think about this book-what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of. To send us general feedback, simply e-mail [email protected], and mention the book's title in the subject of your message. If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.

    Customer support

    Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

    Downloading the example code

    You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

    You can download the code files by following these steps:

    Log in or register to our website using your e-mail address and password.

    Hover the mouse pointer on the SUPPORT tab at the top.

    Click on Code Downloads & Errata.

    Enter the name of the book in the Search box.

    Select the book for which you're looking to download the code files.

    Choose from the drop-down menu where you purchased this book from.

    Click on Code Download.

    Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

    WinRAR / 7-Zip for Windows

    Zipeg / iZip / UnRarX for Mac

    7-Zip / PeaZip for Linux

    The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/R-Data-Structures-and-Algorithms. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

    Downloading the color images of this book

    We also provide you with a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better understand the changes in the output. You can download this file from https://www.packtpub.com/sites/default/files/downloads/RDataStructuresandAlgorithms_ColorImages.pdf.

    Errata

    Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books-maybe a mistake in the text or the code-we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.

    To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.

    Piracy

    Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

    Please contact us at [email protected] with a link to the suspected pirated material.

    We appreciate your help in protecting our authors and our ability to bring you valuable content.

    Questions

    If you have a problem with any aspect of this book, you can contact us at [email protected], and we will do our best to address the problem.

    Chapter 1. Getting Started

    Faster and efficient information retrieval is the primary objective of most computer programs. Data structures and algorithms help us in achieving the objective by processing and retrieving data faster. Information retrieval can easily be integrated with algorithms to answer inferential questions from data such as:

    How are sales increasing over time?

    What is customer's arrival distribution over time?

    Out of all the customers who visit between 3:00 and 6:00 PM, how many order Asian versus Chinese?

    Of all the customers visiting, how many are from the same city?

    In the efficient processing of the preceding queries, especially in big data scenarios, data structures and algorithms utilized for data retrieval play a significant role. This book will introduce primary data structures such as lists, queues, and stacks, which are used for information retrieval and trade-off of different data structures. We will also introduce the data structure and algorithms evaluation approach for retrieval and processing of defined data structures.

    Algorithms are evaluated based on complexity and efficiency, where complexity refers to an algorithm design which is easy to program and debug, and efficacy ensures that the algorithm is utilizing computer resources optimally. This book will focus on the efficiency part of algorithms using data structures, and the current chapter introduces the importance of data structure and the algorithms used for retrieval of data from data structures.

    Introduction to data structure

    Moore's law in 1965 observed that the number of transistors per square inch in a dense integrated circuit (IC) had doubled every year since its invention, thus enhancing computational power per computer. He revised his forecast in 1975, stating that the number of transistors would double every 2 years, instead of every year, due to saturation:

    Introduction to data structure

    Figure 1.1: Moore's law (Ref: data credit - Transistor count, Wikipedia)

    Also, although the computational power has been increasing, problem complexity and data sources have also been increasing exponentially over the decade, enforcing the need for efficient algorithms:

    Introduction to data structure

    Figure 1.2: Increase in size of unstructured data (Ref: Enterprise strategy group 2010)

    This explosion in data from 2008 to 2015 has led to a new field of data science where people put in a lot of effort to derive insights using all kinds of datasets such as structured, semi-structured, and unstructured. Thus, to efficiently deal with data scalability, it is important to efficiently store and retrieve datasets. For example, searching for a word in a dictionary would take a long time if the data is randomly organized; thus, sorted list data structures are utilized to ensure a faster search of words. Similarly, searching for an optimal path in a city based on the input location requires data about road network, position, and so on to be stored in the form of geometries. Ideally, even a variable stored as any built-in data type such as character, integer, or float can be considered as a data structure of scalar nature. However, data structure is formally defined as a scheme of organizing related information in a computer so that it can be used efficiently.

    Similarly, for algorithms, given sufficient space and time, any dataset can be stored and processed to

    Enjoying the preview?
    Page 1 of 1