R Data Structures and Algorithms
()
About this ebook
Related to R Data Structures and Algorithms
Related ebooks
Python Data Structures and Algorithms Rating: 5 out of 5 stars5/5Mastering Data Structures and Algorithms with Python: Unlock the Secrets of Expert-Level Skills Rating: 0 out of 5 stars0 ratingsMastering Algorithms for Competitive Programming: Unlock the Secrets of Expert-Level Skills Rating: 0 out of 5 stars0 ratingsData Structures and Algorithms with Python Rating: 0 out of 5 stars0 ratingsMastering Algorithm in Python Rating: 0 out of 5 stars0 ratingsMastering Data Structures and Algorithms in Python & Java Rating: 0 out of 5 stars0 ratingsAdvanced Algorithm Mastery: Elevating Python Techniques for Professionals Rating: 0 out of 5 stars0 ratingsIntroduction to Algorithms and Data Structures: A solid foundation for the real world of machine learning and data analytics Rating: 0 out of 5 stars0 ratingsData Science with R: Beginner to Expert Rating: 0 out of 5 stars0 ratingsMastering Data Science: From Basics to Expert Proficiency Rating: 0 out of 5 stars0 ratingsIGNOU MCA Data Science and Big Data Previous Years Unsolved Papers MCS 226 Rating: 0 out of 5 stars0 ratingsAlgorithms Unlocked: Mastering Computational Problem Solving Rating: 0 out of 5 stars0 ratings15 Math Concepts Every Data Scientist Should Know: Understand and learn how to apply the math behind data science algorithms Rating: 0 out of 5 stars0 ratingsData Science Fundamentals and Practical Approaches: Understand Why Data Science Is the Next (English Edition) Rating: 0 out of 5 stars0 ratingsData Structure in Python: From Basics to Expert Proficiency Rating: 0 out of 5 stars0 ratingsData Mining Models: Techniques and Applications Rating: 0 out of 5 stars0 ratingsMastering Data Science: A Comprehensive Guide to Techniques and Applications Rating: 0 out of 5 stars0 ratingsMastering Algorithms and Data Structures Rating: 0 out of 5 stars0 ratingsMachine Learning Cookbook with Python: Create ML and Data Analytics Projects Using Some Amazing Open Datasets (English Edition) Rating: 0 out of 5 stars0 ratingsdata science course training in india hyderabad: innomatics research labs Rating: 0 out of 5 stars0 ratingsAdvanced Data Structures in Python: Mastering Complex Computational Patterns Rating: 0 out of 5 stars0 ratingsJava Programming: Algorithms and Structures Rating: 0 out of 5 stars0 ratingsCrushing The Technical Interview: Data Structures And Algorithms (Python Edition) Rating: 0 out of 5 stars0 ratingsElements of Statistical Learning Rating: 0 out of 5 stars0 ratingsBig Data and Data Science: Analytics for the Future Rating: 0 out of 5 stars0 ratingsMastering Python Algorithms: Practical Solutions for Complex Problems Rating: 0 out of 5 stars0 ratingsData Science Mastery: From Beginner to Expert in Big Data Analytics Rating: 0 out of 5 stars0 ratingsData Science Essentials For Dummies Rating: 0 out of 5 stars0 ratingsData Science Unveiled: A Practical Guide to Key Techniques Rating: 0 out of 5 stars0 ratings
Programming For You
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL Rating: 4 out of 5 stars4/5Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1 Rating: 5 out of 5 stars5/5SQL All-in-One For Dummies Rating: 3 out of 5 stars3/5Python Programming : How to Code Python Fast In Just 24 Hours With 7 Simple Steps Rating: 4 out of 5 stars4/5Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer. Rating: 5 out of 5 stars5/5Python: Learn Python in 24 Hours Rating: 4 out of 5 stars4/5JavaScript All-in-One For Dummies Rating: 5 out of 5 stars5/5Learn SQL in 24 Hours Rating: 5 out of 5 stars5/5Learn Python in 10 Minutes Rating: 4 out of 5 stars4/5Coding All-in-One For Dummies Rating: 4 out of 5 stars4/5Linux: Learn in 24 Hours Rating: 5 out of 5 stars5/5Excel 101: A Beginner's & Intermediate's Guide for Mastering the Quintessence of Microsoft Excel (2010-2019 & 365) in no time! Rating: 0 out of 5 stars0 ratingsPYTHON PROGRAMMING Rating: 4 out of 5 stars4/5Beginning Programming with C++ For Dummies Rating: 4 out of 5 stars4/5Microsoft Azure For Dummies Rating: 0 out of 5 stars0 ratingsPYTHON: Practical Python Programming For Beginners & Experts With Hands-on Project Rating: 5 out of 5 stars5/5Algorithms For Dummies Rating: 4 out of 5 stars4/5Godot from Zero to Proficiency (Foundations): Godot from Zero to Proficiency, #1 Rating: 5 out of 5 stars5/5HTML, CSS, and JavaScript Mobile Development For Dummies Rating: 4 out of 5 stars4/5Learn PowerShell in a Month of Lunches, Fourth Edition: Covers Windows, Linux, and macOS Rating: 5 out of 5 stars5/5
Reviews for R Data Structures and Algorithms
0 ratings0 reviews
Book preview
R Data Structures and Algorithms - Dr. PKS Prakash
Table of Contents
R Data Structures and Algorithms
Credits
About the Authors
Acknowledgments
About the Reviewer
www.PacktPub.com
Why subscribe?
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Downloading the color images of this book
Errata
Piracy
Questions
1. Getting Started
Introduction to data structure
Abstract data type and data structure
Relationship between problem and algorithm
Basics of R
Installation of R
Basic data types in R
Operations in R
Control structures in R
If condition
If...else condition
Ifelse function
For() loop
Nested for( ) loop
While loop
Special statements in loops
Break statement
Next statement
Repeat loop
First class functions in R
Exercises
Summary
2. Algorithm Analysis
Getting started with data structure
Memory management in R
System runtime in R
Best, worst, and average cases
Computer versus algorithm
Algorithm asymptotic analysis
Upper bounds or Big O notation
Lower bounds or Big Omega notation (Ω)
Big θ notation
Simplifying rules
Classifying rules
Computation evaluation of a program
Component 1 - Assignment operator
Component 2 - Simple loop
Component 3 - Complex loop
Component 4 - Loops with conditional statements
Component 5 - Recursive statements
Analyzing problems
Space bounds
Exercises
Summary
3. Linked Lists
Data types in R
Vector and atomic vector
Element data types
Factor
Matrix
Array
Dataframes
List
Object-oriented programming using R
Linked list
Linear linked list
Doubly linked list
Circular linked list
Array-based list
Analysis of list operations
Exercises
Summary
4. Stacks and Queues
Stacks
Array-based stacks
Linked stacks
Comparison of array-based and linked stacks
Implementing recursion
Queues
Array-based queues
Linked queues
Comparison of array-based and linked queues
Dictionaries
Exercises
Summary
5. Sorting Algorithms
Sorting terminology and notation
Three Θ(n²) sorting algorithms
Insertion sort
Bubble sort
Selection sort
The cost of exchange sorting
Shell sort
Merge sort
Quick sort
Heap sort
Bin sort and radix sort
An empirical comparison of sorting algorithms
Lower bounds for sorting
Exercises
Summary
6. Exploring Search Options
Searching unsorted and sorted vectors
Self-organizing lists
Heuristic 1 - Count
Heuristic 2 - Move-to-front
Heuristic 3 - Transpose
Hashing
Hash functions
Open hashing
Closed hashing
Bucket hashing
Linear probing
Analysis of closed hashing
Deletion
Exercises
Summary
7. Indexing
Linear indexing
ISAM
Tree-based indexing
2-3 trees
B-trees
B+ trees
B-tree analysis
Exercises
Summary
8. Graphs
Terminology and representations
Graph implementations
Graph traversals
Depth-first search
Breadth-first search
Topological sort
Shortest path problems
Single-source shortest paths
Minimum-cost spanning tree
Prim's algorithm
Kruskal's algorithm
Exercises
Summary
9. Programming and Randomized Algorithms
Dynamic programming
The knapsack problem
All pairs shortest paths
Randomized algorithms
Randomized algorithms for finding large values
Skip lists
Probabilistic analysis of skip lists
Exercises
Summary
10. Functional Data Structures
Functional data structure
Lazy evaluation
Functional stacks
Functional queues
Fast fully-persistent queues
Slowly-persistent queues and deques
Summary
R Data Structures and Algorithms
R Data Structures and Algorithms
Copyright © 2016 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: November 2016
Production reference: 1141116
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-78646-515-3
www.packtpub.com
Credits
About the Authors
Dr. PKS Prakash has pursued his PhD in industrial and system engineering at Wisconsin-Madison, US. He defended his second PhD in engineering from University of Warwick, UK. He has provided data science support to numerous leading companies in healthcare, manufacturing, pharmaceutical, and e-commerce domains on a wide range of business problems related to predictive and prescriptive modeling, virtual metrology, predictive maintenance, root cause analysis, process simulations, fraud detection, early warning systems, and so on. Currently, he is working as the Vice President and Practice Lead for data science at Dream11. Dream11 offers the world's largest fantasy cricket, football, and kabaddi games of skill. He has published widely in research areas of operational research and management, soft computing tools, and advanced algorithms in the manufacturing and healthcare domains in leading journals such as IEEE-Trans, EJOR, and IJPR, among others. He has contributed a chapter in Evolutionary Computing in Advanced Manufacturing and edited an issue of Intelligent Approaches to Complex Systems.
Achyutuni Sri Krishna Rao has an MS in enterprise business analytics (data science) from National University of Singapore. He has worked on a wide range of data science problems in the domain of manufacturing, healthcare and pharmaceuticals. He is an R enthusiast and loves to contribute to the open source community. His passions include freelancing, technical blogs (http://rcodeeasy.blogspot.com) and marathon runs. Currently he works as a data science consultant at a leading consulting firm.
Acknowledgments
The completion of this book wouldn't have been possible without the participation and assistance of so many people whose names may not all be enumerated. Their contribution is sincerely appreciated and gratefully acknowledged. However, we would like to express our deep appreciation and indebtedness to all the members of the Packt team involved in this project. The book arose with an idea with an early discussion with Denim Pinto (acquisition editor), so we want to extend special thanks to him; without his input, this book would never have happened. We also want to thank Pooja Mhapsekar and Siddhi Chavan (content editor) and Sunith Shetty (technical editor) for ensuring the timely publication of this book. We would like to extend thanks to Vahid Mirjalili (reviewer), whose feedback has helped us tremendously improve this book.
About the Reviewer
Dr. Vahid Mirjalili is a software engineer/data scientist, currently working toward his PhD in computer science at Michigan State University. His research at the Integrated Pattern Recognition and Biometrics (i-PRoBE) lab involves attribute classification of face images from large image datasets. Furthermore, he teaches Python programming, as well as computing concepts for data analysis and databases. With his specialty in data mining, he is very interested in predictive modeling and getting insights from data. He is also a Python developer and likes to contribute to the open source community. Furthermore, he enjoys making tutorials for different areas of data science and computer algorithms, which can be found in his GitHub repository (http://github.com/mirjalil/DataScience).
www.PacktPub.com
For support files and downloads related to your book, please visit www.PacktPub.com.
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
www.PacktPub.comhttps://www.packtpub.com/mapt
Get the most in-demand software skills with Mapt. Mapt gives you full access to all Packt books and video courses, as well as industry-leading tools to help you plan your personal development and advance your career.
Why subscribe?
Fully searchable across every book published by Packt
Copy and paste, print, and bookmark content
On demand and accessible via a web browser
Preface
Data structures represent a way to organize and access particular data efficiently. They are critical to any problem solving and provide a complete solution to implement reusable codes. R Data Structures and Algorithms aims at strengthening the data structure skills among R users working in the analytics and intelligence domains. R is a well-designed language and environment for statistical computing and graphics developed at Bell Laboratories (formerly AT&T, now Lucent Technologies). This book will allow users to design optimized algorithms from the computational efficiency and resource usage perspective of an algorithm. This book puts forward the processes of building algorithms by introducing several data structures and their relationship with algorithms, followed by their analysis and evaluation. This book intends to cover not only the classical data structures, but also understand the integrities of functional data structures. We will cover the fundamentals of data structures, such as lists, stacks, queues, and dictionaries, followed by topics such as indexing, sorting, and searching in depth. Readers will also be exposed to advanced topics such as graphs, dynamic programming, and randomized algorithms.
The objective of this book is to build data structure concepts using R.
What this book covers
Chapter 1, Getting Started, builds a background for the aspects of data structures that are important to develop basics of R, as well as why they are important.
Chapter 2, Algorithm Analysis, talks about motivation, basic notation, and fundamental techniques for algorithm analysis.
Chapter 3, Linked Lists, builds a foundation of linked lists and will cover multiple variants of linked lists, such as linear linked lists, doubly linked lists, and circular linked lists.
Chapter 4, Stacks and Queues, introduces you to array-based and linked list-based stacks and queues and their implementation in R.
Chapter 5, Sorting Algorithms, explains various sorting algorithms, such as insertion sort, bubble sort, selection sort, and shell sort, and provides an empirical comparison between different algorithms.
Chapter 6, Exploring Search Options, provides details about search operations carried out on both vectors and lists, including linked lists. It also introduces you to self-organizing lists and hashing concepts.
Chapter 7, Indexing, covers indexing concepts, which are essential in file structuring, and organize large amounts of data on disk. It will also cover ISAM, 2-3 trees, B-tree, and B+ tree in detail.
Chapter 8, Graphs, builds a foundation for the graph data structure and its implementation. It also covers various algorithms for traversals, shortest-paths problems, and minimum-cost spanning trees in detail.
Chapter 9, Programming and Randomized Algorithms, extends the concept of a static data structure to randomize data structure, such as randomized skip lists. The chapter will also introduce programming concepts and several applications of it.
Chapter 10, Functional Data Structures, introduces you to functional data structures and lazy evaluation. It will also cover functional stacks and queues in R.
What you need for this book
You will need inquisitiveness, perseverance, and a passion for algorithm design and data science. The scope and application of data structures is quite broad and wide.
You will need a good understanding of R or another programming language. Preliminary experience of programming and data analysis will be helpful as well. You will need to appreciate algorithms that can be applied in scale to build applications.
Who this book is for
This book is for R developers who want to use data structures efficiently. Basic knowledge of R is expected.
Conventions
In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning.
Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: It also enables us to install and compile new R packages directly from the R console using the install.packages() command.
A block of code is set as follows:
if (test expression)
{
Statement upon condition is true
}
Any command-line input or output is written as follows:
pip3 install -upgrade pip
pip3 install jupyter
New terms and important words are shown in bold. Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: To start a new R Notebook, click on right hand side New tab and select R kernel as shown in Figure 1.7.
Note
Warnings or important notes appear in a box like this.
Tip
Tips and tricks appear like this.
Reader feedback
Feedback from our readers is always welcome. Let us know what you think about this book-what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of. To send us general feedback, simply e-mail [email protected], and mention the book's title in the subject of your message. If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.
Customer support
Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.
Downloading the example code
You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
You can download the code files by following these steps:
Log in or register to our website using your e-mail address and password.
Hover the mouse pointer on the SUPPORT tab at the top.
Click on Code Downloads & Errata.
Enter the name of the book in the Search box.
Select the book for which you're looking to download the code files.
Choose from the drop-down menu where you purchased this book from.
Click on Code Download.
Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:
WinRAR / 7-Zip for Windows
Zipeg / iZip / UnRarX for Mac
7-Zip / PeaZip for Linux
The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/R-Data-Structures-and-Algorithms. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
Downloading the color images of this book
We also provide you with a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better understand the changes in the output. You can download this file from https://www.packtpub.com/sites/default/files/downloads/RDataStructuresandAlgorithms_ColorImages.pdf.
Errata
Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books-maybe a mistake in the text or the code-we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.
To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.
Piracy
Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.
Please contact us at [email protected] with a link to the suspected pirated material.
We appreciate your help in protecting our authors and our ability to bring you valuable content.
Questions
If you have a problem with any aspect of this book, you can contact us at [email protected], and we will do our best to address the problem.
Chapter 1. Getting Started
Faster and efficient information retrieval is the primary objective of most computer programs. Data structures and algorithms help us in achieving the objective by processing and retrieving data faster. Information retrieval can easily be integrated with algorithms to answer inferential questions from data such as:
How are sales increasing over time?
What is customer's arrival distribution over time?
Out of all the customers who visit between 3:00 and 6:00 PM, how many order Asian versus Chinese?
Of all the customers visiting, how many are from the same city?
In the efficient processing of the preceding queries, especially in big data scenarios, data structures and algorithms utilized for data retrieval play a significant role. This book will introduce primary data structures such as lists, queues, and stacks, which are used for information retrieval and trade-off of different data structures. We will also introduce the data structure and algorithms evaluation approach for retrieval and processing of defined data structures.
Algorithms are evaluated based on complexity and efficiency, where complexity refers to an algorithm design which is easy to program and debug, and efficacy ensures that the algorithm is utilizing computer resources optimally. This book will focus on the efficiency part of algorithms using data structures, and the current chapter introduces the importance of data structure and the algorithms used for retrieval of data from data structures.
Introduction to data structure
Moore's law in 1965 observed that the number of transistors per square inch in a dense integrated circuit (IC) had doubled every year since its invention, thus enhancing computational power per computer. He revised his forecast in 1975, stating that the number of transistors would double every 2 years, instead of every year, due to saturation:
Introduction to data structureFigure 1.1: Moore's law (Ref: data credit - Transistor count, Wikipedia)
Also, although the computational power has been increasing, problem complexity and data sources have also been increasing exponentially over the decade, enforcing the need for efficient algorithms:
Introduction to data structureFigure 1.2: Increase in size of unstructured data (Ref: Enterprise strategy group 2010)
This explosion in data from 2008 to 2015 has led to a new field of data science where people put in a lot of effort to derive insights using all kinds of datasets such as structured, semi-structured, and unstructured. Thus, to efficiently deal with data scalability, it is important to efficiently store and retrieve datasets. For example, searching for a word in a dictionary would take a long time if the data is randomly organized; thus, sorted list data structures are utilized to ensure a faster search of words. Similarly, searching for an optimal path in a city based on the input location requires data about road network, position, and so on to be stored in the form of geometries. Ideally, even a variable stored as any built-in data type such as character, integer, or float can be considered as a data structure of scalar nature. However, data structure is formally defined as a scheme of organizing related information in a computer so that it can be used efficiently.
Similarly, for algorithms, given sufficient space and time, any dataset can be stored and processed to