
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Found 2038 Articles for R Programming

468 Views
An R data frame contain columns that might represent a similar type of variables; therefore, we might want to find the sum of the values for each of the columns and make a comparison based on the sum. This can be done with the help of sum function but first we need to extract the columns to find the sum.ExampleConsider the below data frame −> set.seed(1) > x1 x2 x3 x4 x5 x6 x7 df df x1 x2 x3 x4 x5 x6 x7 1 -0.62645381 1.41897737 0.83547640 3.9016178 1.4313313 1.879633 2.494043 2 0.18364332 1.28213630 0.74663832 1.4607600 1.8648214 2.542116 4.343039 3 ... Read More

296 Views
Most of the times the format of the data we get is not we are looking for therefore, we need to change that according to our need. When the levels of categorical variables are represented by words instead of numbers then we can convert those levels to lowercase or to uppercase. Sometimes, this is done just to make the information look user friendly. Mostly, we find that the values are in lowercase, so we can convert it to the upper case with the help of sapply function.ExampleConsider the below data frame −> x1 x2 x3 df df x1 x2 ... Read More

2K+ Views
If an R data frame contains a group variable that has many group levels then finding the minimum and maximum values of a discrete or continuous variable based on the group levels becomes difficult. But this can be done with slice function in dplyr package.Consider the below data frame that has one group variable and continuous as well as discrete variables −> set.seed(2) > x1 x2 x3 x4 x5 x6 x7 Group df df x1 x2 x3 x4 x5 x6 x7 Group 1 85 8 14 7 8 2.900301 749 1 2 79 7 12 4 3 3.331022 200 2 ... Read More

2K+ Views
When a data frame is large, we can split it into multiple parts randomly. This might be required when we want to analyze the data partially. We can do this with the help of split function and sample function to select the values randomly.ExampleConsider the trees data in base R −> str(trees) 'data.frame': 31 obs. of 3 variables: $ Girth : num 8.3 8.6 8.8 10.5 10.7 10.8 11 11 11.1 11.2 ... $ Height: num 70 65 63 72 81 83 66 75 80 75 ... $ Volume: num 10.3 10.3 10.2 16.4 18.8 19.7 15.6 18.2 22.6 19.9 ... Read More

652 Views
When our data has empty values then it is difficult to perform the analysis, we might to convert those empty values to NA so that we can understand the number of values that are not available. This can be done by using single square brackets.ExampleConsider the below data frame that has some empty values −> x1 x2 x3 df df x1 x2 x3 1 1 2 5 2 2 2 5 3 3 2 4 4 1 2 4 5 2 4 4 6 3 4 4 7 1 4 4 8 2 4 2 9 3 2 10 1 2 11 2 12 3 13 1 4 14 2 4 15 3 4 16 4 17 18 19 2 20 1Converting empty values to NA −> df[df == ""] df x1 x2 x3 1 1 2 5 2 2 2 5 3 3 2 4 4 1 2 4 5 2 4 4 6 3 4 4 7 1 4 4 8 2 4 2 9 3 2 10 1 2 11 2 12 3 13 1 4 14 2 4 15 3 4 16 4 17 18 19 2 20 1

657 Views
During the survey or any other medium of data collection, getting all the information from all units is not possible. Sometimes we get partial information and sometimes nothing. Therefore, it is possible that some rows in our data are completely blank and some might have partial data. The blank rows can be removed and the other empty values can be filled with methods that helps to deal with missing information.ExampleConsider the below data frame, it has some missing rows and some missing values −> x1 x2 x3 df df x1 x2 x3 1 1 2 5 2 2 2 5 ... Read More

1K+ Views
Selection of columns in R is generally done with the column number or its name with $ delta operator. We can also select the columns with their partial name string or complete name as well without using $ delta operator. This can be done with select and matches function of dplyr package.ExampleLoading dplyr package −> library(dplyr)Consider the BOD data in base R −> str(BOD) 'data.frame': 6 obs. of 2 variables: $ Time : num 1 2 3 4 5 7 $ demand: num 8.3 10.3 19 16 15.6 19.8 - attr(*, "reference")= chr "A1.4, p. 270"Selecting the column of BOD ... Read More

567 Views
Comparison of rows is an influential part of data analysis, sometimes we compare variable with variable, value with value, case or row with another case or row, or even a complete data set with another data set. This is required to check the accuracy of data values and its consistency therefore we must do it. For this purpose, we need to select the required rows, columns etc. To select the first row for each level of a factor variable we can use duplicated function with ! sign.ExampleConsider the below data frame −> x1 x2 x3 df head(df, 20) x1 ... Read More

220 Views
To check the trend of all columns of a data frame, we need to create line charts for all of those columns. These line charts help us to understand how data points fall or rise for the columns. Once we know the trend, we can try to find the out the reasons behind them and take appropriate actions. We can plot line charts for each of the column by using plot.ts function that plots data as a time series.ExampleConsider the below data frame.> set.seed(1) > x1 x2 x3 x4 x5 x6 df head(df, 20) x1 x2 x3 x4 x5 x6 ... Read More

390 Views
While doing the data exploration in an analytical project, we sometimes need to find the index of some values, mostly the indices of minimum and maximum values to check whether the corresponding data row has some crucial information or we may neglect it. Also, these values sometimes transformed to another values based on the data characteristics if we don’t want to neglect them.Example> x which(x==min(x)) [1] 1 > which(x==max(x)) [1] 25 > set.seed(2) > x1 x1 [1] 85 79 70 6 32 8 17 93 81 76 41 50 75 65 3 80 96 50 55 [20] 63 8 33 ... Read More