R Studio Project
R Studio Project
R Studio Project
ABSTRACT
R is a programming language and free software environment for statistical computing and
graphics supported by the R Foundation for Statistical Computing. The R language is widely
used among statisticians and data miners for developing statistical software and data
analysis. Polls, data mining surveys, and studies of scholarly literature databases show
substantial increases in popularity; as of February 2020, R ranks 13th in the TIOBE index, a
measure of popularity of programming languages.
INTRODUCTION
HISTORY
STATISTICAL FEATURES
R has Rd, its own LaTeX-like documentation format, which is used to supply comprehensive
documentation, both online in a number of formats and in hard copy.
PROGRAMMING FEATURES
Although used mainly by statisticians and other practitioners requiring an environment for
statistical computation and software development, R can also operate as a general matrix
calculation toolbox – with performance benchmarks comparable to GNU
Octave or MATLAB.
PACKAGES
A core set of packages is included with the installation of R, with more than 15,000
additional packages (as of September 2018) available at the Comprehensive R Archive
Network (CRAN), Bioconductor, Omegahat, GitHub, and other repositories.
The "Task Views" page (subject list) on the CRAN website lists a wide range of tasks (in
fields such as Finance, Genetics, High Performance Computing, Machine Learning, Medical
Imaging, Social Sciences and Spatial Statistics) to which R has been applied and for which
packages are available. R has also been identified by the FDA as suitable for interpreting data
from clinical research.
Other R package resources include Crantastic, a community site for rating and reviewing all
CRAN packages, and R-Forge, a central platform for the collaborative development of R
packages, R-related software, and projects. R-Forge also hosts many unpublished beta
packages, and development versions of CRAN packages. Microsoft maintains a daily
snapshot of CRAN, that dates back to Sept. 17, 2014.
The Bioconductor project provides R packages for the analysis of genomic data. This
includes object-oriented data-handling and analysis tools for data
from Affymetrix, cDNA microarray, and next-generation high-throughput
sequencing methods.
INTERFACES
Some of the more common editors with varying levels of support for R
include Emacs (Emacs Speaks Statistics), Vim (Nvim-R plugin), Neovim (Nvim-R
plugin), Kate, LyX, Notepad++, Visual Studio Code, WinEdt, and Tinn-R.
IMPLEMENTATIONS
The main R implementation is written in R, C, and Fortran, and there are several other
implementations aimed at improving speed or increasing extensibility. A closely related
implementation is pqR (pretty quick R) by Radford M. Neal with improved memory
management and support for automatic multithreading. Renjin and FastR
are Java implementations of R for use in a Java Virtual Machine. CXXR, rho, and Riposte are
implementations of R in C++. Renjin, Riposte, and pqR attempt to improve performance by
using multiple processor cores and some form of deferred evaluation. Most of these
alternative implementations are experimental and incomplete, with relatively few users,
compared to the main implementation maintained by the R Development Core Team.
COMMUNITIES
R has local communities worldwide for users to network, share ideas, and learn.
There is a growing number of R events bringing its users together, such as conferences (e.g.
useR!, WhyR?, conectaR, SatRdays), meetups, as well as R-Ladies groups that promote
gender diversity.
LITERATURE REVIEW
1.TEXT MINING SCIENTIFIC ARTICLES USING R STUDIO
The aim of this study is to develop a solution for text mining scientific articles using
the R language in the " Knowledge Extraction and Machine Learning " course.
Automatic text summary of papers is a challenging problem whose approach would
allow researchers to browse large article collections and quickly view highlights and
drill down for details. The proposed solution is based in social network analysis, topic
models and bipartite graph approaches. Our method defines a bipartite graph between
documents and topics built using the Latent Dirichlet Allocation topic model. The
topics are connected to generate a network of topics, which is converted to bipartite
graph, using topics collected in the same document. Hence, it is revealed to be a very
promising technique for providing insights about summarizing scientific article
collections.
I. Journal
A larger amount of data gives a better output but also working with it can become a challenge
due to processing limitations. Nowadays companies are starting to realize the importance of
using more data in order to support decision for their strategies. It was said and proved
through study cases that "More data usually beats better algorithms". With this statement
companies started to realize that they can chose to invest more in processing larger sets of
data rather than investing in expensive algorithms. During the last decade, large statistics
evaluation has seen an exponential boom and will absolutely retain to witness outstanding
tendencies due to the emergence of new interactive multimedia packages and extraordinarily
incorporated systems driven via the speedy growth in statistics services and microelectronic
gadgets. Up to now, maximum of the modern mobile structures are especially centered to
voice communications with low transmission fees.
Doi: 10.5281/zenodo.3266146
Publication Date: 2019
CONCLUSION
There are a number of reasons why R studio is preferred:
There are many answers to this question, but some of the most important are:
One of the biggest perks of working with R and RStudio is that both are available free of
charge. Whereas other, proprietary statistics packages are often stuck in the dark ages of
development (the 1990s, for example), and can be incredibly expensive to purchase, R is a
free alternative that allows users of all experience levels to contribute to its development.
As many scientific fields embrace the idea of reproducible analyses, proprietary point-and-
click systems actually serve as a hindrance to this process. If you need to re-run your
analysis using one of these systems, you’ll need to carefully copy-and-paste your results
into your text editor, potentially from beginning to end. As anyone who has done this sort
of copy-and-pasting knows, this approach is both prone to errors and incredibly tedious.
If, on the other hand, you use the workflows described in this book, your analyses will be
reproducible, thus eliminating the copy-and-paste dance. And, as you can probably guess,
it is much better to be able to update your code and data inputs and then re-run all of your
analysis with the push of a button than to have to worry about manually moving your
results from one program to another. Reproducibility also helps you as a programmer,
since your most frequent collaborator is likely to be yourself a few months or years down
the road. Instead of having to carefully write down all the steps you took to find the correct
drop-down menu option, your entire code is stored, and immediately reusable.
This approach also helps with collaboration since, as you will see later, you can share a
single R Markdown file containing all of your analysis, documentation, comments, and
code with others. This reduces the time needed to work with others and reduces the
likelihood of errors being made in following along with point-and-click analyses. The
mantra here is to Say No to Copy-And-Paste! both for your sanity and for the sake of
science.
We all know that learning isn’t easy. Do you have trouble remembering how to follow a
list of more than 10 steps or so? Do you find yourself going back over and over again
because you can’t remember what step comes next in the process? This is extremely
common, especially if you haven’t done the procedure in awhile. Learning by following a
procedure is easy in the short-term, but can be extremely frustrating to remember in the
long-term. If done well, programming promotes long-term thinking to short-term fixes.