Package Inpdfr': R Topics Documented
Package Inpdfr': R Topics Documented
R topics documented:
askQuit . .
checkEntry
doCA . . .
doCluster .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2
3
3
4
askQuit
doKmeansClust . . . . . . .
doMetacomEntropart . . . .
doMetacomMetacom . . . .
excludeStopWords . . . . .
exclusionList_FR . . . . . .
exclusionList_SP . . . . . .
exclusionList_UK . . . . . .
getAllAnalysis . . . . . . .
getListFiles . . . . . . . . .
getMostFreqWord . . . . . .
getMostFreqWordCor . . . .
getPDF . . . . . . . . . . .
getStopWords . . . . . . . .
getSummaryStatsBARPLOT
getSummaryStatsHISTO . .
getSummaryStatsOCCUR .
getTXT . . . . . . . . . . .
getwordOccuDF . . . . . . .
getXFreqWord . . . . . . .
inpdfr . . . . . . . . . . . .
loadGUI . . . . . . . . . . .
loremIpsum . . . . . . . . .
makeMainWindowsContent
makeMenuMainWindow . .
makeWordcloud . . . . . . .
mergeWordFreq . . . . . . .
open_cb . . . . . . . . . . .
open_cbFile . . . . . . . . .
postProcTxt . . . . . . . . .
preProcTxt . . . . . . . . .
quitSpaceFromChars . . . .
switchOffDialogWait . . . .
switchOnDialogWait . . . .
truncNumWords . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Index
askQuit
5
6
7
8
9
9
10
10
11
12
13
14
15
15
16
17
18
18
19
20
20
21
21
22
22
23
24
24
25
26
26
27
27
28
29
Description
This function is provided so that you can easily see its content. It is not intended to be used, prefer
loadGUI() to load the RGtk2 GUI.
checkEntry
Usage
askQuit(myobject)
Arguments
myobject
checkEntry
Description
This function is provided so that you can easily see its content. It is not intended to be used, prefer
loadGUI() to load the RGtk2 GUI.
Usage
checkEntry(validatedEntry, myRegEx)
Arguments
validatedEntry Entry to be checked.
myRegEx
Regular expression to test the validatedEntry.
doCA
Description
Performs a correspondance analysis on the basis of the word-occurrence data.frame using ca function.
Usage
doCA(wordF, getPlot = TRUE, mwidth = 800, mheight = 800,
formatType = "png", ...)
Arguments
wordF
getPlot
mwidth
mheight
formatType
...
doCluster
Value
The results of the ca function.
Examples
data("loremIpsum")
loremIpsum01 <- loremIpsum[1:100]
loremIpsum02 <- loremIpsum[101:200]
loremIpsum03 <- loremIpsum[201:300]
loremIpsum04 <- loremIpsum[301:400]
loremIpsum05 <- loremIpsum[401:500]
subDir <- "RESULTS"
dir.create(file.path(getwd(), subDir), showWarnings = FALSE)
write(x = loremIpsum01, file = "RESULTS/loremIpsum01.txt")
write(x = loremIpsum02, file = "RESULTS/loremIpsum02.txt")
write(x = loremIpsum03, file = "RESULTS/loremIpsum03.txt")
write(x = loremIpsum04, file = "RESULTS/loremIpsum04.txt")
write(x = loremIpsum05, file = "RESULTS/loremIpsum05.txt")
wordOccuDF <- getwordOccuDF(mywd = paste0(getwd(), "/RESULTS"), excludeSW = FALSE)
file.remove(list.files(pattern = "loremIpsum"))
doCA(wordF = wordOccuDF)
doCluster
Description
Performs a cluster analysis on the basis of the word-occurrence data.frame using hclust function.
Usage
doCluster(wordF, myMethod = "ward.D2", gp = FALSE, nbGp = 5,
getPlot = TRUE, mwidth = 800, mheight = 800, formatType = "png", ...)
Arguments
wordF
myMethod
gp
nbGp
getPlot
mwidth
mheight
formatType
The format for the output file ("eps", "pdf", "png", "svg", "tiff", "jpeg", "bmp").
...
doKmeansClust
Value
An object of class hclust.
Examples
data("loremIpsum")
loremIpsum01 <- loremIpsum[1:100]
loremIpsum02 <- loremIpsum[101:200]
loremIpsum03 <- loremIpsum[201:300]
loremIpsum04 <- loremIpsum[301:400]
loremIpsum05 <- loremIpsum[401:500]
subDir <- "RESULTS"
dir.create(file.path(getwd(), subDir), showWarnings = FALSE)
write(x = loremIpsum01, file = "RESULTS/loremIpsum01.txt")
write(x = loremIpsum02, file = "RESULTS/loremIpsum02.txt")
write(x = loremIpsum03, file = "RESULTS/loremIpsum03.txt")
write(x = loremIpsum04, file = "RESULTS/loremIpsum04.txt")
write(x = loremIpsum05, file = "RESULTS/loremIpsum05.txt")
wordOccuDF <- getwordOccuDF(mywd = paste0(getwd(), "/RESULTS"), excludeSW = FALSE)
file.remove(list.files(pattern = "loremIpsum"))
doCluster(wordF = wordOccuDF, myMethod = "ward.D2")
doKmeansClust
Description
Performs a k-means cluster analysis on the basis of the word-occurrence data.frame using kmeans
function.
Usage
doKmeansClust(wordF, nbClust = 4, nbIter = 10, algo = "Hartigan-Wong",
getPlot = TRUE, mwidth = 800, mheight = 800, formatType = "png", ...)
Arguments
wordF
nbClust
nbIter
algo
getPlot
mwidth
mheight
formatType
...
doMetacomEntropart
Value
An object of class kmeans (see kmeans).
Examples
data("loremIpsum")
loremIpsum01 <- loremIpsum[1:100]
loremIpsum02 <- loremIpsum[101:200]
loremIpsum03 <- loremIpsum[201:300]
loremIpsum04 <- loremIpsum[301:400]
loremIpsum05 <- loremIpsum[401:500]
subDir <- "RESULTS"
dir.create(file.path(getwd(), subDir), showWarnings = FALSE)
write(x = loremIpsum01, file = "RESULTS/loremIpsum01.txt")
write(x = loremIpsum02, file = "RESULTS/loremIpsum02.txt")
write(x = loremIpsum03, file = "RESULTS/loremIpsum03.txt")
write(x = loremIpsum04, file = "RESULTS/loremIpsum04.txt")
write(x = loremIpsum05, file = "RESULTS/loremIpsum05.txt")
wordOccuDF <- getwordOccuDF(mywd = paste0(getwd(), "/RESULTS"), excludeSW = FALSE)
file.remove(list.files(pattern = "loremIpsum"))
doKmeansClust(wordF = wordOccuDF, nbClust = 2)
doMetacomEntropart
Description
Uses the entropart-package to analyse the word-occurrence data.frame, considering words as
species and documents as communities.
Usage
doMetacomEntropart(wordF, getPlot = c(TRUE, TRUE, TRUE, TRUE),
getTextSink = c(TRUE, TRUE, TRUE, TRUE), mwidth = 800, mheight = 800,
formatType = "png")
Arguments
wordF
getPlot
getTextSink
doMetacomMetacom
mwidth
mheight
formatType
The format for the output file ("eps", "pdf", "png", "svg", "tiff", "jpeg", "bmp").
Value
A MetaCommunity object (see entropart-package).
Examples
data("loremIpsum")
loremIpsum01 <- loremIpsum[1:100]
loremIpsum02 <- loremIpsum[101:200]
loremIpsum03 <- loremIpsum[201:300]
loremIpsum04 <- loremIpsum[301:400]
loremIpsum05 <- loremIpsum[401:500]
subDir <- "RESULTS"
dir.create(file.path(getwd(), subDir), showWarnings = FALSE)
write(x = loremIpsum01, file = "RESULTS/loremIpsum01.txt")
write(x = loremIpsum02, file = "RESULTS/loremIpsum02.txt")
write(x = loremIpsum03, file = "RESULTS/loremIpsum03.txt")
write(x = loremIpsum04, file = "RESULTS/loremIpsum04.txt")
write(x = loremIpsum05, file = "RESULTS/loremIpsum05.txt")
wordOccuDF <- getwordOccuDF(mywd = paste0(getwd(), "/RESULTS"),
excludeSW = FALSE)
file.remove(list.files(pattern = "loremIpsum"))
doMetacomEntropart(wordF = wordOccuDF)
doMetacomMetacom
Description
Use the package Metacommunity to analyse the word-occurrence data.frame, considering words as
species and documents as communities.
Usage
doMetacomMetacom(wordF, numSim = 10, limit = "Inf", getPlot = TRUE,
getTextSink = TRUE, mwidth = 800, mheight = 800, formatType = "png")
Arguments
wordF
numSim
limit
getPlot
excludeStopWords
getTextSink
mwidth
mheight
formatType
The format for the output file ("eps", "pdf", "png", "svg", "tiff", "jpeg", "bmp").
Value
An object of class Metacommunity.
Examples
data("loremIpsum")
loremIpsum01 <- loremIpsum[1:100]
loremIpsum02 <- loremIpsum[101:200]
loremIpsum03 <- loremIpsum[201:300]
loremIpsum04 <- loremIpsum[301:400]
loremIpsum05 <- loremIpsum[401:500]
subDir <- "RESULTS"
dir.create(file.path(getwd(), subDir), showWarnings = FALSE)
write(x = loremIpsum01, file = "RESULTS/loremIpsum01.txt")
write(x = loremIpsum02, file = "RESULTS/loremIpsum02.txt")
write(x = loremIpsum03, file = "RESULTS/loremIpsum03.txt")
write(x = loremIpsum04, file = "RESULTS/loremIpsum04.txt")
write(x = loremIpsum05, file = "RESULTS/loremIpsum05.txt")
wordOccuDF <- getwordOccuDF(mywd = paste0(getwd(), "/RESULTS"),
excludeSW = FALSE)
file.remove(list.files(pattern = "loremIpsum"))
doMetacomMetacom(wordF = wordOccuDF)
excludeStopWords
Description
Exclude StopWords form the word occurrences data.frame. excludeStopWords uses parallel to
perform parallel computation.
Usage
excludeStopWords(wordF, lang = "English")
Arguments
wordF
lang
Value
The word-occurrence data.frame.
exclusionList_FR
Examples
## Not run:
excludeStopWords(wordF = myDF, lang = "French")
## End(Not run)
exclusionList_FR
Description
A vector containing stop words in French.
Usage
exclusionList_FR
Format
A vector with 173 elements (character), with UTF-8 characters escaped using stringi::stri_escape_unicode(exclusion
Source
Adapted from http://www.ranks.nl/stopwords/french.
exclusionList_SP
Description
A vector containing stop words in Spanish
Usage
exclusionList_SP
Format
A vector with 190 elements (character), with UTF-8 characters escaped using stringi::stri_escape_unicode(exclusion
Source
Adapted from http://www.ranks.nl/stopwords/spanish.
10
getAllAnalysis
exclusionList_UK
Description
A vector containing stop words in English.
Usage
exclusionList_UK
Format
A vector with 542 elements (character).
Source
Adapted from http://www.ranks.nl/stopwords.
getAllAnalysis
Description
A quick way to compute a set of analysis from the word-occurrence data.frame.
Usage
getAllAnalysis(dataset, wcloud = TRUE, sumStats = TRUE, freqW = TRUE,
corA = TRUE, clust = TRUE, metacom = TRUE)
Arguments
dataset
wcloud
sumStats
freqW
corA
clust
metacom
Value
A set of analyses available from the inpdfr package.
getListFiles
11
Examples
data("loremIpsum")
loremIpsum01 <- loremIpsum[1:100]
loremIpsum02 <- loremIpsum[101:200]
loremIpsum03 <- loremIpsum[201:300]
loremIpsum04 <- loremIpsum[301:400]
loremIpsum05 <- loremIpsum[401:500]
subDir <- "RESULTS"
dir.create(file.path(getwd(), subDir), showWarnings = FALSE)
write(x = loremIpsum01, file = "RESULTS/loremIpsum01.txt")
write(x = loremIpsum02, file = "RESULTS/loremIpsum02.txt")
write(x = loremIpsum03, file = "RESULTS/loremIpsum03.txt")
write(x = loremIpsum04, file = "RESULTS/loremIpsum04.txt")
write(x = loremIpsum05, file = "RESULTS/loremIpsum05.txt")
wordOccuDF <- getwordOccuDF(mywd = paste0(getwd(), "/RESULTS"),
excludeSW = FALSE)
file.remove(list.files(pattern = "loremIpsum"))
getAllAnalysis(dataset = wordOccuDF, wcloud = FALSE, sumStats = FALSE)
getListFiles
Description
List files in a specified directory sorted by extension. The function takes into account .txt and .pdf
files based on strsplit function.
Usage
getListFiles(mywd)
Arguments
mywd
Value
A list of length 2 with file names sorted by extension (pdf and txt).
Examples
getListFiles(mywd = getwd())
12
getMostFreqWord
getMostFreqWord
Description
Returns most frequent words and plots their frequencies per document.
Usage
getMostFreqWord(wordF, numWords, getPlot = TRUE, mwidth = 1024,
mheight = 800, formatType = "png")
Arguments
wordF
numWords
getPlot
mwidth
mheight
formatType
The format for the output file ("eps", "pdf", "png", "svg", "tiff", "jpeg", "bmp").
Value
The numWords most frequent words.
Examples
data("loremIpsum")
loremIpsum01 <- loremIpsum[1:100]
loremIpsum02 <- loremIpsum[101:200]
loremIpsum03 <- loremIpsum[201:300]
loremIpsum04 <- loremIpsum[301:400]
loremIpsum05 <- loremIpsum[401:500]
subDir <- "RESULTS"
dir.create(file.path(getwd(), subDir), showWarnings = FALSE)
write(x = loremIpsum01, file = "RESULTS/loremIpsum01.txt")
write(x = loremIpsum02, file = "RESULTS/loremIpsum02.txt")
write(x = loremIpsum03, file = "RESULTS/loremIpsum03.txt")
write(x = loremIpsum04, file = "RESULTS/loremIpsum04.txt")
write(x = loremIpsum05, file = "RESULTS/loremIpsum05.txt")
wordOccuDF <- getwordOccuDF(mywd = paste0(getwd(), "/RESULTS"),
excludeSW = FALSE)
file.remove(list.files(pattern = "loremIpsum"))
getMostFreqWord(wordF = wordOccuDF, numWords = 5)
getMostFreqWordCor
13
getMostFreqWordCor
Description
Test for correlation between the most frequent words.
Usage
getMostFreqWordCor(wordF, numWords, getPlot = c(TRUE, TRUE),
getTextSink = TRUE, mwidth = 1024, mheight = 1024, formatType = "png")
Arguments
wordF
numWords
getPlot
A vector with two logical values. If plots[1]==TRUE, an image of the correlation matrix is saved in the RESULTS directory. If plots[2]==TRUE, the image
of the p-value matrix associated with the correlation is saved in the RESULTS
directory.
getTextSink
If TRUE, save the correlation matrix and the associated p-values in a text file in
the RESULTS directory.
mwidth
mheight
formatType
The format for the output file ("eps", "pdf", "png", "svg", "tiff", "jpeg", "bmp").
Value
A list with the correlation matrix and the p-value matrix.
Examples
data("loremIpsum")
loremIpsum01 <- loremIpsum[1:100]
loremIpsum02 <- loremIpsum[101:200]
loremIpsum03 <- loremIpsum[201:300]
loremIpsum04 <- loremIpsum[301:400]
loremIpsum05 <- loremIpsum[401:500]
subDir <- "RESULTS"
dir.create(file.path(getwd(), subDir), showWarnings = FALSE)
write(x = loremIpsum01, file = "RESULTS/loremIpsum01.txt")
write(x = loremIpsum02, file = "RESULTS/loremIpsum02.txt")
write(x = loremIpsum03, file = "RESULTS/loremIpsum03.txt")
write(x = loremIpsum04, file = "RESULTS/loremIpsum04.txt")
write(x = loremIpsum05, file = "RESULTS/loremIpsum05.txt")
wordOccuDF <- getwordOccuDF(mywd = paste0(getwd(), "/RESULTS"),
excludeSW = FALSE)
14
getPDF
file.remove(list.files(pattern = "loremIpsum"))
getMostFreqWordCor(wordF = wordOccuDF, numWords = 5)
getPDF
Description
getPDF returns a word-occurrence data.frame from PDF files. It needs XPDF in order to run (http://www.foolabs.com/xpdf/dow
and uses parallel to perform parallel computation.
Usage
getPDF(myPDFs, minword = 1, maxword = 20, minFreqWord = 1,
pathToPdftotext = "")
Arguments
myPDFs
minword
An integer specifying the minimum number of letters per word into the returned
data.frame.
maxword
An integer to specifying the maximum number of letters per word into the returned data.frame.
minFreqWord
An integer specifying the minimum word frequency into the returned data.frame.
pathToPdftotext
A character containing an alternative path to XPDF pdftotext function, see
Details section.
Details
getPDF uses XPDF pdftotext function to extract the content of PDF files into a TXT file. If
pdftotext is not in the PATH, an alternative is to provide the full path of the program into the
pathToPdftotext parameter.
Value
A list of list with word-occurrence data.frame and file name.
Examples
## Not run:
getPDF(myPDFs = "mypdf.pdf")
## End(Not run)
getStopWords
15
getStopWords
Description
getStopWords returns a list of stopwords.
Usage
getStopWords()
Value
A list of vectors with stopwords for French, English, and Spanish languages.
Examples
getStopWords()
getSummaryStatsBARPLOT
Perform a barplot with the number of unique words per document
Description
Perform a barplot with the number of unique words per document using barplot function.
Usage
getSummaryStatsBARPLOT(wordF, getPlot = TRUE, mwidth = 480, mheight = 480,
formatType = "png", ...)
Arguments
wordF
getPlot
mwidth
mheight
formatType
The format for the output file ("eps", "pdf", "png", "svg", "tiff", "jpeg", "bmp").
...
Value
The number of unique words per document.
16
getSummaryStatsHISTO
Examples
data("loremIpsum")
loremIpsum01 <- loremIpsum[1:100]
loremIpsum02 <- loremIpsum[101:200]
loremIpsum03 <- loremIpsum[201:300]
loremIpsum04 <- loremIpsum[301:400]
loremIpsum05 <- loremIpsum[401:500]
subDir <- "RESULTS"
dir.create(file.path(getwd(), subDir), showWarnings = FALSE)
write(x = loremIpsum01, file = "RESULTS/loremIpsum01.txt")
write(x = loremIpsum02, file = "RESULTS/loremIpsum02.txt")
write(x = loremIpsum03, file = "RESULTS/loremIpsum03.txt")
write(x = loremIpsum04, file = "RESULTS/loremIpsum04.txt")
write(x = loremIpsum05, file = "RESULTS/loremIpsum05.txt")
wordOccuDF <- getwordOccuDF(mywd = paste0(getwd(), "/RESULTS"),
excludeSW = FALSE)
file.remove(list.files(pattern = "loremIpsum"))
getSummaryStatsBARPLOT(wordF = wordOccuDF)
getSummaryStatsHISTO
Description
Plot a histogram with the number of words excluding stop words using hist function.
Usage
getSummaryStatsHISTO(wordF, mwidth = 800, mheight = 800,
formatType = "png", ...)
Arguments
wordF
mwidth
mheight
formatType
The format for the output file ("eps", "pdf", "png", "svg", "tiff", "jpeg", "bmp").
...
Examples
data("loremIpsum")
loremIpsum01 <- loremIpsum[1:100]
loremIpsum02 <- loremIpsum[101:200]
loremIpsum03 <- loremIpsum[201:300]
loremIpsum04 <- loremIpsum[301:400]
loremIpsum05 <- loremIpsum[401:500]
subDir <- "RESULTS"
getSummaryStatsOCCUR
17
getSummaryStatsOCCUR
Description
Plot a scatter plot with the proportion of documents using similar words.
Usage
getSummaryStatsOCCUR(wordF, getPlot = TRUE, mwidth = 800, mheight = 800,
formatType = "png")
Arguments
wordF
getPlot
mwidth
mheight
formatType
The format for the output file ("eps", "pdf", "png", "svg", "tiff", "jpeg", "bmp").
Value
A data.frame containing the proportion of documents and the number of similar words.
Examples
## Not run:
getSummaryStatsOCCUR(wordF = myDF)
## End(Not run)
18
getwordOccuDF
getTXT
Description
Extract text from TXT files and return a word-occurrence data.frame.
Usage
getTXT(myTXTs)
Arguments
myTXTs
A character vector containing TXT file names (or complete path to these files).
Value
A list of list with word-occurrence data.frame and file name.
Examples
data("loremIpsum")
loremIpsum01 <- loremIpsum[1:100]
loremIpsum02 <- loremIpsum[101:200]
loremIpsum03 <- loremIpsum[201:300]
loremIpsum04 <- loremIpsum[301:400]
loremIpsum05 <- loremIpsum[401:500]
subDir <- "RESULTS"
dir.create(file.path(getwd(), subDir), showWarnings = FALSE)
write(x = loremIpsum01, file = "RESULTS/loremIpsum01.txt")
write(x = loremIpsum02, file = "RESULTS/loremIpsum02.txt")
write(x = loremIpsum03, file = "RESULTS/loremIpsum03.txt")
write(x = loremIpsum04, file = "RESULTS/loremIpsum04.txt")
write(x = loremIpsum05, file = "RESULTS/loremIpsum05.txt")
wordOccuFreq <- getTXT(myTXTs = list.files(path = paste0(getwd(),
"/RESULTS/"), pattern = "loremIpsum", full.names = TRUE))
file.remove(list.files(pattern = "loremIpsum"))
getwordOccuDF
Description
A quick way to obtain the word-occurrence data.frame from a set of documents.
getXFreqWord
19
Usage
getwordOccuDF(mywd, language = "English", excludeSW = TRUE)
Arguments
mywd
language
excludeSW
Value
A single word-occurrrence data.frame.
Examples
data("loremIpsum")
loremIpsum01 <- loremIpsum[1:100]
loremIpsum02 <- loremIpsum[101:200]
loremIpsum03 <- loremIpsum[201:300]
loremIpsum04 <- loremIpsum[301:400]
loremIpsum05 <- loremIpsum[401:500]
subDir <- "RESULTS"
dir.create(file.path(getwd(), subDir), showWarnings = FALSE)
write(x = loremIpsum01, file = "RESULTS/loremIpsum01.txt")
write(x = loremIpsum02, file = "RESULTS/loremIpsum02.txt")
write(x = loremIpsum03, file = "RESULTS/loremIpsum03.txt")
write(x = loremIpsum04, file = "RESULTS/loremIpsum04.txt")
write(x = loremIpsum05, file = "RESULTS/loremIpsum05.txt")
wordOccuDF <- getwordOccuDF(mywd = paste0(getwd(), "/RESULTS"),
excludeSW = FALSE)
file.remove(list.files(pattern = "loremIpsum"))
getXFreqWord
Description
Returns most frequent words
Usage
getXFreqWord(wordF, occuWords)
Arguments
wordF
occuWords
20
loadGUI
Value
A vector with most frequent words.
Examples
data("loremIpsum")
loremIpsum01 <- loremIpsum[1:100]
loremIpsum02 <- loremIpsum[101:200]
loremIpsum03 <- loremIpsum[201:300]
loremIpsum04 <- loremIpsum[301:400]
loremIpsum05 <- loremIpsum[401:500]
subDir <- "RESULTS"
dir.create(file.path(getwd(), subDir), showWarnings = FALSE)
write(x = loremIpsum01, file = "RESULTS/loremIpsum01.txt")
write(x = loremIpsum02, file = "RESULTS/loremIpsum02.txt")
write(x = loremIpsum03, file = "RESULTS/loremIpsum03.txt")
write(x = loremIpsum04, file = "RESULTS/loremIpsum04.txt")
write(x = loremIpsum05, file = "RESULTS/loremIpsum05.txt")
wordOccuDF <- getwordOccuDF(mywd = paste0(getwd(), "/RESULTS"),
excludeSW = FALSE)
file.remove(list.files(pattern = "loremIpsum"))
getXFreqWord(wordF = wordOccuDF, occuWords = 5)
inpdfr
Description
The inpdfr package allows analysing and comparing PDF/TXT documents using both classical
text mining tools and those from theoretical ecolgy. In the later, words are considered as species
and documents as communities, therefore allowing analysis at the community and metacommunity
levels. The inpdfr package provides three cathegories of functions: functions to extract and process
text into a word-occurrence data.frame, functions to analyse the word-occurrence data.frame with
standard and ecological tools, and functions to use inpdfr through a Gtk2 Graphical User Interface.
loadGUI
Description
Load the Graphical user Interface in order to use inpdfr package through a user-friendly interface.
Usage
loadGUI()
loremIpsum
21
Details
inpdfr package uses RGtk2 package for its GUI. Non-linux users may need to download additional
files such as the "gtk-file" icon, or the "hicolor" theme, which can be found by downloading GTK+
from http://www.gtk.org/. They are not needed for the GUI to work as intended, but you may get
a "GTK-WARNING" when using loadGUI(). Feel free to ignore this warning. The RGtk2 GUI is
not needed to access all funcionalities of inpdfr package. Some options are only available through
the command line interface.
Examples
## Not run:
loadGUI()
## End(Not run)
loremIpsum
Description
A vector containing a Lorem Ipsum text for testing purposes.
Usage
loremIpsum
Format
A vector with 556 elements, each element corresponds to a line in the original text (character).
Source
http://lipsum.com/.
makeMainWindowsContent
RGtk2 GUI function: dynamic content of main window.
Description
This function is provided so that you can easily see its content. It is not intended to be used, prefer
loadGUI() to load the RGtk2 GUI.
Usage
makeMainWindowsContent(main_window)
22
makeWordcloud
Arguments
main_window
makeMenuMainWindow
Description
This function is provided so that you can easily see its content. It is not intended to be used, prefer
loadGUI() to load the RGtk2 GUI.
Usage
makeMenuMainWindow(main_window)
Arguments
main_window
makeWordcloud
Description
Plot a word cloud from the word-occurrence data.frame using wordcloud function.
Usage
makeWordcloud(wordF, wcFormat = "png", wcminFreq = 3, wcmaxWords = Inf,
wcRandOrder = FALSE, wcCol = RColorBrewer::brewer.pal(8, "Dark2"),
getPlot = c(TRUE, TRUE), mwidth = 1000, mheight = 1000,
formatType = "png")
Arguments
wordF
wcFormat
wcminFreq
wcmaxWords
wcRandOrder
wcCol
getPlot
A vector with two logical values. If plots[1]==TRUE, a word cloud is made for
each document. If plots[2]==TRUE, a word cloud is made for the combinaison
of all documents.
mergeWordFreq
23
mwidth
mheight
formatType
The format for the output file ("eps", "pdf", "png", "svg", "tiff", "jpeg", "bmp").
Examples
## Not run:
makeWordcloud(wordF = myDF)
## End(Not run)
mergeWordFreq
Description
Merge word-occurrence data.frames into a single data.frame.
Usage
mergeWordFreq(wordF)
Arguments
wordF
Value
A single word-occurrrence data.frame with each column corresponding to a text file.
Examples
data("loremIpsum")
loremIpsum01 <- loremIpsum[1:100]
loremIpsum02 <- loremIpsum[101:200]
loremIpsum03 <- loremIpsum[201:300]
loremIpsum04 <- loremIpsum[301:400]
loremIpsum05 <- loremIpsum[401:500]
subDir <- "RESULTS"
dir.create(file.path(getwd(), subDir), showWarnings = FALSE)
write(x = loremIpsum01, file = "RESULTS/loremIpsum01.txt")
write(x = loremIpsum02, file = "RESULTS/loremIpsum02.txt")
write(x = loremIpsum03, file = "RESULTS/loremIpsum03.txt")
write(x = loremIpsum04, file = "RESULTS/loremIpsum04.txt")
write(x = loremIpsum05, file = "RESULTS/loremIpsum05.txt")
wordOccuFreq <- getTXT(myTXTs = list.files(path = paste0(getwd(),
"/RESULTS/"), pattern = "loremIpsum", full.names = TRUE))
wordOccuDF <- mergeWordFreq(wordF = wordOccuFreq)
file.remove(list.files(pattern = "loremIpsum"))
24
open_cbFile
open_cb
Description
This function is provided so that you can easily see its content. It is not intended to be used, prefer
loadGUI() to load the RGtk2 GUI.
Usage
open_cb(widget, window)
Arguments
widget
Widget to open.
window
Value
The path to the user-defeined working directory.
open_cbFile
Description
This function is provided so that you can easily see its content. It is not intended to be used, prefer
loadGUI() to load the RGtk2 GUI.
Usage
open_cbFile(widget, window)
Arguments
widget
Widget to open.
window
Value
The path to the user-defeined file.
postProcTxt
postProcTxt
25
Description
Prossess vectors containing words into a data.frame of word occurrences.
Usage
postProcTxt(txt, minword = 1, maxword = 20, minFreqWord = 1)
Arguments
txt
minword
An integer specifying the minimum number of letters per word into the returned
data.frame.
maxword
An integer to specifying the maximum number of letters per word into the returned data.frame.
minFreqWord
An integer specifying the minimum word frequency into the returned data.frame.
Value
A data.frame (freq = occurrences, stem = stem words, word = words), sorted by word occurrences.
Examples
## Not run:
postProcTxt(txt = preProcTxt(filetxt = "loremIpsum.txt"))
## End(Not run)
data("loremIpsum")
subDir <- "RESULTS"
dir.create(file.path(getwd(), subDir), showWarnings = FALSE)
write(x = loremIpsum, file = "RESULTS/loremIpsum.txt")
preProcTxt(filetxt = paste0(getwd(), "/RESULTS/loremIpsum.txt"))
postProcTxt(txt = preProcTxt(filetxt = paste0(getwd(), "/RESULTS/loremIpsum.txt")))
file.remove(list.files(pattern = "loremIpsum"))
26
quitSpaceFromChars
preProcTxt
Description
Extract text from txt files and pre-process content.
Usage
preProcTxt(filetxt, encodingIn = "UTF-8", encodingOut = "UTF-8")
Arguments
filetxt
encodingIn
encodingOut
Value
A character vector with the content of the pre-process txt file (one element per line).
Examples
data("loremIpsum")
subDir <- "RESULTS"
dir.create(file.path(getwd(), subDir), showWarnings = FALSE)
write(x = loremIpsum, file = "RESULTS/loremIpsum.txt")
preProcTxt(filetxt = paste0(getwd(), "/RESULTS/loremIpsum.txt"))
file.remove(list.files(pattern = "loremIpsum"))
quitSpaceFromChars
Description
Delete spaces in file names located in the current working directory.
Usage
quitSpaceFromChars(vectxt)
Arguments
vectxt
switchOffDialogWait
27
Value
The function returns a logical for each file, with TRUE if the file has been found, and FALSE
otherwise.
Examples
quitSpaceFromChars(c("my pdf.pdf","my other pdf.pdf"))
switchOffDialogWait
Description
This function is provided so that you can easily see its content. It is not intended to be used, prefer
loadGUI() to load the RGtk2 GUI.
Usage
switchOffDialogWait(dialogX)
Arguments
dialogX
switchOnDialogWait
Description
This function is provided so that you can easily see its content. It is not intended to be used, prefer
loadGUI() to load the RGtk2 GUI.
Usage
switchOnDialogWait()
28
truncNumWords
truncNumWords
Description
Truncate the word-occurrence data.frame.
Usage
truncNumWords(wordF, maxWords)
Arguments
wordF
maxWords
Value
The data.frame containing word occurrences.
Examples
## Not run:
truncNumWords(wordF = myWordOccurrenceDF, maxWords = 50)
## End(Not run)
Index
Topic datasets
exclusionList_FR, 9
exclusionList_SP, 9
exclusionList_UK, 10
loremIpsum, 21
inpdfr, 20
inpdfr-package (inpdfr), 20
kmeans, 5, 6
loadGUI, 20
loremIpsum, 21
askQuit, 2
makeMainWindowsContent, 21
makeMenuMainWindow, 22
makeWordcloud, 22
mergeWordFreq, 23
Metacommunity, 7, 8
barplot, 15
ca, 3, 4
checkEntry, 3
dist, 4
doCA, 3
doCluster, 4
doKmeansClust, 5
doMetacomEntropart, 6
doMetacomMetacom, 7
open_cb, 24
open_cbFile, 24
postProcTxt, 25
preProcTxt, 26
quitSpaceFromChars, 26
excludeStopWords, 8
exclusionList_FR, 9
exclusionList_SP, 9
exclusionList_UK, 10
switchOffDialogWait, 27
switchOnDialogWait, 27
truncNumWords, 28
getAllAnalysis, 10
getListFiles, 11
getMostFreqWord, 12
getMostFreqWordCor, 13
getPDF, 14
getStopWords, 15
getSummaryStatsBARPLOT, 15
getSummaryStatsHISTO, 16
getSummaryStatsOCCUR, 17
getTXT, 18
getwordOccuDF, 18
getXFreqWord, 19
wordcloud, 22
hclust, 4, 5
hist, 16
29