Optimisation and Dddddimension Reduction Tech-Unlocked
Optimisation and Dddddimension Reduction Tech-Unlocked
Introduction
Similar to the statement made by Samuel Taylor Coleridge in his poem “Rime of
the Ancient Mariner,” the value of data is heavily reliant on the analyst’s proficiency in
managing and manipulating the data. Despite the remarkable progress made in data-
related technologies, analysts continue to dedicate a significant portion of their time to
acquiring data, identifying and addressing data quality concerns and preparing data for
effective utilisation.
Extensive research has demonstrated that within the realm of data analysis, this
particular phase stands out as the most laborious and time-intensive element. In the
realm of data analysis and exploration, data wrangling continues to hold a significant role
as a foundational element. Despite the inherent difficulties and complexities associated
with this process, it serves as a crucial stepping stone that empowers the creation of
impactful visualisations and facilitates the development of robust statistical models.
Notes
The data wrangling process refers to the steps involved in preparing and
transforming raw data into a format that is suitable for analysis. This process typically
includes tasks such as data collection, data
Data wrangling is a technical term that can be considered self-descriptive. The term
“wrangling” is used to describe the process of gathering and organising information in a
specific manner. The operation comprises a series of the subsequent processes:
1. Data Wrangling Exploration: Prior to commencing the data wrangling procedure, it is
imperative to consider the potential underlying aspects of your data. It is imperative
to engage in critical thinking regarding the anticipated outcomes of your data and
its intended applications after the completion of the data wrangling process. After
establishing your objectives, proceed to collect the necessary data.
2UJDQLVDWLRQ 2QFH WKH UDZ GDWD KDV EHHQ FROOHFWHG ZLWKLQ D VSHFL¿F GDWDVHW LW LV
necessary to arrange the data in a structured manner. The initial observation of raw
data can be overwhelming due to the multitude of data types and sources, as well as
their inherent complexity.
3. Data Cleaning: Once your data has been organised, the next step is to initiate the data
cleaning process. The process of data cleaning encompasses the tasks of identifying
and removing outliers, formatting null values and eliminating duplicate data. It is
imperative to acknowledge that the process of cleaning data obtained through web
scraping techniques can be more laborious compared to cleaning data obtained from
a database. Web data is typically characterised by its high level of unstructuredness,
which often necessitates a longer processing time compared to structured data
obtained from a database.
Amity Directorate of Distance & Online Education
Optimization and Dimension Reduction Techniques 45
'DWDHQULFKPHQWLQYROYHVHYDOXDWLQJWKHDYDLODEOHGDWDWRDVFHUWDLQLWVVXI¿FLHQF\IRU
IXUWKHUSURFHVVLQJ,QVXI¿FLHQWGDWDGXULQJWKHFRPSOHWLRQRIWKHZUDQJOLQJSURFHVV Notes
can potentially undermine the insights derived from subsequent analysis. Investors
seeking to analyse product review data would require a substantial volume of data to
effectively depict the market and enhance their investment intelligence.
9DOLGDWLRQ 2QFH D VXI¿FLHQW DPRXQW RI GDWD KDV EHHQ FROOHFWHG LW LV QHFHVVDU\ WR
implement validation rules to ensure the accuracy and integrity of the data. The
validation rules are executed in iterative sequences to ensure the consistency of your
data across the entire dataset. Validation rules serve the dual purpose of ensuring
both quality and security. This step employs a similar logical approach as that used in
data normalisation, which is a process of standardising data through the application of
validation rules.
6. Data publishing is the concluding stage of the data munging process. Data publishing
encompasses the process of adequately preparing data for subsequent utilisation.
This may involve the creation of comprehensive notes and documentation detailing the
steps taken during the data wrangling process. Additionally, it may require establishing
access permissions for other users and applications.
The use case of data wrangling involves the process of transforming and cleaning
raw data into a structured format that is suitable for analysis and interpretation. This
process is essential for ensuring data quality and reliability in various domains, such as
business intelligence, data science and research.
Data munging is a versatile technique that finds application in various use-cases,
which are outlined below:
1. Fraud Detection: By leveraging a data wrangling tool, businesses are empowered to
execute the following actions:
Corporate fraud can be distinguished by carefully analysing detailed information
such as multi-party and multi-layered emails or web chats. By examining these
sources, one can identify unusual behaviour that may indicate fraudulent
activities.
Enhance data security by enabling non-technical operators to efficiently analyse
and manipulate data in order to effectively manage the multitude of daily
security tasks.
To achieve accurate and consistent modelling results, it is essential to establish
a standardised approach for quantifying both structured and unstructured data
sets.
The benefits of data wrangling include improved data quality, increased efficiency
in data analysis, enhanced data integration and better decision-making capabilities.
Data wrangling allows for the identification and correction of errors, inconsistencies and
missing values
As stated earlier, the utilisation of big data has become a fundamental component in
the realm of business and finance in the present day. However, the complete potential of
the aforementioned data is not always evident. Data processes, such as data discovery,
serve as valuable tools for identifying and acknowledging the potential of your data.
In order to maximise the potential of your data, it is necessary to implement data. The
following are several significant advantages associated with data wrangling.
1. Data Consistency: The process of data wrangling ensures that the resulting dataset
exhibits a higher level of consistency from an organisational perspective. Ensuring data
consistency is of utmost importance in business operations that entail the collection of
data input by consumers or other human end-users. In the event that a human end-
user erroneously submits personal information, such as creating a duplicate customer
account, it can have a subsequent impact on performance analysis.
2. Enhanced statistical insights can be obtained through the process of data wrangling,
which involves transforming metadata to achieve greater consistency. The
aforementioned insights are frequently attained through enhanced data consistency.
:KHQ PHWDGDWD UHPDLQV FRQVLVWHQW DXWRPDWHG WRROV FDQ HIIHFWLYHO\ DQG HI¿FLHQWO\
analyse the data, leading to faster and more accurate results. In the context of
constructing a model for forecasting market performance, the process of data wrangling
becomes essential in order to ensure that the metadata is cleansed and prepared
appropriately. This enables the model to execute smoothly and without encountering
any errors.
&RVWHI¿FLHQF\$VVWDWHGHDUOLHUWKHXWLOLVDWLRQRIGDWDZUDQJOLQJWHFKQLTXHVHQDEOHV
EXVLQHVVHV WR HQKDQFH WKH HI¿FLHQF\ RI WKHLU GDWD DQDO\VLV DQG PRGHOEXLOGLQJ
procedures, resulting in long-term cost savings. One example of an effective practice
is to perform a comprehensive cleaning and organisation of data prior to its integration,
as this will result in a reduction of errors and time saved for developers.
imported into R for data analysis. Similarly, data within the R environment can be saved
Notes to external files using the same file formats. This process facilitates seamless data
exchange and enables efficient data manipulation and analysis in R.
RDS Files
Function: readRDS()
Space-Delimited
The read.table() function is a built-in function in R that is used to read data from a file
and create a data frame.
Function: read.table()
Common Parameters:
Header: TRUE when the first row includes variable names. The default is
FALSE.
Sep: A string indicating what is separating the data. The default is “ “.
> dataSPACE <-read.table(“C:/mydata/survey.dat”, header=TRUE, sep= “ “)
With the working directory set, this is equivalent to:
> dataSPACE <-read.table(“survey.dat”, header=TRUE, sep= “ “)
Once the working directory has been set, the following is considered equivalent:
The data from the file “survey.dat” is read into the variable dataSPACE using the
read.table function. The file has a header row and the columns are separated by spaces.
Tab-Delimited
Functions: read.table()
Common Parameters:
Header: TRUE when the first row includes variable names. The default is
FALSE.
Sep: A string indicating what is separating the data. The default is “ “.
> dataTAB <-read.table(“survey.dat”, header=TRUE, sep= “\t”)
The read.table() function is a useful tool in programming that allows for the reading
of tabular data.
The data from the file “survey.dat” is read into the data frame called dataTAB. The
file has a header row and the columns are separated by tabs.
Comma-Delimited
Notes
Function: read.csv()
Common Parameters:
Header: TRUE when the first row includes variable names. The default is
FALSE.
> dataCOMMA <-read.csv(“survey.csv”, header=TRUE)
The read.csv() function is used to read a CSV (Comma-Separated Values) file in R
programming language.
The dataCOMMA variable is assigned the value of the CSV file “survey.csv” using
the read.csv() function, with the header parameter set to TRUE.
Fixed-Width Formats
Function: read.fwf()
Common Parameters:
Header: TRUE when the first row includes variable names. The default is
FALSE.
> dataFW <-read.fwf(“survey.txt”, header=TRUE)
Fixed-width formats are a type of data format commonly used in computer systems.
In fixed-width formats, each field or data element is allocated a
The read.fwf() function is a function used in programming to read fixed-width
formatted files.
The dataFW object is assigned the result of the read.fwf function, which reads the
“survey.txt” file. The header parameter is set to TRUE, indicating that the file
SPSS
The read.spss() function is designed to read SPSS data files.
Function: read.spss()
Common Parameters:
Ɣ to.data.frame: TRUE if R should treat loaded data as a data frame. The default is
FALSE.
Ɣ use.value.labels: TRUE if R should convert variables with value labels into R factors
with those levels. The default is TRUE. Notes
The dataSPSS object is assigned the result of the read.spss function, which reads
the “C:\mydata/survey.save” file and converts it into a data frame.
> dataSPSS <- read.spss(“C:\mydata/survey.save”, to.data.frame=TRUE)
In R, it is assumed that any value labels present in the SPSS file pertain to factors,
which are R’s equivalent of categorical variables. Consequently, R stores the labels
themselves instead of the original numerical values. An illustrative instance involves the
utilisation of a variable denoted as “gender,” which is encoded as 0 for male and 1 for
female. These corresponding labels are stored within the SAV file. When data is imported
from SPSS into R, the variable values will be represented as “male” and “female” instead
of the original “0” and “1” values. The default behaviour can be modified in the call to the
read.spss function by specifying the desired changes.
The variable “dataSPSS” is assigned a value. The function `read.spss()` is used to
read an SPSS file. It requires the user to choose the file using the `file
> dataSPSS <- read.spss(file.choose(), use.value.labels=FALSE)
STATA
The purpose of the read.data() function is to read data from a specified source.
Function: read.data()
Common Parameters:
Ɣ convert.dates: Convert STATA dates to Date class. The default is TRUE.
Ɣ convert.factors: TRUE to convert value labels into factors. The default is TRUE.
The data from the file “survey.dta” is read into the variable “dataStata” using the
function read.dta().
> dataStata <- read.dta(“survey.dta”)
The object that is generated is inherently a data frame. By default, the conversion
process transforms value labels into factor levels. To disable this feature, utilise the
following steps:
> dataStata <-read.dta(“survey.dta”, convert.factors=FALSE)
The data from the file “survey.dta” is read into the variable “dataStata” using the
function “read.dta”. The option “convert.factors” is set to FALSE.
Note:It should be noted that STATA has a tendency to modify the way it stores
data files when transitioning between versions, which may cause compatibility issues
with the foreign package. In the event that the read.dta command encounters an error,
it is recommended to utilise the SAVEOLD command in STATA to store the data. This
operation generates a DTA file that is saved in a previous version of STATA, which is
more likely to be recognised by the read.dta function.
SAS
The read.xport() function is designed to read data from an XPORT file format.
Function: read.xport()
The data from the file “C:/mydata/survey” is read into the variable “dataSAS” using
the function “read.xport”.
Syntax:
read.csv(path, header = TRUE, sep = “,”)
The function read.csv() is used to read a CSV file from the specified path. It takes
three arguments: path, header and sep. The path argument specifies the location of
the CSV file. The header argument is a logical value that indicates whether the CSV file
contains a header row.
Arguments:
path: CSV file path that needs to be imported.
header: Indicates whether to import headers in CSV. By default, it is set to
TRUE.
sep: the field separator character
The programming language R commonly employs the concept of factors
for the purpose of re-encoding strings. It is recommended to set the parameter
“stringsAsFactors” = FALSE in order to prevent R from automatically converting character
or categorical variables into factors.
# read the data from the CSV file
GDWDUHDGFVY ³&??3HUVRQDO??,06??FULFNHWBSRLQWVFVY´KHDGHU 758(
# print the data variable (outputs as DataFrame)
data
Output
Notes
Teams Wins Lose Points
1 India 52 10
2 South Africa 3 4 6
3 West Indies 1 6 2
4 England 2 4 4
5 Australia 4 2 8
6 New Zealand 2 5 4
8VLQJUHDGBFVY PHWKRG
7KH UHDGBFVY PHWKRG LV ZLGHO\ UHJDUGHG DV WKH SUHIHUUHG DSSURDFK IRU UHDGLQJ
CSV files in R. The programme sequentially processes each line of a CSV file.
The data is read in the form of a Tibble, with only 10 rows initially displayed.
Additional rows can be accessed by expanding the view.
Additionally, it provides the user with the percentage of the file that has been read
into the system, enhancing its robustness in comparison to the read.csv() method.
:KHQGHDOLQJZLWKODUJH&69ILOHVLWLVDGYLVDEOHWRXWLOLVHWKHUHDGBFVY PHWKRG
Syntax:
UHDGBFVY SDWKFROBQDPHVQBPD[FROBW\SHVSURJUHVV
Arguments :
path: CSV file path that needs to be imported.
FROBQDPHV,QGLFDWHVZKHWKHUWRLPSRUWKHDGHUVLQ&69%\GHIDXOWLWLVVHWWR
TRUE.
QBPD[7KHPD[LPXPQXPEHURIURZVWRUHDG
FROBW\SHV ,I DQ\ FROXPQ VXFFXPEV WR 18// WKHQ WKH FROBW\SHV FDQ EH
specified in a compact string format.
progress: A progress metre to analyse the percentage of files read into the
system
# import data.table library
library(data.table)
#import data
GDWDUHDGBFVY ³&??3HUVRQDO??,06??FULFNHWBSRLQWVFVY´
Output
Output
Syntax:
UHDGBH[FHO ILOHQDPHVKHHWGW\SH ´IORDW´
Parameters:
filename:File name to read from
sheet:The title of the Excel sheet
dtype:-Numpy data type
Returns:
A data frame is used to represent the variable.
Steps to import excel file using Dataset option from the environment window of
Rstudio:
Step 1: The Import Dataset option should be selected in the environment window
of RStudio. The user needs to choose this option to import the dataset from the
environment window in RStudio.
Notes
Step 2: Choose the “From Excel” option within the import Dataset menu. To import
an Excel file, the user should select the “From Excel” option under the import dataset
menu. This option is specifically designed for importing Excel files.
Step 3: Utilise the browse option to choose and import the desired Excel file. The
user will be presented with the option to browse for the desired Excel file to import into R
by clicking on the corresponding button.Next, the user must choose the specific Excel file
that they wish to import into R.
Notes
Step 4: Proceed to select the import option, which will initiate the successful
importation of the Excel file. In the final step, users are required to click on the “import”
button to initiate the successful importation of the selected Excel file into R.
The dataset was selected by the user based on their preferences, involving
modifications to the file name and sheet type. The user is presented with two sheets and
selects the second one. They proceed to choose the second list using the sheet option.
They can specify the maximum number of rows they desire from the data they have
entered. In the skip box, the user has the ability to skip rows according to their desired
quantity. The user inputs a value in the NA box. If this value exists in the dataset, it is
considered as NA.
An alternative approach for importing Excel files into R-Studio is available.
Step 1: To initiate the desired action, please select the “file” option by clicking on it.
Step 2: Within the file, locate and select the “Import Dataset” option, followed by
choosing the desired dataset from the available Excel files.
Notes
Syntax:
# read data stored in .txt file
[UHDGWDEOH ³ILOHBQDPHW[W´KHDGHU 758()$/6(
# Simple R program to read txt file
x<-read.table(“D://Data//myfile.txt”, header=FALSE)
# print x
print(x)
Output:
V1 V2 V3
1 100 a1 b1
2 200 a2 b2
3 300 a3 b3
If the header argument is set at TRUE, which reads the column names if they exist in
the file. Notes
RMySQL
To utilise the RMySQL package, follow these steps: 1. Install the RMySQL package
by running the following command: `install.packages
The RMySQL package serves as an interface to the MySQL Database Management
System (DBMS). The current version of this package necessitates the pre-installation of
the DBI package.
The dbDriver function, when passed the argument “MySQL”, will return an object
that manages database connections. This object can then be used with functions such
as dbConnect and dbDisconnect to respectively establish and terminate a connection
to a database. Before working with DBMS using their respective call functions, such as
dbDriver(“SQLite”), dbDriver(“RPostgreSQL”) and dbDriver(“Oracle”), it is necessary to
install packages like RSQLite, RPostgreSQL and ROracle.
Ɣ The function dbGetQuery is used to send queries and retrieve results in the form of
a data frame.
Ɣ The function dbSendQuery is responsible for sending the query and returning an
Notes object that belongs to the class inheriting from “DBIResult”. This object can then be
utilised to retrieve the desired result.
Ɣ The function dbClearResult is used to remove the result from the cache memory.
Ɣ The fetch operation retrieves either a subset or all of the rows that were specified in
the query. The output of the fetch function is a collection of elements organised in a
list structure.
Ɣ The function dbHasCompleted is used to verify if all the rows have been retrieved.
Ɣ The functions dbReadTable and dbWriteTable are utilised to perform table reading
and writing operations in a Database from a R data frame.
Syntax:
>library(RMySQL)
!FRQQHFWLRQGE&RQQHFW GE'ULYHU ³0\64/´ GEQDPH ´7HVWB'DWDEDVH´
## Assuming that using MySQL tables for DBMS
>dbListTables(connection)
##Loading a data frame into database
!GDWD 6DPSOHB'DWD
!GE:ULWH7DEOH FRQQHFWLRQ³&ROXPQ´6DPSOHB'DWDRYHUZULWH 7UXH
## To read the Column1 of in the database
>dbReadTable(connection, “Column1”)
## Selecting from the loaded table as a query
!GE*HW4XHU\ &RQQHFWLRQSDVWH 5RZB1DPH9DULDEOHB1DPH&RQGLWLRQ
RODBC
The RODBC package is used for connecting R to databases using the Open
Database Connectivity (ODBC) interface.
The RODBC package offers an interface for accessing database sources that
support the ODBC interface. The popularity of this package stems from its ability to
utilise the same R code for importing data from various database systems. The RODBC
package is compatible with OS X, Windows and Unix/Linux operating systems and
supports a wide range of database systems including MySQL, Microsoft SQL Server,
Oracle and PostgreSQL.
Ɣ The functions odbcConnect and odbcDriverConnect are used to establish a
connection to a database.
Ɣ The odbcGetInfo function retrieves information regarding the client and server.
Ɣ The odbcClose function is utilised for the purpose of closing the database
connection.
Ɣ The sqlSave function is used to store the R data frame provided as an argument
into a database table.
Ɣ The sqlFetch function performs the reverse operation, whereby it retrieves the
database table and stores it as a R data frame.
Ɣ The sqlQuery function is utilised to transmit a SQL query to the database, resulting
in the retrieval of a R data frame. Notes
In the example mentioned below, PostgreSQL with an ODBC driver is used.
>library(RODBC)
!FRQQHFWLRQRGEF&RQQHFW ³6DPSOHB'DWDEDVH´XLG ´1DPH´FDVH ´WRORZHU´
!GDWD 6DPSOHB'DWD
!VTO6DYH FRQQHFWLRQ6DPSOHB'DWDURZQDPH ´5RZ´DGG3. 758(
!UP 6DPSOHB'DWD
!VTO4XHU\ FRQQHFWLRQ³6HOHFWB&ROXPQ&RQGLWLRQ´
Importing Data from Non-Relational Database
R also has packages that support non-relational databases for data import.
rhbase is used for Hadoop Distributed File System
RCassandra is used for Cassandra Database system
Rmongodb is used for MongoDB.
>Library(rmongodb)
!6DPSOH'DWDEDVH³7HVWB'DWDEDVH´
>MyMongoDB <- mongo.create(db=SampleDatabase)
## To insert a list
!PRQJRLQVHUW 0\0RQJR'%³7HVWB'DWDEDVH&ROXPQ´OLVWQDPH
2. CSS, on the other hand, is a style sheet language that determines the visual
Notes SUHVHQWDWLRQDQGOD\RXWRIDZHEVLWH,WLVUHVSRQVLEOHIRUGH¿QLQJWKHDSSHDUDQFHDQG
aesthetics of the site, including aspects such as colours, fonts and spacing.
For instance, CSS is used to specify various aspects of a website, such as fonts,
colours, sizes, spacing and more.
One of the crucial aspects of CSS is selectors, which are patterns used to select
elements. The .class selector is particularly important, as it selects all elements with the
same class. For example, the .xyz selector will target all elements with class=”xyz”.
3. JavaScript is a programming language that enables interactive and dynamic
IXQFWLRQDOLW\ RQ ZHE SDJHV ,W LV XVHG WR GH¿QH WKH EHKDYLRXU DQG LQWHUDFWLYLW\ RI D
website, allowing for actions such as user input validation, content manipulation and
dynamic updates.
Web scraping is a technique used to extract information from the lines of code in
HTML, CSS and Javascript. The term typically denotes an automated process that is
characterised by reduced error rates and increased speed compared to manual data
collection methods.
It is imperative to acknowledge that web scraping may give rise to ethical concerns
due to its involvement in accessing and utilising data from websites without explicit
permission from the website owner. Adhering to the terms of use for a website and
obtaining written consent prior to extracting substantial volumes of data are considered
best practices.
Web scrapers as browser extensions provide added features and can be easily
Notes integrated with the browser for user convenience. However, they have limitations
since they operate within the browser’s constraints and cannot execute advanced
features beyond those limits.
Ɣ Software web scrapers, on the other hand, are independent programs that users
download and install on their computers. They offer greater complexity and
advanced features not restricted by browser limitations.
Ɣ Cloud web scrapers use remote servers provided by the vendor, freeing up a user’s
computer resources for other tasks. Local web scrapers, on the contrary, rely on the
user’s computer resources, potentially impacting overall performance if significant
CPU or RAM resources are required during scraping.
The principles of tidy data establish a standardised approach for organising data
values in a dataset. The utilisation of a standard facilitates the process of initial data Notes
cleaning by eliminating the need to commence from the beginning and develop a solution
from scratch on each occasion. The tidy data standard has been specifically designed
to enhance the ease of initial data exploration and analysis, as well as to streamline
the development of cohesive data analysis tools. Translation is frequently necessary
when using existing tools. The process involves allocating a certain amount of time to
manipulate the output generated by one tool in order to prepare it for use as input in
another tool. Tidy datasets and tidy tools are complementary components that facilitate
data analysis, enabling users to concentrate on the substantive domain problem rather
than the mundane data logistics.
Figure: Following three rules makes a dataset tidy: variables are in columns, observations are in rows
and values are in cells.
(Image Source: https://r4ds.had.co.nz/tidy-data.html)
Notes
The interrelation of these three rules stems from the impossibility of satisfying
only two out of the three. The aforementioned interrelationship gives rise to a more
streamlined set of practical instructions:
Create a tibble for each dataset.
Arrange each variable in a separate column.
For individuals who extensively utilise Excel, particularly Excel pivot tables, it is
beneficial to perceive tidy data as data that is highly compatible with pivoting operations.
Consider a scenario in which you encountered a situation where the utilisation of a pivot
table was necessary. However, the original dataset contained dimensions presented in
both rows and columns. For instance, the rows contained information about Campaign,
while the columns contained information about Device Category, with the corresponding
Sessions data populated within the cells. If an individual encounters the need to manually
or extensively apply formulas to transform raw data into a desired format suitable for
input into a pivot table, they have encountered a situation involving non-tidy data.
Tidy data refers to data that may not be optimised for human readability, but is highly
compatible with subsequent R functions, particularly those within the tidyverse.
Ensuring the tidiness of your data is important for several reasons. There are two
primary advantages:
1. There is a notable benefit associated with selecting a uniform approach for data
storage. Having a consistent data structure facilitates the learning process of
tools that operate on it due to the presence of a fundamental uniformity.
2. Placing variables in columns offers a distinct advantage by leveraging the
vectorized nature of R. As previously discussed in the context of mutate and
summary functions, it is important to note that the majority of built-in R functions
operate on vectors of values. The process of transforming tidy data is facilitated
by its inherent ease and intuitiveness.
Notes
Output:
ņņ$WWDFKLQJSDFNDJHV
ņņņņņņņņņņņņņņņņņņņņņņņņņņņņņņņņņņņņņņņWLG\YHUVHņņ
9 ggplot2 3.3.6 9 purrr 0.3.5
Select Function
#To select the following columns
mycols <- select(census, age, education,occupation)
head(mycols)
#To select all columns from education to relationship
mycols <- select(census, education:relationship)
#To print first 5 rows
head(mycols, 5)
#To select columns with numeric indexes
mycols <- select(mycols,c(6:9))
head(mycols)