0% found this document useful (0 votes)
16 views30 pages

DSF Record

The document outlines a series of experiments conducted using Excel and R, focusing on data analysis techniques such as summary statistics, comparative statistics, univariate analysis, data cleansing, data visualization, and regression analysis. Each experiment includes an aim, procedure, and expected output, detailing steps for executing various statistical functions and data manipulation tasks. The document serves as a practical guide for performing data analysis and visualization using these software tools.

Uploaded by

juuuuuuuuuuu007
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views30 pages

DSF Record

The document outlines a series of experiments conducted using Excel and R, focusing on data analysis techniques such as summary statistics, comparative statistics, univariate analysis, data cleansing, data visualization, and regression analysis. Each experiment includes an aim, procedure, and expected output, detailing steps for executing various statistical functions and data manipulation tasks. The document serves as a practical guide for performing data analysis and visualization using these software tools.

Uploaded by

juuuuuuuuuuu007
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 30

Table of Contents

Ex. No. Date Experiment Name Marks Signature.


1 07-02-24 Data Entry and Calculating Summary
Statistics
2 15-02-24 Generate Comparative Statistics in Excel.

3 02-03-24 Univariate Analysis: Frequency, Mean,


Median, Mode, Variance, Standard
Deviation, Skewness
4 11-03-24 Data Cleansing using Excel

5 18-03-24 Simple Linear Regression Model in


Microsoft Excel
6 26-03-24 Data Visualization using Excel

7 13-04-24 Practical Based on Numpy Ndarray using R

8 13-04-24 Working With Pandas Data Frame using R


9 24-04-24 Handling Missing Values and Duplicate
Values using R
10 24-04-24 Data Integration in R

1
Exp No: 1 DATA ENTRY AND CALCULATING SUMMARY STATISTICS
Date: 07/02/2024

AIM:
To calculate summary statistics. of data entry in excel.
PROCEDURE:
STEP1: Open data entry in Excel.
STEP2: Click File → options.
STEP 3: Select Add-Ins in the window opened.
STEP4: In manage tab, choose ExcelAdd-ins and select Go.
STEP5: Choose DATA in Microsoft Excel ribbon.
STEP6: Select ‘Data analysis’ in analysis tool bar.
STEP7: From the analysis tools, select descriptive statistics.
STEP 8: Select the input range and output range by dragging the values.
STEP 9: check the summary statistics and click OK
STEP10: The desired output is displayed.
OUTPUT:

2
FORMULAE:
Mean (Average) = AVERAGE (range)
Standard Error= STDEV (range) / SQRT (COUNT (range))
Median=MEDIAN (range)
Mode=MODE (range)
Standard Deviation=STDEV (range)
Sample Variance=VAR.S(range)
Kurtosis=KURT (range)
Skewness=SKEW (range)
Range=MAX (range) - MIN (range)
Minimum=MIN (range)
Maximum=MAX (range)
Sum=SUM (range)
Count=COUNT (range)

RESULT:
The desired output is obtained.

3
Exp No: 2 GENERATE COMPARATIVE STATISTICS in EXCEL
Date: 15/02/2024
AIM:
To calculate the comparative statistics in excel.
PROCEDURE:
STEP1: Open data entry in Excel.
STEP2: Update data (or) Giver Some data in the excel sheet.
STEP 3: Open data analytics in data tab.
STEP4: In data analytics tab, select t-test paired two sample for means.
STEP5: In a dialogue box, the Variable 1 box select the first data set.
STEP6: Then in variable 2 box, select the second dataset.
STEP7: Select the output cell where to display.
STEP 8: Select OK
STEP 9: The desired output is displayed.
DESCRIPTION:
1.Mean:
The average value of each Sample group (or) the mean difference between paired
observations.
2.Variance:
Measures how much the values the Sample differ from mean (or) the variability
within each sample group.
3.Observations:
The number of paired Observations (or) data points in the Sample.
4.Person correlation:
Indicates the Strength and direction of the linear relationship between the Paired
observation.
5.Hypothesized mean difference:
The expected between the mean's two paired Sample under the null hypothesis.
6. Degree of freedom:
Represents the no. of independent pieces of information available to estimate a
statistic, often calculated as the number of pairs minus - 1.
7.T - Statistic (t stat):

4
The Calculated value of the t-test statistics measuring the size of the difference
relative to the variation in the data.
8.P(T<=t) One-tail:
The probability of observing the given result or a more extreme result in one-tailed
test, if the null hypothesis is true.
9.t-critical one tail:
The critical t-value for a one-tailed test, used to determine whether the observed t-
statistics is significant.
10.P(T<=t) two-tail:
The probability of observing the given result or a more extreme result in two-tailed test,
if the null hypothesis is true.
OUTPUT:

RESULT:
The desired output is obtained.

5
EX NO: 3 UNIVARIATE ANALYSIS: FREQUENCY, MEAN, MEDIAN, MODE,
Date: 02/03/2024 VARIANCE, STANDARD DEVIATION, SKEWNESS

AIM:
To use employee data set from a company and to perform the following operations-
Univariate Analysis Frequency, Mean. Median, Mode, Variance, Standard deviation and
Skewness.

PROCEDURE:
STEPI: Open data entry in Excel.
STEP2: Update data or give some data in the excel sheet.
STEP 3: Calculate mean by selecting the required data and use mean formula.
STEP4: Calculate mode by selecting the required data and use mode formula.
STEP5: Calculate Variance, Standard deviation, Skewness by selecting the required data and
use Standard deviation, Skewness, variance formula.
STEP6: The desired output is displayed.

DESCRIPTION:
1. Frequency:
To create a frequency distribution in Excel, use the COUNTIF function.
Enter the unique values in one column.
In the adjacent column, use the formula:
=COUNTIF(DataRange, CellReference) for each value.

2. Mean (Average):
Use the AVERAGE function.
Example: =AVERAGE(DataRange)

3. Median:
Use the MEDIAN function.
Example: =MEDIAN(DataRange)

4. Mode:
Use the MODE.SNGL or MODE.MULT functions.
Example for single mode: =MODE.SNGL(DataRange)
Example for multiple modes: =MODE.MULT(DataRange)

5. Variance:
Use the VAR.P or VAR.S functions for population or sample variance.
Example for population variance: =VAR.P(DataRange)

6
Example for sample variance: =VAR.S(DataRange)

6. Standard Deviation:
Use the STDEV.P or STDEV.S functions for population or sample standard
deviation.
Example for population standard deviation: =STDEV.P(DataRange)
Example for sample standard deviation: =STDEV.S(DataRange)

7. Skewness:
Excel doesn't have a built-in function for skewness, but can use the SKEW.P or
SKEW.S functions for population or sample skewness.
Example for population skewness: =SKEW.P(DataRange)
Example for sample skewness: =SKEW.S(DataRange)
Replace DataRange with the actual range of your data.
Once after entering these formulas, Excel will automatically calculate the answer
OUTPUT:

FORMULAE:
1. Mean: = AVERAGE(RANGE)
2. Median: = MEDIAN (RANGE)
3. Mode Single = MODE. SNGL (RANGE)
4. Mode multiple = MODE. MULT (RANGE)
5. Variance Sample: = VAR.S (RANGE)
6. Variance population = VAR.P (RANGE)
7. Standard deviation Sample = STDEV.S(RANGE)
8. Standard deviation population = STDEV.P(RANGE)
9. Skewness Sample: SKEW (RANGE)
10. Skewness population: SKEW.P (RANGE)

RESULT:
The desired output is obtained.

7
Exp No. 4 DATA CLEANSING USING EXCEL
Date: 11/03/2024
AIM:
To perform data cleansing with the given data set.
DESCRIPTION:
Data cleansing, also known as data cleaning or data scrubbing, involves
identifying and correcting errors or inconsistencies in a dataset to improve its quality and
reliability. Here are some common techniques for data cleansing in Excel

REMOVING DUPLICATES:
Use the "Remove Duplicates" feature to eliminate duplicate records from
your dataset.
STEPS FOR REMOVING DUPLICATES:
STEP 1: Identify the range containing the data where you want to remove duplicates.
STEP 2: Go to the "Data" tab in Excel.
STEP 3: Locate and click on the "Remove Duplicates" button in the Data Tools group.
STEP 4: A dialog box will appear. Choose the columns where you want to remove duplicates
by checking the respective boxes.
STEP 5: Once you've selected the columns, click "OK" to confirm your selection and remove
duplicates from the chosen columns.
OUTPUT:

8
HANDLING MISSING VALUES:
Identify and handle missing values using Excel functions.
Use IFERROR, or other relevant functions to replace or remove missing values.
STEPS FOR HANDLING MISSING VALUES:
STEP 1: Identify cells or columns with missing values in your dataset.
STEP 2: Decide whether to replace missing values with appropriate values or remove them
entirely.
STEP 3: Utilize functions like IFERROR to replace missing values with specified
alternatives.
STEP 4: Implement the chosen method for handling missing values across the dataset.
OUTPUT:

9
CORRECTING TYPOS AND INCONSISTENCIES:
Use the "Find and Replace" feature to correct typos and inconsistencies.
STEPS FOR CORRECTING TYPOS AND INCONSISTENCIES:
STEP 1: Press Ctrl + H to open the "Find and Replace" dialog.
STEP 2: Enter the incorrect value in the "Find" field and the correct value in the "Replace"
field.
STEP 3: Click on "Replace All" to replace all occurrences of the incorrect value with the
correct one.
OUTPUT:

10
STANDARDIZING TEXT:
Ensure consistency in text data by standardizing it.
STEPS FOR STANDARDIZING TEXT:
STEP 1: Identify the range containing the text data you want to standardize.
STEP 2: Utilize functions like UPPER, LOWER, or PROPER to standardize text case as
required.
STEP 3: Apply the TRIM function to remove leading, trailing, and excess spaces in the text
data.
OUTPUT

11
VALIDATING DATA AGAINST A LIST:
Ensure that data values match a predefined list.
STEPS FOR VALIDATING DATA AGAINST A LIST:
STEP 1: Have a predefined list against which you want to validate your data.
STEP 2: Go to the "Data" tab in Excel.
STEP 3: Utilize VLOOKUP or INDEX/MATCH functions to check if each data value
matches an entry in the reference list.
STEP 4: Apply the chosen lookup function across the dataset to validate data against the
reference list.
OUTPUT

12
REMOVING UNNECESSARY SPACES:
Use the TRIM function to remove leading, trailing, and excess spaces.
STEPS FOR REMOVING UNNECESSARY SPACES:
STEP 1: Identify the range containing text data with unnecessary spaces.
STEP 2: Use the TRIM function to remove leading, trailing, and excess spaces from the text
data.
STEP 3: Apply the TRIM function across the identified range to remove unnecessary spaces.

13
RESULT:
The desired output is obtained.
Exp No: 5 DATA VISUALIZATION USING EXCEL
Date: 18/03/2024

AIM:
To implement Data Visualization using Excel.

PROCEDURE FOR CREATING SCATTER PLOT:


STEP 1: Data Visualization in Excel (Linear regression graph in Excel)
STEP 2: Visualize the relationship between the two variables, draw a linear regression
chart.
STEP 3: Select the two columns with the data, including headers.

14
STEP 4: On the Inset tab, in the Chats group, click the Scatter chart icon, and select the
Scatter thumbnail (the first one)
STEP 5: This will insert a scatter plot in the worksheet.
STEP 6: Now, to draw the least squares regression line. To have it done, right click on any
point and choose Add Trendline… from the context menu.
STEP 7: On the right pane, select the Linear trendline shape and, optionally, check Display
Equation on Chart to get the regression formula.
STEP 8: Choose a different line color and use a solid line instead of a dashed line (select
Solid line in the Dash type box).
STEP 9: Add axes titles (Chart Elements button > Axis Titles).

OUTPUT:

15
PROCEDURE FOR CREATING HISTOGRAM:
STEP1: Prepare your data: Organize your data in a single column.
STEP2: Select your data range: Click and drag to select the range of cells containing your
data.
STEP3: Insert a histogram: Go to the "Insert" tab on the Excel ribbon.
STEP4: Choose a chart type: In the "Charts" group, select "Histogram" from the dropdown
menu.
STEP5: Customize the histogram: Excel will create a default histogram based on your data.
Customize it by clicking on various elements of the chart and using the formatting options
available.
STEP6: Adjust bin width (optional): Right-click on the histogram bars, select "Format Data
Series," and adjust the bin width under "Series Options."
STEP7: Add titles and labels: Add appropriate titles and labels to your histogram using the
Chart Design and Format tabs on the ribbon.
STEP8: Finalize and save: Once satisfied, finalize your histogram by adding finishing
touches and save your Excel file.

16
RESULT:
The desired output is obtained.

17
Exp No: 6 SIMPLE LINEAR REGRESSION MODEL IN MICROSOFT EXCEL
Date: 26/03/24

AIM:
To perform regression analysis in excel.

PROCEDURE:
STEP1: Open Excel.
STEP2: Click File open workbook.
STEP 3: Choose DATA in Microsoft Excel ribbon.
STEP4: Select ‘Data analysis’ in analysis tool bar.
STEP5: Select regression.
STEP6: Select the input x range and input y range by dragging the values.
STEP7: Uncheck label and check residuals.
STEP 8: Select the output range by dragging the values.
STEP 9: The desired output is displayed.

18
OUTPUT:

RESULT:
The desired output is obtained.

19
Exp No:7 PRACTICAL BASED ON NUMPY NDARRAY USING R
Date: 13/04/2024

AIM:
To implement matrix addition and multiplication using NumPy ndarray using R.
DESCRIPTION:
In R, the NumPy library is commonly used for numerical computing, and it provides a
powerful array object called ndarray. The reticulate package allows R users to interact with
Python, including NumPy, so you can work with ndarray in R through this package.
Creation of ndarray:
Create a NumPy ndarray in R using the numpy module's array function.
Attributes of ndarray:
NumPy ndarrays have various attributes, such as shape, dtype, and size, which describe the
dimensions, data type, and total number of elements in the array.
Indexing and Slicing:
ndarray supports powerful indexing and slicing operations. Can access individual elements or
subarrays using square brackets.
Mathematical Operations:
NumPy ndarray supports various mathematical operations, including element-wise
operations, matrix multiplication, and more.
Broadcasting:
NumPy allows broadcasting, which enables operations between arrays of different shapes and
sizes.
Conversion between R and NumPy:
Convertion between R matrices and NumPy ndarrays using py_to_r and r_to_py functions.
ALGORITHM
Step1: Start
Step2: Import numpy module
Step3: Print the basic characteristics and operations of array
Step4: Stop

20
PROGRAM CODE:
arr_1d <- c(1, 2, 3, 4, 5)
print("1D Array:")
print(arr_1d)
# Creating a 2-dimensional array
arr_2d <- matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9), nrow = 3, ncol = 3, byrow = TRUE)
print("\n2D Array:")
print(arr_2d)
# Accessing elements
print("\nElement at row 1, column 2:", arr_2d[1, 2])
# Slicing
print("\nSlicing the array:")
print(arr_2d[2:3, 1:3])
# Operations
arr_a <- matrix(c(1, 2, 3, 4), nrow = 2)
arr_b <- matrix(c(5, 6, 7, 8), nrow = 2)
# Element-wise addition
result_add <- arr_a + arr_b
print("\nElement-wise Addition:")
print(result_add)
# Matrix multiplication
result_mul <- arr_a %*% arr_b
print("\nMatrix Multiplication:")
print(result_mul)

OUTPUT:

[1] "1D Array:"


[1] 1 2 3 4 5[1] "\n2D Array:"
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
[1] "\nElement at row 1, column 2:"
[1] "\nSlicing the array:"
[,1] [,2] [,3]
[1,] 4 5 6
[2,] 7 8 9
[1] "\nElement-wise Addition:"
[,1] [,2]
[1,] 6 10
[2,] 8 12
[1] "\nMatrix Multiplication:"
[,1] [,2]
[1,] 23 31
[2,] 34 46

RESULT:
Thus the 2-dimentional array created and operations performed.

21
Exp No: 8 WORKING WITH PANDAS DATA FRAME USING R
Date: 13/04/2024

AIM:
To create a data frame using R.

ALGORITHM:

1. Define vectors or lists for each column: Create vectors or lists containing the values
for each column in your data frame. Each vector represents a column, and the
elements in the vectors are the values for that column.

2. Combine the vectors into a data frame: Use the data.frame function to combine the
vectors into a data frame. Each vector becomes a column in the data frame, and you
can specify column names using the name = vector syntax.

3. Print or inspect the data frame: Print or use functions like head to inspect the created
data frame.

PROGRAM CODE:

# Create DataFrame from a 2D array


my_2darray <- matrix(c(1, 2, 3, 4, 5, 6), nrow = 2, byrow = TRUE)
df_2darray <- data.frame(my_2darray)
print(df_2darray)

# Create DataFrame from a dictionary


my_dict <- list('1' = c('1', '3'), '2' = c('1', '2'), '3' = c('2', '4'))
df_dict <- as.data.frame(my_dict)
print(df_dict)

# Create DataFrame from an existing DataFrame


my_df <- data.frame(A = c(4, 5, 6, 7))
df_existing <- data.frame(my_df)
print(df_existing)

# Create DataFrame from a Series


my_series <- c("London", "New Delhi", "Washington", "Brussels")
names(my_series) <- c("United Kingdom", "India", "United States", "Belgium")
df_series <- data.frame(Country = names(my_series), Capital = my_series)
print(df_series)

# Get the shape of the DataFrame


print(dim(df_2darray))

22
OUTPUT:

X1 X2 X3
1 1 2 3
2 4 5 6
X1 X2 X3
1 1 1 2
2 3 2 4
A
14
25
36
47
Country Capital
United Kingdom United Kingdom London
India India New Delhi
United States United States Washington
Belgium Belgium Brussels
[1] 2 3

[Execution complete with exit code 0]

RESULT:
Thus, the data frame is created using R

23
Exp No:9 HANDLING MISSING VALUES AND DUPLICATE VALUES
Date: 24/04/2024 USING R

AIM:
To implement operations of Data Cleansing and Preparation.

Data cleaning and preparation are crucial steps in the data science process. They involve
handling missing values, dealing with outliers, transforming variables, and ensuring that the
data is in a suitable format for analysis.

Handling Missing Values:

 Identify missing values in the dataset.


 Decide on a strategy for handling missing values:
 Remove rows or columns with missing values.
 Impute missing values with the mean, median, mode, or using more sophisticated
methods like regression imputation.
 Use domain knowledge to impute missing values.

Handling Duplicates:

 Check for and remove duplicate rows.


 Ensure that each row or observation is unique.

Handling Outliers:

 Identify and analyze outliers in the data.


 Decide on a strategy for dealing with outliers:
 Remove outliers if they are data entry errors.
 Transform data (e.g., log transformation) to reduce the impact of outliers.
 Use robust statistical methods that are less sensitive to outliers.

Variable Transformation:

 Transform variables if needed for better representation or analysis:


 Log transformation for skewed distributions.
 Standardization (scaling) of numerical variables.
 Encoding categorical variables using one-hot encoding, label encoding, or other
methods.
 Binning or discretization of continuous variables.

Dealing with Data Types:

 Ensure that data types are appropriate for analysis.


 Convert data types (e.g., converting string representations of numbers to actual
numeric values).

24
Handling Text Data:

 Tokenize and preprocess text data.


 Remove stop words and special characters.
 Perform stemming or lemmatization.

Dealing with Date and Time:

 Convert date and time columns to appropriate formats.


 Extract features like day of the week, month, or year.

Handling Categorical Data:

 Convert categorical variables to numerical representations using encoding techniques.


 Ensure that the encoding is suitable for the machine learning algorithm being used.

Data Integration:

 Merge or join datasets if needed for analysis.


 Ensure that the integration process is appropriate for the data and the analysis being
performed.

Data Scaling:

 Scale numerical features to a standard range if necessary, especially when using


algorithms sensitive to scale.

Data Splitting:

 Split the dataset into training and testing sets for model evaluation.

PROGRAM CODE:

Handling Duplicates values:


data <- data.frame(
Name = c("John", "Alice", "Bob", NA, "Charlie"),
Age = c(25, 30, NA, 35, 28),
City = c("New York", "San Francisco", "Chicago", "Los Angeles", "Seattle")
)

# Check for missing values in the entire dataset


missing_values <- is.na(data)

# Print the dataset with an indication of missing values


print("Original Dataset:")
print(data)

print("\nMissing Values Indicator:")


print(missing_values)
OUTPUT:

25
[1] "Original Dataset:"
Name Age City
1 John 25 New York
2 Alice 30 San Francisco
3 Bob NA Chicago
4 <NA> 35 Los Angeles
5 Charlie 28 Seattle
[1] "\nMissing Values Indicator:"
Name Age City
[1,] FALSE FALSE FALSE
[2,] FALSE FALSE FALSE
[3,] FALSE TRUE FALSE
[4,] TRUE FALSE FALSE
[5,] FALSE FALSE FALSE

[Execution complete with exit code 0]

Handling Duplicates values:

# Create a sample dataset with duplicate rows


data <- data.frame(
ID = c(1, 2, 3, 4, 2, 5),
Name = c("John", "Alice", "Bob", "Charlie", "Alice", "Eva"),
Age = c(25, 30, 22, 35, 30, 28),
City = c("New York", "San Francisco", "Chicago", "Los Angeles", "San Francisco",
"Seattle")
)

# Check for and remove duplicate rows


duplicated_rows <- duplicated(data)
data_no_duplicates <- data[!duplicated_rows, ]

# Print the original dataset


print("Original Dataset:")
print(data)

# Print the dataset without duplicates


print("\nDataset without Duplicates:")
print(data_no_duplicates)

OUTPUT:

26
[1] "Original Dataset:"
ID Name Age City
1 1 John 25 New York
2 2 Alice 30 San Francisco
3 3 Bob 22 Chicago
4 4 Charlie 35 Los Angeles
5 2 Alice 30 San Francisco
6 5 Eva 28 Seattle
[1] "\nDataset without Duplicates:"
ID Name Age City
1 1 John 25 New York
2 2 Alice 30 San Francisco
3 3 Bob 22 Chicago
4 4 Charlie 35 Los Angeles
6 5 Eva 28 Seattle

[Execution complete with exit code 0]

RESULT:
Thus, the program for missing values and duplicate values implemented.
Exp No:10 DATA INTEGRATION IN R
Date: 24/04/2024

27
AIM:
To integrate two datasets using R programming.

DESCRIPTION:
Data integration is the process of combining and unifying data from different sources into a
single, cohesive view. In R, there are various tools and packages that facilitate data
integration tasks. Below is a description of the general process and some commonly used R
packages for data integration:

Data Integration Process:

Data Collection: Gather data from various sources, which may include databases,
spreadsheets, APIs, or other file formats.

Data Cleaning: Clean and preprocess the data to handle missing values, outliers, and
inconsistencies.

Data Transformation: Transform the data to a common format or structure, ensuring


compatibility between different datasets.

Data Merging: Combine datasets based on common keys or variables to create a unified
dataset.

Data Validation: Check the integrity of the integrated dataset to ensure accuracy and
reliability.

R Packages for Data Integration:

dplyr: Part of the tidyverse, dplyr provides a set of functions for data manipulation. The
mutate(), filter(), and select() functions are commonly used for transforming and cleaning
data.

tidyr: Also part of the tidyverse, tidyr is useful for reshaping and cleaning data. Functions like
gather() and spread() can be handy for transforming data into a more suitable format.

merge(): The base R function merge() is used for merging datasets based on common
columns. It allows for inner and outer joins, similar to SQL.

data.table: The data.table package is known for its efficiency in handling large datasets. It
provides fast and concise syntax for data manipulation and merging.

dtplyr: This is an extension of dplyr that uses data.table under the hood, combining the syntax
of dplyr with the performance of data.table.

sqldf: If you are familiar with SQL, the sqldf package allows you to perform SQL queries on
R data frames, which can be useful for merging and transforming data.

reshape2: This package provides functions like melt() and dcast() for reshaping data frames,
which can be helpful during the transformation phase.

28
Program Code:
# Create two sample datasets
data1 <- data.frame(ID = c(1, 2, 3), Name = c("John", "Alice", "Bob"))
data2 <- data.frame(ID = c(2, 3, 4), Age = c(25, 30, 22))

# Merge datasets based on the common column "ID"


merged_data <- merge(data1, data2, by = "ID", all = TRUE)

# Print the original datasets


print("Dataset 1:")
print(data1)

print("\nDataset 2:")
print(data2)

# Print the merged dataset


print("\nMerged Dataset:")
print(merged_data)

OUTPUT:

[1] "Dataset 1:"


ID Name
1 1 John
2 2 Alice
3 3 Bob
[1] "\nDataset 2:"
ID Age
1 2 25
2 3 30
3 4 22
[1] "\nMerged Dataset:"
ID Name Age
1 1 John NA
2 2 Alice 25
3 3 Bob 30
4 4 <NA> 22

[Execution complete with exit code 0]

RESULT:
Thus, the integration of two dataset is implemented using R.

29
30

You might also like