Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8
1.
INITIALIZATION OF R PROGRAMMING
1. Display a welcome message using the cat function.
2. Set up two variables, a and b, with the values 5 and 7, respectively. 3. Perform a simple addition operation and store the result in the variable result. 4. Display the result using the cat function, including a message indicating that it is the sum of a and b. 5. Output a completion message using the cat function.
2. IDENTIFYING TYPES OF VARIABLES: LEVELS OF MEASUREMENTS
1. Sample Data Definition:
• Define sample data vectors for gender, education, temperature, income, and age. 2. Function Definition (identify_measurement): • Define a function named identify_measurement that takes a variable as an argument. 3. Level of Measurement Identification: • Within the identify_measurement function: • Check if the variable is a factor or character: if true, print "Nominal Level." • Check if the variable is ordered: if true, print "Ordinal Level." • Check if the variable is numeric: • If it's an integer, print "Ratio Level (Integer)." • If it's continuous, print "Ratio Level (Continuous)." • If none of the above conditions are met, print "Unable to identify the level of measurement for this variable." 4. Application of the Function: • Apply the identify_measurement function to each of the sample variables (gender, education, temperature, income, and age). • Print the result for each variable. 5. Output: • The program will display the level of measurement for each variable based on its type and structure. 4. INTRODUCTION TO PROBABILITY - RECODING VARIABLES
1. Sample Data Definition:
• Define a vector named original_scores containing numerical values representing test scores. 2. Display Original Scores: • Use the cat function to display the original scores. 3. Function Definition (recode_scores): • Define a function named recode_scores that takes a vector of scores as an argument. • Within the function: • Use the ifelse function to recode scores based on a threshold (e.g., 70). • Assign the recoded scores to a new variable (recoded_scores). • Return the recoded scores. 4. Recode the Original Scores: • Call the recode_scores function with the original_scores vector as an argument. • Assign the returned recoded scores to a variable named recoded_scores. 5. Display Recoded Scores: • Use the cat function to display the recoded scores. 6. Output: • The program will display the original scores and the corresponding recoded scores based on the threshold.
5. UNIVARIATE STATISTICS
1. Sample Data Definition:
• Define a vector named numeric_variable containing numerical values. 2. Function Definition (compute_univariate_statistics): • Define a function named compute_univariate_statistics that takes a vector (variable) as an argument. • Within the function: • Calculate measures of central tendency: mean, median, and mode. • Calculate measures of dispersion: range, minimum, maximum, variance, and standard deviation. • Display the computed statistics using the cat function. 3. Compute Univariate Statistics: • Call the compute_univariate_statistics function with the numeric_variable vector as an argument. 4. Output: • The program will compute and display various univariate statistics for the numeric variable, including mean, median, mode, range, minimum, maximum, variance, and standard deviation.
6. THE NORMAL CURVE - CREATING A HISTOGRAM IN R
1. Set Seed for Reproducibility:
• Use set.seed(123) to set a seed for reproducibility. This ensures that the random numbers generated in the following steps will be the same each time the script is run. 2. Generate Random Data: • Use rnorm to generate 1000 random numbers from a normal distribution with a mean of 0 and a standard deviation of 1. Store the result in the variable data. 3. Create a Histogram: • Use hist to create a histogram of the generated data. • Customize the appearance with parameters such as color, title, axis labels, bar border color, and the number of bins. 4. Overlay Normal Curve on Histogram: • Use curve to overlay a normal probability density function (PDF) on the histogram. • The PDF is generated using dnorm with the mean and standard deviation calculated from the generated data. • Customize the appearance of the curve with parameters such as color, line width, and overlay on the existing plot. 5. Output: • The program will generate a reproducible set of random data, create a histogram of the data with specified customization, and overlay a normal curve on the histogram. This visual representation helps in understanding the distribution of the generated data.
7. STANDARD DEVIATIONS, STANDARD SCORES AND THE NORMAL
DISTRIBUTION
1. Set Seed for Reproducibility:
• Use set.seed(123) to set a seed for reproducibility. This ensures that the random numbers generated in the following steps will be the same each time the script is run. 2. Generate Random Data: • Use rnorm to generate 1000 random numbers from a normal distribution with a mean of 50 and a standard deviation of 10. Store the result in the variable data. 3. Calculate Standard Deviation: • Use sd to calculate the standard deviation of the generated data and store it in the variable std_dev. • Display the calculated standard deviation using cat. 4. Calculate Standard Scores (Z-scores): • Use scale to calculate the standard scores (z-scores) for the generated data and store them in the variable z_scores. • Display the first 5 z-scores using cat and head. 5. Create a Histogram: • Use hist to create a histogram of the generated data with specified customization. 6. Overlay Standard Normal Curve on Histogram: • Use curve to overlay a standard normal probability density function (PDF) on the histogram. • The PDF is generated using dnorm with the mean and standard deviation calculated from the generated data. • Customize the appearance of the curve with parameters such as color, line width, and overlay on the existing plot. 7. Output: • The program will generate a reproducible set of random data, calculate the standard deviation, z-scores, and create a histogram with an overlay of the standard normal curve. This visual representation helps in understanding the distribution of the generated data and its standardization.
8. TESTING THE SIGNIFICANCE OF THE DIFFERENCE BETWEEN TWO
MEANS
1. Set Seed for Reproducibility:
• Use set.seed(123) to set a seed for reproducibility. This ensures that the random numbers generated in the following steps will be the same each time the script is run. 2. Generate Two Sets of Random Data: • Use rnorm to generate 30 random numbers for each of two groups (group1 and group2) from normal distributions with specified means and standard deviations. 3. Perform Independent Samples t-test: • Use t.test to perform an independent samples t-test on the two groups (group1 and group2). • Store the t-test results in the variable t_test_result. 4. Display t-test Results: • Use cat to display the results of the independent samples t-test, including the test statistic, p-value, and degrees of freedom. 5. Check the Significance Level: • Define the significance level (alpha) as 0.05. 6. Determine if the Difference is Statistically Significant: • Use an if-else statement to check if the p-value is less than the significance level (alpha). • Display a conclusion based on whether the null hypothesis is rejected or not. 7. Output: • The program will generate two sets of random data, perform an independent samples t-test, display the t-test results, and provide a conclusion regarding the significance of the difference between the means of the two groups. 9. ONE AND TWO TAILED TESTS
1. Set Seed for Reproducibility:
• Use set.seed(123) to set a seed for reproducibility. This ensures that the random numbers generated in the following steps will be the same each time the script is run. 2. Generate Two Sets of Random Data: • Use rnorm to generate 30 random numbers for each of two groups (group1 and group2) from normal distributions with specified means and standard deviations. 3. Perform Two-Tailed and One-Tailed Independent Samples t-tests: • Use t.test to perform a two-tailed independent samples t-test on the two groups (group1 and group2). • Use t.test with the alternative parameter set to "less" to perform a one-tailed independent samples t-test, assuming group2 mean is less than group1 mean. 4. Display the Results: • Use cat to display the results of both the two-tailed and one-tailed t-tests, including the test statistic, p-value, and degrees of freedom. 5. Check the Significance Level: • Define the significance level (alpha) as 0.05. 6. Determine if the Differences are Statistically Significant: • Use an if-else statement to check if the p-values are less than the adjusted significance level for the two-tailed test (alpha/2) and the one-tailed test (alpha). • Display conclusions based on whether the null hypotheses are rejected or not for both tests. 7. Output: • The program will generate two sets of random data, perform two types of t-tests, display the t-test results, and provide conclusions regarding the significance of the differences between the means of the two groups.
10. BIVARIATE STATICS FOR NOMINAL DATE AND ORDINAL DATE
1. Sample Data Definition:
• Define vectors nominal_data and ordinal_data containing categorical data. 2. Create a Data Frame: • Use data.frame to create a data frame named data_df with two columns, "Nominal" and "Ordinal," containing the respective data vectors. 3. Display the Data Frame: • Use cat and print to display the contents of the data frame. 4. Create a Contingency Table: • Use table to create a contingency table named contingency_table from the data frame. 5. Display the Contingency Table: • Use cat and print to display the contents of the contingency table. 6. Perform Chi-Square Test for Association: • Use chisq.test to perform a chi-square test for association on the contingency table, storing the results in chi_square_result. 7. Display Chi-Square Test Results: • Use cat and print to display the results of the chi-square test. 8. Check the Significance Level: • Define the significance level (alpha) as 0.05. 9. Determine if the Association is Statistically Significant: • Use an if-else statement to check if the p-value from the chi-square test is less than the significance level. • Display a conclusion based on whether the null hypothesis is rejected or not. 10. Output: • The program will create a data frame, generate a contingency table, perform a chi- square test for association, and display the results along with a conclusion about the significance of the association between the nominal and ordinal variables.
11. BIVARIATE STATICS FOR INTERVAL/RATIO DATA
1. Sample Data Definition:
• Define vectors variable1 and variable2 containing numerical data. 2. Create a Data Frame: • Use data.frame to create a data frame named data_df with two columns, "Variable1" and "Variable2," containing the respective data vectors. 3. Display the Data Frame: • Use cat and print to display the contents of the data frame. 4. Create a Scatterplot: • Use plot to create a scatterplot of variable1 against variable2. • Customize the plot with a title, axis labels, point style (pch), and color (col). 5. Calculate the Correlation Coefficient: • Use cor to calculate the correlation coefficient between variable1 and variable2. 6. Display the Correlation Coefficient: • Use cat to display the calculated correlation coefficient. 7. Check the Strength of the Correlation: • Use an if-else statement to check the absolute value of the correlation coefficient and classify the strength of the correlation as weak, moderate, or strong. 8. Output: • The program will create a data frame, generate a scatterplot, calculate the correlation coefficient, display the results, and provide a qualitative assessment of the strength of the correlation between variable1 and variable2.