0% found this document useful (0 votes)
23 views

Computational

Computational past papers
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

Computational

Computational past papers
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

1.

Functions of the three Python packages


(NumPy, Pandas, MatPlotLib) - 6 marks

NumPy:

- Array Operations:

Provides support for large multi-dimensional arrays and matrices, along with a large library of high-level
mathematical functions to operate on these arrays.

- Mathematical Functions:

Includes functions for operations like statistical analysis, linear algebra, Fourier transforms, and random
number generation.

- Efficiency:

Optimized for performance, allowing operations on arrays to be performed much faster than with
standard Python lists.

Pandas:

- Data Structures:

Introduces data structures like Series (one-dimensional) and Data Frame (two-dimensional) for efficient
data manipulation and analysis.

- Data Manipulation:

Provides tools for data cleaning, merging, reshaping, and filtering.

- Handling Missing Data:

Includes functions to handle missing data, such as filling or dropping null values.

Matplotlib:

- Plotting:

Provides a comprehensive library for creating static, animated, and interactive visualizations in Python.

- Customization:

Allows for extensive customization of plots, including control over line styles, font properties, and more.

- Integration:

Works well with other libraries like NumPy and Pandas, enabling easy plotting of data stored in these
structures.
2. Describe what the following command does - 3 marks

x <- 3 if(x>2) y else y <- 3*x

This command contains a logical error. In R, if statements require a condition and two separate
commands for the if and else clauses. The correct form should use proper syntax such as:

x <- 3

if(x > 2) {

y <- y

} else {

y <- 3 * x

In the corrected command:

- x is assigned the value 3.

- The if condition checks if x is greater than 2. Since x is 3, the condition is true.

- If true, y is supposed to be assigned a value. However, y is not defined, so this will result in an error
unless y has been defined previously.

3. State and describe five types of data representation in a computer - 5 marks

a. Binary (Machine Code):

The most basic form of data representation, using binary digits (0s and 1s) to represent all types of data.

b. Text (ASCII/Unicode):
Characters are represented using standards like ASCII or Unicode, allowing text data to be
encoded in a binary format.
c. Integer:
Whole numbers represented in binary form, either as signed or unsigned integers.
d. Floating-point:
Numbers with fractional parts, represented using a specific format (like IEEE 754) to encode the
value in binary.
e. Boolean:
Logical data that can be either true or false, often represented as 1 or 0 in binary.

4. Explain the difference between supervised and unsupervised learning - 4 marks

Supervised Learning:

- Definition: Involves training a model on a labeled dataset, where the correct output is known for each
training example.

- Purpose: Used for tasks like classification and regression where the goal is to predict an output based
on input data.

- Example: Predicting house prices based on features like size, location, and number of rooms.

Unsupervised Learning:

- Definition: Involves training a model on an unlabeled dataset, where the output is not provided, and
the model tries to find patterns or structures in the data.

- Purpose: Used for tasks like clustering and dimensionality reduction.

- Example: Grouping customers into segments based on purchasing behavior.

5. Differentiate between overfitting and underfitting in data models - 4 marks

Overfitting:

- Definition: Occurs when a model learns the training data too well, including noise and outliers, leading
to poor performance on unseen data.

- Symptoms: High accuracy on training data but low accuracy on test data.

- Solution: Use techniques like cross-validation, pruning, regularization, and simplifying the model.

Underfitting:

- Definition: Occurs when a model is too simple to capture the underlying patterns in the data, leading
to poor performance on both training and test data.

- Symptoms: Low accuracy on both training and test data.


- Solution: Use more complex models, adding features and reducing bias.

6. Briefly describe any three problem-solving strategies - 6 marks

a. Divide and Conquer:

- Approach: Break down a large problem into smaller, more manageable sub-problems, solve each sub-
problem individually, and then combine the solutions.

- Example: Sorting algorithms like Merge Sort and Quick Sort.

b. Dynamic Programming:

- Approach: Solve complex problems by breaking them down into simpler overlapping sub-problems
and storing the results of these sub-problems to avoid redundant computations.

- Example: Fibonacci sequence calculation, shortest path algorithms like Dijkstra's.

c. Greedy Algorithm:

- Approach: Make a series of choices by selecting the best option available at each step without
reconsidering previous choices.

- Example: Coin change problem, Kruskal’s algorithm for minimum spanning trees.

7. Define the following terms - 2 marks

Algorithm:

- Definition: A step-by-step procedure or formula for solving a problem, often expressed in pseudocode
or a programming language.

Debugging:

- Definition: The process of identifying, analyzing, and removing errors or bugs in a computer program to
ensure it runs as expected.

8. Write a Python code to create a data frame with appropriate headings from the list - 4 marks
Here's a Python example to create a DataFrame from a list of dictionaries:

python

import pandas as pd

# List of dictionaries

data = [

{'Name': 'Alice', 'Age': 25, 'City': 'New York'},

{'Name': 'Bob', 'Age': 30, 'City': 'Los Angeles'},

{'Name': 'Charlie', 'Age': 35, 'City': 'Chicago'}

# Creating DataFrame

df = pd.DataFrame(data)

# Display DataFrame

print(df)

9. Environmental data analysis - 16 marks

Preprocessing Steps (5 marks):

a. Handling Missing Data:


Identify missing values and decide whether to fill them (imputation) or remove them. For
instance, using mean/mode for imputation or dropping rows/columns with excessive missing
data.
b. Outlier Detection:
Identify and handle outliers using statistical methods or visualization techniques like box plots.
c. Normalization/Standardization:
Normalize or standardize data to bring different features onto a similar scale, which can
improve the performance of many machine learning algorithms.
d. Encoding Categorical Data:
Convert categorical variables into numerical format using techniques like one-hot encoding.
e. Data Splitting:
Split the dataset into training and testing sets to validate the model's performance on unseen
data.

Correlation Analysis (4 marks):

a. Calculate Correlation Coefficients:


Use methods like Pearson, Spearman, or Kendall to calculate correlation coefficients between
industrial emissions and air quality metrics.
b. Visualize Correlation:
Create correlation matrices and heatmaps to visualize the relationships between different
variables.
c. Interpret Results:
Analyze the correlation coefficients to understand the strength and direction of the
relationships.

Variables Selection (2 marks):

- Industrial Emissions: Key variables might include emissions of specific pollutants like CO2, NOx, SOx.

- Air Quality Metrics: Include variables like PM2.5 levels, ozone levels, and other relevant air quality
indices.

- Reasoning: These variables are chosen because they directly measure the pollutants and air quality
levels which are necessary to assess the impact of industrial emissions.

Time Series Analysis ( 5 marks):

a. Decomposition: Decompose the time series data into trend, seasonal, and residual components to
understand the underlying patterns.

b. Visualization: Plot time series graphs to visualize trends, seasonal patterns, and anomalies over time.

c. Modeling: Apply time series models like ARIMA, SARIMA, or Exponential Smoothing to model and
forecast air quality trends.

d. Validation: Use techniques like cross-validation on time series data to ensure the model's accuracy.

e. Interpretation: Analyze the results to identify long-term trends, seasonal effects, and potential
impacts of industrial emissions on air quality.

10. Discuss the two sources of errors in computational methods - 4 marks


a. Truncation Error:

- Definition: Arises when an infinite process is approximated by a finite one, such as truncating an
infinite series or using a finite number of terms.

- Example: Approximating the value of π using a limited number of terms in its series representation.

b. Round-off Error:

- Definition: Occurs due to the finite precision with which computers represent real numbers, leading to
small discrepancies between the true value and its computer representation.

- Example: When performing arithmetic operations on floating-point numbers, the precision limits of the
hardware can introduce small errors that accumulate over multiple operations.

You might also like