Computational
Computational
NumPy:
- Array Operations:
Provides support for large multi-dimensional arrays and matrices, along with a large library of high-level
mathematical functions to operate on these arrays.
- Mathematical Functions:
Includes functions for operations like statistical analysis, linear algebra, Fourier transforms, and random
number generation.
- Efficiency:
Optimized for performance, allowing operations on arrays to be performed much faster than with
standard Python lists.
Pandas:
- Data Structures:
Introduces data structures like Series (one-dimensional) and Data Frame (two-dimensional) for efficient
data manipulation and analysis.
- Data Manipulation:
Includes functions to handle missing data, such as filling or dropping null values.
Matplotlib:
- Plotting:
Provides a comprehensive library for creating static, animated, and interactive visualizations in Python.
- Customization:
Allows for extensive customization of plots, including control over line styles, font properties, and more.
- Integration:
Works well with other libraries like NumPy and Pandas, enabling easy plotting of data stored in these
structures.
2. Describe what the following command does - 3 marks
This command contains a logical error. In R, if statements require a condition and two separate
commands for the if and else clauses. The correct form should use proper syntax such as:
x <- 3
if(x > 2) {
y <- y
} else {
y <- 3 * x
- If true, y is supposed to be assigned a value. However, y is not defined, so this will result in an error
unless y has been defined previously.
The most basic form of data representation, using binary digits (0s and 1s) to represent all types of data.
b. Text (ASCII/Unicode):
Characters are represented using standards like ASCII or Unicode, allowing text data to be
encoded in a binary format.
c. Integer:
Whole numbers represented in binary form, either as signed or unsigned integers.
d. Floating-point:
Numbers with fractional parts, represented using a specific format (like IEEE 754) to encode the
value in binary.
e. Boolean:
Logical data that can be either true or false, often represented as 1 or 0 in binary.
Supervised Learning:
- Definition: Involves training a model on a labeled dataset, where the correct output is known for each
training example.
- Purpose: Used for tasks like classification and regression where the goal is to predict an output based
on input data.
- Example: Predicting house prices based on features like size, location, and number of rooms.
Unsupervised Learning:
- Definition: Involves training a model on an unlabeled dataset, where the output is not provided, and
the model tries to find patterns or structures in the data.
Overfitting:
- Definition: Occurs when a model learns the training data too well, including noise and outliers, leading
to poor performance on unseen data.
- Symptoms: High accuracy on training data but low accuracy on test data.
- Solution: Use techniques like cross-validation, pruning, regularization, and simplifying the model.
Underfitting:
- Definition: Occurs when a model is too simple to capture the underlying patterns in the data, leading
to poor performance on both training and test data.
- Approach: Break down a large problem into smaller, more manageable sub-problems, solve each sub-
problem individually, and then combine the solutions.
b. Dynamic Programming:
- Approach: Solve complex problems by breaking them down into simpler overlapping sub-problems
and storing the results of these sub-problems to avoid redundant computations.
c. Greedy Algorithm:
- Approach: Make a series of choices by selecting the best option available at each step without
reconsidering previous choices.
- Example: Coin change problem, Kruskal’s algorithm for minimum spanning trees.
Algorithm:
- Definition: A step-by-step procedure or formula for solving a problem, often expressed in pseudocode
or a programming language.
Debugging:
- Definition: The process of identifying, analyzing, and removing errors or bugs in a computer program to
ensure it runs as expected.
8. Write a Python code to create a data frame with appropriate headings from the list - 4 marks
Here's a Python example to create a DataFrame from a list of dictionaries:
python
import pandas as pd
# List of dictionaries
data = [
# Creating DataFrame
df = pd.DataFrame(data)
# Display DataFrame
print(df)
- Industrial Emissions: Key variables might include emissions of specific pollutants like CO2, NOx, SOx.
- Air Quality Metrics: Include variables like PM2.5 levels, ozone levels, and other relevant air quality
indices.
- Reasoning: These variables are chosen because they directly measure the pollutants and air quality
levels which are necessary to assess the impact of industrial emissions.
a. Decomposition: Decompose the time series data into trend, seasonal, and residual components to
understand the underlying patterns.
b. Visualization: Plot time series graphs to visualize trends, seasonal patterns, and anomalies over time.
c. Modeling: Apply time series models like ARIMA, SARIMA, or Exponential Smoothing to model and
forecast air quality trends.
d. Validation: Use techniques like cross-validation on time series data to ensure the model's accuracy.
e. Interpretation: Analyze the results to identify long-term trends, seasonal effects, and potential
impacts of industrial emissions on air quality.
- Definition: Arises when an infinite process is approximated by a finite one, such as truncating an
infinite series or using a finite number of terms.
- Example: Approximating the value of π using a limited number of terms in its series representation.
b. Round-off Error:
- Definition: Occurs due to the finite precision with which computers represent real numbers, leading to
small discrepancies between the true value and its computer representation.
- Example: When performing arithmetic operations on floating-point numbers, the precision limits of the
hardware can introduce small errors that accumulate over multiple operations.