Assignment 2
Assignment 2
Question 1: Explain the differences between traditional programming and machine learning
in the context of energy analytics. Provide an example of how each approach can be used to
solve a specific problem related to energy consumption prediction.
Certainly! Traditional programming and machine learning (ML) approach problems in
fundamentally different ways, especially in the context of energy analytics.
Traditional Programming
Approach:
In traditional programming, a developer explicitly writes rules and algorithms to process data
and produce outputs. This method relies heavily on predefined logic and heuristics, meaning
the programmer must have a deep understanding of the problem domain.
- If it's daytime, increase the prediction by a fixed percentage based on the historical average
for that time.
- If the temperature is above a certain threshold, adjust the prediction upwards because
cooling systems are likely to be in use.
This model would require continual tweaking and updating based on new data or changing
conditions, as it is rigid and relies on fixed rules.
Machine Learning
Approach:
Machine learning, on the other hand, uses data-driven techniques to learn patterns from
historical data without explicit programming for every rule. Instead, the model is trained on a
dataset, which allows it to identify relationships and make predictions based on new, unseen
data.
1. Data Collection:Gather historical energy consumption data along with various influencing
factors (e.g., temperature, occupancy, time of year).
2. Model Training:Use this historical data to train a machine learning model (e.g., a neural
network, random forest, or gradient boosting).
3. Prediction: Once trained, the model can predict future energy consumption based on new
inputs (e.g., upcoming weather forecasts, day of the week).
For example, you might find that the ML model can identify complex interactions between
multiple factors that are not easily captured by a traditional rule-based system, leading to
more accurate predictions.
Summary
Traditional Programming: Explicitly defined rules and heuristics; suitable for well-
understood problems but can be inflexible and require constant updates.
Example: Rule-based algorithm adjusting energy predictions based on time and temperature.
Machine Learning: Data-driven, learns from patterns; more adaptive and can handle complex
relationships in data.
Example: Regression model predicting energy consumption based on historical data and
various features like weather and occupancy.
Both approaches have their place, but ML often offers more flexibility and accuracy in
dynamic fields like energy analytics.
Question 2: Describe how each element (representation, data collection, data preparation,
model selection, model training, model evaluation, and prediction) would be implemented in
a project aimed at forecasting energy demand for a city.
1. Representation
Implementation:
Feature Selection: Identify relevant features that may influence energy demand. This could
include:
o Historical energy consumption data (hourly/daily)
o Weather data (temperature, humidity, precipitation)
o Time features (hour of the day, day of the week, holidays)
o Demographic data (population density, economic indicators)
o Events (local festivals, sports events)
Target Variable: Define the target variable as the total energy consumption for the city,
aggregated by the desired time interval (e.g., hourly, daily).
2. Data Collection
Implementation:
3. Data Preparation
Implementation:
Cleaning: Remove missing values, outliers, and duplicate records. Ensure data is
consistent and formatted correctly.
Transformation: Convert categorical variables (e.g., day of the week) into numerical
format using one-hot encoding.
4. Model Selection
Implementation:
Algorithm Choice: Evaluate various algorithms suitable for time series forecasting and
regression tasks, such as:
o Linear Regression
o Decision Trees or Random Forests
o Gradient Boosting Machines (GBM)
o Recurrent Neural Networks (RNN) for time series data
Framework: Choose appropriate machine learning frameworks (e.g., scikit-learn,
TensorFlow, or PyTorch) based on the selected algorithms.
5. Model Training
Implementation:
Training and Validation Split: Divide the dataset into training and validation sets
(e.g., 80% training, 20% validation) to assess model performance.
Training Process: Fit the model to the training data, allowing it to learn the
relationships between features and energy demand.
6. Model Evaluation
Implementation:
Metrics Selection: Choose appropriate evaluation metrics based on the project goals,
such as:
Validation: Evaluate model performance using the validation set and assess
overfitting by checking performance on unseen data.
Implementation:
Future Input Data: Collect and prepare future input data (e.g., weather forecasts,
upcoming events) for prediction.
Real-time Prediction: Implement a system that regularly fetches new data, updates
the input features, and generates energy demand forecasts at specified intervals (e.g.,
hourly, daily).
Question 3: For hourly energy consumption data for a year, describe the steps for preparing
this data for a machine learning model using Pandas. Include how you would handle missing
values, normalize the data, and create new features such as day of the week or hour of the
day.
import pandas as pd
data = pd.read_csv('hourly_energy_consumption.csv')
Approach:
You can fill missing values using various strategies, depending on the nature of the data.
Common approaches include forward filling, backward filling, or using interpolation.
Forward fill to handle missing values
data['consumption'] = data['consumption'].fillna(method='ffill')
Ensure your date/time column is in the correct date time format. If your data set includes a
time stamp:
Setting the time stamp as the index can be useful for time series analysis.
python
Copy code
You can extract useful features from the date time index:
python
Copy code
data['hour'] = data.index.hour
data['month'] = data.index.month
data['year'] = data.index.year
Normalization helps in scaling the data to a standard range, which is particularly useful for
algorithms sensitive to feature scales.
python
Copy code
scaler = MinMaxScaler()
data['consumption_normalized'] =
scaler.fit_transform(data[['consumption']])
If there are any columns you won't use in your model (like the original consumption), you
can drop them:
data.drop(columns=['consumption'], inplace=True)
Ensure the data is ready for modeling by checking its shape and content:
Question 4: Take a dataset of monthly energy consumption over the past 10 years, use
NumPy to calculate the following:
a. Mean and median monthly energy consumption.
b. Standard deviation and variance of monthly energy consumption.
c. The 25th and 75th percentiles of the monthly energy consumption.
Provide the Python code you would use to perform these calculations.
import numpy as np
import pandas as pd
Load the dataset (assuming the data is in a CSV file with a column named
'monthly_consumption')
data = pd.read_csv('monthly_energy_consumption.csv')