0% found this document useful (0 votes)
5 views9 pages

Assignment 2

The document discusses the differences between traditional programming and machine learning in energy analytics, highlighting that traditional programming relies on explicit rules while machine learning learns from data patterns. It outlines a project framework for forecasting energy demand, detailing steps such as representation, data collection, preparation, model selection, training, evaluation, and prediction. Additionally, it provides a guide for preparing hourly energy consumption data using Pandas and calculating statistical metrics using NumPy.

Uploaded by

chandanjat18
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views9 pages

Assignment 2

The document discusses the differences between traditional programming and machine learning in energy analytics, highlighting that traditional programming relies on explicit rules while machine learning learns from data patterns. It outlines a project framework for forecasting energy demand, detailing steps such as representation, data collection, preparation, model selection, training, evaluation, and prediction. Additionally, it provides a guide for preparing hourly energy consumption data using Pandas and calculating statistical metrics using NumPy.

Uploaded by

chandanjat18
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Assignment- 1 (Unit 2)

Date of issue: Last date of submission:

Question 1: Explain the differences between traditional programming and machine learning
in the context of energy analytics. Provide an example of how each approach can be used to
solve a specific problem related to energy consumption prediction.
Certainly! Traditional programming and machine learning (ML) approach problems in
fundamentally different ways, especially in the context of energy analytics.

Traditional Programming

Approach:
In traditional programming, a developer explicitly writes rules and algorithms to process data
and produce outputs. This method relies heavily on predefined logic and heuristics, meaning
the programmer must have a deep understanding of the problem domain.

Example in Energy Analytics:


For predicting energy consumption, a traditional programming approach might involve
creating a detailed algorithm that considers various factors such as time of day, temperature,
and historical consumption patterns. For instance, you might create a rule-based model that
states:

- If it's daytime, increase the prediction by a fixed percentage based on the historical average
for that time.
- If the temperature is above a certain threshold, adjust the prediction upwards because
cooling systems are likely to be in use.

This model would require continual tweaking and updating based on new data or changing
conditions, as it is rigid and relies on fixed rules.

Machine Learning

Approach:
Machine learning, on the other hand, uses data-driven techniques to learn patterns from
historical data without explicit programming for every rule. Instead, the model is trained on a
dataset, which allows it to identify relationships and make predictions based on new, unseen
data.

Example in Energy Analytics:


For energy consumption prediction using ML, you could employ a regression model or a
time-series forecasting technique. Here’s how it might work:

1. Data Collection:Gather historical energy consumption data along with various influencing
factors (e.g., temperature, occupancy, time of year).
2. Model Training:Use this historical data to train a machine learning model (e.g., a neural
network, random forest, or gradient boosting).
3. Prediction: Once trained, the model can predict future energy consumption based on new
inputs (e.g., upcoming weather forecasts, day of the week).

For example, you might find that the ML model can identify complex interactions between
multiple factors that are not easily captured by a traditional rule-based system, leading to
more accurate predictions.

Summary

Traditional Programming: Explicitly defined rules and heuristics; suitable for well-
understood problems but can be inflexible and require constant updates.
Example: Rule-based algorithm adjusting energy predictions based on time and temperature.

Machine Learning: Data-driven, learns from patterns; more adaptive and can handle complex
relationships in data.
Example: Regression model predicting energy consumption based on historical data and
various features like weather and occupancy.

Both approaches have their place, but ML often offers more flexibility and accuracy in
dynamic fields like energy analytics.
Question 2: Describe how each element (representation, data collection, data preparation,
model selection, model training, model evaluation, and prediction) would be implemented in
a project aimed at forecasting energy demand for a city.

1. Representation

Implementation:

 Feature Selection: Identify relevant features that may influence energy demand. This could
include:
o Historical energy consumption data (hourly/daily)
o Weather data (temperature, humidity, precipitation)
o Time features (hour of the day, day of the week, holidays)
o Demographic data (population density, economic indicators)
o Events (local festivals, sports events)
 Target Variable: Define the target variable as the total energy consumption for the city,
aggregated by the desired time interval (e.g., hourly, daily).

2. Data Collection

Implementation:

 Sources: Gather data from various sources:


o Energy utility companies for historical consumption data.
o Meteorological departments for weather data.
o Local government databases for demographic and event data.
 APIs and Databases: Utilize APIs to automate data retrieval (e.g., weather APIs) and
maintain a database for storage.

3. Data Preparation

Implementation:

Cleaning: Remove missing values, outliers, and duplicate records. Ensure data is
consistent and formatted correctly.

Transformation: Convert categorical variables (e.g., day of the week) into numerical
format using one-hot encoding.

Feature Engineering: Create additional features that may enhance model


performance, such as lagged consumption values (previous day's demand) or rolling
averages.
Normalization/Scaling: Normalize or scale numerical features to improve model
performance, particularly for algorithms sensitive to feature scales.

4. Model Selection

Implementation:

 Algorithm Choice: Evaluate various algorithms suitable for time series forecasting and
regression tasks, such as:

o Linear Regression
o Decision Trees or Random Forests
o Gradient Boosting Machines (GBM)
o Recurrent Neural Networks (RNN) for time series data
 Framework: Choose appropriate machine learning frameworks (e.g., scikit-learn,
TensorFlow, or PyTorch) based on the selected algorithms.

5. Model Training

Implementation:

Training and Validation Split: Divide the dataset into training and validation sets
(e.g., 80% training, 20% validation) to assess model performance.

Hyperparameter Tuning: Use techniques like Grid Search or Random Search to


optimize hyperparameters for the selected model.

Training Process: Fit the model to the training data, allowing it to learn the
relationships between features and energy demand.

6. Model Evaluation

Implementation:

Metrics Selection: Choose appropriate evaluation metrics based on the project goals,
such as:

o Mean Absolute Error (MAE)


o Root Mean Squared Error (RMSE)
o Mean Absolute Percentage Error (MAPE)

Validation: Evaluate model performance using the validation set and assess
overfitting by checking performance on unseen data.

Cross-Validation: Optionally, employ k-fold cross-validation for a more robust


assessment of model performance.
7. Prediction

Implementation:

Future Input Data: Collect and prepare future input data (e.g., weather forecasts,
upcoming events) for prediction.

Model Deployment: Deploy the trained model in a production environment (e.g.,


using cloud platforms) to allow for real-time predictions.

Real-time Prediction: Implement a system that regularly fetches new data, updates
the input features, and generates energy demand forecasts at specified intervals (e.g.,
hourly, daily).

Reporting and Visualization: Create dashboards or reports to visualize the predicted


energy demand, enabling stakeholders to make informed decisions based on the
forecasts.

Question 3: For hourly energy consumption data for a year, describe the steps for preparing
this data for a machine learning model using Pandas. Include how you would handle missing
values, normalize the data, and create new features such as day of the week or hour of the
day.

1. Load the Data

import pandas as pd

Load the data

data = pd.read_csv('hourly_energy_consumption.csv')

2. Inspect the Data

Inspect the first few rowsprint(data.head())

Check for missing values and data


typesprint(data.info())print(data.isnull().sum())

3. Handle Missing Values

Approach:

 You can fill missing values using various strategies, depending on the nature of the data.
Common approaches include forward filling, backward filling, or using interpolation.
Forward fill to handle missing values

data['consumption'] = data['consumption'].fillna(method='ffill')

Alternatively, you could use interpolation# data['consumption'] =


data['consumption'].interpolate()

4. Convert Date/Time Column

Ensure your date/time column is in the correct date time format. If your data set includes a
time stamp:

Convert the 'time stamp' column to date time

data['time stamp'] = pd.to_date time(data['time stamp'])

5. Set the Index (Optional)

Setting the time stamp as the index can be useful for time series analysis.

python

Copy code

# Set time stamp as the index

data.set_index('time stamp', inplace =True)

6. Create New Features

You can extract useful features from the date time index:

python

Copy code

# Create new features

data['hour'] = data.index.hour

data['day_of_week'] = data.index.day of week # Monday=0, Sunday=6

data['month'] = data.index.month

data['year'] = data.index.year

data['is_weekend'] = (data['day_of_week'] >= 5).as-type(int) # 1 if


weekend, 0 if weekday
7. Normalize the Data

Normalization helps in scaling the data to a standard range, which is particularly useful for
algorithms sensitive to feature scales.

python

Copy code

from sklearn.preprocessing import MinMaxScaler

# Initialize the scaler

scaler = MinMaxScaler()

Normalize the consumption data

data['consumption_normalized'] =
scaler.fit_transform(data[['consumption']])

8. Drop Unnecessary Columns

If there are any columns you won't use in your model (like the original consumption), you
can drop them:

data.drop(columns=['consumption'], inplace=True)

9. Final Data Preparation

Ensure the data is ready for modeling by checking its shape and content:

Check the final shape and head of the prepared


dataprint(data.shape)print(data.head())

Question 4: Take a dataset of monthly energy consumption over the past 10 years, use
NumPy to calculate the following:
a. Mean and median monthly energy consumption.
b. Standard deviation and variance of monthly energy consumption.
c. The 25th and 75th percentiles of the monthly energy consumption.
Provide the Python code you would use to perform these calculations.
import numpy as np
import pandas as pd

Load the dataset (assuming the data is in a CSV file with a column named
'monthly_consumption')
data = pd.read_csv('monthly_energy_consumption.csv')

Extract the monthly consumption values into a NumPy array


monthly_consumption = data['monthly_consumption'].values

a. Mean and median monthly energy consumption


mean_consumption = np.mean(monthly_consumption)
median_consumption = np.median(monthly_consumption)

print(f'Mean Monthly Energy Consumption: {mean_consumption}')


print(f'Median Monthly Energy Consumption: {median_consumption}')

b. Standard deviation and variance of monthly energy consumption


std_deviation = np.std(monthly_consumption)
variance = np.var(monthly_consumption)

print(f'Standard Deviation of Monthly Energy Consumption: {std_deviation}')


print(f'Variance of Monthly Energy Consumption: {variance}')

c. The 25th and 75th percentiles of the monthly energy consumption


percentile_25 = np.percentile(monthly_consumption, 25)
percentile_75 = np.percentile(monthly_consumption, 75)

print(f'25th Percentile of Monthly Energy Consumption: {percentile_25}')


print(f'75th Percentile of Monthly Energy Consumption: {percentile_75}')

You might also like