0% found this document useful (0 votes)
7 views8 pages

Human Activity Recognition

This document explains a Python code implementation for human activity recognition using signal processing and machine learning techniques. It details the process of loading data, extracting time and frequency domain features, and training classifiers with k-fold cross-validation. Additionally, it includes tasks for students to explore feature engineering, classifier comparison, and advanced signal processing techniques.

Uploaded by

anhbin531
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views8 pages

Human Activity Recognition

This document explains a Python code implementation for human activity recognition using signal processing and machine learning techniques. It details the process of loading data, extracting time and frequency domain features, and training classifiers with k-fold cross-validation. Additionally, it includes tasks for students to explore feature engineering, classifier comparison, and advanced signal processing techniques.

Uploaded by

anhbin531
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Human Activity Recognition using Signal Feature Extraction and

Machine Learning: A Code Explanation

This Python code implements human activity recognition using signal processing techniques and
machine learning classifiers. It processes data from a CSV file, extracts time and frequency
domain features, and trains various classifiers to predict human activities. This explanation is
structured to help students understand the code and the underlying theory.

1. Libraries and Data Loading:


import pandas as pd
import numpy as np
from scipy.signal import find_peaks
from scipy.fft import fft
from sklearn.model_selection import KFold
from sklearn.preprocessing import StandardScaler
# ... (other sklearn imports)

The code starts by importing necessary libraries: pandas for data manipulation, numpy for
numerical operations, scipy.signal for signal processing, scipy.fft for Fourier transforms, and
various modules from sklearn for machine learning tasks. The Google Drive mounting code is
included (commented out), useful if your data is stored there.
try:
df = pd.read_csv(file_path)
print("File loaded successfully!")
except FileNotFoundError:
# ... (error handling)

This block reads the data from a CSV file into a pandas DataFrame. Robust error handling is
included to catch potential issues like the file not being found.

2. Feature Extraction:
The core of the code lies in the feature extraction functions. Features are calculated from the raw
sensor data to represent the underlying activity.
2.1 Time Domain Features:
def calculate_time_features(signal):
rms = np.sqrt(np.mean(signal**2)) # Root Mean Square
shape_factor = rms / np.mean(np.abs(signal)) if np.mean(np.abs(signal))
!= 0 else 0
peak_value = np.max(np.abs(signal))
crest_factor = peak_value / rms if rms != 0 else 0
clearance_factor = peak_value /
np.mean(np.abs(np.sqrt(np.abs(signal))))**2 if
np.mean(np.abs(np.sqrt(np.abs(signal)))) != 0 else 0
impulse_factor = peak_value / np.mean(np.abs(signal)) if
np.mean(np.abs(signal)) != 0 else 0
return rms, shape_factor, peak_value, crest_factor, clearance_factor,
impulse_factor

This function calculates several time-domain features:


• RMS (Root Mean Square): Measures the effective magnitude of the signal. It's a good
indicator of the signal's energy.
• Shape Factor: Relates the RMS value to the mean absolute value. It provides information
about the shape of the signal.
• Peak Value: The maximum absolute value of the signal.
• Crest Factor: The ratio of the peak value to the RMS value. It's sensitive to peaks and
outliers in the signal.
• Clearance Factor: Similar to the crest factor but uses the mean of the square root of the
absolute signal.
• Impulse Factor: Ratio of the peak value to the mean absolute value.
These features capture characteristics of the signal in the time domain.
2.2 Frequency Domain Features:

def calculate_frequency_features(signal):
N = len(signal)
yf = fft(signal) # Fast Fourier Transform
xf = np.linspace(0.0, fs/2, N//2) # Frequency axis

peaks, _ = find_peaks(np.abs(yf[0:N//2]), height=0,


distance=int(0.25*N/fs)) # Find peaks
peak_amplitude = np.abs(yf[peaks[0]]) if peaks.size > 0 else 0
peak_location = xf[peaks[0]] if peaks.size > 0 else 0

mean_frequency = np.sum(xf[0:N//2] * np.abs(yf[0:N//2])) /


np.sum(np.abs(yf[0:N//2])) if np.sum(np.abs(yf[0:N//2])) != 0 else 0

# Band power
f_low = 0.5
f_high = 4
band_indices = np.where((xf >= f_low) & (xf <= f_high))[0]
band_power = np.sum(np.abs(yf[band_indices]))**2

power_bandwidth = xf[band_indices[-1]] - xf[band_indices[0]] if


band_indices.size > 1 else 0

return peak_amplitude, peak_location, mean_frequency, band_power,


power_bandwidth

This function calculates frequency-domain features using the Fast Fourier Transform (FFT):
• FFT (Fast Fourier Transform): The FFT decomposes the signal into its constituent
frequencies. yf contains the frequency components, and xf represents the corresponding
frequencies.
• Peak Amplitude and Location: The code finds the peaks in the frequency spectrum using
find_peaks (with refined parameters as per your requirements) and extracts the amplitude
and frequency of the dominant peak.
• Mean Frequency: The average frequency weighted by the magnitude of the frequency
components.
• Band Power: The power within a specific frequency band (e.g., 0.5Hz to 4Hz). This can
be a good indicator of the energy in that particular frequency range.
• Power Bandwidth: The width of the frequency band used for band power calculation.

2.3 Mean Feature:


The code also calculates the mean of the signal as a simple but often informative feature:

mean_signal = np.mean(signal)

3. Data Preparation:
Python

features = []
labels = []
for index, row in df.iterrows():
signal = row.iloc[1:45].values # Input values
label = row.iloc[-1] # Corrected label access
# ... (feature calculation calls)

features.append(np.concatenate([time_features, frequency_features,
[mean_signal]]))
labels.append(label)

X = np.array(features)
y = np.array(labels)

This loop iterates through each row of the DataFrame, extracts the signal data and label,
calculates the time and frequency features, and combines them into a single feature vector using
np.concatenate. The features are stored in X, and the corresponding labels are stored in y.

4. Classifiers and K-Fold Cross-Validation:

classifiers = { # Dictionary of classifiers


# ... (classifier instantiations)
}
k = 5 # Number of folds
kf = KFold(n_splits=k, shuffle=True, random_state=42) # KFold object

for name, clf in classifiers.items():


accuracies = []
for train_index, test_index in kf.split(X):
# ... (data splitting and scaling)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
accuracies.append(accuracy)
print(f"{name} - Mean Accuracy: {np.mean(accuracies):.4f} (+/-
{np.std(accuracies):.4f})")

This section trains and evaluates several classifiers using k-fold cross-validation:
• Classifiers: A dictionary stores the different classifiers to be used.
• K-Fold: KFold splits the data into k folds. The code iterates through each fold, using one
fold for testing and the remaining k-1 folds for training. shuffle=True shuffles the data
before splitting, and random_state ensures reproducibility.
• Feature Scaling: StandardScaler scales the features to have zero mean and unit variance.
This is crucial for many machine learning algorithms. It's done inside the cross-validation
loop to prevent data leakage.
• Training and Evaluation: The code trains each classifier on the training data and evaluates
its performance on the test data using accuracy as the metric. The mean and standard
deviation of the accuracy across all folds are printed.

Key Concepts and Theory:


• Feature Engineering: The process of extracting relevant features from raw data is crucial
for machine learning. The time and frequency domain features calculated in this code are
designed to capture different aspects of the signal related to human activity.
• Fast Fourier Transform (FFT): The FFT is an efficient algorithm for computing the
discrete Fourier transform (DFT). The DFT decomposes a signal into its constituent
frequencies, allowing for analysis in the frequency domain.
• K-Fold Cross-Validation: A robust technique for evaluating machine learning models. It
helps to estimate the model's performance on unseen data and reduces the risk of
overfitting.
• Feature Scaling: Many machine learning algorithms perform better when the features are
scaled to a similar range. This prevents features with larger values from dominating the
learning process.
Tasks for students
1. Feature Engineering Exploration:
Task: Investigate the importance of different features.
Exercises:
• Feature Ablation: Systematically remove each feature (or group of features – time,
frequency, mean) and observe the impact on classifier performance. This helps
understand which features are most discriminative.
• Feature Visualization: Plot the different features for each activity class (e.g., box plots,
histograms). This can help visualize which features are most separable and understand
their distributions.
• Feature Selection: Implement feature selection techniques (e.g., using SelectKBest or
SelectFromModel from sklearn.feature_selection) to find the optimal subset of features.
This can improve performance and reduce computational cost.

2. Classifier Comparison and Tuning:


Task: Compare and tune different classifiers.
Exercises:
• More Classifiers: Experiment with other classifiers available in scikit-learn, such as
Support Vector Machines (SVMs), Gradient Boosting Machines (GBM), or Naive Bayes.
• Hyperparameter Tuning: Use techniques like GridSearchCV or RandomizedSearchCV
from sklearn.model_selection to find the optimal hyperparameters for each classifier.
This significantly impacts performance. Focus on understanding what each
hyperparameter controls and how it affects the model.
• Performance Metrics: Explore other relevant performance metrics beyond accuracy, such
as precision, recall, F1-score, and confusion matrices. Understand the trade-offs between
these metrics and when each is most appropriate. Use classification_report from
sklearn.metrics.
• Cross-Validation Strategies: Experiment with different cross-validation strategies, such as
StratifiedKFold (for imbalanced datasets) or LeaveOneOut cross-validation.

3. Signal Processing Deep Dive:


Task: Explore signal processing techniques in more detail.
Exercises:
• Windowing: Experiment with different windowing functions (e.g., Hamming, Hanning,
Blackman) before calculating the FFT. Understand the effects of windowing on the
frequency spectrum.
• FFT Length: Investigate the impact of different FFT lengths on the frequency resolution.
A longer FFT length provides finer frequency resolution but requires more computation.
• Band Power Refinement: Experiment with different frequency bands for band power
calculation. Research which frequency bands are most relevant for distinguishing
between the activities.
• Filtering: Try applying digital filters (e.g., bandpass, lowpass, highpass) to the raw signal
before feature extraction. This can help remove noise or isolate specific frequency
components.
• Signal Segmentation: Explore different methods for segmenting the signal into windows
for feature extraction. Consider overlapping windows or adaptive windowing techniques.

You might also like