Human Activity Recognition
Human Activity Recognition
This Python code implements human activity recognition using signal processing techniques and
machine learning classifiers. It processes data from a CSV file, extracts time and frequency
domain features, and trains various classifiers to predict human activities. This explanation is
structured to help students understand the code and the underlying theory.
The code starts by importing necessary libraries: pandas for data manipulation, numpy for
numerical operations, scipy.signal for signal processing, scipy.fft for Fourier transforms, and
various modules from sklearn for machine learning tasks. The Google Drive mounting code is
included (commented out), useful if your data is stored there.
try:
df = pd.read_csv(file_path)
print("File loaded successfully!")
except FileNotFoundError:
# ... (error handling)
This block reads the data from a CSV file into a pandas DataFrame. Robust error handling is
included to catch potential issues like the file not being found.
2. Feature Extraction:
The core of the code lies in the feature extraction functions. Features are calculated from the raw
sensor data to represent the underlying activity.
2.1 Time Domain Features:
def calculate_time_features(signal):
rms = np.sqrt(np.mean(signal**2)) # Root Mean Square
shape_factor = rms / np.mean(np.abs(signal)) if np.mean(np.abs(signal))
!= 0 else 0
peak_value = np.max(np.abs(signal))
crest_factor = peak_value / rms if rms != 0 else 0
clearance_factor = peak_value /
np.mean(np.abs(np.sqrt(np.abs(signal))))**2 if
np.mean(np.abs(np.sqrt(np.abs(signal)))) != 0 else 0
impulse_factor = peak_value / np.mean(np.abs(signal)) if
np.mean(np.abs(signal)) != 0 else 0
return rms, shape_factor, peak_value, crest_factor, clearance_factor,
impulse_factor
def calculate_frequency_features(signal):
N = len(signal)
yf = fft(signal) # Fast Fourier Transform
xf = np.linspace(0.0, fs/2, N//2) # Frequency axis
# Band power
f_low = 0.5
f_high = 4
band_indices = np.where((xf >= f_low) & (xf <= f_high))[0]
band_power = np.sum(np.abs(yf[band_indices]))**2
This function calculates frequency-domain features using the Fast Fourier Transform (FFT):
• FFT (Fast Fourier Transform): The FFT decomposes the signal into its constituent
frequencies. yf contains the frequency components, and xf represents the corresponding
frequencies.
• Peak Amplitude and Location: The code finds the peaks in the frequency spectrum using
find_peaks (with refined parameters as per your requirements) and extracts the amplitude
and frequency of the dominant peak.
• Mean Frequency: The average frequency weighted by the magnitude of the frequency
components.
• Band Power: The power within a specific frequency band (e.g., 0.5Hz to 4Hz). This can
be a good indicator of the energy in that particular frequency range.
• Power Bandwidth: The width of the frequency band used for band power calculation.
mean_signal = np.mean(signal)
3. Data Preparation:
Python
features = []
labels = []
for index, row in df.iterrows():
signal = row.iloc[1:45].values # Input values
label = row.iloc[-1] # Corrected label access
# ... (feature calculation calls)
features.append(np.concatenate([time_features, frequency_features,
[mean_signal]]))
labels.append(label)
X = np.array(features)
y = np.array(labels)
This loop iterates through each row of the DataFrame, extracts the signal data and label,
calculates the time and frequency features, and combines them into a single feature vector using
np.concatenate. The features are stored in X, and the corresponding labels are stored in y.
This section trains and evaluates several classifiers using k-fold cross-validation:
• Classifiers: A dictionary stores the different classifiers to be used.
• K-Fold: KFold splits the data into k folds. The code iterates through each fold, using one
fold for testing and the remaining k-1 folds for training. shuffle=True shuffles the data
before splitting, and random_state ensures reproducibility.
• Feature Scaling: StandardScaler scales the features to have zero mean and unit variance.
This is crucial for many machine learning algorithms. It's done inside the cross-validation
loop to prevent data leakage.
• Training and Evaluation: The code trains each classifier on the training data and evaluates
its performance on the test data using accuracy as the metric. The mean and standard
deviation of the accuracy across all folds are printed.