0% found this document useful (0 votes)
20 views2 pages

README

This document provides a summary of a water quality prediction dataset. The dataset contains historical water measurement data from 36 sites in Georgia, USA, including 11 indices like dissolved oxygen, temperature, and specific conductance. The goal is to forecast the next day's pH value based on the input data. The data is split into training and test sets covering different date ranges. It includes features, location IDs, input data for the indices over time, output pH values over time, and location groups that form connected water systems centered around Atlanta and the eastern coast of Georgia. The dataset is derived from the US Geological Survey and is provided in a MATLAB format file.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views2 pages

README

This document provides a summary of a water quality prediction dataset. The dataset contains historical water measurement data from 36 sites in Georgia, USA, including 11 indices like dissolved oxygen, temperature, and specific conductance. The goal is to forecast the next day's pH value based on the input data. The data is split into training and test sets covering different date ranges. It includes features, location IDs, input data for the indices over time, output pH values over time, and location groups that form connected water systems centered around Atlanta and the eastern coast of Georgia. The dataset is derived from the US Geological Survey and is provided in a MATLAB format file.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Water Quality Prediction Dataset

Introduction
Here we want to forecast the spatio-temporal water quality in terms of the “power of hydrogen
(pH)” value for the next day based on the input data, which is the historical data of other water
measurement indices. The input data consists of daily samples for 36 sites, providing
measurements related to pH values in Georgia, USA. The input features consist of 11 common
indices including volume of dissolved oxygen, temperature, and specific conductance (see
details in dataset). The output to predict is the measurement of 'pH, water, unfiltered, field,
standard units (Median)'.
There are two major water systems to consider: one is centered on the city of Atlanta while the
other is centered on the eastern coast of Georgia. This information indicates spatial depenency
among different locations which are important to the forecast.

Processed Data
Download link: [Dataset]

Data format: *.mat (use Matlab to open)

Data description:
Variable Type Size Description
Name
features array of strings 1*11 a list of water indices to measure
location_ids array of integer 37*1 IDs of the water stations
X_te array of 1*282 test set input data: water indices for 282 contiguous dates until
matrices 2018-01-01
 each element is a 37*11 matrix: 37 spatial locations
by 11 features
X_tr array of 1*423 training set input data: water indices for 423 contiguous dates
matrices from 2016-01-28
 each element is a 37*11 matrix: 37 spatial locations
by 11 features
Y_te array of 37*28 test set output data: water quality for 37 locations in 282
matrices 2 contiguous dates until 2018-01-01
Y_tr array of 37*42 training set output data: water quality for 37 locations in 423
matrices 3 contiguous dates from 2016-01-28
location_grou array of cells 1*3 the groups of water stations, each group forms a connected
p spatial network (i.e., water system)

Data Source
This dataset is arranged and partly derived from the United States Geological Survey: [External
Link]

Citation
To use these datasets, please cite the papers:
Liang Zhao, Olga Gkountouna, and Dieter Pfoser. 2019. Spatial Auto-regressive Dependency
Interpretable Learning Based on Spatial Topological Constraints. ACM Trans. Spatial Algorithms
Syst. 5, 3, Article 19 (August 2019), 28 pages. DOI:https://doi.org/10.1145/3339823

You might also like