During my summer internship, I tried to read the station observational data from the Atmospheric Science Research and Application Databank(大氣科學研究與應用資料庫,或稱大氣水文資料庫). However, I found it is difficult to access, so I fix some bug and published the code I used.
By this reader, you can:
- Quick load ASRAD data with mutil-threads
- Useful format for timeseries analyze
Since this class is inherited from pandas.DataFrame, you can use its functions.
- NumPy 1.26.2 or higher
- pandas 2.1.4 or higher
- python 3.12.8 or higher
- Download the ASRAD_reader.py, and put into same folder with main.py
- Import this reader as a normal libraries, see below:
from ASRAD_reader import NanMode import ASRAD_reader as Reader
Use example below to input the file.
file_path = "datas/20029999_cwb_hr/20021099.cwb_hr.txt"
df = Reader.DataSet.read_file(file_path)The mode parameter, is a enumeration named NanMode, which defines what values should be treated as NaN (Not a Number).
There are 4 specific mode: ObsEmpty, AllEmpty, NotInObs, and AllValue. Default is NotInObs.
df = Reader.DataSet.read_file(file_path, mode = NanMode.AllValue)The drop_nan parameter,
if set to True, rows containing NaN values will be removed from the DataFrame. Defaults to False.
df = Reader.DataSet.read_file(file_path, drop_nan = True)The is_utf8 parameter, specifies whether the file is encoded in UTF-8 (True) or Big5 (False).
df = Reader.DataSet.read_file(file_path, is_utf8 = True)The mode and its special value:
| Mode | Description |
|---|---|
ObsEmpty |
Any reason without observation. |
AllEmpty |
All but without trace. |
NotInObs |
No data because no observation. |
AllValue |
All special value cases. |
Example below:
folder_path = "datas/"
df = Reader.DataSet.read_folder(folder_path)Other parameter of read_folder:
mode, optional, default isNanMode.AllEmpty.max_threads, optional, default is 4.drop_nan, optional, default isTrue.
Or only read specific columns:
folder_path = "datas/"
df = Reader.DataSet.read_folder_selected(folder_path)Other parameter of read_folder_selected:
mode, optional, default isNanMode.AllEmpty.max_threads, optional, default is 4.drop_nan, optional, default isTrue.selected_cols, optional, default is["TX01", "PP01", "PS01", "RH01", "WD01", "WD02"].station_number, optional, default is 467490.
- To normal .csv via DataFrame function:
For detail, please refer to pandas document.
df.to_csv("datas.csv")
- To special .csv, which could load by this reader
And use the command below to load back.
df.to_datasets_csv("datas.csv")
df = Reader.DataSet.read_dataset_csv("datas.csv")