Tuesday, October 7, 2008

Treatment to missing data corrected in daily RDAHMM analysis

A serious error of daily RDAHMM analysis has been corrected, concerning the way we treat the missing input data of stations. Previously we do nothing about the missing data but recording them, which may lead to incorrect RDAHMM results because the Hidden Markov Model assumes an even distribution of data across time. So now we correct this error by inserting "fake data lines" in to the missing-data time sections, which are duplicated from the available data at a most recent time relative to the missing-data time sections.

After this correction, the total number of stations with state changes dropped significantly (from around 100 on average to less than 30) during for the time from 2006 to 2008. Whether this new result is reasonable still needs verification.

No comments: