Tuesday, October 7, 2008

Treatment to missing data corrected in daily RDAHMM analysis

A serious error of daily RDAHMM analysis has been corrected, concerning the way we treat the missing input data of stations. Previously we do nothing about the missing data but recording them, which may lead to incorrect RDAHMM results because the Hidden Markov Model assumes an even distribution of data across time. So now we correct this error by inserting "fake data lines" in to the missing-data time sections, which are duplicated from the available data at a most recent time relative to the missing-data time sections.

After this correction, the total number of stations with state changes dropped significantly (from around 100 on average to less than 30) during for the time from 2006 to 2008. Whether this new result is reasonable still needs verification.

Result Aggregator added to real-time RDAHMM service

A new component, real-time RDAHMM results aggregator, is added in to the real-time RDAHMM service. This aggregator connects to the service through NaradaBrokering, receives messages containing the real-time RDAHMM analysis results from all the seven networks, and aggregates these results and save them into a single .xml file, so that the portlet can show the results by accessing this file.

One problem is the large amount of results. Some stations came across as many as 800+ state changes within a 4-hour test run. So we just save one day's results in this file temporarily.