Streaming Random Forests

Thumbnail Image
Abdulsalam, Hanady
Data mining , Streams , Classification algorithms , Streaming algorithms
Recent research addresses the problem of data-stream mining to deal with applications that require processing huge amounts of data such as sensor data analysis and financial applications. Data-stream mining algorithms incorporate special provisions to meet the requirements of stream-management systems, that is stream algorithms must be online and incremental, processing each data record only once (or few times); adaptive to distribution changes; and fast enough to accommodate high arrival rates. We consider the problem of data-stream classification, introducing an online and incremental stream-classification ensemble algorithm, Streaming Random Forests, an extension of the Random Forests algorithm by Breiman, which is a standard classification algorithm. Our algorithm is designed to handle multi-class classification problems. It is able to deal with data streams having an evolving nature and a random arrival rate of training/test data records. The algorithm, in addition, automatically adjusts its parameters based on the data seen so far. Experimental results on real and synthetic data demonstrate that the algorithm gives a successful behavior. Without losing classification accuracy, our algorithm is able to handle multi-class problems for which the underlying class boundaries drift, and handle the case when blocks of training records are not big enough to build/update the classification model.
External DOI