"In the stream scenario, data are collected continuously over time over different kinds of hardwares, like sensors, and it comes in very large volumes. That leads to new challenges. You only get one look at the data. You can't possibly store all the data or process all of it.
So how do you redesign traditional data mining algorithms in the stream case, which are far more challenging than static data?"
How do you stop a speeding train?
That's the question researchers in the field of data stream mining might ask themselves as they write algorithms to get the proverbial "one look at the data" as it goes streaming past.
In this episode of Computer Science Spotlight, IBM computer scientist Charu Aggarwal talks about some of the fundamental issues in mining streamed data -- and alludes to two books he has worked on in recent years: Data Streams: Models and Algorithms and Privacy-Preserving Data Mining: Models and Algorithms.
Subscribe to Computer Science Spotlight.
Download the mp3 file (6 minutes, 48 seconds) | Download the transcript (Word Document) |
Series producer: Barbara Finkelstein
Music: Salsa2Long by Kevin MacLeod
Last updated on July 21,2008
