Online Topic Segmentation of Russian Broadcast News

This paper deals with topic segmentation of continuous speech. We propose an online segmentation method that relies on the information about sentence boundaries obtained from an automatic sentence boundary detection system. We show that using information about sentence boundaries to divide continuous speech into fragments for topic classification provides an increase in classification accuracy of about 25-30%, compared to the method where only a threshold on the number of words is used to divide continuous speech into fragments. The highest average classification F-measure for 5 topics obtained in our experiments is 0.79.