Original Articles: 2014 Vol: 6 Issue: 7
MFCC-based perceptual hashing for compressed domain of speech content identification
Abstract
Current research on speech content identification aim primarily at raw wideband speech signals, which are
generally transmitted in a compressed format. This makes it unable to meet the demand of speech content
identification in compressed domain. This paper proposes a new speech perceptual hashing algorithm for speech
content identification with compressed domain based on MFCC (Mel Frequency Cepstral Coefficient), to solve
problems of real-time speech content identification and large quantity of voice message information over the mobile
Internet. This algorithm extracts MFCC feature based on the raw wideband method. The process begins by
extracting the MDCT coefficients, which are the intermediately decoded results of compressed speeches in MP3
format. These coefficients are translated to MFCC parameters and the binary hashing values are then generated
from these parameters, combined with human auditory features. This algorithm uses highly compressed data to
realize fast identification for speech content. Experimental results show that the proposed algorithm can realize
tampering localization and increase 5% in efficiency when compared with raw wideband algorithms, with the
precondition of robustness and discrimination.