Adaptive Time-frequency Transform (ATFT) Coding of Wideband Audio Signals

Project: Adaptive Time-frequency Transform (ATFT) Coding of Wideband Audio Signals

Project Summary

This project addresses an important issue facing the state-of-the-art digital audio technology. Currently the data rate associated with high fidelity wideband audio signal is enormous for many transmission channels and storage media. The objective of this project is to develop a high quality low bit-rate audio coder that takes advantage of the human psychoacoustical properties, and the true non-stationary nature of audio signals. The main challenge of this project will be in designing non-stationary signal analysis algorithm that will exploit the joint time and frequency correlation, and provide an energy compact representation of the audio signal in a fewer transform coefficients.

The objective is to develop a computationally less expensive high quality low bit-rate audio coder to meet the increasing demand of the start-of-the-art digital audio technology. It is hypothesized that the high quality low bit-rate coder can be achieved by exploiting the true non-stationary characteristics of wideband audio signals, and by taking advantage of the human psychoacoustical properties. The details of the proposed project are outlined below in four stages:

STAGE 1:
To start with, the wideband audio signal will be sampled and quantized. The audio signal at this stage is converted into a digital audio, typically of PCM format. The data rate of a high fidelity audio signal is about 1.4 Mb/s for a 44.1 kHz sampling rate and 16 bits/sample quantization. This data rate is simply too high for many transmission channels and storage media. As a result, coding algorithms that reduce the output data rate have received much attention. These algorithms compress the audio signal by exploiting the statistical, temporal and spatial redundancies that are an integral part of any audio signal. A brief review on commonly used compression schemes, and a detailed description of the proposed compression scheme are discussed in stage 2.

STAGE 2:
Two fundamentally different techniques are available for the compression of PCM audio data: time domain and frequency domain coding. In time domain coding, the temporal redundancy between audio samples is exploited. The motivation for time domain coding of audio signals is to represent a orrelated waveform in terms of difference samples, such that one can maintain the same signal-to-noise ratio (SNR) at a reduced bit rate. Frequency domain coders are designed to identify and remove redundancy in frequency domain. A common feature of all frequency domain coders is the transformation technique used. The mapping into frequency domain is accomplished by a transform, resulting in a transform coder, or by subband decomposition, resulting in a subband coder.

In case of audio signals, the instruments used as well as the music style affect their spectral content, and the spectra varies with time. It is evident that the audio signals are non-stationary random processes, and it is difficult to analyze such signals either in time or frequency domain alone. Popular coding techniques such as discrete cosine transform (a transform coding technique) and linear predictive coding (a time domain coding technique) do not yield optimal performance in compressing non-stationary audio data.

In this project, we propose to develop a transform coder that takes into account the non-stationary behavior of audio signals. Non-stationary signals could be analyzed by using joint time-frequency (TF) transformations. In a TF transform the energy of a signal is mapped into a two-dimensional time and frequency plane. There are a various types of quadratic TF distributions, and it is difficult to determine the best (or optimal) one for analyzing audio signals. One way of providing an optimal analysis is by designing a TF transformation totally adapted to the signal characteristics, and we call such a technique as adaptive TF transformation (ATFT).

The main focus of this project will be in designing the ATFT algorithm suitable for audio signals. The ATFT algorithm will be based on the idea of signal decomposition. Once the signal is decomposed into components, the ATFT of the signal will be constructed by combining the commonly used quadratic distributions (e.g., Wigner distributions) of each components. The key to the successful design of ATFT lies in the selection of the signal decomposition algorithm. The components obtained depend on the basis functions used. For example, the basis function of the Fourier transform decomposes the signal into tonal components, and the basis function of the wavelet transform decomposes the signal into components with good time and scale properties. The components obtained by decomposing a signal using basis functions with good TF properties may be ideal candidate for ATFT coding. It is known in the literature that the Guassion functions provide an optimal TF resolution. Therefore the signal decomposition in ATFT will be performed by projecting the signal onto an overcomplete collection of Guassian basis functions. Now the decomposed signal components represented in terms of ATFT coefficients will contain the joint time and frequency correlation information.

Also the signal decomposition concept inherent in ATFT will find application in an important area of audio signal processing known as adaptive denoising. Adaptive denoising is a novel technique for reducing noise in cases where the spectra of the signal and noise overlap significantly, and forms the basis for high quality audio reproduction. Noise (either white or colored) tend to take low magnitude values on ATFT coefficients. The SNR of an audio signal could be significantly improved by reconstructing the signal back by suppressing low magnitude ATFT coefficients. The threshold to decide low magnitude ATFT coefficients could be made signal adaptive or it could be based on a hard decision. The ATFT coefficients left out after denoising will be a compact representation of the audio signal. In stage 3 discussed below the ATFT coefficients will be further processed in order to achieve a low bit-rate coding.

STAGE 3:
The ATFT coefficients may contain perceptually redundant values. Psychoacoustics provides an analytic model of auditory perception. This model of the human auditory system establishes a framework under which the ATFT coefficients containing redundant audio information can be identified. The perceptually relevant ATFT coefficients will be re-quantized. The re-quantized output will now denote the low bit-rate output of the audio coder. The re-quantized output could be further compressed by using entropy-based coding techniques such as Huffman coding. The Huffman coding stage is just an option, and may be excluded in computationally intensive applications.

STAGE 4:
The ATFT coder could be implemented on a hardware. Processing digital audio and performing ATFT coding will require a significant amount of memory, computation, and internal data transfer. The project could be cost-effectively implemented using digital signal processing (DSP) chips, which are microprocessors tailored to implement signal processing tasks efficiently. Features that make DSP-based platforms ideally suited for implementing ATFT coder include: low-power consumption, single-cycle multiply and multiply-accumulate (MAC) for fast calculation of ATFT coefficients and quantization, and various memory access modes for efficient data transfer.

The output of the ATFT coder could either be send through a bandlimited communication network or massively stored in a digital media depending on the application in hand.

Preliminary Results

Joint time-frequency representation (spectrogram) of an audio signal.

Joint time-frequency representation (spectrogram) of the reconstructed audio signal.

Joint time-frequency representation (spectrogram) of the reconstructed audio signal with perceptaul coding.

Auditory

Audio signal 1 (wav format for PCs) (au format for SUNs)
ATFT coded (wav format for PCs) (au format for SUNs)

Audio signal 2 (wav format for PCs) (au format for SUNs)
ATFT coded (wav format for PCs) (au format for SUNs)

Audio signal 3 (wav format for PCs) (au format for SUNs)
ATFT coded (wav format for PCs) (au format for SUNs)

Audio signal 4 (wav format for PCs) (au format for SUNs)
ATFT coded (wav format for PCs) (au format for SUNs)