![]()
![]()
This project addresses an important issue facing the state-of-the-art digital audio technology. Currently the data rate associated with high fidelity wideband audio signal is enormous for many transmission channels and storage media. The objective of this project is to develop a high quality low bit-rate audio coder that takes advantage of the human psychoacoustical properties, and the true non-stationary nature of audio signals. The main challenge of this project will be in designing non-stationary signal analysis algorithm that will exploit the joint time and frequency correlation, and provide an energy compact representation of the audio signal in a fewer transform coefficients.
The objective is to develop a computationally less expensive high quality low bit-rate audio coder to meet the increasing demand of the start-of-the-art digital audio technology. It is hypothesized that the high quality low bit-rate coder can be achieved by exploiting the true non-stationary characteristics of wideband audio signals, and by taking advantage of the human psychoacoustical properties. The details of the proposed project are outlined below in four stages:STAGE 1:
To start with, the wideband audio signal will be sampled and quantized. The audio signal at this stage is converted into a digital audio, typically of PCM format. The data rate of a high fidelity audio signal is about 1.4 Mb/s for a 44.1 kHz sampling rate and 16 bits/sample quantization. This data rate is simply too high for many transmission channels and storage media. As a result, coding algorithms that reduce the output data rate have received much attention. These algorithms compress the audio signal by exploiting the statistical, temporal and spatial redundancies that are an integral part of any audio signal. A brief review on commonly used compression schemes, and a detailed description of the proposed compression scheme are discussed in stage 2.STAGE 2:
Two fundamentally different techniques are available for the compression of PCM audio data: time domain and frequency domain coding. In time domain coding, the temporal redundancy between audio samples is exploited. The motivation for time domain coding of audio signals is to represent a orrelated waveform in terms of difference samples, such that one can maintain the same signal-to-noise ratio (SNR) at a reduced bit rate. Frequency domain coders are designed to identify and remove redundancy in frequency domain. A common feature of all frequency domain coders is the transformation technique used. The mapping into frequency domain is accomplished by a transform, resulting in a transform coder, or by subband decomposition, resulting in a subband coder.STAGE 3:
The ATFT coefficients may contain perceptually redundant values. Psychoacoustics provides an analytic model of auditory perception. This model of the human auditory system establishes a framework under which the ATFT coefficients containing redundant audio information can be identified. The perceptually relevant ATFT coefficients will be re-quantized. The re-quantized output will now denote the low bit-rate output of the audio coder. The re-quantized output could be further compressed by using entropy-based coding techniques such as Huffman coding. The Huffman coding stage is just an option, and may be excluded in computationally intensive applications.STAGE 4:
The ATFT coder could be implemented on a hardware. Processing digital audio and performing ATFT coding will require a significant amount of memory, computation, and internal data transfer. The project could be cost-effectively implemented using digital signal processing (DSP) chips, which are microprocessors tailored to implement signal processing tasks efficiently. Features that make DSP-based platforms ideally suited for implementing ATFT coder include: low-power consumption, single-cycle multiply and multiply-accumulate (MAC) for fast calculation of ATFT coefficients and quantization, and various memory access modes for efficient data transfer.![]()
![]()

Joint time-frequency representation (spectrogram) of an audio signal.

Joint time-frequency representation (spectrogram) of the reconstructed audio signal.

Joint time-frequency representation (spectrogram) of the reconstructed audio signal with perceptaul coding.
![]()
Audio signal 1 (wav format for PCs) (au format for SUNs)
ATFT coded (wav format for PCs) (au format for SUNs)
![]()
Audio signal 2 (wav format for PCs) (au format for SUNs)
ATFT coded (wav format for PCs) (au format for SUNs)
![]()
Audio signal 3 (wav format for PCs) (au format for SUNs)
ATFT coded (wav format for PCs) (au format for SUNs)
![]()
Audio signal 4 (wav format for PCs) (au format for SUNs)
ATFT coded (wav format for PCs) (au format for SUNs)
![]()