系列文章目录

基于 CoreAudio 的音频编解码（一）：音频解码

基于 CoreAudio 的音频编解码（二）：音频编码

前言

在基于 CoreAudio 的音频编解码（一）：音频解码中，我们介绍了 Core Audio 中常见的数据结构和基本概念，如果你还没有看过这些内容，最好去看一看。

Core Audio 表示音频的数据的方式并不是告诉你 ”hi，这是个 mp3 文件“ 那么简单。文件格式和文件内的音频数据格式之间有很大的区别。

关于格式的很多内容看起来似乎很随意，但 Audio File Services 提供了一个有趣函数，叫做

AudioFileGetGlobalInfo

，它给出的信息不是关于单个文件，而是关于 Core Audio 对音频文件的总体处理。下面是

AudioFileGetGlobalInfo

可以查询的信息：

kAudioFileGlobalInfo_ReadableTypes					
kAudioFileGlobalInfo_WritableTypes					
kAudioFileGlobalInfo_FileTypeName					
kAudioFileGlobalInfo_AvailableStreamDescriptionsForFormat
kAudioFileGlobalInfo_AvailableFormatIDs				

kAudioFileGlobalInfo_AllExtensions					
kAudioFileGlobalInfo_AllHFSTypeCodes				
kAudioFileGlobalInfo_AllUTIs						
kAudioFileGlobalInfo_AllMIMETypes					

kAudioFileGlobalInfo_ExtensionsForType				
kAudioFileGlobalInfo_HFSTypeCodesForType			
kAudioFileGlobalInfo_UTIsForType					
kAudioFileGlobalInfo_MIMETypesForType				

kAudioFileGlobalInfo_TypesForMIMEType				
kAudioFileGlobalInfo_TypesForUTI					
kAudioFileGlobalInfo_TypesForHFSTypeCode			
kAudioFileGlobalInfo_TypesForExtension

例如

kAudioFileGlobalInfo_AvailableFormatIDs

，当给定文件类型(AudioFileTypeID），它返回一组

FormatID

，表示当前文件类型所支持的数据格式。

下面举个例子，展示如何使用

AudioFileGetGlobalInfo

获取想要的信息。假设我们想知道当文件类型是

kAudioFileMPEG4Type

时，所支持的格式有哪些，我们可以这么做：

OSStatus err;
UInt32 file_type = kAudioFileMPEG4Type;
UInt32 size;
err = AudioFileGetGlobalInfoSize(kAudioFileGlobalInfo_AvailableFormatIDs,
                           sizeof(UInt32),
                           &file_type,
                           &size);

auto* formats = (UInt32*)malloc(size);
err = AudioFileGetGlobalInfo(kAudioFileGlobalInfo_AvailableFormatIDs,
                             sizeof(UInt32),
                             &file_type,
                             &size,
                             formats);

int format_cnt = size / sizeof(UInt32);
for(int i = 0; i < format_cnt; ++i){
    UInt32 format4cc = CFSwapInt32HostToBig(formats[i]);
    cout << i << ": mFormatId: " << (char*)(&format4cc);
}

代码输出了十几项，

kAudioFileMPEG4Type

所支持的格式类型相当丰富。

0: mFormatId: .mp1
1: mFormatId: .mp2
2: mFormatId: .mp3
3: mFormatId: aac 
4: mFormatId: aace
5: mFormatId: aacf
6: mFormatId: aacg
7: mFormatId: aach
8: mFormatId: aac
9: mFormatId: aacp	
10: mFormatId: ac-3
11: mFormatId: alac
12: mFormatId: ec-3
13: mFormatId: usac

如果是

kAudioFileAIFFType

呢？它支持一种格式：

0: mFormatId: lpcm

举另一个例子，

kAudioFileGlobalInfo_AvailableStreamDescriptionsForFormat

，当给定文件类型(AudioFileTypeID）和格式类型，它返回一组

AudioStreamBasicDescription

并填写以下字段：mFormatID、mFormatFlags、mBitsPerChannel。这些信息对于写入文件非常有帮助，毕竟你肯定不想去茫茫文档中找寻这些信息。

AudioFileTypeAndFormatID  file_type_and_format_id;
file_type_and_format_id.mFileType = kAudioFileAIFFType;
file_type_and_format_id.mFormatID = kAudioFormatLinearPCM;

err = AudioFileGetGlobalInfoSize(kAudioFileGlobalInfo_AvailableStreamDescriptionsForFormat,
                                 sizeof(file_type_and_format_id),
                                 &file_type_and_format_id,
                                 &size);

auto  *asbds = (AudioStreamBasicDescription*)malloc(size);
err = AudioFileGetGlobalInfo(kAudioFileGlobalInfo_AvailableStreamDescriptionsForFormat,
                             sizeof(file_type_and_format_id),
                             &file_type_and_format_id,
                             &size,
                             asbds);

int asbd_count = size / sizeof(AudioStreamBasicDescription);

for(int i = 0; i < asbd_count; ++i){
    UInt32 format4cc = CFSwapInt32HostToBig(asbds[i].mFormatID);
    cout << i << ": mFormatId: " << (char*)(&format4cc)
         << ", mFormatFlags: " << asbds[i].mFormatFlags
         << ", mChannelsPerFrame: " << asbds[i].mChannelsPerFrame
         << ", mBytesPerFrame: " << asbds[i].mBytesPerFrame
         << ", mBitsPerChannel: " << asbds[i].mBitsPerChannel << endl;
}

上述代码中，指定文件类型为

kAudioFileAIFFType

，数据格式为

kAudioFormatLinearPCM

，输出为：

0: mFormatId: lpcm, mFormatFlags: 14, mBitsPerChannel: 8
1: mFormatId: lpcm, mFormatFlags: 14, mBitsPerChannel: 16
2: mFormatId: lpcm, mFormatFlags: 14, mBitsPerChannel: 24
3: mFormatId: lpcm, mFormatFlags: 14, mBitsPerChannel: 32

其输出为表明了它支持 8、16、24、32位数据，其

mFormatFlags = 14

表示

0x2 + 0x4 + 0x8

，即

kAudioFormatFlagIsBigEndian | kAudioFormatFlagIsSignedInteger | kAudioFormatFlagIsPacked

音频编码

在前言部分，我们介绍了如何利用

AudioFileGetGlobalInfo

获取信息，这在音频编码过程中非常重要，因为编码时遵循以下几个步骤：

确定文件类型。你想要的文件是啥类型的？wav，aiff 还是 aac 呢？
确定格式类型。不同的文件类型支持的数据格式不同，可以通过 AudioFileGetGlobalInfo 和 kAudioFileGlobalInfo_AvailableFormatIDs 确定
合适的 mFormatFlags 和 mBitsPerChannel 。确定合适的 flags 和 bits 能够确保打开文件时不会出错，可以通过 AudioFileGetGlobalInfo 和 kAudioFileGlobalInfo_AvailableStreamDescriptionsForFormat 来确定。

Show me the code

废话不多说，直接上代码，具体解释在代码后面。

int main(int argc, char* argv[])
{
    AudioFileTypeID file_type = kAudioFileMPEG4Type;
    int o_channels = 2;
    double o_sr = 44100;

    AudioStreamBasicDescription output_asbd;
    memset(&output_asbd, 0, sizeof(output_asbd));
    output_asbd.mSampleRate = o_sr;
    output_asbd.mChannelsPerFrame = o_channels;
    output_asbd.mFormatID = kAudioFormatMPEG4AAC;
    AudioFormatGetProperty(kAudioFormatProperty_FormatInfo, 0, NULL, &size, &output_asbd);

    // open output file
    CFURLRef output_url = createCFURLWithStdString("sin440.aac");
    ExtAudioFileRef output_file;
    OSStatus status = ExtAudioFileCreateWithURL(output_url,file_type,
                                                &output_asbd, nullptr,
                                                kAudioFileFlags_EraseFile,
                                                &output_file);
    assert(status == noErr);
    double i_sr = 44100;
    double i_channels = 2;
    AudioStreamBasicDescription input_asbd;
    FillOutASBDForLPCM (input_asbd,i_sr,i_channels,32,32,true,false,false);
    status = ExtAudioFileSetProperty(output_file, kExtAudioFileProperty_ClientDataFormat,
                                     sizeof(input_asbd), &input_asbd);

    assert(status == noErr);

    const int num_frame_out_per_block = 1024;
    AudioBufferList outputData;
    outputData.mNumberBuffers = 1;
    outputData.mBuffers[0].mNumberChannels = i_channels;
    outputData.mBuffers[0].mDataByteSize = sizeof(float)*num_frame_out_per_block*i_channels;
    std::vector<float> buffer(num_frame_out_per_block * i_channels);
    outputData.mBuffers[0].mData = buffer.data();


    float t = 0;
    float tincr = 2 * M_PI * 440.0f / i_sr;
    for(int i = 0; i < 200; ++i){
        for(int j = 0; j < num_frame_out_per_block; ++j){
            buffer[j * i_channels] = sin(t);
            buffer[j * i_channels + 1] = buffer[j * i_channels];

            t += tincr;
        }

        // write audio block
        status = ExtAudioFileWrite(output_file, num_frame_out_per_block, &outputData);

        assert(status == noErr);
    }

    ExtAudioFileDispose(output_file);

    return 0;
}

首先，我们创建

AudioStreamBasicDescription

，并指定其文件类型为

kAudioFileMPEG4Type

，以及采样率、声道数和数据格式。其他部分通通置零，然后调用

AudioFormatGetProperty

来填充其他信息，但如果是

kAudioFormatLinearPCM

，你最好应该使用

FillOutASBDForLPCM

来填充信息。

AudioFileTypeID file_type = kAudioFileMPEG4Type;

AudioStreamBasicDescription output_asbd;
memset(&output_asbd, 0, sizeof(output_asbd));
output_asbd.mSampleRate = o_sr;
output_asbd.mChannelsPerFrame = o_channels;
output_asbd.mFormatID = kAudioFormatMPEG4AAC;
AudioFormatGetProperty(kAudioFormatProperty_FormatInfo, 0, NULL, &size, &asbd);

接着，通过

ExtAudioFileCreateWithURL

创建并打开文件，其中

kAudioFileFlags_EraseFile

表示将覆盖已有文件进行创建。

CFURLRef output_url = createCFURLWithStdString("sin440.flac");
ExtAudioFileRef output_file;
OSStatus status = ExtAudioFileCreateWithURL(output_url,file_type,
                                            &output_asbd, nullptr,
                                            kAudioFileFlags_EraseFile,
                                            &output_file);

接下来一步非常重要，通过

ExtAudioFileSetProperty

设置 client format，表明编码文件时，输入的音频数据格式是咋样的。在这里例子中，我们输入的音频数据格式为，双声道的interleave float。

AudioStreamBasicDescription input_asbd;
FillOutASBDForLPCM (input_asbd,i_sr,i_channels,32,32,true,false,false);
status = ExtAudioFileSetProperty(output_file, kExtAudioFileProperty_ClientDataFormat,
                                 sizeof(input_asbd), &input_asbd);

然后是创建

AudioBufferList

用于存放音频数据。由于是 interleave float，因此

mNumberBuffers = 1

。

const int num_frame_out_per_block = 1024;
AudioBufferList outputData;
outputData.mNumberBuffers = 1;
outputData.mBuffers[0].mNumberChannels = i_channels;
outputData.mBuffers[0].mDataByteSize = sizeof(float)*num_frame_out_per_block*i_channels;
std::vector<float> buffer(num_frame_out_per_block * i_channels);
outputData.mBuffers[0].mData = buffer.data();

接下来进行音频数据的写入，示例中写入的是 440hz 的正弦波。

float t = 0;
float tincr = 2 * M_PI * 440.0f / i_sr;
for(int i = 0; i < 200; ++i){
    for(int j = 0; j < num_frame_out_per_block; ++j){
        buffer[j * i_channels] = sin(t);
        buffer[j * i_channels + 1] = buffer[j * i_channels];

        t += tincr;
    }

    // write audio block
    status = ExtAudioFileWrite(output_file, num_frame_out_per_block, &outputData);
}

最后不要忘记释放资源。

Q&A

如果输入数据是 Planar 格式的要如何处理？

当

kAudioFormatFlagIsNonInterleaved

为

true

时，表示数据是 planar 格式，对此它有一段特别的注释说明

//    Typically, when an ASBD is being used, the fields describe the complete layout
//    of the sample data in the buffers that are represented by this description -
//        where typically those buffers are represented by an AudioBuffer that is
//    contained in an AudioBufferList.
//
//        However, when an ASBD has the kAudioFormatFlagIsNonInterleaved flag, the
//    AudioBufferList has a different structure and semantic. In this case, the ASBD
//    fields will describe the format of ONE of the AudioBuffers that are contained in
//    the list, AND each AudioBuffer in the list is determined to have a single (mono)
//    channel of audio data. Then, the ASBD's mChannelsPerFrame will indicate the
//    total number of AudioBuffers that are contained within the AudioBufferList -
//        where each buffer contains one channel. This is used primarily with the
//    AudioUnit (and AudioConverter) representation of this list - and won't be found
//    in the AudioHardware usage of this structure.

这时候的 AudioBufferLists 的语义发生了变换，使用方式大致如下：

int i_channels = 2;
    const int num_frame_out_per_block = 1024;
    AudioBufferList *outputData = (AudioBufferList*)malloc(sizeof(AudioBufferList) + (sizeof(AudioBuffer) * (i_channels - 1)));

    // if input_asbd inIsNonInterleaved is true(planar data), mNumberBuffers set to number of channels
    outputData->mNumberBuffers = i_channels;
    for(auto i = 0; i < i_channels; ++i){
        outputData->mBuffers[i].mNumberChannels = 1;
        outputData->mBuffers[i].mDataByteSize = sizeof(float) * num_frame_out_per_block;
        outputData->mBuffers[i].mData = new float[num_frame_out_per_block];
    }

总结

使用 Core Audio 进行音频文件编码，最重要的是找到合适

AudioStreamBasicDescription

。通过

AudioFileGetGlobalInfo

，可以从文件类型出发，找到合适的数据格式，最后在找到合适的

AudioStreamBasicDescription

。之后的工作只要交给

ExtAudioFile

就能够简洁高效的完成。

完整代码在 CoreAudioExtAudioFileExample

基于 CoreAudio 的音频编解码（二）：音频编码系列文章目录前言音频编码Q&A总结

系列文章目录

前言

音频编码

Show me the code

Q&A

如果输入数据是 Planar 格式的要如何处理？

总结

继续阅读

C语言第四章自述2第四章选择结构程序设计

面试题:vector和map的区别，异同。空间分布，100万数据存哪个比较合适。一、迭代器区别二、vector三、Map、Set四、vector_map 为什么比map效率高五、如何选择六、容器选择原则七、效率对比

C++ 多线程用条件变量确定线程的执行顺序而不是使用 sleep(1)

POJ 1284 Primitive Roots (欧拉函数&原根定理)

CQ V1.0分词bates(基于双数组tire树)—应该是目前最快的中文分词算法

成员函数初始化列表

2021-08-13c++——类之操作符重载

swmm与lisflood-fp源码如何一起编译 CMake命令

Windows下VS开发环境环境安装工程项目设置关于Debug和Release的提示

一文看懂字符串的加减乘除

C++ 第十五周报告1--《冒泡法排序》

C++实现简单顺序表

C经典书籍笔记——C陷阱与缺陷②(语法陷阱之优先级)一、错误案列二、优先级规律

线性表之顺序表的实现

C++判断素数、求最大公约数代码判断一个数是否为素数求两个数的最大公约数

SequoiaDB巨杉数据库C++驱动概述

基于 CoreAudio 的音频编解码（二）：音频编码系列文章目录前言音频编码Q&amp;A总结

系列文章目录

前言

音频编码

Show me the code

Q&A

如果输入数据是 Planar 格式的要如何处理？

总结

继续阅读

基于 CoreAudio 的音频编解码（二）：音频编码系列文章目录前言音频编码Q&A总结