天天看點

Android端PCM編碼AAC的軟編和硬編

第六章的内容有點多,現在學習一下音視訊的軟編和硬編。

使用libfdk_aac進行編碼

書上的源碼是采用ffmpeg的api進行編碼的,當然你也可單獨編譯libfdk_aac來進行編碼。首先我們還是需要配置采樣率,聲道,碼率等參數進行初始化。

接着我們探測輸出檔案,讓ffmpeg自動根據檔案名探測格式。

int ret;
    av_register_all();
    avFormatContext = avformat_alloc_context();
    LOGI("aacFilePath is %s ", aacFilePath);
    //一種方法
    //先探測格式,然後設定到avFormatContext中
//    AVOutputFormat *fmt = av_guess_format(NULL, aacFilePath, NULL);
//    avFormatContext->oformat = fmt;

    //直接根據輸出檔案的名字來自動探測格式
    if ((ret = avformat_alloc_output_context2(&avFormatContext, nullptr, nullptr, aacFilePath)) !=
        ) {
        LOGI("avFormatContext   alloc   failed : %s", av_err2str(ret));
        return -;
    }
    if (ret = avio_open2(&avFormatContext->pb, aacFilePath, AVIO_FLAG_WRITE, nullptr, nullptr)) {
        LOGI("Could not avio open fail %s", av_err2str(ret));
        return -;
    }
           

接下來我們建立一個音頻流,并且擷取音頻流的編解碼器上下文。

AVCodec *codec;
    AVSampleFormat preferedSampleFMT = AV_SAMPLE_FMT_S16;
    int preferedChannels = audioChannels;
    int preferedSampleRate = audioSampleRate;
    audioStream = avformat_new_stream(avFormatContext, nullptr);
    audioStream->id = ;
    avCodecContext = audioStream->codec;
           

配置編解碼上下文的參數,主要需要配置的有

  • 編解碼器類型
  • 采樣率
  • 比特率
  • 解碼器的id,從之前探測的oformat可以擷取
  • 采樣格式
  • 通道數
  • AAC的規格,大概一下集中

    MPEG-2 AAC LC 低複雜度規格(Low Complexity)

    MPEG-2 AAC Main 主規格

    MPEG-2 AAC SSR 可變采樣率規格(Scaleable Sample Rate)

    MPEG-4 AAC LC 低複雜度規格(Low Complexity),現在的手機比較常見的 MP4 檔案中的音頻部份就包括了該規格音頻檔案

    MPEG-4 AAC Main 主規格

    MPEG-4 AAC SSR 可變采樣率規格(Scaleable Sample Rate)

    MPEG-4 AAC LTP 長時期預測規格(Long Term Predicition)

    MPEG-4 AAC LD 低延遲規格(Low Delay)

    MPEG-4 AAC HE 高效率規格(High Efficiency)

avCodecContext->codec_type = AVMEDIA_TYPE_AUDIO;
    avCodecContext->sample_rate = audioSampleRate;
    if (publishBitRate > ) {
        avCodecContext->bit_rate = publishBitRate;
    } else {
        avCodecContext->bit_rate = PUBLISH_BITE_RATE;

    }
    avCodecContext->codec_id = avFormatContext->oformat->audio_codec;
    avCodecContext->sample_fmt = preferedSampleFMT;
    LOGI("audioChannels is %d", audioChannels);
    avCodecContext->channel_layout =
            preferedChannels ==  ? AV_CH_LAYOUT_MONO : AV_CH_LAYOUT_STEREO;
    avCodecContext->channels = av_get_channel_layout_nb_channels(avCodecContext->channel_layout);
    //配置編碼aac規格
    avCodecContext->profile = FF_PROFILE_AAC_LOW;
    LOGI("avCodecContext->channels is %d", avCodecContext->channels);

    if (avFormatContext->oformat->flags & AVFMT_GLOBALHEADER) {
        avCodecContext->flags |= CODEC_FLAG_GLOBAL_HEADER;
    }
           

找到對應的編解碼器,并擷取編解碼器支援的采樣格式,采樣率,并進行一定條件下的篩選。書上的代碼是如果采樣格式不支援就直接使用支援的格式的第一個,采樣率選取的是最接近的一個。

//使用之前探測格式來找
    codec = avcodec_find_encoder(avCodecContext->codec_id);
    if (!codec) {
        LOGI("Couldn't find a valid audio codec");
        return -;
    }

    if (codec->sample_fmts) {
        /* check if the prefered sample format for this codec is supported.
         * this is because, depending on the version of libav, and with the whole ffmpeg/libav fork situation,
         * you have various implementations around. float samples in particular are not always supported.
         */
        const enum AVSampleFormat *p = codec->sample_fmts;
        for (; *p != -; p++) {
            if (*p == audioStream->codec->sample_fmt)
                break;
        }
        if (*p == -) {
            LOGI("sample format incompatible with codec. Defaulting to a format known to work.........");
            avCodecContext->sample_fmt = codec->sample_fmts[];
        }

    }

    //從支援的采樣率找到最接近的
    if (codec->supported_samplerates) {
        const int *p = codec->supported_samplerates;
        int best = ;
        int best_dist = INT_MAX;
        for (; *p; p++) {
            int dist = abs(audioStream->codec->sample_rate - *p);
            if (dist < best_dist) {
                best_dist = dist;
                best = *p;
            }
        }
        /* best is the closest supported sample rate (same as selected if best_dist == ) */
           

接着如果如果我們輸入的PCM規格不支援編解碼器的支援的規格,那麼就需要進行轉碼。初始化swrContext,最後就可以打開編解碼器了。

//初始化轉碼
    if (preferedChannels != avCodecContext->channels
        || preferedSampleRate != avCodecContext->sample_rate
        || preferedSampleFMT != avCodecContext->sample_fmt) {
        LOGI("channels is {%d, %d}", preferedChannels, audioStream->codec->channels);
        LOGI("sample_rate is {%d, %d}", preferedSampleRate, audioStream->codec->sample_rate);
        LOGI("sample_fmt is {%d, %d}", preferedSampleFMT, audioStream->codec->sample_fmt);
        LOGI("AV_SAMPLE_FMT_S16P is %d AV_SAMPLE_FMT_S16 is %d AV_SAMPLE_FMT_FLTP is %d",
             AV_SAMPLE_FMT_S16P, AV_SAMPLE_FMT_S16, AV_SAMPLE_FMT_FLTP);
        swrContext = swr_alloc_set_opts(NULL,
                                        av_get_default_channel_layout(avCodecContext->channels),
                                        (AVSampleFormat) avCodecContext->sample_fmt,
                                        avCodecContext->sample_rate,
                                        av_get_default_channel_layout(preferedChannels),
                                        preferedSampleFMT, preferedSampleRate,
                                        , NULL);
        if (!swrContext || swr_init(swrContext)) {
            if (swrContext)
                swr_free(&swrContext);
            return -;
        }
    }
    if (avcodec_open2(avCodecContext, codec, NULL) < ) {
        LOGI("Couldn't open codec");
        return -;
    }
           

接下來我們先寫入流的頭資訊,接着配置設定幀的緩存。

if (avformat_write_header(avFormatContext, nullptr) != ) {
        LOGI("Could not write header\n");
        return -;
    }
    this->isWriteHeaderSuccess = true;
    this->alloc_avframe();
           

配置設定幀緩存主要有輸入幀,以及如果需要轉碼還需要配置設定轉碼幀,以及轉碼時候的資料空間。對于音頻幀編碼需要手動初始化的有

  • 目前幀的的采樣個數(每一個聲道)
  • 采樣格式
  • 通道數
  • 采樣率
  • 幀的緩沖區,配置設定好後調用avcodec_fill_audio_frame進行填充
int AudioEncoder::alloc_avframe() {
    int ret = ;
    AVSampleFormat preferedSampleFMT = AV_SAMPLE_FMT_S16;
    int preferedChannels = audioChannels;
    int preferedSampleRate = audioSampleRate;
    input_frame = av_frame_alloc();
    if (!input_frame) {
        LOGI("Could not allocate audio frame\n");
        return -;
    }
    input_frame->nb_samples = avCodecContext->frame_size;
    input_frame->format = preferedSampleFMT;
    input_frame->channel_layout = preferedChannels ==  ? AV_CH_LAYOUT_MONO : AV_CH_LAYOUT_STEREO;
    input_frame->sample_rate = preferedSampleRate;

    buffer_size = av_samples_get_buffer_size(NULL, av_get_channel_layout_nb_channels(
            input_frame->channel_layout),
                                             input_frame->nb_samples, preferedSampleFMT, );
    samples = static_cast<uint8_t *>(av_malloc(buffer_size));
    samplesCursor = ;
    if (!samples) {
        LOGI("Could not allocate %d bytes for samples buffer\n", buffer_size);
        return -;
    }
    LOGI("allocate %d bytes for samples buffer\n", buffer_size);
    /* setup the data pointers in the AVFrame */
    //綁定avframe的緩沖區
    ret = avcodec_fill_audio_frame(input_frame,
                                   av_get_channel_layout_nb_channels(input_frame->channel_layout),
                                   preferedSampleFMT, samples, buffer_size, );
    if (ret < ) {
        LOGI("Could not setup audio frame\n");
    }

    if (swrContext) {
        if (av_sample_fmt_is_planar(avCodecContext->sample_fmt)) {
            LOGI("Codec Context SampleFormat is Planar...");
        }
        /*配置設定空間*/
        /* 配置設定空間 */
        convert_data = (uint8_t **) calloc(avCodecContext->channels,
                                           sizeof(*convert_data));
        av_samples_alloc(convert_data, nullptr, avCodecContext->channels,
                         avCodecContext->frame_size,
                         avCodecContext->sample_fmt, 
        );
        swrBufferSize = av_samples_get_buffer_size(NULL, avCodecContext->channels,
                                                   avCodecContext->frame_size,
                                                   avCodecContext->sample_fmt, );
        swrBuffer = (uint8_t *) av_malloc(swrBufferSize);
        LOGI("After av_malloc swrBuffer");
        swrFrame = av_frame_alloc();
        if (!swrFrame) {
            LOGI("Could not allocate swrFrame frame\n");
            return -;
        }
        swrFrame->nb_samples = avCodecContext->frame_size;
        swrFrame->format = avCodecContext->sample_fmt;
        swrFrame->channel_layout =
                avCodecContext->channels ==  ? AV_CH_LAYOUT_MONO : AV_CH_LAYOUT_STEREO;
        swrFrame->sample_rate = avCodecContext->sample_rate;
        ret = avcodec_fill_audio_frame(swrFrame, avCodecContext->channels,
                                       avCodecContext->sample_fmt, (const uint8_t *) swrBuffer,
                                       swrBufferSize, );
        LOGI("After avcodec_fill_audio_frame");
        if (ret < ) {
            LOGI("avcodec_fill_audio_frame error ");
            return -;
        }

    }

    return ret;
}
           

接下來就是編碼了,我們從PCM檔案中讀取資料之後進行編碼,如果格式不支援還需要先轉碼。編碼完成之後寫入檔案,主要的編碼流程是初始化AVPacket,它儲存的是編碼後的資料,然後需要手動設定的參數有

  • 流的index
  • 資料源
  • 資料的大小

    然後我們調用avcodec_encode_audio2進行編碼,然後調用

    av_interleaved_write_frame寫入檔案。

void AudioEncoder::encodePacket() {

    int ret, got_output;
    AVPacket pkt;
    av_init_packet(&pkt);
    AVFrame *encode_frame;
    if (swrContext) {
        long long beginSWRTimeMills = getCurrentTime();
        swr_convert(swrContext, convert_data, avCodecContext->frame_size,
                    (const uint8_t **) input_frame->data, avCodecContext->frame_size);
        int length =
                avCodecContext->frame_size * av_get_bytes_per_sample(avCodecContext->sample_fmt);
        for (int k = ; k < ; ++k) {
            for (int j = ; j < length; ++j) {
                swrFrame->data[k][j] = convert_data[k][j];
            }
        }
        totalSWRTimeMills += (getCurrentTime() - beginSWRTimeMills);
        encode_frame = swrFrame;
    } else {
        encode_frame = input_frame;
    }
    encode_frame->pts = frameIndex++;
    pkt.stream_index = audioStream->index;
//    pkt.duration = (int) AV_NOPTS_VALUE;
//    pkt.pts = pkt.dts = ;
    pkt.data = samples;
    pkt.size = buffer_size;

    ret = avcodec_encode_audio2(avCodecContext, &pkt, encode_frame, &got_output);
    if (ret < ) {
        LOGI("Error encoding audio frame\n");
        return;
    }
    if (got_output) {
        if (avCodecContext->coded_frame && avCodecContext->coded_frame->pts != AV_NOPTS_VALUE) {
            pkt.pts = av_rescale_q(avCodecContext->coded_frame->pts, avCodecContext->time_base,
                                   audioStream->time_base);
        }
        //包含關鍵幀
        pkt.flags |= AV_PKT_FLAG_KEY;
        this->duration += (pkt.duration * av_q2d(audioStream->time_base));

        //此函數負責交錯地輸出一個媒體包。如果調用者無法保證來自各個媒體流的包正确交錯,則最好調用此函數輸出媒體包,反之,可以調用av_write_frame以提高性能。

        int writeCode = av_interleaved_write_frame(avFormatContext, &pkt);

    }

    av_free_packet(&pkt);


}
           

完成之後需要在destory()中調用寫入檔案尾的方法,av_write_trailer(avFormatContext) 至此使用ffmpeg進行軟編碼完成。

在運作編碼過程中我發現幾個問題,不知道是不是自己設定的問題,還望有哪位大佬不吝賜教一下。

1、設定AVPacket的pts隻是循環遞增,如果不設定encode_frame的pts,AVPacket的pts将會是一個不變的初始值。

2、avCodecContext的time_base,fram_size不需要手動設定,但是書上的源碼是手動設定了,而audioStream的time_base初始化後一直是預設1/90000,需要手動設定為1/44100,這個隻能自己設定還是能利用api自動設定?

使用MediaCodec編碼AAC

首先由采樣率,比特率,音頻通道數等參數初始化MediaCodec,這些參數都是保留才map中,看key就可以基本明白了意義了,然後根據對應的類型擷取MediaCodec,并配置格式以及編解碼類型。

init {
        val encodeFormat: MediaFormat = MediaFormat.createAudioFormat(MINE_TYPE, sampleRate, channels)
        encodeFormat.setInteger(MediaFormat.KEY_BIT_RATE, bitRate)
        encodeFormat.setInteger(MediaFormat.KEY_AAC_PROFILE, MediaCodecInfo.CodecProfileLevel.AACObjectLC)
        encodeFormat.setInteger(MediaFormat.KEY_MAX_INPUT_SIZE,  * )
        mediaCodec = MediaCodec.createEncoderByType(MINE_TYPE)
        mediaCodec?.run {
            configure(encodeFormat, null, null, MediaCodec.CONFIGURE_FLAG_ENCODE);
            start()
            [email protected].inputBuffers = inputBuffers
            [email protected].outputBuffers = outputBuffers
        }.ifNull {
            Log.e("problem", "create mediaEncode failed");
            [email protected]
        }
    }
           

MediaCodec使用原理如圖

Android端PCM編碼AAC的軟編和硬編

簡單來說就是,輸入和輸出不是同一個隊列,我們将PCM放入輸入隊列,編碼完成之後我們再輸出隊列擷取AAC資料,是以還是老套路,我們不斷從檔案裡讀取PCM資料送入MediaCodec進行編碼。主要就是需要注意ByteBuffer的位置偏移問題,其他都比較簡單。

fun fireAudio(data: ByteArray, len: Int) {
        mediaCodec?.run {
            val inputBufferIndex = dequeueInputBuffer(-)
            if (inputBufferIndex > ) {
                val inputBuffer = [email protected].inputBuffers!![inputBufferIndex]
                inputBuffer.clear()
                inputBuffer.put(data)
                queueInputBuffer(inputBufferIndex, , len, System.nanoTime(), )
            }
            val bufferInfo = MediaCodec.BufferInfo()
            var outputBufferIndex = dequeueOutputBuffer(bufferInfo, )

            while (outputBufferIndex > ) {
                val outputBuffer [email protected].outputBuffers!![outputBufferIndex]
                outputAACDelegate?.run {
                    val outPacketSize = bufferInfo.size + 
                    outputBuffer.position(bufferInfo.offset)
                    outputBuffer.limit(bufferInfo.offset + bufferInfo.size)
                    val outData = ByteArray(outPacketSize)
                    addADTStoPacket(outData, outPacketSize);//添加ADTS 代碼後面會貼上
                    outputBuffer.get(outData,,bufferInfo.size)
                    //讀取完成postion回歸
                    outputBuffer.position(bufferInfo.offset)
                   //寫檔案
                   outputAACDelegate?.outputAACPacket(outData);
                }
                releaseOutputBuffer(outputBufferIndex,false)
                outputBufferIndex=dequeueOutputBuffer(bufferInfo,)
            }
        }


    }
           

這裡還需要注意的是MediaCodec編碼的是AAC的裸資料,我們需要給它添加ADTS頭部,這樣播放器就可以直接播放了。我們根據相關資訊添加7個位元組的頭部資訊,然後再最終寫入檔案。

private fun addADTStoPacket(packet: ByteArray, packetLen: Int) {
        val profile =  // AAC LC
        val freqIdx =  // 44.1KHz
        val chanCfg = // CPE
        packet[] = .toByte()
        packet[] = .toByte()
        packet[] = ((profile -  shl ) + (freqIdx shl ) + (chanCfg shr )).toByte()
        packet[] = ((chanCfg and  shl ) + (packetLen shr )).toByte()
        packet[] = (packetLen and  shr ).toByte()
        packet[] = ((packetLen and  shl ) + ) .toByte()
        packet[] = .toByte()
    }
           

關于ADTS頭部資訊詳情可以看這篇文章

https://blog.csdn.net/jay100500/article/details/52955232

硬編總體還是比較簡單,速度也快,如果不考慮安卓相容性的話,還是很有優勢的。

至此第六章的音頻部分終于完結了,接下來是視訊的編碼部分。

源碼位址