第六章的内容有點多,現在學習一下音視訊的軟編和硬編。
使用libfdk_aac進行編碼
書上的源碼是采用ffmpeg的api進行編碼的,當然你也可單獨編譯libfdk_aac來進行編碼。首先我們還是需要配置采樣率,聲道,碼率等參數進行初始化。
接着我們探測輸出檔案,讓ffmpeg自動根據檔案名探測格式。
int ret;
av_register_all();
avFormatContext = avformat_alloc_context();
LOGI("aacFilePath is %s ", aacFilePath);
//一種方法
//先探測格式,然後設定到avFormatContext中
// AVOutputFormat *fmt = av_guess_format(NULL, aacFilePath, NULL);
// avFormatContext->oformat = fmt;
//直接根據輸出檔案的名字來自動探測格式
if ((ret = avformat_alloc_output_context2(&avFormatContext, nullptr, nullptr, aacFilePath)) !=
) {
LOGI("avFormatContext alloc failed : %s", av_err2str(ret));
return -;
}
if (ret = avio_open2(&avFormatContext->pb, aacFilePath, AVIO_FLAG_WRITE, nullptr, nullptr)) {
LOGI("Could not avio open fail %s", av_err2str(ret));
return -;
}
接下來我們建立一個音頻流,并且擷取音頻流的編解碼器上下文。
AVCodec *codec;
AVSampleFormat preferedSampleFMT = AV_SAMPLE_FMT_S16;
int preferedChannels = audioChannels;
int preferedSampleRate = audioSampleRate;
audioStream = avformat_new_stream(avFormatContext, nullptr);
audioStream->id = ;
avCodecContext = audioStream->codec;
配置編解碼上下文的參數,主要需要配置的有
- 編解碼器類型
- 采樣率
- 比特率
- 解碼器的id,從之前探測的oformat可以擷取
- 采樣格式
- 通道數
-
AAC的規格,大概一下集中
MPEG-2 AAC LC 低複雜度規格(Low Complexity)
MPEG-2 AAC Main 主規格
MPEG-2 AAC SSR 可變采樣率規格(Scaleable Sample Rate)
MPEG-4 AAC LC 低複雜度規格(Low Complexity),現在的手機比較常見的 MP4 檔案中的音頻部份就包括了該規格音頻檔案
MPEG-4 AAC Main 主規格
MPEG-4 AAC SSR 可變采樣率規格(Scaleable Sample Rate)
MPEG-4 AAC LTP 長時期預測規格(Long Term Predicition)
MPEG-4 AAC LD 低延遲規格(Low Delay)
MPEG-4 AAC HE 高效率規格(High Efficiency)
avCodecContext->codec_type = AVMEDIA_TYPE_AUDIO;
avCodecContext->sample_rate = audioSampleRate;
if (publishBitRate > ) {
avCodecContext->bit_rate = publishBitRate;
} else {
avCodecContext->bit_rate = PUBLISH_BITE_RATE;
}
avCodecContext->codec_id = avFormatContext->oformat->audio_codec;
avCodecContext->sample_fmt = preferedSampleFMT;
LOGI("audioChannels is %d", audioChannels);
avCodecContext->channel_layout =
preferedChannels == ? AV_CH_LAYOUT_MONO : AV_CH_LAYOUT_STEREO;
avCodecContext->channels = av_get_channel_layout_nb_channels(avCodecContext->channel_layout);
//配置編碼aac規格
avCodecContext->profile = FF_PROFILE_AAC_LOW;
LOGI("avCodecContext->channels is %d", avCodecContext->channels);
if (avFormatContext->oformat->flags & AVFMT_GLOBALHEADER) {
avCodecContext->flags |= CODEC_FLAG_GLOBAL_HEADER;
}
找到對應的編解碼器,并擷取編解碼器支援的采樣格式,采樣率,并進行一定條件下的篩選。書上的代碼是如果采樣格式不支援就直接使用支援的格式的第一個,采樣率選取的是最接近的一個。
//使用之前探測格式來找
codec = avcodec_find_encoder(avCodecContext->codec_id);
if (!codec) {
LOGI("Couldn't find a valid audio codec");
return -;
}
if (codec->sample_fmts) {
/* check if the prefered sample format for this codec is supported.
* this is because, depending on the version of libav, and with the whole ffmpeg/libav fork situation,
* you have various implementations around. float samples in particular are not always supported.
*/
const enum AVSampleFormat *p = codec->sample_fmts;
for (; *p != -; p++) {
if (*p == audioStream->codec->sample_fmt)
break;
}
if (*p == -) {
LOGI("sample format incompatible with codec. Defaulting to a format known to work.........");
avCodecContext->sample_fmt = codec->sample_fmts[];
}
}
//從支援的采樣率找到最接近的
if (codec->supported_samplerates) {
const int *p = codec->supported_samplerates;
int best = ;
int best_dist = INT_MAX;
for (; *p; p++) {
int dist = abs(audioStream->codec->sample_rate - *p);
if (dist < best_dist) {
best_dist = dist;
best = *p;
}
}
/* best is the closest supported sample rate (same as selected if best_dist == ) */
接着如果如果我們輸入的PCM規格不支援編解碼器的支援的規格,那麼就需要進行轉碼。初始化swrContext,最後就可以打開編解碼器了。
//初始化轉碼
if (preferedChannels != avCodecContext->channels
|| preferedSampleRate != avCodecContext->sample_rate
|| preferedSampleFMT != avCodecContext->sample_fmt) {
LOGI("channels is {%d, %d}", preferedChannels, audioStream->codec->channels);
LOGI("sample_rate is {%d, %d}", preferedSampleRate, audioStream->codec->sample_rate);
LOGI("sample_fmt is {%d, %d}", preferedSampleFMT, audioStream->codec->sample_fmt);
LOGI("AV_SAMPLE_FMT_S16P is %d AV_SAMPLE_FMT_S16 is %d AV_SAMPLE_FMT_FLTP is %d",
AV_SAMPLE_FMT_S16P, AV_SAMPLE_FMT_S16, AV_SAMPLE_FMT_FLTP);
swrContext = swr_alloc_set_opts(NULL,
av_get_default_channel_layout(avCodecContext->channels),
(AVSampleFormat) avCodecContext->sample_fmt,
avCodecContext->sample_rate,
av_get_default_channel_layout(preferedChannels),
preferedSampleFMT, preferedSampleRate,
, NULL);
if (!swrContext || swr_init(swrContext)) {
if (swrContext)
swr_free(&swrContext);
return -;
}
}
if (avcodec_open2(avCodecContext, codec, NULL) < ) {
LOGI("Couldn't open codec");
return -;
}
接下來我們先寫入流的頭資訊,接着配置設定幀的緩存。
if (avformat_write_header(avFormatContext, nullptr) != ) {
LOGI("Could not write header\n");
return -;
}
this->isWriteHeaderSuccess = true;
this->alloc_avframe();
配置設定幀緩存主要有輸入幀,以及如果需要轉碼還需要配置設定轉碼幀,以及轉碼時候的資料空間。對于音頻幀編碼需要手動初始化的有
- 目前幀的的采樣個數(每一個聲道)
- 采樣格式
- 通道數
- 采樣率
- 幀的緩沖區,配置設定好後調用avcodec_fill_audio_frame進行填充
int AudioEncoder::alloc_avframe() {
int ret = ;
AVSampleFormat preferedSampleFMT = AV_SAMPLE_FMT_S16;
int preferedChannels = audioChannels;
int preferedSampleRate = audioSampleRate;
input_frame = av_frame_alloc();
if (!input_frame) {
LOGI("Could not allocate audio frame\n");
return -;
}
input_frame->nb_samples = avCodecContext->frame_size;
input_frame->format = preferedSampleFMT;
input_frame->channel_layout = preferedChannels == ? AV_CH_LAYOUT_MONO : AV_CH_LAYOUT_STEREO;
input_frame->sample_rate = preferedSampleRate;
buffer_size = av_samples_get_buffer_size(NULL, av_get_channel_layout_nb_channels(
input_frame->channel_layout),
input_frame->nb_samples, preferedSampleFMT, );
samples = static_cast<uint8_t *>(av_malloc(buffer_size));
samplesCursor = ;
if (!samples) {
LOGI("Could not allocate %d bytes for samples buffer\n", buffer_size);
return -;
}
LOGI("allocate %d bytes for samples buffer\n", buffer_size);
/* setup the data pointers in the AVFrame */
//綁定avframe的緩沖區
ret = avcodec_fill_audio_frame(input_frame,
av_get_channel_layout_nb_channels(input_frame->channel_layout),
preferedSampleFMT, samples, buffer_size, );
if (ret < ) {
LOGI("Could not setup audio frame\n");
}
if (swrContext) {
if (av_sample_fmt_is_planar(avCodecContext->sample_fmt)) {
LOGI("Codec Context SampleFormat is Planar...");
}
/*配置設定空間*/
/* 配置設定空間 */
convert_data = (uint8_t **) calloc(avCodecContext->channels,
sizeof(*convert_data));
av_samples_alloc(convert_data, nullptr, avCodecContext->channels,
avCodecContext->frame_size,
avCodecContext->sample_fmt,
);
swrBufferSize = av_samples_get_buffer_size(NULL, avCodecContext->channels,
avCodecContext->frame_size,
avCodecContext->sample_fmt, );
swrBuffer = (uint8_t *) av_malloc(swrBufferSize);
LOGI("After av_malloc swrBuffer");
swrFrame = av_frame_alloc();
if (!swrFrame) {
LOGI("Could not allocate swrFrame frame\n");
return -;
}
swrFrame->nb_samples = avCodecContext->frame_size;
swrFrame->format = avCodecContext->sample_fmt;
swrFrame->channel_layout =
avCodecContext->channels == ? AV_CH_LAYOUT_MONO : AV_CH_LAYOUT_STEREO;
swrFrame->sample_rate = avCodecContext->sample_rate;
ret = avcodec_fill_audio_frame(swrFrame, avCodecContext->channels,
avCodecContext->sample_fmt, (const uint8_t *) swrBuffer,
swrBufferSize, );
LOGI("After avcodec_fill_audio_frame");
if (ret < ) {
LOGI("avcodec_fill_audio_frame error ");
return -;
}
}
return ret;
}
接下來就是編碼了,我們從PCM檔案中讀取資料之後進行編碼,如果格式不支援還需要先轉碼。編碼完成之後寫入檔案,主要的編碼流程是初始化AVPacket,它儲存的是編碼後的資料,然後需要手動設定的參數有
- 流的index
- 資料源
-
資料的大小
然後我們調用avcodec_encode_audio2進行編碼,然後調用
av_interleaved_write_frame寫入檔案。
void AudioEncoder::encodePacket() {
int ret, got_output;
AVPacket pkt;
av_init_packet(&pkt);
AVFrame *encode_frame;
if (swrContext) {
long long beginSWRTimeMills = getCurrentTime();
swr_convert(swrContext, convert_data, avCodecContext->frame_size,
(const uint8_t **) input_frame->data, avCodecContext->frame_size);
int length =
avCodecContext->frame_size * av_get_bytes_per_sample(avCodecContext->sample_fmt);
for (int k = ; k < ; ++k) {
for (int j = ; j < length; ++j) {
swrFrame->data[k][j] = convert_data[k][j];
}
}
totalSWRTimeMills += (getCurrentTime() - beginSWRTimeMills);
encode_frame = swrFrame;
} else {
encode_frame = input_frame;
}
encode_frame->pts = frameIndex++;
pkt.stream_index = audioStream->index;
// pkt.duration = (int) AV_NOPTS_VALUE;
// pkt.pts = pkt.dts = ;
pkt.data = samples;
pkt.size = buffer_size;
ret = avcodec_encode_audio2(avCodecContext, &pkt, encode_frame, &got_output);
if (ret < ) {
LOGI("Error encoding audio frame\n");
return;
}
if (got_output) {
if (avCodecContext->coded_frame && avCodecContext->coded_frame->pts != AV_NOPTS_VALUE) {
pkt.pts = av_rescale_q(avCodecContext->coded_frame->pts, avCodecContext->time_base,
audioStream->time_base);
}
//包含關鍵幀
pkt.flags |= AV_PKT_FLAG_KEY;
this->duration += (pkt.duration * av_q2d(audioStream->time_base));
//此函數負責交錯地輸出一個媒體包。如果調用者無法保證來自各個媒體流的包正确交錯,則最好調用此函數輸出媒體包,反之,可以調用av_write_frame以提高性能。
int writeCode = av_interleaved_write_frame(avFormatContext, &pkt);
}
av_free_packet(&pkt);
}
完成之後需要在destory()中調用寫入檔案尾的方法,av_write_trailer(avFormatContext) 至此使用ffmpeg進行軟編碼完成。
在運作編碼過程中我發現幾個問題,不知道是不是自己設定的問題,還望有哪位大佬不吝賜教一下。
1、設定AVPacket的pts隻是循環遞增,如果不設定encode_frame的pts,AVPacket的pts将會是一個不變的初始值。
2、avCodecContext的time_base,fram_size不需要手動設定,但是書上的源碼是手動設定了,而audioStream的time_base初始化後一直是預設1/90000,需要手動設定為1/44100,這個隻能自己設定還是能利用api自動設定?
使用MediaCodec編碼AAC
首先由采樣率,比特率,音頻通道數等參數初始化MediaCodec,這些參數都是保留才map中,看key就可以基本明白了意義了,然後根據對應的類型擷取MediaCodec,并配置格式以及編解碼類型。
init {
val encodeFormat: MediaFormat = MediaFormat.createAudioFormat(MINE_TYPE, sampleRate, channels)
encodeFormat.setInteger(MediaFormat.KEY_BIT_RATE, bitRate)
encodeFormat.setInteger(MediaFormat.KEY_AAC_PROFILE, MediaCodecInfo.CodecProfileLevel.AACObjectLC)
encodeFormat.setInteger(MediaFormat.KEY_MAX_INPUT_SIZE, * )
mediaCodec = MediaCodec.createEncoderByType(MINE_TYPE)
mediaCodec?.run {
configure(encodeFormat, null, null, MediaCodec.CONFIGURE_FLAG_ENCODE);
start()
[email protected].inputBuffers = inputBuffers
[email protected].outputBuffers = outputBuffers
}.ifNull {
Log.e("problem", "create mediaEncode failed");
[email protected]
}
}
MediaCodec使用原理如圖
簡單來說就是,輸入和輸出不是同一個隊列,我們将PCM放入輸入隊列,編碼完成之後我們再輸出隊列擷取AAC資料,是以還是老套路,我們不斷從檔案裡讀取PCM資料送入MediaCodec進行編碼。主要就是需要注意ByteBuffer的位置偏移問題,其他都比較簡單。
fun fireAudio(data: ByteArray, len: Int) {
mediaCodec?.run {
val inputBufferIndex = dequeueInputBuffer(-)
if (inputBufferIndex > ) {
val inputBuffer = [email protected].inputBuffers!![inputBufferIndex]
inputBuffer.clear()
inputBuffer.put(data)
queueInputBuffer(inputBufferIndex, , len, System.nanoTime(), )
}
val bufferInfo = MediaCodec.BufferInfo()
var outputBufferIndex = dequeueOutputBuffer(bufferInfo, )
while (outputBufferIndex > ) {
val outputBuffer [email protected].outputBuffers!![outputBufferIndex]
outputAACDelegate?.run {
val outPacketSize = bufferInfo.size +
outputBuffer.position(bufferInfo.offset)
outputBuffer.limit(bufferInfo.offset + bufferInfo.size)
val outData = ByteArray(outPacketSize)
addADTStoPacket(outData, outPacketSize);//添加ADTS 代碼後面會貼上
outputBuffer.get(outData,,bufferInfo.size)
//讀取完成postion回歸
outputBuffer.position(bufferInfo.offset)
//寫檔案
outputAACDelegate?.outputAACPacket(outData);
}
releaseOutputBuffer(outputBufferIndex,false)
outputBufferIndex=dequeueOutputBuffer(bufferInfo,)
}
}
}
這裡還需要注意的是MediaCodec編碼的是AAC的裸資料,我們需要給它添加ADTS頭部,這樣播放器就可以直接播放了。我們根據相關資訊添加7個位元組的頭部資訊,然後再最終寫入檔案。
private fun addADTStoPacket(packet: ByteArray, packetLen: Int) {
val profile = // AAC LC
val freqIdx = // 44.1KHz
val chanCfg = // CPE
packet[] = .toByte()
packet[] = .toByte()
packet[] = ((profile - shl ) + (freqIdx shl ) + (chanCfg shr )).toByte()
packet[] = ((chanCfg and shl ) + (packetLen shr )).toByte()
packet[] = (packetLen and shr ).toByte()
packet[] = ((packetLen and shl ) + ) .toByte()
packet[] = .toByte()
}
關于ADTS頭部資訊詳情可以看這篇文章
https://blog.csdn.net/jay100500/article/details/52955232
硬編總體還是比較簡單,速度也快,如果不考慮安卓相容性的話,還是很有優勢的。
至此第六章的音頻部分終于完結了,接下來是視訊的編碼部分。
源碼位址