原文位址 : http://blog.csdn.net/dancing_night/article/details/45642493
1、概述
這幾天用ffmpeg實作簡單音頻轉碼,在寫程式的過程中發現音頻在AVFrame中存儲與視訊很相似,音頻要複雜一些,本人記性不好,是以在這裡記錄下來,以作備忘。
2、2個資料成員
在AVFrame中有2個很重要的資料成員,一個是data,一個是linesize。data中存儲的是未編碼的源始資料(不論視音頻),linesize中存儲的是每行data中資料大小。 data的定義如下:
/**
* pointer to the picture/channel planes.
* This might be different from the first allocated byte
*
* Some decoders access areas outside 0,0 - width,height, please
* see avcodec_align_dimensions2(). Some filters and swscale can read
* up to 16 bytes beyond the planes, if these filters are to be used,
* then 16 extra bytes must be allocated.
*/
uint8_t *data[AV_NUM_DATA_POINTERS];
linesize定義如下:
/**
* For video, size in bytes of each picture line.
* For audio, size in bytes of each plane.
*
* For audio, only linesize[0] may be set. For planar audio, each channel
* plane must be the same size.
*
* For video the linesizes should be multiplies of the CPUs alignment
* preference, this is 16 or 32 for modern desktop CPUs.
* Some code requires such alignment other code can be slower without
* correct alignment, for yet other it makes no difference.
*
* @note The linesize may be larger than the size of usable data -- there
* may be extra padding present for performance reasons.
*/
int linesize[AV_NUM_DATA_POINTERS];
注意:當為音頻的時候linesize,隻有linesize[0]才是有效值,因為左右一樣大。
3、存儲方式
1、視訊
視訊相對簡單的多,以yuv420為例,圖像資料在AVFrame中存儲是這樣的: data[0]存儲Y data[1 ]存儲U data[2 ]存儲V 而他們相對應的大小為: linesize[0]為Y的大小 linesize[1]為U的大小 linesize[2]為V的大小
2、音頻
音頻資料則要複雜一些,在音頻中分為平面和非平面資料類型,下面是音頻資料類型的定義:
/**
* Audio Sample Formats
*
* @par
* The data described by the sample format is always in native-endian order.
* Sample values can be expressed by native C types, hence the lack of a signed
* 24-bit sample format even though it is a common raw audio data format.
*
* @par
* The floating-point formats are based on full volume being in the range
* [-1.0, 1.0]. Any values outside this range are beyond full volume level.
*
* @par
* The data layout as used in av_samples_fill_arrays() and elsewhere in FFmpeg
* (such as AVFrame in libavcodec) is as follows:
*
* For planar sample formats, each audio channel is in a separate data plane,
* and linesize is the buffer size, in bytes, for a single plane. All data
* planes must be the same size. For packed sample formats, only the first data
* plane is used, and samples for each channel are interleaved. In this case,
* linesize is the buffer size, in bytes, for the 1 plane.
*/
enum AVSampleFormat {
AV_SAMPLE_FMT_NONE = -1,
AV_SAMPLE_FMT_U8, ///< unsigned 8 bits
AV_SAMPLE_FMT_S16, ///< signed 16 bits
AV_SAMPLE_FMT_S32, ///< signed 32 bits
AV_SAMPLE_FMT_FLT, ///< float
AV_SAMPLE_FMT_DBL, ///< double
AV_SAMPLE_FMT_U8P, ///< unsigned 8 bits, planar
AV_SAMPLE_FMT_S16P, ///< signed 16 bits, planar
AV_SAMPLE_FMT_S32P, ///< signed 32 bits, planar
AV_SAMPLE_FMT_FLTP, ///< float, planar
AV_SAMPLE_FMT_DBLP, ///< double, planar
AV_SAMPLE_FMT_NB ///< Number of sample formats. DO NOT USE if linking dynamically
};
定義中最後帶p的為平面資料類型,可以用av_sample_fmt_is_planar來判斷此資料類型是否是平面資料類型。 先說非平面資料: 以一個雙聲道(左右)音頻來說,存儲格式可能就為LRLRLR.........(左聲道在前還是右聲道在前沒有認真研究過),資料都裝在data[0]中,而大小則為linesize[0]。 平面資料: 就有點像視訊部分的YUV資料,同樣對雙聲道音頻PCM資料,以S16P為例,存儲就可能是
plane 0: LLLLLLLLLLLLLLLLLLLLLLLLLL...
plane 1: RRRRRRRRRRRRRRRRRRRR....
對應的存儲則為: data[0]存儲plane 0 data[1]存儲plane 1
對應的大小則都為linesize[0],可以用av_get_bytes_per_sample(out_stream->codec->sample_fmt) * out_frame->nb_samples來算出plane的大小。