Previous review

The first two articles Android hard codec MediaCodec analysis - starting from the story of the pork restaurant (1) and Android hard encoding and decoding MediaCodec analysis - starting from the story of the pork restaurant (2) - has taken you from the story of the pork restaurant to explain in more detail the workflow and specific code of the Android platform hard decoding tool MediaCode, but the analysis of the first two articles is based on static, But what the specific decoding process of MediaCodec is, we don't know. So get the code "moving" today and get a deeper grasp of the MediaCodec decoding process through log and auxiliary code.

If you haven't read the first two blog posts, it is still recommended to take a look, because this article is very related to the first two.

Android Hard Codec MediaCodec Analysis - Starting from the Story of the Pork Restaurant (3)

The code runs log analysis

First click on the first item:

Enter this screen:

Take a look at the Log at this time:

The log print location is at com.android.grafika.PlayMovieActivity, mainly see the line "SurfaceTexture ready (984x1384)":

@Override
public void onSurfaceTextureAvailable(SurfaceTexture st, int width, int height) {
    // There's a short delay between the start of the activity and the initialization
    // of the SurfaceTexture that backs the TextureView.  We don't want to try to
    // send a video stream to the TextureView before it has initialized, so we disable
    // the "play" button until this callback fires.
    Log.d(TAG, "SurfaceTexture ready (" + width + "x" + height + ")");
    mSurfaceTextureReady = true;
    updateControls();
}

Remember the Android hard codec tool MediaCodec analysis - starting from the story of the pork restaurant (2) draw the overall flowchart:

The callback method onSurface TextureAvailable tells us that the TextureView's Surface Texture has been initialized and can start rendering. The play button is only made clickable. The "(984x1384)" in the log "SurfaceTexture ready (984x1384)" is the size of the TextureView.

How to get C++ learning materials for free: Follow the audio and video development T brother, + "link" to get the latest C++ audio and video development advanced exclusive learning materials in 2023 for free!

At this time, gently click the play button, so the video begins to move, which can be described as a clock that shuttles through the time screen, starting to move from the opposite direction~:

First output this log:

D/fuyao-Grafika: Extractor selected track 0 (video/avc): {track-id=1, level=32, mime=video/avc, profile=1, language=``` , color-standard=4, display-width=320, csd-1=java.nio.HeapByteBuffer[pos=0 lim=8 cap=8], color-transfer=3, durationUs=2033333, display-height=240, width=320, color-range=2, max-input-size=383, frame-rate=16, height=240, csd-0=java.nio.HeapByteBuffer[pos=0 lim=38 cap=38]}

It is printed when the media track is selected by MediaExtractor, and prints out information about the specific current video track format:

/**
 * Selects the video track, if any.
 *
 * @return the track index, or -1 if no video track is found.
 */
private static int selectTrack(MediaExtractor extractor) {
    // Select the first video track we find, ignore the rest.
    //当前媒体文件共有多少个轨道（视频轨道、音频轨道、字幕轨道等等）
    int numTracks = extractor.getTrackCount();
    for (int i = 0; i < numTracks; i++) {
        //第i个轨道的MediaFormat
        MediaFormat format = extractor.getTrackFormat(i);
        //format对应的mime类型
        String mime = format.getString(MediaFormat.KEY_MIME);
        //找到视频轨道的index
        if (mime.startsWith("video/")) {
            if (VERBOSE) {
            //注意这行的log打印
                Log.d(TAG, "Extractor selected track " + i + " (" + mime + "): " + format);
            }
            return i;
        }
    }

    return -1;
}

Explain a few key parameters in the log a little:

The level and profile in .log refer to the quality level, and the following explanation is referenced in the # H264 encoding profile & level control

H.264 has four quality levels, namely baseline, extended, main, high: 1, Baseline Profile: Basic image quality. Support I/P frames, only progressive and CAVLC; 2. Extended profile: advanced image quality. I/P/B/SP/SI frames are supported, only progressive and CAVLC are supported; (Less used) 3, Main profile: mainstream image quality. Provides I/P/B frames, supports progressive and interlaced, and also supports CAVLC and CABAC support; 4. High profile: advanced picture quality. Added 8x8 internal prediction, custom quantization, lossless video encoding and more YUV formats on top of the main profile; The H.264 Baseline profile, Extended profile, and Main profile are all video sequences for 8-bit sample data in 4:2:0 format (YUV). In the same configuration, the High profile (HP) can reduce the bitrate by 10% compared to the Main profile (MP). According to different application fields, Baseline profile is mostly used in the field of real-time communication, Main profile is mostly used in the field of streaming media, and High profile is mostly used in the field of broadcasting and storage.

2.mime is video/avc, this previous article has already mentioned, video/avc is H264.

3.color-standard: refers to the color format of the video,

/**
 * An optional key describing the color primaries, white point and
 * luminance factors for video content.
 *
 * The associated value is an integer: 0 if unspecified, or one of the
 * COLOR_STANDARD_ values.
 */
public static final String KEY_COLOR_STANDARD = "color-standard";

/** BT.709 color chromacity coordinates with KR = 0.2126, KB = 0.0722. */
public static final int COLOR_STANDARD_BT709 = 1;

/** BT.601 625 color chromacity coordinates with KR = 0.299, KB = 0.114. */
public static final int COLOR_STANDARD_BT601_PAL = 2;

/** BT.601 525 color chromacity coordinates with KR = 0.299, KB = 0.114. */
public static final int COLOR_STANDARD_BT601_NTSC = 4;

/** BT.2020 color chromacity coordinates with KR = 0.2627, KB = 0.0593. */
public static final int COLOR_STANDARD_BT2020 = 6;

Remember # YUV Color Coding of Audio and Video Development Basics It is said that RGB to YUV has different conversion standards:

At present, the general decoded video format is yuv, but the general graphics card rendering format is RGB, so you need to convert yuv to RGB.

This brings us to the concept of Color Range. There are two types of Color Range, one is Full Range and the other is Limited Range. The R, G, and B values of Full Range are all in the range of 0~255. The value range of R, G, and B in Limited Range is 16~235.

For each Color Range, there are different conversion standards, and the common standards are mainly BT601 and BT709 (BT601 is the standard of SD, while BT709 is the standard of HD).

Here the color-standard of the video is 4, that is, the conversion standard is BT.601 525.

4.color-range: As mentioned in the above reference section, the current color-range is 2, see the constant value description of Google Docs:

/** Limited range. Y component values range from 16 to 235 for 8-bit content.
 *  Cr, Cy values range from 16 to 240 for 8-bit content.
 *  This is the default for video content. */
public static final int COLOR_RANGE_LIMITED = 2;

/** Full range. Y, Cr and Cb component values range from 0 to 255 for 8-bit content. */
public static final int COLOR_RANGE_FULL = 1;

So the color-range of the current video is limited range.

Because there are too many other parameters, most of them can be understood, so they will not be explained one by one.

Look at the next log:

The first line is printed here:

//拿到可用的ByteBuffer的index
int inputBufIndex = decoder.dequeueInputBuffer(TIMEOUT_USEC);
//根据index得到对应的输入ByteBuffer
ByteBuffer inputBuf = decoderInputBuffers[inputBufIndex];
Log.d(TAG, "decoderInputBuffers inputBuf:" + inputBuf + ",inputBufIndex:" + inputBufIndex);

The printed inputBuffer situation, as mentioned in the previous article, is like the raw pork buyer asks the chef if there is an empty basket, the chef tells the buyer the number of the basket in TIMEOUT_USEC microseconds, and then the buyer finds the corresponding empty basket according to the number.

According to the log, it can be seen:

decoderInputBuffers inputBuf:java.nio.DirectByteBuffer[pos=0 lim=6291456 cap=6291456],inputBufIndex:2

The size of this empty buffer is 6291456 bytes (pos represents the position pointed to by the current operation pointer, lim represents the maximum number of currently readable or writable, and cap represents its capacity), and the inputBufIndex is 2, that is, the position of the buffer in the input buffer array of the MediaCodec is 2.

submitted frame 0 to dec, size=339

The frame 0 of this log represents the first block of data read by MediaExtractor's readSampleData, here is the first frame, size=339 means that the frame size is 339 bytes, of course, this is the compressed data size.

The following log output takes the data, that is, the customer asks the chef if the pork is fried:

D/fuyao-Graphics: dequeueOutputBuffer decoderBufferIndex:-1,mBufferInfo:android.media.MediaCodec$BufferInfo@fcbc6e2

D/fuyao-Grafika: no output from decoder available

Source of this log:

int outputBufferIndex = decoder.dequeueOutputBuffer(mBufferInfo, TIMEOUT_USEC);
Log.d(TAG, "dequeueOutputBuffer decoderBufferIndex:" + outputBufferIndex + ",mBufferInfo:" + mBufferInfo);

A decoderBufferIndex of -1 equals MediaCodec.INFO_TRY_AGAIN_LATER, that is, there is no data at the current output, that is, the chef tells the customer that the pork is not ready.

If you have seen the parsing H264 video coding principle I wrote before - starting from Sun Yezhen's movie (1) and parsing H264 video coding principle - starting from Son Yezhen's movie (2), you know that video encoding is a very complex process that involves a large number of mathematical algorithms, so decoding will not be simple, basically will not just put a frame of data to the input end, the output end will immediately get the decoded data.

As can be seen from the later log, after many cycles of putting data in the input side and trying to take out the data on the output side, I finally got the data on the output side 77ms seconds after the first time I put the data in the input side:

Startup lag is a statistic that the official demo already has, from the first time to put data in the input side to the first time to get data from the output side.

The next step is to retrieve the log of specific data:

decoderBufferIndex is 0, that is, the buffer where the decoded data is located is the 0th buffer array on the output side.

ecoderOutputBuffers.length: 8 is what I specifically printed out the number of output arrays:

ByteBuffer[] decoderOutputBuffers = decoder.getOutputBuffers();
Log.d(TAG, "ecoderOutputBuffers.length:" + decoderOutputBuffers.length);

The size of the visible output buffer array is 8 (after practice, it has been found that this value is not fixed).

outputBuffer: java.nio.DirectByteBuffer[pos=0 lim=115200 cap=115200] means that the available data and capacity of the buffer are 115200. The data decoded later is also this size, because the decoded data is the YUV data of a frame, because the resolution of the picture is fixed, and the YUV format is also fixed, so the size is naturally the same.

The log of the last fetch data before Output gets the data needs to pay attention to the following:

D/fuyao-Graphics:dequeueOutputBuffer decoderBufferIndex:-2,mBufferInfo:android.media.MediaCodec$BufferInfo@9bec00c

D/fuyao-Grafika: decoder output format changed: {crop-right=319, color-format=21, slice-height=240, image-data=java.nio.HeapByteBuffer[pos=0 lim=104 cap=104], mime=video/raw, stride=320, color-standard=4, color-transfer=3, crop-bottom=239, crop-left=0, width=320, color-range=2, crop-top=0, height=240}

decoderBufferIndex is 2, which is MediaCodec.INFO_OUTPUT_FORMAT_CHANGED. After getting the data, there will be a notification informing the format of the output data change, and we can get the format of the output data here.

1. crop-left=0, crop-right=319, crop-top=0, crop-bottom=239 represent the coordinates of the 4 vertices of the real video area in the entire video frame.

Some readers may ask, isn't the video full of a frame? In fact, it is not, take a look at the interpretation of the official website developer.android.google.cn/reference/a... ：

The MediaFormat#KEY_WIDTH and MediaFormat#KEY_HEIGHT keys specify the size of the video frames; however, for most encondings the video (picture) only occupies a portion of the video frame. This is represented by the 'crop rectangle'.

You need to use the following keys to get the crop rectangle of raw output images from the output format. If these keys are not present, the video occupies the entire video frame. The crop rectangle is understood in the context of the output frame before applying any rotation.

The meaning of the specific key:

Format Key	Type	Description
MediaFormat#KEY_CROP_LEFT	Integer	The left-coordinate (x) of the crop rectangle
MediaFormat#KEY_CROP_TOP	Integer	The top-coordinate (y) of the crop rectangle
MediaFormat#KEY_CROP_RIGHT	Integer	The right-coordinate (x) MINUS 1 of the crop rectangle
MediaFormat#KEY_CROP_BOTTOM	Integer	The bottom-coordinate (y) MINUS 1 of the crop rectangle

The official website gives another code to calculate the valid area of the video through these 4 values:

MediaFormat format = decoder.getOutputFormat(…);
 int width = format.getInteger(MediaFormat.KEY_WIDTH);
 if (format.containsKey(MediaFormat.KEY_CROP_LEFT)
        && format.containsKey(MediaFormat.KEY_CROP_RIGHT)) {
    width = format.getInteger(MediaFormat.KEY_CROP_RIGHT) + 1
                - format.getInteger(MediaFormat.KEY_CROP_LEFT);
 }
 int height = format.getInteger(MediaFormat.KEY_HEIGHT);
 if (format.containsKey(MediaFormat.KEY_CROP_TOP)
        && format.containsKey(MediaFormat.KEY_CROP_BOTTOM)) {
    height = format.getInteger(MediaFormat.KEY_CROP_BOTTOM) + 1
                 - format.getInteger(MediaFormat.KEY_CROP_TOP);
 }

2.Color-format: Color-coded format. 21 is COLOR_FormatYUV420SemiPlanar and is often called NV21. YUV color coding on the basics of yuv specific format in audio and video development has been described, but the article does not specifically talk about NV21, NV21 in semi planner, y independently put an array, uv put an array, first V and then U staggered storage (picture from: Brief analysis of YUV color space)

For example, a 4*4 picture, the distribution is shown in the following figure:

Y Y Y Y
Y Y Y Y
Y Y Y Y
Y Y Y Y
V U V U
V U V U

3.slice-height: refers to the height of the frame, that is, how many lines, but this number of lines may be memory-aligned, sometimes in order to improve the reading speed, the video frame height will be filled to the power of 2.

4. Stride: Span, is a concept when storing images. It refers to the space occupied by each row of pixels in memory when images are stored. Similarly, this one is also memory-aligned, so it is greater than or equal to the number of pixels per row of the original video. The root cause of many video screen problems is that the attribute of stride is ignored.

Other parameters have already been mentioned above, so I will not repeat them.

Once the output decoded data is rendered, it is rendered to the Surface via releaseOutputBuffer:

//将输出buffer数组的第outputBufferIndex个buffer绘制到surface。doRender为true绘制到配置的surface
decoder.releaseOutputBuffer(outputBufferIndex, doRender);

We see that the last frame of data for log's output is:

output EOS

When calling:

int outputBufferIndex = decoder.dequeueOutputBuffer(mBufferInfo, TIMEOUT_USEC);

When the obtained mBufferInfo.flags is MediaCodec.BUFFER_FLAG_END_OF_STREAM (the intput end is passed in at the last frame of the video), it means that the frame is already the last frame of the video, and the decoding loop is jumped out of the big loop and ready to release resources:

finally {
    // release everything we grabbed
    if (decoder != null) {
        //Call stop() to return the codec to the Uninitialized state, whereupon it may be configured again.
        decoder.stop();
        decoder.release();
        decoder = null;
    }
    if (extractor != null) {
        extractor.release();
        extractor = null;
    }
}

Remember the Android hard codec MediaCodec parsing - starting with the story of the pork restaurant (1) mentioned the state machine of MediaCode:

First call the stop method, it enters the Uninitialized state, that is, the pork restaurant is going to clean up the tables and chairs, and after cleaning up the tables and chairs, then call release to release resources, that is, the pork restaurant is closed.

Save the decoded output data

The next interesting thing to do is to save the decoded data of each output as a picture.

Create a method to receive a frame of data from the output, and then convert the yuv data into jpeg data through the YuvImage provided by the system, and then convert the jpeg data to Bitmap through BitmapFactory.decodeByteArray, and then save it to a local folder.

private void outputFrameAsPic(byte[] ba, int i) {
    Log.d(TAG, "outputBuffer i：" + i);
    YuvImage yuvImage = new YuvImage(ba, ImageFormat.NV21, mVideoWidth, mVideoHeight, null);
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    //将yuv转化为jpeg
    yuvImage.compressToJpeg(new Rect(0, 0, mVideoWidth, mVideoHeight), 100, baos);
    byte[] jdata = baos.toByteArray();//rgb
    Bitmap bmp = BitmapFactory.decodeByteArray(jdata, 0, jdata.length);
    if (bmp != null) {
        try {
            File parent = new File(Environment.getExternalStorageDirectory().getAbsolutePath()+"/moviePlayer/");
            if (!parent.exists()){
                parent.mkdirs();
            }

            File myCaptureFile = new File(parent.getAbsolutePath(),String.format("img%s.png", i));
            if (!myCaptureFile.exists()){
                myCaptureFile.createNewFile();
            }
            BufferedOutputStream bos = new BufferedOutputStream(new FileOutputStream(myCaptureFile));
            bmp.compress(Bitmap.CompressFormat.JPEG, 80, bos);
            Log.d(TAG, "bmp.compress myCaptureFile:" + myCaptureFile.getAbsolutePath());
            bos.flush();
            bos.close();
        } catch (Exception e) {
            e.printStackTrace();
            Log.d(TAG, "outputFrameAsPic Exception:" + e);
        }
    }
}

Then call the method every time you get the output Buffer:

ByteBuffer outputBuffer = decoderOutputBuffers[outputBufferIndex];
Log.d(TAG, "outputBuffer:" + outputBuffer);

outputBuffer.position(mBufferInfo.offset);
outputBuffer.limit(mBufferInfo.offset + mBufferInfo.size);

byte[] ba = new byte[outputBuffer.remaining()];
//byteBuffer数据放入ba
outputBuffer.get(ba);
//输出的一帧保存为本地的一张图片
outputFrameAsPic(ba, decodeFrameIndex);

Then run the program and get the following picture:

It can be seen that each frame is successfully screenshotted and saved to the local ~~

Synchronous and asynchronous modes

Finally, MediaCodec codec is divided into synchronous and asynchronous modes (Android 5.0 began to support asynchronous state), synchronous is such as raw pork buyers and customers must be parsed in the Android hard codec tool MediaCodec - starting from the story of pork restaurants (2) about the MediaCodec decoding process code, is synchronous, the so-called synchronous, is relative to asynchronous. The biggest difference between synchronous and asynchronous, I personally think that the former requires us to actively consult MeidaCodec whether there is a buffer available, and the latter is MeidaCodec to notify us that we have a buffer available. Just like the original pork buyer took the initiative to ask the chef if there was an empty basket to use, now the chef sends a WeChat message to tell the buyer that there is now an empty basket to use.

For asynchrony, MediaCodec works a little differently from synchronization:

In the case of asynchrony, from Configured will directly enter the Running state, and then wait for the callback notification of MediaCodec before processing the data, the following is the official code template:

MediaCodec codec = MediaCodec.createByCodecName(name);
 MediaFormat mOutputFormat; // member variable
 codec.setCallback(new MediaCodec.Callback() {
  @Override
  void onInputBufferAvailable(MediaCodec mc, int inputBufferId) {
    ByteBuffer inputBuffer = codec.getInputBuffer(inputBufferId);
    // fill inputBuffer with valid data
    …
    codec.queueInputBuffer(inputBufferId, …);
  }
 
  @Override
  void onOutputBufferAvailable(MediaCodec mc, int outputBufferId, …) {
    ByteBuffer outputBuffer = codec.getOutputBuffer(outputBufferId);
    MediaFormat bufferFormat = codec.getOutputFormat(outputBufferId); // option A
    // bufferFormat is equivalent to mOutputFormat
    // outputBuffer is ready to be processed or rendered.
    …
    codec.releaseOutputBuffer(outputBufferId, …);
  }
 
  @Override
  void onOutputFormatChanged(MediaCodec mc, MediaFormat format) {
    // Subsequent data will conform to new format.
    // Can ignore if using getOutputFormat(outputBufferId)
    mOutputFormat = format; // option B
  }
 
  @Override
  void onError(…) {
    …
  }
 });
 codec.configure(format, …);
 mOutputFormat = codec.getOutputFormat(); // option B
 codec.start();
 // wait for processing to complete
 codec.stop();
 codec.release();

summary

This article runs the code on the basis of the previous article analysis code, and analyzes the details of the decoding process by analyzing the log, so that you can have a clearer understanding of the decoding process. The screenshot of each decoded frame is saved locally, which verifies that the data obtained by the output end of the video decoding is indeed the data representing one frame.

Finally, I talked about the MediaCodec codec codec asynchronous mode correlation.

The good times always go by, and before I know it, I've already used three blog posts about MediaCode, and I can't wait to move on to the next series, the OpenGL series.

Because after successful decoding, it is rendered to the screen, and the most mainstream rendering tool of the current Android platform is OpenGL.

Author: Cat in the Peninsula Tin Box Link: https://juejin.cn/post/7113767096512675870 Source: Rare Earth Nuggets The copyright belongs to the author. For commercial reproduction, please contact the author for authorization, and for non-commercial reproduction, please indicate the source.

You are not alone on the road of development, welcome to join the C++ audio and video development exchange group, the big family is jumping to discuss and communicate

#Programmer##C++##音视频开发 #

Android Hard Codec MediaCodec Analysis - Starting from the Story of the Pork Restaurant (3)

Previous review

The code runs log analysis

Save the decoded output data

Synchronous and asynchronous modes

summary