前言
上一篇使用了 AudioToolBox 将音频数据编码成 AAC,这次的需求是将视频帧编码成 H264。在 iOS 8.0 之前,如果要在 iOS 平台上硬编码 H264 只能使用 AVAssetWriter 的野路子"曲线救国",先利用系统硬编将视频帧写到本地 mp4 文件里,然后需要自己写逻辑去 mp4 的 Box 里读取 sps、pps 和 NALU 数据,会有频繁的文件读写操作。iOS 8.0 苹果提供了 VideoToolBox 来支持硬件的编解码,大大提升了开发效率,让开发者可以直接拿到编码后的数据,因此我们视频编码基于 VideoToolBox 来实现。
实现思路
使用 VideoToolBox 一般会用到 VTCompressionSessionRef 这个类,使用方式比较简单,首先调用 VTCompressionSessionCreate 创建编码器实例:
1 2 3 4 5 6 7 8 9 10 11 12 13
| VT_EXPORT OSStatus VTCompressionSessionCreate( CM_NULLABLE CFAllocatorRef allocator, int32_t width, int32_t height, CMVideoCodecType codecType, CM_NULLABLE CFDictionaryRef encoderSpecification, CM_NULLABLE CFDictionaryRef sourceImageBufferAttributes, CM_NULLABLE CFAllocatorRef compressedDataAllocator, CM_NULLABLE VTCompressionOutputCallback outputCallback, void * CM_NULLABLE outputCallbackRefCon, CM_RETURNS_RETAINED_PARAMETER CM_NULLABLE VTCompressionSessionRef * CM_NONNULL compressionSessionOut) API_AVAILABLE(macosx(10.8), ios(8.0), tvos(10.2));
|
实例创建好后,可以给编码器配置输出参数,一般常用的属性有帧率、码率、GOP大小、profileLevel,注意 propertyValue 是 CoreFoundation 类型,需要手动管理指针的释放:
1 2 3 4 5 6
| VT_EXPORT OSStatus VTSessionSetProperty( CM_NONNULL VTSessionRef session, CM_NONNULL CFStringRef propertyKey, CM_NULLABLE CFTypeRef propertyValue ) API_AVAILABLE(macosx(10.8), ios(8.0), tvos(10.2));
|
配置好属性,调用方法让编码器分配足够内存准备编码(可选):
1 2
| VT_EXPORT OSStatus VTCompressionSessionPrepareToEncodeFrames( CM_NONNULL VTCompressionSessionRef session ) API_AVAILABLE(macosx(10.9), ios(8.0), tvos(10.2));
|
准备工作完成,就可以将采集到的视频数据送入编码器了:
1 2 3 4 5 6 7 8 9 10
| VT_EXPORT OSStatus VTCompressionSessionEncodeFrame( CM_NONNULL VTCompressionSessionRef session, CM_NONNULL CVImageBufferRef imageBuffer, CMTime presentationTimeStamp, CMTime duration, CM_NULLABLE CFDictionaryRef frameProperties, void * CM_NULLABLE sourceFrameRefcon, VTEncodeInfoFlags * CM_NULLABLE infoFlagsOut ) API_AVAILABLE(macosx(10.8), ios(8.0), tvos(10.2));
|
编码器在编码一帧后会回调在 VTCompressionSessionCreate 传入的 outputCallback 函数,将编码后的数据和上下文指针回调出来:
1 2 3 4 5 6
| typedef void (*VTCompressionOutputCallback)( void * CM_NULLABLE outputCallbackRefCon, void * CM_NULLABLE sourceFrameRefCon, OSStatus status, VTEncodeInfoFlags infoFlags, CM_NULLABLE CMSampleBufferRef sampleBuffer );
|
拿到 sampleBuffer 就可以解析出视频帧编码后的数据和一些其他信息了。
代码实现
首先创建编码器实例,配置属性:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103
| - (void)setupCompressionSession { [_lock lock]; XPVideoEncodeConfig *configuration = _videoEncodeConfig; OSStatus err = noErr; VTCompressionSessionRef session = NULL; NSDictionary *pixelBufferOptions = @{ (NSString*) kCVPixelBufferPixelFormatTypeKey : @(kCVPixelFormatType_32BGRA), (NSString*) kCVPixelBufferWidthKey : @([configuration videoSize].width), (NSString*) kCVPixelBufferHeightKey : @([configuration videoSize].height), (NSString*) kCVPixelBufferOpenGLESCompatibilityKey : @YES}; err = VTCompressionSessionCreate(kCFAllocatorDefault, [configuration videoSize].width, [configuration videoSize].height, kCMVideoCodecType_H264, NULL, (__bridge CFDictionaryRef)pixelBufferOptions, kCFAllocatorDefault, &vtCallback, (__bridge void *)self, &session); if (err != noErr) { NSLog(@"error: failed to setup VTCompressionSession. %d", err); } _compressionSession = session; if (err == noErr) { const int32_t interval = (int32_t)[configuration videoMaxKeyFrameInterval]; const int32_t frameRate = (int32_t)[configuration expectedSourceVideoFrameRate]; int32_t duration = (int32_t)(interval / frameRate); err = SetVTSessionIntProperty(session, kVTCompressionPropertyKey_MaxKeyFrameInterval, interval); if(err != noErr) { NSLog(@"error: failed to setup VTCompressionSession. %d", err); } err = SetVTSessionIntProperty(session, kVTCompressionPropertyKey_MaxKeyFrameIntervalDuration, duration); if(err != noErr) { NSLog(@"error: failed to setup VTCompressionSession. %d", err); } err = SetVTSessionIntProperty(session, kVTCompressionPropertyKey_ExpectedFrameRate, frameRate); if(err != noErr) { NSLog(@"error: failed to setup VTCompressionSession. %d", err); } } if(err == noErr) { err = SetVTSessionBoolProperty(session, kVTCompressionPropertyKey_AllowFrameReordering, false); if(err != noErr) { NSLog(@"error: failed to setup VTCompressionSession. %d", err); } } if(err == noErr) { err = [self setExpectedBitrate:[configuration averageVideoBitrate]]; if(err != noErr) { NSLog(@"error: failed to setup VTCompressionSession. %d", err); } } if(err == noErr) { err = SetVTSessionBoolProperty(session, kVTCompressionPropertyKey_RealTime, true); if(err != noErr) { NSLog(@"error: failed to setup VTCompressionSession. %d", err); } } if(err == noErr) { err = SetVTSessionStringProperty(session, kVTCompressionPropertyKey_ProfileLevel, (__bridge CFTypeRef)[configuration videoProfileLevel]); if(err != noErr) { NSLog(@" error:failed to setup VTCompressionSession. %d", err); } } if (err == noErr) { err = VTCompressionSessionPrepareToEncodeFrames(session); if (err != noErr) { NSLog(@"error: failed to setup VTCompressionSession. %d", err); } } if(err != noErr) { NSLog(@"error: failed to setup VTCompressionSession. %d", err); [_lock unlock]; @throw [NSException exceptionWithName:kXPH264VTEncoderErrorInit reason:@"failed to setup VTCompressionSession" userInfo:nil]; return; } [_lock unlock]; }
|
初始化方法里用到的 SetVTSession***Property 是封装的自定义函数,避免代码里频繁出现 CF 框架对象的创建和释放:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
| static OSStatus SetVTSessionIntProperty(VTSessionRef session, CFStringRef key, int32_t value) { CFNumberRef cfNum = CFNumberCreate(kCFAllocatorDefault, kCFNumberSInt32Type, &value); OSStatus status = VTSessionSetProperty(session, key, cfNum); CFRelease(cfNum); if (status != noErr) { NSLog(@"VTSessionSetProperty failed to set key: %@ with value: %d", (__bridge NSString*)key, value); } return status; }
static OSStatus SetVTSessionBoolProperty(VTSessionRef session, CFStringRef key, bool value) { CFBooleanRef cf_bool = (value) ? kCFBooleanTrue : kCFBooleanFalse; OSStatus status = VTSessionSetProperty(session, key, cf_bool); if (status != noErr) { NSLog(@"VTSessionSetProperty failed to set key: %@ with value: %@", (__bridge NSString*)key, value ? @"YES":@"NO"); } return status; }
static OSStatus SetVTSessionStringProperty(VTSessionRef session, CFStringRef key, CFStringRef value) { OSStatus status = VTSessionSetProperty(session, key, value); if (status != noErr) { NSLog(@"VTSessionSetProperty failed to set key: %@ with value: %@", (__bridge NSString*)key, (__bridge NSString*)value); } return status; }
|
码率的设置会相对复杂一些,kVTCompressionPropertyKey_AverageBitRate 用来设置平均码率,实际的码率会围绕平均码率浮动,所以还需要设置一个浮动的范围:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
| - (OSStatus)setExpectedBitrate:(NSUInteger)averageBitrate { if (!_compressionSession) { return kNilOptions; } int bitrate = (int)averageBitrate;
OSStatus status; status = SetVTSessionIntProperty(_compressionSession, kVTCompressionPropertyKey_AverageBitRate, (int32_t)bitrate); if (status != noErr) { return status; } int64_t dataLimitBytesPerSecond = (int64_t)(bitrate * kLimitToAverageBitRateFactor / 8); CFNumberRef bytesPerSecondRef = CFNumberCreate(kCFAllocatorDefault, kCFNumberSInt64Type, &dataLimitBytesPerSecond); int64_t aSecond = 1; CFNumberRef aSecondRef = CFNumberCreate(kCFAllocatorDefault, kCFNumberSInt64Type, &aSecond); const void* nums[2] = { bytesPerSecondRef, aSecondRef }; CFArrayRef dataRateLimitsRef = CFArrayCreate(NULL, nums, 2, &kCFTypeArrayCallBacks); status = VTSessionSetProperty(_compressionSession, kVTCompressionPropertyKey_DataRateLimits, dataRateLimitsRef); if (bytesPerSecondRef) { CFRelease(bytesPerSecondRef); } if (aSecondRef) { CFRelease(aSecondRef); } if (dataRateLimitsRef) { CFRelease(dataRateLimitsRef); } return status; }
|
准备工作完成后,当采集到视频数据时就可以将其送进编码器了:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
| - (void)pushBuffer:(CVPixelBufferRef)pixelBuffer metaData:(XPMetaData *)metaData { if (_compressionSession == NULL) { return; } if (pixelBuffer == NULL) { NSLog(@"error: pixel buffer is NULL"); return; } size_t width = CVPixelBufferGetWidth(pixelBuffer); size_t height = CVPixelBufferGetHeight(pixelBuffer); if (width == 0 || height == 0) { return; } XPVideoEncodeConfig *configuration = _videoEncodeConfig;
CMTime presentationTime = {0}; presentationTime.timescale = 1000; presentationTime.value = metaData.pts; presentationTime.flags = kCMTimeFlags_Valid; [_lock lock]; VTEncodeInfoFlags flags; VTCompressionSessionEncodeFrame(_compressionSession, pixelBuffer, presentationTime, kCMTimeInvalid, NULL, NULL, &flags); [_lock unlock]; }
|
VideoToolBox 编码完成后会回调先前定义的回调函数:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70
| static void vtCallback( void * CM_NULLABLE outputCallbackRefCon, void * CM_NULLABLE sourceFrameRefCon, OSStatus status, VTEncodeInfoFlags infoFlags, CM_NULLABLE CMSampleBufferRef sampleBuffer ) { XPVTH264Encoder *encoder = (__bridge XPVTH264Encoder *)outputCallbackRefCon; if (!encoder) { return; } CMBlockBufferRef block = CMSampleBufferGetDataBuffer(sampleBuffer); CFArrayRef attachments = CMSampleBufferGetSampleAttachmentsArray(sampleBuffer, false); CMTime pts = CMSampleBufferGetPresentationTimeStamp(sampleBuffer); bool isKeyframe = false; if (attachments != NULL) { CFDictionaryRef attachment; CFBooleanRef dependsOnOthers; attachment = (CFDictionaryRef)CFArrayGetValueAtIndex(attachments, 0); dependsOnOthers = (CFBooleanRef)CFDictionaryGetValue(attachment, kCMSampleAttachmentKey_DependsOnOthers); isKeyframe = (dependsOnOthers == kCFBooleanFalse); } if (isKeyframe && !encoder.isConfigSent) { size_t spsSize = 0, ppsSize = 0; const uint8_t* sps = NULL, *pps = NULL; CMFormatDescriptionRef format = CMSampleBufferGetFormatDescription(sampleBuffer); size_t paramCount; CMVideoFormatDescriptionGetH264ParameterSetAtIndex(format, 0, &sps, &spsSize, ¶mCount, NULL); CMVideoFormatDescriptionGetH264ParameterSetAtIndex(format, 1, &pps, &ppsSize, ¶mCount, NULL);
encoder.configSent = YES; } char* bufferData; size_t size; uint8_t *naluData; status = CMBlockBufferGetDataPointer(block, 0, NULL, &size, &bufferData); if (status == noErr) { naluData = (uint8_t *)malloc(size); memcpy(naluData, (uint8_t *)bufferData, size);
} else { NSLog(@"error: video toolbox encoder error: %d", status); } }
|
在收到第一个编码后的关键帧时,先从 SampleBuffer 里解析出 sps 和 pps,将其封装成 AVCC 的 extradata ,打包成 flv tag,并作为第一个 video tag 发送给服务端。通过 CMBlockBufferGetDataPointer 方法拿到的编码数据块中可能包含多个 NALU,苹果有时会将 SEI 和 IDR 帧放到同一个 CMBlockBuffer 中,如果需要把每个 NALU 单独解析出来,可以判断前 4 个字节的 NALU 长度,将 NALU 一个一个解析出来。
iOS 平台的 h264 硬编码使用的都是 AVCC,而非 Annex B,在实际使用编码数据的过程中,需要根据不同的场景对 NALU 做转换。RTMP 协议使用 flv 的封装格式,而恰好 flv 也是用 AVCC,所以封装的时候就很方便了。
使用结束之后,别忘了释放编码器实例:
1 2 3 4 5 6 7 8 9
| - (void)teardownCompressionSession { [_lock lock]; if (_compressionSession) { VTCompressionSessionInvalidate(_compressionSession); CFRelease(_compressionSession); _compressionSession = NULL; } [_lock unlock]; }
|
后记
下一步,flv 封装!