本文最后更新于:2025年4月9日 下午
前言
上一篇使用了 AudioToolBox 将音频数据编码成 AAC,这次的需求是将视频帧编码成 H264。在 iOS 8.0 之前,如果要在 iOS 平台上硬编码 H264 只能使用 AVAssetWriter 的野路子"曲线救国",先利用系统硬编将视频帧写到本地 mp4 文件里,然后需要自己写逻辑去 mp4 的 Box 里读取 sps、pps 和 NALU 数据,会有频繁的文件读写操作。iOS 8.0 苹果提供了 VideoToolBox 来支持硬件的编解码,大大提升了开发效率,让开发者可以直接拿到编码后的数据,因此我们视频编码基于 VideoToolBox 来实现。
实现思路
使用 VideoToolBox 一般会用到 VTCompressionSessionRef 这个类,使用方式比较简单,首先调用 VTCompressionSessionCreate 创建编码器实例:
VT_EXPORT OSStatus VTCompressionSessionCreate( CM_NULLABLE CFAllocatorRef allocator, int32_t width, int32_t height, CMVideoCodecType codecType, CM_NULLABLE CFDictionaryRef encoderSpecification, CM_NULLABLE CFDictionaryRef sourceImageBufferAttributes, CM_NULLABLE CFAllocatorRef compressedDataAllocator, CM_NULLABLE VTCompressionOutputCallback outputCallback, void * CM_NULLABLE outputCallbackRefCon, CM_RETURNS_RETAINED_PARAMETER CM_NULLABLE VTCompressionSessionRef * CM_NONNULL compressionSessionOut) API_AVAILABLE(macosx(10.8 ), ios(8.0 ), tvos(10.2 ));
实例创建好后,可以给编码器配置输出参数,一般常用的属性有帧率、码率、GOP大小、profileLevel,注意 propertyValue 是 CoreFoundation 类型,需要手动管理指针的释放:
VT_EXPORT OSStatus VTSessionSetProperty( CM_NONNULL VTSessionRef session, CM_NONNULL CFStringRef propertyKey, CM_NULLABLE CFTypeRef propertyValue ) API_AVAILABLE(macosx(10.8 ), ios(8.0 ), tvos(10.2 ));
配置好属性,调用方法让编码器分配足够内存准备编码(可选):
VT_EXPORT OSStatus VTCompressionSessionPrepareToEncodeFrames( CM_NONNULL VTCompressionSessionRef session ) API_AVAILABLE(macosx(10.9 ), ios(8.0 ), tvos(10.2 ));
准备工作完成,就可以将采集到的视频数据送入编码器了:
VT_EXPORT OSStatus VTCompressionSessionEncodeFrame( CM_NONNULL VTCompressionSessionRef session, CM_NONNULL CVImageBufferRef imageBuffer, CMTime presentationTimeStamp, CMTime duration, CM_NULLABLE CFDictionaryRef frameProperties, void * CM_NULLABLE sourceFrameRefcon, VTEncodeInfoFlags * CM_NULLABLE infoFlagsOut ) API_AVAILABLE(macosx(10.8 ), ios(8.0 ), tvos(10.2 ));
编码器在编码一帧后会回调在 VTCompressionSessionCreate 传入的 outputCallback 函数,将编码后的数据和上下文指针回调出来:
typedef void (*VTCompressionOutputCallback)( void * CM_NULLABLE outputCallbackRefCon, void * CM_NULLABLE sourceFrameRefCon, OSStatus status, VTEncodeInfoFlags infoFlags, CM_NULLABLE CMSampleBufferRef sampleBuffer );
拿到 sampleBuffer 就可以解析出视频帧编码后的数据和一些其他信息了。
代码实现
首先创建编码器实例,配置属性:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 - (void )setupCompressionSession { [_lock lock]; XPVideoEncodeConfig *configuration = _videoEncodeConfig; OSStatus err = noErr; VTCompressionSessionRef session = NULL ; NSDictionary *pixelBufferOptions = @{ (NSString *) kCVPixelBufferPixelFormatTypeKey : @(kCVPixelFormatType_32BGRA), (NSString *) kCVPixelBufferWidthKey : @([configuration videoSize].width), (NSString *) kCVPixelBufferHeightKey : @([configuration videoSize].height), (NSString *) kCVPixelBufferOpenGLESCompatibilityKey : @YES}; err = VTCompressionSessionCreate(kCFAllocatorDefault, [configuration videoSize].width, [configuration videoSize].height, kCMVideoCodecType_H264, NULL , (__bridge CFDictionaryRef )pixelBufferOptions, kCFAllocatorDefault, &vtCallback, (__bridge void *)self , &session); if (err != noErr) { NSLog (@"error: failed to setup VTCompressionSession. %d" , err); } _compressionSession = session; if (err == noErr) { const int32_t interval = (int32_t)[configuration videoMaxKeyFrameInterval]; const int32_t frameRate = (int32_t)[configuration expectedSourceVideoFrameRate]; int32_t duration = (int32_t)(interval / frameRate); err = SetVTSessionIntProperty(session, kVTCompressionPropertyKey_MaxKeyFrameInterval, interval); if (err != noErr) { NSLog (@"error: failed to setup VTCompressionSession. %d" , err); } err = SetVTSessionIntProperty(session, kVTCompressionPropertyKey_MaxKeyFrameIntervalDuration, duration); if (err != noErr) { NSLog (@"error: failed to setup VTCompressionSession. %d" , err); } err = SetVTSessionIntProperty(session, kVTCompressionPropertyKey_ExpectedFrameRate, frameRate); if (err != noErr) { NSLog (@"error: failed to setup VTCompressionSession. %d" , err); } } if (err == noErr) { err = SetVTSessionBoolProperty(session, kVTCompressionPropertyKey_AllowFrameReordering, false ); if (err != noErr) { NSLog (@"error: failed to setup VTCompressionSession. %d" , err); } } if (err == noErr) { err = [self setExpectedBitrate:[configuration averageVideoBitrate]]; if (err != noErr) { NSLog (@"error: failed to setup VTCompressionSession. %d" , err); } } if (err == noErr) { err = SetVTSessionBoolProperty(session, kVTCompressionPropertyKey_RealTime, true ); if (err != noErr) { NSLog (@"error: failed to setup VTCompressionSession. %d" , err); } } if (err == noErr) { err = SetVTSessionStringProperty(session, kVTCompressionPropertyKey_ProfileLevel, (__bridge CFTypeRef )[configuration videoProfileLevel]); if (err != noErr) { NSLog (@" error:failed to setup VTCompressionSession. %d" , err); } } if (err == noErr) { err = VTCompressionSessionPrepareToEncodeFrames(session); if (err != noErr) { NSLog (@"error: failed to setup VTCompressionSession. %d" , err); } } if (err != noErr) { NSLog (@"error: failed to setup VTCompressionSession. %d" , err); [_lock unlock]; @throw [NSException exceptionWithName:kXPH264VTEncoderErrorInit reason:@"failed to setup VTCompressionSession" userInfo:nil ]; return ; } [_lock unlock]; }
初始化方法里用到的 SetVTSession***Property 是封装的自定义函数,避免代码里频繁出现 CF 框架对象的创建和释放:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 static OSStatus SetVTSessionIntProperty(VTSessionRef session, CFStringRef key, int32_t value) { CFNumberRef cfNum = CFNumberCreate (kCFAllocatorDefault, kCFNumberSInt32Type, &value); OSStatus status = VTSessionSetProperty(session, key, cfNum); CFRelease (cfNum); if (status != noErr) { NSLog (@"VTSessionSetProperty failed to set key: %@ with value: %d" , (__bridge NSString *)key, value); } return status; }static OSStatus SetVTSessionBoolProperty(VTSessionRef session, CFStringRef key, bool value) { CFBooleanRef cf_bool = (value) ? kCFBooleanTrue : kCFBooleanFalse; OSStatus status = VTSessionSetProperty(session, key, cf_bool); if (status != noErr) { NSLog (@"VTSessionSetProperty failed to set key: %@ with value: %@" , (__bridge NSString *)key, value ? @"YES" :@"NO" ); } return status; }static OSStatus SetVTSessionStringProperty(VTSessionRef session, CFStringRef key, CFStringRef value) { OSStatus status = VTSessionSetProperty(session, key, value); if (status != noErr) { NSLog (@"VTSessionSetProperty failed to set key: %@ with value: %@" , (__bridge NSString *)key, (__bridge NSString *)value); } return status; }
码率的设置会相对复杂一些,kVTCompressionPropertyKey_AverageBitRate 用来设置平均码率,实际的码率会围绕平均码率浮动,所以还需要设置一个浮动的范围:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 - (OSStatus)setExpectedBitrate:(NSUInteger )averageBitrate { if (!_compressionSession) { return kNilOptions; } int bitrate = (int )averageBitrate; OSStatus status; status = SetVTSessionIntProperty(_compressionSession, kVTCompressionPropertyKey_AverageBitRate, (int32_t)bitrate); if (status != noErr) { return status; } int64_t dataLimitBytesPerSecond = (int64_t)(bitrate * kLimitToAverageBitRateFactor / 8 ); CFNumberRef bytesPerSecondRef = CFNumberCreate (kCFAllocatorDefault, kCFNumberSInt64Type, &dataLimitBytesPerSecond); int64_t aSecond = 1 ; CFNumberRef aSecondRef = CFNumberCreate (kCFAllocatorDefault, kCFNumberSInt64Type, &aSecond); const void * nums[2 ] = { bytesPerSecondRef, aSecondRef }; CFArrayRef dataRateLimitsRef = CFArrayCreate (NULL , nums, 2 , &kCFTypeArrayCallBacks); status = VTSessionSetProperty(_compressionSession, kVTCompressionPropertyKey_DataRateLimits, dataRateLimitsRef); if (bytesPerSecondRef) { CFRelease (bytesPerSecondRef); } if (aSecondRef) { CFRelease (aSecondRef); } if (dataRateLimitsRef) { CFRelease (dataRateLimitsRef); } return status; }
准备工作完成后,当采集到视频数据时就可以将其送进编码器了:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 - (void )pushBuffer:(CVPixelBufferRef)pixelBuffer metaData:(XPMetaData *)metaData { if (_compressionSession == NULL ) { return ; } if (pixelBuffer == NULL ) { NSLog (@"error: pixel buffer is NULL" ); return ; } size_t width = CVPixelBufferGetWidth(pixelBuffer); size_t height = CVPixelBufferGetHeight(pixelBuffer); if (width == 0 || height == 0 ) { return ; } XPVideoEncodeConfig *configuration = _videoEncodeConfig; CMTime presentationTime = {0 }; presentationTime.timescale = 1000 ; presentationTime.value = metaData.pts; presentationTime.flags = kCMTimeFlags_Valid; [_lock lock]; VTEncodeInfoFlags flags; VTCompressionSessionEncodeFrame(_compressionSession, pixelBuffer, presentationTime, kCMTimeInvalid, NULL , NULL , &flags); [_lock unlock]; }
VideoToolBox 编码完成后会回调先前定义的回调函数:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 static void vtCallback( void * CM_NULLABLE outputCallbackRefCon, void * CM_NULLABLE sourceFrameRefCon, OSStatus status, VTEncodeInfoFlags infoFlags, CM_NULLABLE CMSampleBufferRef sampleBuffer ) { XPVTH264Encoder *encoder = (__bridge XPVTH264Encoder *)outputCallbackRefCon; if (!encoder) { return ; } CMBlockBufferRef block = CMSampleBufferGetDataBuffer (sampleBuffer); CFArrayRef attachments = CMSampleBufferGetSampleAttachmentsArray (sampleBuffer, false ); CMTime pts = CMSampleBufferGetPresentationTimeStamp (sampleBuffer); bool isKeyframe = false ; if (attachments != NULL ) { CFDictionaryRef attachment; CFBooleanRef dependsOnOthers; attachment = (CFDictionaryRef )CFArrayGetValueAtIndex (attachments, 0 ); dependsOnOthers = (CFBooleanRef )CFDictionaryGetValue (attachment, kCMSampleAttachmentKey_DependsOnOthers); isKeyframe = (dependsOnOthers == kCFBooleanFalse); } if (isKeyframe && !encoder.isConfigSent) { size_t spsSize = 0 , ppsSize = 0 ; const uint8_t* sps = NULL , *pps = NULL ; CMFormatDescriptionRef format = CMSampleBufferGetFormatDescription (sampleBuffer); size_t paramCount; CMVideoFormatDescriptionGetH264ParameterSetAtIndex (format, 0 , &sps, &spsSize, ¶mCount, NULL ); CMVideoFormatDescriptionGetH264ParameterSetAtIndex (format, 1 , &pps, &ppsSize, ¶mCount, NULL ); encoder.configSent = YES ; } char * bufferData; size_t size; uint8_t *naluData; status = CMBlockBufferGetDataPointer (block, 0 , NULL , &size, &bufferData); if (status == noErr) { naluData = (uint8_t *)malloc(size); memcpy(naluData, (uint8_t *)bufferData, size); } else { NSLog (@"error: video toolbox encoder error: %d" , status); } }
在收到第一个编码后的关键帧时,先从 SampleBuffer 里解析出 sps 和 pps,将其封装成 AVCC 的 extradata ,打包成 flv tag,并作为第一个 video tag 发送给服务端。通过 CMBlockBufferGetDataPointer 方法拿到的编码数据块中可能包含多个 NALU,苹果有时会将 SEI 和 IDR 帧放到同一个 CMBlockBuffer 中,如果需要把每个 NALU 单独解析出来,可以判断前 4 个字节的 NALU 长度,将 NALU 一个一个解析出来。
iOS 平台的 h264 硬编码使用的都是 AVCC,而非 Annex B,在实际使用编码数据的过程中,需要根据不同的场景对 NALU 做转换。RTMP 协议使用 flv 的封装格式,而恰好 flv 也是用 AVCC,所以封装的时候就很方便了。
使用结束之后,别忘了释放编码器实例:
- (void )teardownCompressionSession { [_lock lock]; if (_compressionSession) { VTCompressionSessionInvalidate(_compressionSession); CFRelease (_compressionSession); _compressionSession = NULL ; } [_lock unlock]; }
后记
下一步,flv 封装!