ML Kit provides an optimized SDK for selfie segmentation. The Selfie Segmenter assets are statically linked to your app at build time. This will increase your app size by up to 24MB and the API latency can vary from ~7ms to ~12ms depending on the input image size, as measured on iPhone X.
Try it out
- Play around with the sample app to see an example usage of this API.
Before you begin
Include the following ML Kit libraries in your Podfile:
pod 'GoogleMLKit/SegmentationSelfie', '7.0.0'
After you install or update your project’s Pods, open your Xcode project using its .
xcworkspace
. ML Kit is supported in Xcode version 13.2.1 or higher.
1. Create an instance of Segmenter
To perform segmentation on a selfie image, first create an instance of Segmenter
with SelfieSegmenterOptions
and optionally specify the segmentation settings.
Segmenter options
Segmenter Mode
The Segmenter
operates in two modes. Be sure you choose the one that matches your use case.
STREAM_MODE (default)
This mode is designed for streaming frames from video or camera. In this mode, the segmenter will leverage results from previous frames to return smoother segmentation results.
SINGLE_IMAGE_MODE (default)
This mode is designed for single images that are not related. In this mode, the segmenter will process each image independently, with no smoothing over frames.
Enable raw size mask
Asks the segmenter to return the raw size mask which matches the model output size.
The raw mask size (e.g. 256x256) is usually smaller than the input image size.
Without specifying this option, the segmenter will rescale the raw mask to match the input image size. Consider using this option if you want to apply customized rescaling logic or rescaling is not needed for your use case.
Specify the segmenter options:
Swift
let options = SelfieSegmenterOptions() options.segmenterMode = .singleImage options.shouldEnableRawSizeMask = true
Objective-C
MLKSelfieSegmenterOptions *options = [[MLKSelfieSegmenterOptions alloc] init]; options.segmenterMode = MLKSegmenterModeSingleImage; options.shouldEnableRawSizeMask = YES;
Finally, get an instance of Segmenter
. Pass the options you specified:
Swift
let segmenter = Segmenter.segmenter(options: options)
Objective-C
MLKSegmenter *segmenter = [MLKSegmenter segmenterWithOptions:options];
2. Prepare the input image
To segment selfies, do the following for each image or frame of video.
If you enabled stream mode, you must create VisionImage
objects from
CMSampleBuffer
s.
Create a VisionImage
object using a UIImage
or a
CMSampleBuffer
.
If you use a UIImage
, follow these steps:
- Create a
VisionImage
object with theUIImage
. Make sure to specify the correct.orientation
.Swift
let image = VisionImage(image: UIImage) visionImage.orientation = image.imageOrientation
Objective-C
MLKVisionImage *visionImage = [[MLKVisionImage alloc] initWithImage:image]; visionImage.orientation = image.imageOrientation;
If you use a
CMSampleBuffer
, follow these steps:-
Specify the orientation of the image data contained in the
CMSampleBuffer
.To get the image orientation:
Swift
func imageOrientation( deviceOrientation: UIDeviceOrientation, cameraPosition: AVCaptureDevice.Position ) -> UIImage.Orientation { switch deviceOrientation { case .portrait: return cameraPosition == .front ? .leftMirrored : .right case .landscapeLeft: return cameraPosition == .front ? .downMirrored : .up case .portraitUpsideDown: return cameraPosition == .front ? .rightMirrored : .left case .landscapeRight: return cameraPosition == .front ? .upMirrored : .down case .faceDown, .faceUp, .unknown: return .up } }
Objective-C
- (UIImageOrientation) imageOrientationFromDeviceOrientation:(UIDeviceOrientation)deviceOrientation cameraPosition:(AVCaptureDevicePosition)cameraPosition { switch (deviceOrientation) { case UIDeviceOrientationPortrait: return cameraPosition == AVCaptureDevicePositionFront ? UIImageOrientationLeftMirrored : UIImageOrientationRight; case UIDeviceOrientationLandscapeLeft: return cameraPosition == AVCaptureDevicePositionFront ? UIImageOrientationDownMirrored : UIImageOrientationUp; case UIDeviceOrientationPortraitUpsideDown: return cameraPosition == AVCaptureDevicePositionFront ? UIImageOrientationRightMirrored : UIImageOrientationLeft; case UIDeviceOrientationLandscapeRight: return cameraPosition == AVCaptureDevicePositionFront ? UIImageOrientationUpMirrored : UIImageOrientationDown; case UIDeviceOrientationUnknown: case UIDeviceOrientationFaceUp: case UIDeviceOrientationFaceDown: return UIImageOrientationUp; } }
- Create a
VisionImage
object using theCMSampleBuffer
object and orientation:Swift
let image = VisionImage(buffer: sampleBuffer) image.orientation = imageOrientation( deviceOrientation: UIDevice.current.orientation, cameraPosition: cameraPosition)
Objective-C
MLKVisionImage *image = [[MLKVisionImage alloc] initWithBuffer:sampleBuffer]; image.orientation = [self imageOrientationFromDeviceOrientation:UIDevice.currentDevice.orientation cameraPosition:cameraPosition];
3. Process the image
Pass the
VisionImage
object to one of theSegmenter
's image processing methods. You can either use the asynchronousprocess(image:)
method or the synchronousresults(in:)
method.To perform segmentation on a selfie image synchronously:
Swift
var mask: [SegmentationMask] do { mask = try segmenter.results(in: image) } catch let error { print("Failed to perform segmentation with error: \(error.localizedDescription).") return } // Success. Get a segmentation mask here.
Objective-C
NSError *error; MLKSegmentationMask *mask = [segmenter resultsInImage:image error:&error]; if (error != nil) { // Error. return; } // Success. Get a segmentation mask here.
To perform segmentation on a selfie image asynchronously:
Swift
segmenter.process(image) { mask, error in guard error == nil else { // Error. return } // Success. Get a segmentation mask here.
Objective-C
[segmenter processImage:image completion:^(MLKSegmentationMask * _Nullable mask, NSError * _Nullable error) { if (error != nil) { // Error. return; } // Success. Get a segmentation mask here. }];
4. Get the segmentation mask
You can get the segmentation result as follows:
Swift
let maskWidth = CVPixelBufferGetWidth(mask.buffer) let maskHeight = CVPixelBufferGetHeight(mask.buffer) CVPixelBufferLockBaseAddress(mask.buffer, CVPixelBufferLockFlags.readOnly) let maskBytesPerRow = CVPixelBufferGetBytesPerRow(mask.buffer) var maskAddress = CVPixelBufferGetBaseAddress(mask.buffer)!.bindMemory( to: Float32.self, capacity: maskBytesPerRow * maskHeight) for _ in 0...(maskHeight - 1) { for col in 0...(maskWidth - 1) { // Gets the confidence of the pixel in the mask being in the foreground. let foregroundConfidence: Float32 = maskAddress[col] } maskAddress += maskBytesPerRow / MemoryLayout<Float32>.size }
Objective-C
size_t width = CVPixelBufferGetWidth(mask.buffer); size_t height = CVPixelBufferGetHeight(mask.buffer); CVPixelBufferLockBaseAddress(mask.buffer, kCVPixelBufferLock_ReadOnly); size_t maskBytesPerRow = CVPixelBufferGetBytesPerRow(mask.buffer); float *maskAddress = (float *)CVPixelBufferGetBaseAddress(mask.buffer); for (int row = 0; row < height; ++row) { for (int col = 0; col < width; ++col) { // Gets the confidence of the pixel in the mask being in the foreground. float foregroundConfidence = maskAddress[col]; } maskAddress += maskBytesPerRow / sizeof(float); }
For a full example of how to use the segmentation results, please see the ML Kit quickstart sample.
Tips to improve performance
The quality of your results depends on the quality of the input image:
- For ML Kit to get an accurate segmentation result, the image should be at least 256x256 pixels.
- If you perform selfie segmentation in a real-time application, you might also want to consider the overall dimensions of the input images. Smaller images can be processed faster, so to reduce latency, capture images at lower resolutions, but keep in mind the above resolution requirements and ensure that the subject occupies as much of the image as possible.
- Poor image focus can also impact accuracy. If you don't get acceptable results, ask the user to recapture the image.
If you want to use segmentation in a real-time application, follow these guidelines to achieve the best frame rates:
- Use the
stream
segmenter mode. - Consider capturing images at a lower resolution. However, also keep in mind this API's image dimension requirements.
- For processing video frames, use the
results(in:)
synchronous API of the segmenter. Call this method from the AVCaptureVideoDataOutputSampleBufferDelegate's captureOutput(_, didOutput:from:) function to synchronously get results from the given video frame. Keep AVCaptureVideoDataOutput's alwaysDiscardsLateVideoFrames as true to throttle calls to the segmenter. If a new video frame becomes available while the segmenter is running, it will be dropped. - If you use the output of the segmenter to overlay graphics on the input image, first get the result from ML Kit, then render the image and overlay in a single step. By doing so, you render to the display surface only once for each processed input frame. See the previewOverlayView and CameraViewController classes in the ML Kit quickstart sample for an example.
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-01-09 UTC.
[null,null,["Last updated 2025-01-09 UTC."],[[["ML Kit's Selfie Segmenter API enables you to segment selfies in real-time or single images, offering options for stream or single image modes."],["To use the API, you'll need to integrate the `GoogleMLKit/SegmentationSelfie` pod, create a `Segmenter` instance, prepare a `VisionImage`, and process it to obtain a segmentation mask."],["You can customize the segmentation process by enabling raw size mask or choosing different segmenter modes based on your use case."],["For optimal performance, ensure images are at least 256x256 pixels, consider lower resolutions for real-time applications, and leverage the synchronous API for video frames."],["This API is currently in beta and may be subject to changes that break backward compatibility."]]],[]]
-