ML Kit provides two optimized SDKs for pose detection.
SDK Name | PoseDetection | PoseDetectionAccurate |
---|---|---|
Implementation | Assets for base detector are statically linked to your app at build time. | Assets for accurate detector are statically linked to your app at build time. |
App size | Up to 29.6MB | Up to 33.2MB |
Performance | iPhone X: ~45FPS | iPhone X: ~29FPS |
Try it out
- Play around with the sample app to see an example usage of this API.
Before you begin
Include the following ML Kit pods in your Podfile:
# If you want to use the base implementation: pod 'GoogleMLKit/PoseDetection', '15.5.0' # If you want to use the accurate implementation: pod 'GoogleMLKit/PoseDetectionAccurate', '15.5.0'
After you install or update your project’s pods, open your Xcode project using its
xcworkspace
. ML Kit is supported in Xcode version 13.2.1 or higher.
1. Create an instance of PoseDetector
To detect a pose in an image, first create an instance of PoseDetector
and
optionally specify the detector settings.
PoseDetector
options
Detection Mode
The PoseDetector
operates in two detection modes. Be sure you choose the one that matches
your use case.
stream
(default)- The pose detector will first detect the most prominent person in the image and then run pose detection. In subsequent frames, the person-detection step will not be conducted unless the person becomes obscured or is no longer detected with high confidence. The pose detector will attempt to track the most-prominent person and return their pose in each inference. This reduces latency and smooths detection. Use this mode when you want to detect pose in a video stream.
singleImage
- The pose detector will detect a person and then run pose detection. The person-detection step will run for every image, so latency will be higher, and there is no person-tracking. Use this mode when using pose detection on static images or where tracking is not desired.
Specify the pose detector options:
Swift
// Base pose detector with streaming, when depending on the PoseDetection SDK let options = PoseDetectorOptions() options.detectorMode = .stream // Accurate pose detector on static images, when depending on the // PoseDetectionAccurate SDK let options = AccuratePoseDetectorOptions() options.detectorMode = .singleImage
Objective-C
// Base pose detector with streaming, when depending on the PoseDetection SDK MLKPoseDetectorOptions *options = [[MLKPoseDetectorOptions alloc] init]; options.detectorMode = MLKPoseDetectorModeStream; // Accurate pose detector on static images, when depending on the // PoseDetectionAccurate SDK MLKAccuratePoseDetectorOptions *options = [[MLKAccuratePoseDetectorOptions alloc] init]; options.detectorMode = MLKPoseDetectorModeSingleImage;
Finally, get an instance of PoseDetector
. Pass the options you specified:
Swift
let poseDetector = PoseDetector.poseDetector(options: options)
Objective-C
MLKPoseDetector *poseDetector = [MLKPoseDetector poseDetectorWithOptions:options];
2. Prepare the input image
To detect poses, do the following for each image or frame of video.
If you enabled stream mode, you must create VisionImage
objects from
CMSampleBuffer
s.
Create a VisionImage
object using a UIImage
or a
CMSampleBuffer
.
If you use a UIImage
, follow these steps:
- Create a
VisionImage
object with theUIImage
. Make sure to specify the correct.orientation
.Swift
let image = VisionImage(image: UIImage) visionImage.orientation = image.imageOrientation
Objective-C
MLKVisionImage *visionImage = [[MLKVisionImage alloc] initWithImage:image]; visionImage.orientation = image.imageOrientation;
If you use a
CMSampleBuffer
, follow these steps:-
Specify the orientation of the image data contained in the
CMSampleBuffer
.To get the image orientation:
Swift
func imageOrientation( deviceOrientation: UIDeviceOrientation, cameraPosition: AVCaptureDevice.Position ) -> UIImage.Orientation { switch deviceOrientation { case .portrait: return cameraPosition == .front ? .leftMirrored : .right case .landscapeLeft: return cameraPosition == .front ? .downMirrored : .up case .portraitUpsideDown: return cameraPosition == .front ? .rightMirrored : .left case .landscapeRight: return cameraPosition == .front ? .upMirrored : .down case .faceDown, .faceUp, .unknown: return .up } }
Objective-C
- (UIImageOrientation) imageOrientationFromDeviceOrientation:(UIDeviceOrientation)deviceOrientation cameraPosition:(AVCaptureDevicePosition)cameraPosition { switch (deviceOrientation) { case UIDeviceOrientationPortrait: return cameraPosition == AVCaptureDevicePositionFront ? UIImageOrientationLeftMirrored : UIImageOrientationRight; case UIDeviceOrientationLandscapeLeft: return cameraPosition == AVCaptureDevicePositionFront ? UIImageOrientationDownMirrored : UIImageOrientationUp; case UIDeviceOrientationPortraitUpsideDown: return cameraPosition == AVCaptureDevicePositionFront ? UIImageOrientationRightMirrored : UIImageOrientationLeft; case UIDeviceOrientationLandscapeRight: return cameraPosition == AVCaptureDevicePositionFront ? UIImageOrientationUpMirrored : UIImageOrientationDown; case UIDeviceOrientationUnknown: case UIDeviceOrientationFaceUp: case UIDeviceOrientationFaceDown: return UIImageOrientationUp; } }
- Create a
VisionImage
object using theCMSampleBuffer
object and orientation:Swift
let image = VisionImage(buffer: sampleBuffer) image.orientation = imageOrientation( deviceOrientation: UIDevice.current.orientation, cameraPosition: cameraPosition)
Objective-C
MLKVisionImage *image = [[MLKVisionImage alloc] initWithBuffer:sampleBuffer]; image.orientation = [self imageOrientationFromDeviceOrientation:UIDevice.currentDevice.orientation cameraPosition:cameraPosition];
3. Process the image
Pass the
VisionImage
to one of the pose detector's image processing methods. You can either use the asynchronousprocess(image:)
method or the synchronousresults()
method.To detect objects synchronously:
Swift
var results: [Pose] do { results = try poseDetector.results(in: image) } catch let error { print("Failed to detect pose with error: \(error.localizedDescription).") return } guard let detectedPoses = results, !detectedPoses.isEmpty else { print("Pose detector returned no results.") return } // Success. Get pose landmarks here.
Objective-C
NSError *error; NSArray
*poses = [poseDetector resultsInImage:image error:&error]; if (error != nil) { // Error. return; } if (poses.count == 0) { // No pose detected. return; } // Success. Get pose landmarks here. To detect objects asynchronously:
Swift
poseDetector.process(image) { detectedPoses, error in guard error == nil else { // Error. return } guard !detectedPoses.isEmpty else { // No pose detected. return } // Success. Get pose landmarks here. }
Objective-C
[poseDetector processImage:image completion:^(NSArray
* _Nullable poses, NSError * _Nullable error) { if (error != nil) { // Error. return; } if (poses.count == 0) { // No pose detected. return; } // Success. Get pose landmarks here. }]; 4. Get information about the detected pose
If a person is detected in the image, the pose detection API either passes an array of
Pose
objects to the completion handler or returns the array, depending on whether you called the asynchronous or synchronous method.If the person was not completely inside the image, the model assigns the missing landmarks coordinates outside the frame and gives them low InFrameConfidence values.
If no person was detected the array is empty.
Swift
for pose in detectedPoses { let leftAnkleLandmark = pose.landmark(ofType: .leftAnkle) if leftAnkleLandmark.inFrameLikelihood > 0.5 { let position = leftAnkleLandmark.position } }
Objective-C
for (MLKPose *pose in detectedPoses) { MLKPoseLandmark *leftAnkleLandmark = [pose landmarkOfType:MLKPoseLandmarkTypeLeftAnkle]; if (leftAnkleLandmark.inFrameLikelihood > 0.5) { MLKVision3DPoint *position = leftAnkleLandmark.position; } }
Tips to improve performance
The quality of your results depends on the quality of the input image:
- For ML Kit to accurately detect pose, the person in the image should be represented by sufficient pixel data; for best performance, the subject should be at least 256x256 pixels.
- If you detect pose in a real-time application, you might also want to consider the overall dimensions of the input images. Smaller images can be processed faster, so to reduce latency, capture images at lower resolutions, but keep in mind the above resolution requirements and ensure that the subject occupies as much of the image as possible.
- Poor image focus can also impact accuracy. If you don't get acceptable results, ask the user to recapture the image.
If you want to use pose detection in a real-time application, follow these guidelines to achieve the best framerates:
- Use the base PoseDetection SDK and
stream
detection mode. - Consider capturing images at a lower resolution. However, also keep in mind this API's image dimension requirements.
- For processing video frames, use the
results(in:)
synchronous API of the detector. Call this method from the AVCaptureVideoDataOutputSampleBufferDelegate's captureOutput(_, didOutput:from:) function to synchronously get results from the given video frame. Keep AVCaptureVideoDataOutput's alwaysDiscardsLateVideoFrames as true to throttle calls to the detector. If a new video frame becomes available while the detector is running, it will be dropped. - If you use the output of the detector to overlay graphics on the input image, first get the result from ML Kit, then render the image and overlay in a single step. By doing so, you render to the display surface only once for each processed input frame. See the previewOverlayView and MLKDetectionOverlayView classes in the showcase sample app for an example.
Next steps
- To learn how to use pose landmarks to classify poses, see Pose Classification Tips.
- See the ML Kit quickstart sample on GitHub for an example of this API in use.
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-10-29 UTC.
[null,null,["Last updated 2024-10-29 UTC."],[],[]]
-