Google MediaPipe Solutions

Mediapipe is an open source project by Google that offers cross-platform Machine Learning and AI solutions for live and streaming media to further customize the solutions code to meet your application needs.

Mediapipe offers Mediapipe Solutions and Mediapipe Framework

MediaPipe Solutions provides a suite of libraries and tools for you to quickly apply artificial intelligence (AI) and machine learning (ML) techniques in your applications.

Mediapipe Framework is a low-level component for building machine learning pipelines. 

The hand landmark model bundle detects the keypoint localization of 21 hand-knuckle coordinates within the detected hand regions.

The Face Landmarker uses a series of models to predict face landmarks. The first model detects faces, a second model locates landmarks on the detected faces, and a third model uses those landmarks to identify facial features and expressions.

The following models are packaged together into a downloadable model bundle:

Face detection model: detects the presence of faces with a few key facial landmarks.

Face mesh model: adds a complete mapping of the face. The model outputs an estimate of 478 3-dimensional face landmarks.

Blendshape prediction model: receives output from the face mesh model predicts 52 blendshape scores, which are coefficients representing facial different expressions.

The Pose Landmarker uses a series of models to predict pose landmarks. The first model detects the presence of human bodies within an image frame, and the second model locates landmarks on the bodies. The pose landmarker model tracks 33 body landmark locations, representing the approximate location of the following body parts: 

0 - nose

1 - left eye (inner)

2 - left eye

3 - left eye (outer)

4 - right eye (inner)

5 - right eye

6 - right eye (outer)

7 - left ear

8 - right ear

9 - mouth (left)

10 - mouth (right)

11 - left shoulder

12 - right shoulder

13 - left elbow

14 - right elbow

15 - left wrist

16 - right wrist

17 - left pinky

18 - right pinky

19 - left index

20 - right index

21 - left thumb

22 - right thumb

23 - left hip

24 - right hip

25 - left knee

26 - right knee

27 - left ankle

28 - right ankle

29 - left heel

30 - right heel

31 - left foot index

32 - right foot index

The MediaPipe Holistic Landmarker task lets you combine components of the pose, face, and hand landmarkers to create a complete landmarker for the human body. You can use this task to analyze full-body gestures, poses, and actions. The task outputs a total of 543 landmarks (33 pose landmarks, 468 face landmarks, and 21 hand landmarks per hand) in real-time.

Selfie Segmentation model predicts binary segmentation mask of foreground with humans. Variations of the model are powering background replacement in Google Meet and a more general model is now available in TensorFlow.js and MediaPipe.

Mediapipe-Model-Maker is a Python library used for customizing models for your use case. It leverages a popular technique from machine learning known as Transfer Learning which makes the models learn on new data with the same behavior and in the same domain.

Related - Body Segmentation with MediaPipe and TensorFlow.js

Comments