天天看点

更稳定的手势识别方法--基于手部骨架与关键点检测

导读

本期将介绍并演示基于MediaPipe的手势骨架与特征点提取步骤以及以此为基础实现手势识别的方法。

介绍

关于MediaPipe以前有相关文章介绍,可以参看下面链接:

Google开源手势识别--基于TF Lite/MediaPipe

它能做些什么?它支持的语言和平台有哪些?请看下面两张图:

更稳定的手势识别方法--基于手部骨架与关键点检测
更稳定的手势识别方法--基于手部骨架与关键点检测

我们主要介绍手势骨架与关键点提取,其他内容大家有兴趣自行学习了解。github地址:https://github.com/google/mediapipe

效果展示

手势骨架提取与关键点标注:

手势识别0~6:

实现步骤

具体可参考下面链接:

​​https://google.github.io/mediapipe/solutions/hands ​​

(1) 安装mediapipe,执行pip install mediapipe

更稳定的手势识别方法--基于手部骨架与关键点检测

(2) 下载手势检测与骨架提取模型,地址:

​​https://github.com/google/mediapipe/tree/master/mediapipe/modules/hand_landmark​​

更稳定的手势识别方法--基于手部骨架与关键点检测
更稳定的手势识别方法--基于手部骨架与关键点检测

(3) 代码测试(摄像头实时测试):

import cv2
import mediapipe as mp
from os import listdir
mp_drawing = mp.solutions.drawing_utils
mp_hands = mp.solutions.hands




hands = mp_hands.Hands(
    min_detection_confidence=0.5, min_tracking_confidence=0.5)
cap = cv2.VideoCapture(0)
while cap.isOpened():
  success, image = cap.read()
  if not success:
    print("Ignoring empty camera frame.")
    continue


  image = cv2.cvtColor(cv2.flip(image, 1), cv2.COLOR_BGR2RGB)
  image.flags.writeable = False
  results = hands.process(image)


  image.flags.writeable = True
  image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
  if results.multi_hand_landmarks:
    for hand_landmarks in results.multi_hand_landmarks:
      mp_drawing.draw_landmarks(
          image, hand_landmarks, mp_hands.HAND_CONNECTIONS)
  cv2.imshow('result', image)
  if cv2.waitKey(5) & 0xFF == 27:
    break
cv2.destroyAllWindows()
hands.close()
cap.release()      

输出与结果:

更稳定的手势识别方法--基于手部骨架与关键点检测
更稳定的手势识别方法--基于手部骨架与关键点检测

图片检测(可支持多个手掌):

import cv2
import mediapipe as mp
from os import listdir
mp_drawing = mp.solutions.drawing_utils
mp_hands = mp.solutions.hands


# For static images:
hands = mp_hands.Hands(
    static_image_mode=True,
    max_num_hands=5,
    min_detection_confidence=0.2)
img_path = './multi_hands/'
save_path = './'
index = 0
file_list = listdir(img_path) 
for filename in file_list:
  index += 1
  file_path = img_path + filename
  # Read an image, flip it around y-axis for correct handedness output (see
  # above).
  image = cv2.flip(cv2.imread(file_path), 1)
  # Convert the BGR image to RGB before processing.
  results = hands.process(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))


  # Print handedness and draw hand landmarks on the image.
  print('Handedness:', results.multi_handedness)
  if not results.multi_hand_landmarks:
    continue
  image_hight, image_width, _ = image.shape
  annotated_image = image.copy()
  for hand_landmarks in results.multi_hand_landmarks:
    print('hand_landmarks:', hand_landmarks)
    print(
        f'Index finger tip coordinates: (',
        f'{hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_TIP].x * image_width}, '
        f'{hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_TIP].y * image_hight})'
    )
    mp_drawing.draw_landmarks(
        annotated_image, hand_landmarks, mp_hands.HAND_CONNECTIONS)
  cv2.imwrite(
      save_path + str(index) + '.png', cv2.flip(annotated_image, 1))
hands.close()


# For webcam input:
hands = mp_hands.Hands(
    min_detection_confidence=0.5, min_tracking_confidence=0.5)
cap = cv2.VideoCapture(0)
while cap.isOpened():
  success, image = cap.read()
  if not success:
    print("Ignoring empty camera frame.")
    # If loading a video, use 'break' instead of 'continue'.
    continue


  # Flip the image horizontally for a later selfie-view display, and convert
  # the BGR image to RGB.
  image = cv2.cvtColor(cv2.flip(image, 1), cv2.COLOR_BGR2RGB)
  # To improve performance, optionally mark the image as not writeable to
  # pass by reference.
  image.flags.writeable = False
  results = hands.process(image)


  # Draw the hand annotations on the image.
  image.flags.writeable = True
  image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
  if results.multi_hand_landmarks:
    for hand_landmarks in results.multi_hand_landmarks:
      mp_drawing.draw_landmarks(
          image, hand_landmarks, mp_hands.HAND_CONNECTIONS)
  cv2.imshow('result', image)
  if cv2.waitKey(5) & 0xFF == 27:
    break
cv2.destroyAllWindows()
hands.close()
cap.release()      
更稳定的手势识别方法--基于手部骨架与关键点检测
更稳定的手势识别方法--基于手部骨架与关键点检测

总结后续说明

总结:MediaPipe手势检测与骨架提取模型识别相较传统方法更稳定,而且提供手指关节的3D坐标点,对于手势识别与进一步手势动作相关开发有很大帮助。

其他说明:

(1) 手部关节点标号与排序定义如下图:

更稳定的手势识别方法--基于手部骨架与关键点检测

(2) 手部关节点坐标(x,y,z)输出为小于1的小数,需要归一化后显示到图像上,这部分可以查看上部分源码后转到定义查看,这里给出demo代码,另外Z坐标靠近屏幕增大,远离屏幕减小:

def Normalize_landmarks(image, hand_landmarks):
  new_landmarks = []
  for i in range(0,len(hand_landmarks.landmark)):
    float_x = hand_landmarks.landmark[i].x
    float_y = hand_landmarks.landmark[i].y
    # Z坐标靠近屏幕增大,远离屏幕减小
    float_z = hand_landmarks.landmark[i].z
    print(float_z)
    width = image.shape[1]
    height = image.shape[0]
 
    pt = mp_drawing._normalized_to_pixel_coordinates(float_x,float_y,width,height)
    new_landmarks.append(pt)
  return new_landmarks      

(3) 基于此你可以做个简单额手势识别或者手势靠近远离屏幕的小程序,当然不仅要考虑关节点的坐标,可能还需要计算角度已经以前的状态等等,比如下面这样: