Discover millions of ebooks, audiobooks, and so much more with a free trial

From $11.99/month after trial. Cancel anytime.

Articulated Body Pose Estimation: Unlocking Human Motion in Computer Vision
Articulated Body Pose Estimation: Unlocking Human Motion in Computer Vision
Articulated Body Pose Estimation: Unlocking Human Motion in Computer Vision
Ebook114 pages1 hour

Articulated Body Pose Estimation: Unlocking Human Motion in Computer Vision

Rating: 0 out of 5 stars

()

Read preview

About this ebook

What is Articulated Body Pose Estimation


In the field of computer vision, the study of techniques and systems that recover the pose of an articulated body, which is comprised of joints and rigid parts, through the use of image-based observations is referred to as the articulated body pose estimation. It is one of the longest-lasting challenges in computer vision because of the complexity of the models that relate observation with position, and because of the range of scenarios in which it would be useful.


How you will benefit


(I) Insights, and validations about the following topics:


Chapter 1: Articulated body pose estimation


Chapter 2: Image segmentation


Chapter 3: Simultaneous localization and mapping


Chapter 4: Gesture recognition


Chapter 5: Video tracking


Chapter 6: Fundamental matrix (computer vision)


Chapter 7: Structure from motion


Chapter 8: Bag-of-words model in computer vision


Chapter 9: Point-set registration


Chapter 10: Michael J. Black


(II) Answering the public top questions about articulated body pose estimation.


(III) Real world examples for the usage of articulated body pose estimation in many fields.


Who this book is for


Professionals, undergraduate and graduate students, enthusiasts, hobbyists, and those who want to go beyond basic knowledge or information for any kind of Articulated Body Pose Estimation.

LanguageEnglish
Release dateApr 29, 2024
Articulated Body Pose Estimation: Unlocking Human Motion in Computer Vision

Read more from Fouad Sabry

Related to Articulated Body Pose Estimation

Titles in the series (100)

View More

Related ebooks

Intelligence (AI) & Semantics For You

View More

Related articles

Reviews for Articulated Body Pose Estimation

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Articulated Body Pose Estimation - Fouad Sabry

    Chapter 1: Articulated body pose estimation

    The field of computer vision known as articulated body pose estimation focuses on techniques and systems that can determine the position of a moving object from a series of images of its joints and rigid parts. The difficulty of the models that relate observation with stance, as well as the wide range of applications, have made this an enduring challenge in computer vision.

    Robots need the ability to detect and understand the presence of people in their immediate vicinity. The interactive machine should comprehend the real-world context of the scenario if a human employs gestures to point to a specific object. Because of its significance and difficulty, several methods have been developed and implemented over the past two decades to address the issue of pose estimation in computer vision. Training complicated models with enormous data sets is a common approach.

    Due to the human body's 244 DOF and 230 joints, pose estimation is a challenging problem with current study. The human body has 10 major parts and 20 degrees of freedom, however not all movements between joints are visible. There is a lot of variation in look that algorithms need to account for, including variances in clothing, body shape, size, and haircuts. In addition, self-articulation occlusions, such as a person covering their face with their hand, or external occlusions can render the results unclear. Finally, most algorithms calculate pose from standard camera's monocular (two-dimensional) images. Inconsistent camera and illumination conditions may contribute to the problem. Additional performance needs only add to the complexity. These pictures have a lot of room for interpretation errors because they lack the depth information of a real body posture. Recent efforts in this direction make use of the color and depth information captured by RGBD cameras.

    In a model-based technique used by most articulated body pose estimation systems, the utmost/minimal similarity/difference between an observation (input) and a template model is used to determine an estimated pose. Various sensors, such as the following, have been considered for use in making the observation::

    Imaging at visible wavelengths, Photos taken in the long-wave infrared spectrum, Time-of-flight photography, and

    Photos taken with a laser rangefinder.

    The model makes direct use of the intermediate representations produced by these sensors. These are some of the depictions::

    Image appearance, Reconstruction based on voxels (volume elements), In three dimensions, using a total of Gaussian kernels

    Three-dimensional surface meshes.

    The human skeleton is where the concept of a part based model first emerged. When an object has the ability to articulate, it can be disassembled into component parts that can be rearranged into a variety of configurations. The primary object's scale and orientation are articulated to the pieces' scales and orientations. The springs serve to connect the many components of the model, allowing it to be described mathematically. So called because it resembles a spring, this model has other names. The compression and expansion of the springs account for the relative proximity of the various components. Spring orientations are limited by geometry. Legs, for instance, don't have arms that can rotate in a full circle. Therefore, components cannot be oriented in such a way. The number of viable combinations is therefore reduced.

    In the spring model, nodes (V) represent the components, while edges (E) represent the springs that connect them.

    Each location in the image can be reached by the x and y coordinates of the pixel location.

    Let {\displaystyle \mathbf {p} _{i}(x,\,y)} be point at {\displaystyle \mathbf {i} ^{th}} location.

    Then the cost associated in joining the spring between {\displaystyle \mathbf {i} ^{th}} and the {\displaystyle \mathbf {j} ^{th}} point can be given by {\displaystyle S(\mathbf {p} _{i},\,\mathbf {p} _{j})=S(\mathbf {p} _{i}-\mathbf {p} _{j})} .

    Hence the total cost associated in placing l components at locations {\displaystyle \mathbf {P} _{l}} is given by

    {\displaystyle S(\mathbf {P} _{l})=\displaystyle \sum _{i=1}^{l}\;\displaystyle \sum _{j=1}^{i}\;\mathbf {s} _{ij}(\mathbf {p} _{i},\,\mathbf {p} _{j})}

    The aforementioned equation is a simplification of the spring model commonly employed to describe body posture. Cost or energy function minimization is used to estimate pose from photographs. There are two terms in this energy function. The first takes into consideration how well each part matches the image data, while the second takes into account how well the oriented (deformed) portions match, such that articulation and object detection are taken into account.

    A hierarchical chain is used to build the kinematic skeleton.

    Each rigid body segment has its local coordinate system that can be transformed to the world coordinate system via a 4×4 transformation matrix {\displaystyle T_{l}} , {\displaystyle T_{l}=T_{\operatorname {par} (l)}R_{l},}

    where {\displaystyle R_{l}} denotes the local transformation from body segment S_{l} to its parent {\displaystyle \operatorname {par} (S_{l})} .

    There are three degrees of freedom (DoF) of movement at each human joint.

    Given a transformation matrix T_l , The T-pose joint position can be translated to the coordinate system of the world.

    Numerous works, the 3D joint rotation is expressed as a normalized quaternion {\displaystyle [x,y,z,w]} due to its continuity that can facilitate gradient-based optimization in the parameter estimation.

    In order to accurately estimate the poses of articulated bodies, deep learning has become the standard technique since around 2016. The appearance of the joints and the relationships between the joints of the body are learned via vast training sets as opposed to developing an explicit model for the parts as above. Extraction of 2D joint positions (keypoints), 3D joint positions, or 3D body shape from a single or several photos is typically the primary emphasis of models.

    Initial deep learning models developed were primarily concerned with determining the 2D locations of human joints from a given image. In order to detect joints, these models feed an input image into a convolutional neural network, which produces a set of heatmaps (one for each joint) with high values in those areas.

    With the proliferation of datasets containing human pose annotations from various angles,, Alongside the aforementioned research, scientists have been trying to reconstruct the 3D form of a person or animal from a collection of 2D photographs. Estimating the correct pose of the skinned multi-person linear (SMPL) model is the main focus. For each animal in the image, keypoints and a silhouette are often detected; after they are found, the parameters of a 3D shape model are typically adapted to correspond with their locations.

    Annotated photos are essential for the aforementioned algorithms, although their creation can be laborious. To solve this problem, researchers in the field of computer vision have created new algorithms that can either recognize keypoints in films without any annotations or learn 3D keypoints given only annotated 2D images from a single view.

    In the not-too-distant future, assisted living facilities may make use of personal care robots. In order for these robots

    Enjoying the preview?
    Page 1 of 1