Action recognition models such as PoseClassificationNet have been around for some time, helping systems identify and classify human actions like walking, waving, or picking up objects. While the concept is well-established, the challenge lies in building a robust computer vision model that can accurately recognize the range of actions across different scenarios that are domain- or use case–specific.
One of the key hurdles is acquiring a robust amount of training data, adding the classes that are needed for a unique use case, and training such a model effectively. Instead of relying solely on real-world data, which can be time-consuming and expensive to collect, synthetic data generation (SDG) is quickly becoming an effective and practical solution.
SDG is the process of creating artificial data from physically accurate 3D simulations that mimic real-world data. The model training process is iterative and often requires more data to cover specific scenes, new classes, and creating diverse scenes, ensuring that the model evolves efficiently.
This post discusses the steps used to create synthetic data using NVIDIA Isaac Sim, a reference application built on NVIDIA Omniverse for simulating and validating robots, for multiple domains: Retail, Sports, Warehouse, and Hospital.
We customized the PoseClassificationNet action recognition model with NVIDIA TAO fine-tuning capabilities and tested it on real data. The outlined steps apply to many other use cases applied to create synthetic data for other domains or use cases.
Creating a human action recognition video dataset with Isaac Sim
To begin using Isaac Sim, start with the Isaac Sim Hello World video. To create the action recognition model, you need actions such as picking up an apple. From actions, you can extract key points and that is the input for the AR model. You can obtain action animations from any third-party vendor or create these animations using real videos.
Omni.Replicator.Agent (ORA), an Isaac Sim extension, is designed to generate synthetic data on human characters and robots across a variety of 3D environments. The ORA extension offers the following features:
- Multi-camera consistency
- Multi-sensor logging
- Custom DataWriter support (skeletal data, 2D position, and segmentation)
- Position and orientation randomization for characters, agents, and objects
Select SimReady assets and environments in Omniverse
Isaac Sim has more than a thousand SimReady assets that can be used for your 3D simulation. Environments such as hospital scenes, warehouse digital twins, and retail stores are among some that have been designed, with a selection of over 10K usable assets. To display the selection available, choose Windows > Browsers > Assets.
You can also create your customized assets and environments in Omniverse. For more information, see Environmental setup.
Figure 3 shows some SimReady assets.
Set the ORA Core extension configuration file
Every job uses a config file for specifying the path to the scene and character assets that must be added to the scene. More properties can be added for access when the agent core SDG extension loads job information.
omni.replicator.agent:
character:
asset_path: http://omniverse-content-production.s3-us-west-2.amazonaws.com/Assets/Isaac/4.2/Isaac/People/Characters/
num: 1
global:
camera_num: 4
seed: 1777061627
replicator:
parameters:
bbox: true
output_dir: /media/scratch.metropolis2/sdg_data_action_recognition/sdg_warehouse/warehouse_aisle_walking_f_0
rgb: true
video: true
writer: ActionWriter
scene:
asset_path: http://omniverse-content-production.s3-us-west-2.amazonaws.com/Assets/Isaac/4.2/Isaac/Environments/Simple_Warehouse/full_warehouse.usd
version: 0.1.0
You can specify character assets, generation settings, output configurations, and scene environments in the replicator configuration, with options for custom data recording and various output modes.
Configure and place cameras
Use the extension multi-view camera consistencies by setting the camera count (camera_num
property in the configuration file) to the desired number of views and manually placing cameras in the scene.
Customize the ORA extension (advanced)
We provide the source code for the ORA extension as visualized in the directory tree, which is found under the file path, /isaac-sim-4.0.0-rc.20/extscache/omni.replicator.agent.core-0.2.3
.
|-- data_generation.py
|-- randomization
| |-- camera_randomizer.py
| |-- character_randomizer.py
|-- simulation.py
The ORA extension is written in Python. Here is a breakdown of the main modifiable files and folders in the ORA extension:
simulation.py
: Contains code (SimulationManager
class) to open the context stage and refresh the job scene to start different counts of jobs.data_generation.py
: Loads a config file when prompted bySimulationManager
and starts recording simulation data asynchronously./randomization
: Folder where camera and character spawning properties (rotation and position ranges) can be logically rewritten./writers
: Location to add custom writers to record different types of data and store in the/output_dir
folder (segmentation maps, custom skeleton data, and so on).
The refresh_auto_job_anim
function in simulation.py
contains the callbacks to data generation functions for initiating simulations.
An example of programming a new action for a subject might look like specifying that the character walks, then on the next run, the character sits. For this, you can add custom logic to refresh_auto_job_anim
. This rewrites the relationships between animation and character primitive (Prim) USD objects before every re-render of a stage. For more information, see Prim.
The following code shows an example implementation, where a new character is spawned and a new action is programmed into their animation sequence on every refresh of the scene.
<simulation.py>
def refresh_auto_job_anim(self, num):
///...
stage = omni.usd.get_context().get_stage()
#Get the animation graph for the current character in the scene
anim_prims = stage.GetPrimAtPath("/World/Characters/Biped_Setup/Animations").GetAllChildren()
#Get the new animation to attach to the character in the scene
curr_paths = []
for i in pick_index_list:
curr_paths.append(anim_prims[i])
if self.yaml_data["character"]["animation_name"] == '':
pick_prim = anim_prims[self.yaml_data["character"]["animation_num"]]
pick_name = str(pick_prim.GetPrimPath()).split('/')[-1]
new_anim_graph_node = "/World/Characters/Biped_Setup/AnimationGraph/" + pick_name
else:
pick_prim = None
for prim in anim_prims:
if self.yaml_data["character"]["animation_name"] in str(prim.GetPrimPath()).split('/')[-1]:
pick_prim = prim
new_anim_graph_node = "/World/Characters/Biped_Setup/AnimationGraph/" +
str(prim.GetPrimPath()).split('/')[-1]
#Code to attach the new animation Prim to the character in the scene
omni.kit.commands.execute("CreatePrimCommand",
prim_type = "AnimationClip",
prim_path = new_anim_graph_node,
select_new_prim = True)
omni.kit.commands.execute("AnimGraphUISetNodePositionCommand",
prim = stage.GetPrimAtPath(new_anim_graph_node),
position_attribute_name="ui:position",
value=(-331, 57))
omni.kit.commands.execute("AnimGraphUISetRelationshipTargetsCommand",
relationship = stage.GetPrimAtPath(new_anim_graph_node).GetRelationship("inputs:animationSource"),
targets=[pick_prim.GetPrimPath()])
omni.kit.commands.execute("AddRelationshipTargetCommand",
You can add custom functions to the pipeline to aid with randomization. For instance, you can retrieve and decode the USD format of animations on Reallusion to quickly determine the shortest loop period, enabling you to customize the simulation length for data generated on each run. The following code example is an implementation of this.
<simulation.py>
def find_times(self):
stage = omni.usd.get_context().get_stage()
anim_prims = stage.GetPrimAtPath("/World/Characters/Biped_Setup/Animations").GetAllChildren()
times = {}
for anim in anim_prims:
counter = 0
attr = anim.GetAttribute("translations")
prev_val = None
curr_val = attr.Get(counter)
while prev_val != curr_val:
counter += 1
prev_val = curr_val
curr_val = attr.Get(counter)
if(attr.Get(counter + 1) == attr.Get(counter + 2)):
times[anim] = counter
return times
More settings for camera randomization can be adjusted in the settings.py
file within the ORA extension. Properties such as character_focus
can be set to True
, and parameters such as character_radius
can be modified to spawn cameras relative to character positions, ensuring that characters remain in view and are not occluded during data generation.
Isaac Sim can be executed headlessly in a container after configurations are set up and optional modifications to the extension code have been made.
The entire Isaac Sim application instance can be containerized by pulling the desired Isaac Sim container from NGC and migrating extension modifications to the /extscache
subfolder in the container, as described earlier.
./python.sh tools/isaac_people/sdg_scheduler.py -c
/isaac-sim/curr_sdg_data/mount_sports_config_files/{filename} -n 1
Enable large-scale data generation
To help scale and orchestrate the data generation process, you can use NVIDIA OSMO, a cloud-native orchestration platform for scaling complex, multi-stage, and multi-container ‌robotics across a hybrid infrastructure. With OSMO, we accelerated data generation by 10x on 10 NVIDIA A40 GPUs.
With these steps, we created 25,880 samples with 84 action animations and 4-5 camera angles for 40 different characters:
- 8400 warehouse
- 6600 hospital
- 4800 retail
- 7600 sports
Train an action recognition model with synthetically generated data
Now, you can use the synthetic data to expand the capabilities of a spatial-temporal graph convolutional network (ST-GCN) model, a machine learning model that detects human actions based on skeletal information.
In this example, we trained the PoseClassificationNet model (ST-GCN architecture) on top of the 3D skeleton data produced by Isaac Sim with NVIDIA TAO, a framework for efficiently training and fine-tuning ML models.
The skeleton data from Isaac Sim is first converted into a key point. A key point is either represented directly by a join in the character skeleton or calculated when no corresponding joint can be found. Character skeleton is defined in Renderpeople rigged assets. For more information about an example, see Bundle Casual Rigged 002 from Renderpeople.
Upon training the model, we developed different splits of data by varying the number of characters and frames of actions used. We found the best performance when truncating or padding all animations to 650 frames in the length of the animation sequence and training with 35 characters plus data for five additional characters exhibiting random jitters and rotations.
After training the ST-GCN model with TAO, we obtained an average of 97% test accuracy across 85 classes of action recognition. To further test the robustness of the model against real data, we used the NTU-RGB+D dataset’s 25 keypoint skeleton data for classes of actions that mapped well between those available in the NTU dataset and those in our custom SDG dataset.
NTU Action | Number of Samples | Model trained on SDG and tested on NTU (TOP 5) | Model trained on NTU and tested on NTU (Top 5) |
Drinking water | 948 | 89.14% | 92.347% |
Sitting down from a standing Up | 948 | 98.73% | 100% |
Standing from Sitting | 948 | 99.37% | 100% |
Falling | 948 | 82.17% | 95.82% |
Walking apart | 948 | 87.45% | 94.68% |
Make victory sign | 948 | 99.46% | 100% |
Compared to state-of-the-art performances, the customized model performs well, considering that it was only trained on synthetic data and evaluated with zero-shot inferencing on NTU data that it wasn’t trained on and for significantly different action classes.
The training was iterative. Initially, the model performed well on some classes but poorly on others. To balance the dataset, we added more assets and variations within classes, such as the sitting class.
This refinement improved accuracy across all classes and SDG made it easy to scale.
Try it today
Synthetic data generation (SDG) accelerates model training by creating high-fidelity, artificial data when real-world data is limited. This helps to improve data diversity and generalize the model for a multitude of use cases and scenarios. SDG can improve model accuracy and performance.
Open-source frameworks such as SynthDa can also be used in conjunction with Isaac Sim to add more ways to generate synthetic data from real-world data.
Get started today with Isaac Sim:
- Download and install NVIDIA Omniverse and then tune the PoseClassificationNet model with NVIDIA TAO.
- Deploy in the cloud as a container:
- Isaac SIM cloud deployment
- NVIDIA Osmo with location-agnostic deployment
- Running TAO in the cloud
- Deploy the NVIDIA Omniverse GPU-Optimized AMII (AWS Marketplace)
Acknowledgments
Thanks to Jiajun Li, Haoquan Liang, Anil Ubale, and Aik Beng Ng for their contributions to this post and project.