Dataset Manual

All of our data ships in a standard format (extended COCO). This manual explains the additional metadata included, and features code snippets to get you started quickly.

Dataset Format

The annotated dataset is represented with a single JSON file (COCO_SBX_2021_03_19.json) in the dataset folder.

# high level coco json structure
   "images": [...],        
   "annotations": [...],  
   "categories": [...],    
   "info": {...}


The images section has one entry for every RGB and depth frame in the dataset.  Some important fields:

# image data
    'file_name': 'SBXSensor_3_00000000.png',
    'height': 480,
    'width': 640,
    'id': 3,
    'scene_id': 0,
    'channel': 'rgb'


The annotations section includes one entry for every annotated item across all images. Our conventions:

# annotation data  {
   "segmentation": {        # RLE of segmentation mask
       "counts": [....],
       "size": [1080, 1920]
   "area": 96852.0,
   "iscrowd": 0,
   "image_id": 1,
   "bbox": [825.0, 580.0, 241.0, 499.0],  
   "category_id": 3,
   "id": 0,


The categories section enumerates the types of annotated objects in the scenes.

# category data {
   {'id': 0, 'name': 'sample_item1', 'supercategory': ''},
   {'id': 1, 'name': 'sample_item2', 'supercategory': ''},

Camera Model

We use the OpenCV pinhole camera model.  Positive Z axis points into the scene, positive X axis points to the right and positive Y axis points towards the bottom of the image plane.

Camera Intrinsics (K)

Each RGB image will have a unique set of camera intrinsics K.

# example K matrix
array([[1.29900928e+03, 0.00000000e+00, 9.60000000e+02],
      [0.00000000e+00, 1.29900928e+03, 5.40000000e+02],
      [0.00000000e+00, 0.00000000e+00, 1.00000000e+00]])

Camera Relative Pose (RT)

Every item has an associated camera relative pose matrix that describes the translation and orientation of the item relative to the camera.  All units are in meters (m) for translation quantities.  

RT = np.matmul(world_to_camera_pose_RT, object_to_world_pose_RT)

# example RT matrix
# RT[:3, :3] is rotation, RT[:, 3] is translation
array([[-0.51877965,  0.81398015, -0.26135031, -0.34774154],
      [-0.84177769, -0.53973034, -0.01007326,  0.18114404],
      [-0.14925813,  0.21477306,  0.96519145,  1.06569358]])

Occlusion Metric

The visible percentage is a measure of the item’s occlusion in the scene.  

We project the perspective transformed bounding box of the item and calculate the fraction of visible pixels / total pixels within the perspective transformed bounding box.  A value of 0.0 denotes no visible item parts and 1.0 would denote full view of the item from the given perspective.

In the example below the visible percentage metric would be 0.1515.  

Code Samples

Download sample dataset

Please provide your name and email address for a link to download the SBX sample dataset.

Thanks for your interest!  Download the SBX sample dataset
Oops! Something went wrong while submitting the form.

#1: project 2D bounding box

Projecting the 2D bounding box field into the RGB image for an associated annotation.

#2: display segmentation mask

Load the dataset in python3 using pycocotools, and display the annotation mask for a particular sample.

#3: iterate through all masks

Iterate through all the images, showing projecting the masks and showing the raw masks for each frame.

#4: display depth frame

Load corresponding RGB and Depth frames for a given scene.

The depth images are scaled depth_factor which transforms the uint16 png depth frame into metric distances (m)

'id': 3,
'scene_id': 0,
'channel': 'rgb'

'id': 103,
'scene_id': 0,
'channel': 'depth'

#5: plotting a 3D bounding box (bbox_3D)

Each annotation has a corresponding entry that describes the geometry of the bounding box as a 8x3 matrix.  The 8 points represent the 8 corners of the item in the item’s coordinate system.  All units are in meters (m).

The bounding box 3D points can be transformed and projected into the camera frame with the intrinsic and relative pose matrix, as shown in the example below.

The example code will plot a red mask over the item with the bounding box corners drawn as yellow circles.

#6 plotting mesh overlay + all annotations

For datasets concerned with 6D pose, the meshes for items are included in the meshes sub-directory

This example will produce a 4-up display for each annotation including : 2D bounding box, 3D bounding box, projected sample of mesh vertices onto both the RGB and Depth frames