Why You Should Utilize Hybrid Data

A hybrid synthetic data approach is best implemented when some real data exists. This is when part of the dataset is generated from supposed distributions and other parts from actual data. This best of both worlds approach leads to unmatched, rapid datasets for your business.

10x Faster

Using synthetic data with a small amount of real data can be 10x faster than collecting and annotating thousands of real images. See model improvements in weeks not months

95-99% Rapid Training Data

Datasets that are 95-99% synth with a small fraction of real data can outperform training on real data alone

Started With Just 100 Images

Starting with just 100 real images is often enough to reach strong performance when combined with 10Ks of synthetic training image data.

Witness the Possibilities of Synthetic Data Through Our Case Studies

Curious to see what synthetic data can do for you? From object detection to item manipulation and identification, view our case studies today to turn your business problem into a data-driven solution.

Hybrid Data FAQ

What do I need to get started?

Right arrow in a circular white circle

Sharing a small amount of real data with SBX is the fastest way to get started.

If you have pre-annotated training or validation sets, SBX can use those, but we are happy to assist in annotating real images. 

The best validation data to share is both 

                    i)  diverse : representative of real data

                   ii)  challenging : rare images or those with known performance issues on existing 

       vision models.

Will SBX annotate real image data?

Right arrow in a circular white circle

Yes, SBX can assist with annotation of small batches of real data.

How does it work? How does SBX combine real + synthetic data?

Right arrow in a circular white circle

The simplest hybrid training technique is using a mixture of real and synthetic images. SBX runs internal hybrid benchmarking to ensure the synthetic data produced is complementary to the real training data, and provides performance lift on the validation data 

How much real data do I need to provide?

Right arrow in a circular white circle

The number of images can vary depending on the application and item diversity. However a reasonable range is 100-200 real images 

Can SBX match my sensor intrinsics / calibration?

Right arrow in a circular white circle

Yes, if you have calibrated your image sensor this information is very helpful to share with SBX.

What is Hybrid Data?

Hybrid data is data that’s created based on mathematical algorithms (synthetic data) and also real-world events (manual data). This makes it an efficient and reliable alternative to standard data.

Benefits of Hybrid Data

  • Producing hybrid-synthetic data takes less time and is highly cost-effective
  • Reduces the constraints on obtaining difficult-to-retrieve or tightly regulated data
  • Has the ability to be shared and used across industries or with colleagues faster
  • Use it to train & pre-train machine learning methods with large data repositories


Each SBX dataset is the product of iterative testing and optimization to achieve the best performance on real-world data. This way, you can skip costly hardware setup, time-consuming data collection, data annotation, and data cleaning.

Try out our platform, it’s easy!

Share 25 images from your vision system, and we will generate an optimized training set of 25,000 annotated synthetic images.