Unreasonably Effective
hybrid Data
By combining a small amount of real training data with a large synthetic set, models reach peak performance. Rapidly iterate on datasets to solve real-world vision tasks.
Why You Should Utilize Hybrid Data
A hybrid synthetic data approach is best implemented when some real data exists. This is when part of the dataset is generated from supposed distributions and other parts from actual data. This best of both worlds approach leads to unmatched, rapid datasets for your business.

10x Faster
Using synthetic data with a small amount of real data can be 10x faster than collecting and annotating thousands of real images. See model improvements in weeks not months
95-99% Rapid Training Data
Datasets that are 95-99% synth with a small fraction of real data can outperform training on real data alone
Started With Just 100 Images
Starting with just 100 real images is often enough to reach strong performance when combined with 10Ks of synthetic training image data.
Remove Data as a Blocker & Get Back to Building

Witness the Possibilities of Synthetic Data Through Our Case Studies
Curious to see what synthetic data can do for you? From object detection to item manipulation and identification, view our case studies today to turn your business problem into a data-driven solution.
Hybrid Data FAQ
What do I need to get started?
Sharing a small amount of real data with SBX is the fastest way to get started.
If you have pre-annotated training or validation sets, SBX can use those, but we are happy to assist in annotating real images.
The best validation data to share is both
i) diverse : representative of real data
ii) challenging : rare images or those with known performance issues on existing
vision models.
Will SBX annotate real image data?
Yes, SBX can assist with annotation of small batches of real data.
How does it work? How does SBX combine real + synthetic data?
The simplest hybrid training technique is using a mixture of real and synthetic images. SBX runs internal hybrid benchmarking to ensure the synthetic data produced is complementary to the real training data, and provides performance lift on the validation data
How much real data do I need to provide?
The number of images can vary depending on the application and item diversity. However a reasonable range is 100-200 real images
Can SBX match my sensor intrinsics / calibration?
Yes, if you have calibrated your image sensor this information is very helpful to share with SBX.
What is Hybrid Data?
Hybrid data is data that’s created based on mathematical algorithms (synthetic data) and also real-world events (manual data). This makes it an efficient and reliable alternative to standard data.
Benefits of Hybrid Data
- Producing hybrid-synthetic data takes less time and is highly cost-effective
- Reduces the constraints on obtaining difficult-to-retrieve or tightly regulated data
- Has the ability to be shared and used across industries or with colleagues faster
- Use it to train & pre-train machine learning methods with large data repositories
Why SBX
Each SBX dataset is the product of iterative testing and optimization to achieve the best performance on real-world data. This way, you can skip costly hardware setup, time-consuming data collection, data annotation, and data cleaning.
Try out our platform, it’s easy!
Share 25 images from your vision system, and we will generate an optimized training set of 25,000 annotated synthetic images.