Compare

Gretel alternative for synthetic document data

Gretel built a genuinely good developer platform for privacy-preserving tabular and text synthesis — and its technology now lives inside NVIDIA NeMo. If you arrived here looking for the part Gretel never built — filled, labeled synthetic documents — this page maps the landscape honestly.

For years, “synthetic data API for developers” meant Gretel: a self-serve console, a Python client, open-source libraries, and serious differential-privacy work on tabular and text data. That era ended with the NVIDIA acquisition. If you’re an ex-Gretel user — or you searched for Gretel and found a redirect — your next step depends entirely on what shape of data you need. This page covers both paths, including the one that doesn’t lead to us.

What happened to Gretel

In March 2025, Wired and TechCrunch reported that NVIDIA had acquired Gretel for a price exceeding its most recent $320M valuation, with the team folded into NVIDIA’s generative AI services. The practical effects, as of June 2026:

What the Gretel-to-NeMo line is genuinely good at

  • Privacy with formal guarantees. NeMo Safe Synthesizer detects and replaces PII, generates synthetic data preserving statistical properties, and applies record-level differential privacy via DP-SGD. If your mandate is “share a provably private version of this sensitive table,” that is the right category of tool.
  • Tabular, text, code, and JSON at scale. Data Designer’s documented column types span categorical and numerical distributions, temporal data, person entities, and LLM-generated text, code, and structured JSON — a flexible framework for production-grade synthetic datasets.
  • Open source plus NVIDIA backing. Data Designer is Apache 2.0, works with multiple LLM endpoints, and plugs into the broader NeMo enterprise stack.

If you were using Gretel for tabular synthesis, NeMo Data Designer is the lineal successor and the first thing you should evaluate. No spin from us on that.

The document-shaped gap

Here is the boundary that matters for document AI teams. A document extraction pipeline trains on pages: rendered, filled forms paired with per-word bounding boxes, field IDs, and entity labels. Measured against that need:

  • Synthetic documents, but no layout labels. NVIDIA does market synthetic document datasets — its synthetic data generation page cites tax forms, legal, and mortgage documents as NeMo Data Designer use cases. What the Data Designer documentation does not describe is the shape a document extraction pipeline trains on: rendered page images paired with word-level bounding boxes and layout labels (FUNSD-style ground truth).
  • The self-serve evaluation path changed. Gretel’s public pricing with free monthly credits is gone. NVIDIA’s published model is open source for development, with NVIDIA AI Enterprise for supported production — an enterprise licensing motion without a published price list.
  • Statistical fidelity is not layout fidelity. Even a perfectly distributed synthetic table doesn’t teach a model where the wages box sits on a W-2, what a handwritten date looks like after a fax pass, or how a 1040’s line items reconcile with the W-2 stapled behind it. Document models learn from documents.

Dimension by dimension

DimensionGretel → NVIDIA NeMoSymageDocs
Data shapeTabular, text, code, structured JSON (documented column types); synthetic document datasets cited in NVIDIA marketingFilled document images: PDFs + PNGs, typed and handwritten
Layout-labeled training dataNot described in product docsPer-word bboxes + entity labels; FUNSD-format ground truth in every bundle
Privacy modelDifferential privacy (DP-SGD) over real source dataNo real source data — identities are simulated, zero PII by construction
Identity coherenceStatistical fidelity to a source distributionPopulation model: one identity fills multiple linked forms that reconcile
Open sourceData Designer is Apache 2.0Commercial platform with a Python SDK
PricingOSS for dev; NVIDIA AI Enterprise for production (no public price list)All tiers public; 250 free credits, from $79/mo billed annually
Best forPrivacy-safe synthetic tables and LLM training textTraining and testing document AI / OCR pipelines

Gretel status and NeMo capabilities verified against NVIDIA documentation, GitHub, and press coverage, June 2026. Sources: NeMo Data Designer docs, TechCrunch, gretel-synthetics (archived).

Document-shaped synthetic data, in code

If Gretel’s developer experience is what you’ll miss most, this should feel familiar: pip install symagedocs, one client, seeded jobs, typed results. The difference is what comes out the other end.

From API call to labeled document dataset

Form: IRS W-2 (2026). FUNSD-format ground-truth annotations ride alongside the rendered pages in every bundle.

python
from symagedocs import Client

client = Client(api_key="sk_live_...")

# Document-shaped output: rendered, filled forms plus the
# layout labels a document AI pipeline trains on. FUNSD-style
# ground truth is always included in the bundle; identities
# are simulated, so there's no source dataset and no PII.
job = client.generate.create(
    form_id="irs_w2_2026",
    quantity=1_000,
    seed=42,
    output_formats=["pdf_typed", "png_typed"],
)

client.generate.wait(job.job_id)

# One zip bundle: rendered PDFs, page images, and annotations
# with per-word bounding boxes, field IDs, and entity types.
client.generate.download(job.job_id, format="dataset", path="./w2_dataset.zip")

For document AI training

Tabular synthesis vs document generation — training-data readiness

  • Gretel (sunset as standalone)

    Console, docs, and self-serve pricing offline as of June 2026; open-source repos archived. Strong legacy in private tabular/text synthesis.

  • NVIDIA NeMo Data Designer

    Capable successor for tabular, text, code, and JSON — but its docs describe no rendered document images or bounding-box-labeled training output.

SymageDocs

Filled regulated forms rendered as realistic pages with per-word bounding boxes, entity labels, and ML-ready exports — self-serve, seeded, and generated from coherent simulated identities with no source data required.

When to use each

TaskNeMo Data Designer / Safe SynthesizerSymageDocs
Privacy-safe copy of a sensitive tableUse it — DP-SGD is the real thing.Not what we do.
Synthetic text / code datasets for LLM workBuilt for this.Not the right tool.
Train a form-extraction model (LayoutLM, Donut, custom)No documented document-image output.Purpose-built for this.
KYC / fraud test sets with corroborating documentsRows don’t corroborate across forms.One identity fills linked W-2s, 1040s, and claims.
Evaluate self-serve this afternoonOSS install; production via enterprise license.250 free credits at signup.

Like every comparison on this site, the honest answer can be “both”: NeMo for your tabular and text needs, SymageDocs for the document corpus. If your pipeline starts at a W-2, a 1040, or a CMS-1500, the form pages show real generated output. For the model-by-model training story, see synthetic training data for document AI.

Frequently asked questions

What happened to Gretel?
NVIDIA's acquisition of Gretel was reported by Wired and TechCrunch in March 2025, in a deal reported to exceed Gretel's $320M valuation. As of June 2026, gretel.ai redirects to NVIDIA's synthetic data generation pages, the Gretel console and docs are offline, and Gretel's open-source repositories (gretel-synthetics, gretel-python-client) were archived in February 2026. The technology continues inside NVIDIA's NeMo platform — most visibly NeMo Data Designer and NeMo Safe Synthesizer.
Is SymageDocs a drop-in replacement for Gretel?
Only if your Gretel use case was document-shaped — and most weren't. Gretel's strength was privacy-preserving synthetic structured and text data, and NVIDIA NeMo continues that line with formal differential privacy. SymageDocs is a document data platform: filled regulated forms rendered as labeled pages. If you used Gretel to synthesize database tables, NeMo Data Designer is the natural successor. If you reached for Gretel hoping for labeled synthetic documents, that's the gap SymageDocs fills.
Doesn't NeMo Data Designer generate any kind of data?
NVIDIA's marketing does cite synthetic document datasets — tax forms, legal, and mortgage documents — as Data Designer use cases. Its documented generation surface is sampler columns (categorical, numerical distributions, temporal, person entities) and LLM-generated text, code, and structured JSON. The documentation does not describe rendered page images, filled PDF forms, or word-level bounding-box/layout-annotated training data (FUNSD-style ground truth). That isn't a criticism — it's a different product scope.
Can I still get a self-serve, free-tier evaluation like Gretel used to offer?
Not from Gretel — its old self-serve pricing page is gone. NVIDIA's model is open source for development (NeMo Data Designer is Apache 2.0) with NVIDIA AI Enterprise licensing for supported production use, which is an enterprise motion without a published price list. SymageDocs is self-serve end to end: 250 free credits at signup, published plans from $79/month (billed annually), no sales call at any self-serve tier.
Does SymageDocs offer differential privacy like Gretel did?
No, and we don't need to make that claim: differential privacy protects real people whose data trained a generator. SymageDocs documents are filled from simulated identities — there is no real-person training record to leak, so the privacy property is structural rather than statistical. If your mandate is specifically to share a privacy-preserving version of an existing sensitive dataset, a DP-based tool like NeMo Safe Synthesizer is the right category; if your mandate is training data for document models, generation from simulated identities sidesteps the problem entirely.

Need documents, not rows?

Generate labeled synthetic W-2s, 1040s, and healthcare claims from coherent identities. Start with 250 free credits — no credit card required.

Start for free