Is SymageDocs a drop-in replacement for Gretel?

Only if your Gretel use case was document-shaped — and most weren't. Gretel's strength was privacy-preserving synthetic structured and text data, and NVIDIA NeMo continues that line with formal differential privacy. SymageDocs is a document data platform: filled regulated forms rendered as labeled pages. If you used Gretel to synthesize database tables, NeMo Data Designer is the natural successor. If you reached for Gretel hoping for labeled synthetic documents, that's the gap SymageDocs fills.

Doesn't NeMo Data Designer generate any kind of data?

NVIDIA's marketing does cite synthetic document datasets — tax forms, legal, and mortgage documents — as Data Designer use cases. Its documented generation surface is sampler columns (categorical, numerical distributions, temporal, person entities) and LLM-generated text, code, and structured JSON. The documentation does not describe rendered page images, filled PDF forms, or word-level bounding-box/layout-annotated training data (FUNSD-style ground truth). That isn't a criticism — it's a different product scope.

Can I still get a self-serve, free-tier evaluation like Gretel used to offer?

Not from Gretel — its old self-serve pricing page is gone. NVIDIA's model is open source for development (NeMo Data Designer is Apache 2.0) with NVIDIA AI Enterprise licensing for supported production use, which is an enterprise motion without a published price list. SymageDocs is self-serve end to end: 500 free credits at signup, published plans from $79/month (billed annually), no sales call at any self-serve tier.

Does SymageDocs offer differential privacy like Gretel did?

No, and we don't need to make that claim: differential privacy protects real people whose data trained a generator. SymageDocs documents are filled from simulated identities — there is no real-person training record to leak, so the privacy property is structural rather than statistical. If your mandate is specifically to share a privacy-preserving version of an existing sensitive dataset, a DP-based tool like NeMo Safe Synthesizer is the right category; if your mandate is training data for document models, generation from simulated identities sidesteps the problem entirely.

Gretel Alternative for Synthetic Document Data

For years, “synthetic data API for developers” meant Gretel: a self-serve console, a Python client, open-source libraries, and serious differential-privacy work on tabular and text data. That era ended with the NVIDIA acquisition. If you’re an ex-Gretel user — or you searched for Gretel and found a redirect — your next step depends entirely on what shape of data you need. This page covers both paths, including the one that doesn’t lead to us.

What happened to Gretel

In March 2025, Wired and TechCrunch reported that NVIDIA had acquired Gretel for a price exceeding its most recent $320M valuation, with the team folded into NVIDIA’s generative AI services. The practical effects, as of June 2026:

gretel.ai redirects to NVIDIA. The domain now 301-redirects to NVIDIA’s synthetic data generation page; the Gretel console, docs, and pricing pages are offline.
The open-source repos are archived. gretel-synthetics and gretel-python-client were archived on GitHub in February 2026 and are now read-only.
The technology continues inside NVIDIA NeMo. NeMo Data Designer (open source, Apache 2.0) generates synthetic datasets from scratch or from seed data, and NeMo Safe Synthesizer carries the privacy line forward — including formal differential privacy via DP-SGD.

What the Gretel-to-NeMo line is genuinely good at

Privacy with formal guarantees. NeMo Safe Synthesizer detects and replaces PII, generates synthetic data preserving statistical properties, and applies record-level differential privacy via DP-SGD. If your mandate is “share a provably private version of this sensitive table,” that is the right category of tool.
Tabular, text, code, and JSON at scale. Data Designer’s documented column types span categorical and numerical distributions, temporal data, person entities, and LLM-generated text, code, and structured JSON — a flexible framework for production-grade synthetic datasets.
Open source plus NVIDIA backing. Data Designer is Apache 2.0, works with multiple LLM endpoints, and plugs into the broader NeMo enterprise stack.

If you were using Gretel for tabular synthesis, NeMo Data Designer is the lineal successor and the first thing you should evaluate. No spin from us on that.

The document-shaped gap

Here is the boundary that matters for document AI teams. A document extraction pipeline trains on pages: rendered, filled forms paired with per-word bounding boxes, field IDs, and entity labels. Measured against that need:

Synthetic documents, but no layout labels. NVIDIA does market synthetic document datasets — its synthetic data generation page cites tax forms, legal, and mortgage documents as NeMo Data Designer use cases. What the Data Designer documentation does not describe is the shape a document extraction pipeline trains on: rendered page images paired with word-level bounding boxes and layout labels (FUNSD-style ground truth).
The self-serve evaluation path changed. Gretel’s public pricing with free monthly credits is gone. NVIDIA’s published model is open source for development, with NVIDIA AI Enterprise for supported production — an enterprise licensing motion without a published price list.
Statistical fidelity is not layout fidelity. Even a perfectly distributed synthetic table doesn’t teach a model where the wages box sits on a W-2, what a handwritten date looks like after a fax pass, or how a 1040’s line items reconcile with the W-2 stapled behind it. Document models learn from documents.

Dimension by dimension

Dimension	Gretel → NVIDIA NeMo	SymageDocs
Data shape	Tabular, text, code, structured JSON (documented column types); synthetic document datasets cited in NVIDIA marketing	Filled document images: PDFs + PNGs, typed and handwritten
Layout-labeled training data	Not described in product docs	Per-word bboxes + entity labels; FUNSD-format ground truth in every bundle
Privacy model	Differential privacy (DP-SGD) over real source data	No real source data — identities are simulated, zero PII by construction
Identity coherence	Statistical fidelity to a source distribution	Population model: one identity fills multiple linked forms that reconcile
Open source	Data Designer is Apache 2.0	Commercial platform with a Python SDK
Pricing	OSS for dev; NVIDIA AI Enterprise for production (no public price list)	All tiers public; 500 free credits, from $79/mo billed annually
Best for	Privacy-safe synthetic tables and LLM training text	Training and testing document AI / OCR pipelines

Gretel status and NeMo capabilities verified against NVIDIA documentation, GitHub, and press coverage, June 2026. Sources: NeMo Data Designer docs, TechCrunch, gretel-synthetics (archived).

Document-shaped synthetic data, in code

If Gretel’s developer experience is what you’ll miss most, this should feel familiar: pip install symagedocs, one client, seeded jobs, typed results. The difference is what comes out the other end.

From API call to labeled document dataset

Form: IRS W-2 (2026). FUNSD-format ground-truth annotations ride alongside the rendered pages in every bundle.

python

from symagedocs import Client

client = Client(api_key="sk_live_...")

# Document-shaped output: rendered, filled forms plus the
# layout labels a document AI pipeline trains on. FUNSD-style
# ground truth is always included in the bundle; identities
# are simulated, so there's no source dataset and no PII.
job = client.generate.create(
    form_id="irs_w2_single_page_2026",
    quantity=1_000,
    seed=42,
    output_formats=["pdf_typed", "png_typed"],
)

client.generate.wait(job.job_id)

# One zip bundle: rendered PDFs, page images, and annotations
# with per-word bounding boxes, field IDs, and entity types.
client.generate.download(job.job_id, format="dataset", path="./w2_dataset.zip")

For document AI training

Tabular synthesis vs document generation — training-data readiness

Gretel (sunset as standalone)
Console, docs, and self-serve pricing offline as of June 2026; open-source repos archived. Strong legacy in private tabular/text synthesis.
NVIDIA NeMo Data Designer
Capable successor for tabular, text, code, and JSON — but its docs describe no rendered document images or bounding-box-labeled training output.

SymageDocs

Filled regulated forms rendered as realistic pages with per-word bounding boxes, entity labels, and ML-ready exports — self-serve, seeded, and generated from coherent simulated identities with no source data required.

When to use each

Task	NeMo Data Designer / Safe Synthesizer	SymageDocs
Privacy-safe copy of a sensitive table	Use it — DP-SGD is the real thing.	Not what we do.
Synthetic text / code datasets for LLM work	Built for this.	Not the right tool.
Train a form-extraction model (LayoutLM, Donut, custom)	No documented document-image output.	Purpose-built for this.
KYC / fraud test sets with corroborating documents	Rows don’t corroborate across forms.	One identity fills linked W-2s, 1040s, and claims.
Evaluate self-serve this afternoon	OSS install; production via enterprise license.	500 free credits at signup.

Like every comparison on this site, the honest answer can be “both”: NeMo for your tabular and text needs, SymageDocs for the document corpus. If your pipeline starts at a W-2, a 1040, or a CMS-1500, the form pages show real generated output. For the model-by-model training story, see synthetic training data for document AI.

Frequently asked questions

What happened to Gretel?: NVIDIA's acquisition of Gretel was reported by Wired and TechCrunch in March 2025, in a deal reported to exceed Gretel's $320M valuation. As of June 2026, gretel.ai redirects to NVIDIA's synthetic data generation pages, the Gretel console and docs are offline, and Gretel's open-source repositories (gretel-synthetics, gretel-python-client) were archived in February 2026. The technology continues inside NVIDIA's NeMo platform — most visibly NeMo Data Designer and NeMo Safe Synthesizer.
Is SymageDocs a drop-in replacement for Gretel?: Only if your Gretel use case was document-shaped — and most weren't. Gretel's strength was privacy-preserving synthetic structured and text data, and NVIDIA NeMo continues that line with formal differential privacy. SymageDocs is a document data platform: filled regulated forms rendered as labeled pages. If you used Gretel to synthesize database tables, NeMo Data Designer is the natural successor. If you reached for Gretel hoping for labeled synthetic documents, that's the gap SymageDocs fills.
Doesn't NeMo Data Designer generate any kind of data?: NVIDIA's marketing does cite synthetic document datasets — tax forms, legal, and mortgage documents — as Data Designer use cases. Its documented generation surface is sampler columns (categorical, numerical distributions, temporal, person entities) and LLM-generated text, code, and structured JSON. The documentation does not describe rendered page images, filled PDF forms, or word-level bounding-box/layout-annotated training data (FUNSD-style ground truth). That isn't a criticism — it's a different product scope.
Can I still get a self-serve, free-tier evaluation like Gretel used to offer?: Not from Gretel — its old self-serve pricing page is gone. NVIDIA's model is open source for development (NeMo Data Designer is Apache 2.0) with NVIDIA AI Enterprise licensing for supported production use, which is an enterprise motion without a published price list. SymageDocs is self-serve end to end: 500 free credits at signup, published plans from $79/month (billed annually), no sales call at any self-serve tier.
Does SymageDocs offer differential privacy like Gretel did?: No, and we don't need to make that claim: differential privacy protects real people whose data trained a generator. SymageDocs documents are filled from simulated identities — there is no real-person training record to leak, so the privacy property is structural rather than statistical. If your mandate is specifically to share a privacy-preserving version of an existing sensitive dataset, a DP-based tool like NeMo Safe Synthesizer is the right category; if your mandate is training data for document models, generation from simulated identities sidesteps the problem entirely.

Need documents, not rows?

Generate labeled synthetic W-2s, 1040s, and healthcare claims from coherent identities. Start with 500 free credits — no credit card required.

Start for free

Gretel alternative for synthetic document data