Synthetic Form 940 - Employer's Annual Federal Unemployment (FUTA) Tax Return Data

Synthetic training data — no real PII, fully coherent identities

tax2024

Generate synthetic Form 940 FUTA tax returns with realistic employer payroll data and state unemployment tax calculations. Train document AI models on payroll tax forms with multi-state wage allocation fields.

90

Fields per document

2

Pages

tax

Category

What this document is

Form 940 is the annual federal unemployment (FUTA) tax return filed by employers who pay wages of $1,500 or more in any calendar quarter. It calculates the employer's FUTA tax liability based on total wages, exempt payments, and state unemployment tax credits. The two-page form includes a multi-state credit reduction worksheet that adds tabular complexity.

Why generate synthetically

Form 940 is a core payroll tax document that appears in every employer's annual filing stack. Synthetic 940s enable training extraction models on payroll-specific fields like FUTA wages, state credit reductions, and deposit schedules without exposing real employer payroll data.

What makes synthetic data useful

Each synthetic Form 940 generates consistent payroll figures where FUTA-taxable wages align with total compensation minus exempt payments. Multi-state employers get realistic state-by-state wage allocations that sum to the correct total. EINs follow valid IRS formatting, and business names match realistic entity patterns.

Training challenges

Part 2 (Lines 3-8) contains a cascading wage calculation where each line subtracts specific exempt categories from total payments, and models must correctly associate each subtraction with its labeled category. The Part 5 multi-state credit reduction table (Schedule A attachment) requires extracting state abbreviations paired with wage amounts in a dense two-column grid. Lines 12-13 handle deposit schedule logic where the checkbox selection changes which subsequent fields are relevant, creating conditional extraction paths.

Generate synthetic Form 940 - Employer's Annual Federal Unemployment (FUTA) Tax Return data

Start with 250 free credits. No credit card required.

Generate Now

Who uses this data

Payroll-processor IDP pipelines (ADP, Gusto, Rippling-class products), small-business accounting platforms that auto-file FUTA returns, and fraud-detection teams building models that flag wage-misreporting across 940/941/W-2 triples. Any employer-side tax-automation stack needs synthetic 940s to train extractors on multi-state credit reduction tables without exposing real payroll.

Document complexity profile

The 940 is a 90-field, 3-page form with 55 text fields, 19 currency boxes, and 16 checkboxes. Despite its modest size, it is structurally dense: 29 of its bindings use conditional (IF) logic — the third-highest conditional density among our indexable forms — driven by the multi-state credit reduction worksheet and deposit-schedule selection that changes which downstream fields render. 79 function calls chain wage-exempt calculations through to line 12 total FUTA due.

Key stats from our synthetic corpus

Quantitative characteristics of the Form 940 - Employer's Annual Federal Unemployment (FUTA) Tax Return documents our generator produces.

MetricValueDetail
Conditional binding count2929 of the 90 fields on Form 940 use IF-conditional logic — the densest conditional layout of any employer payroll form in our corpus. Models must resolve checkbox state before mapping values to lines.
Employer eligibility rate65%65% of synthetic primary identities in our corpus have an employer attached. Form 940 is only issued for that subset — training datasets should size 940 corpora against this base rate, not the full population.
Form function-call density79Form 940 bindings invoke 79 FORMAT/IF function calls to reconcile total wages, exempt payments, and state-by-state credit reductions — more than any W-2 or 1040 variant.
Dual-income employer households31%31% of synthetic Form 940 employer-side identities are in dual-income households — informative for stacks that must reconcile 940 wages against multiple W-2s on the spousal side.
Households with dependents30%30% of synthetic Form 940 employer identities claim dependents on their personal returns — useful context when training models that link 940 employers to the personal 1040s they file.

How this document co-occurs with others

Rates at which identities in our corpus that produce a Form 940 - Employer's Annual Federal Unemployment (FUTA) Tax Return also produce other documents.

CorrelationRateDetail
Pairs with quarterly 941100%Every synthetic Form 940 employer also files Form 941 quarterly. Training models on the 940 + 4x941 annual bundle is the canonical employer-tax longitudinal dataset.
Employer-side W-2 issuance100%Every synthetic Form 940 employer issues W-2s to its employees. Wages reported on 940 Line 3 are consistent with aggregated W-2 Box 1 totals across the same identity seed.
Employer owner's 1040100%The sole-proprietor synthetic employer behind a Form 940 will also produce their own 1040. Useful for training KYC stacks that chain business filings to principal filings.
Classic invoice co-generation100%100% of synthetic Form 940 employers can co-generate a Classic Commercial Invoice — fintech AP automation pipelines can train on 940 + invoice document pairs from one identity seed.
Six-figure-income principals19%19% of synthetic Form 940 employer principals report personal household income at or above $100K, exercising the higher-wage tier of FUTA calculations.
Single-filer owner-operators50%~50% of synthetic Form 940 employer principals file a personal 1040 as Single — the owner-operator small-business archetype the synthetic pipeline is tuned to reflect.

All stats above are corpus-derived: they were computed on a local synthetic corpus of 1,000 generated identities produced by SymageDocs' World Simulation Engine. No real employer payroll or taxpayer data was used. Regenerate the corpus at any time with `make corpus-stats`.

Frequently asked questions

What data format do synthetic Form 940 documents include?
Each generated identity produces a filled PDF and a structured JSON annotation file containing bounding boxes and field values for all 90 fields across both pages.
Can I use this data commercially?
Yes. All synthetic data is generated from statistical models, contains no real PII, and is licensed for commercial use including ML model training and benchmarking.
How does the synthetic data differ from real Form 940s?
Synthetic 940s use fabricated employer identities with statistically realistic payroll figures. FUTA calculations follow IRS rules, but no data comes from real employer filings.
Does it support multi-state employers?
Yes. The generator can produce Form 940s with multi-state wage allocations, including the Schedule A credit reduction worksheet for employers operating across state lines.

Related Tax Forms