Benchmarks for AI Models and Agents on CAD Tasks
gNucleus-CAD-Bench is a comprehensive collection of benchmarks to benchmark CAD models and AI agents on CAD design and 3D modeling tasks.

Leaderboard
View full leaderboard →| Rank | Name | Geometry Accuracy | Spec Consistency | Overall |
|---|---|---|---|---|
1 | Claude Opus 4.7 | 89.2 | 87.0 | 87.6 |
2 | GPT-5.3-Codex | 86.8 | 83.5 | 85.0 |
3 | Claude Opus 4.5 | 82.5 | 79.8 | 80.9 |
4 | Claude Opus 4.6 | 82.1 | 79.6 | 80.8 |
5 | Gemini 3.1 Pro | 81.9 | 79.0 | 80.6 |
6 | MiniMax M2.5 | 81.0 | 79.3 | 80.2 |
7 | GPT-5.2 | 80.7 | 78.5 | 80.0 |
8 | Qwen3.6 Plus | 80.0 | 77.5 | 78.8 |
9 | GLM-5 | 78.8 | 76.9 | 77.8 |
10 | Muse Spark | 78.5 | 76.5 | 77.4 |
Coming Soon
Leaderboard scores will be published when the benchmark goes live.
Benchmarks Tasks
3D Parametric Part Generation

Generate editable, parametric 3D CAD part models from natural-language prompts and reference inputs.
Assembly Generation

Generate multi-part assemblies with proper mates, constraints, and component hierarchy.
Complex CAD workflow

Multi-step CAD workflows that generate, iteratively edit, and verify designs until the model meets the target spec.
Evaluation Methods
CAD evaluation should check not only visual similarity, but also whether the generated CAD is valid, accurate, rebuildable, and consistent with the design spec.
Each task is scored automatically in a sandboxed CAD environment, comparing the generated CAD against the design spec and reference CAD (ground truth) across the axes below. Scoring is deterministic — same output, same score.
Geometry Accuracy
Measures how closely the generated geometry matches the reference part.
Constraint & Assembly Correctness
Checks constraint satisfaction, mating validity, and assembly stability.
Parametric Correctness
Verifies CAD model consistency with the spec parameters.
Topology / Structure
Evaluates topological validity and part structure correctness.
Agent Workflow Success
Assesses task completion rate and workflow correctness.
Efficiency
Measures token usage, execution time, and resource efficiency.
Latest Updates
May 10, 2026
A programmatic grader for evaluating AI-generated parametric FreeCAD parts — checks geometry similarity to the ground truth and CAD/spec consistency.
May 08, 2026
Open-sourced cad-gen-freecad on HuggingFace — native FreeCAD parts with parametric feature history, design specs, and renderings, aimed at accurate, parametric CAD generation from text prompt.
Hugging Face