Benchmarks for AI Models and Agents on CAD Tasks

gNucleus-CAD-Bench is a comprehensive collection of benchmarks to benchmark CAD models and AI agents on CAD design and 3D modeling tasks.

Gear box CAD assembly
RankNameGeometry AccuracySpec ConsistencyOverall
1
Claude Opus 4.789.287.087.6
2
GPT-5.3-Codex86.883.585.0
3
Claude Opus 4.582.579.880.9
4
Claude Opus 4.682.179.680.8
5
Gemini 3.1 Pro81.979.080.6
6
MiniMax M2.581.079.380.2
7
GPT-5.280.778.580.0
8
Qwen3.6 Plus80.077.578.8
9
GLM-578.876.977.8
10
Muse Spark78.576.577.4

Coming Soon

Leaderboard scores will be published when the benchmark goes live.

Benchmarks Tasks

3D Parametric Part Generation

3D Parametric Part Generation

Generate editable, parametric 3D CAD part models from natural-language prompts and reference inputs.

Assembly Generation

Assembly Generation

Generate multi-part assemblies with proper mates, constraints, and component hierarchy.

Complex CAD workflow

Complex CAD workflow

Multi-step CAD workflows that generate, iteratively edit, and verify designs until the model meets the target spec.

Evaluation Methods

CAD evaluation should check not only visual similarity, but also whether the generated CAD is valid, accurate, rebuildable, and consistent with the design spec.

Each task is scored automatically in a sandboxed CAD environment, comparing the generated CAD against the design spec and reference CAD (ground truth) across the axes below. Scoring is deterministic — same output, same score.

Geometry Accuracy

Measures how closely the generated geometry matches the reference part.

Constraint & Assembly Correctness

Checks constraint satisfaction, mating validity, and assembly stability.

Parametric Correctness

Verifies CAD model consistency with the spec parameters.

Topology / Structure

Evaluates topological validity and part structure correctness.

Agent Workflow Success

Assesses task completion rate and workflow correctness.

Efficiency

Measures token usage, execution time, and resource efficiency.

Latest Updates

May 10, 2026

Open-sourced freecad-validator on GitHub

A programmatic grader for evaluating AI-generated parametric FreeCAD parts — checks geometry similarity to the ground truth and CAD/spec consistency.


May 08, 2026

Published the cad-gen-freecad Dataset to Hugging Face

Open-sourced cad-gen-freecad on HuggingFace — native FreeCAD parts with parametric feature history, design specs, and renderings, aimed at accurate, parametric CAD generation from text prompt.

gNucleus AI
Product

© 2025 gNucleus AI. All rights reserved.