Back to CAD-Bench

May 10, 2026

Open Source Release

Open-sourced freecad-validator on GitHub

Last week, we shared the gnucleus-ai/cad-gen-freecad dataset on Hugging Face. This week, we're open-sourcing freecad-validator — a deterministic, programmatic grader for evaluating AI-generated parametric FreeCAD parts. No LLM, no GPU.

GitHub Repo: https://github.com/gNucleus-AI/freecad-validator/


Why a programmatic grader

As AI agents generate CAD models, evaluation becomes critical. For CAD generation, visual similarity is not enough — the AI model also needs to match the design spec, rebuild correctly, and be scored consistently. A deterministic, reproducible grader gives the same candidate the same score every time, which is what benchmarks and RL reward signals need to be trustworthy.

What freecad-validator checks and why
  • Geometry similarity to the ground truth — does the generated part match the referenced FreeCAD parts across surface types, volume, surface area, and bounding box?
    Why: a CAD script can look plausible and still produce the wrong shape — wrong topology, swapped features, broken booleans. Geometry similarity catches outputs that don't rebuild into the intended part, regardless of how clean the underlying script reads.
  • CAD / spec consistency — do the generated FreeCAD parts correctly reflect the design specification, including dimensions, features, and engineering intent?
    Why: two parts can have near-identical geometry while encoding very different engineering intent — wrong dimensions, hard-coded magic numbers, or features that happen to match by coincidence. Spec consistency confirms the model actually understood the design requirements, which is what makes the output editable, parametric, and useful downstream.
How the two axes combine

The two sub-scores are combined into a single overall verdict via the harmonic mean, chosen so a strong score on one axis cannot rescue a weak score on the other:

                    2 · g · s
combined(g, s) =  ─────────────       (returns 0 when either g or s is 0)
                      g + s

where g = geometry_similarity and s = cad_spec_consistency. A part that looks right but ignores the spec — or matches the spec but renders a different shape — cannot quietly pass.

The goal

Make CAD generation evaluation more deterministic, transparent, and useful for training engineering AI agents. A consistent, reproducible grader is the foundation for benchmarking models and providing reliable reward signals during RL training.


View on GitHub →
gNucleus AI
Product

© 2025 gNucleus AI. All rights reserved.