May 08, 2026
•
Dataset Release
Published the cad-gen-freecad Dataset to Hugging Face
We are open-sourcing a FreeCAD dataset for CAD generation tasks: https://huggingface.co/datasets/gnucleus-ai/cad-gen-freecad
The goal is to help move AI CAD research beyond generating meshes such as STL, or dumb B-Rep files such as STEP, toward agents that can generate editable, parametric CAD models with feature history from text prompts.
What each sample contains
- Native FreeCAD file (ground truth): a .FCStd file with parametric feature history, step-by-step CAD operations, and associated 2D sketches — the reference geometry that generated outputs are evaluated against.
- Design specification: the part name, description, and key parameters needed to deterministically define the 3D geometry in FreeCAD.
- Image rendering: a PNG preview rendered from the CAD model for quick visual comparison.
- 3D visualization data: rendered 3D display data that can be loaded into a viewer for interactive inspection of the generated part.
Parametric, not dumb geometry
The FreeCAD files are not dumb solids (STEP) or meshes (STL, GLB). Each part preserves the step-by-step CAD construction process, including features such as Pad, Pocket, Loft, Sweep, Pattern operations, and the associated 2D sketches.
Structured design specs
The design spec describes the target part in a structured way, including the part name, natural-language description, and key parameters such as dimensions, positions, counts, radii, angles, and other values needed to define the model deterministically.
Why accurate CAD generation from design specs matters
Accurate CAD generation matters because CAD is an engineering artifact, not just a visual 3D shape. The generated CAD model must correctly reflect the design specification, including dimensions, constraints, feature structure, and engineering intent. The goal is not only to create geometry that looks similar, but to generate valid, rebuildable, and editable CAD that matches the intended design.
This is especially important for training AI CAD agents, including with RL, because the agent needs consistent and meaningful feedback on whether its generated CAD satisfies the design spec. Evaluation should therefore measure not only visual similarity, but also geometric accuracy, rebuild validity, parametric structure, and consistency with the original design requirements.
Why FreeCAD
FreeCAD is open source, fully scriptable from Python, and has a stable native format (.FCStd) that preserves the full feature history. Because it runs entirely offline, it's also straightforward to drop into a sandboxed container for automated evaluation. And it's a real CAD system that engineers use — the operations, constraints, and conventions match production workflows, rather than a research-only DSL on top of an open-source geometry kernel like OpenCascade / OCCT.
Why Part Design (feature-based)
FreeCAD has two solid-modeling styles: the Part Workbench (CSG — booleans on primitives) and the Part Design Workbench (feature-based — sketches plus a parametric feature tree). We use Part Design because it captures engineering intent — parameter-driven operations (Pad, Pocket, Loft, Sweep, Pattern…) on top of sketches — and is how professional CAD (SolidWorks, Onshape, Catia, NX, Creo) actually works. That structure gives a much richer evaluation signal — right features used, parameters match, part still rebuilds when a dimension changes — which CSG loses once the booleans are applied.
What's next
We're also planning to open-source an evaluator and publish a CAD benchmark on top of this dataset.
Would love feedback from anyone working on AI CAD, CAD automation, engineering tasks for AI agents, or agent benchmarks.
View on HuggingFace →