Deterministic Evidence-Based BIM Generation from 2D Vector Structural Floor Plans Kim, Si Uk Department of Architecture Engineering Graduate School Dankook University Advisor : Kim, Chee Kyeong The construction industry is operated on a vast asset bas...
Deterministic Evidence-Based BIM Generation from 2D Vector Structural Floor Plans Kim, Si Uk Department of Architecture Engineering Graduate School Dankook University Advisor : Kim, Chee Kyeong The construction industry is operated on a vast asset base of 2D CAD drawings and demands a paradigm shift to Building Information Modeling (BIM) in order to improve productivity and strengthen data quality. Under current domestic policy, there is a trend toward making BIM mandatory for all future public works, and the demand for converting 2D drawing data into 3D BIM is continuously increasing. Despite efforts to introduce BIM, the digitalization of the construction sector in Korea remains at an early stage in terms of BIM adoption on site, and the level of subsequent utilization is low. The main reason is that the process of building BIM models requires an excessive amount of time and manpower. In the process of engineers interpreting complex drawing information, highly skilled experts must perform repetitive manual work, and this leads to inefficiencies in time, manpower, and cost. Existing studies tend to rely on parsing based on algorithmic tree structures, machine learning that uses drawing image recognition techniques, or, more recently, large language models. Through these methods, patterns and shapes that frequently appear in real-world drawings have been extracted and recognized to construct automatic modeling pipelines. However, these methods have not resolved the needs for determinism, explicit evidence, and operational safety that are required in real projects. This study proposes a vector-first hybrid pipeline that combines uncertainty-aware rule-based parsing and large language model orchestration to generate structural BIM from 2D vector drawings. Each run is logged into an evidence file(evidence.json), a structured JSON record that aggregates inputs, parameters, diagnostics, and per element scores for reproducible auditing. Previous research on drawing recognition for practical conversion from drawings to BIM has explored rule-based vector parsing, machine learning on rasterized plan images, and pipelines that heavily depend on large language models. These approaches learn recurring patterns and shapes in real-world drawings and can automatically construct BIM models. However, they do not cover the full range of design symbols and project-specific rules and remain sensitive to overlapping or inconsistent layer conventions. In addition, they rarely express uncertainty explicitly, and there are very few cases that provide machine-readable evidence and deterministic execution that enable safe rollback, regression testing, and traceable auditing. The core contribution of this study is a hybrid pipeline that automates BIM generation from structural drawings based on explicit intermediate evidence and contract-like data representations. First, vector DXF files are taken as input, and overlapping objects are separated using geometric algorithms and layer information. Geometric defects such as unclosed polygons, broken lines, and micro gaps are detected and corrected to stabilize subsequent modeling stages, and each candidate wall, column, beam, and slab segment is assigned confidence scores and evidence handles, which are recorded in evidence.json. Next, facts and actions are separated into two contract-based intermediate representations. BIM-JSON stores geometry, topology, and key parameters for the current interpretation as a single source of truth, and Action-JSON stores a small set of atomic operations, with explicit bounds and idempotency keys, for execution in Revit. For type inference and parameter extraction tasks that are difficult to express using rules alone, a large language model operates only on these structured packets and plays the role of a policy corrector within the action schema. This model reviews unmatched labels and ambiguous matches, proposes limited action adjustments based on calibrated confidence, and, instead of creating new facts, attaches links to the evidence that supports the adjustments. To avoid amplifying misrecognition when label or object information is missing, the pipeline adopts a vertical stepwise validation procedure. Three LLM agents that run independently play the roles of Planner, Validator, and Mapper and review matching consistency and potential errors, and only then are BIM actions accepted. After that, the Preflight module performs rule-based checks on unit and coordinate consistency, clash and continuity, and schema conformity, and it rejects or separately isolates any action set that violates the contract. Finally, automatic modeling in Revit is performed through Python scripts only for Action-JSON that has completed validation, and the logs and evidence collected at each stage are recorded for feedback and auditing. Transactions are executed atomically so that partial model creation or side effects that are not left in the logs do not occur when errors arise. Geometric information is exported in GeoJSON format together with overlay images so that experts can quickly cross-check the results against evidence.json at key stages. By making use of vector data, the proposed pipeline can directly use accurate coordinate and topological information without relying on image recognition, and as a result it can reduce dimensional and positional errors in the generated models. By structuring the conversion from 2D vector drawings to BIM as an end-to-end process that includes explicit intermediate representations, evidence, and validation, this approach reduces repetitive intervention by engineers while maintaining determinism and traceability. Unlike existing pipelines that rely only on rule-based algorithms or only on large language models, the proposed architecture restricts the role of the language model to a scope with clear boundaries and maintains machine-readable evidence at points where engineering judgment is required. It also supports controlled rollback and re-execution and guarantees that repeated runs under the same conditions produce identical outputs that can be compared and audited. This study shows that BIM models can be generated from early-stage structural drawings while reducing modeling time by 41.3±4.8% compared with traditional manual modeling and keeping the manual modification rate at the level of 8.1±2.5% of all elements. Through experiments conducted on actual building projects in the domestic construction environment, it is confirmed that the proposed pipeline can be integrated into real workflows without changing drawing conventions. The results show that combining uncertainty-aware vector parsing with contract-like validation of actions allows large language models to make a practical contribution to the automation of drawing recognition and BIM modeling. In this configuration, the system also meets engineering requirements for accuracy, determinism, and auditability.