05.13.2026

Crossing the Virtual-Physical Boundary: Building Next-Generation Data Center Compliance Validation with NVIDIA Cosmos and OpenUSD

AI Research, AgarudaSean Huang, Liam Huang

中文版本


When building and operating modern large-scale data centers or smart factories, compliance is often one of the most challenging issues for engineering teams. From early-stage spatial and piping design to ISO-based occupational safety requirements during operations, traditional compliance workflows rely heavily on manual drawing reviews and on-site inspections. These processes are time-consuming and difficult to fully protect against human oversight.

To address this pain point, Agaruda recently introduced a dual-track automated validation workflow. On the operations side, we use NVIDIA Cosmos, a world foundation model built for Physical AI, to perform video-based reasoning. On the design side, we work directly with the underlying architecture of OpenUSD to develop automated spatial geometry inspection tools.

Below are two cases from how we applied these technologies in production compliance work.

High-angle 3D rendering of an industrial cooling system with multiple fan units and piping on a rooftop.

1. Operational compliance automation: video reasoning with NVIDIA Cosmos Reason

In day-to-day operations, ensuring that on-site environments comply with ISO requirements is an extremely labor-intensive task. Traditional computer vision models are often limited to single-object detection and struggle to understand complex temporal sequences and physical-world context. To overcome this limitation, we introduced NVIDIA Cosmos-Reason2, a model with advanced spatiotemporal understanding capabilities.

Practical Use Case: Physical Separation Validation Between Server Racks and Liquid-Cooling Piping

Physical separation between coolant systems and IT equipment is one of the highest-priority requirements in data center safety. For example, one critical safety rule states that liquid-cooling pipes must never pass directly above server racks without a physical barrier, to prevent potential coolant leakage from dripping directly onto high-value computing equipment.

In the past, engineers had to manually inspect the ceiling area section by section. Now, we feed inspection camera footage into Cosmos-Reason2. The model can identify “liquid-cooling pipes” and “server racks,” but more importantly, it can understand the three-dimensional spatial relationship and physical occlusion within the scene.

Through video reasoning, the AI automatically generates structured compliance reports. For example, it can accurately determine that the liquid-cooling pipe in the video passes directly above the server rack, and that there is no physical ceiling or protective barrier between them. As a result, the system immediately triggers an alert, classifies the area as “Non-compliant,” and assigns the risk level as Critical.

In this workflow, the AI acts like a senior engineering auditor with an uncompromising eye for detail. It can instantly identify hidden physical safety risks, significantly reducing both the false-negative rate of traditional video analytics and the cost of manual inspections.

Top-down 3D schematic of a modular data center showing server racks, cooling pipes, and a white structural frame.

2. Overcoming the data bottleneck: generative applications with Cosmos Predict and Transfer

Beyond passive inspection, we also began thinking about a deeper challenge: how can we train edge AI systems for rare but catastrophic edge cases?

We cannot, and should not, start a fire inside a real data center just to collect training data. This is where the predictive and controllable generation capabilities of Cosmos become our synthetic data engine. By combining different control conditions, we can generate diverse training scenarios while preserving accurate physical structures. The workflow can be divided into three stages:

  • Baseline Hazard Prediction with Cosmos Predict:
    First, we use the Cosmos Predict model to generate a physically plausible dynamic video. In our current use case, we directly generate a simulated video of “large amounts of smoke emerging inside a data center server room” as the baseline scenario, as shown in Video 1.
  • Structural Feature Extraction through Edge Extraction:
    To preserve the correct geometric layout of server racks, liquid-cooling pipes, and smoke diffusion contours when modifying the scene later, we extract precise edge features from the video, as shown in Video 2. This step ensures absolute consistency in the spatial structure.
  • Style Transfer and Scene Generation with Edge + Text Prompt:
    Finally, we use the extracted edge map as a control condition and combine it with a specific text prompt. The model can then generate videos with entirely different lighting styles, equipment materials, or environmental conditions while preserving the original spatial outline of “smoke emerging from server racks,” as shown in Video 3.

Through this precise Edge + Text Prompt control technique, we can generate a large number of extreme hazard scenarios in a virtual environment. This allows us to expand the training dataset for edge AI at a very low cost, ensuring that anomaly detection systems have sufficient stability and reliability when facing real-world emergencies.


3. Design compliance defense: physical-space validation with OpenUSD

AI solves the “dynamic” challenges during operations, but for the “static” design of hardware infrastructure, what we need is highly accurate and deterministic validation.

In ISO-related data center construction requirements, one fundamental rule is clear: secondary liquid-cooling pipes must never pass directly above server racks. This prevents any potential coolant leakage from dripping onto critical IT equipment and causing short circuits. However, in large and complex 3D piping design models, manual visual inspection is inefficient and highly prone to omissions.

To address this, we built an automated spatial compliance validation workflow on top of OpenUSD. Instead of relying on a heavy rendering engine, we directly parse the structural data from the underlying USD layer:

  • Parsing USD Hierarchy and Semantics:
    The system can directly read USD files and traverse complex 3D assembly structures to accurately extract the attributes and classifications of each object. This allows the program to clearly distinguish the locations of “server racks” and “liquid-cooling pipes.”
  • Automated Spatial Relationship Computation:
    After acquiring object information, the algorithm analyzes their relative coordinates in 3D space. By using vertical projection from a top-down view and comparing elevation data, it can quickly identify areas where objects overlap vertically.
  • Precise Compliance Alerting:
    Once the system detects that the liquid-cooling piping path overlaps with the area above a server rack, it immediately triggers a compliance violation alert and automatically generates a visualization containing the abnormal coordinates.

This approach enables us to eliminate physical design risks during the digital twin stage, before engineering procurement and construction work begins.

Elevation plot showing blue rectangular spatial objects with one overlapping pair highlighted by a red box and arrow.

Conclusion

Automated compliance requires a dual approach.

We use the scene-structure parsing capabilities of OpenUSD to ensure that physical space design meets a deterministic compliance baseline. At the same time, we use the video reasoning and world generation capabilities of the NVIDIA Cosmos model family to give the system the ability to understand dynamic reality and predict future scenarios.

By moving from passive “post-event auditing” toward proactive “prevention and automated reasoning,” Physical AI and digital twin technologies are reshaping how compliance gets done in the industrial sector.


Related