Mess

Chen, Xuyang

MeSS: City Mesh-Guided Outdoor Scene Generation with Cross-View Consistent Diffusion

Xuyang Chen^1,3, Zhijun Zhai², Kaixuan Zhou³ Zengmao Wang² Jianan He³ Dong Wang³ Yanfeng Zhang³ Mingwei Sun^2,3 Rüdiger Westermann¹ Konrad Schindler⁴ Liqiu Meng¹

¹TU Munich ²Wuhan University ³Huawei Riemann Lab ⁴ETH Zurich
Arxiv

Paper Code arXiv

Starting from textureless urban meshes, our MeSS synthesizes high-quality Gaussian Splatting Scenes with realistic appearance. After synthesis, these Gaussian scenes can be further rendered into stylized videos.

Abstract

Mesh models have become increasingly accessible for numerous cities; however, the lack of realistic textures restricts their application in virtual urban navigation and autonomous driving. To address this, this paper proposes MeSS (Meshbased Scene Synthesis) for generating high-quality, styleconsistent outdoor scenes with city mesh models serving as the geometric prior. While image and video diffusion models can leverage spatial layouts (such as depth maps or HD maps) as control conditions to generate street-level perspective views, they are not directly applicable to 3D scene generation. Video diffusion models excel at synthesizing consistent view sequences that depict scenes but often struggle to adhere to predefined camera paths or align accurately withrendered control videos. In contrast, image diffusion models, though unable to guarantee cross-view visual consistency,can produce more geometry-aligned results when combinedwith ControlNet. Building on this insight, our approach enhances image diffusion models by improving cross-view consistency. The pipeline comprises three key stages: first, wegenerate geometrically consistent sparse views using Cascaded Outpainting ControlNets; second, we propagate denserintermediate views via a component dubbed AGInpaint; andthird, we globally eliminate visual inconsistencies (e.g., varying exposure) using the GCAlign module. Concurrently withgeneration, a 3D Gaussian Splatting (3DGS) scene is reconstructed by initializing Gaussian balls on the mesh surface. Our method outperforms existing approaches in both geometric alignment and generation quality. Once synthesized,the scene can be rendered in diverse styles through relighting and style transfer techniques.

Method

The MeSS pipeline is designed to synthetically generate viewpoints to reconstruct gaussian scene following a sparse-to-dense scheme. Given a 3D city map (i.e., a mesh model with semantic and instance labels but without texture), we specify a virtual camera path via a sequence of M views. In Stage I, we generate a subset of N key view images along the sequence via a warp-and-outpaint procedure: starting from the initial key frame generated by geometric conditioned ControlNet-S, each proceeding key frame is warped as an additional condition to the outpainting of new key frame with ControlNet-N (Sec. 3.2). After obtaining all key frames, we use them to construct a Gaussian field through optimizing gaussian surfels on the surface of mesh models. In Stage II, we render from gaussian scene the intermediate views between each pair of subsequent key views. Artifacts like silhouettes in intermediate frames are filled up by Appearance Guided Inpainting (Sec. 3.3). Lastly, Global Consistency Alignment (Sec. 3.4) further enhances the appearance consistency of gaussian surfels learned from different views.

Stylized Videos trhough Relighting or SDEdit

BibTeX

@article{chen2025mess,
  title={Mess: City mesh-guided outdoor scene generation with cross-view consistent diffusion},
  author={Chen, Xuyang and Zhai, Zhijun and Zhou, Kaixuan and Wang, Zengmao and He, Jianan and Wang, Dong and Zhang, Yanfeng and Westermann, R{\"u}diger and Schindler, Konrad and Meng, Liqiu and others},
  journal={arXiv preprint arXiv:2508.15169},
  year={2025}
}





  
  
    
      
        

          
            This page was built using the Academic Project Page Template which was adopted from the Nerfies project page.
            You are free to borrow the source code of this website, we just ask that you link back to this page in the footer. 
 This website is licensed under a Creative
            Commons Attribution-ShareAlike 4.0 International License.

More Works from Our Lab

Paper Title 1

Paper Title 2

Paper Title 3