VGGT-Segmentor:
Geometry-Enhanced Cross-View Segmentation

CVPR 2026 Oral 🎉

Yulu Gao^*¹, Bohao Zhang^*², Zongheng Tang¹, Jitong Liao², Wenjun Wu², Si Liu^†²

1. Hangzhou International Innovation Institute of Beihang University 2. Beihang University

* denotes equal contribution,

† denotes corresponding author

Exo → Ego Cross-View Segmentation

We propose VGGT-Segmentor, a geometry-aware framework for segmenting the same physical object across egocentric and exocentric views. Built upon VGGT[1], with powerful multi-view geometric representations, VGGT-Segmentor combines mask prompt fusion, point-guided prediction, and iterative refinement to achieve robust cross-view segmentation under extreme viewpoint, scale, and occlusion changes.

Project video

Overview presentation

Hope you enjoy our overview video! Please check it out for a fun and intuitive introduction to VGGT-Segmentor and our geometry-enhanced cross-view segmentation framework.

The Model

VGGT-Segmentor consists of a VGGT Encoder and a lightweight Union Segmentation Head. The Union Segmentation Head is composed of three stages: Mask Prompt Fusion, Point-Guided Prediction, and Mask Refinement. During the design of the Union Segmentation Head, we drew significant inspiration from Segment Anything Model 2[2].

The Results

We evaluate our method on the Ego-Exo4D benchmark and report the results here. Our approach achieves 67.7% IoU on Ego→Exo and 68.0% IoU on Exo→Ego, surpassing the previous state-of-the-art method, DOMR, by 18.0% and 12.8%, respectively. Compared to the LLM-based ObjectRelator, our method outperforms it by 22.3% and 17.1% in the two directions. In the zero-shot learning(ZSL) setting, our model achieves 54.1% IoU on Ego→Exo and 58.4% IoU on Exo→Ego.

Benchmarks

Results on EgoExo4D

Citation

If you use VGGT-Segmentor in your research, please use the following BibTeX entry.

@article{gao2026vggt,
  title={VGGT-Segmentor: Geometry-Enhanced Cross-View Segmentation},
  author={Gao, Yulu and Zhang, Bohao and Tang, Zongheng and Liao, Jitong and Wu, Wenjun and Liu, Si},
  journal={arXiv preprint arXiv:2604.13596},
  year={2026}
}