3 June 2026
1:30-1:35 PM SINT4CH Organizers
Welcome and Opening Remarks
1:35-2:15 PM Iro Armeni
“Unchaining the Parthenon: Generative Models for 3D Part Assembly in Cultural Heritage”
For nearly half a century, the anastylosis of the Parthenon has advanced marble block by marble block. What would it take to unchain it — to compress decades of manual fragment matching into days? As one step toward that goal, I will present Rectified Point Flow, a unified parameterization that casts pairwise point cloud registration and multi-part shape assembly as a single conditional generative problem. Given unposed point clouds, the model learns a continuous velocity field that transports noisy points to their target positions, recovering part poses while intrinsically discovering assembly symmetries without labels. RPF reaches state-of-the-art on six benchmarks and shows that geometric priors transfer across tasks as different as furniture and fractured artifacts. The harder puzzle remains open: discovering which weathered fragments belong together before any registration can begin. I will close on the challenges that are still remaining to solve this difficult task in practice.
2:15-2:55 PM Hadar Averbuch-Elor
“Dreaming From Fragments: Grounding Partial Reconstructions in the Wild”
Complete and richly detailed 3D reconstructions provide a powerful foundation for preserving, analyzing, and exploring cultural heritage sites. Yet for most historical landmarks, the available visual evidence is sparse, incomplete, and unevenly distributed, making such comprehensive reconstructions difficult to obtain. In this talk, I will present two recent works that explore how partial image observations can be grounded within larger spatial contexts, enabling fragmented reconstructions to be integrated into unified 3D models of historical sites. Together, these works highlight opportunities for moving beyond reconstruction alone, suggesting a path toward realizing the dream of complete digital reconstructions—not despite the fragments, but through them.
3:00-3:40 PM Qixing Huang
“A Retrospective Analysis of Computational Solutions to Fractured Object Reassembly”
This talk will review two decades of research on algorithmic fractured object reassembly, from early hand-crafted pipelines to more recent deep learning papers. We will focus on principles in core tasks such as fracture surface segmentation and matching and the use of template and discuss pros and cons of end-to-end pose regression and geometric matching. We will conclude the talk with open challenges in this domain.
3:40-5:00 PM Oral Session - Chair: Jing Zhang, Emanuele Balloni
3:40-4:00 PM Zeyu Jiang, Sihang Li, Siqi Tan, Chenyang Xu, Juexiao Zhang, Julia Galway-Witham, Xue Wang, Scott A. Williams, Radu Iovita, Chen Feng, Jing Zhang
CRAG: Can 3D Generative Models Help 3D Assembly?
4:00-4:20 PM Shuyi Yin
Documentation Infrastructure and Ethical Challenges in Spatial AI for Cultural Heritage
4:20-4:40 PM Hairong He, Wanhua Li, William Freeman, Mark Hamilton, Yilun Du, Hanspeter Pfister, Eugene Wang, Chenchen Lu, Ruojin Cai
Murals in Motion: Reimagining Medieval Chinese Dance from Poses in Murals
4:40-5:00 PM Sharva Gogawale, Gal Grudka, Daria Shapira, Omer Ventura, Berat Kurar, Nachum Dershowitz
Bag of Bags: Adaptive Visual Vocabularies for Genizah Join Image Retrieval
5:00-5:40 PM Deblina Bhattacharjee
“Beyond the Canvas: Trustworthy Multimodal AI for Artworks, Comics, and Colonial Archives”
Cultural heritage, including paintings, illustrated archives, museum collections, and colonial-era photographs, is increasingly being mediated by computer vision systems. Foundation models such as CLIP, BLIP, GPT-4V, and Gemini are being deployed to caption, classify, retrieve, and reconstruct heritage imagery, while diffusion models are used to inpaint damaged murals and re-render lost monuments. Yet the field has inherited assumptions from natural-image vision that travel poorly to non-photorealistic, multi-style, and historically loaded content: object detectors trained on COCO collapse on comic panels; depth estimators fail on flattened pictorial spaces; face-recognition models calibrated on contemporary Western faces misclassify subjects in ethnographic archives; and text-to-image models trained on web-scraped art now sit at the centre of active copyright litigation. This keynote argues that trustworthiness in heritage vision is a dense-prediction problem in its own right: a multimodal coordination of segmentation, depth, saliency, style, retrieval, and provenance signals across domains where ground truth is sparse, annotators are non-neutral, and the cost of a confident wrong answer is cultural harm. Drawing from my prior work on the AI for Visual Arts benchmark, Multitasking Transformers, Vision Transformer Adapters for cross-domain multitask learning, DUNIT for unsupervised image-to-image translation, and monocular depth in non-photorealistic imagery, I trace a research trajectory that connects technical methods (self-supervision, domain adaptation, diffusion-based augmentation, multitask transformers) to provenance and safety scaffolding such as datasheets for heritage datasets, C2PA content credentials, and the CARE principles for Indigenous data governance. The talk closes with five open challenges for the computer vision community, from non-Western benchmarks and ethnographic-gaze auditing to verifiable generative restoration, and an invitation to treat heritage as one of vision's most demanding and most consequential frontiers.
5:40-5:45 PM Closing Remarks