A contrastive‑learning approach that evaluates NVS quality using nearby, non‑aligned reference views.
Evaluating the perceptual quality of novel view synthesis (NVS) is challenging when pixel‑aligned ground truth is unavailable. Full‑reference IQA methods break under misalignment, while no‑reference models often fail to generalize to NVS artifacts. We introduce the Non‑Aligned Reference (NAR) IQA setting for NVS and present NOVA, a LoRA‑enhanced DINOv2 model trained with supervised contrastive learning on localized synthetic distortions applied within motion‑aware Temporal Regions of Interest (TROI). NOVA robustly predicts human preferences using either aligned or non‑aligned reference views and achieves state‑of‑the‑art accuracy on a new NVS NAR‑IQA benchmark and strong correlations on NVS‑QA.
The NOVA benchmark contains 1,035 curated triplets drawn from 17 NeRF/GS scenes (with four train/test splits), each triplet consisting of two distorted views and either an aligned or a nearby non‑aligned reference from the same scene. Triplets underwent expert review to retain only high‑agreement cases. We will release the dataset to the research community.
Dataset Page (UT Austin) Access & license details are provided on the dataset page.
Copyright (c) 2026 The University of Texas at Austin. All rights reserved. Use permitted with attribution; see dataset readme/license for terms.
@inproceedings{ghildyal2026nova,
title = {Non-Aligned Reference Image Quality Assessment for Novel View Synthesis},
author = {Ghildyal, Abhijay and Sureddi, Rajesh and Barman, Nabajeet and Zadtootaghaj, Saman and Bovik, Alan C.},
booktitle = {Submitted to WACV},
year = {2026}
}