Abstract
Evaluating the perceptual quality of novel view synthesis (NVS) is challenging when pixel‑aligned ground truth is unavailable. Full‑reference IQA methods break under misalignment, while no‑reference models often fail to generalize to NVS artifacts. We introduce the Non‑Aligned Reference (NAR) IQA setting for NVS and present NOVA, a LoRA‑enhanced DINOv2 model trained with supervised contrastive learning on localized synthetic distortions applied within motion‑aware Temporal Regions of Interest (TROI). NOVA robustly predicts human preferences using either aligned or non‑aligned reference views and achieves state‑of‑the‑art accuracy on a new NVS NAR‑IQA benchmark and strong correlations on NVS‑QA.
- First subjective study of NAR‑IQA in NVS.
- New benchmark across 17 scenes (NeRF & GS) with 1,035 curated triplets.
- Contrastive framework with IQA‑model supervision (DISTS & DeepDC) + KL regularization toward DINOv2.
- Robust under spatial misalignment; small gap between aligned and non‑aligned performance.