NOVA

NOVA: Non‑Aligned Reference Image Quality Assessment for Novel View Synthesis

A contrastive‑learning approach that evaluates NVS quality using nearby, non‑aligned reference views.

Abhijay Ghildyal1 Rajesh Sureddi2 Nabajeet Barman1 Saman Zadtootaghaj1 Alan C. Bovik2
1 Sony Interactive Entertainment    2 University of Texas at Austin
📄 Paper (PDF) 📦 Dataset 💻 Code 🔖 BibTeX
Aligned reference
Aligned reference
Nearby, non‑aligned reference
Nearby, non‑aligned reference
Distorted candidate 1
Candidate 1
Distorted candidate 2
Candidate 2
Human preference: Candidate 2 ✓ NOVA: Candidate 2 ✓ Baselines: LPIPS → 1 ✗ DISTS → 1 ✗ ST‑LPIPS → 1 ✗ CrossScore → 1 ✗

Abstract

Evaluating the perceptual quality of novel view synthesis (NVS) is challenging when pixel‑aligned ground truth is unavailable. Full‑reference IQA methods break under misalignment, while no‑reference models often fail to generalize to NVS artifacts. We introduce the Non‑Aligned Reference (NAR) IQA setting for NVS and present NOVA, a LoRA‑enhanced DINOv2 model trained with supervised contrastive learning on localized synthetic distortions applied within motion‑aware Temporal Regions of Interest (TROI). NOVA robustly predicts human preferences using either aligned or non‑aligned reference views and achieves state‑of‑the‑art accuracy on a new NVS NAR‑IQA benchmark and strong correlations on NVS‑QA.

Method

TROI generation and synthetic distortions
Localized synthetic distortions are applied within motion‑based TROIs to mimic realistic NVS artifacts.
NOVA architecture diagram
LoRA‑enhanced DINOv2 with dual outputs trained using two cosine triplet losses and a KL prior toward the frozen backbone.

Training Overview

NOVA Dataset

The NOVA benchmark contains 1,035 curated triplets drawn from 17 NeRF/GS scenes (with four train/test splits), each triplet consisting of two distorted views and either an aligned or a nearby non‑aligned reference from the same scene. Triplets underwent expert review to retain only high‑agreement cases. We will release the dataset to the research community.

17 scenes
1,035 triplets
NeRF & Gaussian Splatting
Expert‑rated

Dataset Page (UT Austin) Access & license details are provided on the dataset page.

Investigators

Copyright

Copyright (c) 2026 The University of Texas at Austin. All rights reserved. Use permitted with attribution; see dataset readme/license for terms.

BibTeX

@inproceedings{ghildyal2026nova,
  title     = {Non-Aligned Reference Image Quality Assessment for Novel View Synthesis},
  author    = {Ghildyal, Abhijay and Sureddi, Rajesh and Barman, Nabajeet and Zadtootaghaj, Saman and Bovik, Alan C.},
  booktitle = {Submitted to WACV},
  year      = {2026}
}