NOVA

NOVA: Non‑Aligned Reference Image Quality Assessment for Novel View Synthesis

A contrastive‑learning approach that evaluates NVS quality using nearby, non‑aligned reference views.

Abhijay Ghildyal¹ Rajesh Sureddi² Nabajeet Barman¹ Saman Zadtootaghaj¹ Alan C. Bovik²

¹ Sony Interactive Entertainment ² University of Texas at Austin

📄 Paper (PDF) 📦 Dataset 💻 Code 🔖 BibTeX

Abstract

Evaluating the perceptual quality of novel view synthesis (NVS) is challenging when pixel‑aligned ground truth is unavailable. Full‑reference IQA methods break under misalignment, while no‑reference models often fail to generalize to NVS artifacts. We introduce the Non‑Aligned Reference (NAR) IQA setting for NVS and present NOVA, a LoRA‑enhanced DINOv2 model trained with supervised contrastive learning on localized synthetic distortions applied within motion‑aware Temporal Regions of Interest (TROI). NOVA robustly predicts human preferences using either aligned or non‑aligned reference views and achieves state‑of‑the‑art accuracy on a new NVS NAR‑IQA benchmark and strong correlations on NVS‑QA.

First subjective study of Non-Aligned‑IQA for NVS.
New benchmark across 17 scenes (NeRF & GS) with 1,035 curated triplets.
Contrastive framework with IQA‑model supervision + KL regularization toward DINOv2.
Robust under spatial misalignment, with a marginal gap between aligned and non‑aligned performance.

Method

TROI generation and synthetic distortions — Localized synthetic distortions are applied within motion‑based TROIs to mimic realistic NVS artifacts.

NOVA architecture diagram — LoRA‑enhanced DINOv2 with dual outputs trained using two cosine triplet losses and a KL prior toward the frozen backbone.

Training Overview

Two triplet losses (cosine distance) with margins 0.3 and 0.1; KL divergence on embeddings with temperature annealing.
63k high‑confidence triplets via IQA supervision filtering.
Input resolution 518×518; AdamW; 80 epochs on TROI‑annotated synthetic dataset.

NOVA Dataset

The NOVA benchmark contains 1,035 curated triplets drawn from 17 NeRF/GS scenes (with four train/test splits), each triplet consisting of two distorted views and either an aligned or a nearby non‑aligned reference from the same scene. Triplets underwent expert review to retain only high‑agreement cases. We will release the dataset to the research community.

17 scenes

1,035 triplets

NeRF & Gaussian Splatting

Expert‑rated

Dataset Page (UT Austin) Access & license details are provided on the dataset page.

Investigators

Abhijay Ghildyal (Sony Interactive Entertainment)
Rajesh Sureddi (University of Texas at Austin)
Nabajeet Barman (Sony Interactive Entertainment)
Saman Zadtootaghaj (Sony Interactive Entertainment)
Alan C. Bovik (University of Texas at Austin)

Copyright

BibTeX

@inproceedings{ghildyal2026nova,
  title     = {Non-Aligned Reference Image Quality Assessment for Novel View Synthesis},
  author    = {Ghildyal, Abhijay and Sureddi, Rajesh and Barman, Nabajeet and Zadtootaghaj, Saman and Bovik, Alan C.},
  booktitle = {Submitted to WACV},
  year      = {2026}
}