NOVA: Non‑Aligned Reference Image Quality Assessment for Novel View Synthesis

WACV 2026

Perceptual quality assessment for novel view synthesis using non‑aligned references.

1 Sony Interactive Entertainment     2 University of Texas at Austin
Aligned Reference Aligned reference
Non‑Aligned Reference Nearby, non-aligned reference
Novel View 1 Novel View 1
LPIPS ✗ DISTS ✗ ST‑LPIPS ✗ CrossScore ✗
Novel View 2 Novel View 2
Human preference ✓ NOVA ✓

Abstract

Evaluating the perceptual quality of novel view synthesis (NVS) is challenging when pixel‑aligned ground truth is unavailable. Full‑reference IQA methods break under misalignment, while no‑reference models often fail to generalize to NVS artifacts. We introduce the Non‑Aligned Reference (NAR) IQA setting for NVS and present NOVA, a LoRA‑enhanced DINOv2 model trained with supervised contrastive learning on localized synthetic distortions applied within motion‑aware Temporal Regions of Interest (TROI). NOVA robustly predicts human preferences using either aligned or non‑aligned reference views and achieves state‑of‑the‑art accuracy on a new NVS NAR‑IQA benchmark and strong correlations on NVS‑QA.


Method

TROI generation and synthetic distortions
Localized synthetic distortions applied within motion‑based TROIs to mimic realistic NVS artifacts.
NOVA architecture diagram
LoRA‑enhanced DINOv2 with dual outputs trained using two cosine triplet losses and a KL prior toward the frozen backbone.

Training Overview

  • Two triplet losses (cosine distance) with margins 0.3 and 0.1; KL divergence on embeddings with temperature annealing.
  • 63k high‑confidence triplets via IQA supervision filtering.
  • Input resolution 518×518; AdamW; 80 epochs on TROI‑annotated synthetic dataset.

NOVA Dataset

The NOVA benchmark contains 1,035 curated triplets drawn from 17 NeRF/GS scenes (with four train/test splits), each triplet consisting of two distorted views and either an aligned or a nearby non‑aligned reference from the same scene. Triplets underwent expert review to retain only high‑agreement cases. We will release the dataset to the research community.

17 scenes
1,035 triplets
NeRF & Gaussian Splatting
Expert‑rated

Dataset Page (UT Austin)

Investigators

Copyright


Citation

If you find NOVA useful in your research, please cite the following paper:

@InProceedings{Ghildyal_2026_WACV, author = {Ghildyal, Abhijay and Sureddi, Rajesh and Barman, Nabajeet and Zadtootaghaj, Saman and Bovik, Alan C}, title = {Non-Aligned Reference Image Quality Assessment for Novel View Synthesis}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {March}, year = {2026}, pages = {6350-6359} }