Fréchet Wavelet Distance: A Domain-Agnostic Metric for Image Generation

1University of Bonn 2University of Tuebingen 3MIT-IBM Watson AI Lab 4Lamarr Institute for Machine Learning and Artificial Intelligence

First two images depict the same person, while last image is different. MSE of InceptionV3 predictions is lower for last two images whereas MSE of Wavelet packets show first two images are similar.

Abstract

Modern metrics for generative learning like Fréchet Inception Distance (FID) and DINOv2-Fréchet Distance (FD-DINOv2) demonstrate impressive performance. However, they suffer from various shortcomings, like a bias towards specific generators and datasets. To address this problem, we propose the Fréchet Wavelet Distance (FWD) as a domain-agnostic metric based on the Wavelet Packet Transform (\(\mathcal{W}_p\)) . FWD provides a sight across a broad spectrum of frequencies in images with a high resolution, preserving both spatial and textural aspects. Specifically, we use \(\mathcal{W}_p\) to project generated and real images to the packet coefficient space. We then compute the Fréchet distance with the resultant coefficients to evaluate the quality of a generator. This metric is general-purpose and dataset-domain agnostic, as it does not rely on any pre-trained network while being more interpretable due to its ability to compute Fréchet distance per packet, enhancing transparency. We conclude with an extensive evaluation of a wide variety of generators across various datasets that the proposed FWD can generalize and improve robustness to domain shifts and various corruptions compared to other metrics.

Method





We propose the Fréchet Wavelet Distace (FWD) to primarily tackle the dataset-domain bias which inturn leverages Wavelet Packet Transform (\(\mathcal{W}_p\)). In a nutshell, \(\mathcal{W}_p\) constructs a tree of images by recursively convolving with predefined filters. Example of \(\mathcal{W}_p\) in action is provided in below figure.

The metric is computed in three steps. First, we compute the \(p^{th}\) packet mean for the set of \(N\) generated images $$\mu_{g_p} = \frac{1}{N}\sum_{n=1}^{N} \mathcal{W}(I_n)_p,$$ where \(I_n\) represents \(n^{th}\) image in \(N\) images. Second, we compute the convariance matrix as follows $$\Sigma_{g_p} = \frac{1}{N-1}\sum_{n=1}^{N}(\mathcal{W}(I_n)_p-\mu_{g_p})(\mathcal{W}(I_n)_p-\mu_{g_p})^T.$$ Similarly, we compute the mean and covariance for the real set of images and represent as \(\mu_{r_p}\) and \(\Sigma_{r_p}\) respectively. Finally, we compute the Fréchet distance across all \(P\) packets as $$ \text{FWD} = \frac{1}{P} \sum_{p=1}^{P} ||\mu_{r_p}-\mu_{g_p}||_2^2 + \text{Trace}(\Sigma_{r_p}+\Sigma_{g_p}-2\sqrt{\Sigma_{r_p}\Sigma_{g_p}}).$$

Results

Interpretability

Since FWD averages Fréchet distances across all frequency bands, this design choice allows one to understand overall score. The below figure depicts per-packet FWD for both StyleGAN2 and DDGAN. We observe frequency characteristics of DDGAN images are near to original dataset compared to StyleGAN2 images.


Domain bias detection

Interpolation end reference image.

FID prefers Proj. FastGAN over DDGAN in all datasets, whereas FWD prefers DDGAN. FD-DINOv2 agree with FWD in all datasets except DNDD.



Computational Complexity

Interpolation end reference image.

Computational efficiency of FID, FD-DINOv2, and FWD.



Use FWD in your evaluation

pip install pytorchfwd

BibTeX

@inproceedings{
veeramacheneni2025fwd,
title={Fr\'echet Wavelet Distance: A Domain-Agnostic Metric for Image Generation},
author={Lokesh Veeramacheneni and Moritz Wolter and Hildegard Kuehne and Juergen Gall},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=QinkNNKZ3b}
}