Modern metrics for generative learning like Fréchet Inception Distance (FID) and DINOv2-Fréchet Distance (FD-DINOv2) demonstrate impressive performance. However, they suffer from various shortcomings, like a bias towards specific generators and datasets. To address this problem, we propose the Fréchet Wavelet Distance (FWD) as a domain-agnostic metric based on the Wavelet Packet Transform (\(\mathcal{W}_p\)) . FWD provides a sight across a broad spectrum of frequencies in images with a high resolution, preserving both spatial and textural aspects. Specifically, we use \(\mathcal{W}_p\) to project generated and real images to the packet coefficient space. We then compute the Fréchet distance with the resultant coefficients to evaluate the quality of a generator. This metric is general-purpose and dataset-domain agnostic, as it does not rely on any pre-trained network while being more interpretable due to its ability to compute Fréchet distance per packet, enhancing transparency. We conclude with an extensive evaluation of a wide variety of generators across various datasets that the proposed FWD can generalize and improve robustness to domain shifts and various corruptions compared to other metrics.
We propose the Fréchet Wavelet Distace (FWD) to primarily tackle the dataset-domain bias which inturn leverages Wavelet Packet Transform (\(\mathcal{W}_p\)). In a nutshell, \(\mathcal{W}_p\) constructs a tree of images by recursively convolving with predefined filters. Example of \(\mathcal{W}_p\) in action is provided in below figure.
The metric is computed in three steps. First, we compute the \(p^{th}\) packet mean for the set of \(N\) generated images $$\mu_{g_p} = \frac{1}{N}\sum_{n=1}^{N} \mathcal{W}(I_n)_p,$$ where \(I_n\) represents \(n^{th}\) image in \(N\) images. Second, we compute the convariance matrix as follows $$\Sigma_{g_p} = \frac{1}{N-1}\sum_{n=1}^{N}(\mathcal{W}(I_n)_p-\mu_{g_p})(\mathcal{W}(I_n)_p-\mu_{g_p})^T.$$ Similarly, we compute the mean and covariance for the real set of images and represent as \(\mu_{r_p}\) and \(\Sigma_{r_p}\) respectively. Finally, we compute the Fréchet distance across all \(P\) packets as $$ \text{FWD} = \frac{1}{P} \sum_{p=1}^{P} ||\mu_{r_p}-\mu_{g_p}||_2^2 + \text{Trace}(\Sigma_{r_p}+\Sigma_{g_p}-2\sqrt{\Sigma_{r_p}\Sigma_{g_p}}).$$
Since FWD averages Fréchet distances across all frequency bands, this design choice allows one to understand overall score. The below figure depicts per-packet FWD for both StyleGAN2 and DDGAN. We observe frequency characteristics of DDGAN images are near to original dataset compared to StyleGAN2 images.
pip install pytorchfwd
@inproceedings{
veeramacheneni2025fwd,
title={Fr\'echet Wavelet Distance: A Domain-Agnostic Metric for Image Generation},
author={Lokesh Veeramacheneni and Moritz Wolter and Hildegard Kuehne and Juergen Gall},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=QinkNNKZ3b}
}