Abstract: Most examinations of neural networks’ learned latent spaces typically employ dimensionality reduction techniques such as t-distributed stochastic neighbor embedding (t-SNE) or uniform ...
Abstract: Variational Autoencoder (VAE), compressing videos into latent representations, is a crucial preceding component of Latent Video Diffusion Models (LVDMs). However, most LVDMs utilize 2D image ...