资讯

Benchmark environment for evaluating vision-language models (VLMs) on popular video games! - alexzhang13/videogamebench ...
In our recent paper, we propose VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech. Several recent end-to-end text-to-speech (TTS) models enabling single ...