Analysis-Paralysis
A comparative analysis of text-to-image models, setting the stage for working on text-to-scene generation.

Kaveesh Khattar
Dec 30, 2023

The Problem Statement
To identify the advantages and disadvantages of various text-to-image creation techniques and learn about the underlying mechanisms that contribute to their picture synthesis skills by investigating their architectural designs.
Datasets
The datasets used in the comparative analysis paper encompass a diverse range of applications in computer vision and multimedia research.
YFCC100M, a massive collection of 100 million Flickr images and videos, supports advancements in visual perception, while MS-COCO, with its rich annotations for object detection and segmentation, has fueled cutting-edge research in visual comprehension.
The CUB dataset focuses on fine-grained bird species recognition, and the Oxford-102 Flowers dataset aids in the fine-grained classification of floral species. Together, these datasets enable significant progress in object detection, classification, and attribute prediction tasks.
KTH Action Recognition: Evaluates human action recognition, featuring six activities like walking, running, and boxing, with spatio-temporal analysis focus.
UCF Sports: Highlights sports activity recognition in videos, with diverse action classes like basketball and diving, aiding sports video analysis.
Architectures
Our work emphasized deeper experimentation with GAN-based approaches, given their impact on image synthesis.
Metrics
We explored several metrics used to evaluate AI generated images. The Inception Score (IS) measures image quality and diversity by comparing class probabilities of generated images. The Fréchet Inception Distance (FID) quantifies the similarity between real and generated image distributions, with lower FID indicating better quality.
For subjective evaluation, the Mean Opinion Score (MOS) involved human ratings of image fidelity on a numerical scale, with higher scores reflecting greater realism. These metrics collectively ensured a balanced assessment of quality, diversity, and user perception.
The Final Result
We submitted our paper to ACI 2023 for publication, and our paper was successfully published!
