Controllable synthetic image generation data is a crucial component in training deep learning models, especially for computer vision tasks, where models rely on large, diverse, and well-annotated datasets to learn visual patterns and make accurate predictions. However, collecting and labeling real-world data is often expensive, time-consuming, and difficult to scale. Annotation requires significant human effort and expertise, particularly when precise labels like segmentation masks or bounding boxes are needed. Moreover, capturing edge cases—such as rare scenarios or dangerous situations like traffic accidents or hazardous environments—is particularly challenging, either due to their infrequency or ethical and safety concerns during data collection. In this context, frameworks that enable synthetic image generation become highly valuable. They allow for the creation of diverse and annotated datasets at scale, including rare or risky scenarios, helping overcome many limitations of traditional data collection and significantly accelerating the development of robust computer vision models.