Generation of 3D Image by Editing the Background Using Only Prompt
Main Article Content
Abstract
This study explores an innovative approach to creating 3D visuals by altering backdrops with textual cues, leveraging recent advancements in AI, machine learning, and generative models. Traditionally, 3D modeling required expert knowledge and sophisticated tools, but AI-powered generative models now allow users to create realistic 3D scenes using simple text descriptions. The study aims to develop a flexible model capable of multimodal shape denoising, conditional synthesis, and shape interpolation, catering to applications in virtual reality (VR), gaming, design, and entertainment. A key challenge in 3D shape development is integrating global and local information, with hierarchical latent spaces offering a solution by capturing both large-scale structures and fine-grained details. Hierarchical DE denoising diffusion models (DDMs) are employed to produce high-quality 3D shapes through iterative refinement. Surface reconstruction ensures smooth, realistic models for practical applications, such as digital art and VR. Additionally, prompt-based 3D creation enables quick prototyping, enhancing creative workflows in industries like filmmaking and gaming. Ethical considerations, including copyright and ownership of AI-generated content, are also discussed.