Representation Matters in AI-Generated Images
Media Inquiries
Results from artificial intelligence image generators can range from appropriate to downright offensive 鈥 particularly for cultures that aren鈥檛 well represented in the internet鈥檚 data.聽
An international team led from 一本道无码 used the Bridges-2 system and input from several different cultures to develop an effective fine-tuning approach,聽聽(SCoFT), for retraining a popular image generator so that it can generate equitable images for underrepresented cultures.
A research team led by , associate research professor at 一本道无码鈥檚 , is working on how to make generative AI models aware of the diversity of people and cultures.聽
鈥淲e wanted to use visual representation as a universal way of communication between people around the world,鈥 said Oh. 鈥淲e started generating images about Korea, China and Nigeria. We immediately observed that the popular foundation models are clueless about the world outside the U.S. If we redraw the world map based on what these models know it will be pretty skewed.鈥
Toward this goal, her team developed a novel fine-tuning approach and, thanks to an allocation from the NSF鈥檚 ACCESS project, used PSC鈥檚 Bridges-2 supercomputer to train new models and run sets of experiments to verify the performance of the proposed approach.
Bridges-2 enhances AI image generation
At one point, scientists developing the AI approaches underlying image generation thought that more available data would generate better results. Models trained on the internet, though, didn鈥檛 quite turn out that way.
Deep-learning聽AIs learn by brute force, beginning by making random guesses on a聽training dataset in which humans have labeled the right answers. As the computer makes good or bad guesses, it uses these labels to correct itself, eventually becoming accurate enough to test on data for which it isn鈥檛 given the answers. For the task of generating images based on requests made with text, an AI tool called Stable Diffusion is an example of the state of the art, having trained on the 5.85-billion text-to-image-pair LAION dataset.
But ask Stable Diffusion to give for a picture of a modern street in Ibadan, Nigeria, and it creates something that looks more like a Westerner鈥檚 negative stereotype. Other images may be less聽obviously聽offensive. In some ways that鈥檚 worse, because it鈥檚 harder to identify.
To improve on this, the RI team recruited people from five cultures to curate a small, culturally relevant dataset. Although this聽Cross-Cultural Understanding Benchmark (CCUB)聽dataset had only an average of about 140 text-to-image pairs for each culture, it allowed the team to retrain Stable Diffusion to teach it to generate images portraying each culture more accurately with less stereotyping when compared to the baseline model. The team also added the same fine-tuning step to images generated by the popular GPT-3 AI image generator.
Bridges-2 proved ideal for the work. PSC鈥檚 flagship system offers powerful image- and pattern-recognition-friendly graphics processing units (GPUs), and an architecture designed to help large data move efficiently through the computer without logjams. This enabled the scientists to fine-tune the AI in progressive steps that significantly improved the impressions that 51 people from five recipient cultures had from the resulting images. Their SCoFT method improved the judges鈥 perception of how well the image matched the text query and represented their cultures, and reduced the images鈥 offensiveness,
The team will present at the 2024 IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR 24) in June.聽
The Pittsburgh Supercomputing Center is a joint computational research center with 一本道无码 and the University of Pittsburgh. PSC provides university, government and industrial researchers with access to several of the most powerful systems for high performance computing, communications, and data storage available to scientists and engineers nationwide for unclassified research.