
Alibaba’s Qwen group launched a brand new picture technology synthetic intelligence (AI) mannequin final week. Dubbed Qwen VLo, it’s a successor to the Qwen 2.5 imaginative and prescient language mannequin and comes with a number of upgrades in comparison with the older fashions. The newest AI picture mannequin helps each text-to-image and image-to-image technology. It additionally helps textual content enter in a number of languages, together with English and Chinese language. Aside from picture technology, the AI mannequin can also be able to making inline edits to generated pictures in addition to enter pictures.
Qwen VLo Accepts Prompts in A number of Languages
In a post on X (previously referred to as Twitter), the official deal with of the Qwen group introduced the discharge of the brand new mannequin. The mannequin’s technical identify is Qwen3-235B-A22B, and it’s accessible on the corporate’s chat interface free of charge right here. Customers may also use the mannequin with out logging in.
Devices 360 workers members examined out the AI mannequin and located its picture technology functionality to be on par with Google’s Imagen 2. The instruction following and picture output high quality is barely decrease than Imagen-3 and OpenAI’s GPT-4o-powered picture technology function. Nonetheless, its technology time is quicker than each of them, and it has the next fee restrict than them.
On its GitHub page, the corporate stated that the Qwen VLo comes with improved picture understanding, which permits it to make higher inline edits with out distorting the structural integrity of the enter picture. This additionally improves the general high quality of the output. The mannequin additionally higher understands obscure and open-ended prompts, and might generate pictures which are aligned with person expectations.
Aside from picture technology and enhancing, the Qwen VLo may also carry out picture annotation-related duties corresponding to edge detection, segmentation, prediction mapping, and extra. The corporate stated the longer term model of the mannequin can even have the ability to settle for a number of enter pictures and mix them primarily based on person requests.
Textual content rendering has additionally been improved with the most recent AI picture generator. We had been capable of generate correct textual content throughout totally different fonts in our testing of the mannequin. Lastly, the Qwen VLo additionally helps pictures with dynamic facet ratios as enter, together with excessive ratios corresponding to 4:1 and 1:3. The corporate plans so as to add the function to generate pictures in several facet ratios quickly.