Alibaba Releases Two New AI Models for Image Interpretation (1)
The rapid evolution of A.I. tools, like Alibaba's image-reading models, underscores the need to harness A.I.'s capabilities responsibly. By releasing these models as open-source, Alibaba empowers users to customize the tools for app development or research.
Summary: Alibaba's New AI Models
- Alibaba, a major Chinese tech giant, unveiled two new open-source A.I. models: Qwen-VL and Qwen-VL-Chat.
- Unlike text-based models like ChatGPT and Google Bard, these are vision language models that interpret images.
- Qwen-VL-Chat's capabilities:
- Provides directions by analyzing street signs.
- Solves math problems from a photo.
- Constructs narratives from multiple images.
- Translates signs from Mandarin to English.
- Assists in captioning photos for news agencies.
- Qwen-VL is an enhanced version of Alibaba's existing image-reading chatbot, now supporting higher resolution images.
- Alibaba's announcement was limited to the public release, with no additional comments to Fortune.
Applications and Competition
- Alibaba's image-scanning tech can aid visually impaired individuals, e.g., scanning product labels and reading them aloud.
- The models will be accessible on Alibaba Cloud’s Modelscope and Hugging Face, a renowned startup with an A.I. model library.
- Meta, a day prior, introduced an A.I. model for coding, based on the open-source Llama 2 model from July.
- Alibaba has been racing to match Meta's A.I. advancements. They recently launched Qwen-7B and Qwen-7B-Chat, which are foundational to the latest releases.
- In a collaboration, Meta's Llama 2 model became available to the Chinese market via Alibaba’s cloud division in July.
Â
Â