Open AI and the startup robotics firm Figure AI have released a video demonstrating the capabilities of a new visual language model.
Figure 01 uses in "speech-to-speech" analysis via OpenAI's multimodal model, VLM, to understand images and texts.
The robot relies on another voice conversation to craft its responses, as compared to written prompts in other Open AI technology, according to a release from the company.
In the video, the robot has a natural-sounding voice, and will continue to rewrite its own code as it interacts with more users by Figure's neural net.