Very cool demos, OpenAI, and huge props for doing them live. What a great time to be in the arena. My prediction: We see all these capabilities in open source in 60 days or less. Most of the functionality already exists in bits and pieces...
Interestingly, they’re comparing to the “in-progress” scores of Llama3 400b and are barely scoring beyond them.
Things are moving quickly
Looking over this link it says - Uses static images of a video and describes them as a whole. - 128k context window allows for the static images of a video to be described in a single API call. - All frames do not need to be sent. - TTS is generated from these frames description plus a system prompt to return a mp3 … I feel like all of this tech already exists with LlaVa, a TTS engine and regular Turbo 3.5 or any other open source model. https://cookbook.openai.com/examples/gpt_with_vision_for_video_understanding
Bits and pieces are exactly the problem in open source and early stage tech
Would have loved to see them do something to keep competition away. You can decide to ignore competition, consumers are not ignoring perplexity, claude and now llama via whatsapp :)
I bet it’s less than 6 weeks ;)
Couldn't agree more. 60 days or less - now let that sink.
COO / President (Cloud, Sales/GTM, Supply Chain, and Operations) @ Groq
7momore cool demos here: https://www.youtube.com/@OpenAI/videos