Our CEO Mokshith Voodarla recently chatted with Steven Vonder Haar from IntelliVid Research about how Sieve is building a new set of primitives to understand, manipulate and create video. Today this is being applied by developers primarily in social and creative use cases, but we see these same primitives becoming key tools in powering the future of VR, gaming, and robotics as well. Watch the original video below or on YouTube: https://lnkd.in/egCH8X-t
Transcript
Hello and welcome to this edition of Intelligent Video Today. Joining us on today's episode, Mokshit Vidala, CEO over at SIV. Welcome, Mokshit. Hey Steve, how are you doing? Very good. Where you appreciate you joining the show today? Could you tell us a little bit about SIV? How long have you guys been around and what problems are you guys trying to solve in the AI video space? Sure. So we're a little over 2 1/2 years old as a company and the best way to think about what we do is a set of developer primitives for understanding, manipulating and creating video. So a lot of the time it's like developers maybe hitting our API's for a variety of these use cases, some of the popular ones being things like dubbing or background removal or content moderation. But over time, a lot of people. Are also kind of. Combining pipelines, running custom workloads on SIV as well. So a lot of companies are in the business of developing video platforms. These are the types of tools they'd be developing on their own. Why would they be hiring or working with somebody like you to do the work that they're supposed to be doing? Yeah, sure. So a lot of video platforms are are focused on a specific thing, right? You might have a lot of content platforms, you might have some of these new sort of AI video avatar platforms, whatever it is. But there's a lot of more table stakes, lower level components to building that that people might not want to build from scratch again, for example, building really, really good background removal. Is a problem that is hard, it's not something you might wanna resolve yourself, especially if that's not your core value. And so it's very much about what is your core value and what are the problems you wanna build internally and solve internally versus work with the vendor on. So tell us a little bit about those problems that still can help Venda SIV can help vendors address. So what what challenges can you help those vendors that actually address? On a on an ongoing basis. Yeah, sure. So a lot of the, uh, workloads, we help people run our, our AI workloads, right. So these are different ways people are creating, manipulating or understanding video with, with AI models. A lot of the time what you'll notice working on these problems is that it's rarely the fact that the valuable problem to solve ends up being solved by a single model. Let's take a use case like translating videos into different languages. Doing that really, really well involved. Using a great text to speech Model 1, getting the translations right too. But then there's also all this legwork and making sure the timings are exactly perfect. You know, you might stylize that in different ways. There's like this kind of speed up and slow down problem based on the length of the translation. So you know, there's a lot of this nuance that ends up existing when you're putting a lot of models and components together, which ends up being the case for a lot of problems you're solving. Not to mention running these workloads is also. Expensive, so, so optimizing them and running them at scale is is also another problem to solve. Efficiency can really translate into cost savings for software developers in a big way. So are there other issues that that you've been seeing popping up that's a SIV can play a role in? Yes, I mean, infrastructure costs I mentioned is one, right? But I think one thing that's really unique about the AI market today is there's a ton of new models, a ton of new vendors kind of coming out every day with new things. And over the last year and a half or two, you've basically seen a lot of vendor sprawl with companies having to work with dozens of vendors, which comes with obviously contract negotiations and obviously data security. And so in problem we've actually helped a lot of people solve is, is in consolidation because we offer a lot of pipelines in a single experience. People don't have to go manage relationships with 10 different vendors for every single new use case they're thinking about. It's it's one product and platform they can work with. So the thing that drew my attention to Sid recently was you recently announced a tool or a pipeline for handling the green screen. Backgrounds tell us a little bit about how that works. Sure. So a lot of the focus for us over the last few months has actually been around solving what we think of is, is more table stakes use cases that a lot of IT video platforms know they need. There might be solutions for them out there already, but. Problems we think we can go solve better, um, with background removal specifically, we actually noticed that there was no developer friendly, super high quality solution when it came to video. And there's a lot of new models that have come out recently that are great at segmentation, but they require you to pass in points or draw boxes that then kind of can be used to remove backgrounds versus now we've built a layer on top. Can auto prompt these models to then automatically remove backgrounds in a much higher quality manner. Obviously with the SIV developer experience as well. So something that's really friendly for developers to use something that some software vendors would come and look at SIV and kind of think of you as a new type of video infrastructure of a sort. Are you comfortable with that type of perception or or description of what city is doing? Yeah. I mean, I think in, in in some sense, right, it is a new type of infrastructure that isn't, isn't exactly compute alone, right. Which I mean, traditionally you might look at AWS and Google Cloud as an infrastructure provider. In our own way, we're kind of building a layer on top much more focused on on, on video and visual as a data type and giving developers basically these pipelines, which in their own sense are infrastructure. Just a slightly higher level of infrastructure that they can use to solve problems. So if we're taking a look at the marketplace from an infrastructure perspective, look in your crystal ball. What are some of the big themes or topics that we're going to be dealing with from a video infrastructure perspective as it relates to AI video, say over the next three to five years? So I'll look over the horizon a little bit for me. Sure. Um, so actually recently we've started working with a lot of companies that tend to train their own models for for certain applications. And a lot of what goes into training great models. And a lot of what, what these companies are focused on is much more granular control over over the generated or manipulated videos. For example, you see a lot of the new avatars that are coming out able to be emotive now, right? Because that's something people care a lot. Out or with complete kind of text to video generation, there's much more interest in styles or specific themes, right? And so I think there's there's this one theme around like control that is that is super important. And I think that's like a big, big problem when it comes to AI and in these workloads. Same thing with text to speech, where the text to speech models have gotten really realistic. But let's say you want to control an exact emotion of how something sounded. Or given an exact accent. That is still kind of an unsolved problem. Some of the newer models are starting to kind of do that, but it's not. Solved yet, right? So I think that that kind of sense of controllability is one big problem that we need to solve in this space. I think another thing and actually this comes to. Some of the people training models using our platform as well to, you know, clean, curate, collect data sets, right? I think another problem is just when you're working with such large datasets and video datasets tend to just be huge, right? There is a lot of problems around how you clean and and curate, curate that data depending on what models you're training efficiently, right? And this comes with, again, a lot of people wanting to run. The pipelines we have available on SIV, on entire datasets, how do you do that efficiently? Make sure that you're not paying a bunch of egress, right? So there's a lot of these problems around, like just the scale at which you need to collect and curate and clean data to train good video models. Yeah, it's really the two ends of the spectrum, isn't it? AI video at scale and AI video with nuance. And you kind of have to really thread the needle in both of those. Instances and applications or APIs like those offered by SIV hopefully helps vendors do that a little more effectively. Don't. I guess that's the the overall objective for SIV over long haul, right? Yep, exactly. I think today a lot of the applications that we're seeing of of our pipelines are in kind of what I think of as social and creative video right, But over time you're gonna see that in self driving robotics R gaming, right. There's all these other industries as well that's going to touch but but these problems kind of stay the same regardless. Some great times ahead for Sev. I'm going to be watching your progress with great interest. Thanks so much for taking the time to. With us today. Thanks, Steve. And we thank you for watching today's episode. If you want access to future interviews with industry thought leaders like Mokshit Budala of SIV, just go to the YouTube link right below there. Subscribe to the Intelligent Video Today channel and you'll get notifications of future interviews in the Intelligent Video Today series. Foreign Talbot research and Intelligent Video Today. I'm Steve Vonderhaar. Thanks for your time.To view or add a comment, sign in