"Diffuse to Choose": a novel not so complex yet innovative and efficient double U-Net 2D perspective-aware diffusion model for a virtual try-on try-all. Great news: performs fast "inpainting" with fascinating level of detail on the cellphone-taken images - fast and on pair with the more complex models requiring the multiple images of the object! https://lnkd.in/emTZvix2
Leo Plese’s Post
More Relevant Posts
-
Engineer at AIMonk Labs || Crafting Stable AI Products and Enhancing Software Aesthetics || Enthusiastic about Robotics and Cutting-Edge AI Developments || Sharing the Hottest Trends in Artificial Intelligence.
🚨I3D 2024 Paper Alert 🚨 ➡️Paper Title: FaceFolds: Meshed Radiance Manifolds for Efficient Volumetric Rendering of Dynamic Faces 🌟Few pointers from the paper 🗿In this paper authors have presented a novel representation that enables high-quality volumetric rendering of an actor's dynamic facial performances with minimal compute and memory footprint. 🗿It runs natively on commodity graphics software and hardware, and allows for a graceful trade-off between quality and efficiency. 🗿Their method utilizes recent advances in neural rendering, particularly learning discrete radiance manifolds to sparsely sample the scene to model volumetric effects. 🗿They achieve efficient modeling by learning a single set of manifolds for the entire dynamic sequence, while implicitly modeling appearance changes as temporal canonical texture. 🗿They exported a single layered mesh and view-independent RGBA texture video that is compatible with legacy graphics renderers without additional ML integration. 🗿Authors demonstrated their method by rendering dynamic face captures of real actors in a game engine, at comparable photorealism to state-of-the-art neural rendering techniques at previously unseen frame rates. 🏢Organization: Google, Massachusetts Institute of Technology, ETH Zürich 🧙Paper Authors: Safa C. Medin, Gengyan Li, Ruofei Du, Stephan Garbin, Philip Davidson, Gregory W. Wornell, Thabo Beeler, Abhimitra Meka 1️⃣Read the Full Paper here: https://lnkd.in/g7fy4Z2X 2️⃣Project Page: https://lnkd.in/g5bKrx_8 🎥 Be sure to watch the attached Video-Sound on 🔊🔊 Find this Valuable 💎 ? ♻️REPOST and teach your network something new Follow me 👣, Naveen Manwani, for the latest updates on Tech and AI-related news, insightful research papers, and exciting announcements. #I3D2024 #ar #vr #Ml #rendering #FaceModeling
To view or add a comment, sign in
-
Unleashing the Power of AI for User-friendly Login Experiences Discover how AI can revolutionize the login process by understanding layout, graphics, and user actions. Learn how LLMs can reason and determine the order of actions, such as clicking buttons and filling in fields, to enhance user experience. #AIforUserExperience #RevolutionizingLoginProcess #UserFriendlyInterfaces #LayoutUnderstanding #LLMsforUserActions #EnhancingUserExperience #NextGenLogin #GraphicsRecognition #StreamliningLoginForm #OptimizingUserJourney
To view or add a comment, sign in
-
Motion and face tracking produce vast amounts of data. In high-end productions, a single day's worth of mocap footage can generate terabytes of data, which is processed and analyzed by proprietary software. Processing Power: Studios require massive server farms and specialized GPU-based hardware to process and render the vast amount of motion capture data captured during production. Films like Avatar and Avengers required supercomputers to render the high-fidelity visual effects. Facial Markers: Face tracking often uses up to 150+ markers placed on key facial points (muscles around the mouth, eyes, forehead, etc.) to capture detailed expressions and emotions. Dots for Performance Capture: In films like The Hobbit and Planet of the Apes, actors wore face-tracking rigs with around 60 to 80 reflective dots. In Avatar, actors like Sam Worthington had 150 facial markers. Facial Capture Systems: Systems like ILM’s Medusa, Weta Digital's FACETS, or Faceware Technologies capture highly detailed facial expressions using either marker-based or markerless systems. These technologies have been used to create characters like Thanos in Avengers and Caesar in Planet of the Apes. Depth-Sensing Cameras: Some systems use depth-sensing technology (like Microsoft's Kinect) to track the 3D positions of an actor’s face in real-time without physical markers. _ #designfacts #filmindustry #cgi #vfx #motiondesign #motioncapture #facetracking #designstudios #creativmedium #swiss #animationstudio #designinsights #designers #marketingagency #marketing #dailyfacts #unrealengine #iml #ai #facialcapture #supercomputers #datafarms #renderfarms _ creativ medium is a multidisciplinary design studio and creative agency in Zug in Switzerland. www.creativ-medium.com Instagram : https://lnkd.in/dN4A7CfF _
To view or add a comment, sign in
-
New day, new AI breakthrough. The last few months I have been surprised multiple times by the rate of improvement when it comes to image/video generation. Both in terms of what you can do as well as the quality of it. Exciting times! Big credits to the authors (link contains more video's!): https://lnkd.in/e8YCXunv
Alibaba presents MIMO Controllable Character Video Synthesis with Spatial Decomposed Modeling Character video synthesis aims to produce realistic videos of animatable characters within lifelike scenes. As a fundamental problem in the computer vision and graphics community, 3D works typically require multi-view captures for per-case training, which severely limits their applicability of modeling arbitrary characters in a short time. Recent 2D methods break this limitation via pre-trained diffusion models, but they struggle for pose generality and scene interaction. To this end, we propose MIMO, a novel framework which can not only synthesize character videos with controllable attributes (i.e., character, motion and scene) provided by simple user inputs, but also simultaneously achieve advanced scalability to arbitrary characters, generality to novel 3D motions, and applicability to interactive real-world scenes in a unified framework. The core idea is to encode the 2D video to compact spatial codes, considering the inherent 3D nature of video occurrence. Concretely, we lift the 2D frame pixels into 3D using monocular depth estimators, and decompose the video clip to three spatial components (i.e., main human, underlying scene, and floating occlusion) in hierarchical layers based on the 3D depth. These components are further encoded to canonical identity code, structured motion code and full scene code, which are utilized as control signals of synthesis process. The design of spatial decomposed modeling enables flexible user control, complex motion expression, as well as 3D-aware synthesis for scene interactions. Experimental results demonstrate effectiveness and robustness of the proposed method.
To view or add a comment, sign in
-
More video based mocap solutions on the horizon makes me question where they will be in 5 years in terms of fidelity and a more real-time capture/turnaround, when compared to multi thousand dollar suits.
Alibaba presents MIMO Controllable Character Video Synthesis with Spatial Decomposed Modeling Character video synthesis aims to produce realistic videos of animatable characters within lifelike scenes. As a fundamental problem in the computer vision and graphics community, 3D works typically require multi-view captures for per-case training, which severely limits their applicability of modeling arbitrary characters in a short time. Recent 2D methods break this limitation via pre-trained diffusion models, but they struggle for pose generality and scene interaction. To this end, we propose MIMO, a novel framework which can not only synthesize character videos with controllable attributes (i.e., character, motion and scene) provided by simple user inputs, but also simultaneously achieve advanced scalability to arbitrary characters, generality to novel 3D motions, and applicability to interactive real-world scenes in a unified framework. The core idea is to encode the 2D video to compact spatial codes, considering the inherent 3D nature of video occurrence. Concretely, we lift the 2D frame pixels into 3D using monocular depth estimators, and decompose the video clip to three spatial components (i.e., main human, underlying scene, and floating occlusion) in hierarchical layers based on the 3D depth. These components are further encoded to canonical identity code, structured motion code and full scene code, which are utilized as control signals of synthesis process. The design of spatial decomposed modeling enables flexible user control, complex motion expression, as well as 3D-aware synthesis for scene interactions. Experimental results demonstrate effectiveness and robustness of the proposed method.
To view or add a comment, sign in
-
enabling digital services for Student Loan related activities while maintaining the highest security standard, the most compliant personal data protection and customer-centric data-driven innovation.
🚀 Just published a fascinating blog post on "Choreographing the Digital Canvas: A Machine Learning Approach to Artistic Performance." The paper introduces a novel design tool for artistic performances, integrating a cutting-edge machine-learning (ML) model with an interactive interface to generate and visualize artistic movements. This groundbreaking approach utilizes a cyclic Attribute-Conditioned Variational Autoencoder (AC-VAE) model, specifically developed to capture and generate realistic 3D human body motions. The platform also offers a unique dataset and web-based interface, empowering artists with fine-grained control over motion attributes. Explore the future of technology in artistic expression: https://bit.ly/3W1d0t5 #ArtTech #MachineLearning #CreativeExpression
To view or add a comment, sign in
-
Meta-Species is an exercise in utilizing and responding to AI image synthesis tools. The intention was to explore how this new frontier of image-making technology results in the visually novel and how it can inform and direct abstract narrative structures. Concept, Creative Direction, 3D: Tomorrow Bureau Graphics and 2D Motion design: Léo Imbert Sound design: Tomorrow Bureau + AI
To view or add a comment, sign in
-
HRDC Accredited Trainer | TVET | ACLP | AI Trainer | Helping Corporates & Business Digital Transformation through AI Upskilling, Data Analytics, Process Automation & Content Marketing
Generative AI "Video To Video" is picking up steam and you know its a serious business when Alibaba is getting into the game with MIMO. Unlike Runway Gen-3 that deals with 2D to 2D, MIMO attempts to further enhance the output by introducing 3D construct within the AI Model itself. Also on a separate news, Director James Cameron is joining Stability AI's board of directors, showing that there is going to be an interesting change in the perception of using AI for creative video works. This proves that there is something for everyone in the creative field, depending on the level of AI Tools you choose to introduce in your workflow. #VideoToVideo #GenerativeAI #machinelearning
Alibaba presents MIMO Controllable Character Video Synthesis with Spatial Decomposed Modeling Character video synthesis aims to produce realistic videos of animatable characters within lifelike scenes. As a fundamental problem in the computer vision and graphics community, 3D works typically require multi-view captures for per-case training, which severely limits their applicability of modeling arbitrary characters in a short time. Recent 2D methods break this limitation via pre-trained diffusion models, but they struggle for pose generality and scene interaction. To this end, we propose MIMO, a novel framework which can not only synthesize character videos with controllable attributes (i.e., character, motion and scene) provided by simple user inputs, but also simultaneously achieve advanced scalability to arbitrary characters, generality to novel 3D motions, and applicability to interactive real-world scenes in a unified framework. The core idea is to encode the 2D video to compact spatial codes, considering the inherent 3D nature of video occurrence. Concretely, we lift the 2D frame pixels into 3D using monocular depth estimators, and decompose the video clip to three spatial components (i.e., main human, underlying scene, and floating occlusion) in hierarchical layers based on the 3D depth. These components are further encoded to canonical identity code, structured motion code and full scene code, which are utilized as control signals of synthesis process. The design of spatial decomposed modeling enables flexible user control, complex motion expression, as well as 3D-aware synthesis for scene interactions. Experimental results demonstrate effectiveness and robustness of the proposed method.
To view or add a comment, sign in
-
Is this a possible approach for storytelling in sports exhibitions? Before, capturing human movement with such precision often required tedious 3D modelling and animation rigging, which was hardly ever this detailed and fluid, limiting how we could tell historical stories in sports exhibition content due to time, budget, or lack of specialized skills. But - the possibilities are evolving now. As seen in a brilliant post by Ahsen Khaliq, complex athletic movements can be portrayed without intense production hassle, shifting our focus from the tech to the story itself. When I engage with innovative tech, I always try to find a relatable bridge that can enhance exhibition experiences. How can we design to enrich our content and, in turn, enhance our audiences’ experience? This innovative movement concept offers a powerful answer. When I think back to my exhibition projects working together with the FIFA Museum or the German Football Museum, our storytelling techniques could have truly benefited from this in terms of animating legendary soccer players explaining their iconic goals, bringing history to life interactively and creating that special emotional connection to epic historical moments in time. In my opinion, this technology has massive potential to transform storytelling in exhibitions, making history feel alive and more engaging for audiences. It also opens doors for smaller-budget exhibitions to adopt innovative storytelling methods that captivate museum or event audiences. How could this apply in your field or transform the way you tell stories? I’d genuinely love to know your take on this and am very curious to witness how much this technology will grow in the coming months! Exciting times! :)
Alibaba presents MIMO Controllable Character Video Synthesis with Spatial Decomposed Modeling Character video synthesis aims to produce realistic videos of animatable characters within lifelike scenes. As a fundamental problem in the computer vision and graphics community, 3D works typically require multi-view captures for per-case training, which severely limits their applicability of modeling arbitrary characters in a short time. Recent 2D methods break this limitation via pre-trained diffusion models, but they struggle for pose generality and scene interaction. To this end, we propose MIMO, a novel framework which can not only synthesize character videos with controllable attributes (i.e., character, motion and scene) provided by simple user inputs, but also simultaneously achieve advanced scalability to arbitrary characters, generality to novel 3D motions, and applicability to interactive real-world scenes in a unified framework. The core idea is to encode the 2D video to compact spatial codes, considering the inherent 3D nature of video occurrence. Concretely, we lift the 2D frame pixels into 3D using monocular depth estimators, and decompose the video clip to three spatial components (i.e., main human, underlying scene, and floating occlusion) in hierarchical layers based on the 3D depth. These components are further encoded to canonical identity code, structured motion code and full scene code, which are utilized as control signals of synthesis process. The design of spatial decomposed modeling enables flexible user control, complex motion expression, as well as 3D-aware synthesis for scene interactions. Experimental results demonstrate effectiveness and robustness of the proposed method.
To view or add a comment, sign in
-
#FluxAI #Midjourney FLUX AI, the Midjourney Killer? This was the common theme echoed this week (Aug 9, 2024) on the thumbnails of several YouTube channels I follow. I decided to find out, and the answer depends on the type of AI images you create. Though I'm fortunate to own a PC with a 4090 graphics card capable of running Flux AI Dev model, I was reluctant to down the large files and struggle with installation, updating comfy ui etc. I discovered it's far easier, and very affordable to run this model on replicate.com, for only 3 cents per image. In my tests tonight, I generated 63 images in about 1-1/2 hours, for $1.89.. I realized Midjourney is currently superior for artistic images, including oil paintings/watercolors, surreal images, and beautiful architecture/buildings. Flux AI seems better for photographs of people, portraits, and adhering to the prompt, as well as putting text on the images. Buildings look like mundane photos. However with image prompts, you can achieve results on par or better than with Midjourney. The final 2 images in my samples are examples where I think Flux was much better than Midjourney. Controlnets and Loras that will soon be available which will give a much greater degree of control with FLUX than possible with Midjourney... I've attached some images for comparison. The first of each similar group of images is Midjourney, the later ones are Flux AI. Let me know your thoughts...
To view or add a comment, sign in