Vaibhav Srivastav’s Post

GPU poor @ Hugging Face

6mo

Smol VLMs ftw! Microsoft just dropped Florence - SOTA 200M & 800M parameter vision foundation model! 🔥 > Best part MIT Licensed! > 200M checkpoint beats Flamingo 80B (400x bigger model) by a huge margin > Performs captioning, object detection and segmentation, OCR, phrase grounding and more > Leverages FLD-5B dataset - 5.4 billion annotations across 126 million images > Multi-task learning > Finetuned model checkpoints beat the likes of PaLI, PaLI-X Thanks, and kudos to Microsoft for choosing open-source! 🤗 https://lnkd.in/en3xKke5

Florence - a microsoft Collection

huggingface.co

4 Comments

Harri Smått

Disguised sr. sw eng

6mo

There is so many models to try, something new almost every day, and here I am still stuck with StyleGAN from years ago 😂

1 Reaction

Allan M.

Javascript Developer, DeepRL, Prompt Engineering, Model Coercion

6mo

microsoft is acing by going open-source wonderful

2 Reactions

Dibson Dibe Gondim

Associate Professor of Pathology at University of Louisville School of Medicine

6mo

V. K. Cody Bumgardner

1 Reaction

See more comments

To view or add a comment, sign in

More Relevant Posts

Omar Sanseviero

Making Gemini and Gemma go brrr
6mo
Report this post
Microsoft just silently dropped Florence 2 👀Vision model that can tackle many vision tasks (captioning, long captioning, detection, region proposal, OCR) 🤏Small models (200M and 800M) with ~quality to models 100x larger 🎉MIT licensed Papers and models: https://lnkd.in/dTrA3vH4

Florence - a microsoft Collection

huggingface.co

3 Comments
Like Comment
To view or add a comment, sign in
Debananda Ghosh

Cloud Analytics Business Lead- APJ market | Author
8mo
Report this post
Microsoft Research introduces VASA, a framework for generating lifelike talking faces of virtual characters with appealing visual affective skills (VAS), given a single static image and a speech audio clip. https://lnkd.in/gz3B8U66

VASA-1 - Microsoft Research

https://www.microsoft.com/en-us/research

1 Comment
Like Comment
To view or add a comment, sign in
Aifinite Learning

2,944 followers
7mo Edited
Report this post
Learn More about #ComputerVision #DataScience #Machineleaning:- PoseNet: Google's real-time machine learning model for human pose estimation, detecting key body parts in images or video frames. MS Transformer: Microsoft's extension of the Transformer architecture to computer vision tasks, leveraging self-attention for image classification, object detection, and semantic segmentation. Ours (Marepo): Without specific context, it's difficult to provide concise details, but "Ours (Marepo)" could refer to a proprietary model or project, potentially developed by a research group named Marepo, focusing on a specific area within machine learning or computer vision.
Like Comment
To view or add a comment, sign in
Venkata Sai Santosh Gajjela

Data Scientist @SpaceMultiMedia |EX-OLX AUTOS|Generative AI | ML Algorithms | Deep Learning | Computer Vision | NLP | Statistical Modeling | Python | TensorFlow | PyTorch | OpenCV
5mo Edited
Report this post
I'm excited to share my new article on Medium diving deep into the impressive Microsoft Florence 2 model! This tiny titan in computer vision packs a powerful punch, excelling in tasks like image classification, object detection, and even generating captions. Let me know in the comments what you think about Florence 2's potential! #Microsoft #MicrosoftFlorence2 #AI #ComputerVision #MachineLearning

Microsoft Florence-2

link.medium.com
Like Comment
To view or add a comment, sign in
NVIDIA AI

1,150,771 followers
8mo
Report this post
Technical Deep Dive: Explore how federated learning enabled by NVIDIA FLARE can address data management challenges with easy and scalable integration to enhance the accuracy and robustness of your LLMs. https://nvda.ws/3TxVZUb

Scalable Federated Learning with NVIDIA FLARE for Enhanced LLM Performance | NVIDIA Technical Blog

1 Comment
Like Comment
To view or add a comment, sign in
Muchiu (Henry) Chang, PhD. Cantab (Cambridge, UK)

Consultant in Patent Intelligence and Engineering Management
8mo
Report this post
Without metadata, NO data can be found. Try this for real, our mind-blowing AI-IP story, based on fact. Is there any AI products you know, that can interpret and answer the following Chinese-English multilingual questions? "Who, in the Ontario province of Canada, has new US patents granted on the nearest Tuesday, when the USPTO releases the newly granted US patents on a weekly basis?" "Who, in the "江蘇‘’ province of China, has new US patents granted on the nearest Tuesday, when the USPTO releases the newly granted US patents on a weekly basis?" With our intellectual property (IP), a copyrighted multilingual metadata, we can answer. Metadata is an enabler. Without metadata, NO data can be found/retrieved, even by the most advanced technologies, like AI, NVIDIA chips, supercomputers, etc. https://lnkd.in/g-aJFnXR Experiment results showed that, with our intellectual property (IP), a copyrighted multilingual metadata, we are doing what AI, like ChatGPT, can't do in data analytics. Our IP can also make your information service UNIQUE in the world. Do you or any of your contacts need our expertise and our IP? We're selling, NOT just talking. " Thanks.

NVIDIA AI

1,150,771 followers
8mo

Technical Deep Dive: Explore how federated learning enabled by NVIDIA FLARE can address data management challenges with easy and scalable integration to enhance the accuracy and robustness of your LLMs. https://nvda.ws/3TxVZUb

Scalable Federated Learning with NVIDIA FLARE for Enhanced LLM Performance | NVIDIA Technical Blog
Like Comment
To view or add a comment, sign in
Sai Lalitha Chirravuri

Assistant professor in CSE at Vishnu institute of technology|Medium writer| Good knowledge in AR/VR|Innovation Ambassador
2mo Edited
Report this post
Hello connections, I want to share that one more my article on 🔉 medium.com The article is about key metrics to measure federated learning. Recently I have started writing one survey research paper on plant leaf 🌿 disease identification using federated learning with CNN,this topic is really interesting 🤔🧐. I started understanding of how FL works,I want to explain with simple example is that now a days everyone using smartphone that to every person have unique typing habits what FL will do it will predict the next typing word is going to be , initially every device is one local model for easy prediction after that aggregating all the local models to build one global model. Google uses FL to predict the next word.💫💫 https://lnkd.in/gR9z_9Jm

Performance Measures for Federated Learning Methods

medium.com

6 Comments
Like Comment
To view or add a comment, sign in
Abdullah Nasrullah

UET CS'27🎓✨ HTML | CSS | Bootstrap | JavaScript | C++ | C# | Window forms Front-End Development | User Interface design University of Engineering and Technology, Lahore
6mo
Report this post
"Over the moon to announce my finishing touch of the 'Introduction to Machine Learning' direction with Great Learning! 🚀 Excited to discover the limitless opportunities of facts technology and device mastering. 🌟 #MachineLearning #DataScience #GreatLearningJourney"
Like Comment
To view or add a comment, sign in
Hesham Elkouha, MSc

Senior Engineer @ Microsoft | Azure Infra | GenAI | Windows 365 | Microsoft 365
5mo
Report this post
#Microsoft introduced Florence-2, a cutting-edge vision foundation model that revolutionizes computer vision and vision-language tasks. Unlike traditional models, Florence-2 excels in understanding complex spatial hierarchies and semantic granularities with simple text prompts

Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks - Microsoft Research

https://www.microsoft.com/en-us/research
Like Comment
To view or add a comment, sign in
Nakul Havaldar

Generative AI @Intent HQ| Ex - Abbvie Data Science | Graduate Researcher - AI in marketing @Kelley School of Business
5mo
Report this post
Today I earned my "Fundamentals of Computer Vision" badge! I'm thrilled to share some exciting experiences with cloud technologies, particularly in building rapid, large-scale data applications. I've had the opportunity to explore Azure Vision Studio, creating some fascinating computer vision use cases: 🔘 Image Captioning: This innovative application aims to enhance customer experience in expansive retail environments. By providing descriptive captions for images, it offers valuable assistance to shoppers navigating large stores. 🔘 Object Detection and Summarization in Videos: Focused on improving efficiency in industrial settings, this application identifies and summarizes objects on factory floors. It's a game-changer for streamlining operations and enhancing safety protocols. These projects showcase the incredible potential of cloud-based AI services in solving real-world challenges. I'm continually amazed by how these technologies can transform various industries and improve everyday experiences. Have you had any similar experiences with cloud technologies or computer vision? I'd love to hear your thoughts or discuss any exciting projects you've encountered!

Fundamentals of Computer Vision

learn.microsoft.com
Like Comment
To view or add a comment, sign in

44,569 followers

View Profile Follow

Vaibhav Srivastav’s Post

Florence - a microsoft Collection

huggingface.co

More from this author

VBlog - June 2021

Co-Operative Game Theory and It's Applications

Explore topics