Meta's Groundbreaking Open Source Segmentation Model for Videos and Images

July 31, 2022

On July 29, 2024, Meta announced the release of SAM 2, the next generation of their Segment Anything Model (SAM), supporting real-time object segmentation in both images and videos. This new model builds on the success of its predecessor and is being launched under the Apache 2.0 license, promoting open science and accessibility.

Model highlights:

Unified Model for Images and Videos: SAM 2 offers state-of-the-art performance in real-time, promptable object segmentation across both images and videos, surpassing previous capabilities with reduced interaction time.
Open Source Commitment: Meta is releasing the SAM 2 code and model weights under the Apache 2.0 license, alongside the SA-V dataset, which contains approximately 51,000 real-world videos and over 600,000 spatio-temporal masks.
Versatile Applications: SAM 2’s zero-shot generalization allows it to segment unseen objects, enabling applications ranging from creative video effects and faster annotation tools to advancements in autonomous vehicles, robotics, and medical research.
Innovative Use Cases: Examples include creating video effects, segmenting microscopic moving cells for scientific research, tracking animals in drone footage, and aiding medical procedures.
Technical Innovations:
- Memory Mechanism: Incorporates a memory encoder, bank, and attention module to store and recall information across video frames, ensuring accurate segmentation.
- Handling Ambiguity and Occlusions: SAM 2 generates multiple valid masks and includes an occlusion head for predicting object visibility, enhancing robustness.
SA-V Dataset: The SA-V dataset, now the largest video segmentation dataset available, was created using an iterative model-in-the-loop approach, significantly accelerating annotation speed and improving data quality.

Since its launch, SAM has made a significant impact across various fields. It has inspired new features in Meta’s apps, such as Instagram’s Backdrop and Cutouts, and found applications in science, medicine, and numerous industries. It has streamlined data annotation processes, saving millions of hours and supporting tasks from marine science to medical diagnostics. Mark Zuckerberg highlighted in an open letter that open-source AI can profoundly enhance productivity, creativity, and quality of life. SAM 2 builds on this foundation, promising even greater advancements.

Meta’s release of SAM 2 and the SA-V dataset marks a significant advancement in computer vision, offering powerful new tools for diverse applications. The open-source release is expected to drive further innovation and application in various fields, from creative industries to scientific research.