Last year, Meta had a huge hit with Segment Anything, a machine learning model that could quickly and accurately find and outline anything in a picture. CEO Mark Zuckerberg showed off the update on stage at SIGGRAPH on Monday. It uses video, which shows how quickly the field is changing.
A vision model looks at a picture and pulls out the parts: “this is a dog, this is a tree behind the dog” (hopefully, not “this is a tree growing out of a dog”). This process is called segmentation. Even though this has been going on for a long time, things have recently gotten much better and faster. Segment Anything is a big step forward in this regard.
Segment Anything 2 (SA2) is an appropriate follow-up because it works with video as well as still images. You could use the first model on each frame of a video individually, but that’s not the best way to work.
“This is used by scientists to study things like coral reefs and natural habitats.” But it’s pretty cool that you can do this in video with zero shot and tell it what you want, Zuckerberg told Jensen Huang, CEO of Nvidia [company name].
Video processing, on the other hand, requires a lot more computing power. The fact that SA2 can run without melting the server shows how far the industry has come in terms of efficiency. It’s still a very big model that needs very powerful hardware to run, but even a year ago, it was almost impossible to do fast, adjustable segmentation.
The model, like the first one, will be open and free to use. There are no plans for an online version, which is something that AI companies sometimes do. But you can try it out for free.
As you might expect, training such a model takes a lot of data. To help with this, Meta is also sharing a big database of 50,000 videos that have been labelled carefully. Another database with more than 100,000 “internally available” movies was also used for training in the SA2 paper. This one is not being made public, and I’ve asked Meta to explain what it is and why it is not being made public. Our best guess is that it comes from public Instagram and Facebook users.
Meta has been a leader in “open” AI for a few years now, but Zuckerberg said in the chat that it has been doing this for a long time with tools like PyTorch. But lately, LLaMa, Segment Anything, and a few other models it releases for free have become a pretty easy way to judge how well AI works in those areas, though some people question how “open” they really are.
Facebook’s Zuckerberg said that the openness isn’t all done out of kindness at Meta, but that doesn’t mean they have bad intentions:
“You can’t just make this like a piece of software; you need an ecosystem around it.” If we didn’t make it open source, it probably wouldn’t work that well either, right? There’s no selfish reason behind our actions, even though I believe this will be good for the ecosystem. We’re doing it because we believe it will make what we’re making the best.
What do you say about this story? Visit Parhlo World For more.