Shark tracking simplified – Methods Blog

October 12, 2025

28

Post provide Chinmay keshava lalgudi.

Drone imagery offers an efficient way to gather data on mobile animals. Drones are used for population surveys, creating 3D models of habitat, and even studying how animals move and behave in their environment. While collecting this data is relatively easy, manually annotating it is painstaking and slow. Analysing drone imagery can often mean spending hours in front of a computer annotating footage. In fact, we found just annotating bounding boxes around sharks can take hours per minute of video – valuable time that would be better spent thinking about interesting scientific questions.

Our team has focused on developing a technique to analyse sharks in Santa Elena Bay, Costa Rica – a critical habitat for the endangered Pacific nurse shark. Using drones has provided us the rare opportunity to study the movement ecology of this understudied species.

One of our team members preparing for drone take-off over a study site in Santa Elena Bay, Costa Rica.

Why deep learning models fail in the wild

While many studies use deep learning approaches to analyse drone imagery of animals, they typically rely on training species-specific models from a single habitat. While these models often perform well in the exact setting they were trained on, they can often be highly sensitive to changes in lighting, contrast, distractors (such as new vegetation or tidal movement), as well as individual variation. For example, a model trained on nurse sharks from data collected around mid-day might struggle on imagery in the exact same location, but collected around sunset.

Indeed, when we first trained our own models, we also found this to be the case. While we were able to train models with strong in-domain performance (data collected around the same area under the same conditions), the specialised models greatly struggled with out-of-domain performance.

These models are also challenging to work with because they must constantly be trained and retrained as conditions change. Every time you want to study a new species, work in a different location, or analyse footage collected under different weather conditions, you essentially need to start from scratch – collect new training data, manually annotate thousands of images, and spend days or weeks training and validating a new model. This creates a major barrier for scientists trying to focus on biological questions. Therefore, we set out to develop a generalizable solution for researchers – something that doesn’t require training and is easy for anyone to use!

The power of foundation models

Powerful systems like GPT (which works with language) and CLIP (which connects images and text) have completely changed how we solve problems using AI. Instead of needing to be retrained for every new task, these models learn from huge, diverse datasets and can often handle new challenges right away — a skill known as “zero-shot” learning.

The foundation model Segment Anything Model 2 (SAM 2) caught our attention because of its video understanding capabilities. Unlike traditional image segmentation models that analyse each frame one-by-one, SAM 2 can use information from earlier frames to keep track of objects as they move through a video.

This temporal awareness is extremely important for biological research – when a shark briefly disappears behind a wave or becomes obscured by water glare, SAM 2 can use its “memory” of where the animal was in previous frames to maintain the track rather than losing it entirely. We found that SAM 2 worked especially well in our challenging coastal marine environment, where animals are always moving through shifting water conditions — with changing reflections, backgrounds, shadows, and surface ripples.

FLAIR

Our new study offers just this: a video processing pipeline called Frame Level AlIgnment and tRacking (FLAIR). FLAIR uses both SAM 2’s segmentation capabilities along with CLIP’s ability to classify images. By passing language prompts (e.g. “a shark swimming in clear water”) through CLIP, our pipeline guides SAM 2 to focus on the right objects. The key innovation is FLAIR’s alignment strategy — if a candidate shark is identified by CLIP and tracked by SAM 2 at multiple timepoints, it is likely a real shark rather than a false positive, such as a shadow or piece of debris.

FLAIR dramatically outperforms traditional approaches, especially in challenging real-world conditions. FLAIR also generalizes to several shark species, including white and blacktip reef sharks, as well as other animals entirely, such as zebras!

Examples showcasing FLAIR’s ability to segment and track animals across a wide range of videos.

We conducted a user study to see how much FLAIR really speeds up annotation time. While labelling a typical 5-minute drone video would take more than 20 hours of manual effort, FLAIR completes the entire segmentation process in just under an hour of near-fully automated processing.

We also show it’s possible to calculate biometrics like body length (essential for population demographics) and tail beat frequency (which can reveal energy expenditure and swimming efficiency) from the accurate segmentation masks our system generates. This opens doors to research questions that would otherwise be extremely time-consuming to study.

Open-Source

We built FLAIR to be accessible to everyone. To help researchers use the tool easily, we provide two ways to easily use FLAIR: a Google Colab notebook and a Python workflow. In either option, you can import your video, enter a prompt, and start tracking!

To learn more, check out our paper and tool

Post edited by swifenwe and Prayer Kanyile.

Shark tracking simplified – Methods Blog

Related Articles

Ancient Astronomical Alignments: Reading and Mapping the Stars at Early Advanced Civilization Sites

Cosmic Ray Scan Of El Castillo At Chichén Itzá May Reveal Hidden Chambers

Was the Bayeux Tapestry Made for a Monastic Dining Hall?

LEAVE A REPLY Cancel reply

Latest Articles

Ancient Astronomical Alignments: Reading and Mapping the Stars at Early Advanced Civilization Sites

Cosmic Ray Scan Of El Castillo At Chichén Itzá May Reveal Hidden Chambers

Was the Bayeux Tapestry Made for a Monastic Dining Hall?

16th-Century Gallows Excavated in France

The Ancient World Online: The ancient Near East and Pharaonic Egypt in Spain and Portugal. Travelers, pioneers, collectors, institutions and reception