-
ChimpVLM: Ethogram-Enhanced Chimpanzee Behaviour Recognition
Authors:
Otto Brookes,
Majid Mirmehdi,
Hjalmar Kuhl,
Tilo Burghardt
Abstract:
We show that chimpanzee behaviour understanding from camera traps can be enhanced by providing visual architectures with access to an embedding of text descriptions that detail species behaviours. In particular, we present a vision-language model which employs multi-modal decoding of visual features extracted directly from camera trap videos to process query tokens representing behaviours and outp…
▽ More
We show that chimpanzee behaviour understanding from camera traps can be enhanced by providing visual architectures with access to an embedding of text descriptions that detail species behaviours. In particular, we present a vision-language model which employs multi-modal decoding of visual features extracted directly from camera trap videos to process query tokens representing behaviours and output class predictions. Query tokens are initialised using a standardised ethogram of chimpanzee behaviour, rather than using random or name-based initialisations. In addition, the effect of initialising query tokens using a masked language model fine-tuned on a text corpus of known behavioural patterns is explored. We evaluate our system on the PanAf500 and PanAf20K datasets and demonstrate the performance benefits of our multi-modal decoding approach and query initialisation strategy on multi-class and multi-label recognition tasks, respectively. Results and ablations corroborate performance improvements. We achieve state-of-the-art performance over vision and vision-language models in top-1 accuracy (+6.34%) on PanAf500 and overall (+1.1%) and tail-class (+2.26%) mean average precision on PanAf20K. We share complete source code and network weights for full reproducibility of results and easy utilisation.
△ Less
Submitted 13 April, 2024;
originally announced April 2024.
-
PanAf20K: A Large Video Dataset for Wild Ape Detection and Behaviour Recognition
Authors:
Otto Brookes,
Majid Mirmehdi,
Colleen Stephens,
Samuel Angedakin,
Katherine Corogenes,
Dervla Dowd,
Paula Dieguez,
Thurston C. Hicks,
Sorrel Jones,
Kevin Lee,
Vera Leinert,
Juan Lapuente,
Maureen S. McCarthy,
Amelia Meier,
Mizuki Murai,
Emmanuelle Normand,
Virginie Vergnes,
Erin G. Wessling,
Roman M. Wittig,
Kevin Langergraber,
Nuria Maldonado,
Xinyu Yang,
Klaus Zuberbuhler,
Christophe Boesch,
Mimi Arandjelovic
, et al. (2 additional authors not shown)
Abstract:
We present the PanAf20K dataset, the largest and most diverse open-access annotated video dataset of great apes in their natural environment. It comprises more than 7 million frames across ~20,000 camera trap videos of chimpanzees and gorillas collected at 14 field sites in tropical Africa as part of the Pan African Programme: The Cultured Chimpanzee. The footage is accompanied by a rich set of an…
▽ More
We present the PanAf20K dataset, the largest and most diverse open-access annotated video dataset of great apes in their natural environment. It comprises more than 7 million frames across ~20,000 camera trap videos of chimpanzees and gorillas collected at 14 field sites in tropical Africa as part of the Pan African Programme: The Cultured Chimpanzee. The footage is accompanied by a rich set of annotations and benchmarks making it suitable for training and testing a variety of challenging and ecologically important computer vision tasks including ape detection and behaviour recognition. Furthering AI analysis of camera trap information is critical given the International Union for Conservation of Nature now lists all species in the great ape family as either Endangered or Critically Endangered. We hope the dataset can form a solid basis for engagement of the AI community to improve performance, efficiency, and result interpretation in order to support assessments of great ape presence, abundance, distribution, and behaviour and thereby aid conservation efforts.
△ Less
Submitted 31 January, 2024; v1 submitted 24 January, 2024;
originally announced January 2024.
-
Automatic Individual Identification of Patterned Solitary Species Based on Unlabeled Video Data
Authors:
Vanessa Suessle,
Mimi Arandjelovic,
Ammie K. Kalan,
Anthony Agbor,
Christophe Boesch,
Gregory Brazzola,
Tobias Deschner,
Paula Dieguez,
Anne-Céline Granjon,
Hjalmar Kuehl,
Anja Landsmann,
Juan Lapuente,
Nuria Maldonado,
Amelia Meier,
Zuzana Rockaiova,
Erin G. Wessling,
Roman M. Wittig,
Colleen T. Downs,
Andreas Weinmann,
Elke Hergenroether
Abstract:
The manual processing and analysis of videos from camera traps is time-consuming and includes several steps, ranging from the filtering of falsely triggered footage to identifying and re-identifying individuals. In this study, we developed a pipeline to automatically analyze videos from camera traps to identify individuals without requiring manual interaction. This pipeline applies to animal speci…
▽ More
The manual processing and analysis of videos from camera traps is time-consuming and includes several steps, ranging from the filtering of falsely triggered footage to identifying and re-identifying individuals. In this study, we developed a pipeline to automatically analyze videos from camera traps to identify individuals without requiring manual interaction. This pipeline applies to animal species with uniquely identifiable fur patterns and solitary behavior, such as leopards (Panthera pardus). We assumed that the same individual was seen throughout one triggered video sequence. With this assumption, multiple images could be assigned to an individual for the initial database filling without pre-labeling. The pipeline was based on well-established components from computer vision and deep learning, particularly convolutional neural networks (CNNs) and scale-invariant feature transform (SIFT) features. We augmented this basis by implementing additional components to substitute otherwise required human interactions. Based on the similarity between frames from the video material, clusters were formed that represented individuals bypassing the open set problem of the unknown total population. The pipeline was tested on a dataset of leopard videos collected by the Pan African Programme: The Cultured Chimpanzee (PanAf) and achieved a success rate of over 83% for correct matches between previously unknown individuals. The proposed pipeline can become a valuable tool for future conservation projects based on camera trap data, reducing the work of manual analysis for individual identification, when labeled data is unavailable.
△ Less
Submitted 19 April, 2023;
originally announced April 2023.
-
Triple-stream Deep Metric Learning of Great Ape Behavioural Actions
Authors:
Otto Brookes,
Majid Mirmehdi,
Hjalmar Kühl,
Tilo Burghardt
Abstract:
We propose the first metric learning system for the recognition of great ape behavioural actions. Our proposed triple stream embedding architecture works on camera trap videos taken directly in the wild and demonstrates that the utilisation of an explicit DensePose-C chimpanzee body part segmentation stream effectively complements traditional RGB appearance and optical flow streams. We evaluate sy…
▽ More
We propose the first metric learning system for the recognition of great ape behavioural actions. Our proposed triple stream embedding architecture works on camera trap videos taken directly in the wild and demonstrates that the utilisation of an explicit DensePose-C chimpanzee body part segmentation stream effectively complements traditional RGB appearance and optical flow streams. We evaluate system variants with different feature fusion techniques and long-tail recognition approaches. Results and ablations show performance improvements of ~12% in top-1 accuracy over previous results achieved on the PanAf-500 dataset containing 180,000 manually annotated frames across nine behavioural actions. Furthermore, we provide a qualitative analysis of our findings and augment the metric learning system with long-tail recognition techniques showing that average per class accuracy -- critical in the domain -- can be improved by ~23% compared to the literature on that dataset. Finally, since our embedding spaces are constructed as metric, we provide first data-driven visualisations of the great ape behavioural action spaces revealing emerging geometry and topology. We hope that the work sparks further interest in this vital application area of computer vision for the benefit of endangered great apes.
△ Less
Submitted 6 January, 2023;
originally announced January 2023.
-
SOCRATES: A Stereo Camera Trap for Monitoring of Biodiversity
Authors:
Timm Haucke,
Hjalmar S. Kühl,
Volker Steinhage
Abstract:
The development and application of modern technology is an essential basis for the efficient monitoring of species in natural habitats and landscapes to trace the development of ecosystems, species communities, and populations, and to analyze reasons of changes. For estimating animal abundance using methods such as camera trap distance sampling, spatial information of natural habitats in terms of…
▽ More
The development and application of modern technology is an essential basis for the efficient monitoring of species in natural habitats and landscapes to trace the development of ecosystems, species communities, and populations, and to analyze reasons of changes. For estimating animal abundance using methods such as camera trap distance sampling, spatial information of natural habitats in terms of 3D (three-dimensional) measurements is crucial. Additionally, 3D information improves the accuracy of animal detection using camera trapping. This study presents a novel approach to 3D camera trapping featuring highly optimized hardware and software. This approach employs stereo vision to infer 3D information of natural habitats and is designated as StereO CameRA Trap for monitoring of biodivErSity (SOCRATES). A comprehensive evaluation of SOCRATES shows not only a $3.23\%$ improvement in animal detection (bounding box $\text{mAP}_{75}$) but also its superior applicability for estimating animal abundance using camera trap distance sampling. The software and documentation of SOCRATES is provided at https://github.com/timmh/socrates
△ Less
Submitted 13 October, 2022; v1 submitted 19 September, 2022;
originally announced September 2022.
-
Compensating class imbalance for acoustic chimpanzee detection with convolutional recurrent neural networks
Authors:
Franz Anders,
Ammie K. Kalan,
Hjalmar S. Kühl,
Mirco Fuchs
Abstract:
Automatic detection systems are important in passive acoustic monitoring (PAM) systems, as these record large amounts of audio data which are infeasible for humans to evaluate manually. In this paper we evaluated methods for compensating class imbalance for deep-learning based automatic detection of acoustic chimpanzee calls. The prevalence of chimpanzee calls in natural habitats is very rare, i.e…
▽ More
Automatic detection systems are important in passive acoustic monitoring (PAM) systems, as these record large amounts of audio data which are infeasible for humans to evaluate manually. In this paper we evaluated methods for compensating class imbalance for deep-learning based automatic detection of acoustic chimpanzee calls. The prevalence of chimpanzee calls in natural habitats is very rare, i.e. databases feature a heavy imbalance between background and target calls. Such imbalances can have negative effects on classifier performances. We employed a state-of-the-art detection approach based on convolutional recurrent neural networks (CRNNs). We extended the detection pipeline through various stages for compensating class imbalance. These included (1) spectrogram denoising, (2) alternative loss functions, and (3) resampling. Our key findings are: (1) spectrogram denoising operations significantly improved performance for both target classes, (2) standard binary cross entropy reached the highest performance, and (3) manipulating relative class imbalance through resampling either decreased or maintained performance depending on the target class. Finally, we reached detection performances of 33% for drumming and 5% for vocalization, which is a >7 fold increase compared to previously published results. We conclude that supporting the network to learn decoupling noise conditions from foreground classes is of primary importance for increasing performance.
△ Less
Submitted 26 May, 2021;
originally announced May 2021.
-
Overcoming the Distance Estimation Bottleneck in Estimating Animal Abundance with Camera Traps
Authors:
Timm Haucke,
Hjalmar S. Kühl,
Jacqueline Hoyer,
Volker Steinhage
Abstract:
The biodiversity crisis is still accelerating, despite increasing efforts by the international community. Estimating animal abundance is of critical importance to assess, for example, the consequences of land-use change and invasive species on community composition, or the effectiveness of conservation interventions. Various approaches have been developed to estimate abundance of unmarked animal p…
▽ More
The biodiversity crisis is still accelerating, despite increasing efforts by the international community. Estimating animal abundance is of critical importance to assess, for example, the consequences of land-use change and invasive species on community composition, or the effectiveness of conservation interventions. Various approaches have been developed to estimate abundance of unmarked animal populations. Whereas these approaches differ in methodological details, they all require the estimation of the effective area surveyed in front of a camera trap. Until now camera-to-animal distance measurements are derived by laborious, manual and subjective estimation methods. To overcome this distance estimation bottleneck, this study proposes an automatized pipeline utilizing monocular depth estimation and depth image calibration methods. We are able to reduce the manual effort required by a factor greater than 21 and provide our system at https://timm.haucke.xyz/publications/distance-estimation-animal-abundance
△ Less
Submitted 22 December, 2021; v1 submitted 10 May, 2021;
originally announced May 2021.