Detection of Bird Species Found in Bhutan Using Vision Transformer-based Transfer Learning
DOI:
https://doi.org/10.17102/zmv8.i1.008Keywords:
Vision Transformer, Bhutanese bird recognition, Transfer learning, Fine-tuning, Deep learningAbstract
Birdwatching is an emerging recreational activity in Bhutan, attracting both local enthusiasts and international tourists due to the country's rich avian biodiversity. This growing interest contributes to local tourism and economic development. However, accurate bird identification remains a challenge due to variations in size, shape, and coloration, compounded by inconsistencies in English and Dzongkha nomenclature. Traditional identification methods, which rely on field guides and expert observations, are often prone to errors and disagreements. To address this limitation, we developed a bird detection and recognition system utilizing image processing and machine learning techniques. Bird images were collected from birdwatchers in Paro, Thimphu, and Trongsa, as well as from the Kaggle dataset. These images underwent preprocessing and augmentation to construct a comprehensive dataset. The study considered 23 bird species, and the model was fine-tuned using Google’s pre-trained transformer encoder for image recognition, operating at a resolution of 244×244 with 16×16 patches. The model was trained on a dataset of 3,595 images, leading to a significant reduction in training and validation losses, from 2.8491 and 1.2231 to 0.0030 and 0.0529, respectively. The results indicate the effectiveness of the proposed approach in enhancing bird species identification, offering a valuable tool for birdwatchers and conservation efforts in Bhutan.