Position title
Master thesis »Deep Learning-Based Audio Analysis for Avian Population Estimation«
Description

The Fraunhofer Institute for Digital Media Technology IDMT is part of the Fraunhofer-Gesellschaft. Headquartered in Ilmenau, Germany, the institute is internationally recognized for its expertise in applied electroacoustics and audio engineering, AI-based signal analysis and machine learning, and data privacy and security. At the headquarters, on the campus of “Technische Universität Ilmenau” researchers work on technologies for robust, trustworthy AI-based analysis and classification of audio and video data. These are used, among other things, to monitor industrial production processes, but also in traffic monitoring or in the media context, for example when it comes to automatic metadata extraction and audio manipulation detection. Another focus is the development of algorithms for the areas of virtual product development, intelligent actuator-sensor systems and audio for the automotive sector. There are currently around 70 employees working at Fraunhofer IDMT in Ilmenau. 

 

In the group Semantic Music Technologies at the Fraunhofer IDMT, one of the main research focuses is on extracting meaningful information, identifying patterns, and making sense of complex acoustic recording. For this purpose, methods from audio signal processing and machine learning are often combined.

 

What you will do

This thesis aims to explore and develop audio analysis methods specifically designed for counting bird populations in outdoor environments. The study will primarily focus on developing deep learning-based audio analysis methods to count distinct bird calls in complex and noisy outdoor soundscapes. These calls may originate from individuals of the same species or from individuals of different species. The ability to detect and count bird calls within a short audio segment enables the derivation of long-term statistics about population density at specific acoustic sensor locations.
Existing bioacoustics research has mainly focused on the task of bird species classification. The most popular model, BirdNet [1], uses a convolutional neural network (CNN) architecture to classify up to 3,000 bird species. Some research exists on animal counting, which employs either computer vision-based methods [2, 3] or audio-based algorithms [4,5]. In previous research at Fraunhofer IDMT, various approaches for polyphony estimation were developed to characterize audio signals. The term “polyphony” can refer to the number of simultaneous pitches [6], the ensemble size in a music recording [7], or the number of audible sound sources in a short environmental audio recording [8].

 

Objectives

(1) As part of the first objective, the student should investigate whether there exist suitable bird song datasets with polyphony annotations. Alternatively, the student should examine the following approach to dataset creation: After defining a taxonomy of approximately 10-20 common bird species (with assistance from domain experts from an ongoing EU research project), corresponding audio recordings should be collected from the Xeno-Canto platform [9], which houses numerous audio recordings of individual bird calls. These recordings can then be randomly mixed to generate a larger dataset of mixtures, covering different degrees of polyphony and allowing for model training and evaluation. 

(2) For the second objective, two deep learning-based approaches should be identified from the literature and re-implemented. Utilizing the previously compiled dataset, models should be trained and evaluated for the task of bird counting. The student should compare two strategies: an explicit counting approach, where the number of birds is predicted directly, and implicit counting approach, where the number of unique species is counted that are classified before by the BirdNet model. 

(3) In the third objective, the robustness of the counting method should be tested having a real-world application scenario of a passive acoustic monitoring (PAM) sensor in mind. For this purpose, different data augmentation methods such as the simulation of various microphone characteristics, room impulse responses, and different signal-to-noise levels with background sounds shall be simulated and their effect on the counting accuracy shall be evaluated. Finally, the results should be documented in a written thesis. 

References
[1] S. Kahl, C. M. Wood, M. Eibl, H. Klinck, BirdNET: A deep learning solution for avian diversity monitoring, Ecological Informatics, Volume 61, 2021, https://birdnet.cornell.edu/ 
[2] C. Arteta, V. Lempitsky, A. Zisserman: Counting in the wild. In Proceedings of the European Conference on Computer Vision (ECCV), 2016. https://www.robots.ox.ac.uk/~vgg/publications/2016/Arteta16/arteta16.pdf 
[3] V. A. Sindagi, V. M. Patel: A survey of recent advances in CNN-based single image crowd counting and density estimation. Pattern Recognition Letters, 107, 3-16, 2018 https://www.sciencedirect.com/science/article/pii/S0167865517302398 
[4] E. Sebastián-González, R. J. Camp, et al. : Density estimation of sound-producing terrestrial animals using single automatic acoustic recorders and distance sampling, Avian Conservation and Ecology , vol. 13 , no. 2 , 2018. https://doi.org/10.5751/ACE-01224-130207
[5] D. Passilongo, L. Mattioli, E. Bassi, et al. : Visualizing sound: counting wolves by using a spectral view of the chorus howling. Front Zool 12, 22 (2015). https://doi.org/10.1186/s12983-015-0114-0    
[6] M. Taenzer, S. Mimilakis, J. Abeßer: Informing Piano Multi-Pitch Estimation with Inferred Local Polyphony Based on Convolutional Neural Networks. Electronics 2021, 10, 851. https://doi.org/10.3390/electronics10070851 
[7] S. Grollmisch, E. Cano, F. M. Ángel, and G. López Gil: Ensemble Size Classification in Colombian Andean String Music Recordings. In Perception, Representations, Image, Sound, Music: 14th International Symposium, CMMR 2019, Marseille, France, October 14–18, 2019, Revised Selected Papers. Springer-Verlag, Berlin, Heidelberg, 60–74. https://doi.org/10.1007/978-3-030-70210-6_4
[8] J. Abeßer, A. Ullah, S. Ziegler and S. Grollmisch: Human and Machine Performance in Counting Sound Classes in Single-Channel Soundscapes, J. Audio Eng. Soc., vol. 71, no. 12, pp. 860–872, 2023, https://doi.org/10.17743/jaes.2022.0106 
[9] https://xeno-canto.org/ 

 

 

What you bring to the table

The prerequisites for this master's thesis topic are excellent skills in audio signal processing and deep learning, practical experience using Python and deep learning libraries such as TensorFlow or PyTorch, as well as a general interest in bioacoustic research topics.

 

What you can expect

  • exciting market-related topics with complex issues to be solved – you can be actively involved in shaping the future 
  • challenges at a high level – on top we offer you excellent opportunities for professional and technical trainings 
  • space to also implement your own ideas, such as in our quarterly open-topic idea contest 
  • an excellent technical infrastructure 
  • renowned partners and customers who work closely with you to develop the technologies of tomorrow 
  • a very good work-life balance thanks to flexible working hours, a co-child office, the option of digital childcare in case of daycare shortages, and the possibility of mobile working, because family comes first – we know that 
  • an open-minded and interested team, a tolerant and familiar atmosphere as well as regular team events 
  • good transport connections and proximity to the state capital Erfurt 
  • attractive special offers as part of Fraunhofer corporate benefits with numerous enterprise partners 
  • new work and diversity are not just empty buzzwords, but an integral part of our corporate culture 

 

We value and promote the diversity of our employees' skills and therefore welcome all applications - regardless of age, gender, nationality, ethnic and social origin, religion, ideology, disability, sexual orientation and identity. Severely disabled persons are given preference in the event of equal suitability. 

With its focus on developing key technologies that are vital for the future and enabling the commercial utilization of this work by business and industry, Fraunhofer plays a central role in the innovation process. As a pioneer and catalyst for groundbreaking developments and scientific excellence, Fraunhofer helps shape society now and in the future. 

Interested? Apply online now. We look forward to getting to know you!

 

 

Professional queries:

Dr. Jakob Abeßer
jakob.abesser@idmt.fraunhofer.de
 

Questions about the application process:

Katrin Pursche
katrin.pursche@idmt.fraunhofer.de

Fraunhofer Institute for Digital Media Technology IDMT 

www.idmt.fraunhofer.de 

 

Requisition Number: 73723                Application Deadline:

 

How the process will look like

Your teammates will gather all requirements within our organization. Then, once priority has been discussed, you will decide as a team on the best solutions and architecture to meet these needs. In continuous increments and continuous communication between the team and stakeholders, you’re part of making data play an even more important (and understood) part withing Brand New Day.

 

Job Benefits

USD 30K - 56K *

Employment Type
Full-time
Beginning of employment
asap
Job Location
Ilmenau, DE, 98693
Working Hours
40
Base Salary
euro USD 30K - 56K *
Date posted
May 14, 2024
PDF Export
Close modal window

Thank you for submitting your application. We will contact you shortly!