Neural Networks and Birdsongs
Modern Automation in Birdwatching
In order to sustainably protect animals such as birds, constant monitoring and surveillance is critical. However, there are limits to humans’ ability to continuously observe wild animals in their natural habitat. Generally, image and sound recorders are used to collect long-term data. But this can be both a blessing and a curse: the resulting data needs to be analyzed before it can be used. Stefan Kahl, doctoral candidate in the Endowed Professorship of Media Computing in the Department of Computer Science at the Chemnitz University of Technology, has taken on this challenge. Specifically related to birds, he has taken on the question of how to automatically recognize and identify various species based on their vocalizations, or songs, using audio recordings. The European Social Fund (ESF) is providing €57,600 in funding for the project over three years. Further support comes from the Professorship of Media Informatics (Prof. Dr. Maximilian Eibl) and the Endowed Professorship of Media Computing (Dr. Danny Kowerko, Interim Director) at TU Chemnitz as well as businesses in the region.
Using Artificial Neural Networks to Process Data
Kahl’s doctoral research topic arose as a result of cooperation dating back to 2015 between the TU Chemnitz Junior Professorship of Media Computing (formerly Prof. Dr. Marc Ritter, now Mittweida University of Applied Sciences) with the “Bioacoustic Research Program” at the Cornell Lab of Ornithology (Cornell University in Ithaca, New York), under the direction of Prof. Dr. Holger Klinck. The TU professorships and the scientific nonprofit organization seek to make ornithologists’ and bird-watchers’ work easier. To this end, Stefan Kahl is developing software to analyze audio data and automatically identify and classify bird species. However, the computer scientist has already encountered several challenges along the way: “Because of the length of the signals, processing raw audio data isn’t well suited for the detection and classification of audio events. For that, we need to find a representation that can simplify the signals,” describes Kahl.
The TU researcher’s specialty is in image processing. He has been able to put this knowledge to good use while processing audio data. “The software we developed is calibrated to convert audio signals into images, which are called spectrograms. These pictographic representations of audio signals have proven to be particularly useful and appropriate in detecting and classifying acoustic events,” explains Kahl. He takes the recordings, transformed into spectrograms, and uses them to train an artificial neural network to recognize bird calls. “In recent years, artificial neural networks have made extraordinary progress in the domain of object recognition and classification. Current research results demonstrate how effective this method is and support my approach,” says the TU doctoral candidate. However, artificial neural networks require special processors, like those primarily found on powerful (and therefore expensive) graphic cards. For this purpose alone, the Professorship of Media Computing spent around €30,000 on specialized hardware in 2017.
Bird Recognition as Foundation for Other Types of Animals – Support from US Institute
One benefit for Kahl’s research: the birdwatching community is large and very involved when it comes to recording the animals with directional microphones and providing researchers with audio files. “Working with bird vocalizations is a great starting point for the automatic detection and classification of acoustic events. What’s more, it’s less complicated. You can basically get involved right from your own garden,” explains Stefan Kahl and points out: “Without the recordings we received from birdwatchers, some of whom are operating semi-professionally, this project would have been almost impossible.”
The project also receives a great deal of support from the Cornell Lab of Ornithology. The cooperation partner contributes by continually sending so-called “annotations”, which are one possibility for semi-automated programming, as well as audio recordings (soundscapes) and high-quality archive recordings. These files are used to create training data sets. The recordings are often captured using omnidirectional microphones, which are installed in large numbers around the Cornell Lab’s premises and record 24 hours per day. Kahl also receives help from researchers active at the university in New York State, as they allow him access to the code base of a similar project.
Making Mobile Use Possible for Everyone
„Since participating in the internationally-recognized ‘ImageCLEF 2017’ scientific competition, we were again able to make significant improvements to the system. Since then, for example, the recognition of North American bird species is possible with 85% accuracy using monophonic recordings,” comments Kahl on the current status of his research. The TU’s research group took second place in the ImageCLEF competition. Now, tens of thousands of soundscapes can be analyzed automatically.
A further major goal of the research project is the recognition and classification of bird species in real time directly by use of the recorder. This is a particularly difficult endeavor – the device needs to be equipped with high-performance hardware. Nevertheless: initial tests with mobile recorders have already been successfully conducted. In addition, an Android app is currently in development, which should make it possible for everyone to observe birds in their natural habitat and to identify them using their song or call.
Stefan Kahl is currently preparing to participate in ‘ImageCLEF’ again this year.
More information is available from Stefan Kahl (M.Sc.), phone 0371 531-32219, e-mail stefan.kahl@informatik.tu-chemnitz.de
(Translation: Sarah Wilson)
Matthias Fejes
05.03.2018