Learning to identify birdsong with Balbucam !
Balbucam is now able to offer a very original way to discover other animal life around the nest. Thanks to the new audio system installed for Balbucam Season 3 and improved in Season 4 you will be able to hear a host of other birds such as the Greenfinch, the Blackcap, the Song thrush, the Chaffinch, the Chiffchaff, the Nuthatch, the Wren and lots of others. You’re used to hearing this birdsong but most of the time you might not be able to identify the species in question.
The Balbucam team has developed a system that uses state of the art technology to analyse the audio signal of the birdsong in real time and identify which species is singing. At the bottom of the live video page you will be able to see the real time list of birds that the system has identified. For each species a picture will be displayed along with the possibility to listen to a sample of its song thanks to xeno-canto.org website .
How the project came about
The idea of developing an audio recognition system came up very early, during Season 1, in response to the many questions from viewers on the subject of the sounds heard around the nest, emitted by the local fauna, mostly birds.
How does it work ?
We have submitted sound samples of bird songs to machine learning software. This software has learned by itself to classify the sound clips of the songs of birds. The classification algorithm used is Random Forest (described by Leo Breiman and Adéle Cutler in 2001). We chose the implementation of Random Forest proposed in Weka software. Weka is an open source software developed in Java language by the University of Wellington in New Zealand, it is also the name of a bird species endemic to New Zealand. We also use an Adaboost meta-algorithm to improve the recognition rate of the Random Forest algorithm. The model has been trained to also recognize surrounding noise (wind, rain, aircraft, …) and thus reduce recognition errors.
Here are the operating principles of our system:
1- Sound recording on the nesting site
The sound taken for the sound recognition of birds is performed by an external microphone. Its internal noise is very low and its sampling frequency is high (suitable for bird song analysis). It was placed at the bottom of the nest (which is located 30 meters high) and allows recording the forest concert. The principle used is that of vinyl records of the 1970s: the song is engraved on an audio track and the sounds of the instruments on another track, the set creates a stereo sound. In our device, the sounds generated by the two microphones (from the camera and external) produce a stereo sound (with a headset, we hear on the left the sound coming from the external microphone and on the right the one coming from the microphone of the camera). Part of the noise is removed using software that applies a high-pass filter, the audio track is re-encoded with an AAC (Advanced Audio Codec) codec in stereo, the audio track is “injected” into the video stream which uses the H264 codec, and the stream is transmitted on the “streaming” server (live stream) using the RTMP protocol.
2- Extraction of the audio track on the “streaming” server
When the video is transmitted on the streaming server, the audio part, and more precisely the channel corresponding to the external microphone, is extracted and converted to WAV format, which can then be operated by the signal processing system described below.
3-Audio signal processing
A signal processing is applied to the audio stream because the raw format can not be analyzed directly: it is a question of transforming the raw audio stream into frequency spectra using a software implementation of the Fourier transforms (of the famous French mathematician Joseph Fourier). Below the spectrum of frequencies obtained from a recording of the song of a great tit:
Once the audio stream is transformed into spectrograms, the sequences that do not contain a sufficient sound level are deleted and the others are aggregated in order to improve the machine learning performance. A final step in the processing of the audio signal consists of normalizing the aggregations obtained.
4- classification of the received audio signal
Once the signal is received and processed in real time, it is submitted to our machine learning software that will give each bird a score. The higher the score, the greater the probability that it has been correctly identified. From the scores obtained we apply a last step of filtering which consists of eliminating birds whose scores are not high enough and keeping only the two species with the highest score.
The birds identified in the audio stream are inserted into a cache server. The graphic interface of BalbuCam, created in WordPress, continuously interrogates this server using a WebService and displays online information about the identified birds (name, scientific name, image and example of singing).
Which birds have been identified so far ?
Here is the list of birds in the database so far that the system can recognize:
This list will be added to throughout the season and we hope to be able to identify at least 30 species by the end of the summer.
How reliable is the system ?
The tests that we’ve run show that we can obtain an identification success rate of about 70%, which is variable depending on the species in question. We should be able to improve on this during the season by adding the recordings to the database as they happen.
The efficiency of the system decreases as background noise increases (wind, aircraft, traffic).
What else could this system be used for ?
We could easily link the system to a statistical analysis tool and establish a biodiversity indicator based on the number of species identified. It could also be used to identify rare or uncommon species.
Is the audio identification system “open source”
The identification system is not yet published under Github but could become an open source project of other associations wish to use this technology.
Dan Stowell, Mark D. Plumbley:
Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning
Translation from French to English by Ian Stevenson (Thanks a lot Ian ).