Notice: Undefined index: username in /var/www/chiro-canto/public/articles/Article.php on line 39


Chiro-Canto Machine Learning Dataset

Aiming to build an auto classification system, Chiro-Canto will make available a large crowd-sourced ultrasound (exp_x10) dataset with validated species recordings.

To be able to build a machine learning model that is capable of classifying new bat cries, we have to collect a huge audio dataset with over an hundred of diverse sound for each species.

To do that, Chiro-Canto will call out all Chiro-Canto users, that offered their recordings under the Creative-Commons-Non commercial-Share Alike 4.0 (CC-By-NC-SA-4.0, default) license using this portal.

In order to have a most clean dataset, all the recordings that will be uploaded to the database have to be correctly identified. If any doubt persists, the species entry has to be set to 'unknown'.

The dataset (including all major versions) will be available to everyone on this website, currently at this address: UltraData datasets.

The classification model code source is already available, and will stay, under the GNU GPL v3 license in the Chiro-Canto organisation git repository.

Every body is invited to try to create its own classification program, inspired by Chiro-Canto's one, and to share it with the whole community.

Published on 2 May 2021

Return to Homepage