Initializing

mirdata.initialize(dataset_name, data_home=None, version='default')[source]

Load a mirdata dataset by name

Example

orchset = mirdata.initialize('orchset')  # get the orchset dataset
orchset.download()  # download orchset
orchset.validate()  # validate orchset
track = orchset.choice_track()  # load a random track
print(track)  # see what data a track contains
orchset.track_ids()  # load all track ids

Parameters:

dataset_name (str) – the dataset’s name see mirdata.DATASETS for a complete list of possibilities
data_home (str or None) – path where the data lives. If None uses the default location.
version (str or None) – which version of the dataset to load. If None, the default version is loaded.

Returns:

Dataset – a mirdata.core.Dataset object

mirdata.list_datasets()[source]

Get a list of all mirdata dataset names

Returns:: list – list of dataset names as strings

Dataset Loaders

acousticbrainz_genre

Acoustic Brainz Genre dataset

Dataset Info

The AcousticBrainz Genre Dataset consists of four datasets of genre annotations and music features extracted from audio suited for evaluation of hierarchical multi-label genre classification systems.

Description about the music features can be found here: https://essentia.upf.edu/streaming_extractor_music.html

The datasets are used within the MediaEval AcousticBrainz Genre Task. The task is focused on content-based music genre recognition using genre annotations from multiple sources and large-scale music features data available in the AcousticBrainz database. The goal of our task is to explore how the same music pieces can be annotated differently by different communities following different genre taxonomies, and how this should be addressed by content-based genre r ecognition systems.

We provide four datasets containing genre and subgenre annotations extracted from four different online metadata sources:

AllMusic and Discogs are based on editorial metadata databases maintained by music experts and enthusiasts. These sources contain explicit genre/subgenre annotations of music releases (albums) following a predefined genre namespace and taxonomy. We propagated release-level annotations to recordings (tracks) in AcousticBrainz to build the datasets.
Lastfm and Tagtraum are based on collaborative music tagging platforms with large amounts of genre labels provided by their users for music recordings (tracks). We have automatically inferred a genre/subgenre taxonomy and annotations from these labels.

For details on format and contents, please refer to the data webpage.

Note, that the AllMusic ground-truth annotations are distributed separately at https://zenodo.org/record/2554044.

If you use the MediaEval AcousticBrainz Genre dataset or part of it, please cite our ISMIR 2019 overview paper:

Bogdanov, D., Porter A., Schreiber H., Urbano J., & Oramas S. (2019).
The AcousticBrainz Genre Dataset: Multi-Source, Multi-Level, Multi-Label, and Large-Scale.
20th International Society for Music Information Retrieval Conference (ISMIR 2019).

This work is partially supported by the European Union’s Horizon 2020 research and innovation programme under grant agreement No 688382 AudioCommons.

class mirdata.datasets.acousticbrainz_genre.Dataset(data_home=None, version='default')[source]

The acousticbrainz genre dataset

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: 1.0 Default version: 1.0

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download data to save_dir and optionally print a message.

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

Partial Downloads: validation-01 validation-89 train-01 train-23 train-45 train-67 train-89 train-ab train-cd train-ef

filter_index(search_key)[source]

Load from AcousticBrainz genre dataset the indexes that match with search_key.

Parameters:: search_key (str) – regex to match with folds, mbid or genres
Returns:: dict – {track_id: track data}

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_all_train()[source]

Load from AcousticBrainz genre dataset the tracks that are used for training across the four different datasets.

Returns:: dict – {track_id: track data}

load_all_validation()[source]

Load from AcousticBrainz genre dataset the tracks that are used for validating across the four different datasets.

Returns:: dict – {track_id: track data}

load_allmusic_train()[source]

Load from AcousticBrainz genre dataset the tracks that are used for validation in allmusic dataset.

Returns:: dict – {track_id: track data}

load_allmusic_validation()[source]

Load from AcousticBrainz genre dataset the tracks that are used for validation in allmusic dataset.

Returns:: dict – {track_id: track data}

load_discogs_train()[source]

Load from AcousticBrainz genre dataset the tracks that are used for training in discogs dataset.

Returns:: dict – {track_id: track data}

load_discogs_validation()[source]

Load from AcousticBrainz genre dataset the tracks that are used for validation in tagtraum dataset.

Returns:: dict – {track_id: track data}

load_extractor(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.acousticbrainz_genre.load_extractor

load_lastfm_train()[source]

Load from AcousticBrainz genre dataset the tracks that are used for training in lastfm dataset.

Returns:: dict – {track_id: track data}

load_lastfm_validation()[source]

Load from AcousticBrainz genre dataset the tracks that are used for validation in lastfm dataset.

Returns:: dict – {track_id: track data}

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_tagtraum_train()[source]

Load from AcousticBrainz genre dataset the tracks that are used for training in tagtraum dataset.

Returns:: dict – {track_id: track data}

load_tagtraum_validation()[source]

Load from AcousticBrainz genre dataset the tracks that are used for validating in tagtraum dataset.

Returns:: dict – {track_id: track data}

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.datasets.acousticbrainz_genre.Track(track_id, data_home, dataset_name, index, metadata)[source]

AcousticBrainz Genre Dataset track class

Parameters:

track_id (str) – track id of the track
data_home (str) – Local path where the dataset is stored. If None, looks for the data in the default directory, ~/mir_datasets

Variables:

track_id (str) – track id
genre (list) – human-labeled genre and subgenres list
mbid (str) – musicbrainz id
mbid_group (str) – musicbrainz id group
artist (list) – the track’s artist/s
title (list) – the track’s title
date (list) – the track’s release date/s
filename (str) – the track’s filename
album (list) – the track’s album/s
track_number (list) – the track number/s
tonal (dict) – dictionary of acousticbrainz tonal features
low_level (dict) – dictionary of acousticbrainz low-level features
rhythm (dict) – dictionary of acousticbrainz rhythm features

Other Parameters:

acousticbrainz_metadata (dict) – dictionary of metadata provided by AcousticBrainz

property album

metadata album annotation

Returns:: list – album

property artist

metadata artist annotation

Returns:: list – artist

property date

metadata date annotation

Returns:: list – date

property file_name

metadata file_name annotation

Returns:: str – file name

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

property low_level

low_level track descriptors.

Returns:

dict –

‘average_loudness’: dynamic range descriptor. It rescales average loudness, computed on 2sec windows with 1 sec overlap, into the [0,1] interval. The value of 0 corresponds to signals with large dynamic range, 1 corresponds to signal with little dynamic range. Algorithms: Loudness
’dynamic_complexity’: dynamic complexity computed on 2sec windows with 1sec overlap. Algorithms: DynamicComplexity
’silence_rate_20dB’, ‘silence_rate_30dB’, ‘silence_rate_60dB’: rate of silent frames in a signal for thresholds of 20, 30, and 60 dBs. Algorithms: SilenceRate
’spectral_rms’: spectral RMS. Algorithms: RMS
’spectral_flux’: spectral flux of a signal computed using L2-norm. Algorithms: Flux
’spectral_centroid’, ‘spectral_kurtosis’, ‘spectral_spread’, ‘spectral_skewness’: centroid and central moments statistics describing the spectral shape. Algorithms: Centroid, CentralMoments
’spectral_rolloff’: the roll-off frequency of a spectrum. Algorithms: RollOff
’spectral_decrease’: spectral decrease. Algorithms: Decrease
’hfc’: high frequency content descriptor as proposed by Masri. Algorithms: HFC
’zerocrossingrate’ zero-crossing rate. Algorithms: ZeroCrossingRate
’spectral_energy’: spectral energy. Algorithms: Energy
’spectral_energyband_low’, ‘spectral_energyband_middle_low’, ‘spectral_energyband_middle_high’,
’spectral_energyband_high’: spectral energy in frequency bands [20Hz, 150Hz], [150Hz, 800Hz], [800Hz, 4kHz], and [4kHz, 20kHz]. Algorithms EnergyBand
’barkbands’: spectral energy in 27 Bark bands. Algorithms: BarkBands
’melbands’: spectral energy in 40 mel bands. Algorithms: MFCC
’erbbands’: spectral energy in 40 ERB bands. Algorithms: ERBBands
’mfcc’: the first 13 mel frequency cepstrum coefficients. See algorithm: MFCC
’gfcc’: the first 13 gammatone feature cepstrum coefficients. Algorithms: GFCC
’barkbands_crest’, ‘barkbands_flatness_db’: crest and flatness computed over energies in Bark bands. Algorithms: Crest, FlatnessDB
’barkbands_kurtosis’, ‘barkbands_skewness’, ‘barkbands_spread’: central moments statistics over energies in Bark bands. Algorithms: CentralMoments
’melbands_crest’, ‘melbands_flatness_db’: crest and flatness computed over energies in mel bands. Algorithms: Crest, FlatnessDB
’melbands_kurtosis’, ‘melbands_skewness’, ‘melbands_spread’: central moments statistics over energies in mel bands. Algorithms: CentralMoments
’erbbands_crest’, ‘erbbands_flatness_db’: crest and flatness computed over energies in ERB bands. Algorithms: Crest, FlatnessDB
’erbbands_kurtosis’, ‘erbbands_skewness’, ‘erbbands_spread’: central moments statistics over energies in ERB bands. Algorithms: CentralMoments
’dissonance’: sensory dissonance of a spectrum. Algorithms: Dissonance
’spectral_entropy’: Shannon entropy of a spectrum. Algorithms: Entropy
’pitch_salience’: pitch salience of a spectrum. Algorithms: PitchSalience
’spectral_complexity’: spectral complexity. Algorithms: SpectralComplexity
’spectral_contrast_coeffs’, ‘spectral_contrast_valleys’: spectral contrast features. Algorithms: SpectralContrast

property rhythm

rhythm essentia extractor descriptors

Returns:

dict –

‘beats_position’: time positions [sec] of detected beats using beat tracking algorithm by Degara et al., 2012. Algorithms: RhythmExtractor2013, BeatTrackerDegara
’beats_count’: number of detected beats
’bpm’: BPM value according to detected beats
’bpm_histogram_first_peak_bpm’, ‘bpm_histogram_first_peak_spread’, ‘bpm_histogram_first_peak_weight’,
’bpm_histogram_second_peak_bpm’, ‘bpm_histogram_second_peak_spread’, ‘bpm_histogram_second_peak_weight’: descriptors characterizing highest and second highest peak of the BPM histogram. Algorithms: BpmHistogramDescriptors
’beats_loudness’, ‘beats_loudness_band_ratio’: spectral energy computed on beats segments of audio across the whole spectrum, and ratios of energy in 6 frequency bands. Algorithms: BeatsLoudness, SingleBeatLoudness
’onset_rate’: number of detected onsets per second. Algorithms: OnsetRate
’danceability’: danceability estimate. Algorithms: Danceability

property title

metadata title annotation

Returns:: list – title

property tonal

tonal features

Returns:

dict –

‘tuning_frequency’: estimated tuning frequency [Hz]. Algorithms: TuningFrequency
’tuning_nontempered_energy_ratio’ and ‘tuning_equal_tempered_deviation’
’hpcp’, ‘thpcp’: 32-dimensional harmonic pitch class profile (HPCP) and its transposed version. Algorithms: HPCP
’hpcp_entropy’: Shannon entropy of a HPCP vector. Algorithms: Entropy
’key_key’, ‘key_scale’: Global key feature. Algorithms: Key
’chords_key’, ‘chords_scale’: Global key extracted from chords detection.
’chords_strength’, ‘chords_histogram’: : strength of estimated chords and normalized histogram of their progression; Algorithms: ChordsDetection, ChordsDescriptors
’chords_changes_rate’, ‘chords_number_rate’: chords change rate in the progression; ratio of different chords from the total number of chords in the progression; Algorithms: ChordsDetection, ChordsDescriptors

property tracknumber

metadata tracknumber annotation

Returns:: list – tracknumber

mirdata.datasets.acousticbrainz_genre.load_extractor(fhandle)[source]

Load a AcousticBrainz Dataset json file with all the features and metadata.

Parameters:

fhandle (str or file-like) – path or file-like object pointing to a json file

Returns:

np.ndarray - the mono audio signal
float - The sample rate of the audio file

baf

BAF Loader

Dataset Info

BAF dataset is only available upon request. To download the audio request access in this link: https://doi.org/10.5281/zenodo.6868083. Then unzip the audio into the baf general dataset folder for the rest of annotations and files. Please include, in the justification field, your academic affiliation (if you have one) and a brief description of your research topics and why you would like to use this dataset.

Overview

Broadcast Audio Fingerprinting dataset is an open, available upon request, annotated dataset for the task of music monitoring in broadcast. It contains 2,000 tracks from Epidemic Sound’s private catalogue as reference tracks that represent 74 hours. As queries, it contains over 57 hours of TV broadcast audio from 23 countries and 203 channels distributed with 3,425 one-min audio excerpts.

It has been annotated by six annotators in total and each query has been cross-annotated by three of them obtaining high inter-annotator agreement percentages, which validates the annotation methodology and ensures the reliability of the annotations.

Purpose of the dataset

This dataset aims to become the standard dataset to evaluate Audio Fingerprinting algorithms since it’s built on real data, without the use of any data-augmentation techniques. It is also the first dataset to address background music fingerprinting, which is a real problem in royalties distribution.

Dataset use

This dataset is available for conducting non-commercial research related to audio analysis. It shall not be used for music generation or music synthesis.

About the data

Sampling frequency: 8 kHz
Bit-depth: 16 bit
Number of channels: 1
Encoding: pcm_s16le
Audio format: .wav

Annotations mark which tracks sound (either in foreground or background) in each query (if any) and also the specific times where it starts and ends sound in the query. Note that there are 88 queries that doesn’t have any matches/annotations .

For more information check the dedicated Github repository: https://github.com/guillemcortes/baf-dataset and the dataset datasheet included in the files.

Ownership of the data

Next, we specify the ownership of all the data included in BAF: Broadcast Audio Fingerprinting dataset. For licensing information, please refer to the “License” section.

Reference tracks

The reference tracks are owned by Epidemic Sound AB, which has given a worldwide, revocable, non-exclusive, royalty-free licence to use and reproduce this data collection consisting of 2,000 low-quality monophonic 8kHz downsampled audio recordings.

Query tracks

The query tracks come from publicly available TV broadcast emissions so the ownership of each recording belongs to the channel that emitted the content. We publish them under the right of quotation provided by the Berne Convention.

Annotations

Guillem Cortès together with Alex Ciurana and Emilio Molina from BMAT Music Licensing S.L. have managed the annotation therefore the annotations belong to BMAT.

Accessing the dataset

The dataset is available upon request. Please include, in the justification field, your academic affiliation (if you have one) and a brief description of your research topics and why you would like to use this dataset. Bear in mind that this information is important for the evaluation of every access request.

License

Given the different ownership of the elements of the dataset, the
dataset is licensed under the following conditions:
    * User's access request
    * Research only, non-commercial purposes
    * No adaptations nor derivative works
    * Attribution to Epidemic Sound and the authors as it is indicated
        in the ”citation” section.

Acknowledgments

With the support of Ministerio de Ciencia Innovación y universidades through Retos-Colaboración call, reference: RTC2019-007248-7, and also with the support of the Industrial Doctorates Plan of the Secretariat of Universities and Research of the Department of Business and Knowledge of the Generalitat de Catalunya. Reference: DI46-2020.

class mirdata.datasets.baf.Dataset(data_home=None, version='default')[source]

The BAF dataset

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: 1.0 Default version: 1.0

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download data to save_dir and optionally print a message.

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.datasets.baf.EventDataExtended(intervals, interval_unit, events, event_unit, tags, tag_unit)[source]

EventDataExtended class. Inherits from annotations.EventData class. An event is defined here as a match query-reference, and the time interval in the query. This class adds the possibility to attach tags to each event, useful if there’s a need to differenciate them. In BAF, tags are [single, majority, unanimity].

Variables:

tags (list) – list of tag labels (as strings)
tag_unit (str) – tag units, one of TAG_UNITS
intervals (np.ndarray) – (n x 2) array of intervals
positive (in the form [start_time, end_time]. Times should be)
duration (and intervals should have non-negative)
interval_unit (str) – unit of the time values in intervals. One
TIME_UNITS. (of)
interval_unit – interval units, one of TIME_UNITS
events (list) – list of event labels (as strings)
event_unit (str) – event units, one of EVENT_UNITS

mirdata.datasets.baf.TAG_UNITS = {'open': 'no scrict schema or units'}: Tag units

class mirdata.datasets.baf.Track(track_id, data_home, dataset_name, index, metadata)[source]

BAF track class.

Parameters:

track_id (str) – track id of the track
data_home (str) – Local path where the dataset is stored. If None, looks for the data in the default directory, ~/mir_datasets/baf

Variables:

audio_path (str) – audio path

Properties:: audio (Tuple[np.ndarray, float]): audio array country (str): country of emission channel (str): tv channel of the emission datetime (str): datetime of the TV emission in YYYY-MM-DD HH:mm:ssformat matches (list): list of matches for a specific query

Returns:: Track – BAF dataset track

property audio: Tuple[ndarray, float]

The track’s audio

Returns:

np.ndarray - audio signal
float - sample rate

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

mirdata.datasets.baf.load_audio(fpath: str) → Tuple[ndarray, float][source]

Load a baf audio file.

Parameters:

fpath (str) – path to audio file

Returns:

np.ndarray - the mono audio signal
float - The sample rate of the audio file

mirdata.datasets.baf.load_matches(track_metadata: dict) → EventDataExtended | None[source]

Load the matches corresponding to a query track.

Parameters:: track_metadata (dict) – track’s metadata
Returns:: Optional[EventDataExtended] – Track’s annotations in EvendDataExtended format

ballroom

Ballroom Rhythm Dataset Loader

Dataset Info

The Ballroom Rhythm Dataset is a comprehensive collection of rhythm annotations for ballroom dance music. This dataset is designed for tasks such as beat tracking, rhythm analysis, and tempo estimation in ballroom dance music. It includes annotations for beats and bars corresponding to different dance styles within the ballroom genre.

Dataset Overview:

The dataset offers beat and bar annotations for various ballroom dance styles, such as Waltz, Tango, Viennese Waltz, Slow Foxtrot, Quickstep, Samba, Cha-Cha-Cha, Rumba, Paso Doble, and Jive. These annotations are provided in a format that includes beat time in seconds and beat ID, facilitating precise rhythm analysis.

Beat and Bar Annotations:

The beat annotations are structured as .beats files, where each line represents a beat with its timestamp and beat ID. For example, a line 9.430022675 3 indicates that the third beat of a bar is located at 9.43 seconds. This format is particularly useful for identifying downbeats, as they correspond to beats with ID = 1.

Annotation Methodology:

The dataset’s annotations are based on the tempo guidelines of each ballroom dance style. Initial annotations were generated using a beat tracker, and then manually adjusted for accuracy. This method ensures that the annotations reflect the characteristic rhythms of each dance style.

Applications:

The Ballroom Rhythm Dataset is ideal for developing and testing algorithms for beat tracking, tempo estimation, and rhythm analysis in ballroom dance music. It can also be used for educational purposes, offering insights into the rhythmic structures of various ballroom dance styles.

Acknowledgments and References:

This dataset was created with the collaboration of experts in ballroom dance music. We extend our gratitude to those who contributed their knowledge and expertise to this project. For detailed information on the dataset and its creation, please refer to the associated research papers and documentation.

[1] Gouyon F., A. Klapuri, S. Dixon, M. Alonso, G. Tzanetakis, C. Uhle, and P. Cano. An experimental comparison of audio tempo induction algorithms. Transactions on Audio, Speech and Language Processing 14(5), pp.1832-1844, 2006.

[2] Böck, S., and M. Schedl. Enhanced beat tracking with context-aware neural networks. In Proceedings of the International Conference on Digital Audio Effects (DAFX), 2010.

[3] Dixon, S., F. Gouyon & G. Widmer. Towards Characterisation of Music via Rhythmic Patterns. In Proceedings of the 5th International Society for Music Information Retrieval Conference (ISMIR). 2004.

class mirdata.datasets.ballroom.Dataset(data_home=None, version='default')[source]

The ballroom dataset

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: 1.0 Default version: 1.0

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download data to save_dir and optionally print a message.

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

Partial Downloads: audio tempo beats

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.datasets.ballroom.Track(track_id, data_home, dataset_name, index, metadata)[source]

Ballroom Rhythm class

Parameters:

track_id (str) – track id of the track
data_home (str) – Local path where the dataset is stored. default=None If None, looks for the data in the default directory, ~/mir_datasets

Variables:

audio_path (str) – path to audio file
beats_path (str) – path to beats file
tempo_path (str) – path to tempo file
genre (str) – genre of the track

Other Parameters:

beats (BeatData) – human-labeled beat annotations
tempo (float) – human-labeled tempo annotations

property audio: Tuple[ndarray, float] | None

The track’s audio

Returns:

np.ndarray - audio signal
float - sample rate

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

mirdata.datasets.ballroom.load_audio(fhandle: BinaryIO) → Tuple[ndarray, float][source]

Load a Ballroom audio file.

Parameters:

fhandle (str or file-like) – path or file-like object pointing to an audio file

Returns:

np.ndarray - the mono audio signal
float - The sample rate of the audio file

mirdata.datasets.ballroom.load_beats(fhandle: TextIO)[source]

Load beats

Parameters:: fhandle (str or file-like) – Local path where the beats annotation is stored.
Returns:: BeatData – beat annotations

mirdata.datasets.ballroom.load_tempo(fhandle: TextIO) → float[source]

Load tempo

Parameters:: fhandle (str or file-like) – Local path where the tempo annotation is stored.
Returns:: float – tempo annotation

beatles

Beatles Dataset Loader

Dataset Info

The Beatles Dataset includes beat and metric position, chord, key, and segmentation annotations for 179 Beatles songs. Details can be found in https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.207.4076&rep=rep1&type=pdf and http://isophonics.net/content/reference-annotations-beatles.

class mirdata.datasets.beatles.Dataset(data_home=None, version='default')[source]

The beatles dataset

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: 1.2 Default version: 1.2

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download data to save_dir and optionally print a message.

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

Partial Downloads: annotations

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_audio(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.beatles.load_audio

load_beats(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.beatles.load_beats

load_chords(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.beatles.load_chords

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_sections(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.beatles.load_sections

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.datasets.beatles.Track(track_id, data_home, dataset_name, index, metadata)[source]

Beatles track class

Parameters:

track_id (str) – track id of the track
data_home (str) – path where the data lives

Variables:

audio_path (str) – track audio path
beats_path (str) – beat annotation path
chords_path (str) – chord annotation path
keys_path (str) – key annotation path
sections_path (str) – sections annotation path
title (str) – title of the track
track_id (str) – track id

Other Parameters:

beats (BeatData) – human-labeled beat annotations
chords (ChordData) – human-labeled chord annotations
key (KeyData) – local key annotations
sections (SectionData) – section annotations

property audio: Tuple[ndarray, float] | None

The track’s audio

Returns:

np.ndarray - audio signal
float - sample rate

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

mirdata.datasets.beatles.load_audio(fhandle: BinaryIO) → Tuple[ndarray, float][source]

Load a Beatles audio file.

Parameters:

fhandle (str or file-like) – path or file-like object pointing to an audio file

Returns:

np.ndarray - the mono audio signal
float - The sample rate of the audio file

mirdata.datasets.beatles.load_beats(fhandle: TextIO) → BeatData[source]

Load Beatles format beat data from a file

Parameters:: fhandle (str or file-like) – path or file-like object pointing to a beat annotation file
Returns:: BeatData – loaded beat data

mirdata.datasets.beatles.load_chords(fhandle: TextIO) → ChordData[source]

Load Beatles format chord data from a file

Parameters:: fhandle (str or file-like) – path or file-like object pointing to a chord annotation file
Returns:: ChordData – loaded chord data

mirdata.datasets.beatles.load_key(fhandle: TextIO) → KeyData[source]

Load Beatles format key data from a file

Parameters:: fhandle (str or file-like) – path or file-like object pointing to a key annotation file
Returns:: KeyData – loaded key data

mirdata.datasets.beatles.load_sections(fhandle: TextIO) → SectionData[source]

Load Beatles format section data from a file

Parameters:: fhandle (str or file-like) – path or file-like object pointing to a section annotation file
Returns:: SectionData – loaded section data

beatport_key

beatport_key Dataset Loader

Dataset Info

The Beatport EDM Key Dataset includes 1486 two-minute sound excerpts from various EDM subgenres, annotated with single-key labels, comments and confidence levels generously provided by Eduard Mas Marín, and thoroughly revised and expanded by Ángel Faraldo.

The original audio samples belong to online audio snippets from Beatport, an online music store for DJ’s and Electronic Dance Music Producers (<http:www.beatport.com>). If this dataset were used in further research, we would appreciate the citation of the current DOI (10.5281/zenodo.1101082) and the following doctoral dissertation, where a detailed description of the properties of this dataset can be found:

Ángel Faraldo (2017). Tonality Estimation in Electronic Dance Music: A Computational and Musically Informed
Examination. PhD Thesis. Universitat Pompeu Fabra, Barcelona.

This dataset is mainly intended to assess the performance of computational key estimation algorithms in electronic dance music subgenres.

Data License: Creative Commons Attribution Share Alike 4.0 International

class mirdata.datasets.beatport_key.Dataset(data_home=None, version='default')[source]

The beatport_key dataset

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: 1.0.0 Default version: 1.0.0

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False)[source]

Download the dataset

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

Partial Downloads: keys metadata audio

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_artist(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.beatport_key.load_artist

load_audio(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.beatport_key.load_audio

load_genre(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.beatport_key.load_genre

load_key(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.beatport_key.load_key

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_tempo(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.beatport_key.load_tempo

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.datasets.beatport_key.Track(track_id, data_home, dataset_name, index, metadata)[source]

beatport_key track class

Parameters:

track_id (str) – track id of the track
data_home (str) – Local path where the dataset is stored.

Variables:

audio_path (str) – track audio path
keys_path (str) – key annotation path
metadata_path (str) – sections annotation path
title (str) – title of the track
track_id (str) – track id

Other Parameters:

key (list) – list of annotated musical keys
artists (list) – artists involved in the track
genre (dict) – genres and subgenres
tempo (int) – tempo in beats per minute

property audio

The track’s audio

Returns:

np.ndarray - audio signal
float - sample rate

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

mirdata.datasets.beatport_key.load_artist(fhandle)[source]

Load beatport_key tempo data from a file

Parameters:: fhandle (str or file-like) – path or file-like object pointing to metadata file
Returns:: list – list of artists involved in the track.

mirdata.datasets.beatport_key.load_audio(fpath)[source]

Load a beatport_key audio file.

Parameters:

fpath (str) – path to an audio file

Returns:

np.ndarray - the mono audio signal
float - The sample rate of the audio file

mirdata.datasets.beatport_key.load_genre(fhandle)[source]

Load beatport_key genre data from a file

Parameters:: fhandle (str or file-like) – path or file-like object pointing to metadata file
Returns:: dict – with the list with genres [‘genres’] and list with sub-genres [‘sub_genres’]

mirdata.datasets.beatport_key.load_key(fhandle)[source]

Load beatport_key format key data from a file

Parameters:: fhandle (str or file-like) – path or file-like object pointing to a key annotation file
Returns:: list – list of annotated keys

mirdata.datasets.beatport_key.load_tempo(fhandle)[source]

Load beatport_key tempo data from a file

Parameters:: fhandle (str or file-like) – path or file-like object pointing to metadata file
Returns:: str – tempo in beats per minute

billboard

McGill Billboard Dataset Loader

Dataset Info

The McGill Billboard dataset includes annotations and audio features corresponding to 890 slots from a random sample of Billboard chart slots. It also includes metadata like Billboard chart date, peak rank, artist name, etc. Details can be found at https://ddmal.music.mcgill.ca/research/The_McGill_Billboard_Project_(Chord_Analysis_Dataset)

class mirdata.datasets.billboard.Dataset(data_home=None, version='default')[source]

The McGill Billboard dataset

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: 2.0 Default version: 2.0

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download data to save_dir and optionally print a message.

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

Partial Downloads: metadata annotation_salami annotation_lab annotation_mirex13 annotation_chordino

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_audio(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.billboard.load_audio

load_chords(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.billboard.load_chords

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_named_sections(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.billboard.load_named_sections

load_sections(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.billboard.load_sections

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.datasets.billboard.Track(track_id, data_home, dataset_name, index, metadata)[source]

McGill Billboard Dataset Track class

Parameters:

track_id (str) – track id of the track

Variables:

track_id (str) – the index for the sample entry
audio_path (str) – audio path of the track
date (chart) – the date of the chart for the entry
rank (peak) – the desired rank on that chart
rank – the rank of the song actually annotated, which may be up to 2 ranks higher or lower than the target rank
title (str) – the title of the song annotated
artist (str) – the name of the artist performing the song annotated
rank – the highest rank the song annotated ever achieved on the Billboard Hot 100
chart (weeks on) – the number of weeks the song annotated spent on the Billboard Hot 100 chart in total

Other Parameters:

chords_full (ChordData) – HTK-style LAB files for the chord annotations (full)
chords_majmin7 (ChordData) – HTK-style LAB files for the chord annotations (majmin7)
chords_majmin7inv (ChordData) – HTK-style LAB files for the chord annotations (majmin7inv)
chords_majmin (ChordData) – HTK-style LAB files for the chord annotations (majmin)
chords_majmininv (ChordData) – HTK-style LAB files for the chord annotations(majmininv)
chroma (np.array) – Array containing the non-negative-least-squares chroma vectors
tuning (list) – List containing the tuning estimates
sections (SectionData) – Letter-annotated section data (A,B,A’)
named_sections (SectionData) – Name-annotated section data (intro, verse, chorus)
salami_metadata (dict) – Metadata of the Salami LAB file

property audio: Tuple[ndarray, float] | None

The track’s audio

Returns:

np.ndarray - audio signal
float - sample rate

chroma

Non-negative-least-squares (NNLS) chroma vectors from the Chordino Vamp plug-in

Returns:: np.ndarray - NNLS chroma vector

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

tuning

Tuning estimates from the Chordino Vamp plug-in

Returns:: list - list of of tuning estimates []

mirdata.datasets.billboard.load_audio(fhandle: BinaryIO) → Tuple[ndarray, float][source]

Load a Billboard audio file.

Parameters:

fhandle (str or file-like) – File-like object or path to audio file

Returns:

np.ndarray - the mono audio signal
float - The sample rate of the audio file

mirdata.datasets.billboard.load_chords(fhandle: TextIO)[source]

Load chords from a Salami LAB file.

Parameters:: fhandle (str or file-like) – path to audio file
Returns:: ChordData – chord data

mirdata.datasets.billboard.load_named_sections(fpath: str)[source]

Load name-annotated sections from a Salami LAB file.

Parameters:: fpath (str) – path to sections file
Returns:: SectionData – section data

mirdata.datasets.billboard.load_sections(fpath: str)[source]

Load letter-annotated sections from a Salami LAB file.

Parameters:: fpath (str) – path to sections file
Returns:: SectionData – section data

brid

BRID Dataset Loader

Dataset Info

The Brazilian Rhythmic Instruments Dataset (BRID) [1] is a valuable resource assembled for research in Music Information Retrieval (MIR). This dataset is designed to facilitate research in computational rhythm analysis, beat tracking, and rhythmic pattern recognition, particularly in the context of Brazilian music. BRID offers a comprehensive collection of solo and multiple-instrument recordings, featuring 10 different instrument classes playing in 5 main rhythm classes from Brazilian music, including samba, partido alto, samba-enredo, capoeira, and marcha.

Dataset Overview:

BRID comprises a total of 367 tracks, averaging about 30 seconds each, amounting to approximately 2 hours and 57 minutes of music. These tracks include recordings of various Brazilian instruments, played in different Brazilian rhythmic styles.

Instruments and Rhythms:

The recorded instruments in BRID represent the most significant instruments in Brazilian music, particularly samba. Ten different instrument classes were chosen, including agogoˆ, caixa (snare drum), cu ́ıca, pandeiro (frame drum), reco-reco, repique, shaker, surdo, tamborim, and tanta ̃. To ensure diversity in sound, these instruments vary in terms of shape, size, material, pitch/tuning, and the way they are struck, resulting in 32 variations.

Rhythms in BRID:

BRID features various Brazilian rhythmic styles, with a focus on samba and its sub-genres, samba-enredo and partido alto. Additionally, the dataset includes rhythms such as marcha, capoeira, and a few tracks of baia ̃o and maxixe styles. The dataset provides a faithful representation of each rhythm, all of which are in duple meter.

Dataset Recording:

All recordings in BRID were made in a professional recording studio in Manaus, Brazil, between October and November.

Applications:

The Brazilian Rhythmic Instruments Dataset (BRID) serves as a crucial resource for researchers in the field of Music Information Retrieval (MIR) and rhythm analysis. It showcases the richness of Brazilian rhythmic content and highlights the challenges that non-Western music presents to traditional computational musicology research. Researchers can use BRID to develop more robust MIR tools tailored to Brazilian music.

Acknowledgments:

We extend our gratitude to the creators of BRID for providing this valuable dataset for research purposes in the field of MIR. Additionally, we acknowledge the authors of the following research paper for their contributions to the dataset and experiments:

[1] Lucas Maia, Pedro D. de Tomaz Júnior, Magdalena Fuentes, Martín Rocamora, Luiz W. P. Biscainho, Maurício V. M. Costa, and Sara Cohen. “A Novel Dataset of Brazilian Rhythmic Instruments and Some Experiments in Computational Rhythm Analysis.” In Proceedings of the {CONGRESO LATINOAMERICANO DE LA AES}, 2018. [Link](https://api.semanticscholar.org/CorpusID:204762166)

For more details on the dataset and its applications, please refer to the associated research papers and documentation.

class mirdata.datasets.brid.Dataset(data_home=None, version='default')[source]

The BRID dataset

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: 1.0 Default version: 1.0

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download data to save_dir and optionally print a message.

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

Partial Downloads: annotations audio

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.datasets.brid.Track(track_id, data_home, dataset_name, index, metadata)[source]

BRID Rhythm class

Parameters:

track_id (str) – track id of the track
data_home (str) – Local path where the dataset is stored. default=None If None, looks for the data in the default directory, ~/mir_datasets

Variables:

audio_path (str) – path to audio file
beats_path (str) – path to beats file
tempo_path (str) – path to tempo file

property audio: Tuple[ndarray, float] | None

The track’s audio

Returns:

np.ndarray - audio signal
float - sample rate

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

mirdata.datasets.brid.load_audio(audio_path)[source]

Load an audio file.

Parameters:

audio_path (str) – path to audio file

Returns:

np.ndarray - the mono audio signal
float - The sample rate of the audio file

mirdata.datasets.brid.load_beats(fhandle: TextIO)[source]

Load beats

Parameters:: fhandle (str or file-like) – Local path where the beats annotation is stored.
Returns:: BeatData – beat annotations

mirdata.datasets.brid.load_tempo(fhandle: TextIO) → float[source]

Load tempo

Parameters:: fhandle (str or file-like) – Local path where the tempo annotation is stored.
Returns:: float – tempo annotation

candombe

Candombe Dataset Loader

Dataset Info

This is a dataset of Candombe recordings with annotated beats and downbeats, totaling over 2 hours of audio. It comprises 35 complete performances by renowned players, in groups of three to five drums. Recording sessions were conducted in studio, in the context of musicological research over the past two decades. A total of 26 tambor players took part, belonging to different generations and representing all the important traditional Candombe styles. The audio files are stereo with a sampling rate of 44.1 kHz and 16-bit precision. The location of beats and downbeats was annotated by an expert, adding to more than 4700 downbeats.

The audio is provided as .flac files and the annotations as .csv files. The values in the first column of the csv file are the time instants of the beats. The numbers on the second column indicate both the bar number and the beat number within the bar. For instance, 1.1, 1.2, 1.3 and 1.4 are the four beats of the first bar. Hence, each label ending with .1 indicates a downbeat. Another set of annotations are provided as .beats files in which the bar numbers are removed.

class mirdata.datasets.candombe.Dataset(data_home=None, version='default')[source]

The candombe dataset

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: 1.0 Default version: 1.0

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download data to save_dir and optionally print a message.

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

Partial Downloads: annotations audio

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.datasets.candombe.Track(track_id, data_home, dataset_name, index, metadata)[source]

Candombe Track class

Parameters:

track_id (str) – track id of the track

Variables:

audio_path (str) – path to audio file
beats_path (str) – path to beats file

Other Parameters:

beats (BeatData) – beat annotations

property audio: Tuple[ndarray, float] | None

The track’s audio

Returns:

np.ndarray - audio signal
float - sample rate

beats

The track’s beats

Returns:: BeatData – loaded beat data

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

mirdata.datasets.candombe.load_audio(fhandle: BinaryIO) → Tuple[ndarray, float][source]

Load a candombe audio file.

Parameters:

fhandle (str or file-like) – path or file-like object pointing to an audio file

Returns:

np.ndarray - the audio signal
float - The sample rate of the audio file

mirdata.datasets.candombe.load_beats(fhandle: TextIO) → BeatData[source]

Load a candombe beats file.

Parameters:: fhandle (str or file-like) – path or file-like object pointing to an audio file
Returns:: BeatData – loaded beat data

cante100

cante100 Loader

Dataset Info

The cante100 dataset contains 100 tracks taken from the COFLA corpus. We defined 10 style families of which 10 tracks each are included. Apart from the style family, we manually annotated the sections of the track in which the vocals are present. In addition, we provide a number of low-level descriptors and the fundamental frequency corresponding to the predominant melody for each track. The meta-information includes editoral meta-data and the musicBrainz ID.

Total tracks: 100

cante100 audio is only available upon request. To download the audio request access in this link: https://zenodo.org/record/1324183. Then unzip the audio into the cante100 general dataset folder for the rest of annotations and files.

Audio specifications:

Sampling frequency: 44.1 kHz
Bit-depth: 16 bit
Audio format: .mp3

cante100 dataset has spectrogram available, in csv format. spectrogram is available to download without request needed, so at first instance, cante100 loader uses the spectrogram of the tracks.

The available annotations are:

F0 (predominant melody)
Automatic transcription of notes (of singing voice)

CANTE100 LICENSE (COPIED FROM ZENODO PAGE)

The provided datasets are offered free of charge for internal non-commercial use.
We do not grant any rights for redistribution or modification. All data collections were gathered
by the COFLA team.
© COFLA 2015. All rights reserved.

For more details, please visit: http://www.cofla-project.com/?page_id=134

class mirdata.datasets.cante100.Dataset(data_home=None, version='default')[source]

The cante100 dataset

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: 1.0 Default version: 1.0

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download data to save_dir and optionally print a message.

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

Partial Downloads: spectrogram melody notes metadata README

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_audio(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.cante100.load_audio

load_melody(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.cante100.load_melody

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_notes(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.cante100.load_notes

load_spectrogram(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.cante100.load_spectrogram

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.datasets.cante100.Track(track_id, data_home, dataset_name, index, metadata)[source]

cante100 track class

Parameters:

track_id (str) – track id of the track
data_home (str) – Local path where the dataset is stored. If None, looks for the data in the default directory, ~/mir_datasets/cante100

Variables:

track_id (str) – track id
identifier (str) – musicbrainz id of the track
artist (str) – performing artists
title (str) – title of the track song
release (str) – release where the track can be found
duration (str) – duration in seconds of the track

Other Parameters:

melody (F0Data) – annotated melody
notes (NoteData) – annotated notes

property audio: Tuple[ndarray, float]

The track’s audio

Returns:

np.ndarray - audio signal
float - sample rate

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

property spectrogram: ndarray | None

spectrogram of The track’s audio

Returns:: np.ndarray – spectrogram

mirdata.datasets.cante100.load_audio(fpath: str) → Tuple[ndarray, float][source]

Load a cante100 audio file.

Parameters:

fpath (str) – path to audio file

Returns:

np.ndarray - the mono audio signal
float - The sample rate of the audio file

mirdata.datasets.cante100.load_melody(fhandle: TextIO) → F0Data | None[source]

Load cante100 f0 annotations

Parameters:: fhandle (str or file-like) – path or file-like object pointing to melody annotation file
Returns:: F0Data – predominant melody

mirdata.datasets.cante100.load_notes(fhandle: TextIO) → NoteData[source]

Load note data from the annotation files

Parameters:: fhandle (str or file-like) – path or file-like object pointing to a notes annotation file
Returns:: NoteData – note annotations

mirdata.datasets.cante100.load_spectrogram(fhandle: TextIO) → ndarray[source]

Load a cante100 dataset spectrogram file.

Parameters:: fhandle (str or file-like) – path or file-like object pointing to an audio file
Returns:: np.ndarray – spectrogram

cipi

Can I play it? (CIPI) Dataset Loader

Dataset Info

The “Can I Play It?” (CIPI) dataset is a specialized collection of 652 classical piano scores, provided in a machine-readable MusicXML format and accompanied by integer-based difficulty levels ranging from 1 to 9, as verified by expert pianists. Then, it provides embeddings for fingering and expresiveness of the piece. Each recording has multiple scores corresponding to it. This dataset focuses exclusively on classical piano music, offering a rich resource for music researchers, educators, and students. Developed by the Music Technology Group in Barcelona, by P. Ramoneda et al.

The CIPI dataset facilitates various applications such as the study of musical complexity, the selection of appropriately leveled pieces for students, and general research in music education. The dataset, alongside embeddings of multiple dimensions of difficulty, has been made publicly available to encourage ongoing innovation and collaboration within the music education and research communities.

The dataset has been published alongside a paper in Expert Systems with Applications Journal.

The dataset is shared under a Creative Commons Attribution Non Commercial Share Alike 4.0 International License, but need to be requested. Please do request the dataset here: https://zenodo.org/records/8037327. The dataset can only be used for open research purposes.

class mirdata.datasets.cipi.Dataset(data_home=None, version='default')[source]

The Can I play it? (CIPI) dataset

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: 1.0 Default version: 1.0

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download data to save_dir and optionally print a message.

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.datasets.cipi.Track(track_id, data_home, dataset_name, index, metadata)[source]

Can I play it? (CIPI) track class

Parameters:

track_id (str) – track id of the track

Variables:

title (str) – title of the track
book (str) – book of the track
URI (str) – URI of the track
composer (str) – name of the author of the track
track_id (str) – track id
musicxml_paths (list) – path to musicxml score. If the music piece contains multiple movents the list will contain multiple paths.
difficulty_annotation (int) – annotated difficulty
fingering_path (tuple) – Path of fingering features from technique dimension computed with ArGNN fingering model. Return of two paths, the right hand and the ones of the left hand. Use torch.load(…) for loading the embeddings.
expressiveness_path (str) – Path of expressiveness features from sound dimension computed with virtuosoNet model.Use torch.load(…) for loading the embeddings.
notes_path (str) – Path of note features from notation dimension. Use torch.load(…) for loading the embeddings.

Other Parameters:

scores (list[music21.stream.Score]) – music21 scores. If the work is split in several movements the list will contain multiple scores.

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

mirdata.datasets.cipi.load_score(fhandle: str, data_home: str = 'tests/resources/mir_datasets/cipi') → music21.stream.Score[source]

Load cipi score in music21 stream

Parameters:

fhandle (str) – path to MusicXML score
data_home (str) – path to cipi dataset

Returns:

music21.stream.Score – score in music21 format

compmusic_carnatic_rhythm

CompMusic Carnatic Rhythm Dataset Loader

Dataset Info

CompMusic Carnatic Rhythm Dataset is a rhythm annotated test corpus for automatic rhythm analysis tasks in Carnatic Music. The collection consists of audio excerpts from the CompMusic Carnatic research corpus, manually annotated time aligned markers indicating the progression through the taala cycle, and the associated taala related metadata. A brief description of the dataset is provided below. For a brief overview and audio examples of taalas in Carnatic music, please see: http://compmusic.upf.edu/examples-taala-carnatic

The dataset contains the following data:

AUDIO: The pieces are chosen from the CompMusic Carnatic music collection. The pieces were chosen in four popular taalas of Carnatic music, which encompasses a majority of Carnatic music. The pieces were chosen include a mix of vocal and instrumental recordings, new and old recordings, and to span a wide variety of forms. All pieces have a percussion accompaniment, predominantly Mridangam. The excerpts are full length pieces or a part of the full length pieces. There are also several different pieces by the same artist (or release group), and multiple instances of the same composition rendered by different artists. Each piece is uniquely identified using the MBID of the recording. The pieces are stereo, 160 kbps, mp3 files sampled at 44.1 kHz.

SAMA AND BEATS: The primary annotations are audio synchronized time-stamps indicating the different metrical positions in the taala cycle. The annotations were created using Sonic Visualizer by tapping to music and manually correcting the taps. Each annotation has a time-stamp and an associated numeric label that indicates the position of the beat marker in the taala cycle. The marked positions in the taala cycle are shown with numbers, along with the corresponding label used. In each case, the sama (the start of the cycle, analogous to the downbeat) are indicated using the numeral 1.

METADATA: For each excerpt, the taala of the piece, edupu (offset of the start of the piece, relative to the sama, measured in aksharas) of the composition, and the kalai (the cycle length scaling factor) are recorded. Each excerpt can be uniquely identified and located with the MBID of the recording, and the relative start and end times of the excerpt within the whole recording. A separate 5 digit taala based unique ID is also provided for each excerpt as a double check. The artist, release, the lead instrument, and the raaga of the piece are additional editorial metadata obtained from the release. A flag indicates if the excerpt is a full piece or only a part of a full piece. There are optional comments on audio quality and annotation specifics.

Possible uses of the dataset: Possible tasks where the dataset can be used include taala, sama and beat tracking, tempo estimation and tracking, taala recognition, rhythm based segmentation of musical audio, structural segmentation, audio to score/lyrics alignment, and rhythmic pattern discovery.

Dataset organization: The dataset consists of audio, annotations, an accompanying spreadsheet providing additional metadata. For a detailed description of the organization, please see the README in the dataset.

Data Subset: A subset of this dataset consisting of 118 two minute excerpts of music is also available. The content in the subset is equaivalent and is separately distributed for a quicker testing of algorithms and approaches.

The annotations files of this dataset are shared with the following license: Creative Commons Attribution Non Commercial Share Alike 4.0 International

class mirdata.datasets.compmusic_carnatic_rhythm.Dataset(data_home=None, version='default')[source]

The compmusic_carnatic_rhythm dataset

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: full_dataset subset full_dataset_1.0 subset_1.0 Default version: full_dataset_1.0

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download data to save_dir and optionally print a message.

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.datasets.compmusic_carnatic_rhythm.Track(track_id, data_home, dataset_name, index, metadata)[source]

CompMusic Carnatic Music Rhythm class

Parameters:

track_id (str) – track id of the track
data_home (str) – Local path where the dataset is stored. default=None If None, looks for the data in the default directory, ~/mir_datasets

Variables:

audio_path (str) – path to audio file
beats_path (srt) – path to beats file
meter_path (srt) – path to meter file

Other Parameters:

beats (BeatData) – beats annotation
meter (string) – meter annotation
mbid (string) – MusicBrainz ID
name (string) – name of the recording in the dataset
artist (string) – artists name
release (string) – release name
lead_instrument_code (string) – code for the load instrument
taala (string) – taala annotation
raaga (string) – raaga annotation
num_of_beats (int) – number of beats in annotation
num_of_samas (int) – number of samas in annotation

property audio

The track’s audio

Returns:

np.ndarray - audio signal
float - sample rate

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

mirdata.datasets.compmusic_carnatic_rhythm.load_audio(audio_path)[source]

Load an audio file.

Parameters:

audio_path (str) – path to audio file

Returns:

np.ndarray - the mono audio signal
float - The sample rate of the audio file

mirdata.datasets.compmusic_carnatic_rhythm.load_beats(fhandle)[source]

Load beats

Parameters:: fhandle (str or file-like) – Local path where the beats annotation is stored.
Returns:: BeatData – beat annotations

mirdata.datasets.compmusic_carnatic_rhythm.load_meter(fhandle)[source]

Load meter

Parameters:: fhandle (str or file-like) – Local path where the meter annotation is stored.
Returns:: float – meter annotation

compmusic_hindustani_rhythm

CompMusic Hindustani Rhythm Dataset Loader

Dataset Info

CompMusic Hindustani Rhythm Dataset is a rhythm annotated test corpus for automatic rhythm analysis tasks in Hindustani Music. The collection consists of audio excerpts from the CompMusic Hindustani research corpus, manually annotated time aligned markers indicating the progression through the taal cycle, and the associated taal related metadata. A brief description of the dataset is provided below.

For a brief overview and audio examples of taals in Hindustani music, please see: http://compmusic.upf.edu/examples-taal-hindustani

The dataset contains the following data:

AUDIO: The pieces are chosen from the CompMusic Hindustani music collection. The pieces were chosen in four popular taals of Hindustani music, which encompasses a majority of Hindustani khyal music. The pieces were chosen include a mix of vocal and instrumental recordings, new and old recordings, and to span three lays. For each taal, there are pieces in dhrut (fast), madhya (medium) and vilambit (slow) lays (tempo class). All pieces have Tabla as the percussion accompaniment. The excerpts are two minutes long. Each piece is uniquely identified using the MBID of the recording. The pieces are stereo, 160 kbps, mp3 files sampled at 44.1 kHz. The audio is also available as wav files for experiments.

SAM, VIBHAAG AND THE MAATRAS: The primary annotations are audio synchronized time-stamps indicating the different metrical positions in the taal cycle. The sam and matras of the cycle are annotated. The annotations were created using Sonic Visualizer by tapping to music and manually correcting the taps. Each annotation has a time-stamp and an associated numeric label that indicates the position of the beat marker in the taala cycle. The annotations and the associated metadata have been verified for correctness and completeness by a professional Hindustani musician and musicologist. The long thick lines show vibhaag boundaries. The numerals indicate the matra number in cycle. In each case, the sam (the start of the cycle, analogous to the downbeat) are indicated using the numeral 1.

METADATA: For each excerpt, the taal and the lay of the piece are recorded. Each excerpt can be uniquely identified and located with the MBID of the recording, and the relative start and end times of the excerpt within the whole recording. A separate 5 digit taal based unique ID is also provided for each excerpt as a double check. The artist, release, the lead instrument, and the raag of the piece are additional editorial metadata obtained from the release. There are optional comments on audio quality and annotation specifics.

The dataset consists of excerpts with a wide tempo range from 10 MPM (matras per minute) to 370 MPM. To study any effects of the tempo class, the full dataset (HMDf) is also divided into two other subsets - the long cycle subset (HMDl) consisting of vilambit (slow) pieces with a median tempo between 10-60 MPM, and the short cycle subset (HMDs) with madhyalay (medium, 60-150 MPM) and the drut lay (fast, 150+ MPM).

Possible uses of the dataset: Possible tasks where the dataset can be used include taal, sama and beat tracking, tempo estimation and tracking, taal recognition, rhythm based segmentation of musical audio, audio to score/lyrics alignment, and rhythmic pattern discovery.

Dataset organization: The dataset consists of audio, annotations, an accompanying spreadsheet providing additional metadata, a MAT-file that has identical information as the spreadsheet, and a dataset description document.

The annotations files of this dataset are shared with the following license: Creative Commons Attribution Non Commercial Share Alike 4.0 International

class mirdata.datasets.compmusic_hindustani_rhythm.Dataset(data_home=None, version='default')[source]

The compmusic_hindustani_rhythm dataset

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: 1.0 Default version: 1.0

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download data to save_dir and optionally print a message.

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.datasets.compmusic_hindustani_rhythm.Track(track_id, data_home, dataset_name, index, metadata)[source]

CompMusic Hindustani Music Rhythm class

Parameters:

track_id (str) – track id of the track
data_home (str) – Local path where the dataset is stored. default=None If None, looks for the data in the default directory, ~/mir_datasets

Variables:

audio_path (str) – path to audio file
beats_path (srt) – path to beats file
meter_path (srt) – path to meter file

Other Parameters:

beats (BeatData) – beats annotation
meter (string) – meter annotation
mbid (string) – MusicBrainz ID
name (string) – name of the recording in the dataset
artist (string) – artists name
release (string) – release name
lead_instrument_code (string) – code for the load instrument
taala (string) – taala annotation
raaga (string) – raaga annotation
laya (string) – laya annotation
num_of_beats (int) – number of beats in annotation
num_of_samas (int) – number of samas in annotation
median_matra_period (float) – median matra per period
median_matras_per_min (float) – median matras per minute
median_ISI (float) – median ISI
median_avarts_per_min (float) – median avarts per minute

property audio

The track’s audio

Returns:

np.ndarray - audio signal
float - sample rate

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

mirdata.datasets.compmusic_hindustani_rhythm.load_audio(audio_path)[source]

Load an audio file.

Parameters:

audio_path (str) – path to audio file

Returns:

np.ndarray - the mono audio signal
float - The sample rate of the audio file

mirdata.datasets.compmusic_hindustani_rhythm.load_beats(fhandle)[source]

Load beats

Parameters:: fhandle (str or file-like) – Local path where the beats annotation is stored.
Returns:: BeatData – beat annotations

mirdata.datasets.compmusic_hindustani_rhythm.load_meter(fhandle)[source]

Load meter

Parameters:: fhandle (str or file-like) – Local path where the meter annotation is stored.
Returns:: float – meter annotation

compmusic_indian_tonic

Indian Art Music Tonic Loader

Dataset Info

This loader includes a combination of six different datasets for the task of Indian Art Music tonic identification.

These datasets comprise audio excerpts and manually done annotations of the tonic pitch of the lead artist for each audio excerpt. Each excerpt is accompanied by its associated editorial metadata. These datasets can be used to develop and evaluate computational approaches for automatic tonic identification in Indian art music. These datasets have been used in several articles mentioned below. A majority of These datasets come from the CompMusic corpora of Indian art music, for which each recording is associated with a MBID. Through the MBID other information can be obtained using the Dunya API.

These six datasets are used for for the task of tonic identification for Indian Art Music, and can be used for a comparative evaluation. To the best of our knowledge these are the largest datasets available for tonic identification for Indian art music. These datases vary in terms of the audio quality, recording period (decade), the number of recordings for Carnatic, Hindustani, male and female singers and instrumental and vocal excerpts.

All the datasets (annotations) are version controlled. The audio files corresponding to these datsets are made available on request for only research purposes. See DOWNLOAD_INFO of this loader.

The tonic annotations are availabe both in tsv and json format. The loader uses the JSON formatted annotations.

'ID': {
    'artist': <name of the lead artist if available>,
    'filepath': <relative path to the audio file>,
    'gender': <gender of the lead singer if available>,
    'mbid': <musicbrainz id when available>,
    'tonic': <tonic in Hz>,
    'tradition': <Hindustani or Carnatic>,
    'type': <vocal or instrumental>
}

where keys of the main dictionary are the filepaths to the audio files (feature path is exactly the same with a different extension of the file name).

Despite not being loaded in this dataloader, the dataset includes features, which may be integrated to the loader in future releases. However these features may be easily computed following the instructions in the related paper. See BIBTEX.

There are a total of 2161 audio excerpts, and while the CM collection includes aproximately 50% Carnatic and 50% Hindustani recordings, IITM and IISc collections are 100% Carnatic music. The excerpts vary a lot in duration. See [this webpage](https://compmusic.upf.edu/iam-tonic-dataset) for a detailed overview of the datasets.

If you have any questions or comments about the dataset, please feel free to email: [sankalp (dot) gulati (at) gmail (dot) com], or [sankalp (dot) gulati (at) upf (dot) edu].

class mirdata.datasets.compmusic_indian_tonic.Dataset(data_home=None, version='default')[source]

The compmusic_indian_tonic dataset

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: 1.0 Default version: 1.0

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download data to save_dir and optionally print a message.

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

Partial Downloads: remote_data

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.datasets.compmusic_indian_tonic.Track(track_id, data_home, dataset_name, index, metadata)[source]

CompMusic Tonic Dataset track class

Parameters:

track_id (str) – track id of the track
data_home (str) – Local path where the dataset is stored.

Variables:

track_id (str) – track id
audio_path (str) – audio path

Other Parameters:

tonic (float) – tonic annotation
artist (str) – performing artist
gender (str) – gender of the recording artists
mbid (str) – MusicBrainz ID of the piece (if available)
type (str) – type of piece (vocal, instrumental, etc.)
tradition (str) – tradition of the piece (Carnatic or Hindustani)

property audio

The track’s audio

Returns:

np.ndarray - audio signal
float - sample rate

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

mirdata.datasets.compmusic_indian_tonic.load_audio(audio_path)[source]

Load a Indian Art Music Tonic audio file.

Parameters:

fhandle (str or file-like) – File-like object or path to audio file

Returns:

np.ndarray - the mono audio signal
float - The sample rate of the audio file

compmusic_iamms

Compmuic IAMMS Dataset Loader

Dataset Info

This dataset comprises audio excerpts and manually done annotations of the melodic phrases in Carnatic and Hindustani music. This dataset can be used to develop and evaluate approaches for computing melodic similarity between short-time melodic patterns in Indian art music.

The dataset contains the following manual annotations referring to audio files:

Section annotations, both original and finetuned, stored as start and end timestamps together with the phrase ID of the section (similar melodic phrases have the same ID).
Nyas event annotations stored as start and end timestamps.
Audio features automatically extracted and stored: pitch and tonic.
The annotations are stored in files with song identifier as the filename and file extension:
- Section annotations: .anot and .anotEdit
- Nyas annotations: .flatSegNyas
- Pitch annotations: .pitch, .pitchSilIntrpPP, tpe and tpe5msSilIntrpPP
- Tonic: .tonic and .tonic

The dataset contains a total of 32 tracks.

The files of this dataset are shared with the following license: Creative Commons Attribution Non Commercial Share Alike 4.0 International

Dataset compiled by: Gulati, S., Serrà, J., and Serra, X.

For more information about the dataset as well as IAM and annotations, please refer to: https://zenodo.org/records/16631794, where a really detailed explanation of the data and annotations is published.

class mirdata.datasets.compmusic_iamms.Dataset(data_home=None, version='default')[source]

The IAM Melodic Similarity dataset.

This dataset contains Carnatic music recordings with annotations for sections, pitch, nyas, and tonic. It is designed to support research on melodic similarity with culturally relevant features.

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: 1.0 Default version: 1.0

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download data to save_dir and optionally print a message.

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

Partial Downloads: all

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.datasets.compmusic_iamms.Track(track_id, data_home, dataset_name, index, metadata)[source]

Track class for IAM Melodic Similarity dataset.

Parameters:

track_id (str) – track id of the track
data_home (str) – Local path where the dataset is stored. default=None

Variables:

audio_path (str) – path to audio file
sections_path (str) – path to sections annotation file
sections_finetuned_path (str) – path to improved sections annotation file
nyas_path (str) – path to nyas features
pitch_path (str) – path to pitch annotation file
pitch_finetuned_path (str) – path to improved pitch annotation file
tonic_path (str) – path to tonic data file
tonic_finetuned_path (str) – path to improved tonic data file

Other Parameters:

audio (tuple) – (audio signal as np.ndarray, sample rate as float)
sections (SectionData) – section annotations
sections_finetuned (SectionData) – improved section annotations
nyas (EventData) – nyas annotations
pitch (F0Data) – pitch annotations
pitch_finetuned (F0Data) – improved pitch annotations
tonic (float) – tonic
tonic_finetuned (float) – tonic finetuned

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

mirdata.datasets.compmusic_iamms.load_audio(audio_path)[source]

Load an audio file.

Parameters:: audio_path (str) – path to audio file
Returns:: tuple – np.ndarray - the stereo audio signal, float - sample rate

mirdata.datasets.compmusic_iamms.load_nyas(fhandle)[source]

Load a nyas annotation.

Parameters:: fhandle (str) – path to annotation file
Returns:: EventData – nyas annotation intervals

mirdata.datasets.compmusic_iamms.load_pitch(fhandle)[source]

Load pitch annotations.

Parameters:: fhandle (str) – path to pitch file
Returns:: F0Data – pitch annotations

mirdata.datasets.compmusic_iamms.load_sections(fhandle)[source]

Load a sections annotation file.

Parameters:: fhandle (str) – path to annotation file
Returns:: SectionData – section annotations with intervals (melodic phrasee) and labels (phrase identifier)

mirdata.datasets.compmusic_iamms.load_tonic(fhandle)[source]

Load track’s tonic.

Parameters:: fhandle (str) – path to tonic file
Returns:: float – tonic frequency in Hz

compmusic_jingju_acappella

Jingju A Cappella Singing Dataset Loader

Dataset Info

Description:

This dataset is a collection of boundary annotations of a cappella singing performed by Beijing Opera (Jingju, 京剧) professional and amateur singers.

Contents:

wav.zip: audio files in .wav format, mono or stereo.
pycode.zip: util code for parsing the .textgrid annotation
catalogue*.csv: recording metadata, source separation recordings are not included.
annotation_txt.zip: phrase, syllable and phoneme time boundaries (second) and labels in .txt format

The annotation_txt.zip folder annotations are represented as follows:

phrase_char: phrase-level time boundaries, labeled in Mandarin characters
phrase: phrase-level time boundaries, labeled in Mandarin pinyin
syllable: syllable-level time boundaries, labeled in Mandarin pinyin
phoneme: phoneme-level time boundaries, labeled in X-SAMPA

The boundaries (onset and offset) have been annotated hierarchically:

phrase (line)
syllable
phoneme

Annotation details:

Singing units in pinyin and X-SAMPA have been annotated to a jingju a cappella singing audio dataset.

Audio details:

The corresponding audio files are the a cappella singing arias recordings, which are stereo or mono, sampled at 44.1 kHz, and stored as .wav files. The .wav files are recorded by two institutes: those file names ending with ‘qm’ are recorded by C4DM, Queen Mary University of London; others file names ending with ‘upf’ or ‘lon’ are recorded by MTG-UPF. Additionally, another collection of 15 clean singing recordings is included in this dataset. They are extracted from the commercial recordings which originally contains karaoke accompaniment and mixed versions.

Additional details:

Annotation format, units, parsing code and other information please refer to: https://github.com/MTG/jingjuPhonemeAnnotation

License information:

Textgrid annotations are licensed under Creative Commons Attribution-NonCommercial 4.0 International License. Wav audio ending with ‘upf’ or ‘lon’ is licensed under Creative Commons Attribution-NonCommercial 4.0 International. For the license of .wav audio ending with ‘qm’ from C4DM Queen Mary University of London, please refer to this page http://isophonics.org/SingingVoiceDataset

class mirdata.datasets.compmusic_jingju_acappella.Dataset(data_home=None, version='default')[source]

The compmusic_jingju_acappella dataset

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: 7.0 Default version: 7.0

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download data to save_dir and optionally print a message.

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

Partial Downloads: annotation_txt catalogue_dan catalogue_laosheng wav

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_phonemes(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.jingju_acapella.load_phonemes

load_phrases(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.jingju_acapella.load_phrases

load_syllable(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.jingju_acapella.load_syllable

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.datasets.compmusic_jingju_acappella.Track(track_id, data_home, dataset_name, index, metadata)[source]

Jingju A Cappella Singing Track class

Parameters:

track_id (str) – track id of the track
data_home (str) – Local path where the dataset is stored. default=None If None, looks for the data in the default directory, ~/mir_datasets

Variables:

audio_path (str) – local path where the audio is stored
phoneme_path (str) – local path where the phoneme annotation is stored
phrase_char_path (str) – local path where the lyric phrase annotation in chinese is stored
phrase_path (str) – local path where the lyric phrase annotation in western characters is stored
syllable_path (str) – local path where the syllable annotation is stored
work (str) – string referring to the work where the track belongs
details (float) – string referring to additional details about the track

Other Parameters:

phoneme (EventData) – phoneme annotation
phrase_char (LyricsData) – lyric phrase annotation in chinese
phrase (LyricsData) – lyric phrase annotation in western characters
syllable (EventData) – syllable annotation

property audio: Tuple[ndarray, float] | None

The track’s audio

Returns:

np.ndarray - audio signal
float - sample rate

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

mirdata.datasets.compmusic_jingju_acappella.load_audio(fhandle: BinaryIO) → Tuple[ndarray, float][source]

Load Jingju A Cappella Singing audio file.

Parameters:

fhandle (str or file-like) – File-like object or path to audio file

Returns:

np.ndarray - the mono audio signal
float - The sample rate of the audio file

mirdata.datasets.compmusic_jingju_acappella.load_phonemes(fhandle: TextIO) → LyricData[source]

Load phonemes

Parameters:: fhandle (str or file-like) – path or file-like object pointing to a phoneme annotation file
Returns:: LyricData – phoneme annotation

mirdata.datasets.compmusic_jingju_acappella.load_phrases(fhandle: TextIO) → LyricData[source]

Load lyric phrases annotation

Parameters:: fhandle (str or file-like) – path or file-like object pointing to a lyric annotation file
Returns:: LyricData – lyric phrase annotation

mirdata.datasets.compmusic_jingju_acappella.load_syllable(fhandle: TextIO) → LyricData[source]

Load syllable

Parameters:: fhandle (str or file-like) – path or file-like object pointing to a syllable annotation file
Returns:: LyricData – syllable annotation

compmusic_otmm_makam

OTMM Makam Recognition Dataset Loader

Dataset Info

NOTE: From mirdata v0.3.8 on, the only version available of this dataset is dlfm2016-fix1, which is basically the same as dlfm2016, but with a few fixes in some annotations. The original dlfm2016 version is still available in mirdata versions <=0.3.7. Note that from dlfm2016 to dlfm2016-fix1, no new recordings or annotation were added, only a few annotation files were fixed.

This dataset is designed to test makam recognition methodologies on Ottoman-Turkish makam music. It is composed of 50 recording from each of the 20 most common makams in CompMusic Project’s Dunya Ottoman-Turkish Makam Music collection. Currently the dataset is the largest makam recognition dataset.

The recordings are selected from commercial recordings carefully such that they cover diverse musical forms, vocal/instrumentation settings and recording qualities (e.g. historical recordings vs. contemporary recordings). Each recording in the dataset is identified by an 16-character long unique identifier called MBID, hosted in MusicBrainz. The makam and the tonic of each recording is annotated in the file annotations.json.

The audio related data in the test dataset is organized by each makam in the folder data. Due to copyright reasons, we are unable to distribute the audio. Instead we provide the predominant melody of each recording, computed by a state-of-the-art predominant melody extraction algorithm optimized for OTMM culture. These features are saved as text files (with the paths data/[makam]/[mbid].pitch) of single column that contains the frequency values. The timestamps are removed to reduce the filesizes. The step size of the pitch track is 0.0029 seconds (an analysis window of 128 sample hop size of an mp3 with 44100 Hz sample rate), with which one can recompute the timestamps of samples.

Moreover the metadata of each recording is available in the repository, crawled from MusicBrainz using an open source tool developed by us. The metadata files are saved as data/[makam]/[mbid].json.

For reproducability purposes we note the version of all tools we have used to generate this dataset in the file algorithms.json (not integrated in the loader but present in the donwloaded dataset).

A complementary toolbox for this dataset is MORTY, which is a mode recogition and tonic identification toolbox. It can be used and optimized for any modal music culture. Further details are explained in the publication above.

class mirdata.datasets.compmusic_otmm_makam.Dataset(data_home=None, version='default')[source]

The compmusic_otmm_makam dataset

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: dlfm2016-fix1 Default version: dlfm2016-fix1

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download data to save_dir and optionally print a message.

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

Partial Downloads: all

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_mb_tags(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.compmusic_otmm_makam.load_mb_tags

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_pitch(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.compmusic_otmm_makam.load_pitch

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.datasets.compmusic_otmm_makam.Track(track_id, data_home, dataset_name, index, metadata)[source]

OTMM Makam Track class

Parameters:

track_id (str) – track id of the track
data_home (str) – Local path where the dataset is stored. default=None If None, looks for the data in the default directory, ~/mir_datasets

Variables:

pitch_path (str) – local path where the pitch annotation is stored
mb_tags_path (str) – local path where the MusicBrainz tags annotation is stored
makam (str) – string referring to the makam represented in the track
tonic (float) – tonic annotation
mbid (str) – MusicBrainz ID of the track

Other Parameters:

pitch (F0Data) – pitch annotation
mb_tags (dict) – dictionary containing the raw editorial track metadata from MusicBrainz

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

mirdata.datasets.compmusic_otmm_makam.load_mb_tags(fhandle: TextIO) → dict[source]

Load track metadata

Parameters:: fhandle (str or file-like) – path or file-like object pointing to musicbrainz metadata file
Returns:: Dict – metadata of the track

mirdata.datasets.compmusic_otmm_makam.load_pitch(fhandle: TextIO) → F0Data[source]

Load pitch

Parameters:: fhandle (str or file-like) – path or file-like object pointing to a pitch annotation file
Returns:: F0Data – pitch annotation

compmusic_raga

CompMusic Raga Dataset Loader

Dataset Info

Rāga datasets from CompMusicomprise two sizable datasets, one for each music tradition, Carnatic and Hindustani. These datasets comprise full length audio recordings and their associated rāga labels. These two datasets can be used to develop and evaluate approaches for performing automatic rāga recognition in Indian art music.

These datasets are derived from the CompMusic corpora of Indian Art Music. Therefore, the dataset has been compiled at the Music Technology Group, by a group of researchers working on the computational analysis of Carnatic and Hindustani music within the framework of the ERC-funded CompMusic project.

Each recording is associated with a MBID. With the MBID other information can be obtained using the Dunya API or pycompmusic.

The Carnatic subset comprises 124 hours of audio recordings and editorial metadata that includes carefully curated and verified rāga labels. It contains 480 recordings belonging to 40 rāgas with 12 recordings per rāga.

The Hindustani subset comprises 116 hours of audio recordings and editorial metadata that includes carefully curated and verified rāga labels. It contains 300 recordings belonging to 30 rāgas with 10 recordings per rāga.

The dataset also includes features per each file: * Tonic: float indicating the recording tonic * Tonic fine tuned: float indicating the manually fine-tuned recording tonic * Predominant pitch: automatically-extracted predominant pitch time-series (timestamps and freq. values) * Post-processed pitch: automatically-extracted and post-processed predominant pitch time-series * Nyas segments: KNN-extracted segments of Nyas (start and end times provided) * Tani segments: KNN-extracted segments of Tanis (start and end times provided)

The dataset includes both txt files and json files that contain information about each audio recording in terms of its mbid, the path of the audio/feature files and the associated rāga identifier. Each rāga is assigned a unique identifier by Dunya, which is similar to the mbid in terms of purpose. A mapping of the rāga id to its transliterated name is also provided.

For more information about the dataset please refer to: https://compmusic.upf.edu/node/328

class mirdata.datasets.compmusic_raga.Dataset(data_home=None, version='default')[source]

The compmusic_raga dataset

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: 1.0 Default version: 1.0

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download data to save_dir and optionally print a message.

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

Partial Downloads: features

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.datasets.compmusic_raga.Track(track_id, data_home, dataset_name, index, metadata)[source]

CompMusic Raga Dataset class

Parameters:

track_id (str) – track id of the track
data_home (str) – Local path where the dataset is stored. default=None If None, looks for the data in the default directory, ~/mir_datasets

Variables:

audio_path (str) – path to audio file
tonic_path (str) – path to tonic annotation
tonic_fine_tuned_path (str) – path to tonic fine-tuned annotation
pitch_path (str) – path to pitch annotation
pitch_post_processed_path (str) – path to processed pitch annotation
nyas_segments_path (str) – path to nyas segments annotation
tani_segments_path (str) – path to tani segments annotation

Other Parameters:

tonic (float) – tonic annotation
tonic_fine_tuned (float) – tonic fine-tuned annotation
pitch (F0Data) – pitch annotation
pitch_post_processed (F0Data) – processed pitch annotation
nyas_segments (EventData) – nyas segments annotation
tani_segments (EventData) – tani segments annotation
recording (str) – name of the recording
concert (str) – name of the concert
artist (str) – name of the artist
mbid (str) – mbid of the recording
raga (str) – raga in the recording
ragaid (str) – id of the raga in the recording
tradition (str) – tradition name (carnatic or hindustani)

property audio

The track’s audio

Returns:

np.ndarray - audio signal
float - sample rate

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

mirdata.datasets.compmusic_raga.load_audio(audio_path)[source]

Load an audio file.

Parameters:

audio_path (str) – path to audio file

Returns:

np.ndarray - the mono audio signal
float - The sample rate of the audio file

mirdata.datasets.compmusic_raga.load_nyas_segments(fhandle)[source]

Load nyas segments

Parameters:: fhandle (str or file-like) – Local path where the nyas segments annotation is stored.
Returns:: EventData – segment annotation

mirdata.datasets.compmusic_raga.load_pitch(fhandle)[source]

Load pitch

Parameters:: fhandle (str or file-like) – Local path where the pitch annotation is stored.
Returns:: F0Data – pitch annotation

mirdata.datasets.compmusic_raga.load_tani_segments(fhandle)[source]

Load tani segments

Parameters:: fhandle (str or file-like) – Local path where the tani segments annotation is stored.
Returns:: EventData – segment annotation

mirdata.datasets.compmusic_raga.load_tonic(fhandle)[source]

Load track absolute tonic

Parameters:: fhandle (str or file-like) – Local path where the tonic path is stored.
Returns:: int – Tonic annotation in Hz

cuidado

Cuidado Rhythm Dataset Loader

Dataset Info

The Cuidado Rhythm Dataset is a comprehensive collection of rhythm annotations for cuidado dance music. This dataset is designed for tasks such as beat tracking, rhythm analysis, and tempo estimation in ballroom dance music. It includes annotations for beats and bars corresponding to different dance styles within the ballroom genre.

Dataset Overview:

The dataset offers beat and bar annotations for various cuidado dance styles, such as Waltz, Tango, Viennese Waltz, Slow Foxtrot, Quickstep, Samba, Cha-Cha-Cha, Rumba, Paso Doble, and Jive. These annotations are provided in a format that includes beat time in seconds and beat ID, facilitating precise rhythm analysis.

Beat and Bar Annotations:

The beat annotations are structured as .beats files, where each line represents a beat with its timestamp and beat ID.

Annotation Methodology:

The dataset’s annotations are based on the tempo guidelines of each cuidado dance style. Initial annotations were generated using a beat tracker, and then manually adjusted for accuracy. This method ensures that the annotations reflect the characteristic rhythms of each dance style.

Applications:

The Cuidado Rhythm Dataset is ideal for developing and testing algorithms for beat tracking, tempo estimation, and rhythm analysis in cuidado dance music. It can also be used for educational purposes, offering insights into the rhythmic structures of various ballroom dance styles.

Acknowledgments and References:

This dataset was created with the collaboration of experts in cuidado dance music. We extend our gratitude to those who contributed their knowledge and expertise to this project. For detailed information on the dataset and its creation, please refer to the associated research papers and documentation (https://zenodo.org/records/1416940).

[1] Gouyon F., A. Klapuri, S. Dixon, M. Alonso, G. Tzanetakis, C. Uhle, and P. Cano. An experimental comparison of audio tempo induction algorithms. Transactions on Audio, Speech and Language Processing 14(5), pp.1832-1844, 2006.

[2] Böck, S., and M. Schedl. Enhanced beat tracking with context-aware neural networks. In Proceedings of the International Conference on Digital Audio Effects (DAFX), 2010.

[3] Dixon, S., F. Gouyon & G. Widmer. Towards Characterisation of Music via Rhythmic Patterns. In Proceedings of the 5th International Society for Music Information Retrieval Conference (ISMIR). 2004.

class mirdata.datasets.cuidado.Dataset(data_home=None, version='default')[source]

The cuidado dataset

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: 1.0 Default version: 1.0

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download data to save_dir and optionally print a message.

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.datasets.cuidado.Track(track_id, data_home, dataset_name, index, metadata)[source]

Cuidado Rhythm Track class

Parameters:

track_id (str) – track id of the track
data_home (str) – Local path where the dataset is stored. default=None If None, looks for the data in the default directory, ~/mir_datasets

Variables:

audio_path (str) – path to audio file
beats_path (str) – path to beats file
tempo_path (str) – path to tempo file

property audio: Tuple[ndarray, float] | None

The track’s audio

Returns:

np.ndarray - audio signal
float - sample rate

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

mirdata.datasets.cuidado.load_audio(audio_path)[source]

Load an audio file.

Parameters:

audio_path (str) – path to audio file

Returns:

np.ndarray - the mono audio signal
float - The sample rate of the audio file

mirdata.datasets.cuidado.load_beats(fhandle: TextIO)[source]

Load beats

Parameters:: fhandle (str or file-like) – Local path where the beats annotation is stored.
Returns:: BeatData – beat annotations

mirdata.datasets.cuidado.load_tempo(fhandle: TextIO) → float[source]

Load tempo

Parameters:: fhandle (str or file-like) – Local path where the tempo annotation is stored.
Returns:: float – tempo annotation

dagstuhl_choirset

Dagstuhl ChoirSet Dataset Loader

Dataset Info

Dagstuhl ChoirSet (DCS) is a multitrack dataset of a cappella choral music. The dataset includes recordings of an amateur vocal ensemble performing two choir pieces in full choir and quartet settings (total duration 55min 30sec). The audio data was recorded during an MIR seminar at Schloss Dagstuhl using different close-up microphones to capture the individual singers’ voices:

Larynx microphone (LRX): contact microphone attached to the singer’s throat.
Dynamic microphone (DYN): handheld dynamic microphone.
Headset microphone (HSM): microphone close to the singer’s mouth.

LRX, DYN and HSM recordings are provided on the Track level. All tracks in the dataset have a LRX recording, while only a subset has DYN and HSM recordings.

In addition to the close-up microphone tracks, the dataset also provides the following recordings:

Room microphone mixdown (STM): mixdown of the stereo room microphone.
Room microphone left (STL): left channel of the stereo microphone.
Room microphone right (STR): right channel of the stereo microphone.
Room microphone mixdown with reverb (StereoReverb_STM): STM signal with artificial reverb.
Piano left (SPL): left channel of the piano accompaniment.
Piano right (SPR): right channel of the piano accompaniment.

All room microphone and piano recordings are provided on the Multitrack level. All multitracks have room microphone signals, while only a subset has piano recordings.

For more details, we refer to: Sebastian Rosenzweig (1), Helena Cuesta (2), Christof Weiß (1), Frank Scherbaum (3), Emilia Gómez (2,4), and Meinard Müller (1): Dagstuhl ChoirSet: A Multitrack Dataset for MIR Research on Choral Singing. Transactions of the International Society for Music Information Retrieval, 3(1), pp. 98–110, 2020. DOI: https://doi.org/10.5334/tismir.48

International Audio Laboratories Erlangen, DE
Music Technology Group, Universitat Pompeu Fabra, Barcelona, ES
University of Potsdam, DE
Joint Research Centre, European Commission, Seville, ES

class mirdata.datasets.dagstuhl_choirset.Dataset(data_home=None, version='default')[source]

The Dagstuhl ChoirSet dataset

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: 1.2.3 Default version: 1.2.3

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download data to save_dir and optionally print a message.

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

Partial Downloads: full_dataset

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_audio(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.dagstuhl_choirset.load_audio

load_beat(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.dagstuhl_choirset.load_beat

load_f0(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.dagstuhl_choirset.load_f0

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_score(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.dagstuhl_choirset.load_score

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.datasets.dagstuhl_choirset.MultiTrack(mtrack_id, data_home, dataset_name, index, track_class, metadata)[source]

Dagstuhl ChoirSet multitrack class

Parameters:

mtrack_id (str) – multitrack id
data_home (str) – Local path where the dataset is stored. If None, looks for the data in the default directory, ~/mir_datasets/dagstuhl_choirset

Variables:

audio_stm_path (str) – path to room mic (mono mixdown) audio file
audio_str_path (str) – path to room mic (right channel) audio file
audio_stl_path (str) – path to room mic (left channel) audio file
audio_rev_path (str) – path to room mic with artifical reverb (mono mixdown) audio file
audio_spl_path (str) – path to piano accompaniment (left channel) audio file
audio_spr_path (str) – path to piano accompaniement (right channel) audio file
beat_path (str) – path to beat annotation file

Other Parameters:

beat (annotations.BeatData) – Beat annotation
notes (annotations.NoteData) – Note annotation
multif0 (annotations.MultiF0Data) – Aggregate of f0 annotations for tracks

property audio_rev: Tuple[ndarray, float] | None

The audio for the room mic with artifical reverb (mono mixdown)

Returns:

np.ndarray - audio signal
float - sample rate

property audio_spl: Tuple[ndarray, float] | None

The audio for the piano accompaniment DI (left channel)

Returns:

np.ndarray - audio signal
float - sample rate

property audio_spr: Tuple[ndarray, float] | None

The audio for the piano accompaniment DI (right channel)

Returns:

np.ndarray - audio signal
float - sample rate

property audio_stl: Tuple[ndarray, float] | None

The audio for the room mic (left channel)

Returns:

np.ndarray - audio signal
float - sample rate

property audio_stm: Tuple[ndarray, float] | None

The audio for the room mic (mono mixdown)

Returns:

np.ndarray - audio signal
float - sample rate

property audio_str: Tuple[ndarray, float] | None

The audio for the room mic (right channel)

Returns:

np.ndarray - audio signal
float - sample rate

get_mix()[source]

Create a linear mixture given a subset of tracks.

Parameters:: track_keys (list) – list of track keys to mix together
Returns:: np.ndarray – mixture audio with shape (n_samples, n_channels)

get_path(key)[source]

Get absolute path to multitrack audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

get_random_target(n_tracks=None, min_weight=0.3, max_weight=1.0)[source]

Get a random target by combining a random selection of tracks with random weights

Parameters:

n_tracks (int or None) – number of tracks to randomly mix. If None, uses all tracks
min_weight (float) – minimum possible weight when mixing
max_weight (float) – maximum possible weight when mixing

Returns:

np.ndarray - mixture audio with shape (n_samples, n_channels)
list - list of keys of included tracks
list - list of weights used to mix tracks

get_target(track_keys, weights=None, average=True, enforce_length=True)[source]

Get target which is a linear mixture of tracks

Parameters:

track_keys (list) – list of track keys to mix together
weights (list or None) – list of positive scalars to be used in the average
average (bool) – if True, computes a weighted average of the tracks if False, computes a weighted sum of the tracks
enforce_length (bool) – If True, raises ValueError if the tracks are not the same length. If False, pads audio with zeros to match the length of the longest track

Returns:

np.ndarray – target audio with shape (n_channels, n_samples)

Raises:

ValueError – if sample rates of the tracks are not equal if enforce_length=True and lengths are not equal

class mirdata.datasets.dagstuhl_choirset.Track(track_id, data_home, dataset_name, index, metadata)[source]

Dagstuhl ChoirSet Track class

Parameters:

track_id (str) – track id of the track

Variables:

audio_dyn_path (str) – dynamic microphone audio path
audio_hsm_path (str) – headset microphone audio path
audio_lrx_path (str) – larynx microphone audio path
f0_crepe_dyn_path (str) – crepe f0 annotation for dynamic microphone path
f0_crepe_hsm_path (str) – crepe f0 annotation for headset microphone path
f0_crepe_lrx_path (str) – crepe f0 annotation for larynx microphone path
f0_pyin_dyn_path (str) – pyin f0 annotation for dynamic microphone path
f0_pyin_hsm_path (str) – pyin f0 annotation for headset microphone path
f0_pyin_lrx_path (str) – pyin f0 annotation for larynx microphone path
f0_manual_lrx_path (str) – manual f0 annotation for larynx microphone path
score_path (str) – score annotation path

Other Parameters:

f0_crepe_dyn (F0Data) – algorithm-labeled (crepe) f0 annotations for dynamic microphone
f0_crepe_hsn (F0Data) – algorithm-labeled (crepe) f0 annotations for headset microphone
f0_crepe_lrx (F0Data) – algorithm-labeled (crepe) f0 annotations for larynx microphone
f0_pyin_dyn (F0Data) – algorithm-labeled (pyin) f0 annotations for dynamic microphone
f0_pyin_hsn (F0Data) – algorithm-labeled (pyin) f0 annotations for headset microphone
f0_pyin_lrx (F0Data) – algorithm-labeled (pyin) f0 annotations for larynx microphone
f0_manual_lrx (F0Data) – manually labeled f0 annotations for larynx microphone
score (NoteData) – time-aligned score representation

property audio_dyn: Tuple[ndarray, float] | None

The audio for the track’s dynamic microphone (if available)

Returns:

np.ndarray - audio signal
float - sample rate

property audio_hsm: Tuple[ndarray, float] | None

The audio for the track’s headset microphone (if available)

Returns:

np.ndarray - audio signal
float - sample rate

property audio_lrx: Tuple[ndarray, float] | None

The audio for the track’s larynx microphone (if available)

Returns:

np.ndarray - audio signal
float - sample rate

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

mirdata.datasets.dagstuhl_choirset.load_audio(fhandle: BinaryIO) → Tuple[ndarray, float][source]

Load a Dagstuhl ChoirSet audio file.

Parameters:

audio_path (str) – path pointing to an audio file

Returns:

np.ndarray - the audio signal
float - The sample rate of the audio file

mirdata.datasets.dagstuhl_choirset.load_beat(fhandle: TextIO) → BeatData[source]

Load a Dagstuhl ChoirSet beat annotation.

Parameters:: fhandle (str or file-like) – File-like object or path to beat annotation file
Returns:: BeatData Object - the beat annotation

mirdata.datasets.dagstuhl_choirset.load_f0(fhandle: TextIO) → F0Data[source]

Load a Dagstuhl ChoirSet F0-trajectory.

Parameters:: fhandle (str or file-like) – File-like object or path to F0 file
Returns:: F0Data Object - the F0-trajectory

mirdata.datasets.dagstuhl_choirset.load_score(fhandle: TextIO) → NoteData[source]

Load a Dagstuhl ChoirSet time-aligned score representation.

Parameters:: fhandle (str or file-like) – File-like object or path to score representation file
Returns:: NoteData Object - the time-aligned score representation

dali

DALI Dataset Loader

Dataset Info

DALI contains 5358 audio files with their time-aligned vocal melody. It also contains time-aligned lyrics at four levels of granularity: notes, words, lines, and paragraphs.

For each song, DALI also provides additional metadata: genre, language, musician, album covers, or links to video clips.

For more details, please visit: https://github.com/gabolsgabs/DALI

class mirdata.datasets.dali.Dataset(data_home=None, version='default')[source]

The dali dataset

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: 1.0 Default version: 1.0

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download data to save_dir and optionally print a message.

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

Partial Downloads: metadata

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_annotations_class(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.dali.load_annotations_class

load_annotations_granularity(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.dali.load_annotations_granularity

load_audio(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.dali.load_audio

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.datasets.dali.Track(track_id, data_home, dataset_name, index, metadata)[source]

DALI melody Track class

Parameters:

track_id (str) – track id of the track

Variables:

album (str) – the track’s album
annotation_path (str) – path to the track’s annotation file
artist (str) – the track’s artist
audio_path (str) – path to the track’s audio file
audio_url (str) – youtube ID
dataset_version (int) – dataset annotation version
ground_truth (bool) – True if the annotation is verified
language (str) – sung language
release_date (str) – year the track was released
scores_manual (int) – manual score annotations
scores_ncc (float) – ncc score annotations
title (str) – the track’s title
track_id (str) – the unique track id
url_working (bool) – True if the youtube url was valid

Other Parameters:

notes (NoteData) – vocal notes
words (LyricData) – word-level lyrics
lines (LyricData) – line-level lyrics
paragraphs (LyricData) – paragraph-level lyrics
annotation-object (DALI.Annotations) – DALI annotation object

property audio: Tuple[ndarray, float] | None

The track’s audio

Returns:

np.ndarray - audio signal
float - sample rate

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

mirdata.datasets.dali.load_annotations_class(annotations_path)[source]

Load full annotations into the DALI class object

Parameters:: annotations_path (str) – path to a DALI annotation file
Returns:: DALI.annotations – DALI annotations object

mirdata.datasets.dali.load_annotations_granularity(annotations_path: TextIO, granularity: str)[source]

Load annotations at the specified level of granularity

Parameters:

annotations_path (str or file-like) – path to a DALI annotation file
granularity (str) – one of ‘notes’, ‘words’, ‘lines’, ‘paragraphs’

Returns:

NoteData for granularity=’notes’ or LyricData otherwise

mirdata.datasets.dali.load_audio(fhandle: BinaryIO) → Tuple[ndarray, float] | None[source]

Load a DALI audio file.

Parameters:

fhandle (str or file-like) – path or file-like object pointing to an audio file

Returns:

np.ndarray - the mono audio signal
float - The sample rate of the audio file

da_tacos

Da-TACOS Dataset Loader

Dataset Info

Da-TACOS: a dataset for cover song identification and understanding. It contains two subsets, namely the benchmark subset (for benchmarking cover song identification systems) and the cover analysis subset (for analyzing the links among cover songs), with pre-extracted features and metadata for 15,000 and 10,000 songs, respectively. The annotations included in the metadata are obtained with the API of SecondHandSongs.com. All audio files we use to extract features are encoded in MP3 format and their sample rate is 44.1 kHz. Da-TACOS does not contain any audio files. For the results of our analyses on modifiable musical characteristics using the cover analysis subset and our initial benchmarking of 7 state-of-the-art cover song identification algorithms on the benchmark subset, you can look at our publication.

For organizing the data, we use the structure of SecondHandSongs where each song is called a ‘performance’, and each clique (cover group) is called a ‘work’. Based on this, the file names of the songs are their unique performance IDs (PID, e.g. P_22), and their labels with respect to their cliques are their work IDs (WID, e.g. W_14).

Metadata for each song includes:

performance title

performance artist

work title

work artist

release year

SecondHandSongs.com performance ID

SecondHandSongs.com work ID

whether the song is instrumental or not

In addition, we matched the original metadata with MusicBrainz to obtain MusicBrainz ID (MBID), song length and genre/style tags. We would like to note that MusicBrainz related information is not available for all the songs in Da-TACOS, and since we used just our metadata for matching, we include all possible MBIDs for a particular songs.

For facilitating reproducibility in cover song identification (CSI) research, we propose a framework for feature extraction and benchmarking in our supplementary repository: acoss. The feature extraction component is designed to help CSI researchers to find the most commonly used features for CSI in a single address. The parameter values we used to extract the features in Da-TACOS are shared in the same repository. Moreover, the benchmarking component includes our implementations of 7 state-of-the-art CSI systems. We provide the performance results of an initial benchmarking of those 7 systems on the benchmark subset of Da-TACOS. We encourage other CSI researchers to contribute to acoss with implementing their favorite feature extraction algorithms and their CSI systems to build up a knowledge base where CSI research can reach larger audiences.

Pre-extracted features:

The list of features included in Da-TACOS can be seen below. All the features are extracted with acoss repository that uses open-source feature extraction libraries such as Essentia, LibROSA, and Madmom.

To facilitate the use of the dataset, we provide two options regarding the file structure.

1. In da-tacos_benchmark_subset_single_files and da-tacos_coveranalysis_subset_single_files folders, we organize the data based on their respective cliques, and one file contains all the features for that particular song.

{
    "chroma_cens": numpy.ndarray,
    "crema": numpy.ndarray,
    "hpcp": numpy.ndarray,
    "key_extractor": {
        "key": numpy.str_,
        "scale": numpy.str_,_
        "strength": numpy.float64
    },
    "madmom_features": {
        "novfn": numpy.ndarray,
        "onsets": numpy.ndarray,
        "snovfn": numpy.ndarray,
        "tempos": numpy.ndarray
    }
    "mfcc_htk": numpy.ndarray,
    "tags": list of (numpy.str_, numpy.str_)
    "label": numpy.str_,
    "track_id": numpy.str_
}

2. In da-tacos_benchmark_subset_FEATURE and da-tacos_coveranalysis_subset_FEATURE folders, the data is organized based on their cliques as well, but each of these folders contain only one feature per song. For instance, if you want to test your system that uses HPCP features, you can download da-tacos_benchmark_subset_hpcp to access the pre-computed HPCP features. An example for the contents in those files can be seen below:

{
    "hpcp": numpy.ndarray,
    "label": numpy.str_,
    "track_id": numpy.str_
}

class mirdata.datasets.da_tacos.Dataset(data_home=None, version='default')[source]

The Da-TACOS dataset

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: 1.1_crema 1.1_full Default version: 1.1_full

benchmark_tracks()[source]

Load from Da-TACOS dataset the benchmark subset tracks.

Returns:: dict – {track_id: track data}

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

coveranalysis_tracks()[source]

Load from Da-TACOS dataset the coveranalysis subset tracks.

Returns:: dict – {track_id: track data}

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download data to save_dir and optionally print a message.

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

Partial Downloads: metadata benchmark_cens benchmark_crema benchmark_hpcp benchmark_key benchmark_madmom benchmark_mfcc coveranalysis_tags coveranalysis_cens coveranalysis_crema coveranalysis_hpcp coveranalysis_key coveranalysis_madmom coveranalysis_mfcc

filter_index(search_key)[source]

Load from Da-TACOS genre dataset the indexes that match with search_key.

Parameters:: search_key (str) – regex to match with folds, mbid or genres
Returns:: dict – {track_id: track data}

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_cens(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.da_tacos.load_cens

load_crema(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.da_tacos.load_crema

load_hpcp(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.da_tacos.load_hpcp

load_key(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.da_tacos.load_key

load_madmom(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.da_tacos.load_madmom

load_mfcc(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.da_tacos.load_mfcc

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_tags(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.da_tacos.load_tags

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.datasets.da_tacos.Track(track_id, data_home, dataset_name, index, metadata)[source]

da_tacos track class

Parameters:

track_id (str) – track id of the track

Variables:

subset (str) – subset which the track belongs to
work_id (str) – id of work’s original track
label (str) – alias of work_id
performance_id (str) – id of cover track
cens_path (str) – cens annotation path
crema_path (str) – crema annotation path
hpcp_path (str) – hpcp annotation path
key_path (str) – key annotation path
madmom_path (str) – madmom annotation path
mfcc_path (str) – mfcc annotation path
tags_path (str) – tags annotation path

Properties:: work_title (str): title of the work work_artist (str): original artist of the work performance_title (str): title of the performance performance_artist (str): artist of the performance release_year (str): release year is_instrumental (bool): True if the track is instrumental performance_artist_mbid (str): musicbrainz id of the performance artist mb_performances (dict): musicbrainz ids of performances

Other Parameters:

cens (np.ndarray) – chroma-cens features
hpcp (np.ndarray) – hpcp features
key (dict) – key data, with keys ‘key’, ‘scale’, and ‘strength’
madmom (dict) – dictionary of madmom analysis features
mfcc (np.ndarray) – mfcc features
tags (list) – list of tags

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

mirdata.datasets.da_tacos.load_cens(fhandle: BinaryIO)[source]

Load Da-TACOS cens features from a file

Parameters:: fhandle (str or file-like) – File-like object or path to chroma-cens file
Returns:: np.ndarray – cens features

mirdata.datasets.da_tacos.load_crema(fhandle: BinaryIO)[source]

Load Da-TACOS crema features from a file

Parameters:: fhandle (str or file-like) – File-like object or path to crema file
Returns:: np.ndarray – crema features

mirdata.datasets.da_tacos.load_hpcp(fhandle: BinaryIO)[source]

Load Da-TACOS hpcp features from a file

Parameters:: fhandle (str or file-like) – File-like object or path to hpcp file
Returns:: np.ndarray – hpcp features

mirdata.datasets.da_tacos.load_key(fhandle: BinaryIO)[source]

Load Da-TACOS key features from a file.

Parameters:: fhandle (str or file-like) – File-like object or path to key file
Returns:: dict – key, mode and confidence

Examples

{‘key’: ‘C’, ‘scale’: ‘major’, ‘strength’: 0.8449875116348267}

mirdata.datasets.da_tacos.load_madmom(fhandle: BinaryIO)[source]

Load Da-TACOS madmom features from a file

Parameters:: fhandle (str or file-like) – File-like object or path to madmom file
Returns:: dict – madmom features, with keys ‘novfn’, ‘onsets’, ‘snovfn’, ‘tempos

mirdata.datasets.da_tacos.load_mfcc(fhandle: BinaryIO)[source]

Load Da-TACOS mfcc from a file

Parameters:: fhandle (str or file-like) – File-like object or path to mfcc file
Returns:: np.ndarray – array of mfccs over time

mirdata.datasets.da_tacos.load_tags(fhandle: BinaryIO)[source]

Load Da-TACOS tags from a file

Parameters:: fhandle (str or file-like) – File-like object or path to tags file
Returns:: list – tags, in the form [(tag, confidence), …]

Example

[(‘rock’, ‘0.127’), (‘pop’, ‘0.014’), …]

egfxset

EGFxSet Dataset Loader

Dataset Info

EGFxSet (Electric Guitar Effects dataset) features recordings for all clean tones in a 22-fret Stratocaster, recorded with 5 different pickup configurations, also processed through 12 popular guitar effects. Our dataset was recorded in real hardware, making it relevant for music information retrieval tasks on real music. We also include annotations for parameter settings of the effects we used.

EGFxSet is a dataset of 8,970 audio files with a 5-second duration each, summing a total time of - 12 hours and 28 minutes -.

All possible 138 notes of a standard tuning 22 frets guitar were recorded in each one of the 5 pickup configurations, giving a total of 690 clean tone audio files ( 58 min ).

The 690 clean audio (58 min) files were processed through 12 different audio effects employing actual guitar gear (no VST emulations were used), summing a total of 8,280 processed audio files (11 hours 30 min).

The effects employed were divided into four categories, and each category comprised three different effects. Sometimes there were employed more than one effect from a same guitar equipment.

Categories, Models and Effects:

Distortion:

Boss BD-2:
Blues Driver

Ibanez Minitube Screamer:
Tube Screamer

ProCo RAT2:
Distortion

Modulation:

Boss CE-3:
Chorus

MXR Phase 45:
Phaser

Mooer E-Lady:
Flanger

Delays:

Line6 DL-4:
Digital Delay, Tape Echo, Sweep Echo

Reverb:

Orange CR-60 Combo Amplifier:
Plate Reverb, Hall Reverb, Spring Reverb

Annotations are labeled by a trained electric guitar musician. For each tone, we provide:

Guitar string number

Fret number

Guitar pickup configuration

Effect name

Effect type

Hardware modes

Knob names

Knob types

Knob settings

The dataset website is: https://egfxset.github.io/

The data can be accessed here: https://zenodo.org/record/7044411#.YxKdSWzMKEI

An ISMIR extended abstract was presented in 2022: https://ismir2022.ismir.net/program/lbd/

This dataset was conceived during Iran Roman’s “Deep Learning for Music Information Retrieval” course imparted in the postgraduate studies in music technology at the UNAM (Universidad Nacional Autónoma de México). The result is a combined effort between two UNAM postgraduate students (Hegel Pedroza and Gerardo Meza) and Iran Roman(NYU).

class mirdata.datasets.egfxset.Dataset(data_home=None, version='default')[source]

The EGFxSet dataset

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: 1.0 Default version: 1.0

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download data to save_dir and optionally print a message.

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

Partial Downloads: bluesDriver chorus clean digitalDelay flanger hallReverb phaser plateReverb rat spring-Reverb sweepEcho tapeEcho tubeScreamer metadata

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.datasets.egfxset.Track(track_id, data_home, dataset_name, index, metadata)[source]

EGFxSet Track class

Parameters:

track_id (str) – track id of the track

Variables:

audio_path (str) – path to the track’s audio file
stringfret_tuple (list) – an array with the tuple of the note recorded
pickup_configuration (string) – the pickup used in the recording
effect (str) – the effect recorded
model (str) – the model of the hardware used
effect_type (str) the type of effect used (distortion, modulation, delay or reverb)
knob_names (list) – an array with the knob names of the effect used or “None” when the recording is a clean effect sound
knob_type (list) – an array with the type of knobs of the effect used or “None” when the recording is a clean effect sound
setting (list) – the setting of the effect recorded or “None” when the recording is a clean effect sound

Other Parameters:

note_name (ndarray) – a list with the note name annotation of the audio file (e.g. “Ab5”, “C6”, etc.)
midinote (NoteData) – the midinote annotation of the audio file

property audio: Tuple[ndarray, float] | None

Solo guitar audio (mono)

Returns:

np.ndarray - audio signal
float - sample rate

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

mirdata.datasets.egfxset.load_audio(fhandle: BinaryIO) → Tuple[ndarray, float][source]

Load EGFxSet guitar audio

Parameters:

fhandle (str or file-like) – File-like object or path to audio file

Returns:

np.ndarray - audio signal
float - sample rate

filosax

Filosax Dataset Loader

Dataset Info

The Filosax dataset was conceived, curated and compiled by Dave Foster (a PhD student on the AIM programme at QMUL) and his supervisor Simon Dixon (C4DM @ QMUL). The dataset is a collection of 48 multitrack jazz recordings, where each piece has 8 corresponding audio files:

The original Aebersold backing track (stereo)
Bass_Drums, a mono file of a mix of bass and drums
Piano_Drums, a mono file of a mix of piano and drums
Participant 1 Sax, a mono file of solo saxophone
Participant 2 Sax, a mono file of solo saxophone
Participant 3 Sax, a mono file of solo saxophone
Participant 4 Sax, a mono file of solo saxophone
Participant 5 Sax, a mono file of solo saxophone

Each piece is ~6mins long, so each of the 8 stems contains ~5hours of audio

For each piece, there is a corresponding .jams file containing piece-level annotations:

Beat annotation for the start of each bar and any mid-bar chord change
Chord annotation for each bar, and mid-bar chord change
Section annotation for when the solo changes between the 3 categories:
1. head (melody)
2. written solo (interpretation of transcribed solo)
3. improvised solo

For each Sax recording (5 per piece), there is a corresponding .json file containing note annotations (see Note object).

The Participant folders also contain MIDI files of the transcriptions (frame level and score level) as well as a PDF and MusicXML of the typeset solo.

The dataset comes in 2 flavours: full (all 48 tracks and 5 sax players) and lite (5 tracks and 2 sax players). Both flavours can be used with or without the backing tracks (which need to be purchased online). Hence, when opening the dataset, use one of 4 versions: ‘full’, ‘full_sax’, ‘lite’, ‘lite_sax’.

class mirdata.datasets.filosax.Dataset(data_home=None, version='default')[source]

The Filosax dataset

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: full full_sax lite lite_sax full_1.0 full_sax_1.0 lite_1.0 lite_sax_1.0 Default version: full_1.0

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download data to save_dir and optionally print a message.

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.datasets.filosax.MultiTrack(mtrack_id, data_home, dataset_name, index, track_class, metadata)[source]

Filosax multitrack class

Parameters:

mtrack_id (str) – multitrack id
data_home (str) – Local path where the dataset is stored. If None, looks for the data in the default directory, ~/mir_datasets/Filosax

Variables:

mtrack_id (str) – track id
tracks (dict) – {track_id: Track}
track_audio_property (str) – the name of the attribute of Track which returns the audio to be mixed
name (str) – the name of the tune
duration (float) – the duration, in seconds
beats (list, Observation) – the time and beat numbers of bars and chord changes
chords (list, Observation) – the time of chord changes
segments (list, Observation) – the time of segment changes
bass_drums (Track) – the associated bass/drums track
piano_drums (Track) – the associated piano/drums track
sax (list, Track) – a list of associated sax tracks

Other Parameters:

annotation (jams.JAMS) – a .jams file containing the annotations

annotation

dictionary loaded from json file

Type:: output type

property bass_drums

The associated bass/drums track

Returns:

Track

property beats

The times of downbeats and chord changes

Returns:

(SortedKeyList, Observation) - timestamp, duration (seconds), beat

property chords

The times and values of chord changes

Returns:

(SortedKeyList, Observation) - timestamp, duration (seconds), chord symbol

property duration

The track’s duration

Returns:

float - track duration (in seconds)

get_mix()[source]

Create a linear mixture given a subset of tracks.

Parameters:: track_keys (list) – list of track keys to mix together
Returns:: np.ndarray – mixture audio with shape (n_samples, n_channels)

get_path(key)[source]

Get absolute path to multitrack audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

get_random_target(n_tracks=None, min_weight=0.3, max_weight=1.0)[source]

Get a random target by combining a random selection of tracks with random weights

Parameters:

n_tracks (int or None) – number of tracks to randomly mix. If None, uses all tracks
min_weight (float) – minimum possible weight when mixing
max_weight (float) – maximum possible weight when mixing

Returns:

np.ndarray - mixture audio with shape (n_samples, n_channels)
list - list of keys of included tracks
list - list of weights used to mix tracks

get_target(track_keys, weights=None, average=True, enforce_length=True)[source]

Get target which is a linear mixture of tracks

Parameters:

track_keys (list) – list of track keys to mix together
weights (list or None) – list of positive scalars to be used in the average
average (bool) – if True, computes a weighted average of the tracks if False, computes a weighted sum of the tracks
enforce_length (bool) – If True, raises ValueError if the tracks are not the same length. If False, pads audio with zeros to match the length of the longest track

Returns:

np.ndarray – target audio with shape (n_channels, n_samples)

Raises:

ValueError – if sample rates of the tracks are not equal if enforce_length=True and lengths are not equal

property name

The track’s name

Returns:

str - track name

property piano_drums

The associated piano/drums track

Returns:

Track

property sax

The associated sax tracks (1-5)

Returns:

(list, Track)

property segments

The times of segment changes (values are ‘head’, ‘written solo’, ‘improvised solo’)

Returns:

(SortedKeyList, Observation) - timestamp, duration (seconds), beat

class mirdata.datasets.filosax.Note(input_dict)[source]

Filosax Note class - dictionary wrapper to give dot properties

Parameters:

input_dict (dict) – dictionary of attributes

Variables:

a_start_time (float) – the time stamp of the note start, in seconds
a_end_time (float) – the time stamp of the note end, in seconds
a_duration (float) – the duration of the note, in seconds
a_onset_time (float) – the onset time (compared to a_start_time) (filosax_full only, 0.0 otherwise)
midi_pitch (int) – the quantised midi pitch
crochet_num (int) – the number of sub-divisions which define a crochet (always 24)
musician (int) – the participant ID
bar_num (int) – the bar number of the start of the note
s_start_time (float) – the time stamp of the score note start, in seconds
s_duration (float) – the duration of the score note, in seconds
s_end_time (float) – the time stamp of the score note end, in seconds
s_rhythmic_duration (int) – the duration of the score note (compared to crochet_num)
s_rhythmic_position (int) – the position in the bar of the score note start (compared to crochet_num)
tempo (float) – the tempo at the start of the note, in beats per minute
bar_type (int) – the section annotation where 0 = head, 1 = written solo, 2 = improvised solo
is_grace (bool) – is the note a grace note, associated with the following note
chord_changes (dict) – the chords, where the key is the rhythmic position of the chord (using crochet_num, relative to s_rhythmic_position) and the value a JAMS chord annotation (An additional chord is added in the case of a quaver at the end of the bar, followed by a rest on the downbeat)
num_chord_changes (int) – the number of chords which accompany the note (usually 1, sometimes >1 for long notes)
main_chord_num (int) – usually 0, sometimes 1 in the quaver case described above
scale_changes (list, int) – the degree of the chromatic scale when midi_pitch is compared to chord_root
loudness_max_val (float) – the value (db) of the maximum loudness
loudness_max_time (float) – the time (seconds) of the maximum loudness (compared to a_start_time)
loudness_curve (list, float) – the inter-note loudness values, 1 per millisecond
pitch_average_val (float) – the value (midi) of the average pitch and
pitch_average_time (float) – the time (seconds) of the average pitch (compared to a_start_time)
pitch_curve (list, float) – the inter-note pitch values, 1 per millisecond
pitch_vib_freq (float) – the vibrato frequency (Hz), 0.0 if no vibrato detected
pitch_vib_ext (float) – the vibrato extent (midi), 0.0 if no vibrato detected
spec_cent (float) – the spectral centroid value at the time of the maximum loudness
spec_flux (float) – the spectral flux value at the time of the maximum loudness
spec_cent_curve (list, float) – the inter-note spectral centroid values, 1 per millisecond
spec_flux_curve (list, float) – the inter-note spectral flux values, 1 per millisecond
seq_len (int) – the length of the phrase in which the note falls (filosax_full only, -1 otherwise)
seq_num (int) – the note position in the phrase (filosax_full only, -1 otherwise)

class mirdata.datasets.filosax.Track(track_id, data_home, dataset_name, index, metadata)[source]

Filosax track class

Parameters:

track_id (str) – track id of the track

Variables:

audio_path (str) – path to audio file
annotation_path (str) – path to annotation file
midi_path (str) – path to MIDI file
musicXML_path (str) – path to musicXML file
pdf_path (str) – path to PDF file

Other Parameters:

notes (list, Note) – an ordered list of Note objects

property audio: Tuple[ndarray, float] | None

The track’s audio

Returns:

np.ndarray - audio signal
float - sample rate

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

notes

The track’s note list - only for Sax files

Returns:

[Note] - ordered list of Note objects (empty if Backing file)

mirdata.datasets.filosax.load_annotation(fhandle: TextIO) → List[Note][source]

Load a Filosax annotation file.

Parameters:: fhandle (str or file-like) – path or file-like object pointing to an audio file
Returns:: ** (list, Note)* – an ordered list of Note objects

mirdata.datasets.filosax.load_audio(fhandle: BinaryIO) → Tuple[ndarray, float][source]

Load a Filosax audio file.

Parameters:

fhandle (str or file-like) – path or file-like object pointing to an audio file

Returns:

np.ndarray - the audio signal
float - The sample rate of the audio file

fma_keys

FMA Keys Dataset Loader

Dataset Info

FMA Keys is an expert-labeled dataset for the evaluation of key detection containing 340 hours (5489 songs) of song-level key and mode annotations, spread across 17 genres.

This dataset has been annotated by one annotator with perfect pitch and twenty years of music experience as a concert pianist. A sample of this dataset was cross-annotated by two annotators with high inter-annotator agreement.

Dataset use

The annotations are available for conducting non-commercial research related to audio analysis.

About the dataset

For each song, we provide annotations for: - FMA track id - Spotify URI (when available) - Key and mode

The modes are provided both as strings and numbers:: “Major” <-> 1, “minor” <-> 0

Similarly, for the keys: “C” <-> 0, “C#” <-> 1, etc.

We also provide easy access to the underlying audio data from the FMA dataset.

We filtered the FMA dataset to a subset that exists in the Spotify API through fuzzy matching the artists, titles. Next, we compared song duration and discard results that are egregiously different.

About the audio

All the audio is collected in and distributed by the FMA dataset by Michael Defferrard, Kirell Benzi, Pierre Vandergheynst, and Xavier Bresson.

The FMA metadata is made freely available for public use under a Creative Commons license. We do not hold the copyright on the audio and distribute it under the license chosen by the artist. The dataset is meant for research purposes.

class mirdata.datasets.fma_keys.Dataset(data_home=None, version='default')[source]

The FMA Keys dataset

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: 1.0 Default version: 1.0

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download data to save_dir and optionally print a message.

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

Partial Downloads: metadata tracks-000-019 tracks-020-039 tracks-040-049 tracks-050-059 tracks-060-069 tracks-070-079 tracks-080-089 tracks-090-099 tracks-100-109 tracks-110-124

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.datasets.fma_keys.Track(track_id, data_home, dataset_name, index, metadata)[source]

FMA Keys Track class

Parameters:

track_id (str) – track id of the track

Variables:

spotify_uri (str) – Spotify URI if available
key (str) – key of the track (C, C#, etc)
mode (str) – mode of the track (Major, minor)
key_number (int) – numeric key of the track (0-11)
mode_number (int) – numeric mode of the track (0 for minor, 1 for Major)
audio_path (str) – path to the track’s audio file
audio (ndarray) – audio data

property audio: Tuple[ndarray, float] | None: Returns: * np.ndarray - audio signal * float - sample rate

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

mirdata.datasets.fma_keys.load_audio(path: str) → Tuple[ndarray, float][source]

Load fma keys audio

Parameters:

path (str) – Path to audio file

Returns:

np.ndarray - audio signal
float - sample rate

four_way_tabla

Four-Way Tabla Stroke Transcription and Classification Loader

Dataset Info

The Four-Way Tabla Dataset includes audio recordings of tabla solo with onset annotations for particular strokes types. This dataset was published in 2021 in the context of ISMIR2021 (Online), and may be used for tasks related to tabla analysis, including problems such as onset detection and stroke classification.

Total audio samples: We do have a total of 226 samples for training and 10 for testing. Each audio has an approximate duration of 1 minute.

Audio specifications:

Sampling frequency: 44.1 kHz
Bit-depth: 16 bit
Audio format: .wav

Dataset usage: This dataset may be used for the data-driven research of tabla stroke transcription and identification. In this dataset, four important tabla characteristic strokes are considered.

Dataset structure: The dataset is split in two subsets, containing training and testing samples. Within each subset, there is a folder containing the audios, and another folder containing the onset annotations. The onset annotations are organized in a folder per each stroke type: b, d, rb, rt. Therefore, the paths to onsets would look like:

train/onsets/<StrokeType>/<ID>.onsets

The dataset is made available by CompMusic under a Creative Commons Attribution 3.0 Unported (CC BY 3.0) License.

class mirdata.datasets.four_way_tabla.Dataset(data_home=None, version='default')[source]

The Four-Way Tabla dataset

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: 1.0 Default version: 1.0

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download data to save_dir and optionally print a message.

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

Partial Downloads: remote_data

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.datasets.four_way_tabla.Track(track_id, data_home, dataset_name, index, metadata)[source]

Four-Way Tabla track class

Parameters:

track_id (str) – track id of the track
data_home (str) – Local path where the dataset is stored.

Variables:

track_id (str) – track id
audio_path (str) – audio path
onsets_b_path (str) – path to B onsets
onsets_d_path (str) – path to D onsets
onsets_rb_path (str) – path to RB onsets
onsets_rt_path (str) – path to RT onsets

property audio: Tuple[ndarray, float] | None

The track’s audio

Returns:

np.ndarray - audio signal
float - sample rate

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

property onsets_b: BeatData | None

Onsets for stroke B

Returns:

annotations.BeatData - onsets annotation

property onsets_d: BeatData | None

Onsets for stroke D

Returns:

annotations.BeatData - onsets annotation

property onsets_rb: BeatData | None

Onsets for stroke RB

Returns:

annotations.BeatData - onsets annotation

property onsets_rt: BeatData | None

Onsets for stroke RT

Returns:

annotations.BeatData - onsets annotation

mirdata.datasets.four_way_tabla.load_audio(fhandle: BinaryIO) → Tuple[ndarray, float][source]

Load a Mridangam Stroke Dataset audio file.

Parameters:

fhandle (str or file-like) – File-like object or path to audio file

Returns:

np.ndarray - the mono audio signal
float - The sample rate of the audio file

mirdata.datasets.four_way_tabla.load_onsets(fhandle)[source]

Load stroke onsets

Parameters:: fhandle (str or file-like) – Local path where the pitch annotation is stored.
Returns:: EventData – onset annotations

freesound_one_shot_percussive_sounds

Freesound One-Shot Percussive Sounds Dataset Loader

Dataset Info

Introduction:

This dataset contains 10254 one-shot (single event) percussive sounds from freesound.org, a timbral analysis computed by two different extractors (FreesoundExtractor from Essentia and AudioCommons Extractor), and a list of tags. There is also metadata information about the audio file, since the audio specifications are not the same along all the dataset tracks. The analysis data was used to train the generative model for “Neural Percussive Synthesis Parameterised by High-Level Timbral Features”.

Dataset Construction:

To collect this dataset, the following steps were performed: * Freesound was queried with words associated with percussive instruments, such as “percussion”, “kick”, “wood” or “clave”. Only sounds with less than one second of effective duration were selected. * This stage retrieved some audio clips that contained multiple sound events or that were of low quality. Therefore, we listened to all the retrieved sounds and manually discarded the sounds presenting one of these characteristics. For this, the percussive-annotator was used (https://github.com/xavierfav/percussive-annotator). This tool allows the user to annotate a dataset that focuses on percussive sounds. * The sounds were then cut or padded to have 1-second length, normalized and downsampled to 16kHz. * Finally, the sounds were analyzed with the AudioCommons Extractor, to obtain the AudioCommons timbral descriptors.

Authors and Contact:

This dataset was developed by António Ramires, Pritish Chadna, Xavier Favory, Emilia Gómez and Xavier Serra. Any questions related to this dataset please contact: António Ramires (antonio.ramires@upf.edu / aframires@gmail.com)

Acknowledgements:

This work has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 765068 (MIP-Frontiers). This work has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 770376 (TROMPA).

class mirdata.datasets.freesound_one_shot_percussive_sounds.Dataset(data_home=None, version='default')[source]

The Freesound One-Shot Percussive Sounds dataset

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: 1.0 Default version: 1.0

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download data to save_dir and optionally print a message.

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

Partial Downloads: audio analysis sound_info_analysis metadata readme

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_audio(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.freesound_one_shot_percussive_sounds.load_audio

load_file_metadata(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.freesound_one_shot_percussive_sounds.load_f ile_metadata

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.datasets.freesound_one_shot_percussive_sounds.Track(track_id, data_home, dataset_name, index, metadata)[source]

Freesound one-shot percussive sounds track class

Parameters:

track_id (str) – track id of the track
data_home (str) – Local path where the dataset is stored. If None, looks for the data in the default directory, ~/mir_datasets/freesound_one_shot_percussive_sounds

Variables:

file_metadata_path (str) – local path where the analysis file is stored and from where we get the file metadata
audio_path (str) – local path where audio file is stored
track_id (str) – track id
filename (str) – filename of the track
username (str) – username of the Freesound uploader of the track
license (str) – link to license of the track file
tags (list) – list of tags of the track
freesound_preview_urls (dict) – dict of Freesound previews urls of the track
freesound_analysis (str) – dict of analysis parameters computed in Freesound using Essentia extractor
audiocommons_analysis (str) – dict of analysis parameters computed using AudioCommons Extractor

Other Parameters:

file_metadata (dict) – metadata parameters of the track file in form of Python dictionary

property audio: Tuple[ndarray, float] | None

The track’s audio

Returns:

np.ndarray - audio signal
float - sample rate

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

mirdata.datasets.freesound_one_shot_percussive_sounds.load_audio(fhandle: BinaryIO) → Tuple[ndarray, float][source]

Load the track audio file.

Parameters:

fhandle (str) – path to an audio file

Returns:

np.ndarray - the mono audio signal
float - The sample rate of the audio file

mirdata.datasets.freesound_one_shot_percussive_sounds.load_file_metadata(fhandle: TextIO) → dict | None[source]

Extract file metadata from analysis json file

Parameters:: fhandle (str or file-like) – path or file-like object pointing to f0 annotation file
Returns:: analysis – track analysis dict

giantsteps_key

giantsteps_key Dataset Loader

Dataset Info

The GiantSteps+ EDM Key Dataset includes 600 two-minute sound excerpts from various EDM subgenres, annotated with single-key labels, comments and confidence levels by Daniel G. Camhi, and thoroughly revised and expanded by Ángel Faraldo at MTG UPF. Additionally, 500 tracks have been thoroughly analysed, containing pitch-class set descriptions, key changes, and additional modal changes. This dataset is a revision of the original GiantSteps Key Dataset, available in Github (<https://github.com/GiantSteps/giantsteps-key-dataset>) and initially described in:

Knees, P., Faraldo, Á., Herrera, P., Vogl, R., Böck, S., Hörschläger, F., Le Goff, M. (2015).
Two Datasets for Tempo Estimation and Key Detection in Electronic Dance Music Annotated from User Corrections.
In Proceedings of the 16th International Society for Music Information Retrieval Conference, 364–370. Málaga, Spain.

The original audio samples belong to online audio snippets from Beatport, an online music store for DJ’s and Electronic Dance Music Producers (<http:www.beatport.com>). If this dataset were used in further research, we would appreciate the citation of the current DOI (10.5281/zenodo.1101082) and the following doctoral dissertation, where a detailed description of the properties of this dataset can be found:

Ángel Faraldo (2017). Tonality Estimation in Electronic Dance Music: A Computational and Musically Informed Examination.
PhD Thesis. Universitat Pompeu Fabra, Barcelona.

This dataset is mainly intended to assess the performance of computational key estimation algorithms in electronic dance music subgenres.

All the data of this dataset is licensed with Creative Commons Attribution Share Alike 4.0 International.

class mirdata.datasets.giantsteps_key.Dataset(data_home=None, version='default')[source]

The giantsteps_key dataset

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: + Default version: +

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download data to save_dir and optionally print a message.

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

Partial Downloads: audio keys metadata

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_artist(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.giantsteps_key.load_artist

load_audio(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.giantsteps_key.load_audio

load_genre(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.giantsteps_key.load_genre

load_key(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.giantsteps_key.load_key

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_tempo(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.giantsteps_key.load_tempo

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.datasets.giantsteps_key.Track(track_id, data_home, dataset_name, index, metadata)[source]

giantsteps_key track class

Parameters:

track_id (str) – track id of the track

Variables:

audio_path (str) – track audio path
keys_path (str) – key annotation path
metadata_path (str) – sections annotation path
title (str) – title of the track
track_id (str) – track id

Other Parameters:

key (str) – musical key annotation
artists (list) – list of artists involved
genres (dict) – genres and subgenres
tempo (int) – crowdsourced tempo annotations in beats per minute

property audio: Tuple[ndarray, float]

The track’s audio

Returns:

np.ndarray - audio signal
float - sample rate

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

mirdata.datasets.giantsteps_key.load_artist(fhandle: TextIO) → List[str][source]

Load giantsteps_key tempo data from a file

Parameters:: fhandle (str or file-like) – File-like object or path pointing to metadata annotation file
Returns:: list – list of artists involved in the track.

mirdata.datasets.giantsteps_key.load_audio(fpath: str) → Tuple[ndarray, float][source]

Load a giantsteps_key audio file.

Parameters:

fpath (str) – str pointing to an audio file

Returns:

np.ndarray - the mono audio signal
float - The sample rate of the audio file

mirdata.datasets.giantsteps_key.load_genre(fhandle: TextIO) → Dict[str, List[str]][source]

Load giantsteps_key genre data from a file

Parameters:: fhandle (str or file-like) – File-like object or path pointing to metadata annotation file
Returns:: dict – {‘genres’: […], ‘subgenres’: […]}

mirdata.datasets.giantsteps_key.load_key(fhandle: TextIO) → str[source]

Load giantsteps_key format key data from a file

Parameters:: fhandle (str or file-like) – File like object or string pointing to key annotation file
Returns:: str – loaded key data

mirdata.datasets.giantsteps_key.load_tempo(fhandle: TextIO) → str[source]

Load giantsteps_key tempo data from a file

Parameters:: fhandle (str or file-like) – File-like object or string pointing to metadata annotation file
Returns:: str – loaded tempo data

giantsteps_tempo

giantsteps_tempo Dataset Loader

Dataset Info

GiantSteps tempo + genre is a collection of annotations for 664 2min(1) audio previews from www.beatport.com, created by Richard Vogl <richard.vogl@tuwien.ac.at> and Peter Knees <peter.knees@tuwien.ac.at>

references:

[giantsteps_tempo_cit_1]

Peter Knees, Ángel Faraldo, Perfecto Herrera, Richard Vogl, Sebastian Böck, Florian Hörschläger, Mickael Le Goff: “Two data sets for tempo estimation and key detection in electronic dance music annotated from user corrections”, Proc. of the 16th Conference of the International Society for Music Information Retrieval (ISMIR’15), Oct. 2015, Malaga, Spain.

[giantsteps_tempo_cit_2]

Hendrik Schreiber, Meinard Müller: “A Crowdsourced Experiment for Tempo Estimation of Electronic Dance Music”, Proc. of the 19th Conference of the International Society for Music Information Retrieval (ISMIR’18), Sept. 2018, Paris, France.

The audio files (664 files, size ~1gb) can be downloaded from http://www.beatport.com/ using the bash script:

https://github.com/GiantSteps/giantsteps-tempo-dataset/blob/master/audio_dl.sh

To download the files manually use links of the following form: http://geo-samples.beatport.com/lofi/<name of mp3 file> e.g.: http://geo-samples.beatport.com/lofi/5377710.LOFI.mp3

To convert the audio files to .wav use the script found at https://github.com/GiantSteps/giantsteps-tempo-dataset/blob/master/convert_audio.sh and run:

./convert_audio.sh

To retrieve the genre information, the JSON contained within the website was parsed. The tempo annotation was extracted from forum entries of people correcting the bpm values (i.e. manual annotation of tempo). For more information please refer to the publication [giantsteps_tempo_cit_1].

[giantsteps_tempo_cit_2] found some files without tempo. There are:

3041381.LOFI.mp3
3041383.LOFI.mp3
1327052.LOFI.mp3

Their v2 tempo is denoted as 0.0 in tempo and mirex and has no annotation in the JAMS format.

Most of the audio files are 120 seconds long. Exceptions are:

name              length (sec)
906760.LOFI.mp3   62
1327052.LOFI.mp3  70
4416506.LOFI.mp3  80
1855660.LOFI.mp3  119
3419452.LOFI.mp3  119
3577631.LOFI.mp3  119

class mirdata.datasets.giantsteps_tempo.Dataset(data_home=None, version='default')[source]

The giantsteps_tempo dataset

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: 2.0 Default version: 2.0

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download data to save_dir and optionally print a message.

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

Partial Downloads: annotations

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_audio(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.giantsteps_tempo.load_audio

load_genre(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.giantsteps_tempo.load_genre

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_tempo(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.giantsteps_tempo.load_tempo

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.datasets.giantsteps_tempo.Track(track_id, data_home, dataset_name, index, metadata)[source]

giantsteps_tempo track class

Parameters:

track_id (str) – track id of the track

Variables:

audio_path (str) – track audio path
title (str) – title of the track
track_id (str) – track id
annotation_v1_path (str) – track annotation v1 path
annotation_v2_path (str) – track annotation v2 path

Other Parameters:

genre (dict) – Human-labeled metadata annotation
tempo (list) – List of annotations.TempoData, ordered by confidence
tempo_v2 (list) – List of annotations.TempoData for version 2, ordered by confidence

property audio: Tuple[ndarray, float]

The track’s audio

Returns:

np.ndarray - audio signal
float - sample rate

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

mirdata.datasets.giantsteps_tempo.load_audio(fhandle: str) → Tuple[ndarray, float][source]

Load a giantsteps_tempo audio file.

Parameters:

fhandle (str or file-like) – path to audio file

Returns:

np.ndarray - the mono audio signal
float - The sample rate of the audio file

mirdata.datasets.giantsteps_tempo.load_genre(fhandle: TextIO) → str[source]

Load genre data from a file

Parameters:: fhandle (TextIO) – file handle to metadata annotation file
Returns:: str – loaded genre data

mirdata.datasets.giantsteps_tempo.load_tempo(fhandle: TextIO) → TempoData[source]

Load giantsteps_tempo tempo data from a file ordered by confidence

Parameters:: fhandle (str or file-like) – File-like object or path to tempo annotation file
Returns:: annotations.TempoData – Tempo data

good_sounds

Good-Sounds Dataset Loader

Dataset Info

The Good-Sounds dataset is born of the collaboration between the Music Technology Group and Korg. Good-Sounds [2, 16] is carried out recording a training dataset of single note excerpts including six classes of sounds per studied instrument. Twelve different instruments are recorded, as is shown in Table 2. For each instrument, the complete range of playable semitones is captured several times with various tonal characteristics. There are two classes: Good and Bad sounds. Bad sounds are divided into five sub-classes, one for each musical dimension stated by the expert musicians. Bad sounds are composed by examples of note recordings that are intentionally badly played. The last class includes examples of note recordings that are considered to be well played.

This dataset was created in the context of the Pablo project, partially funded by KORG Inc. It contains monophonic recordings of two kind of exercises: single notes and scales. The recordings were made in the Universitat Pompeu Fabra / Phonos recording studio by 15 different professional musicians, all of them holding a music degree and having some expertise in teaching. 12 different instruments were recorded using one or up to 4 different microphones (depending on the recording session). For all the instruments the whole set of playable semitones in the instrument is recorded several times with different tonal characteristics. Each note is recorded into a separate mono .flac audio file of 48kHz and 32 bits. The tonal characteristics are explained both in the the following section and the related publication. The database is meant for organizing the sounds in a handy way. It is organised in four different entities: sounds, takes, packs and ratings.

class mirdata.datasets.good_sounds.Dataset(data_home=None, version='default')[source]

The GOOD-SOUNDS dataset

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: 1.0 Default version: 1.0

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download data to save_dir and optionally print a message.

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

Partial Downloads: packs ratings sounds takes audios

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_audio(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.good_sounds.load_audio

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.datasets.good_sounds.Track(track_id, data_home, dataset_name, index, metadata)[source]

GOOD-SOUNDS Track class

Parameters:

track_id (str) – track id of the track

Variables:

audio_path (str) – Path to the audio file

Other Parameters:

ratings_info (dict) – A dictionary containing the entity Ratings.
Some musicians self-rated their performance in a 0-10 goodness scale for the user evaluation of the first project
prototype. Please read the paper for more detailed information. –
- id
- mark: the rate or score.
- type: the klass of the sound. Related to the tags of the sound.
- created_at
- comments
- sound_id
- rater: the musician who rated the sound.
pack_info (dict) – A dictionary containing the entity Pack. A pack is a group of sounds from the same recording session. The audio files are organised in the sound_files directory in subfolders with the pack name to which they belong. The following metadata is associated with the entity Pack. - id - name - description
sound_info (dict) – A dictionary containing the entity Sound. A sound can have several takes as some of them were recorded using different microphones at the same time. The following metadata is associated with the entity Sound. - id - instrument: flute, cello, clarinet, trumpet, violin, sax_alto, sax_tenor, sax_baritone, sax_soprano, oboe, piccolo, bass - note - octave - dynamics: for some sounds, the musical notation of the loudness level (p, mf, f..) - recorded_at: recording date and time - location: recording place - player: the musician who recorded. For detailed information about the musicians contact us. - bow_velocity: for some string instruments the velocity of the bow (slow, medieum, fast) - bridge_position: for some string instruments the position of the bow (tasto, middle, ponticello) - string: for some string instruments the number of the string in which the sound it’s played (1: lowest in pitch) - csv_file: used for creation of the DB - csv_id: used for creation of the DB - pack_filename: used for creation of the DB - pack_id: used for creation of the DB - attack: for single notes, manual annotation of the onset in samples. - decay: for single notes, manual annotation of the decay in samples. - sustain: for single notes, manual annotation of the beginnig of the sustained part in samples. - release: for single notes, manual annotation of the beginnig of the release part in samples. - offset: for single notes, manual annotation of the offset in samples - reference: 1 if sound was used to create the models in the good-sounds project, 0 if not. - Other tags regarding tonal characteristics are also available. - comments: if any - semitone: midi note - pitch_reference: the reference pitch - klass: user generated tags of the tonal qualities of the sound. They also contain information about the exercise, that could be single note or scale. * “good-sound”: good examples of single note * “bad”: bad example of one of the sound attributes defined in the project (please read the papers for a detailed explanation) * “scale-good”: good example of a one octave scale going up and down (15 notes). If the scale is minor a tagged “minor” is also available. * “scale-bad”: bad example scale of one of the sounds defined in the project. (15 notes up and down).
take_info (dict) – A dictionary containing the entity Take. A sound can have several takes as some of them were recorded using different microphones at the same time. Each take has an associated audio file. The annotations. Each take has an associated audio file. The following metadata is associated with the entity Sound. - id - microphone - filename: the name of the associated audio file - original_filename: - freesound_id: for some sounds uploaded to freesound.org - sound_id: the id of the sound in the DB - goodsound_id: for some of the sounds available in good-sounds.org
microphone (str) – the microphone used to record the take.
instrument (str) – the instrument recorded (flute, cello, clarinet, trumpet, violin, sax_alto, sax_tenor, sax_baritone, sax_soprano, oboe, piccolo, bass).
klass (str) – user generated tags of the tonal qualities of the sound. They also contain information about the exercise, that could be single note or scale. * “good-sound”: good examples of single note * “bad”: bad example of one of the sound attributes defined in the project (please read the papers for a detailed explanation) * “scale-good”: good example of a one octave scale going up and down (15 notes). If the scale is minor a tagged “minor” is also available. * “scale-bad”: bad example scale of one of the sounds defined in the project. (15 notes up and down).
semitone (int) – midi note
pitch_reference (int) – the reference pitch

property audio: Tuple[ndarray, float] | None

The track’s audio

Returns:

np.ndarray - audio signal
float - sample rate

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

mirdata.datasets.good_sounds.load_audio(fhandle: BinaryIO) → Tuple[ndarray, float][source]

Load a GOOD-SOUNDS audio file.

Parameters:

fhandle (str or file-like) – path or file-like object pointing to an audio file

Returns:

np.ndarray - the mono audio signal
float - The sample rate of the audio file

groove_midi

Groove MIDI Loader

Dataset Info

The Groove MIDI Dataset (GMD) is composed of 13.6 hours of aligned MIDI and synthesized audio of human-performed, tempo-aligned expressive drumming. The dataset contains 1,150 MIDI files and over 22,000 measures of drumming.

To enable a wide range of experiments and encourage comparisons between methods on the same data, Gillick et al. created a new dataset of drum performances recorded in MIDI format. They hired professional drummers and asked them to perform in multiple styles to a click track on a Roland TD-11 electronic drum kit. They also recorded the aligned, high-quality synthesized audio from the TD-11 and include it in the release.

The Groove MIDI Dataset (GMD), has several attributes that distinguish it from existing ones:

The dataset contains about 13.6 hours, 1,150 MIDI files, and over 22,000 measures of drumming.
Each performance was played along with a metronome set at a specific tempo by the drummer.
The data includes performances by a total of 10 drummers, with more than 80% of duration coming from hired professionals. The professionals were able to improvise in a wide range of styles, resulting in a diverse dataset.
The drummers were instructed to play a mix of long sequences (several minutes of continuous playing) and short beats and fills.
Each performance is annotated with a genre (provided by the drummer), tempo, and anonymized drummer ID.
Most of the performances are in 4/4 time, with a few examples from other time signatures.
Four drummers were asked to record the same set of 10 beats in their own style. These are included in the test set split, labeled eval-session/groove1-10.
In addition to the MIDI recordings that are the primary source of data for the experiments in this work, the authors captured the synthesized audio outputs of the drum set and aligned them to within 2ms of the corresponding MIDI files.

A train/validation/test split configuration is provided for easier comparison of model accuracy on various tasks.

The dataset is made available by Google LLC under a Creative Commons Attribution 4.0 International (CC BY 4.0) License.

For more details, please visit: http://magenta.tensorflow.org/datasets/groove

class mirdata.datasets.groove_midi.Dataset(data_home=None, version='default')[source]

The groove_midi dataset

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: 1.0.0 Default version: 1.0.0

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download data to save_dir and optionally print a message.

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

Partial Downloads: all

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_audio(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.groove_midi.load_audio

load_beats(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.groove_midi.load_beats

load_drum_events(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.groove_midi.load_drum_events

load_midi(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.groove_midi.load_midi

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.datasets.groove_midi.Track(track_id, data_home, dataset_name, index, metadata)[source]

Groove MIDI Track class

Parameters:

track_id (str) – track id of the track

Variables:

drummer (str) – Drummer id of the track (ex. ‘drummer1’)
session (str) – Type of session (ex. ‘session1’, ‘eval_session’)
track_id (str) – track id of the track (ex. ‘drummer1/eval_session/1’)
style (str) – Style (genre, groove type) of the track (ex. ‘funk/groove1’)
tempo (int) – track tempo in beats per minute (ex. 138)
beat_type (str) – Whether the track is a beat or a fill (ex. ‘beat’)
time_signature (str) – Time signature of the track (ex. ‘4-4’, ‘6-8’)
midi_path (str) – Path to the midi file
audio_path (str) – Path to the audio file
duration (float) – Duration of the midi file in seconds
split (str) – Whether the track is for a train/valid/test set. One of ‘train’, ‘valid’ or ‘test’.

Other Parameters:

beats (BeatData) – Machine-generated beat annotations
drum_events (EventData) – Annotated drum kit events
midi (pretty_midi.PrettyMIDI) – object containing MIDI information

property audio: Tuple[ndarray | None, float | None]

The track’s audio

Returns:

np.ndarray - audio signal
float - sample rate

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

mirdata.datasets.groove_midi.load_audio(path: str) → Tuple[ndarray | None, float | None][source]

Load a Groove MIDI audio file.

Parameters:

path – path to an audio file

Returns:

np.ndarray - the mono audio signal
float - The sample rate of the audio file

mirdata.datasets.groove_midi.load_beats(midi_path, midi=None)[source]

Load beat data from the midi file.

Parameters:

midi_path (str) – path to midi file
midi (pretty_midi.PrettyMIDI) – pre-loaded midi object or None if None, the midi object is loaded using midi_path

Returns:

annotations.BeatData – machine generated beat data

mirdata.datasets.groove_midi.load_drum_events(midi_path, midi=None)[source]

Load drum events from the midi file.

Parameters:

midi_path (str) – path to midi file
midi (pretty_midi.PrettyMIDI) – pre-loaded midi object or None if None, the midi object is loaded using midi_path

Returns:

annotations.EventData – drum event data

mirdata.datasets.groove_midi.load_midi(fhandle: BinaryIO) → pretty_midi.PrettyMIDI | None[source]

Load a Groove MIDI midi file.

Parameters:: fhandle (str or file-like) – File-like object or path to midi file
Returns:: midi_data (pretty_midi.PrettyMIDI) – pretty_midi object

gtzan_genre

GTZAN-Genre Dataset Loader

Dataset Info

This dataset was used for the well known genre classification paper:

"Musical genre classification of audio signals " by G. Tzanetakis and
P. Cook in IEEE Transactions on Audio and Speech Processing 2002.

The dataset consists of 1000 audio tracks each 30 seconds long. It contains 10 genres, each represented by 100 tracks. The tracks are all 22050 Hz mono 16-bit audio files in .wav format.

class mirdata.datasets.gtzan_genre.Dataset(data_home=None, version='default')[source]

The gtzan_genre dataset

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: 1.0 mini Default version: 1.0

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download data to save_dir and optionally print a message.

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

Partial Downloads: all mini tempo_beat_annotations

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_audio(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.gtzan_genre.load_audio

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.datasets.gtzan_genre.Track(track_id, data_home, dataset_name, index, metadata)[source]

gtzan_genre Track class

Parameters:

track_id (str) – track id of the track

Variables:

audio_path (str) – path to the audio file
genre (str) – annotated genre
track_id (str) – track id

Other Parameters:

beats (BeatData) – human-labeled beat annotations
tempo (float) – global tempo annotations

property audio: Tuple[ndarray, float] | None

The track’s audio

Returns:

np.ndarray - audio signal
float - sample rate

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

mirdata.datasets.gtzan_genre.load_audio(fhandle: BinaryIO) → Tuple[ndarray, float][source]

Load a GTZAN audio file.

Parameters:

fhandle (str or file-like) – File-like object or path to audio file

Returns:

np.ndarray - the mono audio signal
float - The sample rate of the audio file

mirdata.datasets.gtzan_genre.load_beats(fhandle: TextIO) → BeatData[source]

Load GTZAN format beat data from a file

Parameters:: fhandle (str or file-like) – path or file-like object pointing to a beat annotation file
Returns:: BeatData – loaded beat data

mirdata.datasets.gtzan_genre.load_tempo(fhandle: TextIO) → float[source]

Load GTZAN format tempo data from a file

Parameters:: fhandle (str or file-like) – path or file-like object pointing to a beat annotation file
Returns:: tempo (float) – loaded tempo data

guitarset

GuitarSet Loader

Dataset Info

GuitarSet provides audio recordings of a variety of musical excerpts played on an acoustic guitar, along with time-aligned annotations including pitch contours, string and fret positions, chords, beats, downbeats, and keys.

GuitarSet contains 360 excerpts that are close to 30 seconds in length. The 360 excerpts are the result of the following combinations:

6 players
2 versions: comping (harmonic accompaniment) and soloing (melodic improvisation)
5 styles: Rock, Singer-Songwriter, Bossa Nova, Jazz, and Funk
3 Progressions: 12 Bar Blues, Autumn Leaves, and Pachelbel Canon.
2 Tempi: slow and fast.

The tonality (key) of each excerpt is sampled uniformly at random.

GuitarSet was recorded with the help of a hexaphonic pickup, which outputs signals for each string separately, allowing automated note-level annotation. Excerpts are recorded with both the hexaphonic pickup and a Neumann U-87 condenser microphone as reference. 3 audio recordings are provided with each excerpt with the following suffix:

hex: original 6 channel wave file from hexaphonic pickup
hex_cln: hex wave files with interference removal applied
mic: monophonic recording from reference microphone
mix: monophonic mixture of original 6 channel file

Each of the 360 excerpts has an accompanying JAMS file which stores 16 annotations. Pitch:

6 pitch_contour annotations (1 per string)
6 midi_note annotations (1 per string)

Beat and Tempo:

1 beat_position annotation
1 tempo annotation

Chords:

2 chord annotations: instructed and performed. The instructed chord annotation is a digital version of the lead sheet that’s provided to the player, and the performed chord annotations are inferred from note annotations, using segmentation and root from the digital lead sheet annotation.

For more details, please visit: http://github.com/marl/guitarset/

class mirdata.datasets.guitarset.Dataset(data_home=None, version='default')[source]

The guitarset dataset

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: 1.1.0 Default version: 1.1.0

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download data to save_dir and optionally print a message.

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

Partial Downloads: annotations audio_hex_debleeded audio_hex_original audio_mic audio_mix

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_audio(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.guitarset.load_audio

load_beats(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.guitarset.load_beats

load_chords(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.guitarset.load_chords

load_key_mode(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.guitarset.load_key_mode

load_multitrack_audio(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.guitarset.load_multitrack_audio

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_notes(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.guitarset.load_notes

load_pitch_contour(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.guitarset.load_pitch_contour

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.datasets.guitarset.Track(track_id, data_home, dataset_name, index, metadata)[source]

guitarset Track class

Parameters:

track_id (str) – track id of the track

Variables:

audio_hex_cln_path (str) – path to the debleeded hex wave file
audio_hex_path (str) – path to the original hex wave file
audio_mic_path (str) – path to the mono wave via microphone
audio_mix_path (str) – path to the mono wave via downmixing hex pickup
jams_path (str) – path to the jams file
mode (str) – one of [‘solo’, ‘comp’] For each excerpt, players are asked to first play in ‘comp’ mode and later play a ‘solo’ version on top of the already recorded comp.
player_id (str) – ID of the different players. one of [‘00’, ‘01’, … , ‘05’]
style (str) – one of [‘Jazz’, ‘Bossa Nova’, ‘Rock’, ‘Singer-Songwriter’, ‘Funk’]
tempo (float) – BPM of the track
track_id (str) – track id

Other Parameters:

beats (BeatData) – beat positions
leadsheet_chords (ChordData) – chords as written in the leadsheet
inferred_chords (ChordData) – chords inferred from played transcription
key_mode (KeyData) – key and mode
pitch_contours (dict) – Pitch contours per string - ‘E’: F0Data(…) - ‘A’: F0Data(…) - ‘D’: F0Data(…) - ‘G’: F0Data(…) - ‘B’: F0Data(…) - ‘e’: F0Data(…)
multif0 (MultiF0Data) – all pitch contour data as one multif0 annotation
notes (dict) – Notes per string - ‘E’: NoteData(…) - ‘A’: NoteData(…) - ‘D’: NoteData(…) - ‘G’: NoteData(…) - ‘B’: NoteData(…) - ‘e’: NoteData(…)
notes_all (NoteData) – all note data as one note annotation

property audio_hex: Tuple[ndarray, float] | None

Hexaphonic audio (6-channels) with one channel per string

Returns:

np.ndarray - audio signal
float - sample rate

property audio_hex_cln: Tuple[ndarray, float] | None

Hexaphonic audio (6-channels) with one channel per string: after bleed removal

Returns:

np.ndarray - audio signal
float - sample rate

property audio_mic: Tuple[ndarray, float] | None

The track’s audio

Returns:

np.ndarray - audio signal
float - sample rate

property audio_mix: Tuple[ndarray, float] | None

Mixture audio (mono)

Returns:

np.ndarray - audio signal
float - sample rate

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

mirdata.datasets.guitarset.load_audio(fhandle: BinaryIO) → Tuple[ndarray, float][source]

Load a Guitarset audio file.

Parameters:

fhandle (str or file-like) – File-like object or path to audio file

Returns:

np.ndarray - the mono audio signal
float - The sample rate of the audio file

mirdata.datasets.guitarset.load_beats(fhandle: TextIO) → BeatData[source]

Load a Guitarset beats annotation.

Parameters:: fhandle (str or file-like) – File-like object or path of the jams annotation file
Returns:: BeatData – Beat data

mirdata.datasets.guitarset.load_chords(jams_fhandle: TextIO, leadsheet_version)[source]

Load a guitarset chord annotation.

Parameters:

jams_fhandle (file-like) – File-like object or path of the jams annotation file
leadsheet_version (bool) – Whether or not to load the leadsheet version of the chord annotation. If False, load the inferred version.

Returns:

ChordData – Chord data.

Raises:

FileNotFoundError – If the jams_fhandle does not exist.

mirdata.datasets.guitarset.load_key_mode(fhandle: TextIO) → KeyData[source]

Load a Guitarset key-mode annotation.

Parameters:: fhandle (str or file-like) – File-like object or path of the jams annotation file
Returns:: KeyData – Key data

mirdata.datasets.guitarset.load_multitrack_audio(fhandle: BinaryIO) → Tuple[ndarray, float][source]

Load a Guitarset multitrack audio file.

Parameters:

fhandle (str or file-like) – File-like object or path to audio file

Returns:

np.ndarray - the mono audio signal
float - The sample rate of the audio file

mirdata.datasets.guitarset.load_notes(jams_fhandle: TextIO, string_num)[source]

Load a guitarset note annotation for a given string

Parameters:

jams_fhandle (str or file-like) – file like object to the annotation file
string_num (int), in range(6) – Which string to load. 0 is the Low E string, 5 is the high e string.

Returns:

NoteData – Note data for the given string

mirdata.datasets.guitarset.load_pitch_contour(jams_fhandle: TextIO, string_num)[source]

Load a guitarset pitch contour annotation for a given string

Parameters:

jams_fhandle (str or file-like) – file like object to the annotation file
string_num (int), in range(6) – Which string to load. 0 is the Low E string, 5 is the high e string.

Returns:

F0Data – Pitch contour data for the given string, or None if no data is found.

Raises:

FileNotFoundError – If the jams_fhandle does not exist.

hainsworth

Hainsworth Dataset Loader

Dataset Info

Dataset Overview:

The Hainsworth Dataset [1] comprises 222 musical excerpts, each approximately 1 minute in length, categorized into six genres: rock/pop, dance, jazz, folk, classical, and choral. It was created by Stephen Hainsworth as part of his PhD thesis [1] on automatic music transcription. The dataset offers annotations for beat and downbeat locations, which were generated in a two-stage process. Initially, initial taps were recorded, and then annotations were manually corrected using a custom interface in Matlab, guided by a time-frequency representation.

Of particular significance is the inclusion of approximately 20 choral examples, which posed a significant challenge for annotation due to their unique characteristics. This dataset gained recognition within the beat tracking community for its contribution to annotating and analyzing such challenging musical signals.

In 2014, [2] conducted revisions on the beat and downbeat annotations to correct errors, leading to an enhancement in performance.

Applications:

The Hainsworth Dataset Loader is valuable for tasks related to beat tracking, rhythm analysis, and downbeat detection in various musical genres. Researchers and developers can utilize this dataset for algorithm development, testing, and evaluation. Additionally, it serves as a valuable resource for educational purposes, providing insights into the rhythmic structures of different musical genres.

Acknowledgments and References:

We would like to acknowledge Stephen Hainsworth for creating this dataset and his significant contribution to the field of automatic music transcription. Special thanks to [2] for their efforts in improving the dataset annotations.

For more detailed information about the dataset and its creation, please refer to Stephen Hainsworth’s PhD thesis and the associated research papers and documentation.

[1] Hainsworth, Stephen. (PhD Thesis)

[2] Böck, Sebastian, et al. “Enhanced beat tracking with context-aware neural networks.” In Proceedings of the International Conference on Digital Audio Effects (DAFX), 2010.

class mirdata.datasets.hainsworth.Dataset(data_home=None, version='default')[source]

The Hainsworth dataset

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: 1.0 Default version: 1.0

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download data to save_dir and optionally print a message.

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.datasets.hainsworth.Track(track_id, data_home, dataset_name, index, metadata)[source]

Hainsworth dataset class

Parameters:

track_id (str) – track id of the track
data_home (str) – Local path where the dataset is stored. default=None If None, looks for the data in the default directory, ~/mir_datasets

Variables:

audio_path (str) – path to audio file
beats_path (str) – path to beats file
tempo_path (str) – path to tempo file

Other Parameters:

beats (BeatData) – human-labeled beat annotations
tempo (float) – human-labeled tempo annotations

property audio: Tuple[ndarray, float] | None

The track’s audio

Returns:

np.ndarray - audio signal
float - sample rate

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

mirdata.datasets.hainsworth.load_audio(fhandle: BinaryIO) → Tuple[ndarray, float][source]

Load a Hainsworth audio file.

Parameters:

fhandle (str or file-like) – path or file-like object pointing to an audio file

Returns:

np.ndarray - the mono audio signal
float - The sample rate of the audio file

mirdata.datasets.hainsworth.load_beats(fhandle: TextIO)[source]

Load beats

Parameters:: fhandle (str or file-like) – Local path where the beats annotation is stored.
Returns:: BeatData – beat annotations

mirdata.datasets.hainsworth.load_tempo(fhandle: TextIO) → float[source]

Load tempo

Parameters:: fhandle (str or file-like) – Local path where the tempo annotation is stored.
Returns:: float – tempo annotation

haydn_op20

haydn op20 Dataset Loader

Dataset Info

This dataset accompanies the Master Thesis from Nestor Napoles. It is a manually-annotated corpus of harmonic analysis in harm syntax.

This dataset contains 30 pieces composed by Joseph Haydn in symbolic format, which have each been manually annotated with harmonic analyses.

class mirdata.datasets.haydn_op20.Dataset(data_home=None, version='default')[source]

The haydn op20 dataset

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: 1.3 Default version: 1.3

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download data to save_dir and optionally print a message.

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

Partial Downloads: all

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_chords(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.haydn_op20.load_chords

load_chords_music21(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.haydn_op20.load_chords_music21

load_key(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.haydn_op20.load_key

load_key_music21(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.haydn_op20.load_key_music21

load_midi_path(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.haydn_op20.convert_and_save_to_midi

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_roman_numerals(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.haydn_op20.load_roman_numerals

load_score(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.haydn_op20.load_score

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.datasets.haydn_op20.Track(track_id, data_home, dataset_name, index, metadata)[source]

haydn op20 track class

Parameters:

track_id (str) – track id of the track

Variables:

title (str) – title of the track
track_id (str) – track id
humdrum_annotated_path (str) – path to humdrum annotated score

Other Parameters:

keys (KeyData) – annotated local keys.
keys_music21 (list) – annotated local keys.
roman_numerals (list) – annotated roman_numerals.
chords (ChordData) – annotated chords.
chords_music21 (list) – annotated chords.
duration (int) – relative duration
midi_path (str) – path to midi
score (music21.stream.Score) – music21 score

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

mirdata.datasets.haydn_op20.convert_and_save_to_midi(fpath: TextIO)[source]

convert to midi file and return the midi path

Parameters:: fpath (str or file-like) – path to score file
Returns:: str – midi file path

Deprecated since version 0.3.4: convert_and_save_to_midi is deprecated and will be removed in a future version

mirdata.datasets.haydn_op20.load_chords(fhandle: TextIO, resolution: int = 28)[source]

Load haydn op20 chords data from a file

Parameters:

fhandle (str or file-like) – path to chord annotations
resolution (int) – the number of pulses, or ticks, per quarter note (PPQ)

Returns:

ChordData – chord annotations

mirdata.datasets.haydn_op20.load_chords_music21(fhandle: TextIO, resolution: int = 28)[source]

Load haydn op20 chords data from a file in music21 format

Parameters:

fhandle (str or file-like) – path to chord annotations
resolution (int) – the number of pulses, or ticks, per quarter note (PPQ)

Returns:

list – musical chords data and relative time (offset (Music21Object.offset) * resolution) [(time in PPQ, chord)]

mirdata.datasets.haydn_op20.load_key(fhandle: TextIO, resolution=28)[source]

Load haydn op20 key data from a file

Parameters:

fhandle (str or file-like) – path to key annotations
resolution (int) – the number of pulses, or ticks, per quarter note (PPQ)

Returns:

KeyData – loaded key data

mirdata.datasets.haydn_op20.load_key_music21(fhandle: TextIO, resolution=28)[source]

Load haydn op20 key data from a file in music21 format

Parameters:

fhandle (str or file-like) – path to key annotations
resolution (int) – the number of pulses, or ticks, per quarter note (PPQ)

Returns:

list – musical key data and relative time (offset (Music21Object.offset) * resolution) [(time in PPQ, local key)]

mirdata.datasets.haydn_op20.load_roman_numerals(fhandle: TextIO, resolution=28)[source]

Load haydn op20 roman numerals data from a file

Parameters:

fhandle (str or file-like) – path to roman numeral annotations
resolution (int) – the number of pulses, or ticks, per quarter note (PPQ)

Returns:

list – musical roman numerals data and relative time (offset (Music21Object.offset) * resolution) [(time in PPQ, roman numerals)]

mirdata.datasets.haydn_op20.load_score(fhandle: TextIO)[source]

Load haydn op20 score with annotations from a file with music21 format (music21.stream.Score).

Parameters:: fhandle (str or file-like) – path to score
Returns:: music21.stream.Score – score in music21 format

idmt_smt_audio_effects

IDMT-SMT-Audio-Effects Dataset Loader

Dataset Info

IDMT-SMT-Audio-Effects is a large database for automatic detection of audio effects in recordings of electric guitar and bass and related signal processing. The overall duration of the audio material is approx. 30 hours.

The dataset consists of 55044 WAV files (44.1 kHz, 16bit, mono) with single recorded notes:

20592 monophonic bass notes 20592 monophonic guitar notes 13860 polyphonic guitar sounds Overall, 11 different audio effects are incorporated: feedback delay, slapback delay, reverb, chorus, flanger, phaser, tremolo, vibrato, distortion, overdrive, no effect (unprocessed notes/sounds)

2 different electric guitars and 2 different electric bass guitars, each with two different pick-up settings and up to three different plucking styles (finger plucked - hard, finger plucked - soft, picked) were used for recording. The notes cover the common pitch range of a 4-string bass guitar from E1 (41.2 Hz) to G3 (196.0 Hz) or the common pitch range of a 6-string electric guitar from E2 (82.4 Hz) to E5 (659.3 Hz). Effects processing was performed using a digital audio workstation and a variety of mostly freely available effect plugins.

To organize the database, lists in XML format are used, which record all relevant information and are provided with the database as well as a summary of the used effect plugins and parameter settings.

In addition, most of this information is also encoded in the first part of the file name of the audio files using a simple alpha-numeric encoding scheme. The second part of the file name contains unique identification numbers. This provides an option for fast and flexible structuring of the data for various purposes.

DOI 10.5281/zenodo.7544032

class mirdata.datasets.idmt_smt_audio_effects.Dataset(data_home=None, version='default')[source]

The IDMT-SMT-Audio Effect dataset.

Parameters:

data_home (str) – Directory where the dataset is located or will be downloaded.
version (str) – Dataset version. Default is “default”.

Variables:

name (str) – Name of the dataset.
track_class (Type[core.Track]) – Track type.
bibtex (str or None) – BibTeX citation.
indexes (dict or None) – Available versions.
remotes (dict or None) – Data to be downloaded.
download_info (str) – Instructions for downloading the dataset.
license_info (str) – Dataset license.
data_home (str) – path where mirdata will look for the dataset
version (str)
name – the identifier of the dataset
bibtex – dataset citation/s in bibtex format
indexes
remotes – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: 1.0 Default version: 1.0

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download the dataset

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

Partial Downloads: full_dataset

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.datasets.idmt_smt_audio_effects.Track(track_id, data_home, dataset_name, index, metadata)[source]

IDMT-SMT-Audio-Effects track class.

Parameters:

track_id (str) – track id of the track.
data_home (str) – Local path where the dataset is stored.
dataset_name (str) – Name of the dataset.
index (Dict) – Index dictionary.
metadata (Dict) – Metadata dictionary.

Variables:

audio_path (str) – path to audio file.
instrument (str) – instrument used to record the track.
midi_nr (int) – midi number of the note.
fx_group (int) – effect group number.
fx_type (int) – effect type number.
fx_setting (int) – effect setting number.

property audio: Tuple[ndarray, float] | None

The track’s audio

Returns:

np.ndarray - audio signal
float - sample rate

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

mirdata.datasets.idmt_smt_audio_effects.load_audio(fhandle: BinaryIO) → Tuple[ndarray, float][source]

Load a IDMT-SMT-Audio Effect track.

Parameters:

fhandle (Union[str, BinaryIO]) – Path to audio file or file-like object.

Returns:

np.ndarray - the mono audio signal
float - The sample rate of the audio file

ikala

iKala Dataset Loader

Dataset Info

The iKala dataset is comprised of 252 30-second excerpts sampled from 206 iKala songs (plus 100 hidden excerpts reserved for MIREX). The music accompaniment and the singing voice are recorded at the left and right channels respectively and can be found under the Wavfile directory. In addition, the human-labeled pitch contours and timestamped lyrics can be found under PitchLabel and Lyrics respectively.

For more details, please visit: http://mac.citi.sinica.edu.tw/ikala/

class mirdata.datasets.ikala.Dataset(data_home=None, version='default')[source]

The ikala dataset

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: 1.0 2.0 Default version: 2.0

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download data to save_dir and optionally print a message.

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

Partial Downloads: metadata notes_pyin

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_f0(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.ikala.load_f0

load_instrumental_audio(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.ikala.load_instrumental_audio

load_lyrics(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.ikala.load_lyrics

load_mix_audio(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.ikala.load_mix_audio

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_notes(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.ikala.load_notes

load_pronunciations(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.ikala.load_pronunciations

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

load_vocal_audio(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.ikala.load_vocal_audio

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.datasets.ikala.Track(track_id, data_home, dataset_name, index, metadata)[source]

ikala Track class

Parameters:

track_id (str) – track id of the track

Variables:

audio_path (str) – path to the track’s audio file
f0_path (str) – path to the track’s f0 annotation file
notes_pyin_path (str) – path to the note annotation file
lyrics_path (str) – path to the track’s lyric annotation file
section (str) – section. Either ‘verse’ or ‘chorus’
singer_id (str) – singer id
song_id (str) – song id of the track
track_id (str) – track id

Other Parameters:

f0 (F0Data) – human-annotated singing voice pitch
notes_pyin (NoteData) – notes estimated by the pyin algorithm
lyrics (LyricsData) – human-annotated lyrics
pronunciations (LyricsData) – human-annotation lyric pronunciations

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

property instrumental_audio: Tuple[ndarray, float] | None

instrumental audio (mono)

Returns:

np.ndarray - audio signal
float - sample rate

property mix_audio: Tuple[ndarray, float] | None

mixture audio (mono)

Returns:

np.ndarray - audio signal
float - sample rate

property vocal_audio: Tuple[ndarray, float] | None

solo vocal audio (mono)

Returns:

np.ndarray - audio signal
float - sample rate

mirdata.datasets.ikala.load_f0(fhandle: TextIO) → F0Data[source]

Load an ikala f0 annotation

Parameters:: fhandle (str or file-like) – File-like object or path to f0 annotation file
Raises:: IOError – If f0_path does not exist
Returns:: F0Data – the f0 annotation data

mirdata.datasets.ikala.load_instrumental_audio(fhandle: BinaryIO) → Tuple[ndarray, float][source]

Load ikala instrumental audio

Parameters:

fhandle (str or file-like) – File-like object or path to audio file

Returns:

np.ndarray - audio signal
float - sample rate

mirdata.datasets.ikala.load_lyrics(fhandle: TextIO) → LyricData[source]

Load an ikala lyrics annotation

Parameters:: fhandle (str or file-like) – File-like object or path to lyric annotation file
Raises:: IOError – if lyrics_path does not exist
Returns:: LyricData – lyric annotation data

mirdata.datasets.ikala.load_mix_audio(fhandle: BinaryIO) → Tuple[ndarray, float][source]

Load an ikala mix.

Parameters:

fhandle (str or file-like) – File-like object or path to audio file

Returns:

np.ndarray - audio signal
float - sample rate

mirdata.datasets.ikala.load_notes(fhandle: TextIO) → NoteData | None[source]

load a note annotation file

Parameters:: fhandle (str or file-like) – str or file-like to note annotation file
Raises:: IOError – if file doesn’t exist
Returns:: NoteData – note annotation

mirdata.datasets.ikala.load_pronunciations(fhandle: TextIO) → LyricData[source]

Load an ikala pronunciation annotation

Parameters:: fhandle (str or file-like) – File-like object or path to lyric annotation file
Raises:: IOError – if lyrics_path does not exist
Returns:: LyricData – pronunciation annotation data

mirdata.datasets.ikala.load_vocal_audio(fhandle: BinaryIO) → Tuple[ndarray, float][source]

Load ikala vocal audio

Parameters:

fhandle (str or file-like) – File-like object or path to audio file

Returns:

np.ndarray - audio signal
float - sample rate

irmas

IRMAS Loader

Dataset Info

IRMAS: a dataset for instrument recognition in musical audio signals

This dataset includes musical audio excerpts with annotations of the predominant instrument(s) present. It was used for the evaluation in the following article:

Bosch, J. J., Janer, J., Fuhrmann, F., & Herrera, P. “A Comparison of Sound Segregation Techniques for
Predominant Instrument Recognition in Musical Audio Signals”, in Proc. ISMIR (pp. 559-564), 2012.

IRMAS is intended to be used for training and testing methods for the automatic recognition of predominant instruments in musical audio. The instruments considered are: cello, clarinet, flute, acoustic guitar, electric guitar, organ, piano, saxophone, trumpet, violin, and human singing voice. This dataset is derived from the one compiled by Ferdinand Fuhrmann in his PhD thesis, with the difference that we provide audio data in stereo format, the annotations in the testing dataset are limited to specific pitched instruments, and there is a different amount and lenght of excerpts from the original dataset.

The dataset is split into training and test data.

Training data

Total audio samples: 6705 They are excerpts of 3 seconds from more than 2000 distinct recordings.

Audio specifications

Sampling frequency: 44.1 kHz
Bit-depth: 16 bit
Audio format: .wav

IRMAS Dataset trainig samples are annotated by storing the information of each track in their filenames.

Predominant instrument:
- The annotation of the predominant instrument of each excerpt is both in the name of the containing folder, and in the file name: cello (cel), clarinet (cla), flute (flu), acoustic guitar (gac), electric guitar (gel), organ (org), piano (pia), saxophone (sax), trumpet (tru), violin (vio), and human singing voice (voi).
- The number of files per instrument are: cel(388), cla(505), flu(451), gac(637), gel(760), org(682), pia(721), sax(626), tru(577), vio(580), voi(778).
Drum presence
- Additionally, some of the files have annotation in the filename regarding the presence ([dru]) or non presence([nod]) of drums.
The annotation of the musical genre:
- country-folk ([cou_fol])
- classical ([cla]),
- pop-rock ([pop_roc])
- latin-soul ([lat_sou])
- jazz-blues ([jaz_blu]).

Testing data

Total audio samples: 2874

Audio specifications

Sampling frequency: 44.1 kHz
Bit-depth: 16 bit
Audio format: .wav

IRMAS Dataset testing samples are annotated by the following basis:

Predominant instrument:

The annotations for an excerpt named: “excerptName.wav” are given in “excerptName.txt”. More than one instrument may be annotated in each excerpt, one label per line. This part of the dataset contains excerpts from a diversity of western musical genres, with varied instrumentations, and it is derived from the original testing dataset from Fuhrmann (http://www.dtic.upf.edu/~ffuhrmann/PhD/). Instrument nomenclatures are the same as the training dataset.

Dataset compiled by Juan J. Bosch, Ferdinand Fuhrmann, Perfecto Herrera, Music Technology Group - Universitat Pompeu Fabra (Barcelona).

The IRMAS dataset is offered free of charge for non-commercial use only. You can not redistribute it nor modify it. This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License

For more details, please visit: https://www.upf.edu/web/mtg/irmas

class mirdata.datasets.irmas.Dataset(data_home=None, version='default')[source]

The irmas dataset

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: 1.0 Default version: 1.0

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download data to save_dir and optionally print a message.

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

Partial Downloads: training_data testing_data_1 testing_data_2 testing_data_3

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_audio(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.irmas.load_audio

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_pred_inst(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.irmas.load_pred_inst

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.datasets.irmas.Track(track_id, data_home, dataset_name, index, metadata)[source]

IRMAS track class

Parameters:

track_id (str) – track id of the track
data_home (str) – Local path where the dataset is stored. If None, looks for the data in the default directory, ~/mir_datasets/Mridangam-Stroke

Variables:

track_id (str) – track id
predominant_instrument (list) – Training tracks predominant instrument
train (bool) – flag to identify if the track is from the training of the testing dataset
genre (str) – string containing the namecode of the genre of the track.
drum (bool) – flag to identify if the track contains drums or not.
split (str) – data split (“train” or “test”)

Other Parameters:

instrument (list) – list of predominant instruments as str

property audio: Tuple[ndarray, float] | None

The track’s audio signal

Returns:

np.ndarray - the mono audio signal
float - The sample rate of the audio file

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

mirdata.datasets.irmas.load_audio(fhandle: BinaryIO) → Tuple[ndarray, float][source]

Load a IRMAS dataset audio file.

Parameters:

fhandle (str or file-like) – File-like object or path to audio file

Returns:

np.ndarray - the mono audio signal
float - The sample rate of the audio file

mirdata.datasets.irmas.load_pred_inst(fhandle: TextIO) → List[str][source]

Load predominant instrument of track

Parameters:: fhandle (str or file-like) – File-like object or path where the test annotations are stored.
Returns:: list(str) – test track predominant instrument(s) annotations

jtd

Jazz Trio Database (JTD) Loader

Dataset Info

The Jazz Trio Database (JTD) is a dataset comprising 1,294 multitrack jazz performances (about 45 hours total) annotated by an automated signal processing pipeline. All performances are commercial recordings of jazz piano trios, comprising acoustic piano, upright bass, and drum kit, and are broadly in the “straight-ahead” jazz style.

Its purpose is to serve as a reference database for the design, evaluation, and implementation of various music information retrieval systems related to jazz and improvised music, including (but not limited to) onset detection, beat tracking, automatic music transcription, and performer identification.

For every performance, the following audio files are included:

the “raw” audio from the piano solo, typically including piano, bass, and drums (stereo, 44.1 kHz)
- for some performances, individual audio files for the left and right stereo channels are also included
unmixed piano audio obtained by applying a music source separation model to the “raw” audio
unmixed bass audio
unmixed drums audio

For the “raw” audio, there are the following annotations:

Beat timestamps for the start of each quarter note
Downbeat annotations for the start of each bar

For the three “unmixed” audio files, there are the following annotations:

MIDI transcription (frame-level, currently piano only)
Onset timestamps
Beat-matched onsets

To “match” onsets in the unmixed audio and beats in the “raw” audio, a window of -32nd/+16th note is applied to every beat timestamp, and the nearest onset from every unmixed audio file is taken as the “match”. In cases where no onsets are contained inside the window, the beat is set to “missing” in the data, such that the number of beat-matched onsets is always the same as the number of beats.

Finally, there are the following piece-level annotations:

Tempo, in quarter-note beats-per-minute
Time signature (either three or four quarter-note beats)
Timestamps for the duration of the piano solo within the performance
Metadata (e.g., recording year, performer names, album title)

The JTD was created by researchers at the Centre for Music & Science, University of Cambridge, as part of Huw Cheston’s PhD research, during the period 2023-2024.

The audio data is not publicly available and access must be requested on Zenodo. The annotations and metadata are freely available. The database is made available for research and educational purposes under the MIT license (https://github.com/HuwCheston/Jazz-Trio-Database/blob/main/LICENSE).

For more details, please visit our GitHub repository (https://github.com/HuwCheston/Jazz-Trio-Database/) or our TISMIR publication (https://doi.org/10.5334/tismir.186).

class mirdata.datasets.jtd.Dataset(data_home=None, version='default')[source]

The Jazz Trio Database.

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: 2.0 Default version: 2.0

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download data to save_dir and optionally print a message.

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

Partial Downloads: annotations

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.datasets.jtd.MultiTrack(mtrack_id, data_home, dataset_name, index, track_class, metadata)[source]

JTD multitrack class

Parameters:

mtrack_id (str) – multitrack id
data_home (str) – Local path where the dataset is stored. If None, looks for the data in the default directory, ~/mir_datasets/jtd

Variables:

album (str) – The name of the album that this performance was taken from.
audio (Tuple[np.ndarray, float]) – The track’s audio, center channel.
audio_lchan (Tuple[np.ndarray, float]) – The track’s audio, left channel (if available).
audio_rchan (Tuple[np.ndarray, float]) – The track’s audio, right channel (if available).
bandleader (str) – The full name of the bandleader who led the recording session.
bass (Track) – The associated bass track for this recording.
drums (Track) – The associated drums track for this recording.
duration (float) – The duration of the piano solo in seconds.
jtd_300 (bool) – Whether the track is contained in the smaller JTD-300 subset of 300 recordings.
mtrack_id (str) – track id
musicbrainz_id (str) – The MusicBrainz ID for the recording.
name (str) – The track’s name.
piano (Track) – The associated piano track for this recording.
start (int) – The start of the piano solo relative to the full recording (in seconds).
stop (int) – The end of the piano solo relative to the full recording (in seconds).
tempo (float) – The average tempo of the track in beats per minute.
time_signature (int) – The time signature of the recording (3 or 4 quarter-note beats).
tracks (dict) – Dictionary of track IDs and Track instances
year (int) – The year the recording was made.

Other Parameters:

beats (annotations.BeatData) – The times of quarter-note beats for the recording.

property album: str

The name of the album that this performance was taken from

Returns:

str - name of the album

property audio: Tuple[ndarray, float] | None

The track’s audio, center channel

Returns:

np.ndarray - audio signal
float - sample rate

property audio_lchan: Tuple[ndarray, float] | None

The track’s audio, left channel (not present for all tracks)

Returns:

np.ndarray - audio signal
float - sample rate

property audio_rchan: Tuple[ndarray, float] | None

The track’s audio, right channel (not present for all tracks)

Returns:

np.ndarray - audio signal
float - sample rate

property bandleader: str

The full name of the bandleader who led the recording session

Returns:

str - name of the bandleader

property bass: Track

The associated bass track for this recording

Returns:

Track

beats

The times of quarter-note beats for the recording

Returns:

annotations.BeatData - timestamp, beat number (1-indexed to bar)

property drums: Track

The associated drums track for this recording

Returns:

Track

property duration: int

The duration of the piano solo

Returns:

float - solo duration (in seconds)

get_mix()[source]

Create a linear mixture given a subset of tracks.

Parameters:: track_keys (list) – list of track keys to mix together
Returns:: np.ndarray – mixture audio with shape (n_samples, n_channels)

get_path(key)[source]

Get absolute path to multitrack audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

get_random_target(n_tracks=None, min_weight=0.3, max_weight=1.0)[source]

Get a random target by combining a random selection of tracks with random weights

Parameters:

n_tracks (int or None) – number of tracks to randomly mix. If None, uses all tracks
min_weight (float) – minimum possible weight when mixing
max_weight (float) – maximum possible weight when mixing

Returns:

np.ndarray - mixture audio with shape (n_samples, n_channels)
list - list of keys of included tracks
list - list of weights used to mix tracks

get_target(track_keys, weights=None, average=True, enforce_length=True)[source]

Get target which is a linear mixture of tracks

Parameters:

track_keys (list) – list of track keys to mix together
weights (list or None) – list of positive scalars to be used in the average
average (bool) – if True, computes a weighted average of the tracks if False, computes a weighted sum of the tracks
enforce_length (bool) – If True, raises ValueError if the tracks are not the same length. If False, pads audio with zeros to match the length of the longest track

Returns:

np.ndarray – target audio with shape (n_channels, n_samples)

Raises:

ValueError – if sample rates of the tracks are not equal if enforce_length=True and lengths are not equal

property jtd_300: bool

Whether the track is contained in the smaller JTD-300 subset of 300 recordings

Returns:

bool - True if contained in JTD-300, otherwise false

property musicbrainz_id: str

The MusicBrainz ID for the recording

Returns:

str - musicbrainz ID

property name: str

The track’s name

Returns:

str - track name

property piano: Track

The associated piano track for this recording

Returns:

Track

property start: int

The start of the piano solo relative to the full recording

Returns:

int - start of performance, in seconds

property stop: int

The end of the piano solo relative to the full recording

Returns:

int - end of performance, in seconds

property tempo: float

The average tempo of the track

Returns:

float - the tempo, in beats-per-minute

property time_signature: int

The time signature of the recording, either 3 or 4 quarter-note beats

Returns:

int - time signature

property year: int | None

The year the recording was made

Returns:

int - recording year

class mirdata.datasets.jtd.Track(track_id, data_home, dataset_name, index, metadata)[source]

JTD track class

Parameters:

track_id (str) – track id of the track

Variables:

audio_path (str) – path to audio file
onsets_path (str) – path to onsets file
midi_path (str) – path to MIDI file
beats_path (str) – path to beats file
instrument (str) – name of the instrument for this track, either “piano”, “bass”, or “drums”

Properties:: audio(tuple): audio signal and sample rate for the isolated instrument track of this performance musician (str): name of the musician playing the instrument on this track

Other Parameters:

beats (BeatData) – beat times for this instrument
onsets (EventData) – onset and offset times
midi (NoteData) – midi pitches, onset, offset times, and velocities

property audio: Tuple[ndarray, float] | None

The source-separated audio for this instrument

Returns:

np.ndarray - audio signal
float - sample rate

beats

The times of onsets by this musician matched to the nearest quarter-note beat timestamp

Returns:

annotations.BeatData - timestamp, beat number (1-indexed to bar)

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

midi

The MIDI for this instrument

Returns:

annotations.NoteData

property musician: str

The name of the musician playing on this track

Returns:

str - name of musician

onsets

The onsets for this instrument

Returns:

annotations.EventData

mirdata.datasets.jtd.load_audio(fhandle: BinaryIO) → Tuple[ndarray, float][source]

Load a JTD audio file.

Parameters:

fhandle (str or file-like) – path or file-like object pointing to an audio file

Returns:

np.ndarray - the audio signal
float - The sample rate of the audio file

mirdata.datasets.jtd.load_beats(fhandle: TextIO, col_idx: int) → BeatData | None[source]

Load a JTD beat file.

Parameters:

fhandle (str or file-like) – path or file-like object pointing to a beat csv file
col_idx (int, optional) – index of the column to use (0=overall, 1=piano, 2=bass, 3=drums), defaults to 0

Returns:

annotations.BeatData - the beat data

mirdata.datasets.jtd.load_onsets(fhandle: TextIO) → EventData | None[source]

Load a JTD onset file.

Parameters:

fhandle (str or file-like) – path or file-like object pointing to an onset csv file

Returns:

annotations.EventData - the onset data

mirdata.datasets.jtd.timestamp_to_seconds(ts: str) → int[source]: Coerces timestamp in form %M:%S or %H:%M:S to an integer

mtg_jamendo_autotagging_moodtheme

MTG jamendo autotagging moodtheme Dataset Loader

Dataset Info

The MTG Jamendo autotagging mood/theme Dataset is a new open dataset for music auto-tagging. It is built using music available at Jamendo under Creative Commons licenses and tags provided by content uploaders. The dataset contains 18,486 full audio tracks with 195 tags from mood/theme. It is provided five fixed data splits for a better and fair replication. For more information please visit: https://github.com/MTG/mtg-jamendo-dataset .

The moodtheme tags are:

action, adventure, advertising, ambiental, background, ballad, calm, children, christmas, commercial, cool, corporate, dark, deep, documentary, drama, dramatic, dream, emotional, energetic, epic, fast, film, fun, funny, game, groovy, happy, heavy, holiday, hopeful, horror, inspiring, love, meditative, melancholic, mellow, melodic, motivational, movie, nature, party, positive, powerful, relaxing, retro, romantic, sad, sexy, slow, soft, soundscape, space, sport, summer, trailer, travel, upbeat, uplifting.

Emotion and theme recognition is a popular task in music information retrieval that is relevant for music search and recommendation systems.

This task involves the prediction of moods and themes conveyed by a music track, given the raw audio. The examples of moods and themes are: happy, dark, epic, melodic, love, film, space etc. The full list is available at: https://github.com/mir-dataset-loaders/mirdata/pull/505 Each track is tagged with at least one tag that serves as a ground-truth.

Acknowledgments

This work was funded by the predoctoral grant MDM-2015-0502-17-2 from the Spanish Ministry of Economy and Competitiveness linked to the Maria de Maeztu Units of Excellence Programme (MDM-2015-0502).

This work has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 765068 “MIP-Frontiers”.

This work has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 688382 “AudioCommons”.

class mirdata.datasets.mtg_jamendo_autotagging_moodtheme.Dataset(data_home=None, version='default')[source]

The MTG jamendo autotagging moodtheme dataset

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: 1.0 Default version: 1.0

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download data to save_dir and optionally print a message.

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

Partial Downloads: metadata

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_ids_for_split(split_number)[source]

Load a MTG jamendo autotagging moodtheme pre-defined split. There are five different train/validation/tests splits. :Parameters: split_number (int) – split to be retrieved from 0 to 4

Returns:: ** dict* – {“train”: […], “validation”: […], “test”: […]} - the train split

Deprecated since version 0.3.6: Use mirdata.datasets.mtg_jamendo_autotagging_moodtheme.get_track _splits

get_track_splits(split_number=0)[source]

Get predetermined track splits released alongside this dataset

Parameters:: split_number (int) – which split split_number to use (0, 1, 2, 3 or 4)
Returns:: dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_audio(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.mtg_jamendo_autotagging_moodtheme.load_audio

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.datasets.mtg_jamendo_autotagging_moodtheme.Track(track_id, data_home, dataset_name, index, metadata)[source]

MTG jamendo autotagging moodtheme Track class

Parameters:

track_id (str) – track id of the track (JAMENDO track id)

Variables:

audio_path (str) – Path to the audio file

Other Parameters:

artist_id (str) – JAMENDO artist id
album_id (str) – JAMENDO album id
duration (float) – track duration
tags (str) – autotagging moodtheme annotations

property audio: Tuple[ndarray, float] | None

The track’s audio

Returns:

np.ndarray - audio signal
float - sample rate

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

tags: Returns a list of all tags for this track

mirdata.datasets.mtg_jamendo_autotagging_moodtheme.load_audio(fhandle: BinaryIO) → Tuple[ndarray, float][source]

Load a MTG jamendo autotagging moodtheme audio file.

Parameters:

fhandle (str or file-like) – path or file-like object pointing to an audio file

Returns:

np.ndarray - the mono audio signal
float - The sample rate of the audio file

maestro

MAESTRO Dataset Loader

Dataset Info

MAESTRO (MIDI and Audio Edited for Synchronous TRacks and Organization) is a dataset composed of over 200 hours of virtuosic piano performances captured with fine alignment (~3 ms) between note labels and audio waveforms.

The dataset is created and released by Google’s Magenta team.

The dataset contains over 200 hours of paired audio and MIDI recordings from ten years of International Piano-e-Competition. The MIDI data includes key strike velocities and sustain/sostenuto/una corda pedal positions. Audio and MIDI files are aligned with ∼3 ms accuracy and sliced to individual musical pieces, which are annotated with composer, title, and year of performance. Uncompressed audio is of CD quality or higher (44.1–48 kHz 16-bit PCM stereo).

A train/validation/test split configuration is also proposed, so that the same composition, even if performed by multiple contestants, does not appear in multiple subsets. Repertoire is mostly classical, including composers from the 17th to early 20th century.

The dataset is made available by Google LLC under a Creative Commons Attribution Non-Commercial Share-Alike 4.0 (CC BY-NC-SA 4.0) license.

This loader supports MAESTRO version 2.

For more details, please visit: https://magenta.tensorflow.org/datasets/maestro

class mirdata.datasets.maestro.Dataset(data_home=None, version='default')[source]

The maestro dataset

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: 2.0.0 Default version: 2.0.0

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False)[source]

Download the dataset

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

Partial Downloads: all midi metadata

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_audio(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.maestro.load_audio

load_midi(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.io.load_midi

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_notes(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.io.load_notes_from_midi

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.datasets.maestro.Track(track_id, data_home, dataset_name, index, metadata)[source]

MAESTRO Track class

Parameters:

track_id (str) – track id of the track

Variables:

audio_path (str) – Path to the track’s audio file
canonical_composer (str) – Composer of the piece, standardized on a single spelling for a given name.
canonical_title (str) – Title of the piece. Not guaranteed to be standardized to a single representation.
duration (float) – Duration in seconds, based on the MIDI file.
midi_path (str) – Path to the track’s MIDI file
split (str) – Suggested train/validation/test split.
track_id (str) – track id
year (int) – Year of performance.

Cached Property:: midi (pretty_midi.PrettyMIDI): object containing MIDI annotations notes (NoteData): annotated piano notes

property audio: Tuple[ndarray, float] | None

The track’s audio

Returns:

np.ndarray - audio signal
float - sample rate

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

mirdata.datasets.maestro.load_audio(fhandle: BinaryIO) → Tuple[ndarray, float][source]

Load a MAESTRO audio file.

Parameters:

fhandle (str or file-like) – File-like object or path to audio file

Returns:

np.ndarray - the mono audio signal
float - The sample rate of the audio file

mdb_stem_synth

MDB-stem-synth Dataset Loader

Dataset Info

MDB-stem-synth contains 230 solo stems (tracks) from the MedleyDB dataset spanning a variety of musical instruments and voices, which have been resynthesized to obtain a perfect f0 annotation using the analysis/synthesis method described in the referenced publication of Salamon et al. (ISMIR 2017).

For more details and download info, please visit: - https://synthdatasets.weebly.com/mdb-stem-synth.html - https://zenodo.org/record/1481172

class mirdata.datasets.mdb_stem_synth.Dataset(data_home=None, version='default')[source]

The MDB-stem-synth dataset

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: 1.0.0 Default version: 1.0.0

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download data to save_dir and optionally print a message.

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

Partial Downloads: mdb_stem_synth

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.datasets.mdb_stem_synth.Track(track_id, data_home, dataset_name, index, metadata)[source]

mdb_stem_synth Track class

Parameters:

track_id (str) – track id of the track

Variables:

audio_path (str) – path to the track’s audio file
f0_path (str) – path to the track’s f0 annotation file
track_id (str) – track id

Other Parameters:

f0 (F0Data) – the track’s f0 annotation
audio (Tuple[np.ndarray, float]) – audio signal and sample rate

audio

The track’s audio

Returns:

np.ndarray - audio signal
float - sample rate

f0

The track’s f0 annotation

Returns:: F0Data – the f0 annotation data

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

mirdata.datasets.mdb_stem_synth.load_audio(fhandle: BinaryIO) → Tuple[ndarray, float][source]

Load MDB-stem-synth audio

Parameters:

fhandle (str or file-like) – File-like object or path to audio file

Returns:

np.ndarray - audio signal
float - sample rate

mirdata.datasets.mdb_stem_synth.load_f0(fhandle: TextIO) → F0Data[source]

Load a MDB-stem-synth f0 annotation

Parameters:: fhandle (str or file-like) – File-like object or path to f0 annotation file
Raises:: IOError – If f0_path does not exist
Returns:: F0Data – the f0 annotation data

medley_solos_db

Medley-solos-DB Dataset Loader.

Dataset Info

Medley-solos-DB is a cross-collection dataset for automatic musical instrument recognition in solo recordings. It consists of a training set of 3-second audio clips, which are extracted from the MedleyDB dataset (Bittner et al., ISMIR 2014) as well as a test set of 3-second clips, which are extracted from the solosDB dataset (Essid et al., IEEE TASLP 2009).

Each of these clips contains a single instrument among a taxonomy of eight:

clarinet,

distorted electric guitar,

female singer,

flute,

piano,

tenor saxophone,

trumpet, and

violin.

The Medley-solos-DB dataset is the dataset that is used in the benchmarks of musical instrument recognition in the publications of Lostanlen and Cella (ISMIR 2016) and Andén et al. (IEEE TSP 2019).

class mirdata.datasets.medley_solos_db.Dataset(data_home=None, version='default')[source]

The medley_solos_db dataset

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: 1.2 Default version: 1.2

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download data to save_dir and optionally print a message.

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

Partial Downloads: annotations audio

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_audio(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.medley_solos_db.load_audio

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.datasets.medley_solos_db.Track(track_id, data_home, dataset_name, index, metadata)[source]

medley_solos_db Track class

Parameters:

track_id (str) – track id of the track

Variables:

audio_path (str) – path to the track’s audio file
instrument (str) – instrument encoded by its English name
instrument_id (int) – instrument encoded as an integer
song_id (int) – song encoded as an integer
subset (str) – either equal to ‘train’, ‘validation’, or ‘test’
track_id (str) – track id

property audio: Tuple[ndarray, float] | None

The track’s audio

Returns:

np.ndarray - audio signal
float - sample rate

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

mirdata.datasets.medley_solos_db.load_audio(fhandle: BinaryIO) → Tuple[ndarray, float][source]

Load a Medley Solos DB audio file.

Parameters:

fhandle (str or file-like) – File-like object or path to audio file

Returns:

np.ndarray - the mono audio signal
float - The sample rate of the audio file

medleydb_melody

MedleyDB melody Dataset Loader

Dataset Info

MedleyDB melody is a subset of the MedleyDB dataset containing only the mixtures and melody annotations.

MedleyDB is a dataset of annotated, royalty-free multitrack recordings. MedleyDB was curated primarily to support research on melody extraction, addressing important shortcomings of existing collections. For each song we provide melody f0 annotations as well as instrument activations for evaluating automatic instrument recognition.

For more details, please visit: https://medleydb.weebly.com

class mirdata.datasets.medleydb_melody.Dataset(data_home=None, version='default')[source]

The medleydb_melody dataset

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: 5.0 Default version: 5.0

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download data to save_dir and optionally print a message.

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_audio(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.medleydb_melody.load_audio

load_melody(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.medleydb_melody.load_melody

load_melody3(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.medleydb_melody.load_melody3

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.datasets.medleydb_melody.Track(track_id, data_home, dataset_name, index, metadata)[source]

medleydb_melody Track class

Parameters:

track_id (str) – track id of the track

Variables:

artist (str) – artist
audio_path (str) – path to the audio file
genre (str) – genre
is_excerpt (bool) – True if the track is an excerpt
is_instrumental (bool) – True of the track does not contain vocals
melody1_path (str) – path to the melody1 annotation file
melody2_path (str) – path to the melody2 annotation file
melody3_path (str) – path to the melody3 annotation file
n_sources (int) – Number of instruments in the track
title (str) – title
track_id (str) – track id

Other Parameters:

melody1 (F0Data) – the pitch of the single most predominant source (often the voice)
melody2 (F0Data) – the pitch of the predominant source for each point in time
melody3 (MultiF0Data) – the pitch of any melodic source. Allows for more than one f0 value at a time

property audio: Tuple[ndarray, float] | None

The track’s audio

Returns:

np.ndarray - audio signal
float - sample rate

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

mirdata.datasets.medleydb_melody.load_audio(fhandle: BinaryIO) → Tuple[ndarray, float][source]

Load a MedleyDB audio file.

Parameters:

fhandle (str or file-like) – File-like object or path to audio file

Returns:

np.ndarray - the mono audio signal
float - The sample rate of the audio file

mirdata.datasets.medleydb_melody.load_melody(fhandle: TextIO) → F0Data[source]

Load a MedleyDB melody1 or melody2 annotation file

Parameters:: fhandle (str or file-like) – File-like object or path to a melody annotation file
Raises:: IOError – if melody_path does not exist
Returns:: F0Data – melody data

mirdata.datasets.medleydb_melody.load_melody3(fhandle: TextIO) → MultiF0Data[source]

Load a MedleyDB melody3 annotation file

Parameters:: fhandle (str or file-like) – File-like object or melody 3 melody annotation path
Raises:: IOError – if melody_path does not exist
Returns:: MultiF0Data – melody 3 annotation data

medleydb_pitch

MedleyDB pitch Dataset Loader

Dataset Info

MedleyDB Pitch is a pitch-tracking subset of the MedleyDB dataset containing only f0-annotated, monophonic stems.

MedleyDB is a dataset of annotated, royalty-free multitrack recordings. MedleyDB was curated primarily to support research on melody extraction, addressing important shortcomings of existing collections. For each song we provide melody f0 annotations as well as instrument activations for evaluating automatic instrument recognition.

For more details, please visit: https://medleydb.weebly.com

class mirdata.datasets.medleydb_pitch.Dataset(data_home=None, version='default')[source]

The medleydb_pitch dataset

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: 2.0 3.0 Default version: 3.0

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download data to save_dir and optionally print a message.

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

Partial Downloads: notes_pyin

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_audio(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.medleydb_pitch.load_audio

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_notes(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.medleydb_pitch.load_notes

load_pitch(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.medleydb_pitch.load_pitch

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.datasets.medleydb_pitch.Track(track_id, data_home, dataset_name, index, metadata)[source]

medleydb_pitch Track class

Parameters:

track_id (str) – track id of the track

Variables:

artist (str) – artist
audio_path (str) – path to the audio file
genre (str) – genre
instrument (str) – instrument of the track
notes_pyin_path (str) – path to the pyin note annotation file
pitch_path (str) – path to the pitch annotation file
title (str) – title
track_id (str) – track id

Other Parameters:

pitch (F0Data) – human annotated pitch
notes_pyin (NoteData) – notes estimated by the pyin algorithm. Not available in version 2.0

property audio: Tuple[ndarray, float] | None

The track’s audio

Returns:

np.ndarray - audio signal
float - sample rate

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

mirdata.datasets.medleydb_pitch.load_audio(fhandle: BinaryIO) → Tuple[ndarray, float][source]

Load a MedleyDB audio file.

Parameters:

fhandle (str or file-like) – File-like object or path to audio file

Returns:

np.ndarray - the mono audio signal
float - The sample rate of the audio file

mirdata.datasets.medleydb_pitch.load_notes(fhandle: TextIO) → NoteData | None[source]

load a note annotation file

Parameters:: fhandle (str or file-like) – str or file-like to note annotation file
Raises:: IOError – if file doesn’t exist
Returns:: NoteData – note annotation

mirdata.datasets.medleydb_pitch.load_pitch(fhandle: TextIO) → F0Data[source]

load a MedleyDB pitch annotation file

Parameters:: fhandle (str or file-like) – str or file-like to pitch annotation file
Raises:: IOError – if the path doesn’t exist
Returns:: F0Data – pitch annotation

mridangam_stroke

Mridangam Stroke Dataset Loader

Dataset Info

The Mridangam Stroke dataset is a collection of individual strokes of the Mridangam in various tonics. The dataset comprises of 10 different strokes played on Mridangams with 6 different tonic values. The audio examples were recorded from a professional Carnatic percussionist in a semi-anechoic studio conditions by Akshay Anantapadmanabhan.

Total audio samples: 6977

Used microphones:

SM-58 microphones
H4n ZOOM recorder.

Audio specifications:

Sampling frequency: 44.1 kHz
Bit-depth: 16 bit
Audio format: .wav

The dataset can be used for training models for each Mridangam stroke. The presentation of the dataset took place on the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2013) on May 2013. You can read the full publication here: https://repositori.upf.edu/handle/10230/25756

Mridangam Dataset is annotated by storing the informat of each track in their filenames. The structure of the filename is:

<TrackID>__<AuthorName>__<StrokeName>-<Tonic>-<InstanceNum>.wav

The dataset is made available by CompMusic under a Creative Commons Attribution 3.0 Unported (CC BY 3.0) License.

For more details, please visit: https://compmusic.upf.edu/mridangam-stroke-dataset

class mirdata.datasets.mridangam_stroke.Dataset(data_home=None, version='default')[source]

The mridangam_stroke dataset

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: 1.5 Default version: 1.5

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download data to save_dir and optionally print a message.

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

Partial Downloads: remote_data

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_audio(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.mridangam_stroke.load_audio

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.datasets.mridangam_stroke.Track(track_id, data_home, dataset_name, index, metadata)[source]

Mridangam Stroke track class

Parameters:

track_id (str) – track id of the track
data_home (str) – Local path where the dataset is stored.

Variables:

track_id (str) – track id
audio_path (str) – audio path
stroke_name (str) – name of the Mridangam stroke present in Track
tonic (str) – tonic of the stroke in the Track

property audio: Tuple[ndarray, float] | None

The track’s audio

Returns:

np.ndarray - audio signal
float - sample rate

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

mirdata.datasets.mridangam_stroke.load_audio(fhandle: BinaryIO) → Tuple[ndarray, float][source]

Load a Mridangam Stroke Dataset audio file.

Parameters:

fhandle (str or file-like) – File-like object or path to audio file

Returns:

np.ndarray - the mono audio signal
float - The sample rate of the audio file

orchset

ORCHSET Dataset Loader

Dataset Info

Orchset is intended to be used as a dataset for the development and evaluation of melody extraction algorithms. This collection contains 64 audio excerpts focused on symphonic music with their corresponding annotation of the melody.

For more details, please visit: https://zenodo.org/record/1289786#.XREpzaeZPx6

class mirdata.datasets.orchset.Dataset(data_home=None, version='default')[source]

The orchset dataset

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: 1.0 Default version: 1.0

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download data to save_dir and optionally print a message.

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

Partial Downloads: all

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_audio_mono(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.orchset.load_audio_mono

load_audio_stereo(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.orchset.load_audio_stereo

load_melody(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.orchset.load_melody

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.datasets.orchset.Track(track_id, data_home, dataset_name, index, metadata)[source]

orchset Track class

Parameters:

track_id (str) – track id of the track

Variables:

alternating_melody (bool) – True if the melody alternates between instruments
audio_path_mono (str) – path to the mono audio file
audio_path_stereo (str) – path to the stereo audio file
composer (str) – the work’s composer
contains_brass (bool) – True if the track contains any brass instrument
contains_strings (bool) – True if the track contains any string instrument
contains_winds (bool) – True if the track contains any wind instrument
excerpt (str) – True if the track is an excerpt
melody_path (str) – path to the melody annotation file
only_brass (bool) – True if the track contains brass instruments only
only_strings (bool) – True if the track contains string instruments only
only_winds (bool) – True if the track contains wind instruments only
predominant_melodic_instruments (list) – List of instruments which play the melody
track_id (str) – track id
work (str) – The musical work

Other Parameters:

melody (F0Data) – melody annotation

property audio_mono: Tuple[ndarray, float] | None

the track’s audio (mono)

Returns:

np.ndarray - the mono audio signal
float - The sample rate of the audio file

property audio_stereo: Tuple[ndarray, float] | None

the track’s audio (stereo)

Returns:

np.ndarray - the mono audio signal
float - The sample rate of the audio file

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

mirdata.datasets.orchset.load_audio_mono(fhandle: BinaryIO) → Tuple[ndarray, float][source]

Load an Orchset audio file.

Parameters:

fhandle (str or file-like) – File-like object or path to audio file

Returns:

np.ndarray - the mono audio signal
float - The sample rate of the audio file

mirdata.datasets.orchset.load_audio_stereo(fhandle: BinaryIO) → Tuple[ndarray, float][source]

Load an Orchset audio file.

Parameters:

fhandle (str or file-like) – File-like object or path to audio file

Returns:

np.ndarray - the stereo audio signal
float - The sample rate of the audio file

mirdata.datasets.orchset.load_melody(fhandle: TextIO) → F0Data[source]

Load an Orchset melody annotation file

Parameters:: fhandle (str or file-like) – File-like object or path to melody annotation file
Raises:: IOError – if melody_path doesn’t exist
Returns:: F0Data – melody annotation data

phenicx_anechoic

PHENICX-Anechoic Dataset Loader

Dataset Info

This dataset includes audio and annotations useful for tasks as score-informed source separation, score following, multi-pitch estimation, transcription or instrument detection, in the context of symphonic music: M. Miron, J. Carabias-Orti, J. J. Bosch, E. Gómez and J. Janer, “Score-informed source separation for multi-channel orchestral recordings”, Journal of Electrical and Computer Engineering (2016))”

We do not provide the original audio files, which can be found at the web page hosted by Aalto University. However, with their permission we distribute the denoised versions for some of the anechoic orchestral recordings. The original dataset was introduced in: Pätynen, J., Pulkki, V., and Lokki, T., “Anechoic recording system for symphony orchestra,” Acta Acustica united with Acustica, vol. 94, nr. 6, pp. 856-865, November/December 2008.

Additionally, we provide the associated musical note onset and offset annotations, and the Roomsim configuration files used to generate the multi-microphone recordings.

The original anechoic dataset in Pätynen et al. consists of four passages of symphonic music from the Classical and Romantic periods. This work presented a set of anechoic recordings for each of the instruments, which were then synchronized between them so that they could later be combined to a mix of the orchestra. In order to keep the evaluation setup consistent between the four pieces, we selected the following instruments: violin, viola, cello, double bass, oboe, flute, clarinet, horn, trumpet and bassoon. A list of the characteristics of the four pieces can be found below:

Mozart - duration: 3min 47s - period: classical - no. sources: 8 - total no. instruments: 10 - max. instruments/source: 2

Beethoven - duration: 3min 11s - period: classical - no. sources: 10 - total no. instruments: 20 - max. instruments/source: 4

Beethoven - duration: 2min 12s - period: romantic - no. sources: 10 - total no. instruments: 30 - max. instruments/source: 4

Bruckner - duration: 1min 27s - period: romantic - no. sources: 10 - total no. instruments: 39 - max. instruments/source: 12

For more details, please visit: https://www.upf.edu/web/mtg/phenicx-anechoic

class mirdata.datasets.phenicx_anechoic.Dataset(data_home=None, version='default')[source]

The Phenicx-Anechoic dataset

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: 1.0 Default version: 1.0

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download data to save_dir and optionally print a message.

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

Partial Downloads: all

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_audio(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.phenicx_anechoic.load_audio

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_score(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.phenicx_anechoic.load_score

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.datasets.phenicx_anechoic.MultiTrack(mtrack_id, data_home, dataset_name, index, track_class, metadata)[source]

Phenicx-Anechoic MultiTrack class

Parameters:

mtrack_id (str) – track id of the track
data_home (str) – Local path where the dataset is stored. If None, looks for the data in the default directory, ~/mir_datasets/Phenicx-Anechoic

Variables:

track_audio_property (str) – the attribute of track which is used for mixing
mtrack_id (str) – multitrack id
piece (str) – the classical music piece associated with this multitrack
tracks (dict) – dict of track ids and the corresponding Tracks
instruments (dict) – dict of instruments and the corresponding track
sections (dict) – dict of sections and the corresponding list of tracks for each section

get_audio_for_instrument(instrument)[source]

Get the audio for a particular instrument

Parameters:: instrument (str) – the instrument to get audio for
Returns:: np.ndarray – instrument audio with shape (n_samples, n_channels)

get_audio_for_section(section)[source]

Get the audio for a particular section

Parameters:: section (str) – the section to get audio for
Returns:: np.ndarray – section audio with shape (n_samples, n_channels)

get_mix()[source]

Create a linear mixture given a subset of tracks.

Parameters:: track_keys (list) – list of track keys to mix together
Returns:: np.ndarray – mixture audio with shape (n_samples, n_channels)

get_notes_for_instrument(instrument, notes_property='notes')[source]

Get the notes for a particular instrument

Parameters:

instrument (str) – the instrument to get the notes for
notes_property (str) – the attribute associated with NoteData, notes or notes_original

Returns:

NoteData – Note data for the instrument

get_notes_for_section(section, notes_property='notes')[source]

Get the notes for a particular section

Parameters:

section (str) – the section to get the notes for
notes_property (str) – the attribute associated with NoteData, notes or notes_original

Returns:

NoteData – Note data for the section

get_notes_target(track_keys, notes_property='notes')[source]

Get the notes for all the tracks

Parameters:

track_keys (list) – list of track keys to get the NoteData for
notes_property (str) – the attribute associated with NoteData, notes or notes_original

Returns:

NoteData – Note data for the tracks

get_path(key)[source]

Get absolute path to multitrack audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

get_random_target(n_tracks=None, min_weight=0.3, max_weight=1.0)[source]

Get a random target by combining a random selection of tracks with random weights

Parameters:

n_tracks (int or None) – number of tracks to randomly mix. If None, uses all tracks
min_weight (float) – minimum possible weight when mixing
max_weight (float) – maximum possible weight when mixing

Returns:

np.ndarray - mixture audio with shape (n_samples, n_channels)
list - list of keys of included tracks
list - list of weights used to mix tracks

get_target(track_keys, weights=None, average=True, enforce_length=True)[source]

Get target which is a linear mixture of tracks

Parameters:

track_keys (list) – list of track keys to mix together
weights (list or None) – list of positive scalars to be used in the average
average (bool) – if True, computes a weighted average of the tracks if False, computes a weighted sum of the tracks
enforce_length (bool) – If True, raises ValueError if the tracks are not the same length. If False, pads audio with zeros to match the length of the longest track

Returns:

np.ndarray – target audio with shape (n_channels, n_samples)

Raises:

ValueError – if sample rates of the tracks are not equal if enforce_length=True and lengths are not equal

class mirdata.datasets.phenicx_anechoic.Track(track_id, data_home, dataset_name, index, metadata)[source]

Phenicx-Anechoic Track class

Parameters:

track_id (str) – track id of the track

Variables:

audio_path (list) – path to the audio files
notes_path (list) – path to the score files
notes_original_path (list) – path to the original score files
instrument (str) – the name of the instrument
piece (str) – the name of the piece
n_voices (int) – the number of voices in this instrument
track_id (str) – track id

Other Parameters:

notes (NoteData) – notes annotations that have been time-aligned to the audio
notes_original (NoteData) – original score representation, not time-aligned

property audio: Tuple[ndarray, float] | None

the track’s audio

Returns:

np.ndarray - the mono audio signal
float - The sample rate of the audio file

get_audio_voice(id_voice: int) → Tuple[ndarray, float] | None[source]

the track’s audio

Parameters:

id_voice (int) – The integer identifier for the voice e.g. 2 for bassoon-2

Returns:

np.ndarray - the mono audio signal
float - The sample rate of the audio file

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

notes

the track’s notes corresponding to the score aligned to the audio

Returns:: NoteData – Note data for the track

notes_original

the track’s notes corresponding to the original score

Returns:: NoteData – Note data for the track

mirdata.datasets.phenicx_anechoic.load_audio(fhandle: BinaryIO) → Tuple[ndarray, float][source]

Load a Phenicx-Anechoic audio file.

Parameters:

fhandle (str or file-like) – File-like object or path to audio file

Returns:

np.ndarray - the audio signal
float - The sample rate of the audio file

mirdata.datasets.phenicx_anechoic.load_score(fhandle: TextIO) → NoteData[source]

Load a Phenicx-Anechoic score file.

Parameters:: fhandle (str or file-like) – File-like object or path to score file
Returns:: NoteData – Note data for the given track

queen

Queen Dataset Loader

Dataset Info

Queen Dataset includes chord, key, and segmentation annotations for 51 Queen songs. Details can be found in http://matthiasmauch.net/_pdf/mauch_omp_2009.pdf and http://isophonics.net/content/reference-annotations-queen.

The CDs used in this dataset are: Queen: Greatest Hits I, Parlophone, 0777 7 8950424 Queen: Greatest Hits II, Parlophone, CDP 7979712 Queen: Greatest Hits III, Parlophone, 7243 52389421

In the progress of labelling the chords, C4DM researchers used the following literature to verify their judgements:

Queen, Greatest Hits I, International Music Publications Ltd, London, ISBN 0-571-52828-7

Queen, Greatest Hits II, Queen Music Ltd./EMI Music Publishing (Barnes Music Engraving), ISBN 0-86175-465-4

Acknowledgements We’d like to thank our student annotators:

Eric Gyingy Diako Rasoul Felix Stiller Helena du Toit Vinh Ton Chuks Chiejine

class mirdata.datasets.queen.Dataset(data_home=None, version='default')[source]

Queen dataset

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: 1.0 Default version: 1.0

load_audio(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.queen.load_audio

load_chords(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.queen.load_chords

load_key(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.queen.load_key

load_sections(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.queen.load_sections

class mirdata.datasets.queen.Track(track_id, data_home, dataset_name, index, metadata)[source]

Queen track class

Parameters:

track_id (str) – track id of the track

Variables:

audio_path (str) – track audio path
chords_path (str) – chord annotation path
keys_path (str) – key annotation path
sections_path (str) – sections annotation path
title (str) – title of the track
track_id (str) – track id

Other Parameters:

chords (ChordData) – human-labeled chord annotations
key (KeyData) – local key annotations
sections (SectionData) – section annotations

property audio: Tuple[ndarray, float] | None

The track’s audio

Returns:

np.ndarray - audio signal
float - sample rate

mirdata.datasets.queen.load_audio(fhandle: BinaryIO) → Tuple[ndarray, float][source]

Load a Queen audio file.

Parameters:

fhandle (str) – path to an audio file

Returns:

np.ndarray - the mono audio signal
float - The sample rate of the audio file

mirdata.datasets.queen.load_chords(fhandle: TextIO) → ChordData[source]

Load Queen format chord data from a file

Parameters:: fhandle (str or file-like) – path or file-like object pointing to a chord file
Returns:: (ChordData) – loaded chord data

mirdata.datasets.queen.load_key(fhandle: TextIO) → KeyData[source]

Load Queen format key data from a file

Parameters:: fhandle (str or file-like) – path or file-like object pointing to a key file
Returns:: (KeyData) – loaded key data

mirdata.datasets.queen.load_sections(fhandle: TextIO) → SectionData[source]

Load Queen format section data from a file

Parameters:: fhandle (str or file-like) – path or file-like object pointing to a section file
Returns:: (SectionData) – loaded section data

rwc_classical

RWC Classical Dataset Loader

Dataset Info

The Classical Music Database consists of 50 pieces

Symphonies: 4 pieces
Concerti: 2 pieces
Orchestral music: 4 pieces
Chamber music: 10 pieces
Solo performances: 24 pieces
Vocal performances: 6 pieces

A note about the Beat annotations:

48 corresponds to the duration of a quarter note (crotchet)
24 corresponds to the duration of an eighth note (quaver)
384 corresponds to the position of a downbeat

In 4/4 time signature, they correspond as follows:

1st beat in a measure (i.e., downbeat position)
2nd beat
3rd beat
4th beat

In 3/4 time signature, they correspond as follows:

1st beat in a measure (i.e., downbeat position)
2nd beat
3rd beat

In 6/8 time signature, they correspond as follows:

1st beat in a measure (i.e., downbeat position)
2nd beat
3rd beat
4th beat
5th beat
6th beat

For more details, please visit: https://staff.aist.go.jp/m.goto/RWC-MDB/rwc-mdb-c.html

class mirdata.datasets.rwc_classical.Dataset(data_home=None, version='default')[source]

The rwc_classical dataset

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: 1.0 Default version: 1.0

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download data to save_dir and optionally print a message.

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

Partial Downloads: annotations_beat annotations_sections metadata

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_audio(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.rwc_classical.load_audio

load_beats(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.rwc_classical.load_beats

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_sections(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.rwc_classical.load_sections

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.datasets.rwc_classical.Track(track_id, data_home, dataset_name, index, metadata)[source]

rwc_classical Track class

Parameters:

track_id (str) – track id of the track

Variables:

artist (str) – the track’s artist
audio_path (str) – path of the audio file
beats_path (str) – path of the beat annotation file
category (str) – One of ‘Symphony’, ‘Concerto’, ‘Orchestral’, ‘Solo’, ‘Chamber’, ‘Vocal’, or blank.
composer (str) – Composer of this Track.
duration (float) – Duration of the track in seconds
piece_number (str) – Piece number of this Track, [1-50]
sections_path (str) – path of the section annotation file
suffix (str) – string within M01-M06
title (str) – Title of The track.
track_id (str) – track id
track_number (str) – CD track number of this Track

Other Parameters:

sections (SectionData) – human-labeled section annotations
beats (BeatData) – human-labeled beat annotations

property audio: Tuple[ndarray, float] | None

The track’s audio

Returns:

np.ndarray - audio signal
float - sample rate

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

mirdata.datasets.rwc_classical.load_audio(fhandle: BinaryIO) → Tuple[ndarray, float][source]

Load a RWC audio file.

Parameters:

fhandle (str or file-like) – File-like object or path to audio file

Returns:

np.ndarray - the mono audio signal
float - The sample rate of the audio file

mirdata.datasets.rwc_classical.load_beats(fhandle: TextIO) → BeatData[source]

Load rwc beat data from a file

Parameters:: fhandle (str or file-like) – File-like object or path to beats annotation file
Returns:: BeatData – beat data

mirdata.datasets.rwc_classical.load_sections(fhandle: TextIO) → SectionData | None[source]

Load rwc section data from a file

Parameters:: fhandle (str or file-like) – File-like object or path to sections annotation file
Returns:: SectionData – section data

rwc_jazz

RWC Jazz Dataset Loader.

Dataset Info

The Jazz Music Database consists of 50 pieces:

Instrumentation variations: 35 pieces (5 pieces × 7 instrumentations).
The instrumentation-variation pieces were recorded to obtain different versions of the same piece; i.e., different arrangements performed by different player instrumentations. Five standard-style jazz pieces were originally composed and then performed in modern-jazz style using the following seven instrumentations:
1. Piano solo
2. Guitar solo
3. Duo: Vibraphone + Piano, Flute + Piano, and Piano + Bass
4. Piano trio: Piano + Bass + Drums
5. Piano trio + Trumpet or Tenor saxophone
6. Octet: Piano trio + Guitar + Alto saxophone + Baritone saxophone + Tenor saxophone × 2
7. Piano trio + Vibraphone or Flute
Style variations: 9 pieces
The style-variation pieces were recorded to represent various styles of jazz. They include four well-known public-domain pieces and consist of
1. Vocal jazz: 2 pieces (including “Aura Lee”)
2. Big band jazz: 2 pieces (including “The Entertainer”)
3. Modal jazz: 2 pieces
4. Funky jazz: 2 pieces (including “Silent Night”)
5. Free jazz: 1 piece (including “Joyful, Joyful, We Adore Thee”)
Fusion (crossover): 6 pieces

The fusion pieces were recorded to obtain music that combines elements of jazz with other styles such as popular, rock, and latin. They include music with an eighth-note feel, music with a sixteenth-note feel, and Latin jazz music.

For more details, please visit: https://staff.aist.go.jp/m.goto/RWC-MDB/rwc-mdb-j.html

class mirdata.datasets.rwc_jazz.Dataset(data_home=None, version='default')[source]

The rwc_jazz dataset

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: 1.0 Default version: 1.0

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download data to save_dir and optionally print a message.

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

Partial Downloads: metadata annotations_beat annotations_sections

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_audio(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.rwc_jazz.load_audio

load_beats(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.rwc_jazz.load_beats

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_sections(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.rwc_jazz.load_sections

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.datasets.rwc_jazz.Track(track_id, data_home, dataset_name, index, metadata)[source]

rwc_jazz Track class

Parameters:

track_id (str) – track id of the track

Variables:

artist (str) – Artist name
audio_path (str) – path of the audio file
beats_path (str) – path of the beat annotation file
duration (float) – Duration of the track in seconds
instruments (str) – list of used instruments.
piece_number (str) – Piece number of this Track, [1-50]
sections_path (str) – path of the section annotation file
suffix (str) – M01-M04
title (str) – Title of The track.
track_id (str) – track id
track_number (str) – CD track number of this Track
variation (str) – style variations

Other Parameters:

sections (SectionData) – human-labeled section data
beats (BeatData) – human-labeled beat data

property audio: Tuple[ndarray, float] | None

The track’s audio

Returns:

np.ndarray - audio signal
float - sample rate

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

rwc_popular

RWC Popular Dataset Loader

Dataset Info

The Popular Music Database consists of 100 songs — 20 songs with English lyrics performed in the style of popular music typical of songs on the American hit charts in the 1980s, and 80 songs with Japanese lyrics performed in the style of modern Japanese popular music typical of songs on the Japanese hit charts in the 1990s.

For more details, please visit: https://staff.aist.go.jp/m.goto/RWC-MDB/rwc-mdb-p.html

class mirdata.datasets.rwc_popular.Dataset(data_home=None, version='default')[source]

The rwc_popular dataset

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: 1.0 Default version: 1.0

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download data to save_dir and optionally print a message.

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

Partial Downloads: metadata annotations_beat annotations_sections annotations_chords annotations_vocal_act

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_audio(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.rwc_popular.load_audio

load_beats(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.rwc_popular.load_beats

load_chords(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.rwc_popular.load_chords

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_sections(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.rwc_popular.load_sections

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

load_vocal_activity(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.rwc_popular.load_vocal_activity

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.datasets.rwc_popular.Track(track_id, data_home, dataset_name, index, metadata)[source]

rwc_popular Track class

Parameters:

track_id (str) – track id of the track

Variables:

artist (str) – artist
audio_path (str) – path of the audio file
beats_path (str) – path of the beat annotation file
chords_path (str) – path of the chord annotation file
drum_information (str) – If the drum is ‘Drum sequences’, ‘Live drums’, or ‘Drum loops’
duration (float) – Duration of the track in seconds
instruments (str) – List of used instruments
piece_number (str) – Piece number, [1-50]
sections_path (str) – path of the section annotation file
singer_information (str) – could be male, female or vocal group
suffix (str) – M01-M04
tempo (str) – Tempo of the track in BPM
title (str) – title
track_id (str) – track id
track_number (str) – CD track number
voca_inst_path (str) – path of the vocal/instrumental annotation file

Other Parameters:

sections (SectionData) – human-labeled section annotation
beats (BeatData) – human-labeled beat annotation
chords (ChordData) – human-labeled chord annotation
vocal_instrument_activity (EventData) – human-labeled vocal/instrument activity

property audio: Tuple[ndarray, float] | None

The track’s audio

Returns:

np.ndarray - audio signal
float - sample rate

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

mirdata.datasets.rwc_popular.load_chords(fhandle: TextIO) → ChordData[source]

Load rwc chord data from a file

Parameters:: fhandle (str or file-like) – File-like object or path to chord annotation file
Returns:: ChordData – chord data

mirdata.datasets.rwc_popular.load_vocal_activity(fhandle: TextIO) → EventData[source]

Load rwc vocal activity data from a file

Parameters:: fhandle (str or file-like) – File-like object or path to vocal activity annotation file
Returns:: EventData – vocal activity data

salami

SALAMI Dataset Loader

Dataset Info

The SALAMI dataset contains Structural Annotations of a Large Amount of Music Information: the public portion contains over 2200 annotations of over 1300 unique tracks.

NB: mirdata relies on the corrected version of the 2.0 annotations: Details can be found at https://github.com/bmcfee/salami-data-public/tree/hierarchy-corrections and https://github.com/DDMAL/salami-data-public/pull/15.

For more details, please visit: https://github.com/DDMAL/salami-data-public

class mirdata.datasets.salami.Dataset(data_home=None, version='default')[source]

The salami dataset

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: 2.0-corrected Default version: 2.0-corrected

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download data to save_dir and optionally print a message.

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

Partial Downloads: annotations

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_audio(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.salami.load_audio

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_sections(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.salami.load_sections

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.datasets.salami.Track(track_id, data_home, dataset_name, index, metadata)[source]

salami Track class

Parameters:

track_id (str) – track id of the track

Variables:

annotator_1_id (str) – number that identifies annotator 1
annotator_1_time (str) – time that the annotator 1 took to complete the annotation
annotator_2_id (str) – number that identifies annotator 1
annotator_2_time (str) – time that the annotator 1 took to complete the annotation
artist (str) – song artist
audio_path (str) – path to the audio file
broad_genre (str) – broad genre of the song
duration (float) – duration of song in seconds
genre (str) – genre of the song
sections_annotator1_lowercase_path (str) – path to annotations in hierarchy level 1 from annotator 1
sections_annotator1_uppercase_path (str) – path to annotations in hierarchy level 0 from annotator 1
sections_annotator2_lowercase_path (str) – path to annotations in hierarchy level 1 from annotator 2
sections_annotator2_uppercase_path (str) – path to annotations in hierarchy level 0 from annotator 2
source (str) – dataset or source of song
title (str) – title of the song

Other Parameters:

sections_annotator_1_uppercase (SectionData) – annotations in hierarchy level 0 from annotator 1
sections_annotator_1_lowercase (SectionData) – annotations in hierarchy level 1 from annotator 1
sections_annotator_2_uppercase (SectionData) – annotations in hierarchy level 0 from annotator 2
sections_annotator_2_lowercase (SectionData) – annotations in hierarchy level 1 from annotator 2
sections_uppercase (annotations.MultiAnnotator) – annotations in hierarchy level 0
sections_lowercase (annotations.MultiAnnotator) – annotations in hierarchy level 1

property audio: Tuple[ndarray, float]

The track’s audio

Returns:

np.ndarray - audio signal
float - sample rate

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

mirdata.datasets.salami.load_audio(fpath: str) → Tuple[ndarray, float][source]

Load a Salami audio file.

Parameters:

fpath (str) – path to audio file

Returns:

np.ndarray - the mono audio signal
float - The sample rate of the audio file

mirdata.datasets.salami.load_sections(fhandle: TextIO) → SectionData | None[source]

Load salami sections data from a file

Parameters:: fhandle (str or file-like) – File-like object or path to section annotation file
Returns:: SectionData – section data

saraga_carnatic

Saraga Dataset Loader

Dataset Info

This dataset contains time aligned melody, rhythm and structural annotations of Carnatic Music tracks, extracted from the large open Indian Art Music corpora of CompMusic.

The dataset contains the following manual annotations referring to audio files:

Section and tempo annotations stored as start and end timestamps together with the name of the section and tempo during the section (in a separate file)
Sama annotations referring to rhythmic cycle boundaries stored as timestamps.
Phrase annotations stored as timestamps and transcription of the phrases using solfège symbols ({S, r, R, g, G, m, M, P, d, D, n, N}).
Audio features automatically extracted and stored: pitch and tonic.
The annotations are stored in text files, named as the audio filename but with the respective extension at the end, for instance: “Bhuvini Dasudane.tempo-manual.txt”.

The dataset contains a total of 249 tracks. A total of 168 tracks have multitrack audio.

The files of this dataset are shared with the following license: Creative Commons Attribution Non Commercial Share Alike 4.0 International

Dataset compiled by: Bozkurt, B.; Srinivasamurthy, A.; Gulati, S. and Serra, X.

For more information about the dataset as well as IAM and annotations, please refer to: https://mtg.github.io/saraga/, where a really detailed explanation of the data and annotations is published.

class mirdata.datasets.saraga_carnatic.Dataset(data_home=None, version='default')[source]

The saraga_carnatic dataset

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: 1.5 Default version: 1.5

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download data to save_dir and optionally print a message.

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

Partial Downloads: all

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_audio(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.saraga_carnatic.load_audio

load_metadata(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.saraga_carnatic.load_metadata

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_phrases(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.saraga_carnatic.load_phrases

load_pitch(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.saraga_carnatic.load_pitch

load_sama(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.saraga_carnatic.load_sama

load_sections(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.saraga_carnatic.load_sections

load_tempo(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.saraga_carnatic.load_tempo

load_tonic(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.saraga_carnatic.load_tonic

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.datasets.saraga_carnatic.Track(track_id, data_home, dataset_name, index, metadata)[source]

Saraga Track Carnatic class

Parameters:

track_id (str) – track id of the track
data_home (str) – Local path where the dataset is stored. default=None If None, looks for the data in the default directory, ~/mir_datasets

Variables:

audio_path (str) – path to audio file
audio_ghatam_path (str) – path to ghatam audio file
audio_mridangam_left_path (str) – path to mridangam left audio file
audio_mridangam_right_path (str) – path to mridangam right audio file
audio_violin_path (str) – path to violin audio file
audio_vocal_s_path (str) – path to vocal s audio file
audio_vocal_pat (str) – path to vocal pat audio file
ctonic_path (srt) – path to ctonic annotation file
pitch_path (srt) – path to pitch annotation file
pitch_vocal_path (srt) – path to vocal pitch annotation file
tempo_path (srt) – path to tempo annotation file
sama_path (srt) – path to sama annotation file
sections_path (srt) – path to sections annotation file
phrases_path (srt) – path to phrases annotation file
metadata_path (srt) – path to metadata file

Other Parameters:

tonic (float) – tonic annotation
pitch (F0Data) – pitch annotation
pitch_vocal (F0Data) – vocal pitch annotation
tempo (dict) – tempo annotations
sama (BeatData) – sama section annotations
sections (SectionData) – track section annotations
phrases (SectionData) – phrase annotations
metadata (dict) – track metadata with the following fields:
- title (str): Title of the piece in the track
- mbid (str): MusicBrainz ID of the track
- album_artists (list, dicts): list of dicts containing the album artists present in the track and its mbid
- artists (list, dicts): list of dicts containing information of the featuring artists in the track
- raaga (list, dict): list of dicts containing information about the raagas present in the track
- form (list, dict): list of dicts containing information about the forms present in the track
- work (list, dicts): list of dicts containing the work present in the piece, and its mbid
- taala (list, dicts): list of dicts containing the talas present in the track and its uuid
- concert (list, dicts): list of dicts containing the concert where the track is present and its mbid

property audio

The track’s audio

Returns:

np.ndarray - audio signal
float - sample rate

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

mirdata.datasets.saraga_carnatic.load_audio(audio_path)[source]

Load a Saraga Carnatic audio file.

Parameters:

audio_path (str) – path to audio file

Returns:

np.ndarray - the mono audio signal
float - The sample rate of the audio file

mirdata.datasets.saraga_carnatic.load_metadata(fhandle)[source]

Load a Saraga Carnatic metadata file

Parameters:

fhandle (str or file-like) – File-like object or path to metadata json

Returns:

dict –

metadata with the following fields

title (str): Title of the piece in the track

mbid (str): MusicBrainz ID of the track

album_artists (list, dicts): list of dicts containing the album artists present in the track and its mbid

artists (list, dicts): list of dicts containing information of the featuring artists in the track

raaga (list, dict): list of dicts containing information about the raagas present in the track

form (list, dict): list of dicts containing information about the forms present in the track

work (list, dicts): list of dicts containing the work present in the piece, and its mbid

taala (list, dicts): list of dicts containing the talas present in the track and its uuid

concert (list, dicts): list of dicts containing the concert where the track is present and its mbid

mirdata.datasets.saraga_carnatic.load_phrases(fhandle)[source]

Load phrases

Parameters:: fhandle (str or file-like) – Local path where the phrase annotation is stored.
Returns:: EventData – phrases annotation for track

mirdata.datasets.saraga_carnatic.load_pitch(fhandle)[source]

Load pitch

Parameters:: fhandle (str or file-like) – Local path where the pitch annotation is stored.
Returns:: F0Data – pitch annotation

mirdata.datasets.saraga_carnatic.load_sama(fhandle)[source]

Load sama

Parameters:: fhandle (str or file-like) – Local path where the sama annotation is stored.
Returns:: BeatData – sama annotations

mirdata.datasets.saraga_carnatic.load_sections(fhandle)[source]

Load sections from carnatic collection

Parameters:: fhandle (str or file-like) – Local path where the section annotation is stored.
Returns:: SectionData – section annotations for track

mirdata.datasets.saraga_carnatic.load_tempo(fhandle)[source]

Load tempo from carnatic collection

Parameters:

fhandle (str or file-like) – Local path where the tempo annotation is stored.

Returns:

dict –

Dictionary of tempo information with the following keys:

tempo_apm: tempo in aksharas per minute (APM)

tempo_bpm: tempo in beats per minute (BPM)

sama_interval: median duration (in seconds) of one tāla cycle

beats_per_cycle: number of beats in one cycle of the tāla

subdivisions: number of aksharas per beat of the tāla

mirdata.datasets.saraga_carnatic.load_tonic(fhandle)[source]

Load track absolute tonic

Parameters:: fhandle (str or file-like) – Local path where the tonic path is stored.
Returns:: int – Tonic annotation in Hz

saraga_hindustani

Saraga Dataset Loader

Dataset Info

This dataset contains time aligned melody, rhythm and structural annotations of Hindustani Music tracks, extracted from the large open Indian Art Music corpora of CompMusic.

The dataset contains the following manual annotations referring to audio files:

Section and tempo annotations stored as start and end timestamps together with the name of the section and tempo during the section (in a separate file)
Sama annotations referring to rhythmic cycle boundaries stored as timestamps
Phrase annotations stored as timestamps and transcription of the phrases using solfège symbols ({S, r, R, g, G, m, M, P, d, D, n, N})
Audio features automatically extracted and stored: pitch and tonic.
The annotations are stored in text files, named as the audio filename but with the respective extension at the end, for instance: “Bhuvini Dasudane.tempo-manual.txt”.

The dataset contains a total of 108 tracks.

The files of this dataset are shared with the following license: Creative Commons Attribution Non Commercial Share Alike 4.0 International

Dataset compiled by: Bozkurt, B.; Srinivasamurthy, A.; Gulati, S. and Serra, X.

For more information about the dataset as well as IAM and annotations, please refer to: https://mtg.github.io/saraga/, where a really detailed explanation of the data and annotations is published.

class mirdata.datasets.saraga_hindustani.Dataset(data_home=None, version='default')[source]

The saraga_hindustani dataset

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: 1.5 Default version: 1.5

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download data to save_dir and optionally print a message.

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

Partial Downloads: all

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_audio(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.saraga_hindustani.load_audio

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_phrases(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.saraga_hindustani.load_phrases

load_pitch(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.saraga_hindustani.load_pitch

load_sama(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.saraga_hindustani.load_sama

load_sections(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.saraga_hindustani.load_sections

load_tempo(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.saraga_hindustani.load_tempo

load_tonic(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.saraga_hindustani.load_tonic

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.datasets.saraga_hindustani.Track(track_id, data_home, dataset_name, index, metadata)[source]

Saraga Hindustani Track class

Parameters:

track_id (str) – track id of the track
data_home (str) – Local path where the dataset is stored. default=None If None, looks for the data in the default directory, ~/mir_datasets

Variables:

audio_path (str) – path to audio file
ctonic_path (str) – path to ctonic annotation file
pitch_path (str) – path to pitch annotation file
tempo_path (str) – path to tempo annotation file
sama_path (str) – path to sama annotation file
sections_path (str) – path to sections annotation file
phrases_path (str) – path to phrases annotation file
metadata_path (str) – path to metadata annotation file

Other Parameters:

tonic (float) – tonic annotation
pitch (F0Data) – pitch annotation
tempo (dict) – tempo annotations
sama (BeatData) – Sama section annotations
sections (SectionData) – track section annotations
phrases (EventData) – phrase annotations
metadata (dict) – track metadata with the following fields
- title (str): Title of the piece in the track
- mbid (str): MusicBrainz ID of the track
- album_artists (list, dicts): list of dicts containing the album artists present in the track and its mbid
- artists (list, dicts): list of dicts containing information of the featuring artists in the track
- raags (list, dict): list of dicts containing information about the raags present in the track
- forms (list, dict): list of dicts containing information about the forms present in the track
- release (list, dicts): list of dicts containing information of the release where the track is found
- works (list, dicts): list of dicts containing the work present in the piece, and its mbid
- taals (list, dicts): list of dicts containing the taals present in the track and its uuid
- layas (list, dicts): list of dicts containing the layas present in the track and its uuid

property audio

The track’s audio

Returns:

np.ndarray - audio signal
float - sample rate

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

mirdata.datasets.saraga_hindustani.load_audio(audio_path)[source]

Load a Saraga Hindustani audio file.

Parameters:

audio_path (str) – path to audio file

Returns:

np.ndarray - the mono audio signal
float - The sample rate of the audio file

mirdata.datasets.saraga_hindustani.load_metadata(fhandle)[source]

Load a Saraga Hindustani metadata file

Parameters:

fhandle (str or file-like) – path to metadata json file

Returns:

dict –

metadata with the following fields

title (str): Title of the piece in the track

mbid (str): MusicBrainz ID of the track

album_artists (list, dicts): list of dicts containing the album artists present in the track and its mbid

artists (list, dicts): list of dicts containing information of the featuring artists in the track

raags (list, dict): list of dicts containing information about the raags present in the track

forms (list, dict): list of dicts containing information about the forms present in the track

release (list, dicts): list of dicts containing information of the release where the track is found

works (list, dicts): list of dicts containing the work present in the piece, and its mbid

taals (list, dicts): list of dicts containing the taals present in the track and its uuid

layas (list, dicts): list of dicts containing the layas present in the track and its uuid

mirdata.datasets.saraga_hindustani.load_phrases(fhandle)[source]

Load phrases

Parameters:: fhandle (str or file-like) – Local path where the phrase annotation is stored. If None, returns None.
Returns:: EventData – phrases annotation for track

mirdata.datasets.saraga_hindustani.load_pitch(fhandle)[source]

Load automatic extracted pitch or melody

Parameters:: fhandle (str or file-like) – Local path where the pitch annotation is stored. If None, returns None.
Returns:: F0Data – pitch annotation

mirdata.datasets.saraga_hindustani.load_sama(fhandle)[source]

Load sama

Parameters:: fhandle (str or file-like) – Local path where the sama annotation is stored. If None, returns None.
Returns:: SectionData – sama annotations

mirdata.datasets.saraga_hindustani.load_sections(fhandle)[source]

Load tracks sections

Parameters:: fhandle (str or file-like) – Local path where the section annotation is stored.
Returns:: SectionData – section annotations for track

mirdata.datasets.saraga_hindustani.load_tempo(fhandle)[source]

Load tempo from hindustani collection

Parameters:

fhandle (str or file-like) – Local path where the tempo annotation is stored.

Returns:

dict – Dictionary of tempo information with the following keys:

tempo: median tempo for the section in mātrās per minute (MPM)
matra_interval: tempo expressed as the duration of the mātra (essentially dividing 60 by tempo, expressed in seconds)
sama_interval: median duration of one tāl cycle in the section
matras_per_cycle: indicator of the structure of the tāl, showing the number of mātrā in a cycle of the tāl of the recording
start_time: start time of the section
duration: duration of the section

mirdata.datasets.saraga_hindustani.load_tonic(fhandle)[source]

Load track absolute tonic

Parameters:: fhandle (str or file-like) – Local path where the tonic path is stored. If None, returns None.
Returns:: int – Tonic annotation in Hz

scms

Saraga-Carnatic-Melody-Synth loader

Dataset Info

This dataset contains time aligned vocal melody and activations for Carnatic Music recordings, extracted from the Saraga Carnatic dataset. The recordings have passed through a Carnatic-aware Analysis/Synthesis framework to convert automatically extracted pitch tracks into ground-truth annotations. This dataset is not meant to be listened to, but to be used as training and evaluation data for the vocal pitch extraction research of Indian Art Music.

The dataset contains a total of 2460 tracks, which generally have a length of 30 seconds, in some cases a bit less. All the tracks have vocals at some point.

The files of this dataset are shared with the following license: Creative Commons Attribution Non Commercial Share Alike 4.0 International

Dataset compiled by: Genís Plaja-Roglans, Thomas Nuttall, Lara Pearson, Xavier Serra, and Marius Miron.

For more information about Saraga Carnatic please refer to https://mtg.github.io/saraga/.

class mirdata.datasets.scms.Dataset(data_home=None, version='default')[source]

The Saraga-Carnatic-Melody-Synth dataset

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: 1.0 Default version: 1.0

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download data to save_dir and optionally print a message.

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

Partial Downloads: scms

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.datasets.scms.Track(track_id, data_home, dataset_name, index, metadata)[source]

Saraga-Carnatic-Melody-Synth Track class

Parameters:

track_id (str) – track id of the track

Variables:

artist (str) – artist
audio_path (str) – path to the audio file
pitch_path (str) – path to the pitch annotation file
activations_path (str) – path to the vocal activation annotation file
tonic (str) – tonic of the recording
gender (str) – gender
artist – instrument of the track
title (str) – title
train (bool) – indicating if the track belongs to the train or testing set
track_id (str) – track id

Other Parameters:

pitch (F0Data) – vocal pitch time-series
activations (EventData) – time regions where the singing voice is present and active

property audio: Tuple[ndarray, float] | None

The track”s audio

Returns:

np.ndarray - audio signal
float - sample rate

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

mirdata.datasets.scms.load_activations(fhandle: TextIO) → EventData | None[source]

load a Saraga-Carnatic-Melody-Synth activation annotation file

Parameters:: fhandle (str or file-like) – str or file-like to note annotation file
Raises:: IOError – if file doesn”t exist
Returns:: EventData – vocal activations

mirdata.datasets.scms.load_audio(fhandle: BinaryIO) → Tuple[ndarray, float][source]

Load a Saraga-Carnatic-Melody-Synth audio file.

Parameters:

fhandle (str or file-like) – File-like object or path to audio file

Returns:

np.ndarray - the mono audio signal
float - The sample rate of the audio file

mirdata.datasets.scms.load_pitch(fhandle: TextIO) → F0Data[source]

load a Saraga-Carnatic-Melody-Synth pitch annotation file

Parameters:: fhandle (str or file-like) – str or file-like to pitch annotation file
Raises:: IOError – if the path doesn”t exist
Returns:: F0Data – pitch annotation

simac

SIMAC Rhythm Dataset Loader

Dataset Info

The SIMAC (Semantic Interaction with Music Audio Contents) project, funded by the EU-FP6-IST-507142, addresses the development of innovative components for a music information retrieval system. It focuses on the use and exploitation of semantic descriptors of musical content, automatically extracted from music audio files. These descriptors, derived from combinations of lower-level descriptors and generalizations from manually annotated databases, are generated using machine learning techniques. Although SIMAC considers multiple modalities within its corpora, this loader currently supports only the rhythmic portion. We may add support for other modalities in the future to broaden its applicability.

Project Overview:

SIMAC’s approach to music content processing involves the computation of low-level signal features, characterizing the acoustic properties of signals.

Musical Facets and Descriptors:

Rhythm: SIMAC investigates various aspects of automatic rhythm description, such as tempo induction, beat tracking, and rhythmic pattern characterization.

Acknowledgments and References:

The project involved more than 15 collaborators and was led by teams from Universitat Pompeu Fabra Barcelona, Queen Mary University London, the Austrian Research Institute for Artificial Intelligence Vienna, and Philips Research Eindhoven. For detailed information, visit [http://www.semanticaudio.org](http://www.semanticaudio.org).

class mirdata.datasets.simac.Dataset(data_home=None, version='default')[source]

The Simac dataset

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: Default version: 1.0

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download data to save_dir and optionally print a message.

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.datasets.simac.Track(track_id, data_home, dataset_name, index, metadata)[source]

Simac Rhythm class

Parameters:

track_id (str) – track id of the track
data_home (str) – Local path where the dataset is stored. default=None If None, looks for the data in the default directory, ~/mir_datasets

Variables:

audio_path (str) – path to audio file
beats_path (str) – path to beats file
tempo_path (str) – path to tempo file

Other Parameters:

beats (BeatData) – human-labeled beat annotations
tempo (float) – human-labeled tempo annotations

property audio: Tuple[ndarray, float] | None

The track’s audio

Returns:

np.ndarray - audio signal
float - sample rate

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

mirdata.datasets.simac.load_audio(fhandle: BinaryIO) → Tuple[ndarray, float][source]

Load a SIMAC Rhythm audio file. :Parameters: fhandle (str or file-like) – path or file-like object pointing to an audio file

Returns:

np.ndarray - the mono audio signal
float - The sample rate of the audio file

mirdata.datasets.simac.load_beats(fhandle: TextIO)[source]

Load beats

Parameters:: fhandle (str or file-like) – Local path where the beats annotation is stored.
Returns:: BeatData – beat annotations

mirdata.datasets.simac.load_tempo(fhandle: TextIO) → float[source]

Load tempo

Parameters:: fhandle (str or file-like) – Local path where the tempo annotation is stored.
Returns:: float – tempo annotation

slakh

slakh Dataset Loader

Dataset Info

The Synthesized Lakh (Slakh) Dataset is a dataset of multi-track audio and aligned MIDI for music source separation and multi-instrument automatic transcription. Individual MIDI tracks are synthesized from the Lakh MIDI Dataset v0.1 using professional-grade sample-based virtual instruments, and the resulting audio is mixed together to make musical mixtures.

The original release of Slakh, called Slakh2100, contains 2100 automatically mixed tracks and accompanying, aligned MIDI files, synthesized from 187 instrument patches categorized into 34 classes, totaling 145 hours of mixture data.

This loader supports two versions of Slakh: - Slakh2100-redux: a deduplicated version of slakh2100 containing 1710 multitracks - baby-slakh: a mini version with 16k wav audio and only the first 20 tracks

This dataset was created at Mitsubishi Electric Research Labl (MERL) and Interactive Audio Lab at Northwestern University by Ethan Manilow, Gordon Wichern, Prem Seetharaman, and Jonathan Le Roux.

For more information see http://www.slakh.com/

class mirdata.datasets.slakh.Dataset(data_home=None, version='default')[source]

The slakh dataset

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: test_2100-redux 2100-redux baby sample_2100-redux Default version: 2100-redux

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download data to save_dir and optionally print a message.

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

Partial Downloads: 2100-redux baby

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_audio(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.slakh.load_audio

load_midi(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.slakh.load_midi

load_multif0_from_midi(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.io.load_multif0_from_midi

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_notes_from_midi(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.io.load_notes_from_midi

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

mirdata.datasets.slakh.MIXING_GROUPS = {'bass': [32, 33, 34, 35, 36, 37, 38, 39], 'drums': [128], 'guitar': [24, 25, 26, 27, 28, 29, 30, 31], 'piano': [0, 1, 2, 3, 4, 5, 6, 7]}: Mixing group to program number mapping

class mirdata.datasets.slakh.MultiTrack(mtrack_id, data_home, dataset_name, index, track_class, metadata)[source]

slakh multitrack class, containing information about the mix and the set of associated stems

Variables:

mtrack_id (str) – track id
tracks (dict) – {track_id: Track}
track_audio_property (str) – the name of the attribute of Track which returns the audio to be mixed
mix_path (str) – path to the multitrack mix audio
midi_path (str) – path to the full midi data used to generate the mixture
metadata_path (str) – path to the multitrack metadata file
split (str or None) – one of ‘train’, ‘validation’, ‘test’, or ‘omitted’. ‘omitted’ tracks are part of slakh2100-redux which were found to be duplicates in the original slakh2011.
data_split (str or None) – equivalent to split (deprecated in 0.3.6)
uuid (str) – File name of the original MIDI file from Lakh, sans extension
lakh_midi_dir (str) – Path to the original MIDI file from a fresh download of Lakh
normalized (bool) – whether the mix and stems were normalized according to the ITU-R BS.1770-4 spec
overall_gain (float) – gain applied to every stem to make sure mixture does not clip when stems are summed

Other Parameters:

midi (PrettyMIDI) – midi data used to generate the mixture audio
notes (NoteData) – note representation of the midi data
multif0 (MultiF0Data) – multif0 representation of the midi data

property audio: Tuple[ndarray, float] | None

The track’s audio

Returns:

np.ndarray - audio signal
float - sample rate

get_mix()[source]

Create a linear mixture given a subset of tracks.

Parameters:: track_keys (list) – list of track keys to mix together
Returns:: np.ndarray – mixture audio with shape (n_samples, n_channels)

get_path(key)[source]

Get absolute path to multitrack audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

get_random_target(n_tracks=None, min_weight=0.3, max_weight=1.0)[source]

Get a random target by combining a random selection of tracks with random weights

Parameters:

n_tracks (int or None) – number of tracks to randomly mix. If None, uses all tracks
min_weight (float) – minimum possible weight when mixing
max_weight (float) – maximum possible weight when mixing

Returns:

np.ndarray - mixture audio with shape (n_samples, n_channels)
list - list of keys of included tracks
list - list of weights used to mix tracks

get_submix_by_group(target_groups)[source]

Create submixes grouped by instrument type. Creates one submix per target group, plus one additional “other” group for any remaining sources. Only tracks with available audio are mixed.

Parameters:: target_groups (list) – List of target groups. Elements should be one of MIXING_GROUPS, e.g. [“bass”, “guitar”]
Returns:: ** submixes (dict)* – {group: audio_signal} of submixes * groups (dict): {group: list of track ids} of submixes

get_target(track_keys, weights=None, average=True, enforce_length=True)[source]

Get target which is a linear mixture of tracks

Parameters:

track_keys (list) – list of track keys to mix together
weights (list or None) – list of positive scalars to be used in the average
average (bool) – if True, computes a weighted average of the tracks if False, computes a weighted sum of the tracks
enforce_length (bool) – If True, raises ValueError if the tracks are not the same length. If False, pads audio with zeros to match the length of the longest track

Returns:

np.ndarray – target audio with shape (n_channels, n_samples)

Raises:

ValueError – if sample rates of the tracks are not equal if enforce_length=True and lengths are not equal

class mirdata.datasets.slakh.Track(track_id, data_home, dataset_name, index, metadata)[source]

slakh Track class, for individual stems

Variables:

audio_path (str or None) – path to the track’s audio file. For some unusual tracks, such as sound effects, there is no audio and this attribute is None.
split (str or None) – one of ‘train’, ‘validation’, ‘test’, or ‘omitted’. ‘omitted’ tracks are part of slakh2100-redux which were found to be duplicates in the original slakh2011. In baby slakh there are no splits, so this attribute is None.
data_split (str or None) – equivalent to split (deprecated in 0.3.6)
metadata_path (str) – path to the multitrack’s metadata file
midi_path (str or None) – path to the track’s midi file. For some unusual tracks, such as sound effects, there is no midi and this attribute is None.
mtrack_id (str) – the track’s multitrack id
track_id (str) – track id
instrument (str) – MIDI instrument class, see link for details: https://en.wikipedia.org/wiki/General_MIDI#Program_change_events
integrated_loudness (float) – integrated loudness (dB) of this track as calculated by the ITU-R BS.1770-4 spec
is_drum (bool) – whether the “drum” flag is true for this MIDI track
midi_program_name (str) – MIDI instrument program name
plugin_name (str) – patch/plugin name that rendered the audio file
mixing_group (str) – which mixing group the track belongs to. One of MIXING_GROUPS.
program_number (int) – MIDI instrument program number

Other Parameters:

midi (PrettyMIDI) – midi data used to generate the audio
notes (NoteData or None) – note representation of the midi data. If there are no notes in the midi file, returns None.
multif0 (MultiF0Data or None) – multif0 representaation of the midi data. If there are no notes in the midi file, returns None.

property audio: Tuple[ndarray, float] | None

The track’s audio

Returns:

np.ndarray - audio signal
float - sample rate

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

mirdata.datasets.slakh.load_audio(fhandle: BinaryIO) → Tuple[ndarray, float][source]

Load a slakh audio file.

Parameters:

fhandle (str or file-like) – path or file-like object pointing to an audio file

Returns:

np.ndarray - the audio signal
float - The sample rate of the audio file

tinysol

TinySOL Dataset Loader.

Dataset Info

TinySOL is a dataset of 2913 samples, each containing a single musical note from one of 14 different instruments:

Bass Tuba
French Horn
Trombone
Trumpet in C
Accordion
Contrabass
Violin
Viola
Violoncello
Bassoon
Clarinet in B-flat
Flute
Oboe
Alto Saxophone

These sounds were originally recorded at Ircam in Paris (France) between 1996 and 1999, as part of a larger project named Studio On Line (SOL). Although SOL contains many combinations of mutes and extended playing techniques, TinySOL purely consists of sounds played in the so-called “ordinary” style, and in absence of mute.

TinySOL can be used for education and research purposes. In particular, it can be employed as a dataset for training and/or evaluating music information retrieval (MIR) systems, for tasks such as instrument recognition or fundamental frequency estimation. For this purpose, we provide an official 5-fold split of TinySOL as a metadata attribute. This split has been carefully balanced in terms of instrumentation, pitch range, and dynamics. For the sake of research reproducibility, we encourage users of TinySOL to adopt this split and report their results in terms of average performance across folds.

We encourage TinySOL users to subscribe to the Ircam Forum so that they can have access to larger versions of SOL.

For more details, please visit: https://www.orch-idea.org/

class mirdata.datasets.tinysol.Dataset(data_home=None, version='default')[source]

The tinysol dataset

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: 6.0 Default version: 6.0

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download data to save_dir and optionally print a message.

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

Partial Downloads: audio annotations

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_audio(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.tinysol.load_audio

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.datasets.tinysol.Track(track_id, data_home, dataset_name, index, metadata)[source]

tinysol Track class

Parameters:

track_id (str) – track id of the track

Variables:

audio_path (str) – path of the audio file
dynamics (str) – dynamics abbreviation. Ex: pp, mf, ff, etc.
dynamics_id (int) – pp=0, p=1, mf=2, f=3, ff=4
family (str) – instrument family encoded by its English name
instance_id (int) – instance ID. Either equal to 0, 1, 2, or 3.
instrument_abbr (str) – instrument abbreviation
instrument_full (str) – instrument encoded by its English name
is_resampled (bool) – True if this sample was pitch-shifted from a neighbor; False if it was genuinely recorded.
pitch (str) – string containing English pitch class and octave number
pitch_id (int) – MIDI note index, where middle C (“C4”) corresponds to 60
string_id (NoneType) – string ID. By musical convention, the first string is the highest. On wind instruments, this is replaced by None.
technique_abbr (str) – playing technique abbreviation
technique_full (str) – playing technique encoded by its English name
track_id (str) – track id

property audio: Tuple[ndarray, float] | None

The track’s audio

Returns:

np.ndarray - audio signal
float - sample rate

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

mirdata.datasets.tinysol.load_audio(fhandle: BinaryIO) → Tuple[ndarray, float][source]

Load a TinySOL audio file.

Parameters:

fhandle (str or file-like) – File-like object or path to audio file

Returns:

np.ndarray - the mono audio signal
float - The sample rate of the audio file

tonality_classicaldb

Tonality classicalDB Dataset Loader

Dataset Info

The Tonality classicalDB Dataset includes 881 classical musical pieces across different styles from s.XVII to s.XX annotated with single-key labels.

Tonality classicalDB Dataset was created as part of:

Gómez, E. (2006). PhD Thesis. Tonal description of music audio signals.
Department of Information and Communication Technologies.

This dataset is mainly intended to assess the performance of computational key estimation algorithms in classical music.

2020 note: The audio is privates. If you don’t have the original audio collection, you could create it from your private collection because most of the recordings are well known. To this end, we provide musicbrainz metadata. Moreover, we have added the spectrum and HPCP chromagram of each audio.

This dataset can be used with mirdata library: https://github.com/mir-dataset-loaders/mirdata

Spectrum features have been computed as is shown here: https://github.com/mir-dataset-loaders/mirdata-notebooks/blob/master/Tonality_classicalDB/ClassicalDB_spectrum_features.ipynb

HPCP chromagram has been computed as is shown here: https://github.com/mir-dataset-loaders/mirdata-notebooks/blob/master/Tonality_classicalDB/ClassicalDB_HPCP_features.ipynb

Musicbrainz metadata has been computed as is shown here: https://github.com/mir-dataset-loaders/mirdata-notebooks/blob/master/Tonality_classicalDB/ClassicalDB_musicbrainz_metadata.ipynb

class mirdata.datasets.tonality_classicaldb.Dataset(data_home=None, version='default')[source]

The tonality_classicaldb dataset

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: 1.0 Default version: 1.0

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download data to save_dir and optionally print a message.

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

Partial Downloads: keys musicbrainz_metadata HPCPs spectrums

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_audio(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.tonality_classicaldb.load_audio

load_hpcp(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.tonality_classicaldb.load_hpcp

load_key(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.tonality_classicaldb.load_key

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_musicbrainz(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.tonality_classicaldb.load_musicbrainz

load_spectrum(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.tonality_classicaldb.load_spectrum

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.datasets.tonality_classicaldb.Track(track_id, data_home, dataset_name, index, metadata)[source]

tonality_classicaldb track class

Parameters:

track_id (str) – track id of the track

Variables:

audio_path (str) – track audio path
key_path (str) – key annotation path
title (str) – title of the track
track_id (str) – track id

Other Parameters:

key (str) – key annotation
spectrum (np.array) – computed audio spectrum
hpcp (np.array) – computed hpcp
musicbrainz_metadata (dict) – MusicBrainz metadata

property audio: Tuple[ndarray, float] | None

The track’s audio

Returns:

np.ndarray - audio signal
float - sample rate

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

mirdata.datasets.tonality_classicaldb.load_audio(fhandle: BinaryIO) → Tuple[ndarray, float][source]

Load a Tonality classicalDB audio file.

Parameters:

fhandle (str or file-like) – File-like object or path to audio file

Returns:

np.ndarray - the mono audio signal
float - The sample rate of the audio file

mirdata.datasets.tonality_classicaldb.load_hpcp(fhandle: TextIO) → ndarray[source]

Load Tonality classicalDB HPCP feature from a file

Parameters:: fhandle (str or file-like) – File-like object or path to HPCP file
Returns:: np.ndarray – loaded HPCP data

mirdata.datasets.tonality_classicaldb.load_key(fhandle: TextIO) → str[source]

Load Tonality classicalDB format key data from a file

Parameters:: fhandle (str or file-like) – File-like object or path to key annotation file
Returns:: str – musical key data

mirdata.datasets.tonality_classicaldb.load_musicbrainz(fhandle: TextIO) → Dict[Any, Any][source]

Load Tonality classicalDB musicbraiz metadata from a file

Parameters:: fhandle (str or file-like) – File-like object or path to musicbrainz metadata file
Returns:: dict – musicbrainz metadata

mirdata.datasets.tonality_classicaldb.load_spectrum(fhandle: TextIO) → ndarray[source]

Load Tonality classicalDB spectrum data from a file

Parameters:: fhandle (str or file-like) – File-like object or path to spectrum file
Returns:: np.ndarray – spectrum data

tonas

TONAS Loader

Dataset Info

This dataset contains a music collection of 72 sung excerpts representative of three a cappella singing styles (Deblas, and two variants of Martinete). It has been developed within the COFLA research project context. The distribution is as follows: 1. 16 Deblas 2. 36 Martinete 1 3. 20 Martinete 2

This collection was built in the context of a study on similarity and style classification of flamenco a cappella singing styles (Tonas) by the flamenco expert Dr. Joaquin Mora, Universidad de Sevilla.

We refer to (Mora et al. 2010) for a comprehensive description of the considered styles and their musical characteristics. All 72 excerpts are monophonic, their average duration is 30 seconds and there is enough variability for a proper evaluation of our methods, including a variety of singers, recording conditions, presence of percussion, clapping, background voices and noise. We also provide manual melodic transcriptions, generated by the COFLA team and Cristina López Gómez.

The annotations are represented by specifying the value (in this case, Notes and F0) at the related timestamps. TONAS’ note and F0 annotations also have “Energy” information, which refers to the average energy value through all the frames in which a note or a F0 value is comprised.

Using this dataset: TONAS dataset can be obtained upon request. Please refer to this link: https://zenodo.org/record/1290722 to request access and follow the indications of the .download() method for a proper storing and organization of the TONAS dataset.

Citing this dataset: When TONAS is used for academic research, we would highly appreciate if scientific publications of works partly based on the TONAS dataset quote the following publication: - Music material: Mora, J., Gomez, F., Gomez, E., Escobar-Borrego, F.J., Diaz-Banez, J.M. (2010). Melodic Characterization and Similarity in A Cappella Flamenco Cantes. 11th International Society for Music Information Retrieval Conference (ISMIR 2010). - Transcriptions: Gomez, E., Bonada, J. (in Press). Towards Computer-Assisted Flamenco Transcription: An Experimental Comparison of Automatic Transcription Algorithms As Applied to A Cappella Singing. Computer Music Journal.

class mirdata.datasets.tonas.Dataset(data_home=None, version='default')[source]

The TONAS dataset

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: 1.0 Default version: 1.0

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download data to save_dir and optionally print a message.

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_audio(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.tonas.load_audio

load_f0(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.tonas.load_f0

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_notes(*args, **kwargs)[source]: Deprecated since version 0.3.4: Use mirdata.datasets.tonas.load_notes

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.datasets.tonas.Track(track_id, data_home, dataset_name, index, metadata)[source]

TONAS track class

Parameters:

track_id (str) – track id of the track
data_home (str) – Local path where the dataset is stored. If None, looks for the data in the default directory, ~/mir_datasets/TONAS

Variables:

f0_path (str) – local path where f0 melody annotation file is stored
notes_path (str) – local path where notation annotation file is stored
audio_path (str) – local path where audio file is stored
track_id (str) – track id
singer (str) – performing singer (cantaor)
title (str) – title of the track song
tuning_frequency (float) – tuning frequency of the symbolic notation

Other Parameters:

f0_automatic (F0Data) – automatically extracted f0
f0_corrected (F0Data) – manually corrected f0 annotations
notes (NoteData) – annotated notes

property audio: Tuple[ndarray, float]

The track’s audio

Returns:

np.ndarray - audio signal
float - sample rate

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

mirdata.datasets.tonas.load_audio(fhandle: str) → Tuple[ndarray, float][source]

Load a TONAS audio file.

Parameters:

fhandle (str) – path to an audio file

Returns:

np.ndarray - the mono audio signal
float - The sample rate of the audio file

mirdata.datasets.tonas.load_f0(fpath: TextIO, corrected: bool) → F0Data | None[source]

Load TONAS f0 annotations

Parameters:

fpath (str or file-like) – path pointing to f0 annotation file
corrected (bool) – if True, loads manually corrected frequency values otherwise, loads automatically extracted frequency values

Returns:

F0Data – predominant f0 melody

mirdata.datasets.tonas.load_notes(fhandle: TextIO) → NoteData | None[source]

Load TONAS note data from the annotation files

Parameters:: fhandle (str or file-like) – path or file-like object pointing to a notes annotation file
Returns:: NoteData – note annotations

vocadito

vocadito Dataset Loader

Dataset Info

vocadito is a dataset of 40 short excerpts of solo, monophonic singing. The excerpts are sung in 7 different languages by singers with varying of levels of training, and are recorded on a variety of devices.

Annotations are labeled by trained musicians. For each excerpt, we provide:

frame-level f0 annotations 2 versions of note annotations (from 2 different annotators) lyrics language

For more details, please visit: https://zenodo.org/record/5578807

class mirdata.datasets.vocadito.Dataset(data_home=None, version='default')[source]

The vocadito dataset

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

Available Versions: 1.0 Default version: 1.0

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download data to save_dir and optionally print a message.

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

Partial Downloads: zenodo

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.datasets.vocadito.Track(track_id, data_home, dataset_name, index, metadata)[source]

vocadito Track class

Parameters:

track_id (str) – track id of the track

Variables:

audio_path (str) – path to the track’s audio file
f0_path (str) – path to the track’s f0 annotation file
lyrics_path (str) – path to the track’s lyric annotation file
notes_a1_path (str) – path to the track’s note annotation file for annotator A1
notes_a2_path (str) – path to the track’s note annotation file for annotator A2
track_id (str) – track id
singer_id (str) – singer id
average_pitch_midi (int) – Average pitch in midi, computed from the f0 annotation
language (str) – The track’s language. May contain multiple languages.

Other Parameters:

f0 (F0Data) – human-annotated singing voice pitch
lyrics (List[List[str]]) – human-annotated lyrics
notes_a1 (NoteData) – human-annotated notes by annotator A1
notes_a2 (NoteData) – human-annotated notes by annotator A2

property audio: Tuple[ndarray, float] | None

solo vocal audio (mono)

Returns:

np.ndarray - audio signal
float - sample rate

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

mirdata.datasets.vocadito.load_audio(fhandle: BinaryIO) → Tuple[ndarray, float][source]

Load vocadito vocal audio

Parameters:

fhandle (str or file-like) – File-like object or path to audio file

Returns:

np.ndarray - audio signal
float - sample rate

mirdata.datasets.vocadito.load_f0(fhandle: TextIO) → F0Data[source]

Load a vocadito f0 annotation

Parameters:: fhandle (str or file-like) – File-like object or path to f0 annotation file
Raises:: IOError – If f0_path does not exist
Returns:: F0Data – the f0 annotation data

mirdata.datasets.vocadito.load_lyrics(fhandle: TextIO) → List[List[str]][source]

Load a lyrics annotation

Parameters:: fhandle (str or file-like) – File-like object or path to lyric annotation file
Raises:: IOError – if lyrics_path does not exist
Returns:: LyricData – lyric annotation data

mirdata.datasets.vocadito.load_notes(fhandle: TextIO) → NoteData | None[source]

load a note annotation file

Parameters:: fhandle (str or file-like) – str or file-like to note annotation file
Raises:: IOError – if file doesn’t exist
Returns:: NoteData – note annotation

Core

Core mirdata classes

class mirdata.core.Dataset(data_home=None, version='default', name=None, track_class=None, multitrack_class=None, bibtex=None, indexes=None, remotes=None, download_info=None, license_info=None)[source]

mirdata Dataset class

Variables:

data_home (str) – path where mirdata will look for the dataset
version (str)
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None)
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack

__init__(data_home=None, version='default', name=None, track_class=None, multitrack_class=None, bibtex=None, indexes=None, remotes=None, download_info=None, license_info=None)[source]

Dataset init method

Parameters:

data_home (str or None) – path where mirdata will look for the dataset
version (str) – dataset version
name (str or None) – the identifier of the dataset
track_class (mirdata.core.Track or None) – a Track class
multitrack_class (mirdata.core.Multitrack or None) – a Multitrack class
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None) – indexes to be downloaded
remotes (dict or None) – data to be downloaded
download_info (str or None) – download instructions or caveats
license_info (str or None) – license of the dataset

choice_multitrack()[source]

Choose a random multitrack

Returns:: Multitrack – a Multitrack object instantiated by a random mtrack_id

choice_track()[source]

Choose a random track

Returns:: Track – a Track object instantiated by a random track_id

cite()[source]: Print the reference

property default_path

Get the default path for the dataset

Returns:: str – Local path to the dataset

download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download data to save_dir and optionally print a message.

Parameters:

partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception

Raises:

ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected

get_mtrack_splits()[source]

Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.

Raises:

AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of mtrack_ids

get_random_mtrack_splits(splits, seed=42, split_names=None)[source]

Split the multitracks into partitions, e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_random_track_splits(splits, seed=42, split_names=None)[source]

Split the tracks into partitions e.g. training, validation, test

Parameters:

splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary

Returns:

dict – a dictionary containing the elements in each split

get_track_splits()[source]

Get predetermined track splits (e.g. train/ test) released alongside this dataset

Raises:

AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits

Returns:

dict – splits, keyed by split name and with values of lists of track_ids

license()[source]: Print the license

load_multitracks()[source]

Load all multitracks in the dataset

Returns:: dict – {mtrack_id: multitrack data}
Raises:: NotImplementedError – If the dataset does not support Multitracks

load_tracks()[source]

Load all tracks in the dataset

Returns:: dict – {track_id: track data}
Raises:: NotImplementedError – If the dataset does not support Tracks

mtrack_ids[source]

Return track ids

Returns:: list – A list of track ids

track_ids[source]

Return track ids

Returns:: list – A list of track ids

validate(verbose=True)[source]

Validate if the stored dataset is a valid version

Parameters:

verbose (bool) – If False, don’t print output

Returns:

list - files in the index but are missing locally
list - files which have an invalid checksum

class mirdata.core.Index(filename: str, url: str | None = None, checksum: str | None = None, partial_download: List[str] | None = None)[source]

Class for storing information about dataset indexes. :Parameters: * filename (str) – The index filename (not path), e.g. “example_dataset_index_1.2.json”

url (str or None) – None if index is not remote, or a url to download from

checksum (str or None) – None if index is not remote, or the md5 checksum of the file

partial_download (list or None) – if provided, specifies a subset of Dataset.remotes corresponding to this index to be downloaded. If None, all Dataset.remotes will be downloaded when calling Dataset.download()

Variables:

remote (download_utils.RemoteFileMetadata or None) – None if index is not remote, or a RemoteFileMetadata object
partial_download (list or None) – a list of keys to partially download, or None

get_path() → str[source]: Get the absolute path to the index file :returns: str – absolute path to the index file

class mirdata.core.MultiTrack(mtrack_id, data_home, dataset_name, index, track_class, metadata)[source]

MultiTrack class.

A multitrack class is a collection of track objects and their associated audio that can be mixed together. A multitrack is itself a Track, and can have its own associated audio (such as a mastered mix), its own metadata and its own annotations.

__init__(mtrack_id, data_home, dataset_name, index, track_class, metadata)[source]

Multitrack init method. Sets boilerplate attributes, including:

mtrack_id
_dataset_name
_data_home
_multitrack_paths
_multitrack_metadata

Parameters:

mtrack_id (str) – multitrack id
data_home (str) – path where mirdata will look for the dataset
dataset_name (str) – the identifier of the dataset
index (dict) – the dataset’s file index
metadata (function or None) – a function returning a dictionary of metadata or None

get_mix()[source]

Create a linear mixture given a subset of tracks.

Parameters:: track_keys (list) – list of track keys to mix together
Returns:: np.ndarray – mixture audio with shape (n_samples, n_channels)

get_path(key)[source]

Get absolute path to multitrack audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

get_random_target(n_tracks=None, min_weight=0.3, max_weight=1.0)[source]

Get a random target by combining a random selection of tracks with random weights

Parameters:

n_tracks (int or None) – number of tracks to randomly mix. If None, uses all tracks
min_weight (float) – minimum possible weight when mixing
max_weight (float) – maximum possible weight when mixing

Returns:

np.ndarray - mixture audio with shape (n_samples, n_channels)
list - list of keys of included tracks
list - list of weights used to mix tracks

get_target(track_keys, weights=None, average=True, enforce_length=True)[source]

Get target which is a linear mixture of tracks

Parameters:

track_keys (list) – list of track keys to mix together
weights (list or None) – list of positive scalars to be used in the average
average (bool) – if True, computes a weighted average of the tracks if False, computes a weighted sum of the tracks
enforce_length (bool) – If True, raises ValueError if the tracks are not the same length. If False, pads audio with zeros to match the length of the longest track

Returns:

np.ndarray – target audio with shape (n_channels, n_samples)

Raises:

ValueError – if sample rates of the tracks are not equal if enforce_length=True and lengths are not equal

class mirdata.core.Track(track_id, data_home, dataset_name, index, metadata)[source]

Track base class

See the docs for each dataset loader’s Track class for details

__init__(track_id, data_home, dataset_name, index, metadata)[source]

Track init method. Sets boilerplate attributes, including:

track_id
_dataset_name
_data_home
_track_paths
_track_metadata

Parameters:

track_id (str) – track id
data_home (str) – path where mirdata will look for the dataset
dataset_name (str) – the identifier of the dataset
index (dict) – the dataset’s file index
metadata (function or None) – a function returning a dictionary of metadata or None

get_path(key)[source]

Get absolute path to track audio and annotations. Returns None if the path in the index is None

Parameters:: key (string) – Index key of the audio or annotation type
Returns:: str or None – joined path string or None

class mirdata.core.cached_property(func)[source]

Cached propery decorator

A property that is only computed once per instance and then replaces itself with an ordinary attribute. Deleting the attribute resets the property. Source: https://github.com/bottlepy/bottle/commit/fa7733e075da0d790d809aa3d2f53071897e6f76

mirdata.core.docstring_inherit(parent)[source]

Decorator function to inherit docstrings from the parent class.

Adds documented Attributes from the parent to the child docs.

Annotations

Units

mirdata.annotations.BEAT_POSITION_UNITS = {'bar_fraction': 'beat position as fractions of bars, e.g. 0.25', 'bar_index': 'beat index within a bar, 1-indexed', 'global_fraction': 'bar_frac, but where the integer part indicates the bar. e.g. 4.25', 'global_index': 'beat index within full track, 1-indexed'}: Beat position units

mirdata.annotations.CHORD_UNITS = {'harte': 'chords in harte format, e.g. Ab:maj7', 'jams': "chords in jams 'chord' format", 'open': 'no strict schema or units'}: Chord units

mirdata.annotations.AMPLITUDE_UNITS = {'binary': '0 or 1', 'energy': 'energy value, measured as the sum of a squared signal', 'likelihood': 'score between 0 and 1', 'velocity': 'MIDI velocity between 0 and 127'}: Amplitude/voicing units

mirdata.annotations.EVENT_UNITS = {'open': 'no scrict schema or units'}: Event units

mirdata.annotations.KEY_UNITS = {'key_mode': 'key labels in key-mode format, e.g. G#:minor'}: Key units

mirdata.annotations.LYRIC_UNITS = {'pronunciations_open': 'lyric pronunciations, no strict schema', 'syllable_open': 'lyrics segmented by syllable, no strict schema', 'words': 'lyrics as words or phrases'}: Lyric units

mirdata.annotations.PITCH_UNITS = {'hz': 'hertz', 'midi': 'MIDI note number', 'note_name': 'pc with octave, e.g. Ab4', 'pc': 'pitch class, e.g. G#'}: Pitch units

mirdata.annotations.SECTION_UNITS = {'open': 'no scrict schema or units'}: Section units

mirdata.annotations.TEMPO_UNITS = {'bpm': 'beats per minute'}: Tempo units

mirdata.annotations.TIME_UNITS = {'ms': 'miliseconds', 's': 'seconds', 'ticks': 'MIDI ticks'}: Time units

mirdata.annotations.VOICING_UNITS = {'binary': '0 or 1', 'likelihood': 'score between 0 and 1'}: Voicing units

Annotation Types

class mirdata.annotations.Annotation[source]: Annotation base class

class mirdata.annotations.MultiAnnotator(annotators, annotations, dtype)[source]

Multiple annotator class. This class should be used for datasets with multiple annotators (e.g. multiple annotators per track).

Variables:

annotators (list) – list with annotator ids
annotations (list) – list of annotations (e.g. [beat_data1, beat_data2] each with type BeatData or [chord_data1, chord_data2] each with type chord data

class mirdata.annotations.BeatData(times, time_unit, positions, position_unit, confidence=None, confidence_unit=None)[source]

BeatData class

Variables:

times (np.ndarray) – array of time stamps with positive, strictly increasing values
time_unit (str) – time unit, one of TIME_UNITS
positions (np.ndarray) – array of beat positions in the format of position_unit. For all units, values of 0 indicate beats which fall outside of a measure.
position_unit (str) – beat position unit, one of BEAT_POSITION_UNITS
confidence (np.ndarray) – array of confidence values
confidence_unit (str) – confidence unit, one of AMPLITUDE_UNITS

class mirdata.annotations.SectionData(intervals, interval_unit, labels=None, label_unit=None)[source]

SectionData class

Variables:

intervals (np.ndarray) – (n x 2) array of intervals in the form [start_time, end_time]. Times should be positive and intervals should have non-negative duration
interval_unit (str) – unit of the time values in intervals. One of TIME_UNITS.
labels (list or None) – list of section labels
label_unit (str or None) – label unit, one of SECTION_UNITS

class mirdata.annotations.ChordData(intervals, interval_unit, labels, label_unit, confidence=None, confidence_unit=None)[source]

ChordData class

Variables:

intervals (np.ndarray) – (n x 2) array of intervals in the form [start_time, end_time]. Times should be positive and intervals should have non-negative duration
interval_unit (str) – unit of the time values in intervals. One of TIME_UNITS.
labels (list) – list chord labels (as strings)
label_unit (str) – chord label schema
confidence (np.ndarray or None) – array of confidence values
confidence_unit (str or None) – confidence unit, one of AMPLITUDE_UNITS

class mirdata.annotations.F0Data(times, time_unit, frequencies, frequency_unit, voicing, voicing_unit, confidence=None, confidence_unit=None)[source]

F0Data class

Variables:

times (np.ndarray) – array of time stamps (as floats) with positive, strictly increasing values
time_unit (str) – time unit, one of TIME_UNITS
frequencies (np.ndarray) – array of frequency values (as floats)
frequency_unit (str) – frequency unit, one of PITCH_UNITS
voicing (np.ndarray) – array of voicing values, indicating whether or not a time frame has an active pitch
voicing_unit (str) – voicing unit, one of VOICING_UNITS
confidence (np.ndarray or None) – array of confidence values
confidence_unit (str or None) – confidence unit, one of AMPLITUDE_UNITS

resample(times_new, times_new_unit)[source]

Resample the annotation to a new time scale. This function is adapted from: https://github.com/craffel/mir_eval/blob/master/mir_eval/melody.py#L212

Parameters:

times_new (np.ndarray) – new time base, in units of times_new_unit
times_new_unit (str) – time unit, one of TIME_UNITS

Returns:

F0Data – F0 data sampled at new time scale

to_matrix(time_scale, time_scale_unit, frequency_scale, frequency_scale_unit, amplitude_unit='binary')[source]

Convert f0 data to a matrix (piano roll) defined by a time and frequency scale

Parameters:

time_scale (np.array) – times in units time_unit
time_scale_unit (str) – time scale units, one of TIME_UNITS
frequency_scale (np.array) – frequencies in frequency_unit
frequency_scale_unit (str) – frequency scale units, one of PITCH_UNITS
amplitude_unit (str) – amplitude units, one of AMPLITUDE_UNITS Defaults to “binary”.

Returns:

np.ndarray – 2D matrix of shape len(time_scale) x len(frequency_scale)

to_mir_eval()[source]

Convert units and format to what is expected by mir_eval.melody.evaluate

Returns:

times (np.ndarray) - uniformly spaced times in seconds
frequencies (np.ndarray) - frequency values in hz
voicing (np.ndarray) - voicings, as likelihood values

to_multif0()[source]

Convert annotation to multif0 format

Returns:: MultiF0Data – data in multif0 format

to_sparse_index(time_scale, time_scale_unit, frequency_scale, frequency_scale_unit, amplitude_unit='binary')[source]

Convert F0 annotation to sparse matrix indices for a time-frequency matrix.

Parameters:

time_scale (np.array) – times in units time_unit
time_scale_unit (str) – time scale units, one of TIME_UNITS
frequency_scale (np.array) – frequencies in frequency_unit
frequency_scale_unit (str) – frequency scale units, one of PITCH_UNITS
amplitude_unit (str) – amplitude units, one of AMPLITUDE_UNITS Defaults to “binary”.

Returns:

** sparse_index (np.ndarray)* – Array of sparce indices [(time_index, frequency_index)] * amplitude (np.ndarray): Array of amplitude values for each index

class mirdata.annotations.MultiF0Data(times, time_unit, frequency_list, frequency_unit, confidence_list=None, confidence_unit=None)[source]

MultiF0Data class

Variables:

times (np.ndarray) – array of time stamps (as floats) with positive, strictly increasing values
time_unit (str) – time unit, one of TIME_UNITS
frequency_list (list) – list of lists of frequency values (as floats)
frequency_unit (str) – frequency unit, one of PITCH_UNITS
confidence_list (np.ndarray or None) – list of lists of confidence values
confidence_unit (str or None) – confidence unit, one of AMPLITUDE_UNITS

resample(times_new, times_new_unit)[source]

Resample annotation to a new time scale. This function is adapted from: https://github.com/craffel/mir_eval/blob/master/mir_eval/multipitch.py#L104

Parameters:

times_new (np.array) – array of new time scale values
times_new_unit (str) – units for new time scale, one of TIME_UNITS

Returns:

MultiF0Data – the resampled annotation

to_matrix(time_scale, time_scale_unit, frequency_scale, frequency_scale_unit, amplitude_unit='binary')[source]

Convert f0 data to a matrix (piano roll) defined by a time and frequency scale

Parameters:

time_scale (np.array) – times in units time_unit
time_scale_unit (str) – time scale units, one of TIME_UNITS
frequency_scale (np.array) – frequencies in frequency_unit
frequency_scale_unit (str) – frequency scale units, one of PITCH_UNITS
amplitude_unit (str) – amplitude units, one of AMPLITUDE_UNITS Defaults to “binary”.

Returns:

np.ndarray – 2D matrix of shape len(time_scale) x len(frequency_scale)

to_mir_eval()[source]

Convert annotation into the format expected by mir_eval.multipitch.evaluate

Returns:: ** times (np.ndarray)* – array of uniformly spaced time stamps in seconds * frequency_list (list): list of np.array of frequency values in Hz

to_sparse_index(time_scale, time_scale_unit, frequency_scale, frequency_scale_unit, amplitude_unit='binary')[source]

Convert MultiF0 annotation to sparse matrix indices for a time-frequency matrix.

Parameters:

time_scale (np.array) – times in units time_unit
time_scale_unit (str) – time scale units, one of TIME_UNITS
frequency_scale (np.array) – frequencies in frequency_unit
frequency_scale_unit (str) – frequency scale units, one of PITCH_UNITS
amplitude_unit (str) – amplitude units, one of AMPLITUDE_UNITS Defaults to “binary”.

Returns:

** sparse_index (np.ndarray)* – Array of sparce indices [(time_index, frequency_index)] * amplitude (np.ndarray): Array of amplitude values for each index

class mirdata.annotations.NoteData(intervals: ndarray, interval_unit: str, pitches: ndarray, pitch_unit: str, confidence: ndarray | None = None, confidence_unit: str | None = None)[source]

NoteData class

Variables:

intervals (np.ndarray) – (n x 2) array of intervals in the form [start_time, end_time]. Times should be positive and intervals should have non-negative duration
interval_unit (str) – unit of the time values in intervals. One of TIME_UNITS.
pitches (np.ndarray) – array of pitches
pitch_unit (str) – note unit, one of PITCH_UNITS
confidence (np.ndarray or None) – array of confidence values
confidence_unit (str or None) – confidence unit, one of AMPLITUDE_UNITS

to_matrix(time_scale: ndarray, time_scale_unit: str, frequency_scale: ndarray, frequency_scale_unit: str, amplitude_unit: str = 'binary', onsets_only: bool = False) → ndarray[source]

Convert f0 data to a matrix (piano roll) defined by a time and frequency scale

Parameters:

time_scale (np.ndarray) – array of matrix time stamps in seconds
time_scale_unit (str) – units for time scale values, one of TIME_UNITS
frequency_scale (np.ndarray) – array of matrix frequency values in seconds
frequency_scale_unit (str) – units for frequency scale values, one of PITCH_UNITS
onsets_only (bool, optional) – If True, returns an onset piano roll. Defaults to False.

Returns:

np.ndarray – 2D matrix of shape len(time_scale) x len(frequency_scale)

to_mir_eval()[source]

Convert data to the format expected by mir_eval.transcription.evaluate and mir_eval.transcription_velocity.evaluate

Returns:

intervals (np.ndarray) - (n x 2) array of intervals of start time, end time in seconds
pitches (np.ndarray) - array of pitch values in hz
velocity (optional, np.ndarray) - array of velocity values between 0 and 127

to_multif0(time_hop: float, time_hop_unit: str, max_time: float | None = None) → MultiF0Data[source]

Convert note annotation to multiple f0 format.

Parameters:

time_hop (float) – time between time stamps in multif0 annotation
time_hop_unit (str) – unit for time_hop, and resulting multif0 data. One of TIME_UNITS
max_time (float, optional) – Maximum time stamp in time_hop units. Defaults to None, in which case the maximum note interval time is used.

Returns:

MultiF0Data – multif0 annotation

to_sparse_index(time_scale: ndarray, time_scale_unit: str, frequency_scale: ndarray, frequency_scale_unit: str, amplitude_unit: str = 'binary', onsets_only: bool = False) → Tuple[ndarray, ndarray][source]

Convert note annotations to indexes of a sparse matrix (piano roll)

Parameters:

time_scale (np.array) – array of matrix time stamps in seconds
time_scale_unit (str) – units for time scale values, one of TIME_UNITS
frequency_scale (np.array) – array of matrix frequency values in seconds
frequency_scale_unit (str) – units for frequency scale values, one of PITCH_UNITS
amplitude_unit (str) – units for amplitude values, one of AMPLITUDE_UNITS. Defaults to “binary”.
onsets_only (bool, optional) – If True, returns an onset piano roll. Defaults to False.

Returns:

** sparse_index (np.ndarray)* – Array of sparce indices [(time_index, frequency_index)] * amplitude (np.ndarray): Array of amplitude values for each index

class mirdata.annotations.KeyData(intervals, interval_unit, keys, key_unit)[source]

KeyData class

Variables:

intervals (np.ndarray) – (n x 2) array of intervals in the form [start_time, end_time]. Times should be positive and intervals should have non-negative duration
interval_unit (str) – unit of the time values in intervals. One of TIME_UNITS.
keys (list) – list key labels (as strings)
key_unit (str) – key unit, one of KEY_UNITS

class mirdata.annotations.LyricData(intervals, interval_unit, lyrics, lyric_unit)[source]

LyricData class

Variables:

intervals (np.ndarray) – (n x 2) array of intervals in the form [start_time, end_time]. Times should be positive and intervals should have non-negative duration
interval_unit (str) – unit of the time values in intervals. One of TIME_UNITS.
lyrics (list) – list of lyrics (as strings)
lyric_unit (str) – lyric unit, one of LYRIC_UNITS

class mirdata.annotations.TempoData(intervals, interval_unit, tempos, tempo_unit, confidence=None, confidence_unit=None)[source]

TempoData class

Variables:

intervals (np.ndarray) – (n x 2) array of intervals in the form [start_time, end_time]. Times should be positive and intervals should have non-negative duration
interval_unit (str) – unit of the time values in intervals. One of TIME_UNITS.
tempos (list) – array of tempo values (as floats)
tempo_unit (str) – tempo unit, one of TEMPO_UNITS
confidence (np.ndarray or None) – array of confidence values
confidence_unit (str or None) – confidence unit, one of AMPLITUDE_UNITS

class mirdata.annotations.EventData(intervals, interval_unit, events, event_unit)[source]

EventData class

Variables:

intervals (np.ndarray) – (n x 2) array of intervals in the form [start_time, end_time]. Times should be positive and intervals should have non-negative duration
interval_unit (str) – unit of the time values in intervals. One of TIME_UNITS.
interval_unit – interval units, one of TIME_UNITS
events (list) – list of event labels (as strings)
event_unit (str) – event units, one of EVENT_UNITS

Functions

mirdata.annotations(): mirdata annotation data types

mirdata.annotations.convert_pitch_units(pitches, pitch_unit, target_pitch_unit)[source]

Convert pitch values from pitch_unit to target_pitch_unit

Parameters:

pitches (np.array) – array of pitch values
pitch_unit (str) – unit of pitch, one of PITCH_UNITS
target_pitch_unit (str) – target unit of pitch, one of PITCH_UNITS

Raises:

NotImplementedError – If conversion between given units is not supported

Returns:

np.array – array of pitch values in target_pitch_unit

mirdata.annotations.convert_amplitude_units(amplitude, amplitude_unit, target_amplitude_unit)[source]

Convert amplitude values to likelihoods

Parameters:

amplitude (np.array) – array of amplitude values
amplitude_unit (str) – unit of amplitude, one of AMPLITUDE_UNITS
target_amplitude_unit (str) – target unit of amplitude, one of AMPLITUDE_UNITS

Raises:

NotImplementedError – If conversion is not supported

Returns:

np.array – array of amplitude values as in target amplitude unit

mirdata.annotations.closest_index(input_array, target_array)[source]

Get array of indices of target_array that are closest to the input_array

Parameters:

input_array (np.ndarray) – (n x 2) array of input values
target_array (np.ndarray) – (m x 2) array of target values)

Returns:

np.ndarray – array of shape (n x 1) of indexes into target_array

mirdata.annotations.validate_array_like(array_like, expected_type, expected_dtype, check_child=False, none_allowed=False)[source]

Validate that array-like object is well formed

If array_like is None, validation passes automatically.

Parameters:

array_like (array-like) – object to validate
expected_type (type) – expected type, either list or np.ndarray
expected_dtype (type) – expected dtype
check_child (bool) – if True, checks if all elements of array are children of expected_dtype
none_allowed (bool) – if True, allows array to be None

Raises:

TypeError – if type/dtype does not match expected_type/expected_dtype
ValueError – if array is empty but it shouldn’t be

mirdata.annotations.validate_lengths_equal(array_list)[source]

Validate that arrays in list are equal in length

Some arrays may be None, and the validation for these are skipped.

Parameters:: array_list (list) – list of array-like objects
Raises:: ValueError – if arrays are not equal in length

mirdata.annotations.validate_tempos(tempo, tempo_unit)[source]

Validate if tempos are well-formed

Parameters:

tempo (list) – list of tempo values
tempo_unit (str) – tempo unit, one of TEMPO_UNITS

Raises:

ValueError – if tempos are not well-formed

mirdata.annotations.validate_beat_positions(positions, position_unit)[source]

Validate if positions is well-formed.

Parameters:

positions (np.ndarray) – an array of positions values
positions_unit (str) – one of BEAT_POSITION_UNITS

Raises:

ValueError – if positions values are incompatible with the unit

mirdata.annotations.validate_confidence(confidence, confidence_unit)[source]

Validate if confidence is well-formed.

If confidence is None, validation passes automatically

Parameters:

confidence (np.ndarray) – an array of confidence values
confidence_unit (str) – one of AMPLITUDE_UNITS

Raises:

ValueError – if confidence values are incompatible with the unit

mirdata.annotations.validate_voicing(voicing, voicing_unit)[source]

Validate if voicing is well-formed.

Parameters:

voicing (np.ndarray) – an array of voicing values
voicing_unit (str) – one of VOICING_UNITS

Raises:

ValueError – if voicing values are incompatible with the unit

mirdata.annotations.validate_pitches(pitches, pitch_unit)[source]

Validate if pitches are well-formed.

Parameters:

pitches (np.ndarray) – an array of pitch values
pitch_unit (str) – pitch unit, one of PITCH_UNITS

Raises:

ValueError – if pitches do not correspond to the unit

mirdata.annotations.validate_chord_labels(chords, chord_unit)[source]

Validate that chord labels conform to chord_unit namespace

Parameters:

chords (list) – list of chord labels as strings
chord_unit (str) – chord namespace, e.g. “harte”

Raises:

ValueError – If chords don’t conform to namespace

mirdata.annotations.validate_key_labels(keys, key_unit)[source]

Validate that key labels conform to key_unit namespace

Parameters:

keys (list) – list of key labels as strings
key_unit (str) – key namespace, e.g. “harte”

Raises:

ValueError – If keys don’t conform to namespace

mirdata.annotations.validate_times(times, time_unit)[source]

Validate if times are well-formed.

If times is None, validation passes automatically

Parameters:

times (np.ndarray) – an array of time stamps
time_unit (str) – one of TIME_UNITS

Raises:

ValueError – if times have negative values or are non-increasing

mirdata.annotations.validate_intervals(intervals, interval_unit)[source]

Validate if intervals are well-formed.

If intervals is None, validation passes automatically

Parameters:

intervals (np.ndarray) – (n x 2) array
interval_unit (str) – interval unit, one of TIME_UNITS

Raises:

ValueError – if intervals have an invalid shape, have negative values
or if end times are smaller than start times. –

mirdata.annotations.validate_unit(unit, unit_values, allow_none=False)[source]

Validate that the given unit is one of the allowed unit values.

Parameters:

unit (str) – the unit name
unit_values (dict) – dictionary of possible unit values
allow_none (bool) – if true, allows unit=None to pass validation

Raises:

ValueError – If the given unit is not one of the allowed unit valuess

mirdata.annotations.validate_uniform_times(times)[source]

Advanced

mirdata.validate

Utility functions for mirdata

mirdata.validate.log_message(message, verbose=True)[source]

Helper function to log message

Parameters:

message (str) – message to log
verbose (bool) – if false, the message is not logged

mirdata.validate.md5(file_path)[source]

Get md5 hash of a file.

Parameters:: file_path (str) – File path
Returns:: str – md5 hash of data in file_path

mirdata.validate.validate(local_path, checksum)[source]

Validate that a file exists and has the correct checksum

Parameters:

local_path (str) – file path
checksum (str) – md5 checksum

Returns:

bool - True if file exists
bool - True if checksum matches

mirdata.validate.validate_files(file_dict, data_home, verbose)[source]

Validate files

Parameters:

file_dict (dict) – dictionary of file information
data_home (str) – path where the data lives
verbose (bool) – if True, show progress

Returns:

dict - missing files
dict - files with invalid checksums

mirdata.validate.validate_index(dataset_index, data_home, verbose=True)[source]

Validate files in a dataset’s index

Parameters:

dataset_index (list) – dataset indices
data_home (str) – Local home path that the dataset is being stored
verbose (bool) – if true, prints validation status while running

Returns:

dict - file paths that are in the index but missing locally
dict - file paths with differing checksums

mirdata.validate.validate_metadata(file_dict, data_home, verbose)[source]

Validate files

Parameters:

file_dict (dict) – dictionary of file information
data_home (str) – path where the data lives
verbose (bool) – if True, show progress

Returns:

dict - missing files
dict - files with invalid checksums

mirdata.validate.validator(dataset_index, data_home, verbose=True)[source]

Checks the existence and validity of files stored locally with respect to the paths and file checksums stored in the reference index. Logs invalid checksums and missing files.

Parameters:

dataset_index (list) – dataset indices
data_home (str) – Local home path that the dataset is being stored
verbose (bool) – if True (default), prints missing and invalid files to stdout. Otherwise, this function is equivalent to validate_index.

Returns:

missing_files (list) –

List of file paths that are in the dataset index: but missing locally.
invalid_checksums (list): List of file paths that file exists in the: dataset index but has a different checksum compare to the reference checksum.

mirdata.download_utils

Utilities for downloading from the web.

class mirdata.download_utils.DownloadProgressBar(*_, **__)[source]: Wrap tqdm to show download progress

class mirdata.download_utils.RemoteFileMetadata(filename, url, checksum, destination_dir=None, unpack_directories=None)[source]

The metadata for a remote file

Variables:

filename (str) – the remote file’s basename
url (str) – the remote file’s url
checksum (str) – the remote file’s md5 checksum
destination_dir (str or None) – the relative path for where to save the file
unpack_directories (list or None) – list of relative directories. For each directory the contents will be moved to destination_dir (or data_home if not provided)

mirdata.download_utils.download_from_remote(remote, save_dir, force_overwrite, allow_invalid_checksum)[source]

Download a remote dataset into path Fetch a dataset pointed by remote’s url, save into path using remote’s filename and ensure its integrity based on the MD5 Checksum of the downloaded file.

Adapted from scikit-learn’s sklearn.datasets.base._fetch_remote.

Parameters:

remote (RemoteFileMetadata) – Named tuple containing remote dataset meta information: url, filename and checksum
save_dir (str) – Directory to save the file to. Usually data_home
force_overwrite (bool) – If True, overwrite existing file with the downloaded file. If False, does not overwrite, but checks that checksum is consistent.

Returns:

str – Full path of the created file.

mirdata.download_utils.download_tar_file(tar_remote, save_dir, force_overwrite, cleanup, allow_invalid_checksum)[source]

Download and untar a tar file.

Parameters:

tar_remote (RemoteFileMetadata) – Object containing download information
save_dir (str) – Path to save downloaded file
force_overwrite (bool) – If True, overwrites existing files
cleanup (bool) – If True, remove tarfile after untarring

mirdata.download_utils.download_zip_file(zip_remote, save_dir, force_overwrite, cleanup, allow_invalid_checksum)[source]

Download and unzip a zip file.

Parameters:

zip_remote (RemoteFileMetadata) – Object containing download information
save_dir (str) – Path to save downloaded file
force_overwrite (bool) – If True, overwrites existing files
cleanup (bool) – If True, remove zipfile after unziping

mirdata.download_utils.downloader(save_dir, remotes=None, index=None, partial_download=None, info_message=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]

Download data to save_dir and optionally log a message.

Parameters:

save_dir (str) – The directory to download the data
remotes (dict or None) – A dictionary of RemoteFileMetadata tuples of data in zip format. If None, there is no data to download
index (core.Index) – A mirdata Index class, which may contain a remote index to be downloaded or a subset of remotes to download by default.
partial_download (list or None) – A list of keys to partially download the remote objects of the download dict. If None, all data specified by the index is downloaded
info_message (str or None) – A string of info to log when this function is called. If None, no string is logged.
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete the zip/tar file after extracting.
allow_invalid_checksum (bool) – Allow having an invalid checksum, and whenever this happens prompt a warning instead of deleting the files.

mirdata.download_utils.extractall_unicode(zfile, out_dir)[source]

Extract all files inside a zip archive to a output directory.

In comparison to the zipfile, it checks for correct file name encoding

Parameters:

zfile (obj) – Zip file object created with zipfile.ZipFile
out_dir (str) – Output folder

mirdata.download_utils.move_directory_contents(source_dir, target_dir)[source]

Move the contents of source_dir into target_dir, and delete source_dir

Parameters:

source_dir (str) – path to source directory
target_dir (str) – path to target directory

mirdata.download_utils.untar(tar_path, cleanup)[source]

Untar a tar file inside it’s current directory.

Parameters:

tar_path (str) – Path to tar file
cleanup (bool) – If True, remove tarfile after untarring

mirdata.download_utils.unzip(zip_path, cleanup)[source]

Unzip a zip file inside it’s current directory.

Parameters:

zip_path (str) – Path to zip file
cleanup (bool) – If True, remove zipfile after unzipping