Initializing¶
- mirdata.initialize(dataset_name, data_home=None, version='default')[source]¶
Load a mirdata dataset by name
Example
orchset = mirdata.initialize('orchset') # get the orchset dataset orchset.download() # download orchset orchset.validate() # validate orchset track = orchset.choice_track() # load a random track print(track) # see what data a track contains orchset.track_ids() # load all track ids
- Parameters
dataset_name (str) – the dataset’s name see mirdata.DATASETS for a complete list of possibilities
data_home (str or None) – path where the data lives. If None uses the default location.
version (str or None) – which version of the dataset to load. If None, the default version is loaded.
- Returns
Dataset – a mirdata.core.Dataset object
Dataset Loaders¶
acousticbrainz_genre¶
Acoustic Brainz Genre dataset
Dataset Info
The AcousticBrainz Genre Dataset consists of four datasets of genre annotations and music features extracted from audio suited for evaluation of hierarchical multi-label genre classification systems.
Description about the music features can be found here: https://essentia.upf.edu/streaming_extractor_music.html
The datasets are used within the MediaEval AcousticBrainz Genre Task. The task is focused on content-based music genre recognition using genre annotations from multiple sources and large-scale music features data available in the AcousticBrainz database. The goal of our task is to explore how the same music pieces can be annotated differently by different communities following different genre taxonomies, and how this should be addressed by content-based genre r ecognition systems.
We provide four datasets containing genre and subgenre annotations extracted from four different online metadata sources:
AllMusic and Discogs are based on editorial metadata databases maintained by music experts and enthusiasts. These sources contain explicit genre/subgenre annotations of music releases (albums) following a predefined genre namespace and taxonomy. We propagated release-level annotations to recordings (tracks) in AcousticBrainz to build the datasets.
Lastfm and Tagtraum are based on collaborative music tagging platforms with large amounts of genre labels provided by their users for music recordings (tracks). We have automatically inferred a genre/subgenre taxonomy and annotations from these labels.
For details on format and contents, please refer to the data webpage.
Note, that the AllMusic ground-truth annotations are distributed separately at https://zenodo.org/record/2554044.
If you use the MediaEval AcousticBrainz Genre dataset or part of it, please cite our ISMIR 2019 overview paper:
Bogdanov, D., Porter A., Schreiber H., Urbano J., & Oramas S. (2019).
The AcousticBrainz Genre Dataset: Multi-Source, Multi-Level, Multi-Label, and Large-Scale.
20th International Society for Music Information Retrieval Conference (ISMIR 2019).
This work is partially supported by the European Union’s Horizon 2020 research and innovation programme under grant agreement No 688382 AudioCommons.
- class mirdata.datasets.acousticbrainz_genre.Dataset(data_home=None, version='default')[source]¶
The acousticbrainz genre dataset
- Variables
data_home (str) – path where mirdata will look for the dataset
version (str) –
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None) –
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack
- choice_multitrack()[source]¶
Choose a random multitrack
- Returns
Multitrack – a Multitrack object instantiated by a random mtrack_id
- choice_track()[source]¶
Choose a random track
- Returns
Track – a Track object instantiated by a random track_id
- property default_path¶
Get the default path for the dataset
- Returns
str – Local path to the dataset
- download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]¶
Download data to save_dir and optionally print a message.
- Parameters
partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception
- Raises
ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected
- filter_index(search_key)[source]¶
Load from AcousticBrainz genre dataset the indexes that match with search_key.
- Parameters
search_key (str) – regex to match with folds, mbid or genres
- Returns
dict – {track_id: track data}
- get_mtrack_splits()[source]¶
Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.
- Raises
AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits
- Returns
dict – splits, keyed by split name and with values of lists of mtrack_ids
- get_random_mtrack_splits(splits, seed=42, split_names=None)[source]¶
Split the multitracks into partitions, e.g. training, validation, test
- Parameters
splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary
- Returns
dict – a dictionary containing the elements in each split
- get_random_track_splits(splits, seed=42, split_names=None)[source]¶
Split the tracks into partitions e.g. training, validation, test
- Parameters
splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary
- Returns
dict – a dictionary containing the elements in each split
- get_track_splits()[source]¶
Get predetermined track splits (e.g. train/ test) released alongside this dataset
- Raises
AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits
- Returns
dict – splits, keyed by split name and with values of lists of track_ids
- load_all_train()[source]¶
Load from AcousticBrainz genre dataset the tracks that are used for training across the four different datasets.
- Returns
dict – {track_id: track data}
- load_all_validation()[source]¶
Load from AcousticBrainz genre dataset the tracks that are used for validating across the four different datasets.
- Returns
dict – {track_id: track data}
- load_allmusic_train()[source]¶
Load from AcousticBrainz genre dataset the tracks that are used for validation in allmusic dataset.
- Returns
dict – {track_id: track data}
- load_allmusic_validation()[source]¶
Load from AcousticBrainz genre dataset the tracks that are used for validation in allmusic dataset.
- Returns
dict – {track_id: track data}
- load_discogs_train()[source]¶
Load from AcousticBrainz genre dataset the tracks that are used for training in discogs dataset.
- Returns
dict – {track_id: track data}
- load_discogs_validation()[source]¶
Load from AcousticBrainz genre dataset the tracks that are used for validation in tagtraum dataset.
- Returns
dict – {track_id: track data}
- load_extractor(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.acousticbrainz_genre.load_extractor
- load_lastfm_train()[source]¶
Load from AcousticBrainz genre dataset the tracks that are used for training in lastfm dataset.
- Returns
dict – {track_id: track data}
- load_lastfm_validation()[source]¶
Load from AcousticBrainz genre dataset the tracks that are used for validation in lastfm dataset.
- Returns
dict – {track_id: track data}
- load_multitracks()[source]¶
Load all multitracks in the dataset
- Returns
dict – {mtrack_id: multitrack data}
- Raises
NotImplementedError – If the dataset does not support Multitracks
- load_tagtraum_train()[source]¶
Load from AcousticBrainz genre dataset the tracks that are used for training in tagtraum dataset.
- Returns
dict – {track_id: track data}
- load_tagtraum_validation()[source]¶
Load from AcousticBrainz genre dataset the tracks that are used for validating in tagtraum dataset.
- Returns
dict – {track_id: track data}
- class mirdata.datasets.acousticbrainz_genre.Track(track_id, data_home, dataset_name, index, metadata)[source]¶
AcousticBrainz Genre Dataset track class
- Parameters
track_id (str) – track id of the track
data_home (str) – Local path where the dataset is stored. If None, looks for the data in the default directory, ~/mir_datasets
- Variables
track_id (str) – track id
genre (list) – human-labeled genre and subgenres list
mbid (str) – musicbrainz id
mbid_group (str) – musicbrainz id group
artist (list) – the track’s artist/s
title (list) – the track’s title
date (list) – the track’s release date/s
filename (str) – the track’s filename
album (list) – the track’s album/s
track_number (list) – the track number/s
tonal (dict) – dictionary of acousticbrainz tonal features
low_level (dict) – dictionary of acousticbrainz low-level features
rhythm (dict) – dictionary of acousticbrainz rhythm features
- Other Parameters
acousticbrainz_metadata (dict) – dictionary of metadata provided by AcousticBrainz
- property album¶
metadata album annotation
- Returns
list – album
- property artist¶
metadata artist annotation
- Returns
list – artist
- property date¶
metadata date annotation
- Returns
list – date
- property file_name¶
metadata file_name annotation
- Returns
str – file name
- get_path(key)[source]¶
Get absolute path to track audio and annotations. Returns None if the path in the index is None
- Parameters
key (string) – Index key of the audio or annotation type
- Returns
str or None – joined path string or None
- property low_level¶
low_level track descriptors.
- Returns
dict –
‘average_loudness’: dynamic range descriptor. It rescales average loudness, computed on 2sec windows with 1 sec overlap, into the [0,1] interval. The value of 0 corresponds to signals with large dynamic range, 1 corresponds to signal with little dynamic range. Algorithms: Loudness
’dynamic_complexity’: dynamic complexity computed on 2sec windows with 1sec overlap. Algorithms: DynamicComplexity
’silence_rate_20dB’, ‘silence_rate_30dB’, ‘silence_rate_60dB’: rate of silent frames in a signal for thresholds of 20, 30, and 60 dBs. Algorithms: SilenceRate
’spectral_rms’: spectral RMS. Algorithms: RMS
’spectral_flux’: spectral flux of a signal computed using L2-norm. Algorithms: Flux
’spectral_centroid’, ‘spectral_kurtosis’, ‘spectral_spread’, ‘spectral_skewness’: centroid and central moments statistics describing the spectral shape. Algorithms: Centroid, CentralMoments
’spectral_rolloff’: the roll-off frequency of a spectrum. Algorithms: RollOff
’spectral_decrease’: spectral decrease. Algorithms: Decrease
’hfc’: high frequency content descriptor as proposed by Masri. Algorithms: HFC
’zerocrossingrate’ zero-crossing rate. Algorithms: ZeroCrossingRate
’spectral_energy’: spectral energy. Algorithms: Energy
’spectral_energyband_low’, ‘spectral_energyband_middle_low’, ‘spectral_energyband_middle_high’,
’spectral_energyband_high’: spectral energy in frequency bands [20Hz, 150Hz], [150Hz, 800Hz], [800Hz, 4kHz], and [4kHz, 20kHz]. Algorithms EnergyBand
’barkbands’: spectral energy in 27 Bark bands. Algorithms: BarkBands
’melbands’: spectral energy in 40 mel bands. Algorithms: MFCC
’erbbands’: spectral energy in 40 ERB bands. Algorithms: ERBBands
’mfcc’: the first 13 mel frequency cepstrum coefficients. See algorithm: MFCC
’gfcc’: the first 13 gammatone feature cepstrum coefficients. Algorithms: GFCC
’barkbands_crest’, ‘barkbands_flatness_db’: crest and flatness computed over energies in Bark bands. Algorithms: Crest, FlatnessDB
’barkbands_kurtosis’, ‘barkbands_skewness’, ‘barkbands_spread’: central moments statistics over energies in Bark bands. Algorithms: CentralMoments
’melbands_crest’, ‘melbands_flatness_db’: crest and flatness computed over energies in mel bands. Algorithms: Crest, FlatnessDB
’melbands_kurtosis’, ‘melbands_skewness’, ‘melbands_spread’: central moments statistics over energies in mel bands. Algorithms: CentralMoments
’erbbands_crest’, ‘erbbands_flatness_db’: crest and flatness computed over energies in ERB bands. Algorithms: Crest, FlatnessDB
’erbbands_kurtosis’, ‘erbbands_skewness’, ‘erbbands_spread’: central moments statistics over energies in ERB bands. Algorithms: CentralMoments
’dissonance’: sensory dissonance of a spectrum. Algorithms: Dissonance
’spectral_entropy’: Shannon entropy of a spectrum. Algorithms: Entropy
’pitch_salience’: pitch salience of a spectrum. Algorithms: PitchSalience
’spectral_complexity’: spectral complexity. Algorithms: SpectralComplexity
’spectral_contrast_coeffs’, ‘spectral_contrast_valleys’: spectral contrast features. Algorithms: SpectralContrast
- property rhythm¶
rhythm essentia extractor descriptors
- Returns
dict –
‘beats_position’: time positions [sec] of detected beats using beat tracking algorithm by Degara et al., 2012. Algorithms: RhythmExtractor2013, BeatTrackerDegara
’beats_count’: number of detected beats
’bpm’: BPM value according to detected beats
’bpm_histogram_first_peak_bpm’, ‘bpm_histogram_first_peak_spread’, ‘bpm_histogram_first_peak_weight’,
’bpm_histogram_second_peak_bpm’, ‘bpm_histogram_second_peak_spread’, ‘bpm_histogram_second_peak_weight’: descriptors characterizing highest and second highest peak of the BPM histogram. Algorithms: BpmHistogramDescriptors
’beats_loudness’, ‘beats_loudness_band_ratio’: spectral energy computed on beats segments of audio across the whole spectrum, and ratios of energy in 6 frequency bands. Algorithms: BeatsLoudness, SingleBeatLoudness
’onset_rate’: number of detected onsets per second. Algorithms: OnsetRate
’danceability’: danceability estimate. Algorithms: Danceability
- property title¶
metadata title annotation
- Returns
list – title
- to_jams()[source]¶
the track’s data in jams format
- Returns
jams.JAMS – return track data in jam format
- property tonal¶
tonal features
- Returns
dict –
‘tuning_frequency’: estimated tuning frequency [Hz]. Algorithms: TuningFrequency
’tuning_nontempered_energy_ratio’ and ‘tuning_equal_tempered_deviation’
’hpcp’, ‘thpcp’: 32-dimensional harmonic pitch class profile (HPCP) and its transposed version. Algorithms: HPCP
’hpcp_entropy’: Shannon entropy of a HPCP vector. Algorithms: Entropy
’key_key’, ‘key_scale’: Global key feature. Algorithms: Key
’chords_key’, ‘chords_scale’: Global key extracted from chords detection.
’chords_strength’, ‘chords_histogram’: : strength of estimated chords and normalized histogram of their progression; Algorithms: ChordsDetection, ChordsDescriptors
’chords_changes_rate’, ‘chords_number_rate’: chords change rate in the progression; ratio of different chords from the total number of chords in the progression; Algorithms: ChordsDetection, ChordsDescriptors
- property tracknumber¶
metadata tracknumber annotation
- Returns
list – tracknumber
- mirdata.datasets.acousticbrainz_genre.load_extractor(fhandle)[source]¶
Load a AcousticBrainz Dataset json file with all the features and metadata.
- Parameters
fhandle (str or file-like) – path or file-like object pointing to a json file
- Returns
np.ndarray - the mono audio signal
float - The sample rate of the audio file
beatles¶
Beatles Dataset Loader
Dataset Info
The Beatles Dataset includes beat and metric position, chord, key, and segmentation annotations for 179 Beatles songs. Details can be found in http://matthiasmauch.net/_pdf/mauch_omp_2009.pdf and http://isophonics.net/content/reference-annotations-beatles.
- class mirdata.datasets.beatles.Dataset(data_home=None, version='default')[source]¶
The beatles dataset
- Variables
data_home (str) – path where mirdata will look for the dataset
version (str) –
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None) –
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack
- choice_multitrack()[source]¶
Choose a random multitrack
- Returns
Multitrack – a Multitrack object instantiated by a random mtrack_id
- choice_track()[source]¶
Choose a random track
- Returns
Track – a Track object instantiated by a random track_id
- property default_path¶
Get the default path for the dataset
- Returns
str – Local path to the dataset
- download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]¶
Download data to save_dir and optionally print a message.
- Parameters
partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception
- Raises
ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected
- get_mtrack_splits()[source]¶
Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.
- Raises
AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits
- Returns
dict – splits, keyed by split name and with values of lists of mtrack_ids
- get_random_mtrack_splits(splits, seed=42, split_names=None)[source]¶
Split the multitracks into partitions, e.g. training, validation, test
- Parameters
splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary
- Returns
dict – a dictionary containing the elements in each split
- get_random_track_splits(splits, seed=42, split_names=None)[source]¶
Split the tracks into partitions e.g. training, validation, test
- Parameters
splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary
- Returns
dict – a dictionary containing the elements in each split
- get_track_splits()[source]¶
Get predetermined track splits (e.g. train/ test) released alongside this dataset
- Raises
AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits
- Returns
dict – splits, keyed by split name and with values of lists of track_ids
- load_audio(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.beatles.load_audio
- load_beats(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.beatles.load_beats
- load_chords(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.beatles.load_chords
- load_multitracks()[source]¶
Load all multitracks in the dataset
- Returns
dict – {mtrack_id: multitrack data}
- Raises
NotImplementedError – If the dataset does not support Multitracks
- load_sections(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.beatles.load_sections
- class mirdata.datasets.beatles.Track(track_id, data_home, dataset_name, index, metadata)[source]¶
Beatles track class
- Parameters
track_id (str) – track id of the track
data_home (str) – path where the data lives
- Variables
audio_path (str) – track audio path
beats_path (str) – beat annotation path
chords_path (str) – chord annotation path
keys_path (str) – key annotation path
sections_path (str) – sections annotation path
title (str) – title of the track
track_id (str) – track id
- Other Parameters
beats (BeatData) – human-labeled beat annotations
chords (ChordData) – human-labeled chord annotations
key (KeyData) – local key annotations
sections (SectionData) – section annotations
- property audio: Optional[Tuple[numpy.ndarray, float]]¶
The track’s audio
- Returns
np.ndarray - audio signal
float - sample rate
- mirdata.datasets.beatles.load_audio(fhandle: BinaryIO) Tuple[numpy.ndarray, float] [source]¶
Load a Beatles audio file.
- Parameters
fhandle (str or file-like) – path or file-like object pointing to an audio file
- Returns
np.ndarray - the mono audio signal
float - The sample rate of the audio file
- mirdata.datasets.beatles.load_beats(fhandle: TextIO) mirdata.annotations.BeatData [source]¶
Load Beatles format beat data from a file
- Parameters
fhandle (str or file-like) – path or file-like object pointing to a beat annotation file
- Returns
BeatData – loaded beat data
- mirdata.datasets.beatles.load_chords(fhandle: TextIO) mirdata.annotations.ChordData [source]¶
Load Beatles format chord data from a file
- Parameters
fhandle (str or file-like) – path or file-like object pointing to a chord annotation file
- Returns
ChordData – loaded chord data
- mirdata.datasets.beatles.load_key(fhandle: TextIO) mirdata.annotations.KeyData [source]¶
Load Beatles format key data from a file
- Parameters
fhandle (str or file-like) – path or file-like object pointing to a key annotation file
- Returns
KeyData – loaded key data
- mirdata.datasets.beatles.load_sections(fhandle: TextIO) mirdata.annotations.SectionData [source]¶
Load Beatles format section data from a file
- Parameters
fhandle (str or file-like) – path or file-like object pointing to a section annotation file
- Returns
SectionData – loaded section data
beatport_key¶
beatport_key Dataset Loader
Dataset Info
The Beatport EDM Key Dataset includes 1486 two-minute sound excerpts from various EDM subgenres, annotated with single-key labels, comments and confidence levels generously provided by Eduard Mas Marín, and thoroughly revised and expanded by Ángel Faraldo.
The original audio samples belong to online audio snippets from Beatport, an online music store for DJ’s and Electronic Dance Music Producers (<http:www.beatport.com>). If this dataset were used in further research, we would appreciate the citation of the current DOI (10.5281/zenodo.1101082) and the following doctoral dissertation, where a detailed description of the properties of this dataset can be found:
Ángel Faraldo (2017). Tonality Estimation in Electronic Dance Music: A Computational and Musically Informed
Examination. PhD Thesis. Universitat Pompeu Fabra, Barcelona.
This dataset is mainly intended to assess the performance of computational key estimation algorithms in electronic dance music subgenres.
Data License: Creative Commons Attribution Share Alike 4.0 International
- class mirdata.datasets.beatport_key.Dataset(data_home=None, version='default')[source]¶
The beatport_key dataset
- Variables
data_home (str) – path where mirdata will look for the dataset
version (str) –
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None) –
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack
- choice_multitrack()[source]¶
Choose a random multitrack
- Returns
Multitrack – a Multitrack object instantiated by a random mtrack_id
- choice_track()[source]¶
Choose a random track
- Returns
Track – a Track object instantiated by a random track_id
- property default_path¶
Get the default path for the dataset
- Returns
str – Local path to the dataset
- download(partial_download=None, force_overwrite=False, cleanup=False)[source]¶
Download the dataset
- Parameters
partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
- Raises
ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected
- get_mtrack_splits()[source]¶
Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.
- Raises
AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits
- Returns
dict – splits, keyed by split name and with values of lists of mtrack_ids
- get_random_mtrack_splits(splits, seed=42, split_names=None)[source]¶
Split the multitracks into partitions, e.g. training, validation, test
- Parameters
splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary
- Returns
dict – a dictionary containing the elements in each split
- get_random_track_splits(splits, seed=42, split_names=None)[source]¶
Split the tracks into partitions e.g. training, validation, test
- Parameters
splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary
- Returns
dict – a dictionary containing the elements in each split
- get_track_splits()[source]¶
Get predetermined track splits (e.g. train/ test) released alongside this dataset
- Raises
AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits
- Returns
dict – splits, keyed by split name and with values of lists of track_ids
- load_artist(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.beatport_key.load_artist
- load_audio(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.beatport_key.load_audio
- load_genre(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.beatport_key.load_genre
- load_key(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.beatport_key.load_key
- load_multitracks()[source]¶
Load all multitracks in the dataset
- Returns
dict – {mtrack_id: multitrack data}
- Raises
NotImplementedError – If the dataset does not support Multitracks
- load_tempo(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.beatport_key.load_tempo
- class mirdata.datasets.beatport_key.Track(track_id, data_home, dataset_name, index, metadata)[source]¶
beatport_key track class
- Parameters
track_id (str) – track id of the track
data_home (str) – Local path where the dataset is stored.
- Variables
audio_path (str) – track audio path
keys_path (str) – key annotation path
metadata_path (str) – sections annotation path
title (str) – title of the track
track_id (str) – track id
- Other Parameters
key (list) – list of annotated musical keys
artists (list) – artists involved in the track
genre (dict) – genres and subgenres
tempo (int) – tempo in beats per minute
- property audio¶
The track’s audio
- Returns
np.ndarray - audio signal
float - sample rate
- mirdata.datasets.beatport_key.load_artist(fhandle)[source]¶
Load beatport_key tempo data from a file
- Parameters
fhandle (str or file-like) – path or file-like object pointing to metadata file
- Returns
list – list of artists involved in the track.
- mirdata.datasets.beatport_key.load_audio(fpath)[source]¶
Load a beatport_key audio file.
- Parameters
fpath (str) – path to an audio file
- Returns
np.ndarray - the mono audio signal
float - The sample rate of the audio file
- mirdata.datasets.beatport_key.load_genre(fhandle)[source]¶
Load beatport_key genre data from a file
- Parameters
fhandle (str or file-like) – path or file-like object pointing to metadata file
- Returns
dict – with the list with genres [‘genres’] and list with sub-genres [‘sub_genres’]
billboard¶
McGill Billboard Dataset Loader
Dataset Info
The McGill Billboard dataset includes annotations and audio features corresponding to 890 slots from a random sample of Billboard chart slots. It also includes metadata like Billboard chart date, peak rank, artist name, etc. Details can be found at https://ddmal.music.mcgill.ca/research/The_McGill_Billboard_Project_(Chord_Analysis_Dataset)
- class mirdata.datasets.billboard.Dataset(data_home=None, version='default')[source]¶
The McGill Billboard dataset
- Variables
data_home (str) – path where mirdata will look for the dataset
version (str) –
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None) –
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack
- choice_multitrack()[source]¶
Choose a random multitrack
- Returns
Multitrack – a Multitrack object instantiated by a random mtrack_id
- choice_track()[source]¶
Choose a random track
- Returns
Track – a Track object instantiated by a random track_id
- property default_path¶
Get the default path for the dataset
- Returns
str – Local path to the dataset
- download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]¶
Download data to save_dir and optionally print a message.
- Parameters
partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception
- Raises
ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected
- get_mtrack_splits()[source]¶
Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.
- Raises
AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits
- Returns
dict – splits, keyed by split name and with values of lists of mtrack_ids
- get_random_mtrack_splits(splits, seed=42, split_names=None)[source]¶
Split the multitracks into partitions, e.g. training, validation, test
- Parameters
splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary
- Returns
dict – a dictionary containing the elements in each split
- get_random_track_splits(splits, seed=42, split_names=None)[source]¶
Split the tracks into partitions e.g. training, validation, test
- Parameters
splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary
- Returns
dict – a dictionary containing the elements in each split
- get_track_splits()[source]¶
Get predetermined track splits (e.g. train/ test) released alongside this dataset
- Raises
AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits
- Returns
dict – splits, keyed by split name and with values of lists of track_ids
- load_audio(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.billboard.load_audio
- load_chords(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.billboard.load_chords
- load_multitracks()[source]¶
Load all multitracks in the dataset
- Returns
dict – {mtrack_id: multitrack data}
- Raises
NotImplementedError – If the dataset does not support Multitracks
- load_named_sections(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.billboard.load_named_sections
- load_sections(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.billboard.load_sections
- class mirdata.datasets.billboard.Track(track_id, data_home, dataset_name, index, metadata)[source]¶
McGill Billboard Dataset Track class
- Parameters
track_id (str) – track id of the track
- Variables
track_id (str) – the index for the sample entry
audio_path (str) – audio path of the track
date (chart) – the date of the chart for the entry
rank (peak) – the desired rank on that chart
rank – the rank of the song actually annotated, which may be up to 2 ranks higher or lower than the target rank
title (str) – the title of the song annotated
artist (str) – the name of the artist performing the song annotated
rank – the highest rank the song annotated ever achieved on the Billboard Hot 100
chart (weeks on) – the number of weeks the song annotated spent on the Billboard Hot 100 chart in total
- Other Parameters
chords_full (ChordData) – HTK-style LAB files for the chord annotations (full)
chords_majmin7 (ChordData) – HTK-style LAB files for the chord annotations (majmin7)
chords_majmin7inv (ChordData) – HTK-style LAB files for the chord annotations (majmin7inv)
chords_majmin (ChordData) – HTK-style LAB files for the chord annotations (majmin)
chords_majmininv (ChordData) – HTK-style LAB files for the chord annotations(majmininv)
chroma (np.array) – Array containing the non-negative-least-squares chroma vectors
tuning (list) – List containing the tuning estimates
sections (SectionData) – Letter-annotated section data (A,B,A’)
named_sections (SectionData) – Name-annotated section data (intro, verse, chorus)
salami_metadata (dict) – Metadata of the Salami LAB file
- property audio: Optional[Tuple[numpy.ndarray, float]]¶
The track’s audio
- Returns
np.ndarray - audio signal
float - sample rate
- chroma¶
Non-negative-least-squares (NNLS) chroma vectors from the Chordino Vamp plug-in
- Returns
np.ndarray - NNLS chroma vector
- get_path(key)[source]¶
Get absolute path to track audio and annotations. Returns None if the path in the index is None
- Parameters
key (string) – Index key of the audio or annotation type
- Returns
str or None – joined path string or None
- to_jams()[source]¶
Get the track’s data in jams format
- Returns
jams.JAMS – the track’s data in jams format
- tuning¶
Tuning estimates from the Chordino Vamp plug-in
- Returns
list - list of of tuning estimates []
- mirdata.datasets.billboard.load_audio(fhandle: BinaryIO) Tuple[numpy.ndarray, float] [source]¶
Load a Billboard audio file.
- Parameters
fhandle (str or file-like) – File-like object or path to audio file
- Returns
np.ndarray - the mono audio signal
float - The sample rate of the audio file
- mirdata.datasets.billboard.load_chords(fhandle: TextIO)[source]¶
Load chords from a Salami LAB file.
- Parameters
fhandle (str or file-like) – path to audio file
- Returns
ChordData – chord data
cante100¶
cante100 Loader
Dataset Info
The cante100 dataset contains 100 tracks taken from the COFLA corpus. We defined 10 style families of which 10 tracks each are included. Apart from the style family, we manually annotated the sections of the track in which the vocals are present. In addition, we provide a number of low-level descriptors and the fundamental frequency corresponding to the predominant melody for each track. The meta-information includes editoral meta-data and the musicBrainz ID.
Total tracks: 100
cante100 audio is only available upon request. To download the audio request access in this link: https://zenodo.org/record/1324183. Then unzip the audio into the cante100 general dataset folder for the rest of annotations and files.
Audio specifications:
Sampling frequency: 44.1 kHz
Bit-depth: 16 bit
Audio format: .mp3
cante100 dataset has spectrogram available, in csv format. spectrogram is available to download without request needed, so at first instance, cante100 loader uses the spectrogram of the tracks.
The available annotations are:
F0 (predominant melody)
Automatic transcription of notes (of singing voice)
CANTE100 LICENSE (COPIED FROM ZENODO PAGE)
The provided datasets are offered free of charge for internal non-commercial use.
We do not grant any rights for redistribution or modification. All data collections were gathered
by the COFLA team.
© COFLA 2015. All rights reserved.
For more details, please visit: http://www.cofla-project.com/?page_id=134
- class mirdata.datasets.cante100.Dataset(data_home=None, version='default')[source]¶
The cante100 dataset
- Variables
data_home (str) – path where mirdata will look for the dataset
version (str) –
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None) –
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack
- choice_multitrack()[source]¶
Choose a random multitrack
- Returns
Multitrack – a Multitrack object instantiated by a random mtrack_id
- choice_track()[source]¶
Choose a random track
- Returns
Track – a Track object instantiated by a random track_id
- property default_path¶
Get the default path for the dataset
- Returns
str – Local path to the dataset
- download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]¶
Download data to save_dir and optionally print a message.
- Parameters
partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception
- Raises
ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected
- get_mtrack_splits()[source]¶
Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.
- Raises
AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits
- Returns
dict – splits, keyed by split name and with values of lists of mtrack_ids
- get_random_mtrack_splits(splits, seed=42, split_names=None)[source]¶
Split the multitracks into partitions, e.g. training, validation, test
- Parameters
splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary
- Returns
dict – a dictionary containing the elements in each split
- get_random_track_splits(splits, seed=42, split_names=None)[source]¶
Split the tracks into partitions e.g. training, validation, test
- Parameters
splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary
- Returns
dict – a dictionary containing the elements in each split
- get_track_splits()[source]¶
Get predetermined track splits (e.g. train/ test) released alongside this dataset
- Raises
AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits
- Returns
dict – splits, keyed by split name and with values of lists of track_ids
- load_audio(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.cante100.load_audio
- load_melody(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.cante100.load_melody
- load_multitracks()[source]¶
Load all multitracks in the dataset
- Returns
dict – {mtrack_id: multitrack data}
- Raises
NotImplementedError – If the dataset does not support Multitracks
- load_notes(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.cante100.load_notes
- load_spectrogram(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.cante100.load_spectrogram
- class mirdata.datasets.cante100.Track(track_id, data_home, dataset_name, index, metadata)[source]¶
cante100 track class
- Parameters
track_id (str) – track id of the track
data_home (str) – Local path where the dataset is stored. If None, looks for the data in the default directory, ~/mir_datasets/cante100
- Variables
track_id (str) – track id
identifier (str) – musicbrainz id of the track
artist (str) – performing artists
title (str) – title of the track song
release (str) – release where the track can be found
duration (str) – duration in seconds of the track
- Other Parameters
melody (F0Data) – annotated melody
notes (NoteData) – annotated notes
- property audio: Tuple[numpy.ndarray, float]¶
The track’s audio
- Returns
np.ndarray - audio signal
float - sample rate
- get_path(key)[source]¶
Get absolute path to track audio and annotations. Returns None if the path in the index is None
- Parameters
key (string) – Index key of the audio or annotation type
- Returns
str or None – joined path string or None
- property spectrogram: Optional[numpy.ndarray]¶
spectrogram of The track’s audio
- Returns
np.ndarray – spectrogram
- mirdata.datasets.cante100.load_audio(fpath: str) Tuple[numpy.ndarray, float] [source]¶
Load a cante100 audio file.
- Parameters
fpath (str) – path to audio file
- Returns
np.ndarray - the mono audio signal
float - The sample rate of the audio file
- mirdata.datasets.cante100.load_melody(fhandle: TextIO) Optional[mirdata.annotations.F0Data] [source]¶
Load cante100 f0 annotations
- Parameters
fhandle (str or file-like) – path or file-like object pointing to melody annotation file
- Returns
F0Data – predominant melody
- mirdata.datasets.cante100.load_notes(fhandle: TextIO) mirdata.annotations.NoteData [source]¶
Load note data from the annotation files
- Parameters
fhandle (str or file-like) – path or file-like object pointing to a notes annotation file
- Returns
NoteData – note annotations
- mirdata.datasets.cante100.load_spectrogram(fhandle: TextIO) numpy.ndarray [source]¶
Load a cante100 dataset spectrogram file.
- Parameters
fhandle (str or file-like) – path or file-like object pointing to an audio file
- Returns
np.ndarray – spectrogram
compmusic_carnatic_rhythm¶
CompMusic Carnatic Rhythm Dataset Loader
Dataset Info
CompMusic Carnatic Rhythm Dataset is a rhythm annotated test corpus for automatic rhythm analysis tasks in Carnatic Music. The collection consists of audio excerpts from the CompMusic Carnatic research corpus, manually annotated time aligned markers indicating the progression through the taala cycle, and the associated taala related metadata. A brief description of the dataset is provided below. For a brief overview and audio examples of taalas in Carnatic music, please see: http://compmusic.upf.edu/examples-taala-carnatic
The dataset contains the following data:
AUDIO: The pieces are chosen from the CompMusic Carnatic music collection. The pieces were chosen in four popular taalas of Carnatic music, which encompasses a majority of Carnatic music. The pieces were chosen include a mix of vocal and instrumental recordings, new and old recordings, and to span a wide variety of forms. All pieces have a percussion accompaniment, predominantly Mridangam. The excerpts are full length pieces or a part of the full length pieces. There are also several different pieces by the same artist (or release group), and multiple instances of the same composition rendered by different artists. Each piece is uniquely identified using the MBID of the recording. The pieces are stereo, 160 kbps, mp3 files sampled at 44.1 kHz.
SAMA AND BEATS: The primary annotations are audio synchronized time-stamps indicating the different metrical positions in the taala cycle. The annotations were created using Sonic Visualizer by tapping to music and manually correcting the taps. Each annotation has a time-stamp and an associated numeric label that indicates the position of the beat marker in the taala cycle. The marked positions in the taala cycle are shown with numbers, along with the corresponding label used. In each case, the sama (the start of the cycle, analogous to the downbeat) are indicated using the numeral 1.
METADATA: For each excerpt, the taala of the piece, edupu (offset of the start of the piece, relative to the sama, measured in aksharas) of the composition, and the kalai (the cycle length scaling factor) are recorded. Each excerpt can be uniquely identified and located with the MBID of the recording, and the relative start and end times of the excerpt within the whole recording. A separate 5 digit taala based unique ID is also provided for each excerpt as a double check. The artist, release, the lead instrument, and the raaga of the piece are additional editorial metadata obtained from the release. A flag indicates if the excerpt is a full piece or only a part of a full piece. There are optional comments on audio quality and annotation specifics.
Possible uses of the dataset: Possible tasks where the dataset can be used include taala, sama and beat tracking, tempo estimation and tracking, taala recognition, rhythm based segmentation of musical audio, structural segmentation, audio to score/lyrics alignment, and rhythmic pattern discovery.
Dataset organization: The dataset consists of audio, annotations, an accompanying spreadsheet providing additional metadata. For a detailed description of the organization, please see the README in the dataset.
Data Subset: A subset of this dataset consisting of 118 two minute excerpts of music is also available. The content in the subset is equaivalent and is separately distributed for a quicker testing of algorithms and approaches.
The annotations files of this dataset are shared with the following license: Creative Commons Attribution Non Commercial Share Alike 4.0 International
- class mirdata.datasets.compmusic_carnatic_rhythm.Dataset(data_home=None, version='default')[source]¶
The compmusic_carnatic_rhythm dataset
- Variables
data_home (str) – path where mirdata will look for the dataset
version (str) –
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None) –
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack
- choice_multitrack()[source]¶
Choose a random multitrack
- Returns
Multitrack – a Multitrack object instantiated by a random mtrack_id
- choice_track()[source]¶
Choose a random track
- Returns
Track – a Track object instantiated by a random track_id
- property default_path¶
Get the default path for the dataset
- Returns
str – Local path to the dataset
- download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]¶
Download data to save_dir and optionally print a message.
- Parameters
partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception
- Raises
ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected
- get_mtrack_splits()[source]¶
Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.
- Raises
AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits
- Returns
dict – splits, keyed by split name and with values of lists of mtrack_ids
- get_random_mtrack_splits(splits, seed=42, split_names=None)[source]¶
Split the multitracks into partitions, e.g. training, validation, test
- Parameters
splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary
- Returns
dict – a dictionary containing the elements in each split
- get_random_track_splits(splits, seed=42, split_names=None)[source]¶
Split the tracks into partitions e.g. training, validation, test
- Parameters
splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary
- Returns
dict – a dictionary containing the elements in each split
- get_track_splits()[source]¶
Get predetermined track splits (e.g. train/ test) released alongside this dataset
- Raises
AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits
- Returns
dict – splits, keyed by split name and with values of lists of track_ids
- load_multitracks()[source]¶
Load all multitracks in the dataset
- Returns
dict – {mtrack_id: multitrack data}
- Raises
NotImplementedError – If the dataset does not support Multitracks
- class mirdata.datasets.compmusic_carnatic_rhythm.Track(track_id, data_home, dataset_name, index, metadata)[source]¶
CompMusic Carnatic Music Rhythm class
- Parameters
track_id (str) – track id of the track
data_home (str) – Local path where the dataset is stored. default=None If None, looks for the data in the default directory, ~/mir_datasets
- Variables
audio_path (str) – path to audio file
beats_path (srt) – path to beats file
meter_path (srt) – path to meter file
- Other Parameters
beats (BeatData) – beats annotation
meter (string) – meter annotation
mbid (string) – MusicBrainz ID
name (string) – name of the recording in the dataset
artist (string) – artists name
release (string) – release name
lead_instrument_code (string) – code for the load instrument
taala (string) – taala annotation
raaga (string) – raaga annotation
num_of_beats (int) – number of beats in annotation
num_of_samas (int) – number of samas in annotation
- property audio¶
The track’s audio
- Returns
np.ndarray - audio signal
float - sample rate
- mirdata.datasets.compmusic_carnatic_rhythm.load_audio(audio_path)[source]¶
Load an audio file.
- Parameters
audio_path (str) – path to audio file
- Returns
np.ndarray - the mono audio signal
float - The sample rate of the audio file
compmusic_hindustani_rhythm¶
CompMusic Hindustani Rhythm Dataset Loader
Dataset Info
CompMusic Hindustani Rhythm Dataset is a rhythm annotated test corpus for automatic rhythm analysis tasks in Hindustani Music. The collection consists of audio excerpts from the CompMusic Hindustani research corpus, manually annotated time aligned markers indicating the progression through the taal cycle, and the associated taal related metadata. A brief description of the dataset is provided below.
For a brief overview and audio examples of taals in Hindustani music, please see: http://compmusic.upf.edu/examples-taal-hindustani
The dataset contains the following data:
AUDIO: The pieces are chosen from the CompMusic Hindustani music collection. The pieces were chosen in four popular taals of Hindustani music, which encompasses a majority of Hindustani khyal music. The pieces were chosen include a mix of vocal and instrumental recordings, new and old recordings, and to span three lays. For each taal, there are pieces in dhrut (fast), madhya (medium) and vilambit (slow) lays (tempo class). All pieces have Tabla as the percussion accompaniment. The excerpts are two minutes long. Each piece is uniquely identified using the MBID of the recording. The pieces are stereo, 160 kbps, mp3 files sampled at 44.1 kHz. The audio is also available as wav files for experiments.
SAM, VIBHAAG AND THE MAATRAS: The primary annotations are audio synchronized time-stamps indicating the different metrical positions in the taal cycle. The sam and matras of the cycle are annotated. The annotations were created using Sonic Visualizer by tapping to music and manually correcting the taps. Each annotation has a time-stamp and an associated numeric label that indicates the position of the beat marker in the taala cycle. The annotations and the associated metadata have been verified for correctness and completeness by a professional Hindustani musician and musicologist. The long thick lines show vibhaag boundaries. The numerals indicate the matra number in cycle. In each case, the sam (the start of the cycle, analogous to the downbeat) are indicated using the numeral 1.
METADATA: For each excerpt, the taal and the lay of the piece are recorded. Each excerpt can be uniquely identified and located with the MBID of the recording, and the relative start and end times of the excerpt within the whole recording. A separate 5 digit taal based unique ID is also provided for each excerpt as a double check. The artist, release, the lead instrument, and the raag of the piece are additional editorial metadata obtained from the release. There are optional comments on audio quality and annotation specifics.
The dataset consists of excerpts with a wide tempo range from 10 MPM (matras per minute) to 370 MPM. To study any effects of the tempo class, the full dataset (HMDf) is also divided into two other subsets - the long cycle subset (HMDl) consisting of vilambit (slow) pieces with a median tempo between 10-60 MPM, and the short cycle subset (HMDs) with madhyalay (medium, 60-150 MPM) and the drut lay (fast, 150+ MPM).
Possible uses of the dataset: Possible tasks where the dataset can be used include taal, sama and beat tracking, tempo estimation and tracking, taal recognition, rhythm based segmentation of musical audio, audio to score/lyrics alignment, and rhythmic pattern discovery.
Dataset organization: The dataset consists of audio, annotations, an accompanying spreadsheet providing additional metadata, a MAT-file that has identical information as the spreadsheet, and a dataset description document.
The annotations files of this dataset are shared with the following license: Creative Commons Attribution Non Commercial Share Alike 4.0 International
- class mirdata.datasets.compmusic_hindustani_rhythm.Dataset(data_home=None, version='default')[source]¶
The compmusic_hindustani_rhythm dataset
- Variables
data_home (str) – path where mirdata will look for the dataset
version (str) –
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None) –
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack
- choice_multitrack()[source]¶
Choose a random multitrack
- Returns
Multitrack – a Multitrack object instantiated by a random mtrack_id
- choice_track()[source]¶
Choose a random track
- Returns
Track – a Track object instantiated by a random track_id
- property default_path¶
Get the default path for the dataset
- Returns
str – Local path to the dataset
- download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]¶
Download data to save_dir and optionally print a message.
- Parameters
partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception
- Raises
ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected
- get_mtrack_splits()[source]¶
Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.
- Raises
AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits
- Returns
dict – splits, keyed by split name and with values of lists of mtrack_ids
- get_random_mtrack_splits(splits, seed=42, split_names=None)[source]¶
Split the multitracks into partitions, e.g. training, validation, test
- Parameters
splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary
- Returns
dict – a dictionary containing the elements in each split
- get_random_track_splits(splits, seed=42, split_names=None)[source]¶
Split the tracks into partitions e.g. training, validation, test
- Parameters
splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary
- Returns
dict – a dictionary containing the elements in each split
- get_track_splits()[source]¶
Get predetermined track splits (e.g. train/ test) released alongside this dataset
- Raises
AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits
- Returns
dict – splits, keyed by split name and with values of lists of track_ids
- load_multitracks()[source]¶
Load all multitracks in the dataset
- Returns
dict – {mtrack_id: multitrack data}
- Raises
NotImplementedError – If the dataset does not support Multitracks
- class mirdata.datasets.compmusic_hindustani_rhythm.Track(track_id, data_home, dataset_name, index, metadata)[source]¶
CompMusic Hindustani Music Rhythm class
- Parameters
track_id (str) – track id of the track
data_home (str) – Local path where the dataset is stored. default=None If None, looks for the data in the default directory, ~/mir_datasets
- Variables
audio_path (str) – path to audio file
beats_path (srt) – path to beats file
meter_path (srt) – path to meter file
- Other Parameters
beats (BeatData) – beats annotation
meter (string) – meter annotation
mbid (string) – MusicBrainz ID
name (string) – name of the recording in the dataset
artist (string) – artists name
release (string) – release name
lead_instrument_code (string) – code for the load instrument
taala (string) – taala annotation
raaga (string) – raaga annotation
laya (string) – laya annotation
num_of_beats (int) – number of beats in annotation
num_of_samas (int) – number of samas in annotation
median_matra_period (float) – median matra per period
median_matras_per_min (float) – median matras per minute
median_ISI (float) – median ISI
median_avarts_per_min (float) – median avarts per minute
- property audio¶
The track’s audio
- Returns
np.ndarray - audio signal
float - sample rate
- mirdata.datasets.compmusic_hindustani_rhythm.load_audio(audio_path)[source]¶
Load an audio file.
- Parameters
audio_path (str) – path to audio file
- Returns
np.ndarray - the mono audio signal
float - The sample rate of the audio file
compmusic_indian_tonic¶
Indian Art Music Tonic Loader
Dataset Info
This loader includes a combination of six different datasets for the task of Indian Art Music tonic identification.
These datasets comprise audio excerpts and manually done annotations of the tonic pitch of the lead artist for each audio excerpt. Each excerpt is accompanied by its associated editorial metadata. These datasets can be used to develop and evaluate computational approaches for automatic tonic identification in Indian art music. These datasets have been used in several articles mentioned below. A majority of These datasets come from the CompMusic corpora of Indian art music, for which each recording is associated with a MBID. Through the MBID other information can be obtained using the Dunya API.
These six datasets are used for for the task of tonic identification for Indian Art Music, and can be used for a comparative evaluation. To the best of our knowledge these are the largest datasets available for tonic identification for Indian art music. These datases vary in terms of the audio quality, recording period (decade), the number of recordings for Carnatic, Hindustani, male and female singers and instrumental and vocal excerpts.
All the datasets (annotations) are version controlled. The audio files corresponding to these datsets are made available on request for only research purposes. See DOWNLOAD_INFO of this loader.
The tonic annotations are availabe both in tsv and json format. The loader uses the JSON formatted annotations.
'ID': {
'artist': <name of the lead artist if available>,
'filepath': <relative path to the audio file>,
'gender': <gender of the lead singer if available>,
'mbid': <musicbrainz id when available>,
'tonic': <tonic in Hz>,
'tradition': <Hindustani or Carnatic>,
'type': <vocal or instrumental>
}
where keys of the main dictionary are the filepaths to the audio files (feature path is exactly the same with a different extension of the file name).
Despite not being loaded in this dataloader, the dataset includes features, which may be integrated to the loader in future releases. However these features may be easily computed following the instructions in the related paper. See BIBTEX.
There are a total of 2161 audio excerpts, and while the CM collection includes aproximately 50% Carnatic and 50% Hindustani recordings, IITM and IISc collections are 100% Carnatic music. The excerpts vary a lot in duration. See [this webpage](https://compmusic.upf.edu/iam-tonic-dataset) for a detailed overview of the datasets.
If you have any questions or comments about the dataset, please feel free to email: [sankalp (dot) gulati (at) gmail (dot) com], or [sankalp (dot) gulati (at) upf (dot) edu].
- class mirdata.datasets.compmusic_indian_tonic.Dataset(data_home=None, version='default')[source]¶
The compmusic_indian_tonic dataset
- Variables
data_home (str) – path where mirdata will look for the dataset
version (str) –
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None) –
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack
- choice_multitrack()[source]¶
Choose a random multitrack
- Returns
Multitrack – a Multitrack object instantiated by a random mtrack_id
- choice_track()[source]¶
Choose a random track
- Returns
Track – a Track object instantiated by a random track_id
- property default_path¶
Get the default path for the dataset
- Returns
str – Local path to the dataset
- download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]¶
Download data to save_dir and optionally print a message.
- Parameters
partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception
- Raises
ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected
- get_mtrack_splits()[source]¶
Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.
- Raises
AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits
- Returns
dict – splits, keyed by split name and with values of lists of mtrack_ids
- get_random_mtrack_splits(splits, seed=42, split_names=None)[source]¶
Split the multitracks into partitions, e.g. training, validation, test
- Parameters
splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary
- Returns
dict – a dictionary containing the elements in each split
- get_random_track_splits(splits, seed=42, split_names=None)[source]¶
Split the tracks into partitions e.g. training, validation, test
- Parameters
splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary
- Returns
dict – a dictionary containing the elements in each split
- get_track_splits()[source]¶
Get predetermined track splits (e.g. train/ test) released alongside this dataset
- Raises
AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits
- Returns
dict – splits, keyed by split name and with values of lists of track_ids
- load_multitracks()[source]¶
Load all multitracks in the dataset
- Returns
dict – {mtrack_id: multitrack data}
- Raises
NotImplementedError – If the dataset does not support Multitracks
- class mirdata.datasets.compmusic_indian_tonic.Track(track_id, data_home, dataset_name, index, metadata)[source]¶
CompMusic Tonic Dataset track class
- Parameters
track_id (str) – track id of the track
data_home (str) – Local path where the dataset is stored.
- Variables
track_id (str) – track id
audio_path (str) – audio path
- Other Parameters
tonic (float) – tonic annotation
artist (str) – performing artist
gender (str) – gender of the recording artists
mbid (str) – MusicBrainz ID of the piece (if available)
type (str) – type of piece (vocal, instrumental, etc.)
tradition (str) – tradition of the piece (Carnatic or Hindustani)
- property audio¶
The track’s audio
- Returns
np.ndarray - audio signal
float - sample rate
compmusic_jingju_acappella¶
Jingju A Cappella Singing Dataset Loader
Dataset Info
- Description:
This dataset is a collection of boundary annotations of a cappella singing performed by Beijing Opera (Jingju, 京剧) professional and amateur singers.
- Contents:
wav.zip: audio files in .wav format, mono or stereo.
pycode.zip: util code for parsing the .textgrid annotation
catalogue*.csv: recording metadata, source separation recordings are not included.
annotation_txt.zip: phrase, syllable and phoneme time boundaries (second) and labels in .txt format
- The annotation_txt.zip folder annotations are represented as follows:
phrase_char: phrase-level time boundaries, labeled in Mandarin characters
phrase: phrase-level time boundaries, labeled in Mandarin pinyin
syllable: syllable-level time boundaries, labeled in Mandarin pinyin
phoneme: phoneme-level time boundaries, labeled in X-SAMPA
- The boundaries (onset and offset) have been annotated hierarchically:
phrase (line)
syllable
phoneme
- Annotation details:
Singing units in pinyin and X-SAMPA have been annotated to a jingju a cappella singing audio dataset.
- Audio details:
The corresponding audio files are the a cappella singing arias recordings, which are stereo or mono, sampled at 44.1 kHz, and stored as .wav files. The .wav files are recorded by two institutes: those file names ending with ‘qm’ are recorded by C4DM, Queen Mary University of London; others file names ending with ‘upf’ or ‘lon’ are recorded by MTG-UPF. Additionally, another collection of 15 clean singing recordings is included in this dataset. They are extracted from the commercial recordings which originally contains karaoke accompaniment and mixed versions.
- Additional details:
Annotation format, units, parsing code and other information please refer to: https://github.com/MTG/jingjuPhonemeAnnotation
- License information:
Textgrid annotations are licensed under Creative Commons Attribution-NonCommercial 4.0 International License. Wav audio ending with ‘upf’ or ‘lon’ is licensed under Creative Commons Attribution-NonCommercial 4.0 International. For the license of .wav audio ending with ‘qm’ from C4DM Queen Mary University of London, please refer to this page http://isophonics.org/SingingVoiceDataset
- class mirdata.datasets.compmusic_jingju_acappella.Dataset(data_home=None, version='default')[source]¶
The compmusic_jingju_acappella dataset
- Variables
data_home (str) – path where mirdata will look for the dataset
version (str) –
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None) –
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack
- choice_multitrack()[source]¶
Choose a random multitrack
- Returns
Multitrack – a Multitrack object instantiated by a random mtrack_id
- choice_track()[source]¶
Choose a random track
- Returns
Track – a Track object instantiated by a random track_id
- property default_path¶
Get the default path for the dataset
- Returns
str – Local path to the dataset
- download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]¶
Download data to save_dir and optionally print a message.
- Parameters
partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception
- Raises
ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected
- get_mtrack_splits()[source]¶
Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.
- Raises
AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits
- Returns
dict – splits, keyed by split name and with values of lists of mtrack_ids
- get_random_mtrack_splits(splits, seed=42, split_names=None)[source]¶
Split the multitracks into partitions, e.g. training, validation, test
- Parameters
splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary
- Returns
dict – a dictionary containing the elements in each split
- get_random_track_splits(splits, seed=42, split_names=None)[source]¶
Split the tracks into partitions e.g. training, validation, test
- Parameters
splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary
- Returns
dict – a dictionary containing the elements in each split
- get_track_splits()[source]¶
Get predetermined track splits (e.g. train/ test) released alongside this dataset
- Raises
AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits
- Returns
dict – splits, keyed by split name and with values of lists of track_ids
- load_multitracks()[source]¶
Load all multitracks in the dataset
- Returns
dict – {mtrack_id: multitrack data}
- Raises
NotImplementedError – If the dataset does not support Multitracks
- load_phonemes(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.jingju_acapella.load_phonemes
- load_phrases(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.jingju_acapella.load_phrases
- load_syllable(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.jingju_acapella.load_syllable
- class mirdata.datasets.compmusic_jingju_acappella.Track(track_id, data_home, dataset_name, index, metadata)[source]¶
Jingju A Cappella Singing Track class
- Parameters
track_id (str) – track id of the track
data_home (str) – Local path where the dataset is stored. default=None If None, looks for the data in the default directory, ~/mir_datasets
- Variables
audio_path (str) – local path where the audio is stored
phoneme_path (str) – local path where the phoneme annotation is stored
phrase_char_path (str) – local path where the lyric phrase annotation in chinese is stored
phrase_path (str) – local path where the lyric phrase annotation in western characters is stored
syllable_path (str) – local path where the syllable annotation is stored
work (str) – string referring to the work where the track belongs
details (float) – string referring to additional details about the track
- Other Parameters
phoneme (EventData) – phoneme annotation
phrase_char (LyricsData) – lyric phrase annotation in chinese
phrase (LyricsData) – lyric phrase annotation in western characters
syllable (EventData) – syllable annotation
- property audio: Optional[Tuple[numpy.ndarray, float]]¶
The track’s audio
- Returns
np.ndarray - audio signal
float - sample rate
- mirdata.datasets.compmusic_jingju_acappella.load_audio(fhandle: BinaryIO) Tuple[numpy.ndarray, float] [source]¶
Load Jingju A Cappella Singing audio file.
- Parameters
fhandle (str or file-like) – File-like object or path to audio file
- Returns
np.ndarray - the mono audio signal
float - The sample rate of the audio file
- mirdata.datasets.compmusic_jingju_acappella.load_phonemes(fhandle: TextIO) mirdata.annotations.LyricData [source]¶
Load phonemes
- Parameters
fhandle (str or file-like) – path or file-like object pointing to a phoneme annotation file
- Returns
LyricData – phoneme annotation
- mirdata.datasets.compmusic_jingju_acappella.load_phrases(fhandle: TextIO) mirdata.annotations.LyricData [source]¶
Load lyric phrases annotation
- Parameters
fhandle (str or file-like) – path or file-like object pointing to a lyric annotation file
- Returns
LyricData – lyric phrase annotation
- mirdata.datasets.compmusic_jingju_acappella.load_syllable(fhandle: TextIO) mirdata.annotations.LyricData [source]¶
Load syllable
- Parameters
fhandle (str or file-like) – path or file-like object pointing to a syllable annotation file
- Returns
LyricData – syllable annotation
compmusic_otmm_makam¶
OTMM Makam Recognition Dataset Loader
Dataset Info
This dataset is designed to test makam recognition methodologies on Ottoman-Turkish makam music. It is composed of 50 recording from each of the 20 most common makams in CompMusic Project’s Dunya Ottoman-Turkish Makam Music collection. Currently the dataset is the largest makam recognition dataset.
The recordings are selected from commercial recordings carefully such that they cover diverse musical forms, vocal/instrumentation settings and recording qualities (e.g. historical recordings vs. contemporary recordings). Each recording in the dataset is identified by an 16-character long unique identifier called MBID, hosted in MusicBrainz. The makam and the tonic of each recording is annotated in the file annotations.json.
The audio related data in the test dataset is organized by each makam in the folder data. Due to copyright reasons, we are unable to distribute the audio. Instead we provide the predominant melody of each recording, computed by a state-of-the-art predominant melody extraction algorithm optimized for OTMM culture. These features are saved as text files (with the paths data/[makam]/[mbid].pitch) of single column that contains the frequency values. The timestamps are removed to reduce the filesizes. The step size of the pitch track is 0.0029 seconds (an analysis window of 128 sample hop size of an mp3 with 44100 Hz sample rate), with which one can recompute the timestamps of samples.
Moreover the metadata of each recording is available in the repository, crawled from MusicBrainz using an open source tool developed by us. The metadata files are saved as data/[makam]/[mbid].json.
For reproducability purposes we note the version of all tools we have used to generate this dataset in the file algorithms.json (not integrated in the loader but present in the donwloaded dataset).
A complementary toolbox for this dataset is MORTY, which is a mode recogition and tonic identification toolbox. It can be used and optimized for any modal music culture. Further details are explained in the publication above.
- class mirdata.datasets.compmusic_otmm_makam.Dataset(data_home=None, version='default')[source]¶
The compmusic_otmm_makam dataset
- Variables
data_home (str) – path where mirdata will look for the dataset
version (str) –
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None) –
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack
- choice_multitrack()[source]¶
Choose a random multitrack
- Returns
Multitrack – a Multitrack object instantiated by a random mtrack_id
- choice_track()[source]¶
Choose a random track
- Returns
Track – a Track object instantiated by a random track_id
- property default_path¶
Get the default path for the dataset
- Returns
str – Local path to the dataset
- download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]¶
Download data to save_dir and optionally print a message.
- Parameters
partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception
- Raises
ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected
- get_mtrack_splits()[source]¶
Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.
- Raises
AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits
- Returns
dict – splits, keyed by split name and with values of lists of mtrack_ids
- get_random_mtrack_splits(splits, seed=42, split_names=None)[source]¶
Split the multitracks into partitions, e.g. training, validation, test
- Parameters
splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary
- Returns
dict – a dictionary containing the elements in each split
- get_random_track_splits(splits, seed=42, split_names=None)[source]¶
Split the tracks into partitions e.g. training, validation, test
- Parameters
splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary
- Returns
dict – a dictionary containing the elements in each split
- get_track_splits()[source]¶
Get predetermined track splits (e.g. train/ test) released alongside this dataset
- Raises
AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits
- Returns
dict – splits, keyed by split name and with values of lists of track_ids
- load_mb_tags(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.compmusic_otmm_makam.load_mb_tags
- load_multitracks()[source]¶
Load all multitracks in the dataset
- Returns
dict – {mtrack_id: multitrack data}
- Raises
NotImplementedError – If the dataset does not support Multitracks
- load_pitch(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.compmusic_otmm_makam.load_pitch
- class mirdata.datasets.compmusic_otmm_makam.Track(track_id, data_home, dataset_name, index, metadata)[source]¶
OTMM Makam Track class
- Parameters
track_id (str) – track id of the track
data_home (str) – Local path where the dataset is stored. default=None If None, looks for the data in the default directory, ~/mir_datasets
- Variables
pitch_path (str) – local path where the pitch annotation is stored
mb_tags_path (str) – local path where the MusicBrainz tags annotation is stored
makam (str) – string referring to the makam represented in the track
tonic (float) – tonic annotation
mbid (str) – MusicBrainz ID of the track
- Other Parameters
pitch (F0Data) – pitch annotation
mb_tags (dict) – dictionary containing the raw editorial track metadata from MusicBrainz
- mirdata.datasets.compmusic_otmm_makam.load_mb_tags(fhandle: TextIO) dict [source]¶
Load track metadata
- Parameters
fhandle (str or file-like) – path or file-like object pointing to musicbrainz metadata file
- Returns
Dict – metadata of the track
- mirdata.datasets.compmusic_otmm_makam.load_pitch(fhandle: TextIO) mirdata.annotations.F0Data [source]¶
Load pitch
- Parameters
fhandle (str or file-like) – path or file-like object pointing to a pitch annotation file
- Returns
F0Data – pitch annotation
compmusic_raga¶
CompMusic Raga Dataset Loader
Dataset Info
Rāga datasets from CompMusicomprise two sizable datasets, one for each music tradition, Carnatic and Hindustani. These datasets comprise full length audio recordings and their associated rāga labels. These two datasets can be used to develop and evaluate approaches for performing automatic rāga recognition in Indian art music.
These datasets are derived from the CompMusic corpora of Indian Art Music. Therefore, the dataset has been compiled at the Music Technology Group, by a group of researchers working on the computational analysis of Carnatic and Hindustani music within the framework of the ERC-funded CompMusic project.
Each recording is associated with a MBID. With the MBID other information can be obtained using the Dunya API or pycompmusic.
The Carnatic subset comprises 124 hours of audio recordings and editorial metadata that includes carefully curated and verified rāga labels. It contains 480 recordings belonging to 40 rāgas with 12 recordings per rāga.
The Hindustani subset comprises 116 hours of audio recordings and editorial metadata that includes carefully curated and verified rāga labels. It contains 300 recordings belonging to 30 rāgas with 10 recordings per rāga.
The dataset also includes features per each file: * Tonic: float indicating the recording tonic * Tonic fine tuned: float indicating the manually fine-tuned recording tonic * Predominant pitch: automatically-extracted predominant pitch time-series (timestamps and freq. values) * Post-processed pitch: automatically-extracted and post-processed predominant pitch time-series * Nyas segments: KNN-extracted segments of Nyas (start and end times provided) * Tani segments: KNN-extracted segments of Tanis (start and end times provided)
The dataset includes both txt files and json files that contain information about each audio recording in terms of its mbid, the path of the audio/feature files and the associated rāga identifier. Each rāga is assigned a unique identifier by Dunya, which is similar to the mbid in terms of purpose. A mapping of the rāga id to its transliterated name is also provided.
For more information about the dataset please refer to: https://compmusic.upf.edu/node/328
- class mirdata.datasets.compmusic_raga.Dataset(data_home=None, version='default')[source]¶
The compmusic_raga dataset
- Variables
data_home (str) – path where mirdata will look for the dataset
version (str) –
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None) –
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack
- choice_multitrack()[source]¶
Choose a random multitrack
- Returns
Multitrack – a Multitrack object instantiated by a random mtrack_id
- choice_track()[source]¶
Choose a random track
- Returns
Track – a Track object instantiated by a random track_id
- property default_path¶
Get the default path for the dataset
- Returns
str – Local path to the dataset
- download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]¶
Download data to save_dir and optionally print a message.
- Parameters
partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception
- Raises
ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected
- get_mtrack_splits()[source]¶
Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.
- Raises
AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits
- Returns
dict – splits, keyed by split name and with values of lists of mtrack_ids
- get_random_mtrack_splits(splits, seed=42, split_names=None)[source]¶
Split the multitracks into partitions, e.g. training, validation, test
- Parameters
splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary
- Returns
dict – a dictionary containing the elements in each split
- get_random_track_splits(splits, seed=42, split_names=None)[source]¶
Split the tracks into partitions e.g. training, validation, test
- Parameters
splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary
- Returns
dict – a dictionary containing the elements in each split
- get_track_splits()[source]¶
Get predetermined track splits (e.g. train/ test) released alongside this dataset
- Raises
AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits
- Returns
dict – splits, keyed by split name and with values of lists of track_ids
- load_multitracks()[source]¶
Load all multitracks in the dataset
- Returns
dict – {mtrack_id: multitrack data}
- Raises
NotImplementedError – If the dataset does not support Multitracks
- class mirdata.datasets.compmusic_raga.Track(track_id, data_home, dataset_name, index, metadata)[source]¶
CompMusic Raga Dataset class
- Parameters
track_id (str) – track id of the track
data_home (str) – Local path where the dataset is stored. default=None If None, looks for the data in the default directory, ~/mir_datasets
- Variables
audio_path (str) – path to audio file
tonic_path (str) – path to tonic annotation
tonic_fine_tuned_path (str) – path to tonic fine-tuned annotation
pitch_path (str) – path to pitch annotation
pitch_post_processed_path (str) – path to processed pitch annotation
nyas_segments_path (str) – path to nyas segments annotation
tani_segments_path (str) – path to tani segments annotation
- Other Parameters
tonic (float) – tonic annotation
tonic_fine_tuned (float) – tonic fine-tuned annotation
pitch (F0Data) – pitch annotation
pitch_post_processed (F0Data) – processed pitch annotation
nyas_segments (EventData) – nyas segments annotation
tani_segments (EventData) – tani segments annotation
recording (str) – name of the recording
concert (str) – name of the concert
artist (str) – name of the artist
mbid (str) – mbid of the recording
raga (str) – raga in the recording
ragaid (str) – id of the raga in the recording
tradition (str) – tradition name (carnatic or hindustani)
- property audio¶
The track’s audio
- Returns
np.ndarray - audio signal
float - sample rate
- mirdata.datasets.compmusic_raga.load_audio(audio_path)[source]¶
Load an audio file.
- Parameters
audio_path (str) – path to audio file
- Returns
np.ndarray - the mono audio signal
float - The sample rate of the audio file
- mirdata.datasets.compmusic_raga.load_nyas_segments(fhandle)[source]¶
Load nyas segments
- Parameters
fhandle (str or file-like) – Local path where the nyas segments annotation is stored.
- Returns
EventData – segment annotation
- mirdata.datasets.compmusic_raga.load_pitch(fhandle)[source]¶
Load pitch
- Parameters
fhandle (str or file-like) – Local path where the pitch annotation is stored.
- Returns
F0Data – pitch annotation
dagstuhl_choirset¶
Dagstuhl ChoirSet Dataset Loader
Dataset Info
Dagstuhl ChoirSet (DCS) is a multitrack dataset of a cappella choral music. The dataset includes recordings of an amateur vocal ensemble performing two choir pieces in full choir and quartet settings (total duration 55min 30sec). The audio data was recorded during an MIR seminar at Schloss Dagstuhl using different close-up microphones to capture the individual singers’ voices:
Larynx microphone (LRX): contact microphone attached to the singer’s throat.
Dynamic microphone (DYN): handheld dynamic microphone.
Headset microphone (HSM): microphone close to the singer’s mouth.
LRX, DYN and HSM recordings are provided on the Track level. All tracks in the dataset have a LRX recording, while only a subset has DYN and HSM recordings.
In addition to the close-up microphone tracks, the dataset also provides the following recordings:
Room microphone mixdown (STM): mixdown of the stereo room microphone.
Room microphone left (STL): left channel of the stereo microphone.
Room microphone right (STR): right channel of the stereo microphone.
Room microphone mixdown with reverb (StereoReverb_STM): STM signal with artificial reverb.
Piano left (SPL): left channel of the piano accompaniment.
Piano right (SPR): right channel of the piano accompaniment.
All room microphone and piano recordings are provided on the Multitrack level. All multitracks have room microphone signals, while only a subset has piano recordings.
For more details, we refer to: Sebastian Rosenzweig (1), Helena Cuesta (2), Christof Weiß (1), Frank Scherbaum (3), Emilia Gómez (2,4), and Meinard Müller (1): Dagstuhl ChoirSet: A Multitrack Dataset for MIR Research on Choral Singing. Transactions of the International Society for Music Information Retrieval, 3(1), pp. 98–110, 2020. DOI: https://doi.org/10.5334/tismir.48
International Audio Laboratories Erlangen, DE
Music Technology Group, Universitat Pompeu Fabra, Barcelona, ES
University of Potsdam, DE
Joint Research Centre, European Commission, Seville, ES
- class mirdata.datasets.dagstuhl_choirset.Dataset(data_home=None, version='default')[source]¶
The Dagstuhl ChoirSet dataset
- Variables
data_home (str) – path where mirdata will look for the dataset
version (str) –
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None) –
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack
- choice_multitrack()[source]¶
Choose a random multitrack
- Returns
Multitrack – a Multitrack object instantiated by a random mtrack_id
- choice_track()[source]¶
Choose a random track
- Returns
Track – a Track object instantiated by a random track_id
- property default_path¶
Get the default path for the dataset
- Returns
str – Local path to the dataset
- download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]¶
Download data to save_dir and optionally print a message.
- Parameters
partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception
- Raises
ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected
- get_mtrack_splits()[source]¶
Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.
- Raises
AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits
- Returns
dict – splits, keyed by split name and with values of lists of mtrack_ids
- get_random_mtrack_splits(splits, seed=42, split_names=None)[source]¶
Split the multitracks into partitions, e.g. training, validation, test
- Parameters
splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary
- Returns
dict – a dictionary containing the elements in each split
- get_random_track_splits(splits, seed=42, split_names=None)[source]¶
Split the tracks into partitions e.g. training, validation, test
- Parameters
splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary
- Returns
dict – a dictionary containing the elements in each split
- get_track_splits()[source]¶
Get predetermined track splits (e.g. train/ test) released alongside this dataset
- Raises
AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits
- Returns
dict – splits, keyed by split name and with values of lists of track_ids
- load_audio(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.dagstuhl_choirset.load_audio
- load_beat(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.dagstuhl_choirset.load_beat
- load_f0(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.dagstuhl_choirset.load_f0
- load_multitracks()[source]¶
Load all multitracks in the dataset
- Returns
dict – {mtrack_id: multitrack data}
- Raises
NotImplementedError – If the dataset does not support Multitracks
- load_score(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.dagstuhl_choirset.load_score
- class mirdata.datasets.dagstuhl_choirset.MultiTrack(mtrack_id, data_home, dataset_name, index, track_class, metadata)[source]¶
Dagstuhl ChoirSet multitrack class
- Parameters
mtrack_id (str) – multitrack id
data_home (str) – Local path where the dataset is stored. If None, looks for the data in the default directory, ~/mir_datasets/dagstuhl_choirset
- Variables
audio_stm_path (str) – path to room mic (mono mixdown) audio file
audio_str_path (str) – path to room mic (right channel) audio file
audio_stl_path (str) – path to room mic (left channel) audio file
audio_rev_path (str) – path to room mic with artifical reverb (mono mixdown) audio file
audio_spl_path (str) – path to piano accompaniment (left channel) audio file
audio_spr_path (str) – path to piano accompaniement (right channel) audio file
beat_path (str) – path to beat annotation file
- Other Parameters
beat (annotations.BeatData) – Beat annotation
notes (annotations.NoteData) – Note annotation
multif0 (annotations.MultiF0Data) – Aggregate of f0 annotations for tracks
- property audio_rev: Optional[Tuple[numpy.ndarray, float]]¶
The audio for the room mic with artifical reverb (mono mixdown)
- Returns
np.ndarray - audio signal
float - sample rate
- property audio_spl: Optional[Tuple[numpy.ndarray, float]]¶
The audio for the piano accompaniment DI (left channel)
- Returns
np.ndarray - audio signal
float - sample rate
- property audio_spr: Optional[Tuple[numpy.ndarray, float]]¶
The audio for the piano accompaniment DI (right channel)
- Returns
np.ndarray - audio signal
float - sample rate
- property audio_stl: Optional[Tuple[numpy.ndarray, float]]¶
The audio for the room mic (left channel)
- Returns
np.ndarray - audio signal
float - sample rate
- property audio_stm: Optional[Tuple[numpy.ndarray, float]]¶
The audio for the room mic (mono mixdown)
- Returns
np.ndarray - audio signal
float - sample rate
- property audio_str: Optional[Tuple[numpy.ndarray, float]]¶
The audio for the room mic (right channel)
- Returns
np.ndarray - audio signal
float - sample rate
- get_mix()[source]¶
Create a linear mixture given a subset of tracks.
- Parameters
track_keys (list) – list of track keys to mix together
- Returns
np.ndarray – mixture audio with shape (n_samples, n_channels)
- get_path(key)[source]¶
Get absolute path to multitrack audio and annotations. Returns None if the path in the index is None
- Parameters
key (string) – Index key of the audio or annotation type
- Returns
str or None – joined path string or None
- get_random_target(n_tracks=None, min_weight=0.3, max_weight=1.0)[source]¶
Get a random target by combining a random selection of tracks with random weights
- Parameters
n_tracks (int or None) – number of tracks to randomly mix. If None, uses all tracks
min_weight (float) – minimum possible weight when mixing
max_weight (float) – maximum possible weight when mixing
- Returns
np.ndarray - mixture audio with shape (n_samples, n_channels)
list - list of keys of included tracks
list - list of weights used to mix tracks
- get_target(track_keys, weights=None, average=True, enforce_length=True)[source]¶
Get target which is a linear mixture of tracks
- Parameters
track_keys (list) – list of track keys to mix together
weights (list or None) – list of positive scalars to be used in the average
average (bool) – if True, computes a weighted average of the tracks if False, computes a weighted sum of the tracks
enforce_length (bool) – If True, raises ValueError if the tracks are not the same length. If False, pads audio with zeros to match the length of the longest track
- Returns
np.ndarray – target audio with shape (n_channels, n_samples)
- Raises
ValueError – if sample rates of the tracks are not equal if enforce_length=True and lengths are not equal
- class mirdata.datasets.dagstuhl_choirset.Track(track_id, data_home, dataset_name, index, metadata)[source]¶
Dagstuhl ChoirSet Track class
- Parameters
track_id (str) – track id of the track
- Variables
audio_dyn_path (str) – dynamic microphone audio path
audio_hsm_path (str) – headset microphone audio path
audio_lrx_path (str) – larynx microphone audio path
f0_crepe_dyn_path (str) – crepe f0 annotation for dynamic microphone path
f0_crepe_hsm_path (str) – crepe f0 annotation for headset microphone path
f0_crepe_lrx_path (str) – crepe f0 annotation for larynx microphone path
f0_pyin_dyn_path (str) – pyin f0 annotation for dynamic microphone path
f0_pyin_hsm_path (str) – pyin f0 annotation for headset microphone path
f0_pyin_lrx_path (str) – pyin f0 annotation for larynx microphone path
f0_manual_lrx_path (str) – manual f0 annotation for larynx microphone path
score_path (str) – score annotation path
- Other Parameters
f0_crepe_dyn (F0Data) – algorithm-labeled (crepe) f0 annotations for dynamic microphone
f0_crepe_hsn (F0Data) – algorithm-labeled (crepe) f0 annotations for headset microphone
f0_crepe_lrx (F0Data) – algorithm-labeled (crepe) f0 annotations for larynx microphone
f0_pyin_dyn (F0Data) – algorithm-labeled (pyin) f0 annotations for dynamic microphone
f0_pyin_hsn (F0Data) – algorithm-labeled (pyin) f0 annotations for headset microphone
f0_pyin_lrx (F0Data) – algorithm-labeled (pyin) f0 annotations for larynx microphone
f0_manual_lrx (F0Data) – manually labeled f0 annotations for larynx microphone
score (NoteData) – time-aligned score representation
- property audio_dyn: Optional[Tuple[numpy.ndarray, float]]¶
The audio for the track’s dynamic microphone (if available)
- Returns
np.ndarray - audio signal
float - sample rate
- property audio_hsm: Optional[Tuple[numpy.ndarray, float]]¶
The audio for the track’s headset microphone (if available)
- Returns
np.ndarray - audio signal
float - sample rate
- property audio_lrx: Optional[Tuple[numpy.ndarray, float]]¶
The audio for the track’s larynx microphone (if available)
- Returns
np.ndarray - audio signal
float - sample rate
- mirdata.datasets.dagstuhl_choirset.load_audio(fhandle: BinaryIO) Tuple[numpy.ndarray, float] [source]¶
Load a Dagstuhl ChoirSet audio file.
- Parameters
audio_path (str) – path pointing to an audio file
- Returns
np.ndarray - the audio signal
float - The sample rate of the audio file
- mirdata.datasets.dagstuhl_choirset.load_beat(fhandle: TextIO) mirdata.annotations.BeatData [source]¶
Load a Dagstuhl ChoirSet beat annotation.
- Parameters
fhandle (str or file-like) – File-like object or path to beat annotation file
- Returns
BeatData Object - the beat annotation
- mirdata.datasets.dagstuhl_choirset.load_f0(fhandle: TextIO) mirdata.annotations.F0Data [source]¶
Load a Dagstuhl ChoirSet F0-trajectory.
- Parameters
fhandle (str or file-like) – File-like object or path to F0 file
- Returns
F0Data Object - the F0-trajectory
- mirdata.datasets.dagstuhl_choirset.load_score(fhandle: TextIO) mirdata.annotations.NoteData [source]¶
Load a Dagstuhl ChoirSet time-aligned score representation.
- Parameters
fhandle (str or file-like) – File-like object or path to score representation file
- Returns
NoteData Object - the time-aligned score representation
dali¶
DALI Dataset Loader
Dataset Info
DALI contains 5358 audio files with their time-aligned vocal melody. It also contains time-aligned lyrics at four levels of granularity: notes, words, lines, and paragraphs.
For each song, DALI also provides additional metadata: genre, language, musician, album covers, or links to video clips.
For more details, please visit: https://github.com/gabolsgabs/DALI
- class mirdata.datasets.dali.Dataset(data_home=None, version='default')[source]¶
The dali dataset
- Variables
data_home (str) – path where mirdata will look for the dataset
version (str) –
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None) –
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack
- choice_multitrack()[source]¶
Choose a random multitrack
- Returns
Multitrack – a Multitrack object instantiated by a random mtrack_id
- choice_track()[source]¶
Choose a random track
- Returns
Track – a Track object instantiated by a random track_id
- property default_path¶
Get the default path for the dataset
- Returns
str – Local path to the dataset
- download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]¶
Download data to save_dir and optionally print a message.
- Parameters
partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception
- Raises
ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected
- get_mtrack_splits()[source]¶
Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.
- Raises
AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits
- Returns
dict – splits, keyed by split name and with values of lists of mtrack_ids
- get_random_mtrack_splits(splits, seed=42, split_names=None)[source]¶
Split the multitracks into partitions, e.g. training, validation, test
- Parameters
splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary
- Returns
dict – a dictionary containing the elements in each split
- get_random_track_splits(splits, seed=42, split_names=None)[source]¶
Split the tracks into partitions e.g. training, validation, test
- Parameters
splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary
- Returns
dict – a dictionary containing the elements in each split
- get_track_splits()[source]¶
Get predetermined track splits (e.g. train/ test) released alongside this dataset
- Raises
AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits
- Returns
dict – splits, keyed by split name and with values of lists of track_ids
- load_annotations_class(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.dali.load_annotations_class
- load_annotations_granularity(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.dali.load_annotations_granularity
- load_audio(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.dali.load_audio
- load_multitracks()[source]¶
Load all multitracks in the dataset
- Returns
dict – {mtrack_id: multitrack data}
- Raises
NotImplementedError – If the dataset does not support Multitracks
- class mirdata.datasets.dali.Track(track_id, data_home, dataset_name, index, metadata)[source]¶
DALI melody Track class
- Parameters
track_id (str) – track id of the track
- Variables
album (str) – the track’s album
annotation_path (str) – path to the track’s annotation file
artist (str) – the track’s artist
audio_path (str) – path to the track’s audio file
audio_url (str) – youtube ID
dataset_version (int) – dataset annotation version
ground_truth (bool) – True if the annotation is verified
language (str) – sung language
release_date (str) – year the track was released
scores_manual (int) – manual score annotations
scores_ncc (float) – ncc score annotations
title (str) – the track’s title
track_id (str) – the unique track id
url_working (bool) – True if the youtube url was valid
- Other Parameters
notes (NoteData) – vocal notes
words (LyricData) – word-level lyrics
lines (LyricData) – line-level lyrics
paragraphs (LyricData) – paragraph-level lyrics
annotation-object (DALI.Annotations) – DALI annotation object
- property audio: Optional[Tuple[numpy.ndarray, float]]¶
The track’s audio
- Returns
np.ndarray - audio signal
float - sample rate
- mirdata.datasets.dali.load_annotations_class(annotations_path)[source]¶
Load full annotations into the DALI class object
- Parameters
annotations_path (str) – path to a DALI annotation file
- Returns
DALI.annotations – DALI annotations object
- mirdata.datasets.dali.load_annotations_granularity(annotations_path, granularity)[source]¶
Load annotations at the specified level of granularity
- Parameters
annotations_path (str) – path to a DALI annotation file
granularity (str) – one of ‘notes’, ‘words’, ‘lines’, ‘paragraphs’
- Returns
NoteData for granularity=’notes’ or LyricData otherwise
- mirdata.datasets.dali.load_audio(fhandle: BinaryIO) Optional[Tuple[numpy.ndarray, float]] [source]¶
Load a DALI audio file.
- Parameters
fhandle (str or file-like) – path or file-like object pointing to an audio file
- Returns
np.ndarray - the mono audio signal
float - The sample rate of the audio file
da_tacos¶
Da-TACOS Dataset Loader
Dataset Info
Da-TACOS: a dataset for cover song identification and understanding. It contains two subsets, namely the benchmark subset (for benchmarking cover song identification systems) and the cover analysis subset (for analyzing the links among cover songs), with pre-extracted features and metadata for 15,000 and 10,000 songs, respectively. The annotations included in the metadata are obtained with the API of SecondHandSongs.com. All audio files we use to extract features are encoded in MP3 format and their sample rate is 44.1 kHz. Da-TACOS does not contain any audio files. For the results of our analyses on modifiable musical characteristics using the cover analysis subset and our initial benchmarking of 7 state-of-the-art cover song identification algorithms on the benchmark subset, you can look at our publication.
For organizing the data, we use the structure of SecondHandSongs where each song is called a ‘performance’, and each clique (cover group) is called a ‘work’. Based on this, the file names of the songs are their unique performance IDs (PID, e.g. P_22), and their labels with respect to their cliques are their work IDs (WID, e.g. W_14).
Metadata for each song includes:
performance title
performance artist
work title
work artist
release year
SecondHandSongs.com performance ID
SecondHandSongs.com work ID
whether the song is instrumental or not
In addition, we matched the original metadata with MusicBrainz to obtain MusicBrainz ID (MBID), song length and genre/style tags. We would like to note that MusicBrainz related information is not available for all the songs in Da-TACOS, and since we used just our metadata for matching, we include all possible MBIDs for a particular songs.
For facilitating reproducibility in cover song identification (CSI) research, we propose a framework for feature extraction and benchmarking in our supplementary repository: acoss. The feature extraction component is designed to help CSI researchers to find the most commonly used features for CSI in a single address. The parameter values we used to extract the features in Da-TACOS are shared in the same repository. Moreover, the benchmarking component includes our implementations of 7 state-of-the-art CSI systems. We provide the performance results of an initial benchmarking of those 7 systems on the benchmark subset of Da-TACOS. We encourage other CSI researchers to contribute to acoss with implementing their favorite feature extraction algorithms and their CSI systems to build up a knowledge base where CSI research can reach larger audiences.
Pre-extracted features:
The list of features included in Da-TACOS can be seen below. All the features are extracted with acoss repository that uses open-source feature extraction libraries such as Essentia, LibROSA, and Madmom.
To facilitate the use of the dataset, we provide two options regarding the file structure.
1. In da-tacos_benchmark_subset_single_files and da-tacos_coveranalysis_subset_single_files folders, we organize the data based on their respective cliques, and one file contains all the features for that particular song.
{
"chroma_cens": numpy.ndarray,
"crema": numpy.ndarray,
"hpcp": numpy.ndarray,
"key_extractor": {
"key": numpy.str_,
"scale": numpy.str_,_
"strength": numpy.float64
},
"madmom_features": {
"novfn": numpy.ndarray,
"onsets": numpy.ndarray,
"snovfn": numpy.ndarray,
"tempos": numpy.ndarray
}
"mfcc_htk": numpy.ndarray,
"tags": list of (numpy.str_, numpy.str_)
"label": numpy.str_,
"track_id": numpy.str_
}
2. In da-tacos_benchmark_subset_FEATURE and da-tacos_coveranalysis_subset_FEATURE folders, the data is organized based on their cliques as well, but each of these folders contain only one feature per song. For instance, if you want to test your system that uses HPCP features, you can download da-tacos_benchmark_subset_hpcp to access the pre-computed HPCP features. An example for the contents in those files can be seen below:
{
"hpcp": numpy.ndarray,
"label": numpy.str_,
"track_id": numpy.str_
}
- class mirdata.datasets.da_tacos.Dataset(data_home=None, version='default')[source]¶
The Da-TACOS dataset
- Variables
data_home (str) – path where mirdata will look for the dataset
version (str) –
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None) –
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack
- benchmark_tracks()[source]¶
Load from Da-TACOS dataset the benchmark subset tracks.
- Returns
dict – {track_id: track data}
- choice_multitrack()[source]¶
Choose a random multitrack
- Returns
Multitrack – a Multitrack object instantiated by a random mtrack_id
- choice_track()[source]¶
Choose a random track
- Returns
Track – a Track object instantiated by a random track_id
- coveranalysis_tracks()[source]¶
Load from Da-TACOS dataset the coveranalysis subset tracks.
- Returns
dict – {track_id: track data}
- property default_path¶
Get the default path for the dataset
- Returns
str – Local path to the dataset
- download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]¶
Download data to save_dir and optionally print a message.
- Parameters
partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception
- Raises
ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected
- filter_index(search_key)[source]¶
Load from Da-TACOS genre dataset the indexes that match with search_key.
- Parameters
search_key (str) – regex to match with folds, mbid or genres
- Returns
dict – {track_id: track data}
- get_mtrack_splits()[source]¶
Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.
- Raises
AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits
- Returns
dict – splits, keyed by split name and with values of lists of mtrack_ids
- get_random_mtrack_splits(splits, seed=42, split_names=None)[source]¶
Split the multitracks into partitions, e.g. training, validation, test
- Parameters
splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary
- Returns
dict – a dictionary containing the elements in each split
- get_random_track_splits(splits, seed=42, split_names=None)[source]¶
Split the tracks into partitions e.g. training, validation, test
- Parameters
splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary
- Returns
dict – a dictionary containing the elements in each split
- get_track_splits()[source]¶
Get predetermined track splits (e.g. train/ test) released alongside this dataset
- Raises
AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits
- Returns
dict – splits, keyed by split name and with values of lists of track_ids
- load_cens(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.da_tacos.load_cens
- load_crema(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.da_tacos.load_crema
- load_hpcp(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.da_tacos.load_hpcp
- load_key(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.da_tacos.load_key
- load_madmom(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.da_tacos.load_madmom
- load_mfcc(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.da_tacos.load_mfcc
- load_multitracks()[source]¶
Load all multitracks in the dataset
- Returns
dict – {mtrack_id: multitrack data}
- Raises
NotImplementedError – If the dataset does not support Multitracks
- load_tags(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.da_tacos.load_tags
- class mirdata.datasets.da_tacos.Track(track_id, data_home, dataset_name, index, metadata)[source]¶
da_tacos track class
- Parameters
track_id (str) – track id of the track
- Variables
subset (str) – subset which the track belongs to
work_id (str) – id of work’s original track
label (str) – alias of work_id
performance_id (str) – id of cover track
cens_path (str) – cens annotation path
crema_path (str) – crema annotation path
hpcp_path (str) – hpcp annotation path
key_path (str) – key annotation path
madmom_path (str) – madmom annotation path
mfcc_path (str) – mfcc annotation path
tags_path (str) – tags annotation path
- Properties:
work_title (str): title of the work work_artist (str): original artist of the work performance_title (str): title of the performance performance_artist (str): artist of the performance release_year (str): release year is_instrumental (bool): True if the track is instrumental performance_artist_mbid (str): musicbrainz id of the performance artist mb_performances (dict): musicbrainz ids of performances
- Other Parameters
cens (np.ndarray) – chroma-cens features
hpcp (np.ndarray) – hpcp features
key (dict) – key data, with keys ‘key’, ‘scale’, and ‘strength’
madmom (dict) – dictionary of madmom analysis features
mfcc (np.ndarray) – mfcc features
tags (list) – list of tags
- mirdata.datasets.da_tacos.load_cens(fhandle: BinaryIO)[source]¶
Load Da-TACOS cens features from a file
- Parameters
fhandle (str or file-like) – File-like object or path to chroma-cens file
- Returns
np.ndarray – cens features
- mirdata.datasets.da_tacos.load_crema(fhandle: BinaryIO)[source]¶
Load Da-TACOS crema features from a file
- Parameters
fhandle (str or file-like) – File-like object or path to crema file
- Returns
np.ndarray – crema features
- mirdata.datasets.da_tacos.load_hpcp(fhandle: BinaryIO)[source]¶
Load Da-TACOS hpcp features from a file
- Parameters
fhandle (str or file-like) – File-like object or path to hpcp file
- Returns
np.ndarray – hpcp features
- mirdata.datasets.da_tacos.load_key(fhandle: BinaryIO)[source]¶
Load Da-TACOS key features from a file.
- Parameters
fhandle (str or file-like) – File-like object or path to key file
- Returns
dict – key, mode and confidence
Examples
{‘key’: ‘C’, ‘scale’: ‘major’, ‘strength’: 0.8449875116348267}
- mirdata.datasets.da_tacos.load_madmom(fhandle: BinaryIO)[source]¶
Load Da-TACOS madmom features from a file
- Parameters
fhandle (str or file-like) – File-like object or path to madmom file
- Returns
dict – madmom features, with keys ‘novfn’, ‘onsets’, ‘snovfn’, ‘tempos
egfxset¶
EGFxSet Dataset Loader
Dataset Info
EGFxSet (Electric Guitar Effects dataset) features recordings for all clean tones in a 22-fret Stratocaster, recorded with 5 different pickup configurations, also processed through 12 popular guitar effects. Our dataset was recorded in real hardware, making it relevant for music information retrieval tasks on real music. We also include annotations for parameter settings of the effects we used.
EGFxSet is a dataset of 8,970 audio files with a 5-second duration each, summing a total time of - 12 hours and 28 minutes -.
All possible 138 notes of a standard tuning 22 frets guitar were recorded in each one of the 5 pickup configurations, giving a total of 690 clean tone audio files ( 58 min ).
The 690 clean audio (58 min) files were processed through 12 different audio effects employing actual guitar gear (no VST emulations were used), summing a total of 8,280 processed audio files (11 hours 30 min).
The effects employed were divided into four categories, and each category comprised three different effects. Sometimes there were employed more than one effect from a same guitar equipment.
Categories, Models and Effects:
- Distortion:
- Boss BD-2:
Blues Driver
- Ibanez Minitube Screamer:
Tube Screamer
- ProCo RAT2:
Distortion
- Modulation:
- Boss CE-3:
Chorus
- MXR Phase 45:
Phaser
- Mooer E-Lady:
Flanger
- Delays:
- Line6 DL-4:
Digital Delay, Tape Echo, Sweep Echo
- Reverb:
- Orange CR-60 Combo Amplifier:
Plate Reverb, Hall Reverb, Spring Reverb
Annotations are labeled by a trained electric guitar musician. For each tone, we provide:
Guitar string number
Fret number
Guitar pickup configuration
Effect name
Effect type
Hardware modes
Knob names
Knob types
Knob settings
The dataset website is: https://egfxset.github.io/
The data can be accessed here: https://zenodo.org/record/7044411#.YxKdSWzMKEI
An ISMIR extended abstract was presented in 2022: https://ismir2022.ismir.net/program/lbd/
This dataset was conceived during Iran Roman’s “Deep Learning for Music Information Retrieval” course imparted in the postgraduate studies in music technology at the UNAM (Universidad Nacional Autónoma de México). The result is a combined effort between two UNAM postgraduate students (Hegel Pedroza and Gerardo Meza) and Iran Roman(NYU).
- class mirdata.datasets.egfxset.Dataset(data_home=None, version='default')[source]¶
The EGFxSet dataset
- Variables
data_home (str) – path where mirdata will look for the dataset
version (str) –
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None) –
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack
- choice_multitrack()[source]¶
Choose a random multitrack
- Returns
Multitrack – a Multitrack object instantiated by a random mtrack_id
- choice_track()[source]¶
Choose a random track
- Returns
Track – a Track object instantiated by a random track_id
- property default_path¶
Get the default path for the dataset
- Returns
str – Local path to the dataset
- download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]¶
Download data to save_dir and optionally print a message.
- Parameters
partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception
- Raises
ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected
- get_mtrack_splits()[source]¶
Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.
- Raises
AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits
- Returns
dict – splits, keyed by split name and with values of lists of mtrack_ids
- get_random_mtrack_splits(splits, seed=42, split_names=None)[source]¶
Split the multitracks into partitions, e.g. training, validation, test
- Parameters
splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary
- Returns
dict – a dictionary containing the elements in each split
- get_random_track_splits(splits, seed=42, split_names=None)[source]¶
Split the tracks into partitions e.g. training, validation, test
- Parameters
splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary
- Returns
dict – a dictionary containing the elements in each split
- get_track_splits()[source]¶
Get predetermined track splits (e.g. train/ test) released alongside this dataset
- Raises
AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits
- Returns
dict – splits, keyed by split name and with values of lists of track_ids
- load_multitracks()[source]¶
Load all multitracks in the dataset
- Returns
dict – {mtrack_id: multitrack data}
- Raises
NotImplementedError – If the dataset does not support Multitracks
- class mirdata.datasets.egfxset.Track(track_id, data_home, dataset_name, index, metadata)[source]¶
EGFxSet Track class
- Parameters
track_id (str) – track id of the track
- Variables
audio_path (str) – path to the track’s audio file
stringfret_tuple (list) – an array with the tuple of the note recorded
pickup_configuration (string) – the pickup used in the recording
effect (str) – the effect recorded
model (str) – the model of the hardware used
effect_type (str) the type of effect used (distortion, modulation, delay or reverb) –
knob_names (list) – an array with the knob names of the effect used or “None” when the recording is a clean effect sound
knob_type (list) – an array with the type of knobs of the effect used or “None” when the recording is a clean effect sound
setting (list) – the setting of the effect recorded or “None” when the recording is a clean effect sound
- Other Parameters
note_name (list) – a list with the note name annotation of the audio file (e.g. “Ab5”, “C6”, etc.)
midinote (NoteData) – the midinote annotation of the audio file
- property audio: Optional[Tuple[numpy.ndarray, float]]¶
Solo guitar audio (mono)
- Returns
np.ndarray - audio signal
float - sample rate
- mirdata.datasets.egfxset.load_audio(fhandle: BinaryIO) Tuple[numpy.ndarray, float] [source]¶
Load EGFxSet guitar audio
- Parameters
fhandle (str or file-like) – File-like object or path to audio file
- Returns
np.ndarray - audio signal
float - sample rate
filosax¶
Filosax Dataset Loader
Dataset Info
The Filosax dataset was conceived, curated and compiled by Dave Foster (a PhD student on the AIM programme at QMUL) and his supervisor Simon Dixon (C4DM @ QMUL). The dataset is a collection of 48 multitrack jazz recordings, where each piece has 8 corresponding audio files:
The original Aebersold backing track (stereo)
Bass_Drums, a mono file of a mix of bass and drums
Piano_Drums, a mono file of a mix of piano and drums
Participant 1 Sax, a mono file of solo saxophone
Participant 2 Sax, a mono file of solo saxophone
Participant 3 Sax, a mono file of solo saxophone
Participant 4 Sax, a mono file of solo saxophone
Participant 5 Sax, a mono file of solo saxophone
Each piece is ~6mins long, so each of the 8 stems contains ~5hours of audio
For each piece, there is a corresponding .jams file containing piece-level annotations:
Beat annotation for the start of each bar and any mid-bar chord change
Chord annotation for each bar, and mid-bar chord change
- Section annotation for when the solo changes between the 3 categories:
head (melody)
written solo (interpretation of transcribed solo)
improvised solo
For each Sax recording (5 per piece), there is a corresponding .json file containing note annotations (see Note object).
The Participant folders also contain MIDI files of the transcriptions (frame level and score level) as well as a PDF and MusicXML of the typeset solo.
The dataset comes in 2 flavours: full (all 48 tracks and 5 sax players) and lite (5 tracks and 2 sax players). Both flavours can be used with or without the backing tracks (which need to be purchased online). Hence, when opening the dataset, use one of 4 versions: ‘full’, ‘full_sax’, ‘lite’, ‘lite_sax’.
- class mirdata.datasets.filosax.Dataset(data_home=None, version='default')[source]¶
The Filosax dataset
- Variables
data_home (str) – path where mirdata will look for the dataset
version (str) –
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None) –
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack
- choice_multitrack()[source]¶
Choose a random multitrack
- Returns
Multitrack – a Multitrack object instantiated by a random mtrack_id
- choice_track()[source]¶
Choose a random track
- Returns
Track – a Track object instantiated by a random track_id
- property default_path¶
Get the default path for the dataset
- Returns
str – Local path to the dataset
- download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]¶
Download data to save_dir and optionally print a message.
- Parameters
partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception
- Raises
ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected
- get_mtrack_splits()[source]¶
Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.
- Raises
AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits
- Returns
dict – splits, keyed by split name and with values of lists of mtrack_ids
- get_random_mtrack_splits(splits, seed=42, split_names=None)[source]¶
Split the multitracks into partitions, e.g. training, validation, test
- Parameters
splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary
- Returns
dict – a dictionary containing the elements in each split
- get_random_track_splits(splits, seed=42, split_names=None)[source]¶
Split the tracks into partitions e.g. training, validation, test
- Parameters
splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary
- Returns
dict – a dictionary containing the elements in each split
- get_track_splits()[source]¶
Get predetermined track splits (e.g. train/ test) released alongside this dataset
- Raises
AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits
- Returns
dict – splits, keyed by split name and with values of lists of track_ids
- load_multitracks()[source]¶
Load all multitracks in the dataset
- Returns
dict – {mtrack_id: multitrack data}
- Raises
NotImplementedError – If the dataset does not support Multitracks
- class mirdata.datasets.filosax.MultiTrack(mtrack_id, data_home, dataset_name, index, track_class, metadata)[source]¶
Filosax multitrack class
- Parameters
mtrack_id (str) – multitrack id
data_home (str) – Local path where the dataset is stored. If None, looks for the data in the default directory, ~/mir_datasets/Filosax
- Variables
mtrack_id (str) – track id
tracks (dict) – {track_id: Track}
track_audio_property (str) – the name of the attribute of Track which returns the audio to be mixed
name (str) – the name of the tune
duration (float) – the duration, in seconds
beats (list, Observation) – the time and beat numbers of bars and chord changes
chords (list, Observation) – the time of chord changes
segments (list, Observation) – the time of segment changes
bass_drums (Track) – the associated bass/drums track
piano_drums (Track) – the associated piano/drums track
sax (list, Track) – a list of associated sax tracks
- Other Parameters
annotation (jams.JAMS) – a .jams file containing the annotations
- annotation¶
.jams file
- Type
output type
- property bass_drums¶
The associated bass/drums track
- Returns
Track
- property beats¶
The times of downbeats and chord changes
- Returns
(SortedKeyList, Observation) - timestamp, duration (seconds), beat
- property chords¶
The times and values of chord changes
- Returns
(SortedKeyList, Observation) - timestamp, duration (seconds), chord symbol
- property duration¶
The track’s duration
- Returns
float - track duration (in seconds)
- get_mix()[source]¶
Create a linear mixture given a subset of tracks.
- Parameters
track_keys (list) – list of track keys to mix together
- Returns
np.ndarray – mixture audio with shape (n_samples, n_channels)
- get_path(key)[source]¶
Get absolute path to multitrack audio and annotations. Returns None if the path in the index is None
- Parameters
key (string) – Index key of the audio or annotation type
- Returns
str or None – joined path string or None
- get_random_target(n_tracks=None, min_weight=0.3, max_weight=1.0)[source]¶
Get a random target by combining a random selection of tracks with random weights
- Parameters
n_tracks (int or None) – number of tracks to randomly mix. If None, uses all tracks
min_weight (float) – minimum possible weight when mixing
max_weight (float) – maximum possible weight when mixing
- Returns
np.ndarray - mixture audio with shape (n_samples, n_channels)
list - list of keys of included tracks
list - list of weights used to mix tracks
- get_target(track_keys, weights=None, average=True, enforce_length=True)[source]¶
Get target which is a linear mixture of tracks
- Parameters
track_keys (list) – list of track keys to mix together
weights (list or None) – list of positive scalars to be used in the average
average (bool) – if True, computes a weighted average of the tracks if False, computes a weighted sum of the tracks
enforce_length (bool) – If True, raises ValueError if the tracks are not the same length. If False, pads audio with zeros to match the length of the longest track
- Returns
np.ndarray – target audio with shape (n_channels, n_samples)
- Raises
ValueError – if sample rates of the tracks are not equal if enforce_length=True and lengths are not equal
- property name¶
The track’s name
- Returns
str - track name
- property piano_drums¶
The associated piano/drums track
- Returns
Track
- property sax¶
The associated sax tracks (1-5)
- Returns
(list, Track)
- property segments¶
The times of segment changes (values are ‘head’, ‘written solo’, ‘improvised solo’)
- Returns
(SortedKeyList, Observation) - timestamp, duration (seconds), beat
- class mirdata.datasets.filosax.Note(input_dict)[source]¶
Filosax Note class - dictionary wrapper to give dot properties
- Parameters
input_dict (dict) – dictionary of attributes
- Variables
a_start_time (float) – the time stamp of the note start, in seconds
a_end_time (float) – the time stamp of the note end, in seconds
a_duration (float) – the duration of the note, in seconds
a_onset_time (float) – the onset time (compared to a_start_time) (filosax_full only, 0.0 otherwise)
midi_pitch (int) – the quantised midi pitch
crochet_num (int) – the number of sub-divisions which define a crochet (always 24)
musician (int) – the participant ID
bar_num (int) – the bar number of the start of the note
s_start_time (float) – the time stamp of the score note start, in seconds
s_duration (float) – the duration of the score note, in seconds
s_end_time (float) – the time stamp of the score note end, in seconds
s_rhythmic_duration (int) – the duration of the score note (compared to crochet_num)
s_rhythmic_position (int) – the position in the bar of the score note start (compared to crochet_num)
tempo (float) – the tempo at the start of the note, in beats per minute
bar_type (int) – the section annotation where 0 = head, 1 = written solo, 2 = improvised solo
is_grace (bool) – is the note a grace note, associated with the following note
chord_changes (dict) – the chords, where the key is the rhythmic position of the chord (using crochet_num, relative to s_rhythmic_position) and the value a JAMS chord annotation (An additional chord is added in the case of a quaver at the end of the bar, followed by a rest on the downbeat)
num_chord_changes (int) – the number of chords which accompany the note (usually 1, sometimes >1 for long notes)
main_chord_num (int) – usually 0, sometimes 1 in the quaver case described above
scale_changes (list, int) – the degree of the chromatic scale when midi_pitch is compared to chord_root
loudness_max_val (float) – the value (db) of the maximum loudness
loudness_max_time (float) – the time (seconds) of the maximum loudness (compared to a_start_time)
loudness_curve (list, float) – the inter-note loudness values, 1 per millisecond
pitch_average_val (float) – the value (midi) of the average pitch and
pitch_average_time (float) – the time (seconds) of the average pitch (compared to a_start_time)
pitch_curve (list, float) – the inter-note pitch values, 1 per millisecond
pitch_vib_freq (float) – the vibrato frequency (Hz), 0.0 if no vibrato detected
pitch_vib_ext (float) – the vibrato extent (midi), 0.0 if no vibrato detected
spec_cent (float) – the spectral centroid value at the time of the maximum loudness
spec_flux (float) – the spectral flux value at the time of the maximum loudness
spec_cent_curve (list, float) – the inter-note spectral centroid values, 1 per millisecond
spec_flux_curve (list, float) – the inter-note spectral flux values, 1 per millisecond
seq_len (int) – the length of the phrase in which the note falls (filosax_full only, -1 otherwise)
seq_num (int) – the note position in the phrase (filosax_full only, -1 otherwise)
- class mirdata.datasets.filosax.Track(track_id, data_home, dataset_name, index, metadata)[source]¶
Filosax track class
- Parameters
track_id (str) – track id of the track
- Variables
audio_path (str) – path to audio file
annotation_path (str) – path to annotation file
midi_path (str) – path to MIDI file
musicXML_path (str) – path to musicXML file
pdf_path (str) – path to PDF file
- Other Parameters
notes (list, Note) – an ordered list of Note objects
- property audio: Optional[Tuple[numpy.ndarray, float]]¶
The track’s audio
- Returns
np.ndarray - audio signal
float - sample rate
- get_path(key)[source]¶
Get absolute path to track audio and annotations. Returns None if the path in the index is None
- Parameters
key (string) – Index key of the audio or annotation type
- Returns
str or None – joined path string or None
- notes¶
The track’s note list - only for Sax files
- Returns
[Note] - ordered list of Note objects (empty if Backing file)
- mirdata.datasets.filosax.load_annotation(fhandle: TextIO) List[mirdata.datasets.filosax.Note] [source]¶
Load a Filosax annotation file.
- Parameters
fhandle (str or file-like) – path or file-like object pointing to an audio file
- Returns
** (list, Note)* – an ordered list of Note objects
- mirdata.datasets.filosax.load_audio(fhandle: BinaryIO) Tuple[numpy.ndarray, float] [source]¶
Load a Filosax audio file.
- Parameters
fhandle (str or file-like) – path or file-like object pointing to an audio file
- Returns
np.ndarray - the audio signal
float - The sample rate of the audio file
four_way_tabla¶
Four-Way Tabla Stroke Transcription and Classification Loader
Dataset Info
The Four-Way Tabla Dataset includes audio recordings of tabla solo with onset annotations for particular strokes types. This dataset was published in 2021 in the context of ISMIR2021 (Online), and may be used for tasks related to tabla analysis, including problems such as onset detection and stroke classification.
Total audio samples: We do have a total of 226 samples for training and 10 for testing. Each audio has an approximate duration of 1 minute.
Audio specifications:
Sampling frequency: 44.1 kHz
Bit-depth: 16 bit
Audio format: .wav
Dataset usage: This dataset may be used for the data-driven research of tabla stroke transcription and identification. In this dataset, four important tabla characteristic strokes are considered.
Dataset structure: The dataset is split in two subsets, containing training and testing samples. Within each subset, there is a folder containing the audios, and another folder containing the onset annotations. The onset annotations are organized in a folder per each stroke type: b, d, rb, rt. Therefore, the paths to onsets would look like:
train/onsets/<StrokeType>/<ID>.onsets
The dataset is made available by CompMusic under a Creative Commons Attribution 3.0 Unported (CC BY 3.0) License.
- class mirdata.datasets.four_way_tabla.Dataset(data_home=None, version='default')[source]¶
The Four-Way Tabla dataset
- Variables
data_home (str) – path where mirdata will look for the dataset
version (str) –
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None) –
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack
- choice_multitrack()[source]¶
Choose a random multitrack
- Returns
Multitrack – a Multitrack object instantiated by a random mtrack_id
- choice_track()[source]¶
Choose a random track
- Returns
Track – a Track object instantiated by a random track_id
- property default_path¶
Get the default path for the dataset
- Returns
str – Local path to the dataset
- download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]¶
Download data to save_dir and optionally print a message.
- Parameters
partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception
- Raises
ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected
- get_mtrack_splits()[source]¶
Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.
- Raises
AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits
- Returns
dict – splits, keyed by split name and with values of lists of mtrack_ids
- get_random_mtrack_splits(splits, seed=42, split_names=None)[source]¶
Split the multitracks into partitions, e.g. training, validation, test
- Parameters
splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary
- Returns
dict – a dictionary containing the elements in each split
- get_random_track_splits(splits, seed=42, split_names=None)[source]¶
Split the tracks into partitions e.g. training, validation, test
- Parameters
splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary
- Returns
dict – a dictionary containing the elements in each split
- get_track_splits()[source]¶
Get predetermined track splits (e.g. train/ test) released alongside this dataset
- Raises
AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits
- Returns
dict – splits, keyed by split name and with values of lists of track_ids
- load_multitracks()[source]¶
Load all multitracks in the dataset
- Returns
dict – {mtrack_id: multitrack data}
- Raises
NotImplementedError – If the dataset does not support Multitracks
- class mirdata.datasets.four_way_tabla.Track(track_id, data_home, dataset_name, index, metadata)[source]¶
Four-Way Tabla track class
- Parameters
track_id (str) – track id of the track
data_home (str) – Local path where the dataset is stored.
- Variables
track_id (str) – track id
audio_path (str) – audio path
onsets_b_path (str) – path to B onsets
onsets_d_path (str) – path to D onsets
onsets_rb_path (str) – path to RB onsets
onsets_rt_path (str) – path to RT onsets
- property audio: Optional[Tuple[numpy.ndarray, float]]¶
The track’s audio
- Returns
np.ndarray - audio signal
float - sample rate
- get_path(key)[source]¶
Get absolute path to track audio and annotations. Returns None if the path in the index is None
- Parameters
key (string) – Index key of the audio or annotation type
- Returns
str or None – joined path string or None
- property onsets_b: Optional[mirdata.annotations.BeatData]¶
Onsets for stroke B
- Returns
annotations.BeatData - onsets annotation
- property onsets_d: Optional[mirdata.annotations.BeatData]¶
Onsets for stroke D
- Returns
annotations.BeatData - onsets annotation
- property onsets_rb: Optional[mirdata.annotations.BeatData]¶
Onsets for stroke RB
- Returns
annotations.BeatData - onsets annotation
- property onsets_rt: Optional[mirdata.annotations.BeatData]¶
Onsets for stroke RT
- Returns
annotations.BeatData - onsets annotation
- mirdata.datasets.four_way_tabla.load_audio(fhandle: BinaryIO) Tuple[numpy.ndarray, float] [source]¶
Load a Mridangam Stroke Dataset audio file.
- Parameters
fhandle (str or file-like) – File-like object or path to audio file
- Returns
np.ndarray - the mono audio signal
float - The sample rate of the audio file
freesound_one_shot_percussive_sounds¶
Freesound One-Shot Percussive Sounds Dataset Loader
Dataset Info
Introduction:
This dataset contains 10254 one-shot (single event) percussive sounds from freesound.org, a timbral analysis computed by two different extractors (FreesoundExtractor from Essentia and AudioCommons Extractor), and a list of tags. There is also metadata information about the audio file, since the audio specifications are not the same along all the dataset tracks. The analysis data was used to train the generative model for “Neural Percussive Synthesis Parameterised by High-Level Timbral Features”.
Dataset Construction:
To collect this dataset, the following steps were performed: * Freesound was queried with words associated with percussive instruments, such as “percussion”, “kick”, “wood” or “clave”. Only sounds with less than one second of effective duration were selected. * This stage retrieved some audio clips that contained multiple sound events or that were of low quality. Therefore, we listened to all the retrieved sounds and manually discarded the sounds presenting one of these characteristics. For this, the percussive-annotator was used (https://github.com/xavierfav/percussive-annotator). This tool allows the user to annotate a dataset that focuses on percussive sounds. * The sounds were then cut or padded to have 1-second length, normalized and downsampled to 16kHz. * Finally, the sounds were analyzed with the AudioCommons Extractor, to obtain the AudioCommons timbral descriptors.
Authors and Contact:
This dataset was developed by António Ramires, Pritish Chadna, Xavier Favory, Emilia Gómez and Xavier Serra. Any questions related to this dataset please contact: António Ramires (antonio.ramires@upf.edu / aframires@gmail.com)
Acknowledgements:
This work has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 765068 (MIP-Frontiers). This work has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 770376 (TROMPA).
- class mirdata.datasets.freesound_one_shot_percussive_sounds.Dataset(data_home=None, version='default')[source]¶
The Freesound One-Shot Percussive Sounds dataset
- Variables
data_home (str) – path where mirdata will look for the dataset
version (str) –
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None) –
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack
- choice_multitrack()[source]¶
Choose a random multitrack
- Returns
Multitrack – a Multitrack object instantiated by a random mtrack_id
- choice_track()[source]¶
Choose a random track
- Returns
Track – a Track object instantiated by a random track_id
- property default_path¶
Get the default path for the dataset
- Returns
str – Local path to the dataset
- download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]¶
Download data to save_dir and optionally print a message.
- Parameters
partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception
- Raises
ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected
- get_mtrack_splits()[source]¶
Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.
- Raises
AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits
- Returns
dict – splits, keyed by split name and with values of lists of mtrack_ids
- get_random_mtrack_splits(splits, seed=42, split_names=None)[source]¶
Split the multitracks into partitions, e.g. training, validation, test
- Parameters
splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary
- Returns
dict – a dictionary containing the elements in each split
- get_random_track_splits(splits, seed=42, split_names=None)[source]¶
Split the tracks into partitions e.g. training, validation, test
- Parameters
splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary
- Returns
dict – a dictionary containing the elements in each split
- get_track_splits()[source]¶
Get predetermined track splits (e.g. train/ test) released alongside this dataset
- Raises
AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits
- Returns
dict – splits, keyed by split name and with values of lists of track_ids
- load_audio(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.freesound_one_shot_percussive_sounds.load_audio
- load_file_metadata(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.freesound_one_shot_percussive_sounds.load_f ile_metadata
- load_multitracks()[source]¶
Load all multitracks in the dataset
- Returns
dict – {mtrack_id: multitrack data}
- Raises
NotImplementedError – If the dataset does not support Multitracks
- class mirdata.datasets.freesound_one_shot_percussive_sounds.Track(track_id, data_home, dataset_name, index, metadata)[source]¶
Freesound one-shot percussive sounds track class
- Parameters
track_id (str) – track id of the track
data_home (str) – Local path where the dataset is stored. If None, looks for the data in the default directory, ~/mir_datasets/freesound_one_shot_percussive_sounds
- Variables
file_metadata_path (str) – local path where the analysis file is stored and from where we get the file metadata
audio_path (str) – local path where audio file is stored
track_id (str) – track id
filename (str) – filename of the track
username (str) – username of the Freesound uploader of the track
license (str) – link to license of the track file
tags (list) – list of tags of the track
freesound_preview_urls (dict) – dict of Freesound previews urls of the track
freesound_analysis (str) – dict of analysis parameters computed in Freesound using Essentia extractor
audiocommons_analysis (str) – dict of analysis parameters computed using AudioCommons Extractor
- Other Parameters
file_metadata (dict) – metadata parameters of the track file in form of Python dictionary
- property audio: Optional[Tuple[numpy.ndarray, float]]¶
The track’s audio
- Returns
np.ndarray - audio signal
float - sample rate
- mirdata.datasets.freesound_one_shot_percussive_sounds.load_audio(fhandle: BinaryIO) Tuple[numpy.ndarray, float] [source]¶
Load the track audio file.
- Parameters
fhandle (str) – path to an audio file
- Returns
np.ndarray - the mono audio signal
float - The sample rate of the audio file
giantsteps_key¶
giantsteps_key Dataset Loader
Dataset Info
The GiantSteps+ EDM Key Dataset includes 600 two-minute sound excerpts from various EDM subgenres, annotated with single-key labels, comments and confidence levels by Daniel G. Camhi, and thoroughly revised and expanded by Ángel Faraldo at MTG UPF. Additionally, 500 tracks have been thoroughly analysed, containing pitch-class set descriptions, key changes, and additional modal changes. This dataset is a revision of the original GiantSteps Key Dataset, available in Github (<https://github.com/GiantSteps/giantsteps-key-dataset>) and initially described in:
Knees, P., Faraldo, Á., Herrera, P., Vogl, R., Böck, S., Hörschläger, F., Le Goff, M. (2015).
Two Datasets for Tempo Estimation and Key Detection in Electronic Dance Music Annotated from User Corrections.
In Proceedings of the 16th International Society for Music Information Retrieval Conference, 364–370. Málaga, Spain.
The original audio samples belong to online audio snippets from Beatport, an online music store for DJ’s and Electronic Dance Music Producers (<http:www.beatport.com>). If this dataset were used in further research, we would appreciate the citation of the current DOI (10.5281/zenodo.1101082) and the following doctoral dissertation, where a detailed description of the properties of this dataset can be found:
Ángel Faraldo (2017). Tonality Estimation in Electronic Dance Music: A Computational and Musically Informed Examination.
PhD Thesis. Universitat Pompeu Fabra, Barcelona.
This dataset is mainly intended to assess the performance of computational key estimation algorithms in electronic dance music subgenres.
All the data of this dataset is licensed with Creative Commons Attribution Share Alike 4.0 International.
- class mirdata.datasets.giantsteps_key.Dataset(data_home=None, version='default')[source]¶
The giantsteps_key dataset
- Variables
data_home (str) – path where mirdata will look for the dataset
version (str) –
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None) –
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack
- choice_multitrack()[source]¶
Choose a random multitrack
- Returns
Multitrack – a Multitrack object instantiated by a random mtrack_id
- choice_track()[source]¶
Choose a random track
- Returns
Track – a Track object instantiated by a random track_id
- property default_path¶
Get the default path for the dataset
- Returns
str – Local path to the dataset
- download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]¶
Download data to save_dir and optionally print a message.
- Parameters
partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception
- Raises
ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected
- get_mtrack_splits()[source]¶
Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.
- Raises
AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits
- Returns
dict – splits, keyed by split name and with values of lists of mtrack_ids
- get_random_mtrack_splits(splits, seed=42, split_names=None)[source]¶
Split the multitracks into partitions, e.g. training, validation, test
- Parameters
splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary
- Returns
dict – a dictionary containing the elements in each split
- get_random_track_splits(splits, seed=42, split_names=None)[source]¶
Split the tracks into partitions e.g. training, validation, test
- Parameters
splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary
- Returns
dict – a dictionary containing the elements in each split
- get_track_splits()[source]¶
Get predetermined track splits (e.g. train/ test) released alongside this dataset
- Raises
AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits
- Returns
dict – splits, keyed by split name and with values of lists of track_ids
- load_artist(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.giantsteps_key.load_artist
- load_audio(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.giantsteps_key.load_audio
- load_genre(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.giantsteps_key.load_genre
- load_key(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.giantsteps_key.load_key
- load_multitracks()[source]¶
Load all multitracks in the dataset
- Returns
dict – {mtrack_id: multitrack data}
- Raises
NotImplementedError – If the dataset does not support Multitracks
- load_tempo(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.giantsteps_key.load_tempo
- class mirdata.datasets.giantsteps_key.Track(track_id, data_home, dataset_name, index, metadata)[source]¶
giantsteps_key track class
- Parameters
track_id (str) – track id of the track
- Variables
audio_path (str) – track audio path
keys_path (str) – key annotation path
metadata_path (str) – sections annotation path
title (str) – title of the track
track_id (str) – track id
- Other Parameters
key (str) – musical key annotation
artists (list) – list of artists involved
genres (dict) – genres and subgenres
tempo (int) – crowdsourced tempo annotations in beats per minute
- property audio: Tuple[numpy.ndarray, float]¶
The track’s audio
- Returns
np.ndarray - audio signal
float - sample rate
- mirdata.datasets.giantsteps_key.load_artist(fhandle: TextIO) List[str] [source]¶
Load giantsteps_key tempo data from a file
- Parameters
fhandle (str or file-like) – File-like object or path pointing to metadata annotation file
- Returns
list – list of artists involved in the track.
- mirdata.datasets.giantsteps_key.load_audio(fpath: str) Tuple[numpy.ndarray, float] [source]¶
Load a giantsteps_key audio file.
- Parameters
fpath (str) – str pointing to an audio file
- Returns
np.ndarray - the mono audio signal
float - The sample rate of the audio file
- mirdata.datasets.giantsteps_key.load_genre(fhandle: TextIO) Dict[str, List[str]] [source]¶
Load giantsteps_key genre data from a file
- Parameters
fhandle (str or file-like) – File-like object or path pointing to metadata annotation file
- Returns
dict – {‘genres’: […], ‘subgenres’: […]}
giantsteps_tempo¶
giantsteps_tempo Dataset Loader
Dataset Info
GiantSteps tempo + genre is a collection of annotations for 664 2min(1) audio previews from www.beatport.com, created by Richard Vogl <richard.vogl@tuwien.ac.at> and Peter Knees <peter.knees@tuwien.ac.at>
references:
- giantsteps_tempo_cit_1
Peter Knees, Ángel Faraldo, Perfecto Herrera, Richard Vogl, Sebastian Böck, Florian Hörschläger, Mickael Le Goff: “Two data sets for tempo estimation and key detection in electronic dance music annotated from user corrections”, Proc. of the 16th Conference of the International Society for Music Information Retrieval (ISMIR’15), Oct. 2015, Malaga, Spain.
- giantsteps_tempo_cit_2
Hendrik Schreiber, Meinard Müller: “A Crowdsourced Experiment for Tempo Estimation of Electronic Dance Music”, Proc. of the 19th Conference of the International Society for Music Information Retrieval (ISMIR’18), Sept. 2018, Paris, France.
The audio files (664 files, size ~1gb) can be downloaded from http://www.beatport.com/ using the bash script:
https://github.com/GiantSteps/giantsteps-tempo-dataset/blob/master/audio_dl.sh
To download the files manually use links of the following form: http://geo-samples.beatport.com/lofi/<name of mp3 file> e.g.: http://geo-samples.beatport.com/lofi/5377710.LOFI.mp3
To convert the audio files to .wav use the script found at https://github.com/GiantSteps/giantsteps-tempo-dataset/blob/master/convert_audio.sh and run:
./convert_audio.sh
To retrieve the genre information, the JSON contained within the website was parsed. The tempo annotation was extracted from forum entries of people correcting the bpm values (i.e. manual annotation of tempo). For more information please refer to the publication [giantsteps_tempo_cit_1].
[giantsteps_tempo_cit_2] found some files without tempo. There are:
3041381.LOFI.mp3
3041383.LOFI.mp3
1327052.LOFI.mp3
Their v2 tempo is denoted as 0.0 in tempo and mirex and has no annotation in the JAMS format.
Most of the audio files are 120 seconds long. Exceptions are:
name length (sec)
906760.LOFI.mp3 62
1327052.LOFI.mp3 70
4416506.LOFI.mp3 80
1855660.LOFI.mp3 119
3419452.LOFI.mp3 119
3577631.LOFI.mp3 119
- class mirdata.datasets.giantsteps_tempo.Dataset(data_home=None, version='default')[source]¶
The giantsteps_tempo dataset
- Variables
data_home (str) – path where mirdata will look for the dataset
version (str) –
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None) –
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack
- choice_multitrack()[source]¶
Choose a random multitrack
- Returns
Multitrack – a Multitrack object instantiated by a random mtrack_id
- choice_track()[source]¶
Choose a random track
- Returns
Track – a Track object instantiated by a random track_id
- property default_path¶
Get the default path for the dataset
- Returns
str – Local path to the dataset
- download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]¶
Download data to save_dir and optionally print a message.
- Parameters
partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception
- Raises
ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected
- get_mtrack_splits()[source]¶
Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.
- Raises
AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits
- Returns
dict – splits, keyed by split name and with values of lists of mtrack_ids
- get_random_mtrack_splits(splits, seed=42, split_names=None)[source]¶
Split the multitracks into partitions, e.g. training, validation, test
- Parameters
splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary
- Returns
dict – a dictionary containing the elements in each split
- get_random_track_splits(splits, seed=42, split_names=None)[source]¶
Split the tracks into partitions e.g. training, validation, test
- Parameters
splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary
- Returns
dict – a dictionary containing the elements in each split
- get_track_splits()[source]¶
Get predetermined track splits (e.g. train/ test) released alongside this dataset
- Raises
AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits
- Returns
dict – splits, keyed by split name and with values of lists of track_ids
- load_audio(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.giantsteps_tempo.load_audio
- load_genre(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.giantsteps_tempo.load_genre
- load_multitracks()[source]¶
Load all multitracks in the dataset
- Returns
dict – {mtrack_id: multitrack data}
- Raises
NotImplementedError – If the dataset does not support Multitracks
- load_tempo(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.giantsteps_tempo.load_tempo
- class mirdata.datasets.giantsteps_tempo.Track(track_id, data_home, dataset_name, index, metadata)[source]¶
giantsteps_tempo track class
- Parameters
track_id (str) – track id of the track
- Variables
audio_path (str) – track audio path
title (str) – title of the track
track_id (str) – track id
annotation_v1_path (str) – track annotation v1 path
annotation_v2_path (str) – track annotation v2 path
- Other Parameters
genre (dict) – Human-labeled metadata annotation
tempo (list) – List of annotations.TempoData, ordered by confidence
tempo_v2 (list) – List of annotations.TempoData for version 2, ordered by confidence
- property audio: Tuple[numpy.ndarray, float]¶
The track’s audio
- Returns
np.ndarray - audio signal
float - sample rate
- get_path(key)[source]¶
Get absolute path to track audio and annotations. Returns None if the path in the index is None
- Parameters
key (string) – Index key of the audio or annotation type
- Returns
str or None – joined path string or None
- mirdata.datasets.giantsteps_tempo.load_audio(fhandle: str) Tuple[numpy.ndarray, float] [source]¶
Load a giantsteps_tempo audio file.
- Parameters
fhandle (str or file-like) – path to audio file
- Returns
np.ndarray - the mono audio signal
float - The sample rate of the audio file
- mirdata.datasets.giantsteps_tempo.load_genre(fhandle: TextIO) str [source]¶
Load genre data from a file
- Parameters
path (str) – path to metadata annotation file
- Returns
str – loaded genre data
- mirdata.datasets.giantsteps_tempo.load_tempo(fhandle: TextIO) mirdata.annotations.TempoData [source]¶
Load giantsteps_tempo tempo data from a file ordered by confidence
- Parameters
fhandle (str or file-like) – File-like object or path to tempo annotation file
- Returns
annotations.TempoData – Tempo data
good_sounds¶
Good-Sounds Dataset Loader
Dataset Info
The Good-Sounds dataset is born of the collaboration between the Music Technology Group and Korg. Good-Sounds [2, 16] is carried out recording a training dataset of single note excerpts including six classes of sounds per studied instrument. Twelve different instruments are recorded, as is shown in Table 2. For each instrument, the complete range of playable semitones is captured several times with various tonal characteristics. There are two classes: Good and Bad sounds. Bad sounds are divided into five sub-classes, one for each musical dimension stated by the expert musicians. Bad sounds are composed by examples of note recordings that are intentionally badly played. The last class includes examples of note recordings that are considered to be well played.
This dataset was created in the context of the Pablo project, partially funded by KORG Inc. It contains monophonic recordings of two kind of exercises: single notes and scales. The recordings were made in the Universitat Pompeu Fabra / Phonos recording studio by 15 different professional musicians, all of them holding a music degree and having some expertise in teaching. 12 different instruments were recorded using one or up to 4 different microphones (depending on the recording session). For all the instruments the whole set of playable semitones in the instrument is recorded several times with different tonal characteristics. Each note is recorded into a separate mono .flac audio file of 48kHz and 32 bits. The tonal characteristics are explained both in the the following section and the related publication. The database is meant for organizing the sounds in a handy way. It is organised in four different entities: sounds, takes, packs and ratings.
- class mirdata.datasets.good_sounds.Dataset(data_home=None, version='default')[source]¶
The GOOD-SOUNDS dataset
- Variables
data_home (str) – path where mirdata will look for the dataset
version (str) –
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None) –
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack
- choice_multitrack()[source]¶
Choose a random multitrack
- Returns
Multitrack – a Multitrack object instantiated by a random mtrack_id
- choice_track()[source]¶
Choose a random track
- Returns
Track – a Track object instantiated by a random track_id
- property default_path¶
Get the default path for the dataset
- Returns
str – Local path to the dataset
- download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]¶
Download data to save_dir and optionally print a message.
- Parameters
partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception
- Raises
ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected
- get_mtrack_splits()[source]¶
Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.
- Raises
AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits
- Returns
dict – splits, keyed by split name and with values of lists of mtrack_ids
- get_random_mtrack_splits(splits, seed=42, split_names=None)[source]¶
Split the multitracks into partitions, e.g. training, validation, test
- Parameters
splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary
- Returns
dict – a dictionary containing the elements in each split
- get_random_track_splits(splits, seed=42, split_names=None)[source]¶
Split the tracks into partitions e.g. training, validation, test
- Parameters
splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary
- Returns
dict – a dictionary containing the elements in each split
- get_track_splits()[source]¶
Get predetermined track splits (e.g. train/ test) released alongside this dataset
- Raises
AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits
- Returns
dict – splits, keyed by split name and with values of lists of track_ids
- load_audio(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.good_sounds.load_audio
- load_multitracks()[source]¶
Load all multitracks in the dataset
- Returns
dict – {mtrack_id: multitrack data}
- Raises
NotImplementedError – If the dataset does not support Multitracks
- class mirdata.datasets.good_sounds.Track(track_id, data_home, dataset_name, index, metadata)[source]¶
GOOD-SOUNDS Track class
- Parameters
track_id (str) – track id of the track
- Variables
audio_path (str) – Path to the audio file
- Other Parameters
ratings_info (dict) – A dictionary containing the entity Ratings.
Some musicians self-rated their performance in a 0-10 goodness scale for the user evaluation of the first project
prototype. Please read the paper for more detailed information. –
id
mark: the rate or score.
type: the klass of the sound. Related to the tags of the sound.
created_at
comments
sound_id
rater: the musician who rated the sound.
pack_info (dict) – A dictionary containing the entity Pack. A pack is a group of sounds from the same recording session. The audio files are organised in the sound_files directory in subfolders with the pack name to which they belong. The following metadata is associated with the entity Pack. - id - name - description
sound_info (dict) – A dictionary containing the entity Sound. A sound can have several takes as some of them were recorded using different microphones at the same time. The following metadata is associated with the entity Sound. - id - instrument: flute, cello, clarinet, trumpet, violin, sax_alto, sax_tenor, sax_baritone, sax_soprano, oboe, piccolo, bass - note - octave - dynamics: for some sounds, the musical notation of the loudness level (p, mf, f..) - recorded_at: recording date and time - location: recording place - player: the musician who recorded. For detailed information about the musicians contact us. - bow_velocity: for some string instruments the velocity of the bow (slow, medieum, fast) - bridge_position: for some string instruments the position of the bow (tasto, middle, ponticello) - string: for some string instruments the number of the string in which the sound it’s played (1: lowest in pitch) - csv_file: used for creation of the DB - csv_id: used for creation of the DB - pack_filename: used for creation of the DB - pack_id: used for creation of the DB - attack: for single notes, manual annotation of the onset in samples. - decay: for single notes, manual annotation of the decay in samples. - sustain: for single notes, manual annotation of the beginnig of the sustained part in samples. - release: for single notes, manual annotation of the beginnig of the release part in samples. - offset: for single notes, manual annotation of the offset in samples - reference: 1 if sound was used to create the models in the good-sounds project, 0 if not. - Other tags regarding tonal characteristics are also available. - comments: if any - semitone: midi note - pitch_reference: the reference pitch - klass: user generated tags of the tonal qualities of the sound. They also contain information about the exercise, that could be single note or scale. * “good-sound”: good examples of single note * “bad”: bad example of one of the sound attributes defined in the project (please read the papers for a detailed explanation) * “scale-good”: good example of a one octave scale going up and down (15 notes). If the scale is minor a tagged “minor” is also available. * “scale-bad”: bad example scale of one of the sounds defined in the project. (15 notes up and down).
take_info (dict) – A dictionary containing the entity Take. A sound can have several takes as some of them were recorded using different microphones at the same time. Each take has an associated audio file. The annotations. Each take has an associated audio file. The following metadata is associated with the entity Sound. - id - microphone - filename: the name of the associated audio file - original_filename: - freesound_id: for some sounds uploaded to freesound.org - sound_id: the id of the sound in the DB - goodsound_id: for some of the sounds available in good-sounds.org
microphone (str) – the microphone used to record the take.
instrument (str) – the instrument recorded (flute, cello, clarinet, trumpet, violin, sax_alto, sax_tenor, sax_baritone, sax_soprano, oboe, piccolo, bass).
klass (str) – user generated tags of the tonal qualities of the sound. They also contain information about the exercise, that could be single note or scale. * “good-sound”: good examples of single note * “bad”: bad example of one of the sound attributes defined in the project (please read the papers for a detailed explanation) * “scale-good”: good example of a one octave scale going up and down (15 notes). If the scale is minor a tagged “minor” is also available. * “scale-bad”: bad example scale of one of the sounds defined in the project. (15 notes up and down).
semitone (int) – midi note
pitch_reference (int) – the reference pitch
- property audio: Optional[Tuple[numpy.ndarray, float]]¶
The track’s audio
- Returns
np.ndarray - audio signal
float - sample rate
- mirdata.datasets.good_sounds.load_audio(fhandle: BinaryIO) Tuple[numpy.ndarray, float] [source]¶
Load a GOOD-SOUNDS audio file.
- Parameters
fhandle (str or file-like) – path or file-like object pointing to an audio file
- Returns
np.ndarray - the mono audio signal
float - The sample rate of the audio file
groove_midi¶
Groove MIDI Loader
Dataset Info
The Groove MIDI Dataset (GMD) is composed of 13.6 hours of aligned MIDI and synthesized audio of human-performed, tempo-aligned expressive drumming. The dataset contains 1,150 MIDI files and over 22,000 measures of drumming.
To enable a wide range of experiments and encourage comparisons between methods on the same data, Gillick et al. created a new dataset of drum performances recorded in MIDI format. They hired professional drummers and asked them to perform in multiple styles to a click track on a Roland TD-11 electronic drum kit. They also recorded the aligned, high-quality synthesized audio from the TD-11 and include it in the release.
The Groove MIDI Dataset (GMD), has several attributes that distinguish it from existing ones:
The dataset contains about 13.6 hours, 1,150 MIDI files, and over 22,000 measures of drumming.
Each performance was played along with a metronome set at a specific tempo by the drummer.
The data includes performances by a total of 10 drummers, with more than 80% of duration coming from hired professionals. The professionals were able to improvise in a wide range of styles, resulting in a diverse dataset.
The drummers were instructed to play a mix of long sequences (several minutes of continuous playing) and short beats and fills.
Each performance is annotated with a genre (provided by the drummer), tempo, and anonymized drummer ID.
Most of the performances are in 4/4 time, with a few examples from other time signatures.
Four drummers were asked to record the same set of 10 beats in their own style. These are included in the test set split, labeled eval-session/groove1-10.
In addition to the MIDI recordings that are the primary source of data for the experiments in this work, the authors captured the synthesized audio outputs of the drum set and aligned them to within 2ms of the corresponding MIDI files.
A train/validation/test split configuration is provided for easier comparison of model accuracy on various tasks.
The dataset is made available by Google LLC under a Creative Commons Attribution 4.0 International (CC BY 4.0) License.
For more details, please visit: http://magenta.tensorflow.org/datasets/groove
- class mirdata.datasets.groove_midi.Dataset(data_home=None, version='default')[source]¶
The groove_midi dataset
- Variables
data_home (str) – path where mirdata will look for the dataset
version (str) –
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None) –
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack
- choice_multitrack()[source]¶
Choose a random multitrack
- Returns
Multitrack – a Multitrack object instantiated by a random mtrack_id
- choice_track()[source]¶
Choose a random track
- Returns
Track – a Track object instantiated by a random track_id
- property default_path¶
Get the default path for the dataset
- Returns
str – Local path to the dataset
- download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]¶
Download data to save_dir and optionally print a message.
- Parameters
partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception
- Raises
ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected
- get_mtrack_splits()[source]¶
Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.
- Raises
AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits
- Returns
dict – splits, keyed by split name and with values of lists of mtrack_ids
- get_random_mtrack_splits(splits, seed=42, split_names=None)[source]¶
Split the multitracks into partitions, e.g. training, validation, test
- Parameters
splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary
- Returns
dict – a dictionary containing the elements in each split
- get_random_track_splits(splits, seed=42, split_names=None)[source]¶
Split the tracks into partitions e.g. training, validation, test
- Parameters
splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary
- Returns
dict – a dictionary containing the elements in each split
- get_track_splits()[source]¶
Get predetermined track splits (e.g. train/ test) released alongside this dataset
- Raises
AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits
- Returns
dict – splits, keyed by split name and with values of lists of track_ids
- load_audio(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.groove_midi.load_audio
- load_beats(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.groove_midi.load_beats
- load_drum_events(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.groove_midi.load_drum_events
- load_midi(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.groove_midi.load_midi
- load_multitracks()[source]¶
Load all multitracks in the dataset
- Returns
dict – {mtrack_id: multitrack data}
- Raises
NotImplementedError – If the dataset does not support Multitracks
- class mirdata.datasets.groove_midi.Track(track_id, data_home, dataset_name, index, metadata)[source]¶
Groove MIDI Track class
- Parameters
track_id (str) – track id of the track
- Variables
drummer (str) – Drummer id of the track (ex. ‘drummer1’)
session (str) – Type of session (ex. ‘session1’, ‘eval_session’)
track_id (str) – track id of the track (ex. ‘drummer1/eval_session/1’)
style (str) – Style (genre, groove type) of the track (ex. ‘funk/groove1’)
tempo (int) – track tempo in beats per minute (ex. 138)
beat_type (str) – Whether the track is a beat or a fill (ex. ‘beat’)
time_signature (str) – Time signature of the track (ex. ‘4-4’, ‘6-8’)
midi_path (str) – Path to the midi file
audio_path (str) – Path to the audio file
duration (float) – Duration of the midi file in seconds
split (str) – Whether the track is for a train/valid/test set. One of ‘train’, ‘valid’ or ‘test’.
- Other Parameters
beats (BeatData) – Machine-generated beat annotations
drum_events (EventData) – Annotated drum kit events
midi (pretty_midi.PrettyMIDI) – object containing MIDI information
- property audio: Tuple[Optional[numpy.ndarray], Optional[float]]¶
The track’s audio
- Returns
np.ndarray - audio signal
float - sample rate
- mirdata.datasets.groove_midi.load_audio(path: str) Tuple[Optional[numpy.ndarray], Optional[float]] [source]¶
Load a Groove MIDI audio file.
- Parameters
path – path to an audio file
- Returns
np.ndarray - the mono audio signal
float - The sample rate of the audio file
- mirdata.datasets.groove_midi.load_beats(midi_path, midi=None)[source]¶
Load beat data from the midi file.
- Parameters
midi_path (str) – path to midi file
midi (pretty_midi.PrettyMIDI) – pre-loaded midi object or None if None, the midi object is loaded using midi_path
- Returns
annotations.BeatData – machine generated beat data
- mirdata.datasets.groove_midi.load_drum_events(midi_path, midi=None)[source]¶
Load drum events from the midi file.
- Parameters
midi_path (str) – path to midi file
midi (pretty_midi.PrettyMIDI) – pre-loaded midi object or None if None, the midi object is loaded using midi_path
- Returns
annotations.EventData – drum event data
- mirdata.datasets.groove_midi.load_midi(fhandle: BinaryIO) Optional[pretty_midi.PrettyMIDI] [source]¶
Load a Groove MIDI midi file.
- Parameters
fhandle (str or file-like) – File-like object or path to midi file
- Returns
midi_data (pretty_midi.PrettyMIDI) – pretty_midi object
gtzan_genre¶
GTZAN-Genre Dataset Loader
Dataset Info
This dataset was used for the well known genre classification paper:
"Musical genre classification of audio signals " by G. Tzanetakis and
P. Cook in IEEE Transactions on Audio and Speech Processing 2002.
The dataset consists of 1000 audio tracks each 30 seconds long. It contains 10 genres, each represented by 100 tracks. The tracks are all 22050 Hz mono 16-bit audio files in .wav format.
- class mirdata.datasets.gtzan_genre.Dataset(data_home=None, version='default')[source]¶
The gtzan_genre dataset
- Variables
data_home (str) – path where mirdata will look for the dataset
version (str) –
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None) –
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack
- choice_multitrack()[source]¶
Choose a random multitrack
- Returns
Multitrack – a Multitrack object instantiated by a random mtrack_id
- choice_track()[source]¶
Choose a random track
- Returns
Track – a Track object instantiated by a random track_id
- property default_path¶
Get the default path for the dataset
- Returns
str – Local path to the dataset
- download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]¶
Download data to save_dir and optionally print a message.
- Parameters
partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception
- Raises
ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected
- get_mtrack_splits()[source]¶
Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.
- Raises
AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits
- Returns
dict – splits, keyed by split name and with values of lists of mtrack_ids
- get_random_mtrack_splits(splits, seed=42, split_names=None)[source]¶
Split the multitracks into partitions, e.g. training, validation, test
- Parameters
splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary
- Returns
dict – a dictionary containing the elements in each split
- get_random_track_splits(splits, seed=42, split_names=None)[source]¶
Split the tracks into partitions e.g. training, validation, test
- Parameters
splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary
- Returns
dict – a dictionary containing the elements in each split
- get_track_splits()[source]¶
Get predetermined track splits (e.g. train/ test) released alongside this dataset
- Raises
AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits
- Returns
dict – splits, keyed by split name and with values of lists of track_ids
- load_audio(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.gtzan_genre.load_audio
- load_multitracks()[source]¶
Load all multitracks in the dataset
- Returns
dict – {mtrack_id: multitrack data}
- Raises
NotImplementedError – If the dataset does not support Multitracks
- class mirdata.datasets.gtzan_genre.Track(track_id, data_home, dataset_name, index, metadata)[source]¶
gtzan_genre Track class
- Parameters
track_id (str) – track id of the track
- Variables
audio_path (str) – path to the audio file
genre (str) – annotated genre
track_id (str) – track id
- Other Parameters
beats (BeatData) – human-labeled beat annotations
tempo (float) – global tempo annotations
- property audio: Optional[Tuple[numpy.ndarray, float]]¶
The track’s audio
- Returns
np.ndarray - audio signal
float - sample rate
- mirdata.datasets.gtzan_genre.load_audio(fhandle: BinaryIO) Tuple[numpy.ndarray, float] [source]¶
Load a GTZAN audio file.
- Parameters
fhandle (str or file-like) – File-like object or path to audio file
- Returns
np.ndarray - the mono audio signal
float - The sample rate of the audio file
- mirdata.datasets.gtzan_genre.load_beats(fhandle: TextIO) mirdata.annotations.BeatData [source]¶
Load GTZAN format beat data from a file
- Parameters
fhandle (str or file-like) – path or file-like object pointing to a beat annotation file
- Returns
BeatData – loaded beat data
guitarset¶
GuitarSet Loader
Dataset Info
GuitarSet provides audio recordings of a variety of musical excerpts played on an acoustic guitar, along with time-aligned annotations including pitch contours, string and fret positions, chords, beats, downbeats, and keys.
GuitarSet contains 360 excerpts that are close to 30 seconds in length. The 360 excerpts are the result of the following combinations:
6 players
2 versions: comping (harmonic accompaniment) and soloing (melodic improvisation)
5 styles: Rock, Singer-Songwriter, Bossa Nova, Jazz, and Funk
3 Progressions: 12 Bar Blues, Autumn Leaves, and Pachelbel Canon.
2 Tempi: slow and fast.
The tonality (key) of each excerpt is sampled uniformly at random.
GuitarSet was recorded with the help of a hexaphonic pickup, which outputs signals for each string separately, allowing automated note-level annotation. Excerpts are recorded with both the hexaphonic pickup and a Neumann U-87 condenser microphone as reference. 3 audio recordings are provided with each excerpt with the following suffix:
hex: original 6 channel wave file from hexaphonic pickup
hex_cln: hex wave files with interference removal applied
mic: monophonic recording from reference microphone
mix: monophonic mixture of original 6 channel file
Each of the 360 excerpts has an accompanying JAMS file which stores 16 annotations. Pitch:
6 pitch_contour annotations (1 per string)
6 midi_note annotations (1 per string)
Beat and Tempo:
1 beat_position annotation
1 tempo annotation
Chords:
2 chord annotations: instructed and performed. The instructed chord annotation is a digital version of the lead sheet that’s provided to the player, and the performed chord annotations are inferred from note annotations, using segmentation and root from the digital lead sheet annotation.
For more details, please visit: http://github.com/marl/guitarset/
- class mirdata.datasets.guitarset.Dataset(data_home=None, version='default')[source]¶
The guitarset dataset
- Variables
data_home (str) – path where mirdata will look for the dataset
version (str) –
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None) –
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack
- choice_multitrack()[source]¶
Choose a random multitrack
- Returns
Multitrack – a Multitrack object instantiated by a random mtrack_id
- choice_track()[source]¶
Choose a random track
- Returns
Track – a Track object instantiated by a random track_id
- property default_path¶
Get the default path for the dataset
- Returns
str – Local path to the dataset
- download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]¶
Download data to save_dir and optionally print a message.
- Parameters
partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception
- Raises
ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected
- get_mtrack_splits()[source]¶
Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.
- Raises
AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits
- Returns
dict – splits, keyed by split name and with values of lists of mtrack_ids
- get_random_mtrack_splits(splits, seed=42, split_names=None)[source]¶
Split the multitracks into partitions, e.g. training, validation, test
- Parameters
splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary
- Returns
dict – a dictionary containing the elements in each split
- get_random_track_splits(splits, seed=42, split_names=None)[source]¶
Split the tracks into partitions e.g. training, validation, test
- Parameters
splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary
- Returns
dict – a dictionary containing the elements in each split
- get_track_splits()[source]¶
Get predetermined track splits (e.g. train/ test) released alongside this dataset
- Raises
AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits
- Returns
dict – splits, keyed by split name and with values of lists of track_ids
- load_audio(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.guitarset.load_audio
- load_beats(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.guitarset.load_beats
- load_chords(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.guitarset.load_chords
- load_key_mode(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.guitarset.load_key_mode
- load_multitrack_audio(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.guitarset.load_multitrack_audio
- load_multitracks()[source]¶
Load all multitracks in the dataset
- Returns
dict – {mtrack_id: multitrack data}
- Raises
NotImplementedError – If the dataset does not support Multitracks
- load_notes(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.guitarset.load_notes
- load_pitch_contour(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.guitarset.load_pitch_contour
- class mirdata.datasets.guitarset.Track(track_id, data_home, dataset_name, index, metadata)[source]¶
guitarset Track class
- Parameters
track_id (str) – track id of the track
- Variables
audio_hex_cln_path (str) – path to the debleeded hex wave file
audio_hex_path (str) – path to the original hex wave file
audio_mic_path (str) – path to the mono wave via microphone
audio_mix_path (str) – path to the mono wave via downmixing hex pickup
jams_path (str) – path to the jams file
mode (str) – one of [‘solo’, ‘comp’] For each excerpt, players are asked to first play in ‘comp’ mode and later play a ‘solo’ version on top of the already recorded comp.
player_id (str) – ID of the different players. one of [‘00’, ‘01’, … , ‘05’]
style (str) – one of [‘Jazz’, ‘Bossa Nova’, ‘Rock’, ‘Singer-Songwriter’, ‘Funk’]
tempo (float) – BPM of the track
track_id (str) – track id
- Other Parameters
beats (BeatData) – beat positions
leadsheet_chords (ChordData) – chords as written in the leadsheet
inferred_chords (ChordData) – chords inferred from played transcription
key_mode (KeyData) – key and mode
pitch_contours (dict) – Pitch contours per string - ‘E’: F0Data(…) - ‘A’: F0Data(…) - ‘D’: F0Data(…) - ‘G’: F0Data(…) - ‘B’: F0Data(…) - ‘e’: F0Data(…)
multif0 (MultiF0Data) – all pitch contour data as one multif0 annotation
notes (dict) – Notes per string - ‘E’: NoteData(…) - ‘A’: NoteData(…) - ‘D’: NoteData(…) - ‘G’: NoteData(…) - ‘B’: NoteData(…) - ‘e’: NoteData(…)
notes_all (NoteData) – all note data as one note annotation
- property audio_hex: Optional[Tuple[numpy.ndarray, float]]¶
Hexaphonic audio (6-channels) with one channel per string
- Returns
np.ndarray - audio signal
float - sample rate
- property audio_hex_cln: Optional[Tuple[numpy.ndarray, float]]¶
- Hexaphonic audio (6-channels) with one channel per string
after bleed removal
- Returns
np.ndarray - audio signal
float - sample rate
- property audio_mic: Optional[Tuple[numpy.ndarray, float]]¶
The track’s audio
- Returns
np.ndarray - audio signal
float - sample rate
- property audio_mix: Optional[Tuple[numpy.ndarray, float]]¶
Mixture audio (mono)
- Returns
np.ndarray - audio signal
float - sample rate
- mirdata.datasets.guitarset.load_audio(fhandle: BinaryIO) Tuple[numpy.ndarray, float] [source]¶
Load a Guitarset audio file.
- Parameters
fhandle (str or file-like) – File-like object or path to audio file
- Returns
np.ndarray - the mono audio signal
float - The sample rate of the audio file
- mirdata.datasets.guitarset.load_beats(fhandle: TextIO) mirdata.annotations.BeatData [source]¶
Load a Guitarset beats annotation.
- Parameters
fhandle (str or file-like) – File-like object or path of the jams annotation file
- Returns
BeatData – Beat data
- mirdata.datasets.guitarset.load_chords(jams_path, leadsheet_version)[source]¶
Load a guitarset chord annotation.
- Parameters
jams_path (str) – path to the jams annotation file
leadsheet_version (Bool) – Whether or not to load the leadsheet version of the chord annotation If False, load the infered version.
- Returns
ChordData – Chord data
- mirdata.datasets.guitarset.load_key_mode(fhandle: TextIO) mirdata.annotations.KeyData [source]¶
Load a Guitarset key-mode annotation.
- Parameters
fhandle (str or file-like) – File-like object or path of the jams annotation file
- Returns
KeyData – Key data
- mirdata.datasets.guitarset.load_multitrack_audio(fhandle: BinaryIO) Tuple[numpy.ndarray, float] [source]¶
Load a Guitarset multitrack audio file.
- Parameters
fhandle (str or file-like) – File-like object or path to audio file
- Returns
np.ndarray - the mono audio signal
float - The sample rate of the audio file
- mirdata.datasets.guitarset.load_notes(jams_path, string_num)[source]¶
Load a guitarset note annotation for a given string
- Parameters
jams_path (str) – path to the jams annotation file
string_num (int), in range(6) – Which string to load. 0 is the Low E string, 5 is the high e string.
- Returns
NoteData – Note data for the given string
- mirdata.datasets.guitarset.load_pitch_contour(jams_path, string_num)[source]¶
Load a guitarset pitch contour annotation for a given string
- Parameters
jams_path (str) – path to the jams annotation file
string_num (int), in range(6) – Which string to load. 0 is the Low E string, 5 is the high e string.
- Returns
F0Data – Pitch contour data for the given string
haydn_op20¶
haydn op20 Dataset Loader
Dataset Info
This dataset accompanies the Master Thesis from Nestor Napoles. It is a manually-annotated corpus of harmonic analysis in harm syntax.
This dataset contains 30 pieces composed by Joseph Haydn in symbolic format, which have each been manually annotated with harmonic analyses.
- class mirdata.datasets.haydn_op20.Dataset(data_home=None, version='default')[source]¶
The haydn op20 dataset
- Variables
data_home (str) – path where mirdata will look for the dataset
version (str) –
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None) –
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack
- choice_multitrack()[source]¶
Choose a random multitrack
- Returns
Multitrack – a Multitrack object instantiated by a random mtrack_id
- choice_track()[source]¶
Choose a random track
- Returns
Track – a Track object instantiated by a random track_id
- property default_path¶
Get the default path for the dataset
- Returns
str – Local path to the dataset
- download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]¶
Download data to save_dir and optionally print a message.
- Parameters
partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception
- Raises
ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected
- get_mtrack_splits()[source]¶
Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.
- Raises
AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits
- Returns
dict – splits, keyed by split name and with values of lists of mtrack_ids
- get_random_mtrack_splits(splits, seed=42, split_names=None)[source]¶
Split the multitracks into partitions, e.g. training, validation, test
- Parameters
splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary
- Returns
dict – a dictionary containing the elements in each split
- get_random_track_splits(splits, seed=42, split_names=None)[source]¶
Split the tracks into partitions e.g. training, validation, test
- Parameters
splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary
- Returns
dict – a dictionary containing the elements in each split
- get_track_splits()[source]¶
Get predetermined track splits (e.g. train/ test) released alongside this dataset
- Raises
AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits
- Returns
dict – splits, keyed by split name and with values of lists of track_ids
- load_chords(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.haydn_op20.load_chords
- load_chords_music21(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.haydn_op20.load_chords_music21
- load_key(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.haydn_op20.load_key
- load_key_music21(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.haydn_op20.load_key_music21
- load_midi_path(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.haydn_op20.convert_and_save_to_midi
- load_multitracks()[source]¶
Load all multitracks in the dataset
- Returns
dict – {mtrack_id: multitrack data}
- Raises
NotImplementedError – If the dataset does not support Multitracks
- load_roman_numerals(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.haydn_op20.load_roman_numerals
- load_score(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.haydn_op20.load_score
- class mirdata.datasets.haydn_op20.Track(track_id, data_home, dataset_name, index, metadata)[source]¶
haydn op20 track class
- Parameters
track_id (str) – track id of the track
- Variables
title (str) – title of the track
track_id (str) – track id
humdrum_annotated_path (str) – path to humdrum annotated score
- Other Parameters
keys (KeyData) – annotated local keys.
keys_music21 (list) – annotated local keys.
roman_numerals (list) – annotated roman_numerals.
chords (ChordData) – annotated chords.
chords_music21 (list) – annotated chords.
duration (int) – relative duration
midi_path (str) – path to midi
score (music21.stream.Score) – music21 score
- mirdata.datasets.haydn_op20.convert_and_save_to_midi(fpath: TextIO)[source]¶
convert to midi file and return the midi path
- Args:
fpath (str or file-like): path to score file
- Returns:
str: midi file path
Deprecated since version 0.3.4: convert_and_save_to_midi is deprecated and will be removed in a future version
- mirdata.datasets.haydn_op20.load_chords(fhandle: TextIO, resolution: int = 28)[source]¶
Load haydn op20 chords data from a file
- Parameters
fhandle (str or file-like) – path to chord annotations
resolution (int) – the number of pulses, or ticks, per quarter note (PPQ)
- Returns
ChordData – chord annotations
- mirdata.datasets.haydn_op20.load_chords_music21(fhandle: TextIO, resolution: int = 28)[source]¶
Load haydn op20 chords data from a file in music21 format
- Parameters
fhandle (str or file-like) – path to chord annotations
resolution (int) – the number of pulses, or ticks, per quarter note (PPQ)
- Returns
list – musical chords data and relative time (offset (Music21Object.offset) * resolution) [(time in PPQ, chord)]
- mirdata.datasets.haydn_op20.load_key(fhandle: TextIO, resolution=28)[source]¶
Load haydn op20 key data from a file
- Parameters
fhandle (str or file-like) – path to key annotations
resolution (int) – the number of pulses, or ticks, per quarter note (PPQ)
- Returns
KeyData – loaded key data
- mirdata.datasets.haydn_op20.load_key_music21(fhandle: TextIO, resolution=28)[source]¶
Load haydn op20 key data from a file in music21 format
- Parameters
fhandle (str or file-like) – path to key annotations
resolution (int) – the number of pulses, or ticks, per quarter note (PPQ)
- Returns
list – musical key data and relative time (offset (Music21Object.offset) * resolution) [(time in PPQ, local key)]
- mirdata.datasets.haydn_op20.load_roman_numerals(fhandle: TextIO, resolution=28)[source]¶
Load haydn op20 roman numerals data from a file
- Parameters
fhandle (str or file-like) – path to roman numeral annotations
resolution (int) – the number of pulses, or ticks, per quarter note (PPQ)
- Returns
list – musical roman numerals data and relative time (offset (Music21Object.offset) * resolution) [(time in PPQ, roman numerals)]
ikala¶
iKala Dataset Loader
Dataset Info
The iKala dataset is comprised of 252 30-second excerpts sampled from 206 iKala songs (plus 100 hidden excerpts reserved for MIREX). The music accompaniment and the singing voice are recorded at the left and right channels respectively and can be found under the Wavfile directory. In addition, the human-labeled pitch contours and timestamped lyrics can be found under PitchLabel and Lyrics respectively.
For more details, please visit: http://mac.citi.sinica.edu.tw/ikala/
- class mirdata.datasets.ikala.Dataset(data_home=None, version='default')[source]¶
The ikala dataset
- Variables
data_home (str) – path where mirdata will look for the dataset
version (str) –
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None) –
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack
- choice_multitrack()[source]¶
Choose a random multitrack
- Returns
Multitrack – a Multitrack object instantiated by a random mtrack_id
- choice_track()[source]¶
Choose a random track
- Returns
Track – a Track object instantiated by a random track_id
- property default_path¶
Get the default path for the dataset
- Returns
str – Local path to the dataset
- download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]¶
Download data to save_dir and optionally print a message.
- Parameters
partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception
- Raises
ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected
- get_mtrack_splits()[source]¶
Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.
- Raises
AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits
- Returns
dict – splits, keyed by split name and with values of lists of mtrack_ids
- get_random_mtrack_splits(splits, seed=42, split_names=None)[source]¶
Split the multitracks into partitions, e.g. training, validation, test
- Parameters
splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary
- Returns
dict – a dictionary containing the elements in each split
- get_random_track_splits(splits, seed=42, split_names=None)[source]¶
Split the tracks into partitions e.g. training, validation, test
- Parameters
splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary
- Returns
dict – a dictionary containing the elements in each split
- get_track_splits()[source]¶
Get predetermined track splits (e.g. train/ test) released alongside this dataset
- Raises
AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits
- Returns
dict – splits, keyed by split name and with values of lists of track_ids
- load_f0(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.ikala.load_f0
- load_instrumental_audio(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.ikala.load_instrumental_audio
- load_lyrics(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.ikala.load_lyrics
- load_mix_audio(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.ikala.load_mix_audio
- load_multitracks()[source]¶
Load all multitracks in the dataset
- Returns
dict – {mtrack_id: multitrack data}
- Raises
NotImplementedError – If the dataset does not support Multitracks
- load_notes(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.ikala.load_notes
- load_pronunciations(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.ikala.load_pronunciations
- load_tracks()[source]¶
Load all tracks in the dataset
- Returns
dict – {track_id: track data}
- Raises
NotImplementedError – If the dataset does not support Tracks
- class mirdata.datasets.ikala.Track(track_id, data_home, dataset_name, index, metadata)[source]¶
ikala Track class
- Parameters
track_id (str) – track id of the track
- Variables
audio_path (str) – path to the track’s audio file
f0_path (str) – path to the track’s f0 annotation file
notes_pyin_path (str) – path to the note annotation file
lyrics_path (str) – path to the track’s lyric annotation file
section (str) – section. Either ‘verse’ or ‘chorus’
singer_id (str) – singer id
song_id (str) – song id of the track
track_id (str) – track id
- Other Parameters
f0 (F0Data) – human-annotated singing voice pitch
notes_pyin (NoteData) – notes estimated by the pyin algorithm
lyrics (LyricsData) – human-annotated lyrics
pronunciations (LyricsData) – human-annotation lyric pronunciations
- get_path(key)[source]¶
Get absolute path to track audio and annotations. Returns None if the path in the index is None
- Parameters
key (string) – Index key of the audio or annotation type
- Returns
str or None – joined path string or None
- property instrumental_audio: Optional[Tuple[numpy.ndarray, float]]¶
instrumental audio (mono)
- Returns
np.ndarray - audio signal
float - sample rate
- property mix_audio: Optional[Tuple[numpy.ndarray, float]]¶
mixture audio (mono)
- Returns
np.ndarray - audio signal
float - sample rate
- to_jams()[source]¶
Get the track’s data in jams format
- Returns
jams.JAMS – the track’s data in jams format
- property vocal_audio: Optional[Tuple[numpy.ndarray, float]]¶
solo vocal audio (mono)
- Returns
np.ndarray - audio signal
float - sample rate
- mirdata.datasets.ikala.load_f0(fhandle: TextIO) mirdata.annotations.F0Data [source]¶
Load an ikala f0 annotation
- Parameters
fhandle (str or file-like) – File-like object or path to f0 annotation file
- Raises
IOError – If f0_path does not exist
- Returns
F0Data – the f0 annotation data
- mirdata.datasets.ikala.load_instrumental_audio(fhandle: BinaryIO) Tuple[numpy.ndarray, float] [source]¶
Load ikala instrumental audio
- Parameters
fhandle (str or file-like) – File-like object or path to audio file
- Returns
np.ndarray - audio signal
float - sample rate
- mirdata.datasets.ikala.load_lyrics(fhandle: TextIO) mirdata.annotations.LyricData [source]¶
Load an ikala lyrics annotation
- Parameters
fhandle (str or file-like) – File-like object or path to lyric annotation file
- Raises
IOError – if lyrics_path does not exist
- Returns
LyricData – lyric annotation data
- mirdata.datasets.ikala.load_mix_audio(fhandle: BinaryIO) Tuple[numpy.ndarray, float] [source]¶
Load an ikala mix.
- Parameters
fhandle (str or file-like) – File-like object or path to audio file
- Returns
np.ndarray - audio signal
float - sample rate
- mirdata.datasets.ikala.load_notes(fhandle: TextIO) Optional[mirdata.annotations.NoteData] [source]¶
load a note annotation file
- Parameters
fhandle (str or file-like) – str or file-like to note annotation file
- Raises
IOError – if file doesn’t exist
- Returns
NoteData – note annotation
- mirdata.datasets.ikala.load_pronunciations(fhandle: TextIO) mirdata.annotations.LyricData [source]¶
Load an ikala pronunciation annotation
- Parameters
fhandle (str or file-like) – File-like object or path to lyric annotation file
- Raises
IOError – if lyrics_path does not exist
- Returns
LyricData – pronunciation annotation data
- mirdata.datasets.ikala.load_vocal_audio(fhandle: BinaryIO) Tuple[numpy.ndarray, float] [source]¶
Load ikala vocal audio
- Parameters
fhandle (str or file-like) – File-like object or path to audio file
- Returns
np.ndarray - audio signal
float - sample rate
irmas¶
IRMAS Loader
Dataset Info
IRMAS: a dataset for instrument recognition in musical audio signals
This dataset includes musical audio excerpts with annotations of the predominant instrument(s) present. It was used for the evaluation in the following article:
Bosch, J. J., Janer, J., Fuhrmann, F., & Herrera, P. “A Comparison of Sound Segregation Techniques for
Predominant Instrument Recognition in Musical Audio Signals”, in Proc. ISMIR (pp. 559-564), 2012.
IRMAS is intended to be used for training and testing methods for the automatic recognition of predominant instruments in musical audio. The instruments considered are: cello, clarinet, flute, acoustic guitar, electric guitar, organ, piano, saxophone, trumpet, violin, and human singing voice. This dataset is derived from the one compiled by Ferdinand Fuhrmann in his PhD thesis, with the difference that we provide audio data in stereo format, the annotations in the testing dataset are limited to specific pitched instruments, and there is a different amount and lenght of excerpts from the original dataset.
The dataset is split into training and test data.
Training data
Total audio samples: 6705 They are excerpts of 3 seconds from more than 2000 distinct recordings.
Audio specifications
Sampling frequency: 44.1 kHz
Bit-depth: 16 bit
Audio format: .wav
IRMAS Dataset trainig samples are annotated by storing the information of each track in their filenames.
Predominant instrument:
The annotation of the predominant instrument of each excerpt is both in the name of the containing folder, and in the file name: cello (cel), clarinet (cla), flute (flu), acoustic guitar (gac), electric guitar (gel), organ (org), piano (pia), saxophone (sax), trumpet (tru), violin (vio), and human singing voice (voi).
The number of files per instrument are: cel(388), cla(505), flu(451), gac(637), gel(760), org(682), pia(721), sax(626), tru(577), vio(580), voi(778).
Drum presence
Additionally, some of the files have annotation in the filename regarding the presence ([dru]) or non presence([nod]) of drums.
The annotation of the musical genre:
country-folk ([cou_fol])
classical ([cla]),
pop-rock ([pop_roc])
latin-soul ([lat_sou])
jazz-blues ([jaz_blu]).
Testing data
Total audio samples: 2874
Audio specifications
Sampling frequency: 44.1 kHz
Bit-depth: 16 bit
Audio format: .wav
IRMAS Dataset testing samples are annotated by the following basis:
Predominant instrument:
The annotations for an excerpt named: “excerptName.wav” are given in “excerptName.txt”. More than one instrument may be annotated in each excerpt, one label per line. This part of the dataset contains excerpts from a diversity of western musical genres, with varied instrumentations, and it is derived from the original testing dataset from Fuhrmann (http://www.dtic.upf.edu/~ffuhrmann/PhD/). Instrument nomenclatures are the same as the training dataset.
Dataset compiled by Juan J. Bosch, Ferdinand Fuhrmann, Perfecto Herrera, Music Technology Group - Universitat Pompeu Fabra (Barcelona).
The IRMAS dataset is offered free of charge for non-commercial use only. You can not redistribute it nor modify it. This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License
For more details, please visit: https://www.upf.edu/web/mtg/irmas
- class mirdata.datasets.irmas.Dataset(data_home=None, version='default')[source]¶
The irmas dataset
- Variables
data_home (str) – path where mirdata will look for the dataset
version (str) –
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None) –
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack
- choice_multitrack()[source]¶
Choose a random multitrack
- Returns
Multitrack – a Multitrack object instantiated by a random mtrack_id
- choice_track()[source]¶
Choose a random track
- Returns
Track – a Track object instantiated by a random track_id
- property default_path¶
Get the default path for the dataset
- Returns
str – Local path to the dataset
- download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]¶
Download data to save_dir and optionally print a message.
- Parameters
partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception
- Raises
ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected
- get_mtrack_splits()[source]¶
Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.
- Raises
AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits
- Returns
dict – splits, keyed by split name and with values of lists of mtrack_ids
- get_random_mtrack_splits(splits, seed=42, split_names=None)[source]¶
Split the multitracks into partitions, e.g. training, validation, test
- Parameters
splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary
- Returns
dict – a dictionary containing the elements in each split
- get_random_track_splits(splits, seed=42, split_names=None)[source]¶
Split the tracks into partitions e.g. training, validation, test
- Parameters
splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary
- Returns
dict – a dictionary containing the elements in each split
- get_track_splits()[source]¶
Get predetermined track splits (e.g. train/ test) released alongside this dataset
- Raises
AttributeError – If this dataset does not have tracks
NotImplementedError – If this dataset does not have predetermined splits
- Returns
dict – splits, keyed by split name and with values of lists of track_ids
- load_audio(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.irmas.load_audio
- load_multitracks()[source]¶
Load all multitracks in the dataset
- Returns
dict – {mtrack_id: multitrack data}
- Raises
NotImplementedError – If the dataset does not support Multitracks
- load_pred_inst(*args, **kwargs)[source]¶
Deprecated since version 0.3.4: Use mirdata.datasets.irmas.load_pred_inst
- class mirdata.datasets.irmas.Track(track_id, data_home, dataset_name, index, metadata)[source]¶
IRMAS track class
- Parameters
track_id (str) – track id of the track
data_home (str) – Local path where the dataset is stored. If None, looks for the data in the default directory, ~/mir_datasets/Mridangam-Stroke
- Variables
track_id (str) – track id
predominant_instrument (list) – Training tracks predominant instrument
train (bool) – flag to identify if the track is from the training of the testing dataset
genre (str) – string containing the namecode of the genre of the track.
drum (bool) – flag to identify if the track contains drums or not.
split (str) – data split (“train” or “test”)
- Other Parameters
instrument (list) – list of predominant instruments as str
- property audio: Optional[Tuple[numpy.ndarray, float]]¶
The track’s audio signal
- Returns
np.ndarray - the mono audio signal
float - The sample rate of the audio file
- mirdata.datasets.irmas.load_audio(fhandle: BinaryIO) Tuple[numpy.ndarray, float] [source]¶
Load a IRMAS dataset audio file.
- Parameters
fhandle (str or file-like) – File-like object or path to audio file
- Returns
np.ndarray - the mono audio signal
float - The sample rate of the audio file
mtg_jamendo_autotagging_moodtheme¶
MTG jamendo autotagging moodtheme Dataset Loader
Dataset Info
The MTG Jamendo autotagging mood/theme Dataset is a new open dataset for music auto-tagging. It is built using music available at Jamendo under Creative Commons licenses and tags provided by content uploaders. The dataset contains 18,486 full audio tracks with 195 tags from mood/theme. It is provided five fixed data splits for a better and fair replication. For more information please visit: https://github.com/MTG/mtg-jamendo-dataset .
The moodtheme tags are:
action, adventure, advertising, ambiental, background, ballad, calm, children, christmas, commercial, cool, corporate, dark, deep, documentary, drama, dramatic, dream, emotional, energetic, epic, fast, film, fun, funny, game, groovy, happy, heavy, holiday, hopeful, horror, inspiring, love, meditative, melancholic, mellow, melodic, motivational, movie, nature, party, positive, powerful, relaxing, retro, romantic, sad, sexy, slow, soft, soundscape, space, sport, summer, trailer, travel, upbeat, uplifting.
Emotion and theme recognition is a popular task in music information retrieval that is relevant for music search and recommendation systems.
This task involves the prediction of moods and themes conveyed by a music track, given the raw audio. The examples of moods and themes are: happy, dark, epic, melodic, love, film, space etc. The full list is available at: https://github.com/mir-dataset-loaders/mirdata/pull/505 Each track is tagged with at least one tag that serves as a ground-truth.
Acknowledgments
This work was funded by the predoctoral grant MDM-2015-0502-17-2 from the Spanish Ministry of Economy and Competitiveness linked to the Maria de Maeztu Units of Excellence Programme (MDM-2015-0502).
This work has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 765068 “MIP-Frontiers”.
This work has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 688382 “AudioCommons”.
- class mirdata.datasets.mtg_jamendo_autotagging_moodtheme.Dataset(data_home=None, version='default')[source]¶
The MTG jamendo autotagging moodtheme dataset
- Variables
data_home (str) – path where mirdata will look for the dataset
version (str) –
name (str) – the identifier of the dataset
bibtex (str or None) – dataset citation/s in bibtex format
indexes (dict or None) –
remotes (dict or None) – data to be downloaded
readme (str) – information about the dataset
track (function) – a function mapping a track_id to a mirdata.core.Track
multitrack (function) – a function mapping a mtrack_id to a mirdata.core.Multitrack
- choice_multitrack()[source]¶
Choose a random multitrack
- Returns
Multitrack – a Multitrack object instantiated by a random mtrack_id
- choice_track()[source]¶
Choose a random track
- Returns
Track – a Track object instantiated by a random track_id
- property default_path¶
Get the default path for the dataset
- Returns
str – Local path to the dataset
- download(partial_download=None, force_overwrite=False, cleanup=False, allow_invalid_checksum=False)[source]¶
Download data to save_dir and optionally print a message.
- Parameters
partial_download (list or None) – A list of keys of remotes to partially download. If None, all data is downloaded
force_overwrite (bool) – If True, existing files are overwritten by the downloaded files.
cleanup (bool) – Whether to delete any zip/tar files after extracting.
allow_invalid_checksum (bool) – Allow invalid checksums of the downloaded data. Useful sometimes behind some proxies that inspection the downloaded data. When having a different checksum promts a warn instead of raising an exception
- Raises
ValueError – if invalid keys are passed to partial_download
IOError – if a downloaded file’s checksum is different from expected
- get_mtrack_splits()[source]¶
Get predetermined multitrack splits (e.g. train/ test) released alongside this dataset.
- Raises
AttributeError – If this dataset does not have multitracks
NotImplementedError – If this dataset does not have predetermined splits
- Returns
dict – splits, keyed by split name and with values of lists of mtrack_ids
- get_random_mtrack_splits(splits, seed=42, split_names=None)[source]¶
Split the multitracks into partitions, e.g. training, validation, test
- Parameters
splits (list of float) – a list of floats that should sum up 1. It will return as many splits as elements in the list
seed (int) – the seed used for the random generator, in order to enhance reproducibility. Defaults to 42
split_names (list) – list of keys to use in the output dictionary
- Returns
dict – a dictionary containing the elements in each split
- get_random_track_splits(splits, seed=42, split_names=None)[source]