Supported Datasets and Annotations

⭐ Dataset Quick Reference ⭐

This table is provided as a guide for users to select appropriate datasets. The list of annotations omits some metadata for brevity, and we document the dataset’s primary annotations only. The number of tracks indicates the number of unique “tracks” in a dataset, but it may not reflect the actual size or diversity of a dataset, as tracks can vary greatly in length (from a few seconds to a few minutes), and may be homogeneous.

“Downloadable” possible values:

  • ✅ : Freely downloadable

  • 🔑 : Available upon request

  • 📺 : Youtube Links only

  • ❌ : Not available

Find the API documentation for each of the below datasets in Initializing.

Dataset

Downloadable?

Annotation Types

Tracks

License

AcousticBrainz Genre

  • audio: ❌

  • annotations: ✅

  • features: ✅

>4M

Beatles

  • audio: ❌

  • annotations: ✅

180

Beatport EDM key

  • audio: ✅

  • annotations: ✅

1486

https://licensebuttons.net/l/by-sa/4.0/80x15.png

cante100

  • audio: 🔑

  • annotations: ✅

F0

100

Custom

DALI

  • audio: 📺

  • annotations: ✅

5358

https://licensebuttons.net/l/by-sa/4.0/80x15.png

Giantsteps key

  • audio: ✅

  • annotations: ✅

global Key

500

https://licensebuttons.net/l/by-sa/4.0/80x15.png

Giantsteps tempo

  • audio: : ❌

  • annotations: ✅

664

https://licensebuttons.net/l/by-sa/4.0/80x15.png

Groove MIDI

  • audio: ✅

  • midi: ✅

1150

https://licensebuttons.net/l/by-sa/4.0/80x15.png

Gtzan-Genre

  • audio: : ✅

  • annotations: ✅

global Genre

1000

Guitarset

  • audio: ✅

  • midi: ✅

360

https://img.shields.io/badge/License-MIT-blue.svg

Ikala

  • audio: ❌

  • annotations: ❌

252

Custom

IRMAS

  • audio: ✅

  • annotations: ✅

9579

https://licensebuttons.net/l/by-nc-sa/3.0/80x15.png

MAESTRO

  • audio: ✅

  • annotations: ✅

Piano Notes

1282

https://licensebuttons.net/l/by-nc-sa/4.0/80x15.png

Medley-solos-DB

  • audio: : ✅

  • annotations: ✅

Instruments

21571

https://licensebuttons.net/l/by-sa/4.0/80x15.png

MedleyDB melody

  • audio: 🔑

  • annotations: ✅

Melody F0

108

https://licensebuttons.net/l/by-nc-sa/4.0/80x15.png

MedleyDB pitch

  • audio: 🔑

  • annotations: ✅

103

https://licensebuttons.net/l/by-nc-sa/4.0/80x15.png

Mridangam Stroke

  • audio: ✅

  • annotations: ✅

6977

https://licensebuttons.net/l/by/3.0/80x15.png

Orchset

  • audio: ✅

  • annotations: ✅

Melody F0

64

https://licensebuttons.net/l/by-nc-sa/4.0/80x15.png

RWC classical

  • audio: ❌

  • annotations: ✅

61

Custom

RWC jazz

  • audio: ❌

  • annotations: ✅

50

Custom

RWC popular

  • audio: ❌

  • annotations: ✅

100

Custom

Salami

  • audio: ❌

  • annotations: ✅

Sections

1359

https://licensebuttons.net/l/zero/1.0/80x15.png

Saraga Carnatic

  • audio: ✅

  • annotations: ✅

249

https://licensebuttons.net/l/by-nc-sa/4.0/80x15.png

Saraga Hindustani

  • audio: ✅

  • annotations: ✅

108

https://licensebuttons.net/l/by-nc-sa/4.0/80x15.png

Tinysol

  • audio: ✅

  • annotations: ✅

2913

https://licensebuttons.net/l/by/4.0/80x15.png

Tonality ClassicalDB

  • audio: ❌

  • annotations: ✅

Global Key

881

https://licensebuttons.net/l/by-nc-sa/4.0/80x15.png

Annotation Types

The table above provides annotation types as a guide for choosing appropriate datasets, but it is difficult to generically categorize annotation types, as they depend on varying definitions and their meaning can change depending on the type of music they correspond to. Here we provide a rough guide to the types in this table, but we strongly recommend reading the dataset specific documentation to ensure the data is as you expect. To see how these annotation types are implemented in mirdata see Annotations.

Beats

Musical beats, typically encoded as sequence of timestamps and corresponding beat positions. This implicitly includes downbeat information (the beginning of a musical measure).

Chords

Musical chords, e.g. as might be played on a guitar. Typically encoded as a sequence of labeled events, where each event has a start time, end time, and a label. The label taxonomy varies per dataset, but typically encode a chord’s root and its quality, e.g. A:m7 for “A minor 7”.

Drums

Transcription of the drums, typically encoded as a sequence of labeled events, where the labels indicate which drum instrument (e.g. cymbal, snare drum) is played. These events often overlap with one another, as multiple drums can be played at the same time.

F0

Musical pitch contours, typically encoded as time series indidcating the musical pitch over time. The time series typically have evenly spaced timestamps, each with a correspoinding pitch value which may be encoded in a number of formats/granularities, including midi note numbers and Hertz.

Genre

A typically global “tag”, indicating the genre of a recording. Note that the concept of genre is highly subjective and we refer those new to this task to this article.

Instruments

Labels indicating which instrument is present in a musical recording. This may refer to recordings of solo instruments, or to recordings with multiple instruments. The labels may be global to a recording, or they may vary over time, indicating the presence/absence of a particular instrument as a time series.

Key

Musical key. This can be defined globally for an audio file or as a sequence of events.

Lyrics

Lyrics corresponding to the singing voice of the audio. These may be raw text with no time information, or they may be time-aligned events. They may have varying levels of granularity (paragraph, line, word, phoneme, character) depending on the dataset.

Melody

The musical melody of a song. Melody has no universal definition and is typically defined per dataset. It is typically enocoded as F0 or as Notes. Other types of annotations such as Vocal F0 or Vocal Notes can often be considered as melody annotations as well.

Notes

Musical note events, typically encoded as sequences of start time, end time, label. The label typically indicates a musical pitch, which may be in a number of formats/granularities, including midi note numbers, Hertz, or pitch class.

Phrases

Musical phrase events, typically encoded by a sequence of timestamps indicating the boundary times and defined by solfège symbols. This annotations are not intended to describe the complete melody but the musical phrases present in the track.

Sections

Musical sections, which may be “flat” or “hierarchical”, typically encoded by a sequence of timestamps indicating musical section boundary times. Section annotations sometimes also include labels for sections, which may indicate repetitions and/or the section type (e.g. Chorus, Verse).

Technique

The playing technique used by a particular instrument, for example “Pizzicato”. This label may be global for a given recording or encoded as a sequence of labeled events.

Tempo

The tempo of a song, typical in units of beats-per-minute (bpm). This is often indicated globally per track, but in practice tracks may have tempos that change, and some datasets encode tempo as time-varying quantity. Additionally, there may be multiple reasonable tempos at any given time (for example, often 2x or 0.5x a tempo value will also be “correct”). For this reason, some datasets provide two or more different tempo values.

Vocal Activity

A time series or sequence of events indicating when singing voice is present in a recording. This type of annotation is implicitly available when Vocal F0 or Vocal Notes annotations are available.

Stroke Name

An open “tag” to identify an instrument stroke name or type. Used for instruments that have specific stroke labels.

Tonic

The absolute tonic of a track. It may refer to the tonic a single stroke, or the tonal center of a track.