⭐ Table of supported datasets ⭐

This table is provided as a guide for users to select appropriate datasets. The list of annotations omits some metadata for brevity, and we document the dataset’s primary annotations only. The number of tracks indicates the number of unique “tracks” in a dataset, but it may not reflect the actual size or diversity of a dataset, as tracks can vary greatly in length (from a few seconds to a few minutes), and may be homogeneous. For specific information about the contents of each dataset, click the link provided in the “Module” column.

“Downloadable” possible values:

  • ✅ : Freely downloadable
  • 🔑 : Available upon request
  • 📺 : Youtube Links only
  • ❌ : Not available
Module Name Downloadable? Annotation Types Tracks
beatles
The Beatles
Dataset
  • audio: ❌
  • annotations: ✅
180
beatport_key Beatport EDM key
  • audio: ✅
  • annotations: ✅
1486
dali DALI
  • audio: 📺
  • annotations: ✅
5358
groove_midi
Groove MIDI
Dataset
  • audio: ✅
  • midi: ✅
1150
gtzan_genre Gtzan-Genre
  • audio: ✅
  • annotations: ✅
1000
giantsteps_tempo
Giantsteps EDM
tempo Dataset
  • audio: ❌
  • annotations: ✅
664
giantsteps_key Giantsteps EDM key
  • audio: ✅
  • annotations: ✅
500
guitarset GuitarSet
  • audio: ✅
  • annotations: ✅
360
ikala iKala
  • audio: ❌
  • annotations: ❌
252
maestro MAESTRO
  • audio: ✅
  • annotations: ✅
1282
medley_solos_db Medley-solos-DB
  • audio: ✅
  • annotations: ✅
21571
medleydb_melody
MedleyDB
Melody Subset
  • audio: 🔑
  • annotations: ✅
108
medleydb_pitch
MedleyDB Pitch
Tracking Subset
  • audio: 🔑
  • annotations: ✅
103
mridangam_stroke Mridangam Stroke
  • audio: ✅
  • annotations: ✅
6977
orchset Orchset
  • audio: ✅
  • annotations: ✅
64
rwc_classical RWC Classical
  • audio: ❌
  • annotations: ✅
50
rwc_jazz RWC Jazz
  • audio: ❌
  • annotations: ✅
50
rwc_popular RWC Pop
  • audio: ❌
  • annotations: ✅
100
salami Salami
  • audio: ❌
  • annotations: ✅
1359
tinysol TinySOL
  • audio: ✅
  • annotations: ✅
2913

Annotation Type Descriptions

The table above provides annotation types as a guide for choosing appropriate datasets, but it is difficult to generically categorize annotation types, as they depend on varying definitions and their meaning can change depending on the type of music they correspond to. Here we provide a rough guide to the types in this table, but we strongly recommend reading the dataset specific documentation to ensure the data is as you expect.

Beats

Musical beats, typically encoded as sequence of timestamps and corresponding beat positions. This implicitly includes downbeat information (the beginning of a musical measure).

Chords

Musical chords, e.g. as might be played on a guitar. Typically encoded as a sequence of labeled events, where each event has a start time, end time, and a label. The label taxonomy varies per dataset, but typically encode a chord’s root and its quality, e.g. A:m7 for “A minor 7”.

Drums

Transcription of the drums, typically encoded as a sequence of labeled events, where the labels indicate which drum instrument (e.g. cymbal, snare drum) is played. These events often overlap with one another, as multiple drums can be played at the same time.

F0

Musical pitch contours, typically encoded as time series indidcating the musical pitch over time. The time series typically have evenly spaced timestamps, each with a correspoinding pitch value which may be encoded in a number of formats/granularities, including midi note numbers and Hertz.

Genre

A typically global “tag”, indicating the genre of a recording. Note that the concept of genre is highly subjective and we refer those new to this task to this article.

Instruments

Labels indicating which instrument is present in a musical recording. This may refer to recordings of solo instruments, or to recordings with multiple instruments. The labels may be global to a recording, or they may vary over time, indicating the presence/absence of a particular instrument as a time series.

Key

Musical key. This can be defined globally for an audio file or as a sequence of events.

Lyrics

Lyrics corresponding to the singing voice of the audio. These may be raw text with no time information, or they may be time-aligned events. They may have varying levels of granularity (paragraph, line, word, phoneme, character) depending on the dataset.

Melody

The musical melody of a song. Melody has no universal definition and is typically defined per dataset. It is typically enocoded as F0 or as Notes. Other types of annotations such as Vocal F0 or Vocal Notes can often be considered as melody annotations as well.

Notes

Musical note events, typically encoded as sequences of start time, end time, label. The label typically indicates a musical pitch, which may be in a number of formats/granularities, including midi note numbers, Hertz, or pitch class.

Sections

Musical sections, which may be “flat” or “hierarchical”, typically encoded by a sequence of timestamps indicating musical section boundary times. Section annotations sometimes also include labels for sections, which may indicate repetitions and/or the section type (e.g. Chorus, Verse).

Technique

The playing technique used by a particular instrument, for example “Pizzicato”. This label may be global for a given recording or encoded as a sequence of labeled events.

Tempo

The tempo of a song, typical in units of beats-per-minute (bpm). This is often indicated globally per track, but in practice tracks may have tempos that change, and some datasets encode tempo as time-varying quantity. Additionally, there may be multiple reasonable tempos at any given time (for example, often 2x or 0.5x a tempo value will also be “correct”). For this reason, some datasets provide two or more different tempo values.

Vocal Activity

A time series or sequence of events indicating when singing voice is present in a recording. This type of annotation is implicitly available when Vocal F0 or Vocal Notes annotations are available.

Stroke Name

An open “tag” to identify an instrument stroke name or type. Used for instruments that have specific stroke labels.

Tonic

The absolute tonic of a track. It may refer to the tonic a single stroke, or the tonal center of a track.