OSU Navigation Bar

The Ohio State University

School of Music

iRb Corpus in **jazz format

The iRb Corpus in **jazz format

Yuri Broze and Daniel Shanahan

Updated 26 Dec 2012


This is the online home of the iRb corpus, made freely available for music research. The corpus contains 1,186 individual files, each representing one page from the "Jazz 1200 Standards" collection from the iReal b forum.


A single .zip file of the 1,186 songs is available for download.

A helper script, jazzparser, is available here.

**jazz Specification

In order to accomodate the use of typical tools of the Humdrum Toolkit, the corpus was translated into **jazz, a newly-specified Humdrum representation. The specification adheres broadly to the **kern representation, with certain modifications to better suit jazz harmonic records. Each individual .jazz file is a UTF-8 text file with reference records, indicators of form, key interpretations, and data tokens.

To give a taste, here is the shortest **jazz file in the corpus:

!!!OTL: Sweeping Up
!!!COM: Swallow, Steve
!!!ODT: 1975

Reference Records

The **jazz files in this corpus begin with metadata. Metadata associated with the .jazz files are presented in the form of kern reference records, described in the User's Guide. The iRb files include at minimum the following three records:
  • !!!OTL: Title
  • !!!COM: Composer
  • !!!ODT: Year (date) of composition

If a lyricist is credited, the reference record used is:

  • !!!LYR: Lyricist

Sometimes, there are multiple composer or lyricist credits. In this case, they are given using !!!COM1: and !!!COM2:, or !!!LYR1: and !!!LYR2: as appropriate. For example:

!!!OTL: Party's Over, The
!!!COM: Styne, Jule
!!!LYR1: Comden, Betty
!!!LYR2: Green, Adolph
!!!ODT: 1956

This means that when one wants to get a complete list of the composers represented in the iRb corpus, one should search for "!!!COM" instead of "!!!COM:", since the former will also match co-composer credits.

For example,

> grep '^!!!COM' *.jazz | sed 's/^.*: //' | sort | uniq | wc -l
> grep '^!!!COM:' *.jazz | sed 's/^.*: //' | sort | uniq | wc -l

This means that 557 unique individuals have composition credits for the songs in the corpus, but only 366 unique individuals have sole composer credits. See the Humdrum User Guide for more information about using UNIX tools in music research.


Following the initial reference records, several Humdrum interpretations appear. These specify the representation used, the formal structure, the meter, and the apparent key (as judged by the authors). For example, here are the reference records and first interpretations for a **jazz file:

!!!OTL: Since I Fell For You
!!!COM: Johnson, Buddy
!!!ODT: 1945

In order, these interpretations are:

  • **jazz -- Specifies the **jazz representation.
  • *>[A,N1,A,N2,B,A2] -- Specifies the formal structure.
  • *>A -- Declares that the following belongs to the A section.
  • *M4/4 -- Specifies the piece is in 4/4 time.
  • *E-: -- We interpreted the piece to be in E flat major.

Note that formal structure is compatible with the Humdrum thru command (or Craig Sapp's thrux). Note that these structure guides represent a compromise between section labels and the machine-performance specifications in the iRealb originals. Therefore, these should be taken with caution.

Data Records

**jazz data records are similar to **kern.

Barline tokens. Single barlines are represented as "=" and double barlines as "==". Bar numbers and barlines signifying repeats are not implemented.

Chord tokens. Chords are of the form [duration][root][extensions]. Durations are in Humdrum reciprocal form, and chord roots are represented like **kern pitches, using a single capital letter. Sharps are designated using "#", and flats by "-", in accordance with **kern usage. Chord qualities and extensions are given as they appear in written form. An optional ":" can be used to set qualities and extensions apart from the root in **jazz, to enhance comprehension.

Here is a brief example:


In addition, slash chords are designated with a slash, while suggested substitutions are provided in parentheses. Note that the **jazz representation diverges from the **kern standard in that its ordering of elements is strict.


Included is a bash script that performs preliminary parsing of the **jazz files, extracting useful information into several spines. Sample output is as follows:

**jazz		**kern	**exten	**solfa	**mint	**quals	**dur
*thru		*thru	*thru	*thru	*thru	*thru	*thru
*M4/4		*M4/4	*M4/4	*M4/4	*M4/4	*M4	*M4/4
*D-:		*D-:	*D-:	*D-:	*D-:	*D-:	*D-:
2E-:min7	E-	min7	re	[E-]	min7	2.0000
2B-7b13		B-	7b13	la	P5	dom	2.0000
=		=	=	=	=	=	=
2E-:min7	E-	min7	re	P4	min7	2.0000
2A-7		A-	7	so	P4	dom	2.0000
=		=	=	=	=	=	=
2D-:maj7	D-	maj7	do	P4	maj	2.0000
2G-7		G-	7	fa	P4	dom	2.0000
=		=	=	=	=	=	=
2F:min7		F	min7	mi	M7	min7	2.0000
2Eo7		E	o7	ri	M7	dim	2.0000
=		=	=	=	=	=	=
2E-:min7	E-	min7	re	d1	min7	2.0000
2E-:min7/D-	E-	min7	re	P1	min7	2.0000
=		=	=	=	=	=	=
2Ch7		C	h7	ti	M6	half	2.0000
2F7b9		F	7b9	mi	P4	dom	2.0000
=		=	=	=	=	=	=
2B-:min7	B-	min7	la	P4	min7	2.0000
4E-:min7	E-	min7	re	P4	min7	1.0000
4A-7		A-	7	so	P4	dom	1.0000
=		=	=	=	=	=	=
4D-6		D-	6	do	P4	maj	1.0000
4G-7		G-	7	fa	P4	dom	1.0000
4Fh		F	h	mi	M7	half	1.0000
4B-7		B-	7	la	P4	dom	1.0000
==		==	==	==	==	==	==
*-		*-	*-	*-	*-	*-	*-

Output from jazzparser can be used in various ways using the extract command.