Pop*
Pop* (pronounced Pop-Star) is a system developed by Paul Bodily starting Winter 2016 to automate the composition of pop songs. A fundamental element of the project is the automation of annotating tablature data as found in open-source databases such as ultimate-guitar.com. This venture is titled Tab-Complete. The purpose of this wiki is to facilitate documentation of the research endeavors for both the Pop* system and the Tab-Complete dataset.
Project Directory
Tab-Complete - Automated completion and annotation of pop music tabs
-
-
-
-
-
-
Potential Publications
Sequence alignment methods for identifying structure in music
Pairwise MSA for identifying consensus of several lyric sources
Lyrical sequence alignment for identifying musical content of tab sources
Aligning pairwise alignments of lyric lines for chorus identification (alignment of alignments)
Aligning pairwise alignments of chord lines for verse identification (alignment of alignments)
Phoneme sequence alignment for rhyme scheme detection
Pairwise alignment of intra-segmental chord lines for substructure identification
How do we get complete tabs? Compare MSA with pairwise Smith-Waterman techniques.
Using rhymes and a distribution metric to pull structure out of unstructured tabs
-
-
Upcoming Conferences
Assumptions Made
The number of chords per line doesn't equal the number of measures per line
All lines in a segment will have the same number of measures per line.
All MMs are single-order right now
Lyric transition models are based on word→word transition frequencies, not syllables→syllable transition frequencies (which we may want to try)
We select the number of words (not syllables) per line, and find a way to make it work syllable-wise
Todo List
-
lyricsnet isn't scraping the beatles
Include figures in manuscript
Get some gold-standard data to validate with and insert results into manuscript
Get a way for Ben to access verses/choruses on demand, etc.
Need better segmentation of songs to avoid a bunch of 1-line segments
Need better rhyme scheme analysis to get good rhyme constraints
Chords are being grouped only by root and minorness. Needs to be expanded to get more variety.
-
Need to check if rhyme constraints are working properly
Need to add constraints
on chorus to indicate which lines are fixed and which aren't (lower priority)
for subrhyme schemes
repetitive subsequences of chords
repetitive subsequences of lyrics
Start drafting paper on data process and alignment
Chorus must also match chords
Create gold standard labeled dataset to test various methods. Label
Key
Rhyme scheme
Internal rhymes
Segment structure
Normalize keys
Use tab contents rather than lyric contents in finalized tab (use lyrics just for identification of song body and completeness)
When eliciting rhyme scheme, currently it's getting all pronunciations of the line and then taking the last few syllables of each (highly redundant). Fix it.
Compare different
rhyme scoring algorithms:
Pat Patterson's rules
Hirjee Matrix simple scoring
Pat Patterson's rules w/ Hirjee Matrix
Alignment
w/ w/o considering penultimate syllable
Normalizing the Hirjee Matrix
Penalize for distance
Start parsing the scraped data
Get rid of blocks in tab with no chords && no matching lyrics
Explicitly set costs in Aligner before it is called.
Have TabComplete print out a tab-delimited file with columns representing value indicative of the quality at each step (don't filter initially, choose later)
Do thorough manual verification of pipeline up through tab parser
Check how complete the lyric consensuses are
Check that all the fields are being correctly populated
Check how many lyric sheets per song are being aligned
Create or find gold standard dataset
Shoulddo list
There are a number of known issues with the MSA consensus calling. It may be better to just use the text from the tab, rather than whatever mish-mash the consensus pulls out (e.g., mindeah).
Right now any alphabetic character will taken precedence over a non-alphabetic character if there's a tie
If there are three different sequences with three different characters, one will get picked at random or may not get picked at all, depending on where gaps are inserted.
So much depends on the scoring costs
Find threshold for rhyming vs no rhyming
Try running on just good tabbers tabs or on Pro tabs (requires subscription?) or those with videos?
Implement and give graduated scoring for relative major chord matching in chord alignment
Find best cost matrices for MSA (i.e., match cost, mismatch cost, gap open/extend costs)
For speed,
Completed
Use conditional probability distributions to model number of chords per line, and constraints relating to number of chords per line and repetition of chords in a line
We have a distribution of rhyme schemes conditioned on SegmentType
We need a distribution of subrhyme schemes conditioned on rhyme schemes (could also be conditioned on SegType)
We need a distribution of the number of chords per line conditioned on rhyme schemes (and possibly conditioned on SegType)
We need a distribution of repetitive subsequences of chords per line conditioned on rhyme scheme and SegType
We need a distribution of the number of words per line conditioned on subrhyme scheme
We need a distribution of repetitive subsequences of lyrics per line conditioned on rhyme scheme and SegType
We need a distribution of the variation in words per line between paired lines (as per rhyme scheme)
We need a distribution of chord transitions, including intersegmental chord transitions, starting chords, ending chords, etc.
Instead of keeping just counts, keep pointers to actual songs to be able to later reference source of decision
Test: Rhyme Structure sampler with actual constraints
Filter explicit songs -
Complete tab alignment with assumption that chord indices (within line) are not correct and that blocks are not maintained
Find matching sheets to do MSA - we will start by only aligning those with matching names
-
Words not in the CMU dictionary?
Fix: Right now a one-line bridge between verses is screwing up distributions for lines per bridge, etc. Solution: treat such as interlude
Fix: When getting phenotypes per line, need to let phoneticizer (w/ rhyme stop words) decide what to get phonemes for (right now just gets last two words, even if they’re stop words)
Account for different pronunciations in the CMU dictionary
Discovered the refactoring takes the same time for both string alignment and MSA. I'm a little concerned the MSA is going to take a long time to complete.
Test refactored code to see if alignments are the same before and after refactoring and the change in speed from generalizing
Compare the two MSA alignments algorithms on simple inputs, comparing their scoring matrices and figuring out why they compute differently, specifically consider subsequence from a real case.
I discovered I'd never updated the fix for computing left costs for the old aligner AND simultaneously stumbled on a bug in the consensus caller that overestimated the count for a character that appeared in both upper and lower cases.
Verified that the refactored alignment code computes the same alignment for strings
Refactor sequence alignment code to easily allow alignment of non-char sequences (e.g., phonemes, chords)
Find out how unbanded and banded with minPercOverlap of 1.0 are different and fix it.
Can we grease up the banded?
n-dimensional MSA? or Pairwise?
Computer?
-
Fix issues with scrapers:
UG: raw_tab,url, title, difficulty, key, provider, contributor, type all look good
eChords: raw_tab,url, title, difficulty, key, provider, contributor, type all look good
SongLyrics: url, title, provider, artist all look good
Metrolyrics: url, title, provider, artist all look good
LyricsNet: url, title, provider, artist all look good
Map role values to actual roles
Parse untagged chords in tabs
Legitimate Chords:
“Intro:G” or “Bm…” or “C|” or “Gmaj7 –” or “~D7” (
1,
2,
3,
4)
“C+” or “Bb5+”
-
-
-
-
“D9sus4”, “Bm7b5”, “F#m7b5”,“Am7add11” (
1,
2)
-
Inferable Chords:
Repeat Information:
“x4” or “repeat” or “(2x)” (
1,
2)
-
Salvageable?
Removable elements:
“(hold)” or anything in parens? (
1)
Things to check:
-
Still seeing “=”?
Still seeing “de de da__ da da__ C” (
1)?
Replace lyrics with more than three consecutive identical letters with just two of them (
solution)
Fix songlyrics.com scraper to not just get Billy Joel songs
Fix metrolyrics.com scraper to also get songs and artists with numbers in their name
Check tab scrapers and lyrics.net to see if they're getting all the artists
** Answer questions:
**
**
**
**
Back to top