A very nice paper by Meinard Muller on using local wavelet-style function fitting on spectral differences to improve beat tracking and tempo estimation. Predmonant Local Pulse paper
The technique might have use in rhythmic grouping, which can be expressed as a form of local beat-division tracking.
Thursday, July 30th Paris Smaragdis presents a talk entitled: Sound Mixtures: A musician’s delight, an engineer’s nightmare!
2PM, Dartmouth College, Room: Wilson 219
Abstract:
Musicians have long embraced polyphony all the way back to the middle ages. Engineers, even today, are still struggling to come to terms with it. Traditional audio signal processing has primarily focused on monophonic signals, such as isolated speech, or solo instruments. But, with the increasing popularity of automated music analysis we are now facing a new level of complexity to signal processing, prompting for some radical rethinking of the fundamentals. In this talk I’ll present some of my work along these lines, and present some solutions to classic problems such as music transcription and source separation, but also explore some of the new things which are possible once we can deal with sound mixtures.
Paris Smaragdis is a senior research scientist at Adobe Systems and holds degrees from MIT under the supervision of Barry Vercoe in the Machine Listening Group. Primary research interests revolve around making machines that can listen and includes numerous publications on source separation, Music Information Retrieval (MIR), and is associated with renowned research labs such as MERL, Interval Research, and Starlab. Dr. Smaragdis is currently a visiting scientist at MIT’s McGovern Institute for Brian Research, ad in 2006 was named one of the “top technology innovators” by MIT’s Technology Review.
Goldsmiths Ph.D. student Michaela Magas has been busy with user interaction design and publicity for AudioDB. First she made a web site showing the prototype interfaces she has developed: mHashup site.
Then there are a couple of nice news articles (including video) of audioDB’smhashup front end:
BBC News article comparing Pandora, Melodis, Shazam and mHashup. And some BBC Click video.
Look out for iPhone apps and VST plugins of this technology in the Fall 2009.
The new version of soundspotter for Max/MSP is finally working.
As an example, here is a 5 minute recording made from a short audio clip (10s) ‘remixed’ in real time using cyclic matching-without-replacement (8 frames) over a shingle size of 4 frames matched to a triangle wave at 272Hz. Parameters are varied in real time to alter the matching behaviour.
Tutorial AM 1 (10:00-13:00): MIR at the Scale of the Web
by Malcolm Slaney (Yahoo! Research), and Michael Casey (Dartmouth College)
Abstract
In the last couple of years we have received access to music databases with millions of songs. This massive change in the amount of data available to researchers is changing the face of Music Information Retrieval. In many domains, speech-recognition is most notable, people have observed that the best way to improve their algorithm’s performance is to add more data. Starting with hidden-Markov models (HMMs) and support-vector machines, people have applied ever greater amounts of data to their problems and been rewarded with new levels of performance.
What are the algorithms and ideas that are necessary to work with such large databases? How do we define the scope of a problem, and how do we apply modern clusters of processors to these problems? What does it take to collect, manage and deliver solutions with millions of songs and terabytes of data? In this tutorial we will talk about a range of algorithms and tools that make it easy/easier to scale our work to Internet-sized collections of music. The field is just developing so this tutorial will talk about a range of techniques that are in use today. Millions of songs fit into a small number of terabytes, which is just a few hundred dollars of disk space.
This tutorial will give attendees an overview and pointers to the tools that will allow them to scale their work to modern datasets. The tutorial will discuss the theoretical and practical problem with large data, applications where large amounts of data are important to consider, types of algorithms that are practical with such large datasets, and examples of implementation techniques that make these algorithms practical. The tutorial will be illustrated with many real-world examples and results.
Panel Discussion: The Musical Brain
Wednesday, January 28 5:30 pm
First Floor Commons, Fahey/McLane, Tuck Drive
Free and open to the public
Jazz pianist/composer Vijay Iyer and Dartmouth faculty members Thalia Wheatley (Psychological & Brain Sciences) and Michael Casey (Music) explore, in a non-technical discussion, music cognition and the way the brain makes sense of rhythm and meter. For more info, call Hop Outreach at 603.646.2010.
Presented in conjunction with a jazz double bill at the Hop featuring Vijay Iyer Trio and percussionist Dafnis Prieto’s Sextet at 7:00 pm on Thursday, January 29 in Spaulding Auditorium. For tickets and info, call 603.646.2422.
Abstract: Recently, a number of piano recordings by different artists were found in a classical music catalog that exhibited a striking resemblance to each other. Could this resemblance be purely coincidental? We set about building a system that could answer this question and others in large recorded collections of music. The AudioDB system listens to polyphonic music recordings and encodes important perceptual information about them at fine time scales. The information extracted corresponds to traditional music-theoretic concepts such as, harmony, timbre, pitch, texture and rhythm yielding a high-dimensional representation; consisting of 300-1200 dimensions.
Our music databases have 104-107 recordings, each with thousands of vectors, so brute-force methods for similarity computation are not practical. Instead, we use locality-sensitive hashing (LSH) which searches in high dimensions with sub-linear time complexity. We propose a method for automatically estimating the radius threshold for efficient and accurate LSH retrieval. Our method employs statistical sampling of the background distance distribution and solving for the minimum distance distribution using order statistics.
Using these methods, we are able to quantify the “purely coincidental” resemblance in the piano recordings mentioned above, demonstrating that their similarity is not, in fact, coincidental. The newer recordings are altered copies of the older ones. Detecting fraudulent recordings with human hearing is difficult; even the music critics were fooled into highly commending these newer recordings. We conclude that scalable audio search systems, such as AudioDB, are required to address the emerging multimedia needs of the Internet, commercial music services and large multimedia archives.
Talk by Prof. Michael Casey
To be presented Friday December 5th, 2008
New York University, NYC
Abstract
Soundspotting is a new approach to creating musical streams by
selecting and concatenating source segments from a large audio
database using methods from music information retrieval. Sometimes
called plundermatics, audio mosaics or concatenative synthesis,
soundspotting computes a similarity score between a target audio
segment and all the available segments in the source database and
selects close-matching sources to concatenate to the audio-output
stream forming a real-time response to the target. Examples of target
signals are: acoustic instruments, signals generated by a synthesis
algorithm, or a previous output of the soundspotting process thus
yielding an audio information feedback circuit. Soundspotting enhances
the techniques of sampling, plunderphonics, re-mixing and mashups by
adding automatic audio organization and an external driving target
signal.
Soundspotting opens up new possibilities for audio-visual composition,
admitting control from live audio sources and organization via
deterministic ordering and selection criteria. In this talk, I will
discuss the techniques, technologies and musical possibilities for
soundspotting and present the work of several artists who have
appropriated the technique of soundspotting into their own work.
AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing.
Invited Talk Acoustical Society of America, 156th Meeting, Miami, FL
Presented on Thursday November 13th, 2008 AudioDB ASA156 powerpoint slides
This talk describes new approximate nearest-neighbor methods employed in a scalable audio-feature database system called “AudioDB.” This open-source system is designed to scale to storing and searching hundreds of millions of feature vectors on standard UNIX workstation platforms. A radius-bounded nearest-neighbor vector-sequence search algorithm, based on locality sensitive hashing LSH , achieves sublinear retrieval times at this scale. The performance of the LSH-based algorithm depends critically on the choice of radius bound supplied—the wrong value impacts retrieval accuracy or retrieval time. An optimal radius estimator is derived by modeling the minimum value distribution of a random sample of a data set’s pairwise distance distribution. When used with LSH this yields accurate search results with retrieval times several orders of magnitude faster than exhaustive search methods and space-partitioning methods. The same statistical sampling method is used to perform retrieval tasks at successively higher levels of specificity on labeled or unlabeled audio collections. The result is a system that a) unifies audio retrieval tasks across a range of specificities, using the statistical framework of background distance-distribution sampling and hypothesis testing b) is as accurate as exhaustive search methods and c ) is three orders of magnitude faster than exhaustive search methods.
Social Playlists and Bottleneck Measurements: Exploiting Musician Social Graphs Using Content-Based Dissimilarity and Pairwise Maximum Flow Values
Ben Fields, Kurt Jacobson, Christophe Rhodes and Michael Casey International Conference on Music Information Retrieval (ISMIR), Philadelphia, Sep., 2008 Paper 209 [PDF]
We have sampled the artist social network of Myspace and to it applied the pairwise relational connectivity measure Minimum cut/Maximum flow. These values are then compared to a pairwise acoustic Earth Mover’s Distance measure and the relationship is discussed. Further, a means of constructing playlists using the maximum flow value to exploit both the social and acoustic distances is realized.