Archive for the ‘Datasets’ Category

Music Recommendation Datasets from, by Oscar Celma

February 12, 2012 Leave a comment

:: Music Recommendation Datasets ::.

One more dataset for MIR and Music Recommendation, compiled by Oscar Celma, and based around data and APIs.

And some more detailed info here.



MySQL has been pushed to its limits in large scale web scenarios, where the load of millions of users concurrently accessing SQL relational databases start to show the huge overheads and subsequent limitation of this type of database. A recent “move” on NoSQL databases appeared, not without their own idiosyncrasies and issues, and more recently a new… er… NewSQL paradigm has been proposed as a scalable, efficient and reliable solution to the problem. Have a look at why Facebook is trapped in MySQL ‘fate worse than death’.

Open source music identification using Audio Fingerprints


The guys at Echnest just realease their Echoprint – Open source music identification service. Looks really neat. There’s even an iOS app example here.

Last.FM also recently provided a audio fingerprinting API. More about this here.

So now it’s really simple to integrate audio fingerprinting in opensource apps. Looking forward to try it out soon.


Music Ontology Specification

February 14, 2011 Leave a comment

The Music Ontology Specification provides main concepts and properties fo describing music (i.e. artists, albums, tracks, but also performances, arrangements, etc.) on the Semantic Web. This document contains a detailed description of the Music Ontology.

via Music Ontology Specification.

Categories: Datasets Tags: ,

DBpedia: extract structured information from Wikipedia

February 14, 2011 Leave a comment : About.

“Wikipedia currently only supports keyword-based search and does not allow more expressive queries like “Give me all cities in New Jersey with more than 10,000 inhabitants” or “Give me all Italian musicians from the 18th century.” This lowers the overall utility of Wikipedia.

One major application domain for the DBpedia data set is to enable sophisticated queries against Wikipedia, which could revolutionize the access to this valuable knowledge source.”

Categories: Datasets Tags: , ,

Million Song Dataset | scaling MIR research

February 10, 2011 Leave a comment

An impressive feature data set extracted from music audio files by LabRosa using the Echonest API:

Million Song Dataset | scaling MIR research.

However, the feature set is (obviously) fixed and you have no access to the audio content of each music piece in the dataset (and there are some understandable reasons for that – check the FAQ). Nevertheless, a lot can already be done using this data (mainly for the machine learning, data mining, Information Retrieval folks), and this effort is a great contribution for the development of more advanced music recommendation systems.

Personally, I’m still very much into audio signal processing (mainly related to sound segregation, where I’m still trying to explore the basics of machine listening), so for now this dataset is not that useful to me…

Congratulations to LabRosa and Echonest for the effort and for making this public and available to the R&D community!

Categories: Datasets, MIR, Research Tags: , , ,

Data Visualization for Human Perception by Stephen Few

February 1, 2011 Leave a comment
Categories: Datasets, Research Tags: