One more dataset for MIR and Music Recommendation, compiled by Oscar Celma, and based around Last.fm data and APIs.
And some more detailed info here.
MySQL has been pushed to its limits in large scale web scenarios, where the load of millions of users concurrently accessing SQL relational databases start to show the huge overheads and subsequent limitation of this type of database. A recent “move” on NoSQL databases appeared, not without their own idiosyncrasies and issues, and more recently a new… er… NewSQL paradigm has been proposed as a scalable, efficient and reliable solution to the problem. Have a look at why Facebook is trapped in MySQL ‘fate worse than death’.
Last.FM also recently provided a audio fingerprinting API. More about this here.
So now it’s really simple to integrate audio fingerprinting in opensource apps. Looking forward to try it out soon.
The Music Ontology Specification provides main concepts and properties fo describing music (i.e. artists, albums, tracks, but also performances, arrangements, etc.) on the Semantic Web. This document contains a detailed description of the Music Ontology.
“Wikipedia currently only supports keyword-based search and does not allow more expressive queries like “Give me all cities in New Jersey with more than 10,000 inhabitants” or “Give me all Italian musicians from the 18th century.” This lowers the overall utility of Wikipedia.
One major application domain for the DBpedia data set is to enable sophisticated queries against Wikipedia, which could revolutionize the access to this valuable knowledge source.”
However, the feature set is (obviously) fixed and you have no access to the audio content of each music piece in the dataset (and there are some understandable reasons for that – check the FAQ). Nevertheless, a lot can already be done using this data (mainly for the machine learning, data mining, Information Retrieval folks), and this effort is a great contribution for the development of more advanced music recommendation systems.
Personally, I’m still very much into audio signal processing (mainly related to sound segregation, where I’m still trying to explore the basics of machine listening), so for now this dataset is not that useful to me…
Congratulations to LabRosa and Echonest for the effort and for making this public and available to the R&D community!