Music Recommendation Datasets from Last.fm, by Oscar Celma
:: Music Recommendation Datasets ::.
One more dataset for MIR and Music Recommendation, compiled by Oscar Celma, and based around Last.fm data and APIs.
And some more detailed info here.
:: Music Recommendation Datasets ::.
One more dataset for MIR and Music Recommendation, compiled by Oscar Celma, and based around Last.fm data and APIs.
And some more detailed info here.
MySQL has been pushed to its limits in large scale web scenarios, where the load of millions of users concurrently accessing SQL relational databases start to show the huge overheads and subsequent limitation of this type of database. A recent “move” on NoSQL databases appeared, not without their own idiosyncrasies and issues, and more recently a new… er… NewSQL paradigm has been proposed as a scalable, efficient and reliable solution to the problem. Have a look at why Facebook is trapped in MySQL ‘fate worse than death’.
The guys at Echnest just realease their Echoprint – Open source music identification service. Looks really neat. There’s even an iOS app example here.
Last.FM also recently provided a audio fingerprinting API. More about this here.
So now it’s really simple to integrate audio fingerprinting in opensource apps. Looking forward to try it out soon.
The Music Ontology Specification provides main concepts and properties fo describing music (i.e. artists, albums, tracks, but also performances, arrangements, etc.) on the Semantic Web. This document contains a detailed description of the Music Ontology.
“Wikipedia currently only supports keyword-based search and does not allow more expressive queries like “Give me all cities in New Jersey with more than 10,000 inhabitants” or “Give me all Italian musicians from the 18th century.” This lowers the overall utility of Wikipedia.
One major application domain for the DBpedia data set is to enable sophisticated queries against Wikipedia, which could revolutionize the access to this valuable knowledge source.”
An impressive feature data set extracted from music audio files by LabRosa using the Echonest API:
Million Song Dataset | scaling MIR research.
However, the feature set is (obviously) fixed and you have no access to the audio content of each music piece in the dataset (and there are some understandable reasons for that – check the FAQ). Nevertheless, a lot can already be done using this data (mainly for the machine learning, data mining, Information Retrieval folks), and this effort is a great contribution for the development of more advanced music recommendation systems.
Personally, I’m still very much into audio signal processing (mainly related to sound segregation, where I’m still trying to explore the basics of machine listening), so for now this dataset is not that useful to me…
Congratulations to LabRosa and Echonest for the effort and for making this public and available to the R&D community!
From a chemistry blog, but useful for other areas too:
This is really cool news! The newest Anrdoid NDK (revsion 5) now allows to natively program full Android apps in C/C++!