Greinasafn fyrir flokkinn: Data Mining

FLICKSTERS

– as in FLICKr cluSTERS.

During a Data Mining course at ITU I participated in a group project that aimed at identifying places of interest by clustering photographs from Flickr by their location and discovering frequent patterns in the places individual photographers visit.

Initially the idea was to cluster the photographers themselves by their common tags of interest, but later on in the process we moved on to this concept of automatically finding spots people flock to, by looking at where they commonly use their camera, as suggested by my wife, Edda Lára, when I was discussing this project with her.  This could be useful for tourist offices in recommending places to visit or avoid, or for city planners to identify load points where special emphasis should be placed on installing public facilities, etc.

For clustering we used the k-Means method and frequency analysis was done with the Apriori algorithm.

In this project I had the task of implementing the photo clustering and writing about it in the report, while teammates Julian Stengaard and Kasper Appeldorff implemented the data gathering, frequency analysis and visualization, along with writing the rest of the report, which can be read here:

http://itu.dk/people/bjrr/datamining/Group_8_-_Data_Mining_-_Report_-_Flicksters.pdf

The k-Means clustering implementation is here:  https://github.com/bthj/Flicksters

And Julian’s cluster visualisation can be seen at:  http://itu.dk/~bjrr/datamining/ClusterVisualization/

 

– a quick presentation for an oral exam based on this project.

 

Before proceeding on to the group project, we were to complete an individual assignment that involved using „the data set … created during the first lecture, through … filling in a questionnaire“ and we were instructed to „formulate one or several questions to be answered based on this data, write the code and carry out the analysis necessary to answer them. … apply at least two different preprocessing methods, one frequent pattern mining method, one clustering method and one supervised learning method. (For example, normalization+missing value replacement+Apriori+k-means+ID3, though many other combinations are possible.)“ – according the project description.

Here’s my report for the individual assignment: http://itu.dk/people/bjrr/datamining/Report__DataMining_IndividualAssignment__bjrr.pdf

and the the code implemented:

https://github.com/bthj/DataMining__MDMI-Spring-2014