Home

Project

Suggested datasets

References

Machine Learning/Data Mining/etc.
- Mining of Massive Datasets by Anand Rajaraman, Jeﬀrey D. Ullman
- Trevor Hastie, Robert Tibshirani, and J. H. Friedman. The elements of statistical learning: data mining, inference, and prediction. 2nd Ed. New York: Springer-Verlag, 2009.
Running data analysis algorithms on Map/Reduce
- Map-Reduce for Machine Learning on Multicore In NIPS (2006), pp. 281-288 by Cheng T. Chu, Sang K. Kim, Yi A. Lin, et al. edited by Bernhard Schölkopf, John C. Platt, Thomas Hoffman
- Weizhong Zhao, Huifang Ma, and Qing He. 2009. Parallel K-Means Clustering Based on MapReduce. In Proceedings of the 1st International Conference on Cloud Computing (CloudCom '09), Martin Gilje Jaatun, Gansen Zhao, and Chunming Rong (Eds.). Springer-Verlag, Berlin, Heidelberg, 674-679.
Stream processing
- Graham Cormode and Marios Hadjieleftheriou. 2009. Finding the frequent items in streams of data. Commun. ACM 52, 10 (October 2009), 97-105.
- Brian Hayes. 2008. The Britney Spears Problem. American Scientist 96, 4 (July-August 2008), 274.
- Gurmeet Singh Manku and Rajeev Motwani. 2002. Approximate frequency counts over data streams. In Proceedings of the 28th international conference on Very Large Data Bases (VLDB '02). VLDB Endowment 346-357.
- Ahmed Metwally, Divyakant Agrawal, and Amr El Abbadi. 2005. Efficient computation of frequent and top-k elements in data streams. In Proceedings of the 10th international conference on Database Theory (ICDT'05), Thomas Eiter and Leonid Libkin (Eds.). Springer-Verlag, Berlin, Heidelberg, 398-412.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Project

References

Clone this wiki locally