Distributed Machine Learning Recommender
Leveraging Spark and XGBoost to build a distributed recommender system for large-scale restaurant recommendation on Yelp.
Leveraging Spark and XGBoost to build a distributed recommender system for large-scale restaurant recommendation on Yelp.
Designing a distributed batch-based clustering pipeline for large scale data clustering.
Building a Spark-based algorithm for large scale community detection on social graphs.
Designing a distributed computing algorithm to identify items frequently bought together on purchasing history data.
Crime analytics via Frequent-Pattern Growth for mining association rules between locations, crimes and resolutions.
Building a scalable algorithm for estimating unique active users in a size-fluctuating stream of data.
Using Spark and MapReduce to analyse a large dataset which requires the appropriate big data tools for insight extraction.
Tracking never-before-seen datapoints in a continuous data stream.
Designing a probability-adjusting data sampler for scalable unbiased sampling of data streams.
Exploring the Netflix movie dataset containing 100M movie ratings, then creating a recommender system based on vector similarity using sparse matrixes.