Playground where some machine learning problems from Kaggle are addressed. My Kaggle’s Kernels Summary page gives the full list of public kernels I wrote.

Personalized Medicine: Redefining Cancer Treatment

To run the R scripts, first download the data sets on Kaggle’s dedicated page, and copy the two files training_text, training_variants, test_text and test_variants in the folder data (located in the working directory where the scripts are).

Naive Bayes vs. RF, GBM, GLM & DL

NCIt & Binary Encoding

New York City Taxi Trip Duration

To run the R scripts, first download the data sets on Kaggle’s dedicated page, and copy the two files train.csv and test.csv in the folder data (located in the working directory where the scripts are).

Comparing Methods Using the xkcd Theme

Deep Features

Using XGBoost

Web Traffic Time Series Forecasting

To run the R scripts, first download the data sets on Kaggle’s dedicated page, and copy the two files train_1.csv and key_1.csv in the folder data (located in the working directory where the scripts are).

HoltWinters & ARIMA

Mercedes-Benz Greener Manufacturing

To run the R scripts, first download the data sets on Kaggle’s dedicated page, and copy the two files train.csv and test.csv in the folder data (located in the working directory where the scripts are).

Principal Component Analysis for Dimensionality Reduction & Boosting

Forwards Stepwise Selection of Features & Boosting

A Complete Solution in 10 Lines of Code

Titanic: Machine Learning from Disaster

Two different approaches to the Titanic problem are presented. The first one follows a typical supervised learning process to craft a binary classifier using a random forest algorithm: Feature tinkering, feature selection and grid search. The second one falls in the semi-supervised learning realm: after preparing adequate features, an autoencoder—unsupervised learning technique aiming at reconstructing its input signal—is trained from the entire data set (including the unlebeled test set); then, the autoencoder is used as pre-trained layers to a neural network trained—using data from the training set only—to classify its input. Although the first approach seems to perform better, the second one yields a surprisingly good outcome, somewhat comparable to the first approach.

To run the R scripts, first download the data sets on Kaggle’s dedicated page, and copy the two files train.csv and test.csv in the folder data (located in the working directory where the scripts are).

Random Forest

Autoencoder

License and Source Code

© 2017 Loic Merckel, Apache v2 licensed. The source code is available on GitHub.