Ben Chuanlong Du's Blog

It is never too late to learn.

Handling Categorical Variables in Machine Learning

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Categorical variables are very common in a machine learning project. On a high level, there are two ways to handle a categorical variable.

  1. Drop a categorical variable if a categorical variable …

Tips on LightGBM

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

  1. It is strongly suggested that you load data into a pandas DataFrame and handle categorical variables by specifying a dtype of "category" for those categorical variables.

    df.cat_var = df.cat_var.astype …

Training Deep Neural Networks

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Rules of Thumb for Training

https://arxiv.org/pdf/1206.5533.pdf**

https://towardsdatascience.com/17-rules-of-thumb-for-building-a-neural-network-93356f9930af

https://hackernoon.com/rules-of-thumb-for-deep-learning-5a3b6d4b0138

https://stats.stackexchange.com/questions/181/how-to-choose-the-number-of-hidden-layers-and-nodes-in-a-feedforward-neural-netw

Batch size affects both …

Learning to Rank

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

https://www.kaggle.com/c/home-credit-default-risk/discussion/61613

https://studylib.net/doc/18339870/yetirank--everybody-lies

http://proceedings.mlr.press/v14/gulin11a/gulin11a.pdf

Model Architecture Ranking Category SOTA Comments Paper
RankNet NN …

Effect of Duplicating Observations in Linear Models

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

coefficients don't change but variance become smaller. use formula to show it ...

Complete Duplication of All Data Points

Complete Duplication of Some Data Points

Duplication with Noise

common in computer vision …