Ben Chuanlong Du's Blog

It is never too late to learn.

Handle Categorical Variables in LightGBM

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

LightGBM support pandas columns of category type. As a matter of fact, this is the suggested way of handling categorical columns in LightGBM.

data[feature] = pd.Series(data[feature], dtype="category")

A LightGBM model (which is a Booster object) records categories of each categorical feature. This information is used to set categories of each categorical feature during prediction, which ensures that a LightGBM model can always handle categorical features correctly.

However, be careful about importance of categorical features. The article Beware of categorical features in LGBM! argues that the Python library shap is better (than built-in methods) for reporting feature importance, especially when you deal with categorical features. Please refer to Interpreting a LightGBM model for more details on how to use shap.

In [ ]:
 

Comments