Ben Chuanlong Du's Blog

It is never too late to learn.

Use LightGBM With Spark

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

https://github.com/Azure/mmlspark/blob/master/docs/lightgbm.md

MMLSpark seems to be the best option to use train models using LightGBM on a Spark cluster. Note that MMLSpark requires Scala 2.11, Spark 2.4+, and Python 3.5+. You can use MMLSpark to run a LightGBM model on Spark too. The method loadNativeModelFromFile (of a model) to load a LightGMB model from a native LightGBM text file. There is no need for you to convert the trained model to PMML or ONNX format.

https://www.reddit.com/r/datascience/comments/9w2qn8/deploying_a_lightgbm_model_with_spark/

https://towardsdatascience.com/build-xgboost-lightgbm-models-on-large-datasets-what-are-the-possible-solutions-bf882da2c27d

Installation

https://mmlspark.blob.core.windows.net/website/index.html#install

Tutorials and Examples

LightGBM - Quantile Regression for Drug Discovery.ipynb

https://github.com/Azure/mmlspark/tree/master/notebooks/samples

References

https://towardsdatascience.com/build-xgboost-lightgbm-models-on-large-datasets-what-are-the-possible-solutions-bf882da2c27d

https://mmlspark.blob.core.windows.net/docs/1.0.0-rc1/scala/index.html#com.microsoft.ml.spark.lightgbm.LightGBMClassifier

https://mmlspark.blob.core.windows.net/docs/1.0.0-rc1/scala/index.html#com.microsoft.ml.spark.lightgbm.LightGBMRegressor

https://github.com/Azure/mmlspark/blob/master/notebooks/samples/LightGBM%20-%20Quantile%20Regression%20for%20Drug%20Discovery.ipynb

Comments