Ben Chuanlong Du's Blog

It is never too late to learn.

Distributed Training of Models on Spark

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

XGBoost

http://www.legendu.net/misc/blog/use-xgboost-with-spark/

LightGBM

http://www.legendu.net/misc/blog/use-lightgbm-with-spark/

BigDL

MMLSpark

Apache Ray

You can run Apache Ray on top of Spark via analytics-zoo, which enables you to run any Python machine lerning library in distributed fashion. But I'm not sure whether this is a good idea.

yahoo/TensorFlowOnSpark

PyTorch

H2O

https://github.com/h2oai/sparkling-water http://docs.h2o.ai/sparkling-water/2.2/latest-stable/doc/pysparkling.html http://h2o-release.s3.amazonaws.com/h2o/master/4273/docs-website/h2o-docs/faq/sparkling-water.html https://docs.databricks.com/_static/notebooks/h2o-sparkling-water-python.html

SystemML

elephas

Distributed training with Keras and Spark.

References

https://towardsdatascience.com/deep-learning-with-apache-spark-part-1-6d397c16abd

Comments