Data Profiling Tools

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

ydata-profiling
ydata-profiling (successor to pandas-profiling) is tool for profiling pandas and Spark DataFrames. One possible way to work with large data is to do simple profiling on the large DataFrame and then sample a relative small data and use pandas-profiling to profile it.
great_expectations helps data teams eliminate pipeline debt, through data testing, documentation, and profiling.
deequ
Optimus
Optimus is the one that is closest to what I want to achieve so far. Looks promissing.
Apache Griffin
Apache Griffin supports data profiling but seems to be heavy and limited.

Other Adhoc Examples¶