Ben Chuanlong Du's Blog

It is never too late to learn.

Get Size of Tables on HDFS

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

The HDFS Way

You can use the hdfs dfs -du /path/to/table command or hdfs dfs -count -q -v -h /path/to/table to get the size of an HDFS path (or table). However, this only works if the cluster supports HDFS. If a Spark cluster exposes only JDBC/ODBC APIs, this method does not work.

The SQL Query Way

Size of One Table

tblproperties will give the size of the table and can be used to grab just that value if needed.

:::sql
describe formatted table_name;
show tblproperties table_name
-- or
show tblproperties table_ame("rawDataSize")

Yes the output is bytes. Also, this only works for non-partitioned tables which have had stats run on them.

Size of Multiple Tables

:::sql
show table extended in some_db 

ANALYZE TABLE tablename [PARTITION(partcol1[=val1], partcol2[=val2], ...)] COMPUTE STATISTICS [noscan];

ANALYZE TABLE ops_bc_log PARTITION(day) COMPUTE STATISTICS noscan;

In [ ]:
 

Comments