Ben Chuanlong Du's Blog

It is never too late to learn.

Spark Issue: InvalidInputException for Some Hive Data Partitions

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!


15/12/29 17:22:27 ERROR yarn.ApplicationMaster: User class threw exception: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist. org.apache.hadoop.mapred.InvalidInputException: Input path does not exist. at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus( at org.apache.hadoop.mapred.FileInputFormat.listStatus( at org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus( at org.apache.hadoop.mapred.FileInputFormat.getSplits( at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:207)


For some reason the HDFS path registered in the Hive meta store does not exist in the physical path - (Infra team should take care of this)

Solution: add the following configuration to force checking partition paths when submiting Spark jobs.

--conf spark.sql.hive.verifyPartitionPath=true
