Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
This post collects some typical pure Python errors in PySpark applications.
Symptom 1¶
object has no attribute
Solution 1¶
Fix the attribute name.
Symptom 2¶
No such file or directory
Solution 2¶
Correct the path to the file/directory or upload the file using --file of the spark-submit command.
Symptom 3¶
error: the following arguments are required
Solution 3¶
Add the required arguments to the command invoking your Python script.
Symptom 4¶
error: unrecognized arguments
Solution 4¶
Correct tthe argument name or remove non-exist arguments from the command invoking your Python script.
Symptom 5¶
error: argument
Solution 5¶
Symptom 6¶
ModuleNotFoundError: No module named
Solution 6¶
Fix typo in the module name or install missing modules.
Symptom 7¶
SyntaxError: invalid syntax
Solution 7¶
Fix syntax error in your Python script.
Symptom 8¶
NameError: name .* is not defined
Solution 8¶
Fix typo in variable/function name or import/define it.
Symptom 9¶
Runtimeerror: Result vector of pandas_udf was not the required length: expected 1, got 101456
Cause 9¶
The length of the result returned by the pandas UDF does not match the length of its input series. Notice that if your pandas UDF parses the stdout of a command, it is possible that extra prints to the stdout was introduced which breaks the parsing.
Solution 9¶
Fix issue in the pandas UDF.
Symptom 10¶
Error: b"error: Found argument ‘--id1-path’ which wasn’t expected, or isn’t valid in this context ...
Cause 10¶
The argument --id1-path is not a valid argument to the command called by Python.
Solution 10¶
Fix the non-valide argument of the command called by Python.
Symptom 11¶
subprocess.CalledProcessError: Command ‘./pineapple test --id1-path id1.txt’ returned non-zero exit status 1.
Cause 11¶
The command invoked by Python failed.
Solution 11¶
Figure out why the command invoked by Python failed and fix the issue.
Symptom 12¶
TypeError: object of type ‘generator’ has no len()
Cause 12¶
Calling the function len on a generator.
Solution 12¶
Assume it is an iterator
(a generator is a special case of iterator)
,
use sum(1 for _ in it)
instead of len(it).
.
Of course,
you have to make sure that the iterator is finite.
Symptom 13¶
pyarrow.lib.ArrowInvalid: Could not convert ... with type function: tried to convert to int
Cause 13¶
The Python object (e.g., a function object) cannot be converted to int in PyArrow.
Solution 13¶
Fix the issue in the Python code. For example, did you use a function without passing parameters to it?
Symptom 14¶
pyarrow.lib.ArrowInvalid: Value 2147483651 too large to fit in C integer type
Cause 14¶
Cast a long integer (64 bits) in Python to int (32 bits) in PyArrow.
Solution 14¶
Use long integer instead for the return type in pandas UDF.
Symptom 15¶
IndentationError: unexpected indent
Cause 15¶
Syntax error in the Python code.
Solution 15¶
Fix the syntax error in the Python code.