Ben Chuanlong Du's Blog

It is never too late to learn.

Check Whether a File Is a Text File in Python

python-magic

python-magic is the recommended library to use for checking whether a file is a text file.

In [2]:
!pip3 install python-magic
Collecting python-magic
  Downloading python_magic-0.4.18-py2.py3-none-any.whl (8.6 kB)
Installing collected packages: python-magic
Successfully installed python-magic-0.4.18
In [3]:
import magic
In [4]:
f = magic.Magic(mime=True, uncompress=True)
In [2]:
!wget https://user-images.githubusercontent.com/824507/128439087-0c935d86-bb34-4c2c-8e69-6d78b3022833.png -O 4s.jpg
--2021-08-05 17:54:32--  https://user-images.githubusercontent.com/824507/128439087-0c935d86-bb34-4c2c-8e69-6d78b3022833.png
Resolving user-images.githubusercontent.com (user-images.githubusercontent.com)... 185.199.110.133, 185.199.111.133, 185.199.109.133, ...
Connecting to user-images.githubusercontent.com (user-images.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4588 (4.5K) [image/png]
Saving to: ‘4s.jpg’

4s.jpg              100%[===================>]   4.48K  --.-KB/s    in 0.001s  

2021-08-05 17:54:32 (2.96 MB/s) - ‘4s.jpg’ saved [4588/4588]

In [5]:
f.from_file("4s.jpg")
Out[5]:
'image/jpeg'
In [1]:
!wget https://storage.googleapis.com/erwinh-public-data/bankingdata/bank-full.csv
--2021-08-05 17:53:53--  https://storage.googleapis.com/erwinh-public-data/bankingdata/bank-full.csv
Resolving storage.googleapis.com (storage.googleapis.com)... 2607:f8b0:400a:801::2010, 2607:f8b0:400a:803::2010, 2607:f8b0:400a:805::2010, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|2607:f8b0:400a:801::2010|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2021-08-05 17:53:54 ERROR 404: Not Found.

In [6]:
f.from_file("bank-full.csv")
Out[6]:
'text/plain'

binaryornot

Not maintained actively.

In [ ]:
from binaryornot.check import is_binary
In [ ]:
is_binary("README.rst")

mimetypes

memetypes does NOT work well!

In [ ]:
 

Comments