Ben Chuanlong Du's Blog

And let it direct your passion with reason.

Hands on pathlib.Path

In [3]:
from pathlib import Path, PureWindowsPath
import itertools
In [4]:
path = Path(".").resolve()
path
Out[4]:
PosixPath('/workdir/archives/blog/en/content/2020/10/python-pathlib.Path')

No Trailing Slashes

A path object always removes the trailing slashes. And Path can be used to manipulate URLs tool, which is convenient.

In [6]:
Path("https://github.com/dclong/dsutil//")
Out[6]:
PosixPath('https:/github.com/dclong/dsutil')

Path.absolute

Generally speaking, Path.resolve is preferred over Path.absolute.

In [16]:
path.absolute()
Out[16]:
PosixPath('/app/archives/blog/misc/content')

Path.anchor

In [17]:
path.anchor
Out[17]:
'/'

Path.as_posix

In [18]:
path.as_posix()
Out[18]:
'/app/archives/blog/misc/content'

Path.as_uri

In [19]:
path.as_uri()
Out[19]:
'file:///app/archives/blog/misc/content'

Path.chmod(mode)

Unlike Path.mkdir, mode is the final mode of the file. It is not affected by the current umask.

In [15]:
help(path.chmod)
Help on method chmod in module pathlib:

chmod(mode) method of pathlib.PosixPath instance
    Change the permissions of the path, like os.chmod().

Path.cwd

Path.cwd is a static method to get the current working direcotry.

In [16]:
Path.cwd()
Out[16]:
PosixPath('/workdir/archives/docker')

Path.drive

In [20]:
path.drive
Out[20]:
''

Path.exists

In [21]:
path.exists()
Out[21]:
True

Path.expanduser

Path.home() is preferred to Path('~').expanduser().

In [23]:
Path("~").expanduser()
Out[23]:
PosixPath('/root')
In [24]:
Path("~/archives").expanduser()
Out[24]:
PosixPath('/root/archives')

Path.glob

  1. pathlib.Path.glob returns a generator (rather than list).

Find all Jupyter/Lab notebooks in the current directory.

In [4]:
path = Path()
path.glob(pattern="*.ipynb")
Out[4]:
<generator object Path.glob at 0x123918c10>

Find all CSS files in the current directory.

In [5]:
list(path.glob(pattern="*.css"))
Out[5]:
[]

Find all Jupyter/Lab notebooks files under the current directory and its sub-directories.

In [3]:
nbs = Path().glob("**/*.ipynb")
In [4]:
len(list(nbs))
Out[4]:
402

Path.home

Both Path.home() (preferred) and Path('~').expanduser() returns a new Path object representing the user's home directory, which is the same as os.path.expanduser('~'). However, Path('~').resolve() does not return a new Path object representing the user's home directory!

In [37]:
Path.home()
Out[37]:
PosixPath('/root')
In [39]:
Path("~").expanduser()
Out[39]:
PosixPath('/root')
In [40]:
Path("~").resolve()
Out[40]:
PosixPath('/app/archives/blog/misc/content/~')

Path.iterdir

Iterate the content of the directory.

The code below shows the first 5 files/folders under path.

In [13]:
[p for p in itertools.islice(path.iterdir(), 5)]
Out[13]:
[PosixPath('2018-07-21-conda-build-issue.markdown'),
 PosixPath('2018-10-29-monitoring-and-alerting-tools.markdown'),
 PosixPath('2019-02-10-unit-testing-debugging-python.markdown'),
 PosixPath('2017-01-15-chinese-locale.markdown'),
 PosixPath('2012-05-17-java-difference-abstract-interface.markdown')]

Path.mkdir(mode=0o777, parents=False, exist_ok=False)

  1. The option parents=True creates missing parent directories as needed. The option exist_ok=True makes FileExistsError to be omitted. path.mkdir(parents=True, exists_ok=True) is equivalent to the shell command mkdir -p path.

  2. By default, the mode option has the value 777. However, this doesn't mean that a created directory will have the permission 777 by default. The option mode works together with umask to decide the permission of the created directory. To make a created directory to have the permission 777, you can set umask to 0 first.

     :::python
     import os
     mask = os.umask(0)
     Path("/opt/spark/warehouse").mkdir(parents=True, exist_ok=True)
     os.umask(mask)
    
    

    Another way is to manually set the permission using the method Path.chmod (not affect by the current umask) after creating the directory.

Path.name

In [26]:
path.name
Out[26]:
'content'
In [28]:
Path("/root/abc.txt").name
Out[28]:
'abc.txt'

Path.parent

In [29]:
path.parent
Out[29]:
PosixPath('/app/archives/blog/misc')

Notice that the parent of the root directories (/, C:, etc.) are themselves.

In [13]:
path = Path("/")
path
Out[13]:
PosixPath('/')
In [15]:
path.parent
Out[15]:
PosixPath('/')
In [16]:
path.parent is path
Out[16]:
True
In [3]:
PureWindowsPath("C:").parent
Out[3]:
PureWindowsPath('C:')

Path.parts

In [31]:
path.parts
Out[31]:
('/', 'app', 'archives', 'blog', 'misc', 'content')

Path.resolve

In [34]:
path.resolve()
Out[34]:
PosixPath('/app/archives/blog/misc/content')

Path.relative_to

In [2]:
Path("/app/archives/blog").relative_to("/app")
Out[2]:
PosixPath('archives/blog')
In [3]:
Path("/app/archives/blog").relative_to(Path("/app"))
Out[3]:
PosixPath('archives/blog')

Path.rename(target)

On Windows, if target exists, FileExistsError will be raised. The behavior of Path.rename on Linux is as below (assume the user has permissions):

  • If target is an existing file, it is overwritten silently.
  • If target is an existing empty directory, it is overwritten silently.
  • if target is an existing non-empty directory, an OSError (Errno 39) is thrown.

If you want to overwrite existing target unconditionally, you can use the method shutil.copytree(src, dst, dirs_exist_ok=True) and then remove the source directory.

In [11]:
!rm -rf test1 && mkdir -p test1 && touch test1/1.txt && ls test1/
1.txt
In [12]:
!rm -rf test2 && mkdir -p test2 && touch test2/2.txt && ls test2/
2.txt
In [13]:
Path("test1").rename("test2")
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
Cell In[13], line 1
----> 1 Path("test1").rename("test2")

File /usr/lib/python3.10/pathlib.py:1234, in Path.rename(self, target)
   1224 def rename(self, target):
   1225     """
   1226     Rename this path to the target path.
   1227 
   (...)
   1232     Returns the new Path instance pointing to the target path.
   1233     """
-> 1234     self._accessor.rename(self, target)
   1235     return self.__class__(target)

OSError: [Errno 39] Directory not empty: 'test1' -> 'test2'
In [8]:
!ls test2/
untitled.txt

Path.stem

In [32]:
path.stem
Out[32]:
'content'
In [33]:
Path("/root/abc.txt").stem
Out[33]:
'abc'

Make this path a symbolic link to target. Under Windows, target_is_directory must be True (default False) if the link’s target is a directory. Under POSIX, target_is_directory’s value is ignored. It is suggested that you always set target_is_directory to be True (no matter of OS) if the link's target is a directory.

Notice that a FileExistsError is throw if the current path already exists. You can first unlink it (using Path.unlink) and then create a symbolic link again using Path.symlink_to.

In [9]:
import tempfile

Path("/tmp/_12345").symlink_to(path, target_is_directory=True)
In [11]:
!ls /tmp/_12345 | head -n 5
2009-11-01-format-data-in-sas.markdown
2009-11-01-general-tips-for-sas.markdown
2009-11-01-macro-in-sas.markdown
2010-11-20-clustering-in-r.markdown
2010-11-20-general-tips-for-latex.markdown

Path.__str__

In [35]:
path.__str__()
Out[35]:
'/app/archives/blog/misc/content'
In [36]:
str(path)
Out[36]:
'/app/archives/blog/misc/content'

Path.with_name

Return a new path with the name changed. If the original path doesn’t have a name, ValueError is raised.

In [3]:
path = Path("/root/abc.txt")
path
Out[3]:
PosixPath('/root/abc.txt')
In [4]:
path.with_name("ABC.txt")
Out[4]:
PosixPath('/root/ABC.txt')
In [5]:
path.with_name(path.name.replace("abc", "ABC"))
Out[5]:
PosixPath('/root/ABC.txt')

Or another way is to manipulate the underlying string of the path directly.

In [6]:
str(path).replace("abc", "ABC")
Out[6]:
'/root/ABC.txt'

Path.with_suffix

Return a new path with the suffix changed. If the original path doesn’t have a suffix, the new suffix is appended instead. If the suffix is an empty string, the original suffix is removed.

In [2]:
path = Path("/root/abc.txt")
path
Out[2]:
PosixPath('/root/abc.txt')

Change the file extension to .pdf.

In [3]:
path.with_suffix(".pdf")
Out[3]:
PosixPath('/root/abc.pdf')

Remove the file extension.

In [4]:
path.with_suffix("")
Out[4]:
PosixPath('/root/abc')

Examples of Using pathlib.Path

Rename files in the current directory.

In [1]:
for path in Path(".").iterdir():
    if path.suffix == ".txt":
        path.rename(path.name.replace("1m", "100k"))

Comments