Ben Chuanlong Du's Blog

It is never too late to learn.

The Best Way to Find Files and Manipulate Them

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

There are many cool (command-line) tools which can help you quickly find/locate files.

  1. find
  2. locate
  3. osquery
  4. fselect
  5. ripgrep

Those tools can be combined with the pipe operator | to do further filtering or manipulation. However, after trying all tools I have to state that the best way for a Python user is leveraging the pathlib module in a Jupyter/Lab notebook (or in a IPython shell). Even though you might have to write slight more code in Python than in other command-line tools, Python code has the following advantages.

  1. Python code is much more intutive to understand. You do not have to remember weird command-line options or to handle corner cases.

  2. Python code is much more flexible especially when used in a notebook or in a IPython shell. You can easily achieve more complicated operations in Python code. If an operation requires lots of Python code, you can also encapsulate it into a Python module. What's more, you can mix shell command code in your Python code if you are using a notebook or a IPython shell, which makes things much more convenient.

  3. You do have to worry about side effect (e.g., spaces, special character escaping, etc.) caused by shell.

Below are some examples.

In [1]:
from pathlib import Path 
  1. Find all files with the extension .out in the current directory and its subdirectory, and then make them executable.
In [11]:
!find . -type f -iname *.out -exec chmod +x '{}' \;
In [12]:
!find . -type f -iname *.out -print0 | xargs -0 chmod +x
In [ ]:
!fselect path from . where is_file = 1 and name like %.out | xargs chmod +x
In [2]:
for p in Path().glob("**/*"):
    if p.suffix == ".out":
        p.chmod(0o700)
  1. Find files whose names contain "conflicted" and remove them.
In [13]:
!find . -iname '*conflicted*' -print0 | xargs -0 rm
rm: missing operand
Try 'rm --help' for more information.
In [14]:
!fselect path from . where is_file = 1 and name like %conflicted% | xargs rm 
/bin/bash: fselect: command not found
rm: missing operand
Try 'rm --help' for more information.
In [15]:
for path in Path().glob("**/*"):
    if path.is_file() and "conflicted" in path.name:
        path.unlink()
  1. Find files with 0 size.
In [19]:
!find . -size 0
./abc/a.out
In [20]:
!fselect path from . where size = 0
/bin/bash: fselect: command not found
In [21]:
for path in Path().glob("**/*"):
    if path.stat().st_size == 0:
        print(path)
abc/a.out
  1. Find empty directories.
In [23]:
!find . -type d -empty
./.ipynb_checkpoints
In [27]:
for path in Path().glob("**/*"):
    if path.is_dir() and not any(True for _ in path.iterdir()):
        print(path)
.ipynb_checkpoints
In [ ]:
        :::bash


3. Find files greater than 1G.

        :::bash
        find . -xdev -type f -size +1G


4. First find files and then pass them to other commands is a very useful trick.
    For example, 
    you can use the following command to find all R scripts containing the word `paste`.

        :::bash
        find . -type f -iname '*.r' | grep --color=auto paste


## Time Related Finding

1. Find files created with in 60 minutes.

        :::bash
        find . -cmin 60

2. Find files more than 30 days ago
        
        :::bash
        find . -ctime +30

3. Find file less than 30 days ago.

        :::bash
        find . -ctime -30

4. Find files that are exactly 30 days ago.

        :::bash
        find . -ctime 30

2. Find all files modified on the June 7, 2007 in the current directory.

        :::bash
        find . -type f -newermt 2007-06-07 ! -newermt 2007-06-08


3. Find all files accessed on the Sep 29, 2008 in the current directory.

        :::bash
        find . -type f -newerat 2008-09-29 ! -newerat 2008-09-30

4. Find files which had their permission changed on the same day.

        :::bash
        find . -type f -newerct 2008-09-29 ! -newerct 2008-09-30

## File Type Related Finding

1. Find broken symbolic links.

        :::bash
        find . -xtype l
        # or
        find -L . -type l

3. Find executable files in current directory 
        
        :::bash
        find .  -maxdepth 1 -type f -executable

## User Related Finding

10. Find files that belong to a user but writable by its group or other people.

        :::bash
        find /path/to/file -user user1 -perm /022

11. Check file type of all files under the current directory.

        :::bash
        find . -type f | xargs file

-perm mode: File's permission bits are exactly mode (octal or symbolic).
-perm -mode: All  of  the  permission bits mode are set for the file. 
-perm /mode: Any of the permission bits mode are set for the file. 
a little bit trick about how to understand the last 2 permission criterias.
as suggested, think in terms of permission BITs (0/1)

The following command finds all files that readable or writable by the group or (readable or writable) by others.

    :::bash
    find /path/to/file -user user1 -perm /066

The following command find all files that readable and writable by the group and (readable and writable) by others.

    :::bash
    find /path/to/file -user user1 -perm -066

The following command find all files that readable or writable by the group and (readable or writable) by others.

    :::bash
    find /path/to/file -user user1 -perm /060 -perm /006


Find Python scripts in the current directory recursively
but ignore those under directories with the name `.ipynb_checkpoints`.

    :::bash
    find . -type f -iname '*.py' -not -path '*/.ipynb_checkpoints/*'
In [ ]:
 

Comments