Ben Chuanlong Du's Blog

It is never too late to learn.

Hands on GitPython

Tips and Traps

  1. GitPython is a wrapper around the git command. It requires the git command to be on the search path in order to work. Also, sometimes it is easier to call the git command via subprocess.run directly instead of using GitPython.

  2. The git command (and thus GitPython) accepts URLs both with and without the trailing .git.

In [1]:
!pip3 install GitPython
Requirement already satisfied: GitPython in /usr/local/lib/python3.8/dist-packages (3.1.17)
Requirement already satisfied: gitdb<5,>=4.0.1 in /usr/local/lib/python3.8/dist-packages (from GitPython) (4.0.7)
Requirement already satisfied: smmap<5,>=3.0.1 in /usr/local/lib/python3.8/dist-packages (from gitdb<5,>=4.0.1->GitPython) (4.0.0)
In [4]:
import git
from git import Repo
In [5]:
url = "https://github.com/dclong/docker-ubuntu_b.git"
dir_local = "/tmp/test_gitpython"
In [6]:
url = "https://github.com/dclong/docker-ubuntu_b"
dir_local = "/tmp/test_gitpython"

Clone a Repository

In [7]:
!rm -rf {dir_local}
In [9]:
repo = git.Repo.clone_from(url, dir_local, branch="main")
repo
Out[9]:
<git.repo.base.Repo '/tmp/test_gitpython/.git'>
In [7]:
ls /tmp/test_gitpython/
build.sh*  Dockerfile  LICENSE  readme.md  scripts/

Verify that the GitHub repository is cloned to local.

In [5]:
!ls {dir_local}
build.sh  readme.md

Clone the local repository to another location (which is not very useful as you can directly copy the directory to the new location).

In [13]:
repo2 = Repo(dir_local).clone(f"/tmp/{dir_local}")
repo2
Out[13]:
<git.repo.base.Repo '/tmp/test_gitpython/.git'>
In [14]:
!ls /tmp/{dir_local}
build.sh  readme.md

Infomation of the Local Repository

In [15]:
heads = repo.heads
heads
Out[15]:
[<git.Head "refs/heads/main">]
In [16]:
main = heads.main
main
Out[16]:
<git.Head "refs/heads/main">

Get the commit pointed to by head called master.

In [17]:
main.commit
Out[17]:
<git.Commit "95ed236bd715a06320ee85d519fb79a0adffe072">
In [18]:
main.rename("main2")
Out[18]:
<git.Head "refs/heads/main2">

Verify that the main branch has been renamed to main2.

In [19]:
!cd {dir_local} && git branch
* main2

Get the Active Branch

In [5]:
repo.active_branch.name
Out[5]:
'main'

Get All Branches

In [10]:
repo.branches
Out[10]:
[<git.Head "refs/heads/main">]

Get the Remote Name

In [6]:
repo.remote().name
Out[6]:
'origin'

Get all Remotes

In [7]:
repo.remotes
Out[7]:
[<git.Remote "origin">]

Commits

Get the latest commit in a branch.

In [23]:
repo.commit("main")
Out[23]:
<git.Commit "53d99955a9762427f2f68dc04765471089055dc1">
In [28]:
repo.commit("main").diff(repo.commit("origin/dev"))
Out[28]:
[]
In [27]:
repo.commit("origin/dev")
Out[27]:
<git.Commit "8f9f426f13d70b21f573f7c50bbe01e8ce38f158">
In [26]:
repo.refs
Out[26]:
[<git.Head "refs/heads/main">,
 <git.RemoteReference "refs/remotes/origin/HEAD">,
 <git.RemoteReference "refs/remotes/origin/debian">,
 <git.RemoteReference "refs/remotes/origin/dev">,
 <git.RemoteReference "refs/remotes/origin/main">]

Changed Files

Update a file.

In [23]:
!echo "# add a line of comment" >> {dir_local}/build.sh
In [24]:
repo = Repo(dir_local)
files_changed = [item.a_path for item in repo.index.diff(None)]
files_changed
Out[24]:
['build.sh']

Staged Files

In [25]:
repo = Repo(dir_local)
index = repo.index
In [26]:
index.add("build.sh")
Out[26]:
[(100644, f1cb16a21febd1f69a7a638402dddeb7f1dc9771, 0, build.sh)]

The file build.sh is now staged.

In [27]:
files_stage = [item.a_path for item in repo.index.diff("HEAD")]
files_stage
Out[27]:
['build.sh']
In [28]:
files_changed = [item.a_path for item in repo.index.diff(None)]
files_changed
Out[28]:
[]

Commit the change.

In [29]:
index.commit("update build.sh")
Out[29]:
<git.Commit "bfea304786b7b77f7fe247c74040c0e23576fc41">
In [30]:
files_stage = [item.a_path for item in repo.index.diff("HEAD")]
files_stage
Out[30]:
[]
In [8]:
remote = repo.remote()
remote
Out[8]:
<git.Remote "origin">

Push the Commits

Push the local main2 branch to the remote main2 branch.

In [32]:
remote.push("main2")
Out[32]:
[<git.remote.PushInfo at 0x11fd596d0>]

The above is equivalent to the following more detailed specification.

In [62]:
remote.push("refs/heads/main2:refs/heads/main2")
Out[62]:
[<git.remote.PushInfo at 0x119903540>]

Push the local main2 branch to the remote main branch.

In [63]:
remote.push("refs/heads/main2:refs/heads/main")
Out[63]:
[<git.remote.PushInfo at 0x11992d9a0>]

Pull a Branch

In [6]:
repo.active_branch
Out[6]:
<git.Head "refs/heads/main">
In [11]:
remote.pull(repo.active_branch)
Out[11]:
[]
In [12]:
!ls {dir_local}
abc       build.sh  readme.md

git checkout

In [42]:
help(repo.refs[4].checkout)
Help on method checkout in module git.refs.head:

checkout(force=False, **kwargs) method of git.refs.remote.RemoteReference instance
    Checkout this head by setting the HEAD to this reference, by updating the index
    to reflect the tree we point to and by updating the working tree to reflect
    the latest index.
    
    The command will fail if changed working tree files would be overwritten.
    
    :param force:
        If True, changes to the index and the working tree will be discarded.
        If False, GitCommandError will be raised in that situation.
    
    :param kwargs:
        Additional keyword arguments to be passed to git checkout, i.e.
        b='new_branch' to create a new branch at the given spot.
    
    :return:
        The active branch after the checkout operation, usually self unless
        a new branch has been created.
        If there is no active branch, as the HEAD is now detached, the HEAD
        reference will be returned instead.
    
    :note:
        By default it is only allowed to checkout heads - everything else
        will leave the HEAD detached which is allowed and possible, but remains
        a special state that some tools might not be able to handle.

In [5]:
?repo.git.checkout
Signature: repo.git.checkout(*args, **kwargs)
Docstring: <no docstring>
File:      /usr/local/lib/python3.8/site-packages/git/cmd.py
Type:      function
In [6]:
repo.active_branch
Out[6]:
<git.Head "refs/heads/dev">

The force=True option discard any local changes no matter switching branch might be blocked by the local changes or not.

In [12]:
repo.git.checkout("dev", force=True)
Out[12]:
'Your branch is ahead of \'origin/dev\' by 1 commit.\n  (use "git push" to publish your local commits)'
In [5]:
repo.git.checkout("main", force=True)
Out[5]:
"Your branch is up to date with 'origin/main'."
In [11]:
repo.active_branch
Out[11]:
<git.Head "refs/heads/dev">

git tag

List all tags.

In [13]:
repo.tags
Out[13]:
[]

Add a tag.

In [15]:
repo.create_tag("v1.0.0")
Out[15]:
<git.TagReference "refs/tags/v1.0.0">
In [17]:
repo.tags
Out[17]:
[<git.TagReference "refs/tags/v1.0.0">]
In [20]:
repo.tag("refs/tags/v1.0.0")
Out[20]:
<git.TagReference "refs/tags/v1.0.0">
In [23]:
tag2 = repo.tag("refs/tags/v2.0.0")
tag2
Out[23]:
<git.TagReference "refs/tags/v2.0.0">
In [24]:
repo.tags
Out[24]:
[<git.TagReference "refs/tags/v1.0.0">]

The GitCommandError is thrown when the tag already exists.

In [25]:
repo.create_tag("v1.0.0")
---------------------------------------------------------------------------
GitCommandError                           Traceback (most recent call last)
<ipython-input-25-c9fa3ca924a1> in <module>
----> 1 repo.create_tag("v1.0.0")

/usr/local/lib/python3.8/site-packages/git/repo/base.py in create_tag(self, path, ref, message, force, **kwargs)
    397 
    398         :return: TagReference object """
--> 399         return TagReference.create(self, path, ref, message, force, **kwargs)
    400 
    401     def delete_tag(self, *tags):

/usr/local/lib/python3.8/site-packages/git/refs/tag.py in create(cls, repo, path, ref, message, force, **kwargs)
     81             kwargs['f'] = True
     82 
---> 83         repo.git.tag(*args, **kwargs)
     84         return TagReference(repo, "%s/%s" % (cls._common_path_default, path))
     85 

/usr/local/lib/python3.8/site-packages/git/cmd.py in <lambda>(*args, **kwargs)
    540         if name[0] == '_':
    541             return LazyMixin.__getattr__(self, name)
--> 542         return lambda *args, **kwargs: self._call_process(name, *args, **kwargs)
    543 
    544     def set_persistent_git_options(self, **kwargs):

/usr/local/lib/python3.8/site-packages/git/cmd.py in _call_process(self, method, *args, **kwargs)
   1003         call.extend(args)
   1004 
-> 1005         return self.execute(call, **exec_kwargs)
   1006 
   1007     def _parse_object_header(self, header_line):

/usr/local/lib/python3.8/site-packages/git/cmd.py in execute(self, command, istream, with_extended_output, with_exceptions, as_process, output_stream, stdout_as_string, kill_after_timeout, with_stdout, universal_newlines, shell, env, max_chunk_size, **subprocess_kwargs)
    820 
    821         if with_exceptions and status != 0:
--> 822             raise GitCommandError(command, status, stderr_value, stdout_value)
    823 
    824         if isinstance(stdout_value, bytes) and stdout_as_string:  # could also be output_stream

GitCommandError: Cmd('git') failed due to: exit code(128)
  cmdline: git tag v1.0.0 HEAD
  stderr: 'fatal: tag 'v1.0.0' already exists'
In [26]:
repo.remote().push("v1.0.0")
Out[26]:
[<git.remote.PushInfo at 0x12034acc0>]

git diff

In [16]:
help(repo.refs[4].commit.diff)
Help on method diff in module git.diff:

diff(other: Union[Type[git.diff.Diffable.Index], Type[ForwardRef('Tree')], object, NoneType, str] = <class 'git.diff.Diffable.Index'>, paths: Union[str, List[str], Tuple[str, ...], NoneType] = None, create_patch: bool = False, **kwargs: Any) -> 'DiffIndex' method of git.objects.commit.Commit instance
    Creates diffs between two items being trees, trees and index or an
    index and the working tree. It will detect renames automatically.
    
    :param other:
        Is the item to compare us with.
        If None, we will be compared to the working tree.
        If Treeish, it will be compared against the respective tree
        If Index ( type ), it will be compared against the index.
        If git.NULL_TREE, it will compare against the empty tree.
        It defaults to Index to assure the method will not by-default fail
        on bare repositories.
    
    :param paths:
        is a list of paths or a single path to limit the diff to.
        It will only include at least one of the given path or paths.
    
    :param create_patch:
        If True, the returned Diff contains a detailed patch that if applied
        makes the self to other. Patches are somewhat costly as blobs have to be read
        and diffed.
    
    :param kwargs:
        Additional arguments passed to git-diff, such as
        R=True to swap both sides of the diff.
    
    :return: git.DiffIndex
    
    :note:
        On a bare repository, 'other' needs to be provided as Index or as
        as Tree/Commit, or a git command error will occur

In [3]:
url = "https://github.com/dclong/docker-ubuntu_b.git"
dir_local = "/tmp/" + url[(url.rindex("/") + 1) :]
!rm -rf {dir_local}
In [4]:
repo = git.Repo.clone_from(url, dir_local, branch="main")
repo
Out[4]:
<git.repo.base.Repo '/tmp/docker-ubuntu_b.git/.git'>
In [25]:
repo.refs
Out[25]:
[<git.Head "refs/heads/debian">,
 <git.Head "refs/heads/dev">,
 <git.Head "refs/heads/main">,
 <git.RemoteReference "refs/remotes/origin/HEAD">,
 <git.RemoteReference "refs/remotes/origin/debian">,
 <git.RemoteReference "refs/remotes/origin/dev">,
 <git.RemoteReference "refs/remotes/origin/main">]
In [6]:
diffs = repo.refs[4].commit.diff(repo.refs[3].commit)
diffs
Out[6]:
[]
In [21]:
diffs = repo.refs[4].commit.diff(repo.refs[2].commit)
diffs
Out[21]:
[<git.diff.Diff at 0x7f0eb05d1a60>, <git.diff.Diff at 0x7f0eb05d1af0>]
In [13]:
str(diffs[0])
Out[13]:
'Dockerfile\n=======================================================\nlhs: 100644 | 8ae5c7650a8c031a8e176d896a3665bbe7e2aae8\nrhs: 100644 | 9f2304d9a97aa1279ad1938b3bb74790172c9d8b'
In [12]:
repo.refs[5].name
Out[12]:
'origin/main'
In [6]:
print(repo.git.status())
On branch main
Your branch is up to date with 'origin/main'.

nothing to commit, working tree clean
In [6]:
repo.git.checkout("debian", force=True)
Out[6]:
"Your branch is up to date with 'origin/debian'."
In [8]:
repo.git.checkout(b="a_new_branch", force=True)
Out[8]:
''
In [ ]:
nima = repo.refs[4].checkout(force=True, b="nima")
nima
In [50]:
diffs = nima.commit.diff(repo.refs[-1].commit)
diffs[0].diff
Out[50]:
''

Diff the dev and the main branch, which is equivalent to the Git command git diff dev..main.

In [30]:
repo.refs[2].commit.diff(repo.refs[1].commit)
Out[30]:
[]
In [32]:
diffs = repo.refs[2].commit.diff(repo.refs[0].commit)
diffs
Out[32]:
[<git.diff.Diff at 0x128ca3a60>]
In [33]:
diffs[0]
Out[33]:
<git.diff.Diff at 0x128ca3a60>
In [24]:
diffs = repo.refs[6].commit.diff(repo.refs[7].commit)
diffs
Out[24]:
[]
In [25]:
diffs = repo.refs[4].commit.diff(repo.refs[7].commit)
diffs
Out[25]:
[<git.diff.Diff at 0x128ca3790>]
In [26]:
diffs[0].diff
Out[26]:
''
In [27]:
diffs = repo.refs[7].commit.diff(repo.refs[4].commit)
diffs
Out[27]:
[<git.diff.Diff at 0x128ca3820>]
In [28]:
diffs[0].diff
Out[28]:
''
In [19]:
any(ele for ele in [""])
Out[19]:
False
In [23]:
repo.branches[0].name
Out[23]:
'dev'
In [ ]:
for branch in repo.branches:
    branch.
In [12]:
commit = repo.head.commit
commit
Out[12]:
<git.Commit "6716bb0d016bd63ba543f3d9c67a65dadecd152e">
In [15]:
type(repo.branches[0])
Out[15]:
git.refs.head.Head
In [17]:
repo.refs[4].commit.diff(repo.refs[2].commit)
Out[17]:
[]
In [9]:
repo.refs[4].commit.diff(repo.refs[3].commit)
Out[9]:
[<git.diff.Diff at 0x127370310>]
In [20]:
help(repo.git.branch)
Help on function <lambda> in module git.cmd:

<lambda> lambda *args, **kwargs

In [28]:
repo.heads
Out[28]:
[<git.Head "refs/heads/dev">, <git.Head "refs/heads/main">]

Diff the debian and the main branches but limit diff to specified paths (via the paths parameter).

In [24]:
diffs = repo.refs[4].commit.diff(repo.refs[2].commit, paths=["build.sh", "scripts"])
diffs
Out[24]:
[]
In [ ]:
 

Comments