Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Things under legendu.net/outdated are outdated technologies that the author does not plan to update any more. Please look for better alternatives.

Tips and Traps

  1. GitPython is a wrapper around the git command. It requires the git command to be on the search path in order to work. Also, sometimes it is easier to call the git command via subprocess.run directly instead of using GitPython.

  2. The git command (and thus GitPython) accepts URLs both with and without the trailing .git.

!pip3 install GitPython
Requirement already satisfied: GitPython in /usr/local/lib/python3.8/dist-packages (3.1.17)
Requirement already satisfied: gitdb<5,>=4.0.1 in /usr/local/lib/python3.8/dist-packages (from GitPython) (4.0.7)
Requirement already satisfied: smmap<5,>=3.0.1 in /usr/local/lib/python3.8/dist-packages (from gitdb<5,>=4.0.1->GitPython) (4.0.0)
import git
from git import Repo
url = "https://github.com/dclong/docker-ubuntu_b.git"
dir_local = "/tmp/test_gitpython"
url = "https://github.com/dclong/docker-ubuntu_b"
dir_local = "/tmp/test_gitpython"

Clone a Repository

!rm -rf {dir_local}
repo = git.Repo.clone_from(url, dir_local, branch="main")
repo
<git.repo.base.Repo '/tmp/test_gitpython/.git'>
!ls /tmp/test_gitpython/
build.sh*  Dockerfile  LICENSE  readme.md  scripts/

Verify that the GitHub repository is cloned to local.

!ls {dir_local}
build.sh  readme.md

Clone the local repository to another location (which is not very useful as you can directly copy the directory to the new location).

repo2 = Repo(dir_local).clone(f"/tmp/{dir_local}")
repo2
<git.repo.base.Repo '/tmp/test_gitpython/.git'>
!ls /tmp/{dir_local}
build.sh  readme.md

Infomation of the Local Repository

heads = repo.heads
heads
[<git.Head "refs/heads/main">]
main = heads.main
main
<git.Head "refs/heads/main">

Get the commit pointed to by head called master.

main.commit
<git.Commit "95ed236bd715a06320ee85d519fb79a0adffe072">
main.rename("main2")
<git.Head "refs/heads/main2">

Verify that the main branch has been renamed to main2.

!cd {dir_local} && git branch
* main2

Get the Active Branch

repo.active_branch.name
'main'

Get All Branches

repo.branches
[<git.Head "refs/heads/main">]

Get the Remote Name

repo.remote().name
'origin'

Get all Remotes

repo.remotes
[<git.Remote "origin">]

Commits

Get the latest commit in a branch.

repo.commit("main")
<git.Commit "53d99955a9762427f2f68dc04765471089055dc1">
repo.commit("main").diff(repo.commit("origin/dev"))
[]
repo.commit("origin/dev")
<git.Commit "8f9f426f13d70b21f573f7c50bbe01e8ce38f158">
repo.refs
[<git.Head "refs/heads/main">, <git.RemoteReference "refs/remotes/origin/HEAD">, <git.RemoteReference "refs/remotes/origin/debian">, <git.RemoteReference "refs/remotes/origin/dev">, <git.RemoteReference "refs/remotes/origin/main">]

Changed Files

Update a file.

!echo "# add a line of comment" >> {dir_local}/build.sh
repo = Repo(dir_local)
files_changed = [item.a_path for item in repo.index.diff(None)]
files_changed
['build.sh']

Staged Files

repo = Repo(dir_local)
index = repo.index
index.add("build.sh")
[(100644, f1cb16a21febd1f69a7a638402dddeb7f1dc9771, 0, build.sh)]

The file build.sh is now staged.

files_stage = [item.a_path for item in repo.index.diff("HEAD")]
files_stage
['build.sh']
files_changed = [item.a_path for item in repo.index.diff(None)]
files_changed
[]

Commit the change.

index.commit("update build.sh")
<git.Commit "bfea304786b7b77f7fe247c74040c0e23576fc41">
files_stage = [item.a_path for item in repo.index.diff("HEAD")]
files_stage
[]
remote = repo.remote()
remote
<git.Remote "origin">

Push the Commits

Push the local main2 branch to the remote main2 branch.

remote.push("main2")
[<git.remote.PushInfo at 0x11fd596d0>]

The above is equivalent to the following more detailed specification.

remote.push("refs/heads/main2:refs/heads/main2")
[<git.remote.PushInfo at 0x119903540>]

Push the local main2 branch to the remote main branch.

remote.push("refs/heads/main2:refs/heads/main")
[<git.remote.PushInfo at 0x11992d9a0>]

Pull a Branch

repo.active_branch
<git.Head "refs/heads/main">
remote.pull(repo.active_branch)
[]
!ls {dir_local}
abc       build.sh  readme.md

git checkout

help(repo.refs[4].checkout)
Help on method checkout in module git.refs.head:

checkout(force=False, **kwargs) method of git.refs.remote.RemoteReference instance
    Checkout this head by setting the HEAD to this reference, by updating the index
    to reflect the tree we point to and by updating the working tree to reflect
    the latest index.
    
    The command will fail if changed working tree files would be overwritten.
    
    :param force:
        If True, changes to the index and the working tree will be discarded.
        If False, GitCommandError will be raised in that situation.
    
    :param kwargs:
        Additional keyword arguments to be passed to git checkout, i.e.
        b='new_branch' to create a new branch at the given spot.
    
    :return:
        The active branch after the checkout operation, usually self unless
        a new branch has been created.
        If there is no active branch, as the HEAD is now detached, the HEAD
        reference will be returned instead.
    
    :note:
        By default it is only allowed to checkout heads - everything else
        will leave the HEAD detached which is allowed and possible, but remains
        a special state that some tools might not be able to handle.

?repo.git.checkout
Signature: repo.git.checkout(*args, **kwargs)
Docstring: <no docstring>
File:      /usr/local/lib/python3.8/site-packages/git/cmd.py
Type:      function
repo.active_branch
<git.Head "refs/heads/dev">

The force=True option discard any local changes no matter switching branch might be blocked by the local changes or not.

repo.git.checkout("dev", force=True)
'Your branch is ahead of \'origin/dev\' by 1 commit.\n (use "git push" to publish your local commits)'
repo.git.checkout("main", force=True)
"Your branch is up to date with 'origin/main'."
repo.active_branch
<git.Head "refs/heads/dev">

git tag

List all tags.

repo.tags
[]

Add a tag.

repo.create_tag("v1.0.0")
<git.TagReference "refs/tags/v1.0.0">
repo.tags
[<git.TagReference "refs/tags/v1.0.0">]
repo.tag("refs/tags/v1.0.0")
<git.TagReference "refs/tags/v1.0.0">
tag2 = repo.tag("refs/tags/v2.0.0")
tag2
<git.TagReference "refs/tags/v2.0.0">
repo.tags
[<git.TagReference "refs/tags/v1.0.0">]

The GitCommandError is thrown when the tag already exists.

repo.create_tag("v1.0.0")
---------------------------------------------------------------------------
GitCommandError                           Traceback (most recent call last)
<ipython-input-25-c9fa3ca924a1> in <module>
----> 1 repo.create_tag("v1.0.0")

/usr/local/lib/python3.8/site-packages/git/repo/base.py in create_tag(self, path, ref, message, force, **kwargs)
    397 
    398         :return: TagReference object """
--> 399         return TagReference.create(self, path, ref, message, force, **kwargs)
    400 
    401     def delete_tag(self, *tags):

/usr/local/lib/python3.8/site-packages/git/refs/tag.py in create(cls, repo, path, ref, message, force, **kwargs)
     81             kwargs['f'] = True
     82 
---> 83         repo.git.tag(*args, **kwargs)
     84         return TagReference(repo, "%s/%s" % (cls._common_path_default, path))
     85 

/usr/local/lib/python3.8/site-packages/git/cmd.py in <lambda>(*args, **kwargs)
    540         if name[0] == '_':
    541             return LazyMixin.__getattr__(self, name)
--> 542         return lambda *args, **kwargs: self._call_process(name, *args, **kwargs)
    543 
    544     def set_persistent_git_options(self, **kwargs):

/usr/local/lib/python3.8/site-packages/git/cmd.py in _call_process(self, method, *args, **kwargs)
   1003         call.extend(args)
   1004 
-> 1005         return self.execute(call, **exec_kwargs)
   1006 
   1007     def _parse_object_header(self, header_line):

/usr/local/lib/python3.8/site-packages/git/cmd.py in execute(self, command, istream, with_extended_output, with_exceptions, as_process, output_stream, stdout_as_string, kill_after_timeout, with_stdout, universal_newlines, shell, env, max_chunk_size, **subprocess_kwargs)
    820 
    821         if with_exceptions and status != 0:
--> 822             raise GitCommandError(command, status, stderr_value, stdout_value)
    823 
    824         if isinstance(stdout_value, bytes) and stdout_as_string:  # could also be output_stream

GitCommandError: Cmd('git') failed due to: exit code(128)
  cmdline: git tag v1.0.0 HEAD
  stderr: 'fatal: tag 'v1.0.0' already exists'
repo.remote().push("v1.0.0")
[<git.remote.PushInfo at 0x12034acc0>]

git diff

help(repo.refs[4].commit.diff)
Help on method diff in module git.diff:

diff(other: Union[Type[git.diff.Diffable.Index], Type[ForwardRef('Tree')], object, NoneType, str] = <class 'git.diff.Diffable.Index'>, paths: Union[str, List[str], Tuple[str, ...], NoneType] = None, create_patch: bool = False, **kwargs: Any) -> 'DiffIndex' method of git.objects.commit.Commit instance
    Creates diffs between two items being trees, trees and index or an
    index and the working tree. It will detect renames automatically.
    
    :param other:
        Is the item to compare us with.
        If None, we will be compared to the working tree.
        If Treeish, it will be compared against the respective tree
        If Index ( type ), it will be compared against the index.
        If git.NULL_TREE, it will compare against the empty tree.
        It defaults to Index to assure the method will not by-default fail
        on bare repositories.
    
    :param paths:
        is a list of paths or a single path to limit the diff to.
        It will only include at least one of the given path or paths.
    
    :param create_patch:
        If True, the returned Diff contains a detailed patch that if applied
        makes the self to other. Patches are somewhat costly as blobs have to be read
        and diffed.
    
    :param kwargs:
        Additional arguments passed to git-diff, such as
        R=True to swap both sides of the diff.
    
    :return: git.DiffIndex
    
    :note:
        On a bare repository, 'other' needs to be provided as Index or as
        as Tree/Commit, or a git command error will occur

url = "https://github.com/dclong/docker-ubuntu_b.git"
dir_local = "/tmp/" + url[(url.rindex("/") + 1) :]
!rm -rf {dir_local}
repo = git.Repo.clone_from(url, dir_local, branch="main")
repo
<git.repo.base.Repo '/tmp/docker-ubuntu_b.git/.git'>
repo.refs
[<git.Head "refs/heads/debian">, <git.Head "refs/heads/dev">, <git.Head "refs/heads/main">, <git.RemoteReference "refs/remotes/origin/HEAD">, <git.RemoteReference "refs/remotes/origin/debian">, <git.RemoteReference "refs/remotes/origin/dev">, <git.RemoteReference "refs/remotes/origin/main">]
diffs = repo.refs[4].commit.diff(repo.refs[3].commit)
diffs
[]
diffs = repo.refs[4].commit.diff(repo.refs[2].commit)
diffs
[<git.diff.Diff at 0x7f0eb05d1a60>, <git.diff.Diff at 0x7f0eb05d1af0>]
str(diffs[0])
'Dockerfile\n=======================================================\nlhs: 100644 | 8ae5c7650a8c031a8e176d896a3665bbe7e2aae8\nrhs: 100644 | 9f2304d9a97aa1279ad1938b3bb74790172c9d8b'
repo.refs[5].name
'origin/main'
print(repo.git.status())
On branch main
Your branch is up to date with 'origin/main'.

nothing to commit, working tree clean
repo.git.checkout("debian", force=True)
"Your branch is up to date with 'origin/debian'."
repo.git.checkout(b="a_new_branch", force=True)
''
nima = repo.refs[4].checkout(force=True, b="nima")
nima
diffs = nima.commit.diff(repo.refs[-1].commit)
diffs[0].diff
''

Diff the dev and the main branch, which is equivalent to the Git command git diff dev..main.

repo.refs[2].commit.diff(repo.refs[1].commit)
[]
diffs = repo.refs[2].commit.diff(repo.refs[0].commit)
diffs
[<git.diff.Diff at 0x128ca3a60>]
diffs[0]
<git.diff.Diff at 0x128ca3a60>
diffs = repo.refs[6].commit.diff(repo.refs[7].commit)
diffs
[]
diffs = repo.refs[4].commit.diff(repo.refs[7].commit)
diffs
[<git.diff.Diff at 0x128ca3790>]
diffs[0].diff
''
diffs = repo.refs[7].commit.diff(repo.refs[4].commit)
diffs
[<git.diff.Diff at 0x128ca3820>]
diffs[0].diff
''
any(ele for ele in [""])
False
repo.branches[0].name
'dev'
commit = repo.head.commit
commit
<git.Commit "6716bb0d016bd63ba543f3d9c67a65dadecd152e">
type(repo.branches[0])
git.refs.head.Head
repo.refs[4].commit.diff(repo.refs[2].commit)
[]
repo.refs[4].commit.diff(repo.refs[3].commit)
[<git.diff.Diff at 0x127370310>]
help(repo.git.branch)
Help on function <lambda> in module git.cmd:

<lambda> lambda *args, **kwargs

repo.heads
[<git.Head "refs/heads/dev">, <git.Head "refs/heads/main">]

Diff the debian and the main branches but limit diff to specified paths (via the paths parameter).

diffs = repo.refs[4].commit.diff(repo.refs[2].commit, paths=["build.sh", "scripts"])
diffs
[]