引用自 Git Internals
Git is fundamentally a content-addressable filesystem with a VCS user interface written on top of it.
git
命令在背后做了什么.git
文件夹的内部结构.git
内各个文件的作用为了演示第1点,我们 不使用 任何 git
命令来作写入操作,而是用 python 来模拟 git
的行为,仅用 git
的一些读命令来验证结果。
即:Git 的无实物表演
from IPython.display import display, Code
import tempfile
from pathlib import PurePath
import os
import subprocess
import shlex
import hashlib
import zlib
import time
import shutil
from typing import Optional, List
from display_git import display_tree, display_general
创建一个临时文件夹,表演会在这里开始
base = tempfile.mkdtemp(prefix='git-mock-')
print(f'explore git at {base}')
explore git at /tmp/git-mock-xma7sdir
"""
执行 shell 命令,将结果展示出来
主要用来验证模拟的结果
"""
def run_cmd(cmd):
proc = subprocess.run(shlex.split(cmd), capture_output=True, encoding='utf-8')
display(Code(f'>>> {shlex.join(proc.args)}\n{proc.stdout or proc.stderr}'))
先看看当前 git
的版本号
不同版本的 git
,在一些文件结构上有区别,比方说 index file format
run_cmd('git --version')
>>> git --version
git version 2.25.1
初始化一个 git 仓库。创建最小量的文件
.git
文件夹.git/objects
, .git/refs/heads
文件夹config
本地配置文件HEAD
文件,让其指向默认分支 master
git_dir = PurePath(base).joinpath('.git')
os.mkdir(git_dir)
for d in ['refs/heads', 'objects']:
os.makedirs(git_dir.joinpath(d), exist_ok=True)
with open(git_dir.joinpath('config'), 'wt', encoding='utf-8') as f:
f.write('''\
[core]
repositoryformatversion = 0
filemode = true
bare = false
logallrefupdates = false
''')
with open(git_dir.joinpath('HEAD'), 'wt', encoding='utf-8') as f:
f.write('ref: refs/heads/master')
来看看当前的目录结构,一个最简单的 git 仓库就已经初始化好了
run_cmd(f'tree {base} -a')
>>> tree /tmp/git-mock-xma7sdir -a
/tmp/git-mock-xma7sdir
└── .git
├── config
├── HEAD
├── objects
└── refs
└── heads
4 directories, 2 files
当前在 master
分支,暂时还没有 commit 历史,同时 working tree 也是空的
表演舞台已经准备就绪
run_cmd(f'git -C {base} status')
>>> git -C /tmp/git-mock-xma7sdir status
On branch master
No commits yet
nothing to commit (create/copy files and use "git add" to track)
py_v1_text = 'print("hello")\n'
py_file_name = 'hello.py'
with open(os.path.join(base, py_file_name), 'wt', encoding='utf-8') as f:
f.write(py_v1_text)
目录里已经有这个文件了
run_cmd(f'tree {base} -a')
>>> tree /tmp/git-mock-xma7sdir -a
/tmp/git-mock-xma7sdir
├── .git
│ ├── config
│ ├── HEAD
│ ├── objects
│ └── refs
│ └── heads
└── hello.py
4 directories, 3 files
看看当前的仓库状态
有一个没有被 追踪 的文件 hello.py
run_cmd(f'git -C {base} status')
>>> git -C /tmp/git-mock-xma7sdir status
On branch master
No commits yet
Untracked files:
(use "git add <file>..." to include in what will be committed)
hello.py
nothing added to commit but untracked files present (use "git add" to track)
我们想要追踪这个 hello.py
文件,即模拟 git add
的行为,该怎么做呢?
git add
做了两件事:
.git/objects/
.git/index
Git: content-addressable file system
这个文件系统是一个简单的 key-value 存储
在文件内容上应用哈希函数 sha1
得到一个哈希值,该哈希值可以作为文件内容的唯一标识,亦可作为文件名
sha
系列函数有如下特点:
换言之,没有实际可行的办法,能够找到虽然内容不同,但哈希值却一样的文件
我们再来看看 value 是怎么一回事
git 将仓库内的所有内容存在 .git/objects/
里,称之为 object database
.git/objects/
里的内容分为几种类型:
仓库内的所有内容都是保存成 blob object 和 tree object
可以这样类比:tree object -> UNIX 目录结构,blob object -> inode 或文件内容
所有的 object 会经过一次压缩后存盘
"""
将文件内容经过zlib压缩后,写入 `.git/object/` 文件夹
文件的命名方式是:
- 取 sha 的前两位作为文件夹名
- 取 sha 的剩下位数作为文件名
"""
def write_object(raw_content: bytes, sha1: str, git_dir: PurePath) -> None:
compressed = zlib.compress(raw_content)
object_dir = git_dir.joinpath('objects', sha1[:2])
os.makedirs(object_dir, exist_ok=True)
with open(object_dir.joinpath(sha1[2:]), 'wb') as f:
f.write(compressed)
几种 object 有一个通用的结构体:
<ascii type without space> + <space> + <ascii decimal size> + <byte\0> + <binary object data>
"""
写 blob object, 模拟 `git add` 的第一部分操作
blob object 的
- <ascii type> = blob
- <binary object data> = 要 add 的文件的内容
"""
def write_blob_object(file_content: str) -> str:
raw_content = f'blob {len(file_content)}\0{file_content}'.encode('utf-8')
sha1 = hashlib.sha1(raw_content).hexdigest()
write_object(raw_content, sha1, git_dir)
return sha1
模拟 git add hello.py
的行为,先做第一部分:写入 blob object
with open(os.path.join(base, py_file_name), 'rt', encoding='utf-8') as f:
file_content = f.read()
py_v1_blob_sha = write_blob_object(file_content)
print(py_v1_blob_sha)
11b15b1a4584b08fa423a57964bdbf018b0da0d5
看看发生了什么
首先,blob_sha
的确对应了原始的文本内容,即通过 blob object 能完全复原原始文件
run_cmd(f'git -C {base} cat-file -p {py_v1_blob_sha}')
>>> git -C /tmp/git-mock-xma7sdir cat-file -p 11b15b1a4584b08fa423a57964bdbf018b0da0d5
print("hello")
.git/objects
下多了一个文件,对应 blob_sha
run_cmd(f'tree {base} -a')
>>> tree /tmp/git-mock-xma7sdir -a
/tmp/git-mock-xma7sdir
├── .git
│ ├── config
│ ├── HEAD
│ ├── objects
│ │ └── 11
│ │ └── b15b1a4584b08fa423a57964bdbf018b0da0d5
│ └── refs
│ └── heads
└── hello.py
5 directories, 4 files
当前仓库的状态仍然是有 untracked file: hello.py
因为我们只做了 git add
的第一步操作,还没有更新 index (staging area)
run_cmd(f'git -C {base} status')
>>> git -C /tmp/git-mock-xma7sdir status
On branch master
No commits yet
Untracked files:
(use "git add <file>..." to include in what will be committed)
hello.py
nothing added to commit but untracked files present (use "git add" to track)
git 将 staging area 的信息存在 .git/index
文件里
该文件的结构如下 (ref: https://git-scm.com/docs/index-format/2.25.0)
| 0 | 4 | 8 | C |
|-------------|--------------|-------------|----------------|
0 | DIRC | Version | entry count | ctime ...| 0
| ctime_ns | mtime | mtime_ns | device |
2 | inode | mode | UID | GID | 2
| file size | blob sha | flags | variable path name ..|
4 | ... | NULL padding | ... another entry ... ...| 4
| ... | index sha1 |
"""
ref: https://git-scm.com/docs/index-format/2.25.0#_index_entry
"""
class IndexEntry:
def __init__(self, path: str, blob_sha: str, base_path: str):
self.path = path
self.blob_sha = blob_sha
self.base_path = base_path
def to_bytes(self):
stat = os.stat(self.path)
b = int(stat.st_ctime).to_bytes(4, byteorder='big')
b += int(stat.st_ctime_ns % 1e9).to_bytes(4, byteorder='big')
b += int(stat.st_mtime).to_bytes(4, byteorder='big')
b += int(stat.st_mtime_ns % 1e9).to_bytes(4, byteorder='big')
b += int(stat.st_dev).to_bytes(4, byteorder='big')
b += int(stat.st_ino).to_bytes(4, byteorder='big')
b += int('100644', 8).to_bytes(4, byteorder='big')
b += int(stat.st_uid).to_bytes(4, byteorder='big')
b += int(stat.st_gid).to_bytes(4, byteorder='big')
b += int(stat.st_size).to_bytes(4, byteorder='big')
b += bytes.fromhex(self.blob_sha)
assume_valid_flag = 0 << 3
extended_flag = 0 << 2
merge_stage_flag = 0
name_length = len(os.path.basename(self.path)) \
if len(os.path.basename(self.path)) < 0xfff else 0xfff
flags = (
((assume_valid_flag | extended_flag | merge_stage_flag) << 12)
| name_length
).to_bytes(2, byteorder='big')
b += flags
relative_path_name = os.path.relpath(self.path, self.base_path).encode('utf-8')
b += relative_path_name
padding_size = 8 - (len(b) % 8)
b += (b'\0' * padding_size)
return b
"""
写 index file, 模拟 `git add` 第二部分操作
"""
def write_index_file(entries: List[IndexEntry]) -> None:
signature = b'DIRC'
version = (2).to_bytes(4, byteorder='big')
entries_number = len(entries).to_bytes(4, byteorder='big')
# Index entries are sorted in ascending order on the name field
entries = sorted(entries, key=lambda e: e.path)
raw_content = signature + version + entries_number \
+ b''.join([e.to_bytes() for e in entries])
sha1 = hashlib.sha1(raw_content).hexdigest()
raw_content += bytes.fromhex(sha1)
with open(git_dir.joinpath('index'), 'wb') as f:
f.write(raw_content)
把 hello.py
加到 staging area 里,更新 .git/index
py_v1_index_entry = IndexEntry(
path=os.path.join(base, py_file_name),
blob_sha=py_v1_blob_sha,
base_path=base)
write_index_file([py_v1_index_entry])
看看发生了什么
.git
下多了一个 index
文件
run_cmd(f'tree {base} -a')
>>> tree /tmp/git-mock-xma7sdir -a
/tmp/git-mock-xma7sdir
├── .git
│ ├── config
│ ├── HEAD
│ ├── index
│ ├── objects
│ │ └── 11
│ │ └── b15b1a4584b08fa423a57964bdbf018b0da0d5
│ └── refs
│ └── heads
└── hello.py
5 directories, 5 files
当前仓库的状态也发生了改变,hello.py
已经 stage 了,能够进入下一阶段:commit
run_cmd(f'git -C {base} status')
>>> git -C /tmp/git-mock-xma7sdir status
On branch master
No commits yet
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: hello.py
"""
tree object 里每个条目的结构
`<mode> <object_type> <sha>\t<name>`
"""
class TreeEntry:
def __init__(self, object_type: str, name: str, sha: str):
assert(object_type in ('tree', 'blob', 'commit', 'tag'))
self.object_type = object_type
self.name = name
self.sha = sha
self.mode = '100644' if object_type == 'blob' else '40000'
"""
写入 tree object, 模拟 `git commit` 的第一部分操作
tree object 的
- <ascii type> = tree
- <binary object data> = sorted tree entries
"""
def write_tree_object(entries: List[TreeEntry]) -> str:
sorted_entries = sorted(entries, key=lambda e: e.name)
entries_content = b''.join([
f'{e.mode} {e.name}\0'.encode('utf-8') + bytes.fromhex(e.sha)
for e in sorted_entries
])
raw_content = f'tree {len(entries_content)}\0'.encode('utf-8') + entries_content
sha1 = hashlib.sha1(raw_content).hexdigest()
write_object(raw_content, sha1, git_dir)
return sha1
当前仓库根目录下只有 hello.py
文件,以此为目录结构创建 tree object
py_tree_entry = TreeEntry(object_type='blob', name=py_file_name, sha=py_v1_blob_sha)
first_tree_sha = write_tree_object([py_tree_entry])
print(first_tree_sha)
30ffe02680eefd02f7ada864196baaade119243b
看看有哪些变化
通过 tree_sha
,我们能完整复原仓库的根目录。然后通过每个文件对应的 sha,我们就能 递归 地构建出整个仓库的目录结构
run_cmd(f'git -C {base} cat-file -p {first_tree_sha}')
>>> git -C /tmp/git-mock-xma7sdir cat-file -p 30ffe02680eefd02f7ada864196baaade119243b
100644 blob 11b15b1a4584b08fa423a57964bdbf018b0da0d5 hello.py
同时,.git/objects
里也多了一个对应 tree_sha
的 object 文件
run_cmd(f'tree {base} -a')
>>> tree /tmp/git-mock-xma7sdir -a
/tmp/git-mock-xma7sdir
├── .git
│ ├── config
│ ├── HEAD
│ ├── index
│ ├── objects
│ │ ├── 11
│ │ │ └── b15b1a4584b08fa423a57964bdbf018b0da0d5
│ │ └── 30
│ │ └── ffe02680eefd02f7ada864196baaade119243b
│ └── refs
│ └── heads
└── hello.py
6 directories, 6 files
看一下第一个 tree 的图示
display_tree(tree_sha=first_tree_sha, base_path=base, width='200px')
不过仓库的状态还是有「待提交的文件 hello.py
」,那是因为我们还没有做第二部分操作:写入 commit object
run_cmd(f'git -C {base} status')
>>> git -C /tmp/git-mock-xma7sdir status
On branch master
No commits yet
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: hello.py
my_name = 'Soros Liu'
my_email = '[email protected]'
"""
写入 commit object, 模拟 `git commit` 的第二部分操作
commit object 的
- <ascii type> = commit
- <binary object data> = 见下
binary object data 的格式如下:
```
tree <tree_sha>
parent <parent_commmit_sha>
author <author_name> <author_email> <timestamp>
committer <committer_name> <committer_email> <timestamp>
<commit_message>
```
"""
def write_commit_object(tree_sha: str, parent_commmit_sha: Optional[str], msg: str) -> str:
commit = f'tree {tree_sha}\n' + \
(f'parent {parent_commmit_sha}\n' if parent_commmit_sha else '') + \
f'author {my_name} <{my_email}> {int(time.time())} +0800\n' + \
f'committer {my_name} <{my_email}> {int(time.time())} +0800\n' + \
'\n' + \
msg + \
'\n'
commit_content = commit.encode('utf-8')
raw_content = f'commit {len(commit_content)}\0'.encode('utf-8') + commit_content
sha1 = hashlib.sha1(raw_content).hexdigest()
write_object(raw_content, sha1, git_dir)
return sha1
因为是仓库的第一个提交,所以没有 parent_commit_sha
写上 commit message, 提交当前的仓库快照
只需要 tree sha 信息就够了,通过 tree sha,找到对应的 tree object,就能完整重建整个仓库内容
first_commit_sha = write_commit_object(
tree_sha=first_tree_sha,
parent_commmit_sha=None,
msg='first commit')
print(first_commit_sha)
5b28d3c9988aa1427fe3afe6515869672ef50884
通过 commit_sha
来验证一下 commit object 已经写入成功
run_cmd(f'git -C {base} cat-file -p {first_commit_sha}')
>>> git -C /tmp/git-mock-xma7sdir cat-file -p 5b28d3c9988aa1427fe3afe6515869672ef50884
tree 30ffe02680eefd02f7ada864196baaade119243b
author Soros Liu <soros.liu1029@gmail.com> 1635143887 +0800
committer Soros Liu <soros.liu1029@gmail.com> 1635143887 +0800
first commit
.git/objects
里又多了一个对应 commit_sha
的 object
run_cmd(f'tree {base} -a')
>>> tree /tmp/git-mock-xma7sdir -a
/tmp/git-mock-xma7sdir
├── .git
│ ├── config
│ ├── HEAD
│ ├── index
│ ├── objects
│ │ ├── 11
│ │ │ └── b15b1a4584b08fa423a57964bdbf018b0da0d5
│ │ ├── 30
│ │ │ └── ffe02680eefd02f7ada864196baaade119243b
│ │ └── 5b
│ │ └── 28d3c9988aa1427fe3afe6515869672ef50884
│ └── refs
│ └── heads
└── hello.py
7 directories, 7 files
但是, 当前仓库的状态仍然是有「待提交的文件」
run_cmd(f'git -C {base} status')
>>> git -C /tmp/git-mock-xma7sdir status
On branch master
No commits yet
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: hello.py
看看 git log, 居然报错了, 为什么呢?
run_cmd(f'git -C {base} log')
>>> git -C /tmp/git-mock-xma7sdir log
fatal: your current branch 'master' does not have any commits yet
run_cmd(f'cat {base}/.git/HEAD')
>>> cat /tmp/git-mock-xma7sdir/.git/HEAD
ref: refs/heads/master
我们把第一条 commit 的 commit sha 写入 HEAD
指向的文件中
with open(git_dir.joinpath('refs', 'heads', 'master'), 'wt', encoding='utf-8') as f:
f.write(first_commit_sha)
再来看看仓库的状态
HEAD
指向的文件已经写入 .git/refs/heads
里了
run_cmd(f'tree {base} -a')
>>> tree /tmp/git-mock-xma7sdir -a
/tmp/git-mock-xma7sdir
├── .git
│ ├── config
│ ├── HEAD
│ ├── index
│ ├── objects
│ │ ├── 11
│ │ │ └── b15b1a4584b08fa423a57964bdbf018b0da0d5
│ │ ├── 30
│ │ │ └── ffe02680eefd02f7ada864196baaade119243b
│ │ └── 5b
│ │ └── 28d3c9988aa1427fe3afe6515869672ef50884
│ └── refs
│ └── heads
│ └── master
└── hello.py
7 directories, 8 files
仓库的状态也总算是「nothing to commit, working tree clean」了
run_cmd(f'git -C {base} status')
>>> git -C /tmp/git-mock-xma7sdir status
On branch master
nothing to commit, working tree clean
同时,我们有了第一条 git log !
run_cmd(f'git -C {base} log')
>>> git -C /tmp/git-mock-xma7sdir log
commit 5b28d3c9988aa1427fe3afe6515869672ef50884
Author: Soros Liu <soros.liu1029@gmail.com>
Date: Mon Oct 25 14:38:07 2021 +0800
first commit
加上一个 README
文件,模拟日常的 git
工作流
README.md
md_text = '## Explore Python\n'
md_file_name = 'README.md'
with open(os.path.join(base, md_file_name), 'wt', encoding='utf-8') as f:
f.write(md_text)
run_cmd(f'git -C {base} status')
>>> git -C /tmp/git-mock-xma7sdir status
On branch master
Untracked files:
(use "git add <file>..." to include in what will be committed)
README.md
nothing added to commit but untracked files present (use "git add" to track)
git add README.md
:.git/objects/
blob object 文件.git/index
文件with open(os.path.join(base, md_file_name), 'rt', encoding='utf-8') as f:
file_content = f.read()
md_blob_sha = write_blob_object(file_content)
print(md_blob_sha)
799edde33b434795e10848fbd25bbba1d102c44f
run_cmd(f'git -C {base} status')
>>> git -C /tmp/git-mock-xma7sdir status
On branch master
Untracked files:
(use "git add <file>..." to include in what will be committed)
README.md
nothing added to commit but untracked files present (use "git add" to track)
run_cmd(f'tree {base} -a')
>>> tree /tmp/git-mock-xma7sdir -a
/tmp/git-mock-xma7sdir
├── .git
│ ├── config
│ ├── HEAD
│ ├── index
│ ├── objects
│ │ ├── 11
│ │ │ └── b15b1a4584b08fa423a57964bdbf018b0da0d5
│ │ ├── 30
│ │ │ └── ffe02680eefd02f7ada864196baaade119243b
│ │ ├── 5b
│ │ │ └── 28d3c9988aa1427fe3afe6515869672ef50884
│ │ └── 79
│ │ └── 9edde33b434795e10848fbd25bbba1d102c44f
│ └── refs
│ └── heads
│ └── master
├── hello.py
└── README.md
8 directories, 10 files
md_index_entry = IndexEntry(
path=os.path.join(base, md_file_name),
blob_sha=md_blob_sha,
base_path=base)
write_index_file([py_v1_index_entry, md_index_entry])
run_cmd(f'git -C {base} status')
>>> git -C /tmp/git-mock-xma7sdir status
On branch master
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
new file: README.md
git commit -m "second commit"
.git/objects/
tree object 文件.git/objects/
commit object 文件.git/HEAD
所指向的文件 .git/refs/heads/master
内所存的 commit sha
md_tree_entry = TreeEntry(object_type='blob', name=md_file_name, sha=md_blob_sha)
second_tree_sha = write_tree_object([py_tree_entry, md_tree_entry])
print(second_tree_sha)
0ba40bfddcda73cf6ce598e82b4d3de9d0cc7065
run_cmd(f'git -C {base} cat-file -p {second_tree_sha}')
>>> git -C /tmp/git-mock-xma7sdir cat-file -p 0ba40bfddcda73cf6ce598e82b4d3de9d0cc7065
100644 blob 799edde33b434795e10848fbd25bbba1d102c44f README.md
100644 blob 11b15b1a4584b08fa423a57964bdbf018b0da0d5 hello.py
run_cmd(f'tree {base} -a')
>>> tree /tmp/git-mock-xma7sdir -a
/tmp/git-mock-xma7sdir
├── .git
│ ├── config
│ ├── HEAD
│ ├── index
│ ├── objects
│ │ ├── 0b
│ │ │ └── a40bfddcda73cf6ce598e82b4d3de9d0cc7065
│ │ ├── 11
│ │ │ └── b15b1a4584b08fa423a57964bdbf018b0da0d5
│ │ ├── 30
│ │ │ └── ffe02680eefd02f7ada864196baaade119243b
│ │ ├── 5b
│ │ │ └── 28d3c9988aa1427fe3afe6515869672ef50884
│ │ └── 79
│ │ └── 9edde33b434795e10848fbd25bbba1d102c44f
│ └── refs
│ └── heads
│ └── master
├── hello.py
└── README.md
9 directories, 11 files
再来看看这棵树的图示
display_tree(tree_sha=second_tree_sha, base_path=base, width='500px')
second_commit_sha = write_commit_object(
tree_sha=second_tree_sha,
parent_commmit_sha=first_commit_sha,
msg='second commit')
print(second_commit_sha)
e9e5df04ab501bec4525967eac89d1bb29c425d3
run_cmd(f'git -C {base} cat-file -p {second_commit_sha}')
>>> git -C /tmp/git-mock-xma7sdir cat-file -p e9e5df04ab501bec4525967eac89d1bb29c425d3
tree 0ba40bfddcda73cf6ce598e82b4d3de9d0cc7065
parent 5b28d3c9988aa1427fe3afe6515869672ef50884
author Soros Liu <soros.liu1029@gmail.com> 1635143888 +0800
committer Soros Liu <soros.liu1029@gmail.com> 1635143888 +0800
second commit
run_cmd(f'tree {base} -a')
>>> tree /tmp/git-mock-xma7sdir -a
/tmp/git-mock-xma7sdir
├── .git
│ ├── config
│ ├── HEAD
│ ├── index
│ ├── objects
│ │ ├── 0b
│ │ │ └── a40bfddcda73cf6ce598e82b4d3de9d0cc7065
│ │ ├── 11
│ │ │ └── b15b1a4584b08fa423a57964bdbf018b0da0d5
│ │ ├── 30
│ │ │ └── ffe02680eefd02f7ada864196baaade119243b
│ │ ├── 5b
│ │ │ └── 28d3c9988aa1427fe3afe6515869672ef50884
│ │ ├── 79
│ │ │ └── 9edde33b434795e10848fbd25bbba1d102c44f
│ │ └── e9
│ │ └── e5df04ab501bec4525967eac89d1bb29c425d3
│ └── refs
│ └── heads
│ └── master
├── hello.py
└── README.md
10 directories, 12 files
run_cmd(f'git -C {base} status')
>>> git -C /tmp/git-mock-xma7sdir status
On branch master
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
new file: README.md
with open(git_dir.joinpath('refs', 'heads', 'master'), 'wt', encoding='utf-8') as f:
f.write(second_commit_sha)
run_cmd(f'git -C {base} status')
>>> git -C /tmp/git-mock-xma7sdir status
On branch master
nothing to commit, working tree clean
run_cmd(f'git -C {base} log')
>>> git -C /tmp/git-mock-xma7sdir log
commit e9e5df04ab501bec4525967eac89d1bb29c425d3
Author: Soros Liu <soros.liu1029@gmail.com>
Date: Mon Oct 25 14:38:08 2021 +0800
second commit
commit 5b28d3c9988aa1427fe3afe6515869672ef50884
Author: Soros Liu <soros.liu1029@gmail.com>
Date: Mon Oct 25 14:38:07 2021 +0800
first commit
模拟切换到一个新的分支: new-idea
.git/HEAD
文件存了当前分支的名字,通过间接引用,可以知道当前版本的 commit sha
run_cmd(f'cat {base}/.git/HEAD')
run_cmd(f'git -C {base} branch --show-current')
run_cmd(f'git -C {base} rev-parse HEAD')
>>> cat /tmp/git-mock-xma7sdir/.git/HEAD
ref: refs/heads/master
>>> git -C /tmp/git-mock-xma7sdir branch --show-current
master
>>> git -C /tmp/git-mock-xma7sdir rev-parse HEAD
e9e5df04ab501bec4525967eac89d1bb29c425d3
在 master
分支的基础上创建分支 new-idea
只需要两步:
.git/refs/heads/master
文件到 .git/refs/heads/new-idea
文件.git/HEAD
文件的内容,将 HEAD
指向 new-idea
分支shutil.copy(
git_dir.joinpath('refs', 'heads', 'master'),
git_dir.joinpath('refs', 'heads', 'new-idea')
)
with open(git_dir.joinpath('HEAD'), 'wt', encoding='utf-8') as f:
f.write('ref: refs/heads/new-idea')
再来通过 git
命令检查一下当前的分支,发现已经切换成功,且 HEAD
同样指向 master
分支上的最新 commit
run_cmd(f'git -C {base} branch --show-current')
run_cmd(f'git -C {base} rev-parse HEAD')
run_cmd(f'git -C {base} log --oneline --decorate --graph')
>>> git -C /tmp/git-mock-xma7sdir branch --show-current
new-idea
>>> git -C /tmp/git-mock-xma7sdir rev-parse HEAD
e9e5df04ab501bec4525967eac89d1bb29c425d3
>>> git -C /tmp/git-mock-xma7sdir log --oneline --decorate --graph
* e9e5df0 (HEAD -> new-idea, master) second commit
* 5b28d3c first commit
我们在新分支 new-idea
上做一些改动,然后将这些改动合并到 master
主分支上
py_v2_text = 'print("hello world")\n'
with open(os.path.join(base, py_file_name), 'wt', encoding='utf-8') as f:
f.write(py_v2_text)
再来检查一下仓库状态
和之前新建一个文件后的状态不同,hello.py
是已经被 git
追踪(tracked)的文件,所以这次修改之后,hello.py
的状态是「chnages not staged for commit」
run_cmd(f'git -C {base} status')
>>> git -C /tmp/git-mock-xma7sdir status
On branch new-idea
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: hello.py
no changes added to commit (use "git add" and/or "git commit -a")
再来看看 git diff
命令,在这里 diff 比较的对象是 working directory 里的 hello.py
和 staging area 里的 hello.py
run_cmd(f'git -C {base} diff')
>>> git -C /tmp/git-mock-xma7sdir diff
diff --git a/hello.py b/hello.py
index 11b15b1..8cde782 100644
--- a/hello.py
+++ b/hello.py
@@ -1 +1 @@
-print("hello")
+print("hello world")
我们准备把上面做的改动提交了
先来 git add hello.py
, 这一步是把 working directory 里的 hello.py
暂存到 staging area 里。依然分为两步:
.git/objects/
里的 blob object 文件.git/index
文件。该文件存的是 staging area 的状态with open(os.path.join(base, py_file_name), 'rt', encoding='utf-8') as f:
file_content = f.read()
py_v2_blob_sha = write_blob_object(file_content)
print(py_v2_blob_sha)
8cde7829c178ede96040e03f17c416d15bdacd01
staging area 里现在有两个文件了:
README.md
文件hello.py
文件py_v2_index_entry = IndexEntry(
path=os.path.join(base, py_file_name),
blob_sha=py_v2_blob_sha,
base_path=base)
write_index_file([py_v2_index_entry, md_index_entry])
再来看看仓库的状态
hello.py
已经暂存到 staging area 了,能够提交了
run_cmd(f'git -C {base} status')
>>> git -C /tmp/git-mock-xma7sdir status
On branch new-idea
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
modified: hello.py
提交 hello.py
仍然分三步:
.git/objects/
tree object 文件.git/objects/
commit object 文件.git/HEAD
指向的 .git/refs/heads/new-idea
文件内的 commit sha
py_tree_entry = TreeEntry(object_type='blob', name=py_file_name, sha=py_v2_blob_sha)
third_tree_sha = write_tree_object([py_tree_entry, md_tree_entry])
print('tree sha:', third_tree_sha)
third_commit_sha = write_commit_object(
tree_sha=third_tree_sha,
parent_commmit_sha=second_commit_sha,
msg='third commit')
print('commit sha:', third_commit_sha)
with open(git_dir.joinpath('refs', 'heads', 'new-idea'), 'wt', encoding='utf-8') as f:
f.write(third_commit_sha)
tree sha: b2c193ca98949e609761b7302867b33b81be3242 commit sha: e546bf5191299064c4005517ffa21da992f23401
提交后,仓库的状态是干净的
同时,git log
里也显示分支 new-idea
上有了新的提交,而原先的 master
分支停留在第二个提交上
run_cmd(f'git -C {base} status')
run_cmd(f'git -C {base} log --oneline --decorate --graph')
>>> git -C /tmp/git-mock-xma7sdir status
On branch new-idea
nothing to commit, working tree clean
>>> git -C /tmp/git-mock-xma7sdir log --oneline --decorate --graph
* e546bf5 (HEAD -> new-idea) third commit
* e9e5df0 (master) second commit
* 5b28d3c first commit
来看看当前仓库的结构
display_general(base_path=base, width='100%')
接下来,我们把 new-idea
上的改动合并到 master
分支上
因为 new-idea
上的改动是基于 master
分支的,而且 master
分支本身没有任何提交
体现在 git log
的 commit graph 上,就是一条 线性 的提交历史
此时,git merge
采用的策略是 fast forward,也是最简单的策略,即:
将 master
指向的 commit 同步成 new-idea
指向的 commit
所以我们只需要简单地拷贝 .git/refs/heads/new-idea
文件到 .git/refs/heads/master
文件就行
注意到这里的操作和上面 git checkout -b
新建分支的操作相反
shutil.copy(
git_dir.joinpath('refs', 'heads', 'new-idea'),
git_dir.joinpath('refs', 'heads', 'master')
);
再来看看 git log
,发现 master
分支,new-idea
分支都在最新的 commit 上了
run_cmd(f'git -C {base} log --oneline --decorate --graph')
>>> git -C /tmp/git-mock-xma7sdir log --oneline --decorate --graph
* e546bf5 (HEAD -> new-idea, master) third commit
* e9e5df0 second commit
* 5b28d3c first commit
另一方面,当 git log
的 commit graph 出现分叉时,即要 merge 的两个分支有各自独立的提交时,采用的策略是 recursive.
按下不表了