2023-08-03 at

A Systematic Introduction to Git

Headnote :

Document initialised 2023-AUG-(03-to-08) : because I've been using Git on and off for twelve years without knowing how it works.

Some related references : 

This document introduces Git and its file storage structure; this is not a quick introduction to simply using Git. This touches upon both porcelain and plumbing - focus is on the porcelain, with intermediate-student-level plumbing references to explain how the porcelain is implemented. 


Map of Files versus [ Sections of this Document ]

                      <<Misc.>>   <<Refs>>    <<Heads>>

+ a_main_work_tree/   (2.)
  + .gitattributes    (8.)
  + .gitignore        (7.)
  + .git/ ........... (3.)
    |
    + branches/       (deprecated)
    + hooks/ ........ (6.)
    + info/           (9.)
    + objects/ ...... (3.1.)
    + rebase-apply/   (XXX)
    + refs/                       (3.2.)
    | |
    | + heads/ .............................. (3.2.3.)
    | + remotes/                  (3.2.2.)
    | + stash ................... (3.2.5.)
    | + tags/                     (3.2.1.)
    |
    + CHERRY_PICK_HEAD ...................... (3.2.3.b.1)
    + COMMMIT_EDITMSG (4.)
    + config ........ (5.)
   
+ description     (3.4.)
    + FETCH_HEAD  ........................... (3.2.3.b.2)
    + HEAD                                    (3.2.3.b.)
    + index ......... (3.3.)
    + logs                        (3.2.4.)
    + MERGE_HEAD ............................ (3.2.3.b.3.)
    + MERGE_MSG       (4.1.)
    + ORIG_HEAD ............................. (3.2.3.b.4.)
    + SQUASH_MSG      (4.2.)
    + packed-refs ............... (3.2.a.)

0.

Git USERS ... that's some human, or machine.

1.

Git EXECUTABLES ... are OPERATOR scripts or binaries which take user commands as input, and perform operations as output; this is the git software which "does things"; besides taking commands from shells, or STDIN in general, a TCP server can be started with (git daemon).

2.

Git MAIN WORK TREES a.k.a. WORKING DIRECTORIES a.k.a. WORKING COPIES a.k.a. PROJECT DIRECTORIES ( informally, a.k.a. GIT ROOT ) ... typically located at "a_main_work_tree/.git/", as the parent folders of REPOSITORIES ... are OPERANDS of git executables; users typically mess around here, and selectively include or exclude files to be TRACKED a.k.a. STAGED a.k.a. INDEXED (see 3.3. below).

2.1.

( A REPOSITORY can support only one MAIN WORK TREE, but it can support MULTIPLE LINKED WORK TREES managed via (git worktree), but this is an intermediate-student-level pattern. By convention, the WORK TREE, refers only to the MAIN WORK TREE. )

3.

Git REPOSITORIES a.k.a. REPOS a.k.a. "git DIRECTORIES, typically named ".git"... are OPERANDS of git executables; repositories store data, within a single folder; typically, for convenience, it is stored in the MAIN WORK TREE as a hidden folder, but is not itself a logical component of the MAIN WORK TREE; a repository is initialised with (git init), or cloned with (git clone).

3.1.

The OBJECT DATABASE ... typically stored at ".git/objects/", contains five types of objects.

  3.1.1. BLOBS ... are zlib-compressed files.

  3.1.2. TREES ... are bags of BLOBS and TREES ( like directories ).

  3.1.3. COMMITS a.k.a. CHECK-INS a.k.a. SAVEPOINTS ... are comprised of :

    - a reference to the top-level TREE being controlled;

    - a reference to zero or more parent COMMITS;

    - a timestamp;

    - a author details;

    - a committer details;

    - a log message.

  3.1.4. ANNOTATED TAGS a.k.a. TAG OBJECTS ... created via (git tag -a) ... being different from LIGHTWEIGHT TAGS ( see 3.2.3. below) ... are comprised of :

    - a reference to another OBJECT;

    - metadata relating to the OBJECT : messages, tagger name, tagger email, date; commonly used to store digital signatures of releases;

    - GPG security measures.

  3.1.5. PACKFILES ... are delta-compressed bundles of other objects.

3.2.

REFS a.k.a. REFERENCES a.k.a. LABELS ... typically stored at ".git/refs/" ... point to various COMMITS.

  3.2.a. PACKED REFS ... typically stored at ".git/packed-refs" ... is a performance enhancing cache for ".git/refs/" ... managed via (git pack-refs).

3.2.1. 

LIGHTWEIGHT TAGS ... like ANNOTATED TAGS ( see 3.1.4. above ), but are simply REFS to an object, without other metadata.

3.2.2.

REMOTES ... are REFS to OTHER REPOSITORIES, usually based at an Internet address; (git push) is used to update REMOTE BRANCHES based on LOCAL BRANCHES :

    - conventionally, "origin" is the name given to a REMOTE CLONE of the local repository;

    - conventionally, "upstream" is the name given to a REMOTE DEPENDENCY of the current repository ("which the local repository depends on").

3.2.3.

BRANCH NAMES a.k.a. NAMED BRANCH TIPS a.k.a. NAMED BRANCH "heads" ... are stored at ".git/refs/heads/".

3.2.3.a.

BRANCHES ... refer to the ancestral chain of COMMITS preceding from the COMMIT pointed to by a specific NAMED BRANCH TIP, where the NAMED BRANCH TIP is the NEWEST COMMIT ( youngest descendant ); ancestral chains of COMMITS preceding from any COMMITS which are not NAMED BRANCH TIPS are not formally regarded as BRANCHES; informally ( practically ) everyone refers to BRANCHES by their respective BRANCH NAME.

3.2.3.b. 

HEAD ... typically stored at ".git/HEAD" ... points to the COMMIT which will be the parent for any NEXT COMMIT executed by the USER; this COMMIT may, or may not, already be pointed to by a BRANCH NAME.

  • The CURRENT BRANCH refers to [ the BRANCH, whose BRANCH NAME points to a COMMIT which is also pointed to by HEAD, at the same time ].
  • a REPOSITORY is in ATTACHED HEAD STATE when HEAD points to a COMMIT which is ALREADY pointed to by a BRANCH NAME; this is the most common state for a REPOSITORY.

            : ATTACHED HEAD STATE, can be arrived at via (git checkout EXISTING_BRANCH_NAME), which first repoints HEAD to the COMMIT pointed to by EXISTING_BRANCH_NAME, then updates the INDEX and MAIN WORK TREE to reflect that COMMIT.

            : ATTACHED HEAD STATE, can be arrived at via (git reset).

  • a REPOSITORY is in DETACHED HEAD STATE when HEAD points to a COMMIT which is NOT already pointed to by a BRANCH NAME; this is a normal, but uncommon, state for a REPOSITORY.

            : DETACHED HEAD STATE, can be arrived at via (git checkout a_commit_which_is_not_a_named_branch_tip), which first repoints HEAD to the COMMIT, then updates the INDEX and MAIN WORK TREE to reflect that COMMIT.

            : in DETACHED HEAD STATE, running (git branch a_new_branch_name) will create a NEW BRANCH NAME pointing to this COMMIT, thereby reverting the REPOSITORY to an ATTACHED HEAD STATE. 

3.2.3.b.1.

CHERRY_PICK_HEAD ... typically stored at ".git/CHERRY_PICK_HEAD" ... points to the child COMMIT which was cherry-picked, when a cherry-picking operation is paused, for the USER to intervene in the MAIN WORK TREE, due to CONFLICTS. 

    : When (git cherry-pick) is given a COMMIT_child, it first creates a PATCH encapsulating the changes between COMMIT_child and its COMMIT_parent, then applies that PATCH to the MAIN WORK TREE and INDEX, and then creates a NEW COMMIT; the new COMMIT becomes the new referent of HEAD, as usual. 

3.2.3.b.2.

FETCH_HEAD ... typically stored at ".git/FETCH_HEAD" ... points to the REFS from a REMOTE REPOSITORY which were most recently (git fetch)-ed. 

3.2.3.b.3.

MERGE_HEAD ... typically stored at ".git/MERGE_HEAD" ... points to the COMMITS which were targeted for merging into HEAD ... when a TRUE MERGE operation is paused, for the USER to intervene in the MAIN WORK TREE, due to CONFLICTS.

    : When (git merge) is given a COMMIT_target to be merged with HEAD, it tries to executes pre-merge checks, then either [ executes a FAST-FORWARD MERGE ... meaning that COMMIT_target is an ancestor of HEAD, and has no conflicting changes with HEAD, such that HEAD will be simply repointed to COMMIT ], or [ attempts to execute a TRUE MERGE ].

    : If USER intervention is required, then CONFLICT STYLING is roughly (for details, RTFM) presented in the MAIN WORK TREE as ...  

"unproblematic sources"
<<<<<<< [ name of source1, typically HEAD ]
"conflicting source from source1"
||||||| [ name of source3, typically COMMIT_target_2 ]
"conflicting source from source3"
=======
"conflicting source from source2"
>>>>>>> [ name of source2, typically COMMIT_target_1 ]
"unproblematic sources"

    : Among other things, CONFLICT STYLING and MERGE STRATEGIES can be configured; this is a nicely illustrated post on MERGE STRATEGIES.

    : During USER intervention, (git mergetool) can provide various visual utilities ( I'm partial towards vimdiff ).  

    : (git merge) can, by default, only merge COMMITS with a common ancestor; to override this limitation, there is a special flag.

3.2.3.b.4.

ORIG_HEAD .. typically stored at ".git/ORIG_HEAD" ... verbatim : " ... is created by commands that move your HEAD in a drastic way (git am, git merge, git rebase, git reset), to record the position of the HEAD before their operation, so that you can easily change the tip of the branch back to the state before you ran them."

3.2.4. 

Git REFERENCE LOGS a.k.a. REFLOGS ... typically stored at ".git/logs/" ... accessed by (git reflog) NOT (git log) ... are histories of each REF; REFLOGS may be denoted via "REF@{N}", where N refers to the Nth previous referent COMMIT of REF; examples :

  • "HEAD@{0}" refers to [ "HEAD"'s current referent COMMIT ];
  • "HEAD@{1}" refers to [ the previous referent COMMIT of "HEAD" ];
  • "HEAD@{3}" refers to [ the third-previous referent COMMIT of "HEAD" ];
  • "abc-branch@{0}" refers to [ "abc-branch"'s current referent COMMIT ];
  • "stash@{0} refers to [ the most recently created stash ]
  • "stash@{1} refers to [ the second-most-recently created stash ]
etc. 
There are many syntactical features SPECIFYING REVISION PARAMETERS for intermediate-level-students a.k.a. "<rev>"s : 

  • <sha1> ;
  • <describeOutput> ;
  • <refname> ... see disambiguation ;
  • @ a.k.a. HEAD ;
  • <refname>@{<date>} ... see natural language approximations ;
  • @{<N>} ... where the omission of <refname> assumes the CURRENT BRANCH as <refname> ;
  • @{-<N>} ... referring to the Nth branch/commit checked out before the current one ;
  • <branchname>@{upstream} a.k.a. <branchname>@{u} ;
  • <branchname>@{push} ;
  • <refname>@{N} ... ( as mentioned in 3.2.4. above ) ;
  • <rev> a.k.a. <rev>^0 ;
  • <rev>^ a.k.a. <rev>^1 ;
  • <rev>^<N> ... which is <rev>'s <N>th (immediate) parent ;
  • <rev>^- a.k.a. <rev>^-1 ;
  • <rev>^-<N> a.k.a. <rev>^<N>..<rev> ... which includes <rev> but excludes <rev>^<N> ;
  • <rev>^@ ... which is all <rev>'s (immediate) parents ;
  • <rev>^! ... which includes <rev> but excludes <rev>^@ ;
  • <rev>~ ... is not defined ;
  • <rev>~<N> ... which is <rev>'s <N>th generation ancestor, through each preceding 1st parent ;
  • <rev>^^ a.k.a. <rev>^1^1 ... because it is parsed as (<rev>^1)^1 ... a.k.a. <rev>~2 ;
  • <rev>^{<type>} ;
  • <rev>^{/<text>} ;
  • :/<text> ;
  • <rev>:<path> ;
  • :<N>:<path> ;
  • ^<rev1> ... exclusion notation ;
  • <rev1>..<rev2> a.k.a. ^<rev1> <rev2> ... range notation ;
  • <rev1>...<rev2> ... symmetric difference notation ;
... and more.

3.2.5.

Git STASHES ... typically located at ".git/refs/stash" ... save and retrieve TRACKED but UNCOMMITED changes in MAIN WORK TREE ... to an UNCOMMITTED location ... and reset the MAIN WORK TREE to HEAD ... via (git stash).

3.3.

The INDEX a.k.a. STAGING AREAS a.k.a. STAGE a.k.a. CACHE ... typically stored at ".git/index" ... is a set of pointers to files in the MAIN WORK TREE, which are to be included in any NEXT COMMIT executed by the USER.

  - (git status) will report on both the MAIN WORK TREE and the INDEX; one of the OPERANDS of (git status) is "a_main_work_tree/.gitignore"* ( see section 7. below ); if you wish to configure specific paths to be treated specially by Git EXECUTABLES, then refer to "a_main_work_tree/.gitattributes" ( see section 8. below ).

* edited for clarity 2023-08-09

  - MAIN WORK TREE files are added or removed from the INDEX via (git add) and ( (git reset --hard), (git rm --cached) ); be careful, and first RTFMs.

    - Files in the INDEX which differ from the referent COMMIT of "HEAD" can become a new COMMIT via (git commit); (git commit) can be constructively undone ("soft deleted") by adding a new commit via (git revert COMMIT_ID).

    - (git diff) will show you the difference between the MAIN WORK TREE and the INDEX; passing other parameters to (git diff) will allow you to, 

  • [ compare two paths on the filesystem ] ;
  • [ compare the INDEX versus any named COMMIT ] ;
  • [ compare the MAIN WORK TREE versus any named COMMIT ] ;
  • [ compare any two named COMMITS ] ;
  • [ compare a merge COMMIT versus its two parent COMMITS ] ;
  • [ given two COMMITS A and B, view the changes on B's BRANCH, between (the common ancestor of A and B) versus B ].

3.4.

The REPOSITORY DESCRIPTION ... typically stored at ".git/description" ... is left to the USER's discretion*.

* typo fixed 2023-08-09

4.

COMMIT_EDITMSG ... typically stored at ".git/COMMIT_EDITMSG" ... is a temporarily location for the USER's message when making a NEW COMMIT; perhaps related :
  • 4.1.
    MERGE_MSG ... typically stored at "
    .git/MERGE_MSG";
  • 4.2.
    SQUASH_MSG ... typically stored at "
    .git/SQUASH_MSG".

5.

REPOSITORY CONFIG ... typically stored at ".git/config" ... being different from GLOBAL CONFIG ... is a configuration file.

6.

Git HOOKS ... typically stored at ".git/hooks/" ... are programs that can be triggered by specific points during EXECUTION; the points :
  • ... regarding patches :
    • applypatch-msg
    • pre-applypatch
    • post-applypatch
  • ... regarding commits :
    • pre-commit
    • pre-merge-commit
    • prepare-commit-msg
    • commit-msg
    • post-commit
    • post-rewrite
  • ... regarding pushs :
    • pre-push 
    • pre-receive
    • update
    • post-receive
    • post-update
    • push-to-checkout
  • ... other :
    • pre-rebase
    • post-checkout
    • reference-transaction
    • post-merge
    • pre-auto-gc
    • sendemail-validate
    • fsmonitor-watchman
    • ... some hooks are p4/Perforce-specific

7.

.GITIGNORE ... typically stored at "a_main_work_tree/.gitignore" ... indicates files which should be not TRACKED by the INDEX.

8.

.GITATTRIBUTES ... typically stored at "a_main_work_tree/.gitattributes" ... indicates path-specific settings for such filters as :
  • which files are binary ( for efficient diffing, based on file format )
  • SMUDGE FILTERING ... for keyword transformation upon CHECKOUT ( INDEX to WORK TREE )
  • CLEAN FILTERING ... for keyword transformation upon STAGING ( WORK TREE to INDEX )
  • file exclusion and keyword transformation upon REPOSITORY EXPORT
  • per-path MERGE STRATEGIES

9.

The INFO folder ... typically stored at ".git/info/" ... is an alternative location for the information found at .GITIGNORE and .GITATTRIBUTES ( see items 7., and 8. above ).

10.

What goes on during basic usage. Er, just read this ... 
... it illustrates the common elementary usage patterns very well.

Try reading it in this order :
  • REVERT and RESET
    • REMOTE REPO - LOCAL REPO - LOCAL STAGE - LOCAL MAIN WORK TREE
    • RR -revert-> LR  -reset-> LS -checkout-> LMWT
    • RR <-push-   LR <-commit- LS <-add-      LMWT
  • SWITCH and CHECKOUT
    • checkout commit, checkout branch, checkout -- file
  • ORIGIN and UPSTREAM
    • fork, pull request, pull, clone, push
  • FETCH and PULL
    • merge
  • MERGE and REBASE
  • HEAD~ and HEAD^

11.

More on the same confusing terminology.

11.1.


11.2.

11.3.

11.4.

11.5.

11.6.

Walking the graph with REV-LIST : discussion

11.7.

11.8.

MERGE-BASE can help in finding good common ancestors for a merge

11.9.

SUBMODULES - nested REPOSITORIES

11.10.

No comments :

Post a Comment