15. Git Data Structure

Git stores all its data in a content addressable key-value store. The the key is always computed from the content itself. The computation uses SHA-1 hash.

15.1 Git Objects

A very light explanation of the Git Objects. blob object
Each file in a git repository stored is called blob object. The Value of blob object is the content of the file and key is the SHA-1 hash of the very content of the file.

sha-1(
    blob object content := whole file content.      
)

Tree Object
blob has the file content but does not tell where it reside. So, Tree Object does this job. It contains entire directory structure of the all files in a repo.

sha-1(
    **each** immediate file path in that directory (not sub-dirs) with its sha-1 hash, meta-data permissions etc
    **each** immediate sub dir tree name and sha-1 hash

Git stores data using DAG (Directed Acyclic Graph). Key of each object(node) of DAG is a SHA-1 hash of the very object object and its predecessors(all parents).

Commit Object
Every change we make to files needs to be stored in identifiable manner. This is done by using commit objects.
Commit object key:value.

sha-1(
    tree_obj_key : key (SHA-1) of tree object.
    parents_key  : key (SHA-1) of the previous commit (or commits if it is merge from two or more branches).  And this is how git can tell the diff from previous commit. Git does not store **diff**. It is computed on the fly! 
    meta_data    : commiter name, email, data, time etc 

15.2 References

  • https://lwn.net/Articles/811068/
  • https://blog.thoughtram.io/git/2014/11/18/the-anatomy-of-a-git-commit.html