NMFS Openscapes | Wiki | NMFS R User Group |
Git A program to track your file changes and create a history of those changes. Creates a ‘container’ for a set of files called a repository.
GitHub A website to host these repositories and allow you to sync local copies (on your computer) to the website. Lots of functionality built on top of this.
.git
(so if you
wanted to get rid of the history and other Git info, you could delete
that folder). You have a local repo and a remote repo (on GitHub).Today I will cover the basic Git/GitHub skills (and info) that are all most people need for 95% of their work. I am using GitHub Desktop. 1 If you want to use Git from RStudio, go to set-up and scroll to the section on RStudio and Git.
Simple Trunk-based Workflow:
We’ll do this
Not this, i.e. what you would probably see if you Google “Git”. 2
I’ll be using GitHub Desktop for the lecture. The first time you use it, you will need to
Not using GitHub Desktop? Then you need to do this from the shell Show me how. RStudio users could also do this from the R command line; Read how
Repository skills (using GitHub Desktop)
Skill 1: Create a blank repo on GitHub
Skill 2: Clone your GitHub repo onto your computer
Skill 0: Open your repository in your editing platform.
Skill 3: Make some changes and commit those local changes
Skill 4: Push local changes to GitHub
Skill 1b: Clone someone else’s GitHub repository
.gitignore
https://www.github.com/yourname/yourrepo
Show me – Show me with the shell – Show me with RStudio – Show me with Visual Studio Code
Show me – Show me with the shell – Show me with RStudio – Show me with Visual Studio Code
To push changes you committed in Skill #3
To pull changes on GitHub that are not on your local computer:
Show me – Show me with RStudio – Show me with the shell – Show me with VS Code
You can copy your own or other people’s repos3.
+
in top right and click
import repository
. Paste in the url and give your repo a
name.Let say you want to make a copy of one of your GitHub repositories and use it as a template to make something brand new.
END OF MATERIAL COVERED IN WEEK 1 LECTURE
Below is supplemental information
Note there are some easier ways to do this but the above is how to do it with your Skills 1-4. Also in my experience the other ‘easier’ ways have a tendency of creating problems for beginners.
Forking is if you are contributing to someone else’s repository (the ‘upstream’ respository). Your fork is another separate origin repository on GitHub but it ‘knows’ the upstream repository and you can easily pull in changes from it and you can make a pull request to push your changes to the upstream repository. You don’t push directly to the upstream repository.
Cloning makes a local copy of a respository on GitHub. The clone is connected directly to the origin repository on GitHub and you push directly to the origin.
Importing/Copying is if you want your own copy of the repository because you want to make your own version of the code or use it as a starting point for your own project. The copy is not connected to the original repository.
Merge conflicts happen when there are changes to a file on your origin repository (GitHub) but also changes to that same file on your local repository. Git doesn’t know how to resolve the conflicting changes and needs your help. GitHub Desktop will warn you and give you some helpful options to resolve these.
A copy of your repository that you can work on without changing the main branch of the repository. Once you are done, you incorporate the changes into the main branchg. Beginners should steer clear of branches until they feel confident with Git and GitHub. I maintain many repositories and use branches on only 2 of them.
If you decide later on to start using branches, then read up on Gitflow versus trunk-based development styles: here and here. The development styles with lots of branching (and branches on branches) are known for merge conflict headaches.
cd
to the directory with the repository.git
foldergit reset --hard
Everything else you do can be recovered.
git reset --hard
actually deletes content forever.
Here’s an alternative.
git log
Look around at the state of your repo at different commits.
git reset <some commit hash #>
Reset to some past state.
git push -f origin main
Make the remote match your local state. Note -f
is
--force
and is a bad thing to do if you are collaborating
because you might wipe out the other persons work.
Say you made a change and you need to get rid of that. The temptation
(for me) is to jump onto the Git command line and clobber my repository
with reset
and revert
commands. Don’t do this.
Here are some strategies that will make this let prone to leaving your
code a mess.
No? Easy click on the file in GitHub Desktop, right click and click ‘Discard Changes…’. Note this will take things all the way back to your last commit!! If you have been making a bunch of changes without committing those, then you are out of luck.
Yes? Go to History in the GitHub Desktop window, click on the commit and click ‘Revert’. This will get rid of all the changes that went with that commit. So if you changed multiple files, all those files will be reverted. If you have pushed the changes to GitHub, then you can push the revert and it’ll show up on GitHub too.
GitHub Desktop makes resolving these pretty easy.
hello.R
and show where
the conflicts are. You then edit hello.R
in RStudio to fix
the conflicts.hello.R
are still there.
hello.R
and fix the conflict. Git won’t have
marked it so it might be hard to find.Let’s say you made a big multi-file commit and you want to revert one file.
You can do this at the Git command line, but I find that to be a huge time suck and in my early Git days, I sometimes left my repository with a horrible problem that I could not fix and had to completely rebuild my repo. Since I don’t need to be a Git wizard, this is what I do when I want to go ‘back in time’ for a single file.
Assuming you have already pushed the changes up to GitHub.
< >
to browse your repo at
the state in time where your file was ok.If you have not pushed the changes up to GitHub.
Ok, here’s the Git command to get a single file back. This works whether or not you have pushed to GitHub. The problem with this and why I don’t do it is that I usually need to look at the file. So I am scrolling back through the status of my repo in the past until I find the status that I want. Then I stare a bit and think and think. Then get a coffee and think some more. Then I scroll back through the status of the repo in the past some more and THEN I do the copy and paste. It is rarely the case that I know exactly what commit that I need to get rid of—and even rarer that I want to go completely to a status in the past.
git checkout SHA~1 -- ./<file name>
For examplegit checkout 1d0f8c2eb4e66db0a7123588ae2fad26a6338303~1 -- ./R/test.R
would reset test.R to one before that commit. This part
1d0f8c2eb4e66db0a7123588ae2fad26a6338303
is the bad commit
hash and this part ~1
means what the file was like 1 commit
before that.If you accidentally leave off the file name and Git says you have a
detached head, use git checkout master
or
git checkout main
to reattach your head.
I tend to commit constantly as I make little changes to files. If you do that too, you might want to “squash” commits together into one commit that summarizes all the changes. This is easy to do in GitHub Desktop.
Time stamps is something that you probably use to see when a
file was last changed. Time stamps lose meaning if you use
git checkout
for branches or tags. In fact, in the
Git-verse, time doesn’t exist since Git workflow is not necessarily
linear. I find this very confusing so I purposely work in a linear
fashion with Git.
To fix time stamps when you use git checkout
(i.e. switch branches), you can use a post-checkout
file in
the .git/hooks
folder.
post-checkout.sample
. Save as
post-checkout
#!/bin/sh -e
OS=${OS:-`uname`}
if [ "$OS" = 'Darwin' ]; then
get_touch_time() {
date -r ${unixtime} '+%Y%m%d%H%M.%S'
}
else
get_touch_time() {
date -d @${unixtime} '+%Y%m%d%H%M.%S'
}
fi
# all git files
git ls-tree -r --name-only HEAD > .git_ls-tree_r_name-only_HEAD
# modified git files
git diff --name-only > .git_diff_name-only
# only restore files not modified
comm -2 -3 .git_ls-tree_r_name-only_HEAD .git_diff_name-only | while read filename; do
unixtime=$(git log -1 --format="%at" -- "${filename}")
touchtime=$(get_touch_time)
# echo ${touchtime} "${filename}"
touch -t ${touchtime} "${filename}"
done
rm .git_ls-tree_r_name-only_HEAD .git_diff_name-only
Why GitHub Desktop? I teach using it because it is made by GitHub and all you have to do is download and login with your GitHub account. No hassles for students. I personally use it because my Git and GitHub work is simply much faster using it. I have many GitHub repos so speed and being able to track multiple repos is important to me. Also I use GitHub issues to plan and track my work. Being able to type ‘#’ in my commit message and see what issues are open on GitHub is critical for my workflow. Other people use RStudio’s Git GUI but it doesn’t have the functions I need. I also like GitKraken. Search around and find something that works for your purposes. In JupyterLab I use the jupyter-git extension.↩︎
Note, if you decide later on to start using branches, then read up on Gitflow (the second figure) versus trunk-based (the first figure) development styles: here and here. The development styles with lots of branching (and branches on branches) are known for merge conflict headaches.↩︎
This is different from forking. There is no connection to the original repository.↩︎