Compartmentalized | Documented | Extendible | Reproducible | Robust |
This week I will the basic Git/GitHub skills (and info) that are all most NMFS scientists will need.
Repository skills (using GitHub Desktop)
Skill 1: Clone one of your GitHub repos onto your computer
Skill 2: Tell RStudio about a local repository
Skill 3: Commit local changes and push GitHub
How to clone someone else’s GitHub repository
How to use a repository as a template for something brand new.
New definitions
Key GitHub.com skills.
Tips to be aware of:
Topics I won’t cover:
git push
first time you need to give RStudio your GitHub login).This is not a bespoke website. It is a Jekyll site being generated from GitHub. All the data driving the College Scorescard is available on the GitHub repository maintained by the Department of Education.
https://nmfs-fish-tools.github.io/
Here is an example of one of the tools: r4MAS
Again this is not a bespoke website. In this case, r4MASS is an R package and a tool called pkgdown was used to create this site which GitHub shows as webpage in your browser.
###Create a blank repo on GitHub
https://www.github.com/yourname/yourrepo
New Project
.Existing Directory
and navigate to the directory where you just saved the repo.In a RStudio project that is also a Git repository
You can clone your own or other people’s repos.
+
in top right and click import repository
. Paste in the url and give your repo a name.Optional Method 2.
If RStudio knows where Git is, you can use this method.
Step 1-4 are the same.
New Project
. Then select Version Control
and paste in the url of your repository’s url. For example, https://github.com/<youraccount>/Test
Let say you want to make a copy of one of your GitHub repositories and use it as a template to make something brand new.
Note there are some easier ways to do this but the above is how to do it with your skills 1-3.
In GitHub Desktop, click File > Add Local Repository… Then navigate to the folder and click that. GitHub Desktop will complain that there is no repository at that location but gives you option to ‘Create new repository’. Click that and answer questions. Then you’ll see ‘Publish to GitHub’ at the top. Click. Done! You can also do File > New Repository but getting it to work on an existing folder is finicky and prone to unintended consequences.
In RStudio, Tools > Project Options… > Git/SVN > use the dropdown to select Git. It’ll ask you to restart RStudio. Go to GitHub Desktop, click File > Add Local Repository… Then navigate to the folder and click that. Then you’ll see ‘Publish to GitHub’ at the top. Click. Done!
Forking is if you are contributing to someone else’s repository. In that case, you need to make ‘pull requests’. Pull request = ‘here is a suggested change’ request.
Cloning is if you want your own copy of the repository because you want to make your own version of the code or use it as a starting point for your own project. Or you need to clone a blank repository to get started on a project.
Merge conflicts happen when there are changes to a file on your remote repository (GitHub) but also changes to that same file on your local repository. Git doesn’t know how to resolve the conflicting changes and needs your help. GitHub Desktop will warn you and give you some helpful options to resolve these.
A copy of your repository that you can work on without changing the main repository. Once you are done, you incorporate the changes into the main repository. Most of you should steer clear of branches because they are incompatible with our common workflows!! I maintain 40+ repositories and use branches on only 2 of them.
When your switch to a branch, i.e. checkout
a branch, it changes the files in that folder to the state of the branch. The info to restore the files to the main branch state is in the .git
folder.
Let’s say you have a repository (folder) for all your common functions, common
and in that a folder called R
with a function basicplot.R
. You decide to create a branch to play around with some other options for basicplot.R
. So you switch branches to branch temp
and make some sandboxy changes to basicplot.R
. Those changes are in your file system, not on some magic separate ‘branch’. Any call to like this in your other code is reading that sandboxy basicplot.R
.
source(
~/Documents/common/R/basicplot.R`)Let’s say you have you hard drive on an automatic backup system. It will backup the branch temp
. The main branch info is in the .git
folder but this will be pretty confusing if someone looks at the files.
For this reason, switching branches will reset the time stamp of basicplot.R
when you switch back to the main branch.
Pro tip checkout the state of the repository at the time of a release. From the terminal:
git checkout v1.0
When done:
git switch -
Warning git checkout ...
will change all your time stamps.
I use this to organize collections of repositories.
Say you made a change and you need to get rid of that. The temptation (for me) is to jump onto the Git command line and clobber my repository with reset
and revert
commands. Don’t do this. Here are some strategies that will make this let prone to leaving your code a mess.
No? Easy click on the file in GitHub Desktop, right click and click ‘Discard Changes…’. Note this will take things all the way back to your last commit!! If you have been making a bunch of changes without committing those, then you are out of luck.
Yes? Go to History in the GitHub Desktop window, click on the commit and click ‘Revert’. This will get rid of all the changes that went with that commit. So if you changed multiple files, all those files will be reverted. If you have pushed the changes to GitHub, then you can push the revert and it’ll show up on GitHub too.
GitHub Desktop makes resolving these pretty easy.
hello.R
and show where the conflicts are. You then edit hello.R
in RStudio to fix the conflicts.hello.R
are still there.
hello.R
and fix the conflict. Git won’t have marked it so it might be hard to find.Those using Git in RStudio Merge conflicts are a bit of a disaster in RStudio, and RStudio gives no warning before it mucks up your files. So it you are pushing/pulling from RStudio be sure to practice on some toy merge conflicts before you run into a real one.
Let’s say you made a big multi-file commit and you want to revert one file.
You can do this at the Git command line, but I find that to be a huge time suck and in my early Git days, I sometimes left my repository with a horrible problem that I could not fix and had to completely rebuild my repo. Since I don’t need to be a Git wizard, this is what I do when I want to go ‘back in time’ for a single file.
Assuming you have already pushed the changes up to GitHub
< >
to browse your repo at the state in time where your file was ok.If you have not pushed the changes up to GitHub.
Ok, here’s the Git command to get a single file back. This works whether or not you have pushed to GitHub. The problem with this and why I don’t do it is that I usually need to look at the file. So I am scrolling back through the status of my repo in the past until I find the status that I want. Then I stare a bit and think and think. Then get a coffee and think some more. Then I scroll back through the status of the repo in the past some more and THEN I do the copy and paste. It is rarely the case that I know exactly what commit that I need to get rid of—and even rarer that I want to go completely to a status in the past.
git log
to find the commit hash (the long number)git checkout 1d0f8c2eb4e66db0a7123588ae2fad26a6338303~1 -- ./R/test.R
would reset test.R to one before that commit. This part 1d0f8c2eb4e66db0a7123588ae2fad26a6338303
is the bad commit hash and this part ~1
means what the file was like 1 commit before that.If you accidentally leave off the file name and Git says you have a detached head, use git checkout master
to reattach your head.
Time stamps is something that you probably use to see when a file was last changed. Time stamps lose meaning if you use git checkout
for branches or tags. In fact, in the Git-verse, time doesn’t exist since Git workflow is not necessarily linear. I find this very confusing so I purposely work in a linear fashion with Git.
To fix time stamps when you use git checkout
(i.e. switch branches), you can use a post-checkout
file in the .git/hooks
folder.
post-checkout.sample
. Save as post-checkout
#!/bin/sh -e
OS=${OS:-`uname`}
if [ "$OS" = 'Darwin' ]; then
get_touch_time() {
date -r ${unixtime} '+%Y%m%d%H%M.%S'
}
else
get_touch_time() {
date -d @${unixtime} '+%Y%m%d%H%M.%S'
}
fi
# all git files
git ls-tree -r --name-only HEAD > .git_ls-tree_r_name-only_HEAD
# modified git files
git diff --name-only > .git_diff_name-only
# only restore files not modified
comm -2 -3 .git_ls-tree_r_name-only_HEAD .git_diff_name-only | while read filename; do
unixtime=$(git log -1 --format="%at" -- "${filename}")
touchtime=$(get_touch_time)
# echo ${touchtime} "${filename}"
touch -t ${touchtime} "${filename}"
done
rm .git_ls-tree_r_name-only_HEAD .git_diff_name-only