21 Version Control with Git and GitHub
21.1 Introduction to Git and Version Control
Version control systems (VCS) are essential tools in modern programming, providing a framework to manage, track, and organize code changes over time. Git, a distributed version control system, allows users to create multiple versions of a project, facilitating collaboration and simplifying the process of managing historical changes. Created by Linus Torvalds in 2005, Git has become a cornerstone in software development, with GitHub providing a popular web-based platform for Git repositories, enabling sharing and collaborative coding.
For statisticians and mathematicians, Git and GitHub offer streamlined methods for managing code, facilitating collaboration, and ensuring reproducibility. Git enables tracking modifications and maintaining a clear history, while GitHub offers remote storage, a web interface, and collaboration features, making it ideal for both individual and team projects.
21.2 Setting Up Git and GitHub
Git and GitHub work together to streamline version control both locally and remotely. This section will guide you through setting up Git, creating a GitHub account, and connecting Git with GitHub, so you can start tracking and sharing your code.
21.2.1 Installing Git
Git must be installed on your computer to work with version control locally. Installation instructions vary based on operating system:
Windows:
Go to the official Git website at git-scm.com and click on the “Download for Windows” button.
Run the downloaded installer file. During installation, you will see multiple configuration options. For beginners, the default settings are generally appropriate. However, ensure the “Git Bash Here” option is enabled, as it provides a command-line interface for Git commands.
Once installed, open Git Bash by searching for it in your Start menu, and type the following command to verify the installation:
git --version
This should display the installed version of Git.
macOS:
Git is often pre-installed on macOS. You can check by opening Terminal and typing:
git --version
If it is not installed, Git can be installed using Homebrew, a package manager for macOS. If Homebrew is not already installed, visit brew.sh and follow the instructions.
Install Git by opening Terminal and typing:
brew install git
With Git installed, you are ready to initialize repositories and start tracking changes in your projects.
21.2.2 Creating a GitHub Account
GitHub is an online platform where you can store and share your Git repositories. Creating an account is straightforward:
- Go to GitHub.com and click “Sign Up.”
- Follow the on-screen instructions to complete the account creation, including verifying your email and setting up a secure password.
- GitHub will also prompt you to create your first repository. You can skip this step for now if you want to create a repository directly from your local Git setup.
After signing up, you’ll want to generate a Personal Access Token (PAT) for secure access between Git and GitHub. This token replaces your password when using Git on the command line or in IDEs.
Generating a Personal Access Token (PAT):
- In GitHub, go to Settings > Developer settings > Personal access tokens.
- Click Generate new token, give it a descriptive name, and select appropriate permissions. For most uses, choose
repo
(for full control over repositories) andworkflow
(for CI/CD workflows). - Copy and save the token somewhere secure, as it will only be displayed once. This token will be required for any remote pushes or fetches involving your GitHub repositories.
21.2.3 Configuring Git with GitHub
Before you can interact with GitHub, it is important to set up Git with your GitHub credentials. This helps Git identify you and ensures all commits are associated with your GitHub profile.
Set Your Username and Email Address
The first time you set up Git, configure your username and email. This information will be attached to all your commits.
git config --global user.name "your_username" git config --global user.email "your_email@example.com"
Verify the Configuration
To verify that your username and email have been correctly set, you can view the configuration with:
git config --list
This command will display your Git settings, confirming the username and email that will appear on all commits.
Caching GitHub Credentials
To avoid entering your PAT each time you push or pull from GitHub, you can cache your credentials. Git offers a credentials helper to store your PAT securely:
macOS:
git config --global credential.helper osxkeychain
Windows:
git config --global credential.helper wincred
Testing the Connection to GitHub
To confirm Git and GitHub are configured correctly, try cloning an empty repository from GitHub. First, create an empty repository in GitHub by clicking “New” on your GitHub dashboard. Copy the repository’s URL and use the following command:
git clone https://github.com/your_username/your_repository.git
If everything is configured correctly, this command should clone the repository to your local machine without errors.
21.3 Basic Git Commands in Python
Once you have Git and GitHub set up, you can start using basic Git commands to track, manage, and push code changes. These commands can be run in a terminal or integrated into your Python code using libraries like GitPython
. This section covers essential Git operations, including initializing a repository, staging and committing changes, and viewing repository status.
21.3.1 Initializing a Repository
Creating a Git repository is the first step in tracking changes for a new project. This action initializes a .git
folder in your project directory, which will store all Git data, including your commit history and configurations.
Command Line: Navigate to your project folder and use:
git init
This command will initialize a new Git repository in the current directory.
Python with
GitPython
:from git import Repo = Repo.init("/path/to/your/project") repo
This code initializes a repository in the specified directory, enabling you to start tracking changes from within Python.
21.3.2 Tracking Changes with add
, commit
, and status
Tracking changes in Git follows a specific workflow where files are staged, committed, and reviewed. Let’s explore these steps in detail.
Adding Files to the Staging Area
Before saving a snapshot of your changes, you must add files to the “staging area.” Staging allows you to control which files will be included in the next commit.
- Command Line: Use
git add
to stage a file or directory. For example:
git add filename.py
To stage all changes in the repository, use:
git add .
Python with
GitPython
:= Repo('/path/to/your/project') repo 'filename.py']) repo.index.add([
This line stages the specified file. You can also add multiple files by including them in the list, e.g.,
['file1.py', 'file2.py']
.
- Command Line: Use
Committing Changes
A commit is a snapshot of your repository at a specific point in time. Each commit should have a message that describes the changes.
Command Line: After staging, use
git commit
to save your changes:git commit -m "Initial commit"
The
-m
flag allows you to include a message that summarizes the purpose of the commit.Python with
GitPython
:"Initial commit") repo.index.commit(
This line commits staged changes with the provided message.
Viewing the Status
It’s often helpful to review the current status of your repository to see which files are staged, unstaged, or untracked.
Command Line: To view the status of your repository, use:
git status
This command displays a summary of changes, showing which files are staged, unstaged, or untracked.
Python with
GitPython
:= Repo('/path/to/your/project') repo for item in repo.index.diff(None): print(item.a_path, item.change_type)
This Python code provides a list of files with their change types (e.g., added, modified, deleted), giving an overview of the repository’s current status.
21.3.3 Reviewing Commit History with log
Reviewing the commit history is an important part of version control. Git allows you to see past commits, which include messages, author information, and unique commit IDs.
Command Line: Use
git log
to view the commit history:git log
This command shows a list of commits in reverse chronological order. Each entry includes a commit hash, author, date, and message.
Python with
GitPython
:for commit in repo.iter_commits(): print(f"Commit: {commit.hexsha}\nAuthor: {commit.author.name}\nDate: {commit.committed_datetime}\nMessage: {commit.message}\n")
This code iterates through the commit history, displaying details for each commit. The
hexsha
attribute provides the unique commit ID.
21.3.4 Undoing Changes with reset
and checkout
Git provides several commands to undo changes or revert to a previous state, helping to recover from mistakes or unwanted changes.
Unstaging Files
If you accidentally added a file to the staging area, you can remove it without discarding changes.
Command Line: Use
git reset
to unstage:bash git reset filename.py
Python with
GitPython
:python repo.index.remove(['filename.py'], working_tree=True)
Reverting to Previous Commits
You can roll back your repository to a previous state by checking out a previous commit. This will reset your working directory to that specific point in history.
- Command Line: Use
git checkout
followed by the commit hash:
git checkout <commit_hash>
- Python with
GitPython
:
'commit_hash') repo.git.checkout(
Replace
'commit_hash'
with the actual hash of the commit you want to revert to.- Command Line: Use
21.3.5 Pushing and Pulling Changes with Remote Repositories
To collaborate with others or back up your work, you’ll need to push changes to a remote repository or pull updates from it. Let’s see how to work with remote repositories on GitHub.
Adding a Remote Repository
Linking your local Git repository to a GitHub repository allows you to sync changes.
- Command Line: Use
git remote add origin
with the repository URL:
git remote add origin https://github.com/your_username/your_repository.git
- Python with
GitPython
:
= repo.create_remote('origin', 'https://github.com/your_username/your_repository.git') origin
- Command Line: Use
Pushing Changes
After committing, you can push changes to GitHub to make them available remotely.
- Command Line: Use
git push
to upload changes:
git push -u origin main
- Python with
GitPython
:
origin.push()
- Command Line: Use
Pulling Changes
To retrieve the latest changes from the remote repository, use the
pull
command.- Command Line:
git pull origin main
- Python with
GitPython
:
origin.pull()
These commands form the foundation for using Git effectively. With these basics, you can track, commit, and sync changes locally and remotely, supporting both individual and collaborative workflows.