21  Version Control with Git and GitHub

21.1 Introduction to Git and Version Control

Version control systems (VCS) are essential tools in modern programming, providing a framework to manage, track, and organize code changes over time. Git, a distributed version control system, allows users to create multiple versions of a project, facilitating collaboration and simplifying the process of managing historical changes. Created by Linus Torvalds in 2005, Git has become a cornerstone in software development, with GitHub providing a popular web-based platform for Git repositories, enabling sharing and collaborative coding.

For statisticians and mathematicians, Git and GitHub offer streamlined methods for managing code, facilitating collaboration, and ensuring reproducibility. Git enables tracking modifications and maintaining a clear history, while GitHub offers remote storage, a web interface, and collaboration features, making it ideal for both individual and team projects.

21.2 Setting Up Git and GitHub

Git and GitHub work together to streamline version control both locally and remotely. This section will guide you through setting up Git, creating a GitHub account, and connecting Git with GitHub, so you can start tracking and sharing your code.

21.2.1 Installing Git

Git must be installed on your computer to work with version control locally. Installation instructions vary based on operating system:

  • Windows:

    1. Go to the official Git website at git-scm.com and click on the “Download for Windows” button.

    2. Run the downloaded installer file. During installation, you will see multiple configuration options. For beginners, the default settings are generally appropriate. However, ensure the “Git Bash Here” option is enabled, as it provides a command-line interface for Git commands.

    3. Once installed, open Git Bash by searching for it in your Start menu, and type the following command to verify the installation:

      git --version

      This should display the installed version of Git.

  • macOS:

    1. Git is often pre-installed on macOS. You can check by opening Terminal and typing:

      git --version
    2. If it is not installed, Git can be installed using Homebrew, a package manager for macOS. If Homebrew is not already installed, visit brew.sh and follow the instructions.

    3. Install Git by opening Terminal and typing:

      brew install git

With Git installed, you are ready to initialize repositories and start tracking changes in your projects.

21.2.2 Creating a GitHub Account

GitHub is an online platform where you can store and share your Git repositories. Creating an account is straightforward:

  1. Go to GitHub.com and click “Sign Up.”
  2. Follow the on-screen instructions to complete the account creation, including verifying your email and setting up a secure password.
  3. GitHub will also prompt you to create your first repository. You can skip this step for now if you want to create a repository directly from your local Git setup.

After signing up, you’ll want to generate a Personal Access Token (PAT) for secure access between Git and GitHub. This token replaces your password when using Git on the command line or in IDEs.

Generating a Personal Access Token (PAT):

  1. In GitHub, go to Settings > Developer settings > Personal access tokens.
  2. Click Generate new token, give it a descriptive name, and select appropriate permissions. For most uses, choose repo (for full control over repositories) and workflow (for CI/CD workflows).
  3. Copy and save the token somewhere secure, as it will only be displayed once. This token will be required for any remote pushes or fetches involving your GitHub repositories.

21.2.3 Configuring Git with GitHub

Before you can interact with GitHub, it is important to set up Git with your GitHub credentials. This helps Git identify you and ensures all commits are associated with your GitHub profile.

  1. Set Your Username and Email Address

    The first time you set up Git, configure your username and email. This information will be attached to all your commits.

    git config --global user.name "your_username"
    git config --global user.email "your_email@example.com"
  2. Verify the Configuration

    To verify that your username and email have been correctly set, you can view the configuration with:

    git config --list

    This command will display your Git settings, confirming the username and email that will appear on all commits.

  3. Caching GitHub Credentials

    To avoid entering your PAT each time you push or pull from GitHub, you can cache your credentials. Git offers a credentials helper to store your PAT securely:

    • macOS:

      git config --global credential.helper osxkeychain
    • Windows:

      git config --global credential.helper wincred
  4. Testing the Connection to GitHub

    To confirm Git and GitHub are configured correctly, try cloning an empty repository from GitHub. First, create an empty repository in GitHub by clicking “New” on your GitHub dashboard. Copy the repository’s URL and use the following command:

    git clone https://github.com/your_username/your_repository.git

    If everything is configured correctly, this command should clone the repository to your local machine without errors.

21.3 Basic Git Commands in Python

Once you have Git and GitHub set up, you can start using basic Git commands to track, manage, and push code changes. These commands can be run in a terminal or integrated into your Python code using libraries like GitPython. This section covers essential Git operations, including initializing a repository, staging and committing changes, and viewing repository status.

21.3.1 Initializing a Repository

Creating a Git repository is the first step in tracking changes for a new project. This action initializes a .git folder in your project directory, which will store all Git data, including your commit history and configurations.

  • Command Line: Navigate to your project folder and use:

    git init

    This command will initialize a new Git repository in the current directory.

  • Python with GitPython:

    from git import Repo
    repo = Repo.init("/path/to/your/project")

    This code initializes a repository in the specified directory, enabling you to start tracking changes from within Python.

21.3.2 Tracking Changes with add, commit, and status

Tracking changes in Git follows a specific workflow where files are staged, committed, and reviewed. Let’s explore these steps in detail.

  1. Adding Files to the Staging Area

    Before saving a snapshot of your changes, you must add files to the “staging area.” Staging allows you to control which files will be included in the next commit.

    • Command Line: Use git add to stage a file or directory. For example:
    git add filename.py

    To stage all changes in the repository, use:

    git add .
    • Python with GitPython:

      repo = Repo('/path/to/your/project')
      repo.index.add(['filename.py'])

      This line stages the specified file. You can also add multiple files by including them in the list, e.g., ['file1.py', 'file2.py'].

  2. Committing Changes

    A commit is a snapshot of your repository at a specific point in time. Each commit should have a message that describes the changes.

    • Command Line: After staging, use git commit to save your changes:

      git commit -m "Initial commit"

      The -m flag allows you to include a message that summarizes the purpose of the commit.

    • Python with GitPython:

      repo.index.commit("Initial commit")

      This line commits staged changes with the provided message.

  3. Viewing the Status

    It’s often helpful to review the current status of your repository to see which files are staged, unstaged, or untracked.

    • Command Line: To view the status of your repository, use:

      git status

      This command displays a summary of changes, showing which files are staged, unstaged, or untracked.

    • Python with GitPython:

      repo = Repo('/path/to/your/project')
      for item in repo.index.diff(None):
          print(item.a_path, item.change_type)

      This Python code provides a list of files with their change types (e.g., added, modified, deleted), giving an overview of the repository’s current status.

21.3.3 Reviewing Commit History with log

Reviewing the commit history is an important part of version control. Git allows you to see past commits, which include messages, author information, and unique commit IDs.

  • Command Line: Use git log to view the commit history:

    git log

    This command shows a list of commits in reverse chronological order. Each entry includes a commit hash, author, date, and message.

  • Python with GitPython:

    for commit in repo.iter_commits():
        print(f"Commit: {commit.hexsha}\nAuthor: {commit.author.name}\nDate: {commit.committed_datetime}\nMessage: {commit.message}\n")

    This code iterates through the commit history, displaying details for each commit. The hexsha attribute provides the unique commit ID.

21.3.4 Undoing Changes with reset and checkout

Git provides several commands to undo changes or revert to a previous state, helping to recover from mistakes or unwanted changes.

  1. Unstaging Files

    If you accidentally added a file to the staging area, you can remove it without discarding changes.

    • Command Line: Use git reset to unstage: bash git reset filename.py

    • Python with GitPython: python repo.index.remove(['filename.py'], working_tree=True)

  2. Reverting to Previous Commits

    You can roll back your repository to a previous state by checking out a previous commit. This will reset your working directory to that specific point in history.

    • Command Line: Use git checkout followed by the commit hash:
    git checkout <commit_hash>
    • Python with GitPython:
    repo.git.checkout('commit_hash')

    Replace 'commit_hash' with the actual hash of the commit you want to revert to.

21.3.5 Pushing and Pulling Changes with Remote Repositories

To collaborate with others or back up your work, you’ll need to push changes to a remote repository or pull updates from it. Let’s see how to work with remote repositories on GitHub.

  1. Adding a Remote Repository

    Linking your local Git repository to a GitHub repository allows you to sync changes.

    • Command Line: Use git remote add origin with the repository URL:
    git remote add origin https://github.com/your_username/your_repository.git
    • Python with GitPython:
    origin = repo.create_remote('origin', 'https://github.com/your_username/your_repository.git')
  2. Pushing Changes

    After committing, you can push changes to GitHub to make them available remotely.

    • Command Line: Use git push to upload changes:
    git push -u origin main
    • Python with GitPython:
    origin.push()
  3. Pulling Changes

    To retrieve the latest changes from the remote repository, use the pull command.

    • Command Line:
    git pull origin main
    • Python with GitPython:
    origin.pull()

These commands form the foundation for using Git effectively. With these basics, you can track, commit, and sync changes locally and remotely, supporting both individual and collaborative workflows.