See all posts

GitHub Self-Service Automation: Introduction to Repoman and Release

Today, we are announcing the release of Repoman, a Python-based project for managing GitHub Repositories, Teams, and secrets, and performing an Organization backup to Azure Blob Storage. This initial blog post will cover the backup client in more detail. The full project is available on GitHub.

5 minApr 30, 2024
undefined undefined

Having performed security assessments of numerous cloud environments over the past years, we’ve learned that misconfigurations are prevalent in source code repositories and deployment pipelines. There’s often a disconnect between the team’s knowledge that an adversary can abuse functionality used for deploying resources unless sufficient controls are in place. At a recent engagement, I wanted to get this right from the beginning by automating the secure configuration and management of GitHub.

We hope this can contribute to securing the supply chain for cloud environments.

Background

The idea came from working with an organization to build a new cloud platform that didn’t have a common source code repository but wanted to use GitHub. We had the opportunity to build it from scratch and ensure we had a secure design from day one.

Having assessed safe-settings and policy-bot, we found that neither was a good fit for the organization.

Requirements

We found that safe-settings provided most of the features we required but had a few limitations, such as:

We also found that our needs were relatively simple and that the REST API offered rich functionality for automating this, so we decided to proceed.

After a workshop, we boiled our requirements down to the following:

  • Create new repositories and assign Teams
  • Create environments in repositories
  • Enforce Branch Protection policies for Repositories and Environments
  • Create Teams and assign teams to IDP Groups (Entra ID or Okta Groups)
  • Add variables to repositories and secrets
  • Have a function that supports uploading encrypted variables on-demand
  • Perform backup of the entire organization to a cloud storage
  • Being able to run it from anywhere

We required a tool that allowed simple management of GitHub for people with no prior experience with GitHub and who were relatively new to GitOps. The code was initially written with a customer. This release is a reimplementation with many error handling, logging, and modularity improvements. The initial release includes the backup and new repository functionality while continuing to refactor the other parts outlined in our requirements below.

Approach

After assessing the needs and learning the GitHub API, we decided to implement a desired state configuration (DSC) style project and write these as Python functions that call the GitHub REST API. We kept the configuration in a JSON structure in a separate repo that could fetch the package from GitHub packages or PyPI.

Our implementation does not keep track of the state and does not offer delete functionality or the ability to update a repository's name. This was a deliberate design decision to keep it as simple as possible with these known limitations.

For this first blog post, I want to cover some details on the GithubBackupClientAzure that we created.

Backing up GitHub

I’ve seen this requirement at numerous previous organizations, and the approach I’ve seen is usually to have all repositories cloned and then pushed to a central storage. I wanted this to be part of the solution, and ideally, we would use an API that fits the purpose. Luckily, there is the Organization Migration API that does exactly this.

Blank diagram (11).png

A limitation of the API, however, is that it doesn’t support a fine-grained access token or authentication using a GitHub App that the remaining functionality is built on-

Unfortunately, to overcome this limitation, a highly privileged personal access token is required, so keep this in mind. Luckily, this is only required for the backup function, and with the project's modularity, it can be run in a process separate from the remaining automation the tool offers.

A positive thing about the API is that you can run an asynchronous job where you provide a list of all the repositories you want backed up. It will include the organization metadata and the git history in a single ZIP file.

The steps required to perform a full GitHub Organization Migration, aka backup are:

  1. Get all existing repositories
  2. Make a POST request to the Organization Migrations API with the repositories object
  3. Wait for it to complete asynchronously
  4. Download the archive once done.
  5. … Upload to a cloud storage of your choice, in this case I’ve created a class that support an Azure Blob Storage container.

For portability, it imports all parameters through environment variables, allowing it to run locally, in a GitHub Actions Workflow or in a container.

import os
import logging
from package.backupclient import GithubBackupClientAzure
from package.repoclient import GithubRepoClient
from package.secretsclient import GithubSecretsClient
from package.teamclient import GithubTeamClient
from package.utils import load_env_vars

# Configure logging
logging.basicConfig(level=logging.INFO)

def load_env_vars(var_names):
    return {var: os.getenv(var) for var in var_names}

def main():
    env_vars = load_env_vars([
        'GITHUB_TOKEN',
        'ORG_OR_USER',
        'AZURE_STORAGE_ACCOUNT_NAME',
        'AZURE_STORAGE_CONTAINER_NAME'
    ])
    missing_vars = [var for var, value in env_vars.items() if value is None]
    if missing_vars:
        error_message = f'Missing environment variables: {", ".join(missing_vars)}'
        logging.error(error_message)
        raise ValueError(error_message) 
    try:
        logging.info("Starting backup process...")
        backup = GithubBackupClientAzure(
            env_vars['GITHUB_TOKEN'], 
            env_vars['ORG_OR_USER'], 
            env_vars['AZURE_STORAGE_ACCOUNT_NAME'], 
            env_vars['AZURE_STORAGE_CONTAINER_NAME']
        )
        backup.create_gh_backup()
        logging.info("Backup process completed.")
    except Exception as e:
        logging.error(f"An error occurred while creating backups: {e}")
        return

if __name__ == "__main__":
    main()

For Authentication towards Azure, it will default to using EnvironmentCredential.

Summary

The blog post announces the release of Repoman, a Python-based project for managing GitHub repositories, teams, and secrets and performing an organization backup to Azure Blob Storage. The post explains the background, requirements, and approach taken to build the tool, which allows simple management of GitHub for people with no prior experience with GitHub and who are relatively new to GitOps. The post covers the backup feature in detail and explains how Repoman uses the Organization Migration API to backup the entire organization to a cloud storage.

You can find the project here: https://github.com/O3-Cyber/repoman

We will continue with blog posts on using Repoman to manage GitHub organizations securely.