Categories
Software Development Technology and Internet

Mirroring GitHub Repositories

Git by itself is a distributed version control system (a very popular one), but over the years organizations started to rely on some internet services to manage their repositories and those services eventually become the central/single source of truth for their code.

The most well known service out there is GitHub (now owned by Microsoft), which nowadays is synonymous of git for a huge amount of people. Many other services exist, such as Gitlab and BitBucked, but GitHub gained a notoriety above all others, specially for hosting small (and some large) open source projects.

These centralized services provide many more features that help managing, testing and deploying software. Functionality not directly related to the main purpose of git.

Relying on these central services is very useful but as everything in life, it is a trade-off. Many large open source organizations don’t rely on these companies (such as KDE, Gnome, Debian, etc), because the risks involved are not worth the convenience of letting these platforms host their code and other data.

Over time we have been witnessing some of these risks, such as your project (and all the related data) being taken down without you having any chance to defend yourself (Example 1 and Example 2). Very similar to what some content creators have been experiencing with Youtube (I really like this one).

When this happens, your or your organizations don’t lose the code itself since you almost certainly have copies on your own devices (thanks to git), but you lose everything else, issues, projects, automated actions, documentation and essentially the known used by URL of your project.

Since Github is just too convenient to collaborate with other people, we can’t just leave. In this post I explain an easy alternative to minimize the risks described above, that I implemented myself after reading many guides and tools made by others that also tried to address this problem before.

The main idea is to automatically mirror everything in a machine that I own and make it publicly available side by side with the GitHub URLs, the work will still be done in Github but can be easily switched over if something happens.

The software

To achieve the desired outcome I’ve researched a few tools and the one that seemed to fit all my requirements (work with git and be lightweight) was “Gitea“. Next I will describe the steps I took.

The Setup

This part was very simple, I just followed the instructions present on the documentation for a docker based install. Something like this:

version: "3"

networks:
  gitea:
    external: false

services:
  server:
    image: gitea/gitea:latest
    container_name: gitea
    environment:
      - USER_UID=1000
      - USER_GID=1000
    restart: always
    networks:
      - gitea
    volumes:
      - ./gitea:/data
      - /etc/timezone:/etc/timezone:ro
      - /etc/localtime:/etc/localtime:ro
    ports:
      - "3000:3000"
      - "222:22"

If you are doing the same, don’t copy the snippet above. Take look here for updated instructions.

Since my website is not supposed to have much concurrent activity, using an SQLite database is more than enough. So after launching the container, I chose this database type and made sure I disabled the all the functionality I won’t need.

Part of Gitea's configuration page. Server and Third-Party Service Settings.
Part of the Gitea’s configuration page

After this step, you should be logged in as an admin. The next step is to create a new migration on the top right menu. We just need to choose the “Github” option and continue. You should see the below screen:

Screenshot of the page that lets the users create a new migration/mirror in Gitea.
Creating a new Github migration/mirror in Gitea.

If you choose This repository will be a mirror option, Gitea will keep your repository and wiki in sync with the original, but unfortunately it will not do the same for issues, labels, milestones and releases. So if you need that information, the best approach is to uncheck this field and do a normal migration. To keep that information updated you will have to repeat this process periodically.

Once migrated, do the same for your other repositories.

Conclusion

Having an alternative with a backup of the general Github data ended up being quite easy to set up. However the mirror feature would be much more valuable if it included the other items available on the standard migration.

During my research for solutions, I found Fossil, which looks very interesting and something that I would like to explore in the future, but at the moment all repositories are based on Git and for practical reasons that won’t change for the time being.

With this change, my public repositories can be found in:

Edit: Due to the described limitations of the mirror functionality in Gitea. My self-hosted mirror ended up being less useful than previously estimated. For that reason, it was shutdown on 12/07/2024.

By Gonçalo Valério

Software developer and owner of this blog. More in the "about" page.