GitLab is an awesome mess

I’ve been administrating my company’s GitLab instance for about a month or two, both on the front and on the server side. The instance is a Premium one, not so big (around 300 users with 1500-ish active projects), and is managed via Omnibus package (the easiest installation method available). Here is my little experience and interesting things I bumped into for the time being.

The good

Always start with the positives, to keep the hope up 😄 !

I, back in the day, always loved Gitea for its simplicity, and deemed GitLab an unworthy dump of overcomplicated features. Yet, despite all my stupid reasoning, people kept self-hosting GitLab at scale, claiming it to be the necessary evil. And I finally know why. Apart from the sluggish frontend, the gigantic amount of bloated features I’d rather implement myself or not use (e.g. Terraform managed state backend, K8s connection per project, AutoDevOps, SAST), GitLab actually does lots of things right.

Firstly, I finally come to understand and start to like the micro-component architecture of GitLab. You have Gitaly as the middle-man for Git storage, Gitlab-Shell specifically to manage SSH access, and so on. It allows each part to horizontally scale independently of each other, which is always a plus. Also, Geo, for me personally with a month of experimenting, is Disaster Recovery done right. Sadly, it’s a locked down feature and not available for the standard Community installation. In comparison, Gitea, at the time of writing, straightforwardly doesn’t scale, at all. If High Availability is your concern, GitLab is the obvious choice.

Then, there is the famous CI/CD to talk about. It’s so well integrated into GitLab that when you use GitLab as your code platform, no other CI/CD solutions matter anymore. Putting a few misleading predefined CI/CD variables, and occasional weird YAML merging behaviors aside, the .gitlab-ci.yml schema is pretty clean (not many nested keywords like GitHub Actions), well-defined and flexible. My only complaint, as of now, is that it’s still a YAML document. default:, include: and !reference[] exist, but, at the end of the day, no YAML anchors or fancy features inside the YAML engine can save you from a big, ugly CI/CD configuration. I’ll be glad if they add support for Jsonnet1.

The bad

You can’t avoid talking ill about something.

Shipping unfinished features on new releases

A great example of this is Batched background migrations, enabled in version 13.12. You have to wait for all the batched migrations to complete, before moving on to the next upgrade. There wouldn’t be a story to tell if things went that smoothly. The CI/CD table conversion job fails easily on large instances (mine already did, and you can find countless people complaining about it on the Internet). So, how do you suppose to fix the failed batched migrations? There was no Retry button on the WebUI until version 14.3, and the gitlab-rake command to manually rerun the batched migrations didn’t exist until 14.1. The solution workaround, at least in my case:

# Upgrade to 14.1. It will fail miserably, but that's fine.

# Set postgresql['statement_timeout'] to a value greater than the default 60s
# (it took me 2 days straight to find out this was the culprit behind the failed migration)

# Get the failed migration ID
$ gitlab-rake db:migrate:status | grep ' down '

# Temporarily mark it as finished
$ gitlab-rake gitlab:db:mark_migration_complete[<migration_id>]

# Run the rest
$ gitlab-rake db:migrate

# Unmark the failed migration
$ gitlab-psql
> DELETE FROM schema_migrations WHERE version IN ('<migration_id>')

# Manually start the batched migration as GitLab Docs tells you to

And if it still doesn’t work? Just go to the migration file list, find the specific Ruby migration code that failed in the batch, and replicate it with SQL commands directly inside gitlab-psql console. Remember to mark the migration as finished afterward.

Another example, just recently, is the introduction of admin_mode scope for Personal Access Token in version 15.8. All admin tokens have this scope injected to them automatically, while there are no way to set or unset it through GitLab API. What it means is that all my Pulumi automation code to manage admin access tokens is now broken, because Pulumi will try to recreate the tokens every time it runs, due to the mismatched scopes in state. How nice!


Since I mentioned tokens, just take a look at all the different ways for your automations to access private GitLab resources2.

  • CI_JOB_TOKEN is the unique one and is nicely implemented, security wise, but its scopes are so limited it turns out to be useless most of the time.
  • I wonder why Deploy Keys even exists. It doesn’t enforce expiration date, so would quickly become stale and unmanageable. And I find people usually use it the same way as Deploy Tokens (the only access method I actually can bear).
  • Personal Access Token, Project Access Token and Group Access Token are a mess. They are powerful, but can quickly and easily be forgotten, as the usual case is to generate one and set it in a CI/CD variable. With the future 16.0 release, all your old, poorly maintained pipelines might suddenly break out of nowhere as their lost tokens expire.
  • To make things worse, instance-level access token isn’t a thing3, so you get stuck with bot-like accounts for cross top-level groups automations, which occupy license seats and waste $19 USD/month each on your Premium plan4.


Beside technical knowledge, these 2 months of hands-on experience with GitLab also gave me some good memories. All those times pulling all-nighters with co-workers to upgrade GitLab, gossiping about random things while waiting for the pre-upgrade backup to finish, they were really fun and memorable. As such, I appreciate that GitLab is a mess as it is, giving me tons of frustrations, while also introducing awesome features and bringing me joy maintaining it (who doesn’t like almighty power over all the company’s projects, to be honest 😄).

There isn’t a better way to end the post, so let me share one final story.

Just 1-2 week ago, died (500) for roughly an hour while I was hotfixing a broken production pipeline. I panicked for a few minutes not being able to access CI/CD YAML schema reference before realizing we already self-hosted an instance of GitLab docs for situations like this5.

After that, we went to check GitLab Docs issues and found their DevOps team “fighting” each other in the comment section over why the pipelines failed. And, at that time we knew that GitLab Devs like smashing the Retry button too and pray the Gods for their pipelines to magically succeed, even though the logs keep saying otherwise.

That’s all for me. Thanks for reading until the end!

  1. A praise for DroneCI and Agola for doing something way better than GitLab CI/CD. ↩︎


  3. There is a 2-year-old opened issue↩︎

  4. Even GitLab organization themselves haven’t got away from @gitlab-bot yet, after all these time. ↩︎

  5. Each GitLab instance comes with a copy of GitLab Docs (check <your_instance_url>/help), but it looks less fancy with no search function or the left navbar. ↩︎