Airflow with ArgoCD, kustomize, and Helm. Introducing CICD for our Data Scientist Team

Deploy Airflow on ArgoCD using Airflow’s Helm chart, manage deployment with kustomize, fully automated Airflow CICD. Introducing CICD for…

Airflow with ArgoCD, kustomize, and Helm. Introducing CICD for our Data Scientist Team
Airflow ArgoCD full Architecture

Airflow with ArgoCD, Kustomize, and Helm — Introducing CI/CD for Our Data Scientist Team

Photo by Myriam Jessier on Unsplash

Some prerequisites for following this blog post, include:

  1. Basic knowledge of K8s and ArgoCD. I recommend this video to get a landscape of what ArgoCD is.
  2. Basic knowledge of Kustomize.

Here is the repository for deploying Airflow on ArgoCD.

Before you start using the repository, you need to have:

  1. A running K8s cluster. If you want to get a light-weight K3s cluster up and running, you can check out this online tutorial.
  2. If your use case requires a customized Airflow image, that is outside the scope of this article.

The Benefits of Today’s Tools

  1. Why Kustomize?

Airflow comes with its nice helm chart, and our team has been using this helm chart for a while to play with Airflow. However, as we are pushing Airflow to production, we need a way to configure an Airflow cluster for development use, and a cluster for production use.

We hope the 2 clusters can accept different configurations on ingress routes, passwords, worker replica, … etc. Using 2 values.yaml files doesn’t seem to be the most elegant approach and not as convenient when we are deploying with ArgoCD. Using different branches also creates problems and results in a bunch of merge conflicts over time. Kustomize seems to solve this deployment configuration issue quite well, and forces us to continue this infrastructure as code culture.

2. Why use ArgoCD?

ArgoCD’s declarative gitops enables a K8s native CD workflow. Without the need to store K8s credentials in our pipeline server (either GitHub or GitLab), our credential management becomes a bit easier.

Architecture and Scenario

Scenario Walk Through:

  1. When a data scientist realizes that he/she needs a new dependency from Airflow, they update Airflow image spec, which triggers an automatic build and push new image to Docker Hub (checkout this post for detail).
  2. After they have the new image, they can use this image to continuously develop the dags in their own dags repository and create pr on dags repository’s develop branch.
  3. Before the dag is deployed, data scientists update Airflow ArgoCD repository to point airflow image to the new image version.
  4. ArgoCD picks up the update automatically and updates the develop environment’s Airflow.
  5. Data scientist finally merge their code to the develop branch of dags repository.
  6. Develop environment’s Airflow picks up dag change, and we are able to run tests on develop environment.
  7. After all tests are done, deploy Airflow to production, and merge dags repository’s develop into master branch.

Making Airflow Work on ArgoCD

  1. ArgoCD doesn’t work natively with kustomize + Helm chart (i.e., ksutomize --enable-helm), so we need to install a plugin to make yaml file from helm chart. See README for installation instruction. In the plugin, we basically customized yaml spec generation command by kustomize build --enable-helm command.
  2. So the approach we took is, we use Kustomize as the main configuration management tool, so we are translating helm chart to Kustomize. In order to do this, we installed a plugin to ArgoCD. If you know of a more elegant approach, please let me know, your suggestions would be much appreciated.
  3. In overlays folder, that’s where we make Airflow’s gitSync pull from different branches based on deployment environments.
  4. base/kustomization is where we translate helm chart’s job hook related flag to ArgoCD’s hook. Since we are not deploying this with helm but Kustomize, we need these flags so that ArgoCD know when to start these jobs, and how to handle the hook delete policy.
  5. For detailed steps, please visit the repository’s README file.

And with that, you’ve crossed another level to becoming a boss coder. GG! 👏

I hope you found this article instructional and informative. If you have any feedback or queries, please let me know in the comments below. And follow SelectFrom for more tutorials and guides on topics like Big Data, Spark, and data warehousing.


The world’s fastest cloud data warehouse:

When designing analytics experiences which are consumed by customers in production, even the smallest delays in query response times become critical. Learn how to achieve sub-second performance over TBs of data with Firebolt.

Read more

在優比快Cloud Team工作是什麼樣子

在優比快Cloud Team工作是什麼樣子

如果你正在找一份可以安安靜靜寫程式、不需要太多溝通的工作,老實說——Ubiquiti Cloud Team 可能不適合你。 年輕的工程師通常在意的是能不能學習、有沒有人帶;而資深工程師,則更看重領域的深度與發揮空間。這兩種我都理解,也都經歷過。在 Ubiquiti Cloud Team,工作確實不輕鬆,問題通常也不單純。但如果你追求挑戰、在意技術如何帶出產品價值,這裡就是個能讓你不斷磨練、逐步放大的舞台。 一些基本資訊先講清楚:我們使用 GitHub,開發環境現代化,雲平台該用的都有;團隊內部提供各種 AI coding 工具輔助日常開發(包括我本人非常依賴的 ChatGPT, Cursor 和 Claude Code);工作型態彈性大,遠端、無限假、健身補助。 一切從「真實世界的裝置」開始 Ubiquiti 跟多數純軟體公司不太一樣,我們的雲端服務是為了支援全球各地數以百萬計的實體網通設備:從 AP、

By schwannden