Kubernetes Migration

Kubernetes%20Banner

Summary

While working for a US-Based igaming organization, we turned our attention away from the client-facing experiences and features, and instead focused on improving the underlying infrastructure. The aim of this project was to improve our platform reliability, application performance, and developer efficiency by migrating the entire web app, as well as the supporting iOS and Android apps, to a series of Kubernetes container clusters. As one of the Product Managers, I was tasked with leading the migration initiative within our team, and completing the migration without disruption to our day-to-day performance.

 Timeline

2020 - 2021

 Role

Product Owner

 Methodology

Agile - Kanban / Agile - Scrum

Challenges

This environment migration, like many others, was attempted during active development of new features and functionality, across many branches, and between many teams. The coordination of a project of this magnitude was the first major challenge with which our teams were presented. This migration would need to be completed without disruption to the end-user environments or sessions, as well as ensuring that any cross-team dependencies were considered so that ongoing development could continue in an effort to ensure that deliverables were completed on time.

This migration would need to be completed without disruption to the end-user environments or cross-team dependencies to ensure that deliverables were completed on time.

Google Datacenter

The second major challenge facing this project, was the sheer number of environments which needed to be maintained during the migration efforts. This company builds both in-house as well as white label SaaS solutions for clients, ultimately resulting in dozens of unique environments operating across several US States. This posed a significant challenge due to the time required to migrate the entirety of the environments, as well as ensuring that development occurring in the legacy environments would not need to be refactored once the migration was completed.

The final major challenge was the mandate to continue development on other ongoing initiatives, which would be launched to consumers both prior and in parallel to the environment migration. This mandate meant that developers were tasked with building functionality in legacy environments, and maintaining the new code in the new cloud environment during the migration efforts as development progressed.

Solution

After the initial requirements had been gathered and a project plan had been drafted, the Product Owners were briefed to ensure there was a unified understanding of the migration strategy. The migration was tackled in phases, first proof of concepts (POC) were completed for the initial Google Kubernetes Engine (GKE) environments, as well as cloud functionality such as session affinity, which would be necessary to ensure reliability and compliance to any state-by-state regulations governing data retention or access. Next, the DevOPS team completed initial MVP builds for our major environments and SaaS partners, before completing rigorous regression testing to finalize the initiation of the project.

Once this was completed, the migration was completed in phases by environment (DEV, QA, PRE-PROD, PROD, etc). Cross team dependencies were mapped with the help of the Solution Architects and Product Owners, and each environment was migrated in sequence based on these dependency maps - effectively eliminating the risk of dropped dependencies during migration. Environments were also migrated in sequence based on our continuous integration/development (CI/CD) pipeline, allowing for ongoing development to continue during the migration, as updated code would be merged into the upper environments prior to their cloud migration. In select situations, teams were tasked with completing a migration ‘snapshot’ where the entire migration was completed at once, to allow for the development of features which would only be available to end-users post launch of the cloud environment. In these cases, the dependencies were evaluated and mitigated prior to the migration, and development of these new features were completed in the new cloud environment instead of the legacy stack.

Flow Diagram

In the final steps, the production environments were migrated and tested thoroughly to ensure performance was maintained once live in the cloud, and final quality assurance (QA) testing and preparations were completed to ensure that the production cut over to the GKE environments would not result in any disruption or downtime for any systems or applications. Once this was completed, and all testing had passed, a Go/No Go decision was made and the production applications were switched over to the new cloud infrastructure.

Analysis

The Kubernetes Environment Migration project was a massive undertaking which required the coordination and cooperation of nearly the entire technical team within our organization. The intricate web of cross-dependencies and interrelated systems created the need for a detailed and thorough project plan. This project, was one of the few software-related projects in my career which was better suited to a traditional waterfall project management methodology as opposed to the agile methodologies which are typically utilized.

Falling back on my training as a Project Manager allowed for easy digestion of the nearly 1000 item gantt chart presiding over this project, and resulted in a perspective which allowed me to adjust the operations of my development team to better align with the more sequential goals and milestones of the environment migration. This was extremely important to the project overall, due to the fact that had the independently managed development teams attempted the migration in the traditionally self-governing manner prescribed by the principles of agile, the migration could have experienced significant delays as the platform or product interdependencies may have been missed, or failed to be considered as readily. These delays would not only have impacted the migration project, but could have also resulted in delays to product feature development and deployment as resources would be pulled off of other projects to help mitigate these potential risks.

Ultimately, this project was successful due to the precise solutioning and planning which was completed during the project initiation phase. An already essential step in traditional project management, this front-loaded planning would have typically been avoided in a traditional agile process - instead opting for strongly defined requirements and in-depth refinement by the individual development teams. In this case however, the waterfall project management methodology allowed for minimized risk, maximized collaboration, and the successful completion of the Kubernetes Migration.