AWS Migration - The Architect's Handbook

Cloud is the way to go. And AWS is the most popular cloud service. So it is natural that everyone wants to migrate to AWS. But often that is not as simple as it sounds. The bigger and older your legacy system, the harder it is to migrate.

Amazon suggests a step by step process for performing such a migration. Here are my two cents on that.

Why Migrate?

The benefits of being on the cloud have been repeatedly advertised over and over in the technical community. I don't want to repeat them all over again. But here, let's look at some reasons why people are not eager to migrate.

Do we really need to migrate?

This question has held back many of us. Billions of dollars are wasted every year because people are not able to conclude on this. The answer is, Yes. Even the pettiest application deserves a place in the cloud. May be you are just a simple desktop application that is doing well within the windows. Even then, connecting it to the AWS will add a lot of value that you can never imagine without a cloud.

Cloud Locking

That's another doubt bubbling in our mind. Well, to think of it, if we are not on the cloud, today we are locked in the premise! And this locking is a lot more wasteful.

Jokes apart, cloud locking is not a major problem anymore. In the early days, people had serious doubts if this cloud craze would really sustain itself. Naturally, they were cautious. But today, things have changed. Well, nothing is permanent but going by the numbers, your on premise systems are going to break down much before Amazon. They are not going anywhere. We can bank on Amazon while planning our roadmap. The strong competition among the service providers will make sure that we always get a very good price.

Security

Some organizations are worried if the application and data are secure on the public cloud? The fact is that nothing is safe. If our application is connected to the internet, it is only a matter of time before hackers get in. We need a strong team of experts who can combat these hackers and keep us safe. Unless information security is our key expertise, having such experts on our team is very difficult. So, the data on our premises can never be secured.

Amazon, on the other hand, has an huge army of experts who can sustain the war against these hackers. So if we follow the security guidelines they define for us, our data on the cloud is a lot more secure than the data on premise. Nothing is free from hackers, Amazon, Google, and all the giants have been hacked and they live under the threat of more attacks. Yet, they are a lot safer than our premise. They understand security much better than we do.

How to Migrate

Now that we have chosen the path, we are all set to get started. To help us migrate fast, Amazon provides us a roadmap and a good set of utilities and services. Let us understand them in detail.

Commonly referred as the two models of five phases and six R's - we hear this jargon all over the industry. There is no one-size-fits-all answer to determining the correct strategy for application migration. But following these best practices can help us discover the right strategy for our application and organization.

Amazon recommends these with a disclaimer that after all each organization is different in culture and targets - so the actual roadmap has to be chosen by the architects. But these best practices provide a good starting point - a baseline template that we can alter as per our requirements.

Five Phase Migration Process

With the five-phase migration process, you start with the least complex application to learn how to migrate while learning more about the target platform, and then build toward the more complex applications.

This is not a waterfall, but it should be an iterative process that lets us plan and go about doing this migration:

Phase 1 - Migration Preparation and Business Planning

If you don’t have a plan, you are planning to fail.

Migration of an enterprise - a live system is a significant effort. A concrete plan is a must for any business endeavor. This requires a good understanding of the baseline architecture. It needs an engaged leadership - that understands the costs/ benefits and is serious about the project.

This enables us to define a good roadmap that is aggressive in its goals and realistic in timelines. A roadmap that provides fast and incremental returns - that keeps the leadership engaged and interested in the process.

Phase 2 - Portfolio Discovery & Planning

Most legacy systems were beautifully architected once upon a time. But over the years, they degenerate to a complex chunk of interconnected components that violate every possible architecture, design and coding guidelines.

It is important to take stock of the current state of the enterprise; Identify the complexity and dependencies among individual components; so that we can segregate them into building blocks that can be migrated one by one.

Based on this exercise, we can identify parts parts that are simpler and less critical than others. Naturally, we move these before the others. This gives us the confidence in the process. And it trains the developers who learn from the problems encountered in this process.

But even before we do this, we should first work on a POC. Identify the typical interfaces and flow of each system. Based on this, implement a simple "hello world" on the cloud - that connects to similar databases and interfaces within the cloud. This helps us understand what is feasible.

A friend once told me about his experience with a migration. They had migrated most of it, but were stuck on a few web applications. They had a mean stack that they wanted to run on an Elastic Beanstalk. They struggled with it till a few days before the deployment - only to discover that MongoDB does not go well on an EBS! Since it was AWS, it did not take them time to spin off an independent Document DB instance and get the whole thing working in a day. But such problems are better identified in a POC rather than a day before the go live.

Some points to note in this phase:

A major pitfall in this stage is to use the AWS Console to create the resources and deployments on the cloud. This may be very tempting - because of the intuitive UI and the helpful tips that the UI provides. But this can land us in deep mess. It is almost certain that we will not be able to replicate such a deployment without any error. We always live with the risk that the tested application will not work on the production setup. So it is very important to have a CloudFormation or a Terraform script ready to run on test or production environments. As secret for those who are in a mess with this - AWS CloudFormation now lets us create a template out of resources that we built using the UI console. But it is best not to depend upon such saviors, and get into the habit of using Infrastructure as Code.
Use the Code Commit. Switch over to the Native AWS CI/CD services like Code Pipeline, Code Build and Code Deploy. This should be a part of the POC and the migration checklist. Any application migration is marked complete only after the pipeline is setup in AWS. People often miss this step when the migrating to the cloud. Then they have to struggle with transporting the binaries built on the premise. Don't waste an EC2 instance only for Jenkins. These services have a very small learning curve, and should be learnt and put into practice from the first day.
Use CloudWatch for monitoring and managing the logs. Don't waste an EC2 for collecting and processing system wide logs. There are many advantages of using native CloudWatch over custom, artificial applications. Don't forego these advantages in the hurry to complete the POC.
URL's are going to consume a huge amount of your time. Everyone knows the best practices, but in all legacy applications, you will find URL's hardcoded in the code - in the most awkward parts of the code. Some of those are constructed dynamically (some brilliant mind worked on it) - making it very difficult to identify. A lot of this code is invoked in a remote scenario that our tester will certainly miss out. When we migrate to the cloud, all these URL's have to be identified and modified to the corresponding URL on the cloud. This can be a tedious activity if we do not plan for it. We should invest in a code scanner that can help identify such URL's hidden in the legacy code. This also helps us generate a better and more accurate network model of the enterprise.

Phase 3/4 - Design, Migrate, Validate

This has to be an iterative agile process. We cannot just switch off on premise data center and have everything ready on the cloud in a single shot. We need a hybrid architecture over the few years of the migration process. We pick applications one by one and migrate them to the cloud.

Based on the learnings of the Phase 2, we make the necessary changes to each application and move it to the cloud one by one. We can take bigger steps as we gain more confidence.

A few important points to note in this phase:

Don't try to redesign at this stage. Take small steps. First migrate everything to the cloud - onto EC2 instances. But, when we migrate individual components, we should try to place them individually on different EC2 instances. Since we are splitting them anyway, this is not an additional effort. But it will pay off the future steps. Enable code cloud logs to help us identify the usage and loading patterns on each EC2 instance.
We should take the effort to identify the right EC2 instance for each, and reserve it. There is a huge difference between the cost of a reserved and on demand EC2 instances. We can start with an on demand instance to start with, and evaluate our requirement. But we should try to identify and reserve it as early as possible. Reserve for a small duration - so that we can change it as we learn.
Plan out a good VPC structure. Follow the least privilege principle. Even if all the instances are in our preview, if they are not required to connect with each other, they should not be able to connect. This will harden the security of our system. It is easy and tempting to ignore and postpone such an effort - until everyone forgets about it. But it pays off in the long run if we do it right now.
All through the migration process, we will have data flowing from our premise deployment to the cloud. This can be a security risk. People try to cover it up by restricting based on the IP address, etc. But it is quite easy to fool such a protection. AWS provides us with a range of services for implementing such a hybrid architecture. Outpost, Local Zone and many others. They are not so costly and certainly worth the cost when we consider the security risk they mitigate.
Migrating compute is relatively simple. But data migration is a major challenge. We have two problems. The huge amount of legacy data just cannot be migrated over the wire. We have useful services like Snowball, Snowmobile and Snowcone, and their variants - collectively called the Snow Family provide great options for doing this.
But a bigger challenge in migrating data is to keep the on premise data data in sync with the data on cloud - throughout the migration process. AWS Storage Gateway, AWS Backup and AWS DataSync help us do this very efficiently.

Phase 5 - Modern Operating Model

Now that the application is migrated, we have to focus on making it cloud-worthy.

If we ignore this step, all the effort above will be wasted, and we might end up increasing the costs! So we have to go through the effort of identifying areas where we can optimize.

By now we have a good fleet of EC2 instances. They have been running for a several months. We have a good data of their usage and loading patterns. Based on this, we can alter the choice of EC2 instances. Some servers that are not doing anything time critical, can be changed to Spot Instances. If we notice the loading patterns vary with time, we can use Auto Scaling with a Load Balancer. Autoscaling can be self defeating, if lose the advantage of reserving the instances. But if we know the load patterns, we can also reserve the instances based on time slots. That gives us best of both the worlds. Even if the load is not fluctuating, it is a great idea to put the EC2 behind an ELB - with Autoscaling. The cost is insignificant, but can be a day saver on some day in the future. Such horizontal scaling should be part of the design in the cloud. Instead of using large EC2 instances, it makes a lot of sense to use smaller instances behind an ELB.
Slowly dockerize the application. Since we had split the enterprise into multiple EC2 instances, now it is very easy to identify components that can be modified without affecting the others. Identify potential Lambda functions. There are a lot more than we can imagine.
Over time, we should have only dockers and lambda functions in our compute. While doing this, we should also discard the EC2 instances one by one and move onto Fargate. Drop the Kafka or other messaging services and move to Kinesis, EventBridge, SNS or SQS, as per the requirement. Migrate to cloud native databases like Aurora, DocumentDB, DynamoDB, and other purpose built databases like TimeStream, Keyspace, Neptune, Redshift, Elasticache Redis, Memcache.

As much as possible, move from EC2 to docker and serverless. Move storage to S3. Use S3 tiering to push unnecessary stuff into Infrequent Access or Glacier. Use rule based or intelligent tiering if there are definite patterns in your data access.

Such optimizations will go a long way in reducing the costs and improving the resilience and scalability of the application. Again, don't worry about cloud locking. Amazon is here to stay. And locking into AWS is a lot better than locking into your premise equipment.

The Six R's

The principles we discussed above can be summarized in the six migration stragegies - also called the 6 R's

Rehost

This is the simple lift and shift approach - move the application from the premises into the cloud - with very little changes. A lot of this effort can be automated using the AWS Server Migration Service

Replatform

This is often called "Lift and tinker and shift". That is making basic cloud optimizations on the applications. Like using RDS instead of having a database running on an EC2 instance.

Repurchase

This is also called "Drop and Shop". Here we identify redundant or better forms of the services we using. Like lower configuration EC2 instances or Spot instances. With this, we replace existing instances and services with better ones.

Refactor / Rearchitect

Here, we change the way the application is architected and developed. This usually involves employing cloud-native features and services. This is where we apply our understanding of AWS. The benefits of migration are visible after this phase. More effort we apply at this phase, better is the outcome. The value we generate here is related to the accuracy of the previous phases - specially the Replatform.

Retire

Once we have the refactored and rearchitected system in place, we can gradually retire the older infrastructure

Retain

Despite all the effort, we will have some parts of the system that refuse to leave our premise. We have to look for a long term plan to migrate them. But till then, we should enable ways to retain small parts on premise infrastructure.

Conclusion

These are the basic points that can help simplify the migration to cloud. Of course, each enterprise is different and the architect has to identify the right path to the migration. The text book can only provide the ideal path. And real life is quite different. Yet we should know and make an attempt to align.

Wish you all the best for your migration. Do share in comments what were your experiences in your adventure on AWS.