Image

Portrait Mathieu Lienart
von Matthieu Lienart
Cloud Engineer, aus Ostermundigen

#knowledgesharing #level 100

Comparing AWS Lambda on ARM vs x86 architectures

In September 2021, AWS announced AWS Lambda Functions powered by AWS Gravition2 ARM processor and stated that:

“Lambda functions powered by Graviton2 are designed to deliver up to 19 percent better performance at 20 percent lower cost”

In this article we investigate the gotchas of this claim.

The benchmarking

In order to test this claim, we used some Python code with the Numpy library to perform some complex matrix calculations to simulate CPU intensive operations for benchmarking.

Using 300x300 float matrices the benchmarking code performs multiple:

1. Matrix multiplications

2. Vector multiplications

3. Matrix Singular Value Decomposition

4. Matrix Cholesky Decomposition

5. And Matrix Eigenvector Decomposition

The AWS Lambda Function was packaged and deployed on both ARM and x86 architectures. We used the AWS Lambda Power Tuning tool to execute both functions with different memory amounts between 128MB and 5120MB.

It should be noted that, when configuring your AWS Lambda Functions, you should stick with memory sizes which are multipliers of 128 (e.g., 128, 256, 384, 512, 1024…). Doing otherwise generates memory management overhead reducing the function performances.

The results

The results at all memory levels are as follows:



If we zoom-in at memory level above 512MB for clarity we get the following results:



We can clearly see two patterns for the CPU intensive operations performed by this benchmark:

1. For memory level <= 512, AWS Lambda on x86 architecture is between 36% and 99% faster than on ARM architecture.

2. If the performance gap between AWS Lambda functions running on ARM vs x86 closes as memory increases, the function running on x86 architecture remains about 16% faster.

ARM vs x86 in the real world

Working on some customer project where we were trying to minimize the response time of an AWS StepFunction orchestrating multiple AWS Lambda functions, we performed some side-by-side comparison. To do so, we deployed the entire process in parallel, one with all AWS Lambda Functions configured with an x86 architecture, one with all the functions configured with an ARM architecture. We observe that in general

• Compute intensive functions were faster on x86 architectures.

• Other functions were faster on ARM architectures.

Migration effort - is it that simple?

In that same blog announcement, AWS states

“The change in architecture doesn’t change the way the function is invoked or communicates its response back.”

This suggests that the migration is as simple as changing the AWS Lambda function architecture parameter from ARM to x86.

Although that might be true in some cases, this is unfortunately not the full picture. If your function uses non default AWS Lambda compiled libraries (e.g., the benchmarking function imports the Python Numpy library which uses compiled C libraries), switching the architecture from x86 to ARM will break the function as the library has to be recompiled for the architecture it is running on.

In such cases, just changing the AWS Lambda function architecture parameter is not enough. You need to repackage your libraries, which, depending on the language and how you package and deploy these libraries can be more or less complicated.

Conclusion

The answer to whether AWS Lambda function executes faster on ARM vs x86 architecture, is as often: “it depends”.

From the benchmark we performed at Axians Amanox, the rule of thumb seems to be that – for now:

The bottom line is that you need to test for your particular applications and test all functions individually depending on the operation they perform. The results will probably show that you will need to run a mix of both.

Image

Vollautomatisierte MLOps-Pipeline - Teil 2

In unserem letzten Beitrag haben wir uns mit dem Training eines Prognosemodells mit SageMaker beschäftigt. In diesem Beitrag erfährst du, wie du die Leistung des Modells überwachen und die Nachschulung automatisieren kannst, um konsistente und zuverlässige Vorhersagen zu gewährleisten.
zum Artikel
Image

Vollautomatisierte MLOps-Pipeline - Teil 1

Im vorigen Blogbeitrag haben wir die Architektur und die Demo einer Pipeline für die Dateneingabe in Amazon SageMaker Feature Store in nahezu Echtzeit vorgestellt. In diesem und dem folgenden Beitrag werden wir die vollständig automatisierte MLOps-Pipeline vorstellen.
zum Artikel
Image

Nahezu Echtzeit-Dateneingabe in den SageMaker Feature Store

Dieser Blog-Beitrag ist der erste Teil einer dreiteiligen Serie über das Testen einer vollautomatischen MLOps-Pipeline für Machine-Learning-Vorhersagen auf Zeitreihendaten in AWS, die nahezu in Echtzeit vorliegen. In diesem ersten Teil konzentrieren wir uns auf die Dateneingabe-Pipeline in den Amazon SageMaker Feature Store.
zum Artikel
Image

AWS AppConfig for Serverless Applications Demo

Wäre es nicht schön, wenn die Applikationskonfiguration von der Infrastrukturkonfiguration und dem Code entkoppelt werden könnte? An dieser Stelle kann AWS AppConfig (eine Komponente von AWS Systems Manager) helfen (Artikel in Englisch)
zum Artikel