Deploying a Pretrained Stable Diffusion Model in AWS Lambda | by Paolorechia

A setup guide for deploying your AI-Art generator model

photo by wicked monday Feather unsplash

In this article, I will describe how to deploy a static propagation (neural network) model on AWS Lambda using a pre-trained model as a base, specifically with the weights and inference code already available.

base code used openvino To produce a highly CPU-optimized version of the stable spread.

This framework is especially useful for Edge and Internet of Things use cases, but here, we’ll use something completely different – we’ll build a really large model (~3GB and execute it successfully on AWS Lambda) Will deploy

I will not teach anything about OpenVino as I am unfamiliar with it. In fact, I just used an open source repository as a base.

I’ll instead focus on getting Glue to deploy to AWS Lambda.

The described approach should work regardless of the underlying framework (hugging face, pytorch, etc.), so it’s very useful to know if you want to spec on a serverless HTTP endpoint.

It is divided into short story:

  1. a brief background
  2. Step by step guide to deploy it on AWS Lambda
  3. my silly mistakes before i fix it

But before we dive in, some context about what we’re working with. You’ve probably heard about the recent boom in AI-generated images and their steady spread. If not, and if you’re interested in the topic, you might want to check out this excellent blog on the topic from Hugging Faces: https://huggingface.co/blog/stable_diffusion,

Here is a sample picture of an astronaut cat that I generated with static diffusion:

This neural network to generate images requires a lot of GPU power to execute, however the model is being optimized over time for any end user with a modern desktop or laptop. Should be more tractable to use.

Because the stable diffusion model is open source, different people are also working on offering optimized alternatives: optimizing it for MacBook M1 chips, optimizing it for Intel chips, etc.

Usually, the time it takes largely depends on the actual hardware. I derived the following approximation for the calculation for a variety of processors (tested locally with an RTX 3090 and an i9, and only read about the M1 online):

As you can see, when executing on a i9Openvino is very slow compared to solution alternatives.

An alternative is to use the ONX version provided by HuggingFaces (which has similar compute times after optimizing it onnx simplifier ,https://github.com/daquexian/onnx-simplifier,

Comment: The Simplifier requires approximately 27GB of RAM memory to run this model. I suspect the end result would have been the same if I had used the ONNX version.

Despite the slowness of the CPU estimation, it’s interesting to see that it can execute in AWS Lambda, which means it’s usable for a free trial/demo due to the generous AWS Lambda free tier. I’m building something similar, for example, a free toy product that I published https://app.openimagegenius.com,

If you want to experiment with it, be sure to be patient and wait up to five minutes for the image to build. Lambda uses only 12 inference steps to speed it up (execution takes about 60 seconds when Lambda is warmed up).

creating golden robotic cats for my 5 year old son

The source code of this example can be found here. Feel free to use it however you like without my permission (it’s MIT license). Please take into account the license of the models.

https://github.com/paolorechia/openimagegenius/tree/main/serverless/stable-diffusion-open-vino-engine

Enough chat. Let’s jump into the solution.

(Working version: Container-based Lambda with EFS🎉)

Unfortunately, there are a lot of manual steps here. While one could automate a good portion of this, I didn’t think it made sense for me to invest that much time. Someone skilled in CloudFormation or Terraform can probably automate most (if not all) of these steps.

If you try to follow these steps and get stuck, don’t hesitate to contact me, I’ll be happy to help.

Let’s start with VPC and EC2

  1. Create a VPC or use the default one. either way is fine.
  2. Create an EC2 instance attached to this VPC and preferably deploy it on a public subnet. You’ll need to connect to the instance via SSH, which becomes a lot easier if you’re using a public subnet.
  3. Create a new security subgroup or modify the one you use. You must have ports 22 and 2049 open to login.

Now EFS

  1. Create an EFS. Make sure that this EC2 instance is available from the setup in the same subnet and uses the same security group that you defined.
  2. SSH into your EC2 instance.
  3. Follow the AWS guide on how to mount EFS in your EC2: https://docs.aws.amazon.com/efs/latest/ug/wt1-test.html
  4. In short, you execute the following commands (with the actual mount folder extracted from the AWS console/CLI and modify the parameters appropriately with mount-target-dns – you can find the DNS in the EFS Filesystem UI screen)
  5. mkdir mnt-folder
  6. sudo mount -t nfs -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport mount-target-DNS:/ ~/mnt-folder
  7. Save Openvino model files to EFS. In my case, I downloaded them manually using this code (https://github.com/bes-dev/stable_diffusion.openvino/blob/master/stable_diffusion_engine.py) and previously uploaded them to the S3 bucket. Then in my EC2 instance, I downloaded from S3 bucket to EFS

,Comment: To do this, you may need to assign a role to your EC2, as shown in the screenshot below.)

aws-cli should work once you have configured the role correctly, for example, you can execute commands like aws3 sync s3://your-bucket,

then create an access point to the elastic filesystem

Create an EFS access point for the Elastic File System you configured.

Here are a few things to note:

file system user permissions – if they’re too restrictive, you’ll get PermissionError When accessing EFS files from your Lambda. In my case, this EFS was dedicated to this lambda, so I didn’t care about the granularity and just gave wide-open access (I’ll do the same in serverless file later):

Also avoid appointing / This can cause problems when mounting to the root directory path you’ve defined for the access point. Also, be sure to note the value you selected. You’ll need to use it inside your lambda function. I personally have used: /mnt/fsAs was described in another guide.

OK, we’re done with creating the resource manually.

Most of the heavy lifting with the serverless template in regards to the EFS parts I’ve gathered https://medium.com/swlh/mount-your-aws-efs-volume-into-aws-lambda-with-the-serverless-framework-470b1c6b1b2D.

Here’s the full blueprint. You’ll pretty much have to change the resource ID to your own.

service: stable-diffusion-open-vino
frameworkVersion: "3"
provider:
name: aws
runtime: python3.8
stage: $opt:stage
region: eu-central-1
memorySize: 10240
iam:
role:
statements:
- Effect: Allow
Action:
- "elasticfilesystem:*"
Resource:
- "arn:aws:elasticfilesystem:$aws:region:$aws:accountId:file-system/$self:custom.fileSystemId"
- "arn:aws:elasticfilesystem:$aws:region:$aws:accountId:access-point/$self:custom.efsAccessPoint"
functions:
textToImg:
url: true
image:
name: appimage
timeout: 300
environment:
MNT_DIR: $self:custom.LocalMountPath
vpc:
securityGroupIds:
- $self:custom.securityGroup
subnetIds:
- $self:custom.subnetsId.subnet0
custom:
efsAccessPoint: YOUR_ACCESS_POINT_ID
fileSystemId: YOUR_FS_ID
LocalMountPath: /mnt/fs
subnetsId:
subnet0: YOUR_SUBNET_ID
securityGroup: YOUR_SECURITY_GROUP
resources:
extensions:
TextToImgLambdaFunction:
Properties:
FileSystemConfigs:
- Arn: "arn:aws:elasticfilesystem:$self:provider.region:$aws:accountId:access-point/$self:custom.efsAccessPoint"
LocalMountPath: "$self:custom.LocalMountPath"

Some excerpts worth mentioning:

Memory Size: You will not have access to 10GB of memory by default. You need to open a ticket with AWS to support this use case. Note that you will not get any specific case for this request. I requested an increase in Lambda storage and explained that I needed more memory. It took a few days for AWS to accept it.

memorySize: 10240

Function URL: this line url: true Enables a public URL to invoke your function, mostly for development/debugging purposes only.

docker container build mode

provider:
...
ecr:
images:
appimage:
path: ./
...functions:
textToImg:
url: true
image:
name: appimage
timeout: 300

The serverless framework does a lot for you here: these alone will block:

  1. Create a private ECR repository
  2. Use a local Dockerfile to build your container
  3. tag image
  4. push it to private ECR repository
  5. create a lambda function that uses the docker image you just created

That being said, be prepared. Our build/deployment will take a long time compared to native AWS Lambda.

Here is the Dockerfile that I used:

FROM python:3.9.9-bullseyeWORKDIR /srcRUN apt-get update && \\
apt-get install -y \\
libgl1 libglib2.0-0 \\
g++ \\
make \\
cmake \\
unzip \\
libcurl4-openssl-dev
COPY requirements.txt /src/

RUN pip3 install -r requirements.txt --target /src/
COPY handler.py stable_diffusion_engine.py /src/

ENTRYPOINT [ "/usr/local/bin/python", "-m", "awslambdaric" ]
CMD [ "handler.handler" ]

This installs the AWS Lambda runtime interface and the dependencies we need to execute the stable-spread (openvino version).

Again, the acknowledgment is owned by the original author of the OpenVINO solution: https://github.com/bes-dev/stable_diffusion.openvino,

Before building a Docker image, you will need to customize it stable_diffusion_engine Get module and lambda handler. You can either pull them from my repository or customize the originals from the repository above.

The main changes required to the engine are related to the way the model is loaded:

        self.tokenizer = CLIPTokenizer.from_pretrained(
"/mnt/fs/models/clip")
(...)
self._text_encoder = self.core.read_model(
"/mnt/fs/models/text_encoder/text_encoder.xml")
(...)
self._unet = self.core.read_model(
"/mnt/fs/models/unet/unet.xml")
(...)
self._vae_decoder = self.core.read_model(
"/mnt/fs/models/vae_decoder/vae_decoder.xml")
(...)
self._vae_encoder = self.core.read_model(
"/mnt/fs/models/vae_encoder/vae_encoder.xml")

Then you can use the module in my handler (which demo.py from file https://github.com/bes-dev/stable_diffusion.openvino,

# -- coding: utf-8 --`
print("Starting container code...")
from dataclasses import dataclass
import numpy as np
import cv2
from diffusers import LMSDiscreteScheduler, PNDMScheduler
from stable_diffusion_engine import StableDiffusionEngine
import json
import os
@dataclass
class StableDiffusionArguments:
prompt: str
num_inference_steps: int
guidance_scale: float
models_dir: str
seed: int = None
init_image: str = None
beta_start: float = 0.00085
beta_end: float = 0.012
beta_schedule: str = "scaled_linear"
model: str = "bes-dev/stable-diffusion-v1-4-openvino"
mask: str = None
strength: float = 0.5
eta: float = 0.0
tokenizer: str = "openai/clip-vit-large-patch14"
def run_sd(args: StableDiffusionArguments):
if args.seed is not None:
np.random.seed(args.seed)
if args.init_image is None:
scheduler = LMSDiscreteScheduler(
beta_start=args.beta_start,
beta_end=args.beta_end,
beta_schedule=args.beta_schedule,
tensor_format="np",
)
else:
scheduler = PNDMScheduler(
beta_start=args.beta_start,
beta_end=args.beta_end,
beta_schedule=args.beta_schedule,
skip_prk_steps=True,
tensor_format="np",
)
engine = StableDiffusionEngine(
model=args.model, scheduler=scheduler, tokenizer=args.tokenizer, models_dir=args.models_dir
)
image = engine(
prompt=args.prompt,
init_image=None if args.init_image is None else cv2.imread(
args.init_image),
mask=None if args.mask is None else cv2.imread(args.mask, 0),
strength=args.strength,
num_inference_steps=args.num_inference_steps,
guidance_scale=args.guidance_scale,
eta=args.eta,
)
is_success, im_buf_arr = cv2.imencode(".jpg", image)
if not is_success:
raise ValueError("Failed to encode image as JPG")
byte_im = im_buf_arr.tobytes()
return byte_im
def handler(event, context, models_dir=None):
print("Getting into handler, event: ", event)
print("Working dir at handler...", )
current_dir = os.getcwd()
print(current_dir)
print(os.listdir(current_dir))
print("Listing root")
print(os.listdir("/"))
# Get args
# randomizer params
body = json.loads(event.get("body"))
prompt = body["prompt"]
seed = body.get("seed")
num_inference_steps: int = int(body.get("num_inference_steps", 32))
guidance_scale: float = float(body.get("guidance_scale", 7.5))
args = StableDiffusionArguments(
prompt=prompt,
seed=seed,
num_inference_steps=num_inference_steps,
guidance_scale=guidance_scale,
models_dir=models_dir
)
print("Parsed args:", args)
image = run_sd(args)
print("Image generated")
body = json.dumps(
"message": "wow, no way", "image": image.decode("latin1"))
return "statusCode": 200, "body": body

When you’ve finished deploying, the serverless framework should give you a URL, which you can call like this:

curl -X POST \\
\\
-H 'content-type: application/json' \\
-d '"prompt": "tree"'

If everything worked, you should see in the Cloud Watch log that it is generating the following image:

When I tested, the main loop took about three minutes and 238 seconds (four minutes) to execute the entire lambda.

the above curl will give you an unreadable string with the encoded image latin1, If you’re planning on actually using your lambda, you probably want something like this instead (I used this to test my container locally, replace the URL):

import requests
import json
headers = "content-type": "application/json"
url = ""
body = json.dumps("prompt": "beautiful tree", "num_inference_steps": 1)
response = requests.post(url, json="body": body, headers=headers)
response.raise_for_status()
j = response.json()
body = json.loads(j["body"])
bytes_img = body["image"].encode("latin1")
with open("test_result.png", "w+b") as fp:
fp.write(bytes_img)

Why! That was a lot of steps! You can now deploy your static propagation model to AWS Lambda. I hope you enjoyed reading this short tutorial. I’ll leave you with a few things I tried – but didn’t totally work – so maybe I can persuade you not to try them.

Are you wondering how bad trial and error was for me? Well, I don’t mind sharing – to fail is to learn.

So, I had read many times to deploy large model on Lambda and use AWS Elastic File System. And so I did.

I have configured regular AWS Lambda and connected it to EFS. However, when I had the code executed, I ran into an error when importing openvino runtime: libm.so.6 not found,

After some head-scratching and research, I learned that AWS Lambda runs on Amazon Linux and I should probably be building my library dependencies directly inside the EC2 instance.

Except, when I tried it, I found openvino Runtime version 2022 is not available for Amazon Linux (https://pypi.org/project/openvino/, Uh-oh, dead end.

A few days later, after discussing with a friend how much I missed using Docker compared to using serverless technologies, a light bulb popped up: what if I were to deploy Lambda? docker image use?

It turns out that the container image size limit is 10 GB, which is quite generous (https://aws.amazon.com/blogs/aws/new-for-aws-lambda-container-image-support/) – he has to work!

OK, not so fast. When it seemed like things were getting close to working, I ran into a few problems.

[ERROR] RuntimeError: Model file /src/models/unet/unet.xml cannot be opened!
Traceback (most recent call last):
File "/src/handler.py", line 129, in handler
image = run_sd(args)
File "/src/handler.py", line 76, in run_sd
engine = StableDiffusionEngine(
File "/src/stable_diffusion_engine.py", line 59, in __init__
self._unet = self.core.read_model(unet_xml, unet_bin)

Huh? I looked at this error for a few hours, debugging my environment, making sure the file was available, etc. Since, according to the OpenVINO API reference, core.read_model Can accept binary data directly, I changed my code a bit and tried loading the model into a dictionary of binary buffers ahead of time.

models = 
for model in ["text_encoder", "unet", "vae_decoder", "vae_encoder"]:
with open(f"./models/model/model.xml", "r+b") as fp:
models[f"model-xml"] = fp.read()
with open(f"./models/model/model.bin", "r+b") as fp:
models[f"model-bin"] = fp.read()

Except, I still got errors, but they were more meaningful this time.

[ERROR] OSError: [Errno 30] Read-only file system: './models/text_encoder/text_encoder.xml'
Traceback (most recent call last):
File "/usr/local/lib/python3.9/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1030, in _gcd_import
File "", line 1007, in _find_and_load
File "", line 986, in _find_and_load_unlocked
File "", line 680, in _load_unlocked
File "", line 850, in exec_module
File "", line 228, in _call_with_frames_removed
File "/src/handler.py", line 23, in
from stable_diffusion_engine import StableDiffusionEngine
File "/src/stable_diffusion_engine.py", line 30, in
with open(f"./models/model/model.xml", "r+b") as fp:

I double checked the python documentation and realized r+b actually means “open for update (read and write)”. Perhaps the filesystem is read-only. Let’s try again without it, just using rb rather than:

[ERROR] PermissionError: [Errno 13] Permission denied: './models/text_encoder/text_encoder.xml'
Traceback (most recent call last):
File "/usr/local/lib/python3.9/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1030, in _gcd_import
File "", line 1007, in _find_and_load
File "", line 986, in _find_and_load_unlocked
File "", line 680, in _load_unlocked
File "", line 850, in exec_module
File "", line 228, in _call_with_frames_removed
File "/src/handler.py", line 23, in
from stable_diffusion_engine import StableDiffusionEngine
File "/src/stable_diffusion_engine.py", line 30, in
with open(f"./models/model/model.xml", "rb") as fp:

ok maybe i need to copy the files /tmp first? No, it gave me the same error. I couldn’t make sense of it – the same code worked perfectly locally, and I tested it as well lambda runtime interface emulator, It had to be something to do with the environment.

AWS blocks binary reads from the container image filepath for security reasons. I never found out why exactly. Switching to a hybrid approach, where models are stored in EFS and code dependencies/libraries are in a Docker image, worked smoothly.

Ok, that’s all for the day!

I hope you enjoy reading it. encourage!

Leave a Reply