Amazon SageMaker is a managed machine learning service that provides a variety of features for building, training, and deploying machine learning models. One of these features is the ability to create endpoints, which are hosted instances of machine learning models that can be used to make predictions.
When deploying a machine learning model to an endpoint, you have two options: multi-model and multi-container.
Multi-model endpoints (MME) allow you to deploy multiple models in a single container and share the same endpoint. This is useful if you have multiple models that come from the same ML framework, the same algorithm, and perform the same tasks (e.g., image classification). Referring to this example, we just need to specify the S3 directory that contains all the models that SageMaker multi-model endpoints will use to load and serve predictions. We can see that the S3 prefix we specified when setting up MultiDataModel now has multiple model artifacts. As such, the endpoint can now serve up inference requests for these models. For instance:
However, it is worth mentioning that it is possible to deploy models from
different framework backends to SageMaker MME if they can use the same
container image, as demonstrated in this
example.
Notice that they create one container and set Mode
to MultiModel
, and
similarly use TargetModel
to specify which model to use.
Multi-container endpoints allow you to deploy multiple models in separate containers and share the same endpoint. This is useful if you have multiple models that are different from each other, such as a model for image classification and a model for natural language processing. Another common use case of this is to deploy multiple HF models into different containers, and each of them uses different environment variables. For instance:
Additional note on zipping model:
If you search online, you will find Python code that loads the model and tokenizer from the Hugging Face model hub, and utilizes the tarfile Python library. However, I would suggest handling the model cloning and zipping using terminal commands to ensure the tar file is created correctly. Doing it with Python code was nasty and wasted a few hours of my time.
Please read the docs mentioned below for better understanding:
Related:
- Multi-model: Amazon SageMaker Multi-Model Endpoints using XGBoost
- Multi-model: Run mulitple deep learning models on GPUs with Amazon SageMaker Multi-model endpoints (MME)
- Difference between Multi-model and multi-container
- Multi-Container Endpoints with Hugging Face Transformers and Amazon SageMaker
- Available images
- Hugginface DLC