Amazon SageMaker¶
EasyLLM provides a client for interfacing with Amazon SageMaker models.
sagemaker.ChatCompletion
- a client for interfacing with sagemaker models that are compatible with the OpenAI ChatCompletion API.sagemaker.Completion
- a client for interfacing with sagemaker models that are compatible with the OpenAI Completion API.sagemaker.Embedding
- a client for interfacing with sagemaker models that are compatible with the OpenAI Embedding API.
sagemaker.ChatCompletion
¶
The sagemaker.ChatCompletion
client is used to interface with sagemaker models running on Text Generation inference that are compatible with the OpenAI ChatCompletion API. Checkout the Examples
import os
from easyllm.clients import sagemaker
# set env for prompt builder
os.environ["HUGGINGFACE_PROMPT"] = "llama2" # vicuna, wizardlm, stablebeluga, open_assistant
os.environ["AWS_REGION"] = "us-east-1" # change to your region
# os.environ["AWS_ACCESS_KEY_ID"] = "XXX" # needed if not using boto3 session
# os.environ["AWS_SECRET_ACCESS_KEY"] = "XXX" # needed if not using boto3 session
response = sagemaker.ChatCompletion.create(
model="huggingface-pytorch-tgi-inference-2023-08-08-14-15-52-703",
messages=[
{"role": "system", "content": "\nYou are a helpful, respectful and honest assistant."},
{"role": "user", "content": "Knock knock."},
],
temperature=0.9,
top_p=0.6,
max_tokens=1024,
)
Supported parameters are:
model
- The model to use for the completion. If not provided, defaults to the base url.messages
-List[ChatMessage]
to use for the completion.temperature
- The temperature to use for the completion. Defaults to 0.9.top_p
- The top_p to use for the completion. Defaults to 0.6.top_k
- The top_k to use for the completion. Defaults to 10.n
- The number of completions to generate. Defaults to 1.max_tokens
- The maximum number of tokens to generate. Defaults to 1024.stop
- The stop sequence(s) to use for the completion. Defaults to None.stream
- Whether to stream the completion. Defaults to False.frequency_penalty
- The frequency penalty to use for the completion. Defaults to 1.0.debug
- Whether to enable debug logging. Defaults to False.
sagemaker.Completion
¶
The sagemaker.Completion
client is used to interface with sagemaker models running on Text Generation inference that are compatible with the OpenAI Completion API. Checkout the Examples.
import os
from easyllm.clients import sagemaker
# set env for prompt builder
os.environ["HUGGINGFACE_PROMPT"] = "llama2" # vicuna, wizardlm, stablebeluga, open_assistant
os.environ["AWS_REGION"] = "us-east-1" # change to your region
# os.environ["AWS_ACCESS_KEY_ID"] = "XXX" # needed if not using boto3 session
# os.environ["AWS_SECRET_ACCESS_KEY"] = "XXX" # needed if not using boto3 session
response = sagemaker.Completion.create(
model="meta-llama/Llama-2-70b-chat-hf",
prompt="What is the meaning of life?",
temperature=0.9,
top_p=0.6,
max_tokens=1024,
)
Supported parameters are:
model
- The model to use for the completion. If not provided, defaults to the base url.prompt
- Text to use for the completion, if prompt_builder is set, prompt will be formatted with the prompt_builder.temperature
- The temperature to use for the completion. Defaults to 0.9.top_p
- The top_p to use for the completion. Defaults to 0.6.top_k
- The top_k to use for the completion. Defaults to 10.n
- The number of completions to generate. Defaults to 1.max_tokens
- The maximum number of tokens to generate. Defaults to 1024.stop
- The stop sequence(s) to use for the completion. Defaults to None.stream
- Whether to stream the completion. Defaults to False.frequency_penalty
- The frequency penalty to use for the completion. Defaults to 1.0.debug
- Whether to enable debug logging. Defaults to False.echo
- Whether to echo the prompt. Defaults to False.logprobs
- Weather to return logprobs. Defaults to None.
sagemaker.Embedding
¶
The sagemaker.Embedding
client is used to interface with sagemaker models running as an API that are compatible with the OpenAI Embedding API. Checkout the Examples for more details.
import os
# set env for prompt builder
os.environ["HUGGINGFACE_PROMPT"] = "llama2" # vicuna, wizardlm, stablebeluga, open_assistant
os.environ["AWS_REGION"] = "us-east-1" # change to your region
# os.environ["AWS_ACCESS_KEY_ID"] = "XXX" # needed if not using boto3 session
# os.environ["AWS_SECRET_ACCESS_KEY"] = "XXX" # needed if not using boto3 session
from easyllm.clients import sagemaker
embedding = sagemaker.Embedding.create(
model="SageMakerModelEmbeddingEndpoint24E49D09-64prhjuiWUtE",
input="That's a nice car.",
)
len(embedding["data"][0]["embedding"])
Supported parameters are:
model
- The model to use to create the embedding. If not provided, defaults to the base url.input
-Union[str, List[str]]
document(s) to embed.
Environment Configuration¶
You can configure the sagemaker
client by setting environment variables or overwriting the default values. See below on how to adjust the HF token, url and prompt builder.
Setting Credentials¶
By default the sagemaker
client will try to read the AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
environment variable. If this is not set, it will try to use boto3
.
Alternatively you can set the token manually by setting sagemaker.*
.
manually setting the api key:
from easyllm.clients import sagemaker
sagemaker.api_aws_access_key="xxx"
sagemaker.api_aws_secret_key="xxx"
res = sagemaker.ChatCompletion.create(...)
Using environment variable:
# can happen elsehwere
import os
os.environ["AWS_ACCESS_KEY_ID"] = "xxx"
os.environ["AWS_SECRET_ACCESS_KEY"] = "xxx"
from easyllm.clients import sagemaker
Build Prompt¶
By default the sagemaker
client will try to read the sagemaker_PROMPT
environment variable and tries to map the value to the PROMPT_MAPPING
dictionary. If this is not set, it will use the default prompt builder.
You can also set it manually.
Checkout the Prompt Utils for more details.
manually setting the prompt builder:
from easyllm.clients import sagemaker
sagemaker.prompt_builder = "llama2"
res = sagemaker.ChatCompletion.create(...)
Using environment variable: