How to use Chat Completion clients with Amazon Bedrock¶

EasyLLM can be used as an abstract layer to replace gpt-3.5-turbo and gpt-4 with Amazon Bedrock models.

You can change your own applications from the OpenAI API, by simply changing the client.

Chat models take a series of messages as input, and return an AI-written message as output.

This guide illustrates the chat format with a few example API calls.

0. Setup¶

Before you can use easyllm with Amazon Bedrock you need setup permission and access to the models. You can do this by following of the instructions below:

1. Import the easyllm library¶

In [ ]:

Copied!

# if needed, install and/or upgrade to the latest version of the EasyLLM Python library
%pip install --upgrade easyllm[bedrock]
# if needed, install and/or upgrade to the latest version of the EasyLLM Python library
%pip install --upgrade easyllm[bedrock]

In [6]:

Copied!

# import the EasyLLM Python library for calling the EasyLLM API
import easyllm
# import the EasyLLM Python library for calling the EasyLLM API
import easyllm

2. An example chat API call¶

A chat API call has two required inputs:

model: the name of the model you want to use (e.g., huggingface-pytorch-tgi-inference-2023-08-08-14-15-52-703) or leave it empty to just call the api
messages: a list of message objects, where each object has two required fields:
- role: the role of the messenger (either system, user, or assistant)
- content: the content of the message (e.g., Write me a beautiful poem)

Compared to OpenAI api is the huggingface module also exposing a prompt_builder and stop_sequences parameter you can use to customize the prompt and stop sequences. The EasyLLM package comes with prompt builder utilities.

Let's look at an example chat API calls to see how the chat format works in practice.

In [2]:

Copied!





import os 
# set env for prompt builder
os.environ["BEDROCK_PROMPT"] = "anthropic" # vicuna, wizardlm, stablebeluga, open_assistant
os.environ["AWS_REGION"] = "us-east-1"  # change to your region
# os.environ["AWS_ACCESS_KEY_ID"] = "XXX" # needed if not using boto3 session
# os.environ["AWS_SECRET_ACCESS_KEY"] = "XXX" # needed if not using boto3 session

from easyllm.clients import bedrock

response = bedrock.ChatCompletion.create(
    model="anthropic.claude-v2",
    messages=[
        {"role": "user", "content": "What is 2 + 2?"},
    ],
      temperature=0.9,
      top_p=0.6,
      max_tokens=1024,
      debug=False,
)
response
import os 
# set env for prompt builder
os.environ["BEDROCK_PROMPT"] = "anthropic" # vicuna, wizardlm, stablebeluga, open_assistant
os.environ["AWS_REGION"] = "us-east-1"  # change to your region
# os.environ["AWS_ACCESS_KEY_ID"] = "XXX" # needed if not using boto3 session
# os.environ["AWS_SECRET_ACCESS_KEY"] = "XXX" # needed if not using boto3 session

from easyllm.clients import bedrock

response = bedrock.ChatCompletion.create(
    model="anthropic.claude-v2",
    messages=[
        {"role": "user", "content": "What is 2 + 2?"},
    ],
      temperature=0.9,
      top_p=0.6,
      max_tokens=1024,
      debug=False,
)
response

{'completion': ' 2 + 2 = 4', 'stop_reason': 'stop_sequence'}

Out[2]:

{'id': 'hf-Mf7UqliZQP',
 'object': 'chat.completion',
 'created': 1698333425,
 'model': 'anthropic.claude-v2',
 'choices': [{'index': 0,
   'message': {'role': 'assistant', 'content': '2 + 2 = 4'},
   'finish_reason': 'stop_sequence'}],
 'usage': {'prompt_tokens': 9, 'completion_tokens': 9, 'total_tokens': 18}}

As you can see, the response object has a few fields:

id: the ID of the request
object: the type of object returned (e.g., chat.completion)
created: the timestamp of the request
model: the full name of the model used to generate the response
usage: the number of tokens used to generate the replies, counting prompt, completion, and total
choices: a list of completion objects (only one, unless you set n greater than 1)
- message: the message object generated by the model, with role and content
- finish_reason: the reason the model stopped generating text (either stop, or length if max_tokens limit was reached)
- index: the index of the completion in the list of choices

Extract just the reply with:

In [3]:

Copied!

print(response['choices'][0]['message']['content'])
print(response['choices'][0]['message']['content'])

2 + 2 = 4

Even non-conversation-based tasks can fit into the chat format, by placing the instruction in the first user message.

For example, to ask the model to explain asynchronous programming in the style of the pirate Blackbeard, we can structure conversation as follows:

In [4]:

Copied!





# example with a system message
response = bedrock.ChatCompletion.create(
    model="anthropic.claude-v2",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain asynchronous programming in the style of math teacher."},
    ],
)

print(response['choices'][0]['message']['content'])
# example with a system message
response = bedrock.ChatCompletion.create(
    model="anthropic.claude-v2",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain asynchronous programming in the style of math teacher."},
    ],
)

print(response['choices'][0]['message']['content'])

{'completion': ' Okay class, today we\'re going to learn about asynchronous programming. Asynchronous means things happening at different times, not necessarily in order. It\'s like when you\'re cooking dinner - you might put the pasta on to boil, then start chopping vegetables while the pasta cooks. You don\'t have to wait for the pasta to finish boiling before you can start on the vegetables. The two tasks are happening asynchronously. \n\nIn programming, asynchronous functions allow the code to execute other operations while waiting for a long-running task to complete. Let\'s look at an example:\n\n```js\nfunction cookPasta() {\n  console.log("Putting pasta on to boil...");\n  // Simulate a long task\n  setTimeout(() => {\n    console.log("Pasta done!");\n  }, 5000); \n}\n\nfunction chopVegetables() {\n  console.log("Chopping vegetables...");\n}\n\ncookPasta();\nchopVegetables();\n```\n\nWhen we call `cookPasta()`, it starts the timer but doesn\'t wait 5 seconds - it immediately moves on to calling `chopVegetables()`. So the two functions run asynchronously. \n\nThe key is that `cookPasta()` is non-blocking - it doesn\'t stop the rest of the code from running while it completes. This allows us to maximize efficiency and not waste time waiting.\n\nSo in summary, asynchronous programming allows multiple operations to happen independently of each other, like cooking a meal. We avoid blocking code execution by using asynchronous functions. Any questions on this?', 'stop_reason': 'stop_sequence'}
Okay class, today we're going to learn about asynchronous programming. Asynchronous means things happening at different times, not necessarily in order. It's like when you're cooking dinner - you might put the pasta on to boil, then start chopping vegetables while the pasta cooks. You don't have to wait for the pasta to finish boiling before you can start on the vegetables. The two tasks are happening asynchronously. 

In programming, asynchronous functions allow the code to execute other operations while waiting for a long-running task to complete. Let's look at an example:

```js
function cookPasta() {
  console.log("Putting pasta on to boil...");
  // Simulate a long task
  setTimeout(() => {
    console.log("Pasta done!");
  }, 5000); 
}

function chopVegetables() {
  console.log("Chopping vegetables...");
}

cookPasta();
chopVegetables();
```

When we call `cookPasta()`, it starts the timer but doesn't wait 5 seconds - it immediately moves on to calling `chopVegetables()`. So the two functions run asynchronously. 

The key is that `cookPasta()` is non-blocking - it doesn't stop the rest of the code from running while it completes. This allows us to maximize efficiency and not waste time waiting.

So in summary, asynchronous programming allows multiple operations to happen independently of each other, like cooking a meal. We avoid blocking code execution by using asynchronous functions. Any questions on this?

In [5]:

Copied!





# example without a system message and debug flag on:
response = bedrock.ChatCompletion.create(
    model="anthropic.claude-v2",
    messages=[
        {"role": "user", "content": "Explain asynchronous programming in the style of the pirate Blackbeard."},
    ]
)

print(response['choices'][0]['message']['content'])
# example without a system message and debug flag on:
response = bedrock.ChatCompletion.create(
    model="anthropic.claude-v2",
    messages=[
        {"role": "user", "content": "Explain asynchronous programming in the style of the pirate Blackbeard."},
    ]
)

print(response['choices'][0]['message']['content'])

{'completion': " Aye matey! Asynchronous programming be when ye fire yer cannons without waiting fer each shot to hit. Ye keep loadin' and shootin' while the cannonballs sail through the air. Ye don't know exactly when they'll strike the target, but ye keep sendin' 'em off. \n\nThe ship keeps movin' forward, not stalled waiting fer each blast. Other pirates keep swabbin' the decks and hoistin' the sails so we make progress while the cannons thunder. We tie callbacks to the cannons to handle the boom when they finally hit.\n\nArrr! Asynchronous programmin' means ye do lots o' tasks at once, not blocked by waitin' fer each one to finish. Ye move ahead and let functions handle the results when ready. It be faster than linear code that stops at each step. Thar be treasures ahead, lads! Keep those cannons roarin'!", 'stop_reason': 'stop_sequence'}
Aye matey! Asynchronous programming be when ye fire yer cannons without waiting fer each shot to hit. Ye keep loadin' and shootin' while the cannonballs sail through the air. Ye don't know exactly when they'll strike the target, but ye keep sendin' 'em off. 

The ship keeps movin' forward, not stalled waiting fer each blast. Other pirates keep swabbin' the decks and hoistin' the sails so we make progress while the cannons thunder. We tie callbacks to the cannons to handle the boom when they finally hit.

Arrr! Asynchronous programmin' means ye do lots o' tasks at once, not blocked by waitin' fer each one to finish. Ye move ahead and let functions handle the results when ready. It be faster than linear code that stops at each step. Thar be treasures ahead, lads! Keep those cannons roarin'!

3. Few-shot prompting¶

In some cases, it's easier to show the model what you want rather than tell the model what you want.

One way to show the model what you want is with faked example messages.

For example:

In [6]:

Copied!





# An example of a faked few-shot conversation to prime the model into translating business jargon to simpler speech
response = bedrock.ChatCompletion.create(
    model="anthropic.claude-v2",
    messages=[
        {"role": "system", "content": "You are a helpful, pattern-following assistant."},
        {"role": "user", "content": "Help me translate the following corporate jargon into plain English."},
        {"role": "assistant", "content": "Sure, I'd be happy to!"},
        {"role": "user", "content": "New synergies will help drive top-line growth."},
        {"role": "assistant", "content": "Things working well together will increase revenue."},
        {"role": "user", "content": "Let's circle back when we have more bandwidth to touch base on opportunities for increased leverage."},
        {"role": "assistant", "content": "Let's talk later when we're less busy about how to do better."},
        {"role": "user", "content": "This late pivot means we don't have time to boil the ocean for the client deliverable."},
    ],
)

print(response["choices"][0]["message"]["content"])
# An example of a faked few-shot conversation to prime the model into translating business jargon to simpler speech
response = bedrock.ChatCompletion.create(
    model="anthropic.claude-v2",
    messages=[
        {"role": "system", "content": "You are a helpful, pattern-following assistant."},
        {"role": "user", "content": "Help me translate the following corporate jargon into plain English."},
        {"role": "assistant", "content": "Sure, I'd be happy to!"},
        {"role": "user", "content": "New synergies will help drive top-line growth."},
        {"role": "assistant", "content": "Things working well together will increase revenue."},
        {"role": "user", "content": "Let's circle back when we have more bandwidth to touch base on opportunities for increased leverage."},
        {"role": "assistant", "content": "Let's talk later when we're less busy about how to do better."},
        {"role": "user", "content": "This late pivot means we don't have time to boil the ocean for the client deliverable."},
    ],
)

print(response["choices"][0]["message"]["content"])

{'completion': " Changing direction at the last minute means we don't have time to do an exhaustive analysis for what we're providing to the client.", 'stop_reason': 'stop_sequence'}
Changing direction at the last minute means we don't have time to do an exhaustive analysis for what we're providing to the client.

Not every attempt at engineering conversations will succeed at first.

If your first attempts fail, don't be afraid to experiment with different ways of priming or conditioning the model.

As an example, one developer discovered an increase in accuracy when they inserted a user message that said "Great job so far, these have been perfect" to help condition the model into providing higher quality responses.

For more ideas on how to lift the reliability of the models, consider reading our guide on techniques to increase reliability. It was written for non-chat models, but many of its principles still apply.