How to use Chat Completion clients¶

EasyLLM can be used as an abstract layer to replace gpt-3.5-turbo and gpt-4 with open source models.

You can change your own applications from the OpenAI API, by simply changing the client.

Chat models take a series of messages as input, and return an AI-written message as output.

This guide illustrates the chat format with a few example API calls.

1. Import the easyllm library¶

In [ ]:

Copied!

# if needed, install and/or upgrade to the latest version of the EasyLLM Python library
%pip install --upgrade easyllm
# if needed, install and/or upgrade to the latest version of the EasyLLM Python library
%pip install --upgrade easyllm

In [4]:

Copied!

# import the EasyLLM Python library for calling the EasyLLM API
import easyllm
# import the EasyLLM Python library for calling the EasyLLM API
import easyllm

2. An example chat API call¶

A chat API call has two required inputs:

model: the name of the model you want to use (e.g., meta-llama/Llama-2-70b-chat-hf) or leave it empty to just call the api
messages: a list of message objects, where each object has two required fields:
- role: the role of the messenger (either system, user, or assistant)
- content: the content of the message (e.g., Write me a beautiful poem)

Compared to OpenAI api is the huggingface module also exposing a prompt_builder and stop_sequences parameter you can use to customize the prompt and stop sequences. The EasyLLM package comes with prompt builder utilities.

Let's look at an example chat API calls to see how the chat format works in practice.

In [1]:

Copied!





import os 
# set env for prompt builder
os.environ["HUGGINGFACE_PROMPT"] = "falcon" # vicuna, wizardlm, stablebeluga, open_assistant
# os.environ["HUGGINGFACE_TOKEN"] = "hf_xxx" 

from easyllm.clients import huggingface
from easyllm.prompt_utils.falcon import falcon_stop_sequences

MODEL="tiiuae/falcon-180B-chat"

response = huggingface.ChatCompletion.create(
    model=MODEL,
    messages=[
        {"role": "system", "content": "\nYou are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.\n\nIf a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information."},
        {"role": "user", "content": "Knock knock."},
        {"role": "assistant", "content": "Who's there?"},
        {"role": "user", "content": "Cat."},
    ],
      temperature=0.9,
      top_p=0.6,
      max_tokens=1024,
      stop=falcon_stop_sequences,
)
response
import os 
# set env for prompt builder
os.environ["HUGGINGFACE_PROMPT"] = "falcon" # vicuna, wizardlm, stablebeluga, open_assistant
# os.environ["HUGGINGFACE_TOKEN"] = "hf_xxx" 

from easyllm.clients import huggingface
from easyllm.prompt_utils.falcon import falcon_stop_sequences

MODEL="tiiuae/falcon-180B-chat"

response = huggingface.ChatCompletion.create(
    model=MODEL,
    messages=[
        {"role": "system", "content": "\nYou are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.\n\nIf a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information."},
        {"role": "user", "content": "Knock knock."},
        {"role": "assistant", "content": "Who's there?"},
        {"role": "user", "content": "Cat."},
    ],
      temperature=0.9,
      top_p=0.6,
      max_tokens=1024,
      stop=falcon_stop_sequences,
)
response

Out[1]:

{'id': 'hf-ceVG8KGm04',
 'object': 'chat.completion',
 'created': 1695106309,
 'model': 'tiiuae/falcon-180B-chat',
 'choices': [{'index': 0,
   'message': {'role': 'assistant',
    'content': "*Knock knock* Who's there? Cat. Cat who? Cat got your tongue?\nUser:"},
   'finish_reason': 'stop_sequence'}],
 'usage': {'prompt_tokens': 144, 'completion_tokens': 23, 'total_tokens': 167}}

As you can see, the response object has a few fields:

id: the ID of the request
object: the type of object returned (e.g., chat.completion)
created: the timestamp of the request
model: the full name of the model used to generate the response
usage: the number of tokens used to generate the replies, counting prompt, completion, and total
choices: a list of completion objects (only one, unless you set n greater than 1)
- message: the message object generated by the model, with role and content
- finish_reason: the reason the model stopped generating text (either stop, or length if max_tokens limit was reached)
- index: the index of the completion in the list of choices

Extract just the reply with:

In [2]:

Copied!

print(response['choices'][0]['message']['content'])
print(response['choices'][0]['message']['content'])

*Knock knock* Who's there? Cat. Cat who? Cat got your tongue?
User:

Even non-conversation-based tasks can fit into the chat format, by placing the instruction in the first user message.

For example, to ask the model to explain asynchronous programming in the style of the pirate Blackbeard, we can structure conversation as follows:

In [3]:

Copied!





# example with a system message
response = huggingface.ChatCompletion.create(
    model=MODEL,
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain asynchronous programming in the style of math teacher."},
    ],
    stop=falcon_stop_sequences,
)

print(response['choices'][0]['message']['content'])
# example with a system message
response = huggingface.ChatCompletion.create(
    model=MODEL,
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain asynchronous programming in the style of math teacher."},
    ],
    stop=falcon_stop_sequences,
)

print(response['choices'][0]['message']['content'])

Asynchronous Programming: A Mathematical Approach

Good day, class! Today we're going to discuss a fascinating topic in the world of programming - asynchronous programming. Now, you might be wondering what this has to do with math. Well, just like how mathematical operations can sometimes be synchronous or asynchronous, so too can computer programs.

Let's start by defining our terms. Synchronous processes are those that happen one after another, in a predictable sequence. For example, if you were to add two numbers together, then multiply the result by another number, these operations would typically happen synchronously – the addition occurs first, followed by the multiplication.

Asynchronous processes, on the other hand, don't necessarily follow such a strict order. They're more like parallel lines in geometry – they can run alongside each other independently, without waiting for one another to finish. In programming, this means that multiple tasks can be performed at the same time, without one task blocking another from starting.

So why is this useful? Well, imagine you're working on a complex mathematical problem that requires several calculations. If you were to perform these calculations synchronously, you'd have to wait for each calculation to finish before starting the next one. This could take quite some time, especially if your calculations are dependent on external factors such as user input or network latency.

With asynchronous programming, however, you can perform multiple calculations simultaneously. This means that while one calculation is waiting for user input, another can continue processing data from a different source. As a result, your overall computation time is reduced, making your program more efficient and responsive.

Of course, there are challenges involved in asynchronous programming, much like solving an intricate mathematical puzzle. One major issue is ensuring that all asynchronous tasks complete successfully, even if they encounter errors along the way. This requires careful planning and error handling, similar to how you would approach solving a complex equation.

In conclusion, asynchronous programming is a powerful tool in the programmer's toolkit, much like advanced mathematical concepts are essential for solving complex problems. By understanding the principles behind asynchronous processes, you can create more efficient and responsive programs, ready to tackle any challenge that comes their way.

Now, let's put this knowledge into practice with some coding exercises, shall we?