I'm using the following code to send a prompt to the "google/gemma-2-2b" model via Hugging Face's Transformers pipeline:
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch
HUGGINGFACE_TOKEN = "<my-token>"
model_name = "google/gemma-2-2b"
tokenizer = AutoTokenizer.from_pretrained(model_name, token=HUGGINGFACE_TOKEN)
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map="auto",
torch_dtype=torch.float16,
token=HUGGINGFACE_TOKEN
)
text_generator = pipeline("text-generation", model=model, tokenizer=tokenizer, token=HUGGINGFACE_TOKEN)
prompt = "What is the capital of France? Just select an option. Choose only one option from the following A) Paris B) London C) Delhi 4) Goa"
output = text_generator(prompt, max_new_tokens=100)
print(output)
Expected output:
A) Paris
Actual output:
[{'generated_text': 'What is the capital of France? Just select an option. Choose only one option from the following A) Paris B) London C) Delhi 4) Goa 5) New York ...'}]
The model seems to be echoing the prompt and then generating a long list of options, not strictly following my instructions.
How can I modify the prompt or generation parameters so that the model produces output that strictly follows the prompt without just repeating the input? Any suggestions on which settings (e.g., temperature, sampling flags) or prompt modifications can help ensure that the model generates new text according to my instructions?
I'm using the following code to send a prompt to the "google/gemma-2-2b" model via Hugging Face's Transformers pipeline:
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch
HUGGINGFACE_TOKEN = "<my-token>"
model_name = "google/gemma-2-2b"
tokenizer = AutoTokenizer.from_pretrained(model_name, token=HUGGINGFACE_TOKEN)
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map="auto",
torch_dtype=torch.float16,
token=HUGGINGFACE_TOKEN
)
text_generator = pipeline("text-generation", model=model, tokenizer=tokenizer, token=HUGGINGFACE_TOKEN)
prompt = "What is the capital of France? Just select an option. Choose only one option from the following A) Paris B) London C) Delhi 4) Goa"
output = text_generator(prompt, max_new_tokens=100)
print(output)
Expected output:
A) Paris
Actual output:
[{'generated_text': 'What is the capital of France? Just select an option. Choose only one option from the following A) Paris B) London C) Delhi 4) Goa 5) New York ...'}]
The model seems to be echoing the prompt and then generating a long list of options, not strictly following my instructions.
How can I modify the prompt or generation parameters so that the model produces output that strictly follows the prompt without just repeating the input? Any suggestions on which settings (e.g., temperature, sampling flags) or prompt modifications can help ensure that the model generates new text according to my instructions?
You are using the text completion capability of the model. The following can be a chat model interaction:
chat = [
{ "role": "user", "content": "<your prompt text>" },
]
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=150)