python - How do I get googlegemma-2-2b to strictly follow my prompt in Hugging Face Transformers? - Stack Overflow

admin2025-04-18 2

I'm using the following code to send a prompt to the "google/gemma-2-2b" model via Hugging Face's Transformers pipeline:

from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch

HUGGINGFACE_TOKEN = "<my-token>"

model_name = "google/gemma-2-2b"
tokenizer = AutoTokenizer.from_pretrained(model_name, token=HUGGINGFACE_TOKEN)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    torch_dtype=torch.float16,
    token=HUGGINGFACE_TOKEN
)
text_generator = pipeline("text-generation", model=model, tokenizer=tokenizer, token=HUGGINGFACE_TOKEN)

prompt = "What is the capital of France? Just select an option. Choose only one option from the following A) Paris B) London C) Delhi 4) Goa"
output = text_generator(prompt, max_new_tokens=100)
print(output)

Expected output:

A) Paris

Actual output:

[{'generated_text': 'What is the capital of France? Just select an option. Choose only one option from the following A) Paris B) London C) Delhi 4) Goa 5) New York ...'}]

The model seems to be echoing the prompt and then generating a long list of options, not strictly following my instructions.

How can I modify the prompt or generation parameters so that the model produces output that strictly follows the prompt without just repeating the input? Any suggestions on which settings (e.g., temperature, sampling flags) or prompt modifications can help ensure that the model generates new text according to my instructions?

I'm using the following code to send a prompt to the "google/gemma-2-2b" model via Hugging Face's Transformers pipeline:

from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch

HUGGINGFACE_TOKEN = "<my-token>"

model_name = "google/gemma-2-2b"
tokenizer = AutoTokenizer.from_pretrained(model_name, token=HUGGINGFACE_TOKEN)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    torch_dtype=torch.float16,
    token=HUGGINGFACE_TOKEN
)
text_generator = pipeline("text-generation", model=model, tokenizer=tokenizer, token=HUGGINGFACE_TOKEN)

prompt = "What is the capital of France? Just select an option. Choose only one option from the following A) Paris B) London C) Delhi 4) Goa"
output = text_generator(prompt, max_new_tokens=100)
print(output)

Expected output:

A) Paris

Actual output:

[{'generated_text': 'What is the capital of France? Just select an option. Choose only one option from the following A) Paris B) London C) Delhi 4) Goa 5) New York ...'}]

The model seems to be echoing the prompt and then generating a long list of options, not strictly following my instructions.

Share edited Mar 18 at 13:16 OmG 18.9k12 gold badges67 silver badges95 bronze badges asked Mar 6 at 20:24 hanugm 1,4274 gold badges23 silver badges52 bronze badges

You may find the desired output by improving the prompt and also tweaking the temperature of the model. Yet, this model is not appropriate for chat. It is for text completion. – OmG Commented Mar 10 at 16:54

Add a comment |

1 Answer 1

Sorted by: Reset to default 2

You are using the text completion capability of the model. The following can be a chat model interaction:

chat = [
    { "role": "user", "content": "<your prompt text>" },
]
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)

inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=150)

转载请注明原文地址:http://conceptsofalgorithm.com/Algorithm/1744951160a276345.html

pythonHow do I get googlegemma22b to strictly follow my prompt in Hugging Face TransformersStack Overflow

最新回复(0)