Common LLM parameters

This post is related to:

Parameter	Description	Location in Documentation
pad_token_id	The token ID used for padding sequences. This value indicates to the model where padding occurs.	Check the tokenizer documentation: Hugging Face Tokenizers
eos_token_id	The token ID that signifies the end of a sequence (End Of Sequence).	Check model documentation for specifics: Hugging Face Models
attention_mask	A mask that indicates which tokens should be attended to by the model (1 for actual tokens, 0 for padding).	Refer to model-specific usage in the Hugging Face library: Transformers Usage
input_ids	The token IDs representing the input sequence of text.	General model input specifications: Hugging Face Model Input
output_attention	A boolean flag indicating if the model should return attention weights in addition to outputs.	Detailed in specific model API docs: Transformers API
max_length	The maximum length of sequences for generation tasks, limiting how long the output can be.	Check generation section in model docs: Text Generation
num_return_sequences	The number of sequences to return from the model output, useful for generating multiple responses. The number of independently computed returned sequences for each element in the batch.	Refer to the text generation configuration: Text Generation
temperature	Strictly positive float value used to modulate the logits distribution. A value smaller than 1 decreases randomness (and vice versa), with 0 being equivalent to shifting all probability mass to the most likely token.	Discussed in generation options: Generation Parameters
top_k	The number of highest probability vocabulary tokens to keep for sampling.	Detailed in sampling strategies: Sampling Methods
top_p	Controls the cumulative probability distribution for nucleus sampling.	Explained in sampling calculations: Nucleus Sampling