Adjusting LLM Parameters

All Lessons

Understanding LLMs

All Lessons

Adjusting LLM Parameters

Free

Dr. Amir Mohammadi

Generative AI Instructor

To unlock the full potential of LLMs, we need to understand the parameters that allow us to customize and fine-tune the output.

Understanding LLM Parameters

To unlock the full potential of LLMs, we need to understand the parameters that allow us to customize and fine-tune the output. These settings act as dials that you can adjust to control the style, creativity, length, and coherence of the text that the model generates. In this section, we’ll go over the key parameters that influence how your language model behaves.

Key Parameters for Customizing LLM Output

Temperature
The temperature parameter controls the randomness or creativity of the text generated by the model. Essentially, it decides how predictable or novel the responses should be.
- Low Temperature (0.1 - 0.3): Produces more deterministic, factual, and coherent responses. It’s useful for situations where you need consistency or precision (like code generation or technical writing).
- High Temperature (0.7 - 1.0): Results in more creative and diverse responses. This is ideal when you're looking for originality, brainstorming, or generating imaginative content like stories or poems.
When to use low temperature: If you need accuracy, like generating technical content, fact-based answers, or instructions.
When to use high temperature: If you need variety, like creative writing, brainstorming, or exploring multiple possibilities.

Top-p (Nucleus Sampling)
Top-p is a parameter that controls the diversity of the output by narrowing down the pool of potential words that the model can choose from. It’s also known as nucleus sampling. By adjusting top-p, you can limit the model to a subset of words that together represent a specific percentage of the probability mass.
- Low p (0.1 - 0.3): Limits the selection to the most probable words, resulting in more predictable, conservative text.
- High p (0.7 - 1.0): Expands the selection to a wider range of words, producing more diverse and sometimes surprising outputs.
When to use low top-p: If you need more controlled, relevant responses where coherence is important (e.g., technical answers or instructions).
When to use high top-p: If you want creative, less predictable output that introduces novelty and diversity (e.g., creative writing, exploring multiple ideas).

Max Tokens
The max tokens parameter sets the maximum length of the generated response in terms of tokens. Tokens are the smallest unit of text, which can be as short as one character or as long as one word. This parameter helps manage how long or short the model's output should be.
- Short Outputs (50-150 tokens): Ideal for quick responses, summaries, or specific answers.
- Long Outputs (200+ tokens): Suitable for detailed explanations, essays, or creative content that requires more elaboration.
When to use max tokens: Set the token limit based on the amount of content you need. If you require a concise answer, use a smaller max token. For more in-depth explanations, increase the token limit.

Why Adjust LLM Parameters?

Now that we know what each parameter does, let's discuss why it’s important to adjust them. Customizing these parameters allows you to:

Enhance Quality: Fine-tuning helps you control the coherence, relevance, and factual accuracy of the output.
Increase Creativity and Diversity: By adjusting temperature and top-p, you can unlock more varied and imaginative responses, which is valuable for brainstorming or content creation.
Ensure Relevance: Lower temperature and top-p values keep the output focused on the specific task at hand, preventing the model from straying off-topic.

Real-World Applications

The ability to adjust parameters is not just a theoretical concept—it has real-world applications. Below are a few examples of how you might adjust parameters for different scenarios:

Customer Support: For generating precise answers, you’ll want a low temperature and low top-p. This ensures that the responses are focused, clear, and accurate.
Creative Writing: When writing a poem, story, or brainstorm ideas, you would likely use a higher temperature and top-p, allowing the model to generate more creative and diverse content.
Technical Writing: For documentation, product descriptions, or code generation, you'd use lower temperature and top-p settings to ensure clarity, precision, and conciseness.

Activity 1: Experimenting with Parameters

In this activity, you will experiment with different parameter settings to see how they affect the output. Use the following prompt:

Prompt: "Write a 100-word description of a futuristic city in the year 2050."

Set temperature to 0.2, top-p to 0.5, and max tokens to 100: Analyze the output. Is it clear, focused, and factual?
Set temperature to 0.9, top-p to 0.9, and max tokens to 200: Compare this output with the first one. Is it more creative and diverse?
Set temperature to 0.7, top-p to 0.8, and max tokens to 150: Observe the balance between creativity and coherence.

Activity 2: Parameter Adjustments for Different Scenarios

Now, apply your understanding of the parameters to the following use cases:

Customer Inquiry: "How do I reset my password?" – Adjust temperature, top-p, and max tokens for a concise, helpful response.
Creative Poem: Generate a short poem about spring with a focus on creativity and diversity.
Code Generation: Solve a simple math problem with a code snippet that should be precise and functional.