Differences in Prompting Techniques: Claude vs. GPT

Introduction

When working with language models (LLMs) like Claude and GPT, the effectiveness of prompts can vary significantly based on the model’s architecture and capabilities. This article explores key differences in prompting strategies for Claude and GPT, using advanced techniques such as meta-prompting, XML tagging, and the Chain of Thoughts (CoT) method. By understanding these differences, users can optimize their prompts to enhance the accuracy and reliability of LLM outputs.

Context Window Size and Information Processing

One of the most significant differences between Claude and GPT lies in their context window size—the amount of text they can handle in a single prompt.

  • Claude (anthropic/claude-3.5-sonnet) can process up to 200,000 tokens making it ideal for tasks requiring the analysis of large documents or aggregating data from multiple sources. This feature is especially valuable in fields like business and academic research, where Claude can process entire reports or research papers in one go. Because Claude has such a large capacity, it’s essential to be clear and explicit in prompts to help the model focus on the relevant parts of the input.
  • GPT-4 (gpt-4o-2024-08-06) supports up to 128,000 tokens, which, while smaller than Claude’s window, still represents a significant improvement from previous models. This makes GPT ideal for tasks involving moderate-length documents or specific, well-defined queries.

Our Claude prompts in average have 9000 tokens per input and 600 tokens per output. For GPT it’s 9000 tokens per input and 650 tokens per out, because of different token embeddings 

In practice, Claude’s larger context window means prompt engineering requires attention to chunking and summarizing within the input to make the most of its potential, while GPT might be better suited for more focused and precise tasks.

Role of Examples and Instructions

Both models work better when provided with examples and clear instructions:

  • In Claude, examples play a central role in shaping the model’s response. For complex tasks, offering several examples that showcase the expected response pattern can significantly improve the output. It’s important to ensure that these examples are concise and relevant, avoiding unnecessary complexity. Importantly, examples have to be diverse in order not to tend LLM to the basis.

Example prompt with One Shot. We provided our prompt with an example of correctly resolving our Testcase-2 (1100-C). 

Prompt:

You are a Casework expert tasked with reviewing a specification and selecting the correct Series option that represents cabinet materials and thickness. Follow these instructions carefully:

1. Review the available Series options:

<options>

{OPTIONS}

</options>

2. Carefully read and understand the following instructions for selection:

{INSTRUCTION}

3. Examine the provided specification:

<specification>

SPECIFICATION_USER_INPUT

</specification>

4. Think through your decision-making process step by step. Consider the following:

   a. Identify key information in the specification related to cabinet materials and thickness.

   b. Compare this information to the criteria outlined in the instruction.

   c. Match the identified information with the available Series options.

   d. Eliminate options that do not meet the criteria.

   e. Select the most appropriate Series option based on your analysis.

5. After your analysis, provide your reasoning and selection in the following format:

   <thinking>

   [Include your step-by-step reasoning here]

   </thinking>

   <result>

   RESULT: [Insert selected Series option here]

   </result>

<example>

Specification:
PART 2 – PRODUCTS

2.2 MATERIALS

General: Provide materials that comply with requirements of the AWI quality standard for each type of woodwork and quality grade specified, unless otherwise indicated.

Wood Products: Comply with the following:

Particleboard: ANSI A208.1, Grade M-2 M-2-Exterior Glue (at wet sink areas).

High-Pressure Decorative Laminate: NEMA LD 3, grades as indicated, or if not indicated, as required by woodwork quality standard.

Available Manufacturers: Subject to compliance with requirements, manufacturers offering high-pressure decorative laminates that may be incorporated into the Work include, but are not limited to, the following:

Formica Corporation.

Nevamar.

Wilsonart International; Div. of Premark International, Inc.

Adhesive for Bonding Plastic Laminate: PVA.

Adhesive for Bonding Edges: Adhesive specified above for faces.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

2.8 LAMINATE COUNTERTOPS

Quality Standard: Comply with AWI Section 400 requirements for high-pressure decorative laminate countertops.

Grade: Premium.

High-Pressure Decorative Laminate Grade: HGS.

Colors, Patterns, and Finishes: Provide materials and products that result in colors and textures of exposed laminate surfaces complying with the following requirements:

Provide Architect’s selections from manufacturer’s full range of colors and finishes in the following categories:

Solid colors.

Patterns.

Wood grains.

Grain Direction: Parallel to cabinet fronts.

Edge Treatment: 3mm PVC. Color to match laminate.

Core Material: One inch Particleboard

Core Material at Sinks: Particleboard made with exterior glue, or exterior-grade plywood.


Answer:

<thinking>

Let’s analyze the specification step-by-step:

1. Core Material Selection:

– The specification mentions “Particleboard: ANSI A208.1, Grade M-2 M-2-Exterior Glue (at wet sink areas).”

– There’s no mention of NAUF or urea formaldehyde restrictions.

– Default to Industrial Particleboard CARB Compliant.

2. Fronts Material Selection:

  – High-Pressure Decorative Laminate (HPL) is specified for exposed surfaces. This indicates HPL for fronts.

3. Fronts Thickness Determination:

– The specification mentions “Vertical Surfaces: GP-50 (0.050-inch nominal thickness)”

– This corresponds to the standard thickness for HPL.

4. Ends Material Selection:

– Not explicitly specified, but we can assume it’s the same as fronts (HPL).

5. Fronts Edging Selection:

– The specification states “Provide minimum .020″ thick solid PVC edging on all doors and drawer fronts.”

– This indicates Square Flat Edge.

6. Cabinet Edgebanding Selection:

– Not explicitly specified, default to Flat (.020″) PVC Cabinet Edgebanding.

Based on these selections, we can narrow down our options to series that have:

– Industrial Particleboard CARB Compliant core

– HPL fronts and ends

– Square Flat Edge

The series that best matches these criteria is:

</thinking>

<result>

RESULT: 1100-C – Sq Flat Edge HPL Frts & HPL Ends IPB CARB Compliant Core

</result>

</example>

Remember to base your decision solely on the information provided in the specification, instruction, and available options. Do not make assumptions or introduce external 

information.

OUTPUT:

Result: LLM chose IPB Core Compliant and Sq Flat Edge, although previously she easily solved this test case (1200-NAUF), which means that one test case can form the basis to which the LLM will tend when making its answer. Therefore, according to the Claude documentation, examples should be diverse.

Example prompt with Few Shots. We provided our prompt with an example of correctly resolving our second test case (1100-C) and first (1200-N). 

OUTPUT:

We asked Claude to evaluate our examples on relevance, diversity and clarity.

OUTPUT:

  • GPT performs well when given examples to demonstrate expected output, especially in creative or technical contexts. Its ability to generalize from a few examples (i.e., few-shot learning) allows it to produce accurate outputs even from a limited context.

Conclusion:

We can add as many solved test cases to the prompt as long as it all fits into the context window of LLM, and as research shows it will increase accuracy, but unfortunately it is not practical. Although incorporating nine chain-of-thought resolved results into a prompt can improve accuracy in solving tasks, the cost of 35 cents per call makes this method impractical for production use. These nine test cases, comprising around 100,000 tokens, would fit into the context windows of both GPT (128,000 tokens) and Claude (200,000 tokens). An alternative approach to this problem could be implementing Retrieval-Augmented Generation (RAG) with a few-shot chain of thought. This would allow us to insert only the three most relevant resolved test cases into the prompt based on vector search matching the input specification, optimizing both accuracy and relevance. Unfortunately, implementing RAG would require at least 25 test cases to function effectively, which we currently lack. Additionally, RAG would be time-consuming to implement and challenging to test, despite its potential accuracy benefits.

Meta-Prompting

Meta-prompting is a technique used to refine and enhance the performance of a prompt. For Claude LLM, meta-prompting can significantly improve prompt effectiveness. The Claude documentation recommends using the  “Generate a Prompt” feature that allows users to quickly create the first prompt or enhance their existing prompts. This functionality supports iterative improvements, helping to fine-tune prompts based on specific needs.

In contrast, meta-prompting for GPT has not shown substantial improvements in performance in our tests. Although this may be case-specific, it highlights a fundamental difference in how these models handle prompt enhancements.

Claude playground

Formatting Prompts

The way prompts are formatted and segmented can affect the performance of LLMs. Claude’s documentation recommends using XML tags to separate parts of the prompt template or large prompts. This method has proven effective in practice, improving the results significantly.
For GPT, it is advised to specify the delimiter being used, such as “You will be provided with a pair of articles (delimited with XML tags).” This indicates a fundamental difference in how Claude and GPT handle prompt formatting.м

Results of correct formatting for Claude are described here, where meta-prompting made correct formatting.

Example for Claude:

Example for GPT:

OR

System and User Input Separation

GPT and Claude’s performance benefits from separating system prompts and user inputs, allowing the model to process information more effectively. For instance, the system prompt can be set up with specific instructions, while the user input follows separately.

In contrast, separating system and user inputs for GPT often results in diminished performance. GPT tends to handle tasks more efficiently when all instructions are provided in a single system message.

Example for Claude:

Additionally, it was observed that when XML tags were used to separate user input for Claude, it actually led to worse results. Specifically, tagging inputs separately (e.g., encapsulating user input within XML tags) made it harder for Claude to generate meaningful responses, likely due to how the model parses structured inputs compared to plain text. This finding highlights the importance of avoiding unnecessary formatting structures like XML tags when separating inputs for Claude.

Example Prompt for Claude:

You are a Casework expert tasked with reviewing a specification and selecting the correct Series option that represents cabinet materials and thickness. Follow these instructions carefully:

1. Review the available Series options:

<options>

{OPTIONS}

</options>

2. Carefully read and understand the following instructions for selection:

{INSTRUCTION}

3. Examine the provided specification:

<specification>

SPECIFICATION_USER_INPUT

</specification>

4. Think through your decision-making process step by step. Consider the following:

   a. Identify key information in the specification related to cabinet materials and thickness.

   b. Compare this information to the criteria outlined in the instruction.

   c. Match the identified information with the available Series options.

   d. Eliminate options that do not meet the criteria.

   e. Select the most appropriate Series option based on your analysis.

5. After your analysis, provide your reasoning and selection in the following format:

   <thinking>

   [Include your step-by-step reasoning here]

   </thinking>

   <result>

   RESULT: [Insert selected Series option here]

   </result>
Remember to base your decision solely on the information provided in the specification, instruction, and available options.
Do not make assumptions or introduce external information.

Worse results:

messages=[
    {
        "role": "system",
        "content": prompt_template.format(
            OPTIONS=options, INSTRUCTION=instruction,
        ),
    },
    {
        "role": "user",
        "content": f"<specification>{input_text}<specification/>",
    },
],

Better results:

messages=[
    {
        "role": "system",
        "content": prompt_template.format(
            OPTIONS=options, INSTRUCTION=instruction,
        ),
    },
    {
        "role": "user",
        "content": input_text,
    },
],

Chain of Thoughts (CoT) Technique

The Chain of Thoughts (CoT) technique is essential for guiding LLMs through complex instructions by encouraging them to think step-by-step. For Claude, it is recommended to use XML tags to designate where the model should include its reasoning. This structured approach helps Claude organize its thoughts and produce coherent outputs.
For GPT, a simpler approach of instructing the model to “think step by step” or “take your time” is often sufficient. This technique encourages GPT to consider each step in the process but does not require the specific tagging used in Claude.

Example for Claude:

<thinking>

   [Include your step-by-step reasoning here]

</thinking>

Output:
<thinking>

Let’s analyze the specification step-by-step:

1. Core Material Selection:

   – The specification mentions “Particleboard: ANSI A208.1, Grade M-2 M-2-Exterior Glue (at wet sink areas).”

   – There’s no mention of NAUF or urea formaldehyde restrictions.

   – Default to Industrial Particleboard CARB Compliant.

– – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – 

Based on these selections, we can narrow down our options to series that have:

– Industrial Particleboard CARB Compliant core

– HPL fronts and ends

– Square Flat Edge

The series that best matches these criteria is:

</thinking>

Example for GPT:

You think step by step. Take your time to answer.

Output:

### Step-by-Step Analysis:

#### 1. Material Requirements:

– **Core Material**: The specification mentions that adhesives should not contain urea formaldehyde. Therefore, we should select NAUF (No Added Urea Formaldehyde) Particleboard.

– **Fronts Material**: The specification requires high-pressure decorative laminate (HPL) for exposed surfaces.

– **Fronts Thickness**: The specification mentions GP-50 (0.050-inch nominal thickness) for horizontal surfaces and GP-28 (0.028-inch nominal thickness) for vertical surfaces. However, for the fronts, we will consider the standard thickness options provided in the series.

– **Ends Material**: The ends should match the fronts, which is HPL.

– **Fronts Edging**: The specification does not allow PVC edging, so we should look for options without PVC edging.

– **Cabinet Edgebanding**: The specification does not allow PVC edging, so we should look for options without PVC edging.

 – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – 

#### 7. Cabinet Edgebanding Selection:

– **Selected Cabinet Edgebanding**: Since PVC edging is not allowed, we should look for options without PVC edging.

### Final Review:

– **Core Material**: NAUF PB Core.

– **Fronts Material**: 3mm High Pressure Laminate (HPL).

– **Ends Material**: 3mm High Pressure Laminate (HPL).

– **Fronts Edging**: 3mm Radius Edge.

– **Cabinet Edgebanding**: Options without PVC edging.

One of the key recommendations in the GPT documentation is to decompose complex tasks into simpler, more manageable steps. This method significantly enhances performance and reduces the likelihood of the model making mistakes by handling more digestible components of a task one by one. GPT excels when provided with specific instructions in a sequence, allowing it to focus on a single objective at a time.

For example, when working on determining the correct Series option that represents cabinet materials and thickness from a PDF document, a sequence of chain-of-calls (or sequential prompts) could simplify the task. This approach reduces cognitive overload for the model, ensuring more accurate and reliable outputs.

Here’s how this decomposition would look:

  • First prompt: Extract all relevant information about cabinet materials and thickness from the provided document.

This step isolates data extraction from interpretation, ensuring GPT focuses solely on identifying and retrieving the relevant details.

  • Second prompt: Based on the extracted information, determine the correct Series option that represents the cabinet materials and thickness using the provided instructions.

Here, the model is now tasked with processing the extracted data, interpreting it according to specific criteria, and generating a decision.

By breaking down the task into these two steps, we reduce the complexity for the model, minimizing the chances of misinterpretation or errors. This method also makes it easier to troubleshoot and refine individual steps, should the need arise.

Example on Claude:

Example GPT:

In general, this strategy of decomposing tasks into smaller, sequential prompts is especially useful in GPT’s chain of thought framework, where the model benefits from clear and distinct steps. When a PDF file is loaded into GPT for processing, the pricing is calculated based on the total word count of the entire PDF, which can become expensive, especially with large documents. But in my opinion, this method is one of the best ways to minimize hallucinations, and make the model’s responses more deterministic by extracting relevant sections before submitting them to another prompt to GPT. This strategy allows for a focused query, ensuring the model processes only the necessary information, leading to more reliable answers. However, in our specific test case, it might be more cost-efficient to develop an algorithm that automatically identifies and extracts only “Part 2 products,” instead of processing the entire document with GPT, making it a cheaper and more targeted solution.

Roleplay Models vs. GPT: Following Instructions and Logical Compliance

Roleplay models like Claude are more rigid in following instructions, making them ideal for tasks requiring strict adherence to step-by-step guidelines. In contrast, GPT models often rely on their own internal logic, which can lead to creative and effective problem-solving but may deviate from precise instructions. This flexibility is beneficial for open-ended tasks but requires additional review when strict accuracy is needed.

From our experience with Claude and GPT on similar prompts, we observe that Claude tends to follow a more structured, step-by-step approach to instructions, while GPT’s outputs often involve more improvisation, making it necessary to monitor tasks for precise compliance.

TESTCASE-3 (2800-C 3mm Frts & TF Color Throughout IPB CARB Compliant Core)

GPT OUTPUT:

Let’s follow the steps outlined in the instructions to determine the correct series option.

### Step 1: Material Requirements

– The specification mentions that the basis for the specification is Stevens Advantage 2800 Designer Series.

### Step 2: Core Material Selection

– The core material should be Industrial Particleboard CARB Compliant unless adhesives with urea formaldehyde are prohibited.

– There is no mention of prohibiting adhesives with urea formaldehyde, so we will use Industrial Particleboard CARB Compliant.

### Step 3: Fronts Material Selection

– The specification states that door and drawer fronts must be matching high pressure laminate (HPL) on both sides.

– Therefore, the fronts material should be High Pressure Laminate (HPL).

### Step 4: Fronts Thickness Determination

– The specification states that door and drawer fronts shall be 3/4″ thick.

– The standard thickness for HPL fronts is 3mm or Square Flat Edge.

### Step 5: Ends Material Selection

– The specification allows for either color matching thermofused or matching HPL for the rest of the exposed/semi-exposed cabinet components.

– Since the basis for the specification is Stevens Advantage 2800 Designer Series, which uses Thermally Fused Laminate (TFL) for color matching, we will select TFL for the ends.

### Step 6: Fronts Edging Selection

– The specification states that door and drawer fronts shall have 3mm radius extrusion banding.

– Therefore, the edging style should be 3mm Radius Edge.

### Step 7: Cabinet Edgebanding Selection

– The specification states that cabinet sides, top, bottom, adjustable shelves, and other interior components shall be edged with (.020″) flat edge extrusion.

– Therefore, the cabinet edgebanding style should be Flat (.020″) PVC Cabinet Edgebanding.

### Step 8: Final Review

– Ensure all selections are compatible with each other.

– Verify that all requirements from the original specification have been addressed.

Based on the above steps, the correct series option is:

**RESULT: 2800-C – 3mm Frts & TF Color Throughout IPB CARB Compliant Core**

Claude OUTPUT: (link here)

Let’s go through this step-by-step:

1. Material Requirements:

   The specification mentions color matching thermofused or high pressure laminate for exposed/semi-exposed surfaces. Door and drawer fronts must be high pressure laminate on both sides.

2. Core Material Selection:

   The specification mentions industrial grade particleboard, but doesn’t explicitly state NAUF. Therefore, we’ll default to Industrial Particleboard CARB Compliant.

3. Fronts Material Selection:

   Door and drawer fronts are specified as high pressure laminate (HPL) on both sides.

4. Fronts Thickness Determination:

   The specification states that door and drawer fronts shall be 3/4″ thick.

5. Ends Material Selection:

   Exposed/semi-exposed surfaces can be either color matching thermofused or high pressure laminate.

6. Fronts Edging Selection:

   The specification states “Edges shall have 3mm radius extrusion banding.”

7. Cabinet Edgebanding Selection:

   Cabinet edges are specified as (.020″) flat edge extrusion.

8. Interior Material Selection:

   Non-exposed interior surfaces shall be thermofused melamine laminate.

9. Final Review:

   Based on the above, we need a series with:

   – Industrial Particleboard CARB Compliant core

   – HPL fronts

   – 3mm radius edge on fronts

   – Either HPL or TFL ends (TFL is more cost-effective)

   – 3/4″ thickness

The series that best matches these requirements is:

RESULT: 1200-C – 3mm HPL Frts & HPL Ends IPB CARB Compliant Core

This series provides:

– High Pressure Laminate fronts and ends

– Industrial Particleboard CARB Compliant core

– 3mm Radius Edge style

– Flat (.020″) PVC Cabinet Edgebanding

While it doesn’t explicitly mention the 3/4″ thickness, this is standard for this type of construction. The HPL ends meet the requirement, though TFL ends would also have been acceptable if available in a matching series.

Here GPT’s outputs involved more improvisation, looking through documentations, it incorrectly determined the HPL for fronts material, but chose 2800-C, because found the sentence “the basis for the specification is Stevens Advantage 2800 Designer Series”. Claude adheres to the instructions more strictly and rarely uses such tricks. It determined that fronts is HPL and draws conclusions based solely on the instructions we gave it.

Conclusions

The differences between prompt engineering for Claude and GPT are nuanced and center around formatting and step-by-step guidance, structuring, rather than the overall content of the prompts. Claude, with its ability to handle larger context windows, excels in tasks requiring the processing of extensive documents. It benefits from the use of separated system and user inputs, as well as XML-based formatting, which can help structure the conversation effectively.

GPT, on the other hand, is more streamlined and performs optimally with a single, well-structured prompt, without the need for separated formatting. Even though GPT and Claude have some structural differences, we found that the same core instructions worked for both models with only minor adjustments. What initially seemed like a significant difference in prompt construction eventually converged, as identical instructions worked effectively for both.

In conclusion, while the formatting and handling of step-by-step instructions may vary between Claude and GPT, the fundamental principles of prompt engineering remain largely consistent across both models. By understanding and leveraging each model’s unique strengths, it is possible to achieve high-quality results regardless of the choice between Claude and GPT.

References

Claude Prompting Guide

GPT Prompting Guide

//Tags

Related articles

//Prompting

Minimizing Randomness in GPT Responses: A Guide to Using Seeds, Top_p, and Monitoring System Fingerprints

//Prompting

Experiments with different LMMs

//Prompting

Prompt Engineering through Structured Instructions and Advanced Techniques