0
$\begingroup$

I have the following compass images:

Image 1: enter image description here Image 2: enter image description here For a human, it is straightforward to infer the directions from these images. For example:

In Image 1, the directions are:

Top → North Bottom → South Left → West Right → East In Image 2, the compass is rotated, so the directions change accordingly:

Top → South (The rest of the directions follow based on the rotation) The Problem I am using Google's Gemini multimodal model, but it struggles to correctly interpret the compass direction from the images. The model's responses are inconsistent and often incorrect.

My Goal I want to achieve accurate direction recognition solely through prompt engineering (without additional training or external processing). I believe this should be possible with well-structured prompts.

Questions How can I craft an effective prompt to guide the model in correctly reading the compass orientation? Are there prompt engineering techniques that could improve accuracy? (e.g., step-by-step reasoning, few-shot examples, role-based prompting, etc.) If prompt engineering alone is insufficient, what alternative strategies can I try? (e.g., pre-processing the image, extracting features programmatically before passing it to the model) I would appreciate any insights or suggestions!

$\endgroup$

    0

    You must log in to answer this question.

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.