I have the following compass images:
Image 1: Image 2:
For a human, it is straightforward to infer the directions from these images. For example:
In Image 1, the directions are:
Top → North Bottom → South Left → West Right → East In Image 2, the compass is rotated, so the directions change accordingly:
Top → South (The rest of the directions follow based on the rotation) The Problem I am using Google's Gemini multimodal model, but it struggles to correctly interpret the compass direction from the images. The model's responses are inconsistent and often incorrect.
My Goal I want to achieve accurate direction recognition solely through prompt engineering (without additional training or external processing). I believe this should be possible with well-structured prompts.
Questions How can I craft an effective prompt to guide the model in correctly reading the compass orientation? Are there prompt engineering techniques that could improve accuracy? (e.g., step-by-step reasoning, few-shot examples, role-based prompting, etc.) If prompt engineering alone is insufficient, what alternative strategies can I try? (e.g., pre-processing the image, extracting features programmatically before passing it to the model) I would appreciate any insights or suggestions!