How to Teach a Multimodal Model to Recognize Compass Directions Using Prompt Engineering?

Asked2 months ago

Viewed 18 times

I have the following compass images:

Image 1: Image 2: For a human, it is straightforward to infer the directions from these images. For example:

In Image 1, the directions are:

Top → North Bottom → South Left → West Right → East In Image 2, the compass is rotated, so the directions change accordingly:

Top → South (The rest of the directions follow based on the rotation) The Problem I am using Google's Gemini multimodal model, but it struggles to correctly interpret the compass direction from the images. The model's responses are inconsistent and often incorrect.

My Goal I want to achieve accurate direction recognition solely through prompt engineering (without additional training or external processing). I believe this should be possible with well-structured prompts.

Questions How can I craft an effective prompt to guide the model in correctly reading the compass orientation? Are there prompt engineering techniques that could improve accuracy? (e.g., step-by-step reasoning, few-shot examples, role-based prompting, etc.) If prompt engineering alone is insufficient, what alternative strategies can I try? (e.g., pre-processing the image, extracting features programmatically before passing it to the model) I would appreciate any insights or suggestions!

asked Feb 10 at 2:52

GGT

101

Add a comment |

0 You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Stack Exchange Network

How to Teach a Multimodal Model to Recognize Compass Directions Using Prompt Engineering?

0

You must log in to answer this question.

Hot Network Questions

How to Teach a Multimodal Model to Recognize Compass Directions Using Prompt Engineering?

0

You must log in to answer this question.

Related

Hot Network Questions