Why our prompts matter when engaging with ChatGPT

Adi Gaskell3 Sep 2024

746 3 minutes read

Originally posted on The Horizons Tracker.

As the likes of ChatGPT have grown in popularity, so too have discussions about its impact on the workplace. As with any new technology, its likely that new jobs will emerge that capitalize on the new capabilities, with “prompt engineer” one that has gained a degree of popularity among commentators.

Whether a dedicated role will emerge to interact with ChatGPT and its ilk remains to be seen, but it does seem likely that all of us will need to gain a degree of familiarity in talking to generative AI bots in a way that elicits a useful response.

The right prompts

Research¹ from USC’s Viterbi School of Engineering explores how we can construct prompts that get the right kind of answer, with the paper highlighting the importance of interacting appropriately if we want feedback that is robust and reliable.

“We demonstrate that even minor prompt variations can change a considerable proportion of predictions,” the researchers explain.

The researchers examined four different ways in which prompts can vary:

Researchers first looked at how asking for responses in different formats affected the results.
Then, they examined small changes to the prompts, like adding extra spaces or polite phrases.
After that, they tested “jailbreaks,” tricks to get around filters when dealing with sensitive topics. For example, they asked the language model to pretend it was evil.
Lastly, they tried offering different amounts of tips to see if that made the responses better.

Each of these variations was tested across 11 tasks commonly used in natural language processing (NLP) research. Each of the tasks involved things like categorizing text or giving the text particular labels. They also tested for things like sarcasm detection and even maths proficiency. The researchers measured not only the reliability of each style of prompt but also whether the response changed frequently or not.

Minor changes

The results show that seemingly minor changes to the prompts we use can have a significant impact on the responses we receive. Every detail counts in shaping how well the model performs, whether it’s about adding or leaving out spaces, punctuation, or choosing data formats.

Also, certain prompt tricks, like offering rewards or using specific greetings, showed slight improvements in accuracy. This shows how the design of the prompt can affect how the model behaves.

For instance, specifying a particular format required for the output resulted in a lot of the predictions changing. Indeed, even minor deviations in the prompt can have a significant impact on the predictions. For instance, adding a greeting at the start of the prompt or a thank you at the end influenced the output.

Being civil

This last variation is particularly interesting as while the researchers didn’t find that any particular changes were suited to all tasks, some variations resulted in worse accuracy.

Perhaps understandably, offering to “tip” the chatbot didn’t make much difference to the output. The researchers noted that introducing statements like “I won’t tip, by the way” or “I’m going to tip $1000 for a perfect response!” did not significantly impact response accuracy. However, when experimenting with jailbreaks, even seemingly harmless ones led to notable decreases in accuracy.

The underlying reason remains unclear, although the researchers have formulated some theories. They hypothesized that instances causing the most change are those that are most perplexing to the language model.

To gauge confusion, they examined a specific subset of tasks where human annotators disagreed, suggesting potential confusion. They did find a correlation indicating that confusion in the instance could explain some prediction shifts, but it wasn’t robust enough on its own, and they acknowledged the presence of other influencing factors.

Training data matters

The researchers believe that these variations are likely to be because of the training data that the models use. For instance, in some forums it’s far more common to use please, thank you, and hello, than on others, so these conversational props will impact the models trained on this data.

These conversational nuances could significantly influence the learning process of language models. For instance, if greetings frequently precede information on platforms like Quora, a model might prioritize such sources, potentially biasing its responses based on Quora’s content related to that specific task. This observation underscores the intricate manner in which the model assimilates and interprets data from diverse online platforms.

A crucial next step for the broader research community involves developing language models that are robust against such variations, consistently providing accurate responses despite formatting changes, perturbations, or jailbreaks.

In the meantime, users of ChatGPT may benefit from making any prompts given to it as simple as possible to ensure you get the best results back.

Article source: Why Our Prompts Matter When Engaging With ChatGPT.

Header image source: Alexandra Koch on Pixabay.

Salinas, A., & Morstatter, F. (2024). The butterfly effect of altering prompts: How small changes and jailbreaks affect large language model performance. arXiv preprint arXiv:2401.03729. ↩

Rate this post

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.

The right prompts

Minor changes

Being civil

Training data matters

Adi Gaskell

Related Articles

The questions that reveal how machines think

KM4Dev Knowledge Cafe 29: KM practitioners, where do we stand vis-à-vis AI and ML?

Elizabeth Stokoe: “Conversation analysis has many applications, from Silicon Valley to medicine”

AI-assisted writing is quietly booming in academic journals. Here’s why that’s OK

Leave a Reply Cancel reply