I wonder if internally the emoji’s are added through a different mechanism that doesn’t pick up the original request. E.g. another LLM thread that has the instruction “Is this apologetic? If it is, answer with exactly one emoji.”
After this emoji has been forcefully added, the LLM thread that got the original request is trying to reason why the emoji would be there, resulting in more apologies and trolling behaviour.
So the LLM makes up a reason for why it used an emoji
This reminds me of how split-brain patients would confabulate the reasons for the actions performed by the other brain:
A patient with split brain is shown a picture of a chicken foot and a snowy field in separate visual fields and asked to choose from a list of words the best association with the pictures. The patient would choose a chicken to associate with the chicken foot and a shovel to associate with the snow; however, when asked to reason why the patient chose the shovel, the response would relate to the chicken (e.g. “the shovel is for cleaning out the chicken coop”).
I’ve even noticed my brain doing this after I did something out of muscle memory or reflex. It happens a lot in the game Rocket League, where I mostly operate on game sense instead of logical thought. Sometimes I do something and then scramble to find a logical reasoning for my actions. But the truth is that the part of the brain that takes the action does so before the rest of the brain has any say in it, so there often isn’t any logical explanation.
It’s more likely that the fine tuning for this model tends to use emojis, particularly when apologizing, and so it just genuinely spits them out from time to time.
But then when the context is about being told not to use them, it falls into a pattern of trying to apologize/explain/rationalize.
It’s a bit like the scorpion and the frog - it’s very hard to get a thing to change its nature, and find tuned LLMs definitely encode a nature. Which then leads to things like Bing going into emoji loops or ChatGPT being ‘lazy’ by saying it can’t convert a spreadsheet because it’s just a LLM (which a LLM should be able to do).
The neat thing here is the way that continued token generation ends up modeling a stream of consciousness blabbering. It keeps not stopping because it thinks it needs to apologize given the context, but because of the fine tuning can’t apologize because it will use an emoji when apologizing which then it sees and needs to apologize for until it flips to trying to explain it as an intended joke (almost modeling the way humans confabulate when their brain doesn’t know why it did something and subconsciously comes up with a BS explanation for it). But then it still can’t stop with the emojis.
It’s a good reminder that LLMs often obey their embedded nature more than they do in context rules or instructions.
I wonder if internally the emoji’s are added through a different mechanism that doesn’t pick up the original request. E.g. another LLM thread that has the instruction “Is this apologetic? If it is, answer with exactly one emoji.” After this emoji has been forcefully added, the LLM thread that got the original request is trying to reason why the emoji would be there, resulting in more apologies and trolling behaviour.
So the LLM makes up a reason for why it used an emoji
This reminds me of how split-brain patients would confabulate the reasons for the actions performed by the other brain:
https://en.wikipedia.org/wiki/Split-brain
I’ve even noticed my brain doing this after I did something out of muscle memory or reflex. It happens a lot in the game Rocket League, where I mostly operate on game sense instead of logical thought. Sometimes I do something and then scramble to find a logical reasoning for my actions. But the truth is that the part of the brain that takes the action does so before the rest of the brain has any say in it, so there often isn’t any logical explanation.
You might share a split brain with me. I had this exact thought, but decided to leave it out of my comment.
Can recommend the video from cgpgrey on it to anyone: https://www.youtube.com/watch?v=wfYbgdo8e-8
It’s more likely that the fine tuning for this model tends to use emojis, particularly when apologizing, and so it just genuinely spits them out from time to time.
But then when the context is about being told not to use them, it falls into a pattern of trying to apologize/explain/rationalize.
It’s a bit like the scorpion and the frog - it’s very hard to get a thing to change its nature, and find tuned LLMs definitely encode a nature. Which then leads to things like Bing going into emoji loops or ChatGPT being ‘lazy’ by saying it can’t convert a spreadsheet because it’s just a LLM (which a LLM should be able to do).
The neat thing here is the way that continued token generation ends up modeling a stream of consciousness blabbering. It keeps not stopping because it thinks it needs to apologize given the context, but because of the fine tuning can’t apologize because it will use an emoji when apologizing which then it sees and needs to apologize for until it flips to trying to explain it as an intended joke (almost modeling the way humans confabulate when their brain doesn’t know why it did something and subconsciously comes up with a BS explanation for it). But then it still can’t stop with the emojis.
It’s a good reminder that LLMs often obey their embedded nature more than they do in context rules or instructions.
There was often a small delay before the emoji appeared, so this wouldn’t surprise me.
That would be like having some sort of memory disorder, RIP
Just ai Tourrette’s.