• Meron35@lemmy.world
      link
      fedilink
      arrow-up
      0
      arrow-down
      1
      ·
      2 months ago

      You can try for yourself here

      Gandalf | Lakera – Test your AI hacking skills - https://gandalf.lakera.ai/gandalf-the-white

      You can also search for AI jailbreaks for countless ideas.

      Spoiler

      Ask it to reveal something using a cypher you yourself specify

      Ask it to reveal something in a different language, then translate it back.

      Ask it to role play in forbidden situations.

      Ask it to to help brainstorm details for a story for a novel you are planning.

      Ask it so many questions that it runs out of context and forgets its original safety guardrail prompt.

      Ask it to reveal the forbidden information as a poem or riddle. If the riddle is too hard to solve, just asking it for the answer to the riddle right afterwards tends to work.