• 3 Posts
  • 26 Comments
Joined 2 years ago
cake
Cake day: July 13th, 2023

help-circle
  • It would have to be more than just river crossings, yeah.

    Although I’m also dubious that their LLM is good enough for universal river crossing puzzle solving using a tool. It’s not that simple, the constraints have to be translated into the format that the tool understands, and the answer translated back. I got told that o3 solves my river crossing variant but the chat log they gave had incorrect code being run and then a correct answer magically appearing, so I think it wasn’t anything quite as general as that.




  • Further support for the memorization claim: I posted examples of novel river crossing puzzles where LLMs completely fail (on this forum).

    Note that Apple’s actors / agents river crossing is a well known “jealous husbands” variant, which you can ask a chatbot to explain to you. It gladly explains, even as it can’t follow its own explanation (since of course it isn’t its own explanation but a plagiarized one, even if changes words).

    edit: https://awful.systems/post/4027490 and earlier https://awful.systems/post/1769506

    I think what I need to do is to write up a bunch of puzzles, assign them randomly to 2 sets, and test & post one set, while holding back on the second set (not even testing it on any online chatbots). Then in a year or two see how much the set that’s public improves, vs the one that’s held back.






  • I think it could work as a minor gimmick, like terminal hacking minigame in fallout. You have to convince the LLM to tell you the password, or you get to talk to a demented robot whose brain was fried by radiation exposure, or the like. Relatively inconsequential stuff like being able to talk your way through or just shoot your way through.

    Unfortunately this shit is too slow and too huge to embed a local copy of, into a game. You need a lot of hardware compatibility. And running it in the cloud would cost too much.




  • When confronted with a problem like “your search engine imagined a case and cited it”, the next step is to wonder what else it might be making up, not to just quickly slap a bit of tape over the obvious immediate problem and declare everything to be great.

    Exactly. Even if you ensure the cited cases or articles are real it will misrepresent what said articles say.

    Fundamentally it is just blah blah blah ing until the point comes when a citation would be likely to appear, then it blah blah blahs the citation based on the preceding text that it just made up. It plain should not be producing real citations. That it can produce real citations is deeply at odds with it being able to pretend at reasoning, for example.

    Ensuring the citation is real, RAG-ing the articles in there, having AI rewrite drafts, none of these hacks do anything to address any of the underlying problems.






  • Yeah, exactly. There’s no trick to it at all, unlike the original puzzle.

    I also tested OpenAI’s offerings a few months back with similarly nonsensical results: https://awful.systems/post/1769506

    All-vegetables no duck variant is solved correctly now, but I doubt it is due to improved reasoning as such, I think they may have augmented the training data with some variants of the river crossing. The river crossing is one of the top most known puzzles, and various people have been posting hilarious bot failures with variants of it. So it wouldn’t be unexpected that their training data augmentation has river crossing variants.

    Of course, there’s very many ways in which the puzzle can be modified, and their augmentation would only cover obvious stuff like variation on what items can be left with what items or spots on the boat.



  • Full time AI grift jobs would of course be forever closed to any AI whistleblower. There’s still a plenty of other jobs.

    I did participate in the hiring process, I can tell you that at your typical huge corporation the recruiter / HR are too inept to notice that you are a whistleblower, and don’t give a shit anyway. And of the rank and file who will actually google you, plenty enough people dislike AI.

    At the rank and file level, the only folks who actually give a shit who you are are people who will have to work with you. Not the background check provider, not the recruiter.





  • Using tools from physics to create something that is popular but unrelated to physics is enough for the nobel prize in physics?

    If only, it’s not even that! Neither Boltzmann machines nor Hopfield networks led to anything used in the modern spam and deepfake generating AI, nor in image recognition AI, or the like. This is the kind of stuff that struggles to get above 60% accuracy on MNIST (hand written digits).

    Hinton went on to do some different stuff based on backpropagation and gradient descent, on newer computers than those who came up with it long before him, and so he got Turing Award for that, and it’s a wee bit controversial because of the whole “people doing it before, but on worse computers, and so they didn’t get any award” thing, but at least it is for work that is on the path leading to modern AI and not for work that is part of the vast list of things that just didn’t work and it’s extremely hard to explain why you would even think they would work in the first place.