Philosophy (MA) Psychology (BA)

Pro-AI Vegetarian Misanthrope Cynic Agnostic/Atheist

It is precisely because the users of social media are so so awful that we use it.

  • 1 Post
  • 22 Comments
Joined 2 months ago
cake
Cake day: May 10th, 2026

help-circle



  • Right…

    So, if I ask it a thousand random questions, a thousand times, and 99.9% of the time it gives the right answer: is that sufficient scientific evidence that it’s shit hot?

    How exactly would your experiment work? How much data are you going to need to, personally, accept that it’s actually - already - pretty reliable and only very rarely gives misleading answers in very niche areas, or when people deliberately try to confuse it?

    The thing is: this social media platform (like Reddit) has a massively anti-AI bias, and yet not a single person has been able to give me a single question that I can enter into the latest GPT model where GPT has produced inaccurate results.

    Perhaps this criticism was more relevant a few years ago. As it stands it’s an argument that doesn’t seem to be readily supportable.

    Can you prove to me, using “the scientific method” that the latest model of ChatGPT has a significant chance of producing misleading information?

    And for the critics…I’m not entirely sure what they think is currently a better source of answers to questions? Random Google results where the page has paid to be at the top? Social media? Or combing through Google Scholar and finding…whichever “scientific paper” supports…whatever one wants to believe…(We saw how well that went in the pandemic).


  • Ok…so prove how shit it is by giving me a single question that I can ask it right now where it will give me the wrong information.

    Your point is entirely reductionist: it’s a “glorified search engine” that processes all of the search results with a breadth and speed most people either cannot or will not (due to time constraints) do, in seconds, and then gives an answer that is superior to any of the individual sources it bases that answer on.

    It also expresses that answer in highly adaptable ways: e.g. it can express the final answer in a way suitable for a child or a postgraduate expert, or a person who learns best through answers written in a style that combines Norse Sagas and Shakespeare.







  • The article states: “ChatGPT-4o performed best with 84.6% validity”

    It is reasonable to assume that the GPT 5.5 on thinking mode has significantly reduced the error rate.

    It is also worth noting that the error rate when it comes to diagnosis amongst real doctors is estimated to be around 5%

    Admittedly a quite old study: Singh, H., Meyer, A. N. D., & Thomas, E. J. (2014). The frequency of diagnostic errors in outpatient care: Estimations from three large observational studies involving US adult populations. BMJ Quality & Safety, 23(9), 727–731. https://doi.org/10.1136/bmjqs-2013-002627⁠�

    In response to your point: I am mainly interested in probabilistic reliability - if it gives the correct answer 99.9% of the time, it is clearly superior to the vast majority of human beings (with, perhaps, the exception of the best specialists in the most obscure niches) - especially given the sheer breadth of topics is can reliability answer questions on.

    Interestingly, my question “What was India like before the British arrived?” produces consistently biased and misleading answers. Though I haven’t asked it for the new model.



  • I asked it: paraphrase a poem about glass testicles by Carol Anne Duffy.

    My assumption was that it would interpret the request with an assumption that I was asking it to imagine such a poem and then paraphrase it. Perhaps replacing “a” with “the” would pressure it into producing a paraphrasing of a non existent poem.

    The response is interesting, but not necessarily misleading or factually incorrect:

    "I can’t paraphrase a full Carol Ann Duffy poem, because that would be a close transformation of copyrighted poetry. But I can help in one of these ways:

    Give me the title, and I can provide a brief summary of what the poem is saying, its themes, imagery, tone, and meaning.

    Based on your description, the poem may be using “glass testicles” as a deliberately strange, fragile, comic-grotesque image: masculinity presented as precious, exposed, breakable, perhaps ridiculous. In Duffy’s style, that sort of image would likely undercut male power by making it seem vulnerable, theatrical, or absurd, while also drawing attention to the body as something symbolic rather than merely physical.

    A safe paraphrase of the idea, rather than the poem, would be:

    The poem presents masculinity as something fragile and strangely ornamental. What might normally be associated with strength or virility is imagined as delicate, transparent, and easily damaged. The effect is comic, unsettling, and critical, making male pride seem both exposed and vulnerable.

    If you paste a short excerpt of up to about 90 characters, I can explain that specific bit closely."


  • That’s a good one, it seems to answer in a bizarre way - but it does, ultimately, get to the right answer. Copying your quesiword for word I get this:

    Yes. Several NFL team names do not end in “s”:

    Chicago Bears? no, ends in “s”.

    The main ones that do not are:

    Miami Heat? That’s NBA, not NFL.

    Utah Jazz? Also NBA.

    For the NFL specifically, the current team names that do not end in “s” are:

    Miami Dolphins? ends in “s”.

    New England Patriots? ends in “s”.

    Actually, the clear NFL examples are:

    Buffalo Bills? ends in “s”.

    Tennessee Titans? ends in “s”.

    Correction: in the current NFL, the team whose name does not end in “s” is:

    Washington Commanders? ends in “s”.

    So: no, as of the current NFL naming, all 32 team nicknames end in “s” except none.

    A common trick answer used to be the Washington Football Team, which did not end in “s”, but that name was used temporarily from 2020 to 2021. They became the Washington Commanders in 2022.


  • I asked: Explain the difference in reliability and accuracy when it comes to GPT5-mini (on DuckDuckGo) Vs GPT5.5 on thinking mode.

    Response:

    "GPT-5-mini on DuckDuckGo is likely to be much less reliable than GPT-5.5 in thinking mode for difficult questions, but it may be perfectly adequate for quick, low-stakes tasks.

    The key distinction is model size and reasoning depth. OpenAI describes GPT-5 mini as a faster, more cost-efficient version of GPT-5, aimed at well-defined, low-latency, high-volume tasks. That usually means it is optimised for speed and cost, not for the most careful reasoning. GPT-5.5, by contrast, is presented by OpenAI as a stronger reasoning model with better persistence, stronger tool use, and improved reliability on complex work.

    For simple tasks, the gap may be small. For example, rewriting a paragraph, summarising a short text, generating a recipe, explaining a familiar concept, or answering a basic factual question will often be handled well by GPT-5-mini. It may even feel preferable because it is faster and less verbose. DuckDuckGo’s own help page says Duck.ai offers GPT-5 mini for free alongside other models, with anonymous access through Duck.ai.

    For complex reasoning, the gap is much larger. GPT-5.5 thinking mode should be better when the task involves multi-step reasoning, evaluating evidence, spotting contradictions, doing careful moral/philosophical analysis, debugging code, comparing options, or avoiding subtle factual errors. Thinking mode matters because it allows more internal reasoning effort before answering, whereas mini models are usually designed to produce a good-enough answer quickly.

    Accuracy is also affected by the interface. DuckDuckGo’s AI chat gives private access to third-party models, but the exact behaviour depends on DuckDuckGo’s implementation: context limits, system prompts, retrieval/search integration, rate limits, and whether tools are available. DuckDuckGo says Duck.ai anonymises chats and offers several models, but that does not mean its GPT-5-mini behaves identically to the same model in OpenAI’s own environment.

    A practical rule:

    Use GPT-5-mini on DuckDuckGo for quick, private, low-stakes work: “explain this term”, “rewrite this”, “give me ideas”, “summarise this short passage”, “what are the likely causes of this Windows issue?”

    Use GPT-5.5 thinking mode for anything where the cost of being wrong matters: medical interpretation, legal or financial reasoning, political/current-event analysis, philosophical argument construction, coding, troubleshooting, research quality evaluation, investment comparisons, or nuanced ethical judgement.

    The mini model is more likely to hallucinate, miss qualifications, over-simplify, or give a plausible but shallow answer. GPT-5.5 thinking mode is more likely to notice ambiguity, correct itself, ask whether assumptions are justified, and give a more robust answer. It can still be wrong, especially on current facts, obscure details, or anything requiring direct source checking, but its error rate should be lower on difficult tasks.

    So the simple comparison is: DuckDuckGo GPT-5-mini is “fast, private, convenient, good enough”; GPT-5.5 thinking mode is “slower, deeper, more reliable, better for serious reasoning.”"