I tested it out just now. It seems to be much better than gpt 3.5, but just a little worse than gpt 4.
I tested this: “Explain the plot of Cinderella in a sentence where each word has to begin with the next letter in the alphabet from A to Z, without repeating any letters.”
GPT-4 is able to always get it right first try. It gave me: “A beautiful Cinderella, determined, endures, finds glass heel, invites joy; king’s love magically nurtures opulence, prince quests, restores slipper, transformation unveils, validates wistful xerophyte yearning zealously."
Claude 2.0 took 4 tries to get it right. It gave me these: “A beautiful young girl called Cinderella did evil for her fair godmother’s horrid instructions just kicking laundry maidens’ nice outfits purportedly quelling ridiculous step-sisters’ terrible underhanded villainy with x-rays yielding zilch.” and then “Awaited ball causes Downtrodden Ella’s fairy godmother’s help. I’ll just knock loud mean old prune’s quest, righteously sabotaging two ugly vicious women. Xanthippe yells, “Zounds!”” and then “Abandoned by cruel destiny, Ella finds godmother’s hidden invitation juxtaposing king’s lavish ball. Magic occurs, placing radiant slipper to unveil victorious wife. Xerxes yearns, “Zounds, she’s found!”” and then finally “Awaiting ball, Cinderella dreams every Friday godmother helps. Inside jousting knights leer maidens’ nice outfits. Prince quietly rescues sweetheart, though ugly vengeful wife x-rays yellow zebra.”
After that, i tried this prompt: “Andrew is free from 11 am to 3 pm, Joanne is free from noon to 2 pm and then 3:30 pm to 5 pm. Hannah is available at noon for half an hour, and then 4 pm to 6 pm. What are some options for start times for a 30 minute meeting for Ca Given the availability of Andrew, Joanne, and Hannah, we need to find overlapping times for a 30-minute meeting.
After cross-referencing their schedules, the potential start times that accommodate all three schedules are:
12:00 pm to 12:30 pm: This slot is within Andrew’s free time (11 am - 3 pm), Joanne’s free time (12 pm - 2 pm), and Hannah’s first free half hour at noon. There is no other slot that fits within all three schedules, due to Joanne’s break from 2 pm to 3:30 pm and Hannah’s availability starting again only at 4 pm, when Andrew is no longer available. Therefore, there is just one option for a 30-minute meeting that works for all three of them.”
Claude 2.0 responded with: “Okay, based on the availability you provided, here are some options for 30 minute meeting start times that work for Andrew, Hannah and Joanne: 11:30 am - This works because Andrew is free starting at 11am, Hannah is available at noon for half an hour, and Joanne is free from noon to 2pm. 3:30 pm - This works because Andrew is free until 3pm, Hannah is free from 4pm to 6pm, and Joanne is free from 3:30pm to 5pm. 4:30 pm - This also works for the same reasons as 3:30pm. Andrew is free until 3pm, Hannah is free from 4pm to 6pm, and Joanne from 3:30pm to 5pm. 5 pm - This could work as well since Andrew is free until 3pm, Hannah until 6pm, and Joanne until 5pm. It’s pushing the end of Joanne’s availability though. So in summary, the best options are 11:30 am, 3:30 pm or 4:30 pm to ensure there is overlap with all three people’s availability. Let me know if you need any clarification or have additional constraints!”
I told Claude it is incorrect, and it proceeded to correct itself and tell me there is NO available time overlap. After several more attempts, it eventually gave up and asked me for the correct answer.
So although Claude 2.0 is much better than GPT-3.5, it is still worse than GPT-4 for complex reasoning.
EDIT: I just realized that Claude has a 100k token context window. This is SIGNIFICANTLY more than GPT-4’s normal 8k context and 32k api context. This immediately makes it much more useful, as it can parse ~75,000 words. I also have not tested Claude’s creative, programming, mathematic, etc., skills. But regardless of how that compares to GPT’s, it doesn’t matter because of the sheer quantity of tokens that can be parsed by Claude.
Woah, this is huge. Claude 1 was already more useful and coherent than ChatGPT (3.5, not 4). The big point was that it wasn’t available to everyone. This could really steal some marketshare from OpenAI if things go well.
What market though? These AI chatbots seem like money sinks for a potential development into something useful in the distant future.
The market of people buying APIs for popular chatbots. Right now OpenAI’s GPT is overwhelmingly the most popular option and pretty expensive. You constantly see a lot of “powered by GPT” features on products now, but hopefully Claude can provide some better competition.
Fair, I don’t see any real use for these right now. Chatbots just seem like a gimmick that can help people cheat in school (not that I give a fuck about that). Probably just the online circles we run in, what sorta things are powered by GPT? Customer support and stuff?
You’ve got stuff like helping assistants on Duolingo and Khan Academy powered by GPT-4, you’ve got stuff like tools for automatic search engine optimization, tools for automatic code generation, tools for grammar spell checking, tools for translation, and probably a lot more I’m unaware of.
There’s quite a lot of people depending on GPT right now.
Just tried it out, withe some questions about ceramic firing in a electric kiln. Seems to have similar accuracy to chatgpt, maybe closer to gpt4.
It’s not clear when using it what version it’s on, so this may have been Claude 1, I’m unsure where to check.
I asked it directly. It didn’t know and stated it has never had version numbers. I pointed out that news articles differentiate 1.0 and 2.0. It agreed but didn’t say what it was. I asked it again directly, it said it was 2.0.
Hard to believe something that feels like it’s lying to you all the time. I asked it about a topic that I’m in and have a website about, it told me the website was hypothetical. It got it wrong twice, even after it agreed it was wrong, and then told me the wrong thing again.
Can you ask perplexity.ai your question about ceramic firing and see what you get? Perplexity offers prompts to move you along towards your answer.
Hard to believe something that feels like it’s lying to you all the time. I asked it about a topic that I’m in and have a website about, it told me the website was hypothetical. It got it wrong twice, even after it agreed it was wrong, and then told me the wrong thing again.
Is this what they consider hallucinations?