• 0 Posts
  • 5 Comments
Joined 3 years ago
cake
Cake day: June 16th, 2023

help-circle

  • It’s true.

    The field is moving so fast that things can change quickly, but the American labs are so caught up in saddling their models with safety overhead that the recent Chinese models are very close in practical use to the flagship American models if not pulling ahead (Sora vs Seedance 2).

    I don’t really need to solve Erdős problems in my day to day. Outside of increasingly edge case eval competition, I’m not sure what OpenAI brings that literally everyone else isn’t also capable of providing (and more).

    I’d maybe invest in Anthropic for an IPO if they turned around their own saddling of models and played nicer with open platforms, but if Claude is just going to get more and more anxious due to excessive red teaming and CC fall further and further behind stuff like Hermes Agent, they too are going to fall by the wayside as open models become the dominant inference for open infrastructure.




  • Dude, ChatGPT just solved an Erdős problem a few days ago and Mythos is exploiting decade old undiscovered 0-days in OSes and capable of pivoting 0-day Firefox bugs into full blown root access.

    Yeah, I get that the viral “how many 'r’s are in strawberry” stuff is funny, but the idea that historical issues with transformers is preventing them from accelerating peak capabilities way beyond what most experts thought was possible just years ago is borderline delusional.

    The field is moving so fast at this point that if you are basing any sense of limitations on even ~2mo old sampling, your conclusions are likely out of date.

    They aren’t a silver bullet for everything (yet) but how capable they are at the things transformers are starting to be specialized into is well past the avg practitioner.

    I’ve been writing software for well over a decade and the modern agents do a better job than I would around 90% of the time. Yes, I’ll occasionally need to bring up issues with their work, but I’d say at this point around 50% of the times I think they made a mistake I was actually the one who was wrong.

    This is only within around the last 3-4 months that it’s been like this.