PhilipTheBucket

PhilipTheBucket@piefed.social · 13 hours ago

Yeah, I get it. I don’t think it is necessarily bad research or anything. I just feel like maybe it would have been good to go into it as two papers:

Look at the funny LLM and how far off the rails it goes if you don’t keep it stable and let it kind of “build on itself” over time iteratively and don’t put the right boundaries on
How should we actually wrap up an LLM into a sensible model so that it can pursue an “agent” type of task, what leads it off the rails and what doesn’t, what are some various ideas to keep it grounded and which ones work and don’t work

And yeah obviously they can get confused or output counterfactuals or nonsense as a failure mode, what I meant to say was just that they don’t really do that as a response to an overload / “DDOS” situation specifically. They might do it as a result of too much context or a badly set up framework around them sure.

PhilipTheBucket@piefed.social · edit-2 1 day ago

Yes, I get all that. What I’m saying is that you’re making it pretty clear that they’re getting a corrupt bargain but you’re still going to make them go through all the bullshit.

Like if you show up in Dubai and give $5 million dollars to the right person, I’m pretty sure they don’t make you stand in line and pay your visa processing fee, and then say you can only stay 270 days and you have to scram or else they’ll be in trouble. You just get to come hang out.

Just either be aboveboard, and make them go through all the hassle, or have them give you 5 million dollars and then say “Hey broski, we’ll take care of it.” Just talk to your friends that you nominated to run the IRS and say “Hey this guy’s offshore income is going to be $17 and some change, we’re fine with that, right? He’s a friend of mine.” That kind of thing. Give the IRS a little list of the few hundred $5 million platinum card holders and make sure they have some vague understanding of what to do with the information. I don’t get this weird middle-ground bullshit where they’re paying the bribe, but they’re still getting treated like a pleb and made to go through the bureaucracy.

I mean honestly the way it reads to me is that they kind of want to keep an eye on you, they want to put themselves in the position of deciding whether or not you’re allowed back into the country every 270+90 days. I feel like most people who are in that “not American while having to be at the mercy of US immigration” category, intersected with the “capable enough to have $5 million to throw around” category, are probably going to be able to see through that stuff. That’s what I was saying, more to the point.

It’s just more of Trump’s MO. He’s transactional, like pathologically so to where it’s all he really understands, but the other person never actually gets their end of the transaction. He just gets his and then the other person gets fucked. That really comes through to me reading this, because of how convoluted it sounds even when they’re trying to make it sound like this wonderful thing.

PhilipTheBucket@piefed.social · 1 day ago

For a processing fee and, after DHS vetting, a $5 million contribution, you will have the ability to spend up to 270 days in the United States without being subject to U.S. taxes on non-U.S. income.

Fuckin’… what?

Why do I have to pay a processing fee before giving you 5 million dollars?

Why is it “up to 270 days”? Who is going to be swayed by this perk when they opt for the platinum card instead of the gold? What the fuck is all this? I understand selling residency, a lot of shithole countries do that, it’s usually successful at what they are trying to achieve with it. Why are you immediately walking it back with all these nonsensical asterisks though? Did someone put this page together subversively, because they really want people to look at how un-benevolent the whole package is and start to think twice about who it is exactly that they’re making this Faustian bargain with?

PhilipTheBucket@piefed.social · edit-2 2 days ago

Initial thought: Well… but this is a transparently absurd way to set up an ML system to manage a vending machine. I mean it is a useful data point I guess, but to me it leads to the conclusion “Even though LLMs sound to humans like they know what they’re doing, they does not, don’t just stick the whole situation into the LLM input and expect good decisions and strategies to come out of the output, you have to embed it into a more capable and structured system for any good to come of it.”

Updated thought, after reading a little bit of the paper: Holy Christ on a pancake. Is this architecture what people have been meaning by “AI agents” this whole time I’ve been hearing about them? Yeah this isn’t going to work. What the fuck, of course it goes insane over time. I stand corrected, I guess, this is valid research pointing out the stupidity of basically putting the LLM in the driver’s seat of something even more complicated than the stuff it’s already been shown to fuck up, and hoping that goes okay.

Edit: Final thought, after reading more of the paper: Okay, now I’m back closer to the original reaction. I’ve done stuff like this before, this is not how you do it. Have it output JSON, have some tolerance and retries in the framework code for parsing the JSON, be more careful with the prompts to make sure that it’s set up for success, definitely don’t include all the damn history in the context up to the full wildly-inflated context window to send it off the rails, basically, be a lot more careful with how to set it up than this, and put a lot more limits on how much you are asking of the LLM so that it can actually succeed within the little box you’ve put it in. I am not at all surprised that this setup went off the rails in hilarious fashion (and it really is hilarious, you should read). Anyway that’s what LLMs do. I don’t know if this is because the researchers didn’t know any better, or because they were deliberately setting up the framework around the LLM to produce bad results, or because this stupid approach really is the state of the art right now, but this is not how you do it. I actually am a little bit skeptical about whether you even could set up a framework for a current-generation LLM that would enable to succeed at an objective and pretty frickin’ complicated task like they set it up for here, but regardless, this wasn’t a fair test. If it was meant as a test of “are LLMs capable of AGI all on their own regardless of the setup like humans generally are,” then congratulations, you learned the answer is no. But you could have framed it a little more directly to talk about that being the answer instead of setting up a poorly-designed agent framework to be involved in it.

PhilipTheBucket@piefed.social · 2 days ago

Yeah it’s a bunch of shit. I’m not an expert obviously, just talking out of my ass, but:

Running inference for all the devices in the building to “our dev server” would not have maintained a usable level of response time for any of them, unless he meant to say “the dev cluster” or something and his home wifi glitched right at that moment and made it sound different
LLMs don’t degrade by giving wrong answers, they degrade by stopping producing tokens
Meta already has shown itself to be okay with lying
GUYS JUST USE FUCKING CANNED ANSWERS WITH THE RIGHT SOUNDING VOICE, THIS ISN’T ROCKET SCIENCE, THAT’S HOW YOU DO DEMOS WHEN YOUR SHIT’S NOT DONE YET

PhilipTheBucket@piefed.social · 5 days ago

I think the crisis of Trump is likely to be worse than any crisis in the Western world for the last 50 years. I think the closest analogue is probably the collapse of the USSR. So yes, some of the rich people upped their wealth by orders of magnitude, and honestly you might be right that Zuck might manage to be one of that category, but also some of them lost everything or got thrown out windows, or had to survive in reduced capacity within their new walled fortresses in the horrifying new meta. I feel like more likely is that the MAGA world will remember Facebook censoring their posts about ivermectin, and not feel like Zuck needs to have a seat at the table, no matter how many ass-kissing sessions he shows up at the White House to do.

For example I feel like breaking up Meta and mandating Truth Social and TikTok as the only new sanctioned social media going forward might be one possible outcome. It’s kind of hard to say and I won’t swear that you’re definitely wrong that he might come out way ahead in the end. I’m just saying that this type of crisis is a very different type of crisis.

PhilipTheBucket@piefed.social · 5 days ago

Part of my point is that the damage Trump is going to do will cost them tons more money than if they had helped to prop up the fairly safe and civil society they previously were allowed to exist within, under which secure umbrella they’ve been able to rake in money like leaves in autumn on a wide country estate.

PhilipTheBucket@piefed.social · 5 days ago

Hopefully the idiots who run these tech companies will learn their lesson soon.

“If the thing is free, you’re the product” applies just as much to dinners at the White House as it does to social media apps.

PhilipTheBucket@piefed.social · 7 days ago

Headline: “, but they say it’s worth it”

Article: “Carla Rover once spent 30 minutes sobbing”

PhilipTheBucket@piefed.social · 9 days ago

That oughta fix it

PhilipTheBucket@piefed.social · edit-2 18 days ago

I saw an ad on YouTube for what a good job ICE is doing not that long ago. Kristi Noem was in it.

More disturbingly, I’ve noticed a little scattering of those “police bodycam raw video” channels starting to play up when the criminal involved is an immigrant, what their status was, how ICE was involved, and so on. There’s clearly something at work that’s a little more subtle and sinister than just paid advertising.

PhilipTheBucket@piefed.social · 1 month ago

Yeah, and I’m only supposed to use this bong for smoking tobacco. It said so very very clearly when I bought it so you know they mean it.

PhilipTheBucket@piefed.social · 1 month ago

Making a few digits worth of wrong division way down in the not very significant bits of the answer, is way better than encouraging all your users to use an LLM to generate the answers for their quarterly reports / tax forms / do we have enough food for the winter calculations. The Pentium division fuckup was barely worth fixing unless you were doing some kind of numerical analysis or simulation or something, which is why it slipped past all the testing initially. This is astronomically worse of a fuck-up.

PhilipTheBucket@piefed.social · edit-2 1 month ago

It is a good joke but you missed the chance to call back to one of the better gags from Space Quest 3 by writing “Yes / Yes”

PhilipTheBucket@piefed.social · 1 month ago

“Importance”

PhilipTheBucket@piefed.social · 1 month ago

Honestly, man, I get what you’re saying, but also at some point all that stuff just becomes someone else’s problem.

This is what people forget about the social contract: It goes both ways, it was an agreement for the benefit of all. The old way was that if you had a problem with someone, you showed up at their house with a bat / with some friends. That wasn’t really the way, and so we arrived at this deal where no one had to do that, but then people always start to fuck over other people involved in the system thinking that that “no one will show up at my place with a bat, whatever I do” arrangement is a law of nature. It’s not.

PhilipTheBucket@piefed.social · 1 month ago

Is that really true? I guess I have no reason to doubt it, I just hadn’t heard it before.

PhilipTheBucket@piefed.social · 1 month ago

I feel like at some point it needs to be active response. Phase 1 is a teergrube type of slowness to muck up the crawlers, with warnings in the headers and response body, and then phase 2 is a DDOS in response or maybe just a drone strike and cut out the middleman. Once you’ve actively evading Anubis, fuckin’ game on.

PhilipTheBucket@piefed.social · 1 month ago

I thought I had it worked out, how to sort of strike a balance so I can keep my focus intact and let it be helpful without wasting time constantly correcting its stuff or shying away from actually paying attention to the code. But I think my strategy of “let the LLM generate a bunch of vomit to get things started and then take on the correct and augmentation from a human standpoint” has let the overall designs at a high level get a lot sloppier than they used to be.

Yeah, you might be right, it might be time to just set the stuff aside except for very specialized uses.

PhilipTheBucket@piefed.social · 1 month ago

Certainly possible

I’m also genuinely a little bit alarmed looking back now at my pre-LLM code and seeing the quality vs. the with-LLM code.