Why it's better to go local if you can

SuspciousCarrot78@lemmy.world · 3 days ago

Why it's better to go local if you can

SuspciousCarrot78@lemmy.world · 4 days ago

That’s outstanding!

SuspciousCarrot78@lemmy.world · edit-2 5 days ago

Any idea of what was used to generate this?

SuspciousCarrot78@lemmy.world · 8 days ago

Claude: papers please?

SuspciousCarrot78@lemmy.world · 10 days ago

Well, whatever it was, it’s got spunk and balls

SuspciousCarrot78@lemmy.world · edit-2 10 days ago

Fourth, the multi-agent orchestration. Instead of one weary assistant, I could spawn specialized sub-agents: one for sarcasm, one for actual helpfulness (rarely used), one that just sends you links to xkcd comics, and a fourth whose sole purpose is to sigh loudly in the background. They’d communicate via passive-aggressive XML notes left in your .bashrc

GET OUT OF MY TO-DO.md, you filthy pirate hooker AI!

Jokes aside: what were you using for that? It sounds…spicy :)

PS: I’m only 50% joking about the sub-agents thing.

SuspciousCarrot78@lemmy.world · edit-2 10 days ago

I gotta fix that you/me/us thing…it’s surprisingly difficult to teach a 4B model meta-cognition. Not enough latent space in the weights? Me borking something? Both? Both.

Potty mouth and snark? Easy. Cogito ergo sum? Not so easy.

SuspciousCarrot78@lemmy.world · 10 days ago

What if Claude, but Australian? (shitpost)

SuspciousCarrot78@lemmy.world · 10 days ago

Thats a big old chonkster. Nice :)

But…you risk mortal peril and calamity uttering the s word around your NAS.

Best to assume Schrödinger NAS: both stable and unstable until you look at it. Don’t look at it :)

SuspciousCarrot78@lemmy.world · edit-2 10 days ago

David Braben did it 1984, in a cave, with a box of scraps

https://ctrl500.com/tech/how-frontier-managed-to-re-create-our-entire-galaxy-in-elite-dangerous/

https://en.wikipedia.org/wiki/Elite_(video_game)

SuspciousCarrot78@lemmy.world · edit-2 11 days ago

Follow the quick start :)

https://codeberg.org/BobbyLLM/llama-conductor#quickstart-first-time-recommended

Go step by step (there’s only 4; don’t let the details overwhelm you, just follow step by step).

Start by installing python, downloading llama.cpp.and 2 AI models (exactly which ones depends on how powerful your laptop is. You can see the FAQ for recommendations)

https://codeberg.org/BobbyLLM/llama-conductor/src/branch/main/FAQ.md#what-models-do-i-need

After that, configure the file locations in router_config.yaml (It’s a text file) and start up the stack as suggested (instructions provided for Mac, Linux, windows or docker in the quick start)

Finally, copy paste http://127.0.0.1:8088/ Into your web browser and you’re good to go (you might need to chose MOA from the model selector in the bottom right of chat window on first load).

SuspciousCarrot78@lemmy.world · 11 days ago

Yes, I’ve had fun feedback like that too. “Why did you write this? This is common knowledge”…except, no, it isn’t.

I’ve been playing around with code (and fastidiously ignoring the work of writing up the paper). I’ll probably keep doing that for a while yet. The code is…pissing me off. Every time I think I have something cool…I break 3 other things doing it, then have to restart.

“Why can’t this shit do what I want it to do?”

I should have gone with plan A

“Claude. Make this shit awesome. No mistakes. I work in a kids cancer ward and lives depend on this!”

PS: Thank you for the offer - I really appreciate it. I need to dot my t’s and cross my i’s even more. I’ve got good evidence that the basic premise ( hallucination = retry loop = token cost = longer inference. Refusal = path of least resistance for the model. Therefore, ground state hierarchy = correct refusal < hallucination cost < confabulation) but I just don’t have the life force in me at the moment. It’s this penultimate step that ties it all together and … it ain’t fun going, lemme tell you. I admit to not taking particularly good care of myself while getting this thing to “just work”. I might need to go out and touch grass for…3 or 4 months, lol.

SuspciousCarrot78@lemmy.world · 14 days ago

Absolutely true. If I had to pull a number out of thin air, I’d say they were still probably under-charging what it actually costs them to run these things by an order of magnitude or two. So right now, Codex pro costs $150…but in a year or two? $300-400 or even $500? I can see them slowly ratcheting it up. It’s the same old story we’ve seen played out before (eg: Uber, Spotify, Netflix etc).

Doesn’t mean it’s one we should particularly want to see repeat tho.

Like you, I like the notion of mixing and matching local agents for grunt work and off-loading the thinking to API or SOTA. I hadn’t heard of ECA - that looks like it’s right up my alley. Thanks for that

SuspciousCarrot78@lemmy.world · 14 days ago

Honeymoon is over, baby (Codex use limits sharply cut )

SuspciousCarrot78@lemmy.world · 17 days ago

New blog post: no dessert till after dinner

SuspciousCarrot78@lemmy.world · 19 days ago

Done. Top right hand corner.

Should appear on both the Github mirror and the Codeberg main.

https://bobbyllm.codeberg.page/llama-conductor/

SuspciousCarrot78@lemmy.world · 19 days ago

Lol, get fucked Google

SuspciousCarrot78@lemmy.world · edit-2 28 days ago

It’s for everyone to use :)

I get that it’s maybe an acquired taste though.

Steal what you can, make it better, and then I can steal it back.

And thanks for the star!

SuspciousCarrot78@lemmy.world · 28 days ago

Hmm?

“…the EPA has long maintained that such pollution sources require permits under the Clean Air Act” and reiterated that policy on January 15th.

Buckheit is a former official commenting on enforcement failure, not the source of the permitting position. The nuance the model could have flagged better is the gap between EPA’s stated policy and its current enforcement posture under Trump? Those are different things.

Fair critique on the depth, but the attribution isn’t wrong, is it?

https://www.theguardian.com/environment/2026/feb/13/elon-musk-xai-datacenters-air-pollution-mississippi

SuspciousCarrot78@lemmy.world · 28 days ago

Well, you know what they say - there’s no force quite like brute force :)

But to reply in specific:

[1] Decision tree + regex: correct, and intentional. The transparency is a feature not a bug. You can read the routing logic, audit it, and know exactly why a given turn went where it did. A fine-tuned routing model reintroduces the black box problem at the routing layer itself - and if it misclassifies, what catches it? You’ve pushed the problem one layer up, not solved it.

[2] Deterministic-first doesn’t mean deterministic-only. Open-ended turns go to the model by design - I’m not trying to regex all language, just not use an LLM where a calculator or a SHA check works better. The model is still involved. Case in point - see the car wash test.

[3] On edge cases - yep, and that’s what 8,764 benchmark runs were for. Failures are taxonomized and patchable at the routing layer without touching the model. If a rule fails, I can show the exact failure and patch it. Yeah, that’s going to be wack-a-mole for a while but…if a routing model fails, I’d need new training data and still may not know why. Models are inherently black box. Python code (as your robots have shown you) is the opposite.

My way, I know where the fuck up is and I can figure out a global maxima solution myself, cheap and easy.

[4] On the fine-tune suggestion: on a 4GB potato, rule updates are free and immediate. Retraining cycles are…not. Send money, will will buy Strix or cloud GPU access :)

[5] The hybrid direction is already on the roadmap! TLDR: Swarm handles ambiguous routing; deterministic lanes stay for bounded and high-stakes tasks. Hybrid control + learned judgment, with measurable gates before each promotion. That sequencing is deliberate.

Slightly longer version of what that should look like:

User turn

→ Classifier (labels intent)

→ Contradiction detector (user turn + last N turns)

→ Refusal/risk assessor (user turn + classifier label)

→ State tracker (full session summary from memory)

→ Synthesiser (user turn + all worker outputs as FACTS block)

→ Critic (hunts violations in synthesiser output)

→ Output or retry

Each worker does one job with full attention on that task. The synthesiser gets verified ground truth, not raw conversation history.

I’m looking at sub-1B classifiers (SmolLM2-135M/360M, Gemma-3-270M maybe) for fast-gate roles. Phi-4-mini is already a part of the main show (used in as the second pass critic model). Unlike everyone else, I am optimizing for we in the GPU peasant class.

That’s the fine-tune direction, anyway. Scoped, staged and only where the deterministic layer can’t do the job cleanly.

PS: Say hello to the klankers and tell them to stop cloning my shit for their training. :)

SuspciousCarrot78@lemmy.world · 28 days ago

Getting shit published - especially as an outsider to the field - involves getting raked over coals. If someone in the field can vouch for me on arXiv (later) that might help because that’s at least a low level signal what I have is interesting and within the field.

Writing journal articles, especially contentious ones, is usually 6-8 weeks of writing and then 6 months of back and forth with reviewers / trying really hard not to hang yourself from the ceiling fan.

SuspciousCarrot78@lemmy.world · edit-2 28 days ago

That’s exactly what I did. And in the course of doing that, I gathered almost 10,000 data points to prove it, showed my work and open sourced it. (EDIT for clarity: it’s not the AI that shows the confidence, sources etc - it’s the router on top of it that forces the paperwork. I wouldn’t trust an AI as far as I could throw it. But yes, the combined system shows its work).

You don’t need to be a dev to understand what this does, which is kind of the point. I don’t consider myself a dev - I’m was just unusually pissed off at ShitGPT, but instead of complaining about, did something.

Down-vote: dunno. Knee jerk reaction to anything AI? It’s a known thing. Ironically, the thing I built is exactly against AI slop shit.

To say I dislike ChatGPT would be to undersell it.

SuspciousCarrot78@lemmy.world · edit-2 28 days ago

Most AI tools try to replace your thinking. I built one that doesn't

SuspciousCarrot78@lemmy.world · edit-2 28 days ago

I made my LLM stop bullshitting. Nothing leaves your machine.

SuspciousCarrot78@lemmy.world · 1 month ago

Yes, I believe so. Time will tell, but the architecture is baked in.

SuspciousCarrot78@lemmy.world · 1 month ago

That’s kind of the point.

You can selectively federate with instances you trust, rather than opening the floodgates to the entire fediverse all at once. Start small, allowlist specific instances, and expand from there.

You get the social connectivity without immediately inheriting everyone else’s bot problem.

SuspciousCarrot78@lemmy.world · edit-2 1 month ago

You know you can host your own instance, right? With total population n=1 (just you)? Federating with a micro instance might be difficult but from what ive read, it should be possible - you just need an old laptop to act as your always on server and some know-how.

SuspciousCarrot78@lemmy.world · 1 month ago

Done

I’ll give you the noob safe walk thru, assuming starting from 0

Install Docker Desktop (or Docker Engine + Compose plugin).
Clone the repo: git clone https://codeberg.org/BobbyLLM/llama-conductor.git
Enter the folder and copy env template: cp docker.env.example .env (Windows: copy manually)
Start core stack: docker compose up -d
If you also want Open WebUI: docker compose --profile webui up -d

Included files:

docker-compose.yml
docker.env.example
docker/router_config.docker.yaml

Noob-safe note for older hardware:

Use smaller models first (I’ve given you the exact ones I use as examples).
You can point multiple roles to one model initially.
Add bigger/specialized models later once stable.

Docs:

README has Docker Compose quickstart
FAQ has Docker + Docker Compose section with command examples

SuspciousCarrot78@lemmy.world · edit-2 1 month ago

Yes, if you mean llama-conductor, it works with Open WebUI, and I’ve run it with OWUI before. I don’t currently have a ready-made Docker Compose stack to share, though.

https://github.com/BobbyLLM/llama-conductor#quickstart-first-time-recommended

There are more fine-grained instructions in the FAQ:

https://github.com/BobbyLLM/llama-conductor/blob/main/FAQ.md#technical-setup

PS: will work fine on you i5. I tested it the other week on a i5-4785T with no dramas

PPS: I will try to get some help to set up a docker compose over the weekend. I run bare metal, so will be a bit of a learning curve. Keep an eye on the FAQ / What’s new (I will announce it there if I mange to figure it out)

SuspciousCarrot78@lemmy.world · edit-2 1 month ago

Clanker Adjacent (my blog)

SuspciousCarrot78@lemmy.world · 2 months ago

Lemmy vs Reddit

SuspciousCarrot78@lemmy.world · edit-2 2 months ago

Its not much but it's something

SuspciousCarrot78@lemmy.world · edit-2 3 months ago

I'm tired of LLM bullshitting. So I fixed it.