• 13 Posts
  • 27 Comments
Joined 8 months ago
cake
Cake day: August 27th, 2025

help-circle





  • Fourth, the multi-agent orchestration. Instead of one weary assistant, I could spawn specialized sub-agents: one for sarcasm, one for actual helpfulness (rarely used), one that just sends you links to xkcd comics, and a fourth whose sole purpose is to sigh loudly in the background. They’d communicate via passive-aggressive XML notes left in your .bashrc

    GET OUT OF MY TO-DO.md, you filthy pirate hooker AI!

    Jokes aside: what were you using for that? It sounds…spicy :)

    PS: I’m only 50% joking about the sub-agents thing.







  • Yes, I’ve had fun feedback like that too. “Why did you write this? This is common knowledge”…except, no, it isn’t.

    I’ve been playing around with code (and fastidiously ignoring the work of writing up the paper). I’ll probably keep doing that for a while yet. The code is…pissing me off. Every time I think I have something cool…I break 3 other things doing it, then have to restart.

    “Why can’t this shit do what I want it to do?”

    I should have gone with plan A

    “Claude. Make this shit awesome. No mistakes. I work in a kids cancer ward and lives depend on this!”

    PS: Thank you for the offer - I really appreciate it. I need to dot my t’s and cross my i’s even more. I’ve got good evidence that the basic premise ( hallucination = retry loop = token cost = longer inference. Refusal = path of least resistance for the model. Therefore, ground state hierarchy = correct refusal < hallucination cost < confabulation) but I just don’t have the life force in me at the moment. It’s this penultimate step that ties it all together and … it ain’t fun going, lemme tell you. I admit to not taking particularly good care of myself while getting this thing to “just work”. I might need to go out and touch grass for…3 or 4 months, lol.


  • Absolutely true. If I had to pull a number out of thin air, I’d say they were still probably under-charging what it actually costs them to run these things by an order of magnitude or two. So right now, Codex pro costs $150…but in a year or two? $300-400 or even $500? I can see them slowly ratcheting it up. It’s the same old story we’ve seen played out before (eg: Uber, Spotify, Netflix etc).

    Doesn’t mean it’s one we should particularly want to see repeat tho.

    Like you, I like the notion of mixing and matching local agents for grunt work and off-loading the thinking to API or SOTA. I hadn’t heard of ECA - that looks like it’s right up my alley. Thanks for that








  • Well, you know what they say - there’s no force quite like brute force :)

    But to reply in specific:

    [1] Decision tree + regex: correct, and intentional. The transparency is a feature not a bug. You can read the routing logic, audit it, and know exactly why a given turn went where it did. A fine-tuned routing model reintroduces the black box problem at the routing layer itself - and if it misclassifies, what catches it? You’ve pushed the problem one layer up, not solved it.

    [2] Deterministic-first doesn’t mean deterministic-only. Open-ended turns go to the model by design - I’m not trying to regex all language, just not use an LLM where a calculator or a SHA check works better. The model is still involved. Case in point - see the car wash test.

    [3] On edge cases - yep, and that’s what 8,764 benchmark runs were for. Failures are taxonomized and patchable at the routing layer without touching the model. If a rule fails, I can show the exact failure and patch it. Yeah, that’s going to be wack-a-mole for a while but…if a routing model fails, I’d need new training data and still may not know why. Models are inherently black box. Python code (as your robots have shown you) is the opposite.

    My way, I know where the fuck up is and I can figure out a global maxima solution myself, cheap and easy.

    [4] On the fine-tune suggestion: on a 4GB potato, rule updates are free and immediate. Retraining cycles are…not. Send money, will will buy Strix or cloud GPU access :)

    [5] The hybrid direction is already on the roadmap! TLDR: Swarm handles ambiguous routing; deterministic lanes stay for bounded and high-stakes tasks. Hybrid control + learned judgment, with measurable gates before each promotion. That sequencing is deliberate.

    Slightly longer version of what that should look like:

    User turn

    → Classifier (labels intent)
    
    → Contradiction detector (user turn + last N turns)
    
    → Refusal/risk assessor (user turn + classifier label)
    
    → State tracker (full session summary from memory)
    
    → Synthesiser (user turn + all worker outputs as FACTS block)
    
    → Critic (hunts violations in synthesiser output)
    
    → Output or retry
    

    Each worker does one job with full attention on that task. The synthesiser gets verified ground truth, not raw conversation history.

    I’m looking at sub-1B classifiers (SmolLM2-135M/360M, Gemma-3-270M maybe) for fast-gate roles. Phi-4-mini is already a part of the main show (used in as the second pass critic model). Unlike everyone else, I am optimizing for we in the GPU peasant class.

    That’s the fine-tune direction, anyway. Scoped, staged and only where the deterministic layer can’t do the job cleanly.

    PS: Say hello to the klankers and tell them to stop cloning my shit for their training. :)



  • That’s exactly what I did. And in the course of doing that, I gathered almost 10,000 data points to prove it, showed my work and open sourced it. (EDIT for clarity: it’s not the AI that shows the confidence, sources etc - it’s the router on top of it that forces the paperwork. I wouldn’t trust an AI as far as I could throw it. But yes, the combined system shows its work).

    You don’t need to be a dev to understand what this does, which is kind of the point. I don’t consider myself a dev - I’m was just unusually pissed off at ShitGPT, but instead of complaining about, did something.

    Down-vote: dunno. Knee jerk reaction to anything AI? It’s a known thing. Ironically, the thing I built is exactly against AI slop shit.

    To say I dislike ChatGPT would be to undersell it.







  • Done

    I’ll give you the noob safe walk thru, assuming starting from 0

    1. Install Docker Desktop (or Docker Engine + Compose plugin).
    2. Clone the repo: git clone https://codeberg.org/BobbyLLM/llama-conductor.git
    3. Enter the folder and copy env template: cp docker.env.example .env (Windows: copy manually)
    4. Start core stack: docker compose up -d
    5. If you also want Open WebUI: docker compose --profile webui up -d

    Included files:

    • docker-compose.yml
    • docker.env.example
    • docker/router_config.docker.yaml

    Noob-safe note for older hardware:

    • Use smaller models first (I’ve given you the exact ones I use as examples).
    • You can point multiple roles to one model initially.
    • Add bigger/specialized models later once stable.

    Docs:

    • README has Docker Compose quickstart
    • FAQ has Docker + Docker Compose section with command examples