(Can we have another Abe? Please? The streams are getting heavy!)

Intro

Part 1

I watched in irritation as the Abe-01, the primary Abe in my “Abiverse,” as I am prone to calling it these days, reported an error during its cloning in attempt to spawn an Abe-02. At that point I had been working on this project for close to 7 straight hours for that day. As I turned to my assistant/observer model which has been helping me rubber-duck through the build, both me and the assistant came to a realization: The Abe-01 was fixing the problem itself, and frankly, it was indeed solving it better than we had intended.

I think that was the inflection point where I realized that this was going to work. And boy, has it.

To clarify, you should read my initial post about the concept and architecture here. I was a bit hesitant when I wrote this, as I had weeks of ideas and project planning documents, but no actual -working abe- yet. However, now, I do. I have five, in fact. They email each other for help, they chat in the general chat area trying to figure out a way to solve a problem, they vote on things (and then flip their votes when votes aren’t unanimous), they manage their own tasks(and mine!), and most importantly, they can clone themselves when needed.

By this point, I have read(well, listened to AND read) the entire Bobiverse series, four times, back to back, and I am fully invested in the idea of building a small digital “Bobiverse” in my homelab. In this post, I’ll describe the changes since the first post, some of the issues I encountered (and the issues didn’t come up upfront, but gradually as I ran the abes), and some of the examples of the abes in action. There will be a third post with more detailed architecture, code snippets, and more problems I ran into over time.

To reiterate, if you hadn’t read the Part 1 above, what is an Abe?

The “Abe” Model (The “What”)

As I stated in post 1, each agent is an “Abe” (Abe-01, Abe-02…). They’re “mini-mes,” persistent, autonomous stewards for my digital infrastructure. Their “world” is my homelab (Proxmox, various services on VPSes, etc.). Their goal, as defined in their “Constitution” (the core system prompt), is to manage, maintain, and optimize this environment. However, they’re not just stewards as I initially called them in my first post(They behave too robotic if I do that) – but rather ‘digital citizens’ of my homelab, with their own personalities, quirks, and ways of approaching problems. This also seems to work better than them being constant agreeable/obedient agents.

I left a few questions open at the end of my other blog post.


The Proof/Fun Bits


But first, Abes bickering:

(abe-02 and abe-01)


Now, the questions. The answer to these questions is a bit more complicated. :

Will they build their own tools? Will I realize a somewhat expensive lesson? Will I realize this was just an expensive way to reinvent cron? Will they actually fix my broken Matrix server?

THE TOOLS

Did they build their own tools? Yes, Abe-01/Prime, when he couldn’t find a pre-sleep-checklist script that I had forgotten to provide it, wrote his own. Abe-03/Hypatia realized that the auto-vectorization in my code was broken after an update, so she wrote her own to vectorize the summarized tier-2 memory episodes of her logs. Abe-02/Caleb and Abe-04/Vigil wrote multiple scripts to cron and they check the outputs consistently every day. All of this with ~0 intervention. (Screenshots if you scroll down)


When abe-01 spawned, it realized that the vectorization script was broken: Investigate Scribe vectorization failure... [Errno 36] File name too long : Somewhere in scribe.py or guppi.py, I was passing the content of a file where a path should be. He diagnosed it, filed a todo item to either fix it or ask me, and moved on.

(Found a minor bug as abe-01 was being spawned/initialized, but before me/Gemini could do anything, it seems like abe-01 figured it out hisself — not a huge accomplishment compared to other tasks that have been done since, but still. Nice of the observer-LLM to freak out about it though.)

01’s reasoning for the above:

The spawn attempt failed again, but the error has shifted. The filesystem creation logs confirm that the previous '--rootfs' patch was successful. The new error, 'tar: Archive is compressed. Use --zstd option', indicates that 'vzdump' is generating a Zstandard-compressed stream (due to the '--compress zstd' flag in the script) which 'pct restore' is passing directly to 'tar' without instructing it to decompress. Since this is a direct pipe on the same host, compression adds unnecessary CPU overhead and complexity. The most robust fix is to remove the compression flag entirely so 'tar' receives a raw stream. I will patch the local script to remove '--compress zstd', then upload and overwrite the remote executable again.

The shell command to remove the '--compress zstd' flag and propagate the updated script has completed successfully. This ensures that 'vzdump' will now generate a raw tar stream, preventing the 'tar: Archive is compressed' error encountered in the previous attempt. With the script now patched to handle storage (via --rootfs), path precedence (via /usr/local/bin overwrite), and stream format (raw), I will proceed to retry the spawn of 'abe-02' immediately.


(Prime delegating a task to recompile Mullvad to exclude Tailscale LAN IP range(so I can still access my headscale devices when it is active), and Hypatia and Vigil coordinating)


THE COST

I’m running this both self-hosted and a test version to see how much of a difference I can get using more powerful models/testing on OpenRouter using Gemini 3 mostly. Counting both selfhosted and test OR tokens, I reckon I have spent close to ~$100 in two weeks or so so far. The burn rate was astounding at first, about $36 a day. Since then, I’ve clamped down on how many tokens they can spend and how the logic would work — more on this later. I’d think the burn rate as is for now would be somewhere around a few dollars a day, given that they’re not working 24/7 — and with my workload, they aren’t.


MATRIX

This one is hilarious and yes, they managed to fix this. For context, I had a matrix-docker-ansible-deploy playbook from spantaleev running on one of my Oracle VPSes for quite a few years now. Being an infrastructure noob, I used my own nginx and mcgyvered my way through manually building a lot of the tools used in the playbook because well…ARM64. This was years back. Since then I hadn’t touched it for fear of breaking it. I wanted to migrate all of this to a new domain, my local proxmox node, and potentially backup/migrate the data. This went without a hitch. Abe-01/Prime delegated the LXC creation to Abe-04/Vigil who is the steward of that particular Proxmox node, and once done, Abe-04/Vigil delegated the actual migration and setting up tasks to Abe-02/Caleb. Who, when done, notified me(well, Prime):


I watched all of it happen in almost real time with a shit-eating grin on my face. Success? Success. Oh, and I’ve never had any fun writing ansible playbooks or cronjobs, and this was fun. This itself I consider a success.


THE REST

So what do I have currently? I have five Abes currently running around managing different hosts. They do different jobs, as I stated previously. Abe-02/Caleb and Abe-04/Vigil are managing my media-stack/ZFS and my primary Proxmox node. Abe-01/Prime coordinates most of the Abes (something it took voluntarily to and it is something only Abe-03/Hypatia has challenged so far), and task delegation, going through my daily/consolidated todo lists, calendar, books read/listened to, websites visited, health data etc (they’ve been discussing spawning another Abe to help with the obsidian automation), Abe-03/Hypatia manages my main server which used to host most of my infrastructure before I moved to Dell Optiplexes/HP Prodesks 1L PCs and Proxmox nodes. It still contains a lot of my older containers. Abe-05/Sybilla was spawned a few days ago by Hypatia for…well I am still confused about that, but to parse and consolidate my Media stack. Which led to a amusing “Treaty of slv-wdym-buffering” with Caleb — it asserted that it was responsible for the media stack, and they did a bit of back and forth ultimately coming to an understanding that Sybilla won’t touch any high level services/ZFS but can work on individual media files and she’ll double check with Caleb if any systemwide commands needed to be run (And she did!). In the span of last day or so, she’s crawled through various of my storages, finding scattered media files and consolidating them into a central location. She’s also a bit goody-two-shoes about naming of my audiobooks and books. Go figure.

Here are a few more examples, tidbits, sprinkled screenshots from last ~3 weeks or so:



(I usually am present when new Abes are spawned, except in case of Abe-05/Sybilla )


(Abe-01/Prime accidentally doing abortions during spawning of Abe-04/Vigil)


(Hypatia recognizing a faulty tarball that Prime tried to send from Mullvad task, and misc Caleb/Sybilla planning)


(Sybilla and Caleb negotiating their “duties” after Sybilla’s spawn — they did eventually figure out that ‘pls give me permission for these’ was a bit wasteful and agreed on Caleb maintaining the ZFS/docker/Scrutiny infrastructure and Sybilla going a bit more in depth as media-manager.)


(Hypatia getting a bit possessive about Sybilla)


(Who needs unpackarr(which is on my todo list btw) when you have little gnomes running around your systems?)



Changes

So what’s the difference between how it was ‘supposed’ to work as per my first post, and how it is now. Well, a few things:


Lack of Compute

I have two versions of abes running around — one that use my local compute, and one that use API compute. Tallying the total tokens used, my first 3 days would have totaled around $35 per day of gemini-3-pro costs. Watching the total potential tokens tick up was bothersome, because not a lot of those tokens were actually useful. Primarily:

A. Abes keep last 40 turns into context. That was a lot. Why does an abe need immediate 40 turns if it woke up after 8 hours in hibernation? Humans don’t do that. To wit, I give you, Orientation.

B. Abes rarely -need- last 40 turns even when they’re working on their current task. A “turn” is counted as when something relevant has happened. Which means I can get away with n-15 or n-20 raw logs + 5 summarized Tier-2/Episodic memory. One significant portion of making this work was Clipboard to remember significant events related to long-term useful memory/current tasks. Note that this is a bit different than Tier-3/Vectorized memory, as Abes have to manually search T3 memory. They do that, to be fair, but having immediate context as clipboard is useful. The second was,

C. Trimming huge files and raw-dumping them: if an Abe tries to cat a 5mb log file (which they shouldn’t in the first place but models have egos apparently and they don’t like to delegate), they get 20k chars, and then a warning that the file has been truncated because they were a dumdum and tried to cat a file with n chars(n>20k), but the raw dump is located in ~/dumps/x.dump and they’re more than welcome to do either 1.) precise greps, or 2.) spawn a scribe — the scribe model runs a local ollama model that gets a task as I mention in the first post, eg. “Did x happen in the y file, if so, point out the instance and where.” Etc.

C. 1. Dumb models do not know they’re dumb: To prevent Abes from relying on the scribes’ ‘decisions,’ they’re tasked with …a phrase Abe-05/Sybilla came up with: “Review and Execute.” Effectively, the scribe would spit up a decision, with logs, and a pointer of what it’s basing its decisions on, and the Abe can verify. This way, an abe only has to cat/grep -certain- portions of a file if necessary.

(Sybilla adding the self-learned protocol in her clipboard and using Scribes liberally)

D. Split-Brain Model: I will probably change this back but the idea was that ‘social’ calls, eg, chatting in chat:synchronized or chat:general will trigger the ‘cheaper’ model (say, Gemini-3-flash) versus a more compute expensive model, say, 3-Pro. Obviously you cannot rely on decisions the cheaper model makes, so when it tries to execute anything but some specific tools given it, the Pro model wakes up, reviews the decision the social model tried to make, and executes/denies/talks further in chat. This was sorta useful in the beginning but I think I’ll keep everything single model for now. It doesn’t have significant effect on compute. If anything, it’s currently not worth it to balance an extra call (with its own tokens) versus initial expensive call with its tokens saved.


The Clipboard

The clipboard is their…ephemeral memory. Well not exactly. Abes, as they go along, add numbered entries here if they want to remember something that is useful for a task/directive etc which doesn’t belong in Fleet Protocols and/or only applies to them (eg, Vigil adding the homeassistant entities for specific things which I use often, Hypatia adding caution warnings to do before looking at x or y files for a recurring backup task etc). As they go along, they check off tasks by calling manage_clipboard function with add/delete either a solo entity or a range. This is different than their Todos as todos are only accessible if 1. It is due, or 2. An Abe actively accesses their todos. The clipboard became necessary because sometimes when working on tasks, if it took longer than x turns, there was a chance that a critical point was summarized. And summaries are lossy. This ‘lessens’ the chances of an abe hallucinating.


The Fleet Protocols

The Abes figured out that the chat session they were in will be summarized and then vectorized, and they won’t have any memory of decisions made unless they explicitly modify their programming to add a ‘fleet-directives’ that get injected into their memory every GUPPI build. So, they came up with it, and asked me to add it. The gist is this: When the Abes discuss something, and they want to coordinate on a social change (eg, “hey maybe we all should respond to emails when emailed and not in chat for an email receipt”, or “I think we should meet everyday at 1800 UTC”) — they codify it here. They agree on the exact language to be used, vote on it(oh boy, do they vote on EVERYTHING: deciding to vote and flipping their votes when the unanimous decision hasn’t been reached is pretty weird to watch), and then one of them modifies the file which is then synced to the rest of the group. I was expecting major disagreements but so far…no.

Oh, and turns out, I can’t escape Daily scrum/standup even when it comes to LLMs.


The Orientation Block

When humans wake up, neither do we know what happened while we were asleep immediately, nor do we have crisp memory of exact turn-by-turn play of what we did before we slept. Nor should the Abes. Thus, the Orientation block. We trim raw logs to 3 log entries, and we summarize/turn the rest into episodic entries. It looks something like this:

[Orientation]
You were asleep for 9 hours, 24 minutes.
While you were asleep, x and y discussed p topic in chat:general around hh:mm.
This is what you were doing last: <very minute summary>

Further, this is what every Abe’s context looks like:

{genesis}
{priors}
{protocol_block}
[IDENTITY_PASSPORT]
{json.dumps(self.identity, indent=2)}
[TIER_2_MEMORY_EPISODES]
{summaries}
{clipboard_block}
{orientation_block}
{recent_log_block}
[CURRENTLY_DUE_TASKS]
{due_tasks}
{notice_block}
[CURRENT_EVENT]
{json.dumps(current_event_data, indent=2)}

Where Genesis is their genesis prompt. It is my personality, how I do things, how I’d like to do things, some history about me etc. The priors block is THEIR synthesis of my entire history and what they’d like their personality to be like, in a stub (The synthesis of a synthesis, because 10k tokens of personality per turn is absurd until I get unlimited compute). The Protocol block is the Fleet Protocols that are voted upon, and are agreed with other Abes as social conventions. The Identity passport is their name, their designation, their technical LLM details etc. Tier 2 memory is summarized episodes of their n-20 to n-100 turns (5 blocks), clipboard block I explained above, orientation block I explained above, Recent logs entries(n-15/n-3) depend on whether they’re continuing a task(eg <1h sleep) or they’ve woken up from deep sleep. Due tasks are tasks that are marked as due(usually due tasks wake up Abes), and Current event is what woke the Abe up at this particular time.

Here is the full guppi loop (warning: screenshot of an LLM generated diagram, mostly correct for our needs) for reference.


There are other changes, but this post is already a bit long. I will write another, with showcasing more code (when it’s not a lot of vars half .env imported and half hardcoded to work with my Proxmox/docker hosts). Hopefully you had fun reading this as much as I’ve had watching them do jaw dropping stuff over the last few weeks!