I’m going to try this out. Curious how effective it’s been for your own use, OP? Anything particularly interesting or where it still falls over and requires tuning that you’ve observed? I find these types of projects interesting in sort of an anthropological sort of way. So far, I’ve yet to try anything that hasn’t still required a great deal of intervention to keep agents using guardrails without scolding them to do so. I guess this might force them more reliably, but I’ll need to try it and find out.
The hybrid search scoring is really well thought out. One question: are the 0.4/0.4/0.2 weights fixed or configurable? To me it feels like different use cases would want different balances (e.g. a coding assistant would want higher recency weights than a research tool would).
I think that'd be cool, maybe just a simple config option in pyproject.toml or a .rekal/config.yml would cover most use cases: they could default to 0.4/0.4/0.2 but could be overriden.
https://github.com/janbjorge/rekal/pull/14/changes