Share on

Self‑Hosted AI, Agent‑Ready Web, and Lightning‑Fast Inference

NEWSLETTER

Beyond the Build • March 02, 2026

XX minutes of reading

NEWSLETTER | Amplifi Labs

Omni launches open-source, Postgres-powered self-hosted workplace search with AI agent

Around the web • March 2, 2026

Omni is a new open-source, self-hosted workplace search and AI assistant that runs entirely on Postgres (ParadeDB), combining BM25 and pgvector to unify search across Google Workspace, Slack, Confluence, Jira, and more—no separate Elasticsearch or vector DB required. Its chat agent can query connected apps, read documents, and execute Python/bash in a tightly sandboxed container while inheriting source permissions, with BYO-LLM support for Anthropic, OpenAI, Gemini, or open weights via vLLM. Deploy via Docker Compose or Terraform (AWS/GCP); core services are in Rust/Python/SvelteKit with each connector isolated in its own lightweight container.

Read Full Article →

Agent Platforms and the Web: Where automation actually runs

WebMCP preview: Declarative and Imperative APIs for agent‑ready sites

Around the web •March 1, 2026

Chrome introduced WebMCP in early preview, proposing two web APIs that let browser-based AI agents reliably perform actions on users’ behalf. A Declarative API maps standard actions to HTML forms, while an Imperative API enables complex, JavaScript-driven flows—offering faster, more robust alternatives to raw DOM actuation. Developers can join the preview to access docs and demos and start prototyping agent-ready ecommerce, support, and travel workflows.

Read Full Article →

MCP Loses Steam: CLI-First Agent Tooling Wins in Practice

Around the web •March 1, 2026

The author argues Anthropic’s Model Context Protocol adds complexity with little real-world benefit, as LLMs already use existing CLIs effectively. CLIs offer human-parity, shell composability (pipes/jq), battle-tested auth (aws/gh/kubectl), and fewer moving parts, while MCP introduces flaky initialization, repetitive re-auth, and coarse permissions. For teams integrating agents, prioritize robust APIs plus a capable CLI; consider MCP only when no CLI exists.

Read Full Article →

Users Choose GenAI for Multi-Constraint Tasks, Search for Certainty

Nielsen Norman Group •February 27, 2026

NN/g’s observational study finds users favor genAI for multi-constraint planning, aggregation, and side‑by‑side comparisons (e.g., tables and review summaries) because it reduces clicks and cognitive load. When accuracy, provenance, and control are critical, they revert to traditional search and trusted sources—suggesting products should pair chat-based synthesis with clear citations and source controls. For teams building search, shopping, or planning flows: optimize prompts for constraints, support structured outputs, and surface verifiable sources when stakes are high.

Read Full Article →

Practical AI Engineering: Local, fast, and auditable

Timber compiles classical ML to C, 336× faster than Python

Around the web •March 1, 2026

Timber is an open-source compiler and server that converts trained tree-based models (XGBoost, LightGBM, scikit-learn, CatBoost, ONNX TreeEnsemble) into optimized C binaries and serves them via a local HTTP API, eliminating Python from the hot path for microsecond latency. Its Ollama-style workflow (timber load; timber serve) targets low-latency, portable inference in fraud pipelines and edge deployments, with reproducible benchmarks claiming up to 336× speedups over Python XGBoost single-sample inference. Apache-2.0 licensing and an accompanying technical paper make it appealing for platform teams needing deterministic artifacts and auditability.

Read Full Article →

llmfit auto-matches LLMs to your hardware with one command

Around the web •March 1, 2026

llmfit is a TUI/CLI that detects your CPU/GPU/RAM and scores hundreds of LLMs by quality, speed, fit, and context to recommend models that will actually run well locally. It supports multi‑GPU, MoE, dynamic quantization, speed estimation, JSON output, and integrates with Ollama (including remote), llama.cpp, and MLX. Install via Homebrew or a one‑line curl script; useful for developers standardizing local inference and automating model selection in CI.

Read Full Article →

Git-Memento makes AI-assisted commits auditable via Git Notes

Around the web •March 1, 2026

Git-Memento is a CLI that captures AI coding session transcripts and attaches them to Git commits via git notes, producing human-readable Markdown with multi-session support. It supports Codex and Claude, syncs/merges notes across remotes, and provides audit/doctor commands plus a GitHub Action to comment on or gate PRs based on note coverage. For teams using AI pair-programming, it adds traceability and compliance-friendly provenance without changing normal Git flows, with cross‑platform NativeAOT binaries for easy install.

Read Full Article →

Security, mobile, and product strategy in the AI era

Motorola teams with GrapheneOS, adds Moto Analytics and image privacy

Around the web •March 2, 2026

At MWC, Motorola announced a long-term partnership with the GrapheneOS Foundation to co-engineer future devices with GrapheneOS compatibility, signaling OEM-backed hardened Android options for privacy- and enterprise-focused deployments. The company also introduced Moto Analytics, an enterprise platform surfacing fleet-wide app stability, battery health, and connectivity metrics beyond traditional EMM controls, integrated with ThinkShield. Additionally, a new Moto Secure feature, Private Image Data, will auto-strip sensitive EXIF metadata (e.g., location, device details) from new photos, rolling out to motorola signature devices in the coming months.

Read Full Article →

Cheap AI coding demands new UX discipline: build less, test faster

UX Design •March 2, 2026

AI-driven “vibe coding” is collapsing the cost and expertise needed to ship software, enabling non-engineers to build with tools like Cursor, Claude, Lovable, Bolt, and v0.dev while turning designers into design engineers. This abundance supercharges internal tools and turns prototyping into minutes, but it also risks feature bloat—the “Michelin-Star problem”—unless PMs and designers act as the constraint. The practical play: use speed for real-user prototypes and research, then ship fewer, clearer features.

Read Full Article →

‍