Pi + Ollama Cloud API: Setup and First Impressions

Ever since I found out about pi and gave it a shot I loved it, it “just works” and the UX is very well thought out. Some folks have even been saying it’s like “vim for agentic coding” which I’m starting to agree with.

I think I first came across it before the “OpenClaw boom” when someone shared this post by its author on some social media feed and last week I put some effort into making it a first class citizen in my stack.

If you just want the setup instructions, skip to Setup.

Getting off the “Claude train” for a bit

I’ve been exclusively a Claude Code user for a while now, I still think Opus is like “state of the art” in terms of models for coding agents but it’s more like a guess based on what others have written about it. For that reason my interactions with Pi were powered by Opus through Claude’s OAuth creds which has been a grey area for a while and recently stopped working with Claude subscriptions.

The risk of being blocked by Anthropic made me look around for alternatives last month, and over time I’ve been thinking that we need to somehow “embrace AI diversity” by experiencing other models / harnesses to understand how they behave and be better informed when the primary tool goes down or gets too expensive.

Thanks to the “OpenClaw boom” and a 2 hour failed attempt to set it up when it first came out, I found out about Venice.ai in their docs. Their privacy focus caught my eye and made me put a $20 credit in. I used it to play with Pi a lil bit and on some experiments with my very personal data.

Like many other providers, Venice.ai pricing is usage based, but for coding that can get expensive. While looking for alternatives, I found out about a basic free tier by Ollama and gave it a shot with Pi. (If you want to try out venice.ai regardless, use this link to sign up so we both get $10 bonus.)

The undocumented approach

The Pi integration docs assume you’re running Ollama locally and will point Pi to http://localhost:11434/v1 with a “key based auth”. My understanding is that the local web server acts as middleware between our computers and the cloud models and provides the authentication mechanism.

At first that approach is fine on your host machine, especially if you use Ollama for other stuff, but inside containers it means running a separate Ollama process which can get hairy given containers usually have no init system to keep the daemon running. You could wire it up with docker-compose and run Ollama as a sidecar, but now you’ve got more moving parts just to run a coding agent.

Turns out Ollama’s cloud models can also be accessed with an API key using the same OpenAI-compatible interface and no need for local server. The Pi integration docs will hook you up with the localhost + local-server approach, and it took some digging to find that the OpenAI-compatible endpoint also accepts direct API key auth.

The URL gotcha

The Pi integration docs and Ollama API auth docs only show examples against localhost or https://ollama.com/api. So I tried https://ollama.com/api/v1 as the baseUrl in Pi’s models.json which seemed logical, but ended up getting 404s and the debugging saga started:

curl https://ollama.com/api/v1/models -H "Authorization: Bearer $OLLAMA_API_KEY"

# {"error":"path \"/api/v1/models\" not found"}

I was debugging this with Claude (the irony 🙈) and at one point it confidently told me the OpenAI-compatible /v1 path “simply doesn’t exist on Ollama cloud” and suggested I either run a local Ollama server as a proxy like the docs suggest or switch to OpenRouter entirely. I almost gave up on the whole thing.

Later in the day I resumed the chat and pushed it harder, then it walked back and said “actually… let’s also try without the /api prefix.” That was it:

curl https://ollama.com/v1/models -H "Authorization: Bearer $OLLAMA_API_KEY"
# works

Turns out I should’ve just RTFM: the OpenAI-compatible layer lives at https://ollama.com/v1, not https://ollama.com/api/v1, the /api prefix is for native Ollama endpoints like ps and others.

Setup

Here’s what you need to get this Pi + Ollama API combo without running the local Ollama server. First thing is (obviously) to create an account, then:

Create an API key at ollama.com/settings/keys (free tier works for light usage, see pricing for plan details and limits)
Install Pi: npm install -g @mariozechner/pi-coding-agent
Add this to your ~/.pi/agent/models.json:

{
  // Pi docs use "ollama" with localhost, I chose "ollama-api-key" to make things clear
  "ollama-api-key": {
    "api": "openai-completions",
    // TODO: replace with your key from step 1
    "apiKey": "YOUR_API_KEY_HERE",
    "baseUrl": "https://ollama.com/v1",
    "models": [
      {
        "id": "glm-5:cloud",
        "contextWindow": 202752, // 198K
        "input": ["text"],
        "reasoning": true
      }
    ]
  }
}

Set the default provider/model in ~/.pi/agent/settings.json:

{
  "defaultModel": "glm-5:cloud",
  "defaultProvider": "ollama-api-key"
}

The :cloud suffix tells Ollama to run the model on their infrastructure rather than locally - that’s what makes this whole setup work without a local daemon. Only models on the cloud model list support this suffix, so pick from there and swap out glm-5:cloud for whatever you want to try.

A few days in

I started on the free tier and bumped to the $20 Pro plan after a couple of days seeing the setup work. Ollama uses subscription tiers with usage limits rather than pay-per-token, so no surprise bills, but you can hit your limit and get cut off until the reset window (5h / 7d). Pro gives 50x the free usage and up to 3 concurrent models, which is enough for my Pi workflow.

I’ve been mostly using glm-5:cloud with thinking set to medium / high, it’s been my daily driver when using Pi for OSS work and feels OK so far. Still need to spend some time with other models to compare but it’s somewhat refreshing to use something other than claude code + Opus. Also, a nice side effect: less capable models force me to write clearer prompts and better agent instructions, and that feeds back into any model I use. I hope it improves my setup as a whole, time will tell.

A couple of things took some getting used to coming from Claude Code though.

First, Pi is YOLO by design without plan mode, no permission prompts, no built-in TODO tracking. It just gives the model four tools (read, write, edit, bash) and gets out of the way. Second, glm-5:cloud doesn’t feel as sharp as Opus when it comes to judgment calls, when it does go YOLOing it might head in the wrong direction.

As a concrete example, earlier today I used pi+glm-5 to wrap up a PR I’d been “ping-ponging” on with Claude Code for implementation + GitHub copilot for reviewing. I ended up sending like 20+ messages over less than 20 minutes and most of them were about keeping the model on track, as per Claude Opus (my data analyst) the highlights include:

“show me your plan before implementation” - when model went too far without explaining
“can we abstract this logic into one of the existing components?” - when it did not keep code DRY
“hang on, what’s the state of things?” - before it asked me if I was ready to push
“do not use | tail” - as it ran go test ./... | tail -n ... and the tool output was blank without any feedback (side note: Claude does this at times too, dunno why)
“why the fuck do your test commands never finish?” - turns out tailing was the culprit

I also bumped the thinking level from medium to high mid-session and loaded my golang-pro skill to give it more guidance.

It did get the job done in the end, but I’ll probably need to fine-tune my AGENTS.md to push back on the YOLOing and try other models for comparison. That’s a topic for another post, for now I just want to put this to test and see how good this setup can be as an alternative to claude code.

Speaking of which: while wrapping up this post I noticed glm-5.1:cloud dropped 5 days ago and it’s supposedly better at sustaining performance “over hundreds of rounds and thousands of tool calls”, they also claim that “the longer it runs, the better the result.” About to try it out throughout the week. 🚀