Hacker News — vinext + Cloudflare Workers

new
past
show
ask
show
jobs
submit

▲Show HN: Run 500B+ Parameter LLMs Locally on a Mac Mini (github.com)

6 points by fatihturker 13 hours ago | 3 comments

deflator 7 minutes ago [-]

Fascinating. I don't understand the technical terms, but running a big coding agent locally is a dream of mine, so I thank you for your efforts!

ryanholtdev 4 hours ago [-]

Running a Mac Mini M4 as a home server for a bunch of automation scripts right now. The mmap-based layer streaming is the part I'm most curious about -- how does latency look when you're streaming layers from disk mid-inference? I'd expect throughput to degrade sharply once you exceed unified memory, but maybe the Top-K sparsity masks enough of the weight accesses that it's not as bad as sequential streaming would be. What's the actual tokens/sec at 140B scale on the base Mac Mini config?

anentropic 3 hours ago [-]

Yeah...

https://github.com/opengraviton/graviton?tab=readme-ov-file#...

the benchmarks don't show any results for using these larger-than-memory models, only the size difference

it all smells quite sloppy

Rendered at 19:27:22 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.