RYS-XLargeAfter testing several smaller models (Llama’s and smaller Qwen2’s), I set up the config for Qwen2-72B and let it sweep. Each $(i, j)$ configuration took a few minutes: load the re-layered model, run the math probe, run the EQ probe, record the scores, move on. Days of continuous GPU time on the 4090s. But far less compute than a fine tune! In fact, I didn’t even have the hardware needed for a LORA fine-tune on just 48GB of VRAM.
/blog/the-literate-self-hoster/
Liv McMahonTechnology reporter
马成龙承认,作为储能领域的新入局者,远东在品牌影响力上不及先行企业。但依托母公司远东控股集团在电力行业41年的深厚积累,以及自身17年的锂电池研发经验,公司已建立起从电芯到系统的全产业链自主研发能力,覆盖各类应用场景。
task involved prolonged arguing with the model as it made stupid mistakes.