November 28, 2024, 15:30
% ollama create llama-q-and-a transferring model data 100% converting model Error: vocabulary is larger than expected '128258' instead of '128256If I trick it by modifying the downloaded config.json by changing "vocab_size": 128256 to "vocab_size": 128258` it will then create it, but then running it breaks because the architecture is out by two:
% ollama create llama-q-and-a transferring model data 100% converting model creating new layer sha256:27cc8e47a5b0677b27796952267dc8a821d478de44482bee52a2860f01a2d380 creating new layer sha256:e4e2d5fb1c3129b5ccc8fc5c19d1c06f6e8421f28d7dcfc3e80a081e34ecffdf writing manifest success % ollama run llama-q-and-a Error: llama runner process has terminated: error loading model: check_tensor_dims: tensor 'token_embd.weight' has wrong shape; expected 2048, 128258, got 2048, 128256, 1, 1I've tried various ways of converting the model to GGUF and ONNX with a spot of Python first, none have worked so far. Any advice greatly appreciated. Ultimately I want to be able to use Ollama + my model on a Raspberry Pi 5 8GB. Thanks 🙂 PS For reference, when I load and run the model with HF transformers in Python it's fine and I can run inferences fine - it's just transformers is too meaty for my needs whereas Ollama is inference-only optimised.