Ummm… did you try /set parameter num_ctx #
and /set parameter num_predict #
? Are you using a model that actually supports the context length that you desire…?
Ummm… did you try /set parameter num_ctx #
and /set parameter num_predict #
? Are you using a model that actually supports the context length that you desire…?
Models are computed sequentially (the output of each layer is the input into the next layer in the sequence) so more GPUs do not offer any kind of performance benefit