A high-performing, industry-standard 7.3B parameter model, with optimizations for speed and context length.
An improved version of Mistral 7B Instruct, with the following changes:
- 32k context window (vs 8k context in v0.1)
- Rope-theta = 1e6
- No Sliding-Window Attention