TextEmbed - Embedding Inference Server
TextEmbed is a high-throughput, low-latency REST API designed for serving vector embeddings. It supports a wide range of sentence-transformer models and frameworks, making it suitable for various applications in natural language processing.
Features
- High Throughput & Low Latency: Designed to handle a large number of requests efficiently.
- Flexible Model Support: Works with various sentence-transformer models.
- Scalable: Easily integrates into larger systems and scales with demand.
- Batch Processing: Supports batch processing for better and faster inference.
- OpenAI Compatible REST API Endpoint: Provides an OpenAI compatible REST API endpoint.
- Single Line Command Deployment: Deploy multiple models via a single command for efficient deployment.
- Support for Embedding Formats: Supports binary, float16, and float32 embeddings formats for faster retrieval.