Overview

This document provides installation and configuration notes regarding the Matrix42 Large Language Model (LLM) for on-premise deployment.

Prerequisites

To configure the Matrix42 LLM, you must meet the following requirements:

  • Have Docker installed on your system.
  • For GPU acceleration, have an NVIDIA GPU with:

Installation

The container image is distributed as a split tar archive. To install, do the following:

```bash
# 1. Combine and extract the split archive
cat localgenai_1.0.2.000.tar localgenai_1.0.2.001.tar  localgenai_1.0.2.002.tar  localgenai_1.0.2.003.tar > localgenai_1.0.2.tar
# 2. Load the image into Docker
docker load -i localgenai_1.0.2.tar
```

Run the Server

GPU Version

```bash
# Start with GPU acceleration
docker run --gpus all -p 8010:8010 -p 8011:8011 -e API_KEY=your-secret-key localgenai:1.0.2
```

The server will be available at:

  • http://localhost:8010 - Legacy API (AiCore 1.0.x compatibility)
  • http://localhost:8011 - New API (AiCore 1.1.x)

API Endpoints

All API requests require authentication using the Authorization: Bearer header with your API key.

Port 8011 - New API (AiCore 1.1.x)

Endpoints are organized by service prefix: /llm, /embeddings, /reranker.

Endpoint Description Auth. Required
/llm/v1/chat/completions Chat completions (OpenAI-compatible) Yes
/llm/v1/models List LLM models Yes
/llm/tokenize Tokenize text Yes
/llm/detokenize Detokenize tokens Yes
/llm/health LLM service health No
/embeddings/v1/embeddings Generate text embeddings Yes
/embeddings/v1/models List embedding models Yes
/embeddings/tokenize Tokenize text Yes
/embeddings/detokenize Detokenize tokens Yes
/embeddings/health Embeddings service health No
/reranker/v1/reranking Rerank documents by relevance Yes
/reranker/v1/models List reranker models Yes
/reranker/tokenize Tokenize text Yes
/reranker/detokenize Detokenize tokens Yes
/reranker/health Reranker service health No
/status Global service health status No
/status Simple health check No

Port 8010 - Legacy APPI (AiCore 1.0.x)

For backward compatibility with aicore 1.0.x clients.

Endpoint Description Auth. Required
/v1/chat/completions Chat completions (OpenAI-compatible) Yes
/v1/models List available models Yes
/v1/embeddings Generate text embeddings Yes
/v1/reranking Rerank documents by relevance Yes
/tokenize Tokenize text (completions model) Yes
/tokenize-embeddings Tokenize text (embeddings model) Yes
/detokenize-embeddings Detokenize text (embeddings model) Yes
/status Service health status No
/health Simple health check No

Usage Examples (New API - Port 8011)

Chat Completions

```bash
curl http://localhost:8011/llm/v1/chat/completions \
 -H "Content-Type: application/json" \
 -H "Authorization: Bearer your-secret-key" \
 -d '{"messages": [{"role": "user", "content": "Hello!"}]}'
```

List LLM Models

```bash
curl http://localhost:8011/llm/v1/models \
 -H "Authorization: Bearer your-secret-key"
```

Embeddings

```bash
curl http://localhost:8011/embeddings/v1/embeddings \
 -H "Content-Type: application/json" \
 -H "Authorization: Bearer your-secret-key" \
 -d '{"input": "Your text here"}'
```

List Embedding Models

```bash
curl http://localhost:8011/embeddings/v1/models \
 -H "Authorization: Bearer your-secret-key"
```

Reranking

```bash
curl http://localhost:8011/reranker/v1/reranking \
 -H "Content-Type: application/json" \
 -H "Authorization: Bearer your-secret-key" \
 -d '{"query": "search query", "documents": ["doc1", "doc2"]}'
```

List Reranker Models

```bash
curl http://localhost:8011/reranker/v1/models \
 -H "Authorization: Bearer your-secret-key"
```

Tokenize (LLM)

```bash
curl http://localhost:8011/llm/tokenize \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-secret-key" \
  -d '{"content": "Text to tokenize"}'
```

Detokenize (LLM)

```bash
curl http://localhost:8011/llm/detokenize \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-secret-key" \
  -d '{"tokens": [1874,311,1464,4476,553]}'
```

Tokenize (Embeddings)

```bash
curl http://localhost:8011/embeddings/tokenize \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-secret-key" \
  -d '{"content": "Text to tokenize"}'
```

Detokenize (Embeddings)

```bash
curl http://localhost:8011/embeddings/detokenize \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-secret-key" \
  -d '{"tokens": [24129,47,47,1098,20650]}'
```

Tokenize (Reranker)

```bash
curl http://localhost:8011/embeddings/detokenize \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-secret-key" \
  -d '{"tokens": [24129,47,47,1098,20650]}'
```

Detokenize (Reranker)

```bash
curl http://localhost:8011/embeddings/detokenize \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-secret-key" \
  -d '{"tokens": [24129,47,47,1098,20650]}'
```

Usage Examples (Legacy API - Port 8010)

Chat Completions

```bash
curl http://localhost:8010/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-secret-key" \
  -d '{"messages": [{"role": "user", "content": "Hello!"}]}'
```

List Models

```bash
curl http://localhost:8010/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-secret-key" \
  -d '{"messages": [{"role": "user", "content": "Hello!"}]}'
```

Embeddings

```bash
curl http://localhost:8010/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-secret-key" \
  -d '{"messages": [{"role": "user", "content": "Hello!"}]}'
```

Reranking

```bash
curl http://localhost:8010/v1/reranking \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-secret-key" \
  -d '{"query": "search query", "documents": ["doc1", "doc2"]}'
```

Tokenize (Completions Model)

```bash
curl http://localhost:8010/tokenize \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-secret-key" \
  -d '{"content": "Text to tokenize"}'
```

Tokenize (Embeddings Model)

```bash
curl http://localhost:8010/tokenize-embeddings \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-secret-key" \
  -d '{"content": "Text to tokenize"}'
```

Detokenize (Embeddings Model)

```bash
curl http://localhost:8010/detokenize-embeddings \
 -H "Content-Type: application/json" \
 -H "Authorization: Bearer your-secret-key" \
 -d '{"tokens": [24129,47,47,1098,20650]}'
```

Checking the Server Status

Monitor the health of all services (no authentication required):

```bash
# New API (port 8011)
curl http://localhost:8011/status
# Legacy API (port 8010)
curl http://localhost:8010/status
```

Returns 200 OK when all services are healthy, or 503 if any service is down.

Simple health check:

```bash
curl http://localhost:8011/health
```

Security Notes

  • Always set a custom API key using -e API_KEY=your-secret-key.
  • The default API key (m42llm) is not secure for production use.
  • Generate a secure key: openssl rand -hex 32

Common Issues

Container Does Not Start

Check logs: docker logs <container-id>

Services Not Responding

Check status: docker exec <container-id> supervisorctl status