Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
AI Sovereignty and Local LLM Deployment
- Risks associated with cloud LLMs: data retention, potential training on user inputs, and foreign jurisdictional issues.
- Ollama architecture: model server, registry, and OpenAI-compatible API integration.
- Comparison with vLLM, llama.cpp, and Text Generation Inference tools.
- Model licensing specifics for Llama, Mistral, Qwen, and Gemma.
Installation and Hardware Configuration
- Installing Ollama on Linux with CUDA and ROCm support.
- CPU-only fallback options and AVX/AVX2 optimization techniques.
- Docker deployment strategies and persistent volume mapping.
- Multi-GPU setup procedures and VRAM allocation strategies.
Model Management
- Retrieving models from the Ollama registry using commands like 'ollama pull llama3'.
- Importing GGUF models from HuggingFace and TheBloke repositories.
- Understanding quantization levels: tradeoffs between Q4_K_M, Q5_K_M, and Q8_0.
- Switching models and managing limits for concurrent model loading.
Custom Modelfiles
- Writing Modelfile syntax including FROM, PARAMETER, SYSTEM, and TEMPLATE directives.
- Tuning temperature, top_p, and repeat_penalty parameters.
- Engineering system prompts for specific role-based behaviors.
- Creating and publishing custom models to the local registry.
API Integration
- Utilizing the OpenAI-compatible /v1/chat/completions endpoint.
- Implementing streaming responses and JSON mode.
- Integrating with LangChain, LlamaIndex, and custom applications.
- Managing authentication and rate limiting via reverse proxy configurations.
Performance Optimization
- Configuring context window size and managing KV cache.
- Handling batch inference and parallel requests.
- Allocating CPU threads and ensuring NUMA awareness.
- Monitoring GPU utilization and memory pressure levels.
Security and Compliance
- Establishing network isolation for model serving endpoints.
- Implementing input filtering and output moderation pipelines.
- Maintaining audit logs of prompts and generated completions.
- Verifying model provenance and hash integrity.
Requirements
- Intermediate proficiency in Linux and container administration.
- High-level understanding of machine learning concepts and transformer models.
- Familiarity with REST APIs and JSON formats.
Target Audience
- AI engineers and developers transitioning from cloud LLM APIs to local solutions.
- Organizations bound by data sensitivity policies that prohibit the use of cloud models.
- Government and defense teams requiring air-gapped language model deployments.
14 Hours