Get in Touch

Course Outline

AI Sovereignty and Local LLM Deployment

  • Risks associated with cloud LLMs: data retention, potential training on user inputs, and foreign jurisdictional issues.
  • Ollama architecture: model server, registry, and OpenAI-compatible API integration.
  • Comparison with vLLM, llama.cpp, and Text Generation Inference tools.
  • Model licensing specifics for Llama, Mistral, Qwen, and Gemma.

Installation and Hardware Configuration

  • Installing Ollama on Linux with CUDA and ROCm support.
  • CPU-only fallback options and AVX/AVX2 optimization techniques.
  • Docker deployment strategies and persistent volume mapping.
  • Multi-GPU setup procedures and VRAM allocation strategies.

Model Management

  • Retrieving models from the Ollama registry using commands like 'ollama pull llama3'.
  • Importing GGUF models from HuggingFace and TheBloke repositories.
  • Understanding quantization levels: tradeoffs between Q4_K_M, Q5_K_M, and Q8_0.
  • Switching models and managing limits for concurrent model loading.

Custom Modelfiles

  • Writing Modelfile syntax including FROM, PARAMETER, SYSTEM, and TEMPLATE directives.
  • Tuning temperature, top_p, and repeat_penalty parameters.
  • Engineering system prompts for specific role-based behaviors.
  • Creating and publishing custom models to the local registry.

API Integration

  • Utilizing the OpenAI-compatible /v1/chat/completions endpoint.
  • Implementing streaming responses and JSON mode.
  • Integrating with LangChain, LlamaIndex, and custom applications.
  • Managing authentication and rate limiting via reverse proxy configurations.

Performance Optimization

  • Configuring context window size and managing KV cache.
  • Handling batch inference and parallel requests.
  • Allocating CPU threads and ensuring NUMA awareness.
  • Monitoring GPU utilization and memory pressure levels.

Security and Compliance

  • Establishing network isolation for model serving endpoints.
  • Implementing input filtering and output moderation pipelines.
  • Maintaining audit logs of prompts and generated completions.
  • Verifying model provenance and hash integrity.

Requirements

  • Intermediate proficiency in Linux and container administration.
  • High-level understanding of machine learning concepts and transformer models.
  • Familiarity with REST APIs and JSON formats.

Target Audience

  • AI engineers and developers transitioning from cloud LLM APIs to local solutions.
  • Organizations bound by data sensitivity policies that prohibit the use of cloud models.
  • Government and defense teams requiring air-gapped language model deployments.
 14 Hours

Number of participants


Price per participant

Upcoming Courses

Related Categories