Arun Pandian M

Arun Pandian M

Android Dev | Full-Stack & AI Learner

Written by: Arun Pandian MPublished on: Jun 5, 2026

Understanding LLMs, Ollama, and Inference

Before building AI applications, we need to understand three fundamental concepts:

LLM
↓
Ollama
↓
Inference
https://storage.googleapis.com/lambdabricks-cd393.firebasestorage.app/img_understand_olm_inference.svg?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=firebase-adminsdk-fbsvc%40lambdabricks-cd393.iam.gserviceaccount.com%2F20260607%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20260607T182439Z&X-Goog-Expires=3600&X-Goog-SignedHeaders=host&X-Goog-Signature=58893fc05fda0b0a99d23b2652af36eff96ac84fd31457db357d23499e54effdbe5fd9ec38a1242f7d54f6619bf3b4b92abb712356a046f68f8d9136d3a94de68ab6ef9e31933101f8ce3f5d3325ddd44558788b968cf329e1a124325b01b571818238c6a695f897af01c3633b1d6e4fd3831ff43df317df3e4f7c3425d52346414e44dfc94172fde7f19eaf3937ee82e5e272e56ac750ca412783d56a672d8df867093f53a95e892a267dbdc6e8cda944966d8163da72a542681ea5e7bf9feb7b87089b6bbc933c409b2c59909f6d86d357ac9ec62716b82d8dd99020cfe9fb14236f8b6673e9702c79247f8e3f739ba2448f9be193be3d9e9f2eae622ebc50

What is an LLM?

LLM stands for Large Language Model.

Examples:

  • Llama
  • Phi
  • Mistral
  • A language model predicts the next piece of text.

    Example:

    Input:

    The capital of France is

    Prediction:

    Paris

    Every response from an LLM is generated one token at a time.

    Training vs Inference

    Two terms you’ll hear frequently:

    Training

    The model learns patterns.

    Books
    Code
    Articles
    ↓
    Training
    ↓
    Model

    Inference

    The model answers questions.

    Question
    ↓
    Model
    ↓
    Answer

    As AI application engineers, we mostly perform inference.

    What is Ollama?

    Think of Ollama as a runtime.

    Java
    ↓
    JVM
    
    Python
    ↓
    Interpreter
    
    LLM
    ↓
    Ollama
    Java
    ↓
    JVM
    
    Python
    ↓
    Interpreter
    
    LLM
    ↓
    Ollama

    Ollama loads and runs models on your machine.

    Example:

    ollama run phi3:mini

    Calling a Model

    Once Ollama is running:

    import ollama
    
    response = ollama.chat(
        model="phi3:mini",
        messages=[
            {
                "role": "user",
                "content": "What is Kotlin?"
            }
        ]
    )
    
    print(response["message"]["content"])

    Flow:

    Python
    ↓
    Ollama
    ↓
    Model
    ↓
    Response

    Experiment

    Try:

    What is Android?

    Then:

    Explain Android to a beginner.

    Notice how the model changes its answer based on the input.

    #BuildInPublic#GenerativeAI#LocalLLM#Python#MachineLearning#SoftwareEngineering#TechEducation#ArtificialIntelligence#AIJourney#AIEngineering#Inference#LargeLanguageModels#LearningInPublic#OpenSourceAI#DeveloperTools#Ollama#AIApplications#PromptEngineering#AIAgents#LLM
    LAMBDA BRICKS