Written by: Arun Pandian M•Published on: Jun 5, 2026
Understanding LLMs, Ollama, and Inference
Before building AI applications, we need to understand three fundamental concepts:
LLM
↓
Ollama
↓
InferenceWhat is an LLM?
LLM stands for Large Language Model.
Examples:
A language model predicts the next piece of text.
Example:
Input:
The capital of France isPrediction:
ParisEvery response from an LLM is generated one token at a time.
Training vs Inference
Two terms you’ll hear frequently:
Training
The model learns patterns.
Books
Code
Articles
↓
Training
↓
ModelInference
The model answers questions.
Question
↓
Model
↓
AnswerAs AI application engineers, we mostly perform inference.
What is Ollama?
Think of Ollama as a runtime.
Java
↓
JVM
Python
↓
Interpreter
LLM
↓
OllamaJava
↓
JVM
Python
↓
Interpreter
LLM
↓
OllamaOllama loads and runs models on your machine.
Example:
ollama run phi3:miniCalling a Model
Once Ollama is running:
import ollama
response = ollama.chat(
model="phi3:mini",
messages=[
{
"role": "user",
"content": "What is Kotlin?"
}
]
)
print(response["message"]["content"])Flow:
Python
↓
Ollama
↓
Model
↓
ResponseExperiment
Try:
What is Android?Then:
Explain Android to a beginner.Notice how the model changes its answer based on the input.
#BuildInPublic#GenerativeAI#LocalLLM#Python#MachineLearning#SoftwareEngineering#TechEducation#ArtificialIntelligence#AIJourney#AIEngineering#Inference#LargeLanguageModels#LearningInPublic#OpenSourceAI#DeveloperTools#Ollama#AIApplications#PromptEngineering#AIAgents#LLM
