Running your own AI in-house – Part 1 – LLM/LVM – Ollama in the shell

Young Lama Pyrénés (France) by Luc Viatour under CC BY-SA 3.0

In this article series, I’ll be guiding you through the set up of an in-house AI setup, with web interface, tool API endpoint and all the fun stuff. You’ll be able to chat with your AI assistant, let it look at images, generate images, look at documents, do web-searches and overall give you an experience similar to ChatGPT (although we can’t rival ChatGPT on a single computer, we can at least get a decent experience, especially if you have a beefy Gamer PC or even a purpose-built device available). We’ll also take a look at DeepSeek’s distilled reasoning model!

In this first part, we’ll take a look at Ollama, what it is, how to set it up, running your first few models and how to use it and finally creating a secure API endpoint.

Ollama (installation)

The first thing we’ll cover is how to run your own LLM (Large Language Model) and LVLM (Large Vision-Language Model), which will allow you to chat with an AI assistant, and even feed it images to describe or do things like OCR and translating the contents.

To be able to do this, we’ll be using Ollama, which you’ll have to install from https://ollama.com/download

Follow the instructions for your desired OS. If you’re on Linux (e.g. Ubuntu Server) it’s as easy as running:

curl -fsSL https://ollama.com/install.sh | sh

Note for Arch users: You should instead install the ollama package for CPU based operations, or ollama-cuda if you have an NVIDIA graphics card, or ollama-rocm if you have a recent AMD graphic’s card, like e.g:

sudo pacman -Syu ollama-rocm

Ollama (first run)

Great! Now that you have Ollama installed, we can open a terminal and install our first model:

ollama pull llama3.2:latest

This will install the latest version of llama3.2, which is a small (roughly 2GB) LLM, optimized to run on limited hardware. If you have no supported GPU but a decent CPU, this should still run reasonably well.

After the pull is complete, you can run it with:

ollama run llama3.2

Wait for it to load, you should see a spinny thing and once it’s loaded you’ll see:

>>> Send a message (/? for help)

Now you can type a message, like e.g. Hello, how's it going? and wait for it to respond.

In my case it responded with:

I'm just a language model, so I don't have emotions or feelings like humans do. However, I'm here and ready to help you with any questions or tasks you have! How can I assist you today?

Note that the same input will generate different outputs! Each request adds a random seed, so it will never be the exact same output. Also, while in a “chat” session, Ollama will keep the context, so you can keep the conversation flowing. However, after exiting and re-running, it will be a fresh context.

To exit the chat session, you can do ^D (control D) or type /bye and hitting Enter.

Ollama – Single request and getting the output

In some cases, you might want to spawn Ollama to handle one single request, and getting the output and exiting again. You can do so by appending the request behind the run command, like so:

ollama run llama3.2 "How's it going?"

Which will output something like:

I'm doing well, thanks for asking! I'm a large language model, so I don't have emotions or personal experiences, but I'm always ready to help and provide information on a wide range of topics. How about you? How can I assist you today?

And then immediately exit.

We can also use echo to pipe something into it, like this:

echo "How's it going?" | ollama run llama3.2

This can be useful if we want to use it as a tool.

Ollama (as a tool for scripting)

Now that we can pipe things into it, let’s try to use Ollama as a tool. Let’s create a file named lorem.txt with the following contents:

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse non faucibus arcu. Praesent accumsan tincidunt porttitor. Nunc massa est, egestas in iaculis eu, blandit id arcu. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Proin vel massa eu tellus volutpat venenatis. Aenean velit est, sagittis id porta sed, gravida id enim. Aliquam ornare nisl eu cursus sagittis. Nulla ut sollicitudin purus. Nullam vestibulum sem at tempus tristique. Phasellus pharetra luctus accumsan. In quis lorem vitae libero porta dapibus. Fusce tristique, metus eget dictum semper, ipsum arcu rutrum lacus, a maximus ligula risus sit amet magna. Donec a felis imperdiet, iaculis felis sit amet, dignissim neque.

Now we can have it describe lorem.txt like this:

filename="lorem.txt" echo "Here's a file named ${filename}, please describe the file. Its contents are as follows: $(cat ${filename})" | ollama run llama3.2

Which will output:

The file `lorem.txt` appears to be a text file containing a block of Latin text that is often used as placeholder or filler text in graphic design, publishing, and other forms of media. The content of the file reads like a passage of writing, but it doesn't appear to be a coherent or meaningful sentence.

In fact, this type of text is called "Lorem Ipsum" or "placeholder text", and its purpose is to simulate the look and feel of real text while a design or layout is being developed. It's often used in web development, printing, and other applications where content needs to be displayed, but actual text isn't available.

The text has a classic, elegant style that evokes a sense of professionalism and sophistication, which may be why it's commonly used as placeholder text

You can essentially give it requests in natural language, while telling it about a file, passing its contents to it and other things. I’ll give you one more example!

Let’s say we have this social media post:

Woohoo, I love computers! W00t 1337! I use Arch btw.

Now we want to automatically categorize social media posts and filter technical posts. We can do so like this:

IS_TECH=$(echo 'We have this social media post: "Woohoo, I love computers! W00t 1337! I use Arch btw."; if it is related to tech, respond with Y, otherwise responde with N' | ollama run llama3.2)

Now we can run:

echo $IS_TECH

Which should simply return:

That’s an easy way to start writing scripts using AI. I’ll leave the creative thinking to you, the reader. (:

Getting more Models

You can list your currently installed models with:

ollama list

Now, you already know about llama3.2, but I’d like to share some more useful models with you:

deepseek-r1:8b – DeeSeek’s distilled “reasoning” model based on LLama, it’s the latest trend! :D
mistral:latest – is a nice all-purpose model, especially good as a general chat assistant
llama3.2-vision:latest – It can view images (describe them, do OCR, etc.)
codellama:latest – A model optimized for coding assistance

With DeepSeek you can “reason”, and it will output it’s emulated reasoning within think tags. I recommend the distilled 8b (8 billion parameters) model, since it’s small enough to run on a halfway decent GPU, and happens to be based on Llama, so you won’t get as much censorship. Ask it about Tiananmen! :D

Mistral is a model but the french company Mistral AI, it’s my favourite for general purpose chat and advice.

Llama3.2 vision is the counterpart to the already demonstrated llama3.2, except that this is a multi-modal model, which has a vision model built in on top of the LLM.

Codellama is pretty self explanatory: Coding assistant.

Describing images

Using llama3.2-vision we can describe images.

echo 'Describe the file <full path to image file>' | ollama run llama3.2-vision

You should now see “Added image '<path>‘” and then it will spin as it processes the image.

Instead of doing it programmatically, you can also simply chat with the LLM and then mention the full file path in the chat while asking it to describe the file.

Conclusion

In this article I showed you how to install Ollama, how to obtain models, gave my personal model recommendations, how to interact with them, how to script with them and how to describe images.

So far, we’ve been running everything locally in a terminal. In my next article, we’ll elevate things to the web (still self-hosted)!

I run this blog in my spare time. If I helped you out, consider donating a cup of coffee. Make sure to follow my blog on fedi or the RSS feed to be notified when my next article comes out!

2 Comments

Running your own AI in-house – Part 3 – LLM/LVM – Ollama on Telegram - Sindastra's info dump March 21, 2025 at 11:55 pm - Reply

[…] part 1 and part 2 of my article series, we looked at running Ollama, both on CLI and using the HTTP […]
Running your own AI in-house – Part 2 – LLM/LVM – Ollama on the Web (API) and Code Editor - Sindastra's info dump March 2, 2025 at 5:30 pm - Reply

[…] my previous article I showed you how to install Ollama and running your first LLM/LVM, recommend some models, gave you […]

Running your own AI in-house – Part 1 – LLM/LVM – Ollama in the shell

Ollama (installation)

Ollama (first run)

Ollama – Single request and getting the output

Ollama (as a tool for scripting)

Getting more Models

Describing images

Conclusion

About the Author: Sindastra

2 Comments

Leave A Comment Cancel reply

Running your own AI in-house – Part 1 – LLM/LVM – Ollama in the shell

Ollama (installation)

Ollama (first run)

Ollama – Single request and getting the output

Ollama (as a tool for scripting)

Getting more Models

Describing images

Conclusion

Share this post:

About the Author: Sindastra

2 Comments

Leave A Comment Cancel reply