OpenWebUI & Ollama: Experience AI on Your Terms with Local Hosting
In an era where AI solutions are behind subscription models behind cloud-based solutions, OpenWebUI and Ollama provide a powerful alternative that prioritize privacy, security, and cost efficiency. These open-source tools are revolutionizing how organizations and individuals can harness AI capabilities while maintaining complete control of models and data used.
Why use local LLMs? #1 Uncensored Models
One significant advantage of local deployment through Ollama is the ability to use a model of your choosing which includes unrestricted LLMs. While cloud-based AI services often implement various limitations and filters on their models to maintain content control and reduce liability, locally hosted models can be used without these restrictions. This provides several benefits:
- Complete control over model behavior and outputs
- Ability to fine-tune models for specific use cases without limitations
- Access to open-source models with different training approaches
- Freedom to experiment with model parameters and configurations
- No artificial constraints on content generation or topic exploration
This flexibility is particularly valuable for research, creative applications, and specialized industry use cases where standard content filters might interfere with legitimate work.
Here’s an amazing article from Eric Hartford on: Uncensored Models

Why use local LLMs? #2 Privacy
When running AI models locally through Ollama and OpenWebUI, all data processing occurs on your own infrastructure. This means:
- Sensitive data never leaves your network perimeter
- No third-party access to your queries or responses
- Complete control over data retention and deletion policies
- Compliance with data sovereignty requirements
- Protection from cloud provider data breaches
Implementation
Requirements:
- Docker
- NVIDIA Container Toolkit (Optional but Recommended)
- GPU + NVIDIA Cuda Installation (Optional but Recommended)
Step 1: Install Ollama
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama:latest
Step 2: Launch Open WebUI with the new features
docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main
Need help setting up Docker and Nvidia Container toolkit?
OpenWeb UI and Ollama
OpenWebUI provides a sophisticated interface for interacting with locally hosted models while maintaining all the security benefits of local deployment. Key features include:
- Intuitive chat interface similar to popular cloud-based AI services
- Support for multiple concurrent model instances
- Built-in prompt templates and history management
- Customizable UI themes and layouts
- API integration capabilities for internal applications
Ollama simplifies the process of running AI models locally while providing robust security features:
- Easy model installation and version management
- Efficient resource utilization through optimized inference
- Support for custom model configurations
- Built-in model verification and integrity checking
- Container-friendly architecture for isolated deployments
Deploying Azure Functions with Azure DevOps: 3 Must-Dos! Code Security Included
Azure Functions is a serverless compute service that allows you to run your code in response to various events, without the need to manage any infrastructure. Azure DevOps, on the other hand, is a set of tools and services that help you build, test, and deploy your applications more efficiently. Combining these two powerful tools can streamline your Azure Functions deployment process and ensure a smooth, automated workflow.
In this blog post, we’ll explore three essential steps to consider when deploying Azure Functions using Azure DevOps.
1. Ensure Consistent Python Versions
When working with Azure Functions, it’s crucial to ensure that the Python version used in your build pipeline matches the Python version configured in your Azure Function. Mismatched versions can lead to unexpected runtime errors and deployment failures.
To ensure consistency, follow these steps:
- Determine the Python version required by your Azure Function. You can find this information in the
requirements.txt
file or thehost.json
file in your Azure Functions project. - In your Azure DevOps pipeline, use the
UsePythonVersion
task to set the Python version to match the one required by your Azure Function.
- task: UsePythonVersion@0
inputs:
versionSpec: '3.9'
addToPath: true
- Verify the Python version in your pipeline by running
python --version
and ensuring it matches the version specified in the previous step.
2. Manage Environment Variables Securely
Azure Functions often require access to various environment variables, such as database connection strings, API keys, or other sensitive information. When deploying your Azure Functions using Azure DevOps, it’s essential to handle these environment variables securely.
Here’s how you can approach this:
- Store your environment variables as Azure DevOps Service Connections or Azure Key Vault Secrets.
- In your Azure DevOps pipeline, use the appropriate task to retrieve and set the environment variables. For example, you can use the
AzureKeyVault
task to fetch secrets from Azure Key Vault.
- task: AzureKeyVault@1
inputs:
azureSubscription: 'Your_Azure_Subscription_Connection'
KeyVaultName: 'your-keyvault-name'
SecretsFilter: '*'
RunAsPreJob: false
- Ensure that your pipeline has the necessary permissions to access the Azure Key Vault or Service Connections.
3. Implement Continuous Integration and Continuous Deployment (CI/CD)
To streamline the deployment process, it’s recommended to set up a CI/CD pipeline in Azure DevOps. This will automatically build, test, and deploy your Azure Functions whenever changes are made to your codebase.
Here’s how you can set up a CI/CD pipeline:
- Create an Azure DevOps Pipeline and configure it to trigger on specific events, such as a push to your repository or a pull request.
- In the pipeline, include steps to build, test, and package your Azure Functions project.
- Add a deployment task to the pipeline to deploy your packaged Azure Functions to the target Azure environment.
# CI/CD pipeline
trigger:
- main
pool:vmImage: ‘ubuntu-latest’
steps:– task: UsePythonVersion@0
inputs:
versionSpec: ‘3.9’
addToPath: true
– script: |pip install -r requirements.txt
displayName: ‘Install dependencies’
– task: AzureWebApp@1inputs:
azureSubscription: ‘Your_Azure_Subscription_Connection’
appName: ‘your-function-app-name’
appType: ‘functionApp’
deployToSlotOrASE: true
resourceGroupName: ‘your-resource-group-name’
slotName: ‘production’
By following these three essential steps, you can ensure a smooth and reliable deployment of your Azure Functions using Azure DevOps, maintaining consistency, security, and automation throughout the process.
Bonus: Embrace DevSecOps with Code Security Checks
As part of your Azure DevOps pipeline, it’s crucial to incorporate security checks to ensure the integrity and safety of your code. This is where the principles of DevSecOps come into play, where security is integrated throughout the software development lifecycle.
Here’s how you can implement code security checks in your Azure DevOps pipeline:
- Use Bandit for Python Code Security: Bandit is a popular open-source tool that analyzes Python code for common security issues. You can integrate Bandit into your Azure DevOps pipeline to automatically scan your Azure Functions code for potential vulnerabilities.
- script: |
pip install bandit
bandit -r your-functions-directory -f custom -o bandit_report.json
displayName: 'Run Bandit Security Scan'
- task: PublishBuildArtifacts@1
inputs:
PathtoPublish: 'bandit_report.json'
ArtifactName: 'bandit-report'
publishLocation: 'Container'
- Leverage the Safety Tool for Dependency Scanning: Safety is another security tool that checks your Python dependencies for known vulnerabilities. Integrate this tool into your Azure DevOps pipeline to ensure that your Azure Functions are using secure dependencies.
- script: |
pip install safety
safety check --full-report
displayName: 'Run Safety Dependency Scan'
- Review Security Scan Results: After running the Bandit and Safety scans, review the generated reports and address any identified security issues before deploying your Azure Functions. You can publish the reports as build artifacts in Azure DevOps for easy access and further investigation.
By incorporating these DevSecOps practices into your Azure DevOps pipeline, you can ensure that your Azure Functions are not only deployed efficiently but also secure and compliant with industry best practices.
The AI-Driven Evolution of Databases
The hype of Artificial Intelligence (AI) and Retrieval-Augmented Generation (RAG) is revolutionizing databases and how they are architected. Traditional database management systems (DBMS) are being redefined to harness the capabilities of AI, transforming how data is stored, retrieved, and utilized. In this article I am sharing some of the shifts happening right now to catch up and create better DBs that can play nice with AI.
1. Vectorization and Embedding Integration
Traditional databases store data in structured formats, typically as rows and columns in tables. However, with the rise of AI, there is a need to store and query high-dimensional data such as vectors (embeddings), which represent complex data types like images, audio, and natural language.
- Embedding Vectors: When new data is inserted into the database, it can be vectorized using machine learning models, converting the data into embedding vectors. This allows for efficient similarity searches and comparisons. For example, inserting a new product description could automatically generate an embedding that captures its semantic meaning.
- Vector Databases: Specialized vector databases like Pinecone, Weaviate, FAISS (Facebook AI Similarity Search) and Azure AI Search are designed to handle and index vectorized data, enabling fast and accurate similarity searches and nearest neighbor queries.
A great example is PostgreSQL which can be extended to handle high-dimensional vector data efficiently using the pgvector
extension. This capability is particularly useful for applications involving machine learning, natural language processing, and other AI-driven tasks that rely on vector representations of data.
What is pgvector?
pgvector
is an extension for PostgreSQL that enables the storage, indexing, and querying of vector data. Vectors are often used to represent data in a high-dimensional space, such as word embeddings in NLP, feature vectors in machine learning, and image embeddings in computer vision.
2. Enhanced Indexing Techniques
One of the main changes to support AI is that now your index is required to support ANN (approximate nearest neighbor) queries against vector data. A typical query would be “find me the top N vectors that are most similar to this one”. Each vector may have 100s or 1000s of dimensions, and similarity is based on overall distance across all these dimensions. Your regular btree or hash table index is completely useless for this kind of query, so new types of indexes are provided as part of pgvector on PostgreSQL, or you could use Pinecone, Milvus and many solutions being developed as AI keeps demanding data, these solutions are more specialized for these workloads.
Databases are adopting hybrid indexing techniques that combine traditional indexing methods (B-trees, hash indexes) with AI-driven indexes such as neural hashes and inverted indexes for text and multimedia data.
- AI-Driven Indexing: Machine learning algorithms can optimize index structures by predicting access patterns and preemptively loading relevant data into memory, reducing query response times.
What is an Approximate Nearest Neighbor (ANN) Search? It’s an algorithm that finds a data point in a data set that’s very close to the given query point, but not necessarily the absolute closest one. An NN algorithm searches exhaustively through all the data to find the perfect match, whereas an ANN algorithm will settle for a match that’s close enough.
3. Automated Data Management and Maintenance
AI-driven databases can automatically adjust configurations and optimize performance based on workload analysis. This includes automatic indexing, query optimization, and resource allocation.
- Adaptive Query Optimization: AI models predict the best execution plans for queries by learning from historical data, continuously improving query performance over time.
Predictive Maintenance: Machine learning models can predict hardware failures and performance degradation, allowing for proactive maintenance and minimizing downtime.
Some examples:
- Azure SQL Database offers built-in AI features such as automatic tuning, which includes automatic indexing and query performance optimization. Azure DBs also provide insights with machine learning to analyze database performance and recommend optimizations.
- Google BigQuery incorporates machine learning to optimize query execution and manage resources efficiently and allows users to create and execute machine learning models directly within the database.
- Amazon Aurora utilizes machine learning to optimize queries, predict database performance issues, and automate database management tasks such as indexing and resource allocation. They also integrate machine learning capabilities directly into the database, allowing for real-time predictions and automated adjustments.
Wrap-Up
The landscape of database technology is rapidly evolving, driven by the need to handle more complex data types, improve performance, and integrate seamlessly with machine learning workflows. Innovations like vectorization during inserts, enhanced indexing techniques, and automated data management are at the forefront of this transformation. As these technologies continue to mature, databases will become even more powerful, enabling new levels of efficiency, intelligence, and security in data management.
Understanding GPU Evolution and Deployment in Datacenter or Cloud

In the current hype of Artificial Intelligence we find ourselves working a lot more with GPUs which is very fun! GPU workloads can be run on physical servers, VMs, or containers. Each approach has its own advantages and disadvantages.
Physical servers offer the highest performance and flexibility, but they are also the most expensive and complex to manage. VMs provide a good balance between performance, flexibility, resource utilization, and cost. Containers offer the best resource utilization and cost savings, but they may sacrifice some performance and isolation.
For Cloud we often recommend Containers since there are services where you can spin up, run your jobs and spin down. A Physical VM running 24-7 on cloud services can be expensive. The performance trade-off is often solved by distributing load or development.
The best deployment option for your GPU workloads will depend on your specific needs and requirements. If you need the highest possible performance and flexibility, physical servers are the way to go. If you are looking to improve resource utilization and cost savings, VMs or containers may be a better option.
Deployment | Description | Benefits | Disadvantages |
---|---|---|---|
Physical servers | Each GPU is directly dedicated to a single workload. | Highest performance and flexibility. | Highest cost and complexity. |
VMs | GPUs can be shared among multiple workloads. | Improved resource utilization and cost savings. | Reduced performance and flexibility. |
Containers | GPUs can be shared among multiple workloads running in the same container host. | Improved resource utilization and cost savings. | Reduced performance and isolation. |
Additional considerations:
- GPU virtualization: GPU virtualization technologies such as NVIDIA vGPU and AMD SR-IOV allow you to share a single physical GPU among multiple VMs or containers. This can improve resource utilization and reduce costs, but it may also reduce performance.
- Workload type: Some GPU workloads are more sensitive to performance than others. For example, machine learning and artificial intelligence workloads often require the highest possible performance. Other workloads, such as video encoding and decoding, may be more tolerant of some performance degradation.
- Licensing: Some GPU software applications are only licensed for use on physical servers. If you need to use this type of software, you will need to deploy your GPU workloads on physical servers.
Overall, the best way to choose the right deployment option for your GPU workloads is to carefully consider your specific needs and requirements.
Here is a table summarizing the key differences and benefits of each deployment options:
Characteristic | Physical server | VM | Container |
---|---|---|---|
Performance | Highest | High | Medium |
Flexibility | Highest | High | Medium |
Resource utilization | Medium | High | Highest |
Cost | Highest | Medium | Lowest |
Complexity | Highest | Medium | Lowest |
Isolation | Highest | High | Medium |
Containers for Data Scientists on top of Azure Container Apps
The Azure Data Science VMs are good for dev and testing and even though you could use a virtual machine scale set, that is a heavy and costly solution.
When thinking about scaling, one good solution is to containerize the Anaconda / Python virtual environments and deploy them to Azure Kubernetes Service or better yet, Azure Container Apps, the new abstraction layer for Kubernetes that Azure provides.
Here is a quick way to create a container with Miniconda 3, Pandas and Jupyter Notebooks to interface with the environment. Here I also show how to deploy this single test container it to Azure Container Apps.
The result:
A Jupyter Notebook with Pandas Running on Azure Container Apps.

Container Build
If you know the libraries you need then it would make sense to start with the lightest base image which is Miniconda3, you can also deploy the Anaconda3 container but that one might have libraries you might never use that might create unnecessary vulnerabilities top remediate.
Miniconda 3: https://hub.docker.com/r/continuumio/miniconda3
Anaconda 3: https://hub.docker.com/r/continuumio/anaconda3
Below is a simple dockerfile to build a container with pandas, openAi and tensorflow libraries.
FROM continuumio/miniconda3
RUN conda install jupyter -y --quiet && \ mkdir -p /opt/notebooks
WORKDIR /opt/notebooks
RUN pip install pandas
RUN pip install openAI
RUN pip install tensorflow
CMD ["jupyter", "notebook", "--ip='*'", "--port=8888", "--no-browser", "--allow-root"]
Build and Push the Container
Now that you have the container built push it to your registry and deploy it on Azure Container Apps. I use Azure DevOps to get the job done.

Here’s the pipeline task:
- task: Docker@2
inputs:
containerRegistry: 'dockerRepo'
repository: 'm05tr0/jupycondaoai'
command: 'buildAndPush'
Dockerfile: 'dockerfile'
tags: |
$(Build.BuildId)
latest
Deploy to Azure ContainerApps
Deploying to Azure Container Apps was painless, after understanding the Azure DevOps task, since I can include my ingress configuration in the same step as the container. The only requirement I had to do was configure DNS in my environment. The DevOps task is well documented as well but here’s a link to their official docs.
Architecture / DNS: https://learn.microsoft.com/en-us/azure/container-apps/networking?tabs=azure-cli
Azure Container Apps Deploy Task : https://github.com/microsoft/azure-pipelines-tasks/blob/master/Tasks/AzureContainerAppsV1/README.md

A few things I’d like to point out is that you don’t have to provide a username and password for the container registry the task gets a token from az login. The resource group has to be the one where the Azure Container Apps environment lives, if not a new one will be created. The target port is where the container listens on, see the container build and the jupyter notebooks are pointing to port 8888. If you are using the Container Apps Environment with a private VNET, setting the ingress to external means that the VNET can get to it not outside traffic from the internet. Lastly I disable telemetry to stop reporting.
task: AzureContainerApps@1
inputs:
azureSubscription: 'IngDevOps(XXXXXXXXXXXXXXXXXXXX)'
acrName: 'idocr'
dockerfilePath: 'dockerfile'
imageToBuild: 'idocr.azurecr.io/m05tr0/jupycondaoai'
imageToDeploy: 'idocr.azurecr.io/m05tr0/jupycondaoai'
containerAppName: 'datasci'
resourceGroup: 'IDO-DataScience-Containers'
containerAppEnvironment: 'idoazconapps'
targetPort: '8888'
location: 'East US'
ingress: 'external'
disableTelemetry: true
After deployment I had to get the token which was easy with the Log Stream feature under Monitoring. For a deployment of multiple Jupyter Notebooks it makes sense to use JupyterHub.

Azure Open AI: Private and Secure "ChatGPT like" experience for Enterprises.

Azure provides the OpenAI service to address the concerns for companies and government agencies that have strong security regulations but want to leverage the power of AI as well.
Most likely you’ve used one of the many AI offerings out there. Open AI’s ChatGPT, Google Bard, Google PaLM with MakerSuite, Perplexity AI, Hugging Chat and many more have been in the latest hype and software companies are racing to integrate them into their products. The main way is to buy a subscription and connect to the ones that offer their API over the internet but as an DevSecOps engineer here’s where the fun starts.
A lot of companies following good security practices block traffic to and from the internet so the first part of all this will be to open the firewall. Next you must protect the credentials of the API user so that it doesn’t get hacked and access will reveal what you are up to. Then you have to trust that OpenAI is not using your data to train their models and that they are keeping your company’s data safe.
It could take a ton of time to plan, design and deploy a secured infrastructure for using large language models and unless you have a very specific use case it might be overkill to build your own.
Here’s a breakdown of a few infrastructure highlights about this service.
3 Main Features
Privacy and Security
Your Chat-GPT like interface called Azure AI Studio runs in your private subscription. It can be linked to one of your VNETs so that you can use internal routing and you can also add private endpoints so that you don’t even have to use it over the internet.

Even if you have to use it over the internet you can lock it down to only allow your public IPs and your developers will need a token for authentication as well that can be scripted to rotate every month.

Pricing

Common Models
- GPT-4 Series: The GPT-4 models are like super-smart computers that can understand and generate human-like text. They can help with things like understanding what people are saying, writing stories or articles, and even translating languages.
Key Differences from GPT-3:
- Model Size: GPT-4 models tend to be larger in terms of parameters compared to GPT-3. Larger models often have more capacity to understand and generate complex text, potentially resulting in improved performance.
- Training Data: GPT-4 models might have been trained on a more extensive and diverse dataset, potentially covering a broader range of topics and languages. This expanded training data can enhance the model’s knowledge and understanding of different subjects.
- Improved Performance: GPT-4 models are likely to demonstrate enhanced performance across various natural language processing tasks. This improvement can include better language comprehension, generating more accurate and coherent text, and understanding context more effectively.
- Fine-tuning Capabilities: GPT-4 might introduce new features or techniques that allow for more efficient fine-tuning of the model. Fine-tuning refers to the process of training a pre-trained model on a specific dataset or task to make it more specialized for that particular use case.
- Contextual Understanding: GPT-4 models might have an improved ability to understand context in a more sophisticated manner. This could allow for a deeper understanding of long passages of text, leading to more accurate responses and better contextual awareness in conversation.
- GPT-3 Base Series: These models are also really smart and can do similar things as GPT-4. They can generate text for writing, help translate languages, complete sentences, and understand how people feel based on what they write.
- Codex Series: The Codex models are designed for programming tasks. They can understand and generate computer code. This helps programmers write code faster, get suggestions for completing code, and even understand and improve existing code.
- Embeddings Series: The Embeddings models are like special tools for understanding text. They can turn words and sentences into numbers that computers can understand. These numbers can be used to do things like classify text into different categories, find information that is similar to what you’re looking for, and even figure out how people feel based on what they write.
Getting Access to it!
Although the service is Generally Available (GA) it is only available in East US and West Europe. You also have to submit an application so that MS can review your company and use case so they can approve or deny your request. This could be due to capacity and for Microsoft to gather information on how companies will be using the service.
The application is here: https://aka.ms/oai/access
Based on research and experience getting this for my clients I always recommend only pick what you initially need and not get too greedy. It would be also wise to speak with your MS Rep and take them out for a beer! For example if you just need code generation then just select the codex option.
Lately getting the service has been easier to get, hopefully soon we won’t need the form and approval dance.

Easiest Way to Deploy Ubuntu 20.04 with NVIDIA Drivers and the Latest CUDA toolkit via Packer.

I am building an analytics system that deploys containers on top of the Azure NCasT4_v3-series virtual machines which are powered by Nvidia Tesla T4 GPUs and AMD EPYC 7V12(Rome) CPUs. I am deploying the VM from an Azure DevOps pipeline using Hashicorp Packer and after trying a few ways I found a very easy way to deploy the VM, Driver and Cuda Toolkit which I will share in this article.