Django for AI: My DjangoCon US 2025 Confer

I recently presented my talk at DjangoCon US: Django for AI: Django for AI: Deploying Machine Learning Models with Django. Here is a text-based version of the talk.

If you want to jump to the code, here are the related repos:

iris_ml - Jupyter notebook of trained Iris model
django_irisml - Deployed Django site with Iris model available at DjangoForDataScience.com
djangoforai - Django + local LLM + server side events + HTMX demo

The live video will be posted shortly on the DjangoCon US YouTube channel. I will add it here when live.

Title Slide

AI is all around us. My entire trip here was powered by it: from the mapping software that guided my taxi, to the search browser that found my hotel, the songs recommended on my playlist… all of that is AI and more specifically Machine Learning, a branch of AI where we provide data, add algorithms, and the computer on its own comes up with connections. No explicit human code is needed.

Today, an even newer flavor of AI and Machine Learning is transforming our industry, LLMs (Large Language Models) like ChatGPT, which can write essays and increasingly code.

Right now the hype is as high as it’s ever been. I want to talk to you about this new AI future and where Django fits in.

Who Am I Slide

So who am I? My name is Will Vincent and I’ve been involved in the Django community for a long time. Currently, I work as a Developer Advocate at JetBrains focused on the PyCharm IDE (so come up to me if you have compliments or complaints).

For the past six years, I’ve done the Django Chat podcast along with Carlton Gibson, the Django News newsletter with Jeff Triplett, and run a dedicated website devoted to teaching Django, Learndjango.com. Between 2020 and 2022, I served on the Django Board as Treasurer.

Now I know what some of you are thinking: I don’t personally use AI and I never will. Good for you. But you’re in the minority.

Django Survey Slide

The most recent Django Developers Survey, run by the Django Software Foundation in conjunction with JetBrains, shows the following stats on AI usage.

17% of developers said they don’t use AI, but most do, including almost 70% for ChatGPT, a third for Copilot, Claude, JetBrains AI Assistant, and others.

I suspect these percentages will only rise in next year’s results.

How Learn Django Slide

How are Django developers learning these days? This chart is again from the 2024 Django Survey. Thankfully, the excellent Django docs reign supreme at 79%. But second place is a tie between StackOverflow, YouTube, and AI Tools at around 38%. Blogs are close there at 33% and then way down at 22% are books.

DjangoBook.com Slide

By the way, if you are looking for recent books, Jeff Triplett and I run the website djangobook.com that has a list of current in-print titles. For those of you old enough to remember, this was the initial domain for the original Django book written by Jacob Kaplan-Moss and Adrian Holovaty way back in the beginning.

Python Conferences Slide

I’ve been to a few Python conferences this year, including PyCon US and EuroPython. Guess what everyone was discussing? It wasn’t Django. It was AI. And if they talked about the web at all it was in relation to FastAPI.

Framework Usage

The most recent Python Developers Survey run by the PSF and JetBrains shows more usage of FastAPI than Django or Flask.

GitHub Stars Slide

If you look at GitHub stars, an admittedly imperfect measure, you can see that FastAPI is ascendant. We in the Django community can tell ourselves an updated version of the Flask story over the last 10 years, which is that GitHub Stars or PyPI downloads are imperfect measures that don’t capture real-world usage, where we feel Django is clearly the top dog. And these measures don’t account for all the things that make Django special: the community, the conferences, the docs, etc. We win out, for now, on those fronts.

Ostrich Head in the Sand Slide

But we shouldn’t just do this: stick our collective heads in the sand. Clearly something external to the traditional web framework race is happening. We should acknowledge it and try to learn from it.

Why Not Django Slide

So why not Django? When I ask this question to Python developers, I hear three things again and again.

Slow: Django is perceived as slow. FastAPI has fast in the name and it is async by default, which every website needs, right? (Hint: no, but that’s what people think.)
Old: Yeah, Django is old. We celebrate this fact, but a new user goes to the homepage and it feels old. They don’t look at 20 years of success and go, wow, boring tech that just works. Instead they think that’s not a modern tool.
Big/Hard to learn: Apparently AI is easier to learn than web development. But this is a version of the Flask/Django debate we’re familiar with on microframeworks versus batteries-included frameworks.

Web Not Equal API Endpoints Slide

This is what we need to get across to non-web developers. There’s more to the web than an endpoint.

An entire generation of Python developers, focused on AI, are being raised to think this. Train your fancy machine learning model and then use FastAPI, which is built-in to many AI tools, for the web situation. Done.

Don’t worry about an ORM, auth, forms, databases, security, deployment. In a way, this is a rehash of the Single Page Application (SPA) wars of not too long ago, when React and Angular were required because that’s what Facebook and Google used, and surely their engineering concerns are ours. Right?

Thankfully, Django withstood pressure to enhance our front-end offerings, which means now that we can adapt and incorporate a new technology, like HTMX, smoothly. But newcomers think every website needs to be like what they read on Hacker News from OpenAI or whatever large corporation is writing popular blog posts.

Yann LeCunn Tweet

My second big point is that the web itself has never been more important. These fancy models are useless without a way to connect them to paying users. How do they do that? With the web. But there’s a catch. Serving these models via inference is radically different than the database-driven paradigm we’re all used to with Django.

But don’t just take it from me. Here is a post from Yann LeCunn, one of the “godfathers of AI,” pointing out that the real cost is not training but inference. This is from January of this year–way back then–when DeepSeek, a model from China, came out and matched the leading frontier models despite claiming to only cost $6m to train. It wiped $1T from the stock market in just a few days.

While not every new company is going to be an AI chatbot, many are going to be integrating with them, and it’s important to understand their cost structure in this new world of web hosting. The web has never been more important but it is different now.

Gameplan Slide

Hopefully, I’ve got your attention now that change is afoot and Django must be part of it. So what’s the game plan? Broadly there are two sections to this talk.

I’m going to talk about classic Machine Learning, how to train models from scratch and deploy them with Django. We never really “owned” this niche, but we should. Django is great for small and medium ML models. I’ll show you why.
We’ll dive into LLMs, how they differ from traditional ML in training and deployment. Django does have a role to play here–I have a demo for it, too–but it’s quite different.

Classic ML Slide

So let’s talk about classic machine learning.

AI and then ML Under It Image

First, it’s important to acknowledge that “AI” is a problematic term because, from the beginning, it’s been intended as marketing and hype. There are lots of areas but Machine Learning is the most high profile at the moment. We will fill this chart out later when we get to LLMs.

Simply stated, ML means we as programmers don’t explicitly program. Instead, we add data, apply an algorithm, and the computer on its own figures out the connections.

An example is image recognition: if we loaded one million pictures of dogs and cats and then added an ML algorithm that computed for a while, the computer would figure out ways to identify them. It likely could not explain to us, as humans, how it was doing it. But we can run benchmarks for accuracy. And then say, this model seems to work.

The classic workflow is you have a training set, make a model, then expose it to the real-world with humans for feedback, then retrain. That’s the simplified loop.

Titanic and Iris Datasets

When you start in Machine Learning there are two prominent datasets to use for classification problems: Titanic (who lived/died) and Iris Species (what type of flower). These are the “Hello, World” datasets for data science.

Titanic is usually the preferred choice because it’s larger and the data is a little messy, which mimics the real world. Iris is actual real-world data, but very clean. That’s what we’ll use today since I don’t want to focus on data cleaning, even though that’s a large part of what data scientists do!

Three Iris Flowers

This is what Iris flowers look like by the way. There are 3 species: setosa, versicolor, and virginica. The goal is we want to train a ML model so that when we add in petal and sepal information, it will predict the flower species accurately for us.

Jupyter and Python Logos

The first step is to create a new Jupyter notebook. There are several ways to do this:

Web version on jupyter.org (basic, no install required).
Anaconda is a popular option that has its own Python version installed.
A text editor like PyCharm that has built-in support and lots of additional features.

Iris Dataset

Here is the Iris data: 3 species with 50 measurements each and 150 total. You can download a CSV file that has 5 columns of data.

Iris is so common it is included by default in many ML libraries including scikit-learn, R, and MATLAB because it is easy to experiment with.

Iris Pairplot Graphs

This is another way to look at the data using the seaborn library to visualize it in different ways. The blue dots (setosa) are clearly distinct but the orange and green (versicolor and virginica) have overlap. So our model has to do some work here to obtain accurate predictions.

Iris Pairplot Graphs

A few more default pairplot visualizations of the data.

Text of the plan below

Here is what we need to do and it can be done in around 20 lines of code in our Jupyter notebook. First, we use pandas to load and manipulate the dataset. scikit-learn is a Swiss Army Knife for Machine learning that is built on top of NumPy, SciPy, and Matplotlib. It gives one interface for various algorithms and tools.

Second, load the file and then split it into training/testing. We use most of the data to train, but reserve some to check if our predictions are accurate. Then choose our algorithm, in this case, a Supervised Learning Model Classifier (SVM). Choosing the right algorithm requires a lot of judgment and trial and error. But trust me that an SVM works pretty well here.

Then we’ll add a Jupyter cell to make predictions and predict results. Finally, save the model as a joblib file. This is a serialized Python object for saving and loading trained ML models.

Code Example

The training part of the model, by the way, is two lines here.

# Train the model
model = SVC(gamma="auto")
model.fit(X_train, y_train)

That’s it. The rest is setting it up and then evaluating the model.

I’m going a little fast here because I don’t want to get bogged down in code. You can see it all in the iris_ml repo.

What I do want is to show you that we can all this with around 20 lines of code including all the imports.

If we run it locally this is what we see. We can enter inputs and see results for our model within the Jupyter notebook. Predicted species and accuracy.

Django + ML Text

The trained model is pretty lonely on its own. It needs a web interface to connect with the outside world. Enter Django.

What do we need our web app to do? Forms for user input, interact with the model, and perhaps store model predictions for retraining. Django is perfect for this.

Gameplan Slide

Here is the game plan:

Create a new Django project
Load joblib file
Forms for users to enter predictions -> see results
Store user info in the database
Deployment

Django Project Structure Slide

This is the project structure: - a project called django_project - an app called predict - the iris.joblib file included in the base directory

We could add a models directory if we wanted to work with multiple models–that’s more common in the real world–but this is a simplified example.

views.py file

This is the predict/views.py file. A bit of text but not that scary.

The key points are:

Load the joblib model at the top
Create a function-based view called predict that accepts form values from the user and then makes a prediction using the model

Add a Model to Store Predictions

We can also add a model to store user predictions. This data can be used later to retrain the model, which is part of most data pipelines.

Classic ML Slide

You know what, why don’t you try this out yourself? Go to DjangoforDataScience.com and enter your own predictions.

While you’re doing that, this is a short video of what you’ll see. Enter some values and get a prediction.

Admin View Slide

We can use the Django admin to interact with the stored user data and predictions as needed. If I went to the live website, I could log in and see all your responses in real time.

Do It Live Gif

I could spend an entire keynote talking about how to take a Django site and make it live in production. I have two books that cover this step-by-step: Django for Beginners and Django for Professionals.

The code is in the repo: django_irisml.

The important point is the end result is not necessarily an enormous ML file that was computationally hard to serve. You can get away with simple setups in many cases.

LLMs are not like this by the way, as we will see in a moment.

Django Deployment Checklist

What are the actual deployment steps? I’m using a Platform as a Service here, but I was able to go thru them and deploy the site on a custom domain in less than 15 minutes.

Once you’ve done it a few times, it’s just a checklist to run through. This is how I do it. Again, I don’t want to just read out instructions to you. These slides will be available after. But it’s not too bad for a relatively secure site that is scalable.

Where Go From Here?

Where do we go from here? We could add user authentication, security, an API. Most data scientists could clone this repo and vibe code it to work with their data.

LLMs Under the Hood

Now let’s talk about why LLMs like ChatGPT are different. At a high level, they are next token prediction machines, developed for language models, that turn out to have other uses.

AI Diagram of Subcategories

Within the AI hierarchy, there is first ML and then neural networks. If you stack several networks together, you have Deep Learning.

Neural Network

This image is a neural network. You have multiple inputs that are computed by the model, a hidden layer, and then outputs. If you stack many neural networks together–frontier models are typically 100+ at this point–you get Deep Learning.

These techniques had been around for a long time–they were first proposed in the 1940s–but there were technical challenges around scaling them.

LLM Alien Image

So what’s actually happening inside the LLMs? In one sense, we don’t really know. Simon Willison, Django co-creator and now general man about town when it comes to AI, had a post in 2023 where he talked about LLMs as equivalent to an alien technology. I’m shamelessly stealing both his quote and his image here.

He said, “One way to think about it is that about 3 years ago, aliens landed on Earth. They handed over a USB stick and then disappeared. Since then, we’ve been poking the thing they gave us with a stick, trying to figure out what it does and how it works.”

Two years later we have a better sense of what’s happening but, because they are ML models, we can’t fully describe it. They just seem to work.

Attention is All You Need

In 2017, researchers at Google released a famous paper, “Attention is all you need,” that introduced Transformers. That’s the “t” in ChatGPT by the way. Previous architectures were sequential, one token at a time. With transformers and “attention” you could look at all the tokens at once. This meant you could throw big datasets and compute time against these models, especially using GPUs (Graphics Processing Units), which are designed for doing linear algebra (matrix multiplications) across billions of parameters.

Scaling Laws for Neural Language Models Paper

In January of 2020, OpenAI published this famous paper, “Scaling Laws for Neural Language Models”. It showed that the bigger your LLM model, the bigger the dataset, and the more compute you used, the better the results.

This was an unexpected result. In the past, neural networks hit a ceiling where it was hard to improve them. It turned out we had the right approach all along, we just lacked Transformers and millions of dollars to blast them with big datasets and compute.

If you are a big company, this discovery is great. It means money and data is all you need. Hence the current arms race in AI.

Ollama Image

A quick note on different models. We don’t know exactly how big frontier models like ChatGPT-5 are–probably enormous, think terabytes of parameters. But there are smaller open models you can use through services like Ollama seen here.

For example, Gemma is a class of models from Google. You can see there are multiple options available. The b stands for billion of parameters in the model.

Parameters are adjustable numbers inside the models that are tuned in training and assigned weights that are then frozen when the model is done. Broadly speaking, more parameters means a more powerful model, but they aren’t a free lunch: there are technical reasons why just adding an infinite number of parameters isn’t a guarantee of success.

Terminal with LLM Models

Ollama makes it very easy to consume open source models. One click download the app and then select a model and it will download it for you if not already on your machine.

Here are the models on my computer. You can see gpt-oss:20b so that’s 20B parameters from OpenAI and then three different Gemma models. The more parameters in a model the larger its size.

Luigi's Mansion

Any fans of Luigi’s Mansion here? It’s a game where you use a vacuum to suck the world.

There are two stages to building an LLM model: training and inference. Training means let’s vacuum up the entire internet, all written content: websites, books, Wikipedia, Reddit, you name it. All the large LLM companies have private versions of this but you can also use public options like Common Crawl, which is a non-profit that crawls the web and archives it each month. It covers 10s of billions of web pages and the compressed result is ~45TBs in size.

But there’s more to this stage than just copying. They also have to do a lot of data cleaning, duplicates, offensive content, etc.

Where AI Gets Its Data

Not all data is treated equally. Just like in search, companies weight certain sources more highly than others. What’s the highest rated? Semrush shows the top domains cited by ChatGPT and Perplexity: it’s not Wikipedia but Reddit. Then YouTube transcriptions.

Clearly lots of cleaning and data prep is required to obtain good results from these sources.

Google AI Most Cited Domains Chart

Here’s another graphic from Semrush showing the most cited domains in Google AI overviews. That’s what Google gives you instead of actual search results these days. Quora and Reddit are at the top followed by LinkedIn. So similar to search, the LLMs provide weightings to different sources based on authority.

Tokenizer Website

Tokenization is a key part of the training process. It means transforming words (or subwords) into numerical IDs. This image is from a website called Tokenizer from OpenAI. You can enter any text and see how it would be converted into numbers.

Note that punctuation receives its own token as do longer words. A general rule of thumb for English is 1 token is ~3/4 word. It’s worth remembering these LLMs are not thinking machines, they are doing a lot of maths and trying to statistically guess what text should come next in a response.

Token Usage

So talking tokens, how many are there? Something like 10^15. One quadrillion. It’s a big number, but it’s not enough. In newer models, companies are using AI to create more training data, called synthetic data, for training.

Training Compute

Compute times and cost also keep increasing. Here is a chart from Epoch AI of training compute over the last 15 years. It’s clearly up and to the right, broadly increasing 4x a year.

The Y access is logarithmic btw. A FLOP (Floating Point Operation) is basically one arithmetic step (like 4.1 x 3 + 2.5) done on floating-point numbers.

We have gone from 10^14 in 2010 to 10^26 and likely more by now. That’s 1 trillion times bigger. The end result for frontier-level models is they are terabytes size. More than can be stored to serve with Django in the previous example.

But even if we could–even if the models were much smaller-there’s other reasons why Django is a poor choice to serve them!

What is Inference?

And that’s because of something called Inference, the process by which inputs are fed into an LLM model, computed, and then stream output. It doesn’t have to be just text as this image shows, but for the moment, text works best.

The key point is this IS NOT like querying a database. For every input a GPU has to run on the model, which is expensive.

LLM Inference?

Here’s a more detailed guide. I send a prompt into the GPU box, which is empty for now. The model weights are frozen, input text is passed through all the layers, multiplied and transformed by the weights. The result is a probability distribution over the next token.

Responses are staggered between different time stamps. When the model generates its first token (“Django”), every output token is pushed back into the initial prompt. “Django” is passed back in, then the next output “is” comes out and send back in, and so on. At a high level.

LLM Inference Processing

Further down a level, what’s happening is our human text is tokenized, run through the GPU, and then de-tokenized back into text. The tokenizers turn text into vectors that the models can do math on, lots of matrix multiplication and linear algebra.

Traditional Web Requests vs LLM Inference

This slide here is the payoff for all the LLM internals I just discussed. It is what I really want to impart on those of you who are relatively new to LLMs.

Traditional web requests go to the database and send a response. Simple. This is fast, not expensive, and we can add things like CDNs, caching, and database indexes to speed it up. LLM Inference is completely different. Each prompt is unique, triggers a GPU cluster to run, and then returns generated tokens. We can’t speed this process up other than throwing more GPUs (aka money) at it.

We often talk about “internet scale” companies like Google or Facebook and the immense scaling challenges they face. But consider doing that for LLM inference and the costs are astronomical. This is why OpenAI can’t buy enough chips or data centers. Why nuclear power plants are being restarted to power the energy grid. Yes part of that is training, but most of it is inference, aka serving these models.

It’s wild.

LLMs Inference API

What is the web piece for LLM companies? Something that is fast, lightweight, and can stream tokens asynchronously as they are produced. Enter FastAPI, which is built on top of Starlette, the ASGI framework providing the request/response cycle and routing.

FastAPI adds request parsing and validation via Pydantic and extra features for API development. So for web handling LLMs you have an inference engine like vLLM and then FastAPI endpoints in front of that. But of course this requires a JavaScript frontend and also lots of other things like authentication, security, an ORM, and so on to build a normal web app.

Django + AI Venn Diagram

So where does Django fit in? It’s not going to be on the “hot path” of inference… but only LLM providers need that. Most websites will be consuming LLM APIs. Not dissimilar to what websites do now.

What if I Told You Meme

What if I told you Django can do more than you think? Let’s say we wanted to reproduce an LLM chatbot. We can do it!

Websockets, two-way connections, are overkill even though Django has Channels, Daphne, and Channels Redis. These are all now maintained by Carlton Gibson by the way.

What can we use instead? (dramatic pause…)

HTTP vs SSE

Server-Sent Events. The boring old web. Send one HTTP Request and receive Streaming Responses. Django has a StreamingHttpResponse response class.

Can anyone guess when this was added? 2013! In Django 1.5.

If you wanted to get really fancy you could use HTML Streaming as well. Let me show you a demo…

I’ve wired up a local Django project to an Ollama model–Gemma 3:4B here–that streams tokens to an API endpoint for us. No JavaScript here, all HTMX in two templates. We are using a Python generator, yield, to send the tokens one at a time and then render. Regular synchronous views. Store the results in a database and access it via the admin.

This is just a prototype I whipped up, but I think it’s pretty cool. I have a repo available with the code.

Questions?

And that’s it! I covered a lot of ground here today. I hope you have a better sense of the AI landscape, how the web is changing, and how Django fits in.

Personally, I think Django is perfectly suited to this new world, we as a community just need to do a better job of telling all the younger AI people that we’re here when they are ready to share their models, whether it’s classic ML like Iris or fancy LLMs.

Django has been here the past twenty years, and hopefully it will for the next twenty as well.

Thank you for your time.

Django for AI: My DjangoCon US 2025 Conference Talk

Related Posts

Resurrecting the Original Django Book

I Miss Tabs vs Spaces... And Other AI Musings

uv Livestream with Michael Kennedy