# First LLM Classifier UMD

Learn how journalists use large-language models to organize and analyze massive datasets

## What you will learn

This class will give you hands-on experience creating a machine-learning model that can read and categorize the text recorded in newsworthy datasets.

It will teach you how to:

* Submit large-language model prompts with the Python programming language
* Write structured prompts that can classify text into predefined categories
* Submit dozens of prompts at once as part of an automated routine
* Evaluate results using a rigorous, scientific approach
* Improve results by training the model with rules and examples

By the end, you will understand how LLM classifiers can outperform traditional machine-learning methods with significantly less code. And you will be ready to write a classifier on your own.

## Who can take it

This course is free. Anyone who has dabbled with code and AI is qualified to work through the materials. A curious mind and good attitude are all that’s required, but a familiarity with Python will certainly come in handy.

## Table of contents

```{toctree}
:maxdepth: 1
:name: mastertoc
:numbered:

our-mission
llm-wtf
groq
prompting-with-python
structured-responses
bulk-prompts
evaluate
improve
about
```

## About this class

[Ben Welsh](https://palewi.re/who-is-ben-welsh/) and [Derek Willis](https://thescoop.org/about/) prepared this guide for [a training session](https://schedules.ire.org/nicar-2025/index.html#2045) at the National Institute for Computer-Assisted Reporting’s 2025 conference in Minneapolis. Some of the copy was written with the assistance of GitHub's Copilot, an AI-powered text generator. The materials are available as free and [open source on GitHub](https://github.com/NewsAppsUMD/first-llm-classifier-umd). The project has been adapted to [run on Hugging Face](https://huggingface.co/spaces/JournalistsonHF/first-llm-classifier) by [Florent Daudens](https://www.linkedin.com/in/fdaudens/).