Build measurable fluency with AI tools

Building AI Skills

Develop AI competencies through structured practice, real evaluation frameworks, and deliberate technique. Move from casual use to confident, critical mastery.

Scroll

Three Core Competencies

The skills that separate confident users from casual ones

Everyone can type a question into an AI. Mastery is the repeatable ability to shape, evaluate, and reason with its output. These three skills compound.

Prompt Engineering

The craft of designing instructions that reliably produce the output you need. Structure, context, examples, constraints, and iteration, not magic words.

What mastery looks like

You can predict how a prompt will perform before sending it, and diagnose why it failed when it doesn't.

Output Evaluation

A systematic way to judge AI responses for accuracy, bias, completeness, and appropriateness to your task. Trust is earned per output, never granted by default.

What mastery looks like

You catch hallucinations and subtle errors without reading every word twice, because you know where to look.

Critical Thinking with AI

Using AI to sharpen your reasoning without outsourcing it. Knowing when to accept, when to push back, and when to set the tool aside entirely.

What mastery looks like

Your thinking is better with AI than without it, and you can tell the difference between agreement and insight.

Where are you?

A seven-question self-check

No tracking, no accounts. Takes two minutes and gives you a realistic read on your current level and where to focus next.

Question 1 of 7 Pick the option closest to your current habit

Skill Pathway

Three levels, three different questions to ask yourself

Filter by your role above to see examples tuned to your work. Filter by level to jump to where you are.

Level 01 · Foundational

Can I get a useful response on my first try?

You've used AI before, but results feel inconsistent. The goal at this level is to reliably get a response that's in the right ballpark without starting over five times.

Est. time

3–5 hours

Focus

Clarity & context

At this level, you can

Write prompts that include a role, a task, and a format so the output is structured and usable.
Give the model relevant context instead of assuming it knows your situation.
Recognize when an answer is generic or off-topic and ask a sharper follow-up.
Tell the difference between a factual question and a generative one.

For students

Practice this week

1
Rewrite three of your old prompts. Take prompts you've used this term and add a role, context, and a target format. Compare outputs.

Rewrite three of your old prompts. Take prompts you've sent for teaching prep and add a role, audience, and output format. Compare outputs.
2
Explain a concept from your major at three different levels: to a peer, to a first-year, to a family member. Judge which version is actually accurate.

Generate a concept explanation aimed at three student levels (intro, intermediate, upper-level). Judge which version would genuinely work in your classroom.
3
Ask for the weakness. After any AI answer, ask "What's the strongest objection to what you just told me?" Notice how often the answer shifts.

Ask for the weakness. After any AI draft, ask "What's the strongest objection a critical reader would make?" Use the response to sharpen your own thinking.

Level 02 · Developing

Can I shape output to exactly what I need?

You get decent responses, but complex tasks still take many rounds. The goal here is technique: you learn specific moves that consistently improve results.

Est. time

8–12 hours

Focus

Technique & iteration

At this level, you can

Apply few-shot examples and chain-of-thought to get reliable results on reasoning tasks.
Break a complex request into steps and run them as a sequence rather than one mega-prompt.
Set explicit constraints (length, tone, format, what to avoid) and have the model respect them.
Evaluate an output against your own criteria, not just "looks right".
Choose between tools based on the task's actual shape, not habit.

For students

Practice this week

1
Decompose a long assignment. Take a multi-part task and run it as a sequence of 4–6 prompts, each producing an output the next prompt uses. Compare quality to a single prompt attempt.

Decompose a grading rubric. Turn your rubric into a sequence of focused prompts (alignment, clarity, evidence, structure). Compare to a single "give feedback on this paper" prompt.
2
Build a few-shot prompt by showing the model two or three strong examples of what you want before asking it to generate a fourth. Notice how much output quality jumps.

Build a few-shot prompt with two or three exemplar responses in your discipline's voice before asking for a fourth. Compare to unprompted output.
3
Use chain-of-thought on a problem you'd normally get wrong. Add "think through this step by step, then give your final answer" and check the reasoning before the conclusion.

Apply chain-of-thought to a worked example you'd present to students. Have the model walk through each step explicitly; use the trace to spot where your own explanation compresses too much.

Level 03 · Advanced

Can I integrate AI into my real workflow without it integrating me?

You use AI fluently, now you need discipline. The goal at this level is judgment: knowing when to lean in, when to pull back, and how to keep your own voice.

Est. time

Ongoing

Focus

Judgment & integration

At this level, you can

Design multi-step workflows that combine AI generation, your own edits, and independent verification.
Identify when a task is genuinely improved by AI and when using AI will degrade your thinking.
Evaluate two model outputs side-by-side using explicit criteria, not gut feel.
Build reusable prompt patterns you refine over time rather than writing from scratch each session.
Hold a clear, honest view of where your work ends and the model's contribution begins.

For students

Practice this week

1
Run the same task on two different tools. Use the same prompt on two models, compare outputs against a written rubric, then explain which was better and why in specific terms.

Run the same task on two different tools. Use the same prompt for a teaching artifact on two models; evaluate against a written rubric and document which model fits which use cases in your practice.
2
Keep an "AI-off" subject for a week. Pick one topic or assignment where you refuse to use AI at all. Notice what gets harder, and what gets clearer.

Keep an "AI-off" teaching task for a week (a lecture, a rubric, a paper of feedback). Notice which parts were genuinely harder and which were more honest.
3
Build a personal prompt library. Collect the 5–10 prompts you keep reusing for real work. Refine each over several weeks and track which versions work best.

Build a teaching prompt library. Collect the 5–10 prompts you reuse for course design, feedback, and assessment. Refine each across a term and keep version notes.

The Prompt Lab

Seven techniques with before/after examples

Each technique is a specific, named move. See it fail, see it work, and understand why.

Be clear and direct

Say exactly what you want, to whom, in what form.

Most "bad AI output" is a response to an ambiguous prompt. Models pattern-match to the most likely interpretation of vague input. Specifying your task, audience, and desired format resolves that ambiguity up front.

Before

tell me about photosynthesis

After

Explain photosynthesis to a high school biology student who already knows about cell structure. In 150 words, cover: what enters, what exits, and why it matters for the carbon cycle. Use one concrete analogy.

Why it works

The second prompt fixes four things: the audience (so the vocabulary is right), the length (so the model doesn't over-deliver), the structure (so key points aren't missed), and the style (so it's concrete instead of abstract).

Role prompting

Give the model a perspective before giving it a task.

Assigning a role shapes vocabulary, default assumptions, and depth of explanation. It's not about trickery, it's about specifying the voice and expertise level you want the response delivered in.

Before

review this paragraph for problems

After

You are an academic writing tutor for undergraduate students. Review the paragraph below for three things: clarity of the main claim, evidence-to-claim fit, and topic sentence strength. Suggest one specific revision for each. Paragraph: [...]

Why it works

The role (academic writing tutor) narrows the feedback to what matters for the writer. The three-point structure prevents a generic "here are ten things" list and forces prioritization.

Few-shot examples

Show two or three examples of the output you want. Then ask for another.

Describing a style or format in words is imprecise. Showing examples is dramatically more precise. Few-shot prompting is the single biggest quality upgrade for any task with a consistent structure (summaries, rewrites, classifications, formatting).

Before

summarize these meeting notes in a useful way

After

Summarize meeting notes in this format: Example 1: Decisions: [comma-separated] Open questions: [as questions] Next steps: [who · what · by when] Example 2: Decisions: Moving to Monday standups; dropping the Friday retro Open questions: Who owns the analytics dashboard? Next steps: Mohamad · draft outline · Thursday Now summarize these notes: [...]

Why it works

The examples teach the model the exact shape, tone, and granularity you want. Patterns communicate what instructions can't.

Chain-of-thought

Ask the model to think before it answers.

On reasoning tasks (math, logic, multi-step analysis), asking the model to explain its thinking step-by-step before giving a final answer measurably improves accuracy. You also get a reasoning trace you can check for errors.

Before

A lab has 3 researchers. Each runs 4 experiments per week, each experiment needs 2 runs, and 15% of runs fail and need redoing. How many runs per week?

After

A lab has 3 researchers. Each runs 4 experiments per week, each experiment needs 2 runs, and 15% of runs fail and need redoing. Think step by step: 1. Compute total experiments. 2. Compute baseline runs. 3. Compute additional runs from 15% failure rate. 4. Sum for weekly total. Show each step, then give the final number on its own line.

Why it works

Forcing the model to externalize intermediate steps reduces the chance of skipping a computation, and lets you catch an error at step 2 before it contaminates step 4.

Decomposition

Break big tasks into a sequence of smaller prompts.

A single mega-prompt asking for analysis + writing + formatting + citations will produce mediocre output in all four dimensions. Running each step as its own prompt, with the previous output as input to the next, produces noticeably better results with clearer points to verify.

Before

Read this article and write me a 500-word literature review paragraph with citations comparing it to two other relevant sources.

After

Step 1: Summarize the article's main argument and method in 80 words. (verify → next) Step 2: Based on the summary, list three specific claims this article makes that I could compare to other sources. (verify → next) Step 3: For each claim, suggest the type of source that would support or challenge it. (verify → next) Step 4: Draft a 500-word paragraph comparing the article to [sources I provide after step 3].

Why it works

You verify at each step. Errors can't compound silently. And each sub-task is simple enough that the model performs well on it.

Constraints & format

Tell the model what NOT to do, what length, what shape.

Negative constraints ("don't use jargon", "no introduction", "no more than 3 bullets") are often more useful than positive ones. Format constraints (exact length, specific structure, required sections) keep the output usable instead of a wall of prose you then have to re-shape.

Before

write me a short email asking my professor for an extension

After

Write a short email to my professor asking for a 48-hour extension on the midterm paper. Constraints: - Under 120 words - No apologizing more than once - Do not mention reasons I haven't specified - Plain text, no markdown Context: I've been managing a family situation I'd rather not detail.

Why it works

Constraints prevent the model's default tendencies (over-apologizing, over-explaining, generic sympathy) and protect your voice. The "do not mention reasons I haven't specified" line alone rescues most of these emails.

Iterative refinement

Treat the first response as a draft, not the deliverable.

Novice users accept the first output or start over from scratch. Skilled users refine. Ask what's weak. Ask for the strongest version of a specific point. Ask for three alternatives to a passage you don't like. The best prompts are almost always the third one.

Before

(get mediocre first response) make it better

After

(get mediocre first response) Tighten the second paragraph: it's vague where it needs specifics. Replace 'many studies' with either a number or a specific example. Then, rewrite the opening sentence three different ways: one factual, one with a question, one starting from a specific observation. Show all three; I'll pick.

Why it works

"Make it better" has no target. Specific, localized revision requests with concrete replacement instructions give the model something to aim at. And asking for alternatives lets you choose instead of edit.

The S.C.A.N.S. Framework

An evaluation checklist for every AI output

Use this the first few times you review AI work. After a month it becomes automatic, a habit instead of a checklist.

Sources

Where does this claim actually come from?

Every specific fact, statistic, or quote can be traced to a real source I can check.
Citations are real publications, authors, and dates, not plausible-sounding inventions.
For domain-specific claims, the source is appropriate (peer-reviewed, authoritative, recent enough).

Claims

Is the reasoning actually valid?

The conclusion follows from the evidence presented, not from fluent language that feels right.
Confident claims about contested topics are actually contested in the field.
Specific numbers, dates, and names are verifiable against at least one independent source.
Generalizations are accurate at the level they're stated (not overreaching from a single study).

Assumptions

What is this output quietly taking for granted?

The output doesn't assume a Western, English-speaking, or US-centric context when mine is different.
Framing choices (which side of a debate is "standard", whose perspective is centered) are acknowledged, not smuggled in.
Gendered, cultural, or disciplinary defaults match my actual context.

Noise

What's missing or padded?

The response covers what I actually asked, not a generic version of a similar question.
Important caveats or limitations aren't omitted because they'd complicate the answer.
The output isn't padded with preamble, restated questions, or hedges that add no content.

Self-check

Would I stand behind this if asked?

I've read the full output, not skimmed it, and I understand every claim it makes.
If a professor, peer, or editor asked me where a specific line came from, I could answer honestly.
My own contribution is clearly more than light editing, or I've represented the collaboration accurately.
The final version reflects my thinking, not just fluent language I agree with after the fact.

Completion 0 / 17 checkpoints

Deliberate Practice

Challenges you can run in 20 minutes each

Self-guided, no setup. Three challenges per level, each mapped to a core competency. Use the tabs below to switch levels.

Prompt Eng.Foundational

Spot the weak prompt

Take a prompt you wrote in the past week. Rewrite it three times: once adding context, once adding format, once adding constraints.

Task

Find a real prompt you sent to an AI tool recently.
Rewrite it three times (context / format / constraints).
Run all four versions on the same model.
Rank the outputs and note what changed.

Reflection

Which version produced the biggest quality jump? Why? Which rewrite would you always apply from now on?

EvaluationDeveloping

Hunt the hallucination

Ask an AI for five specific facts in a niche area of your major. Verify each one independently. Track how it fails.

Task

Pick a narrow topic you know well (a subfield, a specific methodology, a niche event).
Ask for five specific factual claims with citations.
Verify each claim and each citation using a search engine and at least one scholarly source.
Categorize any errors: fabricated citation, wrong date, wrong attribution, correct but misleading, etc.

Reflection

Which kinds of errors were easiest to spot? Which were most convincing? What heuristic would you use next time to catch similar errors faster?

Critical ThinkingFoundational

Argue the other side

Take a position you actually hold. Ask the AI for the three strongest counterarguments. Actually engage with them.

Task

State a belief you hold about your field (methodological, ethical, interpretive).
Ask the AI for the three strongest challenges to that belief, steel-manned.
Write a response to each that doesn't dismiss it.
Notice which ones you couldn't answer well.

Reflection

Did the AI generate challenges you hadn't considered, or mostly predictable ones? Where did it help you think, and where did it feel like productivity theater?

Prompt Eng.Developing

Decompose and verify

Take a complex task you'd normally write one prompt for. Break it into 5 sequential steps. Verify at each handoff.

Task

Pick a multi-part task from your real work (analysis, outline + draft, summary + comparison).
Write it out as 5 smaller prompts, each consuming the previous output.
Run them in sequence, reading each result before proceeding.
Compare end result to a one-shot version of the same task.

Reflection

Where in the chain did you catch something you would have missed? Was the extra time worth it? For which task types will you use this pattern by default?

EvaluationAdvanced

Two tools, one task, written rubric

Run the same non-trivial task on two different AI tools. Evaluate against a rubric YOU write before you see either output.

Task

Write a 5-criterion rubric for your task before generating anything.
Run the same prompt on two different AI tools.
Score both outputs against your rubric. No gut feel.
Write one paragraph explaining the pattern of differences.

Reflection

Did your rubric capture what actually mattered, or did you find yourself wanting criteria you hadn't written down? What does that tell you about your implicit standards?

Critical ThinkingAdvanced

The AI-off week

For seven days, refuse to use AI for one specific type of task you normally rely on it for. Journal the change.

Task

Pick a task you use AI for several times a week (brainstorming, drafting, summarizing, explaining).
Commit to a seven-day AI-off window for that task only.
Keep a short note each day: what was harder, what was slower, what surprised you.
At the end, decide what you'll resume using AI for, and what you'll keep doing unassisted.

Reflection

What did you lose? What did you find? Where had AI been genuinely helping, and where had it been doing the thinking that was supposed to be yours?

EvaluationFoundational

Summarize something you already know

Have AI summarize a topic, paper, or book you know deeply. Compare line by line to what you actually know and catch every distortion.

Task

Pick content you know deeply (a paper you wrote, a topic you teach, a book you've read carefully, an event you lived through).
Ask AI to summarize it in 300 words.
Compare the summary line by line to what you actually know.
Note every glossed nuance, missing distinction, and outright error.

Reflection

Where did AI confidently say something wrong? Where did it capture the surface but miss what actually mattered? What does that pattern tell you about trusting AI summaries on topics you don't already know well?

Critical ThinkingDeveloping

Your position first, then AI's

Write your own analysis before showing AI. Then get AI's pushback. Compare to the reverse order on a separate question.

Task

Pick a question or decision that genuinely matters (a research direction, an argument you're building, a choice between approaches).
Write your own analysis in at least 200 words with no AI assistance.
Show your analysis to AI and ask for gaps, weaknesses, and counterpoints.
On a separate question, reverse the order: ask AI first, then form your own position.

Reflection

Which order produced clearer thinking? Which produced better final output? Where did AI's pushback genuinely sharpen your view, and where did it just add noise you ended up dismissing? When should you use each pattern?

Prompt Eng.Advanced

Build and test a reusable template

Create a prompt template for a task you do repeatedly. Stress-test it against 5 genuinely different inputs. Iterate until it holds.

Task

Identify a task you do with AI at least weekly (summarizing, feedback, outlining, classification, translation).
Write a prompt template with explicit variable slots.
Run it against 5 genuinely different inputs (not minor variations of the same thing).
When it fails on an input, revise the template and re-test against all 5.
Document the final version AND the conditions under which it still breaks.

Reflection

What kinds of inputs caused the most revision? What was the template unable to handle even after iteration? At what point did you stop improving it, and why? Is this template now worth keeping permanently, or did testing reveal that the task isn't actually suited for templating?

Curated Resources

Where to go deeper

Hand-picked, no course aggregators or marketing pages. Every link opens to a real resource, verified for this page.

Anthropic

Prompt Engineering Interactive Tutorial

Nine-chapter step-by-step course with hands-on exercises and an answer key. Free, self-paced, free-to-clone.

→

DeepLearning.AI

ChatGPT Prompt Engineering for Developers

Andrew Ng and Isa Fulford (OpenAI) teach principles of effective prompting with working notebook examples. Free.

→

Vanderbilt University · Coursera

Prompt Engineering for ChatGPT

Jules White's patterns-based course on prompt patterns, persona prompting, and few-shot techniques. Auditable for free.

→

DeepLearning.AI · Coursera

Generative AI for Everyone

Six hours covering what LLMs are, how to use them well, and how to evaluate outputs. Ideal before deeper technique courses.

→

Anthropic Console

Claude Workbench

Interactive environment to test prompts with Claude, adjust system prompts and parameters, and see how settings change output.

→

OpenAI Platform

OpenAI Playground

Experiment with GPT models, system messages, temperature, and response format. Useful for comparing behavior to Claude.

→

Google

Google AI Studio

Test prompts against Gemini models with long-context support. Handy for trying the same prompt across three major providers.

→

LMSYS / LM Arena

Chatbot Arena (side-by-side)

Run the same prompt on two models simultaneously in blind comparison. The fastest way to build intuition for which tool suits which task.

→

DAIR.AI

Prompt Engineering Guide

Comprehensive open-source reference covering zero-shot, few-shot, chain-of-thought, self-consistency, tree-of-thought, and more.

→

Anthropic Docs

Prompt Engineering Overview

Anthropic's official guide to prompt design, covering clarity, examples, chain-of-thought, XML structure, and iterative refinement.

→

White et al. · Vanderbilt

A Prompt Pattern Catalog

Academic paper cataloging reusable prompt patterns (persona, template, flipped interaction, game-play). Foundation for pattern-based thinking.

→

Simon Willison's Weblog

Prompt engineering writings

Practical, honest, and frequently updated writing on prompt injection, prompt design, and evaluation by a leading independent practitioner.

→

Stanford HAI

Research on Evaluation & Transparency

Peer-reviewed research on evaluating language models, including the Foundation Model Transparency Index and HELM benchmark analyses.

→

Lil'Log · Lilian Weng

Prompt Engineering

Deep technical overview from an OpenAI researcher. Good after the basics, covers self-consistency, automatic prompt generation, and evaluation.

→

Common Questions

Frequently asked

Both. The skills covered here (prompt engineering, evaluation, critical thinking with AI) apply equally to writing a paper and designing a syllabus. Use the role filter at the top to see practice examples tuned to your work, and feel free to read across: the faculty examples are often useful to students and vice versa.

New to AI is orientation: what AI is, how it works, how to start using it responsibly. Building AI Skills is craft: how to get measurably better once you're already using these tools. If you're still deciding what ChatGPT or Claude actually does, start with New to AI. If you're using them regularly but feel your output is inconsistent, start here.

Those pages cover applications: using AI for pedagogy, using AI for scholarship. Building AI Skills covers the underlying craft that makes both of those applications work better. Once you've built fluency here, the techniques transfer to any context, including teaching and research.

Any major frontier model works for the techniques covered here. Claude, ChatGPT, and Gemini all respond well to clear prompts, few-shot examples, chain-of-thought, and iterative refinement. If you want to feel the difference between tools, try the same prompt on two of them (see LM Arena in the Sandboxes tab). The skills transfer.

Foundational to Developing: roughly 8–12 hours of deliberate practice over a few weeks. Developing to Advanced is less about time and more about reps across varied tasks: expect months of ongoing use before the judgment in the Advanced level feels natural. The time estimates on the pathway cards are rough averages, not targets.

It depends on your instructor's policy for the specific course, which should be stated in your syllabus. When in doubt, ask. This page teaches skills that are valuable regardless of the policy, but how you apply them to graded work is governed by each course's rules.

Resources and examples are reviewed each term. AI tools change quickly, but the underlying skills on this page (clear prompting, structured evaluation, deliberate practice) have stayed durable across model generations. We refresh links and course pointers as they shift, and we keep the framework stable.

Apply these skills

Where to apply what you've learned

The techniques on this page transfer across every AI Hub page. Pick the domain that matters most for your next step.

New to AI

Start with the fundamentals

If any of this felt too far ahead, step back and build from the ground up. Concepts, starter tools, responsible use.

Go to New to AI →

Teaching with AI

Bring these skills to your classroom

Apply prompt engineering and evaluation to course design, assessment, and student support. Pedagogy-specific patterns.

Go to Teaching with AI →

Research with AI

Apply the craft to scholarship

Use these techniques for literature review, analysis, and writing. Research-specific workflows and evaluation.

Go to Research with AI →