Build measurable fluency with AI tools

Building AI Skills

Develop AI competencies through structured practice, real evaluation frameworks, and deliberate technique. Move from casual use to confident, critical mastery.

I am a
My level
Scroll
Three Core Competencies

The skills that separate confident users from casual ones

Everyone can type a question into an AI. Mastery is the repeatable ability to shape, evaluate, and reason with its output. These three skills compound.

01

Prompt Engineering

The craft of designing instructions that reliably produce the output you need. Structure, context, examples, constraints, and iteration, not magic words.

What mastery looks like
You can predict how a prompt will perform before sending it, and diagnose why it failed when it doesn't.
02

Output Evaluation

A systematic way to judge AI responses for accuracy, bias, completeness, and appropriateness to your task. Trust is earned per output, never granted by default.

What mastery looks like
You catch hallucinations and subtle errors without reading every word twice, because you know where to look.
03

Critical Thinking with AI

Using AI to sharpen your reasoning without outsourcing it. Knowing when to accept, when to push back, and when to set the tool aside entirely.

What mastery looks like
Your thinking is better with AI than without it, and you can tell the difference between agreement and insight.
Where are you?

A seven-question self-check

No tracking, no accounts. Takes two minutes and gives you a realistic read on your current level and where to focus next.

Question 1 of 7 Pick the option closest to your current habit
Skill Pathway

Three levels, three different questions to ask yourself

Filter by your role above to see examples tuned to your work. Filter by level to jump to where you are.

Level 01 · Foundational

Can I get a useful response on my first try?

You've used AI before, but results feel inconsistent. The goal at this level is to reliably get a response that's in the right ballpark without starting over five times.

Est. time
3–5 hours
Focus
Clarity & context

At this level, you can

  • Write prompts that include a role, a task, and a format so the output is structured and usable.
  • Give the model relevant context instead of assuming it knows your situation.
  • Recognize when an answer is generic or off-topic and ask a sharper follow-up.
  • Tell the difference between a factual question and a generative one.
Level 02 · Developing

Can I shape output to exactly what I need?

You get decent responses, but complex tasks still take many rounds. The goal here is technique: you learn specific moves that consistently improve results.

Est. time
8–12 hours
Focus
Technique & iteration

At this level, you can

  • Apply few-shot examples and chain-of-thought to get reliable results on reasoning tasks.
  • Break a complex request into steps and run them as a sequence rather than one mega-prompt.
  • Set explicit constraints (length, tone, format, what to avoid) and have the model respect them.
  • Evaluate an output against your own criteria, not just "looks right".
  • Choose between tools based on the task's actual shape, not habit.
Level 03 · Advanced

Can I integrate AI into my real workflow without it integrating me?

You use AI fluently, now you need discipline. The goal at this level is judgment: knowing when to lean in, when to pull back, and how to keep your own voice.

Est. time
Ongoing
Focus
Judgment & integration

At this level, you can

  • Design multi-step workflows that combine AI generation, your own edits, and independent verification.
  • Identify when a task is genuinely improved by AI and when using AI will degrade your thinking.
  • Evaluate two model outputs side-by-side using explicit criteria, not gut feel.
  • Build reusable prompt patterns you refine over time rather than writing from scratch each session.
  • Hold a clear, honest view of where your work ends and the model's contribution begins.
The Prompt Lab

Seven techniques with before/after examples

Each technique is a specific, named move. See it fail, see it work, and understand why.

Be clear and direct

Say exactly what you want, to whom, in what form.

Most "bad AI output" is a response to an ambiguous prompt. Models pattern-match to the most likely interpretation of vague input. Specifying your task, audience, and desired format resolves that ambiguity up front.

Before
tell me about photosynthesis
After
Explain photosynthesis to a high school biology student who already knows about cell structure. In 150 words, cover: what enters, what exits, and why it matters for the carbon cycle. Use one concrete analogy.
Why it works

The second prompt fixes four things: the audience (so the vocabulary is right), the length (so the model doesn't over-deliver), the structure (so key points aren't missed), and the style (so it's concrete instead of abstract).

Role prompting

Give the model a perspective before giving it a task.

Assigning a role shapes vocabulary, default assumptions, and depth of explanation. It's not about trickery, it's about specifying the voice and expertise level you want the response delivered in.

Before
review this paragraph for problems
After
You are an academic writing tutor for undergraduate students. Review the paragraph below for three things: clarity of the main claim, evidence-to-claim fit, and topic sentence strength. Suggest one specific revision for each. Paragraph: [...]
Why it works

The role (academic writing tutor) narrows the feedback to what matters for the writer. The three-point structure prevents a generic "here are ten things" list and forces prioritization.

Few-shot examples

Show two or three examples of the output you want. Then ask for another.

Describing a style or format in words is imprecise. Showing examples is dramatically more precise. Few-shot prompting is the single biggest quality upgrade for any task with a consistent structure (summaries, rewrites, classifications, formatting).

Before
summarize these meeting notes in a useful way
After
Summarize meeting notes in this format: Example 1: Decisions: [comma-separated] Open questions: [as questions] Next steps: [who · what · by when] Example 2: Decisions: Moving to Monday standups; dropping the Friday retro Open questions: Who owns the analytics dashboard? Next steps: Mohamad · draft outline · Thursday Now summarize these notes: [...]
Why it works

The examples teach the model the exact shape, tone, and granularity you want. Patterns communicate what instructions can't.

Chain-of-thought

Ask the model to think before it answers.

On reasoning tasks (math, logic, multi-step analysis), asking the model to explain its thinking step-by-step before giving a final answer measurably improves accuracy. You also get a reasoning trace you can check for errors.

Before
A lab has 3 researchers. Each runs 4 experiments per week, each experiment needs 2 runs, and 15% of runs fail and need redoing. How many runs per week?
After
A lab has 3 researchers. Each runs 4 experiments per week, each experiment needs 2 runs, and 15% of runs fail and need redoing. Think step by step: 1. Compute total experiments. 2. Compute baseline runs. 3. Compute additional runs from 15% failure rate. 4. Sum for weekly total. Show each step, then give the final number on its own line.
Why it works

Forcing the model to externalize intermediate steps reduces the chance of skipping a computation, and lets you catch an error at step 2 before it contaminates step 4.

Decomposition

Break big tasks into a sequence of smaller prompts.

A single mega-prompt asking for analysis + writing + formatting + citations will produce mediocre output in all four dimensions. Running each step as its own prompt, with the previous output as input to the next, produces noticeably better results with clearer points to verify.

Before
Read this article and write me a 500-word literature review paragraph with citations comparing it to two other relevant sources.
After
Step 1: Summarize the article's main argument and method in 80 words. (verify → next) Step 2: Based on the summary, list three specific claims this article makes that I could compare to other sources. (verify → next) Step 3: For each claim, suggest the type of source that would support or challenge it. (verify → next) Step 4: Draft a 500-word paragraph comparing the article to [sources I provide after step 3].
Why it works

You verify at each step. Errors can't compound silently. And each sub-task is simple enough that the model performs well on it.

Constraints & format

Tell the model what NOT to do, what length, what shape.

Negative constraints ("don't use jargon", "no introduction", "no more than 3 bullets") are often more useful than positive ones. Format constraints (exact length, specific structure, required sections) keep the output usable instead of a wall of prose you then have to re-shape.

Before
write me a short email asking my professor for an extension
After
Write a short email to my professor asking for a 48-hour extension on the midterm paper. Constraints: - Under 120 words - No apologizing more than once - Do not mention reasons I haven't specified - Plain text, no markdown Context: I've been managing a family situation I'd rather not detail.
Why it works

Constraints prevent the model's default tendencies (over-apologizing, over-explaining, generic sympathy) and protect your voice. The "do not mention reasons I haven't specified" line alone rescues most of these emails.

Iterative refinement

Treat the first response as a draft, not the deliverable.

Novice users accept the first output or start over from scratch. Skilled users refine. Ask what's weak. Ask for the strongest version of a specific point. Ask for three alternatives to a passage you don't like. The best prompts are almost always the third one.

Before
(get mediocre first response) make it better
After
(get mediocre first response) Tighten the second paragraph: it's vague where it needs specifics. Replace 'many studies' with either a number or a specific example. Then, rewrite the opening sentence three different ways: one factual, one with a question, one starting from a specific observation. Show all three; I'll pick.
Why it works

"Make it better" has no target. Specific, localized revision requests with concrete replacement instructions give the model something to aim at. And asking for alternatives lets you choose instead of edit.

The S.C.A.N.S. Framework

An evaluation checklist for every AI output

Use this the first few times you review AI work. After a month it becomes automatic, a habit instead of a checklist.

S

Sources

Where does this claim actually come from?
  • Every specific fact, statistic, or quote can be traced to a real source I can check.
  • Citations are real publications, authors, and dates, not plausible-sounding inventions.
  • For domain-specific claims, the source is appropriate (peer-reviewed, authoritative, recent enough).
C

Claims

Is the reasoning actually valid?
  • The conclusion follows from the evidence presented, not from fluent language that feels right.
  • Confident claims about contested topics are actually contested in the field.
  • Specific numbers, dates, and names are verifiable against at least one independent source.
  • Generalizations are accurate at the level they're stated (not overreaching from a single study).
A

Assumptions

What is this output quietly taking for granted?
  • The output doesn't assume a Western, English-speaking, or US-centric context when mine is different.
  • Framing choices (which side of a debate is "standard", whose perspective is centered) are acknowledged, not smuggled in.
  • Gendered, cultural, or disciplinary defaults match my actual context.
N

Noise

What's missing or padded?
  • The response covers what I actually asked, not a generic version of a similar question.
  • Important caveats or limitations aren't omitted because they'd complicate the answer.
  • The output isn't padded with preamble, restated questions, or hedges that add no content.
S

Self-check

Would I stand behind this if asked?
  • I've read the full output, not skimmed it, and I understand every claim it makes.
  • If a professor, peer, or editor asked me where a specific line came from, I could answer honestly.
  • My own contribution is clearly more than light editing, or I've represented the collaboration accurately.
  • The final version reflects my thinking, not just fluent language I agree with after the fact.
Completion 0 / 17 checkpoints
Deliberate Practice

Challenges you can run in 20 minutes each

Self-guided, no setup. Three challenges per level, each mapped to a core competency. Use the tabs below to switch levels.

Prompt Eng.Foundational

Spot the weak prompt

Take a prompt you wrote in the past week. Rewrite it three times: once adding context, once adding format, once adding constraints.

Task
  1. Find a real prompt you sent to an AI tool recently.
  2. Rewrite it three times (context / format / constraints).
  3. Run all four versions on the same model.
  4. Rank the outputs and note what changed.
Reflection

Which version produced the biggest quality jump? Why? Which rewrite would you always apply from now on?

EvaluationDeveloping

Hunt the hallucination

Ask an AI for five specific facts in a niche area of your major. Verify each one independently. Track how it fails.

Task
  1. Pick a narrow topic you know well (a subfield, a specific methodology, a niche event).
  2. Ask for five specific factual claims with citations.
  3. Verify each claim and each citation using a search engine and at least one scholarly source.
  4. Categorize any errors: fabricated citation, wrong date, wrong attribution, correct but misleading, etc.
Reflection

Which kinds of errors were easiest to spot? Which were most convincing? What heuristic would you use next time to catch similar errors faster?

Critical ThinkingFoundational

Argue the other side

Take a position you actually hold. Ask the AI for the three strongest counterarguments. Actually engage with them.

Task
  1. State a belief you hold about your field (methodological, ethical, interpretive).
  2. Ask the AI for the three strongest challenges to that belief, steel-manned.
  3. Write a response to each that doesn't dismiss it.
  4. Notice which ones you couldn't answer well.
Reflection

Did the AI generate challenges you hadn't considered, or mostly predictable ones? Where did it help you think, and where did it feel like productivity theater?

Prompt Eng.Developing

Decompose and verify

Take a complex task you'd normally write one prompt for. Break it into 5 sequential steps. Verify at each handoff.

Task
  1. Pick a multi-part task from your real work (analysis, outline + draft, summary + comparison).
  2. Write it out as 5 smaller prompts, each consuming the previous output.
  3. Run them in sequence, reading each result before proceeding.
  4. Compare end result to a one-shot version of the same task.
Reflection

Where in the chain did you catch something you would have missed? Was the extra time worth it? For which task types will you use this pattern by default?

EvaluationAdvanced

Two tools, one task, written rubric

Run the same non-trivial task on two different AI tools. Evaluate against a rubric YOU write before you see either output.

Task
  1. Write a 5-criterion rubric for your task before generating anything.
  2. Run the same prompt on two different AI tools.
  3. Score both outputs against your rubric. No gut feel.
  4. Write one paragraph explaining the pattern of differences.
Reflection

Did your rubric capture what actually mattered, or did you find yourself wanting criteria you hadn't written down? What does that tell you about your implicit standards?

Critical ThinkingAdvanced

The AI-off week

For seven days, refuse to use AI for one specific type of task you normally rely on it for. Journal the change.

Task
  1. Pick a task you use AI for several times a week (brainstorming, drafting, summarizing, explaining).
  2. Commit to a seven-day AI-off window for that task only.
  3. Keep a short note each day: what was harder, what was slower, what surprised you.
  4. At the end, decide what you'll resume using AI for, and what you'll keep doing unassisted.
Reflection

What did you lose? What did you find? Where had AI been genuinely helping, and where had it been doing the thinking that was supposed to be yours?

EvaluationFoundational

Summarize something you already know

Have AI summarize a topic, paper, or book you know deeply. Compare line by line to what you actually know and catch every distortion.

Task
  1. Pick content you know deeply (a paper you wrote, a topic you teach, a book you've read carefully, an event you lived through).
  2. Ask AI to summarize it in 300 words.
  3. Compare the summary line by line to what you actually know.
  4. Note every glossed nuance, missing distinction, and outright error.
Reflection

Where did AI confidently say something wrong? Where did it capture the surface but miss what actually mattered? What does that pattern tell you about trusting AI summaries on topics you don't already know well?

Critical ThinkingDeveloping

Your position first, then AI's

Write your own analysis before showing AI. Then get AI's pushback. Compare to the reverse order on a separate question.

Task
  1. Pick a question or decision that genuinely matters (a research direction, an argument you're building, a choice between approaches).
  2. Write your own analysis in at least 200 words with no AI assistance.
  3. Show your analysis to AI and ask for gaps, weaknesses, and counterpoints.
  4. On a separate question, reverse the order: ask AI first, then form your own position.
Reflection

Which order produced clearer thinking? Which produced better final output? Where did AI's pushback genuinely sharpen your view, and where did it just add noise you ended up dismissing? When should you use each pattern?

Prompt Eng.Advanced

Build and test a reusable template

Create a prompt template for a task you do repeatedly. Stress-test it against 5 genuinely different inputs. Iterate until it holds.

Task
  1. Identify a task you do with AI at least weekly (summarizing, feedback, outlining, classification, translation).
  2. Write a prompt template with explicit variable slots.
  3. Run it against 5 genuinely different inputs (not minor variations of the same thing).
  4. When it fails on an input, revise the template and re-test against all 5.
  5. Document the final version AND the conditions under which it still breaks.
Reflection

What kinds of inputs caused the most revision? What was the template unable to handle even after iteration? At what point did you stop improving it, and why? Is this template now worth keeping permanently, or did testing reveal that the task isn't actually suited for templating?

Curated Resources

Where to go deeper

Hand-picked, no course aggregators or marketing pages. Every link opens to a real resource, verified for this page.

Common Questions

Frequently asked

Both. The skills covered here (prompt engineering, evaluation, critical thinking with AI) apply equally to writing a paper and designing a syllabus. Use the role filter at the top to see practice examples tuned to your work, and feel free to read across: the faculty examples are often useful to students and vice versa.
New to AI is orientation: what AI is, how it works, how to start using it responsibly. Building AI Skills is craft: how to get measurably better once you're already using these tools. If you're still deciding what ChatGPT or Claude actually does, start with New to AI. If you're using them regularly but feel your output is inconsistent, start here.
Those pages cover applications: using AI for pedagogy, using AI for scholarship. Building AI Skills covers the underlying craft that makes both of those applications work better. Once you've built fluency here, the techniques transfer to any context, including teaching and research.
Any major frontier model works for the techniques covered here. Claude, ChatGPT, and Gemini all respond well to clear prompts, few-shot examples, chain-of-thought, and iterative refinement. If you want to feel the difference between tools, try the same prompt on two of them (see LM Arena in the Sandboxes tab). The skills transfer.
Foundational to Developing: roughly 8–12 hours of deliberate practice over a few weeks. Developing to Advanced is less about time and more about reps across varied tasks: expect months of ongoing use before the judgment in the Advanced level feels natural. The time estimates on the pathway cards are rough averages, not targets.
It depends on your instructor's policy for the specific course, which should be stated in your syllabus. When in doubt, ask. This page teaches skills that are valuable regardless of the policy, but how you apply them to graded work is governed by each course's rules.
Resources and examples are reviewed each term. AI tools change quickly, but the underlying skills on this page (clear prompting, structured evaluation, deliberate practice) have stayed durable across model generations. We refresh links and course pointers as they shift, and we keep the framework stable.