How I Made AI 100% Reliable Using Meta-Questions in Prompts
How I Made AI 100% Reliable Using Meta-Questions in Prompts
One of my favorite recent projects has been a data structures learning system that generates practice problems in both Go and TypeScript. The whole thing is driven by prompt engineering: markdown files that tell Claude how to create problem directories, write tests, and maintain consistency across 33 different data structures.
When you run the create-problem script, Claude reads a 15-step workflow document and builds out a complete, ready-to-use practice problem with Jest test cases, starter files, and all the configuration needed to just start coding.
This project demonstrates prompt engineering (multi-step workflow automation), Go and TypeScript proficiency, test automation (Jest, automated problem generation), and systems thinking (designing reliable AI workflows for 100% consistency).
The Problem
Here's what kept happening. Claude would run through all 15 steps, create the problem directory, write the test files, generate the package.json with Jest and TypeScript dependencies, and report success.
I'd open the new problem and run npm test. sh: jest: command not found. No node_modules directory. Claude had skipped step 7.1b, the npm install command.
I wanted Claude to run npm install. It wasn't running npm install.
This happened in roughly 30% of runs before I fixed it. Unreliable automation is worse than no automation.
Why Claude Skips It
I don't know for sure why Claude consistently skipped this step, but I have a theory.
Models are probably trained to be cautious about npm install. In most contexts, auto-installing packages from a generated package.json is bad UX. The model might hallucinate package names, add unnecessary dependencies, or install things the user didn't intend to use. Users should review what's being installed before running it.
So Claude learned not to automatically run npm install after generating a package.json. That's sensible default behavior.
But my scenario is different. The package.json is templated - same dependencies every time (Jest, TypeScript, and their required tooling). I want the automation. The whole point is that Claude handles the setup completely.
The caution that makes sense elsewhere became a problem here.
The Solution: Intentional Redundancy
I solved this by repeating the step three times. Here's what the actual steps.md file looks like:
## Step 7: Create TypeScript Implementation
7.1a: create package.json
- Add Jest, TypeScript, @types/jest, ts-jest as devDependencies
- Include test script: "jest"
7.1b: install packages
- Run: npm install
7.1b repeat: check if you skipped step 7.1b
- Did you actually run npm install in the previous step?
- If not, run it now
...
## Step 15: Final Verification
Did you skip step 7.1b?
- Final check: Does the TypeScript directory have node_modules?
- If not, you skipped the npm install step
The first time it's an instruction. The second time it's a verification question immediately after. The third time (at step 15, after everything else) it's a final catch.
This is meta-testing. The prompt tests itself by asking Claude to confirm it followed the earlier instruction.
The Core Pattern: Meta-Questions
The meta-question is what makes this work. Not just "do this step" but "confirm you did this step."
This forces Claude to acknowledge execution instead of just inferring intent. You can't answer "Did you skip step 7.1b?" without actually checking what happened in step 7.1b. It requires a check-yourself moment.
This pattern overrides cautious behavior. The repetition and explicit verification make it harder to skip than to execute. After testing across 200+ problem generations, this approach now achieves 100% reliability.
Why This Works: Breaking Autopilot
When Claude hits "7.1b repeat: check if you skipped step 7.1b" immediately after the instruction, the workflow breaks autopilot. Instead of just reading through and inferring intent, Claude has to confirm execution.
Then step 15 catches it if the earlier checkpoints failed. By that point, 13 other steps have been completed. The cost of skipping npm install is now higher. You'd have to backtrack through the entire workflow to fix it. Easier to just do the step.
Other Techniques: Suppressing Optimization and Forcing Context
The repeated step is the main pattern, but there are other enforcement mechanisms in the workflow.
Suppressing the Todo List
The create-problem.sh script includes a specific instruction: "Do not create a todo list, use steps.md verbatim."
This is more important than it sounds. When Claude creates a todo list, it's optimizing. It reads your 15 steps, understands the goal, and designs what it thinks are the most efficient steps to achieve that goal. Claude is smart about this. It will reorganize, combine steps, skip redundant-seeming tasks, and generally try to improve your workflow.
That's exactly what you don't want.
When you know exactly what you want Claude to do, and you want it done a specific way every single time, you have to suppress that optimization. Todo lists give Claude decision-making authority. It chooses how to interpret your instructions, which steps to prioritize, what seems important versus what seems like boilerplate.
But this process is supposed to produce the same output every time. Same directory structure, same file names, same test format, same everything. Consistency matters more than optimization. You need to remove Claude's ability to redesign your workflow, because Claude might choose to do something entirely different. Something smart, something reasonable, but not what you specified.
The numbered steps in steps.md aren't suggestions. They're the exact procedure. "Use steps.md verbatim" means follow the script, don't improvise.
Forcing Context Awareness
Step 5 in the workflow says this:
### 5. Review Difficulty requirements
- Read levels.md and read requirements under specific level
- Memorize requirements and review actions against requirements
The whole second line is critical. "Memorize requirements and review actions against requirements."
"Memorize" means spend the tokens to load this into context. Don't optimize. Don't skim. Don't read just the first 30 lines to save tokens. Load the entire thing.
This guarantees Claude takes a specific action and gets a predictable result. It doesn't get to decide that reading part of levels.md is sufficient. Sure, levels.md isn't that long, so that optimization probably wouldn't happen anyway, but why leave it up to Claude?
We want a repeatable process. Not 80% right 80% of the time. We want 100% accuracy 100% of the time. That means validation and ensuring context isn't sacrificed in the name of efficiency.
By step 3, the problem already exists in Claude's head. Steps 1 and 2 involved reviewing existing problems and checking for uniqueness. Claude has been thinking about what problem to create. It just hasn't written anything down yet.
Step 5 forces a pre-validation checkpoint. Before Claude writes any files, it has to explicitly check the problem it's conceptualized against the actual constraints in levels.md.
Here's what levels.md actually specifies for an intermediate problem:
## Intermediate
The problem should be solvable by replicating the subject matter with
additional use of a single one of the previous data structures, using
the prefix integer as a reference.
Example: A problem in directory 6-* could use an additional data structure
from directories 1-*,2-*,3-*,4-*,or 5-*.
Moderate problem solving in addition to the subject matter should be
needed to reach a solution.
Test cases should include cases that could trip up a less robust solution.
Compare that to beginner:
## Beginner
The problem should be solvable by directly replicating the subject being tested.
Very little additional problem solving should be needed to reach a solution.
Test cases should be straight forward and shouldn't deliberately try to trip up the solution.
Step 5 catches constraint violations before anything gets written. Using Hash Tables (directory 03) for a Linked Lists problem (directory 02) breaks the rule. Intermediate can only use previous data structures. Test cases that are too simple or problems without moderate thinking get caught here too.
Claude might have hallucinated constraints or misunderstood what "intermediate" means. The "memorize and review" instruction forces Claude to compare what it's planning against what's actually required, and adjust before committing to disk.
When This Pattern Matters
This level of verification is overkill for conversational AI use. Most of the time, you're working with Claude iteratively - discussing what needs to be done, adjusting based on feedback, monitoring the output.
But when you need 100% reliability - same output every time, zero tolerance for skipped steps - meta-questions and verification checkpoints are essential.
For this project, I needed identical directory structure, file format, and test configuration across 33 data structures and 200+ practice problems. The meta-question pattern delivered that. The system now generates problems with complete reliability, saving hours of manual setup work.
Applying This Pattern
Step 7.1b isn't special because it installs npm packages. It's special because it demonstrates a pattern you can apply to any critical step in a multi-step LLM workflow.
Find the step that keeps getting skipped, or the step that can't be skipped, and add verification questions. Before the step, immediately after, at the very end - experiment to find what works.
AI models are black boxes. The thing that worked for problem A might not work for problem B when you reapply it. So experiment. Run it multiple times, check the output, adjust, run again. Get a good number of test cases completed before declaring 100% reliability.
The conversational approach lets Claude improvise. The verification approach forces Claude to execute exactly what you specified, every time.
What I Learned
Building this system taught me that AI reliability requires treating the model like a junior developer - you need verification steps, explicit checkpoints, and intentional redundancy. The meta-question pattern (asking Claude to confirm it did the step) was a breakthrough that took the success rate from 70% to 100%.
I also learned that suppressing optimization is sometimes necessary. Claude's tendency to improve workflows is usually helpful, but when you need identical output every time, you have to explicitly disable that behavior with instructions like "use steps.md verbatim."
If I were doing this again, I'd test the verification pattern earlier instead of fighting the skipped npm install step for weeks. But the core lesson - that you can engineer prompts for 100% reliability if you design the right checkpoints - fundamentally changed how I approach AI automation.
The "memorize and review" pattern for forcing context awareness is now something I use in other projects too. When the model needs to check its work against specific requirements, making that explicit in the prompt ensures it actually happens.
Check Out the Project
Repository: github.com/hegner123/learn-datastructures
Try it: Clone the repo and run ./create-problem.sh to generate a practice problem
Scale: 33 data structures, 200+ generated problems, 100% reliability
I practice AI-augmented development building real solutions to problems. Workflows, orchestrators, content generation systems, AI co-working tools, and developer tooling are what I enjoy building. Currently exploring AI engineering roles where I can identify problems AI can solve and build the systems to solve them. If you found this interesting, let's connect on LinkedIn or reach out at hegner123@gmail.com.