Homework 2: Prompt Engineering

Released: Tuesday, February 3

Due: Tuesday, February 17 @ 11:59 PM ET (submit on Gradescope)

Notebook (70 pts)

HW 2 focuses on prompt craftsmanship, reasoning scaffolds, and conversational memory. You will:

Part 1: Configure a reusable Mistral-based agent with a clean API wrapper.
Part 2: Iterate from a bad transcript prompt to schema-driven extraction, ambiguity handling, and meta-prompting.
Part 3: Practice reasoning-first patterns (Chain-of-Thought and reflection personas) to stabilize answers.
Part 4: Build a three-layer memory stack (buffer, structured store, memory-aware agent) for a multi-turn travel concierge.

Notebook

Includes instructions, starter code, and autograder checks for the assignment.

Open HW2.ipynb in Colab

Short Answer Questions (30 pts)

Question 1 (Schema & Ambiguity)
You explicitly defined the desired JSON schema and set rules to handle ambiguity. Imagine you are building an application that needs to process 1,000 different meeting transcripts every day. Why is a guaranteed, consistent JSON output essential for this application to work at scale? Then, describe a potential failure scenario for a real-world agent if its prompt did not include these how to deal with ambiguity.
Question 2 (Meta-Prompting)
You built a meta-prompt that teaches another agent how to craft the final instructions. Extend that idea to a multi-tool travel assistant that books flights, hotels, and restaurants from a vague request like "Plan me a trip to Paris." Explain how meta-prompting could guide tool selection and output formatting.
Question 3 (Chain-of-Thought)
Beyond reaching the correct wizard/pet assignment, why should developers force the model to show its reasoning trace? If the puzzle output was wrong, explain how the narrated deductions would speed up debugging compared with a bare, incorrect JSON answer.
Question 4 (Multi-Persona Reflection)
The startup exercise makes the model pitch, critique, and then revise the idea. Why does this adversarial, multi-persona flow generate a stronger final concept than simply prompting "Give me a good startup idea"? How does forcing the model to argue with itself produce a more robust and well-vetted final idea compared to just asking it to "generate a good startup idea"?
Question 5 (Buffer Limits & Recovery)
Suppose the travel concierge must sustain 12-turn conversations but the token budget forces you to drop older turns. Design a buffering policy that preserves the right commitments when truncation happens. Describe (a) the criteria you would use to score each turn's importance, (b) how you would summarize or compress low-priority turns without losing critical constraints, and (c) how you would validate that the final itinerary still cites every constraint shared earlier. Justify your trade-offs.
Question 6 (Memory Store & Memory-Aware Agent)
After implementing memory extraction, store updates, and the memory-aware agent, analyze how each layer prevents regressions in a multi-turn travel planning session. Give an example of a real failure that would occur if (a) the extractor mislabeled a fact, or (b) the memory-aware agent forgot to cite the stored constraints in its final itinerary.

Submission Instructions

Where: Submit on Gradescope under “HW 2”.
Due Date: Tuesday, February 17 @ 11:59 PM ET (submit on Gradescope).
Files to Upload:
- HW2.ipynb — your notebook with all coding responses and autograder outputs.
- HW2_short_answers.pdf — a PDF with the six short-answer responses.
Late policy: Late days apply automatically; otherwise late work is not accepted.