Homework 1: Data Processing in Python + Intro to LLMs

Released: Tuesday, January 20

Due: Tuesday, February 3 @ 11:59 PM ET (submit on Gradescope)

Notebook (80 pts)

HW 1 introduces the core skills you'll need to work with LLMs for the rest of the course. You will:

Part 1: Practice text-cleaning and data processing to transform raw text into cleaned model inputs.
Part 2: Use HuggingFace to interact with a chat model, and implement each step of the process from tokenization to generation.
Part 3: compare different LLM checkpoints using an evaluation framework to reason about accuracy and latency trade-offs.

Notebook

Includes instructions and starter code for the assignment.

Open HW1.ipynb in Colab

Short Answer Questions (20 pts)

Question 1 (Data Cleaning)
Explain why replacing sensitive tokens (URLs, emails, phone numbers, etc.) with semantic placeholders (e.g., [url], [email]) is often better than deleting them outright. Reference the data cleaning goals from Task 1.2 and describe at least two downstream analyses that still benefit from knowing a placeholder existed.

Experiment Setup for Questions 2 & 3

Run the following cell to call your generate_news_headline function on both a positive and a negative review across a range of temperature values. Use the outputs to answer the next two questions.

positive_review = reviews_df['text_clean'].iloc[0]
negative_review = reviews_df['text_clean'].iloc[2]
temperature_range = [0.1, 0.2, 0.5, 1.0, 1.5, 2.0, 2.5]

print("--- Positive Review Headlines ---")
print(f"Original: {positive_review}\n")
for temp in temperature_range:
    headline = generate_news_headline(positive_review, temperature=temp)
    print(f"Temp {temp:.1f}: {headline}")

print("\n" + "-" * 50 + "\n")
print("--- Negative Review Headlines ---")
print(f"Original: {negative_review}\n")
for temp in temperature_range:
    headline = generate_news_headline(negative_review, temperature=temp)
    print(f"Temp {temp:.1f}: {headline}")

Question 2 (Sampling Effects)
Based on the outputs you generated above, describe how the headlines changed as the temperature increased. Discuss word choice, creativity, factual grounding, and what happened at the highest temperatures. Provide at least two concrete examples pulled from your generations.
Question 3 (Temperature Selection)
Given your experiment, propose an "ideal" temperature range for creative-but-accurate headline generation. Cite three specific LLM tasks that demand low temperatures and three that benefit from higher temperatures, explaining why temperature matters for each case.
Question 4 (Comparing LLMs)
Using the evaluation framework from Part 3, compare and contrast at least two Hugging Face checkpoints you profiled. Reference quantitative metrics (accuracy, latency, or other observations from evaluate_model) and qualitative behaviors. Discuss when you would choose one model over the other.

Submission Instructions

Where: Submit on Gradescope under “HW 1”.
Due Date: Tuesday, February 4 @ 11:59 PM Eastern Time.
Files to Upload:
- HW1.ipynb — your notebook with all coding responses. Ensure it runs top-to-bottom without errors.
- HW1_short_answers.pdf — a PDF of your responses to the short-answer questions.
Late policy: You may apply your remaining late days; otherwise late work is not accepted.