CS410/1411 Homework 10: LLMs

# CS410/1411 Homework 10: LLMs & PDDL ==**Due Date: 4/20/2026 at 11:59pm**== **Need help?** Remember to check out [Edstem](https://edstem.org/us/courses/93617) and our website for TA assistance. :::danger **⚠️ Battle Alert 🏆 ⚠️** You must activate the CS410 virtual environment every time you work on a CS410 assignment! You can find the activation instructions [here](https://hackmd.io/@cs410/BJvhqHXuR). If you forget to activate this virtual environment, you will almost certainly encounter import errors, and your code will not run. ::: ![Hog_Mountain](https://hackmd.io/_uploads/S1ki8GOHbe.jpg) ## Assignment Overview In this assignment, you will use PDDL (Planning Domain Definition Language) and first-order logic to represent a sliding puzzle tile game. Then, you will use a PDDL solver to find solutions to planning problems. This assignment also provides an introduction to LLMs and their usecases through two separate notebooks. The first notebook provides an introduction to how LLMs work, including how the model understands and generates text. The second notebook investigates how LLMs can be used for planning. Each notebook provides a more in depth description of their content. ## Learning Objectives What you will know: * The advantages (and disadvantages) of using formal planning languages to formulate and solve planning problems. * How LLMs generate output What you will be able to do: * Formalize planning problems into PDDL, a formal planning language. * Create structured output with LLMs (i.e., output PDDL code), and explain why naive approaches often fail ## The Super Sliding Puzzle and PDDL In Homework 2, you worked with the swapping tile game where tiles were arranged in a grid and you swapped adjacent tiles until the goal state was reached. That game is based on the 15-tile puzzle, which has one open cell that tiles can slide into. In this assignment, you will work on an extension of this puzzle, that goes by many names (klotski, super-slider puzzle, Huarong Dao, and many more). ![Screenshot 2025-08-19 at 10.05.46 PM](https://hackmd.io/_uploads/rynFTjfYeg.png) ### Rules of the game The game takes place on a 4x5 board of square cells. Blocks are placed on the board at various positions. Blocks can slide up, down, left, and right **only if the cells they will move into are unoccupied by other blocks**. The goal is typically to move a 2x2 block to the bottom-center of the board. This [video](https://www.youtube.com/watch?v=YGLNyHd2w10) provides insight into the structure of the problem (it discusses our version of the puzzle at 5:20). ## Writing PDDL There are two options available for writing PDDL and running solvers for this assignment. You can use the online editor that was demonstrated in class or you can use a PDDL extension in VSCode that adds highlighting and solvers for PDDL. In both cases, you will utilize [Planning-as-a-service](https://github.com/AI-Planning/planning-as-a-service), which provides an API to interact with PDDL solvers hosted on the cloud. ### Option 1: Planning Domains Editor [editor.planning.domains](https://editor.planning.domains) provides an online editor and access to planners. It provides an easy way to write PDDL files and run popular solvers. You can load existing files (i.e., the files provided in the stencil) by clicking on the File menu and the load files button. To run a solver on a PDDL problem, click the solver button on the toolbar. You must always specify the correct domain file, problem file, and a solver. Different solvers work with different variations of PDDL. For example the Temporal Fast Downward Solver works with durative actions, or actions that take some amount of time to run. Some guarantee optimality, others only find satisficing plans (i.e., suboptimal plans that satisfy the goal condition). For this assignment, we recommend using **BFWS: Best-First Width Search** on editor.planning.domains. It provides support for STRIPS and all of the features of PDDL you will need. Run the solver BFWS on `hello_world_problem.pddl` and the `hello_world_domain.pddl` to test your environment. However, if you are facing problems with BFWS, feel free to use **LAMA**. ### Option 2: PDDL VSCode Plugin Another option is to work locally in VSCode. 1. Install the PDDL extension for VSCode, which will provide syntax highlighting and other useful tools for PDDL. You can install through the extension menu in VSCode or through the [online portal](https://marketplace.visualstudio.com/items?itemName=jan-dolejsi.pddl). 2. Navigate to hello_world_problem.pddl. We will be using external planners for this assignment. To run a planner, press Alt + p (or option + p on a mac) or right click anywhere on the text and look for the "PDDL: Run the planner" option. This will bring up a list of possible solvers to use. We recommend the first option, **LAMA** for this assignment in VSCode. (BFWS, the recommended solver for the online editor does not have an endpoint set up for the VSCode plugin) 3. Running a solver should produce a chart of actions (two hello world actions to be specific). You can export the plan in plain text by clicking the three horizontal bars in the top right pane and saying export to `.plan` file. :::success ::: ### PDDL Tips For more information and guidance on PDDL, you can refer to the course notes, [the planning wiki](https://planning.wiki/ref/pddl/domain), or watch a [video](https://www.youtube.com/watch?v=_NOVa4i7Us8&list=PL1Q0jeuU6XppS_r2Sa9fzVanpbXKqLsYS) guide from the author of the VSCode extension. ## Klotski PDDL The purpose of this assignment is to practice modeling problems. There are many ways to represent this problem. We have provided starter files in each of the problem and domain files you will write that specifies the layout of the grid. :::info Task 1.1: Complete `domain1x1.pddl`, which should work for sliding puzzle games with only 1x1 blocks. 1. Specify the necessary types. What types of objects do you need to keep track of? 2. Specify the necessary predicates. What do you need to know about each type of object? 3. Specify the actions. Each action should be sliding a block to an adjacent cell. ::: :::info **Optional AI Task (Not Graded)** The goal of the PDDL portion of this assignment is for you to practice modeling. It is challenging to provide a visualization tool to visualize the Klotski game if we don't know what your actions, objects, or predicates will be a priori. Use an LLM tool of your choice to produce a visualization tool for your klotski game. :::spoiler Tips for coding with LLMs 1. Provide all of the necessary information: The LLM will need the context (that you are solving klotski), and the relevant files. 2. Generate a plan of action: Before asking your LLM to solve your problem, ask it to produce a plan. Make sure you understand and approve that plan before telling the LLM to actually execute the plan. This typically reduces the number of hallucinations and aligns the model with what you actually want to accomplish. 3. Be specific in your prompts: General prompts cause the LLM to make assumptions about what you want, which may or may not be correct. Be as specific as possible when prompting LLMs. If you know what you want your visualization to look like, specify it. Do you want to visualize it with ASCII on the command line? Or with a matplotlib animation? Or do you want to make a webapp? Be specific. 4. Give detailed feedback: An LLM likely will not be able to create the visualization in one shot and will produce errors or poor visualizations. **Don't just paste the error messages** into the LLM and ask it to fix it. Read the LLM code and try to understand where it went wrong, then prompt the LLM again with specific guidelines for fixing the bug you found. If you don't understand where the bug is, ask the LLM to help explain it to you based on the error message you are receiving. ::: If you create a visualization with an LLM, remember to attach your chat transcript link to your README.md. The major LLM providers provide a way to link a single chat (i.e., you don't just have to copy your LLM output into the README as plain text). ::: ![Screenshot 2025-09-30 at 11.24.24 PM](https://hackmd.io/_uploads/BJa8M7c3xg.png) :::info Task 1.2: Complete `problem1x1.pddl` using the diagram above. The objective is to move the red tile from the top left corner (1,1) to the top right corner (4,5). Four purple tiles start in the middle row, but can be anywhere at the end of the puzzle. 1. Specify the objects in the problem 2. Specify the initial conditions of the problem 3. Specify the goal condition of the problem Use a solver to solve the PDDL problem (see above for instructions based on your platform). Save the .plan file created by the solver as `1x1.plan`. ::: ![Screenshot 2025-09-29 at 1.13.59 PM](https://hackmd.io/_uploads/r1_DR4_2le.png) :::info Task 1.3: Complete `domain2x2.pddl`, which should work for sliding tile games with both 1x1 blocks **and** 2x2 blocks. This domain file should add new types and/or predicates and/or actions on top of what you already completed for `domain1x1.pddl`. Complete `problem2x2.pddl` based on the diagram above. A 2x2 block begins in the top left corner and should end in the green region (the center of the bottom two rows). Use a solver to solve the problem. Save the output as `2x2.plan`. ::: ![Screenshot 2025-09-30 at 11.11.12 PM](https://hackmd.io/_uploads/By5g2fqhel.png) :::info **Task 1.4**: In your README.md, answer the following questions: 1. There are other ways of solving these problems, notably, we can write a search problem implementation that matches what we did in assignment 1 and 2. What are two potential advantages that PDDL provides over creating a search problem directly in python (as you did in homeworks 1 and 2). What are some potential disadvantages of using PDDL? 2. We don't expect anyone to have gotten their PDDL correct on their first try. Name one bug you encountered while writing your PDDL files. How did you determine it was a problem and how did you fix it? ::: ## LLMs: Colab Notebooks The LLM portion of this assignment consists of two separate Google Colab notebooks. Each notebook has a specific setup process that you must follow. Because we are running these notebooks on Google Colab and they are about LLMs, we are allowing you to use the built in AI tools in Colab without any additional citation. If you run into syntax questions, you can ask the built-in Gemini AI tool. Make a copy of these two notebooks: 1. [Introduction to LLMs](https://colab.research.google.com/drive/1SAHPKB32G3lZlBHjis9jviCyNNSRlDAO?usp=sharing) 2. [Planning with LLMs](https://colab.research.google.com/drive/1fjFIb27EnoPVNwj8oVsxgiKOmEHJPcwR?usp=sharing) ## Submission ### PDDL Assignment Download Please click [here](https://classroom.github.com/a/Q5uJvBib) to access the PDDL portion of the assignment. When you are done with the Colab Notebooks, download each notebook from Colab as a `.ipynb` file. Add them to the folder with your PDDL code and README. ### Gradescope Submit your assignment via Gradescope. To submit through GitHub, follow this sequence of commands: 1. `git add -A` 2. `git commit -m "commit message"` 3. `git push` Now, you are ready to upload your repo to Gradescope. :::danger **⚠️WARNING⚠️** Make sure you have the correct assignment repo and branch selected before submitting. ::: *Tip*: If you are having difficulties submitting through GitHub, you may submit by zipping up your hw folder. ### Handin Your handin should contain the following: - all files (make sure you have included the notebooks!), including comments describing the logic of your implementations - a README containing: - your responses to any conceptual questions - known problems in your code - anyone you worked with - any outside resources used for the PDDL portion of the assignment (eg. Stack Overflow, ChatGPT)  ## Rubric The Jupyter Notebook portion of this assignment will be graded for completion. For PDDL, because we do not specify the actions you must implement, we cannot reliably autograde your submissions and will rely more heavily on manual grading. | Component | Points | Notes | |-------------------|------|--------------------------------| | Klotski 1x1 | 15 | Points awarded for correctness of submitted problem and domain files, as well as plan output. | | Klotski 2x2 | 15 | Points awarded for correctness of submitted problem and domain files, as well as plan output. | | PDDL Conceptual Questions| 20 | Points awarded for including answers in README. | | Intro to LLM Programming | 10 | Graded for Completion | | Intro to LLM Readme Questions | 15 | Points awarded for answering the questions in the README | | LLMs + Planning Programming | 10 | Graded for Completion | | LLMs + Planning Programming | 15 |Points awarded for answering the questions in the README | :::success Congrats on submitting your final homework. We are proud of you!! <p style="text-align: center;"> <img src="https://hackmd.io/_uploads/rkDcBp8Sbg.gif" alt="knight gif" width="400" /> </p> :::