AI Open Source Capstone class

# AI Open Source Capstone class ## Unit 8: Testing Automated testing is not the most glamorous topic, but it is one of the most differentiating skills you can build right now. All companies require engineers to understand automated testing, and 99% of computer science students have never built automated tests before. It's useful to have a map of skills that companies are looking for, both for the entry level position, as well as for the promotion track. Luckily, many companies transparently publish the skills required for entry level and senior roles. - [GitHub repo of career ladders](https://github.com/bmoeskau/engineering-ladders) - [Another annotated list of career ladders](https://www.swyx.io/career-ladders) As you can see, many of them have an expectation of automated testing. In this unit, we will review the 4 levels of automated testing. Each serve a different purpose and have tradeoffs (e.g., more thorough, but very slow to run). The testing levels are: 1. Linting and static analysis: Automated tools that check code quality, style consistency, and catch potential bugs before code execution. 2. Unit testing: Tests individual functions, methods, or components in isolation to verify they work correctly on their own. 3. Integration testing: Tests how multiple components, modules, or services work together to ensure they integrate properly. 4. End-to-end testing: Tests the complete application flow from a user's perspective, simulating real-world usage scenarios. Scalar (https://github.com/scalar/scalar) is a great example of a project that utilizes all 4 levels, so we'll be exploring in that context. ## 1. Linting and Static Analysis Linting and static analysis tools examine your code without executing it, catching errors, enforcing style consistency, and identifying potential bugs early in the development process. These tools run fast (typically seconds) and can be integrated into your editor, pre-commit hooks, and CI/CD pipelines. ### Major Tools and How They Differ The landscape of linting and static analysis tools varies by programming language and can be categorized into several types: #### **Linters** (Code Quality & Bug Detection) Linters analyze code structure to catch logic errors, enforce coding standards, and prevent common mistakes. Most use ASTs (Abstract Syntax Trees) to understand code structure. **JavaScript/TypeScript:** - **ESLint** - The most popular JavaScript/TypeScript linter with a massive ecosystem of plugins. Highly configurable and extensible. - **Biome** - A newer, faster all-in-one tool written in Rust that combines linting, formatting, and import organization. Prioritizes speed and simplicity. **Python:** - **Pylint** - Comprehensive Python linter that checks for errors, enforces coding standards, and suggests refactorings. - **Flake8** - Combines PyFlakes (error detection), pycodestyle (style checking), and McCabe (complexity checking). **Ruby:** - **RuboCop** - The standard Ruby linter and formatter, highly configurable with many community-maintained style guides. **Go:** - **golangci-lint** - Fast Go linter that runs multiple linters in parallel, catching common Go mistakes and style issues. **Key Difference**: Some tools (like ESLint, Pylint) are highly configurable with large plugin ecosystems, while others (like Biome, golangci-lint) prioritize speed and come with sensible defaults. **Try it out** - Clone Scalar (https://github.com/scalar/scalar) - Install the recommended VSCode extensions (cmd-shift-p, "Show recommended extensions"). Find the button to install all recommended extensions. - Generate a file to understand what a linter does. ``` I want to try out the Biome linter and understand more about the rules that it enforces. Can you enumerate the top three or four common things that the linter might be set up to catch? Generate a TypeScript file that demonstrates those issues. Some should trigger errors, some should trigger warnings. ``` - Run the Biome linter ``` How do I run the Biome linter? ``` #### **Type Checkers** (Type Safety) Type checkers analyze code to ensure type correctness, catching type mismatches and missing properties before runtime. **JavaScript/TypeScript:** - **TypeScript Compiler (tsc)** - Analyzes TypeScript code for type correctness. Type checking happens at compile time. - **Flow** - Facebook's static type checker for JavaScript (less common now, TypeScript has largely won). **Python:** - **mypy** - Static type checker for Python that enforces type hints. - **Pyre** - Facebook's type checker for Python (alternative to mypy). **Key Difference**: Type checkers focus exclusively on type safety, while linters focus on code quality and style. They complement each other - you'll often use both. - Generate a file to understand what a typechecker does. ``` I want to try out the tsc type checker and understand more about the rules that it enforces. Generate a typescript file that has a variety of type checker errors. ``` - Run the type checker ``` How do I run the typechecker? ``` #### **Formatters** (Code Style) Formatters automatically enforce consistent code style (indentation, spacing, line breaks) across your codebase. They're usually safe to auto-apply. **Multi-language:** - **Prettier** - Opinionated formatter for JavaScript, TypeScript, CSS, JSON, Markdown, and more. "Batteries included" - minimal configuration needed. **Language-specific:** - **Black** (Python) - Uncompromising Python formatter with minimal configuration. - **gofmt** (Go) - Built into Go toolchain, automatically formats Go code. - **rustfmt** (Rust) - Official Rust formatter, part of the Rust toolchain. - **Biome Formatter** - Fast formatter built into Biome for JavaScript/TypeScript. **Key Difference**: Formatters change how code *looks*, while linters change how code *works*. Formatters are usually safe to auto-apply, while lint fixes may require code review. **Try it out** - Generate a file to understand what a formatter does. ``` I want to try out the Biome formatter and understand more about the formatting rules that it enforces. Generate a TypeScript file that has inconsistent formatting (e.g., inconsistent indentation, spacing, quote styles, trailing commas, semicolons) that the formatter would fix. ``` - Run the Biome formatter ``` How do I run the Biome formatter to check for formatting issues? How do I apply the formatting fixes? ``` #### **Advanced Static Analysis** These tools perform deeper analysis, finding security vulnerabilities, performance issues, and complex bugs. - **SonarQube** - Enterprise-grade platform that analyzes code quality, security vulnerabilities, and technical debt across multiple languages. - **CodeQL** (GitHub) - Semantic code analysis engine that finds security vulnerabilities by modeling code as data. - **Semgrep** - Fast, lightweight static analysis that uses pattern matching to find bugs and security issues across many languages. - **Bandit** (Python) - Security-focused linter that finds common security issues in Python code. #### **Specialized Tools** - **Markdown linters** (Remark, markdownlint) - Validate markdown files, check for broken links, enforce documentation standards. - **Dockerfile linters** (hadolint) - Check Dockerfile best practices and security issues. - **YAML/JSON validators** - Ensure configuration files are valid and follow schemas. ### How They Work Together Most projects use multiple tools together, each serving a specific purpose. For example, a JavaScript/TypeScript project might use: - A **linter** (ESLint or Biome) for code quality - A **type checker** (TypeScript) for type safety - A **formatter** (Prettier or Biome) for consistent style - **Static analysis** tools (Semgrep, CodeQL) for security scanning **Real-world example**: The Scalar project uses Biome for most TypeScript/JavaScript files, ESLint specifically for Vue components, TypeScript compiler for type checking, and Remark for markdown validation. This multi-tool approach leverages each tool's strengths. **When choosing tools**: Consider your language ecosystem, team preferences, performance requirements, and whether you want an all-in-one solution (like Biome) or prefer specialized tools (like ESLint + Prettier + TypeScript). For the various linters, type checkers, and formatters above, since they are very fast, they generally run during these times: 1. Continuously in VS Code (however, you can choose to ignore the squiggles) 2. On pre-commit (aborting the commit on error) 3. On pull-request (aborting the commit on progress) ## 2. Unit Tests Unit tests verify that individual functions, methods, or components work correctly in isolation. They're fast to run (typically milliseconds per test), easy to debug when they fail, and help catch bugs early in development. **Characteristics:** - **Speed**: Very fast (milliseconds to seconds for entire suites) - **Scope**: Tests one function/component at a time - **Isolation**: Uses mocks/stubs to isolate the code under test - **Purpose**: Verify logic correctness, edge cases, and error handling **When to use**: Write unit tests for pure functions, utility functions, data transformations, and individual component logic. They form the foundation of your test suite - you'll write many more unit tests than integration or E2E tests. **Tools in Scalar**: The project uses [Vitest](https://vitest.dev/) for unit testing, which is fast, compatible with Jest APIs, and has excellent TypeScript support. ### Build a Feature: Auto-extract Query Parameters from URL Before we dive into testing, let's build a feature together. This will give us real code to test at all three levels. **Note**: We're working on the **API Client**, which is one of Scalar's main applications. You can launch it and see it in action by running: ```bash pnpm dev:client:web ``` Then navigate to http://localhost:5065 in your browser. This will show you the API Client interface where users can make HTTP requests, similar to Postman or Insomnia. **Feature Description**: When a user types a URL with query parameters in the address bar (e.g., `https://api.github.com/users/scalar?page=1&per_page=10`), automatically extract those parameters and populate the Query Parameters section in the UI. **Prompt to paste into your AI assistant:** ``` I'm working on the Scalar API Client (packages/api-client). I want to implement a feature that automatically extracts query parameters from URLs typed in the address bar. Here's what I need: 1. Create a utility function `extractQueryParams(url: string)` that: - Takes a full URL string (may include protocol, domain, path, and query string) - Returns an array of objects with `key` and `value` properties: `Array<{ key: string; value: string }>` - Handles edge cases: URLs without query params, encoded values, duplicate keys, empty values - Should be placed in `packages/api-client/src/libs/parse-url-query-params.ts` 2. Update the AddressBar component (`packages/api-client/src/components/AddressBar/AddressBar.vue`) to: - Call `extractQueryParams` when the URL changes in the `updateRequestPath` function - Use the `requestExampleMutators` to update the query parameters in the request example - Only extract params from the URL if the URL actually contains a query string - Don't overwrite existing query params that the user has manually added - merge intelligently 3. The function should handle: - URLs with query strings: `https://api.example.com/users?page=1&limit=10` - URLs without query strings: `https://api.example.com/users` - Encoded values: `https://api.example.com/search?q=hello%20world` - Multiple values for same key: `https://api.example.com/filter?tag=js&tag=ts` - Empty values: `https://api.example.com/search?q=` - Relative URLs: `/api/users?page=1` Look at how `parseQueryParameters` works in `packages/api-client/src/libs/parse-curl.ts` for reference on the expected return format. The query parameters should be added to `example.parameters.query` using the `requestExampleMutators.edit` method. Check `packages/api-client/src/views/Request/RequestSection/RequestParams.vue` to see how query parameters are structured. ``` **Manual Testing Steps:** After implementing the feature, test it manually: 1. **Launch the API client**: Run `pnpm dev:client:web` from the project root 2. **Open the app**: Navigate to http://localhost:5065 3. **Test basic extraction**: - Type a URL with query params: `https://api.github.com/users/scalar?page=1&per_page=10` - Press Enter or click outside the input - Check that the Query Parameters section shows `page=1` and `per_page=10` 4. **Test URL without query params**: - Type: `https://api.github.com/users/scalar` - Verify no query parameters are added 5. **Test encoded values**: - Type: `https://api.example.com/search?q=hello%20world` - Verify the value is decoded: `hello world` 6. **Test multiple values**: - Type: `https://api.example.com/filter?tag=js&tag=ts` - Verify both values appear (may need to check how the UI handles duplicates) 7. **Test edge cases**: - Empty query value: `https://api.example.com/search?q=` - Relative URL: `/api/users?page=1` **Try it out** Now let's run some existing unit tests to see how they work: 1. **Run all unit tests**: ```bash pnpm test ``` This runs Vitest in watch mode. You'll see all tests run and the results. 2. **Run tests for a specific package**: ```bash pnpm test packages/api-client ``` This filters tests to only the API client package. 3. **Run a specific test file**: ```bash pnpm test create-fetch-query-params ``` This runs tests matching the pattern "create-fetch-query-params". 4. **Run tests in CI mode** (non-interactive, exits after completion): ```bash CI=1 pnpm test packages/api-client --run ``` 5. **Examine a test file**: Look at `packages/api-client/src/libs/send-request/create-fetch-query-params.test.ts` to see: - How tests are structured with `describe` and `it` blocks - How assertions are made with `expect` - How edge cases are tested (empty params, arrays, etc.) ## 3. Integration Tests Integration tests verify that multiple components, modules, or services work together correctly. They test the interactions between different parts of your system, ensuring that components integrate properly. **Characteristics:** - **Speed**: Moderate (seconds to minutes) - **Scope**: Tests multiple components/services together - **Isolation**: May use real dependencies or test doubles for external services - **Purpose**: Verify component interactions, API contracts, data flow between modules **When to use**: Write integration tests for API endpoints, database interactions, component integration, and workflows that span multiple modules. They catch issues that unit tests miss because they test real interactions. **Tools in Scalar**: Also uses Vitest for integration tests. The project distinguishes unit tests (fast, isolated) from integration tests (may require servers, test databases, or external services). **Try it out** Let's run some integration tests: 1. **Run integration tests** (these may require test servers): ```bash pnpm test:integrations:ci ``` Note: Some integration tests require `@scalar/proxy-server` and `@scalar/void-server` to be running. If tests fail, start them: ```bash pnpm dev:void-server pnpm dev:proxy-server ``` Then run the tests again in another terminal. 2. **Examine an integration test**: Look at `packages/api-client/src/libs/send-request/create-request-operation.test.ts`: - Notice how it tests the full request flow (creating request → sending → receiving response) - See how it uses real HTTP requests to a test server (`void-server`) - Observe how it tests multiple components working together (request creation, query params, headers, body) 3. **Run a specific integration test**: ```bash pnpm test create-request-operation ``` 4. **See test structure**: Integration tests often have `beforeAll` hooks to set up test servers or `beforeEach` hooks to reset state between tests. ## 4. End-to-end Tests End-to-end (E2E) tests verify that the complete application works correctly from a user's perspective. They simulate real user interactions by controlling a browser and testing the full user workflow. **Characteristics:** - **Speed**: Slow (seconds to minutes per test) - **Scope**: Tests the entire application stack - **Isolation**: Uses real browsers, real servers, real databases (or close approximations) - **Purpose**: Verify user workflows, UI interactions, and complete feature functionality **When to use**: Write E2E tests for critical user journeys, UI workflows, and features that span multiple pages/components. They're expensive to run, so use them sparingly - focus on high-value user paths. **Tools in Scalar**: Uses [Playwright](https://playwright.dev/) for E2E testing, which can control multiple browsers (Chromium, Firefox, WebKit) and provides excellent debugging tools. **Try it out** Let's run some E2E tests: 1. **Install Playwright browsers** (first time only): ```bash pnpm playwright:install ``` This downloads browser binaries needed for testing. 2. **Start the API reference server** (required for e2e tests): ```bash pnpm --filter cdn-api-reference dev ``` This starts the CDN API reference example server on port 3173, which serves the HTML files that the e2e tests navigate to. Keep this running in a terminal. 3. **Run E2E tests** (in another terminal): ```bash pnpm test:e2e:local ``` This runs Playwright tests against the local server. The tests will navigate to `http://localhost:3173` to test the API reference pages. 4. **Run E2E tests with UI** (interactive mode): ```bash pnpm test:e2e:ui ``` This opens Playwright's test UI where you can: - See tests running in real-time - Watch the browser execute tests - Debug failed tests step-by-step - Time travel through test execution 5. **Examine an E2E test**: Look at `playwright/tests/local.spec.ts`: - See how it uses `test()` blocks from Playwright - Notice `page.goto()` to navigate to URLs - Observe `expect()` assertions that check UI elements - See helper functions like `testApiReference()` that encapsulate common test patterns 6. **Run a specific E2E test**: ```bash pnpm test:e2e:local --grep "local build" ``` ## 5. Designing a Test Plan Before writing tests, it's crucial to design a test plan. A good test plan helps you: - Identify what to test - Prioritize test cases - Ensure comprehensive coverage - Avoid redundant tests ### Test Plan Template For our query parameter extraction feature, here's how we'd structure a test plan: #### **Unit Tests** (Test the `extractQueryParams` function) **Happy Path:** - ✅ Extract single query parameter: `?page=1` → `[{key: 'page', value: '1'}]` - ✅ Extract multiple query parameters: `?page=1&limit=10` → both params extracted - ✅ Handle encoded values: `?q=hello%20world` → decoded to `hello world` **Edge Cases:** - ✅ URL without query string: `https://api.example.com/users` → empty array - ✅ Empty query value: `?q=` → `[{key: 'q', value: ''}]` - ✅ Multiple values for same key: `?tag=js&tag=ts` → both values extracted - ✅ Special characters in values: `?q=hello&world` → properly parsed - ✅ Relative URLs: `/api/users?page=1` → params extracted - ✅ Malformed URLs: handle gracefully without crashing **Error Cases:** - ✅ Invalid URL format: return empty array or handle error - ✅ Very long URLs: performance test #### **Integration Tests** (Test AddressBar + Query Params UI) **Happy Path:** - ✅ User types URL with query params → params appear in Query Parameters section - ✅ User types URL without query params → no params added - ✅ User manually adds query param → doesn't get overwritten by URL extraction **Edge Cases:** - ✅ User types URL, then edits query params manually → URL updates accordingly - ✅ User clears query params → URL query string removed - ✅ Multiple requests → each maintains its own query params #### **E2E Tests** (Test complete user workflow) **Critical User Journeys:** - ✅ User types URL with query params → sees params in UI → sends request → params included - ✅ User imports cURL with query params → params extracted correctly - ✅ User copies URL with query params → can paste and params are preserved ### Prioritization **Must Have** (P0): - Basic extraction works - Params appear in UI - Request includes params **Should Have** (P1): - Edge cases handled - Bidirectional sync (URL ↔ Params) - Doesn't break existing functionality **Nice to Have** (P2): - Performance optimizations - Advanced URL parsing - Visual indicators ## 6. Implementing Tests: A Walkthrough Now let's walk through implementing tests at all three levels for our query parameter extraction feature. We'll use prompts that you can paste into Cursor to generate the test code. ### Step 1: Write Unit Tests **Prompt to paste into Cursor:** ``` I need to write unit tests for the `extractQueryParams` function in `packages/api-client/src/libs/parse-url-query-params.ts`. Create a test file `packages/api-client/src/libs/parse-url-query-params.test.ts` using Vitest. The tests should cover: 1. Extracting a single query parameter: `?page=1` → `[{key: 'page', value: '1'}]` 2. Extracting multiple query parameters: `?page=1&limit=10` → both params extracted 3. URL without query string: `https://api.example.com/users` → empty array 4. URL-encoded values: `?q=hello%20world` → decoded to `hello world` 5. Empty query values: `?q=` → `[{key: 'q', value: ''}]` 6. Relative URLs: `/api/users?page=1` → params extracted 7. Multiple values for same key: `?tag=js&tag=ts` → both values extracted Follow the testing patterns used in `packages/api-client/src/libs/send-request/create-fetch-query-params.test.ts`: - Use `describe` and `it` blocks from vitest - Use clear, descriptive test names (no "should" prefix) - Use `expect().toEqual()` for assertions - Each test should be independent and test one thing ``` **After generating the tests, run them:** ```bash pnpm test parse-url-query-params ``` ### Step 2: Write Integration Tests **Prompt to paste into Cursor:** ``` I need to write integration tests for the AddressBar component's query parameter extraction feature. Create a test file `packages/api-client/src/components/AddressBar/AddressBar.test.ts` using Vitest and Vue Test Utils. The tests should verify: 1. When a user types a URL with query parameters in the address bar, those parameters are automatically extracted and added to `example.parameters.query` 2. When a user has manually added query parameters, typing a URL with different query parameters doesn't overwrite the manual ones - they should be merged intelligently Look at existing Vue component tests in the codebase for patterns on how to: - Set up Vue Test Utils with the component - Mock the workspace store and request mutators - Simulate user interactions (typing in the address bar) - Assert that the request example was updated correctly The AddressBar component is at `packages/api-client/src/components/AddressBar/AddressBar.vue`. It uses `requestMutators` from `useWorkspace()` and calls `updateRequestPath` when the URL changes. ``` **Note**: Integration tests for Vue components require more setup. Look at existing component tests in the codebase for patterns. You may need to examine how other components are tested to understand the test setup patterns. ### Step 3: Write E2E Tests **Prompt to paste into Cursor:** ``` I need to write an end-to-end test using Playwright for the query parameter extraction feature. Add a test to `playwright/tests/local.spec.ts` (or create a new test file if preferred) that: 1. Navigates to the API client at `http://localhost:5065` 2. Finds the address bar input (it has aria-label "Path") 3. Types a URL with query parameters: `https://api.github.com/users/scalar?page=1&per_page=10` 4. Presses Enter or triggers the URL update 5. Verifies that the Query Parameters section becomes visible 6. Verifies that the query parameters `page=1` and `per_page=10` are displayed in the UI Follow the patterns used in `playwright/tests/local.spec.ts`: - Use `test()` from `@playwright/test` - Use `page.goto()` to navigate - Use `page.getByLabel()` or `page.getByText()` to find elements - Use `await expect().toBeVisible()` for assertions - Use descriptive test names The test should be named something like "query parameters are extracted from URL when typing in address bar" ``` **After generating the test, run it:** ```bash # Make sure the API client is running first: pnpm dev:client:app # Then in another terminal, run the E2E test: pnpm test:e2e:local --grep "query parameters" ``` ### Test Implementation Checklist - [ ] Unit tests cover all edge cases - [ ] Integration tests verify component interactions - [ ] E2E tests cover critical user journeys - [ ] All tests pass locally - [ ] Tests are fast enough (unit: <1s, integration: <10s, E2E: <30s per test) - [ ] Tests are maintainable (clear names, good structure) - [ ] Tests fail for the right reasons (test the feature, not implementation details) ## Summary You've now learned about the three main levels of automated testing: 1. **Unit Tests**: Fast, isolated tests of individual functions/components 2. **Integration Tests**: Moderate speed, test component interactions 3. **E2E Tests**: Slower, test complete user workflows Each level serves a different purpose and has different tradeoffs. A well-tested codebase uses all three levels strategically: - **Many unit tests** (fast feedback, catch bugs early) - **Some integration tests** (verify components work together) - **Few E2E tests** (verify critical user journeys) The testing pyramid: wide base of unit tests, smaller middle of integration tests, narrow top of E2E tests. **Next Steps:** 1. Implement the query parameter extraction feature 2. Write tests following the test plan 3. Run all tests to ensure they pass 4. Refactor code if needed (tests give you confidence to refactor!) 5. Commit your changes with tests included