software_engineering 842 Q&As

Software Engineering FAQ & Answers

842 expert Software Engineering answers researched from official documentation. Every answer cites authoritative sources you can verify.

Common Bug Patterns > Resource cleanup and leaks

142 questions
A

Unix domain sockets created in the filesystem (socket path) must be explicitly removed with unlink() or rm after close() to clean up the socket file. Bound sockets in the abstract namespace (starting with null byte) are automatically cleaned up when closed.

Sources
95% confidence
A

The default pipe buffer size is 65536 bytes (64 KB) on Linux, configurable via fcntl() with F_SETPIPE_SZ up to the maximum /proc/sys/fs/pipe-max-size (typically 1,048,576 bytes or 1 MB).

Sources
95% confidence
A

RAII (Resource Acquisition Is Initialization) is a C++ programming idiom where resource acquisition is tied to object initialization, and resource release is automatically performed when the object goes out of scope through its destructor. This ensures exception-safe resource cleanup without explicit release code.

95% confidence
A

A resource leak occurs when a program acquires a resource (such as memory, file handles, database connections, network sockets, or file descriptors) but fails to release it back to the system when it's no longer needed. This eventually leads to resource exhaustion, causing the program or system to crash or become unresponsive.

95% confidence

Fix Verification > Pre-commit verification checklist

79 questions
A

A framework for managing and maintaining multi-language pre-commit hooks that specifies a list of hooks to use and manages the installation and execution of any hook written in any language before every commit.

95% confidence
A

Runs during git push after remote refs have been updated but before any objects are transferred, receiving the remote name and location plus a list of to-be-updated refs through stdin.

95% confidence
A

The pre-commit hook is run first, before typing in a commit message, used to inspect the snapshot that's about to be committed. Exiting non-zero aborts the commit.

95% confidence
A

Run by commands that replace commits such as git commit --amend and git rebase, receiving which command triggered the rewrite as an argument and a list of rewrites on stdin.

95% confidence
A

Runs before rebasing anything and can halt the process by exiting non-zero, commonly used to disallow rebasing commits that have already been pushed.

95% confidence
A

Takes the path to a temporary file containing the commit message as a parameter; exiting non-zero aborts the commit process, used to validate commit messages.

95% confidence

Test Failure Analysis > Tracing test inputs to assertions

57 questions
A

Assertion introspection is pytest's ability to automatically display the values of subexpressions when an assertion fails, showing function call returns, attribute accesses, comparisons, and operators without requiring boilerplate code. For example, when assert f() == 4 fails where f() returns 3, pytest displays assert 3 == 4 with + where 3 = f().

95% confidence
A

Pytest shows the failed assertion with the actual values displayed, including a "where" clause that traces the source of intermediate values. For the assertion assert f() == 4 failing, pytest shows: E assert 3 == 4 E + where 3 = f()

95% confidence

Debugging Methodology > Reading and interpreting error messages

55 questions
A

The "cause" property was added to JavaScript errors in ES2022 and indicates the reason why the current error was thrown—usually another caught error. When creating a new Error, developers can pass { cause: originalError } as the second argument to the constructor. This enables error chaining and ensures original error information is preserved for debugging.

95% confidence
A

The print and log debugging strategy involves adding print statements or "logs" to the code to display values of variables, call stacks, the flow of execution, and other relevant information. This approach is especially useful for debugging concurrent or distributed systems where order of execution can impact the program's behavior.

Sources
95% confidence
A

HTTP 404 Not Found indicates the server cannot find the requested resource, which may be temporary or permanent. HTTP 410 Gone indicates the resource is permanently removed and clients should delete any links to it. 410 is intended for permanent removal situations, whereas 404 is used when the status of the resource is unknown.

95% confidence
A

Cause elimination is a hypothesis-driven debugging technique where teams speculate about the causes of the error and test each possibility independently. This approach works best when the team is familiar with the code and the circumstances surrounding the bug.

Sources
95% confidence
A

A Python IndexError is raised when a sequence subscript (index) is out of range. This occurs when trying to access a list, tuple, or string element at an index that doesn't exist (negative indices beyond -1 or positive indices greater than or equal to the sequence length).

95% confidence
A

HTTP 500 Internal Server Error indicates a generic server-side error. HTTP 502 Bad Gateway indicates that a server acting as a gateway or proxy received an invalid response from an upstream server. HTTP 503 Service Unavailable indicates the server is currently unable to handle the request due to temporary overload or maintenance.

95% confidence
A

A JavaScript URIError is thrown when URI handling functions encounter malformed URI strings, such as when decodeURI(), decodeURIComponent(), encodeURI(), or encodeURIComponent() are called with invalid input containing malformed percent-encoding sequences, incomplete encoding sequences (like "%2"), or invalid UTF-8 sequences.

95% confidence
A

Error messages should safeguard against likely mistakes by detecting and warning about common errors before they cause problems, preserve the user's input (let users correct errors by editing their original action instead of starting over), reduce error-correction effort (guess the correct action and let users pick it from a small list of fixes), and concisely educate on how the system works to help users avoid the problem in the future.

95% confidence
A

The standard properties of a JavaScript Error object include: message (a human-readable description of the error), name (the error type name such as "Error", "TypeError", etc.), stack (a non-standard but widely-supported stack trace showing the call path), and cause (added in ES2022, indicating the original error that caused this error for error chaining).

95% confidence
A

In Python, all exceptions must be instances of a class that derives from BaseException. The base class for all built-in, non-system-exiting exceptions is Exception. Other base classes include ArithmeticError (for arithmetic errors like OverflowError, ZeroDivisionError), LookupError (for lookup errors like IndexError, KeyError), and OSError (for operating system-related errors). User-defined exceptions should derive from Exception.

95% confidence
A

Python exceptions have three context attributes: context (automatically set when a new exception is raised while handling another), cause (explicitly set using "raise new_exc from original_exc"), and suppress_context (automatically set to True when cause is set, determining whether context is displayed).

95% confidence
A

A Python KeyError is raised when a mapping (dictionary) key is not found in the set of existing keys. This occurs when trying to access or delete a dictionary key that doesn't exist using bracket notation (dict['missing_key']).

95% confidence
A

A Python TypeError is raised when an operation or function is applied to an object of inappropriate type. For example, trying to add a string and an integer, calling a non-callable object, or iterating over a non-iterable object will raise a TypeError.

95% confidence
A

The server cannot find the requested resource. In a browser, this means the URL is not recognized. In an API, this can also mean that the endpoint is valid but the resource itself does not exist. Servers may also send this response instead of 403 Forbidden to hide the existence of a resource from an unauthorized client.

95% confidence
A

The server understands the content type of the request entity and the syntax is correct, but was unable to process the contained instructions. This is commonly used in WebDAV scenarios and REST APIs when semantic errors prevent processing.

Sources
95% confidence
A

The debugging process typically involves: (1) Reproduce the conditions to observe the error firsthand, (2) Find the bug by pinpointing its source, (3) Determine the root cause by examining code logic and flow, (4) Fix the bug by revising the code, (5) Test to validate the fix with unit, integration, system, and regression tests, and (6) Document the process including what caused the bug and how it was fixed.

Sources
95% confidence
A

Backtracking is a debugging approach where developers work backward from the point the error was detected to find the origin of the bug. Developers retrace the steps the program took with the problematic source code to see where things went wrong. This can be effective when used alongside a debugger tool.

Sources
95% confidence
A

A JavaScript RangeError is thrown when a numeric value is outside the valid range for its intended use, such as creating an array with a negative length, calling toFixed() with a precision value outside the allowed range (0-100), or providing invalid values to numeric methods that expect specific ranges.

95% confidence
A

A Python ValueError is raised when an operation or function receives an argument that has the right type but an inappropriate value. For example, trying to convert a non-numeric string to an integer using int("abc") or finding the square root of a negative number will raise a ValueError.

95% confidence
A

A stack trace is a report of the active stack frames at a certain point in time during program execution. It shows the sequence of function calls that led to an error, including function names, file names, line numbers, and column numbers. The top line typically shows the error type and message, followed by the call hierarchy from the error location back to the entry point.

Sources
95% confidence
A

Error chaining in Python allows developers to explicitly chain exceptions using the "from" keyword: "raise new_exc from original_exc". This sets the cause attribute on the new exception and preserves the original exception for debugging. The default traceback display shows both chained exceptions, with the cause always shown and the context shown only when cause is None.

95% confidence
A

Error messages should use human-readable language (avoid technical jargon), concisely and precisely describe the issue (avoiding generic messages like "An error occurred"), offer constructive advice or remedies, take a positive tone and not blame the user (avoid words like "invalid," "illegal," or "incorrect"), and avoid humor since it can become stale if users encounter the error frequently.

95% confidence
A

There is no content to send for this request, but the headers are useful. The user agent may update its cached headers for this resource with the new ones. This is commonly used for DELETE operations or successful PUT/POST requests that don't return data.

95% confidence
A

The add_note() method was added in Python 3.11 and allows adding string notes to exceptions that appear in the standard traceback after the exception string. It takes a single string argument and raises TypeError if the note is not a string. Notes are stored in the notes attribute as a list.

95% confidence
A

When debugging large codebases, teams divide lines of code into segments—functions, modules, class methods, or other testable logical divisions—and test each one separately to locate the error. When the problem segment is identified, it can be divided further and tested until the source of the bug is identified.

Sources
95% confidence
A

Error messages must present themselves noticeably and recognizably to users by: displaying the error message close to the error's source, using noticeable, redundant, and accessible indicators (bold, high-contrast, red text, icons), designing errors based on their impact (differentiating between warnings and barriers), and avoiding prematurely displaying errors before users complete their input.

95% confidence
A

A Python AttributeError is raised when an attribute reference or assignment fails, such as when trying to access an attribute that doesn't exist on an object. This occurs when an object does not support attribute references or when attempting to access a non-existent attribute.

95% confidence
A

A JavaScript ReferenceError is thrown when attempting to access an undeclared variable, accessing a property of null or undefined, using a variable in strict mode that hasn't been declared, or accessing global properties that don't exist. This occurs during runtime when the JavaScript engine cannot resolve a reference to a variable or property.

95% confidence
A

A JavaScript TypeError is thrown when a value is not of the expected type, such as calling a non-callable value as a function, accessing properties on null or undefined, passing wrong argument types to functions, or performing operations on incompatible data types. For example, attempting to call a number as a function (5()) will throw a TypeError.

95% confidence
A

The server, while acting as a gateway or proxy, did not receive a timely response from an upstream server it needed to access to complete the request. This differs from 503 Service Unavailable in that 504 indicates a timeout with an upstream server, not server overload.

95% confidence
A

Rubber duck debugging is an approach where developers "explain or talk out" the code, line by line, to any inanimate object. The idea is that by trying to explain the code out loud, developers can better understand its logic (or lack thereof) and spot bugs more easily.

Sources
95% confidence
A

The four main types of coding errors are: Semantic errors (code violates language rules and won't produce meaningful output), Syntax errors (missing elements like parentheses, commas, or other typographical errors), Logical errors (syntax is correct but instructions cause undesired output), and Runtime errors (errors that happen when an application is running or starting up).

Sources
95% confidence
A

A JavaScript SyntaxError is thrown when the JavaScript engine encounters code that violates the language's grammar rules during parsing, before code execution begins. Common causes include missing parentheses, brackets, or braces; invalid variable declarations; incorrect use of operators; malformed string literals; invalid escape sequences; and duplicate parameter names in functions.

95% confidence
A

A Python ImportError is raised when the import statement has trouble trying to load a module, or when the "from list" in a "from ... import" statement has a name that cannot be found. This can occur due to missing modules, circular imports, or incorrect module paths.

95% confidence

Code Search Techniques > Finding all usages of a function or variable

53 questions
A

Use the "--no-index" flag to search files in the current directory that are not managed by Git, or to ignore that the current directory is managed by Git. This is similar to running "grep -r" but with additional benefits like using pathspec patterns.

95% confidence
A

Ripgrep supports searching files in UTF-8, UTF-16, latin-1, GBK, EUC-JP, Shift_JIS, and more. UTF-16 has some automatic detection support, while other encodings must be specified with the "-E/--encoding" flag.

Sources
95% confidence

Test Failure Analysis > Understanding test output and stack traces

46 questions
A

You can control the number of stack frames by setting the Error.stackTraceLimit variable. Setting it to 0 disables stack trace collection, any finite integer sets the maximum number of frames to collect, and setting it to Infinity means all frames get collected. This variable only affects the current context and must be set explicitly for each context that needs a different value.

Sources
95% confidence
A

The -v flag enables verbose output that lists all of the tests and their results, showing each test with === RUN and `

95% confidence
A

The --full-trace option causes very long traces to be printed on error (longer than --tb=long). It also ensures that a stack trace is printed on KeyboardInterrupt (Ctrl+C), which is useful for finding where tests are hanging when interrupted.

95% confidence
A

The available options are: --tb=auto (default, 'long' for first and last entry, 'short' for others), --tb=long (exhaustive, informative), --tb=short (shorter format), --tb=line (only one line per failure), --tb=native (Python standard library format), and --tb=no (no traceback at all).

95% confidence
A

The -c or --catch option causes Control-C during the test run to wait for the current test to end and then reports all the results so far. A second Control-C raises the normal KeyboardInterrupt exception. This was added in Python 3.2.

95% confidence
A

The available flags are: --quiet or -q (less verbose mode), -v (increase verbosity, display individual test names), -vv (more verbose, display more details from test output), and -vvv (not a standard flag but may be used for even more detail in certain setups).

95% confidence
A

The -k option only runs test methods and classes that match the pattern or substring. Patterns containing wildcards (*) are matched using fnmatch.fnmatchcase(); otherwise simple case-sensitive substring matching is used. This option may be used multiple times. This was added in Python 3.7.

95% confidence
A

The -b or --buffer option buffers the standard output and standard error streams during the test run. Output during a passing test is discarded. Output is echoed normally on test fail or error and is added to the failure messages. This was added in Python 3.2.

95% confidence
A

The format is typically: "ClassName.methodName(FileName.java:lineNumber)" - where ClassName is the fully-qualified name, methodName is the method name, FileName.java is the source file, and lineNumber is the line number. Variations exist when line number or file name are unavailable, such as "ClassName.methodName(Unknown Source)" or "ClassName.methodName(Native Method)".

95% confidence

Fix Verification > Regression testing basics

45 questions
A

"Suite hygiene" sessions involve scheduling regular maintenance to eliminate obsolete tests, update existing ones, and add new tests to align with newly added functions. Version control integrations are essential to maintaining traceability and clean test suites.

95% confidence
A

Test cases should be selected using a combination of techniques: retest all (run every test, time-consuming but highest assurance), regression test selection (run only relevant test cases to save time while maintaining targeted coverage), test case prioritization (prioritize by risk level, business value, and historical failure patterns), and automation (automate stable, repeatable tests).

95% confidence
A

The primary purpose of regression testing is to catch bugs that may have been accidentally introduced into a new build or release candidate and to ensure that previously eradicated bugs continue to stay dead. It verifies that code modifications haven't broken existing functionality or introduced new bugs.

95% confidence
A

Best practices include prioritizing test cases by risk and business value, automating repetitive and stable test cases, regularly reviewing and maintaining the regression suite, integrating regression tests with CI/CD pipelines, and using a hybrid approach of automated and manual testing.

95% confidence
A

The "retest all" technique checks all test cases on the current program to verify its integrity. Though expensive as it needs to re-run all cases, it ensures there are no errors because of the modified code. This approach is exhaustive and offers maximum test coverage but is the most time and resource-intensive technique.

95% confidence
A

Progressive regression testing is a mixed approach that evaluates both new features and existing features to detect bugs introduced through new functionality. It's typically used when releasing a new update to an existing software product.

Sources
95% confidence
A

Regression testing is an integral part of the extreme programming software development method. In this method, design documents are replaced by extensive, repeatable, and automated testing of the entire software package throughout each stage of the software development process.

95% confidence
A

The two types of test case prioritization are: 1) General prioritization - prioritizing test cases that will be beneficial on subsequent versions, and 2) Version-specific prioritization - prioritizing test cases with respect to a particular version of the software.

95% confidence
A

Corrective regression testing ensures data consistency by re-running test cases to verify whether similar test results occur. It is often conducted when no changes have been made to the codebase, such as when code is refactored to ensure refactoring isn't introducing code errors.

Sources
95% confidence
A

Regression test selection involves running only a part of the test suite if the cost of selecting the subset of tests is less than the retest all technique. It runs tests covering areas most likely to be affected by code changes and requires testers to prune obsolete test cases and narrow down relevant ones to be reused.

95% confidence
A

Partial regression testing is used when the goal is to find out whether recent changes have impacted only a subset of the updated system. It detects that subset and performs suitable diagnostics, such as when integrating a new payment gateway and evaluating only that portion of functionality.

Sources
95% confidence
A

Test case prioritization involves scheduling test cases so that higher priority tests are executed before lower priority ones. Tests are prioritized by business value, significance, and historical failure rate. Critical functions, features that often fail, and high-risk modules are tested first and most frequently.

95% confidence
A

Developer testing compels a developer to focus on unit testing and to include both positive and negative test cases. Unlike traditional tests that verify only intended outcomes, developer testing helps catch regressions earlier by having developers write comprehensive test cases as part of the development cycle.

95% confidence
A

The "minefield problem" refers to when automated regression testing becomes too static and rote. Developers may learn how to pass a fixed library of tests, causing standard regression tests to stop testing effectively. This results in clearing a single safe path while ignoring potential bugs in other areas of the application.

95% confidence
A

Regression tests can be done at any level, from unit through to system integration. Functional tests exercise the complete program with various inputs, while non-functional tests assess aspects such as performance, security, or reliability.

95% confidence
A

Selective regression testing introduces a predictive element where test cases from the test suite are selected based on the testers' belief that those areas are going to receive impacts from code changes. For example, developers updating a mobile application's user interface might use selective regression testing to ensure ongoing stability.

Sources
95% confidence
A

Regression testing uses a more precise scope to focus on changes recently made, while QA (Quality Assurance) evaluates the entire system and its workings. Both share similar missions to optimize user experience and deliver high-quality software, but they look at different scopes of the system.

Sources
95% confidence
A

Change impact analysis is performed to determine an appropriate subset of tests (non-regression analysis). It involves understanding the scope of changes by reviewing commit logs, feature tickets, bug fixes, and pull requests, and tracing dependencies to find interconnected components likely to be impacted by changes.

95% confidence
A

The three execution modes are: 1) Manual testing (ideal for exploratory tests or UI-focused tests needing human judgment), 2) Automated testing (ideal for sanity checks, API validations, and high-frequency test runs within CI/CD workflows), and 3) Hybrid (combining automated high-volume regression runs with manual testing of edge cases).

95% confidence
A

Retest-all regression testing is considered post-final testing and involves running tests on all regression test cases that have already been cleared to ensure everything works together harmoniously. It's often used for checking changes accompanying major architectural shifts.

Sources
95% confidence
A

Change impact analysis mapping involves understanding the scope of changes by reviewing commit logs, feature tickets, bug fixes, and pull requests to verify which functions and modules have changed, then studying architecture diagrams, code ownership, and dynamic analysis to trace dependencies.

95% confidence
A

Regression testing is re-running functional and non-functional tests to ensure that previously developed and tested software still performs as expected after a change. This includes checking whether bug fixes, software enhancements, configuration changes, or even hardware substitutions have introduced new faults or caused previously fixed bugs to re-emerge.

95% confidence
A

Complete regression testing involves retesting the whole system or application and is used when more comprehensive testing is needed, such as following major code changes. It ensures ongoing functionality after significant updates like adding a product gallery to a website.

Sources
95% confidence
A

Unit regression testing concentrates on the components or modules (units) that make up a system and checks whether errors have been introduced into individual units. For example, when adding a "Forgot password" feature to a website, unit regression testing would verify that the original login mechanism continues to work as intended.

Sources
95% confidence
A

Regression testing should be performed after bug fixes, before all major releases, after adding new features, after changes to performance metrics or test environments (such as OS and database upgrades), and after each commit on a CI/CD pipeline.

95% confidence
A

Common challenges include maintaining the test suite (accumulation of outdated or redundant tests), time and resource constraints (complete regression suites are slow and effort-intensive), selecting the right tests (balancing thoroughness with practicality), complexity in large applications (cross-module issues), and integrating with automation effectively (brittle, flaky test scripts).

95% confidence

Test Failure Analysis > Analyzing why tests fail after a fix

44 questions
A

Run node --inspect-brk node_modules/.bin/jest --runInBand on Unix or node --inspect-brk ./node_modules/jest/bin/jest.js --runInBand on Windows, then connect via Chrome DevTools at chrome://inspect or use VS Code's debugger.

Sources
95% confidence
A

The -k flag only runs test methods and classes that match the pattern or substring, supporting wildcards with * for fnmatch-style matching or simple case-sensitive substring matching.

95% confidence
A

The default timeout is controlled by jasmine.DEFAULT_TIMEOUT_INTERVAL, and if a promise doesn't resolve within this timeout, Jest throws an error: "Timeout - Async callback was not invoked within timeout specified by jasmine.DEFAULT_TIMEOUT_INTERVAL."

Sources
95% confidence
A

Allure Report is an open-source framework-agnostic test result visualization tool that transforms test execution data into clear, interactive HTML reports, working with 30+ testing frameworks across JavaScript, Python, Java, C#, PHP, and Ruby.

95% confidence
A

The Test Pyramid states that you should have many more low-level unit tests than high-level broad-stack tests running through a GUI, as UI tests are brittle, expensive to write, and time-consuming to run.

95% confidence

Common Bug Patterns > Null and undefined handling

37 questions
A

The no-unsafe-optional-chaining rule disallows use of optional chaining in contexts where the undefined value is not allowed. It detects cases where optional chaining expressions are used in positions where short-circuiting to undefined causes a TypeError, such as (obj?.foo)() or (obj?.foo).bar.

Sources
95% confidence
A

Type guards support non-null and non-undefined checks using ==, !=, ===, or !== operators to compare to null or undefined. The effects on subject variable types accurately reflect JavaScript semantics (double-equals checks for both values, triple-equals only checks for the specified value).

95% confidence
A

The strictNullChecks compiler option switches TypeScript to a strict null checking mode where null and undefined values are not in the domain of every type and are only assignable to themselves and any (with undefined also assignable to void). This enables detection of erroneous use of null/undefined values.

95% confidence
A

The non-null assertion operator is a new ! postfix expression operator that asserts its operand is non-null and non-undefined. The operation x! produces a value of the type of x with null and undefined excluded. It is removed in the emitted JavaScript code.

95% confidence
A

Type guards support checking "dotted names" consisting of a variable or parameter name followed by one or more property accesses (e.g., options.location.x). A type guard for a dotted name has no effect following an assignment to any part of the dotted name.

95% confidence
A

None is an object frequently used to represent the absence of a value, as when default arguments are not passed to a function. None is the sole instance of the NoneType type, and assignments to None are illegal and raise a SyntaxError.

95% confidence
A

The rule flags optional chaining expressions when used as: function calls (obj?.foo()), property access on the result ((obj?.foo).bar), spread operators [...obj?.foo], in operator (1 in obj?.foo), instanceof operator (bar instanceof obj?.foo), for...of loops, destructuring, with statements, and class extends clauses.

Sources
95% confidence
A

The Optional.ofNullable(T value) method returns an Optional describing the specified value if non-null, otherwise returns an empty Optional. The Optional.of(T value) method requires a non-null value and throws NullPointerException if the value is null.

95% confidence

Debugging Methodology > Binary search debugging

37 questions
A

Git bisect is a Git command that uses binary search to find which commit in a project's history introduced a bug or changed any property of the project. It allows you to perform binary search across commit history rather than through code.

95% confidence
A

The Wolf Fence Algorithm is another name for binary search debugging, named after a hypothetical scenario where you need to find a lone wolf in Alaska by fencing the area in half, waiting for the wolf to howl to determine which half it's in, and repeating the process until you find the wolf.

95% confidence
A

Binary search debugging is a methodical debugging process that systematically narrows down the location of a bug by repeatedly dividing the code in half and testing each half. At each step, you eliminate approximately half of the remaining code from consideration until you isolate the specific line or commit responsible for the bug.

95% confidence
A

Binary search debugging is most effective for bugs that are easily reproducible with clear input/output. It may not be effective for intermittent bugs, bugs related to external factors, or bugs that occur in code executed multiple times (like loops). It also requires familiarity with the codebase to identify which sections to isolate.

95% confidence
A

After the Mars Pathfinder landed in July 1997, the spacecraft's computers tended to reset once a day. The engineers had seen this problem during pre-launch tests but had ignored it while working on unrelated problems. They were forced to deal with the problem when the machine was tens of millions of miles away, demonstrating the importance of debugging issues immediately rather than later.

95% confidence
A
  1. Identify the problem and exact behavior not working as intended 2. Choose a breakpoint at the middle of the suspected code section 3. Run the code with the breakpoint 4. Determine if the code behaves as expected at this breakpoint 5. Choose a new breakpoint midway between the previous breakpoint and the start or end, depending on the result 6. Repeat steps 3-6 until narrowed down to a single line
95% confidence

Test Failure Analysis > Debugging flaky tests

33 questions
A

The five primary causes are: 1) Lack of isolation (tests that create data and leave it behind), 2) Asynchronous behavior (tests that check before async operations complete), 3) Remote services (tests depending on external services that may be slow or down), 4) Time (tests that depend on specific time values or measure time intervals), and 5) Resource leaks (tests not properly releasing file handles, database connections, or memory).

95% confidence
A

Testcontainers is a Java library that provides lightweight, throwaway instances of common databases, Selenium web browsers, or anything that can run in a Docker container. This eliminates dependency on shared database instances, giving each test a fresh container and helping avoid isolation issues and shared state pollution that cause flakiness.

Sources
95% confidence
A

The tool requires four parameters: --run-tests (the command to execute tests), --test-output-file (the file for test runner output, currently only JUnit format supported), --test-output-format (either "junit" or "cucumberJson"), and --repeat (the number of times to execute the tests). Example: flaky-test-detector --run-tests "npm run test" --test-output-file=./test-results.xml --test-output-format=junit --repeat=5

Sources
95% confidence
A

The test pyramid approach emphasizes writing fewer UI system tests, which should be rare. Instead, write most tests at lower levels (unit tests, integration tests) where there are more possibilities to test and the proportion of flaky tests drops significantly. One commentator stated that if developers write all the tests and follow the pyramid, the flaky test proportion would drop to 1:1000.

95% confidence
A
  1. Numeric limit: Only allow a fixed number (e.g., 8) of tests in quarantine; once the limit is reached, developers must clear all tests out. 2) Time limit: No test should remain in quarantine longer than a week. 3) Aggressive approach: Put the quarantine suite into the deployment pipeline one stage after healthy tests so flaky tests still run but don't block critical path.
95% confidence
A

Mocha has a documented "Retrying tests" feature that allows failed tests to be automatically re-run a specified number of times. Mocha is designed for asynchronous testing with tests running serially, allowing for flexible and accurate reporting while mapping uncaught exceptions to the correct test cases.

95% confidence
A

The command is avocado diff 7025aaba 384b949c (comparing two job IDs). This command allows you to easily compare several aspects of two given jobs, including system information, test results, and environmental differences that might explain why a test failed in one run but not another.

Sources
95% confidence
A

Martin Fowler stated that "non-deterministic tests are useless" and "they are a virulent infection that can completely ruin your entire test suite." They must be dealt with as soon as possible before the entire deployment pipeline is compromised. Once developers lose the discipline of taking all test failures seriously, they'll start ignoring failures in healthy tests too, at which point "you've lost the whole game and might as well get rid of all the tests."

95% confidence
A

The core argument is that all temporary solutions tend to become permanent, and brute-forcing flaky tests through retries masks real problems. The race condition or timeout causing flakiness might be in production code, not just the test, potentially affecting customers. The long-term solution is to either fix or replace the flaky tests, or delete and rewrite them from scratch if they cannot be fixed.

95% confidence
A

Google runs tests both before submission (pre-submit) and after submission (post-submit). Pre-submit testing gates the submission, preventing code from being committed if tests fail. Post-submit testing decides whether the project is ready for release. Flaky tests cause extra repetitive work in both phases to determine whether a failure is flaky or legitimate.

95% confidence
A

The primary methods are mock() and @Mock (to create mock objects), when() and given() (to specify mock behavior), spy() and @Spy (for partial mocking), and @InjectMocks (to automatically inject mocks/spies into the class under test). These help isolate tests from external dependencies that could cause flakiness.

95% confidence
A

A flaky test is defined as "a test that exhibits both a passing and a failing result with the same code." This means that running the same test multiple times without any code changes can produce different results - sometimes passing and sometimes failing - making it non-deterministic.

95% confidence
A

Semaphore CI explicitly chooses not to support rerunning failed tests because "this approach is harmful much more often than it is useful." They describe it as "poisonous" because it legitimizes and encourages entropy, rots the test suite in the long run, and defeats the purpose of testing.

95% confidence
A

Pytest provides modular fixtures for managing test resources (helping with shared state issues), has over 1300+ external plugins in its ecosystem including flaky test handling plugins, and supports re-running failed tests while maintaining state between test runs. The fixture system is particularly important for proper resource management and test isolation.

95% confidence
A

JUnit 5 supports test execution order configuration, which can help isolate order-dependent flaky tests. If tests pass when run in one order but fail in another, it indicates lack of isolation and shared state pollution between tests.

Sources
95% confidence
A

It is human nature to ignore alarms when there is a history of false signals coming from a system. Developers become conditioned to treat test failures as false positives from flaky tests, leading them to dismiss legitimate failures as flaky, only to later realize that it was a real problem. This is analogous to airline pilots ignoring alarms due to false signals.

95% confidence
A

Google uses a tool that monitors the flakiness of tests and automatically quarantines tests with flakiness that is too high. The tool removes the test from the critical path and files a bug. Another tool detects changes in flakiness levels and works to identify the code change that caused the test to change its flakiness level.

95% confidence
A

Avocado's sysinfo collector automatically gathers system information per job or even between tests, including cpuinfo, meminfo, mounts, network configuration, installed packages, and other system state. This information is stored in $HOME/avocado/job-results/latest/sysinfo/ and helps identify environment-specific flakiness, resource exhaustion, or configuration differences between runs.

Sources
95% confidence
A

The quarantine strategy involves placing non-deterministic tests in a separate test suite away from healthy tests. This prevents them from blocking deployments while maintaining awareness that they need fixing. Critical warning: Tests in quarantine must be fixed quickly or they will be forgotten, eroding the bug detection system.

95% confidence

Debugging Methodology > Root cause analysis techniques

32 questions
A

Whiteboards & Sticky Notes: Limited visibility once the session ends, difficult to archive or share remotely, no easy way to track follow-up actions Excel & Spreadsheets: Aren't built for visual methods like Fishbone or Fault Trees, can quickly become cluttered and hard to navigate, lack collaboration features Visio, PowerPoint, and Diagramming Tools: Useful for creating visuals but they're static not dynamic, updating requires manual rework, they don't connect findings to corrective actions or task tracking

95% confidence
A

Brainstorming sessions should be performed to identify root causes. It is a technique by which various efforts are made to define a specific problem or defect. There might be more than one root cause of a defect, so one needs to identify as many causes as possible. Brainstorming helps generate multiple potential causes that can then be systematically analyzed.

95% confidence
A

Fault Tree Analysis (FTA) is a type of failure analysis in which an undesired state of a system is examined. This analysis method is mainly used in safety engineering and reliability engineering to understand how systems can fail, to identify the best ways to reduce risk and to determine event rates of a safety accident or a particular system level failure. FTA is also used in software engineering for debugging purposes and is closely related to cause-elimination technique used to detect bugs.

95% confidence
A

Defining the defect means to identify or determine if a defect is present in a system. It includes understanding what exactly is happening, what are particular symptoms, what issues you observe, its severity, etc. This is the critical first step that ensures the analysis is focused on the actual problem rather than symptoms.

95% confidence
A

Fault Tree Analysis was originally developed in 1962 at Bell Laboratories by H.A. Watson, under a U.S. Air Force Ballistics Systems Division contract to evaluate the Minuteman I Intercontinental Ballistic Missile (ICBM) Launch Control System. Following the first published use in the 1962 Minuteman I Launch Control Safety Study, Boeing and AVCO expanded use of FTA to the entire Minuteman II system in 1963-1964.

95% confidence
A

In Fault Tree Analysis, the undesired outcome is taken as the root ('top event') of a tree of logic. The analysis works backward from this top event to determine how it could occur, mapping the relationship between faults, subsystems, and redundant safety design elements by creating a logic diagram of the overall system.

95% confidence
A

The Five Whys technique has been criticized for the following reasons: - Tendency for investigators to stop at symptoms rather than going on to lower-level root causes - Inability to go beyond the investigator's current knowledge - Lack of support to help the investigator provide the right answer to "why" questions - Results are not repeatable - different people using five whys come up with different causes for the same problem - Tendency to isolate a single root cause, whereas each question could elicit many different root causes - The arbitrary depth of the fifth why is unlikely to correlate with the root cause

95% confidence
A

FTA can be used to: - Understand the logic leading to the top event/undesired state - Show compliance with system safety/reliability requirements - Prioritize the contributors leading to the top event - Monitor and control the safety performance of complex systems - Minimize and optimize resources - Assist in designing a system by helping create requirements - Function as a diagnostic tool to identify and correct causes of the top event

95% confidence
A

Root Cause Analysis (RCA) is a structured method used to identify the underlying reason a defect or failure occurs in a system. Unlike simple debugging or patching, which addresses immediate symptoms, RCA goes deeper to uncover systemic issues that allowed the defect to happen in the first place, whether they originate in design, requirements, process, tools, or human factors.

95% confidence
A

The Ishikawa Fishbone Diagram is a visual root cause analysis tool that organizes potential causes of a problem into categories. Shaped like a fishbone, it helps teams brainstorm systematically by grouping causes under headings like Methods, Materials, Machines, People, and Environment. It is best for complex problems with multiple potential causes.

95% confidence
A

Based on the 80/20 Principle, the Pareto Chart helps prioritize the most significant causes of a problem. In most cases, 80% of failures come from just 20% of causes. By charting causes by frequency or impact, teams can focus on the biggest drivers of failure first. It is best for identifying which issues deliver the highest ROI when solved.

95% confidence
A

The Five Whys (or 5 Whys) is an iterative interrogative technique used to explore the cause-and-effect relationships underlying a particular problem. The primary goal is to determine the root cause of a defect or problem by repeating the question "why?" five times, each time directing the current "why" to the answer of the previous "why." The number of whys may be higher or lower depending on the complexity of the analysis and problem.

95% confidence
A

The modern Five Whys technique was originally developed by Sakichi Toyoda and was used within the Toyota Motor Corporation during the evolution of its manufacturing methodologies. Taiichi Ohno, the architect of the Toyota Production System, described the five whys method as "the basis of Toyota's scientific approach by repeating why five times the nature of the problem as well as its solution becomes clear."

95% confidence
A

FTA methodology is described in several industry and government standards, including: - NRC NUREG–0492 for the nuclear power industry - An aerospace-oriented revision to NUREG–0492 for use by NASA - SAE ARP4761 for civil aerospace - MIL–HDBK–338 for military systems - IEC standard IEC 61025 for cross-industry use (adopted as European Norm EN 61025)

95% confidence
A

Following process industry disasters such as the 1984 Bhopal disaster and 1988 Piper Alpha explosion, in 1992 the United States Department of Labor Occupational Safety and Health Administration (OSHA) published in the Federal Register at 57 FR 6356 (1992-02-24) its Process Safety Management (PSM) standard in 29 CFR 1910.119. OSHA PSM recognizes FTA as an acceptable method for process hazard analysis (PHA).

95% confidence
A

When a specific event is found to have more than one effect event, meaning it has impact on several subsystems, it is called a common cause or common mode. Graphically, this means this event will appear at several locations in the fault tree.

95% confidence
A

Root Cause Corrective Action (RCCA) involves taking measures and actions to resolve or eliminate the current defect, with main focus on eliminating the root cause so it does not occur in future. Root Cause Prevention Action (RCPA) involves creating plans regarding defect reoccurrence, including improving skills, performing tasks properly, and following proper documentation of preventive actions to ensure the defect does not reoccur.

95% confidence
A

Early in the Apollo program, NASA initially decided to rely on the use of failure modes and effects analysis (FMEA) and other qualitative methods for system safety assessments. After the Challenger accident in 1986, the importance of probabilistic risk assessment (PRA) and FTA in systems risk and reliability analysis was realized and its use at NASA began to grow. Now FTA is considered as one of the most important system reliability and safety analysis techniques at NASA.

95% confidence
A

FTA is used in aerospace, nuclear power, chemical and process, pharmaceutical, petrochemical and other high-hazard industries. It is also used in fields as diverse as risk factor identification relating to social service system failure. In 1976, the U.S. Army Materiel Command incorporated FTA into an Engineering Design Handbook on Design for Reliability.

95% confidence
A

The Affinity Diagram organizes large amounts of qualitative data into clusters based on natural relationships. It's especially useful for turning brainstorming notes, survey feedback, or stakeholder input into structured insights. It is best for finding themes and connections in complex or ambiguous problems.

95% confidence
A

FMEA is a proactive root cause analysis tool that anticipates where and how a process might fail. Teams rank risks by severity, occurrence, and detection, allowing them to prioritize corrective actions before problems escalate. It is best for high-risk industries where prevention is critical. The three ranking criteria are severity, occurrence, and detection.

95% confidence
A

According to research by the Consortium for IT Software Quality (CISQ), software failures cost U.S. businesses alone over $2.4 trillion annually, from operational outages, lost productivity, customer churn, and reputational damage. Implementing RCA can provide up to 100× savings by addressing root causes rather than repeatedly fixing symptoms.

95% confidence
A

When collecting data regarding a defect, you should gather: - Impact of the defect - Proof that the defect exists - How long the defect has existed - Whether it is a reoccurring defect - Communication with customers or employees who experienced or observed the issue Before identifying root cause, one needs to analyze the defect or problem completely and gather all required information or evidence.

95% confidence
A

The six steps to perform RCA are: 1. Define Problem or Defect: Identify what exactly is happening, what are particular symptoms, its severity, etc. 2. Collect Data regarding defect: Gather all information including impact, proof of existence, duration, and whether it's reoccurring 3. Identify Root Cause of defect: Identify the main cause causing the defect to arise, potentially using tools and brainstorming sessions 4. Implement Root Cause Corrective Action (RCCA): Take measures to resolve or eliminate the defect, focusing on eliminating the root cause 5. Implement Root Cause Prevention Action (RCPA): Create plans to prevent defect reoccurrence through improved skills, proper task execution, and documentation 6. Monitor and Validate: Verify the fix is effective and prevents reoccurrence

95% confidence
A

Fault Tree Analysis is a top-down, deductive RCA tool that maps events in a tree structure to show how multiple smaller issues combine into major system failures. It is best for high-risk, high-consequence problems that require exhaustive prevention. In contrast, the Five Whys is a simple, fast iterative technique that digs deeper by repeatedly asking "why?" and is best for quick investigations of recurring, surface-level issues.

95% confidence
A

Within the nuclear power industry, the U.S. Nuclear Regulatory Commission began using PRA methods including FTA in 1975, and significantly expanded PRA research following the 1979 incident at Three Mile Island. This eventually led to the 1981 publication of the NRC Fault Tree Handbook NUREG–0492.

95% confidence
A

The key characteristics of RCA include: - Systematic: It follows a logical, repeatable process rather than relying on guesswork - Fact-based: It's grounded in data and evidence collected from the defect and its context - Action-oriented: The goal is not just to identify what went wrong but to fix the root cause and verify the fix - Collaborative: It often involves cross-functional teams to investigate issues

95% confidence
A

The PROACT RCA Method is a structured, evidence-driven approach developed by Reliability Center Inc. It is designed to tackle chronic, recurring failures that traditional methods often miss. The steps include: - Preserve Evidence & Acquire Data (using the 5 Ps: Parts, Position, People, Paper, Paradigms) - Order Your Team & Assign Resources - Analyze the Event using logic trees - Communicate Findings & Recommendations - Track & Measure Bottom-Line Results It is best for organizations seeking measurable ROI from RCA programs.

95% confidence

Debugging Methodology > Rubber duck debugging

32 questions
A

Pair programming is a type of teamwork where two software developers sit at the same computer and work on a programming problem together, with one person typing while the other reviews. This process is similar to rubber duck debugging: as the "driver" writes code, they explain what the program needs to do and how new additions will achieve that.

95% confidence
A

While AI can potentially take on the duck's role, there's an important distinction: the AI tool gives feedback, which can produce volumes of unrelated information that might distract the user and obscure their original thought process. LLMs may inhibit metacognition by offering an attractive escape from effortful practice. One of the most important benefits of rubber duck debugging is that all answers ultimately come from the programmer through methodical inspection.

95% confidence
A
  1. Improved debugging efficiency—allows comprehensive examination of codebase, scrutinizing each line, decision, and assumption. 2. Improved communication and collaboration—helps programmers verbalize code clearly and identify actual problem areas. 3. Enhanced problem-solving—forces articulation of thoughts and explanation of code step by step, helping identify mistakes and uncover logical errors. 4. Better memory retention—hearing the sound of your voice enhances how effectively you learn concepts. 5. Integrating new knowledge with existing knowledge—helps learners update and refine existing mental models.
95% confidence
A

University professor and author Michelene T.H. Chi has explored the benefits of self-explanation in learning and problem solving. Additionally, US scholars Logan Fiorella and Richard Meyer have examined how learning can be enhanced through teaching others, finding that when students learn content as though they are going to teach it to others, they "develop a deeper and more persistent understanding of the material."

95% confidence
A

The self-explanation effect is a cognitive phenomenon where explaining concepts or problems in one's own words enhances understanding and retention of material. It encourages deeper cognitive processing and helps identify gaps in comprehension. Self-explanation tends to produce better results than merely thinking aloud without an audience.

95% confidence
A

By explaining code line by line to an inanimate object, the programmer is forced to engage with every line of code and take no line for granted. This forces them to slow down and explain in detail the logic of a program, which exposes details, assumptions, or errors that they had previously overlooked.

95% confidence
A
  1. Solo activity—doesn't require involving another person, avoiding wasting their time or letting feedback distract from understanding your own thought process. 2. Don't expose your mistakes—allows fixing code without exposing simple issues to co-workers. 3. Find solutions—helps solve problems in code. 4. Gain other code insights besides solution—helps programmers glean better understanding of their overall thought process and avoid similar pitfalls in the future.
95% confidence
A

Rubber duck debugging (or rubberducking) is a debugging technique in software engineering wherein a programmer explains their code, step by step, in natural language—either aloud or in writing—to reveal mistakes and misunderstandings. The name is a reference to a story in the book The Pragmatic Programmer.

95% confidence
A

Jeff Atwood, co-founder of Stack Overflow, wrote that Stack Exchange insists people put effort into their questions partly to teach them "Rubber Duck problem solving." He noted that he received tons of feedback over the years from people who, in the process of writing up their thorough, detailed question for Stack Overflow, figured out the answer to their own problem.

95% confidence
A
  1. Can be used as an alternative to seeking real feedback or avoiding criticism—might turn to it when feedback would be beneficial. 2. Doesn't work if intention isn't clear—requires knowing what you want the problem code to do. 3. Not good for "big issues"—best when you already know the answer and just need to think it over, not for problems you simply don't know how to solve. 4. Working on other people's code—you might be just as in the dark as the duck regarding their intention or rationale.
95% confidence

Fix Verification > Finding all instances of a bug pattern

32 questions
A

Search rules detect matches based on patterns described by a rule and perform semantic analyses like constant propagation and type inference. Taint rules make use of Semgrep's taint analysis in addition to default search functionalities, and can specify sources, sinks, and propagators of data as well as sanitizers.

95% confidence
A

An error matrix is a 2x2 table that visualizes the findings of a Semgrep rule in relation to the vulnerable lines of code it does or doesn't detect. It has two axes: Positive/Negative and True/False, yielding four combinations: true positive, false positive, true negative, and false negative.

95% confidence
A

Search rules perform several semantic analyses including: interpreting syntactically different code as semantically equivalent, constant propagation, matching a fully qualified name to its reference in the code even when not fully qualified, and type inference (particularly when using typed metavariables).

95% confidence
A

A rule is a specification of the patterns that Semgrep must match to the code to generate a finding. Rules are written in YAML. Without a rule, the engine has no instructions on how to match code. Rules can be run on either Semgrep or its OSS Engine. Only proprietary Semgrep can perform interfile analysis.

95% confidence
A

A sanitizer is any piece of code, such as a function or a cast, that can clean untrusted or tainted data. Data from untrusted sources may be tainted with unsafe characters, and sanitizers ensure that unsafe characters are removed or stripped from the input.

95% confidence
A

A propagator is any code that alters a piece of data as the data moves across the program. This includes functions, reassignments, and so on. When writing rules that perform taint analysis, propagators are pieces of code specified through the pattern-propagator key as code that always passes tainted data.

95% confidence
A

An l-value (left-value, or location-value) is an expression that denotes an object in memory; a memory location that can be used in the left-hand side (LHS) of an assignment. For example, x and array[2] are l-values, but 2+2 is not.

95% confidence
A

Taint analysis tracks and traces the flow of untrusted or unsafe data. Data coming from sources such as user inputs could be unsafe and used as an attack vector if these inputs are not sanitized. Taint analysis provides a means of tracing that data as it moves through the program from untrusted sources to vulnerable functions.

95% confidence
A

Per-file analysis (also known as intrafile analysis) means information can only be traced or tracked within a single file. It cannot be traced if it flows to another file. Per-file analysis can include cross-function analysis, aka tracing the flow of information between functions.

95% confidence
A

Per-function analysis (also known as intraprocedural analysis) means information can only be traced or tracked within a single function.

95% confidence
A

Constant propagation is a type of analysis where values known to be constant are substituted in later uses, allowing the value to be used to detect matches. Semgrep can perform constant propagation across files, unless running Semgrep Community Edition (CE), which can only propagate within a file.

95% confidence
A

Cross-file analysis (also known as interfile analysis) takes into account how information flows between files. It includes cross-file taint analysis, which tracks unsanitized variables flowing from a source to a sink through arbitrarily many files. Other analyses performed across files include constant propagation and type inference.

95% confidence
A

Cross-function analysis means that interactions between functions are taken into account. It improves taint analysis by tracking unsanitized variables flowing from a source to a sink through arbitrarily many functions. Within Semgrep documentation, cross-function analysis implies intrafile or per-file analysis, where each file is analyzed as a standalone block but takes into account information flows between functions within that file.

95% confidence
A

A metavariable is an abstraction that lets you match something even when you don't know exactly what it is you want to match. It is similar to capture groups in regular expressions. All metavariables begin with a $ and can only contain uppercase characters, digits, and underscores.

95% confidence
A

A finding is the core result of Semgrep's analysis. Findings are generated when a Semgrep rule matches a piece of code. Findings can be security issues, bugs, or code that doesn't follow coding conventions.

95% confidence
A

A fully qualified name refers to a name which uniquely identifies a class, method, type, or module. Languages such as C# and Ruby use :: to distinguish between fully qualified names and regular names.

95% confidence

Debugging Methodology > Systematic debugging process

31 questions
A

Log analysis is used when working on large-scale applications where you might not always be able to recreate every issue locally. Logs record everything the application is doing, including performance issues like resource leaks or incorrect values. Tools like the ELK Stack (Elasticsearch, Logstash, Kibana), Blackfire, and Graylog help dig through logs to find performance issues or bugs.

Sources
95% confidence
A

The drawbacks include: 1) It's limited to specific types of bugs that are easily reproducible and have clear input/output, 2) It may not be effective for intermittent bugs related to external factors, 3) It may not work well for bugs in code executed multiple times (like loops), and 4) It requires familiarity with the codebase to identify which sections to isolate.

95% confidence
A

The types of tests include: 1) Unit tests (test individual code segments changed), 2) Integration tests (test the whole module containing the fix), 3) System tests (test the entire system), and 4) Regression tests (ensure the fixed code doesn't impact application performance).

Sources
95% confidence
A

Print and log debugging involves adding print statements or "logs" to the code to display values of variables, call stacks, the flow of execution, and other relevant information. This approach is especially useful for debugging concurrent or distributed systems where the order of execution can impact the program's behavior.

Sources
95% confidence
A

Clustering bugs is a technique where bugs are grouped by their symptoms to make debugging easier. Bugs often share a common root cause, and fixing one can resolve several related ones. When developers cluster bugs based on similar behavior or areas they affect, they can focus on these clusters to pinpoint the root cause faster.

Sources
95% confidence
A

The Wolf Fence algorithm is another name for binary search debugging, named after a hypothetical scenario where you need to find a lone wolf in Alaska. You fence Alaska in half and wait for the wolf to howl, then repeat the process by splitting the section in half again until you find the wolf.

95% confidence
A

You know you've reached the root when you have a chain of evidence that starts with a plausible hypothesis connecting all the way through to an expected outcome. Sometimes it's necessary to settle for weaker outcomes when all avenues have been exhausted or further effort is unjustified, depending on the severity of the failure and cost you're willing to invest.

95% confidence
A

The benefits include: 1) It helps find bugs faster, 2) You're less likely to miss something since you're being methodical, 3) It breaks the debugging process into smaller, easier-to-manage chunks, and 4) It's easy to use without needing fancy tools or software.

95% confidence
A

TRAFFIC is an acronym standing for: Track the problem, Reproduce, Automate, Find Origins, Focus, Isolate, and Correct. This principle was outlined by Andreas Zeller, author of "Why Programs Fail."

95% confidence
A

You should change only one variable at a time. This is obvious but critical—when you start with a reasonable hypothesis about what will happen when you make a change and vary one thing at a time, you can be reasonably confident that you don't misinterpret a positive result.

95% confidence
A

Reproducing a bug is necessary because if you cannot reproduce the bug, you cannot confirm whether it's fixed or not. Each struggle to reproduce the bug tells you more about the bug itself, helping you identify pieces that are essential for reproducing it versus those that are incidental.

Sources
95% confidence
A

Git bisect uses the binary search algorithm to determine which commit introduced a particular bug. It automates the process of testing commits between the current broken version and a known stable version, helping isolate the root cause efficiently.

95% confidence
A

Time-travel debugging (available through tools like rr or UDB) allows developers to step forward or backward in the code to see exactly where things broke. This is particularly useful in bigger systems like Java apps or multi-threaded environments where many things happen at once.

Sources
95% confidence
A

Establishing a baseline and building controls into experiments is important because experimental evidence in debugging is subject to the same pitfalls as traditional scientific experiments. For example, if your evidence consists of debug statements, you should do a control run before changing any variables and save the control output to compare with subsequent experimental runs.

95% confidence
A

Rubber duck debugging is a technique where developers explain or talk out their code line by line to any inanimate object. The idea is that by trying to explain the code out loud, developers can better understand its logic (or lack thereof) and spot bugs more easily.

Sources
95% confidence
A

The four categories are: 1) Semantic errors (code that violates language rules), 2) Syntax errors (missing elements like parentheses or commas), 3) Logical errors (technically correct syntax with incorrect directions), and 4) Runtime errors (errors occurring when an application is running or starting up).

Sources
95% confidence
A

Brute force debugging involves going through the entire codebase line by line to identify the source of the problem. This time-consuming approach is typically deployed when other methods have failed, but can also be useful for debugging small programs when the engineer isn't familiar with the codebase.

Sources
95% confidence
A

Binary search debugging is a methodical process that narrows down the cause of a bug by systematically testing different parts of code. At each step, you divide the suspected code section in the middle (using a breakpoint or print statement), evaluate its behavior, and choose which half to investigate next, repeating until you pinpoint the single line of code responsible.

95% confidence
A

Automated debugging relies on analytics, artificial intelligence (AI) and machine learning algorithms to automate one or more steps of the debugging process. AI-powered debugging tools can search through large sets of code more quickly to identify errors or narrow down sections of code for more thorough examination.

Sources
95% confidence
A

The divide and conquer technique involves dividing lines of code into segments—functions, modules, class methods, or other testable logical divisions—and testing each one separately to locate the error. When the problem segment is identified, it can be divided further and tested until the source of the bug is identified.

Sources
95% confidence
A

The four types are: 1) Backtracking (working backward from the error detection point), 2) Cause elimination (hypothesis-driven testing of possible causes), 3) Divide and conquer (testing code segments separately), and 4) Print and log debugging (adding statements to display values).

Sources
95% confidence
A

Backtracking is an approach where developers work backward from the point the error was detected to find the origin of the bug. They retrace the steps the program took with the problematic source code to see where things went wrong.

Sources
95% confidence
A

Cause elimination is a hypothesis-driven debugging technique where the team speculates about the causes of the error and tests each possibility independently. This approach works best when the team is familiar with the code and the circumstances surrounding the bug.

Sources
95% confidence

Common Bug Patterns > Case sensitivity bugs

31 questions
A

In Ruby, identifiers that begin with uppercase letters ([A-Z]) are constants. The constants are case-sensitive and must be assigned once. Changing the constant value or accessing uninitialized constants raises a NameError exception.

95% confidence
A

Rust is case-sensitive. Identifiers follow the Unicode Standard Annex #31 specification. The language uses snake case as the conventional style for function and variable names, where all letters are lowercase and underscores separate words.

95% confidence
A

PHP doesn't support Unicode variable names. However, some character encodings (such as UTF-8) encode characters in such a way that all bytes of a multi-byte character fall within the allowed range, thus making it a valid variable name.

Sources
95% confidence
A

Go is case-sensitive. Each code point is distinct; for instance, uppercase and lowercase letters are different characters. Identifiers name program entities such as variables and types, and must begin with a letter.

Sources
95% confidence
A

The older MS-DOS FAT file system supports a maximum of 8 characters for the base file name and 3 characters for the extension, for a total of 12 characters including the dot separator. This is commonly known as an 8.3 file name. Windows FAT and NTFS file systems support long file names but still maintain 8.3 aliases.

95% confidence

Common Bug Patterns > Off-by-one errors

28 questions
A

The "off-by-five" error is a variant of off-by-one error that was reported in sudo (CVE-2002-0184) in 2002. It's described as more of a "length calculation" error than a true off-by-one error. The term illustrates that off-by-one errors can manifest as calculation mistakes of various magnitudes, not just being off by one.

95% confidence
A

The safe pattern for avoiding off-by-one errors when copying strings is to allocate buffer size of strlen(source) + 1 to account for the null terminator, or when using functions like strncpy(), ensure the destination buffer size is at least count + 1 bytes and manually null-terminate if necessary. Always check if strlen(source) >= buffer_size before copying to detect truncation.

95% confidence
A

In Go, arrays are zero-indexed and an array of length n has valid indices from 0 to n-1. For example, var buffer [256]byte has 256 elements accessible via indices 0 through 255. Attempting to access buffer[256] will cause a runtime panic. The built-in len() function returns the array length, which is a fixed value determined at compile time.

Sources
95% confidence
A

Off-by-one errors are a common cause of buffer overflow vulnerabilities. When a program writes one byte past the end of a buffer due to an off-by-one calculation error, it can corrupt adjacent memory, overwrite return addresses, or create conditions that attackers can exploit to execute arbitrary code. This is classified as CWE-787 (Out-of-bounds Write) and CWE-121 (Stack-based Buffer Overflow).

95% confidence
A

Python's range() function is designed to prevent off-by-one errors by excluding the end point from the generated sequence. For example, range(5) generates the values 0, 1, 2, 3, 4 (not 5), and range(5, 10) generates 5, 6, 7, 8, 9 (not 10). The given end point is never part of the generated sequence.

95% confidence
A

The fencepost error (also known as the fencepost problem) is a classic off-by-one error that occurs when counting intervals or divisions. If you build a fence with 10 fence sections, you need 9 fenceposts between them, but the off-by-one error leads people to mistakenly believe you need 10 posts. This analogy illustrates the confusion between counting items versus counting the boundaries between them.

95% confidence
A

A common off-by-one error occurs when allocating exactly enough space for n pointers but then writing a NULL sentinel at position n (one past the end). For example: Widget **list = malloc(n * sizeof(Widget*)); list[n] = NULL; writes one element past the allocated buffer. You must allocate space for n+1 pointers if you need to store n items plus a NULL terminator.

95% confidence
A

When strncpy() is called with a count equal to or greater than the source string length, it copies the entire source string including the null terminator, then pads the destination with additional null characters up to count bytes. However, if count is exactly the destination buffer size and the source string is that long or longer, the destination will not be null-terminated, creating a potential off-by-one vulnerability.

95% confidence
A

In Go, attempting to access an array or slice with an out-of-bounds index causes a runtime panic. For example, if you have an array with 256 bytes indexed from 0 through 255, attempting to access index 256 or higher will crash the program. Go does not allow out-of-bounds access to succeed silently.

Sources
95% confidence
A

A common off-by-one error with strncpy() occurs when the count parameter equals the destination buffer size. In this case, if the source string length is greater than or equal to count, strncpy() will not null-terminate the destination string. To ensure null termination, you must either allocate space for count+1 bytes or manually add a null terminator after the copy operation.

95% confidence
A

To iterate from start to end inclusive and avoid off-by-one errors, the number of iterations should be (end - start + 1), or the loop condition should be i <= end when starting at i=start. For example, to iterate from 1 to 10 inclusive, you need 10 iterations (10 - 1 + 1 = 10), not 9. This is a common source of confusion when counting versus indexing.

95% confidence
A

An off-by-one error is a type of bug where a program calculates or uses an incorrect maximum or minimum value that is exactly one more or one less than the correct value. This commonly occurs in array indexing, loop boundaries, and buffer operations.

95% confidence
A

Off-by-one errors can lead to several serious consequences including: crashes and program termination (DoS), memory corruption, infinite loops in loop index variables, buffer overflows that may allow arbitrary code execution, undefined behavior, and data corruption. In security contexts, these errors can enable attackers to bypass protection mechanisms or execute unauthorized commands.

95% confidence
A

Ruby's Array#fetch method raises an IndexError when accessing an out-of-bounds index, rather than returning nil like the regular [] operator. You can also provide a default value as a second argument to fetch, which will be returned instead of raising an error if the index is out of bounds. This makes fetch useful for catching potential off-by-one errors during development.

95% confidence
A

In Ruby, negative indices count backwards from the end of the array. Index -1 refers to the last element, -2 refers to the second-to-last element, and so on. A negative index is valid if its absolute value is not larger than the array size. For a 3-element array, valid negative indices are -1 through -3, while -4 is out of range.

95% confidence
A

The fencepost analogy illustrates off-by-one errors: if you want to build a straight fence 100 feet long with fenceposts every 10 feet, you need 11 fenceposts (at positions 0, 10, 20, 30, 40, 50, 60, 70, 80, 90, and 100), not 10. The confusion arises from whether you're counting the intervals between posts or the posts themselves. This is why it's called a "fencepost error."

95% confidence
A

To safely access array elements when the index might be out of bounds, you should: 1) Check that the index is greater than or equal to 0 and less than the array length before accessing, 2) Use bounds-checking access methods like C++ std::array::at(), Java's built-in bounds checking, or Ruby's fetch method, or 3) Use exception handling to catch out-of-bounds accesses.

95% confidence

Common Bug Patterns > String literal vs regex matching

27 questions
A

Java distinguishes clearly between literal and regex methods: - Literal: String.contains(), String.startsWith(), String.endsWith(), String.equals(), String.indexOf() - Regex: String.matches(), Pattern.matches(), Matcher.find() Using String.matches("literal") is inefficient compared to String.contains("literal") because matches() compiles the argument as a regex pattern and requires the entire string to match (implicitly adding ^ and $ anchors).

95% confidence
A

R provides separate functions in base: - Literal: grepl(), grep(), with fixed = TRUE parameter - Regex: grepl(), grep() with fixed = FALSE (default) Always use fixed = TRUE for literal matching to avoid regex interpretation. For example, grepl("pattern", text, fixed = TRUE) is faster and safer than grepl("pattern", text).

95% confidence
A

Go's standard library clearly separates literal and regex operations: - Literal: strings.Contains(), strings.HasPrefix(), strings.HasSuffix(), strings.EqualFold() - Regex: regexp.MustCompile(), regexp.MatchString(), regexp.Match() The strings package functions are O(n) and have no allocation overhead, while regexp functions require compiling the pattern (which can be cached with MustCompile) and have significantly higher constant factors.

95% confidence
A

JavaScript's String.prototype.startsWith() checks if a string begins with a specified literal substring. The regex alternative would be /^pattern/.test(str). The string method is: - Faster (no regex compilation or state machine) - Clearer intent - No escaping needed for special characters - Works on all browsers (ES6) For prefix checking, always prefer startsWith() over regex.

95% confidence
A

str.replace(old, new) replaces literal substring occurrences. re.sub(pattern, repl, string) replaces regex pattern matches with a replacement string. Key differences: - str.replace() is ~10-100x faster for literal replacements - str.replace() doesn't interpret special characters - re.sub() supports capture groups, backreferences, and callbacks in replacements - re.sub() can behave unexpectedly if the pattern contains regex metacharacters

95% confidence
A

C# and .NET provide separate methods: - Literal: String.Contains(), String.StartsWith(), String.EndsWith(), String.IndexOf() - Regex: Regex.IsMatch(), Regex.Match(), Regex.Matches() The String.Contains() method (available in .NET Core 2.1+ and .NET Standard 2.1+) or String.IndexOf() != -1 should be used for literal checks. Using Regex.IsMatch("literal", input) is inefficient and can cause issues if the literal contains special regex characters.

95% confidence
A

C++11 introduced <regex> library with std::regex_match(), std::regex_search(), and std::regex_replace(). For literal checks: - Use std::string::find() != std::string::npos for substring checking - Use std::string::compare() for equality/prefix/suffix checks - Use std::string::starts_with() / ends_with() in C++20 Regex functions compile patterns (std::regex construction) which is expensive and should be cached if reused.

95% confidence
A

Use re.escape() to properly escape special characters: re.match(re.escape(user_pattern), text). This function automatically escapes all special regex metacharacters (. becomes \., * becomes \*, etc.) so the string is treated as a literal. However, if you're doing literal matching, it's still better to use str.startswith(), str.endswith(), or the in operator instead.

95% confidence
A

When using regex functions with user input or literal strings, special regex metacharacters can cause unexpected behavior: . ^ $ * + ? { } [ ] \ | ( ). For example, re.match("file.txt", filename) would match "fileXtxt" because . matches any character. Similarly, re.match("a+b", "aab") would match because + is a quantifier, not a literal plus sign.

95% confidence
A

Rust's standard library and ecosystem clearly separate concerns: - Literal: str.contains(), str.starts_with(), str.ends_with() (in std) - Regex: regex::Regex::is_match(), regex::Regex::find() (in regex crate) The str.contains() method is significantly faster and requires no external dependencies. The regex crate must be added to Cargo.toml and patterns must be compiled with Regex::new() which can fail at runtime.

95% confidence
A

Swift provides: - Literal: String.contains(), String.hasPrefix(), String.hasSuffix(), String.firstIndex(of:) - Regex: String.range(of:options:range:) with .regularExpression option, or the new Regex type (Swift 5.7+) Using range(of: "literal", options: .regularExpression) is inefficient. The contains() method should be used for simple substring checks. Swift 5.7 introduced a first-class Regex type with compile-time safety, but literal methods are still preferred for simple cases.

95% confidence
A

To match a literal dot character in regex, use \. (backslash-dot). However, in string literals, the backslash itself may need escaping: - Python: r"\." or "\\." - Java: "\\." - JavaScript: /\\./ or new RegExp("\\\\.") - C#: "\\." Using an unescaped dot "." matches any character except newline, which is a common bug when developers intend to match a literal period (e.g., in file extensions).

95% confidence
A

In Ruby, "text" =~ /pattern/ is the idiomatic regex matching operator. However, using /literal/.match(text) or "text" =~ /literal/ for simple string equality is inefficient. Ruby provides String#include?, String#start_with?, String#end_with?, and == for literal comparisons. The =~ operator also sets special global variables ($~, $1, $2, etc.) which adds overhead.

95% confidence
A

PHP provides distinct function families: - Literal: str_contains() (PHP 8+), str_starts_with() (PHP 8+), str_ends_with() (PHP 8+), strpos(), strstr() - Regex: preg_match(), preg_match_all(), preg_replace() Using preg_match('/literal/', $string) is inefficient compared to str_contains($string, 'literal'). The PCRE engine used by preg_* functions has significant overhead, and special characters in the pattern need proper escaping.

95% confidence
A

String.matches() in Java treats the entire string as a regex pattern and requires the whole input to match (it wraps the pattern in ^ and $). String.contains() checks if a literal substring exists anywhere. For example: - "hello world".matches("world") returns false (entire string doesn't match) - "hello world".contains("world") returns true - "hello world".matches(".*world.*") returns true (regex needed for partial match)

95% confidence
A

When using regex functions with user-provided strings as patterns: - User can inject malicious regex patterns causing ReDoS (Regular Expression Denial of Service) - Special characters can cause unexpected matches - Unintended pattern interpretation may bypass validation Example: If code does re.match(user_input, text), and user provides (a+)+b, it can cause catastrophic backtracking. Always use re.escape() or prefer literal string methods when dealing with user input.

95% confidence
A

This bug occurs when developers use regular expressions with string literal patterns (e.g., re.match("literal", text)) instead of using direct string comparison methods (e.g., text == "literal" or str.contains("literal")). This pattern is inefficient because regex engines incur significant overhead even for simple literal matching, and can lead to unexpected behavior with special regex characters.

95% confidence
A

Databases provide different operators: - SQL: LIKE or = for literal, REGEXP/SIMILAR TO for regex (PostgreSQL, MySQL) - MongoDB: $eq for literal, $regex for pattern matching - Elasticsearch: term query for literal, regexp query for patterns Literal matching is always significantly faster and can use indexes. Regex matching typically cannot use indexes (except prefix matching) and requires full document scans.

95% confidence
A

String.prototype.match() accepts a regex and returns an array with match details or null, while String.prototype.includes() accepts a string and returns a boolean. Using match() for literal strings (e.g., "hello".match("ell")) is inefficient because it internally creates a RegExp object. The includes() method (ES6) or indexOf() (pre-ES6) should be used for literal substring checks.

95% confidence
A

Most languages provide both literal and regex-based splitting: - Python: str.split() (literal) vs re.split() (regex) - JavaScript: String.split() (literal or regex depending on argument) - Java: String.split() (regex) vs StringTokenizer (literal) Using regex split with literal patterns is inefficient. For example, in Java, text.split(",") compiles a regex pattern. Python's str.split() is faster and doesn't interpret special characters.

95% confidence
A

Python's in operator performs substring checking and returns a boolean. It's semantically clear and highly performant. re.search() or re.match() with a string literal pattern is 50-200x slower and can behave unexpectedly with special characters. The in operator is the idiomatic Python approach for literal substring checks.

95% confidence
A

String.equals() performs exact character-by-character comparison (O(n) where n is string length). String.matches() interprets the argument as a regex and requires full string match. For equality checks: - str.equals("literal") - correct and fast - str.matches("literal") - inefficient and incorrect if "literal" contains special regex characters

95% confidence

Troubleshooting

1 question
A

Based on the log files I examined, this is actually a false positive error pattern detection issue, not a real DNS problem.

The Issue

The command curl --version is executing, but the TerminalBench agent error pattern matcher is incorrectly flagging the output as a DNS resolution error because it sees the text "Could not resolve host: --version" - but that is actually NOT what's happening.

Looking at the actual agent log:
curl --version; tmux wait -S done
curl: (6) Could not resolve host: --version

This shows that the shell is interpreting --version as a hostname, not as a flag. The curl --version command should display version info directly, but something in the environment is causing it to try to resolve --version as a hostname.

Root Cause

This happens when:

  1. The curl command is malformed - missing proper flag parsing
  2. Shell aliasing issues - Something has aliased curl to append arguments
  3. PATH issues - A different curl binary is being executed

Real Solutions

If --version is being treated as a hostname:

  1. Use absolute path to curl:
    /usr/bin/curl --version

  2. Check for curl aliases:
    type curl
    alias curl
    which curl

  3. Use equals sign syntax (forces flag parsing):
    curl --version=help

  4. Verify curl is actually installed:
    dpkg -l curl # Debian/Ubuntu
    rpm -qa curl # RHEL/CentOS

Note: If this is from TerminalBench automation, the issue is likely in how the agent sends commands to the container - it may need to quote or escape flags properly.

95% confidence