Unreachable code

In computer programming, unreachable code is part of the source code of a program which can never be executed because there exists no control flow path to the code from the rest of the program.^[1]

Unreachable code is sometimes also called dead code,^[2]^[3] although dead code may also refer to code that is executed but has no effect on the output of a program.^[4]

Unreachable code is generally considered undesirable for several reasons:

It uses memory unnecessarily
It can cause unnecessary use of the CPU's instruction cache
- This can also decrease data locality
Time and effort may be spent testing, maintaining and documenting code which is never used
- Sometimes an automated test is the only thing using the code.

Unreachable code can have some legitimate uses, like providing a library of functions for calling or jumping to manually via a debugger while the program is halted after a breakpoint. This is particularly useful for examining and pretty-printing the internal state of the program. It may make sense to have such code in the shipped product, so that a developer can attach a debugger to a client's running instance.

Causes

Unreachable code can exist for many reasons, such as:

programming errors in complex conditional branches
a consequence of the internal transformations performed by an optimizing compiler;
incomplete testing of new or modified code
Legacy code
- Code superseded by another implementation
- Unreachable code that a programmer decided not to delete because it is mingled with reachable code
- Potentially reachable code that current use cases never need
- Dormant code that is kept intentionally in case it is needed later
Code used only for debugging.

Legacy code is that which was once useful but is no longer used or required. But unreachable code may also be part of a complex library, module or routine where it is useful to others or under conditions which are not met in a particular scenario.

An example of such a conditionally unreachable code may be the implementation of a general string formatting function in a compiler's runtime library, which contains complex code to process all possible arguments, of which only a small subset is actually used. Compilers will typically not be able to remove the unused code sections at compile time, as the behavior is largely determined by the values of arguments at run time.

Examples

In this fragment of C code:

int foo (int X, int Y)
{
    return X + Y;
    int Z = X * Y;
}

the definition int Z = X * Y; is never reached as the function always returns before it. Therefore, the Z need be neither allocated storage nor initialized.

goto fail bug

Apple's SSL/TLS from February 2014 contained a major security flaw known formally as CVE-2014-1266 and informally as the "goto fail bug".^[5]^[6] The relevant code fragment^[7] is:

static OSStatus
SSLVerifySignedServerKeyExchange(SSLContext *ctx, bool isRsa, SSLBuffer signedParams,
                                 uint8_t *signature, UInt16 signatureLen)
{
    OSStatus        err;
    ...
 
    if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0)
        goto fail;
    if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
        goto fail;
        goto fail;
    if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)
        goto fail;
    ...
 
fail:
    SSLFreeBuffer(&signedHashes);
    SSLFreeBuffer(&hashCtx);
    return err;
}

Here, there are two successive goto fail statements. In the syntax of the C language, the second is unconditional, and hence always skips the call to SSLHashSHA1.final. As a consequence, err will hold the status of the SHA1 update operation, and signature verification will never fail.^[5]

Here, the unreachable code is the call to the final function.^[6] Applying the Clang compiler with the option -Weverything includes unreachable code analysis, which would trigger an alarm for this code.^[6]

C++

In C++, some constructs are specified to have undefined behavior. A compiler is free to implement any behavior or none, and typically an optimizing compiler will assume the code is unreachable.^[8]

Analysis

Detection of unreachable code is a form of control flow analysis to find code that can never be reached in any possible program state. In some languages (e.g. Java^[9]) some forms of unreachable code are explicitly disallowed. The optimization that removes unreachable code is known as dead code elimination.

Code may become unreachable as a consequence of transformations performed by an optimizing compiler (e.g., common subexpression elimination).

In practice the sophistication of the analysis has a significant impact on the amount of unreachable code that is detected. For example, constant folding and simple flow analysis shows that the inside of the if-statement in the following code is unreachable:

int N = 2 + 1;

if (N == 4)
{
   /* unreachable */
}

However, a great deal more sophistication is needed to work out that the corresponding block is unreachable in the following code:

double X = sqrt(2);

if (X > 5)
{
    /* unreachable */
}

Unreachable code elimination technique is in the same class of optimizations as dead code elimination and redundant code elimination.

Unreachability vs. profiling

In some cases, a practical approach may be a combination of simple unreachability criteria and use of a profiler to handle the more complex cases. Profiling in general can not prove anything about the unreachability of a piece of code, but may be a good heuristic for finding potentially unreachable code. Once a suspect piece of code is found, other methods, such as a more powerful code analysis tool, or even analysis by hand, could be used to decide whether the code is truly unreachable.

References

^ Debray, Saumya K.; Evans, William; Muth, Robert; De Sutter, Bjorn (1 March 2000). "Compiler techniques for code compaction". ACM Transactions on Programming Languages and Systems. 22 (2): 378–415. CiteSeerX 10.1.1.43.7215. doi:10.1145/349214.349233. S2CID 6129772.
^ RTCA/DO-178C Software Considerations in Airborne Systems and Equipment Certification. RTCA, Inc. 2011. p. 112. Retrieved 2019-06-11. Dead code – Executable Object Code (or data) which exists as a result of a software development error but cannot be executed (code) or used (data) in any operational configuration of the target computer environment. It is not traceable to a system or software requirement. The following exceptions are often mistakenly categorized as dead code but are necessary for implementation of the requirements/design: embedded identifiers, defensive programming structures to improve robustness, and deactivated code such as unused library functions. [Since requirements-based review should identified such code as untraceable to functional requirements, static code analysis should identify such code as unreachable, and structural coverage analysis of requirements-based testing results should identify such code as unreachable, presence of unjustified dead code in a project should raise consideration of the effectiveness of the organization's development and verification processes.]
^ Jay Thomas (24 January 2017). "Requirements Traceability Forms the Foundation for Thorough Software Testing". Retrieved 2019-06-11. The combination of requirements traceability with coverage analysis can also turn up areas of "dead code," or code that's never executed. This code can mostly be an inconvenience, but it can also be a security threat if a hacker can gain access and from there gain control. It's code that can't be traced and should therefore be eliminated.
^ MISRA Consortium (March 2013). MISRA C:2012 Guidelines for the used of C language in critical systems. MIRA Limited. p. 41. Retrieved 2019-06-11. Rule 2.2 there shall be no dead code. Any operation that is executed but whose removal would not affect program behavior constitutes dead code.
^ ^a ^b Adam Langley (2014). "Apple's SSL/TLS bug".
^ ^a ^b ^c Arie van Deursen (2014). "Learning from Apple's #gotofail Security Bug".
^ "sslKeyExchange.c - Source code for support for key exchange and server key exchange".
^ "MSC15-C. Do not depend on undefined behavior". Carnegie Mellon University. 2020. Retrieved 28 September 2020. Because compilers are not obligated to generate code for undefined behavior, these behaviors are candidates for optimization.
^ "Java Language Specification".

Appel, A. W. 1998 Modern Compiler Implementation in Java. Cambridge University Press.
Muchnick S. S. 1997 Advanced Compiler Design and Implementation. Morgan Kaufmann.

[1] Debray, Saumya K.; Evans, William; Muth, Robert; De Sutter, Bjorn (1 March 2000). "Compiler techniques for code compaction". ACM Transactions on Programming Languages and Systems. 22 (2): 378–415. CiteSeerX 10.1.1.43.7215. doi:10.1145/349214.349233. S2CID 6129772.

[2] RTCA/DO-178C Software Considerations in Airborne Systems and Equipment Certification. RTCA, Inc. 2011. p. 112. Retrieved 2019-06-11. Dead code – Executable Object Code (or data) which exists as a result of a software development error but cannot be executed (code) or used (data) in any operational configuration of the target computer environment. It is not traceable to a system or software requirement. The following exceptions are often mistakenly categorized as dead code but are necessary for implementation of the requirements/design: embedded identifiers, defensive programming structures to improve robustness, and deactivated code such as unused library functions. [Since requirements-based review should identified such code as untraceable to functional requirements, static code analysis should identify such code as unreachable, and structural coverage analysis of requirements-based testing results should identify such code as unreachable, presence of unjustified dead code in a project should raise consideration of the effectiveness of the organization's development and verification processes.]

[3] Jay Thomas (24 January 2017). "Requirements Traceability Forms the Foundation for Thorough Software Testing". Retrieved 2019-06-11. The combination of requirements traceability with coverage analysis can also turn up areas of "dead code," or code that's never executed. This code can mostly be an inconvenience, but it can also be a security threat if a hacker can gain access and from there gain control. It's code that can't be traced and should therefore be eliminated.

[4] MISRA Consortium (March 2013). MISRA C:2012 Guidelines for the used of C language in critical systems. MIRA Limited. p. 41. Retrieved 2019-06-11. Rule 2.2 there shall be no dead code. Any operation that is executed but whose removal would not affect program behavior constitutes dead code.

[gotofail-5] Adam Langley (2014). "Apple's SSL/TLS bug".

[gotofail_lessons-6] Arie van Deursen (2014). "Learning from Apple's #gotofail Security Bug".

[7] "sslKeyExchange.c - Source code for support for key exchange and server key exchange".

[8] "MSC15-C. Do not depend on undefined behavior". Carnegie Mellon University. 2020. Retrieved 28 September 2020. Because compilers are not obligated to generate code for undefined behavior, these behaviors are candidates for optimization.

[9] "Java Language Specification".

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]