Translation unit (programming)

In C and C++ programming language terminology, a translation unit (or more casually a compilation unit) is the ultimate input to a C or C++ compiler from which an object file is generated.[1] A translation unit roughly consists of a source file after it has been processed by the C preprocessor, meaning that header files listed in #include directives are literally included, sections of code within #ifndef may be included, and macros have been expanded.

Context

A C program consists of units called source files (or preprocessing files), which, in addition to source code, includes directives for the C preprocessor. A translation unit is the output of the C preprocessor – a source file after it has been preprocessed.

Preprocessing notably consists of expanding a source file to recursively replace all #include directives with the literal file declared in the directive (usually header files, but possibly other source files); the result of this step is a preprocessing translation unit. Further steps include macro expansion of #define directives, and conditional compilation of #ifdef directives, among others; this translates the preprocessing translation unit into a translation unit. From a translation unit, the compiler generates an object file, which can be further processed and linked (possibly with other object files) to form an executable program.

Note that the preprocessor is in principle language agnostic, and is a lexical preprocessor, working at the lexical analysis level – it does not do parsing, and thus is unable to do any processing specific to C syntax. The input to the compiler is the translation unit, and thus it does not see any preprocessor directives, which have all been processed before compiling starts. While a given translation unit is fundamentally based on a file, the actual source code fed into the compiler may appear substantially different than the source file that the programmer views, particularly due to the recursive inclusion of headers.

Scope

Translation units define a scope, roughly file scope, and functioning similarly to module scope; in C terminology this is referred to as internal linkage, which is one of the two forms of linkage in C. Names (functions and variables) declared outside of a function block may be visible either only within a given translation unit, in which case they are said to have internal linkage – they are not visible to the linker – or may be visible to other object files, in which case they are said to have external linkage, and are visible to the linker.

C does not have a notion of modules. However, separate object files (and hence also the translation units used to produce object files) function similarly to separate modules, and if a source file does not include other source files, internal linkage (translation unit scope) may be thought of as "file scope, including all header files".

Code organization

The bulk of a project's code is typically held in files with a .c suffix (or .cpp, .cxx or .cc for C++, of which .cpp is used most conventionally). Files intended to be included typically have a .h suffix ( .hpp or .hh are also used for C++, but .h is the most common even for C++), and generally do not contain function or variable definitions to avoid name conflicts when headers are included in multiple source files, as is often the case. Header files can be, and often are, included in other header files. It is standard practice for all .c files in a project to include at least one .h file.

See also

References