Architecture notes
If you're interested in contributing, here's some general architectural notes that will hopefully help you find your way around the code base. (By the way, if you'd like to see more detail about specific aspects or topics, please let use know by creating a GitHib issue in the rushstack-websites documentation monorepo.)
Project anatomy
API Extractor's code is separated into source code folders that reflect subsystems that can be arranged into a rough overall operational flow.
src/cli - the command-line interface (CLI) that gets things started
src/api - this folder contains the public API such as
ExtractorandExtractorConfig. The CLI invokes these APIs the same way that an external consumer would; it doesn't use any special internals. The TypeScript compiler gets configured in this stage, producing thets.Programobject that will be used below.src/collector - The
Collectoracts as a central orchestrator that runs many of the stages below. Conceptually it is "collecting" all the API information in a central place, primarilyCollectorEntityobjects. This folder also has theMessageRouterclass that routes errors and warnings based on the"messages"table from api-extractor.json.src/analyzer - the core analyzer, which traverses the TypeScript compiler's abstract syntax tree (AST) and produces the higher-level representations used by API Extractor. There are 4 major pieces of tech here:
The
AstSymbolandAstDeclarationclasses, which mirror the compiler'sts.Symbolandts.Declarationclasses. The difference is that anAstDeclarationnode is only generated for a subset of interesting nodes (e.g. classes, enums, interfaces, etc.) that will become "API items" in the documentation website and itsapi-extractor-modelrepresentation. This condensed tree omits all the intermediaryts.Declarationnodes (e.g.extendsclauses,:tokens, and so forth). The AstSymbol.ts code comments provide some more detail about this very important data structure.The
ExportAnalyzer, which traverses chains of TypeScriptimportstatements, eliminating the intermediary symbol aliases to build a flattened view as seen in the .d.ts rollup. The problem is that the compiler's API makes it difficult to detect when this traversal leaves the working package (e.g. hops into thenode_modulesfolder or compiler's runtime library). That's why this file has special handling for each kind of import syntax. Theexport * fromconstruct is by far the most complicated form.The
Spanclass, which is a fairly lame but fairly effective utility for rewriting TypeScript source code while ignoring most of its meaning except for specific node types that we recognize. API Extractor does not use the compiler's emitter to write .d.ts files, partially because those API were not public when we started, but also because it more faithfully preserves the original .d.ts inputs. The DtsRollupGenerator._modifySpan() function is a good illustration of howSpanis used.The
AstReferenceResolver: Given a TSDoc declaration reference, this walks the theAstSymbolTableto find whatever it refers to.
src/enhancers - After the
Collectorhas collected all the API objects and their metadata, we run a series of additional postprocessing stages calledenhancers. The current ones areValidationEnhancer(which applies some API validation rules) andDocCommentEnhancerwhich tunes up the TSDoc comments, for example expanding the@inheritDocreferences.src/generators - This folder implements API Extractor's famous 3 output types:
ApiReportGenerator,DtsRollupGenerator, andApiModelGenerator.src/schemas - This folder contains the
api-extractor inittemplate file, the JSON schema for api-extractor.json, and api-extractor-defaults.json which represents the default values for api-extractor.json settings.
Data flow
Another useful way to understand API Extractor is by examining what happens to a declaration as it
gets transformed by each stage. Consider a simple function declaration that has two overloads:
import { Report } from 'reporting-package';
/** Declaration 1 */
export declare function add(report: Report, amount: number): void;
/** Declaration 2 */
export declare function add(report: Report, title: string): void;
Here's how it gets processed:
Compiler stage: The TypeScript compiler engine parses the .d.ts file into two
ts.Declarationobjects (one for each overload) representing the parsed syntax. The compiler's analyzer then makes an associatedts.Symbolwhich represents the function's type. Each TypeScript type always becomes exactly one symbol, and in this case with two associated declarations (the two overloads). There will also be many "aliases" for this symbol. For example, if we writeimport { add } from "./math", the wordaddhere becomes a symbol alias whose declaration is thatimportstatement. If we follow the chain of symbol aliases (perhaps through many imports and exports), we will always reach the unique "followed symbol" corresponding to the original real definition ofadd().Analyzer stage: API Extractor starts from your API entry point and follows each export to find its "followed symbol". Then we make an
AstSymboland twoAstDeclaration's foradd(). The analyzer also walks up and down the AST tree to fill out the context. For example, if theAstSymbolis aclass, then we'll create a childAstSymbolfor each of its members. And if the class belongs to anamespace, then a parentAstSymbolis added representing the namespace.While following
importstatements, if we reach an external NPM package, the analysis stops there and produces anAstImportinstead of a regularAstSymbol. This is because API Extractor understands package boundaries, and in fact is designed to be invoked separately on each project. Thus, in the above example,Reportwould become anAstImportinstead of anAstSymbol. The analyzer's overall job is to pick through the extremely detailed compiler data structures and produce a simplified tree ofAstSymbolobjects. This algorithm is the most complex stage of API Extractor, so we try to keep it isolated and single-purpose.Collector stage: The collector builds the inventory of things that will end up as top-level items in the .d.ts rollup. We call these
CollectorEntityobjects, and there is one for ouradd()function, and another one for theReportimport. SoAstSymbolandAstImportcan become aCollectorEntity. But note thatAstDeclarationcannot, nor canAstModule(the analyzer's representation of a .d.ts source file). To keep this straight, the analyzer's objects inherit from theAstEntitybase class if-and-only-if they can become aCollectorEntity. TheCollectorEntitywrapsAstEntityand appends some additional collector stage information:- Whether the entity is an
exportof your .d.ts rollup or just a local declaration. - The local name in the .d.ts rollup, since local declarations may need to get renamed
by
DtsRollupGenerator._makeUniqueNames()to avoid naming conflicts - The export name(s) which can be different from the local name. For example:
export { A as B, A as C }.
- Whether the entity is an
Enhancers stage: The enhancers mostly work with the
DeclarationMetadata,ApiItemMetadata, andSymbolMetadataobjects. These objects are stored onAstSymbolandAstDeclaration, but they are entirely owned by the collector stage.ApiReportGenerator and DtsRollupGenerator: These generators essentially just dump the
CollectorEntityitems into a big text file, but with different formatting. Other than trimming items according to release type, they don't do much processing.api-extractor-model stage: The @microsoft/api-extractor-model package is completely independent and does not rely on any of the other API Extractor types described above. It defines the portable .api.json file format. It has its own rich hierarchy inheriting from the
ApiItembase class (mixin inheritance actually):ApiClass,ApiNamespace,ApiParameter, etc. In our example, theadd()function will become anApiFunctionitem in this representation. This model is designed to make it easy for third parties to generate documentation without having to understand the thorny compiler data structures. Thus theApiModelGeneratortakes ourCollectorEntityforadd()and converts it into anApiFunctionthat will get serialized into .api.json.Recall that the analyzer internally used the
AstReferenceResolverhelper to look up TSDoc declaration references and find the targetAstDeclaration. For the .api.json files, @microsoft/api-extractor-model provides an analogousModelReferenceResolverhelper that looks upApiItemtargets.API Documenter stage: Okay, one final transformation happens here. It's the last one! :-) When API Documenter loads up the .api.json file, it does not render it directly to .md files. First it converts the
ApiFunctionfor ouradd()example function into a tree of TSDocDocNodeelements. NormallyDocNodeis used to represent doc comments. But it happens to be a full DOM-like structure that can represent rich text. Since the TSDoc comment foradd()is already this kind of rich text, API Documenter cleverly reuses this representation to model an entire web page. This intermediate representation enables the markdown emitter to be decoupled from the documentation engine, and makes it easy in the future to output other formats such as HTML or React.
To summarize, for the humble add() function this pipeline produced a number of different representations:
AstDeclarationfor the overload declarationsAstSymbolfor the TypeScript typeCollectorEntityfor the entry in the .d.ts fileDeclarationMetadata,ApiItemMetadata, andSymbolMetadatato annotate the symbol and declaration with more infoApiFunctionfor the .api.json fileDocNodesubtree for the documentation website