Code-Driven Documentation: How to Eliminate Errors in Configuration Documentation

This document outlines how to utilize code to enhance the accuracy of configuration documentation, particularly for technical writers who might not be well-versed in coding but are responsible for creating or reviewing configuration files. By following the examples in this document, writers can ensure that their configuration files are correct, efficient, and easy to understand, while also reducing the risk of errors or inconsistencies.

Documentation-driven development emphasizes the importance of creating documentation that meets the needs of users. Code-driven documentation here prioritizes the accuracy and consistency of documentation by relying on the source of truth, which is the code itself. By using code to verify documentation, technical writers can ensure that their documentation is up-to-date and correctly reflects the behavior of the system, which can help to reduce errors and improve the overall quality of the documentation.

The following uses some configuration files as examples to illustrate how to use code to enhance the accuracy of configuration documentation. For example, the following configuration files are used:

Product	Configuration file	Language	Example
TiDB	`config.toml`	Go	Example 1
TiKV	`config.toml`	Rust	Example 2
TiFlash	`tiflash.toml`	C++	Example 3
PD	`config.toml`	Go	Example 4

Example 1: TiDB configuration file & `config.go`

The TiDB configuration file is written in TOML format. The documentation is TiDB configuration file.

Steps

The following takes log.level as an example:

## Log

Configuration items related to log.

### `level`

+ Specifies the log output level.
+ Value options: `debug`, `info`, `warn`, `error`, and `fatal`.
+ Default value: `info`

Create a shallow clone of the TiDB repository:

git clone https://github.com/pingcap/tidb.git --depth=1 tidb

Search for "level" or toml:"level" in the tidb folder. The following uses find and grep commands to search and list all files that contain the "level" keyword:

Command
Output

cd tidb
find .  | grep -l -r '"level"'

./config/config.go
./planner/core/memtable_predicate_extractor.go
./dumpling/log/log.go
./parser/parser_test.go
./parser/parser.go
./infoschema/metric_table_def.go
./br/pkg/lightning/tikv/tikv.go
./br/pkg/lightning/lightning.go
./br/pkg/lightning/log/log.go
./br/pkg/lightning/restore/precheck_impl_test.go
./sessionctx/stmtctx/stmtctx.go

From the preceding output, you can skip the ./parser/parser_test.go and ./br/pkg/lightning/restore/precheck_impl_test.go files because they are test files. Then, search for the "level" keyword again:

Command
Output

find . | grep -r '"level"' --exclude "*_test.go"

./config/config.go:	Level string `toml:"level" json:"level"`
./planner/core/memtable_predicate_extractor.go:	remained, levlSkipRequest, logLevels := e.extractCol(schema, names, remained, "level", true)
./dumpling/log/log.go:	Level string `toml:"level" json:"level"`
./parser/parser.go:		"level",
./infoschema/metric_table_def.go:		Labels:  []string{"instance", "level", "db"},
./infoschema/metric_table_def.go:		Labels:  []string{"instance", "cf", "level", "db"},
./br/pkg/lightning/tikv/tikv.go:	task := log.With(zap.Int32("level", level), zap.String("tikv", tikvAddr)).Begin(zap.InfoLevel, "compact cluster")
./br/pkg/lightning/lightning.go:		Level zapcore.Level `json:"level"`
./br/pkg/lightning/log/log.go:	Level string `toml:"level" json:"level"`
./sessionctx/stmtctx/stmtctx.go:	Level  string        `json:"level"`

Then, you can see the context of the "level" keyword in other files:

config/config.go
type Log struct {
    // Log level.
    Level string `toml:"level" json:"level"`
    // ...
}

planner/core/memtable_predicate_extractor.go
func (e *ClusterLogTableExtractor) Extract(
    ctx sessionctx.Context,
    schema *expression.Schema,
    names []*types.FieldName,
    predicates []expression.Expression,
) []expression.Expression {
    // ...
    remained, levlSkipRequest, logLevels := e.extractCol(schema, names, remained, "level", true)
    e.SkipRequest = typeSkipRequest || addrSkipRequest || levlSkipRequest
    // ...

dumpling/log/log.go
// Config serializes log related config in toml/json.
type Config struct {
    // Log level.
    // One of "debug", "info", "warn", "error", "dpanic", "panic", and "fatal".
    Level string `toml:"level" json:"level"`
    // ...
}
}

parser/parser.go
yySymNames = []string{
    // ...
    "language",
    "level",
    "list",
    // ...
}

infoschema/metric_table_def.go
var MetricTableMap = map[string]MetricTableDef{
    // ...
    "tikv_compression_ratio": {
        PromQL:  `avg(tikv_engine_compression_ratio{$LABEL_CONDITIONS}) by (level,instance,db)`,
        Labels:  []string{"instance", "level", "db"},
        Comment: "The compression ratio of each level",
    },
    // ...
    "tikv_number_files_at_each_level": {
        PromQL:  `avg(tikv_engine_num_files_at_level{$LABEL_CONDITIONS}) by (cf, level,db,instance)`,
        Labels:  []string{"instance", "cf", "level", "db"},
        Comment: "The number of SST files for different column families in each level",
    },
    // ...
}

br/pkg/lightning/tikv/tikv.go
// Compact performs a leveled compaction with the given minimum level.
func Compact(ctx context.Context, tls *common.TLS, tikvAddr string, level int32) error {
    task := log.With(zap.Int32("level", level), zap.String("tikv", tikvAddr)).Begin(zap.InfoLevel, "compact cluster")
    // ...
}

br/pkg/lightning/lightning.go
func handleLogLevel(w http.ResponseWriter, req *http.Request) {
    w.Header().Set("Content-Type", "application/json")

    var logLevel struct {
        Level zapcore.Level `json:"level"`
    }
    // ...
}

br/pkg/lightning/log/log.go
type Config struct {
    // Log level.
    Level string `toml:"level" json:"level"`
    // Log filename, leave empty to disable file log.
    File string `toml:"file" json:"file"`
    // ...
}

sessionctx/stmtctx/stmtctx.go
type jsonSQLWarn struct {
    Level  string        `json:"level"`
    SQLErr *terror.Error `json:"err,omitempty"`
    Msg    string        `json:"msg,omitempty"`
}

Verify the data type:
config/config.go
```
type Log struct {
    // Log level.
    Level string `toml:"level" json:"level"`
    // ...
}
```
The level item is defined in the Log struct, the variable name is Level, and the type is string. Then, you can verify whether the type of log.level in the document is consistent with the type of Level in the code.

To verify the default value, search for Level in the config/config.go file, and you can find the default value of Level is "info":

config/config.go
var defaultConf = Config{
    Host:                         DefHost,
    AdvertiseAddress:             "",
    Port:                         DefPort,
    // ...
    Log: Log{
        Level:               "info",
        Format:              "text",
        // ...
    },
    // ...
}

To verify whether level is in the log table or not, search for "log" in the config.go file. You can find the following:

config/config.go
type Config struct {
    Host             string `toml:"host" json:"host"`
    // ...
    Log                        Log                     `toml:"log" json:"log"`
    // ...
}
// ...
type Log struct {
    // Log level.
    Level string `toml:"level" json:"level"`
    // ...
}

The level is defined in the Log struct, that is, level is in the log table.

Conclusion

In the config/config.go, you can verify the following information of CONFIG-NAME by searching"CONFIG-NAME":

The type of a configuration item.
The default value of a configuration item.
The table that a configuration item belongs to.

Example 2: TiKV configuration file & `config.rs`

The TiKV configuration file is written in TOML format. The documentation is TiKV configuration file.

Steps

The following takes raftstore.right-derive-when-split as an example:

Create a shallow clone of the TiKV repository:

git clone https://github.com/tikv/tikv.git --depth=1 tikv

Search for right(.*)derive(.*)when(.*)split in the tikv folder.
Verify the data type in the components/raftstore/src/store/config.rs file:

The configuration item is defined as pub right_derive_when_split: bool in the Config struct. The type is bool. Then, you can verify whether the type of raftstore.right-derive-when-split in the document is consistent with the type in the code.
components/raftstore/src/store/config.rs
```
struct Config {
    // Right region derive origin region id when split.
    #[online_config(hidden)]
    pub right_derive_when_split: bool,
    // ...
}
```

Verify the default value in the components/raftstore/src/store/config.rs file. The default value is true:

components/raftstore/src/store/config.rs
impl Default for Config {
    fn default() -> Config {
        Config {
            prevote: true,
            raftdb_path: String::new(),
            // ...
            right_derive_when_split: true,
            // ...
        }
    }
}

Conclusion

In the components/.../config.rs, you can verify the following information of CONFIG-NAME by searchingCONFIG(.*)NAME or CONFIG_NAME:

The type of a configuration item.
The default value of a configuration item.
The value range of a configuration item.

Example 3: TiFlash configuration file

Steps

The TiFlash configuration file is written in TOML format. The documentation is TiFlash configuration file. The following takes storage.format_version as an example.

Create a shallow clone of the TiFlash repository:

git clone https://github.com/pingcap/tiflash.git -depth 1 tiflash

Search for "format_version" in the tiflash folder. You can find the following output:

dbms/src/Server/StorageConfigParser.cpp
void TiFlashStorageConfig::parseMisc(const String & storage_section, Poco::Logger * log)
{
    std::istringstream ss(storage_section);
    cpptoml::parser p(ss);
    auto table = p.parse();

    // ...
    if (auto version = table->get_qualified_as<UInt64>("format_version"); version)
    {
        format_version = *version;
    }
    // ...
}

Then, search for which file calls the format_version and TiFlashStorageConfig at the same time. You can use the TiFlashStorageConfig(?:.|\n)*format_version|format_version(?:.|\n)*TiFlashStorageConfig regular expression and find the following output:

dbms/src/Server/Server.cpp
size_t global_capacity_quota = 0;
TiFlashStorageConfig storage_config;
std::tie(global_capacity_quota, storage_config) = TiFlashStorageConfig::parseSettings(config(), log);

if (storage_config.format_version)
{
    setStorageFormat(storage_config.format_version);
    LOG_FMT_INFO(log, "Using format_version={} (explicit stable storage format detected).", storage_config.format_version);
}
else
{
    LOG_FMT_INFO(log, "Using format_version={} (default settings).", STORAGE_FORMAT_CURRENT.identifier);
}

In the preceding Server.cpp file, storage_config.format_version is used to get the value of format_version and setStorageFormat() is used to set the value of format_version.

Search for setStorageFormat and you can find the following output:

dbms/src/Storages/FormatVersion.h
inline void setStorageFormat(UInt64 setting)
{
    STORAGE_FORMAT_CURRENT = toStorageFormat(setting);
}

inline void setStorageFormat(const StorageFormatVersion & version)
{
    STORAGE_FORMAT_CURRENT = version;
}

If storage_config.format_version is UInt64 type, then toStorageFormat(setting) is used to convert its value to StorageFormatVersion type:

dbms/src/Storages/FormatVersion.h
inline const StorageFormatVersion & toStorageFormat(UInt64 setting)
{
    switch (setting)
    {
    case 1:
        return STORAGE_FORMAT_V1;
    case 2:
        return STORAGE_FORMAT_V2;
    case 3:
        return STORAGE_FORMAT_V3;
    case 4:
        return STORAGE_FORMAT_V4;
    default:
        throw Exception("Illegal setting value: " + DB::toString(setting));
    }
}

In the preceding toStorageFormat() function, if setting is 1, then STORAGE_FORMAT_V1 is returned. If setting is not 1, 2, 3, or 4, an exception is thrown.

Search for STORAGE_FORMAT_V1 and you can find the following output:

dbms/src/Storages/FormatVersion.h
inline static const StorageFormatVersion STORAGE_FORMAT_V1 = StorageFormatVersion{
    .segment = SegmentFormat::V2,
    .dm_file = DMFileFormat::V1,
    .stable = StableFormat::V1,
    .delta = DeltaFormat::V2,
    .page = PageFormat::V2,
    .identifier = 1,
};

Example 4: PD configuration file & `config.go`

The PD configuration file is written in TOML format and the documentation is PD configuration file.

Steps

The following takes pd-server.flow-round-by-digit as an example.

Create a shallow clone of the PD repository:

git clone https://github.com/tikv/pd.git -depth 1 pd

Search for "flow-round-by-digit" in the pd folder. You can find the following output:

server/config/config.go
type PDServerConfig struct {
    // ...
    // FlowRoundByDigit used to discretization processing flow information.
    FlowRoundByDigit int `toml:"flow-round-by-digit" json:"flow-round-by-digit"`
}

The flow-round-by-digit is defined in the PDServerConfig struct and its type is int.

Search for FlowRoundByDigit in config.go and you can find the following output:
server/config/config.go
```
const(
    defaultFlowRoundByDigit  = 3 // KB
)
```
The default value of flow-round-by-digit is 3.

To verify whether flow-round-by-digit is in the pd-server table or not, search for "pd-server" in the config.go file. You can find the following output:

server/config/config.go
type Config struct {
    // ...
    PDServerCfg PDServerConfig `toml:"pd-server" json:"pd-server"`
    // ...
}

Then, search for PDServerConfig and you can find the following output:

server/config/config.go
type PDServerConfig struct {
    // ...
    // MetricStorage is the cluster metric storage.
    // Currently we use prometheus as metric storage, we may use PD/TiKV as metric storage later.
    MetricStorage string `toml:"metric-storage" json:"metric-storage"`
    // There are some values supported: "auto", "none", or a specific address, default: "auto"
    DashboardAddress string `toml:"dashboard-address" json:"dashboard-address"`
    // TraceRegionFlow the option to update flow information of regions.
    // WARN: TraceRegionFlow is deprecated.
    TraceRegionFlow bool `toml:"trace-region-flow" json:"trace-region-flow,string,omitempty"`
    // FlowRoundByDigit used to discretization processing flow information.
    FlowRoundByDigit int `toml:"flow-round-by-digit" json:"flow-round-by-digit"`
    // MinResolvedTSPersistenceInterval is the interval to save the min resolved ts.
    MinResolvedTSPersistenceInterval typeutil.Duration `toml:"min-resolved-ts-persistence-interval" json:"min-resolved-ts-persistence-interval"`
}

The FlowRoundByDigit is defined in the PDServerConfig struct, that is, flow-round-by-digit is in the pd-server table.

Conclusion

In the server/config/config.go, you can verify the following information of CONFIG-NAME by searching"CONFIG-NAME":

The type of a configuration item.
The default value of a configuration item.
The table that a configuration item belongs to.

What's next?

How to verify configuration files using code automatically?
How to generate configuration files from code automatically?

Example 1: TiDB configuration file & config.go​

Steps​

Conclusion​

Example 2: TiKV configuration file & config.rs​

Steps​

Conclusion​

Example 3: TiFlash configuration file​

Steps​

Example 4: PD configuration file & config.go​

Steps​

Conclusion​

What's next?​

Example 1: TiDB configuration file & `config.go`

Steps

Conclusion

Example 2: TiKV configuration file & `config.rs`

Steps

Conclusion

Example 3: TiFlash configuration file

Steps

Example 4: PD configuration file & `config.go`

Steps

Conclusion

What's next?