概述尾部采样在收集完整链路后按策略决定保留与丢弃,适合高流量与成本控制。结合批处理与重试队列,既减少导出压力又保留关键错误与热点服务的链路。关键实践与参数采样决策: `tail_sampling` `decision_wait=5–10s`策略组合: 错误优先、热点服务优先、默认速率限制导出与重试: `batch` `memory_limiter` `retry_on_failure`协议与端点: `otlp`/`otlphttp` 指向后端(Tempo/Jaeger)示例/配置/实现receivers: otlp: protocols: http: grpc: processors: batch: timeout: 5s send_batch_size: 1024 memory_limiter: limit_mib: 512 spike_limit_mib: 128 tail_sampling: decision_wait: 5s policies: - name: error-traces type: status_code status_code: status_codes: [ERROR] - name: important-service type: string_attribute string_attribute: key: service.name values: ["checkout", "payment"] - name: default-rate type: rate_limiting rate_limiting: spans_per_second: 200 exporters: otlphttp: endpoint: http://tempo:4318 service: pipelines: traces: receivers: [otlp] processors: [memory_limiter, tail_sampling, batch] exporters: [otlphttp] 验证错误保留: 制造 `ERROR` 链路,确认采样后仍导出热点优先: 对 `service.name=checkout` 的高流量链路保留比例更高速率限制: 在峰值下导出速率受控,队列无积压延迟与资源: 记录导出延迟与内存使用,确保在阈值内注意事项决策等待时间影响端到端延迟与保留效果,需权衡策略顺序与组合会影响结果,先匹配高优先策略与日志与指标管线协同治理,统一后端容量规划关注后端存储成本与查询性能

发表评论 取消回复