概述尾部采样在收集完整链路后按策略决定保留与丢弃,适合高流量与成本控制。结合批处理与重试队列,既减少导出压力又保留关键错误与热点服务的链路。关键实践与参数采样决策: `tail_sampling` `decision_wait=5–10s`策略组合: 错误优先、热点服务优先、默认速率限制导出与重试: `batch` `memory_limiter` `retry_on_failure`协议与端点: `otlp`/`otlphttp` 指向后端(Tempo/Jaeger)示例/配置/实现receivers:
otlp:
protocols:
http:
grpc:
processors:
batch:
timeout: 5s
send_batch_size: 1024
memory_limiter:
limit_mib: 512
spike_limit_mib: 128
tail_sampling:
decision_wait: 5s
policies:
- name: error-traces
type: status_code
status_code:
status_codes: [ERROR]
- name: important-service
type: string_attribute
string_attribute:
key: service.name
values: ["checkout", "payment"]
- name: default-rate
type: rate_limiting
rate_limiting:
spans_per_second: 200
exporters:
otlphttp:
endpoint: http://tempo:4318
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, tail_sampling, batch]
exporters: [otlphttp]
验证错误保留: 制造 `ERROR` 链路,确认采样后仍导出热点优先: 对 `service.name=checkout` 的高流量链路保留比例更高速率限制: 在峰值下导出速率受控,队列无积压延迟与资源: 记录导出延迟与内存使用,确保在阈值内注意事项决策等待时间影响端到端延迟与保留效果,需权衡策略顺序与组合会影响结果,先匹配高优先策略与日志与指标管线协同治理,统一后端容量规划关注后端存储成本与查询性能

发表评论 取消回复