Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
785 changes: 785 additions & 0 deletions IMPLEMENTATION_PLAN.md

Large diffs are not rendered by default.

466 changes: 466 additions & 0 deletions PROMETHEUS_METRICS.md

Large diffs are not rendered by default.

30 changes: 30 additions & 0 deletions config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,15 @@
Enabled: false,
AllowUnsafe: []string{},
},
Prometheus: PrometheusConfig{
Enabled: false,

Check warning on line 63 in config/config.go

View check run for this annotation

probelabs / Visor: security

security Issue

The default listen address for the Prometheus endpoint is set to `:9090`, which binds to all network interfaces. This could unintentionally expose internal gateway metrics to external networks if the host has a public IP address and is not explicitly firewalled.
Raw output
Change the default `ListenAddress` to `localhost:9090`. This would make the metrics endpoint accessible only from the local machine by default, which is a more secure starting point. Users requiring remote access can then explicitly configure it to `0.0.0.0:9090` or a specific interface IP.
ListenAddress: ":9090",
Path: "/metrics",
MetricPrefix: "tyk_gateway",
EnableGoCollector: true,
EnableProcessCollector: true,
EnablePerAPIMetrics: false,
},
PIDFileLocation: "/var/run/tyk/tyk-gateway.pid",
Security: SecurityConfig{
CertificateExpiryMonitor: CertificateExpiryMonitorConfig{
Expand Down Expand Up @@ -772,6 +781,24 @@
AllowUnsafe []string `json:"allow_unsafe"`
}

// PrometheusConfig holds configuration for Prometheus metrics exposure
type PrometheusConfig struct {
// Enabled activates Prometheus metrics endpoint
Enabled bool `json:"enabled"`
// ListenAddress is the address to expose metrics (e.g., ":9090")
ListenAddress string `json:"listen_address"`
// Path is the HTTP path for metrics endpoint (default: "/metrics")
Path string `json:"path"`
// MetricPrefix is the prefix for all Tyk metrics (default: "tyk_gateway")
MetricPrefix string `json:"metric_prefix"`
// EnableGoCollector enables Go runtime metrics
EnableGoCollector bool `json:"enable_go_collector"`
// EnableProcessCollector enables process metrics
EnableProcessCollector bool `json:"enable_process_collector"`
// EnablePerAPIMetrics enables per-API metrics with api_id label (can increase cardinality)
EnablePerAPIMetrics bool `json:"enable_per_api_metrics"`
}

// Config is the configuration object used by Tyk to set up various parameters.
type Config struct {
// Force your Gateway to work only on a specific domain name. Can be overridden by API custom domain.
Expand Down Expand Up @@ -1186,6 +1213,9 @@
// StatsD prefix
StatsdPrefix string `json:"statsd_prefix"`

// Prometheus metrics configuration
Prometheus PrometheusConfig `json:"prometheus"`

// Event System
EventHandlers apidef.EventHandlerMetaConfig `json:"event_handlers"`
EventTriggers map[apidef.TykEvent][]TykEventHandler `json:"event_trigers_defunct"` // Deprecated: Config.GetEventTriggers instead.
Expand Down
14 changes: 14 additions & 0 deletions gateway/handler_success.go
Original file line number Diff line number Diff line change
Expand Up @@ -182,8 +182,22 @@ func (s *SuccessHandler) addTraceIDTag(reqCtx context.Context, tags []string) []
}

func (s *SuccessHandler) RecordHit(r *http.Request, timing analytics.Latency, code int, responseCopy *http.Response, cached bool) {
// Record Prometheus metrics (independent of analytics)
if s.Gw.PrometheusMetrics != nil {
s.Gw.PrometheusMetrics.RecordRequest(
s.Spec.APIID,
s.Spec.Name,
r.Method,
code,
timing.Total,
timing.Upstream,
)
} else {
log.Debug("PrometheusMetrics is nil, skipping metrics recording")
}

if s.Spec.DoNotTrack || ctxGetDoNotTrack(r) {
log.Debug("Skipping RecordHit: DoNotTrack enabled")
return
}

Expand Down
68 changes: 68 additions & 0 deletions gateway/instrumentation_handlers.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
"time"

"github.com/gocraft/health"
"github.com/sirupsen/logrus"

"github.com/TykTechnologies/tyk/cli"
"github.com/TykTechnologies/tyk/request"
Expand Down Expand Up @@ -48,6 +49,73 @@
gw.MonitorApplicationInstrumentation()
}

// setupPrometheusInstrumentation initializes Prometheus metrics collection and HTTP endpoint
func (gw *Gateway) setupPrometheusInstrumentation() {
gwConfig := gw.GetConfig()

if !gwConfig.Prometheus.Enabled {
return
}

log.WithFields(logrus.Fields{
"per_api_metrics": gwConfig.Prometheus.EnablePerAPIMetrics,
}).Info("Initializing Prometheus metrics...")

gw.PrometheusMetrics = NewPrometheusMetrics(gw, gwConfig.Prometheus.MetricPrefix, gwConfig.Prometheus.EnablePerAPIMetrics)

// Register optional Go and process collectors
gw.PrometheusMetrics.RegisterGoCollectors(
gwConfig.Prometheus.EnableGoCollector,
gwConfig.Prometheus.EnableProcessCollector,
)

Check warning on line 70 in gateway/instrumentation_handlers.go

View check run for this annotation

probelabs / Visor: security

security Issue

The application calls `log.Fatal` if the Prometheus metrics server fails to start, for example, due to a port conflict. This terminates the entire Tyk gateway process, creating a denial-of-service vulnerability that can be triggered by a local user or process occupying the metrics port. The failure of an auxiliary service like metrics should not cause the main application to crash.
Raw output
Replace `log.WithError(err).Fatal(...)` with a non-fatal error log, such as `log.WithError(err).Error(...)`. This will record the failure to start the metrics server while allowing the gateway to continue its primary function of proxying API traffic.

Check failure on line 70 in gateway/instrumentation_handlers.go

View check run for this annotation

probelabs / Visor: quality

reliability Issue

The use of `log.WithError(err).Fatal()` in the `startPrometheusServer` function will cause an immediate and non-graceful termination of the entire gateway process if the Prometheus server fails to start (e.g., due to a port conflict). This bypasses the gateway's graceful shutdown procedures for other components, which could lead to lost analytics data, inconsistent state, or other side effects.
Raw output
Refactor `startPrometheusServer` and `setupPrometheusInstrumentation` to return an error on failure instead of calling `log.Fatal`. The calling function (`initSystem` in `gateway/server.go`) should then handle this error by initiating a proper graceful shutdown of the entire gateway.

// Add Prometheus sink to instrument stream
prometheusSink := NewPrometheusSink(gw.PrometheusMetrics)
instrument.AddSink(prometheusSink)

// Start metrics collection
gw.PrometheusMetrics.StartMetricsCollection(gw.ctx)

// Start Prometheus HTTP server
gw.startPrometheusServer()

log.WithFields(logrus.Fields{
"listen_address": gwConfig.Prometheus.ListenAddress,
"path": gwConfig.Prometheus.Path,
"prefix": gwConfig.Prometheus.MetricPrefix,
}).Info("Prometheus metrics endpoint started")
}

Check failure on line 88 in gateway/instrumentation_handlers.go

View check run for this annotation

probelabs / Visor: architecture

architecture Issue

The gateway will exit with a fatal error if the Prometheus metrics server fails to start (e.g., due to the port being in use). An auxiliary component like the metrics server should not be able to crash the entire gateway, which is a critical process.
Raw output
Replace `log.WithError(err).Fatal(...)` with `log.WithError(err).Error(...)`. This will log the error and allow the gateway to continue running without the Prometheus metrics endpoint, improving the gateway's resilience.
// startPrometheusServer starts the HTTP server for Prometheus metrics endpoint
func (gw *Gateway) startPrometheusServer() {
gwConfig := gw.GetConfig()

mux := http.NewServeMux()
mux.Handle(gwConfig.Prometheus.Path, gw.PrometheusMetrics.Handler())

server := &http.Server{
Addr: gwConfig.Prometheus.ListenAddress,
Handler: mux,
ReadTimeout: 10 * time.Second,
WriteTimeout: 10 * time.Second,
IdleTimeout: 120 * time.Second,
}

gw.prometheusServerMu.Lock()
gw.prometheusServer = server
gw.prometheusServerMu.Unlock()

go func() {
log.WithFields(logrus.Fields{
"address": gwConfig.Prometheus.ListenAddress,
}).Info("Starting Prometheus metrics server...")

if err := server.ListenAndServe(); err != nil && err != http.ErrServerClosed {
log.WithError(err).Fatal("Prometheus metrics server failed to start")
}
}()
}

// InstrumentationMW will set basic instrumentation events, variables and timers on API jobs
func InstrumentationMW(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
Expand Down
Loading
Loading