5.7.0
5.7.0 / 03-07-2016
Linux, Mac OS and Source Install only
Details
New integrations
- Ceph
- DNS
- HDFS
- MapReduce
- StatsD
- TCP RTT (
go-metro
) - YARN
Updated integrations
- Apache
- AWS
- Cassandra
- Consul
- Directory
- Docker
- Elasticsearch
- Go expvar
- Gunicorn
- HAProxy
- HTTP
- IIS
- Kafka
- Mesos
- MongoDB
- MySQL
- PgBouncer
- Postgres
- Process
- Redis
- SNMP
- SSH
- TeamCity
- Tomcat
- vSphere
- Windows Service
- Windows Event Log
- WMI
- Zookeeper
Hadoop integrations (HDFS, MapReduce and YARN checks)
The Agent now includes 4 new checks to monitor Hadoop clusters:
- 2 HDFS checks (
hdfs_namenode
andhdfs_datanode
) that collect metrics respectively from namenodes and datanodes using the JMX-HTTP API - a MapReduce check that collects metrics on the running Mapreduce jobs from the Application Master's REST API
- a YARN check that collects metrics from YARN's ResourceManager REST API
The existing hdfs
check is deprecated and will be removed in a future version of the Agent. Its metric scope is entirely covered by the new hdfs_namenode
check.
TCP RTT measurement with go-metro
This new feature is in beta
The Datadog Agent on 64-bit Linux is now bundled with a new component (go-metro
) that passively calculates TCP RTT metrics between the agent's host and external hosts, and reports them as system.net.tcp.rtt.avg
, system.net.tcp.rtt.jitter
and system.net.tcp.rtt
through StatsD.
go-metro
follows TCP streams active within a certain period of time and estimates the RTT between any outgoing packet with data, and its corresponding TCP acknowledgement.
go-metro
runs in its own process. It's disabled by default and can be enabled like a regular check by configuring an /etc/dd-agent/conf.d/go-metro.yaml
file and restarting the agent.
For more details on go-metro
, check out the project's GitHub page.
Ceph check
The Ceph check retrieves metrics from Ceph's Administration Tool command (ceph
).
The check collects metrics from mon_status
, status
, df detail
, osd pool stats
and osd perf
, and sends a service check reflecting the overall health of the cluster.
See #2264
MySQL
Multiple community-contributed additions to the MySQL check have been consolidated and merged, including:
- metrics from the
performance_schema
table on MySQL >= 5.6 (thanks to @ovaistariq) - extra metrics on the InnoDB and MyISAM engines, from the Binlog, and from the
SHOW STATUS
query (thanks to @ovaistariq) - several schema-specific metrics, including schema size, schema average query runtime and 95th percentile query execution time (thanks again to @ovaistariq)
- metrics on the Handler (thanks to @polynomial)
- Galera-specific performance stats (thanks to @zdannar)
- Query Cache metrics (thanks to @leucos)
- a
mysql.replication.slave_running
service check reflecting the state of the slaves (thanks to @c960657)
Most of these additional metrics are not collected by default but can be enabled in the check's YAML file. See the YAML conf example file for details.
Various bug fixes and improvements have also been implemented:
- the Agent's connections to MySQL are handled properly to prevent stale connections
- the replication status is implemented on both the master and the slaves. On the master this status is determined by the Binlog status and the number of slaves.
- the system metrics of MySQL are retrieved w/o errors on non-linux platforms by using the
psutil
library - the parsing of the MySQL server version is improved
Huge thanks to all our contributors for all these improvements!
Potential backward incompatibilities
Docker
The dockerized Agent now uses the docker hostname (provided by the Name
param from docker info
) as its own hostname when available. This means that for hosts running the dockerized Agent the reported hostname may change to this docker-provided hostname.
For reference, the rules followed by the Agent for its hostname
resolution are described on this wiki page.
MongoDB
The collect_tcmalloc_metrics
parameter in the YAML conf is replaced with the tcmalloc
option under additional_metrics
.
Please refer to the example YAML conf file for more info on the usage of the additional_metrics
option.
vSphere
Instead of sending all metrics as gauge
s, the vSphere integration now checks the types of the metrics as reported by the VMWare module, and sends metrics as rate
s when applicable.
If you haven't enabled the all_metrics
option on the check, the only affected metrics are cpu.usage
, cpu.usagemhz
, network.received
and network.transmitted
.
If the option is enabled, the additional affected metrics are listed here. The change will affect the values of these metrics.
WMI check
The wmi_check
now only supports %
as the wildcard character in the filters
. The support of *
as the wildcard character, which was undocumented, has been dropped.
Changes
- [FEATURE] Ceph: New check collecting metrics from Ceph clusters. #2264
- [FEATURE] Consul: Add SSL support. See #2034 (Thanks @diogokiss)
- [FEATURE] DNS: New check that sends a service check reflecting the status of a hostname's resolution on a nameserver. See #2249 and #2289
- [FEATURE] Elasticsearch: Report additional metrics related to
fs
,indices.segments
andindices.translog
. See #2143 (Thanks @bdharrington7) - [FEATURE] HDFS: 2 new checks (see description above). See #2235, #2260, #2274 and #2287
- [FEATURE] Go-metro: New component that measures TCP RTT (in beta, see description above). See #2208
- [FEATURE] Linux: Add memory metrics (slab, page tables and cached swap). See #2100 (Thanks @gphat)
- [FEATURE] Linux: New
linux_proc_extras
check collecting system-wide metrics on interrupts, context switches and processes. See #2202 (Thanks @gphat) - [FEATURE] MapReduce: New check (see description above). See #2236
- [FEATURE] MongoDB: Collect optional additional metrics, grouped by topic. These can be enabled with the new
additional_metrics
option in the check's YAML conf. Also, the underlyingpymongo
library has been upgraded from2.8
to3.2
. See #2161, #2166, #2140 and #2160 (Thanks @scottbessler and @benmccann) - [FEATURE] MySQL: Add tag parameter for custom MySQL queries. See #2229
- [FEATURE] MySQL: Enhance the catalog of metrics reported, and add a service check on the replication state. See #2116, #2242 and #2288 (Thanks @ovaistariq, @zdannar, @polynomial, @leucos, @Zenexer, @c960657, @nfo, @patricknelson and @scottgeary)
- [FEATURE] Postgres: Measure user functions. See #2164
- [FEATURE] Process: Allow configuring the path to procfs (useful when the agent is run in a container), with a newer version of
psutil
. See #2163 and #2134 (Thanks @sethp-jive) - [FEATURE] Redis: Optionally report metrics from
INFO COMMANDSTATS
ascalls
,usec
andusec_per_call
(prefixed withredis.command.
). See #2109 - [FEATURE] SNMP: Add support for forced SNMP data types to help w/ buggy devices. See #2165 (Thanks @chrissnell)
- [FEATURE] SSH: Add Windows support. See #2072
- [FEATURE] StatsD: New check collecting metrics and service checks using StatsD's admin interface. See #1978 and #2162 (Thanks @gphat)
- [FEATURE] vSphere: Add SSL config options for certs. See #2180
- [FEATURE] YARN: New check (see description above). See #2147 and #2207
- [FEATURE] Zookeeper: Gather stats from
mntr
command and reportzookeeper.instances.<mode>
metrics as 0/1 gauge. See #2156 (Thanks @jpittis) - [IMPROVEMENT] Apache: Allow disabling ssl validation. See #2169
- [IMPROVEMENT] AWS: Incorporate security-groups into tags collected from EC2. See #1951
- [IMPROVEMENT] Cassandra: Add YAML conf for Cassandra version > 2.2. See #2142 and #2271
- [IMPROVEMENT] Directory: Show check on Windows. See #2184 (Thanks @xkrt)
- [IMPROVEMENT] Docker: Pass tags to events as well. See #2182
- [IMPROVEMENT] Docker: Use the docker hostname as the agent's
hostname
when available. See #2145 - [IMPROVEMENT] Elasticsearch: Apply custom tags to service checks too. See #2148
- [IMPROVEMENT] Go expvar: Add configuration option for custom metric namespace. See #2022 (Thanks @theckman)
- [IMPROVEMENT] Go expvar: Add counter support. See #2133 (Thanks @gphat)
- [IMPROVEMENT] Gohai: Count number of logical processors. See gohai-22
- [IMPROVEMENT] HAProxy: Add option to count statuses by service. See #2304 and #2314
- [IMPROVEMENT] HTTP: Add a
days_critical
option to the SSL certificate expiration feature. See #2087 - [IMPROVEMENT] HTTP: Support unicode in content-matching. See #2092
- [IMPROVEMENT] Kafka: Compute instant rates and capture more metrics in example configuration. See #2079 (Thanks @dougbarth)
- [IMPROVEMENT] Linux install script: Add custom provided hostname to
datadog.conf
. See #2225 (Thanks @lowl4tency) - [IMPROVEMENT] Mesos: Improve checks' performance by preventing
requests
from using chardet. See #2192 (Thanks @GregBowyer) - [IMPROVEMENT] MongoDB: Tag mongo instances by replset state. See #2244 (Thanks @rhwlo)
- [IMPROVEMENT] SNMP: Improve performance by running instances of the check in parallel. See #2152
- [IMPROVEMENT] SNMP: Make MIB constraint enforcement optional and improve resilience. See #2268
- [IMPROVEMENT] TeamCity: Allow disabling ssl validation. See #2091 (Thanks @jslatts)
- [IMPROVEMENT] Unix: Revamp source install script. See #2198 and #2199
- [IMPROVEMENT] vSphere: Add
network.received
andnetwork.transmitted
to the basic metrics collected. See #1824 - [IMPROVEMENT] vSphere: Check metric type to determine how to report (
rate
orgauge
). See #2115 - [IMPROVEMENT] Windows: Add uptime metric. See #2135, #2292 and #2299
- [IMPROVEMENT] Windows WMI-based checks (
wmi_check
, System check, IIS, Windows Service, Windows Event Log): gracefully time out WMI queries. See #2185, #2228 and #2278 - [IMPROVEMENT] Windows IIS, Service and Event Log checks: use the new WMI wrapper with increased performance. See #2136
- [IMPROVEMENT] Windows packaging: Tighten permissions on
datadog.conf
. See #2210 - [BUGFIX] AWS: Use proxy settings for EC2 tag collection. See #2201
- [BUGFIX] AWS: During EC2 tags collection, log a warning when the instance is not associated with an IAM role. See #2285
- [BUGFIX] Core: Do not log API keys. See #2146
- [BUGFIX] Core: Fix cases of low/no disk space causing the Agent to crash when calling subprocesses. See #2223
- [BUGFIX] Core: Make Dogstatsd recover gracefully from serialization errors. See #2176
- [BUGFIX] Core: Set agent pid file and path from constants. See #2128 (Thanks @urosgruber)
- [BUGFIX] Development: Fix test of platform in
etcd
CI setup script. See #2205 (Thanks @ojongerius) - [BUGFIX] Docker: Avoid event collection failure if an event has no ID param. See #2308
- [BUGFIX] Docker: Catch exception when getting k8s labels fails. See #2200
- [BUGFIX] Docker: Don't warn if process finishes before measuring. See #2114 (Thanks @oeuftete)
- [BUGFIX] Docker: Remove misleading warning on excluded containers. See #2179 (Thanks @EdRow)
- [BUGFIX] Documentation: Update link to dogstatsd guide in
datadog.conf
. See #2181 - [BUGFIX] Elasticsearch: Optionally collect pending task stats. See #2250
- [BUGFIX] Flare: Use ssl and proxy settings from
datadog.conf
. See #2234 (Thanks @tebriel) - [BUGFIX] Flare: Mention path to tar file in Windows UI. See #2084
- [BUGFIX] FreeBSD: Use correct log file for syslog. See #2171
- [BUGFIX] Go expvar: Add timeout for requests to get go expvar metrics. See #2183 (Thanks @gphat)
- [BUGFIX] Gohai: Log unexpected OSError exceptions instead of re-raising them. See #2309
- [BUGFIX] Gunicorn: Mention in YAML conf that the
setproctitle
module is required. See #2215 - [BUGFIX] HTTP: Add an option to disable warnings when ssl validation is disabled. See #2193
- [BUGFIX] HTTP: Improve log message when http code is incorrect. See #2203
- [BUGFIX] HTTP: Rename
ssl_expire
tocheck_certificate_expiration
in YAML comment. See #2086 (Thanks @MiguelMoll) - [BUGFIX] HTTP: Use proxy settings from
datadog.conf
. See #2112 - [BUGFIX] Kubernetes: Remove unused function. See #2157
- [BUGFIX] OpenStack: Improve docs in YAML conf. See #2094
- [BUGFIX] OpenStack: Remove recommendation for omitting trailing slashes in YAML conf. See #2081
- [BUGFIX] Mac OS X: Fix
gohai
call by passing correct PATH to supervisor. See #2206 - [BUGFIX] Mesos slave: Allow configuring mesos master port. See #2189
- [BUGFIX] MySQL: Fix buggy tagging in service_checks on instances configured w/ unix socket. See #2216
- [BUGFIX] PgBouncer: Avoid raising error when there are no results for a query. See #2280 (Thanks @hjkatz)
- [BUGFIX] SNMP: Fix bug when the requested oid is prefixed by another requested oid. See #2246 (Thanks @xkrt)
- [BUGFIX] Tomcat: Fix bad attribute in YAML conf file. See #2153
- [BUGFIX] Unix: Fix URL of get-pip script in source install script. See #2220 (Thanks @mooney6023)
- [BUGFIX] Windows: Fix cases of collector getting wrongfully restarted by watchdog after one correct watchdog restart. See #2175
- [BUGFIX] WMI check: Remove unnecessary warnings on
Name
property. See #2291 - [BUGFIX] WMI check: Always add the
tag_by
parameter to the collected properties. See #2296