Releases: DataDog/dd-agent
5.8.0-rc.1
Merge pull request #2500 from stripe/cory-fix-mesos-key-errors [mesos_master] Fix key error when a metric is missing.
5.7.4
5.7.4 / 04-21-2016
All platforms
Details
Changes
- [FEATURE] Core: Add Python architecture to the
info
command. See #2413 - [FEATURE] MongoDB: Collect additional
WiredTiger
storage engine metrics. See #1825, #2423 (Thanks @benmccann) - [FEATURE] MongoDB: Collect replication set information metrics. See #2237, #2429 (Thanks @rhwlo)
- [IMPROVEMENT] Cassandra: Exclude new
system_schema
keyspace in version 3.x. See #2339 - [IMPROVEMENT] Kafka: Update the default configuration file to support collection of more metrics. See #2371, #2425
- [IMPROVEMENT] MongoDB: Collect journaling, locks,
WiredTiger
storage engine metrics by default and deprecate the corresponding options. See #2423 - [IMPROVEMENT] MongoDB: Improve messaging and tags on replication set member state events. See #2409, #2432 (Thanks @antifuchs)
- [IMPROVEMENT] Windows WMI-based checks (
wmi_check
, System check, IIS, Windows Service, Windows Event Log): gracefully time out WMI queries. See #2185, #2228, #2278, #2366 and #2401. - [BUGFIX] Agent Metrics: Flush service metadata to avoid memory leaks. See #2414
5.7.3
5.7.3 / 03-31-2016
All platforms
Details
Changes
- [IMPROVEMENT] Linux install script: Ignore
apt-get update
failures and use https for apt repo. See #2378 - [IMPROVEMENT] WMI check: Make configuration of the metric types case-insensitive. See #2392
- [BUGFIX] Consul: Enforce that
get_peers_in_cluster
returns a list. See #2381 - [BUGFIX] Core: Fix Japanese tzname encoding issue on Windows. See #2351
- [BUGFIX] Core: On RHEL, make the
stop
init command kill all the Agent processes properly. See #2349 - [BUGFIX] Core: Fix Watchdog timeout duration on the forwarder. See #2320
- [BUGFIX] IIS: Fix
CRITICAL
service check on the_Total
site (e.g. when nosites
are specified). See #2387 - [BUGFIX] IIS: Deal with BytesTransfered vs BytesTransferred 2008sp2 typo. See #2379
- [BUGFIX] JMXFetch: Take into account
bind_host
. See #2372 and jmxfetch-85 - [BUGFIX] JMXFetch: Handle IOException gracefully at the instance level. See jmxfetch-83
- [BUGFIX] Packaging: Fix the version of
requests
shipped with the packaged Linux Agent. See omnibus-software-45 - [BUGFIX] MySQL: Avoid check failure when InnoDB is not available or disabled. See #2385
- [BUGFIX] SNMP: Fix errors in multiple-instance configurations caused by thread-safety issues with pysnmp
cmd_generator
. See #2357 - [BUGFIX] SQLServer: Fix connection to DB when no username or password are specified. See #2311
- [BUGFIX] vSphere: Fix SSL config options feature by upgrading the packaged version of
pyvmomi
. See omnibus-software-44 - [BUGFIX] Windows Event Log: Fix check when
tag_event_id:true
#2397
5.7.2
5.7.2 / 03-17-2016
Windows only
Details
Changes
- [BUGFIX] WMI: Disable query timeout, cache and re-use connections to avoid memory leaks. See #2366
5.7.1
5.7.1 / 03-09-2016
All platforms
NB: For Windows, please also refer to the 5.7.0
section. 5.7.0
was not released on Windows but the 5.7.1
Windows installer includes all the changes listed in the 5.7.0
section.
Details
Changes
- [BUGFIX] Core: Avoid python segfault when the ctypes module is imported on SELinux-enabled environments. See omnibus-software-43
- [BUGFIX] MySQL: Fix check failure when no tag is provided. See #2329
- [BUGFIX] Packaging: Fix RPM package for Amazon Linux EMR. See omnibus-ruby-18
5.7.0
5.7.0 / 03-07-2016
Linux, Mac OS and Source Install only
Details
New integrations
- Ceph
- DNS
- HDFS
- MapReduce
- StatsD
- TCP RTT (
go-metro
) - YARN
Updated integrations
- Apache
- AWS
- Cassandra
- Consul
- Directory
- Docker
- Elasticsearch
- Go expvar
- Gunicorn
- HAProxy
- HTTP
- IIS
- Kafka
- Mesos
- MongoDB
- MySQL
- PgBouncer
- Postgres
- Process
- Redis
- SNMP
- SSH
- TeamCity
- Tomcat
- vSphere
- Windows Service
- Windows Event Log
- WMI
- Zookeeper
Hadoop integrations (HDFS, MapReduce and YARN checks)
The Agent now includes 4 new checks to monitor Hadoop clusters:
- 2 HDFS checks (
hdfs_namenode
andhdfs_datanode
) that collect metrics respectively from namenodes and datanodes using the JMX-HTTP API - a MapReduce check that collects metrics on the running Mapreduce jobs from the Application Master's REST API
- a YARN check that collects metrics from YARN's ResourceManager REST API
The existing hdfs
check is deprecated and will be removed in a future version of the Agent. Its metric scope is entirely covered by the new hdfs_namenode
check.
TCP RTT measurement with go-metro
This new feature is in beta
The Datadog Agent on 64-bit Linux is now bundled with a new component (go-metro
) that passively calculates TCP RTT metrics between the agent's host and external hosts, and reports them as system.net.tcp.rtt.avg
, system.net.tcp.rtt.jitter
and system.net.tcp.rtt
through StatsD.
go-metro
follows TCP streams active within a certain period of time and estimates the RTT between any outgoing packet with data, and its corresponding TCP acknowledgement.
go-metro
runs in its own process. It's disabled by default and can be enabled like a regular check by configuring an /etc/dd-agent/conf.d/go-metro.yaml
file and restarting the agent.
For more details on go-metro
, check out the project's GitHub page.
Ceph check
The Ceph check retrieves metrics from Ceph's Administration Tool command (ceph
).
The check collects metrics from mon_status
, status
, df detail
, osd pool stats
and osd perf
, and sends a service check reflecting the overall health of the cluster.
See #2264
MySQL
Multiple community-contributed additions to the MySQL check have been consolidated and merged, including:
- metrics from the
performance_schema
table on MySQL >= 5.6 (thanks to @ovaistariq) - extra metrics on the InnoDB and MyISAM engines, from the Binlog, and from the
SHOW STATUS
query (thanks to @ovaistariq) - several schema-specific metrics, including schema size, schema average query runtime and 95th percentile query execution time (thanks again to @ovaistariq)
- metrics on the Handler (thanks to @polynomial)
- Galera-specific performance stats (thanks to @zdannar)
- Query Cache metrics (thanks to @leucos)
- a
mysql.replication.slave_running
service check reflecting the state of the slaves (thanks to @c960657)
Most of these additional metrics are not collected by default but can be enabled in the check's YAML file. See the YAML conf example file for details.
Various bug fixes and improvements have also been implemented:
- the Agent's connections to MySQL are handled properly to prevent stale connections
- the replication status is implemented on both the master and the slaves. On the master this status is determined by the Binlog status and the number of slaves.
- the system metrics of MySQL are retrieved w/o errors on non-linux platforms by using the
psutil
library - the parsing of the MySQL server version is improved
Huge thanks to all our contributors for all these improvements!
Potential backward incompatibilities
Docker
The dockerized Agent now uses the docker hostname (provided by the Name
param from docker info
) as its own hostname when available. This means that for hosts running the dockerized Agent the reported hostname may change to this docker-provided hostname.
For reference, the rules followed by the Agent for its hostname
resolution are described on this wiki page.
MongoDB
The collect_tcmalloc_metrics
parameter in the YAML conf is replaced with the tcmalloc
option under additional_metrics
.
Please refer to the example YAML conf file for more info on the usage of the additional_metrics
option.
vSphere
Instead of sending all metrics as gauge
s, the vSphere integration now checks the types of the metrics as reported by the VMWare module, and sends metrics as rate
s when applicable.
If you haven't enabled the all_metrics
option on the check, the only affected metrics are cpu.usage
, cpu.usagemhz
, network.received
and network.transmitted
.
If the option is enabled, the additional affected metrics are listed here. The change will affect the values of these metrics.
WMI check
The wmi_check
now only supports %
as the wildcard character in the filters
. The support of *
as the wildcard character, which was undocumented, has been dropped.
Changes
- [FEATURE] Ceph: New check collecting metrics from Ceph clusters. #2264
- [FEATURE] Consul: Add SSL support. See #2034 (Thanks @diogokiss)
- [FEATURE] DNS: New check that sends a service check reflecting the status of a hostname's resolution on a nameserver. See #2249 and #2289
- [FEATURE] Elasticsearch: Report additional metrics related to
fs
,indices.segments
andindices.translog
. See #2143 (Thanks @bdharrington7) - [FEATURE] HDFS: 2 new checks (see description above). See #2235, #2260, #2274 and #2287
- [FEATURE] Go-metro: New component that measures TCP RTT (in beta, see description above). See #2208
- [FEATURE] Linux: Add memory metrics (slab, page tables and cached swap). See #2100 (Thanks @gphat)
- [FEATURE] Linux: New
linux_proc_extras
check collecting system-wide metrics on interrupts, context switches and processes. See #2202 (Thanks @gphat) - [FEATURE] MapReduce: New check (see description above). See #2236
- [FEATURE] MongoDB: Collect optional additional metrics, grouped by topic. These can be enabled with the new
additional_metrics
option in the check's YAML conf. Also, the underlyingpymongo
library has been upgraded from2.8
to3.2
. See #2161, #2166, #2140 and #2160 (Thanks @scottbessler and @benmccann) - [FEATURE] MySQL: Add tag parameter for custom MySQL queries. See #2229
- [FEATURE] MySQL: Enhance the catalog of metrics reported, and add a service check on the replication state. See #2116, #2242 and #2288 (Thanks @ovaistariq, @zdannar, @polynomial, @leucos, @Zenexer, @c960657, @nfo, @patricknelson and @scottgeary)
- [FEATURE] Postgres: Measure user functions. See #2164
- [FEATURE] Process: Allow configuring the path to procfs (useful when the agent is run in a container), with a newer version of
psutil
. See #2163 and #2134 (Thanks @sethp-jive) - [FEATURE] Redis: Optionally report metrics from
INFO COMMANDSTATS
ascalls
,usec
andusec_per_call
(prefixed withredis.command.
). See #2109 - [FEATURE] SNMP: Add support for forced SNMP data types to help w/ buggy devices. See #2165 (Thanks @chrissnell)
- [FEATURE] SSH: Add Windows support. See #2072
- [FEATURE] StatsD: New check collecting metrics and service checks using StatsD's admin interface. See #1978 and [#2162](https://github.com/...
5.6.3
5.6.3 / 12-10-2015
Details
Changes
- [FEATURE] Consul: More accurate nodes_* and services_* gauges (NB:
consul.check
service checks are now tagged byconsul_service_id
rather thanservice-id
) See #2130 (Thanks @mtougeron) - [FEATURE] Docker: Improve container name, image name and image tag extraction. See #2071
- [BUGFIX] Core: Catch and log exceptions from the resources checks. See #2029
- [BUGFIX] Core: Fix host tags sending when
create_dd_check_tags
is enabled. See #2088 - [BUGFIX] Docker: Add one more cgroup location. See #2139 (Thanks @bakins)
- [BUGFIX] Flare: Remove proxy credentials from collected datadog.conf. See #1942
- [BUGFIX] Marathon: Fix disk typo in metric name. See #2126 (Thanks @pidah)
- [BUGFIX] OS X: Fix memory metrics. See #2097
- [BUGFIX] Postgres: Fix metrics not reporting with multiple relations. See #2111
- [BUGFIX] Windows: Bundle default CA certs of
requests
. See #2098
5.6.2
5.6.2 / 11-16-2015
Details
Changes
5.6.1
5.6.1 / 11-09-2015
Details
Changes
- [BUGFIX] Consul: Add the main tags to service checks. See #2015 (Thanks @mtougeron)
- [BUGFIX] Docker: Remove spurious proc root container warnings. See #2055 (Thanks @oeuftete)
- [BUGFIX] Flare: Restore missing JMXFetch information. See #2062
- [BUGFIX] OpenStack: Fix false-critical on the network service check. See #2063
- [BUGFIX] Windows: Restore missing JMXFetch service logs. See #1852, #2065
- [OTHER] Upgrade
pymongo
dependency from2.6.3
to2.8
on Windows Datadog Agent 32-bit MSI Installer. - [OTHER] Allow
supervisor.conf
to select Supervisor user. See #2064
5.6.0
5.6.0 / 11-05-2015
Linux, Mac OS and Source Install only
Details
New integration(s)
- Kubernetes
- OpenStack
Updated integrations
- ActiveMQ
- Cassandra
- Couchbase
- Docker
- Dogstream
- HAProxy
- HTTPCheck
- JMXFetch
- Memcached
- MongoDB
- Network
- Nginx
- Process
- Riak
- SNMP
- SQL Server
- Unix
- Windows
- Windows Event Viewer
- WMI
Kubernetes check
The Kubernetes check retrieves metrics from cAdvisor running under Kubelet.
See #2031
OpenStack check
The OpenStack check is intended to run besides individual hypervisors. It can be scoped to any set of projects living on that host via instance-level config.
At the hypervisor-level it collects:
At the project-level it collects:
Additionally it sends service checks to register the UP/DOWN state of networks discovered by the agent,
the locally running hypervisor, and the outward-facing (public || internal
) API services of Nova, Neutron and Keystone
Authentication
While the check will run without issues as an admin
user, it is recommended to configure a read-only datadog
user and configure the check with the corresponding user/password. Instructions on setting up the datadog user + role , as well as the changes required to the policy.json
file can be found here
A note on compatibility
Authentication is performed via the password method, and requires Identity API v3
Nova API v2 and v2.1 are supported, with minor additional configuration necessary for v2.
A big thank you @mtougeron !
See #1864
New WMI module wrapper
Datadog Agent 5.6.0 ships a new built-in lightweight Python WMI module wrapper, built on top of pywin32
and win32com
extensions.
Specifications
- Based on top of the
pywin32
andwin32com
third party extensions only - Compatible with
Raw
* andFormatted
Performance Data classes- Dynamically resolve properties' counter types
- Hold the previous/current
Raw
samples to compute/format new values*
- Fast and lightweight
- Avoid queries overhead
- Cache connections and qualifiers
- Use
wbemFlagForwardOnly
flag to improve enumeration/memory performance
__ Raw
data formatting relies on the avaibility of the corresponding calculator.
Please refer to checks.lib.wmi.counter_type
for more information*
Usage
The new WMI module wrapper is used among the following checks to improve speed performances:
- System
- WMI
Other checks relying on WMI collection will follow in future versions of Datadog Agent.
See #2011
Original discussion thread: #1952
Credits to @TheCloudlessSky (https://github.com/TheCloudlessSky)
[Warning] JMXFetch false-positive bean match & potential backward incompatibilities issues
JMXFetch was illegitimately matching some MBeans attributes when the associated MBean had one of its parameter defined in an instance configuration.
The issue is addressed. As a result, please note that metrics related to false positive bean matches are not reported anymore.
Potential affected checks: ActiveMQ, Cassandra, JMX, Solr, Tomcat.
For more information, please get in touch with support@datadoghq.com
See #81
Changes
- [FEATURE] Cassandra: Support Cassandra > 2.2 metric name structure (CASSANDRA-4009). See #79, #2035
- [FEATURE] Core: Add service check count to the output of Dogstatsd 'info' section. See #1799
- [FEATURE] Docker: Add container names as tags for events. See #2026
- [FEATURE] HAProxy: Collect the number of available/unavailable backends. See #1915 (Thanks @a20012251)
- [FEATURE] JMXFetch: Option to add custom JARs to the classpath. See #1996
- [FEATURE] JMXFetch: Support
float
andjava.lang.Float
attribute types as simple JMX attributes. See #76 - [FEATURE] JMXFetch: Support Cassandra > 2.2 metric name structure (CASSANDRA-4009). See #79
- [FEATURE] JMXFetch: Support custom JMX Service URL to connect to, on a per-instance basis. See #80
- [FEATURE] Kubernetes: New check. See #1919, #2031, #2038, #2039
- [FEATURE] Memcached: Collect
listen_disabled_num
timeout counter. See #1995 (Thanks @alaz) - [FEATURE] MongoDB: Collect TCMalloc memory allocator metrics. See #1979 (Thanks @@benmccann)
- [FEATURE] MongoDB: Report
dbStats
metrics for all databases. See #1855, #1961 (Thanks @asiebert) - [FEATURE] Network: Add UDP metrics from
/proc/net/snmp
in addition to the existing TCP metrics. See #1974, #1986 (Thanks @gphat) - [FEATURE] OpenStack: New check. See #1864, #2040
- [FEATURE] Riak: Add custom tags to service checks' tags. See #1482, #1527, #1987
- [FEATURE] SNMP: Option to set the OID batch size. See #1990
- [FEATURE] Unix: Collect
/proc/meminfo
MemAvailable
metric when available. See #1826, #1993 (Thanks @jraede) - [FEATURE] Windows Event Viewer: Option to tag events by
event_id
. See #2009 - [IMPROVEMENT] Core: Deprecate 'use_dd' flag. See #1856, #1860 (Thanks @ssbarnea)
- [IMPROVEMENT] Core: Fix hanging
subprocess.Popen
calls caused by buffer limits. See #1892 - [IMPROVEMENT] Core: Remove the uses of list comprehensions as looping constructs. See #1939 (Thanks @jamesandariese)
- [IMPROVEMENT] Core: Run Supervisor as
dd-agent
user. See #1348, #1620, #1895 - [IMPROVEMENT] Core: Use user-defined NTP settings in 'info' command's status page. See #1985
- [IMPROVEMENT] Dogstream: Add DEBUG logging to event collection. See #1910
- [IMPROVEMENT] JMXFetch: Assign generic alias if not defined. See #78
- [IMPROVEMENT] Network: Use
ss
instead ofnetstat
on Linux systems. See #1156, #1859 (Thanks @tliakos) - [IMPROVEMENT] Nginx: Add logging on check exceptions. See #1813, #1914 (Thanks @clokep)
- [IMPROVEMENT] Process: Improve sampling of
system.processes.cpu.pct
metric. See #1660, #1928 - [IMPROVEMENT] Unix: Filter SunOS
memory_cap
kstats
by module. See #1959 (Thanks @pfmooney) - [IMPROVEMENT] Windows: New WMI module wrapper to improve speed performances. See #1952, #2011 (Thanks @TheCloudlessSky)
- [IMPROVEMENT] Windows: Switch to the built-in WMI core to improve system metric collection performances. See #1952, #2011 (Thanks @TheCloudlessSky)
- [IMPROVEMENT] WMI: Switch to the built-in WMI core to improve the check performances. See [#1952](https://github.com...