Skip to content

Releases: DataDog/dd-agent

5.8.0-rc.1

16 May 15:13
Compare
Choose a tag to compare
5.8.0-rc.1 Pre-release
Pre-release
Merge pull request #2500 from stripe/cory-fix-mesos-key-errors

[mesos_master] Fix key error when a metric is missing.

5.7.4

21 Apr 20:04
Compare
Choose a tag to compare

5.7.4 / 04-21-2016

All platforms

Details

5.7.3...5.7.4

Changes

  • [FEATURE] Core: Add Python architecture to the info command. See #2413
  • [FEATURE] MongoDB: Collect additional WiredTiger storage engine metrics. See #1825, #2423 (Thanks @benmccann)
  • [FEATURE] MongoDB: Collect replication set information metrics. See #2237, #2429 (Thanks @rhwlo)
  • [IMPROVEMENT] Cassandra: Exclude new system_schema keyspace in version 3.x. See #2339
  • [IMPROVEMENT] Kafka: Update the default configuration file to support collection of more metrics. See #2371, #2425
  • [IMPROVEMENT] MongoDB: Collect journaling, locks, WiredTiger storage engine metrics by default and deprecate the corresponding options. See #2423
  • [IMPROVEMENT] MongoDB: Improve messaging and tags on replication set member state events. See #2409, #2432 (Thanks @antifuchs)
  • [IMPROVEMENT] Windows WMI-based checks (wmi_check, System check, IIS, Windows Service, Windows Event Log): gracefully time out WMI queries. See #2185, #2228, #2278, #2366 and #2401.
  • [BUGFIX] Agent Metrics: Flush service metadata to avoid memory leaks. See #2414

5.7.3

31 Mar 17:49
Compare
Choose a tag to compare

5.7.3 / 03-31-2016

All platforms

Details

5.7.2...5.7.3

Changes

  • [IMPROVEMENT] Linux install script: Ignore apt-get update failures and use https for apt repo. See #2378
  • [IMPROVEMENT] WMI check: Make configuration of the metric types case-insensitive. See #2392
  • [BUGFIX] Consul: Enforce that get_peers_in_cluster returns a list. See #2381
  • [BUGFIX] Core: Fix Japanese tzname encoding issue on Windows. See #2351
  • [BUGFIX] Core: On RHEL, make the stop init command kill all the Agent processes properly. See #2349
  • [BUGFIX] Core: Fix Watchdog timeout duration on the forwarder. See #2320
  • [BUGFIX] IIS: Fix CRITICAL service check on the _Total site (e.g. when no sites are specified). See #2387
  • [BUGFIX] IIS: Deal with BytesTransfered vs BytesTransferred 2008sp2 typo. See #2379
  • [BUGFIX] JMXFetch: Take into account bind_host. See #2372 and jmxfetch-85
  • [BUGFIX] JMXFetch: Handle IOException gracefully at the instance level. See jmxfetch-83
  • [BUGFIX] Packaging: Fix the version of requests shipped with the packaged Linux Agent. See omnibus-software-45
  • [BUGFIX] MySQL: Avoid check failure when InnoDB is not available or disabled. See #2385
  • [BUGFIX] SNMP: Fix errors in multiple-instance configurations caused by thread-safety issues with pysnmp cmd_generator. See #2357
  • [BUGFIX] SQLServer: Fix connection to DB when no username or password are specified. See #2311
  • [BUGFIX] vSphere: Fix SSL config options feature by upgrading the packaged version of pyvmomi. See omnibus-software-44
  • [BUGFIX] Windows Event Log: Fix check when tag_event_id:true #2397

5.7.2

17 Mar 21:38
Compare
Choose a tag to compare

5.7.2 / 03-17-2016

Windows only

Details

5.7.1...5.7.2

Changes

  • [BUGFIX] WMI: Disable query timeout, cache and re-use connections to avoid memory leaks. See #2366

5.7.1

09 Mar 21:25
Compare
Choose a tag to compare

5.7.1 / 03-09-2016

All platforms

NB: For Windows, please also refer to the 5.7.0 section. 5.7.0 was not released on Windows but the 5.7.1 Windows installer includes all the changes listed in the 5.7.0 section.

Details

5.7.0...5.7.1

Changes

  • [BUGFIX] Core: Avoid python segfault when the ctypes module is imported on SELinux-enabled environments. See omnibus-software-43
  • [BUGFIX] MySQL: Fix check failure when no tag is provided. See #2329
  • [BUGFIX] Packaging: Fix RPM package for Amazon Linux EMR. See omnibus-ruby-18

5.7.0

07 Mar 19:47
Compare
Choose a tag to compare

5.7.0 / 03-07-2016

Linux, Mac OS and Source Install only

Details

5.6.3...5.7.0

New integrations

  • Ceph
  • DNS
  • HDFS
  • MapReduce
  • StatsD
  • TCP RTT (go-metro)
  • YARN

Updated integrations

  • Apache
  • AWS
  • Cassandra
  • Consul
  • Directory
  • Docker
  • Elasticsearch
  • Go expvar
  • Gunicorn
  • HAProxy
  • HTTP
  • IIS
  • Kafka
  • Mesos
  • MongoDB
  • MySQL
  • PgBouncer
  • Postgres
  • Process
  • Redis
  • SNMP
  • SSH
  • TeamCity
  • Tomcat
  • vSphere
  • Windows Service
  • Windows Event Log
  • WMI
  • Zookeeper

Hadoop integrations (HDFS, MapReduce and YARN checks)

The Agent now includes 4 new checks to monitor Hadoop clusters:

  • 2 HDFS checks (hdfs_namenode and hdfs_datanode) that collect metrics respectively from namenodes and datanodes using the JMX-HTTP API
  • a MapReduce check that collects metrics on the running Mapreduce jobs from the Application Master's REST API
  • a YARN check that collects metrics from YARN's ResourceManager REST API

The existing hdfs check is deprecated and will be removed in a future version of the Agent. Its metric scope is entirely covered by the new hdfs_namenode check.

TCP RTT measurement with go-metro

This new feature is in beta

The Datadog Agent on 64-bit Linux is now bundled with a new component (go-metro) that passively calculates TCP RTT metrics between the agent's host and external hosts, and reports them as system.net.tcp.rtt.avg, system.net.tcp.rtt.jitter and system.net.tcp.rtt through StatsD.

go-metro follows TCP streams active within a certain period of time and estimates the RTT between any outgoing packet with data, and its corresponding TCP acknowledgement.

go-metro runs in its own process. It's disabled by default and can be enabled like a regular check by configuring an /etc/dd-agent/conf.d/go-metro.yaml file and restarting the agent.

For more details on go-metro, check out the project's GitHub page.

Ceph check

The Ceph check retrieves metrics from Ceph's Administration Tool command (ceph).

The check collects metrics from mon_status, status, df detail, osd pool stats and osd perf, and sends a service check reflecting the overall health of the cluster.

See #2264

MySQL

Multiple community-contributed additions to the MySQL check have been consolidated and merged, including:

  • metrics from the performance_schema table on MySQL >= 5.6 (thanks to @ovaistariq)
  • extra metrics on the InnoDB and MyISAM engines, from the Binlog, and from the SHOW STATUS query (thanks to @ovaistariq)
  • several schema-specific metrics, including schema size, schema average query runtime and 95th percentile query execution time (thanks again to @ovaistariq)
  • metrics on the Handler (thanks to @polynomial)
  • Galera-specific performance stats (thanks to @zdannar)
  • Query Cache metrics (thanks to @leucos)
  • a mysql.replication.slave_running service check reflecting the state of the slaves (thanks to @c960657)

Most of these additional metrics are not collected by default but can be enabled in the check's YAML file. See the YAML conf example file for details.

Various bug fixes and improvements have also been implemented:

  • the Agent's connections to MySQL are handled properly to prevent stale connections
  • the replication status is implemented on both the master and the slaves. On the master this status is determined by the Binlog status and the number of slaves.
  • the system metrics of MySQL are retrieved w/o errors on non-linux platforms by using the psutil library
  • the parsing of the MySQL server version is improved

Huge thanks to all our contributors for all these improvements!

See #2116 and #2242

Potential backward incompatibilities

Docker

The dockerized Agent now uses the docker hostname (provided by the Name param from docker info) as its own hostname when available. This means that for hosts running the dockerized Agent the reported hostname may change to this docker-provided hostname.

For reference, the rules followed by the Agent for its hostname resolution are described on this wiki page.

MongoDB

The collect_tcmalloc_metrics parameter in the YAML conf is replaced with the tcmalloc option under additional_metrics.
Please refer to the example YAML conf file for more info on the usage of the additional_metrics option.

vSphere

Instead of sending all metrics as gauges, the vSphere integration now checks the types of the metrics as reported by the VMWare module, and sends metrics as rates when applicable.

If you haven't enabled the all_metrics option on the check, the only affected metrics are cpu.usage, cpu.usagemhz, network.received and network.transmitted.
If the option is enabled, the additional affected metrics are listed here. The change will affect the values of these metrics.

WMI check

The wmi_check now only supports % as the wildcard character in the filters. The support of * as the wildcard character, which was undocumented, has been dropped.

Changes

  • [FEATURE] Ceph: New check collecting metrics from Ceph clusters. #2264
  • [FEATURE] Consul: Add SSL support. See #2034 (Thanks @diogokiss)
  • [FEATURE] DNS: New check that sends a service check reflecting the status of a hostname's resolution on a nameserver. See #2249 and #2289
  • [FEATURE] Elasticsearch: Report additional metrics related to fs, indices.segments and indices.translog. See #2143 (Thanks @bdharrington7)
  • [FEATURE] HDFS: 2 new checks (see description above). See #2235, #2260, #2274 and #2287
  • [FEATURE] Go-metro: New component that measures TCP RTT (in beta, see description above). See #2208
  • [FEATURE] Linux: Add memory metrics (slab, page tables and cached swap). See #2100 (Thanks @gphat)
  • [FEATURE] Linux: New linux_proc_extras check collecting system-wide metrics on interrupts, context switches and processes. See #2202 (Thanks @gphat)
  • [FEATURE] MapReduce: New check (see description above). See #2236
  • [FEATURE] MongoDB: Collect optional additional metrics, grouped by topic. These can be enabled with the new additional_metrics option in the check's YAML conf. Also, the underlying pymongo library has been upgraded from 2.8 to 3.2. See #2161, #2166, #2140 and #2160 (Thanks @scottbessler and @benmccann)
  • [FEATURE] MySQL: Add tag parameter for custom MySQL queries. See #2229
  • [FEATURE] MySQL: Enhance the catalog of metrics reported, and add a service check on the replication state. See #2116, #2242 and #2288 (Thanks @ovaistariq, @zdannar, @polynomial, @leucos, @Zenexer, @c960657, @nfo, @patricknelson and @scottgeary)
  • [FEATURE] Postgres: Measure user functions. See #2164
  • [FEATURE] Process: Allow configuring the path to procfs (useful when the agent is run in a container), with a newer version of psutil. See #2163 and #2134 (Thanks @sethp-jive)
  • [FEATURE] Redis: Optionally report metrics from INFO COMMANDSTATS as calls, usec and usec_per_call (prefixed with redis.command.). See #2109
  • [FEATURE] SNMP: Add support for forced SNMP data types to help w/ buggy devices. See #2165 (Thanks @chrissnell)
  • [FEATURE] SSH: Add Windows support. See #2072
  • [FEATURE] StatsD: New check collecting metrics and service checks using StatsD's admin interface. See #1978 and [#2162](https://github.com/...
Read more

5.6.3

10 Dec 18:14
Compare
Choose a tag to compare

5.6.3 / 12-10-2015

Details

5.6.2...5.6.3

Changes

  • [FEATURE] Consul: More accurate nodes_* and services_* gauges (NB: consul.check service checks are now tagged by consul_service_id rather than service-id) See #2130 (Thanks @mtougeron)
  • [FEATURE] Docker: Improve container name, image name and image tag extraction. See #2071
  • [BUGFIX] Core: Catch and log exceptions from the resources checks. See #2029
  • [BUGFIX] Core: Fix host tags sending when create_dd_check_tags is enabled. See #2088
  • [BUGFIX] Docker: Add one more cgroup location. See #2139 (Thanks @bakins)
  • [BUGFIX] Flare: Remove proxy credentials from collected datadog.conf. See #1942
  • [BUGFIX] Marathon: Fix disk typo in metric name. See #2126 (Thanks @pidah)
  • [BUGFIX] OS X: Fix memory metrics. See #2097
  • [BUGFIX] Postgres: Fix metrics not reporting with multiple relations. See #2111
  • [BUGFIX] Windows: Bundle default CA certs of requests. See #2098

5.6.2

19 Nov 21:19
Compare
Choose a tag to compare

5.6.2 / 11-16-2015

Details

5.6.1...5.6.2

Changes

  • [FEATURE] Docker/Kubernetes: Collect Kubernetes labels as tags. See #2075, #2082
  • [FEATURE] HTTPCheck: Option to support -RSA, RC4, MD5- weak SSL/TLS ciphers. See #1975, #2048
  • [BUGFIX] Core: Improve detection of agent process from PID to avoid false positives. See #2005

5.6.1

09 Nov 16:41
Compare
Choose a tag to compare

5.6.1 / 11-09-2015

Details

5.6.0...5.6.1

Changes

  • [BUGFIX] Consul: Add the main tags to service checks. See #2015 (Thanks @mtougeron)
  • [BUGFIX] Docker: Remove spurious proc root container warnings. See #2055 (Thanks @oeuftete)
  • [BUGFIX] Flare: Restore missing JMXFetch information. See #2062
  • [BUGFIX] OpenStack: Fix false-critical on the network service check. See #2063
  • [BUGFIX] Windows: Restore missing JMXFetch service logs. See #1852, #2065
  • [OTHER] Upgrade pymongo dependency from 2.6.3 to 2.8 on Windows Datadog Agent 32-bit MSI Installer.
  • [OTHER] Allow supervisor.conf to select Supervisor user. See #2064

5.6.0

05 Nov 18:00
Compare
Choose a tag to compare

5.6.0 / 11-05-2015

Linux, Mac OS and Source Install only

Details

5.5.2...5.6.0

New integration(s)

  • Kubernetes
  • OpenStack

Updated integrations

  • ActiveMQ
  • Cassandra
  • Couchbase
  • Docker
  • Dogstream
  • HAProxy
  • HTTPCheck
  • JMXFetch
  • Memcached
  • MongoDB
  • Network
  • Nginx
  • Process
  • Riak
  • SNMP
  • SQL Server
  • Unix
  • Windows
  • Windows Event Viewer
  • WMI

Kubernetes check

The Kubernetes check retrieves metrics from cAdvisor running under Kubelet.

See #2031

OpenStack check

The OpenStack check is intended to run besides individual hypervisors. It can be scoped to any set of projects living on that host via instance-level config.

At the hypervisor-level it collects:

At the project-level it collects:

Additionally it sends service checks to register the UP/DOWN state of networks discovered by the agent,
the locally running hypervisor, and the outward-facing (public || internal) API services of Nova, Neutron and Keystone

Authentication

While the check will run without issues as an admin user, it is recommended to configure a read-only datadog user and configure the check with the corresponding user/password. Instructions on setting up the datadog user + role , as well as the changes required to the policy.json file can be found here

A note on compatibility

Authentication is performed via the password method, and requires Identity API v3
Nova API v2 and v2.1 are supported, with minor additional configuration necessary for v2.

A big thank you @mtougeron !

See #1864

New WMI module wrapper

Datadog Agent 5.6.0 ships a new built-in lightweight Python WMI module wrapper, built on top of pywin32 and win32com extensions.

Specifications

  • Based on top of the pywin32 and win32com third party extensions only
  • Compatible with Raw* and Formatted Performance Data classes
    • Dynamically resolve properties' counter types
    • Hold the previous/current Raw samples to compute/format new values*
  • Fast and lightweight
    • Avoid queries overhead
    • Cache connections and qualifiers
    • Use wbemFlagForwardOnly flag to improve enumeration/memory performance

__ Raw data formatting relies on the avaibility of the corresponding calculator.
Please refer to checks.lib.wmi.counter_type for more information*

Usage

The new WMI module wrapper is used among the following checks to improve speed performances:

  • System
  • WMI

Other checks relying on WMI collection will follow in future versions of Datadog Agent.

See #2011

Original discussion thread: #1952

Credits to @TheCloudlessSky (https://github.com/TheCloudlessSky)

[Warning] JMXFetch false-positive bean match & potential backward incompatibilities issues

JMXFetch was illegitimately matching some MBeans attributes when the associated MBean had one of its parameter defined in an instance configuration.

The issue is addressed. As a result, please note that metrics related to false positive bean matches are not reported anymore.

Potential affected checks: ActiveMQ, Cassandra, JMX, Solr, Tomcat.

For more information, please get in touch with support@datadoghq.com

See #81

Changes

  • [FEATURE] Cassandra: Support Cassandra > 2.2 metric name structure (CASSANDRA-4009). See #79, #2035
  • [FEATURE] Core: Add service check count to the output of Dogstatsd 'info' section. See #1799
  • [FEATURE] Docker: Add container names as tags for events. See #2026
  • [FEATURE] HAProxy: Collect the number of available/unavailable backends. See #1915 (Thanks @a20012251)
  • [FEATURE] JMXFetch: Option to add custom JARs to the classpath. See #1996
  • [FEATURE] JMXFetch: Support float and java.lang.Float attribute types as simple JMX attributes. See #76
  • [FEATURE] JMXFetch: Support Cassandra > 2.2 metric name structure (CASSANDRA-4009). See #79
  • [FEATURE] JMXFetch: Support custom JMX Service URL to connect to, on a per-instance basis. See #80
  • [FEATURE] Kubernetes: New check. See #1919, #2031, #2038, #2039
  • [FEATURE] Memcached: Collect listen_disabled_num timeout counter. See #1995 (Thanks @alaz)
  • [FEATURE] MongoDB: Collect TCMalloc memory allocator metrics. See #1979 (Thanks @@benmccann)
  • [FEATURE] MongoDB: Report dbStats metrics for all databases. See #1855, #1961 (Thanks @asiebert)
  • [FEATURE] Network: Add UDP metrics from /proc/net/snmp in addition to the existing TCP metrics. See #1974, #1986 (Thanks @gphat)
  • [FEATURE] OpenStack: New check. See #1864, #2040
  • [FEATURE] Riak: Add custom tags to service checks' tags. See #1482, #1527, #1987
  • [FEATURE] SNMP: Option to set the OID batch size. See #1990
  • [FEATURE] Unix: Collect /proc/meminfo MemAvailable metric when available. See #1826, #1993 (Thanks @jraede)
  • [FEATURE] Windows Event Viewer: Option to tag events by event_id. See #2009
  • [IMPROVEMENT] Core: Deprecate 'use_dd' flag. See #1856, #1860 (Thanks @ssbarnea)
  • [IMPROVEMENT] Core: Fix hanging subprocess.Popen calls caused by buffer limits. See #1892
  • [IMPROVEMENT] Core: Remove the uses of list comprehensions as looping constructs. See #1939 (Thanks @jamesandariese)
  • [IMPROVEMENT] Core: Run Supervisor as dd-agent user. See #1348, #1620, #1895
  • [IMPROVEMENT] Core: Use user-defined NTP settings in 'info' command's status page. See #1985
  • [IMPROVEMENT] Dogstream: Add DEBUG logging to event collection. See #1910
  • [IMPROVEMENT] JMXFetch: Assign generic alias if not defined. See #78
  • [IMPROVEMENT] Network: Use ss instead of netstat on Linux systems. See #1156, #1859 (Thanks @tliakos)
  • [IMPROVEMENT] Nginx: Add logging on check exceptions. See #1813, #1914 (Thanks @clokep)
  • [IMPROVEMENT] Process: Improve sampling of system.processes.cpu.pct metric. See #1660, #1928
  • [IMPROVEMENT] Unix: Filter SunOS memory_cap kstats by module. See #1959 (Thanks @pfmooney)
  • [IMPROVEMENT] Windows: New WMI module wrapper to improve speed performances. See #1952, #2011 (Thanks @TheCloudlessSky)
  • [IMPROVEMENT] Windows: Switch to the built-in WMI core to improve system metric collection performances. See #1952, #2011 (Thanks @TheCloudlessSky)
  • [IMPROVEMENT] WMI: Switch to the built-in WMI core to improve the check performances. See [#1952](https://github.com...
Read more