Become a sponsor to Daniel Lemire
Who is Daniel?
Daniel Lemire is a Computer Science Professor at the University of Quebec (TELUQ). Daniel Lemire ranks within the top 2% of scientists worldwide according to Stanford University's 2023 ranking. He is part of the 0.0006% most followed programmers on GitHub. He is also editor of the journal Software: Practice and Experience.
He is @lemire on Twitter/X, and he blogs weekly on software perfmance at https://lemire.me/blog
He is focused on software performance: fast indexes, fast parsing and serialization, fast compression, fast random-number generation and so forth. He programs in various programming languages (Java, Go, C++, C, Swift, Python, JavaScript).
Why sponsor Daniel?
Sponsorship supports his work on producing industrial-quality software packages and contributions, as well as supporting his regular blog posts.
Some of His work
-
With Geoff Langdale, John Keiser and others, he is the author of the fastest JSON library in the world: simdjson. It is the first library to allow parsing gigabytes of JSON per second. It is used by many important systems such as Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks.
-
With YagizNizipli and others, he is the author of Ada URL parser, the URL parser of Node.js and Cloudflare workers. We believe that it is the fastest WHATWG-compliant in the world.
-
With Robert Clausecker, Wojciech Muła, John Keiser and others, he wrote the simdutf library, the fastest Unicode transcoding and base64 library in the world. It accelerates two of the major JavaScript runtime systems (Node.js and Bun).
-
He was instrumental in designing the fastest number parsing algorithm in the world. With collaborators, he wrote the fast_float library which is part of GCC. This number parsing approach is part of the Go, C# and Rust runtime libraries. It is also in WebKit, the engine of Safari, Apple's web browser. It was also adopted by Chromium, the engine behind Google Chrome and Microsoft Edge. For the first time, it allowed us to parse numbers at over a gigabyte per second.
-
He designed the Roaring bitmap format as an efficient bitmap index format. The format has become a standard. It is used by Apache Lucene and derivative systems such as Solr and Elasticsearch, Apache Druid, etc. The YouTube SQL Engine, Google Procella, uses Roaring bitmaps for indexing. With engineers such as Richard Startin, he was instrumental in many of its implementations such as RoaringBitmap (Java), roaring (Go) and CRoaring (C and C++).
-
With Nathan Kurz and Leonid Boytsov, he helped design many accelerated integer compression techniques that surpassed the state-of-the-art by a wide margin. His FastPFor research library became a reference.
-
With Thomas Mueller Graf, he designed and implemented the Binary Fuse Filters, a faster and smaller alternatives to the Bloom filters. It is available in Go, C, Python, and in many other languages.
Selected Publications
- Robert Clausecker, Daniel Lemire, Transcoding Unicode Characters with AVX-512 Instructions, Software: Practice and Experience 53 (12), 2023.
- John Keiser, Daniel Lemire, On-Demand JSON: A Better Way to Parse Documents?, Software: Practice and Experience (to appear)
- Yagiz Nizipli, Daniel Lemire, Parsing Millions of URLs per Second, Software: Practice and Experience (to appear).
- Thomas Mueller Graf, Daniel Lemire, Binary Fuse Filters: Fast and Smaller Than Xor Filters, Journal of Experimental Algorithmics 27, 2022
- Geoff Langdale, Daniel Lemire, Parsing Gigabytes of JSON per Second, VLDB Journal 28 (6), 2019
- Daniel Lemire, Fast Random Integer Generation in an Interval, ACM Transactions on Modeling and Computer Simulation 29 (1), 2019
- S Chambi, D Lemire, O Kaser, R Godin, Better bitmap performance with Roaring bitmaps, Software: Practice and Experience 46 (5), 709–719
- D Lemire, L Boytsov, Decoding billions of integers per second through vectorization, Software: Practice & Experience 45 (1), 1-29
Featured work
-
bits-and-blooms/bitset
Go package implementing bitsets
Go 1,377 -
RoaringBitmap/RoaringBitmap
A better compressed bitset in Java: used by Apache Spark, Netflix Atlas, Apache Pinot, Tablesaw, and many others
Java 3,596 -
RoaringBitmap/roaring
Roaring bitmaps in Go (golang), used by InfluxDB, Bleve, DataDog
Go 2,594 -
simdjson/simdjson
Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks
C++ 19,590 -
fastfloat/fast_float
Fast and exact implementation of the C++ from_chars functions for number types: 4x to 10x faster than strtod, part of GCC 12, Chromium, Redis and WebKit/Safari
C++ 1,679 -
simdutf/simdutf
Unicode routines (UTF8, UTF16, UTF32) and Base64: billions of characters per second using SSE2, AVX2, NEON, AVX-512, RISC-V Vector Extension, LoongArch64. Part of Node.js, WebKit/Safari, Ladybird, …
C++ 1,261