-
Notifications
You must be signed in to change notification settings - Fork 12
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
add cli and getting started docs (#104)
* add getting started docs * add cli docs * add sidebar
- Loading branch information
1 parent
1244ed1
commit 8f7115a
Showing
5 changed files
with
382 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,162 @@ | ||
--- | ||
title: Command line interface | ||
description: How to use adaparse CLI | ||
--- | ||
|
||
The adaparse command tool takes URL strings (ASCII/UTF-8) and it validates, normalizes and queries them efficiently. | ||
|
||
## Options | ||
|
||
- `-d`, `--diagram`: Print a diagram of the result | ||
- `-u`, `--url`: URL Parameter (required) | ||
- `-h`, `--help`: Print usage | ||
- `-g`, `--get`: Get a specific part of the URL (e.g., 'origin', 'host', etc. as mentioned in the examples above) | ||
- `-b`, `--benchmark`: Run benchmark for piped file functions | ||
- `-p`, `--path`: Process all the URLs in a given file | ||
- `-o`, `--output`: Output the results of the parsing to a file | ||
|
||
## Usage/Examples | ||
|
||
### Well-formatted URL | ||
|
||
```bash | ||
adaparse "http://www.google.com" | ||
``` | ||
|
||
**Output:** | ||
|
||
```text | ||
http://www.google.com | ||
``` | ||
|
||
### Diagram | ||
|
||
```bash | ||
adaparse -d http://www.google.com/bal\?a\=\=11\#fddfds | ||
``` | ||
|
||
**Output:** | ||
|
||
```text | ||
http://www.google.com/bal?a==11#fddfds [38 bytes] | ||
| | | | | | ||
| | | | `------ hash_start | ||
| | | `------------ search_start 25 | ||
| | `---------------- pathname_start 21 | ||
| | `---------------- host_end 21 | ||
| `------------------------------ host_start 7 | ||
| `------------------------------ username_end 7 | ||
`-------------------------------- protocol_end 5 | ||
``` | ||
|
||
### Pipe Operator | ||
|
||
Ada can process URLs from piped input, making it easy to integrate with other command-line tools | ||
that produce ASCII or UTF-8 outputs. Here's an example of how to pipe the output of another command into Ada. | ||
Given a list of URLs, one by line, we may query the normalized URL string (`href`) and detect any malformed URL: | ||
|
||
```bash | ||
cat dragonball_url.txt | adaparse --get href | ||
``` | ||
|
||
**Output:** | ||
|
||
```text | ||
http://www.goku.com | ||
http://www.vegeta.com | ||
http://www.gohan.com | ||
``` | ||
|
||
Our tool supports the passing of arguments to each URL in said file so | ||
that you can query for the hash, the host, the protocol, the port, | ||
the origin, the search, the password, the username, the pathname | ||
or the hostname: | ||
|
||
```bash | ||
cat dragonball_url.txt | adaparse -g host | ||
``` | ||
|
||
**Output:** | ||
|
||
```text | ||
www.goku.com | ||
www.vegeta.com | ||
www.gohan.com | ||
``` | ||
|
||
If you omit `-g`, it will only provide a list of invalid URLs. This might be | ||
useful if you want to valid quickly a list of URLs. | ||
|
||
### Benchmark Runner | ||
|
||
The benchmark flag can be used to output the time it takes to process piped input: | ||
|
||
```bash | ||
cat wikipedia_100k.txt | adaparse -b | ||
``` | ||
|
||
**Output:** | ||
|
||
```text | ||
Invalid URL: 1968:_Die_Kinder_der_Diktatur | ||
Invalid URL: 58957:_The_Bluegrass_Guitar_Collection | ||
Invalid URL: 650luc:_Gangsta_Grillz | ||
Invalid URL: Q4%3A57 | ||
Invalid URL: Q10%3A47 | ||
Invalid URL: Q5%3A45 | ||
Invalid URL: Q40%3A28 | ||
Invalid URL: 1:1_scale | ||
Invalid URL: 1893:_A_World's_Fair_Mystery | ||
Invalid URL: 12:51_(Krissy_%26_Ericka_song) | ||
Invalid URL: 111:_A_Nelson_Number | ||
Invalid URL: 7:00AM-8%3A00AM_(24_season_5) | ||
Invalid URL: Q53%3A31 | ||
read 5209265 bytes in 32819917 ns using 100000 lines, used 160 loads | ||
0.1587226744053009 GB/s | ||
``` | ||
|
||
### Saving result to file system | ||
|
||
There is an option to output to a file on disk: | ||
|
||
```bash | ||
cat wikipedia_100k.txt | adaparse -o wiki_output.txt | ||
``` | ||
|
||
As well as read in from a file on disk without going through cat: | ||
|
||
```bash | ||
adaparse -p wikipedia_top_100_txt | ||
``` | ||
|
||
#### Advanced Usage | ||
|
||
You may also combine different flags together. E.g. Say one wishes to extract only the host from URLs stored in wikipedia.txt and output it to the test_write.txt file: | ||
|
||
```bash | ||
adaparse -p wikipedia_top100.txt -o test_write.txt -g host -b | ||
``` | ||
|
||
**Output:** | ||
|
||
```text | ||
read 5209265 bytes in 26737131 ns using 100000 lines, total_bytes is 5209265 used 160 loads | ||
0.19483260937757307 GB/s(base) | ||
``` | ||
|
||
Content of test_write.txt: | ||
|
||
```text | ||
(---snip---) | ||
en.wikipedia.org | ||
en.wikipedia.org | ||
en.wikipedia.org | ||
en.wikipedia.org | ||
en.wikipedia.org | ||
en.wikipedia.org | ||
en.wikipedia.org | ||
en.wikipedia.org | ||
en.wikipedia.org | ||
en.wikipedia.org | ||
(---snip---) | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
--- | ||
title: CLI Performance | ||
description: How is the performance of Ada and adaparse CLI? | ||
--- | ||
|
||
Our `adaparse` tool may outperform other popular alternatives. We offer a [collection of | ||
sets of URLs](https://github.com/ada-url/url-various-datasets) for benchmarking purposes. | ||
The following results are on a MacBook Air 2022 (M2 processor) using LLVM 14. We | ||
compare against [trurl](https://github.com/curl/trurl) version 0.6 (libcurl/7.87.0). | ||
|
||
### Benchmarks | ||
|
||
<details> | ||
<summary> | ||
**wikipedia_100k dataset**, adaparse can parse URLs **three times faster than trurl**. | ||
</summary> | ||
|
||
```bash | ||
time cat url-various-datasets/wikipedia/wikipedia_100k.txt| trurl --url-file - &> /dev/null 1 | ||
cat url-various-datasets/wikipedia/wikipedia_100k.txt 0,00s user 0,01s system 3% cpu 0,179 total | ||
trurl --url-file - &> /dev/null 0,14s user 0,03s system 98% cpu 0,180 total | ||
|
||
|
||
time cat url-various-datasets/wikipedia/wikipedia_100k.txt| ./build/tools/cli/adaparse -g href &> /dev/null | ||
cat url-various-datasets/wikipedia/wikipedia_100k.txt 0,00s user 0,00s system 10% cpu 0,056 total | ||
./build/tools/cli/adaparse -g href &> /dev/null 0,05s user 0,00s system 93% cpu 0,055 total | ||
``` | ||
</details> | ||
<details> | ||
<summary> | ||
Using **top100 dataset**, adaparse is **twice as fast as the trurl**. | ||
</summary> | ||
|
||
```bash | ||
time cat url-various-datasets/top100/top100.txt| trurl --url-file - &> /dev/null 1 | ||
cat url-various-datasets/top100/top100.txt 0,00s user 0,00s system 4% cpu 0,115 total | ||
trurl --url-file - &> /dev/null 0,09s user 0,02s system 97% cpu 0,113 total | ||
|
||
time cat url-various-datasets/top100/top100.txt| ./build/tools/cli/adaparse -g href &> /dev/null | ||
cat url-various-datasets/top100/top100.txt 0,00s user 0,01s system 11% cpu 0,062 total | ||
./build/tools/cli/adaparse -g href &> /dev/null 0,05s user 0,00s system 94% cpu 0,061 total | ||
``` | ||
</details> | ||
|
||
### Results | ||
|
||
The results will vary depending on your system. We invite you to run your own benchmarks. | ||
|
||
#### Parsing 100,000 Wikipedia URLs | ||
|
||
```bash | ||
ada ▏ 55 ms ███████▋ | ||
trurl ▏ 180 ms █████████████████████████ | ||
``` | ||
|
||
#### Parsing 100,000 URLs from TOP 100 websites | ||
|
||
```bash | ||
ada ▏ 61 ms █████████████▍ | ||
trurl ▏ 113 ms █████████████████████████ | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,99 @@ | ||
--- | ||
title: Installation | ||
description: How to install Ada and add as a dependency | ||
--- | ||
import { Steps } from '@astrojs/starlight/components'; | ||
|
||
:::note | ||
Ada is available on different programming languages such as **C++**, **Python**, **Go**, **Rust** or **LuaJIT**. | ||
::: | ||
|
||
## Homebrew | ||
|
||
Ada is available through [Homebrew](https://formulae.brew.sh/formula/ada-url#default). | ||
Homebrew is kept up to date with the recent releases of Ada. | ||
|
||
Run the following command to install Ada to your macOS computer: | ||
|
||
```bash | ||
brew install ada-url | ||
``` | ||
|
||
## Build from source | ||
|
||
### Requirements | ||
|
||
The project is self-contained and has no dependency. | ||
A recent C++ compiler supporting C++17. We test GCC 9 or better, LLVM 10 or better and Microsoft Visual Studio 2022. | ||
|
||
### Compiling | ||
|
||
:::note | ||
Ada uses **cmake** as a build system. It's recommended to have cmake available in your system. | ||
::: | ||
|
||
Run the following commands to compile and build Ada locally. | ||
|
||
<Steps> | ||
|
||
1. **Prepare build** | ||
|
||
The following command prepares the build. | ||
|
||
```bash | ||
cmake -B build | ||
``` | ||
|
||
2. **Build** | ||
|
||
Run the following command to build Ada on your system. | ||
|
||
```bash | ||
cmake --build build | ||
``` | ||
|
||
3. **Run tests** | ||
|
||
Run the following `ctest` command to validate your build. | ||
|
||
```bash | ||
ctest --output-on-failure --test-dir build | ||
``` | ||
</Steps> | ||
|
||
### Windows | ||
|
||
Windows users need additional flags to specify the build configuration, e.g. `--config Release`. | ||
|
||
### Docker | ||
|
||
The project can also be built via docker using default docker file of repository with following commands. | ||
|
||
<Steps> | ||
|
||
1. **Build** | ||
|
||
Build and prepare the docker file | ||
|
||
```bash | ||
docker build -t ada-builder | ||
``` | ||
|
||
2. **Run** | ||
|
||
Run the tests | ||
|
||
```bash | ||
docker run --rm -it -v ${PWD}:/repo ada-builder | ||
``` | ||
|
||
</Steps> | ||
|
||
### Amalgamation | ||
|
||
You may amalgamate all source files into only two files (`ada.h` and `ada.cpp`) by typing executing the Python | ||
3 script `singleheader/amalgamate.py`. By default, the files are created in the `singleheader` directory. | ||
|
||
```bash | ||
./singleheader/amalgamate.py | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
--- | ||
title: Introduction | ||
description: WHATWG specification compliant URL parser | ||
--- | ||
|
||
Ada is a fast and spec-compliant URL parser written in C++. | ||
|
||
* It's widely tested by both Web Platform Tests and Google OSS Fuzzer. | ||
* It is **extremely fast**. | ||
* It's the default URL parser of Node.js since Node 18.16.0. | ||
* It supports Unicode Technical Standard. | ||
|
||
The Ada library passes the full range of tests from the specification, across a wide range of platforms (e.g., Windows, Linux, macOS). | ||
|
||
## FAQ | ||
|
||
<details> | ||
<summary>What is WHATWG?</summary> | ||
|
||
The term WHATWG stands for **Web Hypertext Application Technology Working Group**. | ||
|
||
It is a community-driven organization that focuses on developing and maintaining web standards. | ||
|
||
The WHATWG was initially formed in response to the divergence between the World Wide Web Consortium (W3C) and the browser vendors at the time, who felt that the W3C process was too slow to address the evolving needs of web developers. | ||
</details> | ||
<details> | ||
<summary>Who uses Ada? It is battle-tested?</summary> | ||
|
||
Ada is adopted by Node.js and used by millions of developers since Node.js 18.16.0. | ||
</details> | ||
<details> | ||
<summary>Can I use this in my project?</summary> | ||
|
||
Yes. Free to use for personal and commercial projects. Ada is available under MIT and Apache License 2.0. | ||
</details> | ||
|
||
## License | ||
|
||
This code is made available under the Apache License 2.0 as well as the MIT license. | ||
|
||
Our tests include third-party code and data. The benchmarking code includes third-party code: it is provided for research purposes only and not part of the library. |