Skip to content

Commit

Permalink
Merge pull request #21 from apainintheneck/add-search-index
Browse files Browse the repository at this point in the history
Add search index
  • Loading branch information
apainintheneck authored Dec 15, 2023
2 parents c9bf52d + dcd2a04 commit 3e2b1c2
Show file tree
Hide file tree
Showing 32 changed files with 792 additions and 263 deletions.
3 changes: 3 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,9 @@ jobs:

- name: Outdated readme check
run: bundle exec rake readme:outdated

- name: Outdated cache check
run: bundle exec rake cache:outdated
tests:
needs: docs
strategy:
Expand Down
11 changes: 11 additions & 0 deletions .rubocop.yml
Original file line number Diff line number Diff line change
Expand Up @@ -58,10 +58,21 @@ Minitest/MultipleAssertions:
Style/Documentation:
Enabled: false

Style/SingleArgumentDig:
Enabled: false

Style/StringLiterals:
Enabled: true
EnforcedStyle: double_quotes

Style/StringLiteralsInInterpolation:
Enabled: true
EnforcedStyle: double_quotes

Style/TrailingCommaInArrayLiteral:
Enabled: true
EnforcedStyleForMultiline: comma

Style/TrailingCommaInHashLiteral:
Enabled: true
EnforcedStyleForMultiline: comma
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
## [Unreleased]

- [PR 15](https://github.com/apainintheneck/atlasq/pull/15) Add partial matching support for countries and currencies
- [PR 21](https://github.com/apainintheneck/atlasq/pull/21) Add search index to speed up partial match searches by country name

## [0.1.0] - 2023-11-19

Expand Down
15 changes: 9 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -147,8 +147,8 @@ $ atlasq --country honduras
```console
$ atlasq --region melanesia
*
* Subregion: Melanesia
* * * * * * * * * * * * *
* Region: Melanesia
* * * * * * * * * * *
(🇫🇯 | 242 | FJ | FJI | Fiji)
(🇳🇨 | 540 | NC | NCL | New Caledonia)
(🇵🇬 | 598 | PG | PNG | Papua New Guinea)
Expand All @@ -160,8 +160,8 @@ $ atlasq --region melanesia
```console
$ atlasq --region antarctica
*
* Continent: Antarctica
* * * * * * * * * * * * *
* Region: Antarctica
* * * * * * * * * * * *
(🇦🇶 | 010 | AQ | ATA | Antarctica)
(🇧🇻 | 074 | BV | BVT | Bouvet Island)
(🇬🇸 | 239 | GS | SGS | South Georgia and the South Sandwich Islands)
Expand All @@ -184,9 +184,10 @@ $ atlasq --money ANG
```console
$ atlasq --money \฿
*
* Currency: [THB] ฿ Thai Baht
* Currencies (Partial Match)
* * * * * * * * * * * * * * * *
(🇹🇭 | 764 | TH | THA | Thailand)
- [THB] ฿ Thai Baht
(🇹🇭 | 764 | TH | THA | Thailand)

```

Expand Down Expand Up @@ -217,6 +218,8 @@ To install this gem onto your local machine, run `bundle exec rake install`.

This file gets generated with the `rake readme:generate` command to make sure the example output is always up-to-date. We even check for this on CI with the `rake readme:outdated` command.

More information about cached files can be found in `cache/README.md`.

## Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/apainintheneck/atlasq.
Expand Down
16 changes: 14 additions & 2 deletions Rakefile
Original file line number Diff line number Diff line change
Expand Up @@ -26,16 +26,28 @@ namespace "readme" do
desc "Check if the readme needs to be regenerated"
task :outdated do
Tempfile.open("readme") do |file|
sh "bin/generate_readme > #{file.path}"
sh "bundle exec ruby script/generate_readme.rb > #{file.path}"
sh "diff -q README.md #{file.path}"
end
end

desc "Regenerate the readme"
task :generate do
Tempfile.open("readme") do |file|
sh "bin/generate_readme > #{file.path}"
sh "bundle exec ruby script/generate_readme.rb > #{file.path}"
mv file.path, "README.md"
end
end
end

namespace "cache" do
desc "Check if the cache needs to be regenerated"
task :outdated do
sh "bundle exec ruby script/generate_search_index.rb outdated"
end

desc "Regenerate the cache"
task :generate do
sh "bundle exec ruby script/generate_search_index.rb generate"
end
end
4 changes: 2 additions & 2 deletions atlasq.gemspec
Original file line number Diff line number Diff line change
Expand Up @@ -19,10 +19,10 @@ Gem::Specification.new do |spec|
spec.metadata = {
"homepage_uri" => "https://github.com/apainintheneck/atlasq/",
"changelog_uri" => "https://github.com/apainintheneck/atlasq/blob/main/CHANGELOG.md",
"rubygems_mfa_required" => "true"
"rubygems_mfa_required" => "true",
}

spec.files = Dir["{lib,exe}/**/*"]
spec.files = Dir["{lib,exe}/**/*", "{cache}/**/*.json"]
spec.bindir = "exe"
spec.executables = ["atlasq"]

Expand Down
69 changes: 69 additions & 0 deletions cache/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# Cache Basics

This is essentially a bunch of static files (mostly JSON) that are pre-computed to speed up the program.

You can check if the cache is outdated with the `rake cache:outdated` command and re-generate all files with the `rake cache:generate` command. To make sure it's always up-to-date we even check for this on CI.

There are a few scripts to generate the cache and they can all be found in the `script/` directory.

Note: Sample data can be found in the `README.md` file in each subdirectory.

## Scripts

### script/generate_search_index.rb

As the name would suggest, this generates a bunch of search indexes that are basic JSON files with string to string mappings (no nested nonsense). These are used primarily to speed up partial matches though pre-computing things also means we don't have to pull in the internationalization libraries for 90+ languages which also makes a difference.

## Reading

Reading from the cached files is quite easy. Just use the `Atlas::Cache` module to load the file using the namespace and file name. Each file gets lazy loaded the first time it's referenced and then memoized. Specified in `lib/atlasq/cache.rb`.

```rb
string = Atlas::Cache.get("space/text_file.txt")
```

## Helpers

### script/shared/cache_generator.rb

This class provides a simple wrapper around the whole cache generation process. The namespace specifies which sub-folder should hold these

```rb
cache = CacheGenerator.new(namespace: "space")
```

Add a new cached file using the return value from a `#add` block.

```rb
cache.add "cache_file" do
...
end
```

Text files (.txt) are created when the block returns a string. The following code adds a text file to the cache at `cache/space/text_file.txt`.

```rb
cache.add "text_file" do
"text"
end
```

JSON files (.json) are created when the block doesn't return a string. The following code adds a text file to the cache at `cache/space/json_file.json`.

```rb
cache.add "json_file" do
%w[data data data]
end
```

Use the `#generate` method to re-generate all cache files in the namespace.

```rb
cache.generate
```

Use the `#outdated` method to check if the cache files are outdated since they last time they were generated. If any files are outdated, it will print there names and exit with a non-zero exit code. It generates the files internally and uses `diff` to determine if the files are outdated.

```rb
cache.outdated
```
177 changes: 177 additions & 0 deletions cache/search_index/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,177 @@
# Cache: search_index

---

## Item: search_index/direct_match_country.json

### Content Sample
Sample of the first 20 pretty printed lines of the file.

```
{
"ad": "ad",
"and": "ad",
"020": "ad",
"andorra": "ad",
"the principality of andorra": "ad",
"andorre": "ad",
"アントラ": "ad",
"ጐን፦ሲ": "ad",
"اندورا": "ad",
"এণ্ডোরা": "ad",
"андора": "ad",
"অ্যান্ডোরা": "ad",
"andora": "ad",
"ཨེན་ཌོ་ར།": "ad",
"ανδορρα": "ad",
"andoro": "ad",
"એન્ડોરા": "ad",
"אנדורה": "ad",
"अण्डोरा": "ad",
...
```

## Item: search_index/partial_match_country.json

### Content Sample
Sample of the first 20 pretty printed lines of the file.

```
{
"andorra": [
"ad"
],
"the": [
"ad",
"ae",
"af",
"al",
"am",
"ao",
"ar",
"as",
"at",
"au",
"az",
"bd",
"be",
"bg",
"bh",
...
```

## Item: search_index/countries_by_region.json

### Content Sample
Sample of the first 20 pretty printed lines of the file.

```
{
"europe": [
"ad",
"al",
"at",
"ax",
"ba",
"be",
"bg",
"by",
"ch",
"cz",
"de",
"dk",
"ee",
"es",
"fi",
"fo",
"fr",
"gb",
...
```

## Item: search_index/direct_match_currency.json

### Content Sample
Sample of the first 20 pretty printed lines of the file.

```
{
"978": "eur",
"eur": "eur",
"euro": "eur",
"784": "aed",
"aed": "aed",
"united arab emirates dirham": "aed",
"971": "afn",
"afn": "afn",
"afghan afghani": "afn",
"951": "xcd",
"xcd": "xcd",
"east caribbean dollar": "xcd",
"008": "all",
"all": "all",
"albanian lek": "all",
"051": "amd",
"amd": "amd",
"armenian dram": "amd",
"973": "aoa",
...
```

## Item: search_index/partial_match_currency.json

### Content Sample
Sample of the first 20 pretty printed lines of the file.

```
{
"978": [
"eur"
],
"eur": [
"eur"
],
"€": [
"eur"
],
"euro": [
"eur"
],
"784": [
"aed"
],
"aed": [
"aed"
],
"د.ا": [
...
```

## Item: search_index/countries_by_currency.json

### Content Sample
Sample of the first 20 pretty printed lines of the file.

```
{
"eur": [
"ad",
"at",
"ax",
"be",
"bl",
"cy",
"de",
"ee",
"es",
"fi",
"fr",
"gf",
"gp",
"gr",
"hr",
"ie",
"it",
"lt",
...
```
1 change: 1 addition & 0 deletions cache/search_index/countries_by_currency.json
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"eur":["ad","at","ax","be","bl","cy","de","ee","es","fi","fr","gf","gp","gr","hr","ie","it","lt","lu","lv","mc","me","mf","mq","mt","nl","pm","pt","re","si","sk","sm","tf","va","yt"],"aed":["ae"],"afn":["af"],"xcd":["ag","ai","dm","gd","kn","lc","ms","vc"],"all":["al"],"amd":["am"],"aoa":["ao"],"usd":["aq","as","bq","ec","fm","gu","io","mh","mp","pr","pw","sv","tc","um","us","vg","vi","zw"],"ars":["ar"],"aud":["au","cc","cx","hm","ki","nf","nr","tv"],"awg":["aw"],"azn":["az"],"bam":["ba"],"bbd":["bb"],"bdt":["bd"],"xof":["bf","bj","ci","gw","ml","ne","sn","tg"],"bgn":["bg"],"bhd":["bh"],"bif":["bi"],"bmd":["bm"],"bnd":["bn"],"bob":["bo"],"brl":["br"],"bsd":["bs"],"btn":["bt"],"nok":["bv","no","sj"],"bwp":["bw"],"byn":["by"],"bzd":["bz"],"cad":["ca"],"cdf":["cd"],"xaf":["cf","cg","cm","ga","gq","td"],"chf":["ch","li"],"nzd":["ck","nu","nz","pn","tk"],"clp":["cl"],"cny":["cn"],"cop":["co"],"crc":["cr"],"cup":["cu"],"cve":["cv"],"ang":["cw","sx"],"czk":["cz"],"djf":["dj"],"dkk":["dk","fo","gl"],"dop":["do"],"dzd":["dz"],"egp":["eg"],"mad":["eh","ma"],"etb":["er","et"],"fjd":["fj"],"fkp":["fk"],"gbp":["gb","gg","gs","im","je"],"gel":["ge"],"ghs":["gh"],"gip":["gi"],"gmd":["gm"],"gnf":["gn"],"gtq":["gt"],"gyd":["gy"],"hkd":["hk"],"hnl":["hn"],"htg":["ht"],"huf":["hu"],"idr":["id","tl"],"ils":["il","ps"],"inr":["in"],"iqd":["iq"],"irr":["ir"],"isk":["is"],"jmd":["jm"],"jod":["jo"],"jpy":["jp"],"kes":["ke"],"kgs":["kg"],"khr":["kh"],"kmf":["km"],"kpw":["kp"],"krw":["kr"],"kwd":["kw"],"kyd":["ky"],"kzt":["kz"],"lak":["la"],"lbp":["lb"],"lkr":["lk"],"lrd":["lr"],"lsl":["ls"],"lyd":["ly"],"mdl":["md"],"mga":["mg"],"mkd":["mk"],"mmk":["mm"],"mnt":["mn"],"mop":["mo"],"mru":["mr"],"mur":["mu"],"mvr":["mv"],"mwk":["mw"],"mxn":["mx"],"myr":["my"],"mzn":["mz"],"nad":["na"],"xpf":["nc","pf","wf"],"ngn":["ng"],"nio":["ni"],"npr":["np"],"omr":["om"],"pab":["pa"],"pen":["pe"],"pgk":["pg"],"php":["ph"],"pkr":["pk"],"pln":["pl"],"pyg":["py"],"qar":["qa"],"ron":["ro"],"rsd":["rs"],"rub":["ru"],"rwf":["rw"],"sar":["sa"],"sbd":["sb"],"scr":["sc"],"sdg":["sd"],"sek":["se"],"sgd":["sg"],"shp":["sh"],"sll":["sl"],"sos":["so"],"srd":["sr"],"ssp":["ss"],"std":["st"],"syp":["sy"],"szl":["sz"],"thb":["th"],"tjs":["tj"],"tmt":["tm"],"tnd":["tn"],"top":["to"],"try":["tr"],"ttd":["tt"],"twd":["tw"],"tzs":["tz"],"uah":["ua"],"ugx":["ug"],"uyu":["uy"],"uzs":["uz"],"ves":["ve"],"vnd":["vn"],"vuv":["vu"],"wst":["ws"],"yer":["ye"],"zar":["za"],"zmw":["zm"]}
1 change: 1 addition & 0 deletions cache/search_index/countries_by_region.json
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"europe":["ad","al","at","ax","ba","be","bg","by","ch","cz","de","dk","ee","es","fi","fo","fr","gb","gg","gi","gr","hr","hu","ie","im","is","it","je","li","lt","lu","lv","mc","md","me","mk","mt","nl","no","pl","pt","ro","rs","ru","se","si","sj","sk","sm","tr","ua","va"],"southern europe":["ad","al","ba","es","gi","gr","hr","it","me","mk","mt","pt","rs","si","sm","va"],"emea":["ad","ae","al","am","ao","at","ax","az","ba","be","bf","bg","bh","bi","bj","bw","by","cd","cf","cg","ch","ci","cm","cv","cy","cz","de","dj","dk","dz","ee","eg","eh","er","es","et","fi","fo","fr","ga","gb","ge","gg","gh","gi","gl","gm","gn","gq","gr","gw","hr","hu","ie","il","im","iq","ir","is","it","je","jo","ke","kg","km","kw","kz","lb","li","lr","ls","lt","lu","lv","ly","ma","mc","md","me","mg","mk","ml","mr","ms","mt","mu","mw","mz","na","ne","ng","nl","no","om","pl","ps","pt","qa","re","ro","rs","ru","rw","sa","sc","sd","se","si","sj","sk","sl","sm","sn","so","ss","st","sy","sz","td","tf","tg","tj","tm","tn","tr","tz","ua","ug","uz","va","ye","yt","za","zm","zw"],"asia":["ae","af","am","az","bd","bh","bn","bt","cc","cn","cx","cy","ge","hk","id","il","in","io","iq","ir","jo","jp","kg","kh","kp","kr","kw","kz","la","lb","lk","mm","mn","mo","mv","my","np","om","ph","pk","ps","qa","sa","sg","sy","th","tj","tl","tm","tr","tw","uz","vn","ye"],"western asia":["ae","am","az","bh","cy","ge","il","iq","jo","kw","lb","om","ps","qa","sa","sy","tr","ye"],"southern asia":["af","bd","bt","in","ir","lk","mv","np","pk"],"apac":["af","as","au","bd","bl","bn","bq","bt","bv","cc","ck","cn","cx","fj","fm","gu","hk","hm","id","in","io","jp","kh","ki","kp","kr","la","lk","mh","mm","mn","mo","mp","mv","my","nc","nf","np","nr","nu","nz","pf","pg","ph","pk","pn","pw","sb","sg","sh","tc","th","tk","tl","to","tv","tw","vn","vu","wf","ws"],"americas":["ag","ai","ar","aw","bb","bl","bm","bo","bq","br","bs","bz","ca","cl","co","cr","cu","cw","dm","do","ec","fk","gd","gf","gl","gp","gs","gt","gy","hn","ht","jm","kn","ky","lc","mf","mq","ms","mx","ni","pa","pe","pm","pr","py","sr","sv","sx","tc","tt","um","us","uy","vc","ve","vg","vi"],"caribbean":["ag","ai","aw","bb","bl","bq","bs","cu","cw","dm","do","gd","gp","ht","jm","kn","ky","lc","mf","mq","ms","pr","sx","tc","tt","vc","vg","vi"],"north america":["ag","ai","aw","bb","bl","bm","bq","bs","bz","ca","cr","cu","cw","dm","do","gd","gl","gp","gt","hn","ht","jm","kn","ky","lc","mf","mq","ms","mx","ni","pa","pm","pr","sv","sx","tc","tt","us","vc","vg","vi"],"amer":["ag","ai","aq","ar","aw","bb","bm","bo","br","bs","bz","ca","cl","co","cr","cu","cw","dm","do","ec","fk","gd","gf","gp","gs","gt","gy","hn","ht","jm","kn","ky","lc","mf","mq","mx","ni","pa","pe","pm","pr","py","sr","sv","sx","tt","um","us","uy","vc","ve","vg","vi"],"africa":["ao","bf","bi","bj","bw","cd","cf","cg","ci","cm","cv","dj","dz","eg","eh","er","et","ga","gh","gm","gn","gq","gw","io","ke","km","lr","ls","ly","ma","mg","ml","mr","mu","mw","mz","na","ne","ng","re","rw","sc","sd","sh","sl","sn","so","ss","st","sz","td","tf","tg","tn","tz","ug","yt","za","zm","zw"],"middle africa":["ao","cd","cf","cg","cm","ga","gq","st","td"],"antarctica":["aq","bv","gs","hm"],"south america":["ar","bo","br","cl","co","ec","fk","gf","gs","gy","pe","py","sr","uy","ve"],"oceania":["as","au","cc","ck","cx","fj","fm","gu","ki","mh","mp","nc","nf","nr","nu","nz","pf","pg","pn","pw","sb","tk","to","tv","vu","wf","ws"],"polynesia":["as","ck","nu","pf","pn","tk","to","tv","wf","ws"],"australia":["as","au","ck","fj","fm","gu","ki","mh","mp","nc","nf","nr","nu","nz","pf","pg","pn","pw","sb","tk","to","tv","um","vu","wf","ws"],"western europe":["at","be","ch","de","fr","li","lu","mc","nl"],"australia and new zealand":["au","cc","cx","nf","nz"],"northern europe":["ax","dk","ee","fi","fo","gb","gg","ie","im","is","je","lt","lv","no","se","sj"],"western africa":["bf","bj","ci","cv","gh","gm","gn","gw","lr","ml","mr","ne","ng","sh","sl","sn","tg"],"eastern europe":["bg","by","cz","hu","md","pl","ro","ru","sk","ua"],"eastern africa":["bi","dj","er","et","io","ke","km","mg","mu","mw","mz","re","rw","sc","so","tf","tz","ug","yt","zm","zw"],"northern america":["bm","ca","gl","pm","um","us"],"south-eastern asia":["bn","id","kh","la","mm","my","ph","sg","th","tl","vn"],"southern africa":["bw","ls","na","sz","za"],"central america":["bz","cr","gt","hn","mx","ni","pa","sv"],"eastern asia":["cn","hk","jp","kp","kr","mn","mo","tw"],"northern africa":["dz","eg","eh","ly","ma","sd","ss","tn"],"melanesia":["fj","nc","pg","sb","vu"],"micronesia":["fm","gu","ki","mh","mp","nr","pw"],"central asia":["kg","kz","tj","tm","uz"]}
1 change: 1 addition & 0 deletions cache/search_index/direct_match_country.json

Large diffs are not rendered by default.

Loading

0 comments on commit 3e2b1c2

Please sign in to comment.