Skip to content

Commit

Permalink
Add Apache Spark to README
Browse files Browse the repository at this point in the history
  • Loading branch information
michaelmior committed Feb 27, 2024
1 parent ed764a2 commit faa4efe
Showing 1 changed file with 9 additions and 0 deletions.
9 changes: 9 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,15 @@ This will remove the use of `allOf` but produce a schema which should accept the
This is only useful for schemas not generated by JSONoid since JSONoid does not currently generate schemas with `allOf`.
Accordingly, there is no option for this transformer in the CLI, but may be useful via the API.

## Apache Spark :sparkles:

JSONoid also supports distributed schema discovery via [Apache Spark](https://spark.apache.org/).
There are two options for running JSONoid on Spark.
The first is to the `JsonoidSpark` class as your main class when running Spark.
In this case, you can pass a path file path as input and the schema will be written to standard output.
Alternatively, you can use the `JsonoidRdd#fromString` method to convert an RDD of strings to an RDD of schemas that supports schema discovery via the `reduceSchemas` or `treeReduceSchemas `method.
The result of the reduction will be a `JsonSchema` object.

## Running tests

Tests can be run via [ScalaTest](https://www.scalatest.org/) via `sbt test`.
Expand Down

0 comments on commit faa4efe

Please sign in to comment.