Not able to collect data from an RDD on spark cluster #289

deepakjangid123 · 2018-04-19T06:07:29Z

Hi all,
This is my spark configuration for spark-context in Clojure. I am running Spark 2.2.1 and flambo 0.8.2. Flambo is a Clojure DSL for Apache Spark and flambo-0.8x is compatible with Spark-2x.

(ns some-namespace
  (:require
     [flambo.api :as f]
     [flambo.conf :as conf]))

;; Spark-configurations
(def c
  (-> (conf/spark-conf)
      (conf/master "spark://abhi:7077")
      (conf/app-name "My-app")
      ;; ***To add all the jars used in the project***
      (conf/jars (map #(.getPath %) (.getURLs (java.lang.ClassLoader/getSystemClassLoader))))
      (conf/set "spark.cores.max" "4")
      (conf/set "spark.executor.memory" "2g")
      (conf/set "spark.home" (get (System/getenv) "SPARK_HOME")))

;; Initializing spark-context
(def sc (f/spark-context c))

;; Returns data in `org.apache.spark.api.java.JavaRDD` form from a file
(def data (f/text-file sc "Path-to-a-csv-file"))
;; Collecting data from RDD
(.collect data)

I am able to run this piece of code on master local[*]. But when I am using spark cluster, I am getting following error while collecting the JavaRDD.

18/04/19 05:50:39 ERROR util.Utils: Exception encountered
java.lang.ArrayStoreException: [Z
	at scala.runtime.ScalaRunTime$.array_update(ScalaRunTime.scala:90)
	at org.apache.spark.util.collection.PrimitiveVector.$plus$eq(PrimitiveVector.scala:43)
	at org.apache.spark.util.collection.SizeTrackingVector.$plus$eq(SizeTrackingVector.scala:30)
	at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:216)
	at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1038)
	at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1029)
	at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:969)
	at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1029)
	at org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:792)
	at org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:1350)
	at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:231)
	at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1303)
	at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:206)
	at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
	at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66)
	at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96)
	at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:81)
	at org.apache.spark.scheduler.Task.run(Task.scala:108)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

I am not sure what is going wrong. Have searched about this issue a lot, tried many different things but still getting the very same error. Added all the jars also in spark-configuration.
Please let me know if you have any thought about this.

The text was updated successfully, but these errors were encountered:

vishnu2kmohan · 2018-12-18T14:06:58Z

@deepakjangid123 Please let us know if you continue to run into this issue with the latest Spark 2.5.0-2.2.1 release.

anuragpan · 2020-04-17T09:25:11Z

Spark depends on the memory API's which has been changed in JDK 9 so it is not available starting JDK 9.
You have to setup java 8 for spark. for more info https://issues.apache.org/jira/browse/SPARK-24421

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not able to collect data from an RDD on spark cluster #289

Not able to collect data from an RDD on spark cluster #289

deepakjangid123 commented Apr 19, 2018 •

edited

Loading

vishnu2kmohan commented Dec 18, 2018

anuragpan commented Apr 17, 2020

Not able to collect data from an RDD on spark cluster #289

Not able to collect data from an RDD on spark cluster #289

Comments

deepakjangid123 commented Apr 19, 2018 • edited Loading

vishnu2kmohan commented Dec 18, 2018

anuragpan commented Apr 17, 2020

deepakjangid123 commented Apr 19, 2018 •

edited

Loading