Skip to content
This repository has been archived by the owner on Dec 4, 2024. It is now read-only.

Not able to collect data from an RDD on spark cluster #289

Open
deepakjangid123 opened this issue Apr 19, 2018 · 2 comments
Open

Not able to collect data from an RDD on spark cluster #289

deepakjangid123 opened this issue Apr 19, 2018 · 2 comments

Comments

@deepakjangid123
Copy link

deepakjangid123 commented Apr 19, 2018

Hi all,
This is my spark configuration for spark-context in Clojure. I am running Spark 2.2.1 and flambo 0.8.2. Flambo is a Clojure DSL for Apache Spark and flambo-0.8x is compatible with Spark-2x.

(ns some-namespace
  (:require
     [flambo.api :as f]
     [flambo.conf :as conf]))

;; Spark-configurations
(def c
  (-> (conf/spark-conf)
      (conf/master "spark://abhi:7077")
      (conf/app-name "My-app")
      ;; ***To add all the jars used in the project***
      (conf/jars (map #(.getPath %) (.getURLs (java.lang.ClassLoader/getSystemClassLoader))))
      (conf/set "spark.cores.max" "4")
      (conf/set "spark.executor.memory" "2g")
      (conf/set "spark.home" (get (System/getenv) "SPARK_HOME")))

;; Initializing spark-context
(def sc (f/spark-context c))

;; Returns data in `org.apache.spark.api.java.JavaRDD` form from a file
(def data (f/text-file sc "Path-to-a-csv-file"))
;; Collecting data from RDD
(.collect data)

I am able to run this piece of code on master local[*]. But when I am using spark cluster, I am getting following error while collecting the JavaRDD.

18/04/19 05:50:39 ERROR util.Utils: Exception encountered
java.lang.ArrayStoreException: [Z
	at scala.runtime.ScalaRunTime$.array_update(ScalaRunTime.scala:90)
	at org.apache.spark.util.collection.PrimitiveVector.$plus$eq(PrimitiveVector.scala:43)
	at org.apache.spark.util.collection.SizeTrackingVector.$plus$eq(SizeTrackingVector.scala:30)
	at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:216)
	at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1038)
	at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1029)
	at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:969)
	at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1029)
	at org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:792)
	at org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:1350)
	at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:231)
	at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1303)
	at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:206)
	at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
	at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66)
	at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96)
	at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:81)
	at org.apache.spark.scheduler.Task.run(Task.scala:108)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

I am not sure what is going wrong. Have searched about this issue a lot, tried many different things but still getting the very same error. Added all the jars also in spark-configuration.
Please let me know if you have any thought about this.

@vishnu2kmohan
Copy link

@deepakjangid123 Please let us know if you continue to run into this issue with the latest Spark 2.5.0-2.2.1 release.

@anuragpan
Copy link

Spark depends on the memory API's which has been changed in JDK 9 so it is not available starting JDK 9.
You have to setup java 8 for spark. for more info https://issues.apache.org/jira/browse/SPARK-24421

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants