The latest Spark Kafka setup and dev
Until today the latest versions for bigdata streaming ecosystem are follows.
Please note,
- Spark and Kafka most time just follow the version changes of Hadoop.
- Kafka supports Scala 2.13 well.
- For Spark we'd better choose Scala 2.12 for both Spark installation and client submit, b/c spark-submit has some issues with Scala 2.13.
How to setup and programming with them? Please reference my past blog.
Start spark3.3 kafka3.2 streaming from scratch
The main changes are in build.sbt, whose content becomes,
name := "spark-consumer"
version := "0.1"
scalaVersion := "2.12.17"
libraryDependencies += "org.apache.spark" %% "spark-sql-kafka-0-10" % "3.3.1" % Test
libraryDependencies += "org.apache.spark" %% "spark-sql" % "3.3.1" % "provided"
Go to mvnrepository.com for getting the latest dependencies.
And this is the submit script,
#!/bin/bash
spark-submit \
--packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.3.1 \
--class "KafkaConsumer" \
--master local[*] \
target/scala-2.12/spark-consumer_2.12-0.1.jar
This is the main program,
import org.apache.spark.sql.SparkSession
object KafkaConsumer {
def main(args:Array[String]):Unit = {
val spark = SparkSession.builder.appName("Mykafka").getOrCreate()
val df = spark
.readStream
.format("kafka")
.option("kafka.bootstrap.servers", "localhost:9092")
.option("subscribe", "mytest")
.load()
import spark.implicits._
df.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
.as[(String, String)]
val myCount = df.groupBy("key").count()
val query = myCount.writeStream
.outputMode("complete")
.format("console")
.start()
query.awaitTermination()
}
}
Go to compile and submit the job, you will see stream aggregation as follows.
The last info is the development environment.
$ lsb_release -cd
Description: Ubuntu 22.04.1 LTS
Codename: jammy
$ java -version
openjdk version "11.0.17" 2022-10-18
$ scala -version
Scala code runner version 2.12.17 -- Copyright 2002-2022, LAMP/EPFL and Lightbend, Inc.
$ ruby -v
ruby 3.0.2p107 (2021-07-07 revision 0db68f0233) [x86_64-linux-gnu]