编辑推荐

网易数帆开源API网关与容器云项目，让云原生生产落地“多快好

网易汪源：统一负载与多云环境的“开放姿态”，才是云原生

网易数帆如何用 Kubernetes“原语”搞定云原生中间件

快手打新挤爆券商系统，网易数帆推出券商稳定性保障方案

探索智慧校园新模式，网易有数在教育行业的实践分享

金融行业大数据治理之路——数据模型篇

Structure Streaming和spark streaming原生API访问HDFS文件数据对比

叁叁肆2018-10-31 13:30

此文已由作者岳猛授权网易云社区发布。

欢迎访问网易云社区，了解更多网易技术产品运营经验。

Structure Stream访问方式

code examples

import org.apache.spark.sql.streaming._
val df = spark.readStream.text("/home/testhdfs")
val ps = df.writeStream.format("console").outputMode(OutputMode.Append).start

结论

basedir = /home/testhdfs
支持：mv file to basedir（/home/testhdfs）
不支持：mv directory to basedir

如果往basedir里面添加文件夹会出现ERROR:

java.lang.AssertionError: assertion failed: Conflicting directory structures detected. Suspicious paths:
        hdfs://172.17.1.180:9000/home/testhdfs/data1
        hdfs://172.17.1.180:9000/home/testhdfsIf provided paths are partition directories, please set "basePath" in the options of the data source to specify the root directory of the table. If there are multiple root directories, please load them separately and then union them.

spark streaming 访问方式

测试textFile接口使用

import org.apache.spark.streaming._
val ssc = StreamingContext.getActiveOrCreate(() => new StreamingContext(sc,                  Seconds(120)))
val ds1 = ssc.textFileStream("/home/testhdfs2")
ds1.print
ssc.start