2017-10-23 3 views
1

를 사용하지 않고 나는 아래 형식의 JSON 파일이점화 트랜스 Dataframe Column2raw RDD

{"sku-1":{"att-a":"att-a-7","att-b":"att-b-3","att-c":"att-c-10","att-d":"att-d-10","att-e":"att-e-15","att-f":"att-f-11","att-g":"att-g-2","att-h":"att-h-7","att-i":"att-i-5","att-j":"att-j-14"},"sku-2":{"att-a":"att-a-9","att-b":"att-b-7","att-c":"att-c-12","att-d":"att-d-4","att-e":"att-e-10","att-f":"att-f-4","att-g":"att-g-13","att-h":"att-h-4","att-i":"att-i-1","att-j":"att-j-13"},"sku-3":{"att-a":"att-a-10","att-b":"att-b-6","att-c":"att-c-1","att-d":"att-d-1","att-e":"att-e-13","att-f":"att-f-12","att-g":"att-g-9","att-h":"att-h-6","att-i":"att-i-7","att-j":"att-j-4"}} 

Sample of the above Json

나는 새로운 구조 아래로 점화 Dataframe로 읽을 필요

,

Sample of the output

또한 아래 내용을 읽으려고했습니다.

val schema = (new StructType) 
    .add("SKUNAME", (new StructType) 
     .add("att-a", StringType) 
     .add("att-b", StringType) 
     .add("att-c", StringType) 
     .add("att-d", StringType) 
     .add("att-e", StringType) 
     .add("att-f", StringType) 
     .add("att-g", StringType) 
     .add("att-h", StringType) 
     .add("att-i", StringType) 
     .add("att-j", StringType)) 

val recommendationInputDf = sparkSession.read.schema(schema).json(recommendationsPath) 

위 내 코드의 출력은 다음과

스키마

root 
|-- SKUNAME: struct (nullable = true) 
| |-- att-a: string (nullable = true) 
| |-- att-b: string (nullable = true) 
| |-- att-c: string (nullable = true) 
| |-- att-d: string (nullable = true) 
| |-- att-e: string (nullable = true) 
| |-- att-f: string (nullable = true) 
| |-- att-g: string (nullable = true) 
| |-- att-h: string (nullable = true) 
| |-- att-i: string (nullable = true) 
| |-- att-j: string (nullable = true) 

나는 또한 (Spark: Transpose DataFrame Without Aggregating) & (transpose-dataframe-using-spark-scala-without-using-pivot-function) 할 수 있지만 '와 같은 비슷한 질문을 확인

+-------+ 
|SKUNAME| 
+-------+ 
| null| 
+-------+ 

데이터

입니다 동일한 출력을 얻으십시오

코멘트에서

는 이미 제안 된 아래의 솔루션,

def toLong(df: DataFrame, by: Seq[String]): DataFrame = { 
    val (cols, types) = df.dtypes.filter { case (c, _) => !by.contains(c) }.unzip 
    require(types.distinct.size == 1) 

    val kvs = explode(array(
    cols.map(c => struct(lit(c).alias("key"), col(c).alias("val"))): _*)) 

    val byExprs = by.map(col(_)) 
    import sparkSession.sqlContext.implicits._ 
    df 
    .select(byExprs :+ kvs.alias("_kvs"): _*) 
    .select(byExprs ++ Seq($"_kvs.key", $"_kvs.val"): _*) 
} 

toLong(recommendationInputDf, Seq("sku-1")).show(12, false) 

을 확인하지만 출력은 zero323의 대답 Transpose column to row with Spark의 지침에 따라, 다음과 같습니다

+--------------------------------------------------------------------------------------+-----+-------------------------------------------------------------------------------------+ 
|sku-1                     |key |val                     | 
+--------------------------------------------------------------------------------------+-----+-------------------------------------------------------------------------------------+ 
|[att-a-7,att-b-3,att-c-10,att-d-10,att-e-15,att-f-11,att-g-2,att-h-7,att-i-5,att-j-14]|sku-2|[att-a-9,att-b-7,att-c-12,att-d-4,att-e-10,att-f-4,att-g-13,att-h-4,att-i-1,att-j-13]| 
|[att-a-7,att-b-3,att-c-10,att-d-10,att-e-15,att-f-11,att-g-2,att-h-7,att-i-5,att-j-14]|sku-3|[att-a-10,att-b-6,att-c-1,att-d-1,att-e-13,att-f-12,att-g-9,att-h-6,att-i-7,att-j-4] | 
+--------------------------------------------------------------------------------------+-----+-------------------------------------------------------------------------------------+ 

답변

2

:

val df = spark.read.json(spark.createDataset(Seq(
    """{"sku-1":{"att-a":"att-a-7","att-b":"att-b-3","att-c":"att-c-10","att-d":"att-d-10","att-e":"att-e-15","att-f":"att-f-11","att-g":"att-g-2","att-h":"att-h-7","att-i":"att-i-5","att-j":"att-j-14"},"sku-2":{"att-a":"att-a-9","att-b":"att-b-7","att-c":"att-c-12","att-d":"att-d-4","att-e":"att-e-10","att-f":"att-f-4","att-g":"att-g-13","att-h":"att-h-4","att-i":"att-i-1","att-j":"att-j-13"},"sku-3":{"att-a":"att-a-10","att-b":"att-b-6","att-c":"att-c-1","att-d":"att-d-1","att-e":"att-e-13","att-f":"att-f-12","att-g":"att-g-9","att-h":"att-h-6","att-i":"att-i-7","att-j":"att-j-4"}}""" 
))) 

toLong(df, Seq()).select($"key".alias("sku"), $"val.*").show 
+-----+--------+-------+--------+--------+--------+--------+--------+-------+-------+--------+ 
| sku| att-a| att-b| att-c| att-d| att-e| att-f| att-g| att-h| att-i| att-j| 
+-----+--------+-------+--------+--------+--------+--------+--------+-------+-------+--------+ 
|sku-1| att-a-7|att-b-3|att-c-10|att-d-10|att-e-15|att-f-11| att-g-2|att-h-7|att-i-5|att-j-14| 
|sku-2| att-a-9|att-b-7|att-c-12| att-d-4|att-e-10| att-f-4|att-g-13|att-h-4|att-i-1|att-j-13| 
|sku-3|att-a-10|att-b-6| att-c-1| att-d-1|att-e-13|att-f-12| att-g-9|att-h-6|att-i-7| att-j-4| 
+-----+--------+-------+--------+--------+--------+--------+--------+-------+-------+--------+