0
나는 데이터가 여기에도
Segment.organizationId|^|Segment.segmentId|^|SegmentType|^|SegmentName|^|SegmentName.languageId|^|SegmentLocalLanguageLabel|^|SegmentLocalLanguageLabel.languageId|^|ValidFromPeriodEndDate|^|ValidToPeriodEndDate|^|SegmentInactivationReasonCode|^|SegmentOrganizationId|^|IsShariaCompliant|^|IsCorporate|^|IsElimination|^|IsOther|^|InactiveReasonOtherDescription|^|InactiveReasonOtherDescription.languageId|^|IsOperatingSegment|^|SegmentFundbDescription|^|SegmentFundbDescription.languageId|^|SegmentTypeId|^|SegmentInactiveReasonId|^|FFAction|!|
4295876080|^|7|^|B|^|Test ||^|505074|^|jtrsu|^|505126|^|2010-03-31T00:00:00Z|^||^||^||^|False|^|False|^|False|^|False|^||^|505074|^|False|^||^|505074|^|3013618|^||^|I|!|
을 설정 내 코드
val df = sqlContext.read.format("csv").option("header", "true").option("delimiter", "|").option("inferSchema","true").load("s3://trfsmallfffile/FinancialSegment/TEST")
입니다 그러나 이것은 나에게 올바른 출력
을 제공하지 않습니다 아래에 있습니다여기 내 출력
+----------------------+-----------------+-----------+-----------+----------------------+-------------------------+------------------------------------+----------------------+--------------------+-----------------------------+---------------------+-----------------+-----------+-------------+-------+------------------------------+-----------------------------------------+------------------+-----------------------+----------------------------------+-------------+-----------------------+--------+-------------+
|Segment_organizationId|Segment_segmentId|SegmentType|SegmentName|SegmentName_languageId|SegmentLocalLanguageLabel|SegmentLocalLanguageLabel_languageId|ValidFromPeriodEndDate|ValidToPeriodEndDate|SegmentInactivationReasonCode|SegmentOrganizationId|IsShariaCompliant|IsCorporate|IsElimination|IsOther|InactiveReasonOtherDescription|InactiveReasonOtherDescription_languageId|IsOperatingSegment|SegmentFundbDescription|SegmentFundbDescription_languageId|SegmentTypeId|SegmentInactiveReasonId|FFAction|DataPartition|
+----------------------+-----------------+-----------+-----------+----------------------+-------------------------+------------------------------------+----------------------+--------------------+-----------------------------+---------------------+-----------------+-----------+-------------+-------+------------------------------+-----------------------------------------+------------------+-----------------------+----------------------------------+-------------+-----------------------+--------+-------------+
| 4295876080| 7| B| Test | ^| ^| ^| ^| ^| ^| ^| ^| ^| ^| ^| ^| ^| ^| ^| ^| ^| ^| ^| Japan|
+----------------------+-----------------+-----------+-----------+----------------------+-------------------------+------------------------------------+----------------------+--------------------+-----------------------------+---------------------+-----------------+-----------+-------------+-------+------------------------------+-----------------------------------------+------------------+-----------------------+----------------------------------+-------------+-----------------------+--------+-------------+
|
문자가 레코드에 사용 되었기 때문에 이것을 받고 있습니다.
이 상황을 어떻게 처리 할 수 있습니까?
내 예상 출력은이 option
매개 변수에 스파크 SQL에서 지원되지 않습니다 구분자
...+----------------------+-----------------+-----------+-----------+----------------------+-------------------------+------------------------------------+----------------------+--------------------+-----------------------------+---------------------+-----------------+-----------+-------------+-------+------------------------------+-----------------------------------------+------------------+-----------------------+----------------------------------+-------------+-----------------------+-----------+
|Segment.organizationId|Segment.segmentId|SegmentType|SegmentName|SegmentName.languageId|SegmentLocalLanguageLabel|SegmentLocalLanguageLabel.languageId|ValidFromPeriodEndDate|ValidToPeriodEndDate|SegmentInactivationReasonCode|SegmentOrganizationId|IsShariaCompliant|IsCorporate|IsElimination|IsOther|InactiveReasonOtherDescription|InactiveReasonOtherDescription.languageId|IsOperatingSegment|SegmentFundbDescription|SegmentFundbDescription.languageId|SegmentTypeId|SegmentInactiveReasonId|FFAction|
+----------------------+-----------------+-----------+-----------+----------------------+-------------------------+------------------------------------+----------------------+--------------------+-----------------------------+---------------------+-----------------+-----------+-------------+-------+------------------------------+-----------------------------------------+------------------+-----------------------+----------------------------------+-------------+-----------------------+-----------+
|4295876080 |7 |B |Test | |505074 |jtrsu |505126 |2010-03-31T00:00:00Z | | | |False |False |False |False | |505074 |False | |505074 |3013618 | |I |
+----------------------+-----------------+-----------+-----------+----------------------+-------------------------+------------------------------------+----------------------+--------------------+-----------------------------+---------------------+-----------------+-----------+-------------+-------+------------------------------+-----------------------------------------+------------------+-----------------------+----------------------------------+-------------+-----------------------+-----------+
를 다음 있어야합니다. 그게 지원되지 않습니다 스파크 SQL –
@ RameshMaharjan 내 질문 업데이트 한 번만 봐 주시기 바랍니다 –
당신은 sparkContext와 함께 여러 문자 구분 기호를 사용하고 rdd를 데이터 세트 또는 데이터 프레임으로 변환해야합니다 –