sqoop을 사용하여 hdfs에서 mysql로 일부 데이터를 내보내려고합니다. 문제는 제대로 압축되지 않은 파일을 내보내려고 할 때이지만 lzo 압축으로 압축 된 동일한 파일을 내보내려고하면 sqoop 작업이 실패합니다. 나는 표준 cloudera CDH4 VM 환경에서 그것을 시도하고있다. 파일의 열은 탭으로 구분되며 널 (NULL)은 '\ N'으로 표시됩니다.sqoop을 사용하여 lzo 압축에서 데이터를 내보낼 때 NoSuchElementException이 발생했습니다.
파일 내용 :
[[email protected] ~]$ cat dipayan-test.txt
dipayan koramangala 29
raju marathahalli 32
raju marathahalli 32
raju \N 32
raju marathahalli 32
raju \N 32
raju marathahalli 32
raju marathahalli \N
raju marathahalli \N
MySQL의 테이블에 대한 설명 : HDFS에서
mysql> describe sqooptest;
+---------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------+--------------+------+-----+---------+-------+
| name | varchar(100) | YES | | NULL | |
| address | varchar(100) | YES | | NULL | |
| age | int(11) | YES | | NULL | |
+---------+--------------+------+-----+---------+-------+
3 rows in set (0.01 sec)
파일 :
[[email protected] ~]$ hadoop fs -ls /user/cloudera/dipayan-test
Found 1 items
-rw-r--r-- 3 cloudera cloudera 138 2014-02-16 23:18 /user/cloudera/dipayan-test/dipayan-test.txt.lzo
Sqoop을 명령 :
sqoop export --connect "jdbc:mysql://localhost/bigdata" --username "root" --password "XXXXXX" --driver "com.mysql.jdbc.Driver" --table sqooptest --export-dir /user/cloudera/dipayan-test/ --input-fields-terminated-by '\t' -m 1 --input-null-string '\\N' --input-null-non-string '\\N'
,536,913 63,210
오류 : 파일이 압축되지 않고 내가 직접 dipayan-test.txt
파일로 작업하고있는 경우
[[email protected] ~]$ sqoop export --connect "jdbc:mysql://localhost/bigdata" --username "root" --password "mysql" --driver "com.mysql.jdbc.Driver" --table sqooptest --export-dir /user/cloudera/dipayan-test/ --input-fields-terminated-by '\t' -m 1 --input-null-string '\\N' --input-null-non-string '\\N'
14/02/16 23:19:26 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
14/02/16 23:19:26 WARN sqoop.ConnFactory: Parameter --driver is set to an explicit driver however appropriate connection manager is not being set (via --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which connection manager should be used next time.
14/02/16 23:19:26 INFO manager.SqlManager: Using default fetchSize of 1000
14/02/16 23:19:26 INFO tool.CodeGenTool: Beginning code generation
14/02/16 23:19:26 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM sqooptest AS t WHERE 1=0
14/02/16 23:19:26 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM sqooptest AS t WHERE 1=0
14/02/16 23:19:27 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-0.20-mapreduce
14/02/16 23:19:27 INFO orm.CompilationManager: Found hadoop core jar at: /usr/lib/hadoop-0.20-mapreduce/hadoop-core.jar
Note: /tmp/sqoop-cloudera/compile/676bc185f1efffa3b0de0a924df4a02d/sqooptest.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
14/02/16 23:19:29 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/676bc185f1efffa3b0de0a924df4a02d/sqooptest.jar
14/02/16 23:19:29 INFO mapreduce.ExportJobBase: Beginning export of sqooptest
14/02/16 23:19:30 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM sqooptest AS t WHERE 1=0
14/02/16 23:19:30 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/02/16 23:19:31 INFO input.FileInputFormat: Total input paths to process : 1
14/02/16 23:19:31 INFO input.FileInputFormat: Total input paths to process : 1
14/02/16 23:19:31 INFO mapred.JobClient: Running job: job_201402162201_0013
14/02/16 23:19:32 INFO mapred.JobClient: map 0% reduce 0%
14/02/16 23:19:41 INFO mapred.JobClient: Task Id : attempt_201402162201_0013_m_000000_0, Status : FAILED
java.io.IOException: Can't export data, please check task tracker logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.util.NoSuchElementException
at java.util.AbstractList$Itr.next(AbstractList.java:350)
at sqooptest.__loadFromFields(sqooptest.java:225)
at sqooptest.parse(sqooptest.java:174)
at org.apach
14/02/16 23:19:48 INFO mapred.JobClient: Task Id : attempt_201402162201_0013_m_000000_1, Status : FAILED
java.io.IOException: Can't export data, please check task tracker logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.util.NoSuchElementException
at java.util.AbstractList$Itr.next(AbstractList.java:350)
at sqooptest.__loadFromFields(sqooptest.java:225)
at sqooptest.parse(sqooptest.java:174)
at org.apach
14/02/16 23:19:55 INFO mapred.JobClient: Task Id : attempt_201402162201_0013_m_000000_2, Status : FAILED
java.io.IOException: Can't export data, please check task tracker logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.util.NoSuchElementException
at java.util.AbstractList$Itr.next(AbstractList.java:350)
at sqooptest.__loadFromFields(sqooptest.java:225)
at sqooptest.parse(sqooptest.java:174)
at org.apach
14/02/16 23:20:04 INFO mapred.JobClient: Job complete: job_201402162201_0013
14/02/16 23:20:04 INFO mapred.JobClient: Counters: 7
14/02/16 23:20:04 INFO mapred.JobClient: Job Counters
14/02/16 23:20:04 INFO mapred.JobClient: Failed map tasks=1
14/02/16 23:20:04 INFO mapred.JobClient: Launched map tasks=4
14/02/16 23:20:04 INFO mapred.JobClient: Data-local map tasks=4
14/02/16 23:20:04 INFO mapred.JobClient: Total time spent by all maps in occupied slots (ms)=29679
14/02/16 23:20:04 INFO mapred.JobClient: Total time spent by all reduces in occupied slots (ms)=0
14/02/16 23:20:04 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
14/02/16 23:20:04 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
14/02/16 23:20:04 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
14/02/16 23:20:04 INFO mapreduce.ExportJobBase: Transferred 0 bytes in 33.5335 seconds (0 bytes/sec)
14/02/16 23:20:04 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
14/02/16 23:20:04 INFO mapreduce.ExportJobBase: Exported 0 records.
14/02/16 23:20:04 ERROR tool.ExportTool: Error during export: Export job failed!
이 완벽하게 작동합니다.
이 문제를 해결하는 데 도움이 필요하며 lzo 파일로 작업 할 때 뭔가 빠졌는지 알고 싶습니다.