Python 2.7, (ord> 128) 문자열을 유니 코드로 변환하는 방법

유니 코드로 128자를 초과하는 표준 문자열을 변환하려고합니다. 예를 들어 , Python 2.7, (ord> 128) 문자열을 유니 코드로 변환하는 방법

a='en métro' 
b=u'en métro' 
c = whatToDoWith(a)

내가 정확히 c를 얻을 수 있도록

은 유형과 값 모두에서, b에 같습니다.

# -*- coding: utf-8 -*- 

c='en métro' 
print type(c) 
print c 
d=c.decode('utf8') 
print type(d) 
print d 
a='中文' 
print type(a) 
print a 
b=a.decode('utf8') 
print type(b) 
print b

이 시간 결과가 예상된다 : 내 진짜 프로그램에서

나는 txt = 'en métro'

utxt = txt.decode('utf8') 
File "C:\Python27\lib\encodings\utf_8.py", line 16, in decode 
return codecs.utf_8_decode(input, errors, True) 
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe9 in position 3: invalid continuation byte

조사하기 위해 때, 나는 또한 다음과 같은 테스트 코드를 가지고 다음과 같은 오류가 발생했습니다

<type 'str'> 
en mÃ©tro 
<type 'unicode'> 
en métro 
<type 'str'> 
ä¸æ–‡ 
<type 'unicode'> 
中文

나는 실제 프로그램과 어떤 차이점이 있는지 알 수 없습니다. 나 또한 # -*- coding: utf-8 -*- 줄이 있습니다.

누군가 가능한 문제를 지적 할 수 있습니까?

출처

2017-12-05 Charlie

str.decode() 가장 확실히 귀하의 경우에 작동합니다 :

# coding=utf-8 

a = "en métro" 
b = u"en métro" 
c = a.decode("utf-8") 

print(type(a)) # <type 'str'> 
print(type(b)) # <type 'unicode'> 
print(type(c)) # <type 'unicode'> 

if b == c: 
    print("b equals c!") # hooray they are equal in value 

if type(b) == type(c): 
    print("b is the same type as c!") # hooray they are of equal type

출처

2017-12-06 00:17:49 zwer

감사합니다. 나는 동일한 결과를 얻을 수 있지만 이것은 내 질문에 대답하지 않습니다. – Charlie

@Charlie - 그러면 내게 당신의 질문은 명확하지 않습니다. – zwer

자세한 내용은 아래에 다른 답변을 추가했습니다. – Charlie

위의 답을 감사, 나는 같은 결과를 얻었다 그러나 이것은 내 test.py 작동하지만 내 진짜 프로그램 아무튼 이유에 대한 내 질문에 대답하지 않았다 ' 티.

좀 더 조사를하고 파일에서 읽은 문자열이 인라인 평가 다르다는 것을 발견

enter code here 

# -*- coding: utf-8 -*- 
c='en métro' 
print "c:" 
print type(c) 
print len(c) 
for x in c: 
    print ord(x) 
file = open('test.txt','r') 

e = file.read() 
print "\n\ne:" 
print type(e) 
print len(e) 
for x in e: 
    print ord(x) 
file.close()

나는 결과를 얻었다 :이 이유라고 생각

c: 
<type 'str'> 
9 
101 
110 
32 
109 
195 
169 
116 
114 
111 


e: 
<type 'str'> 

101 
110 
32 
109 
233 
116 
114 
111

을 실제 프로그램에서 나의 실패를 초래했다. 누군가 이유와 해결책을 설명 할 수 있습니까?

출처

2017-12-06 23:48:46 Charlie

Python 2.7, (ord> 128) 문자열을 유니 코드로 변환하는 방법

답변

관련 문제