`
kevin-qingzhan
  • 浏览: 83821 次
  • 性别: Icon_minigender_1
  • 来自: 北京
文章分类
社区版块
存档分类
最新评论

Invalid byte 1 of 1-byte UTF-8 sequence

阅读更多
昨天在用SAX解析XML文档的时候,在XML文件中如果有中文的话就会抛出“invalid byte 1 of 1-byte UTF-8 sequence”异常,调试是总是找不到问题所在,于是求救于网络,终于找到问题所在,成功解决了问题,在此谢谢强大的网络资源。

   XML内容实际是以UTF-8编码的,因此造成了包括中文字符的XML文件无法正常阅读,将编码格式改成“GB2312”后就可以正常解析了。<?xml   version="1.0"   encoding="GB2312"?>



自己的总结:
1、“org.dom4j.DocumentException: Invalid byte 1 of 1-byte UTF-8 sequence.”异常分析和解决:
分析:
该异常由下面的reader.read(file);语句抛出:
SAXReader reader = new SAXReader();
Document doc = reader.read(file);

产生这个异常的原因是:
所读的xml文件实际是GBK或者其他编码的,而xml内容中却用<?xml version="1.0" encoding="utf-8"?>指定编码为utf-8,所以就报异常了!

注释:参考网上的《Java/J2EE中文问题终极解决之道》一文,编码问题原因应该是:操作系统编码为GBK,而xml指定为utf-8,SAXReader使用系统的默认编码GBK,所以存在需要转换编码的问题,也就自然会出现乱码了!解决:让文件编码和java 操作该文件的接口的编码一致;

解决:
情况一:该xml文件由dom4j生成;

解决方法:用 org.dom4j.io.XMLWriter xmlWriter = new org.dom4j.io.XMLWriter(
                    new FileOutputStream(fileName));
代替
xmlWriter = new XMLWriter(new FileWriter(fileName));
,指定编码为utf-8生成xml文件;

详细参考资料1:
Dom4j 编码问题彻底解决 作者:lonsen
http://www.5inet.net/Develop/Java/036579,Dom4j_BianMaWenDiCheDeJieJue.aspx

情况二:解析从jsp页面中读取到的用户输入的xml描述内容时,reader.read()抛出异常;

解决方法:
调用read前先把xml内容转为utf-8编码:(使用支持编码格式的函数)

public void validate(FacesContext context, UIComponent component, Object obj)
     throws ValidatorException {
 
            String xmldescription = (String) obj;
     byte[] bytes =xmldescription.getBytes();
            RelationXmlParser.isXmlOK("E:\\jiangcm\\templateXMLSchema.xsd",bytes); 
     ……
    }

public static boolean isXmlOK(String xsdFile, byte[] tagetXml) throws SAXException,                  IOException, DocumentException
{
   SAXReader reader = new SAXReader();
                ……
   InputStream in = new ByteArrayInputStream(tagetXml);
   InputStreamReader utf8In=new InputStreamReader(in,"utf-8");
                ……
        }



自己的解决:String.getBytes("utf-8")

返回utf-8的字节就可以了
分享到:
评论
1 楼 geyaandy 2013-08-22  
 

相关推荐

    解决Invalid byte 1 of 1-byte UTF-8 sequence

    解决Invalid byte 1 of 1-byte UTF-8 sequence

    php解析xml提示Invalid byte 1 of 1-byte UTF-8 sequence错误的处理方法

    在利用php解析xml时提示Invalid byte 1 of 1-byte UTF-8 sequence错误了,这个问题我百度查实说是编码问题,结果我把编码处理一下果然KO了,下面我来分享一下解决办法

    iuhyiuhkjh908u0980

    在windows系统中,命令行中执行ant命令时,当指定的 构建脚本文件中包含中文字符,而构建脚本文件的编码是UTF-8时将会 Invalid byte 1 of 1-byte UTF-8 sequence. 的错误.这个问题尚未 知解决,故先采用GBK的编码. 2.在...

    ruby中文文档(ruby入门级别教程)

    包括ruby用户指南,RGSS入门教程,Programming Ruby,Ruby参考手册共4部分内容。 chm格式

    SQL Server数据迁移至PostgreSQL出错的解释以及解决方案

    最近对SQL Server到PostgreSQL的数据迁移时出现了问题,返回的错误为:invalid byte sequence for encoding "UTF8": 0x00。经查证pg源代码,该问题引起的原因是sql server的字符类型字段中含有空字符0,该字符在pg中...

    android layout XML解析错误的解决方法

    org.xmlpull.v1.XmlPullParserException: PI must not start with xml (position:unknown @1:5 in java.io.InputStreamReader@47ec2770) org.xml.sax.SAXParseException: PI must not start with xml ...

    LuaUnicode icu-lua

    they come from the closely corresponding ISO standard ISO/IEC 10646-1:2000 and currently differ in that they allow codes outside of the Unicode range, which runs from 0x0 to 0x10FFFF.) Pattern ...

    pyquery报错:UnicodeDecodeError: ‘gbk’ codec can’t decode byte

    今天想使用pyquery库读取本地HTML文件时报错:UnicodeDecodeError: ‘gbk’ codec can’t decode byte 0xa1 in position 164: illegal multibyte sequence。 翻译一下就是UnicodeDecodeError: ‘gbk’编解码器无法...

    acpi控制笔记本风扇转速

    Disassembly of raw data buffers with byte initialization data now prefixes each output line with the current buffer offset. Disassembly of ASF! table now includes all variable-length data fields at ...

    Bochs - The cross platform IA-32 (x86) emulator

    --enable-sep, --enable-aes, --enable-1g-pages are deprecated and should not be used anymore. - Local APIC configure option --enable-apic is deprecated and should not be used anymore. The LAPIC ...

    k7 SRIO参考例程

    byte-wise writes of CSRs such as the deviceID register and BAR. - Message response transaction received as a user defined packet type using 16-bit device IDs appears as a corrupted packet on the ...

    Chatlog_Ripper:一个帮助你从聊天记录文件中提取 URL 的小程序

    聊天记录开膛手在 WDI 中,我们共享一切。... 如果您收到错误“in `scan': invalid byte sequence in UTF-8 (ArgumentError)”,只需将您的文本日志解析为可以转换为 UTF-8 的内容(例如 )。 我将来会解决这个问题。

    The Art of Assembly Language Programming

    You are visitor as of October 17, 1996. The Art of Assembly Language Programming &lt;br&gt;Forward Why Would Anyone Learn This Stuff? 1 What's Wrong With Assembly Language 2 What's Right With ...

    msp430 C语言例程

    ADC12, Repeated Sequence of Conversions ADC12, Repeated Single Channel Conversions ADC12, Using 10 External Channels for Conversion ADC12, Sequence of Conversions (non-repeated) ADC12, Sample A10 Temp...

    doctest:Haskell的python doctest的实现

    Doctest:测试交互式Haskell示例doctest是一个小程序,用于检查。 它与相似,。安装可以从获得doctest 。 通过键入以下内容进行安装: cabal install doctest确保Cabal的bindir在您的PATH 。 在Linux上: export ...

    eac3to V3.17

    * added support for MKV "SRT/UTF8", "SRT/ASCII", "ASS" and "SSA" subtitles * increased some internal buffers to avoid AC3 overflow in the "thd ac3 joiner" * fixed: frame counting didn't work for MKV ...

    S7A驱动720版本

    - Support for S7-200 with CP 243-1 was added. Solved problems: - Passing of invalid OPC Item IDs caused a memory leak of the driver's global memory. After the global memory was exhausted, the ...

    微软内部资料-SQL性能优化3

    Byte 1: Resource Type – 0x07 (Key) Byte 2-3: DBID – 0x0005 Byte 4-7: ObjectID – 0x 75D7831F (1977058079) Byte 8-9: IndexID – 0x0001 Byte 10-16: Hash Key value – 0x 02014F0BEC4E For more ...

    CE中文版-启点CE过NP中文.exe

    Fixed freeze with allow increase/decrease for 8 byte long values Fixed several issues where minimizing a window and then close it would hang CE Fixed file scanning Fixed crashes when editing memory in...

Global site tag (gtag.js) - Google Analytics