blog/handling-byte-order-mark-characters.html (118 lines of code) (raw):

<!DOCTYPE html> <!--[if lt IE 7]> <html class="no-js lt-ie9 lt-ie8 lt-ie7"> <![endif]--> <!--[if IE 7]> <html class="no-js lt-ie9 lt-ie8"> <![endif]--> <!--[if IE 8]> <html class="no-js lt-ie9"> <![endif]--> <!--[if gt IE 8]><!--> <html class="no-js"> <!--<![endif]--><head> <meta charset='utf-8'/><meta http-equiv='X-UA-Compatible' content='IE=edge'/><meta name='viewport' content='width=device-width, initial-scale=1'/><meta name='keywords' content='groovy, bom_chars, unicode, encoding'/><meta name='description' content='Handling Byte Order Mark (BOM) characters in Groovy'/><title>The Apache Groovy programming language - Blogs - Handling Byte-Order-Mark Characters in Groovy™</title><link href='../img/favicon.ico' type='image/x-ico' rel='icon'/><script src='../js/matomo.js'></script><link rel='stylesheet' type='text/css' href='../css/bootstrap.css'/><link rel='stylesheet' type='text/css' href='../css/font-awesome.min.css'/><link rel='stylesheet' type='text/css' href='../css/style.css'/><link rel='stylesheet' type='text/css' href='../css/../css/prettify.min.css'/> </head><body> <div id='fork-me'> <a href='https://github.com/apache/groovy'> <img style='position: fixed; top: 20px; right: -58px; border: 0; z-index: 100; transform: rotate(45deg);' src='../img/horizontal-github-ribbon.png'/> </a> </div><div id='st-container' class='st-container st-effect-9'> <nav class='st-menu st-effect-9' id='menu-12'> <h2 class='icon icon-lab'>Socialize</h2><ul> <li> <a href='https://groovy-lang.org/mailing-lists.html' class='icon'><span class='fa fa-envelope'></span> Discuss on the mailing-list</a> </li><li> <a href='https://twitter.com/ApacheGroovy' class='icon'><span class='fa fa-twitter'></span> Groovy on Twitter</a> </li><li> <a href='https://groovy-lang.org/events.html' class='icon'><span class='fa fa-calendar'></span> Events and conferences</a> </li><li> <a href='https://github.com/apache/groovy' class='icon'><span class='fa fa-github'></span> Source code on GitHub</a> </li><li> <a href='https://groovy-lang.org/reporting-issues.html' class='icon'><span class='fa fa-bug'></span> Report issues in Jira</a> </li><li> <a href='http://stackoverflow.com/questions/tagged/groovy' class='icon'><span class='fa fa-stack-overflow'></span> Stack Overflow questions</a> </li><li> <a href='http://www.groovycommunity.com/' class='icon'><span class='fa fa-slack'></span> Slack Community</a> </li> </ul> </nav><div class='st-pusher'> <div class='st-content'> <div class='st-content-inner'> <!--[if lt IE 7]> <p class="browsehappy">You are using an <strong>outdated</strong> browser. Please <a href="http://browsehappy.com/">upgrade your browser</a> to improve your experience.</p> <![endif]--><div><div class='navbar navbar-default navbar-static-top' role='navigation'> <div class='container'> <div class='navbar-header'> <button type='button' class='navbar-toggle' data-toggle='collapse' data-target='.navbar-collapse'> <span class='sr-only'></span><span class='icon-bar'></span><span class='icon-bar'></span><span class='icon-bar'></span> </button><a class='navbar-brand' href='../index.html'> <i class='fa fa-star'></i> Apache Groovy™ </a> </div><div class='navbar-collapse collapse'> <ul class='nav navbar-nav navbar-right'> <li class=''><a href='https://groovy-lang.org/learn.html'>Learn</a></li><li class=''><a href='https://groovy-lang.org/documentation.html'>Documentation</a></li><li class=''><a href='/download.html'>Download</a></li><li class=''><a href='https://groovy-lang.org/support.html'>Support</a></li><li class=''><a href='/'>Contribute</a></li><li class=''><a href='https://groovy-lang.org/ecosystem.html'>Ecosystem</a></li><li class=''><a href='/blog'>Blog posts</a></li><li class=''><a href='https://groovy.apache.org/events.html'></a></li><li> <a data-effect='st-effect-9' class='st-trigger' href='#'>Socialize</a> </li><li class=''> <a href='../search.html'> <i class='fa fa-search'></i> </a> </li> </ul> </div> </div> </div><div id='content' class='page-1'><div class='row'><div class='row-fluid'><div class='col-lg-3'><ul class='nav-sidebar'><li><a href='./'>Blog index</a></li><li class='active'><a href='#doc'>Handling Byte-Order-Mark Characters in Groovy™</a></li></ul></div><div class='col-lg-8 col-lg-pull-0'><a name='doc'></a><h1>Handling Byte-Order-Mark Characters in Groovy™</h1><p><div style='display:flex;padding:0.2ex'><span>Author:&nbsp;</span><i>Paul King</i></div><br/><span>Published: 2024-07-11 08:00PM</span></p><hr/><div class="paragraph"> <p>A <a href="https://www.javacodegeeks.com/remove-byte-order-mark-characters-from-file.html">recent article</a> showed how to process <a href="https://en.wikipedia.org/wiki/Byte_order_mark">Byte Order Mark (BOM)</a> characters within text files when coding in Java. In particular, often manual removal of those characters might be needed when processing text files. The article showed how to remove the BOM characters when using the <code>InputStream</code> and <code>Reader</code> classes as well as how to do it using <code>NIO</code> functionality. It also showed how the <code>BOMInputStream</code> class in <a href="https://commons.apache.org/proper/commons-io/">Apache Commons IO</a> could be used. It automatically skips over the BOM characters.</p> </div> <div class="paragraph"> <p>Those examples can be run as is in Groovy (albeit after fixing a bug in the first example) but the (complete!) idiomatic solution in Groovy is:</p> </div> <div class="listingblock"> <div class="content"> <pre class="prettyprint highlight"><code data-lang="groovy">println new File('file.txt').text</code></pre> </div> </div> <div class="paragraph"> <p>That&#8217;s right, Groovy automatically detects the encoding, and removes BOM characters, when using the <code>getText()</code> method along with others like <code>eachLine</code>, <code>splitEachLine</code>, <code>readLines</code>, <code>withReader</code>, and <code>filterLine</code>. The same functionality can be obtained using the <code>newReader</code> method too on files and URLs.</p> </div> <div class="paragraph"> <p>When needed there are variants that let you specify the encoding should you wish to explicitly declare it. In that case, you&#8217;d need to handle the BOM characters manually.</p> </div> <div class="paragraph"> <p>Groovy&#8217;s methods like <code>getText</code> call an underlying <code>CharsetToolkit</code> class. You can also use that class directly should you wish to learn more about the encoding of a file.</p> </div></div></div></div></div><footer id='footer'> <div class='row'> <div class='colset-3-footer'> <div class='col-1'> <h1>Groovy</h1><ul> <li><a href='https://groovy-lang.org/learn.html'>Learn</a></li><li><a href='https://groovy-lang.org/documentation.html'>Documentation</a></li><li><a href='/download.html'>Download</a></li><li><a href='https://groovy-lang.org/support.html'>Support</a></li><li><a href='/'>Contribute</a></li><li><a href='https://groovy-lang.org/ecosystem.html'>Ecosystem</a></li><li><a href='/blog'>Blog posts</a></li><li><a href='https://groovy.apache.org/events.html'></a></li> </ul> </div><div class='col-2'> <h1>About</h1><ul> <li><a href='https://github.com/apache/groovy'>Source code</a></li><li><a href='https://groovy-lang.org/security.html'>Security</a></li><li><a href='https://groovy-lang.org/learn.html#books'>Books</a></li><li><a href='https://groovy-lang.org/thanks.html'>Thanks</a></li><li><a href='http://www.apache.org/foundation/sponsorship.html'>Sponsorship</a></li><li><a href='https://groovy-lang.org/faq.html'>FAQ</a></li><li><a href='https://groovy-lang.org/search.html'>Search</a></li> </ul> </div><div class='col-3'> <h1>Socialize</h1><ul> <li><a href='https://groovy-lang.org/mailing-lists.html'>Discuss on the mailing-list</a></li><li><a href='https://twitter.com/ApacheGroovy'>Groovy on Twitter</a></li><li><a href='https://groovy-lang.org/events.html'>Events and conferences</a></li><li><a href='https://github.com/apache/groovy'>Source code on GitHub</a></li><li><a href='https://groovy-lang.org/reporting-issues.html'>Report issues in Jira</a></li><li><a href='http://stackoverflow.com/questions/tagged/groovy'>Stack Overflow questions</a></li><li><a href='http://www.groovycommunity.com/'>Slack Community</a></li> </ul> </div><div class='col-right'> <p> The Groovy programming language is supported by the <a href='http://www.apache.org'>Apache Software Foundation</a> and the Groovy community. </p><div text-align='right'> <img src='../img/asf_logo.png' title='The Apache Software Foundation' alt='The Apache Software Foundation' style='width:60%'/> </div><p>Apache, Apache Groovy, Groovy, and the Apache feather logo are either registered trademarks or trademarks of The Apache Software Foundation.</p> </div> </div><div class='clearfix'>&copy; 2003-2025 the Apache Groovy project &mdash; Groovy is Open Source: <a href='http://www.apache.org/licenses/LICENSE-2.0.html' alt='Apache 2 License'>license</a>, <a href='https://privacy.apache.org/policies/privacy-policy-public.html'>privacy policy</a>.</div> </div> </footer></div> </div> </div> </div> </div><script src='../js/vendor/jquery-1.10.2.min.js' defer></script><script src='../js/vendor/classie.js' defer></script><script src='../js/vendor/bootstrap.js' defer></script><script src='../js/vendor/sidebarEffects.js' defer></script><script src='../js/vendor/modernizr-2.6.2.min.js' defer></script><script src='../js/plugins.js' defer></script><script src='../js/vendor/prettify.min.js'></script><script>document.addEventListener('DOMContentLoaded',prettyPrint)</script> </body></html>