Menu
  • HOME
  • TAGS

Inserting unicode message using sqlalchemy and mysql

python,mysql,unicode,sqlalchemy

As well as setting the storage collation, you need to tell the MySQL DBAPI to use UTF-8 for the connection charset. Otherwise MySQL typically defaults to latin1_swedish_ci (ISO-8859-1). The connection charset is usually set using charset=utf8mb4 in the connection string but I imagine URL(query={'charset': 'utf8mb4'}) would probably work also....

What happens under the hood when bytes converted to String in Java?

java,string,unicode,utf-8,byte

Not all sequences of bytes are valid in UTF-8. UTF-8 is a smart scheme with a variable number of bytes per code point, the form of every byte indicating how many other bytes follow for the same code point. Refer to this table: Bytes 1 (hex 0x01, binary 00000001) and...

Maya Python object assigned as a listType, but won't convert to string?

python,string,list,unicode,maya

You need to convert each item in the list to string datatype. print list(map(str, cubeParent)) OR print [str(i) for i in cubeParent] ...

java convert a english letter to unicode [closed]

java,unicode

It doesn't work because .next() returns a String. Instead, read the first character of the string returned. Scanner input = new Scanner(System.in); String temp = input.nextLine(); char ch = temp.charAt(0); int a = (int) ch; System.out.println(a); ...

How to find UTF-8 reference of a composite unicode character

unicode,encoding,utf-8,character-encoding

UTF-8 is a byte encoding for a sequence of individual Unicode codepoints. There is no single Unicode codepoint defined for n̂, not even when a Unicode string is normalized in NFC or NFKC formats. As you have noted, n̂ consists of codepoint U+006E LATIN SMALL LETTER N followed by codepoint...

Inserting French character into Oracle gets converted into some junk characters

oracle,unicode,plsqldeveloper

The issue was due to different values in NLS_LANGUAGE at client and server. At server it was: AMERICAN use following query to read the parameters: SELECT * FROM nls_database_parameters At client it was: AMERICAN_AMERICA.WE8MSWIN1252 In PL/SQL Developer Help->About, click on Additional Info button and scroll down. What i observed other...

Convert unicode URL to ASCII

php,unicode,encoding,ascii

The following can be used for this transformation: function convertpath ($path) { $path1 = ''; $len = strlen ($path); for ($i = 0; $i < $len; $i++) { if (preg_match ('/^[A-Za-z0-9\/?=+%_.~-]$/', $path[$i])) { $path1 .= $path[$i]; } else { $path1 .= urlencode ($path[$i]); } } return $path1; } ...

Why/how does the browser decide ☃.net goes to xn--n3h.net

url,browser,unicode,character-encoding,iri

It uses an encoding scheme called Punycode (as you've already discovered from the Python testing you've done), capable of representing Unicode characters in ASCII-only format. Each label (delimited by dots, so get.me.a.coffee.com has five labels) that contains Unicode characters is encoded in Punycode and prefixed with the string xn--. The...

write a grammar rule name in unicode [ANTLR 4]

java,parsing,unicode,antlr,antlr4

When looking at the lexer grammar for ANTLR4, you can see that lexer and parser names support certain Unicode chars: /** Allow unicode rule/token names */ ID : NameStartChar NameChar*; fragment NameChar : NameStartChar | '0'..'9' | '_' | '\u00B7' | '\u0300'..'\u036F' | '\u203F'..'\u2040' ; fragment NameStartChar : 'A'..'Z' |...

Cyrllic characters in SVG font

javascript,svg,unicode,encoding,fonts

п would be &#1087; To get the code point of a unicode character in JavaScript you can use String.prototype.codePointAt method, in your case just type this into developer console: "п".codePointAt(0) // 1087 To convert the other way around: String.fromCodePoint(1087) // "п" The format in your example, &#x... is a number...

Why python 2.7 on Windows need a space before unicode character when print?

python,python-2.7,unicode

Short answer On Windows you can't print arbitrary strings using print. There are some workarounds, as shown here: How to make python 3 print() utf8. But, despite the title of that question, you can't use this to actually print UTF-8 using code page 65001, it will repeat the last few...

python 3, unicode conversion, two \u0000 as one character

string,python-3.x,unicode

If the string you receive from C++ is the following in Python: s = b'\u00D1\u0082\u00D0\u00B5\u00D1\u0081\u00D1\u0082 test' Then this will decode it: result = s.decode('unicode-escape').encode('latin1').decode('utf8') print(result) Output: тест test The first stage converts the byte string received into a Unicode string: >>> s1 = s.decode('unicode-escape') >>> s1 'Ñ\x82еÑ\x81Ñ\x82 test' Unfortunately, the...

How can I add an icon to select box choices?

html,css,unicode

Options are hard to style. The basic reason is that they are reaching into the operating system for generation rather than being generated solely by the browser like most other website HTML elements. This is why file upload form field 'submit' buttons don't follow the same rules as any other...

█ character string indexed in python

python,string,unicode,ascii,python-2.x

Try doing myString = u"███ ███ J ██". This will make it a Unicode string instead of the python 2.x default of an ASCII string. If you are reading it from a file or a file-like object, instead of doing file.read(), do file.read().encode('utf-8-sig')....

How to display Arabic unicode text in page that retrieved from database

java,unicode,utf-8,xhtml,arabic

This line: String bankName = "\u0627\u0644\u0628\u0646\u0643 \u0627\u0644\u0645\u062a\u062d\u062f"; Is completely equivalent to this: String bankName = "البنك المتحد"; Escaping (think, for example, about \n) isn't a mechanism in-built in Java strings. It's Java compiler that performs these replacements for you. Imagine to have a text file with these two characters: \...

How to pickle and unpickle to portable string in Python 3

python,python-3.x,serialization,unicode

pickle.dumps() produces a bytes object. Expecting these arbitrary bytes to be valid UTF-8 text (the assumption you are making by trying to decode it to a string from UTF-8) is pretty optimistic. It'd be a coincidence if it worked! One solution is to use the older pickling protocol that uses...

Create File In Linux With Unicode File Name

python,linux,windows,unicode

Finally, I found the problem why I can create the file successful in Python interpreter but failed in my Python script. I found that the LANG environment of Python interpreter is "en_US.UTF8" but "C" in my Python script. import os print os.environ['LANG'] I think it is the problem that when...

Replace unicode characters with characters (Javascript)

javascript,unicode,unicode-string

@adeneo posted an option using jQuery. Here's a relevant answer I found that doesn't use jQuery. From this answer: What's the right way to decode a string that has special HTML entities in it? function parseHtmlEnteties(str) { return str.replace(/&#([0-9]{1,4});/gi, function(match, numStr) { var num = parseInt(numStr, 10); // read num...

Text only getting correct UNICODE if i echo an arabic text before CURL

php,curl,unicode

Try header('Content-type: text/html; charset=utf-8'); instead of @header('charset=utf-8'); charset=utf-8 is not a valid HTTP response header so it is not having any effect on the encoding of the page....

Create unicode character with pack

perl,unicode

Why do I get ff and not c3bf when using pack ? This is because pack creates a character string, not a byte string. > perl -MDevel::Peek -e 'Dump(pack("U", 0xff));' SV = PV(0x13a6d18) at 0x13d2ce8 REFCNT = 1 FLAGS = (PADTMP,POK,READONLY,pPOK,UTF8) PV = 0xa6d298 "\303\277"\0 [UTF8 "\x{ff}"] CUR =...

How to detect homographic text, unicode spoofing with node.js

node.js,unicode,punycode

You could try using Unicode normalization with String.prototype.normalize which is available in Node.js v0.12, but I doubt that takes care of every possible attack vector. Use UCAPI — it’s made for your exact use case....

Why can't I add a Dog Face (u+1f436) field to my object without using a String? [duplicate]

javascript,unicode

Aa the ECMAScript standard defines, valid identifiers must start with a Unicode code point with the Unicode property ID_Start. This is not the case for the poor dog. :( You may use any of these code points as first character of your identifier: http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:ID_Start=Yes:]...

Python array printing in unicode and csv issue

python,csv,unicode,unicode-string

You can remove u by encoding unicodes by passing 'unicode-escape' to unicode.encode() method . you can use a list comprehension : >>> l=[u"P&G's Pamela's Diner", u'Pittsburgh', 40.451723, -79.932833] >>> [i.encode('unicode-escape') if isinstance(i,unicode) else i for i in l] ["P&G's Pamela's Diner", 'Pittsburgh', 40.451723, -79.932833] And about your second question its...

Is there any way to check whether an ordinal position contains a character or is empty?

string,python-2.7,unicode

>>> import unicodedata >>> unicodedata.category(unichr(0x08C0)) 'Cn' Cn is the category returned for a code point that has not been assigned to any character. (However, there's no guarantee it won't be assigned in a future version of Unicode.)...

How to specify string variables as unicode strings for pattern and text in regex matching?

regex,python-2.7,unicode

simply use re.match(myregex.decode('utf-8'), mytext.decode('utf-8')) ...

Handle windows-1252 and unicode in java [closed]

java,unicode,utf-8,character-encoding,bytearray

It seems that your server confuses the ISO-Latin-1 encoding with the proprietary Windows-1252 code page and the encoded data are the result of this. The Windows-1252 code page differs only at a few places from ISO-Latin-1. You can fix the data by converting them back to the bytes the server...

Working with characters based on their UTF-8 hex codes

javascript,jquery,unicode,utf-8

I suggest preprocessing the data as you grab it from the webpage instead of extracting it from the string afterwards. You can then use decodeURIComponent() to decode the percent-encoded string: decodeURIComponent('%F0%9F%98%92') Combine that with jQuery to access the data-textvalue-attribute: decodeURIComponent($(element).data('textvalue')) I created a simple example on JSFiddle. For some reason...

unicode converting in RestTemplate in Spring

java,spring,unicode,utf-8

You can use the Apache Commons Lang. There's a method called StringEscapeUtils.unescapeJava(String s) That can do it. (From http://stackoverflow.com/a/14368185/1176061)...

Behaviour unicode string in python

python,unicode

Its just because of that if you don't specify any encoding for unicode function then : unicode() will mimic the behaviour of str() except that it returns Unicode strings instead of 8-bit strings. More precisely, if object is a Unicode string or subclass it will return that Unicode string without...

LXML to write in unicode?

python,unicode,lxml

Yes you can pass an encoding to etree.tostring method using the encoding parameter: etree.tostring(node, pretty_print=True, encoding='unicode') From the etree.tostring docs: You can also serialise to a Unicode string without declaration by passing the unicode function as encoding (or str in Py3), or the name 'unicode'. This changes the return value...

fonts - how does the text gets displayed when there is no font associated to it

unicode,fonts

Every UI that displays string data has to support fonts in some manner, whether the font is provided by the OS, or assigned by the application. So there is always a font associated with a UI that is displaying string data to a user. As long as that font supports...

Salesforce custom button returning 'Unexpected Token ILLEGAL?

javascript,unicode,salesforce

Seems like you have a special hidden character in following line contactToUpdate.Last_Advisor_Touch__c = new Da​ te(); in word Date If you just rewrite it it should work. Specifically you have the infamous ZERO WIDTH SPACE between Da and te probably comes from this. To eliminate such things you can use...

Python: difficulty converting ascii to unicode

python,unicode,encoding,utf-8

I'll assume your remote "source page" contains more than just ASCII otherwise your comparison will already work as is (ASCII is now a subset of UTF-8. I.e. A in ASCII is 0x41, which is the same as UTF-8). You may find Python Requests library easier as it will automatically decode...

Calculate bytes of the unicode character in python

python,unicode,byte

Suppose you are reading the unicode characters from file into a variable called byteString. Then you can do the following: unicode_string = byteString.decode("utf-8") print len(unicode_string) ...

how to put characters combination as constant command into iOS framework

ios,objective-c,unicode,frameworks,ascii

\u universal characters are restricted by ISO 10646 to exclude certain characters. Of particular interest to you is ESC. But you can encode this in octal: NSString *message2 = @"\033E1"; Note that you typically do not put these in the header file. You typically implement this this way: MYMessages.h //...

Formatting Errors with tail

linux,bash,unicode,cygwin,tail

The file is in UTF-16 format, which uses 2 8-bit bytes to represent most characters (and 4 8-bit bytes for some characters). Each of the 128 ASCII characters is represented as 2 bytes, a zero byte and a byte containing the actual character value. The \xff\xfe sequence at the start...

Python: Transform a unicode variable into a string variable

python,unicode,casting,web-crawler,unicode-string

Use a RegEx to extract the price from your Unicode string: import re def reducePrice(price): match = re.search(r'\d+', u' $500 ') price = match.group() # returns u"500" price = str(price) # convert "500" in unicode to single-byte characters. return price Even though this function converts Unicode to a "regular" string...

Characters and Strings in Swift

ios,string,swift,unicode,character

When a type isn't specified, Swift will create a String instance out of a string literal when creating a variable or constant, no matter the length. Since Strings are so prevalent in Swift and Cocoa/Foundation methods, you should just use that unless you have a specific need for a Character—otherwise...

PHP - length of string containing emojis/special chars

php,unicode,unicode-string

Your functions are all counting different things. Graphemes: 👍 🏿 ✌ 🏿️ @ m e n t i o n 13 ----------- ----------- -------- --------------------- ------ ------ ------ ------ ------ ------ ------ ------ ------ Code points: U+1F44D U+1F3FF U+270C U+1F3FF U+FE0F U+0020 U+0040 U+006D U+0065 U+006E U+0074 U+0069 U+006F U+006E...

Tabulating characters with diacritics in R

r,unicode,nlp,linguistics

Try library(stringi) table(stri_split_boundaries(word, type='character')) #a n n̥ #2 1 1 Or table(strsplit(word, '(?<=\\P{Ll}|\\w)(?=\\w)', perl=TRUE)) #a n n̥ #2 1 1 ...

Need code for removing all unicode characters in vb6

unicode,vb6,unicode-string

If this is UTF-16 text (as normal VB6 String values all are) and you can ignore the issue of surrogate pairs, then this is fairly quick and reasonably concise: Private Sub DeleteNonAscii(ByRef Text As String) Dim I As Long Dim J As Long Dim Char As String I = 1...

How does Python 2 represent Unicode internally?

python,unicode

This statement simply means that there is underlying C code that uses both these encodings and that depending on the circumstances, either variant is chosen. Those circumstances are typically user choice, compiler and operating system. Now, for the possible rationale for that, there are reasons not to use UTF-8: First...

How to make Python Interactive Shell print cyrillic symbols?

python,shell,unicode,character-encoding,cyrillic

You are seeing the repr representation of the unicode strings, if you loop over the list or index and print each string you will see the output you want. In [4]: terms Out[4]: [u'\u041f\u0430\u0432\u0435\u043b', u'\u0445\u043e\u0434\u0438\u0442', u'\u0434\u043e\u043c\u043e\u0439'] # repr In [5]: print terms[0] # str Павел In [6]: print terms[1] ходит...

How can I type an actual lamda character in Visual Studio?

visual-studio,unicode

A good question, and one that bugged me into trying to get this to work. I do second OP's comment that you can compile code with lambda characters for variables just fine. However, after an hour of trying various methods I knew of and found for typing special characters (using...

Testing for unicode escaped strings

python,unicode,python-2.x

What you have is a double-encoded string. It's already been decoded once to create Unicode, but you need to decode it a second time. To do this, we take advantage of the fact that Unicode takes its first 256 code points from the latin-1 character set. That lets us convert...

Punycode for Unicode query parameter

java,url,unicode,punycode

What you did wrong is use punycode. Punycode is used for domain names, including the domain-name part of a URL, only. Other parts of a URL, including the query-parameter part, use Percent Encoding also known as URL encoding or URI encoding, and that is what Chrome is doing; this encodes...

Java Unicode Variable names Devanagari

java,unicode

Save the file in Unicode encoding, and use javac -encoding Unicode program.java to compile it ...

BeautifulSoup parsing unicode giving variable results

python-2.7,unicode,beautifulsoup,ipython-notebook

Another case of RTFM, specifying the specific parser to use seems to solve the problem. soup = BeautifulSoup(content, 'html.parser') or soup = BeautifulSoup(content, 'xml') ...

Is executing C++ code in comments with certain Unicode characters allowed, like in Java?

c++,c++11,unicode,comments

If I read this translation phase reference correctly, then the sequence // \u000d some code here is mapped in phase 1 to itself, i.e. the parser does not translate or expand \u000d. Instead the translation of such sequences happens in phase 5, which is after the comments are replaced by...

How is Levenshtein Distance calculated on Simplified Chinese characters?

python,string,unicode,levenshtein-distance,edit-distance

According to its documentation, it supports unicode: It supports both normal and Unicode strings, but can't mix them, all arguments to a function (method) have to be of the same type (or its subclasses). You need to make sure the Chinese characters are in unicode though: In [1]: from Levenshtein...

Unicode in Android

android,unicode

I think its a font issue .. the font used by Android studio supports that character while the android device (propably Robot font family) doesn't include that glyph.. solution would be to use proper font. Here how to add custom typeface to your project : http://stackoverflow.com/a/27588966/2267723. Here is the list...

Need to convert Java String with è to \u00E8 using Java

java,unicode

Taking forward Tagir Valeev idea of picking up from java.util.Properties: package empty; public class CharsetEncode { public static void main(String[] args) { String s = "resumè"; System.out.println(decompose(s)); } public static String decompose(String s) { return saveConvert(s, true, true); } private static String saveConvert(String theString, boolean escapeSpace, boolean escapeUnicode) { int...

Python length of unicode string confusion

python,unicode

You have 5 codepoints. One of those codepoints is outside of the Basic Multilingual Plane which means the UTF-16 encoding for those codepoints has to use two code units for the character. In other words, the client is relying on an implementation detail, and is doing something wrong. They should...

Python 3 UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d

python,unicode

In Python 3, files are opened text (decoded to Unicode) for you; you don't need to tell BeautifulSoup what codec to decode from. If decoding of the data fails, that's because you didn't tell the open() call what codec to use when reading the file; add the correct codec with...

Using a context processor in conjunction with Jinja template variables

python,unicode,flask,jinja2

You could convert the number first: def format_price(amount): return u'{0:.0f}'.format(int(amount)) Or perhaps: def format_price(amount): return u'{0:.0f}'.format(100*int(amount)) ...

PHP - SQL query containing unicode is returning NULL for some reason

php,mysql,unicode,null

As far as I remember, mysqli_query($con, "SET NAMES 'utf8'"); was required, like this: function gettranslation($word){ return $this->query("SELECT trans FROM `dictionary` WHERE `word` LIKE '$word'"); } function query($query){ //$result=mysqli_query($this->conn, "set character_set_results='utf8'"); mysqli_query($con, "SET NAMES 'utf8'"); $result=mysqli_query($this->conn, $query); return $row = mysqli_fetch_row($result)[0]; } ...

ctrl+G in erl doesn't work

unicode,encoding,utf-8,erlang,docker

Fixed the problem, needed export TERM=linux.

How to map a arabic character to english string using python

python,unicode,arabic

Use unicodedata; (note: This is Python 3. In Python 2 use u'ع' instead) In [1]: import unicodedata In [2]: unicodedata.name('a') Out[2]: 'LATIN SMALL LETTER A' In [6]: unicodedata.name('ع') Out[6]: 'ARABIC LETTER AIN' In [7]: unicodedata.name('ع').split()[-1] Out[7]: 'AIN' The last line works fine with simple letters, but not with all Arabic...

How do I search for all rows containing a given unicode character in Postgres

sql,postgresql,unicode

ANSI SQL answer, may or may not work with Postgresql. SELECT * FROM myTable WHERE desc LIKE U&'%\2028%' ...

PHPExcel file “corrupt” when Unicode in data

excel,unicode,phpexcel

The solution I'm going to use (though not perfect): function PHPExcel_UnicodeFix($value) { //data seems to be UTF-8, despite internal encoding // $iconv = iconv_get_encoding(); // $it = $iconv['internal_encoding']; //such that $it == 'ISO-8859-1' //but iconv($it,"ASCII//TRANSLIT",$value) doesn't work (data is already UTF-8?) //Excel does not accept UTF-8? $value_fixed = iconv("UTF-8","ASCII//TRANSLIT",$value); return...

Formatting string with Regex in java, how do I convert captured group to special character?

java,regex,unicode,special-characters

Since you need to manipulate the matched text before replacement, you need to use the low-level API in Matcher class to perform matching and replacement manually. static String handleEscape(String input) { Pattern p = Pattern.compile("\\$'\\\\x(\\w\\w)'"); Matcher m = p.matcher(input); StringBuffer result = new StringBuffer(); while (m.find()) { m.appendReplacement(result, Character.toString((char) Integer.parseInt(m.group(1),...

Perl internal representation of unicode string

perl,unicode,encoding,mojolicious

update: There is nothing wrong with your program, you are getting été just like you wanted, its simply Dumped as the perl unicode string "\xE9t\xE9", they're the same thing, perl unicode strings aren't stored in memory as utf8, they're decoded from utf into unicode codepoints/ordinals, utf8 is just a way...

Any way to write X^T with just unicode?

unicode

Here you can find a list of supported Latin alphabet superscripts including T, which is %u1D40. Xᵀ For the vector y ⃗ you can use %u20D7 (combining right arrow above). Here and here you can see that those chars can be part of Java identifiers (though %u20D7 may not start...

how to print unicode character U-1F4A9 'pile of poo' in ruby

ruby,unicode,rubymine-7

Unicode code points with more than four hex digits must be enclosed in curly braces: puts "\u{1f4a9}" # => 💩 This is pretty poorly-documented, so don't feel bad about not figuring it out. A nice thing about the curly brace syntax is that you can embed multiple code points separated...

Displaying unicode characters in Python 3

python,list,unicode

In Python 3, this will work directly: >>> [u'\u0413', u'\0434', u'\043b'] ['Г', '#4', '#b'] In Python 2, you can use the print statement to print individual values: >>> for val in [u'\u0413', u'\0434', u'\043b']: ... print val ... Г #4 #b ...

selenium webdriver - xpath locator not working if element's text contains Unicode Characters

selenium,xpath,unicode,selenium-webdriver,webdriver

Am I missing something here? Yes, I think so: <div class="menuitem-content">Français</div> the "a" is missing driver.findelement(By.XPath("//div[text()='Françis']")); EDIT: At least in a Java environment Webdriver can handle Unicode. this works for me (driver in this case being an instance of FirefoxDriver): driver.get("https://fr.wikipedia.org/wiki/Mot%C3%B6rhead"); WebElement we = driver.findElement(By.xpath("//h1[contains(., Motörhead)]")); System.out.println(driver.getTitle() + "...

mysql() refuses to read characters in php

php,mysql,unicode

To convert this Unicode string to utf8 I discovered a php function utf8_decode which after you pass the string with the unicode characters, any weird character will be set to "?" So I can identify them easier.

Why Unicode characters are not displayed properly in terminal with GCC?

c,gcc,unicode

wprintf is a version of printf which takes a wide string as its format string, but otherwise behaves just the same: %c is still treated as char, not wchar_t. So instead you need to use %lc to format a wide character. And since your strings are ASCII you may as...

Parsing string containing Unicode character names

python,unicode

You can use unicode_escape encoding: In Python 2.x: >>> u'M\\N{AMPERSAND}M\\N{APOSTROPHE}s'.decode('unicode-escape') u"M&M's" In Python 3.x: >>> u'M\\N{AMPERSAND}M\\N{APOSTROPHE}s'.encode().decode('unicode-escape') "M&M's" ...

Search for unicode values in character string

r,unicode,grep,gsub

I'm assuming by "unicode character" you just mean non-ASCII characters. Character codes can mean different things depending on encodings. R represents values outside of the current encoding with a special \U sequence. Note that neither the slash nor the letter "U" actually appear in the real data. This is just...

Convert hex values to characters

perl,unicode

The code you posted is correct. The problem appears to be that you forgot to tell Perl to encode your output. This is normally done using use open ':std', ':encoding(UTF-8)'; ...

Open File with Non ASCII Characters

c++,visual-studio-2010,unicode

wchar_t is not portable across multiple platforms, as it is 2 bytes (UTF-16) on some platforms (Windows) but is 4 bytes (UTF-32) on other platforms (Linux, etc). That is what the site is warning you about. In your particular case, you are only focusing on Windows, so std::wstring is perfectly...

Font is right. Why can't I get this unicode character to display in this C# console app?

c#,.net,unicode,console-application

You need to set input encoding as well. Console.InputEncoding = System.Text.Encoding.Unicode; Updated! using System; class Program { static void Main(string[] args) { Console.OutputEncoding = System.Text.Encoding.Unicode; Console.InputEncoding = System.Text.Encoding.Unicode; string s; s = Console.ReadLine(); Console.WriteLine(s); } } ...

Haskell: quoteFile fails on text file with “invalid byte sequence” on unicode characters

linux,haskell,unicode,encoding,utf-8

Finally, I've found that my virtual locale was not properly set, e.g. locale command showed me that all LANG variables are set to POSIX. Exporting LANG variable to command is the quickest workaround (bash example): export LANG=en_US.uft8 cabal build However, likely you need to have en_US locale installed, Debian manual...

How do I match “i” with Turkish i in java?

java,unicode,normalization

If you print out the hex values of the characters you're seeing, the difference is clear: İ 0x130 - Normalized : İ 0x49 0x307 - Lower case: i̇ 0x69 0x307 - Lower case Normalized : i̇ 0x69 0x307 I 0x49 - Normalized : I 0x49 - Lower case: i 0x69...

How to create a Persian file.txt and then explode it?

php,string,unicode,character-encoding,explode

Make sure to save the text file in the UTF-8 encoding. (Use UTF-8 for your HTML output and database connection as well, to match.) If you save a file as the encoding that Microsoft misleadingly call “Unicode” you will actually get UTF-16LE, a two-byte, non-ASCII-compatible encoding that is generally a...

Export (Android/Java) string data in with extended characters for import into Excel

java,android,excel,unicode,utf-8

Excel assumes csv files are in an 8-bit code page. To get Excel to parse your csv as UTF-8, you need to add a UTF-8 Byte Order Mark to the start of the file. Edit: If you're in Western Europe or US, Excel will likely use Windows-1252 character set for...

java linked list works except in one instance

java,unicode,junit

I think I found the real problem, and it has to do with character encoding. I verified that your constructor is good and length = 55300 after the linked list is created. I then stepped into your overridden toString() method. The third iteration through the for loop fails.....curr.letter IS equal...

Converting Unicode codepoints to UTF-8 in C using iconv

c,unicode,encoding,utf-8,iconv

Two problems: Since you’re using UTF-32, you need to specify 4 bytes. The “lower-case lambda is a 16-bit character (0x3BB = 955)” comment isn’t true for a 4-byte fixed-width encoding; it’s 0x000003bb. Set size_t in_size = 4;. iconv doesn’t add null terminators for you; it adjusts the pointers it’s given....

Using sprintf with unicode characters

c,unicode

Universal character names (like \U0001F0A1) are resolved by the compiler. If you use one in a format string, printf will see the UTF-8 representation of the character; it has no idea how to handle backslash escapes. (The same is true of \n and \x2C; those are single characters resolved by...

Allow whitespace, unicode letters, digits, underscore, dash AND comma?

php,regex,unicode,preg-match

This is because - is used for ranges in square brackets([] -> character classes). And as from the manual: indicates character range, example: 0-9 or a-z. So as long as you put it at the end you're fine and don't have to escape it. In all other cases you have...

erlang os:cmd() command with UTF8 binary

unicode,encoding,erlang,utf

Its because Erlang reads your source files like latin1 by default, but on newer versions of erlang you can set your files to use unicode. %% coding: utf-8 -module(test). -compile(export_all). test() -> COMMAND = "touch ჟანიweł", os:cmd(COMMAND). and then compiling and executing the module works fine rorra-air:~ > erl Erlang/OTP...

JavaCC and Unicode issue. Why \u696d cannot be managed in JavaCC although it belong to the range “\u4e00”-“\u9fff”

java,unicode,compiler-construction,antlr,javacc

There is nothing wrong with your token fragment and nothing wrong with JavaCC. The problem lies elsewhere. Here is a JavaCC specification made by copying and pasting your problem code into JavaCC. options { static = true; debug_token_manager = true ; } PARSER_BEGIN(MyNewGrammar) package funnyunicode; import java.io.StringReader ; public class...

How do I compare each character of a String while accounting for characters with length > 1?

java,string,unicode,character-encoding,utf-16

int hanCodePoint = "𩸽".codePointAt(0); for (int i = 0; i < string.length();) { int currentCodePoint = string.codePointAt(i); if (currentCodePoint == hanCodePoint) { // do something here. } i += Character.charCount(currentCodePoint); } ...

Why does Python 3 output \xe3, an extra char?

python,python-3.x,unicode,utf-8

You need to use a console or terminal that supports all of the characters that you want to print. When printing in the interactive console, the characters are encoded to the correct codec for your console, with any character that is not supported using the backslashreplace error handler to keep...

Handling count of characters with diacritics in R

r,unicode,character-encoding,nlp,linguistics

Here is my solution. The idea is that phonetic alphabets can have an unicode representation and then: Use Unicode package; it provide the function Unicode_alphabetic_tokenizer that: Tokenization first replaces the elements of x by their Unicode character sequences. Then, the non- alphabetic characters (i.e., the ones which do not have...

Excel Function - Convert unicode to ascii

excel,unicode,excel-formula

This sound an awful lot like you failed to import the data correctly. If you still have the original data, I suggest you import it again and use 65001: Unicode (UTF-8). If you do not have the original data, you can still trick Excel into importing your data again. I...

Python writing to CSV… TypeError: coercing to Unicode: need string or buffer, file found

python,osx,csv,unicode

f2 is already an open file object; you called the open() function: f2 = open(os.path.expanduser("~/Documents/Test/blah/outputfile.csv")) You cannot then pass that to open(). I think you meant it to be just a filename: f2 = os.path.expanduser("~/Documents/Test/blah/outputfile.csv") with open(f2, 'w') as fp: ...

Change lowercase and uppercase of characters in java

java,unicode,mapping,uppercase,lowercase

This should do the trick: import java.util.HashMap; import java.util.Map; class MyString { String string; static final Map<Character, Character> toLowerCaseMap, toUpperCaseMap; static { toLowerCaseMap = new HashMap<>(); toLowerCaseMap.put('I', '|'); toUpperCaseMap = new HashMap<>(); toUpperCaseMap.put('b', 'P'); } MyString(String string) { this.string = string; } String toLowerCase() { char[] chars = string.toCharArray(); for...

Java Unicode character displaying box instead of Runic letter

java,string,unicode,graphics

The font that Java defaults to varies from platform to platform. To ensure that a unicode character is always displayed properly, you should set to a font that you are sure contains the glyph. You can set the font as such, before calling the drawString() method Font font = new...