Menu
  • HOME
  • TAGS

Create unicode character with pack

Tag: perl,unicode

I am trying to understand how Perl handles unicode.

use feature qw(say);
use strict;
use warnings;

use Encode qw(encode);

say unpack "H*", pack("U", 0xff);
say unpack "H*", encode( 'UTF-8', chr 0xff );

Output:

ff
c3bf

Why do I get ff and not c3bf when using pack ?

Best How To :

Why do I get ff and not c3bf when using pack ?

This is because pack creates a character string, not a byte string.

> perl -MDevel::Peek -e 'Dump(pack("U", 0xff));'
SV = PV(0x13a6d18) at 0x13d2ce8
  REFCNT = 1
  FLAGS = (PADTMP,POK,READONLY,pPOK,UTF8)
  PV = 0xa6d298 "\303\277"\0 [UTF8 "\x{ff}"]
  CUR = 2
  LEN = 32

Hence unpack("H*") doesn't look at the byte-value of that string, but the (truncated) character value of it. If you'd do:

say unpack "H*", encode("UTF-8", pack("U", 0xff));

Then you'd get the expected result.

See also this thread.

Regex in Perl Uninitialized $1

regex,perl

$1 is the value captured by the first capture (()), but you have no captures in your pattern. Fix: /(?<=File `..\/)(.*)(?=')/ Simplified: m{File `../(.*)'} More robust: m{File `../([^']*)'} ...

unable to understand qr interpolation

regex,perl

The chapter of the Perl documentation that deals with this is called perlre. In the extended pattern matching section it explains this. Starting in Perl 5.14, a "^" (caret or circumflex accent) immediately after the "?" is a shorthand equivalent to d-imsx . Flags (except "d" ) may follow the...

-M Script start time minus file modification time, in days

perl,perldoc

I think this explains what you are seeing perl -E 'say "START TIME",$^T; qx(touch $_), sleep(5), say -M for "/tmp/file"; say "STAT ON FILE", (stat(_))[9]' output when I ran it START TIME1434460114 0 STAT ON FILE1434460114 1) script starts $^T is set to 1434460114 2) almost immediately the file "/tmp/file"...

Why can't I add a Dog Face (u+1f436) field to my object without using a String? [duplicate]

javascript,unicode

Aa the ECMAScript standard defines, valid identifiers must start with a Unicode code point with the Unicode property ID_Start. This is not the case for the poor dog. :( You may use any of these code points as first character of your identifier: http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:ID_Start=Yes:]...

Working with characters based on their UTF-8 hex codes

javascript,jquery,unicode,utf-8

I suggest preprocessing the data as you grab it from the webpage instead of extracting it from the string afterwards. You can then use decodeURIComponent() to decode the percent-encoded string: decodeURIComponent('%F0%9F%98%92') Combine that with jQuery to access the data-textvalue-attribute: decodeURIComponent($(element).data('textvalue')) I created a simple example on JSFiddle. For some reason...

selenium webdriver - xpath locator not working if element's text contains Unicode Characters

selenium,xpath,unicode,selenium-webdriver,webdriver

Am I missing something here? Yes, I think so: <div class="menuitem-content">Français</div> the "a" is missing driver.findelement(By.XPath("//div[text()='Françis']")); EDIT: At least in a Java environment Webdriver can handle Unicode. this works for me (driver in this case being an instance of FirefoxDriver): driver.get("https://fr.wikipedia.org/wiki/Mot%C3%B6rhead"); WebElement we = driver.findElement(By.xpath("//h1[contains(., Motörhead)]")); System.out.println(driver.getTitle() + "...

How to display Arabic unicode text in page that retrieved from database

java,unicode,utf-8,xhtml,arabic

This line: String bankName = "\u0627\u0644\u0628\u0646\u0643 \u0627\u0644\u0645\u062a\u062d\u062f"; Is completely equivalent to this: String bankName = "البنك المتحد"; Escaping (think, for example, about \n) isn't a mechanism in-built in Java strings. It's Java compiler that performs these replacements for you. Imagine to have a text file with these two characters: \...

Reading from DATA file handle

performance,perl

I did some benchmarking against three methods. I used an external file for reading (instead of __DATA__). The file consisted of 3 million lines of the exact data you were using. The methods are slurping the file, reading the file line-by line, and using Storable as Sobrique mentioned above. Each...

java convert a english letter to unicode [closed]

java,unicode

It doesn't work because .next() returns a String. Instead, read the first character of the string returned. Scanner input = new Scanner(System.in); String temp = input.nextLine(); char ch = temp.charAt(0); int a = (int) ch; System.out.println(a); ...

Can't locate module(s) using Mojo::DOM

perl,dom,mojolicious,mojo

It's very difficult to debug a single long chained statement like that, and you're much better off splitting it into individual steps Passing a parameter to the dom method is the same as calling find with that parameter on the DOM object. find returns a Mojo::Collection, which makes sense as...

Need to convert Java String with è to \u00E8 using Java

java,unicode

Taking forward Tagir Valeev idea of picking up from java.util.Properties: package empty; public class CharsetEncode { public static void main(String[] args) { String s = "resumè"; System.out.println(decompose(s)); } public static String decompose(String s) { return saveConvert(s, true, true); } private static String saveConvert(String theString, boolean escapeSpace, boolean escapeUnicode) { int...

How to pass a hash as optional argument to -M in command line

perl,hash,package,command-line-interface

Looking at perlrun use: perl -Mfeature=say "-Mconstant {c1 => 'foo', c2 => 'bar'}" -e"say c1,c2" ...

How to extract some text from an HTML doc using Web::Query

perl

The dot selector denotes class selections, which is not what you intend for the second div and h3. For these you want descendant. The correct syntax is; my $subject = $parts->find( 'div.subject > div > h3' )->text; # Which outputs # subjectfinal $VAR1 = '@if version_after macro is illogical'; For...

Why this exclusion not working for long sentences?

text-processing,perl

You're using the wrong syntax: [] is used to match a character class, while here you're trying to match a number of occurences of ., which can be done using {}: perl -ne 'print unless /.{240,}/' input.txt > output.txt Also, as suggested by salva in the comments, the pattern can...

Creating a sequence of unique random digits

arrays,perl,foreach,unique

To generate a list of four unique non-zero decimal digits, use shuffle from List::Util and pick the first four Like this use strict; use warnings; use 5.010; use List::Util 'shuffle'; my @unique = (shuffle 1 .. 9)[0..3]; say "@unique"; output 8 5 1 4 There's no need to seed the...

What does this horribly ugly (yet somehow secretly beautiful) Perl code do?

perl,formatting,deobfuscation

Passing the code to Google returns this page that explains qrpff is a Perl script created by Keith Winstein and Marc Horowitz of the MIT SIPB. It performs DeCSS in six or seven lines. The name itself is an encoding of "decss" in rot-13. See also qrpff explained....

Behaviour unicode string in python

python,unicode

Its just because of that if you don't specify any encoding for unicode function then : unicode() will mimic the behaviour of str() except that it returns Unicode strings instead of 8-bit strings. More precisely, if object is a Unicode string or subclass it will return that Unicode string without...

How to specify string variables as unicode strings for pattern and text in regex matching?

regex,python-2.7,unicode

simply use re.match(myregex.decode('utf-8'), mytext.decode('utf-8')) ...

Looping variables

perl,scripting

As per my comment, most of the functionality of your program is provided by the Math::Vector::Real module that you're already using It looks like you want the angle in degrees between successive pairs of 3D vectors in your file. This code creates vectors from each line in the file until...

Command line arguments in Perl

perl

$! and $_ are global variables. For more information you can read here $_ The default input and pattern-searching space $! If used in a numeric context, yields the current value of the errno variable, identifying the last system call error. If used in a string context, yields the corresponding...

Taking multiple header (rows matching condition) and convert into a column

bash,perl,command-line,awk,sed

In awk awk -F, 'NF==1{a=$0;next}{print a","$0}' file Checks if the number of fields is 1, if it is it sets a variable to that and skips the next block. For each line that doesn't have 1 field, it prints the saved variable and the line And in sed sed -n...

Plain text emails displayed as attachment on some email clients

perl,email,attachment,mime,plaintext

I think this is the closes thing you're going to get to an answer The documentation for MIME::Lite says this MIME::Lite is not recommended by its current maintainer. There are a number of alternatives, like Email::MIME or MIME::Entity and Email::Sender, which you should probably use instead. MIME::Lite continues to accrue...

Replace unicode characters with characters (Javascript)

javascript,unicode,unicode-string

@adeneo posted an option using jQuery. Here's a relevant answer I found that doesn't use jQuery. From this answer: What's the right way to decode a string that has special HTML entities in it? function parseHtmlEnteties(str) { return str.replace(/&#([0-9]{1,4});/gi, function(match, numStr) { var num = parseInt(numStr, 10); // read num...

Windows/Linux child process STDIN differences

linux,windows,perl,process,stdin

This isn't a Windows verus Linux thing. You simply picked two awful examples. type con reads from the console, not from STDIN. This can be seen using type con <nul. cat is extremely unusual. Buffering, on either system, is completely up to the individual application, but almost all applications work...

Perl : Display perl variable awk sed echo

perl

You can do that using a Perl regex pattern my $calculate; ($calculate = $1) =~ tr~,~/~ if $value =~ /SP=[^:]*:([^;]*)/; ...

Reducing code verbosity and efficiency

perl

There is a famous quote: "Premature optimisation is the root of all evil" - Donald Knuth It is almost invariably the case that making code more concise really doesn't make much difference to the efficiency, and can cause significant penalties to readability and maintainability. Algorithm is important, code layout ......

calling cgi script from other cgi script

perl,cgi

A CGI program normally expects to fetch the parameter values from the environment variable QUERY_STRING. Passing parameter values on the command line is a debugging facility, and works only when the program is run from the command prompt You could try something like this my $result = do { local...

Find numbers in a file and change their value with perl

regex,perl

You can just match the numbers that are less than one.. and replace with 1: perl -pe 's/\b0\.\d+/1/g' file See DEMO...

Deleting upto a line

bash,perl,shell,sed,scripting

Try this: grep -oE '[0-9.-]+$' file or awk '{print $NF}' file ...

Capture tee's argument inside piped Perl execution

perl,unix

The short answer is - you can't. tee is a separate process with it's own arguments. There is no way to access these arguments from that process. (well, I suppose you could run ps or something). The point of tee is to take STDOUT write some of it to a...

Opening multiple files in perl array

arrays,perl

You just need to enclose the code that handles a single file in a loop that iterates over all of the log files You should also reconsider the amount of comments that you use. It is far better to write your code and choose identifiers so that the behaviour is...

problems copying shared hash in perl threads

multithreading,perl

Two threads are iterating over the same hash at the same time, so they are both changing its iterator. You need to make sure that no more than one thread uses the hash's iterator at a time. I'd remove all those :shared and use Thread::Queue::Any....

Perl Debugging Using Flags

perl,debugging,script-debugging

In perl, compile time is also run time. So there's really not a great deal of advantage in using #define type statements. My usual trick is: my $debug = 0; $debug += scalar grep ( "-d", @ARGV ); (GetOpt is probably honestly a better plan though) And then use: print...

Perl would I use fc over uc?

perl

fc is used for case-insensitive comparisons. uc($a) cmp uc($b) # XXX lc($a) cmp lc($b) # XXX fc($a) cmp fc($b) # ok An example where this makes a difference: lc uc fc -- -- -- -- ss ss SS ss SS ss SS ss ß ß SS ss ẞ ß ẞ...

Version-dependent fallback code

perl

Many recent features are not forward-compatible. As you've seen, you'll get compile-time errors using a feature that is too new for the version of perl you are running. Using block-eval won't help, because the contents of the block need to be valid for the current perl interpreter. You are on...

How to match and remove the content preceding it from a file in unix [closed]

mysql,perl,sed,solaris

with GNU sed sed -n '1,/-- Final view structure for view `view_oss_user`/p' this will print lines from 1 till pattern found, others will not be printed or if you want to exclude pattern line then sed -n '1,/-- Final view structure for view `view_oss_user`/p' | sed '$d' ...

Why Filter::Indent::HereDoc complain when blank line in middle of HereDoc

perl,heredoc

The documentation says: If there is no terminator string (so the here document stops at the first blank line), then enough whitespace will be stripped out so that the leftmost character of the document will be flush with the left margin, e.g. print <<; Hello, World! # This will print:...

Perl: Multiply loops, 1 hash and regex

arrays,regex,perl,hash,perl-data-structures

you need to append the file when you output meaning use ">>" instead of ">" which will overwrite the file. system("chage -l $_ >> $pwdsett_dump") as you are running it in loop you are overwriting each time the loop executes. Use: foreach (@Usernames) { system("chage -l $_ >> $pwdsett_dump") }...

Perl: Using Text::CSV to print AoH

arrays,perl,csv

Pretty fundamentally - CSV is an array based data structure - it's a vaguely enhanced version of join. But the thing you need for this job is print_hr from Text::CSV. First you need to set your header order: $csv->column_names (@names); # Set column names for getline_hr () Then you can...

Check for decimal point and add it at the end if its not there using awk/perl

regex,perl,shell,awk

In awk Just sub for those fields awk -F, -vOFS="," '{sub(/^[^\.]+$/,"&.",$6);sub(/^[^\.]+$/,"&.",$11)}1' file or sed sed 's/^\(\([^,]*,\)\{5\}[^.,]\+\),/\1./;s/^\(\([^,]*,\)\{10\}[^.,]\+\),/\1./' file ...

How to copy matches from an extremely large file if it contains no newlines?

python,linux,bash,perl,grep

#!/usr/bin/perl use strict; use warnings; use constant BLOCK_SIZE => 64*1024; my $buf = ""; my $searching = 1; while (1) { my $rv = read(\*STDIN, $buf, BLOCK_SIZE, length($buf)); die($!) if !defined($rv); last if !$rv while (1) { if ($searching) { my $len = $buf =~ m{\[(?:a|\z)} ? $-[0] : length($buf);...

Counting occurrences of a word in a string in Perl

regex,perl

I get the correct number - 6 - for the first string However your method is wrong, because if you count the number of pieces you get by splitting on the regex pattern it will give you different values depending on whether the word appears at the beginning of the...

Is executing C++ code in comments with certain Unicode characters allowed, like in Java?

c++,c++11,unicode,comments

If I read this translation phase reference correctly, then the sequence // \u000d some code here is mapped in phase 1 to itself, i.e. the parser does not translate or expand \u000d. Instead the translation of such sequences happens in phase 5, which is after the comments are replaced by...

Get ISO DateTime with only core modules in Perl?

perl

This is simple with the POSIX function strftime: use POSIX qw( strftime ); print strftime "%Y-%m-%d %H:%M:%S", localtime; POSIX has been core since 5.0....

Create unicode character with pack

perl,unicode

Why do I get ff and not c3bf when using pack ? This is because pack creates a character string, not a byte string. > perl -MDevel::Peek -e 'Dump(pack("U", 0xff));' SV = PV(0x13a6d18) at 0x13d2ce8 REFCNT = 1 FLAGS = (PADTMP,POK,READONLY,pPOK,UTF8) PV = 0xa6d298 "\303\277"\0 [UTF8 "\x{ff}"] CUR =...

Perl - an array content

arrays,perl

You can use grep to go through the array one at a time and see if any match: if ( grep $file ~= /\Q.$_\z/, @array ) { Or join the array elements into a single regex: my @array = qw(avi mp4 mov); my $selected_extensions = qr/^\.(?:@{[ join '|', map quotemeta,...

Cyrllic characters in SVG font

javascript,svg,unicode,encoding,fonts

п would be &#1087; To get the code point of a unicode character in JavaScript you can use String.prototype.codePointAt method, in your case just type this into developer console: "п".codePointAt(0) // 1087 To convert the other way around: String.fromCodePoint(1087) // "п" The format in your example, &#x... is a number...

Characters and Strings in Swift

ios,string,swift,unicode,character

When a type isn't specified, Swift will create a String instance out of a string literal when creating a variable or constant, no matter the length. Since Strings are so prevalent in Swift and Cocoa/Foundation methods, you should just use that unless you have a specific need for a Character—otherwise...

Perl & Regex within Windows CMD Line

regex,windows,perl

If I understand correctly, you want to pull out all of the matches from an entire file, and write those results to a separate file. This will work if the below results are what you're after. I don't have a Windows box to test on, but this should work (you...