Category Archives: Networking

How Unicode Plays a Part in Your Software – Encoding & Character Sets

Introduction

There was a time when you could determine the size of a file by counting the number of characters it had. One character equates to one byte. Simple. In fact, it was how I found the office perpetrator who printed out a nasty letter for everyone to see. I went through all the print logs and counted the bytes.

 

In many cases, this is still true. However, for languages, such as Chinese, with thousands of characters, 8 bits (2^8 = 256) is not enough. For this reason a multitude of encoding standards (ISO-8859, Mac OS Roman, Big5, MS-Windows char sets, etc) have been implemented but it has been a headache to make consistent across applications and delivery systems. In some cases, in order to have multiple encodings or character sets in one document would require yet another encoding standard or would just be impossible. This not only applies to text documents, but web pages and databases as well.

 

We needed a standard that encompass it all. That standard is called Unicode.

 

What is Unicode?

Unicode is just a giant mapping table of numbers (code points) to characters. That’s about it. The kicker is that it includes every character imaginable on this planet. Basically it’s the superset of all character sets in existence today. It even includes ancient scripts like Egyptian Hieroglyphs, Cuneiform, and Gothic. The characters make up the code space.

 

Unicode encodings (e.g., UTC-8) specify how these numbers (with their own code points) are represented as bits.

 

Consists of 17 planes of 65,536 (=2^16) code points each. That’s 1,114,112 code points. That’s enough code points to map all past, present, and future characters created by mankind. The first plane, Basic Multilingual Plane (BMP) contains most commonly used characters.

 

What’s the difference between a character set and an encoding?

Character sets are technically just list of distinct characters and symbols. They could be used by multiple languages (e.g., Latin-1 is used for the Americas and Western Europe).

 

Encoding is the way these characters are stored in memory. An encoding maps these characters to a binary representation.

 

Character sets that have encodings are called coded character sets. Unsurprisingly, this is a bit confusing because many systems use them interchangeably. For example, MySQL calls a characters and their encodings simply as a character set. What they really mean is a coded character set (or code pages).

 

Every encoding must have a character set associated with it but a character set could have multiple encodings. The most relevant example of this is the Unicode character set with multiple encodings (UTF-8, UTF-16 BE, UTF-16 LE, UTF-32, etc). The same character in one encoding could be represented by a larger/smaller number of bytes in another encoding.

This W3C article does a fine job explaining this.

What are code pages?

It’s mostly a Microsoft Windows-specific encoding that is based on standard encodings with a few modifications. It could also be generically a coded character set.

 

UTF-8 vs UTF-16 vs UTF-32

UTF-8

  • Variable-length 8 bit code units
  • Backward compatible with ASCII without having to do deal with endianness or byte order marks (BOM). The first 128 characters correspond one-to-one with ASCII.
  • Some commonly used characters could be various lengths which could cause indexing and calculating a code point slow.

UTF-16

  • Variable-length 16 bit code units
  • Great if ASCII doesn’t dominant the document. It’ll use 2 bytes total whereas UTF-8 will use 3 or more bytes. e.g., East Asian languages required 2 bytes in UTF-16 whereas in UTF-8 it would be at least 3.
  • If using primarily US-ASCII strings, there will be lots of null bytes.

UTF-32

  • 32 bit code units
  • You don’t need to decode the code point as it’s given to you in it’s purest 32-bit format.

How does character sets and encoding relate to fonts?

A font defines the “glyphs” for usually a single character set or a subset of a character set. If there’s a character undefined in the font, you’ll typically get a replacement character like a square box or question mark.

 

Basically, fonts are glyphs that are mapped to code points in a coded character set.

Conclusion

At this time, most systems are using UTF-8. It’s efficient as far as storage (as long as it’s mostly ASCII characters). It has the possibility of mapping any character imaginable so there’s really no reason not to use it.

 

When you type on your keyboard, you’re using a certain encoding scheme. When you save that file and display the text again using the same encoding, you’ll get consistent results. The biggest problem we run into is seeing random looking characters in our files. The only explanation for this is that the encoding used to view the file is incorrect.

 

It’ll be important to note: conversion from one encoding to another is not for the faint of heart. You have to know what you’re doing or you’ll lose your original bits forever. Sometimes it’s not even possible to perform the conversion.

 

From this point forward, a byte no longer equates to character. Be wary of the encoding scheme used, especially if you start to see a snowman and cellphones in your CSV file.

 

Bottomline: Use UTF-8.

 

 

How to Setup Let’s Encrypt on Apache2 and Ubuntu 14.04 LTS

After years of having to manually renew certificates (I’ve used StartSSL in the past), Let’s Encrypt is finally live and will allow you to automate this process by installing an agent and a cron job.

Here I’m trying to install certificates on multiple blogs on the same server.

Sites. secure all congrats

auto-create-keys

It’s stupidly easy to do:

  1. Go here: https://certbot.eff.org/
  2. Follow the on-screen instructions
  3. THAT’S IT!

Amazing right? Well I did run into a few errors but they were easily solved:

  • No valid IP addresses found for [website]
    • Make sure your DNS A and CNAME records are correct with the correct IP
  • Incorrect validation certification for TLS-SNI-01 challenge
    • This I found was due to two issues I had:
      • I forgot the site was no longer hosted on my server so the DNS record was pointing to another host anyway
      • You need to have SSLEngine, SSLCertificateFile, and SSLCertificateKeyFile values set in your Apache configuration. Even if it points to empty files, it’ll work. Of course this is probably because I had values there earlier. I haven’t tested this but if only “SSLEngine on” is set, it should still work.
        turn off ssl3
  • DNS problem: NXDOMAIN looking up A for [website]
    • Make sure you have a CNAME record for the subdomain (e.g., “www”)
  • Redirect HTTP traffic to HTTPS no longer works
    cert generator

    • All you have to do is adjust the 000-default configuration to the following:           redirect-http-to-https

Also be sure to protect your sites from POODLE.  Analyze your site here: https://www.ssllabs.com/ssltest/analyze.html

That’s all folks. If you have any issues please let me know in the comments.

How to Revert/Undo Changes in Git

Generally there are three ways of reverting changes:

  1. checkout
  2. revert
  3. reset

Checkout

If you just need to revert specific files, you could run git checkout to retrieve an exact version. In the below example, I wanted to revert the “app.rb” file so that it only contains “Some app work.

git-undo-checkout

Revert

Revert will create a new commit undoing the changes made during a specific commit. It remove an entire commit in your project history. In this example, I’m going to undo the changes in last commit. As you could see, the history of the revert is kept.

git-revert

 

Reset

Unlike revert, reset will undo all subsequent commits. It has the potential Only use this to undo local changes. Most use reset to unstage files to match the most recent commit and perhaps create more focused commits/snapshots. The working directory is unchanged unless “–hard” option is set.
git-reset

You could also reset to a tag.
git-reset-tag

How to Setup a DigitalOcean Provider on Vagrant

If you haven’t done so, install Vagrant for your OS here.

Generate your key pairs. If using Windows, use puttygen. In Windows you’re going to have to use the OpenSSH key formats:

digitalocean-puttygen-private-key digitalocean-puttygen-public-key

Edit VagrantFile. Make sure the private key does not have an extension. The public key should have extension “.pub” with the same file name as the private key (e.g., “do.pub”).

Create your API V2 token from your DO control panel:

digitalocean-generate-new-token

Most of these lines are self-explanatory. To get a list of images and regions you could click on “Create Droplet” from your web account or you could run the following:

Here’s a sample list of regions and images:

Regions Images
  • nyc1
  • ams1
  • sfo1
  • nyc2
  • ams2
  • sgp1
  • lon1
  • nyc3
  • ams3
  • fra1″
  • centos-5-8-x64
  • debian-6-0-x64
  • fedora-21-x64
  • ubuntu-12-04-x64
  • debian-7-0-x64
  • ruby-on-rails
  • wordpress

Now just change into the Vagrant project folder and run “vagrant up”

How To Install OpenVPN Access Server on Ubuntu 14.04.1 LTS

OpenVPN Access Server’s free license provides two user accounts free of charge.

Open ports:

  • TCP:443
  • TCP:943
  • UDP:1194

Admin UI: https://YourIpAddress:943/admin
Client UI: https://YourIPAddress:943/

openvpn1

 

How to Setup gcloud Tool on Ubuntu 12.04 LTS

If you see:

CommandException: arg (XXXXX) does not name a directory, bucket, or bucket subdir.

Keep in mind rsync works on directories, cp works on files.

 

How To Setup EXIM with DKIM and SPF on Multiple WordPress Domains Hosted on a Single Ubuntu 14.04 Server

My goal is to host multiple Wordpress blogs on a single Ubuntu server. This will be for outbound email only. Setting up SMTP for each of my blogs by selecting from a plethora of plugins isn’t a sound solution. With sendmail/postfix and default PHP mail() settings, it isn’t difficult to send mail. However, the problem is that seemingly valid email often times gets marked as spam (e.g., Password Reset). Case in point:
wordpress_email_labelled_as_spam

Gmail’s spam filter is incredibly sophisticated and a bit more stringent than others. Reason being, the sender address can be easily spoofed. However, the originating server can not be forged so easily. Even then, there are further checks to make sure the email isn’t spam.

Capture

From my experience, there a few factors which determine whether an email is going to be marked as spam by Gmail.

  • You need to have a Sender Policy Framework (SPF) and a DomainKeys Identified Email (DKIM) records in your DNS.
    • You could run: dig [domain name] txt to find out any domain’s TXT DNS records.
    • The SPF record determines which servers are allowed to send email.
    • The DKIM record is used to validate the actual email itself. This ensures the message wasn’t tampered with even if it did come from a valid mail server.
    • If both are valid you’ll see something like this (You can see this under “Show Original” from the message pull-down menu):
  1. The “From” email address and name. From what I’ve seen, only valid sender email addresses were able to avoid being marked as spam. So if you setup admin@geekbacon.com but that email address doesn’t actually exist, it will be marked as spam. Same goes for “no-reply” addresses, etc.
    Wordpress Email Options
  2. The content itself. Even emails without subjects could be marked as valid but a suspicious “Subject” could cause the spam filter to trigger. From my experience, the content of the email has a greater weight in determining whether the email is spam or not. Play around with the email templates, fix formatting errors, broken links, etc.

Setting up a SPF DNS Record

This one is straight forward. Create a TXT record “v=spf1 ip:[IP ADDRESS] ~all”

Here is an example from DigitalOcean:

Capture

How To Setup EXIM4 with DKIM

DKIM is included in Exim 4.7+. I’m installing Exim version 4.82 on Ubuntu 14.04.1 LTS, Trusty Tahr.

You need to create a DKIM record: “v=DKIM1; k=rsa; p=[Your public key]”

Here’s another example from DigitalOcean:

Capture

You’ll replace the “p=” section with your own public key without any line breaks.

Now create a new file /etc/exim4/dkim_vhosts. Here you would list out all the virtual hosts and allowed sender addresses in your domains. For example:

Now edit /etc/exim4/conf.d/transport/30_exim4_config_remote_smtp. The entire file should look something like this:

Now restart exim4 (and Apache if you wish):

Now just send a test email from WordPress and it shouldn’t be marked as spam anymore! Lastly, I want to stress that the sender email should be valid, that includes “no-reply” addresses.

GPG signature verification failed when updating rvm

When updating rvm getting error message like the following.

GPG signature verification failed for '/usr/local/rvm/archives/rvm-1.26.5.tgz' - 'https://github.com/wayneeseguin/rvm/releases/download/1.26.5/1.26.5.tar.gz.asc'!
try downloading the signatures:

Type the following in terminal to fix it.
sudo gpg --keyserver hkp://keys.gnupg.net --recv-keys D39DC0E3

How to Setup a Basic Firewall on a Cisco ASA 5505

  • en
  • config t
  • write erase
  • config factory-default (space through all the pages)
  • reload (Don’t save current config)
  • Say no to interactive prompts
  • en (There’s no password)
  • config t
  • enable password [specify enable password]
  • hostname [Your Hostname]
  • interface vlan 1
    • description [VLAN 1 free-form description]
    • security-level 0
    • nameif outside
    • ip address [public ip] [mask] (If you’re using DHCP, replace with “ip address dhcp setroute”)
  • interface vlan 2
    • description [VLAN 2 free-form description]
    • security-level 100
    • ip address [internal ip] [mask]
    • nameif inside
  • interface ethernet0/0
    • description [Insert description]
    • switchport access vlan 1
    • no shutdown
  • interface ethernet0/1
    switchport access vlan 2
    no shutdown
    interface ethernet0/2
    switchport access vlan 2
    no shutdown
    interface ethernet0/3
    switchport access vlan 2
    no shutdown
    interface ethernet0/4
    switchport access vlan 2
    no shutdown
    interface ethernet0/5
    switchport access vlan 2
    no shutdown
    interface ethernet0/6
    switchport access vlan 2
    no shutdown
    interface ethernet0/7
    switchport access vlan 2
    no shutdown
    show switch vlan
  • crypto key generate rsa modulus 1024 (type yes for confirmation)
  • ssh [network allowed to ssh] [mask] inside
  • ssh timeout 10
  • ssh version 2
  • username [specify username] password [specify password] privilege 15
  • aaa authentication ssh console LOCAL
  • show run ssh
  • route outside 0 0 [ISP Gateway] 1 (This setups the default route)
  • global (outside) 1 interface
  • nat (inside) 1 [IP address/network for PAT] [mask]
  • http server enable (requires port if accessing from outside)
  • http [Allow IP Address(s)] [Mask]
  • policy-map global_policy
    • class inspection_default
    • inspect icmp
  • end
  • wr m
  • reload
  • show running-config (To check that everything is ok)