Ruby Classes, Modules, Mixins, and Inheritance

Classes

Classes are one of the building blocks of object-oriented programming. Think of them as blueprints for creating objects. When we just throw variables and methods inside a Ruby file, it’s very loosely associated. However, when you encapsulate them into a class, you combine the state (instance variables) and behavior (methods) into a cohesive unit to which you could create objects. A few things to keep in mind about Ruby classes:

  • Classes in Ruby are constants.
  • The instance variable lives for the life of the object.
  • You could define your own “to_s” method to customize what you want to display when calling “puts.”
  • Objects are responsible for managing its own internal state. That’s why whether you use attr_accessor or attr_reader would be a design decision. You determine what state you want to expose.
  • Generally you want to keep your classes small so that it’s easier to read, reuse, and unit test. If you have too much code, that usually means you could break it up into another class, module, etc.

The benefit of objects is to encapsulate behavior and responsibilities. This way, objects could interact with each other and tell each other what to do. It’s important to keep in mind, objects are passed by reference in Ruby. So if you had the same object in two different arrays, they are actually the same object.

Modules

Modules are used when your methods carry too much responsibility and you want offload some of it. You don’t want to use a class because these are methods/tasks that don’t have an initial state or require multiple instances. One good use of a Module is a database connector where you only need a single instance. They’re basically just a collection of related methods which could be used across classes and within other methods. One example of this is the Enumerable module.

Mixins

Modules provide the ability to mixin your methods across classes. The best way to think about it is in the context of “able” behaviors. For example if you have a few classes that are “Playable” you could include methods like pause, play, stop, next track, shuffle, etc. This could be shared with objects like Music or Videos. If you find yourself duplicated methods like play in both Music and Video classes, then it’s a sign you should mix it in as a module.

All you have to do to mixin is to place an “include [module name]” at the top of your class. From here all methods inside the module are instance methods of that class. When Ruby looks for methods, it first starts in the class, the the module, then the super class, etc. You could run [class].ancestors to see the chain.

One good practice is to use self.[attribute] inside the module instead of instance variables. That way you don’t require the class you’re mixing into to have the same instance variables. It’s better to depend on methods (which attr_accessor/reader/writer sets up real nice for us).

Namespaces

Modules also help prevent naming collisions, especially when you’re going to bundle your library into a gem where it’ll be shared with the world. It basically does this but nesting classes within the module declaration. This way, it forces you to call the class via the module name. For example: HotelsApp::Rooms, EscapeRoom::Rooms, and Hospital::Rooms. There’s no confusion as to which class you’re calling. You could have classes with the same name in different modules and everyone’s happy.

Inheritance

This concept is applied when a class “is a kind of” another class.

  • Dog is a kind of Animal
  • Van is a kind of Automobile
  • Automatic is a kind of Watch
  • Pinot Noir is a kind of Wine

So the parent the child is inheriting from will have qualities and characteristics that are shared. In essence, they share common code that could be reused. They also have unique traits and behaviors and could override their parent’s methods (polymorphism). The super method allows children to call parent methods or even initialize the child object.

Ruby: When to Use Symbol or String?

Symbols are just a way to name something. Typically it’s used to name attributes in a class or for options (colors, types of cuisines, type of shoe, etc). In essence, a constant string.

 

Use Strings instead of a Symbol when you need string like behavior (length, capitalize, reverse, etc). One thing to keep in mind is that Symbols are always in the same place in memory whereas Strings, even the same exact string, are in different parts of the heap.

 

 

Ruby Methods Basics

Methods are great for DRYing up your code and organizing logic in a single place. They act a little bit like a black box. Here are some tidbits that might not be so obvious with Ruby methods.

All methods have to be called on an object

In Ruby, there has to be an object that receives a method call, even if you don’t see it. For example, the “puts” method or the multiplication operator:

So in words:

  • “puts” is a method
  • This method is called on an object.
  • That object is stored in the self variable.
  • That variable points to the object which is named main (Akin to how a string variable points to a an object named String).

Obviously you won’t be able to call “self.puts” because Ruby doesn’t allow explicit receivers for private methods. I won’t get into the details of self but just know that its value could change depending on where you are in your program and is the default receiver for method calls.

Fun fact: Within a method, Ruby will check whether something is a local variable first. If not, it looks for an implicit receiver next.

So we call methods on objects and sometimes we’ll pass a block and/or a method parameter. It’s good to keep in mind Ruby passes variables by reference. For instance:

Default parameters

 

 

Ruby Variables Basics

Variables in Ruby are basically pointers to an object in memory (heap).

Hopefully this illustrates how variables work in Ruby.

Ruby Blocks Tutorial

Sometimes we need to make our code more expressive and tidy but at the same time flexible and easier to use. Learning how to use Ruby blocks appropriately will level you up as a Ruby programmer.

Sometimes we like to “sandwich” boilerplate code for various tasks. Maybe we want to keep track of the value or a variable within a block of code. The simplest example are HTML tags:

We might also duplicate code with similar if-else statements, especially checking boolean values. With blocks, we could separate the concerns a bit and even add the flexibility of performing additional tasks from our result.

So yield basically runs the code in the block and returns its value. That’s the basic concept. Now with this we could run code under a specific context.

One example of this are the Rails environments: development, test, production. When we run code in the context of one of these environments, we want it to automatically switch us back to our default environment, whether an exception has occurred or not. Here are a few other examples:

  • Silence warnings, logs, etc.
  • Changing drivers temporarily
  • Changing defaults for testing (wait times, values, etc)
  • Changing locale, currency, etc.
  • Temporarily funneling results to a file

In my example we’ll use a broker’s trading platform. Most of them allow you to trade virtually or on a live interface:

Once you have to toggle between different contexts, that usually means you could simplify with blocks:

As you could see, the original method is prone to errors while the block pattern encapsulates the toggling in one place and ensures the environment is switched back to the original, even after an exception has occurred.

You get the sense you could do a bit more with this, like managing authentication, a database connection, opening a FTP connection, a URL, opening files, etc. With blocks you could have the code manage its own life cycle. In Ruby, a lot of these blocks are performed with class methods (as opposed to instance methods in the previous examples).

Imagine we want to call an API to make trades for us. A lot of third-party app developers would want this feature. Obviously I need to be authenticated and a connection has to be made. Then my orders will hopefully execute and that connection will close.

Pay special attention to the class method. I no longer have to instantiate an object in order to perform actions on this service. Obviously you would include real authentication rather than a username but this should shed some light on the common Ruby idioms you’ll see.

Finally, you’ve probably noticed that some objects are instantiated with a block instead of passing in variables. We see this in ActiveRecord, Rake tasks, and Gemfiles. If you think about how yield works, you’ll get an idea of how that happens. When calling new, Ruby allocates the memory and calls initialize. In initialize we could actually pass the object to the block as a block parameter and instantiate the instance variables that way:

I hope this tutorial has helped demystify Ruby blocks in a way where you could apply this to your own code.

 

 

When to Define Your Own Custom Ruby Iterator?

General rule of thumb is if you have a collection of objects, you should build custom iterators so that users could readily use them like they do with other collection such as Array, Hash, File, etc.

The easiest way to do this is to mix-in the Enumerable module. The only requirement is to define an each method in your class which will provide Enumerable with your collection objects.

Let’s take a look at an example. Let’s say you’re a property owner with a few properties in various cities.

Now, what if we want to iterate through the addresses in LA?

So how do we implement an iterator, much like we’ve seen with other collections such as Arrays and Hashes? We define an each method in our PropertyList class.

Now what if we want to get the total square footage of all the properties in LA that have at more than 3 rooms? Naturally we think of the select and reduce methods, but those belong to the Enumerable module.

Aha! As mentioned, all we have to do is define an each method in the host class to mix in Enumerable. Our final class looks like this:

and we get an output of: “Total SQ/FT for LA Properties: 10000” which is correct.

Now let’s level up some more. How about we build our own Enumerable module. We want to get this to work:

All we have to do is define our own module and include it in the PropertyList class:

With this we just “Include EstateEnumerable” instead of “Enumerable” and we’ll get this as the output:

This hopefully gives you a glimpse of what Ruby blocks are capable of.

How to Write Your Own Iterators in Ruby

Generally if your class has an array/hash, sometimes you don’t want to expose the implementation of those collections outside of the object. Creating your own iterator will make your object easier to use.

I think another way to demonstrate this is to show how the “each” and “times” methods work in Ruby. There isn’t much magic involved. For times all you have to do is open up the Integer class and define your own with a while loop:

As for each it’s very similar:

Hopefully this helps demystify these methods we use and love so much.

How Ruby Local Variables Interact with Blocks and Their Surrounding Scope

Sometimes the scope of a local variable is a bit confusing when working with blocks. For example, the following will result in an “undefined local variable” for variables “x” and “bacon”

Both are variables defined within a block and will be local to the block. They’re not accessible outside of the block. However, if you were to put “bacon” outside of the block:

The output would be “Turkey.” The block could change the value of a local variable in its surrounding scope. The variable “x” on the other hand is a block parameter and is always local to the block. So even if you define “x” in the surround scope, it would not adjust its value:

Now what if you want to use the variable “bacon” inside the block without having it affect the surrounding scope’s assignment? Fortunately you could tack on a semi-colon in the block parameters list and follow it with variables you want to reserve within the scope of the block:

The variables defined after the semi-colon are protected from being changed in the surround scope.

Hope this clears it up for people running into weird block issues.

Vim – File Formats, Line Feed, and Carriage Return

When given a comma or tab-delimited file, we usually want to import this into some kind of database. The first you need to find out is what type of file this is as it could make or break your import.

Mac OS Pre-X CR ASCII 13 Control Key: ^M
Mac X / Unix LF ASCII 10 Control Key: ^J
Windows CRLF N/A ^M$

This could easily be done at the command line with:

Doesn’t get much clearer than that. So this particular file is from a Windows system with CRLF terminators. Now, without doing this, it’s a bit vague what we’re importing from. For example, if we assumed the text file was from a OSX/Unix system and ran this:

You’ll get some funky carriage return characters at the beginning of each row (and they will count as characters and truncate your data). Now using vim editor, you won’t see much wrong with the file with the “:set list” command. You’ll see ^I where the tabs are and a $ for end-of-line. All looks well but that’s because vim auto-detected the file. You could see this with the vim command:

Now to see its true colors, let’s force it to read as unix/DOS.

You’ll start to see the infamous ^M in the file.

In conclusion, find out exactly what kind of file you’re importing, and let the importer know. In the case with MySQL, terminate lines by “\r\n” will provide a proper import. I hope this solves the mystery of imports that go wrong!

 

How Unicode Plays a Part in Your Software – Encoding & Character Sets

Introduction

There was a time when you could determine the size of a file by counting the number of characters it had. One character equates to one byte. Simple. In fact, it was how I found the office perpetrator who printed out a nasty letter for everyone to see. I went through all the print logs and counted the bytes.

 

In many cases, this is still true. However, for languages, such as Chinese, with thousands of characters, 8 bits (2^8 = 256) is not enough. For this reason a multitude of encoding standards (ISO-8859, Mac OS Roman, Big5, MS-Windows char sets, etc) have been implemented but it has been a headache to make consistent across applications and delivery systems. In some cases, in order to have multiple encodings or character sets in one document would require yet another encoding standard or would just be impossible. This not only applies to text documents, but web pages and databases as well.

 

We needed a standard that encompass it all. That standard is called Unicode.

 

What is Unicode?

Unicode is just a giant mapping table of numbers (code points) to characters. That’s about it. The kicker is that it includes every character imaginable on this planet. Basically it’s the superset of all character sets in existence today. It even includes ancient scripts like Egyptian Hieroglyphs, Cuneiform, and Gothic. The characters make up the code space.

 

Unicode encodings (e.g., UTC-8) specify how these numbers (with their own code points) are represented as bits.

 

Consists of 17 planes of 65,536 (=2^16) code points each. That’s 1,114,112 code points. That’s enough code points to map all past, present, and future characters created by mankind. The first plane, Basic Multilingual Plane (BMP) contains most commonly used characters.

 

What’s the difference between a character set and an encoding?

Character sets are technically just list of distinct characters and symbols. They could be used by multiple languages (e.g., Latin-1 is used for the Americas and Western Europe).

 

Encoding is the way these characters are stored in memory. An encoding maps these characters to a binary representation.

 

Character sets that have encodings are called coded character sets. Unsurprisingly, this is a bit confusing because many systems use them interchangeably. For example, MySQL calls a characters and their encodings simply as a character set. What they really mean is a coded character set (or code pages).

 

Every encoding must have a character set associated with it but a character set could have multiple encodings. The most relevant example of this is the Unicode character set with multiple encodings (UTF-8, UTF-16 BE, UTF-16 LE, UTF-32, etc). The same character in one encoding could be represented by a larger/smaller number of bytes in another encoding.

This W3C article does a fine job explaining this.

What are code pages?

It’s mostly a Microsoft Windows-specific encoding that is based on standard encodings with a few modifications. It could also be generically a coded character set.

 

UTF-8 vs UTF-16 vs UTF-32

UTF-8

  • Variable-length 8 bit code units
  • Backward compatible with ASCII without having to do deal with endianness or byte order marks (BOM). The first 128 characters correspond one-to-one with ASCII.
  • Some commonly used characters could be various lengths which could cause indexing and calculating a code point slow.

UTF-16

  • Variable-length 16 bit code units
  • Great if ASCII doesn’t dominant the document. It’ll use 2 bytes total whereas UTF-8 will use 3 or more bytes. e.g., East Asian languages required 2 bytes in UTF-16 whereas in UTF-8 it would be at least 3.
  • If using primarily US-ASCII strings, there will be lots of null bytes.

UTF-32

  • 32 bit code units
  • You don’t need to decode the code point as it’s given to you in it’s purest 32-bit format.

How does character sets and encoding relate to fonts?

A font defines the “glyphs” for usually a single character set or a subset of a character set. If there’s a character undefined in the font, you’ll typically get a replacement character like a square box or question mark.

 

Basically, fonts are glyphs that are mapped to code points in a coded character set.

Conclusion

At this time, most systems are using UTF-8. It’s efficient as far as storage (as long as it’s mostly ASCII characters). It has the possibility of mapping any character imaginable so there’s really no reason not to use it.

 

When you type on your keyboard, you’re using a certain encoding scheme. When you save that file and display the text again using the same encoding, you’ll get consistent results. The biggest problem we run into is seeing random looking characters in our files. The only explanation for this is that the encoding used to view the file is incorrect.

 

It’ll be important to note: conversion from one encoding to another is not for the faint of heart. You have to know what you’re doing or you’ll lose your original bits forever. Sometimes it’s not even possible to perform the conversion.

 

From this point forward, a byte no longer equates to character. Be wary of the encoding scheme used, especially if you start to see a snowman and cellphones in your CSV file.

 

Bottomline: Use UTF-8.