Author Archives: Lance

s_mach.concurrent: Futures utility library

Versions

  1. Scala 2.11.0
  2. s_mach.concurrent 0.1

Overview

s_mach.concurrent is an open-source Scala utility library that extends the standard scala.concurrent library. It adds new types and functions for easily controlling concurrent execution flow and helps to fix a fundamental flaw in the standard scala.concurrent.Future implementation.

Imports

All code examples assume the following imports:

Concurrently

When first using Future with a for-comprehension, it is natural to assume the following will produce concurrent operation:

Example 1: Incorrect Future concurrency

Sadly, this code will compile and run just fine, but it will not execute concurrently. To correctly implement concurrent operation, the following standard pattern is used:

Example 2: Correct Future concurrency:

To get concurrent operation, all of the futures must be started before the for-comprehension. The for-comprehension is a monadic workflow. It captures commands that must take place in a specific sequential order. The pattern in example 2 is necessary because Scala lacks an applicative workflow: a workflow that captures commands that may be run in any order. s_mach.concurrent adds an applicative workflow method for futures: concurrently. This method can more concisely express the pattern above:

Example 3: New concurrently method

In the example above, all futures are started at the same time and fed to the concurrently method. The method returns a Future[(Int,Int,Int)] which completes once all supplied futures complete. After this returned Future completes, the tuple value results can be extracted using normal Scala idioms. The concurrently method also fixes problems with scala.concurrent exception handling (see the Under the hood: Merge section below).

Transforming and traversing collections serially and concurrently

A common task when working with futures is either transforming or traversing a collection that will call a method that returns a future. The standard idiom for performing this task only provides methods for concurrent operation and, with enough nesting, leads to difficult to read code:

Example 4: Transform and traverse collections, standard method

The same code, rewritten using s_mach.concurrent:

Example 5: Using s_mach.concurrent to serially or concurrently transform and traverse collections:

Transforming and traversing collections using workers

Example 6: Using s_mach.concurrent workers to transform and traverse collections:

Under the hood: Merge method

Powering both the general concurrently method and the collection .concurrently.map, .concurrently.flatMap and .concurrently.foreach methods are the merge and flatMerge methods. The merge method performs the same function as Future.sequence (it calls Future.sequence internally) but it ensures that the returned future completes immediately after an exception occurs in any of the futures. Because Future.sequence waits on all futures in left to right order before completing, an exception thrown at the beginning of the computation by a future at the far right will not be detected until after all other futures have completed. For long running computations, this can mean a significant amount of wasted time waiting on futures to complete whose results will be discarded. Also, while the scala parallel collections correctly handle multiple concurrent exceptions, Future.sequence only returns the first exception encountered. In Future.sequence, all further exceptions past the first are discarded. The merge and flatMerge methods fixes these problems by throwing ConcurrentThrowable. ConcurrentThrowable has a member method to access both the first exception thrown and a future of all exceptions thrown during the computation.

Example 7: Future.sequence gets stuck waiting on longRead to complete and only returns the first exception:

Example 8: merge method fails immediately on the first exception and throws ConcurrentThrowable, which can retrieve all exceptions:

Concurrent Semaphore

TODO

Example 9: Semaphore

Concurrent Lock

TODO

Example 9: Lock

ConcurrentQueue

s_mach.concurrent provides a basic concurrent queue trait ConcurrentQueue that allows for asynchronous buffering operations, including operations on collections of items. Currently only one implementation, ConcurrentListQueue is provided.

Example 11: ConcurrentListQue

Utility methods

s_mach.concurrent provides a few utility methods for writing more concise and DRY code when working with Future:

Example 12: Utility methods

Sugar methods

s_mach.concurrent also provides a number of syntatic-sugar methods for writing more concise and DRY code when working with Future:

Example 13: Sugar methods

Scala: Type-classes

Versions

  1. Scala 2.11.0

What is a type-class?

A type-class a type that is used to add new behaviors to a “primary” type, without having to extend or modify the primary type1. In Scala, there is no native support for type-classes. Instead type-classes are implemented by following a design pattern2. The most common design pattern is to create a Trait for the type-class that accepts one type parameter as the primary type (though more are supported6) and to define one or more abstract methods that add new behaviors to the primary type.

Example

  1. Printable accepts one primary type parameter A
  2. Printable adds a new print method to A

A type is added as a member of a type-class by implementing the type-class for that type:

Note
Simple type-class implementations can often be automatically created using macros or implicits.

In Scala, some “semantic sugar” can be used to make using type-classes easier:

So what happens if we invoke print on a type that doesn’t belong to the Printable type-class?

Compiler error! Scala code that doesn’t properly implement a type-class will never compile.

Why use type-classes?

Type-classes are used to create “ad hoc” polymorphism. They allow adding types as members of a type-class at any time, including after the original definition of the type. Object Oriented Design (OOD) uses inheritance to create polymorphism, but this requires knowing all possible desired instances of polymorphism at design time (though the Visitor Pattern can be used to alleviate this problem somewhat). If further polymorphism is required, most likely many types will need to be refactored.

Type-classes are also highly modular. Users that aren’t interested in adding the new behavior can simply ignore the type-class. This can significantly reduce the complexity of the primary type. Often when using type-classes and case classes in Scala, the case class doesn’t need methods at all! Also, users of type-class libraries can select specific implementations of a type-class for the primary type that best suits their specific needs at that moment. In Scala, this is controlled by importing the desired type-class implementation:

  1. A wrapper object is used here to allow for better control of where the implicit is imported
  2. Defining a custom Printable for Int
Note
If you pasted the previous examples into the console, you will need to restart the console to remove the global implicit for Printable[Int].

An Extended Example

When I first read about type-classes, I found it difficult to understand what marginal value they added over classic OOD polymorphism. But overtime, I’ve grown to love type-classes. For me it took encountering the many painful refactorings that ultimately result from overuse of classic OOD polymorphism. Refactoring gets old quickly. I’ve built this example to help illustrate this idea.

Once upon a time, I created a very basic inheritance structure for modeling the tools in my shed:

This classic OOD model suited my needs and got the job done for a long, long time.

But one day I realize I can’t find my hammer. I’m working on my new IKEA shelf and I just have some finishing nails that I need to hammer in to finish. I poke around my shed and realize my favorite screwdriver (“big bertha”) could probably get the job done! I awkwardly pound my finishing nails in using bertha, but my post-IKEA-assembly-bliss is cut short. I have a problem: Screwdrivers can pound! I’m in a hurry to get my new shelf into my house, so I quickly refactor my model:

This is far from ideal, but I’m in a hurry, so I commit my code and call it a day. Later that night, I’m restless in bed. I realize that if I were to loan my tools to a neighbor, he might assume that because my tool model has the pound method, he can pound things with any of my tools. This might break my rake but I made sure he can’t do that. But my model shouldn’t give him that idea at all. The next morning, I refactor again:

Much better! My neighbor will no longer assume he can use my rake to pound things. Though I’ve created a class that doesn’t really represent anything real. Also, the more I think about the stuff in my shed, the more I realize there is a ton of stuff in there that could pound things. I could have used some of my spare piping to pound things as well! If I want to represent this I will have to refactor again!

Luckily, I spend some time searching the web and discover the pattern to end all this nasty refactoring: type-classes. I refactor one final time:

Perfection! No refactoring needed ever again. As I find things around my shed that can pound, I simply add a new type-class implementation. Also, I can do the same for things that could turn screws or gather leaves. Super flexible!

When to use OOD polymorphism

Some folks might want you to think that you should always use type-classes. But in Scala they require significantly more boilerplate to implement. Also, because Scala doesn’t natively support type-classes, code readers must know the Scala type-class pattern to understand how they work.

I’ve found that the best time to use OOD polymorphism over type-classes is when all of the possible polymorphic methods are known up front and expansion to future use cases is unlikely. A great example of this is the Scala collections library. It is very unlikely that a new method will be added to IndexedSeq or that Traversable will suddenly need the ability to get a value by its index. On the flip side, implementing the collections library with only type-classes would introduce a ton of complexity. Each method on Traversable would need its own type-class. That’s at least 50 type-classes for 50 methods! (Though this number could be reduced significantly by grouping related methods into a few type-classes. See StringOps and StringLike for examples.)

When to use type-classes

In choosing to use type-classes, I’ve found that the clearest use case for them is when I might need to add a behavior to almost any type. The best example of this is for serialization/marshalling/binding etc. Converting to and from JSON, BSON, XML, etc is something that is commonly needed for most every type. Also, sometimes I like to swap out implementations based on what I’m doing. I might have a different JSON serializer depending on the recipient of the JSON.

In many cases, the choice of OOD inheritance or type-classes to achieve polymorphism can be somewhat arbitrary. Scala gives me a ton of flexibility and the downside of all of that choice is that many times, at least within the context of Scala, the question is simply one of what color to paint my shed.

Scala: Cardinality prefix naming convention

Naming Option variables

Many times while writing Scala, I find myself having to look up the type of a variable that I’m working with. Often it is because I’m unsure whether the variable is an Option. This means having to depend on my IDE shortcut to show me the type or having to navigate to the source file.

  1. Is stateCode an Option?

Other times, I’m transforming a given Option variable into its base type and I’m forced to make up an awkward name for the inner name of the value contained in the Option:

  1. Ugly! What to name this value?!?

While this example is trivial, it gets complicated fast once you have 15 or so fields to deal with at once:

  1. Without having to flip back to this source file, I don’t know which fields are Option!

I’ve started using a super-simple naming convention that explicitly declares the cardinality of the variable to both avoid the naming problem and to increase readability of the code. For Option, I simply add the prefix “opt” to the Option variable:

  1. Ahhhh so much better. I know optPhoneNumber is an Option AND I have an obvious (and highly readable) name for the inner variable.

Naming collection variables

There is also another problem that I encounter when naming things: what the heck do I name collection variables? For plural words in English, the convention is mostly to just append s. And this mostly works:

Except when it doesn’t work:

  1. Ack. Ugly!!
  2. Ugly and people think I don’t know basic English grammar. They won’t understand my need for regularity.

Also, there is a more serious problem here. If a collection has at least one item, it is always safe to call head. This condition can often be guaranteed at compile time, but how would anyone know? This leaves me with the same problem I had with null in Java: I must either defensively test everywhere for the condition OR I must depend on it being in the documentation.

  1. Maybe this is wrong? Hopefully, documentation guarantees that hotels is non-empty.
  2. Or I can be safe about this. But now I have an Option and my code grows ever more complex (for possibly no good reason if hotels is guaranteed to have at least one item).

In code I write now, I fix both of these problems by keeping the variable name in the English singular, and prefixing a plural variable prefix to the variable name:

  1. “zom”: Read as “zero or more”
  2. “oom”: Read as “one or more”
  3. “all”: Read as “all of them” (Most likely a very large collection, but technically equivalent to “zom”)
Example:

  1. Ugly. Also is head safe?
  2. I know head is not safe here
  3. I know there is at least one amenity. head is safe.
  4. I know there should be at least one amenity. head is probably safe. More importantly I should avoid excessive transforms on the collection, all those copies will take up a lot of memory!
  5. Bonus! Like Option above, I have an easy and readable decision for what to name the inner function parameter.

Cardinality Prefix naming convention

Here is the full listing of the cardinality prefix naming convention:

  1. Name all variables in the English singular form
    • Ex: hotel, phoneNumber, amenity
  2. If the variable is an option, prefix with “opt”
    • Ex: optHotel, optPhoneNumber, optAmenity
  3. If the variable is a collection AND guarantees at least one member, prefix with “oom”
    • Ex: oomHotel, oomPhoneNumber, oomAmenity
  4. If the variable is a collection AND does not guarantee at least one member, prefix with “zom”
    • Ex: zomHotel, zomPhoneNumber, zomAmenity
  5. If the variable is a collection of all of the values (very large collection), prefix with “all”
    • Ex: allHotel, allPhoneNumber, allAmenity

Example

The large case class from above, after applying the naming convention:

  1. I don’t need to read the source to know optStateCode is an Option
  2. I know oomProviderMapping has at least one item and that head is safe to call

MongoDB: Get a random record

Get a random record in Mongo

A quick and simple function that picks a random record from a cursor:

Example:

Scala: Collections of Futures

Version

  1. Scala 2.11.1

Overview

One of my favorite things about Scala is the amazing collections library. Scala’s collections library combines the best of standard functional idioms with an OOP call style. This makes for some down right beautiful code. However — after working with Scala for a bit over a year now, it has become very apparent to me that the Scala collection’s library was not written with asyncronous/reactive programming in mind.

Note
The Scala/Akka standard library Future is not lazy. Once a Future is constructed, it is “hot” — executing immediately. There are other libraries whose Future implementation is lazy (such as Scalaz2). Lazy futures allow building up an execution “plan” which is eventually run by the ultimate caller. This approach has many advantages, but sadly is not the direction Scala/Akka took. This article focuses exclusively on the Scala/Akka standard library Future1.
Note

I’ve tried to make sure all of the code examples here can be pasted directly into the Scala console. Some boilerplate is required to make these work:

Make sure to paste this into the console before trying the code below.

Note
Scala has wonderful type inference which ordinarinally makes it unnecessary to explicitly declare the type of vals. However, for maximum readability in my examples, I explicitly label the type of vals.

Calling a method that returns a Future N times

Once I decided to start using futures, they started to bleed into function return-type signatures everywhere. One place in particular that they started to show up was in the service layer:

Listing 1

Using a service layer like this was straightforward, until I needed to call it N times:

Listing 2

This was my first intuition about how to call the service N times. It seemed straightforward to me at the time and it even compiles and runs! But what this actually does is to immediately create a List[Future[Unit]] of 20 “hot” futures — all 20 futures have been submitted for execution! While this might be ok for 20 futures, it’s not ok for 1,000, 10,000 or 100,000+ futures.

Internally, the executor stores futures in a queue and executes as many futures as it has workers simulatenously. Dumping too many futures into the executor queue at once will starve other code that uses the same executor and can cause an out of memory error. Definitely not what I wanted.

Also, there is another problem here: each future returned by svc.doSomething is discarded by assignment to Unit. Not only am I not properly waiting on my futures to complete, but by assigning my Future to Unit, I’m throwing away any exception that might be thrown! Also not what I want.

Tip
Assigning a Future to Unit is always an error. Perhaps in the future the compiler will emit a warning about this, but for now the only way to discover this is by code review of programmers who may be new to futures. Even worse, this code may not fail at runtime. For actions returned in futures that complete quickly and are 99% exception free, this bug might go unnoticed for sometime.

Never assign Future to Unit

So what can I do? I need to stop assigning Future to Unit. I can do this by using map instead of foreach. Also, I need a way to properly wait for all my futures to complete:

Listing 3

  1. Use map instead of foreach to ensure I don’t discard any futures
  2. Use Future.sequence to convert a List[Future[Unit]] to Future[List[Unit]]
  3. Properly wait on all my futures to complete. Also, now that I’ve waited on all my futures, I can safely discard my List[Unit] because no exceptions were thrown

If any exceptions are thrown during svc.doSomething calls, the exception will percolate up through Await.result. But how can I stop dumping all futures into the executor at once?

Future.sequence and the functional “sequence” idiom

Most of this is standard Scala collections stuff, but what is Future.sequence? Future.sequence is a Future specific version of the “sequence” functional idiom (not to be confused with the Scala collection type Sequence). There is much more to know about “sequence”, but it is basically a way to invert the nesting of two monads, i.e. G[F[_]] to F[G[_]] (given certain properties of types G and F). Unfortunately, standard Scala requires some compiler magic3 to generically implement sequence. Without resorting to this magic, I have to copy the sequence pattern for different monad types, such as has been done for Future.sequence.

In addition to making Future the outer monad, Future.sequence also takes care of waiting for all of our futures to complete. In listing 3, futResult is of type Future[List[Unit]], which is completed once all of my inner futures complete

Controlling the flow of Future execution

I’ve solved the problem of discarding futures, but I still need to somehow regulate the flow of how many futures go “hot” simulatenously.

Listing 4

  1. Group someInts into a group size that I want to execute simulatenously
  2. For each group create a Future[List[Unit]]
  3. Use Await.result inside the map to wait for each group to complete
  4. Because I divided someInts into groups, I need to flatten the results (Note: this isn’t strictly necessary since result is Unit in this example. I’m going to discard List[Unit] anyway, but if result wasn’t Unit it would be necessary to flatten)

Ok this works. I’ve ensured that no more than N svc.doSomething calls are happening at once and exceptions are never discarded. However, this pattern has a fatal flaw. It does not pass a future back as a result. For the purposes of writing example code, this kind of thing gets the job done. However, when writing code that will live in an asyncronous eco-system, I must make my result a Future.

Tip
When writing a method that calls other functions or methods that return Future, I need to make sure to return a Future to callers of my method. This allows callers to use the Future of my method’s return value in the same way that I did when I called other methods that returned me a Future.

Returning a Future to callers

This has gotten complicated fast! But I feel like I’m almost there, so I will keep going. I’m modifying Listing 3 to ensure my result is a Future:

Pattern 1.0

  1. I’ve replaced map with foldLeft. This will ensure that each group is processed one at a time, from left to right and will accumulate the Future[List[Unit]] result after each group completes. The accumulator is initialized with already completed Future of an empty List[Unit].
  2. Future.flatMap is used here instead of Future.map to flatten the inner return type of Future[List[Unit]] over the entire collection (If Future.map had been used, it would return Future[Future[List[Unit]]]).
  3. After a group completes, the result accumulates

Ok this is much better. I’m ensuring that I don’t discard exceptions, I control the flow of futures AND now I return a Future to callers. But I call a Future returning method N times in many places. This is a pretty tedious pattern to have to repeat everywhere. Scala gives me some amazing utilities for cleaning up complexity like this.

Pimp My Future: Pattern 1.1

I’m going to cleanup Pattern 1.0 using a for-comprehension7 and the Pimp-My-Library Pattern5 with the Scala Value Class6. The pimp-my-library pattern allows creating an implicit wrapper class that can “add” a method to an existing class, essentially making an OOP style call convention for the new method. The Value Class (added in Scala 2.10) makes the implicit wrapper class free — the compiler optimizes away the wrapper in emitted bytecode.

Listing 4

Pattern 1.1

  1. Replaced Future.flatMap and nested Future.map with a much cleaner more readable for-comprehension7
  2. Replaced Future.sequence with sugar method
  3. Replace Await.result with sugar method

I like the OOP style call convention, but this pattern is still tedious. Perhaps, I can make this even simpler?

Pimp My Future: Pattern 1.2

I’m going to further cleanup Pattern 1.1 by creating another pimp-my-library method on a new value class Future_PimpMyTraversableOnce.

Listing 5

  1. Convert to List here for efficient accumulation of results (and no grouped method on TraversableOnce)
  2. Convert back to desired collection
Pattern 1.2

Much better! My code is now simple, readable, idiomatic, doesn’t discard exceptions, doesn’t flood the executor with futures and returns a future to the caller!

Further Exploration: Problems with Pattern 1

Pattern 1 solves the problem of discarding exceptions, regulating the flow of “hot” futures and returning a future to callers, but it isn’t the most efficient way of handling this problem. Because it has to wait for each group to complete, one of the svc.doSomething calls could take an extra long time. If it does, even though the other futures in its group have completed, I have to wait for that one long call to complete before moving on to the next group. Ideally, I should make it so that there are always N futures running simultaneously instead of grouping them. Work for another day!

CentOS 6: Install RPM Forge Repo


Versions

  1. CentOS 6.5 x86_64

Install

Note

replace $VERSION with latest release. As of this writing it is 0.5.3-1

To discover latest major/minor, visit: http://http://pkgs.repoforge.org/rpmforge-release/


Verify

Scala: Use Option instead of null to represent not set

In Java, null is commonly used to represent an Object value that is not-set:

While Java developers are forced to depend on method documentation to understand the semantics of null, Scala developers can instead code the semantics directly into the type system using Option:

This Scala snippet not only eliminates the need to explicitly declare the getter/setter but also removes the need to document nullability. From this declaration, it can be safely assumed that bar is never null and that bar may always be safely dereferenced. By using Option, users of the class are forced to deal with the possibility of bar being not set. In Java, documentation can easily be overlooked. The Java compiler will have no complaints about the following:

This snippet is problematic not just because it might dereference null, but because a NullPointerException doesn’t give the user, administrator or even later developers any indication of why the error has occurred. Only a later inspection of the source code and documentation of the getBar method would reveal that the null(not-set) value was not taken into account. If this code was buried in an edge case that is rarely or never exercised, the bug might not surface for years. Scala Option forces users of the Foo class to deal with the not-set state before accessing the value:

JVM: Memory Settings

Versions

  1. Java Oracle JDK 1.6.X

Heap

On program startup, Java allocates a certain amount of memory for the heap. This memory is used to fulfill requests for memory allocation by the program or libraries it calls. The Java heap space also has an upper limit that if reached, results in the java.lang.OutOfMemoryError[3] being thrown. Generally once running, there is no action a program can take to resolve this error. Instead, either set the maximum heap space higher when running the program or use a Java memory profiler to analyze memory usage (there might be a memory leak). It is also possible to redesign the algorithm to use the disk as a temporary swap space, but this should be avoided unless absolutely necessary.

Java allows setting the minimum and maximum heap settings as follows:[1]

  1. -Xms$MIN_VALUE
  2. -Xmx$MAX_VALUE

Values are specified as multiples of 1024 bytes and must be greater than 1 MB. Append k for kilobytes, m for megabytes or g for gigabytes.

Examples:

  1. -Xms16k
  2. -Xmx32m
  3. -Xmx2g

If these values are not specified at program startup, Java will automatically determine the values to scale with the available system memory.[2]

PermGen

Java uses a separate pre-allocated memory region, called the PermGen, to store class files, string constants and other resources located in jar files. With the exception of very large programs, the default value for PermGen is sufficient. Issues with PermGen are more common when running an application server such as GlassFish or Tomcat that loads jar files on-demand. To fix issues with PermGen, simply raise the default value for the PermGen space. Note that it is possible to create a classloader leak[5] with on-demand jar file loading.

  1. -XX:MaxPermSize

Examples:

  1. -XX:MaxPermSize=128m

Application Server Tuning

It is recommended to run application servers such as GlassFish or Tomcat with the following two settings to optimize garbage collection of unused jar resources in PermGen when an application is undeployed.[7]

  1. -XX:+CMSClassUnloadingEnabled
  2. -XX:+CMSPermGenSweepingEnabled

Scala: Use Option or Either to replace null as a failure indicator

While certainly not a recommended practice in Java, null is occasionally used to represent the failure of a simple operation:

Java developers must depend on documentation to explain if null is a possible return value, and if it is, what the meaning of null is. In this case, null is used to indicate the operation of readInt failed. Scala developers have a couple of options here:

In this Scala snippet, it is explicit that the function may not return a value. However, it is still not clear that None indicates failure and this should still be documented. Generally, I like to have as much information as possible about failures. This final Scala snippet uses Either to remove any ambiguity:

In this snippet, no documentation is required since it is explicit that this function either returns an Exception or a value. Callers of this function are forced to deal with the possible failure of the operation before accessing the return value. If the operation did fail, callers are provided with a possibly informative Exception to explain the failure.