Machine Learning Newsletter

Scala Basics for Python Developers

Python is great language, its syntax, standard library and scientific computing stack(numpy, scipy, scikit-learn, matplotlib and many others) are just great. I feel like whenever I have a problem at my hand, it will be my first to go language no matter what with the extensive library support and community. However, as every programming language has its own shortcomings, Python has its own as well. It is not well-suited concurrent programming and parallelism, it is slow(comparing to JVM based languages), its support for functional programming is limited to map function and functools module. Also, with rising big data, both Hadoop and Spark favor JVM based languages(Java and Scala), for large scale data processing, Python's longtime data processing disadvantages increased by one. You could use interfaces through Jython but development of Jython is lacking and sometimes due to version differences you may not use the feature-full Python with Python.

In this post, I will make a comparison in variables, functions and classes of both languages. Admittedly, they cover very easy constructs but my hope is that it would be a good start for Python developers who want to learn more about Scala.

Scala

If you have not heard of Scala, it is JVM based, modern programming language, succinct syntax and supports both object-oriented and functional programming style with many more advanced features which I will tell a little bit about shortly.

Why

Mainly for Big Data to be honest. Adoption of Scala from companies is also good but especially Scalding and Spark use Scala, I thought I should give it a shot Scala as well as I have been playing/using Clojure for some time and I open-sourced a K Nearest Neighbor classifier in Clojure if you are interested) , enjoying it so far. Also, I found very powerful to be able to combine a Java library with Clojure as it is pretty seamless with Leiningen. What I like about this hybrid approach, you get to use all of these mature libraries written in Java where you write Clojure on top of it.(In this case, Scala). I think using JVM under the hood and building on top of long-time

Advantages

  • It is JVM based and you could use almost all of the available Java libraries(similar to Clojure in this sense)
  • It is a hybrid language which combines object-oriented programming style and functional programming style. If you want to use one aspect heavily and not so much for the other aspect, that is great as it does not compromise in any particular part of the language.
  • It has type inference unlike Java, you could pass functions freely unlike Java. All in all, it is superior to Java in almost all aspects where it does not compromise the speed.
  • Great pattern matching support. Some of them could be also used for conditional statements as well.
  • Scala's programs tend to be more succinct and terse. Compare this verbosity of Java and you could see an decrease in development time and increase in productivity. This is partially because functional programming makes it much easier to common operations but also data structures(especially the collections) are simply a better version of Java's. Also, the classes provide nice shortcuts for Java bean so that you could get away with not writing getter and setters for basic classes.
  • Immutable data structures make concurrent programs much easier to run(similar to Clojure as well).

Disadvantages

  • There are many ways to do one thing. This is the biggest one. This must be avoided if one language wants to be readable by many people. But for the coverage of Scala is overwhelmingly is large and supports so many things that, this could not be avoided in the design of language, I guess.
  • Tooling is not very great. Even though the language has been around for some time(~11 years), I am surprised by sbt and how inferior the available tooling is for the language. Hopefully, this will improve in the future.
  • Some people complain about how slow it is when it comes to compile large projects, I did not use Scala for large projects but I could see how this might be quite problematic in the environments where you want to do multiple iterations in large projects.
  • Syntax. There are ton of features in the language, and some of the features could be applied with syntactic sugar. This is not disadvantage per se, but still contributes the problem of having many ways to do one thing.
  • There is a native XML data structure. The following code is valid in Scala:
val xmlRepresentation =  <p><a href="http://bugra.github.io/">Bugra Akyildiz</a></p>

If this is not a disadvantage per se, it is weird. I would understand JSON but XML, really Scala? I thought you are superior to Java, but it turns out that you are Java in some aspects!

Installation on Mac OS X

Homebrew

If you do not have homebrew in Mac OS X, use the following command to install first the package manager:

ruby -e “$(curl -fsSL https://raw.github.com/Homebrew/homebrew/go/install)
Scala

Then, you could install scala via home-brew:

brew install scala

After installing the scala, you could type scala to the command line in order to start the interactive REPL.

Sbt

In the meantime, you want to install sbt(scala build tool), similar to maven and ant in Java if you are familiar with one of those.

brew install sbt
IDE Support

Scala has official support for Eclipse, one can use Intellij IDEA for Scala through Plugin. I am using Intellij, so I installed plugin(Preferences -> Plugins -> Scala). Installation is quite straightforward.

Variables in Scala

There are two different ways to define variables in Scala. First one is immutable(which is a nice fit for functional programming style) and the other one is mutable ones(surprise!). Immutable data structures cannot be changed after variables are assigned whereas mutable data structures could be changed. In order to define immutable ones, one can use val and for mutable ones var. Unlike Python, you could have a mutable and immutable string or integer, whereas in Python you have a predefined datasets which are immutable(string and tuple) and mutable(list and dictionary).

// String
val firstName: String = "Bugra"
// We could leave the type, and type inference takes into place
val firstName = "Bugra" // firstName is still string
// Similarly, we could always remove the type in the declaration
// part and expect it will be inferred from the value
// Integer
val firstPrimeNumber = 2
// Double 
val doubleNumber = 3.0
// Long
val longNumber = 5L
// Up to this point, if you remove the types and val, everything 
// works in Python similarly as well

// Characters are represented '' as in Java
val firstChar = firstName(0)
// Symbols are useful string representation for string interning
// also makes it easier to compare two strings
// There is no symbol equivalent for Python
val symbol = 'ha
// null, same null in Java, similar to None in Python
val non = null
// List (First get Range and then convert it into list)
val numbers = (1 to 100).toList
// This works too
val anotherNumbers = List(0, 1, 2, 3, 4, 5, 6, 7)
// Tuple(very similar to Python, can contain different data structures and immutable)
val tuple = (firstName, doubleNumber, non, numbers)
// Access to tuple elements use an interesting syntax and 
// starts with 1
tuple._1 // => returns firstName
tuple._2 // => returns doubleNumber and so on
// XML(!), needless to say, Python does not have equivalent 
// of this
val xmlRepresentation =  <p><a href="http://bugra.github.io/">Bugra Akyildiz</a></p>
// Unit type, which is almost same void in Java
val unitType : Unit = ()
val unitType = () // same as above, type inference
// Dictionary Like Map which could have different types in their names and keys
val blueWhale = Map("name" -> "Blue Whale", "weight" -> 170, "height" -> 30)
// To access key, same as Python(with a tweak)
blueWhale.get("name")
blueWhale.getOrElse("color", "blue") // returns blue

With type inference, Scala generally feels like a dynamic language even if it is strongly typed language when it comes to variables.

// Immutable 
val ii = 1
// Following gives: 
//<console>:9: error: value += is not a member of Int
//              ii += 1
//                 ^
// which means + operator is not defined on the immutable Int
ii += 1 
// Immutable data structures cannot be reassigned
// It gives the following error
// <console>:8: error: reassignment to val
//       ii = 3
//          ^
ii = 3 
// Instead use mutable with var
var ii = 0
ii += 1 // ii = 1

Variables in Python

In Python, there are some immutable data structures like tuple, string and some of them are mutable (list and dictionary) unlike Scala.

# string
first_name = "Bugra"
# integer
first_prime_number = 2
# double
double_number = 3.0
# long
long_number = 5L
"""
There is no character variable in Python, this is string as well
also note that Scala uses parenthesizes rather than square 
brackets to access elements of collections, strings, arrays 
"""
first_char = first_name[0]
"""
List (powerhorse of Python), very useful, can contain 
different data structures 
"""
numbers = range(10)
# Tuple
tup = (first_char, first_prime_numbers, numbers)
# indexing tuples and lists are same, with square brackets
print(tup[0], numbers[1]) // first_char, 1 
# Dictionary, JSON-like hash-maps of Python
blue_whale = {
                            'name': 'Blue Whale',
                            'weight': 170,
                            'height': 30,
                         }

Functions

There are a lot of ways to define functions in Scala whereas in Python there are two ways; functions and anonymous functions. This "there are a lot of ways" is common afterwards in Scala. We will see there are a lot of ways to define classes as well in the next section.

Functions in Python

You could define a function with a name, or define an anonymous function using lambda expression and then assign it to to a name. Lambda expressions are limited to single line and cannot accept optional parameters. So, in practice, anonymous functions are generally used ad-hoc.

def adder(x, y):
    return x + y
adder(3, 4) # 7
adder = lambda x, y: x + y
adder(3, 4) # 7 again

Functions in Scala

The same adder can be defined in scala following way(Other than types, it is very similar to Python)

def adder1(x: Int, y: Int): Int = {
    return x + y
}
//  We could leave out the return type and return statement
// Last statement automatically to be returned and return type 
// is inferred by the compiler 
def adder2(x: Int, y: Int) = {
    x + y
}
// The same function could be defined using anonymous function
val adder3 = (x: Int, y: Int) => x + y
// Since cases return values as well, we could use them as functions
val adder4: (Int, Int) => Int = {
    case (x, y) => x + y
}
// Curried Functions are interesting
def adderFactory(x: Int)(y: Int) : Int = x + y
// We could create our own adders using currying
val adder10 = adderFactory(10) _ 
adder10(5) // returns 15
// Closures could be created using this currying concept in functions

// If we want to use only side effects of function, we could do
// This will return `unit` and called 'procedure' as well
// This does not accept any parameters either
def printer = {
    println("I am printer")
}
printer // prints "I am printer"

Classes

Both Python and Scala support object oriented programming style. They do not enforce the usage of object oriented programming style unlike Java, though. In that aspect, they are similar. But as in the case of functions, there are many ways to create classes in Scala and it has also quite some syntactic sugars when it comes to commonly used constructs in classes.

Classes in Python

Python classes can be defined as in the following:

class Operation(object):
    """ Arithmetic Operations on two numbers
    """

    def __init__(self, x, y):
        """ Arguments:
                    x, y(number-like):
        """
        self._x = x
        self._y = y

    def add(self):
        """ Add two numbers
        """
        return self._x + self._y

    def sub(self):
        """ Subtract two numbers
        """
        return self._x - self._y

    def mul(self):
        """ Multiply two numbers
        """
        return self._x * self._y

    def div(self):
        """ Divide two numbers
        """
        result = None
        try:
            result = self._x / float(self._y)
        except ZeroDivisionError as e:
            print(e)
        return result

    @property
  def x(self):
        return self._x

  @x.setter
  def x(self, value):
    self._x = value

  @x.deleter
  def x(self):
    del self._x

    @property
  def y(self):
        return self._y

  @y.setter
  def y(self, value):
    self._y = value

  @y.deleter
  def y(self):
    del self._y

operation = Operation(3,4)
operation.add() # 7
operation.sub() # -1
operation.mul() # 12
operation.div() # 0.75
operation.x = 5
operation.y = 10
operation.add() # 15

In Scala, we could make a shortcut of the properties as in the following:

// Mutable, we want to change the values as in the following
// val operation = new Operation(3, 4)
// operation.x = 5 
class Operation(var x: Int, var y: Int) {

        def add() : Int = {
            this.x + this.y
        }

        def sub() : Int = {
            this.x - this.y
        }

        def mul() : Int = {
            this.x * this.y
        }

        def div() : Double = {
            val doubleX = this.x.toDouble
            val doubleY = this.y.toDouble
            doubleX / doubleY
        }
}

// object definition is very similar to Java
// comparing to Python, there is an extra "new"
val operation = new Operation(3,4)
operation.add // 7
operation.sub // -1
operation.mul // 12
operation.div // 0.75
operation.x = 5
operation.y = 10
operation.add // 15
comments powered by Disqus