Next-gen DataScience with Kotlin, Krangl and Graal Substrate VM AOT (Ahead Of Time)

cloud
data-science
javascript
react
spring
kotlin
graal
krangl
coding
Tags: #<Tag:0x00007fe3bc017780> #<Tag:0x00007fe3bc017640> #<Tag:0x00007fe3bc017500> #<Tag:0x00007fe3bc0173c0> #<Tag:0x00007fe3bc017280> #<Tag:0x00007fe3bc017140> #<Tag:0x00007fe3bc017000> #<Tag:0x00007fe3bc016ec0> #<Tag:0x00007fe3bc016d80>
#1

Next-gen DataScience: AOT-compiled Java and (Micro-)Services - on the cutting edge of 2019

rocketspaths

In the following software-tech article we use Graal Substrate VM for a Kotlin and / or Java based service. Cloud-native, Ahead-Of-Time (AOT) - compiled. Currently this is en vogue, you know… buzzword bingo.

Reality is that AOT compilation requires design changes and refactoring. Old code will never die!

For new developments we can use this tech to create a Kotlin based backend that’s intended for DataScience needs. With Graal Substrate VM our backend can

  • take advantage of Kotlin and Krangl. Latter is a modern DataScience library
    • a fine use-case for reactive PostgreSQL and JSON(B)
  • deploy slim Docker-images that can handle CPU and memory-intensive BigData workflows with little overhead
  • use Micronaut as a framework to organize our workflows with Dependency Injection / Inversion of Control paradigms
    • highlight some ideas where Micronaut resembles Spring(Boot)
  • provide the results via GraphQL
    • make a shiny frontend with React-vis and React-grid

The nature of this tech is, that you’ll dive into full-stack development. In order to keep things smooth we’ll stick to tabular integration and exclude Machine / Deep / Reinforcement Learnings.

But first things first: let’s generate some prototype stubs with boilerplate code to understand this a little better.

Autogenerate a boilerplate Micronaut project with Kotlin and AOT support

Since Graal 19 it’s possible to create native images (builds) of Java applications. Chances are, that if it works for Java it will work for Kotlin.

λ ~/Source/foiliage/ brew info micronaut
micronaut: stable 1.1.3
Modern JVM-based framework for building modular microservices
https://micronaut.io/

And to get our code:

λ ~/Source/foiliage/ mn create-app spring-df-stubs --lang kotlin --features springloaded,graal-native-image
| Generating Kotlin project...
| Application created at /Users/marius.ciepluch/Source/foiliage/spring-df-stubs

Setup the build env

The following examples are based upon Graal VM 19 (OpenJDK / java 8 compatibility, June 2019). Graal CE does support Linux and macOS, and there is experimental Windows support.

λ ~/ export JAVA_HOME=/Users/marius.ciepluch/Downloads/graalvm-ce-19.0.0/Contents/Home

For reference:

(base) λ ~/ java -version
openjdk version "1.8.0_212"
OpenJDK Runtime Environment (build 1.8.0_212-20190420112649.buildslave.jdk8u-src-tar--b03)
OpenJDK GraalVM CE 19.0.0 (build 25.212-b03-jvmci-19-b01, mixed mode)

We set our JDK / JRE to Graal.

(base) at the prompt in the following examples refers to Anaconda Python 3, which is not relevant in this context. master* or something like that will refer to a Git branch. I just didn’t edit the snippets, and interactive prompts are standard anyways.

Run code-gen to start a project

In order to simplify a lot of things, we can use Micronaut’s[1]’s code-generator (mn) with the following features:

(base) λ ~/Source/microKrangl/nativeMicroDf/ mn create-app test.becsec.native \
--features=kotlin,graal-native-image
| Generating Kotlin project...
| Application created at /Users/marius.ciepluch/Source/microKrangl/nativeMicroDf/native

Now open the Kotlin project with your IDE of choice.

04

Touch base with generating a DataFrame from arbitrary JSON

As pointed out in the intro we want to take a look at a DataScience workflow, and therefore we derivate from the general examples and integrate Krangl[2].

Add the Krangl project to build.gradle

Add the compile dependency for Krangl:

maven { url 'https://jitpack.io' }
...
compile "de.mpicbg.scicomp:krangl:0.11"

Krangl is a DataScience library, which is inspired by Pandas and others.

Some people now might point towards Spark RDD objects, which can practically do the same (from Java or Kotlin). More on that below.
Others might point to Morpheus[3] or tablesaw[4]. These two frameworks cannot load arbitrary JSON into a DataFrame object directly. And I don’t intend to deal with POJOs before I have filtered the data-set.

Add javax.inject and form a BeanContext

Micronaut implements the JSR-330 (javax.inject) - Dependency Injection for Java specification hence to use Micronaut we can use the provided annotations from:

compile group: 'javax.inject', name: 'javax.inject', version: '1'

Micronaut takes inspiration from Spring, and in fact, the core developers of Micronaut are former SpringSource/Pivotal engineers now working for OCI.

Unlike Spring which relies exclusively on runtime reflection and proxies, Micronaut uses compile time data to implement dependency injection.

This is a similar approach taken by tools such as Google’s Dagger, which is designed primarily with Android in mind. Micronaut, on the other hand, is designed for building server-side microservices and provides many of the same tools and utilities as Spring but without using reflection or caching excessive amounts of reflection metadata.

(Source: https://docs.micronaut.io/latest/guide/index.html)

Result: we have DI, which greatly enhances test-ability, which is essential for CI / CD. This might sound very basic, but given that this is deployed as a native context, it’s not.

Use the DataFrame object within a Micronaut controller

Create a super-short Kotlin controller class with a one-liner for tabular integration of an arbitrary JSON data-set from a URL:

package backend.services

import javax.inject.Singleton
import com.google.gson.GsonBuilder
import krangl.DataFrame
import krangl.fromJson

interface GenericJSONDataRequestService {
    val apiResponse: String
}

@Singleton
class GenericJSONDataRequestImpl : GenericJSONDataRequestService {

    private val apiEndpointURL = "https://jsonplaceholder.typicode.com/posts"
    private val apiResponseDataFrame = DataFrame.fromJson(apiEndpointURL)
    private val outputMapper = GsonBuilder().setPrettyPrinting().create()

    override val apiResponse: String
        get() = outputMapper.toJson(apiResponseDataFrame)

}

This is only where we want to be in general, from a pure technical perspective:

  • no POJOs before filtering - check
  • Singleton Bean for Inversion of Control / Dep. Injection - check
  • Serializing and De-serializing of JSON - check

The apiResponseDataFrame object allows in-memory operations upon a data-set. Spark RDDs are suitable for large data-sets, which do not fit in memory.

We can make this asynchronous later. Don’t worry about this here.

For the sake of completion, here is the actual Controller class, which loads the BeanContext:

import backend.services.GenericJSONDataRequestImpl
import io.micronaut.context.BeanContext
import io.micronaut.http.MediaType
import io.micronaut.http.annotation.Controller
import io.micronaut.http.annotation.Get
import io.micronaut.http.annotation.Produces

@Controller("/req")
class RequestCtrl {

    @Get("/")
    @Produces(MediaType.TEXT_PLAIN)
    fun hello() : String  {

        val req = BeanContext.run().getBean(GenericJSONDataRequestImpl::class.java)
        return req.apiResponse

    }
}

Standard compilation via Gradle (for Dev)

Assuming that you don’t have any errors, this Gradle build passes:

λ ./gradlew
λ ~/Source/microKrangl/nativeMicroDf/native/ master* java -jar build/libs/native-0.1.jar

AOT app compilation without Docker

The build times reflect a MacBook Pro 2017 3,5 GhZ Core i7. We can use the Gradle wrapper and then run the Graal AOT compiler:

λ ~/Source/microKrangl/nativeMicroDf/native/ ./gradlew assemble
λ ~/Downloads/graalvm-ce-19.0.0/Contents/Home/bin/native-image --no-server -cp build/libs/native-0.1.jar

Build and run the AOT-compiled DataScience server app with Docker

We need to need to add the library path because the native SSL / TLS libs are part of the Graal runtime.
The idea of AOT compilation is to ship slim containers, that do not require the (entire) runtime. But we load JSON data from an arbitrary URL [5]. Thus we have to add some libs.

If you see something like this, your Java library path is incorrect:

Java_sun_security_ec_ECDSASignature_verifySignedDigest

For the build you should also add approximately 4 GB to the Docker VM (default for Docker on macOS is 2 GB). This is not a runtime requirement. It’s for the build only.

λ  time ./native \
-D -Djava.library.path=/Users/marius.ciepluch/Downloads/graalvm-ce-19.0.0/Contents/Home/jre/lib

Change the Dockerfile accordingly to include the libs.

FROM oracle/graalvm-ce:19.0.0 as graalvm
COPY . /home/app/native
WORKDIR /home/app/native
RUN gu install native-image
RUN native-image --no-server -cp build/libs/native-*.jar

FROM frolvlad/alpine-glibc
EXPOSE 8080
COPY --from=graalvm /home/app/native .
COPY --from=graalvm /opt/graalvm-ce-19.0.0/jre/lib/amd64 /opt/graalvm-ce-19.0.0/jre/lib/amd64
ENTRYPOINT ["./native", "-Djava.library.path=/usr/lib:/usr/local/lib:/opt/graalvm-ce-19.0.0/jre/lib/amd64"]

From the project directory you can use the auto-generated Docker build script, which should provide an output that looks partly familiar by now:

λ  ./docker-build.sh
Sending build context to Docker daemon  149.2MB
Step 1/9 : FROM oracle/graalvm-ce:19.0.0 as graalvm
19.0.0: Pulling from oracle/graalvm-ce
35defbf6c365: Pull complete
...
Status: Downloaded newer image for oracle/graalvm-ce:19.0.0
...
Downloading: Component catalog from www.graalvm.org
Processing component archive: Native Image
Downloading: Component native-image: Native Image  from github.com
Installing new component: Native Image licence files (org.graalvm.native-image, version 19.0.0)
Refreshed alternative links in /usr/bin/
Step 5/9 : RUN native-image --no-server -cp build/libs/native-*.jar
 ---> Running in dccebb135667
[native:7]    classlist:   7,845.98 ms
...
To run the docker container execute:
    $ docker run -p 8080:8080 native

The majority of build-time is taken by the native-image process. During this step Docker’s build workflow creates the AOT-compiled app. Afterwards it just gets copied from the Graal container into a slim Alpine Linux Docker container.
Typically this build-task will run on a beefy CI / CD server.

For this example we have the Docker image in our local registry. We can run it directly:

(base) λ ~/Source/microKrangl/nativeMicroDf/native/ docker run -p 8080:8080 native
10:48:35.380 [main] INFO  io.micronaut.runtime.Micronaut - Startup completed in 21ms. Server Running: http://dbc92503fa55:808

And we can use any browser to send a GET request into the server running within the Docker image:

37

And yes… this isn’t a full Business Intelligence reporting service controller. It’s a small stub example. Behind the scenes this does a lot.

Next steps: generate a React Component and use Micronaut's GraphQL support

This workflow naturally lends itself to utilize React. We’ll just implement this with JSX and JavaScript.

Create a stub React project

We add the libs: react-grid-layout plus react-vis. Then checkout Micronaut’s GraphQL support[6].

λ ~/Source/reactrnd master* npx create-react-app my-awesome-vis-app
λ ~/Source/reactrnd master* grep -A 5 depend package.json
  "dependencies": {
    "react": "^16.8.6",
    "react-dom": "^16.8.6",
    "react-grid-layout": "^0.16.6",
    "react-scripts": "3.0.1",
    "react-vis": "^1.11.7"

More on that… later… it’s summer :slight_smile:


  1. https://micronaut.io ↩︎

  2. https://github.com/holgerbrandl/krangl ↩︎

  3. https://github.com/zavtech/morpheus-core ↩︎

  4. https://github.com/jtablesaw/tablesaw ↩︎

  5. https://e.printstacktrace.blog/graalvm-groovy-grape-creating-native-image-of-standalone-script/ ↩︎

  6. https://micronaut-projects.github.io/micronaut-graphql/latest/guide/ ↩︎