Tensorflow.JS - usage notes

Tags: #<Tag:0x00007f173c39ea18> #<Tag:0x00007f173c39e928> #<Tag:0x00007f173c39e838> #<Tag:0x00007f173c39e748> #<Tag:0x00007f173c39e608>

This is a growing Wiki article about Tensorflow.JS that is part of a series on Machine Learning for Information Security domains.

If you are interested in the topic it’s important to focus on the actual development rather than the snake-oil.

Our problems are man-made;
therefore they may be solved by man.
(John F. Kennedy in a speech, 1963)

Preliminary references:

Arvind Narayanan, How to recognize AI snake oil
PDF - presentation Princeton University

Mirror

Mirror:
MIT-STS-AI-snakeoil.pdf (1.1 MB)

We don’t need to add more Fear Uncertainty and Doubt (FUD) into the Information Security domains.

Real-word Information Security may provide the fidelity for Heuristics rather than for Machine Learning; in the sense that we can only apply discovered practical methods to speed up the process of finding good-enough solutions[1].

– It’s surprisingly difficult to build concluding definitions of the involved terms such as “Machine Learning”, “Heuristics” or of the specific Information Security aspects. That is because a lot of groundwork is missing.


Tensorflow.JS - usage notes

The goal of Machine Learning is to train a model from input data.

A model then can be used to predict or infer output data. Tensorflow.JS is developed to deploy ML models into a browser (and / or JavaScript runtime such as Node.JS).

Short note on terminology and jargon

image

There are different schools of thought about the definitions of terms such as

  • Machine Learning and DataScience - “without being explicitly programmed”[2]
  • Deep Learning - supervised, semi-supervised and unsupervised
  • (Deep) Reinforcement Learning

Rather than reciting an opinion, why don’t you dive into the field using some of the MIT Deep Learning lectures[3]?

The first wave of approaches pioneered in the 1940s[4] with the work of McCulloch and Pitts on the character and relations of nervous activity and neural events.

The second [wave] occurred in the 1960s with Rosenblatt’s perceptron convergence theorem and Minsky and Papert’s work showing the limitations of s simple perceptron.[5]

In the 1980s there was renewing interest in Artifical Neural Networks.1982 Werbos proposed back-propagation learning algorithms for multi-layer perceptrons.[6]

Today’s most progressive fields in Machine Learning are Natural Language Processing (NLP) and Deep Neural Networks.

Results
  • It’s important to understand the underlying concepts in relation to Tensorflow’s programming paradigm.[7] Otherwise, it’s not possible to use the results.
  • In a couple of years it’s going to be 100 years of Machine Learning and look what they have learned :stuck_out_tongue:

Technology preface: Node.JS

Here the Wiki sections are reduced to bare usage notes and code listings.

node --version
v12.10.0

ES 6 imports

» npm install --save esm
» grep start package.json
    "start": "node -r esm index.js"
» npm start

> [email protected] start /Users/marius.ciepluch/Source/tensorflow/firststeps
> node -r esm index.js

This way you get to use modern JS import syntax:

import * as tf from '@tensorflow/tfjs';

// Define a model for linear regression.
const model = tf.sequential();

Note: ignore that this import does not feature the package with Tensorflow C++ bindings[8].
This is primarily useful in order to bridge the gap between various pieces of Tensorflow.JS code that target runtimes.

Tabular integration with modern JavaScript

DataScience libraries consume either JSON or CSV formatted data, sometimes from files, sometimes from variables. Sometimes they invent custom formats. It’s messy.

Custom CSV loader with lodash - Shape and Dimension of a Tensor

Key Node.js library Used version Use case
Lodash 4.17 data cleanup
ESM 3.2 ES 6 code

We may need custom loaders because tools and libs for statistical analysis (or analysts) have different styles. For example SPSS, Cognos, Tableau versus Excel, Power BI…

const FileSystem = require("fs");
const _ = require('lodash');

/**
 * Uses lodash for a custom CSV loader
 * @param filename - to read the CSV from
 * @param options
 */
function loadCSV(filename, options) {

    let data = FileSystem.readFileSync(filename, { encoding:"utf-8" });

    // get an array of string objects
    data = data.split("\n").map(
        row => row.split(",")
    );

    // remove empty elements
    data = data.filter(function (el) {
        return el != '';
    });

    // cleanup empty columns
    data = data.map(row => _.dropRightWhile(row, val => val === ''));

    console.log(data);
}

The code does not use ES6 imports here. The synchronous file-read into data is blocking (not very Node.JS like) and the implementation is not complete. This is boilerplate code for a custom CSV loader.

Load a CSV file
[
  [ 'sku', 'title', 'hardware', 'price' ],
  [ '12345', 'Hollow Knight', 'Nintendo Switch', '15.99' ],
  [ '35342', 'Dark Souls Remastered', 'Nintendo Switch', '39.99' ]
]

Thinking in terms of Tensors: this particular structure is an array of arrays. In Computer Science the usual mathematical term is a vector or a matrix. All elements have the same data-type.

  • We can see 2 dimensions (nested array of arrays) - count the number of opening [s before the first element
  • The shape is [ 3, 4 ]. The number of horizontal elements once per dimension.

This is important to understand Tensor(flow) operations[9].

A generic and simple CSV to JSON converter without hardcoded fields

This is rudimentary, but it’s important to exchange data. CSV is a tabular format and JSON is not.

Key Node.js library Used version Use case
csvtosjon 2.0 parser, converter
ESM 3.2 ES 6 code
/**
 * A custom converter CSV to JSON
 */

const csv = require("csvtojson");

function runCSVtoJSON() {

    let myCSVdata =
        `col1, col2, col3
        val1, val2, val3`;

    csv({
        noheader: false,
        output: "json"
    })
        .fromString(myCSVdata)
        .then((csvRow) => {
            console.log(csvRow); 
        });
}

The result:

[ { col1: 'val1', col2: 'val2', col3: 'val3' } ]

A generic and simple JSON to CSV converter without hardcoded fields

This example excludes flattening etc.

Key Node.js library Used version Use case
jsonexport 2.4 parser, converter
ESM 3.2 ES 6 code
/**
 * A custom generic converter for JSON data to a columnar CSV
 * @param jsonData
 */
const jsonexport = require('jsonexport');

function runJSONtoCSV(jsonData) {

    var myJSONdata = JSON.parse(`{ "col1" : "val1", "col2" : "val2", "col3" : "val3" }`);

    jsonexport(myJSONdata, {verticalOutput : false}, function(err, csv){
        if(err) return console.log(err);
        console.log(csv);
    });

}

The result:

col1,col2,col3
val1,val2,val3

This way you can convert a JSON data-set (maybe from a REST endpoint) to other formats, such as Apache Arrow[10][11].

Results
  • basic code to ensure interoperability by converting data to different formats
  • modern ES6 imports can be used in a Node.JS project (with npm or yarn)
  • Example of a Tensor, shape, and dimension

Tensorflow.JS - one step deeper


  1. https://stats.stackexchange.com/questions/300350/is-machine-learning-an-heuristic-method ↩︎

  2. https://ieeexplore.ieee.org/document/5391906 ↩︎

  3. https://deeplearning.mit.edu ↩︎

  4. https://www.cs.cmu.edu/~./epxing/Class/10715/reading/McCulloch.and.Pitts.pdf - McCulloch and Pitts - A Logical Calculus of Ideas Immanent in Nervous Activity - Department of Psychiatry - University of Chicago - 1943 ↩︎

  5. IEEE Theme feature - Artifical Neural Networks: A Tutorial - 1996 ↩︎

  6. P. Werbos "Beyond Regression: New Tools for Prediction and Analysis in Behavioral Sciences - 1984 ↩︎

  7. https://stackoverflow.com/questions/44210561/how-do-backpropagation-works-in-tensorflow ↩︎

  8. https://www.tensorflow.org/js/guide/nodejs - Vanilla CPU ↩︎

  9. https://www.tensorflow.org/js/guide/tensors_operations ↩︎

  10. https://github.com/trxcllnt/csv-to-arrow-js ↩︎

  11. https://arrow.apache.org ↩︎