Code2Thought: 2012

Sunday, September 30, 2012

OData with DataJS

If you've been rolling your own OData requests using jQuery, you might want to look at DataJS. It simplifies your code quite a bit and is really easy to use.

Here are examples of jQuery getJson and ajax methods:

// - Roll your own ajax (see OData version below)
$.ajax({
    type: "GET",
    url: "/OData.svc/PortfolioManagerGroups",
    headers: { "Accept" : "application/json",
               "MaxDataServiceVersion" : "2.0" },
    dataType: "json",
    success: listGroups,
    error: ajaxError
});

This is the DataJS vesion of the above jQuery.ajax() example

OData.read("/OData.svc/PortfolioManagerGroups", 
            listGroups, odataError);

This example shows how to do a JsonP formatted query in DataJS.

// I like to wrap this in a function
function readJsonp(qry, callback) {
    OData.defaultHttpClient.enableJsonpCallback = true;
    OData.read(qry, callback, odataError);
    OData.defaultHttpClient.enableJsonpCallback = false;
}

// A query would look like this. Callback should loop through data.results
var qry = "http://someplace.com/Accounts?$select=ACCT_CD,ACCT_NAME&$orderby=ACCT_CD";
readJsonp(qry, listAccounts);

Wednesday, September 26, 2012

Personal BI Tool - PowerPivot for Excel

On the continuing odyssey to discover new ways to leverage OData services, I finally loaded up PowerPivot on advice from one of my colleagues (ok - my boss) and can only say wow.

There are some rough edges, mostly due I think to the fact that I'm overtaxing my "workstation" (a VM running with 2Gig of ram and 1 CPU). However, even in this constrained environment I was able to run through the tutorial - which includes crunching numbers in 2+ million row sales fact table.

The tools here are based on running SQL 2012 in an embedded mode within Excel (no need to install SQL tools - just Excel 2010 and this free add-in). So you get a lot of tooling to import data and manipulate it, as well as do calculations on it. This won't replace fullblown BI solutions, but it is pretty darn capable.

And, when you combine this with the ability to source data from databases, flat files, excel tables, AND OData (WCF Data Service) sources, this tool puts a lot of power in the hands of analysts.

PowerPivot download

PowerPivot Tutorial (highly recommended - takes about 2 hours to run through it)

Sourcing OData feeds into PowerPivot

Monday, September 24, 2012

WCF Data Service version 5 OData V3 Support

Finally, Entity Framework Code First DbContext is supported by the WCF Data Service, as well as additional features in the OData standard. Version 5.0 download here

There a few steps you need to take to upgrade your WCF Data Service though. (details here)

Strangely, support for JSONP was left out of this new release. If you are using this workaround, as I am, it's a one line fix - change the Accept header fix-up to: application/json;odata=verbose

When you add a new WCF Data Service into a Visual Studio web project, you'll have to repeat the steps given above, unless you are willing to dig under the covers (I am not ;) and create an item template for the new version in Visual Studio.

A couple of other interesting links:
WCF Data Services Toolkit - An OData framework for incorporating non-SQL (or more accurately non-Entity Framework) data sources into an OData service.

JayData - a linq-like javascript library that can use Odata. (uses DataJS under the covers)

Thursday, August 23, 2012

Playing with JsRender and JsFiddle.net

Here's an example:

Saturday, January 7, 2012

Machine Learning - Supervised Learning

Just finished a online extension course on Machine Learning via Stanford Engineering school. Excellent course.

Linear Regression

We learned about simple linear regression - fitting a line/curve to a 2D set of data points. This was then expanded to include 3 or more dimensions. This is useful for prediction and extrapolation where 2 or more variables (dimensions) represent correlated features (height and weight for example.) In a more complex scenario, we focused on predicting house prices base on various features such as square footage, number of rooms, number of floors, etc. Some important learning from this was the need to normalize/scale the features so they have similar ranges: Xs = (Xi-mean(X))/range(X).

Linear regression fits a straight line, but by applying increasing powers to each feature variable, (x1, x2^2, x3^3, etc), and passing the results into the gradient descent algorithm, you can fit a polynomial curve. Or take logarithms, roots, etc., to fit other types of curves.

The algorithm which finds the best fit is called gradient descent. We form a hypothesis formula to predict our desired outcome (in this case price): h(X) = p1*x1 + p2*x2 + p3*x3...
The trick is to find the values of the vector P (p1,p2,p3,...) which gives the best prediction. We determine how good/bad the prediction is by comparing the training data X with the predicted data Y in a cost function J = mean((X-Y)^2) - so the best fit would equate to the lowest cost function results.

Gradient descent uses the derivative of the cost function which calculates the rate of change of the cost function with respect to each parameter in P. p1 = p1 - a*derivative(J,p1)
'a' here is the learning parameter which decides how big a step along the cost curve to take. The parameter is updated. As it reaches the minmum point of the cost curve, the derivative essentially becomes 0 (zero slope), so it no longer changes. At this point, the parameter has been found for that minimum.

The algorithm converges then on the parameter values, which, when fed into the hypothesis equation above - should represent the best fit to the training data.

Logistic Regression

Linear regression is good for prediction of real valued numbers, but in many cases, we want to use a set of features to classify something true or false. For example, whether a tumor is malignant/benign or whether an email is spam or not. Linear regression does not reliably work in these cases because the resulting variable is integral (0 or 1 in these cases) - not a real number.

In this case we need a hypothesis function that examines variables bounded between 0 and 1. We use a "sigmoid" function to bound the values h(x) = 1 / (1 + e^(theta*x)) where "theta" is a parameter similar to linear regression. This function has values between 0 and 1 - essentially representing the probability that, given theta and x, the result y is true or false.

With this, we can create a cost function that compares the difference between the predicted values y and the training data. We calculate the derivative of this function and again use gradient descent to find the optimal parameters (thetas).

Neural Networks

Neural networks are a possibly much more efficient way to find the optimal parameters for a classification problem - such as the ones we solve with Logistic Regression. Especially ones with large number of classes. In logistic regression, our output classes are usually yes/no (k=2), or perhaps a prediction of the letter grades students will get based on certain habits they exhibit, a multi-class definition k>2 where k is less than 10.

Neural networks would extend to something like predicting letters in an alphabet based on the pixels (each one being a variable) in an image. This has a k=26 set of classes and for an image of 20*20 gray scale pixels, 400 variables (and thus 400 parameters to calculate).

Logistic regression would take a lot of time and would examine many non-optimal parameter values as it searched. Neural networks have been found to more quickly converge to optimal parameters by calculating the parameter values in stages (or "layers") that feed outputs from an input layer to one or more inner layers, that feed an output layer.

We're still working with a hypothesis function and cost function, and we use numerical method to minimize the cost function - which essentially is the neural network. The cost function uses the neural network algorithm to determine a cost, along with an array of gradients (derivatives) for each variable. These outputs are fed into the minimization function to calculate the optimal parameter weights for the neural network layers.

Once this "training" is completed, we have a set of weights for each layer of the network. The prediction calculation then boils down to a series of matrix multiplications for each layer to calculate the output values - which is very speedy.

In all the above cases, we use a training set of data with known inputs/outputs. The hypothesis function is written in a way that pairs each variable (feature) with a parameter that weights the variable in some way. The comparison between the known predictions and the hypothesized predictions is the "cost" function. In all cases, our goal is to minimize the difference or cost by choosing the best weights (parameters) to use in the hypothesis function.

Code2Thought

CodeSnips