Indexing the blockchain with Elastic Ethereum

Posted January 5, 2016 by Jonathan Brown ‐ 3 min read

Originally published on jonathanpatrick.me. Retrieved from the Wayback Machine.

One of things that can't really be done yet in a decentralized manner is search. In an Ethereum smart contract it is possible to maintain some elementary lookup tables, but more advanced features such as full text search are generally not possible due to excessive processing and storage requirements on-chain. Eventually it may be possible to use Ethereum to coordinate a network of search oracles that would profit financially if the network determines them to be operating correctly, but I am not currently aware of any such project. This sort of solution would be analogous to how Swarm is being proposed to work.

For example, imagine there was a dapp that was essentially a Yelp or TripAdvisor clone. Businesses could upload their information and customers could leave comments. Because it would be autonomous and transparent, it would avoid a lot of the criticisms levelled at these sites. Being able to search this information would be really important. Ideally the search would also be autonomous and transparent, but this is not yet possible.

In the mean time, there are some very mature centralized search daemons. Elasticsearch is generally regarded as the best. Elastic Ethereum is a Node program that I have created that waits for events on Ethereum contracts and then populates an Elasticsearch index accordingly. A dapp could connect to an external Elasticsearch daemon to provide (albeit centralized) search functionality. Potentially Mist (the Ethereum browser) could even have Elasticsearch bundled with it to provide indexing locally.

Elastic Ethereum could also be used for private analysis of contracts, although depending on your use-case a different database system might be more appropriate.

Additionally, Elastic Ethereum can extend contract objects returned by web3 with custom methods that utilize the index.

The README.md details how to configure it.

I created a contract to test the indexing: public-message.sol.

My production.json looks like this:

{
  "ethereum": {
    "provider": "http://localhost:8545"
  },
  "elasticsearch": {
    "host": "localhost:9200"
  },
  "contracts" : {
    "public-messages": {
      "address": "0x05a74ade0dcb9c8ca8140273e66a9f455be51294",
      "index": "public-messages"
    }
  }
}

And my public-messages.callbacks.js looks like this:

var onInit = function() {
}

var onCreate = function() {
}

var getDeletes = function(log) {
  return {}
}

var getDocuments = function(log) {
  var hash = log.data;
  var message = contract.getMessage(hash);
  var document = {};
  document[hash] = {body: message[2]};
  
  return {
    message: document
  };
}

module.exports = {
  onInit: onInit,
  onCreate: onCreate,
  getDeletes: getDeletes,
  getDocuments: getDocuments
};

The daemon is invoked like this:

node elastic-ethereum.js public-messages

Every time someone executes the saveMessage() function Elastic Ethereum indexes the message.