Google's New Blockchain Search Tools, Broken Down
Google is now in the blockchain search business. Less than a day after Forbes broke the story that the internet search giant would be launching a suite of tools built by, and for, open source developers, those tools are live.
In addition to loading datasets for all the transactions and metadata in eight cryptocurrencies, including bitcoin and ethereum, Google Cloud developer advocate Allen Day and his team of open source developers from around the world are launching a number of tools designed to do to blockchain, what Google search did to the internet.
“I'm very interested to quantify what's happening so that we can see where the real legitimate use cases are for blockchain,” said Day, who manages the cloud portion of the project. "So people can acknowledge that and then we can move to the next use case and develop out what these technologies are really appropriate for.”
Last year Day, and lead developer Evegeny Medvedev discreetly loaded transaction data for the bitcoin and ethereum blockchains, along with some basic search tools, to Google's BigQuery big data analytics platform, and have been studying how developers are using the software. As of today, they're taking what they've learned and making datasets available for bitcoin cash, ethereum classic, litecoin, zcash, dogecoin and dash, along with an expanded suite of search tools.
Dubbed Blockchain ETL (extract, transform, load), the software created by independent developer Medvedev, with support from the rest the team, now includes features such as integration with Google’s BigQuery ML (machine learning) tool, launched into a test, or “beta” version last year. By searching for patterns in transaction flows, the machine learning integration will automatically inform the user basic information about how a cryptocurrency address is being used.
For example, the tool might be used to analyze transaction flows to determine that an address is holding funds for a cryptocurrency mining pool, when users contribute unused computer power to audit blockchain transactions in exchange for cryptocurrency. In the future, the BigQuery ML integration could also identify cryptocurrency addresses owned by a single entity, for example, an exchange, and condense those addresses into a single data point, simplifying comparisons.
Also included in the launch, the blockchain datasets have been standardized into what Day calls a “unified schema,” meaning the data is structured in a similar, easy to access way. By ensuring this level of consistency across datasets, Day hopes to make it easier for data scientists, auditors, and investigators to make comparative statements about transactions in the supported blockchains. “And others going forward will use the same architecture,” Day adds.
Another new search feature now available is what Day calls a “double entry book view,” designed to simplify the way users can search for the cumulative balance of an account over a particular time, accurate down to the eight decimal places, the smallest possible bitcoin denomination, called a satoshi, named after the cryptocurrency’s pseudonymous inventor.
Going forward, datasets that fall into what is called the “Satoshi family,” meaning they structurally resemble bitcoin, will be searchable by two criteria: block and transactions. Whereas support for the ethereum and ethereum classic blockchains, with their more complicated smart contract functionality, now includes five additional tables designed to enable more sophisticated searches.
The first terabyte of inquiries for these and other datasets are free each month, with additional fees charged per-byte, or flat $40,000 monthly rates for high-volume users. Not only did Amazon, Google’s biggest cloud computing competitor, enter blockchain last year in a big way, but fellow cloud leader Microsoft is now considered a seasoned veteran of the burgeoning space. As startups like Storj and Perlin aim to use cryptocurrency as a way to incentivize users to adopt their own decentralized version of cloud computing, Day says the industry, expected to reach $411 billion next year, is primed to experience a blockchain renaissance.
“Some people are more theoretical, and the importance of their work becomes fully manifested decades after they’re dead,” says Day. “I guess I’m just more interested in seeing things play out in front of me, as opposed to doing anything deeply theoretical.”
To incentivize as much participation as possible, Medvedev and Day have partnered with the non-profit Ethereum Community Fund, which is in turn offering cryptocurrency rewards to developers who find and fix bugs in the code. “There are around ten core contributors that helped implement various components of the system,” says Medvedev, who leads the developers and was previously the lead data engineer at cryptocurrency intelligence firm Coinfi. “They are spread around the globe: some live in Russia, others in Singapore or China.”
Perhaps unsurprisingly, Day’s role as customer zero means his interest in helping create the blockchain search features goes beyond theory. Collectively, he believes the tools will enable more advanced econometric calculations including the Gini coefficient, which measures the distribution of wealth in a given system, and could eventually be used to understand which nations are actually using the cryptocurrency. While blockchain data doesn’t natively include information about where a transaction occurs, Day is personally exploring how BigQuery ML might be leveraged to reveal transaction locations.
“This is not some kind of dependency on government agency reporting,” says Day. “We have all the data, and we can pull metrics and look at them and reason about them over time.”
To show how Blockchain ETL could result in improvements to the cryptocurrency economy, Day is also using the suite of tools to examine a number of cryptocurrencies, most notably, bitcoin cash and ethereum classic. While both the cryptocurrencies resulted from a dispute about how to enable smaller, cheaper transactions, Day found, according to the report published today, that the cryptocurrencies are being hoarded in much the same way as their predecessors.
From the report:
"Bitcoin Cash was purportedly created to increase transfer-of-value use cases through lower transaction fees, which should ultimately lead to a lower Gini coefficient of address balances. However, we see that the opposite is true—Bitcoin Cash holdings have actually accumulated since Bitcoin Cash forked from Bitcoin. Similarly, the Ethereum Classic currency was rapidly accumulated post-divergence and remains so."
And it’s not just Day who has been using the cryptocurrency datasets. So far, the largest group of users are coming from within Google itself. In March 2017 Google purchased data science collaboration startup Kaggle for an undisclosed amount. Comprised of a community of data scientists, including Day, Kaggle is now hosting more than 500 bitcoin projects and 16 ethereum projects, many of which are for educational purposes. Projects include Day’s own effort to track the bitcoin transactions of the $10k bitcoin pizza purchase widely believed to be the first ever use of bitcoin to buy goods, and some early work to calculate the Gini coefficient for ethereum.
“We saw a very warm reception from that community,” says Day.
Such successes are giving Day a cult following of sorts, even beyond the confines of Google and its subsidiaries. In December 2018 Day met Tomasz Kolinko, a computer scientist and creator of the Eveem software for analyzing code, called smart contracts, designed to transparently, and immutably execute any number of tasks. The two were attending the EthSingapore hackathon when Kolinko expressed his frustration for having to wait for hours to get results from some of his searches.
Within a month of the two meeting, Kolinko published the results of his analysis using BigQuery, showing the potential benefits and dangers of putting such tools in the hands of the public. Kolinko used the Google BigQuery ethereum dataset to look for a smart contract feature called a “self-destruct” designed to limit how long a contract can be used. In 23 seconds he was able to search 1.2 million smart contracts and found that almost 700 of them had left open a self-destruct feature that would let anyone instantly kill the smart contract, regardless of who might be using it. “The scary part is,” said Kolinko, “if there is a new vulnerability, in the past you couldn’t just easily check all the contracts that were using it.”
That same month Day reached out to engineer Will Price, whose work using Google BigQuery to classify the 40,000 richest ethereum addresses with 25 criteria, he had seen online. Using the basic search tools previously made available, Price identified ten distinct patterns for how ethereum addresses are being used but was only able to classify three of them into the what he called, “archetypes:” exchanges, miners and initial coin offering (ICO) wallets. “The other archetypes are just as valid,” says Price, who is now listed as a member of the developer team. "But I don't have enough information to say what they are.”
Increasingly, it’s not just cryptocurrency datasets loaded by Day that are being used on Google BigQuery. In November 2018 independent Dutch developer Wietse Wind followed Day’s lead and uploaded his own dataset, and similarly gave it away to the open source community. Best known for building the XRP Tip Bot, which has 5,500 active users. Wind invested $20,000 to buy two of his own “bare metal machines”—meaning he's not using cloud for this work—and helps validate data about XRP transactions. Then, in November, he loaded that data to Google BigQuery, and updates it on a regular for public use.
In what is perhaps one of the most visually striking uses of Google BigQuery to analyze cryptocurrency data, graphic designer Thomas Silkjaer exported Wind’s data to a special graphical database called Neo4J designed to visually render data in ways that make patterns more apparent. By merging his skills as a graphic designer for bibles with Wind’s data, Silkjaer gives a glimpse of what is possible. His graphs show simple transactions between wallets but give what is perhaps the most memorable answers to the question, what is a blockchain?
“You now have public access to view all transactions on a payment network,” said Silkjaer, “We have never had that before with banks, because each bank is secretive.” Silkjaer is now working to classify the transaction-clusters into categories and visually paint a picture of which addresses are being used for trading, for making purchases, or for sending collateral to loan providers. Day sees Silkjaer’s work as an example of things to come. “That's what I'm actively working on right now,” he adds. “Getting the data available in graph data structures to enable those types of queries.”
While Day’s job as Google Cloud developer advocate puts him in a unique position to build bridges between the search giant and developers, he is not alone in his blockchain interest at the company. Going back to at least to September 2016 Google has reportedly filed more than 20 patents for blockchain-related technology, including one in 2018 for using a “lattice” of interoperating blockchains to increase security. Among Google’s earliest forays into blockchain were a number of high-profile strategic investments, including Blockchain Inc., Ripple, and Veem.
Then, in July 2018, Google revealed it would be supporting development internally using the ethereum blockchain and Hyperledger Fabric, and that it had formally partnered with financial infrastructure provider, Digital Asset, which counts the Australian Securities Exchange (ASX) among its customers, and enterprise ethereum app developer BlockApps, which was an early partner with Microsoft, and recently started working with Amazon Web Services and Red Hat, now owned by IBM.
BlockApps CEO Kieren James-Lubin says that while Google was relatively late to publicly commit resources to blockchain, the company will benefit from watching from the sidelines as the cryptocurrency market collapsed in 2018. To help make up for that lost time James-Kiernen says his team is working “in the trenches” with Google to help their sales and pre-sales teams understand the value proposition of enterprise ethereum applications.
In the meantime, Google has amped up its presence in the global event space, hosting a number of private events that nonetheless attracted standing room only audiences. In August 2018 the president of the Ethereum Foundation, Aya Miyaguchi joined Day and others on-stage at Google’s Asia headquarters in Singapore where she discussed how Day’s work might be used to help businesses make better informed decisions about how customers are using—or not using—their crypto products.
“Allen's work helps by providing public datasets for businesses or products to make decisions for their implementations,” says Miyaguchi. In December, Google hosted its first blockchain on Google Cloud event in New York City, with startups on stage including partners BlockApps and Digital Asset as well as enterprise blockchain developer, Blockdaemon and ethereum investor, ConsenSys Ventures. At the next Google Cloud NEXT event in April 2019 partner Digital Asset plans to reveal a number of new developments related to the partnership.
As for Allen, he’s working to put together a cash prize for a contest to use Google BigQuery to calculate cryptocurrency Gini coefficients around the world, and is continuing his work using BigQuery ML to seek out new artificial intelligence in blockchain data, and trying to identify what exactly those seemingly coordinated robots are actually up to?
“This is the general trend that you're going to be seeing going forward,” says Day, referring to the most sophisticated forms of search. “The community that I'm building around this is mostly machine learning people, and they're thinking about all kinds of other stuff, and it's gonna start coming out.”