More

chad_walters · on Dec 10, 2015

MySQL may work well for this small data set (200GB). Start working with 10s of TBs of data and you will start to understand why NoSQL stores were built.

austinsharp · on Dec 10, 2015

My thought exactly. This is just a scale that can be solved either way; when you really can't fit your data on even a handful of machines with acceptable performance, then Cassandra can start to shine.

carterehsmith · on Dec 10, 2015

People are running petabyte-sized (1000TB) databases on SQL. One example is from Nasdaq, https://customers.microsoft.com/Pages/CustomerStory.aspx?rec...

Meanwhile, NoSQL does not mean "10s TBs of data" automatically. Check this slideshow explaining challenges of MongoDB (poster NoSQL database) "scaling to 100GB and beyond"

http://www.slideshare.net/mongodb/partner-webinar-the-scalin...

ersoft · on Dec 10, 2015

Did you look at vitess [0] ? It handles sharding/replication of MySQL up to PBs of storage and 10s of thousands of connections. Also, it implements caching at the proxy level so you don't need to use memcached. If multiple requests for the same resource are sent to a vttablet (shard proxy) at the same time, only one is forwarded to the database and all of them receive the same result.

[0] http://youtu.be/midJ6b1LkA0

yoava · on Dec 10, 2015

The question then is what is NoSQL. Does mongoDB a NoSQL engine? can you scale it to 10s of TBs? Can you really scale out MongoDB? I can ask the same for Redis and a whole line of other NoSQL engines that are not really scale out solutions.

(yee, you can shard both Mongo and Redis, as well as MySQL and get to 10s of TBs).

chad_walters · on Nov 23, 2015

Likewise when they start heading to Silicon Valley in droves... Last time around was 2000 -- 'nuff said.

chad_walters · on Nov 2, 2015

Please point out where in the article a solution is discussed at all.

chad_walters · on Oct 22, 2015

This article is terrible -- it sets up a false dichotomy. The article actually refutes it's own conclusion in the paragraphs about what people view as the ideal income distribution. We don't want total equality but we do want less inequality than is actually present today in every 1st world country. So we should care about the current (and growing!) levels of inequality -- even if we don't want a perfectly equal distribution, we want one that is more equal.

chad_walters · on Jan 10, 2015

I previously worked on Thirft and have overseen the development of Bond with Adam as the lead developer.

Your characterization of Thrift is accurate and Bond actually has some of the same architectural roots as Thrift. Those features of Thrift were ones that I wanted to preserve in Bond. But we also wanted to expand that plugability to allow for even more flexibility than the core Thrift architecture would allow for -- for example, the ability to support Avro-like "untagged" protocols within the same framework. I believe that the core innovation is in how that gets implemented. Also, we believe that performance is a feature -- our testing has shown that Bond noticeably outperforms Thrift and Protocol Buffers in most cases.

There is no conspiracy or intent to "ignore languages" -- we will release additional languages as they are ready and as we can support them as first-class citizens. We also welcome community involvement.

chad_walters · on Nov 13, 2014

"After extensive experience working with Bigtable and other eventually consistent systems..."

This is not accurate -- Bigtable is not eventually consistent. The scope of transactions supported by a system is a different set of considerations from the level of consistency it provides. Bigtable is consistent but only allows for transactionality at the row level.

Optimistic concurrency control is nothing new and Percolator layered transactions on top of Bigtable years back. Furthermore, TrueTime -- allowing for comparatively low-latency update across a globally distributed set of DCs -- is the real innovation in Spanner, not the use of optimistic concurrency control.

Honestly, I am not sure what this article is trying to claim, except perhaps that per-node performance has been improved. AFAICT, most of this is due to the fact that RAM is cheaper than it was, SSDs have reached commoditization, and networks in the DC are faster than they used to be.

chad_walters · on Oct 10, 2014

Your post mentions "single root IO virtualization" as a factor in maximizing network performance. I am wondering what the impact of this was in your sorting. Do you have data for runs where you didn't enable this?

rxin · on Oct 10, 2014

It was part of the enhanced networking. Without enhanced networking, we were getting about 600MB/s, vs 1.1GB/s with.

chad_walters · on May 31, 2013

The title is not just misleading -- it is just plain wrong.

NUMA was 15% better for Gmail and 20% better for the Web search frontends, as indicated by the reductions (improvements) in CPI for these workloads.

There were some workloads where NUMA did degrade performance, such as BigTable accesses (12% regression).

chad_walters · on April 7, 2013

The amazing thing is that the "big bulge" that Evans referred to in the original graph is clearly the bulge on the left (low-income) side of the graph, while McArdle incorrectly focuses on the artifact on the right (high-income) side -- and then none of the follow-on posts seem to point this mistake out.

chad_walters · on Sept 14, 2012

There is also S4: http://incubator.apache.org/s4/

batgaijin · on Sept 14, 2012

WTF that project is awesome and doesn't even have a page on Wikipedia?!?!?

I don't get it... what's the point of creating awesome software if you don't even make the effort to put the links out to help people find it?

seanmcdirmid · on Sept 14, 2012

You should write a page for it. You aren't supposed to create pages for your own projects/products on wikipedia; they should come from neutral parties.