The mass migration of data online has created a virtual data pool with a nearly infinite amount of user information. What’s even more impressive is that this pool grows every day as millions of people the world over log onto the Internet to conduct all manner of business. Search engines compile the massive amounts of data that results from organic web searches. That data is like pure gold to marketing and advertising executives wanting to track the efficiencies of their online ad campaigns.
When it comes to online advertising data, Google’s data banks are overflowing. As the most popular search engine used for consumer purposes, Google has saved enough business-relevant data to give companies a virtual glimpse into the minds of potential customers. Among the data that Google stores related to advertising, one will find:
- Targeting criteria
- Click-thru volumes
- Number of impressions
Google uses all of this information in a number of different ways, including analysis, reporting, forecasting, and the auditing of their own internal processes. Once the information has been thoroughly washed by Google, advertisers can then use it to gain valuable insight into the performance of their web campaigns.
The Rise of Mesa
The exponential growth of Google’s advertising data pool has necessitated the creation of an effective warehousing system to handle the storage of this data. That system was recently introduced to the public as Mesa, the latest and greatest feat in data solutions. Mesa was designed to meet the increasingly complex demands of both front end users and systems requirements. Not only does Mesa meet those requirements, it exceeds them in many areas.
User demand for more detail in their data has required that Google increase data size to tremendous proportions. Data of this size and scale presents its own unique challenges when it comes to storability and scalability. Mix those issues in with the wants and wishes of those looking to query such data, and the challenges facing Mesa may seem overwhelming. Among the issues it deals with are:
- Update throughput: Google processes advertising data in near real-time, requiring Mesa to constantly support a seemingly never-ending stream of updates. Those changes include the creation of new rows of data as well as updates to already-existing ones. Keep in mind that this data also needs to be ready to query within minutes.
- Querying reliability: Users look to Mesa to support queries with 99th percentile latency retrieved in mere milliseconds as well as a query throughout of trillions of lines of updated business-critical data every day.
- Exploding updates: A single action by a single front-end user can trigger countless changes that cause updates to literally “explode” across different sets of dimensions and metrics. Mesa must be able to apply those updates immediately, as the system cannot be queried until they are.
- Data and metadata changes: Existing data values and schemas need to be modified or, in some cases, transformed before they can be updated or new features can be applied to them. This process must also be done without affecting the system’s operations and query performance.
- Data consistency: Even with these constant updates, legal and business ethical concerns require that query outputs remain consistent and repeatable, even in those cases when multiple data sources are being queried.
- Data scalability: Mesa must be able to scale as its data volumes increase while also delivering the same level of query and support performance.
- System availability: Even though it’s asked to support such a complex system of data and support parameters, Mesa is expected to be immune to system failures. This means that there can literally be no downtime due to both planned and unplanned maintenance issues.
Sounds easy enough, right? Well, not worry; Mesa makes it look that way. The system itself is able to handle petabytes of data, processing billions of query requests every day that return with data rows numbered in the trillions. What’s more, Mesa is capable of supporting millions of updates to data rows every second, creating a virtually live database that returns near real-time data.
Yet that’s not even the most of impressive of Mesa’s tricks. The system itself is geo-replicated across multiple data centers. This allows users to experience lower latency through access to the data center that’s nearest to them as opposed to one that hundreds to thousands of miles away. This also ensures consistent and repeatable query outputs even in the event that an entire data center on the system fails.
Lessons Learned from Mesa
Yes, Mesa’s performance resume is quite impressive. Yet hidden within it is a message for web hosting services, as well: performance updates should be made in giant leaps, not baby steps. This is especially true when it comes to latency. Given the issues that other warehousing tools such as Hive and Presto have had to deal with when it comes to low latency, Mesa’s developers specifically focused on this aspect to be able to offer a level of data querying and retrieval performance that’s unrivaled. Today’s web hosting services that continue to operate off of a single server or out of a certain data center can’t guarantee their subscribers that same level of latency.
Thus, web hosts may want to take a page from Mesa’s playbook. By improving system performance and availability through geo-replication, Mesa essentially guarantees users that it’s immune from the issues that data centers face. Similar to cloud services, server redundancy allows Mesa’s performance to remain consistent. In fact, many industry insiders see Mesa as a tool that Google will use to launch new cloud services. They cite the ease at which Mesa could be easily deployed to data centers all over the world, thanks in large part to Google’s extensive reach. This could be seen as yet one more reason why web hosts should seriously consider the benefits of offering their clients services on virtual servers.
As more and more consumer-related data is stored online, even more advanced tools will be needed in order to manipulate it. Google’s Mesa is the latest of these super tools, and promises to provide virtually unlimited access to advertising data no matter the volatility of the environment in which it is carried. If web hosting companies are willing to take a close look at all that Mesa has to offer, they’ll see just how much their clients value an efficient and dependable service provider who can promise the same operational efficiency that Mesa delivers.
Top image ©GL Stock Images