Keynote: The Polyglot Enterprise

Mitch Pirtle, CapitalOne

The modern enterprise has been thrust into a world of constantly changing and evolving technology stacks; and this means an ever-changing roster of languages and platforms that access data (including even the very ways in which we store and access data from these apps). Where do databases fit in this rapidly expanding picture of ‘one tool, one task’, especially at the scale of petabytes and zetabytes?

Ambry : LinkedIn's Scalable Geo-Distributed Object Store

Sivabalan Narayanan, LinkedIn

Ambry is an open-source geo-distributed highly available and horizontally scalable object store built at LinkedIn. It is an active-active, immutable, videoually consistent handle store that can be configured to provide different levels of consistency. At LinkedIn, Ambry runs on hundreds of nodes spanning multiple data centers and is the source of truth for all media and immutable content. We’ll look at the need for a scalable, geo-distributed and highly available object store in a media centric world. We will go over some of the unique challenges that an object store posses and how did Ambry over come those by smart design choices which in turn helped Ambry to scale for both large and small objects maintaining low latency and availability. Second part of the talk goes over the architecture of Ambry in little more detail with some selected internals and the talk ends with our roadmap lying ahead for us.

Replex: Rethinking Indexing in Data Stores

Amy Tai, Princeton

The need for scalable, high-performance datastores has led to the development of NoSQL databases, which achieve scalability by partitioning data over a single key. However, programmers often need to query data with other keys, which data stores provide by either querying every partition, eliminating the benefits of partitioning, or replicating additional indexes, wasting the benefits of data replication. There is no need to compromise scalability for functionality with Replex. Replex is a datastore that enables efficient querying on multiple keys by rethinking data placement during replication. Traditionally, a data store is first globally partitioned, then each partition is replicated identically to multiple nodes. Instead, Replex relies on a novel replication unit, termed replex, which partitions a full copy of the data based on its unique key. Replexes eliminate any additional overhead to maintaining indices, at the cost of increasing recovery complexity. To address this issue, we also introduce hybrid replexes, which enable a rich design space for trading off steady-state performance with faster recovery.

Storage Wars: The Art Genome Project

Daniel Doubrovkine, Artsy

In 2011 we started the Art Genome Project, a classification system and technological framework that powers Artsy. It maps the characteristics (we call them “genes”) that connect artists, artworks, architecture, and design objects across history. There are currently over 1,000 characteristics in The Art Genome Project, including art historical movements, subject matter, and formal qualities. This is the story of the evolution of the data layer and nearest neighbor search technology, and the lessons learned, from MongoDB and PostgreSQL through Elastic Search.

Bootstrapping a Startup using Compose

Vikram Tiwari, Omni Labs, Inc

In the current age of OpenSource, testing a new idea is not really hard. Take an open source project, customize it, add your params and you are ready to go. You need the right tool to help you focus on your product rather than managing all nuances.

Scylla, the High-Performance Cassandra-Successor

Eyal Gutkind, Scylla

Scylla applies new systems programming techniques to a horizontally scalable NoSQL data-store designs that results in x10 performance improvements; Scylla is a drop replacement for Cassandra, providing flexible replication, multi-datacenter and Apache Spark integration. To simplify its manageability, Scylla automatically tunes ram, cache and admin tasks such as repair compaction, providing consistently predictable latency.

Porting Our Existing App to GraphQL

Jason Denizac, Zendesk

The real-life story of how we ported Zendesk Inbox from a REST API to GraphQL - exposing data backed by MySQL, ElasticSearch, and internal APIs via a common data model to allow more efficient data fetching and faster front-end development time.

Partial Indexing for Improved Query Performance

Chris Erwin, Elemenio

Postgres supports partial indexes which allow you to essentially add a where clause to your indexes. In my experience, not many developers know about this incredibly powerful tool. This lightning talk will cover what partial indexes are, how to use them, and some practical real world examples.

Managing (or not) the Data in Immutable Infrastructure

Adron Hall, Home Depot

The idea of immutable infrastructure is awesome. However a major problem immediately erupts when we get to the part where we actually have to connect application infrastructure with data infrastructure. In this talk I'll aim to start conversations about what specifics we can aim for now, and in the future, to remove this gap. I'll also talk about and show what and how I've worked up solutions in production with immutable infrastructure and data connectivity.

High throughput, low latency at scale - Boost the performance of your distributed database

Akbar Ahmed, DynamiteDB

The audience will learn how distributed DBs (Cassandra, MongoDB, RethinkDB etc.) solve the problem of scaling persistent storage, but introduce latency as data size increases and become I/O bound. In single server DBs, we solve latency by introducing caching. In this talk, the audience will learn how to improve the performance of distributed DBs by using a distributed cache to move the data layer performance limitation from I/O bound to network bound.

GraphQL: Translating Backend Data to Frontend Needs

Sashko Stubailo, Meteor

Engineers working on backend data services are often focused on operational concerns like data consistency, reliability, uptime, and storage efficiency. Because each situation calls for a specific set of tradeoffs, one organization can end up with a diverse set of backend databases and services. For the people building the UI and frontend API layers, this diversity can quickly become an issue, especially if the same client needs to call into multiple backends or fetch related objects across different data sources. GraphQL is a language-agnostic API gateway technology designed precisely to solve this mismatch between backend and frontend requirements. It provides a highly structured, yet flexible API layer that lets the client specify all of its data requirements in one GraphQL query, without needing to know about the backend services being accessed. Better yet, because of the structured, strongly typed nature of both GraphQL queries and APIs, it's possible to quickly get critical information, such as which objects and fields are accessed by which frontends, which clients will be affected by specific changes to the backend, and more. In this talk, I'll explain what GraphQL is, what data management problems it can solve in an organization, and how you can try it today.

Online Schema Migrations for MySQL Using gh-ost

Tom Kroper, Github

gh-ost is a new tool by GitHub which changes the paradigm of MySQL online schema changes, designed to overcome today's limitations and difficulties in online migrations. gh-ost is:

In this session we will: