Lead Software Architect, Database Engine (Java or C++)

3 декабря 2022

Требования

Архитектор программного обеспечения, Ведущий (Lead) • Распределённые вычисления • Базы данных • Высоконагруженные системы • Системное программирование • Прикладная математика • Алгоритмы и структуры данных • Многопоточность • Разработка программного обеспечения • C++ • Java

Местоположение и тип занятости

• Полный рабочий день • Можно удаленно

Компания

CedrusData

Платформа для анализа всех данных предприятия на основе open-source проекта Trino

Описание вакансии

Условия работы

At Querify Labs, we help technology companies create new databases and data management platforms. We are looking for a motivated team member who will lead the design and implementation of the core components of database engines, such as query optimizers, query engines, storage engines, and distributed protocols.

About the Company

At Querify Labs, we help technology companies build innovative data management products. We create query optimizers, query engines, storage engines, and distributed protocols.

Our customers are startups from the USA, Europe, and Israel, building new databases and data platforms to address emerging and unique niches in the data management market.

Before joining Querify Labs, we worked on Apache Ignite, Hazelcast, Yandex Clickhouse, and Yandex Database (YDB, YQL) projects. Now we scale our expertise worldwide.

We are frequent speakers at Russian and international conferences (Highload++, Percona Live, ApacheCon), maintain a technical blog about databases, and actively contribute to open-source projects, such as Apache Calcite.

Our mission is to build a strong community of database experts in Russia who will drive innovation in the data management domain.

About the Role

The Database Engine Team at Querify Labs designs and implements the core database components. The team researches the existing and novel data processing approaches, prepares the design documents, creates proofs-of-concept, and writes the production code.

We write new databases in Java and C++. Our stack includes:

Apache Calcite for query optimization.
Apache Arrow for vectorized and columnar data processing.
RocksDB for storage-related tasks.
LLVM for compiled query engines.

In this role, you will work closely with a technical project lead and other teammates on the design and implementation of various database components.

You will work on challenging technical problems including but not limited to:

Query engines: relational operators, vectorized and compiled execution, resource scheduling.
Cost-based query optimizers: relational optimization rules, statistics gathering, join graph enumeration.
Storage engines: recovery, concurrent access, indexes, data spilling.
Distributed algorithms: transaction protocols, data replication, fault-tolerance.

You Will

Lead the design and implementation of one or more database components.
Write production code in Java or C++ using Apache Calcite, Apache Arrow, RocksDB, LLVM, and other tools.
Prepare design documents and prototypes.
Analyze open-source products and academic papers in the area of data management.
Share your knowledge with the community through blog posts and conference talks.

You Have

The successful candidate has prior experience in one of the following areas:

Database internals: storage layer, query processing, query optimization, transactions, performance optimization, etc.
Distributed processing: data exchange, streaming, fault-tolerance, replication, distributed transactions, scheduling, etc.
Low-level system design: compilers (front-end, optimizations, IR), SIMD and vectorized execution, acceleration with GPU and FGPA, etc.

Strong knowledge of Java or C++. Readiness to learn new languages and tooling.
Strong analytical skills. Ability to grasp complex technical concepts and tie the impact of trade-offs to product goals.
The thoughtful and empathetic mindset. A desire to partner with your teammates on challenging problems.
Ability to communicate in English (both written and spoken).
Experience with databases or distributed systems is a plus.
Experience with parallel algorithms and concurrency is a plus.
Experience with big data products is a plus (Apache Spark, Apache Flink, etc.).

Бонусы

Working in a team of experts in the areas of data management and distributed systems.
Extraordinary complex and interesting tasks.
High salary, definitively above the market average.
Remote work with flexible working hours.
Additional paid day-offs.