👀 Дизайнеры, с какой командой вы мэтчитесь по вайбам? Проверяйте на Вайб-чеке→ vibe.habr.com

Data Scientist

Местоположение и тип занятости

МоскваПолный рабочий деньМожно удаленно

Компания

Один из лидеров ИТ-рынка с более чем 30-летним опытом

Описание вакансии

Условия работы

Stack

Python (pytest, jinja for pipeline templates), Spark (Databricks platform, PySpark for pipelines), DeltaLake, Azure Data Explorer, Azure EventHubs (Kafka API), GitHub Actions, MLFlow, Apache Superset.

What we do / will do

  • Create a unified data platform as a single source of truth for ML, data analytics, reporting, and dashboarding issues.
  • Develop and automate high-performance data processing pipelines (batch and/or streaming).
  • Design data models to meet critical product and business requirements with optimal storage.
  • Integrate data lineage tools.
  • Develop CI/CD pipelines for the fast ML-based projects launching.
  • Improve data quality by using & improving internal tools to automatically detect issues.
  • Provide consistent tooling for the software developers to simplify integration with the data platform.

What do you know?

  • Data warehousing, data modeling, and data transformation.
  • How to write and optimize complex SQL.
  • Strong Python skills (and Scala is a plus).
  • Experience building production data pipelines with Spark, Spark Streaming with reliable monitoring and logging practices, and optimizing existing workflows for new features.
  • MPP/Cloud data warehouse solutions (Snowflake, Redshift, BigQuery, Vertica, Teradata, Greenplum, Azure DWH, ClickHouse, etc).
  • Experience with messaging systems (ex: Apache Kafka or RabbitMQ etc.)
  • Experience in data quality approaches and providing consistent SLO on data.
  • Data projects architecture at scale.
  • Knowledge in distributed system design, such as how map-reduce and distributed data processing work at scale.
  • Strong familiarity with SQL and data modeling concepts, in either a relational or data warehousing context.