Data Analyst
Требования
Местоположение и тип занятости
Компания
Описание вакансии
О компании и команде
We are an R&D team that deals with everything related to data and machine learning. Our team consists of 6 people, including Data Analysts, ML Engineers, and Software Engineers - a cross-functional team about data, ML, and engineering. We are responsible for the complete cycle of our developments. That is, we build models ourselves, wrap them into services and deploy them into production. We monitor our services ourselves. We are responsible for the fault tolerance of the system.
Our team handles the following tasks: recommendation system, content automoderation, anti-fraud, product and marketing analytics, video generation.
We strive to grow everyone within the team to be full-stack. That is, we teach data scientists and data analysts to write production code. Engineers learn to train models. ML engineers understand how the product and product analytics work.
Our team does not have a project manager or a system analyst who would write us technical specifications. We understand the priorities of the business ourselves, synchronize with the business. We formulate tasks ourselves, decompose them, and distribute them within the team. Often we come to the business with new solutions and ideas ourselves.
We work in two-week sprints. Stand-up every morning. Demo every Friday at the end of the sprint. Always retrospective after the demo. Every 4 weeks there is a 1to1 with the team lead for each team member. Every 6 months, everyone has a Performance Review. We use Jira, but without fanaticism.
Technologies:
We write services in python (FastAPI, Faust), go, and C++.
Everything works in AWS. We deploy light services in a k8s cluster, ML services in AWS Sagemaker. For deploy we are using Gitlab CI. In Python, we actively use mypy, pylint, flake, isort, bandit - without passing linters, the pipeline will not let the merge request through.
User events are sent by backend services to Kafka. From Kafka, we collect everything in ClickHouse using the Kafka engine. Any change in the DB schema is versioned through migrations. We build product metrics and dashboards in Datalens.
Also we use self-hosted Redis Stack and PostgreSQL.
Monitoring with: prometheus, grafana, sentry, kibana.
Growth:
If desired, you will be able to try your hand at different tasks: data analytics, engineering, machine learning. In the production code, we conduct a detailed code review. We prepare articles for publication. We plan to give talks at conferences. In regular 1to1, we focus on professional growth - so everyone does not only what the business needs but also what they are interested in and excited about.
Ожидания от кандидата
Requirements:
- Strong knowledge of Python (numpy, pandas, sklearn);
- Strong knowledge of SQL;
- Ability to visualize data in many ways;
- Knowledge of statistics and probability theory at a confident level;
- Experience in product analytics, knowing how to calculate product metrics (Retention, MAU, DAU, Sticky Factor, K-factor, ARPU, etc);
- Understanding of classical algorithms and data structures;
- Ability to construct and verify statistical hypotheses, conduct AB-tests (bootstrap, CUPED, stratification, MDE, etc);
- Practical experience working in Machine Learning.
Nice to have:
- Experience working with ClickHouse;
- Experience working with high-load systems;
- Experience in anomaly detection in data;
- Experience in financial and economic data analytics;
- Ability to write production code (Python 3.9+).
Tasks:
- Metrics and ad hoc analytics. The app is actively developing, and new features appear quickly. We want to constantly measure what is happening to the app from different points of view. We will need to come up with metrics, discuss them with product owners, implement them as online metrics using SQL and python. Implementation will require a deep dive into the product and how it works.
- Building ETL and DWH. You will need to understand the data, plan how to store it in ClickHouse, taking into account how it will be used later. Write ETL pipelines and migration schemas to load messages from Kafka to ClickHouse. Based on this data, we will build various metrics.
- Building dashboards. We use Yandex Datalens to display metrics. To speed up metric construction, you will need to design various Aggregating Merge Trees. Write SQL, do visualization, present to the business.
- Participation in conducting AB tests. We are building our own system of AB-testing. We already have the first version, using such approaches as stratification, CUPED, linearization, bootstrap, delta method, etc. We need to expand the number of target metrics, participate in the development of AB-testing system, participate in the planning and analysis of AB-tests.
- Data analysis and anomaly detection. The application has many different game mechanics using cryptocurrency. You will need to look at user behavior from different angles, find anomalies and potential fraudsters. Interact with the finance department and do economic analytics.
Условия работы
Benefits and Perks:
- GPU/CPU servers in the cloud;
- Top-notch equipment and all necessary software;
- Office within walking distance of Dobryninskaya / Serpukhovskaya metro stations;
- Possibility of remote work;
- Option grant available;
- Flexible schedule.