Spark Cloud Data Engineer (AWS, remote)


от 300 000 ₽

Местоположение и тип занятости

Полный рабочий деньМожно удаленно


Описание вакансии

Условия работы

# # # About the Company # # #

ClearScale LLC (headquarters: San Francisco, CA, USA) - AWS Premier Consulting Partner, has been offering a full range of professional cloud computing services for more than 10 years, including architecture design, DevOps automation, refactoring and cloud-native application development, integration, migration and support using the best advanced technologies. Our team of 100 + certified architects, engineers and developers has a serious experience of successful projects. We help large Fortune 500 companies, midsize businesses and Silicon Valley startups successfully implement ambitious and complex cloud projects to become industry leaders.

ClearScale is growing quickly and there is high demand for the services we provide. Clients come to us for our deep experience with Big Data, Containerization, Serverless Infrastructure, Microservices, IoT, Machine Learning, DevOps and more.

Due to the growing new needs in the company, we are looking for a Spark Cloud Data Engineer who likes to find solutions to complex problems. We work 100% remotely from various cities and countries. We trust each other, do not take shots of the screen, do not watch you through webcam and do not track your typing, as many others do. The professional reputation of engineers and developers in our community is of the highest value.

This is a unique opportunity for you to make a big impact, enjoy competitive compensation, work on a wide variety of projects for clients across multiple industries, and work alongside some of the best minds in the cloud

# # # Job Overview # # #

ClearScale is looking for an AWS Spark Cloud Data Engineer to participate in building cost-efficient scalable data lakes for the wide variety of customers, from the small startups to the large enterprises. Usually, our projects fit into one of the categories (but not limited to them):

  • Collect data from the IoT edge locations, store it in the Data Lake, orchestrate processes to ETL that data, slice it into the various data marts. Then put those data marts into the machine learning or BI pipelines
  • Optimize pre-existing Spark/Elasticsearch/Solr clusters, find the bottlenecks in the code and/or architecture, plan and execute improvements
  • Build a data deliver pipeline to ingest high volume of the real-time streams, detect anomalies, slice into the window analytics, put those results in the Elastic search system for the further dashboard consumption

# # # Responsibilities # # # 

  • Analyze, scope and estimate tasks, identify technology stack and tools
  • Design and implement optimal architecture and migration plan
  • Develop new and re-architecture solution modules, re-design and re-factor program code
  • Specify the infrastructure and assist DevOps engineers with provisioning
  • Examine performance and advise necessary infrastructure changes
  • Communicate with client on project-related issues
  • Collaborate with in-house and external development and analytical team
  • Write ETL/ELT jobs
  • Design Data Lake House pipelines

# # # Description of candidate requirements # # #

ClearScale expects successful candidate to have most of the following qualifications and skills (not necessary all have to be presented):

  • Hands-on experience designing efficient architectures for high-load enterprise-scale applications or ‘big data’ pipelines
  • Practical experience in implementing of micro-services architectures (using Java, Python or Scala stack)
  • Hands-on experience with message queuing, stream processing and highly scalable ‘big data’ stores
  • Advanced knowledge and experience working with SQL and noSQL databases
  • Proven experience in re-design and re-architecting of the large complex business applications
  • Strong self-management and self-organizational skills
  • Successful candidates should have experience with any of the following software/tools (not all required at the same time):

    • Java/Scala/Python - strong knowledge
    • Big data tools: Kafka, Spark, Hadoop (HDFS3, YARN2,Tez, Hive, HBase)
    • Stream-processing systems: Kinesis Streaming, Spark-Streaming, Kafka Streams, Kinesis Analytics
    • NoSQL DBs: Kassandra, DynamoDB, Mongo
    • AWS cloud services: EMR, RDS, MSK, Redshift, DocumentDB, Lambda
    • Graph databases development and optimization (Neo4j, Neptune, Tinan)
    • Message queue systems: ActiveMQ, RabbitMQ, AWS SQS
    • Federated identity services (SSO): Okta, AWS Cognito
  • We are looking for a candidate with 3+ years of experience in Data, Cloud or Software Engineer role, who has attained a degree in Computer Science, Statistics, Informatics, Information Systems or another quantitative field
  • Valid AWS certificates would be a great plus

You’ll be a great fit if:

  • You'd like to work remotely with a flexible schedule
  • You thrive in a small, dynamic, and agile team that encourages you to learn and grow
  • You desire to work with some of the world’s top brands
  • You enjoy finding solutions to interesting problems and figuring out how things work
  • You welcome having autonomy with complex tasks
  • You are passionate about using your experience and expertise to inspire the team

# # # We offer # # #

  • High compensation paid every two weeks in USD;
  • Completely remote work: we trust our employees to work from anywhere they please;
  • Very flexible schedule consisting of 40 hours per week, Slack for communication;
  • Bureaucracy-free environment;
  • We adore change-makers and creators so you'll be able to influence our processes;
  • Highly skilled project teams that consists of senior level professionals;
  • Wise projects rotation including ML, Big Data, IoT, across the various industries 

    (including but not limited to fintech and healthcare, ads and HR);

  • Paid AWS certifications