Руководитель отдела системного администрирования
Зарплата
Требования
Местоположение и тип занятости
Компания
Описание вакансии
Условия работы
We're looking for a lead sysadmin for full time remote job
Skills required:
- high initiative
- make decisions fast. In sys team there're a lot of things happening daily, so we can't afford thinking about each for days, we don't have even hours, sys team lead should be able to:
* make correct decision fast
* find a person who can solve the problem fast
* make sure the problem is solved
- as a consequence of the above research and implementation of miscellaneous things should be also done fast. If you can make a fast solution within 1 hour and a good solution will take 10 hours you definitely go the 1st way, then you can wait a month and there's a chance the world will change and the better solution will not be needed at all or you'll need to rewrite all from scratch.
- perfectionism is none of expected skills. Pragmatism is expected very much
- ability to do things the lean way. Nobody needs it now = nobody will do it now
- still the quality of the decisions/researches and implementations is required. Lead sysadmin is to guarantee this.
- ability to convince a sysadmin that some job should be done and it should be done soon even if he doesn't want to do it. Or do the job yourself.
- ability to speak English fluently
Scope of responsibilities:
- report to CTO/CEO
- providing high uptime (there will be a monthly achievement based on this)
- handling incoming requests from customers in a timely manner (e.g. any email during working hours should be replied within an hour)
- handling manual and automated emergency alerts (emails, messages in irc from bot, calls) - few incidents each month
- planning maintenances together with admins (customers should know in advance if we expect any downtime or higher warnings ratio)
- short-term planning: daily important things
- long-term planning: quarterly goals
- coordinate communication with technical partners
- coordinate successful work with development team
- configuring integration with existing and new service providers, e.g. phone.com / pingdom / twilio
Possible long-term projects a lead sysadmin should be ready for:
- decrease avg # of alerts
- organize periodical testing of:
- HA
- backups
- emergency calls
- hire another sysadmin or two to organize 24/7 "follow the sun" support
- integrate Ansible/Chef/Puppet
What we have:
- 2 highly qualified sysadmins in the team (speaking Russian)
- development team of 4 persons
- English speaking customers
- about 200 servers, mostly very powerful Dell servers of different models
- hundreds of terabytes of data in MySQL and Sphinx
- API used by huge companies which should never go down since each minute of downtime costs a lot of money
- centos/RAID/DRAC/OMSA/PERC, apache/squid/nginx/pfsense/watchguard/php/mysql/sphinx/memcache/redis/rabbitmq/kafka/mercurial/perl/python
- pingdom/customized nagios/zabbix/twilio/phone.com