LosAngelesRecruiter Since 2001
the smart solution for Los Angeles jobs

Senior Site Reliability Engineer

Company: SHEIN Distribution Corporation
Location: Los Angeles
Posted on: August 7, 2022

Job Description:

Job Title: Senior Site Reliability EngineerReports to: Head of Technical OperationsJob Location: Los Angeles Based, Remote
About SHEIN TechnologySHEIN Technology is a U.S. technology company. Founded in 2012, SHEIN is a leading global online retailer with operations in Guangzhou, Los Angeles and Singapore, along with other key markets. SHEIN reaches consumers across more than 150 countries and regions around the world. We place a premium on choice, delivering more than 6,000 new fashion, beauty and lifestyle products daily with more than 600,000 items available. Our mission is to help people express their individuality through the latest trends that are accessible and affordable. To learn more about SHEIN, follow us at shein.com, instagram.com/sheinofficial and youtube.com/shein.
Position SummaryWe are looking for an experienced Site Reliability Engineer to join our Technical Operations team. Site Reliability Engineers at SHEIN are hybrid software/systems engineers whose overarching goal is to ensure that Production Services are "Always On." They strive to build the most reliable and performant systems on the planet.SREs work closely cross-functional teams to ensure we have the right set of tools to generate, collect, analyze, visualize and alert on operational data, so we know exactly what happens across the ecosystem and can see problems before they occur and address them as quickly as possible.They are also responsible for improving Operational Efficiency, Utilization and System Resiliency of the Platform. They own Critical Open-Source Software that our platform relies on and are core participants in every significant engineering effort underway in the platform.They are also tasked with driving forward the operability of the platform to drive down the number of incidents while reducing MTTR. To accomplish this, the team combines software development, networking and systems engineering expertise, and a strong desire to be challenged by problems of scale and complexity to make our service better for our customers.
Responsibilities:

  • Participate in an on-call rotation to ensure 24/7/365 availability of SHEIN's production system
  • Responsible for architecture review, capacity planning, and cost optimization of Hadoop/Spark/Flink/Elasticsearch/Kafka/Druid and other systems
  • Responsible for user management, authority allocation, and resource allocation of the big data platform
  • Deeply understand the data platform architecture, discover and solve hidden troubles and performance bottlenecks
  • Triage Site Availability Incidents and proactively work towards reducing MTTR for customer impacting incidents
  • Responding to production incidents and using your experience in software development, systems engineering, and networking to proactively prevent repeatable issues
  • Provide relief and sustainable resolution to issues within our infrastructure
  • Driving initiatives to evolve our current platform to increase efficiency and keep it in line with current standards and best practices
  • Drive efficiencies through software improvement and root cause analysis resulting in service delivery, maturity, and scalability
  • Develop and maintain technical documentation, network diagrams, runbooks, and proceduresSkills and Qualifications
    • Bachelor's degree in Computer Science or Information Systems or equivalent technical discipline
    • More than 4 years of big data related component operation and maintenance experience (hadoop/yarn/hbase/hive/spark, etc.)
    • Systematic problem-solving approach, combined with a sense of ownership and drive
    • Track record monitoring and analyzing system performance, isolating issues or bottlenecks that could impact reliability, performance and scalability
    • Deep understanding of Linux system, ability to deploy open-source software independently
    • Proficient in more than one scripting language (shell/perl/python, etc.), familiar with python development language is preferred
    • Understanding and experience with SRE concepts and practices, including being an advocate for the elimination of toil and drive simple solutions
    • Good verbal and written communication skills, and be able to work effectively with geographically remote teams
    • Experience with observability tools such as Grafana, Prometheus, Zabbix etc
    • Experience with Atlassian tools, such as Confluence and Jira
      SHEIN Technology is an equal opportunity employer committed to a diverse workplace environment.

Keywords: SHEIN Distribution Corporation, Los Angeles , Senior Site Reliability Engineer, Engineering , Los Angeles, California

Click here to apply!

Didn't find what you're looking for? Search again!

I'm looking for
in category
within


Log In or Create An Account

Get the latest California jobs by following @recnetCA on Twitter!

Los Angeles RSS job feeds