Responsabilidades
Duties & Responsibilities
Core Responsibilities
• Infrastructure Automation: Implement and maintain Infrastructure as Code (IaC) using tools such as Terraform, CloudFormation, or Ansible
• CI/CD Pipeline Management: Design and maintain continuous integration and continuous deployment pipelines to automate release and deployment processes using Jenkins or GitHub Actions
• Cloud Infrastructure Management: Manage and optimize cloud infrastructure on platforms including AWS, Azure, and Google Cloud, ensuring high availability, security, and cost optimization
• Monitoring and Logging: Implement robust monitoring and logging systems using tools such as Prometheus, Grafana, ELK stack, or Datadog to ensure high performance and rapid issue resolution
• Containerization and Orchestration: Utilize Docker for containerization and Kubernetes for orchestration, managing both on-premises and cloud environments
• Security Implementation: Apply security best practices in infrastructure, ensuring systems are secure, compliant, and resilient against threats
• FinOps Management: Collaborate with finance and engineering teams to establish financial accountability for cloud usage, implement cost management practices, and provide forecasting and budgeting insights
• System Uptime Management: Maintain 99.9% uptime of online services to meet Service Level Agreements (SLA)
• Disaster Recovery: Implement weekly disaster recovery backups for software and databases, stored securely both locally and online
• Security & Threat Prevention: Prevent and mitigate online attacks through proactive security measures
Employee Responsibilities
• Support the company's efforts in developing and refining the corporate vision
• Work collaboratively with teams to develop and apply strategies that create long-term value in the company's evolution, including both internal and external efforts
• Create effective alignment by working as a team to successfully resolve challenges across the company using empathetic, problem-solving, practical, and future-focused tactics
• Inspire current and future HyphaMetrics employees by upholding and exercising the company's core values to help maintain a customer-first focused organization
Specific Responsibilities (Cloud Architecture)
Note: These responsibilities are secondary in priority to core DevOps duties.
• Architecture Design: Lead the design and architecture of scalable, maintainable, and high-performance cloud systems with ownership of key technical decisions
• Performance Optimization: Collaborate with teams to optimize cloud resource usage for performance and cost efficiency
• High Availability Design: Design and implement effective disaster recovery strategies and ensure high availability of services in cloud environments
• Technology Evaluation: Assess and recommend emerging cloud technologies and tools that align with business goals
• Cross-functional Collaboration: Work closely with product management, engineering, and operations teams to ensure cloud solutions meet business requirements and project timelines
Requisitos
Qualifications
Required Qualifications
• Experience: Minimum 5 years of professional experience in DevOps, cloud infrastructure, or related roles
• English Proficiency: 90% advanced level (reading, writing, speaking)
• Level: Senior / Expert level professional
Technical Skills
1. Linux System Administration
◦ File systems: permissions (chmod, chown), disk partitions, and volume mounting
◦ Process management: identifying zombie processes, killing hung applications, managing background services (systemd)
◦ SSH & Access: generating SSH keys, managing authorized_keys, secure tunneling to remote servers
2. Networking Fundamentals
◦ Protocols: HTTP/HTTPS (status codes), TCP/IP, and DNS
◦ Cloud Networking: VPCs, Subnets, Route Tables, and NAT Gateways
◦ Firewalls: configuring Security Groups or iptables to allow traffic on specific ports
3. Scripting & Automation
◦ Bash/Shell: writing scripts with loops, variables, and error handling
◦ Python/Go: writing complex scripts that interact with APIs
◦ API Interaction: RESTful APIs, JSON data formats, and authentication using tokens
4. Version Control
◦ Branching models: Feature Branching vs Trunk-Based Development
◦ Pull Requests: code review and merge conflict resolution
◦ Tagging/Releases: semantic versioning and release tagging for deployment
5. Infrastructure as Code (IaC)
◦ Declarative vs Imperative: understanding the difference between scripting vs desired state configuration
◦ State Files: understanding state management and the risks of manual modification
6. Security (DevSecOps)
◦ IAM: understanding Roles, Policies, and the Principle of Least Privilege
◦ Secrets Management: never committing passwords or API keys to Git, injecting them as environment variables at runtime
7. Troubleshooting & Debugging
◦ Log Analysis: ability to grep through massive log files to find root cause
◦ Resource Analysis: identifying performance bottlenecks (CPU, RAM, Disk I/O, Network latency)
Technology Stack
From Day 1:
• GCP GKE (Google Kubernetes Engine)
• GitHub Repositories
• PAS (Platform Application Services)
• AWS (Amazon Web Services)
• MongoDB
Core Tools & Technologies:
• Foundation: Linux (Ubuntu/CentOS), Terminal/Bash, Git
• Build & Deploy: Docker, Kubernetes (K8s), Python
• CI/CD Automation: Jenkins, GitHub Actions
• Infrastructure: GCP, AWS, Terraform, Ansible
• Monitoring: Prometheus, Grafana, ELK Stack
Preferred Qualifications
• Experience with serverless architectures and event-driven systems
• Experience with microservices architecture
• Certifications in AWS, Azure, or GCP
• Familiarity with Agile methodologies (Scrum, Kanban)
Core Competencies
• Technical Excellence: Deep expertise in DevOps practices, cloud infrastructure, and automation with ability to make key technical decisions independently
• Problem-Solving: Strong troubleshooting and debugging skills with ability to identify root causes quickly and implement effective solutions
• Automation Focus: Passion for automating processes and optimizing workflows to improve efficiency and reduce manual intervention
• Security Mindset: Commitment to security best practices and compliance with ability to implement secure infrastructure and prevent threats
• Collaboration: Effective team player who can work collaboratively with cross-functional teams including development, operations, and business stakeholders
• Performance Under Pressure: Ability to work effectively under pressure, especially during critical incidents and system outages
• Proactive Approach: Self-starter who identifies and addresses potential issues before they become problems
• Continuous Learning: Commitment to staying current with emerging technologies and industry best practices
• Communication: Strong verbal and written communication skills in both English (90%) and technical documentation
• Adaptability: Flexibility to work in different modalities (remote/hybrid/on-site) and adjust to changing priorities and requirements