When would you normalize a schema and when would you deliberately denormalize it?

Normalize for OLTP: reduces redundancy, guarantees integrity, and simplifies writes. Denormalize strategically for OLAP workloads or when read queries are frequent and the cost of joins is measurable in production. Denormalization is always a deliberate decision with its trade-offs documented: increased write complexity and a risk of inconsistency if not managed carefully.

What criteria would you use to decide which columns to index on a table?

Index columns that appear frequently in WHERE, JOIN, ORDER BY, or GROUP BY clauses in high-volume queries. Avoid indexing low-cardinality columns (few distinct values) because the index adds no meaningful selectivity. Consider composite indexes when queries filter on multiple columns together. Every index carries a cost in write performance and storage — never create speculative indexes without evidence they will be used.

How would you investigate a query that is taking much longer than expected in production?

Run EXPLAIN ANALYZE to see the actual execution plan: check whether it is doing a sequential scan instead of an index scan, whether there are expensive nested loops, or whether the planner's row estimates differ significantly from reality. Verify that table statistics are current with ANALYZE. Check whether query parameters are preventing index use — functions applied to indexed columns, or implicit type conversions. Compare the plan in staging against production if they diverge.

What is the difference between a logical backup and a physical backup, and when would you use each?

Logical backup (pg_dump, mysqldump): exports data as SQL or CSV. Portable across versions and engines, but slow to restore for large databases. Physical backup (pg_basebackup, volume snapshots): copies the engine's data files directly. Faster to restore and enables point-in-time recovery with WAL archiving. For production databases with low RPO requirements, physical backup with PITR is the standard approach.

What is a deadlock, and how would you detect and prevent one in a production database?

A deadlock occurs when two transactions each wait for the other to release a lock, blocking indefinitely. Database engines detect this automatically and abort one of the transactions. Prevention: always acquire resources in the same order across all transactions, keep transactions short, and use the minimum necessary isolation level. Detection: the engine logs deadlocks — PostgreSQL records them when deadlock_timeout is exceeded.

How would you manage a developer's access to the production database to investigate a critical bug?

Create a read-only user scoped to the specific tables needed, with a time-limited access window. Log the access in the audit system. Never grant access using the application user account or a superuser. If data modification is required, the change must be reviewed, approved, and executed by the DBA — not the developer directly. Revoke access immediately after the incident is resolved.

What considerations would you apply when designing the schema for a multi-tenant application?

Three main strategies: separate database per tenant (maximum isolation, highest cost), separate schema per tenant in the same database (good isolation, moderate scalability), or shared tables with a tenant_id column (greatest scale, requires row-level security to enforce isolation). The choice depends on the number of tenants, compliance and isolation requirements, and infrastructure cost. Row-level security in PostgreSQL is a key tool for the shared-table approach.

What steps would you take before executing a schema migration on a heavily loaded production database?

Run the migration against a recent copy of production, measuring the actual execution time and the locks it acquires. Verify that a tested rollback plan is in place. Communicate the maintenance window if user impact is expected. Execute during the lowest-traffic window. Actively monitor database metrics during the migration. Have the rollback command ready to execute within seconds if something fails. Never run a production migration that has not been tested beforehand.

How does a B-tree index work internally, and in what cases is it not the best option?

A B-tree maintains values in a sorted, self-balancing tree structure that supports searches, inserts, and deletes in O(log n). It is efficient for equality and range comparisons. It is not optimal for: full-text search (use GIN with tsvector), geospatial data (use GiST or SP-GiST), low-cardinality columns where a bitmap scan is more efficient, or arrays and JSONB where GIN indexes allow searching within values.

How would you implement replication with automatic failover in PostgreSQL, and what tools would you use?

Streaming replication with WAL shipping to one or more standby replicas. For automatic failover, Patroni is the current standard: it manages consensus between nodes using etcd or ZooKeeper, automatically promotes the most up-to-date replica when the primary fails, and updates the connection endpoint via HAProxy or a service discovery entry. The application connects to a virtual endpoint that always points to the current primary, with no configuration changes needed during failover.

What is MVCC and how does it impact the performance and maintenance of PostgreSQL?

Multi-Version Concurrency Control allows readers and writers to operate without blocking each other: each transaction sees a consistent snapshot of the data as it existed when the transaction began. The cost is that old row versions (dead tuples) accumulate until VACUUM reclaims them. Without sufficient vacuuming, table bloat degrades performance and can eventually cause transaction ID wraparound — a critical failure mode. Monitoring VACUUM lag and dead tuple ratios is part of routine DBA maintenance.

How would you design a partitioning strategy for an events table that grows by 50 million rows per month?

Range partitioning by date (monthly or weekly partitions) is the most appropriate approach for time-series data. It enables: dropping old partitions with DROP PARTITION instead of DELETE — instantaneous and without table bloat — confining recent queries to small partitions to improve performance, and implementing clean data retention policies. Create future partitions in advance. Verify that queries include the partition column in the WHERE clause so the planner can perform partition pruning.

How would you diagnose and resolve severe table bloat on a critical production table?

Confirm the bloat by comparing pg_relation_size against pg_total_relation_size, or using the pgstattuple extension. Common causes: autovacuum disabled or configured too conservatively, or long-running transactions preventing VACUUM from reclaiming dead tuples. Zero-downtime solution: VACUUM FULL requires an exclusive lock (not viable on an active production table); use pg_repack or pgcompacttable to reorganize the table online. Tune autovacuum parameters (autovacuum_vacuum_scale_factor, autovacuum_vacuum_cost_delay) to be more aggressive on high-churn tables.

What strategies would you use to scale reads on a PostgreSQL database that is being saturated by analytical queries?

Separate OLAP from OLTP load by adding dedicated read replicas for analytical queries. Route reporting queries to replicas via connection pooling (PgBouncer) or at the application layer. For very heavy queries, consider materializing results in aggregation tables updated on a schedule. If analytical workload continues to grow unboundedly, evaluate offloading it to a dedicated OLAP system (Redshift, BigQuery, ClickHouse) with synchronization from the transactional database.

How would you implement auditing of access to sensitive data without degrading database performance?

Apply audit triggers only to tables containing sensitive data, not to the entire database. Write audit records asynchronously to a separate table or an external system (Kafka, centralized log) to avoid adding latency to the write path. In PostgreSQL, pgaudit enables granular auditing by object and operation type with lower overhead than custom triggers. For strict compliance, send audit logs to an immutable external system where they cannot be altered.

What are the practical differences between using PgBouncer for connection pooling versus direct connections from the application?

PostgreSQL spawns a process per connection with a significant memory cost (5–10 MB per connection). With hundreds of application instances, direct connections will saturate the database. PgBouncer maintains a small pool of real database connections and multiplexes many application connections over them. In transaction pooling mode, the connection is returned to the pool at the end of each transaction. The trade-off: some PostgreSQL features that depend on session state — prepared statements, advisory locks, SET LOCAL — do not work correctly in transaction pooling mode.

How would you design the data strategy for a system that needs global high availability with users across multiple regions?

Evaluate active-passive replication (one primary region, read replicas in other regions) versus active-active (writes accepted in multiple regions with conflict resolution). Active-passive is simpler and avoids write conflicts; the trade-off is write latency for users far from the primary. Active-active with CRDTs or timestamp-based resolution is only viable for data models that tolerate conflicts. CockroachDB or Spanner are options for global consistency without manually managing the complexity.

How would you approach migrating a monolithic database to a distributed data model organized by business domain?

Identify bounded contexts and the data that belongs to each. Start with the most independent domains. The standard strategy is the strangler fig pattern: the new service writes to its own database and to the monolith in parallel during the transition. Once the new system is stable, writes to the monolith are disabled and the monolith reads from the new service via API. The biggest risk is cross-domain consistency: distributed transactions must be replaced with eventual consistency and domain events.

How would you manage a data corruption incident in production that affects millions of records?

First, contain the damage: disable the processes that are continuing to write corrupt data. Then, assess the exact scope using diagnostic queries. Restore from the most recent backup taken before the corruption, using PITR if available. If a full restore is not feasible, identify the affected records and correct them using scripts validated in staging first. Document the incident fully: root cause, scope, actions taken, and improvements to prevent recurrence. Stakeholder communication must be honest about the real extent of the impact.

How would you evaluate whether an existing relational database should be partially or fully migrated to a NoSQL model?

Migration is justified when the relational model is a real, measurable obstacle: schemas that change very frequently and generate costly migrations, data with highly variable structure across records, or access patterns that require horizontal scale beyond what PostgreSQL can achieve. Do not migrate for technological trend-following. In most cases, PostgreSQL with JSONB for variable fields and partitioning for volume resolves the problem without the cost of a full migration. If migration is genuinely justified, do it incrementally by business entity — never as a big bang.

How would you build a schema change review process that does not slow the development team but prevents production incidents?

Embed the review in the pipeline: tools like Squawk or sqlfluff automatically flag dangerous migrations (ALTER TABLE statements that acquire exclusive locks, adding a NOT NULL column without a default on a large table). The DBA reviews only the changes that the automated tools flag as high risk. Provide developers with a guide to safe migration patterns with concrete examples. The goal is for 80% of migrations to pass without manual review because the team already knows the correct patterns.

How would you design a data retention and archival strategy for a system with both legal retention obligations and right-to-erasure requirements?

Classify data by category: active operational data, historical data required for compliance retention, and data subject to right-to-erasure. Implement date-based partitioning to make archival and deletion of old data straightforward. Compliance-retained data is moved to cheaper storage (S3 Glacier) with metadata that allows retrieval if required. Data subject to erasure must have a verifiable deletion mechanism that includes backup coverage. Document the complete data flow for compliance audits.

Proposing indexes as the solution to any performance problem without analyzing the execution plan first

Indexes are not always the right answer and carry real costs in write performance and maintenance. A DBA who recommends adding an index without first running EXPLAIN ANALYZE and understanding why the planner is not using existing indexes is diagnosing superficially — and may make the problem worse.

Not mentioning a rollback plan when describing a production migration

Any change to a production database must have a tested rollback procedure in place before it is executed. A DBA who describes migrations without addressing how to revert if something goes wrong immediately loses credibility with interviewers who have dealt with real database incidents.

Conflating high availability with backups or treating them as equivalent strategies

Backups address data loss (RPO). High availability addresses downtime (RTO). They are complementary, not interchangeable. A system with excellent backups but no HA can still face hours of downtime to restore. A system with HA but no backups can lose data due to corruption or accidental deletion.

Never having tested a backup restore in a real environment

A backup that has never been restored is a hypothesis, not a guarantee. Interviewers at companies with critical data specifically ask when the last restore test was performed and how long it took. Not having an answer to that question is a serious operational red flag.

Discussing schema design without explaining the access patterns that drove the decisions

A schema is not good or bad in the abstract — it is good or bad for a specific set of access patterns. Describing a database design without explaining which queries it was optimized for, what write volume was anticipated, or which relationships were prioritized suggests the design was intuitive rather than evidence-based.

Not accounting for lock impact when proposing schema changes on tables with active traffic

Operations such as ADD COLUMN NOT NULL without a default, CREATE INDEX without CONCURRENTLY, or VACUUM FULL acquire exclusive locks that block both reads and writes. A DBA who proposes these changes without addressing their availability impact — or how to avoid it with online alternatives — demonstrates a lack of experience with production databases under real load.

DBA

Ensures that business data is available, consistent, and accessible at the performance level the product demands.

A Database Administrator (DBA) is responsible for designing, implementing, maintaining, and optimizing the database systems that underpin an organization's products and operations. Their work spans data modeling and database engine configuration to backup management, replication, high availability, and disaster recovery. They work closely with back-end developers, architects, and infrastructure teams to ensure that data systems can support business growth without compromising performance or data integrity.

PostgreSQLMySQLMongoDBRedisSQLReplication

Recruit the best DBA here

Start now

Main Responsibilities

•Design and maintain relational and non-relational database schemas optimized for the product's access patterns.
•Monitor and optimize query performance, indexes, and engine configuration to keep response times within SLAs.
•Implement and maintain backup, replication, and high availability strategies to ensure business continuity.
•Manage schema migrations in production using strategies that minimize downtime and the risk of data loss.
•Define and enforce data access security policies: roles, permissions, auditing, and encryption.
•Collaborate with development teams by reviewing queries and schema designs before they reach production.

Key Skills

Technical Skills

Advanced SQL: complex joins, window functions, CTEs, subqueries, and execution plan optimization
Administration of relational engines (PostgreSQL, MySQL) and non-relational (MongoDB, Redis), including configuration and performance tuning
Schema design with normalization, strategic denormalization, and data modeling for both OLTP and OLAP workloads
Implementation of replication, clustering, and high availability strategies (streaming replication, read replicas, automatic failover)
Backup management with point-in-time recovery strategies and regular, documented restore testing
Performance monitoring with tools such as pg_stat_statements, slow query log, EXPLAIN ANALYZE, and APM systems

Soft Skills

Methodical rigor when executing production changes: checklists, pre-change validations, and documented rollback plans
Technical communication to translate database performance issues into business impact that non-technical stakeholders can understand
Preventive thinking to anticipate bottlenecks before data volume makes them unavoidable
Ability to challenge schema or query designs in collaboration with developers without creating unnecessary friction
Operational discipline to keep documentation current: configurations, design decisions, and emergency procedures
Judgment to balance technical schema purity with the speed the business needs to evolve the product

Real use cases

Context

As data volume increases, queries that performed well at thousands of rows begin to degrade. Proactive optimization prevents the problem from surfacing in production.

Real examples

Slow query log analysis and identification of queries lacking appropriate indexes
Composite index design tailored to the product's specific access patterns
Table partitioning for historical data to maintain performance across years of records
Query rewriting for inefficient execution plans identified with EXPLAIN ANALYZE

Context

A well-designed schema from the start reduces future technical debt and makes the product easier to evolve. A poorly designed schema is expensive to correct once there is production data in it.

Real examples

Entity-relationship modeling with normalization appropriate to the use case
Decision between relational and document models based on access patterns
Multi-tenant schema design with per-organization data isolation
Definition of constraints, foreign keys, and check constraints that enforce integrity at the database level

Context

Databases are the most critical component in most systems. An outage affects the entire application. High availability and DR must be designed before the first incident occurs.

Real examples

Streaming replication with automatic failover configured in PostgreSQL
Read replica implementation to distribute read-heavy workloads
Backup strategy with periodic, documented restore tests
RTO and RPO targets defined and validated through scheduled DR drills

Context

Changing the schema of a database with millions of rows and active users is one of the riskiest production operations. It requires careful planning, a sound strategy, and thorough validation.

Real examples

Zero-downtime migrations using the expand-contract pattern
New column backfills on large tables using batches to avoid lock contention
Data integrity validation before and after every migration
Rollback procedures documented and tested before applying to production

Context

Data is an organization's most sensitive asset. Databases must have strict access controls, operation auditing, and encryption of sensitive data.

Real examples

Role and permission model following least-privilege principles per application and user
Access auditing on tables containing PII or financial data
Sensitive column encryption with key management integrated into a KMS
Dynamic data masking for development and staging environments

DBA

Recruit the best DBA here

Main Responsibilities

Key Skills

Technical Skills

Soft Skills

Real use cases

Basic questions

Technical questions

Advanced questions

Common interview mistakes

Other Roles

Related Frameworks

Salary Range LATAM

Solutions for recruiting and job searching

Top talent specialized in DBA