How would you ensure the reliability of your results before presenting an analysis to senior leadership?

Verify the source data quality: duplicate records, null values in critical fields, complete date ranges, and consistency with other known sources. Validate the analysis logic on a small subset where the expected result is known. Confirm that the calculated metrics match values reported by other systems (cross-check against the billing system, GA, or prior reports). Document the assumptions and limitations of the analysis alongside the results. Presenting leadership with analysis built on questionable data is more harmful than delaying delivery to verify it.

How would you distinguish between correlation and causality when communicating a finding to a non-technical stakeholder?

Explain with a concrete example from the business itself: 'Users who use this feature have 40% better retention — but that does not mean activating it for everyone will increase retention by 40%. The more engaged users may simply be the ones who naturally use it.' To claim causality, we need a randomized controlled experiment where the feature is assigned randomly. Correlation is a clue about where to look, not a guarantee that the intervention will work. This distinction is fundamental before the business makes investment decisions based on the analysis.

How would you prioritize which analyses to complete when you have more requests than you can handle?

Evaluate each request against three criteria: urgency (is there a decision waiting on this analysis?), impact (how much value can the decision this analysis enables generate?), and estimated effort (how much work does it require?). Communicate priorities to requesters transparently so they can re-assess urgency or simplify the scope. For urgent but low-impact requests, offer a quick, less exhaustive analysis that answers the essential question without full rigor. The analyst's time is scarce; transparent prioritization is more valuable than attempting to satisfy all requests superficially.

What would you do if the data does not support the conclusion the stakeholder expected?

Present the results honestly, regardless of expectations. First verify that the analysis is correct by reviewing the methodology and data with a colleague before presenting. When communicating the unexpected result, provide context: what explains the difference from the initial hypothesis, what other variables might be influencing the outcome, and what additional analysis could clarify the situation. An analyst who adjusts their conclusions to confirm stakeholder expectations destroys the value of analysis as a decision-making tool.

How would you design a dashboard to be useful — not just informative?

A useful dashboard answers specific questions for the people who will use it — it does not showcase all available data. Define the three to five decisions the user must be able to make with the dashboard before designing it. Prioritize metrics that signal whether something requires immediate action over those that are interesting but not actionable. Design with a clear visual hierarchy: overall status first, detail on drill-down. Include reference comparisons (versus prior period, versus target) so numbers have context. A number without context does not enable any decision.

What is a vanity metric and how would you identify one in a set of KPIs?

A vanity metric is one that can grow without the business genuinely improving: total site visits without considering quality, total signups without considering active users, or features launched without measuring adoption. Identify them by asking: can this metric grow while the business gets worse? Can the team manipulate it artificially without improving the real outcome? If the answer is yes to both, it is a vanity metric. Actionable metrics are those whose movement necessarily implies a real improvement in the value delivered to users or in business outcomes.

How would you interpret a sudden spike in a business metric to determine whether it is real or a data artifact?

Before reporting a spike as a real insight, verify: does it coincide with any known event (marketing campaign, tracking failure, change in metric definition)? Does the spike appear across all data sources or only one? Does it affect all user segments or only a specific one? Are there anomalies in the raw data such as duplicate values or out-of-range timestamps? A real spike should appear consistently across multiple sources and segments. A data artifact typically appears in a single source or a very specific segment with no business explanation.

What is the difference between the mean and the median, and when would you use each to describe a dataset?

The mean is the sum of all values divided by the number of observations: sensitive to extreme values (outliers). The median is the central value when data is sorted: robust against outliers. For symmetric distributions without significant outliers, both are similar and the mean is more informative. For skewed distributions or data with outliers (incomes, system response times, property prices), the median better represents the typical user or case. Example: if the average load time is 2 seconds but the median is 0.8 seconds, a small number of users with very slow connections are inflating the mean without representing the majority's experience.

How would you write a SQL query to calculate user retention rates by weekly cohort over the first 30 days?

Create a cohort table identifying for each user the week of their first event (registration or first relevant action). For each cohort, count the active users in each subsequent week divided by the total cohort size. Use a CTE to calculate each user's first-use week, another to generate all user-week combinations where the user was active, and finally a JOIN to calculate the retention percentage by cohort and week. Window functions (FIRST_VALUE or MIN with OVER PARTITION BY user_id) are useful for identifying each user's entry week. The result is a matrix of cohort vs. week with the retention rate in each cell.

How would you detect and handle outliers in a dataset before using it for a trend analysis?

First identify them visually with boxplots and histograms showing the distribution of key variables. For statistical detection: the IQR method (values outside Q1 - 1.5×IQR or Q3 + 1.5×IQR) for moderately symmetric distributions, or Z-score for approximately normal distributions. Before treating outliers, investigate their origin: are they data errors (a price of -100 euros), legitimate exceptional events (Black Friday sales), or simply the long tail of the natural distribution? Errors are corrected or removed. Exceptional events can be excluded from regular trend analysis but documented. Natural tail values are generally kept and their effect on interpretation is noted.

How would you evaluate the statistical validity of an A/B test before declaring a winner?

Verify that the pre-calculated sample size was reached before running the analysis: early stopping is one of the most frequent causes of false positives. Calculate the p-value with the appropriate test for the metric type (chi-squared for conversion rates, t-test for continuous metrics). Verify that statistical power was sufficient (80% or higher) to detect the minimum expected effect. Evaluate practical significance in addition to statistical significance: a p-value of 0.001 with a 0.1% lift may not justify the cost of implementing the change. Review guardrail metrics (metrics that should not move) to detect unintended effects of the treatment.

How would you write a SQL query to calculate rolling 12-month revenue by month without using a union of 12 subqueries?

Use a window function with RANGE BETWEEN to calculate the rolling cumulative, or a CTE that generates the month series and then joins with the sales data. The cleanest approach: generate a date series with GENERATE_SERIES or a calendar table, join it against the sales events filtered to the 12 months prior to each date, and group by month. Another option is using SUM with OVER (ORDER BY month ROWS BETWEEN 11 PRECEDING AND CURRENT ROW) for a sliding 12-month window. Both approaches are more maintainable and efficient than 12 subqueries joined with UNION ALL.

How would you design a customer segmentation analysis to identify groups with different behaviors?

Define segmentation variables relevant to the business: purchase frequency, monetary value, recency of last purchase (RFM framework), preferred product type, or acquisition channel. For descriptive segmentation, use quintiles per variable to create discrete, manageable groups. For more sophisticated segmentation, k-means clustering on the normalized variables. Validate that the segments have actionable differences: it is not enough for them to be statistically distinct — they must justify different marketing, product, or service strategies. Name segments in business terms (champions, at-risk, hibernating) to facilitate adoption by the teams that will act on them.

How would you identify the root cause of a 20% drop in conversion rate that occurred last week?

Systematic drill-down methodology. First narrow the problem: does the drop affect all acquisition channels or only some? All devices or only mobile? All countries or specific regions? All user types or a specific segment? Each segmentation that reveals a concentrated drop in a subgroup is a clue to the cause. In parallel, review the change timeline: deployments, campaigns launched or paused, price or checkout flow changes, technical changes that may have broken part of the tracking. Cross-reference the timeline changes against the segments where the drop is concentrated to identify the most probable hypothesis before investigating further.

How would you present the results of a complex analysis to a senior leadership team in 10 minutes?

Structure the presentation in three parts: context in 2 minutes (what was the question and why it matters), the main finding in 5 minutes (what was found, with a clear visualization and the confidence level in the result), and the recommendation in 3 minutes (what the business should do and what the expected impact is). Lead with the conclusion, not the methodology: executives need to know what to do, not how the analysis was done. Prepare the methodological detail as an appendix for anyone who requests it. A well-chosen visualization communicates more in 30 seconds than three slides of data tables.

How would you build a simple sales forecasting model for the next three months without using machine learning?

For a simple but robust forecast: time series decomposition into trend, seasonality, and residual. Calculate the trend with a linear regression over historical data. Calculate seasonal indices as the ratio between each month's sales and the annual average over the last two or three years. The forecast is the trend projection multiplied by the seasonal index for the corresponding month. Add a confidence interval based on the historical variability of the residuals. This method is transparent, explainable to the business, and sufficiently accurate for most cases without the complexity of ML models that require more data and more maintenance.

How would you build a data culture in an organization where business teams make decisions primarily by intuition?

A data culture is built with small, visible wins — not with presentations about the importance of data. Identify a team with a receptive leader and a concrete business problem where data can generate a measurable improvement. Run the analysis, show the result, and follow up on the impact of the data-driven decision. Repeat with another team and another problem. Build self-service tools that allow teams to answer their own questions without waiting for the analyst. Data culture emerges when leaders see their peers making better decisions with data and want the same for their team.

How would you design the metrics system for an early-stage startup that needs to measure progress toward product-market fit?

In the early stage, avoid the trap of measuring everything. Focus on retention indicators that demonstrate the product solves a real problem: the cohort retention curve must stabilize and not drop to zero. The Sean Ellis survey question asked periodically ('How disappointed would you be if you could no longer use the product?') with the 40% 'very disappointed' threshold. Qualitative NPS with the specific reasons from promoters and detractors. Usage frequency among the most active users as a proxy for perceived value. Measuring fewer metrics with high frequency and high fidelity is more useful at this stage than many metrics with low confidence.

How would you approach analyzing the impact of a business initiative when it was not possible to run a randomized controlled experiment?

Quasi-experimental methods allow for causal impact estimation when no RCT is available. Difference-in-differences: compare the evolution of a treated group against a comparable control group before and after the intervention. Regression discontinuity: if there is a threshold (e.g., users with more than X days of tenure received the intervention) you can compare users at the edges of that threshold. Propensity score matching: pair treated users with untreated users who are similar on observable variables and compare their outcomes. Each method requires assumptions that must be documented and honestly questioned. Results from these methods must be presented with explicit limitations — not as equivalents to an RCT.

How would you structure the data analysis process in a team where analysts work in silos without sharing methodologies or results?

Implement a centralized analysis repository with version control (Notion + GitHub, or a platform like Hex or Mode) where every analysis is reproducible and searchable by other analysts. Establish a shared metrics library with standardized definitions to prevent the same KPI from being calculated differently by different analysts. Weekly analysis review sessions where analysts present their work to the team for methodological feedback. Analysis templates that standardize documentation structure: question, methodology, limitations, and findings. Silos produce inconsistent analyses that erode trust when stakeholders get different numbers for the same question.

How would you evaluate whether existing dashboards in an organization are genuinely serving decision-making or only reporting?

Interview dashboard users with two questions: what was the last decision you made based on this dashboard? What action would you take if the main metric dropped by 20%? If they cannot answer the first question with a concrete recent example, the dashboard is decorative. If they cannot answer the second, the metrics shown are not actionable. Also audit actual usage: how many users open the dashboard weekly? How long do they spend on it? A dashboard with 2 active users out of the 20 who supposedly need it is failing its purpose. The audit result should produce a simplification plan — not a plan to add more metrics.

How would you manage the situation in which the available data is insufficient to answer the business question with the required confidence?

Communicate the limitation proactively to the stakeholder before investing time in an analysis that will produce unreliable results. Present the available options: an analysis with existing data that produces an estimate with wide confidence intervals and explicit assumptions, identification of what additional data would reduce the uncertainty and how to obtain it, or the design of an experiment that would generate the needed data in the future. In some cases, the correct answer is 'we do not have the data to answer this question with sufficient confidence to make this decision.' That honesty is more valuable than an analysis that implies greater certainty than the data supports.

Presenting correlations as causalities when communicating findings without distinguishing between the two concepts

Saying 'users who use feature X have 40% better retention, therefore we should force its adoption' is a serious conceptual error that can lead to costly and ineffective product decisions. Interviewers at companies with mature analytical cultures ask direct questions about causality and expect the analyst to know the limits of their inference.

Not validating data quality before presenting analysis results

An analysis built on incorrect data produces incorrect conclusions with a veneer of rigor that makes them more dangerous. Interviewers specifically ask how the candidate verifies data quality before using it. Not mentioning this step signals limited experience with real production data, which frequently has quality issues.

Describing dashboards and reports as the primary output of the work without mentioning impact on business decisions

Building dashboards is a means, not an end. An analyst who describes their work primarily in terms of the dashboards they built without being able to articulate what decisions those dashboards enabled demonstrates measuring impact in outputs rather than outcomes. Interviewers at impact-oriented data teams ask what decisions the analysis changed.

Not being able to explain the analysis methodology or justify the statistical decisions made

An analyst who presents results without being able to explain why they chose that statistical technique, what assumptions it makes, and what the method's limitations are produces analyses that are neither reproducible nor open to challenge. Technical interviewers ask candidates to explain step by step how they would approach a specific analysis.

Ignoring the audience when communicating results and presenting analyses with the same level of technical detail to all stakeholders

An analysis that works well for a data scientist may be incomprehensible or irrelevant to a sales director. The ability to adapt communication to the technical level and interests of each audience is a core competency of the role. Interviewers specifically evaluate how the candidate communicates results to non-technical audiences.

Not considering the business context when interpreting metrics and reporting changes without evaluating whether they are meaningful in practical terms

Reporting that the conversion rate rose 0.1% with statistical significance without mentioning that this change equals three additional sales per month demonstrates a lack of business judgment. Numbers have context and scale: what is statistically significant is not always practically relevant to the business.

Data Analyst / BI

Turns data into concrete answers that allow the business to make better decisions with evidence instead of intuition.

A Data Analyst is responsible for extracting, cleaning, analyzing, and interpreting data to answer specific business questions and generate actionable insights. Their work spans building SQL queries and exploring datasets through creating visualizations and communicating findings to non-technical audiences. They do not just describe what happened: they investigate why it happened and what it implies for business decisions. They work closely with product managers, marketing teams, operations, and leadership to translate business questions into rigorous analyses and understandable results.

SQLPythonTableauPower BIStatisticsGoogle Analytics

Recruit the best Data Analyst / BI here

Start now

Main Responsibilities

•Extract and transform data from multiple sources using SQL and analysis tools to answer specific business questions.
•Clean and validate data before analysis to ensure results are reliable and reproducible.
•Identify trends, patterns, and anomalies in data that generate actionable insights for business teams.
•Build and maintain dashboards and reports that allow teams to monitor their KPIs autonomously.
•Design and analyze A/B experiments in collaboration with product and marketing teams.
•Communicate analysis results with clarity and honesty about the limitations and assumptions used.

Key Skills

Technical Skills

Advanced SQL for data extraction, transformation, and analysis: window functions, CTEs, subqueries, and query optimization
Python or R for statistical analysis, data manipulation with pandas, and visualization with matplotlib or seaborn
Visualization and BI tools: Tableau, Power BI, Looker, or equivalent for building actionable dashboards
Applied statistics: distributions, hypothesis testing, confidence intervals, and correlation for rigorous analyses
Product analytics tools: Google Analytics 4, Mixpanel, Amplitude for user behavioral analysis
Data warehouse knowledge: Snowflake, BigQuery, or Redshift for working with large data volumes efficiently

Soft Skills

Analytical thinking to decompose complex business questions into manageable, rigorous analyses
Healthy skepticism to question data before drawing conclusions and verify quality before presenting results
Effective communication to present complex findings with clear visualizations and accessible narratives for non-technical audiences
Curiosity to explore data beyond the original question and uncover unanticipated insights
Intellectual honesty to report negative or ambiguous results with the same clarity as positive ones
Ability to prioritize when a simple analysis answers the question better than a complex, time-consuming one

Real use cases

Context

Understanding how users navigate the product and where they drop off allows the product team to prioritize improvements with the greatest impact on key metrics.

Real examples

Registration funnel analysis identifying the steps with the highest drop-off and their causes
User segmentation by behavior to identify patterns among the best-converting users
Retention cohort analysis to understand when and why users stop using the product
Identification of features most correlated with long-term retention

Context

Business teams need visibility into their key metrics to make operational decisions. The analyst builds the reporting systems that provide that visibility reliably.

Real examples

Weekly sales dashboards with comparison against prior periods and targets
Marketing campaign performance reports with channel attribution
Sales pipeline quality analysis by stage and by representative
Operational metric monitoring with automatic alerts on significant deviations

Context

Product and marketing decisions based on well-designed experiments are more likely to generate the expected impact than those based on opinions or correlational data.

Real examples

Sample size and minimum duration calculation for an A/B test before launching
Results analysis using appropriate statistical tests with multiple comparisons correction
Root cause diagnosis for why an experiment produced counterintuitive results
Practical significance evaluation in addition to statistical significance for business decisions

Context

Not all business questions are well-defined. Exploratory analysis uncovers patterns and opportunities that teams did not know existed.

Real examples

Highest-LTV customer analysis to identify shared characteristics that guide acquisition strategy
Exploration of products with the highest return rate to identify quality or expectation problems
Geographic sales analysis to identify under-penetrated regions with growth potential
Identification of user segments with anomalous behavior that merit qualitative investigation

Context

Strategic initiatives — market expansion, pricing changes, new product launches — require data analysis to evaluate the opportunity and measure post-implementation impact.

Real examples

New market expansion feasibility analysis with demand and competitive data
Impact evaluation of a pricing change on sales volume and revenue
Post-launch analysis of a new product comparing its adoption against previous products
Scenario modeling to evaluate the impact of different retention strategies