Talently
Talently
Data Analyst / BI

Data Analyst / BI

Turns data into concrete answers that allow the business to make better decisions with evidence instead of intuition.

A Data Analyst is responsible for extracting, cleaning, analyzing, and interpreting data to answer specific business questions and generate actionable insights. Their work spans building SQL queries and exploring datasets through creating visualizations and communicating findings to non-technical audiences. They do not just describe what happened: they investigate why it happened and what it implies for business decisions. They work closely with product managers, marketing teams, operations, and leadership to translate business questions into rigorous analyses and understandable results.

SQLPythonTableauPower BIStatisticsGoogle Analytics

Recruit the best Data Analyst / BI here

Start now

Main Responsibilities

  • Extract and transform data from multiple sources using SQL and analysis tools to answer specific business questions.
  • Clean and validate data before analysis to ensure results are reliable and reproducible.
  • Identify trends, patterns, and anomalies in data that generate actionable insights for business teams.
  • Build and maintain dashboards and reports that allow teams to monitor their KPIs autonomously.
  • Design and analyze A/B experiments in collaboration with product and marketing teams.
  • Communicate analysis results with clarity and honesty about the limitations and assumptions used.

Key Skills

Technical Skills

  • Advanced SQL for data extraction, transformation, and analysis: window functions, CTEs, subqueries, and query optimization
  • Python or R for statistical analysis, data manipulation with pandas, and visualization with matplotlib or seaborn
  • Visualization and BI tools: Tableau, Power BI, Looker, or equivalent for building actionable dashboards
  • Applied statistics: distributions, hypothesis testing, confidence intervals, and correlation for rigorous analyses
  • Product analytics tools: Google Analytics 4, Mixpanel, Amplitude for user behavioral analysis
  • Data warehouse knowledge: Snowflake, BigQuery, or Redshift for working with large data volumes efficiently

Soft Skills

  • Analytical thinking to decompose complex business questions into manageable, rigorous analyses
  • Healthy skepticism to question data before drawing conclusions and verify quality before presenting results
  • Effective communication to present complex findings with clear visualizations and accessible narratives for non-technical audiences
  • Curiosity to explore data beyond the original question and uncover unanticipated insights
  • Intellectual honesty to report negative or ambiguous results with the same clarity as positive ones
  • Ability to prioritize when a simple analysis answers the question better than a complex, time-consuming one

Real use cases

Context

Understanding how users navigate the product and where they drop off allows the product team to prioritize improvements with the greatest impact on key metrics.

Real examples

  • Registration funnel analysis identifying the steps with the highest drop-off and their causes
  • User segmentation by behavior to identify patterns among the best-converting users
  • Retention cohort analysis to understand when and why users stop using the product
  • Identification of features most correlated with long-term retention

Context

Business teams need visibility into their key metrics to make operational decisions. The analyst builds the reporting systems that provide that visibility reliably.

Real examples

  • Weekly sales dashboards with comparison against prior periods and targets
  • Marketing campaign performance reports with channel attribution
  • Sales pipeline quality analysis by stage and by representative
  • Operational metric monitoring with automatic alerts on significant deviations

Context

Product and marketing decisions based on well-designed experiments are more likely to generate the expected impact than those based on opinions or correlational data.

Real examples

  • Sample size and minimum duration calculation for an A/B test before launching
  • Results analysis using appropriate statistical tests with multiple comparisons correction
  • Root cause diagnosis for why an experiment produced counterintuitive results
  • Practical significance evaluation in addition to statistical significance for business decisions

Context

Not all business questions are well-defined. Exploratory analysis uncovers patterns and opportunities that teams did not know existed.

Real examples

  • Highest-LTV customer analysis to identify shared characteristics that guide acquisition strategy
  • Exploration of products with the highest return rate to identify quality or expectation problems
  • Geographic sales analysis to identify under-penetrated regions with growth potential
  • Identification of user segments with anomalous behavior that merit qualitative investigation

Context

Strategic initiatives — market expansion, pricing changes, new product launches — require data analysis to evaluate the opportunity and measure post-implementation impact.

Real examples

  • New market expansion feasibility analysis with demand and competitive data
  • Impact evaluation of a pricing change on sales volume and revenue
  • Post-launch analysis of a new product comparing its adoption against previous products
  • Scenario modeling to evaluate the impact of different retention strategies

Basic questions

Verify the source data quality: duplicate records, null values in critical fields, complete date ranges, and consistency with other known sources. Validate the analysis logic on a small subset where the expected result is known. Confirm that the calculated metrics match values reported by other systems (cross-check against the billing system, GA, or prior reports). Document the assumptions and limitations of the analysis alongside the results. Presenting leadership with analysis built on questionable data is more harmful than delaying delivery to verify it.
Explain with a concrete example from the business itself: 'Users who use this feature have 40% better retention — but that does not mean activating it for everyone will increase retention by 40%. The more engaged users may simply be the ones who naturally use it.' To claim causality, we need a randomized controlled experiment where the feature is assigned randomly. Correlation is a clue about where to look, not a guarantee that the intervention will work. This distinction is fundamental before the business makes investment decisions based on the analysis.
Evaluate each request against three criteria: urgency (is there a decision waiting on this analysis?), impact (how much value can the decision this analysis enables generate?), and estimated effort (how much work does it require?). Communicate priorities to requesters transparently so they can re-assess urgency or simplify the scope. For urgent but low-impact requests, offer a quick, less exhaustive analysis that answers the essential question without full rigor. The analyst's time is scarce; transparent prioritization is more valuable than attempting to satisfy all requests superficially.
Present the results honestly, regardless of expectations. First verify that the analysis is correct by reviewing the methodology and data with a colleague before presenting. When communicating the unexpected result, provide context: what explains the difference from the initial hypothesis, what other variables might be influencing the outcome, and what additional analysis could clarify the situation. An analyst who adjusts their conclusions to confirm stakeholder expectations destroys the value of analysis as a decision-making tool.
A useful dashboard answers specific questions for the people who will use it — it does not showcase all available data. Define the three to five decisions the user must be able to make with the dashboard before designing it. Prioritize metrics that signal whether something requires immediate action over those that are interesting but not actionable. Design with a clear visual hierarchy: overall status first, detail on drill-down. Include reference comparisons (versus prior period, versus target) so numbers have context. A number without context does not enable any decision.
A vanity metric is one that can grow without the business genuinely improving: total site visits without considering quality, total signups without considering active users, or features launched without measuring adoption. Identify them by asking: can this metric grow while the business gets worse? Can the team manipulate it artificially without improving the real outcome? If the answer is yes to both, it is a vanity metric. Actionable metrics are those whose movement necessarily implies a real improvement in the value delivered to users or in business outcomes.
Before reporting a spike as a real insight, verify: does it coincide with any known event (marketing campaign, tracking failure, change in metric definition)? Does the spike appear across all data sources or only one? Does it affect all user segments or only a specific one? Are there anomalies in the raw data such as duplicate values or out-of-range timestamps? A real spike should appear consistently across multiple sources and segments. A data artifact typically appears in a single source or a very specific segment with no business explanation.
The mean is the sum of all values divided by the number of observations: sensitive to extreme values (outliers). The median is the central value when data is sorted: robust against outliers. For symmetric distributions without significant outliers, both are similar and the mean is more informative. For skewed distributions or data with outliers (incomes, system response times, property prices), the median better represents the typical user or case. Example: if the average load time is 2 seconds but the median is 0.8 seconds, a small number of users with very slow connections are inflating the mean without representing the majority's experience.

Technical questions

Create a cohort table identifying for each user the week of their first event (registration or first relevant action). For each cohort, count the active users in each subsequent week divided by the total cohort size. Use a CTE to calculate each user's first-use week, another to generate all user-week combinations where the user was active, and finally a JOIN to calculate the retention percentage by cohort and week. Window functions (FIRST_VALUE or MIN with OVER PARTITION BY user_id) are useful for identifying each user's entry week. The result is a matrix of cohort vs. week with the retention rate in each cell.
First identify them visually with boxplots and histograms showing the distribution of key variables. For statistical detection: the IQR method (values outside Q1 - 1.5×IQR or Q3 + 1.5×IQR) for moderately symmetric distributions, or Z-score for approximately normal distributions. Before treating outliers, investigate their origin: are they data errors (a price of -100 euros), legitimate exceptional events (Black Friday sales), or simply the long tail of the natural distribution? Errors are corrected or removed. Exceptional events can be excluded from regular trend analysis but documented. Natural tail values are generally kept and their effect on interpretation is noted.
Verify that the pre-calculated sample size was reached before running the analysis: early stopping is one of the most frequent causes of false positives. Calculate the p-value with the appropriate test for the metric type (chi-squared for conversion rates, t-test for continuous metrics). Verify that statistical power was sufficient (80% or higher) to detect the minimum expected effect. Evaluate practical significance in addition to statistical significance: a p-value of 0.001 with a 0.1% lift may not justify the cost of implementing the change. Review guardrail metrics (metrics that should not move) to detect unintended effects of the treatment.
Use a window function with RANGE BETWEEN to calculate the rolling cumulative, or a CTE that generates the month series and then joins with the sales data. The cleanest approach: generate a date series with GENERATE_SERIES or a calendar table, join it against the sales events filtered to the 12 months prior to each date, and group by month. Another option is using SUM with OVER (ORDER BY month ROWS BETWEEN 11 PRECEDING AND CURRENT ROW) for a sliding 12-month window. Both approaches are more maintainable and efficient than 12 subqueries joined with UNION ALL.
Define segmentation variables relevant to the business: purchase frequency, monetary value, recency of last purchase (RFM framework), preferred product type, or acquisition channel. For descriptive segmentation, use quintiles per variable to create discrete, manageable groups. For more sophisticated segmentation, k-means clustering on the normalized variables. Validate that the segments have actionable differences: it is not enough for them to be statistically distinct — they must justify different marketing, product, or service strategies. Name segments in business terms (champions, at-risk, hibernating) to facilitate adoption by the teams that will act on them.
Systematic drill-down methodology. First narrow the problem: does the drop affect all acquisition channels or only some? All devices or only mobile? All countries or specific regions? All user types or a specific segment? Each segmentation that reveals a concentrated drop in a subgroup is a clue to the cause. In parallel, review the change timeline: deployments, campaigns launched or paused, price or checkout flow changes, technical changes that may have broken part of the tracking. Cross-reference the timeline changes against the segments where the drop is concentrated to identify the most probable hypothesis before investigating further.
Structure the presentation in three parts: context in 2 minutes (what was the question and why it matters), the main finding in 5 minutes (what was found, with a clear visualization and the confidence level in the result), and the recommendation in 3 minutes (what the business should do and what the expected impact is). Lead with the conclusion, not the methodology: executives need to know what to do, not how the analysis was done. Prepare the methodological detail as an appendix for anyone who requests it. A well-chosen visualization communicates more in 30 seconds than three slides of data tables.
For a simple but robust forecast: time series decomposition into trend, seasonality, and residual. Calculate the trend with a linear regression over historical data. Calculate seasonal indices as the ratio between each month's sales and the annual average over the last two or three years. The forecast is the trend projection multiplied by the seasonal index for the corresponding month. Add a confidence interval based on the historical variability of the residuals. This method is transparent, explainable to the business, and sufficiently accurate for most cases without the complexity of ML models that require more data and more maintenance.

Advanced questions

A data culture is built with small, visible wins — not with presentations about the importance of data. Identify a team with a receptive leader and a concrete business problem where data can generate a measurable improvement. Run the analysis, show the result, and follow up on the impact of the data-driven decision. Repeat with another team and another problem. Build self-service tools that allow teams to answer their own questions without waiting for the analyst. Data culture emerges when leaders see their peers making better decisions with data and want the same for their team.
In the early stage, avoid the trap of measuring everything. Focus on retention indicators that demonstrate the product solves a real problem: the cohort retention curve must stabilize and not drop to zero. The Sean Ellis survey question asked periodically ('How disappointed would you be if you could no longer use the product?') with the 40% 'very disappointed' threshold. Qualitative NPS with the specific reasons from promoters and detractors. Usage frequency among the most active users as a proxy for perceived value. Measuring fewer metrics with high frequency and high fidelity is more useful at this stage than many metrics with low confidence.
Quasi-experimental methods allow for causal impact estimation when no RCT is available. Difference-in-differences: compare the evolution of a treated group against a comparable control group before and after the intervention. Regression discontinuity: if there is a threshold (e.g., users with more than X days of tenure received the intervention) you can compare users at the edges of that threshold. Propensity score matching: pair treated users with untreated users who are similar on observable variables and compare their outcomes. Each method requires assumptions that must be documented and honestly questioned. Results from these methods must be presented with explicit limitations — not as equivalents to an RCT.
Implement a centralized analysis repository with version control (Notion + GitHub, or a platform like Hex or Mode) where every analysis is reproducible and searchable by other analysts. Establish a shared metrics library with standardized definitions to prevent the same KPI from being calculated differently by different analysts. Weekly analysis review sessions where analysts present their work to the team for methodological feedback. Analysis templates that standardize documentation structure: question, methodology, limitations, and findings. Silos produce inconsistent analyses that erode trust when stakeholders get different numbers for the same question.
Interview dashboard users with two questions: what was the last decision you made based on this dashboard? What action would you take if the main metric dropped by 20%? If they cannot answer the first question with a concrete recent example, the dashboard is decorative. If they cannot answer the second, the metrics shown are not actionable. Also audit actual usage: how many users open the dashboard weekly? How long do they spend on it? A dashboard with 2 active users out of the 20 who supposedly need it is failing its purpose. The audit result should produce a simplification plan — not a plan to add more metrics.
Communicate the limitation proactively to the stakeholder before investing time in an analysis that will produce unreliable results. Present the available options: an analysis with existing data that produces an estimate with wide confidence intervals and explicit assumptions, identification of what additional data would reduce the uncertainty and how to obtain it, or the design of an experiment that would generate the needed data in the future. In some cases, the correct answer is 'we do not have the data to answer this question with sufficient confidence to make this decision.' That honesty is more valuable than an analysis that implies greater certainty than the data supports.

Common interview mistakes

Saying 'users who use feature X have 40% better retention, therefore we should force its adoption' is a serious conceptual error that can lead to costly and ineffective product decisions. Interviewers at companies with mature analytical cultures ask direct questions about causality and expect the analyst to know the limits of their inference.
An analysis built on incorrect data produces incorrect conclusions with a veneer of rigor that makes them more dangerous. Interviewers specifically ask how the candidate verifies data quality before using it. Not mentioning this step signals limited experience with real production data, which frequently has quality issues.
Building dashboards is a means, not an end. An analyst who describes their work primarily in terms of the dashboards they built without being able to articulate what decisions those dashboards enabled demonstrates measuring impact in outputs rather than outcomes. Interviewers at impact-oriented data teams ask what decisions the analysis changed.
An analyst who presents results without being able to explain why they chose that statistical technique, what assumptions it makes, and what the method's limitations are produces analyses that are neither reproducible nor open to challenge. Technical interviewers ask candidates to explain step by step how they would approach a specific analysis.
An analysis that works well for a data scientist may be incomprehensible or irrelevant to a sales director. The ability to adapt communication to the technical level and interests of each audience is a core competency of the role. Interviewers specifically evaluate how the candidate communicates results to non-technical audiences.
Reporting that the conversion rate rose 0.1% with statistical significance without mentioning that this change equals three additional sales per month demonstrates a lack of business judgment. Numbers have context and scale: what is statistically significant is not always practically relevant to the business.