BLOG · LEARNING ROADMAP · 2026

From Business Analyst
to Data Analyst.

A six-stage roadmap I wish I had when I started — what to learn, in what order, and why it matters once you are inside an enterprise data team. Tap any topic to mark it done; your progress is saved on this device.

0 of 22 topics marked as learned

Stages

01 Data Basics

02 Analytics

03 Data Platform

04 Data Modeling

05 ETL & ELT

06 Data Governance

BA → Data Analyst

Learning Roadmap

01Data Basicsunderstand the landscape

✓

OLTP vs OLAP

✓

Structured (CSV, tables)

✓

Semi-structured (JSON, Parquet)

✓

Unstructured (docs, logs)

03Data Platformwhere you practice

✓

Snowflake

✓

Databricks

✓

BigQuery

05ETL & ELThow data moves

✓

dbt

✓

Azure Data Factory

✓

Airflow

02Analyticslearn this first

✓

SQL Fundamentals

✓

Window Functions / CTEs

✓

Python (Pandas)

✓

Statistical Thinking

✓

Power BI / Tableau

04Data Modelingdesign skills

✓

Normalized vs Denormalized

✓

Dimensional Modeling

✓

Star Schema

✓

Snowflake Schema

06Data Governancetrust & compliance

✓

MDM

✓

Data Security & Privacy

✓

Data Quality Framework

01 · DATA BASICS

Data Basics

Before you analyze data, you need to understand what shape it comes in and where it lives. These four concepts appear in nearly every data conversation — knowing them helps you ask the right questions from day one.

OLTP vs OLAP

OLTP (e.g., Salesforce) databases handle live operations — each write is a fixed type at a fixed position. OLAP (analytical) databases store historical data optimized for aggregated queries. Always ask which one you're querying before you optimize anything; the two behave very differently.

Structured (CSV, tables)

Structured data lives in rows and columns — spreadsheets, SQL tables, CSV files. Every field has a fixed type and position. The easiest format to search and work with, and the one analysts do most of their analysis work in.

Semi-structured (JSON, Parquet)

Semi-structured: flexible structure with key-value pairs. JSON (APIs, event logs), Parquet (the data-warehouse pivot format). Has fields and values, but the structure can vary between records — you cannot treat it like a CSV until you parse it.

Unstructured (docs, logs)

Raw: free text, logs, images, PDFs. No fixed schema. Requires significant pre-processing — usually NLP or ML — before it becomes usable for analysis.

↑ Back to map

02 · ANALYTICS

Analytics

This is the core toolkit. SQL is the language of data — nothing else matters as much in day-to-day work. You don't need to be a data engineer, but you do need to be able to answer questions yourself without depending on others for every query.

SQL Fundamentals

SELECT, FROM, WHERE, JOIN, GROUP BY, row aggregates — the core of analytics work. You cannot do this without it. Know it well and don't rely on someone else for every query.

Window Functions / CTEs

Window functions calculate rankings, running totals, and moving averages without collapsing your data. CTEs (WITH clause) break complex queries into readable, sequential steps. Together they replace most subquery spaghetti.

Python (Pandas)

Python specifically for loading and reshaping data. Not for ML pipelines. Pandas puts you one read_csv() away from a full DataFrame so you can quickly inspect it, then pass you to a much stronger position alongside familiar BI tooling.

Statistical Thinking

Mean, median, distribution, correlation. SQL tells you what the data says; statistics helps you understand whether that answer is meaningful. Understanding variation prevents you from drawing too much insight from noise.

Power BI / Tableau

Knowing how to build a dashboard and tell the underlying data story. The goal is decision-making findings, not just charts. Pick one tool and go deep — breadth here matters less than depth.

↑ Back to map

03 · DATA PLATFORM

Data Platform

The cloud warehouses are where you will spend most of your time querying and building. You don't need to administer them — but knowing their architecture helps you write better queries and ask smarter questions of the engineers around you.

Snowflake

A cloud data warehouse purpose-built for analytics. Separates storage and compute so teams can scale independently. The most common place BAs find themselves querying production data in enterprise environments.

Databricks

A Lakehouse platform built on Apache Spark, purpose-built for data engineering and ML pipelines. Increasingly the engine behind enterprise data transformations — if your company has a large data science team, you will encounter it.

BigQuery

Google's serverless analytics warehouse. Pay-per-query pricing and native integration with GCP services. Often the entry point for teams already invested in Google Workspace or using Google Analytics data.

↑ Back to map

04 · DATA MODELING

Data Modeling

Data modeling is how analysts and engineers agree on what the data means before it's built. Getting this wrong makes every downstream report inconsistent. Getting it right makes the warehouse easy to query for years.

Normalized vs Denormalized

Normalization removes data redundancy by splitting tables; denormalization trades storage for query speed. Knowing when to apply which is a core design decision that determines how easy — or painful — your data is to query downstream.

Dimensional Modeling

A technique for organizing data warehouses into fact tables (measurements) and dimension tables (context). It is the standard mental model for analytical databases — understanding this unlocks how real enterprise data is structured and why.

Star Schema

The simplest dimensional model: one fact table in the centre, dimension tables radiating outward. Easy to query, easy to explain to stakeholders, and what most BI tools are optimized to consume.

Snowflake Schema

An extension of the star schema where dimension tables are further normalized. More storage-efficient but more complex to query. Knowing when to choose one over the other separates junior data analysts from senior ones.

↑ Back to map

05 · ETL & ELT

ETL & ELT

Data doesn't arrive analysis-ready. Understanding how it moves — from source systems into the warehouse — helps you trace data quality issues, contribute to pipeline design conversations, and avoid requesting reports built on broken data.

dbt

The de-facto tool for transforming raw data into analytics-ready models, in SQL. Version-controlled data transformations with built-in testing and documentation. If your team isn't using dbt, someone is spending time solving a problem it has already solved.

Azure Data Factory

Microsoft's cloud ETL service for ingesting and moving data at scale. The standard connector between source systems and the warehouse in most enterprise Microsoft stacks. Knowing it means you can trace where data comes from without asking an engineer every time.

Airflow

An orchestration platform that schedules and monitors data pipelines as code. When pipelines grow complex enough that "run this after that finishes" needs to be reliable and auditable, Airflow is the standard choice.

↑ Back to map

06 · DATA GOVERNANCE

Data Governance

Trust is slow to build and fast to lose. Governance is the set of practices that keeps data trustworthy — who can access it, what it means, and how good it is. In enterprise environments, this is non-negotiable from day one.

MDM

Master Data Management — the practice of creating a single, authoritative record for key business entities (customers, products, accounts). Without MDM, "customer count" returns a different number in every system, and every report becomes a negotiation.

Data Security & Privacy

Defining who can see what, how long sensitive data is retained, and how it is classified. GDPR, CCPA, and internal compliance policies all fall here. Non-negotiable in any regulated industry — and increasingly standard everywhere else.

Data Quality Framework

The systematic process for measuring, monitoring, and improving data quality — completeness, accuracy, consistency, timeliness. Bad data is the most common reason analytics projects lose stakeholder trust long before the insight is ever acted on.

↑ Back to map

From Business Analystto Data Analyst.

Data Basics

Analytics

Data Platform

Data Modeling

ETL & ELT

Data Governance

From Business Analyst
to Data Analyst.