Top 42 SQL Interview Questions and Answers (2025) | TutorialsJet

SQL Interview Questions and Answers (2025)

Expanded, theory-first explanations with examples, performance notes, and gotchas — crafted for real interviews and search intent.

1)Basics & SQL Semantic

1) What is SQL and how does it differ from a database?

Theory: SQL (Structured Query Language) is a declarative language for expressing what data you want, not how to get it. The database engine (e.g., PostgreSQL, MySQL, SQL Server) parses your query, builds a logical plan, and chooses a physical execution plan using statistics and indexes. SQL is the interface; the DBMS is the software that stores data, enforces constraints, and executes SQL.

Why it matters: Interviewers want to see that you understand declarative vs. imperative paradigms and rely on the optimizer instead of micro-managing row-by-row logic.

SELECT name, email
FROM users
WHERE status = ‘active’;

2) DDL vs DML vs DCL vs TCL

Theory: SQL is grouped by intent: DDL (data definition) changes schema; DML (data manipulation) reads/writes data; DCL controls permissions; TCL manages transaction boundaries and recovery. Separating these concerns enables robust governance and performance tuning.

— DDL
CREATE TABLE accounts(id BIGINT PRIMARY KEY, balance NUMERIC NOT NULL);
— DML
UPDATE accounts SET balance = balance – 100 WHERE id = 1;
— DCL (engine-specific)
GRANT SELECT ON accounts TO analyst;
— TCL
COMMIT; — make changes durable

3) Primary key vs Unique key

Theory: A primary key uniquely identifies each row and is the official row identity; most systems enforce NOT NULL. A unique constraint enforces uniqueness without implying identity semantics and may allow a single NULL depending on the DB. Many designs use surrogate PKs and separate UNIQUE constraints for natural business keys.

Pitfall: Relying on a natural key that changes (e.g., email) causes cascading updates. Prefer stable surrogate keys plus a UNIQUE constraint on the natural key.

4) Foreign keys & referential integrity

Theory: Foreign keys guarantee that a child row references an existing parent row. This prevents orphan records and preserves data correctness across tables. Engines enforce FKs using indexes and constraint checks on INSERT/UPDATE/DELETE.

ALTER TABLE orders
ADD CONSTRAINT fk_orders_user
FOREIGN KEY (user_id) REFERENCES users(id);

5) NULL semantics

Theory: NULL represents unknown/absence. Comparisons with NULL yield UNKNOWN under three-valued logic. Use IS NULL/IS NOT NULL for predicates and COALESCE to replace NULLs when aggregating or presenting data.

SELECT COALESCE(middle_name,”) AS middle_name FROM people;

Pitfall: NOT IN (subquery) returns empty when the subquery yields any NULL; use NOT EXISTS instead.

6) Views vs Materialized Views

Theory: A view is a stored query evaluated on demand; it simplifies permissions and logic sharing. A materialized view stores the result set and must be refreshed. Use materialization for costly aggregations and report queries when data freshness requirements permit.

7) Normalization vs Denormalization

Theory: Normalization (1NF→3NF/BCNF) reduces redundancy and anomalies in OLTP schemas, improving write integrity and reducing duplication. Denormalization intentionally introduces redundancy (pre-joins, rollups) to accelerate reads in analytics or heavy read workloads. Many systems use normalized cores with denormalized read models (e.g., star schemas, caches).

2) Joins & Set Operations

8) INNER vs OUTER joins

Theory: An INNER JOIN returns only matching rows; LEFT/RIGHT OUTER JOIN preserves all rows from one side, filling unmatched columns with NULLs; FULL OUTER preserves all rows on both sides. Optimizers choose among nested-loop, hash, or merge joins based on available indexes and estimated row counts.

SELECT o.id, u.name
FROM orders o
LEFT JOIN users u ON u.id = o.user_id;

Pitfall: Filters on the right table after a LEFT JOIN can turn it effectively into an INNER JOIN. Move such filters into the ON clause if you need preservation.

9) UNION vs UNION ALL

Theory: UNION removes duplicates by sorting or hashing, which is costlier; UNION ALL concatenates results and is usually faster. Use UNION only when semantic uniqueness is required.

10) Anti-joins: NOT EXISTS vs NOT IN vs LEFT JOIN … IS NULL

Theory: NOT EXISTS is NULL-safe and stops at first match; NOT IN fails when the subquery yields NULL; LEFT JOIN ... IS NULL is workable but may miscount if the right side duplicates keys. For correctness and performance, prefer NOT EXISTS with proper indexing.

SELECT u.*
FROM users u
WHERE NOT EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id);

3) Aggregations & Grouping

11) COUNT(*) vs COUNT(column)

Theory: COUNT(*) counts rows regardless of NULLs; COUNT(column) excludes NULLs. Be explicit about intent when measuring sparsity or completeness.

12) DISTINCT vs GROUP BY

Theory: Both can deduplicate, but GROUP BY adds aggregates. Use DISTINCT for uniqueness checks; use GROUP BY when computing metrics per combination.

13) GROUPING SETS / ROLLUP / CUBE

Theory: These operators produce multiple aggregation levels in one pass (totals, subtotals). They reduce query count and often leverage the same scan with different grouping keys.

14) HAVING usage and pitfalls

Theory: HAVING filters after grouping. Don’t try to use aggregates in WHERE. Use HAVING for constraints on aggregate results (e.g., groups with total > 1000).

15) Top-N per group (window approach)

Theory: Window functions let you rank within a partition (per group) while preserving row granularity. Filter the ranked set for the top N. This avoids correlated subqueries and is widely supported.

WITH ranked AS (
SELECT c, p, score,
ROW_NUMBER() OVER (PARTITION BY c ORDER BY score DESC) rn
FROM items
)
SELECT * FROM ranked WHERE rn <= 3; — top 3 per c

4) Subqueries, CTEs, EXISTS

16) CTE vs subquery: readability vs materializationa

Theory: Most optimizers inline non-recursive CTEs like subqueries. Some engines may materialize CTEs (one-time temp results), which can either help (re-use) or hurt (extra I/O). Choose CTEs for clarity, reuse, and recursion; profile critical queries.

17) EXISTS vs INa

Theory: EXISTS tests existence and stops early; IN tests membership and is great for small lists. For large or correlated sets, EXISTS is usually more scalable.

18) Recursive CTEs for hierarchies

Theory: Recursion lets you walk parent/child graphs (org charts, folders). Safeguard with depth limits to prevent runaway recursion on cycles.

WITH RECURSIVE org AS (
SELECT id, manager_id, 0 AS depth FROM employees WHERE id = 1
UNION ALL
SELECT e.id, e.manager_id, o.depth+1
FROM employees e JOIN org o ON e.manager_id = o.id
)
SELECT * FROM org;

5) Window Functions

19) What are window functions?

Theory: Window functions compute analytics across a set of related rows without collapsing them (e.g., running totals, ranks, moving averages). The PARTITION BY defines the peer group; ORDER BY defines sequence; frames (e.g., ROWS BETWEEN) define which neighbors contribute.

20) ROW_NUMBER vs RANK vs DENSE_RANK

Theory: ROW_NUMBER assigns a unique ordinal; RANK gives ties the same rank and leaves gaps; DENSE_RANK gives ties the same rank with no gaps. Choose based on how you want to treat ties in leaderboards or dedup tasks.

21) Running totals and frames

Theory: Use cumulative frames for monotonic metrics. For time-series, prefer RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW or explicit ROWS frames depending on duplicates and sorting stability.

SELECT tdate, amount,
SUM(amount) OVER (
ORDER BY tdate
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total
FROM tx;

6) Date/Time & Strings

22) Sargable date filters

Theory: To allow index seeks, compare the column to a constant boundary rather than wrapping the column in a function. Most engines cannot use an index if you apply a function to the indexed column in the predicate.

— PostgreSQL example
SELECT *
FROM orders
WHERE order_date >= CURRENT_DATE – INTERVAL ‘7 days’;

23) Time zones & storage strategy

Theory: Store timestamps in UTC; convert at the edge (application/UI). Use time zone-aware types where available to avoid ambiguous local times around DST transitions.

24) Full-text vs LIKE

Theory: Prefix wildcards (%term%) defeat B-tree indexes. Use built-in full-text search or trigram/GIN indexes for scalable text search and ranking.

7) Indexing & Sargability

25) How indexes work

Theory: Most relational engines use B-tree variants for general indexes. They maintain sorted keys pointing to rows/pages, enabling logarithmic lookups and ordered scans. Indexes speed reads but add write overhead (maintaining structures on INSERT/UPDATE/DELETE).

26) Clustered vs nonclustered (SQL Server)

Theory: A clustered index defines the physical order of table data; there can be only one. Nonclustered indexes store keys plus a pointer (RID or clustered key). Covering nonclustered indexes that include select-list columns can avoid base lookups.

27) Composite index order

Theory: Put highly selective, frequently filtered columns first and align with the query’s WHERE and ORDER BY. The leftmost prefix rule determines which predicates are sargable.

28) Sargability killers

Theory: Wrapping columns in functions, performing arithmetic on the column, data type mismatches (implicit casts), and leading wildcards prevent index seeks. Rewrite predicates to keep the indexed columns on the left side of comparisons.

29) Keyset vs OFFSET pagination

Theory: OFFSET ... LIMIT makes the engine count/skip rows repeatedly, degrading for deep pages. Keyset pagination uses the last seen key to seek the next page, staying fast and stable.

SELECT *
FROM posts
WHERE (created_at, id) > (:last_created_at, :last_id)
ORDER BY created_at, id
LIMIT 50;

8) Transactions & Concurrency

30) ACID guarantees

Theory: Atomicity ensures all-or-nothing; Consistency preserves invariants; Isolation makes concurrent transactions appear serial; Durability guarantees committed data survives crashes (via WAL/redo logs and checkpoints).

31) Isolation levels & anomalies

Theory: Read Uncommitted allows dirty reads; Read Committed prevents dirty reads; Repeatable Read prevents non-repeatable reads; Serializable prevents phantoms. MVCC-based engines implement snapshots to reduce blocking.

32) Deadlocks and avoidance

Theory: Deadlocks are cycles of waits. Engines detect them and abort one transaction. Avoid by acquiring resources in a consistent order, using narrower transactions, and ensuring selective indexes to reduce lock footprints.

9) Modeling & Normalization

33) 1NF, 2NF, 3NF in practice

Theory: 1NF: atomic columns and no repeating groups. 2NF: every non-key attribute depends on the whole key (no partial dependency on a composite key). 3NF: no transitive dependency (non-key attributes shouldn’t depend on other non-key attributes). This minimizes update anomalies.

34) Surrogate vs natural keys

Theory: Surrogates (IDENTITY/UUID) are stable identifiers; natural keys derive from business attributes (e.g., email) and may change. Use surrogates for relationships; enforce natural uniqueness with a UNIQUE constraint.

35) Cascading deletes & soft deletes

Theory: Cascades keep integrity but can lock many rows in deep graphs. Soft deletes (deleted_at) preserve history and simplify recovery, at the cost of filtering everywhere.

10) Advanced Scenarios

36) Gaps & islands

Theory: Identify consecutive sequences by comparing row numbers to a monotonic series. All rows with the same difference belong to one island. Useful for sessionization and streaks.

WITH s AS (
SELECT val, dt, ROW_NUMBER() OVER (ORDER BY dt) rn
FROM events
), g AS (
SELECT *, val – rn AS grp FROM s
)
SELECT MIN(dt) AS start_dt, MAX(dt) AS end_dt, COUNT(*) AS cnt
FROM g GROUP BY grp;

37) UPSERT patterns

Theory: Upserts ensure idempotent writes under concurrency. Prefer single-statement constructs (e.g., PostgreSQL ON CONFLICT, MySQL ON DUPLICATE KEY UPDATE), or guarded update-then-insert with proper isolation/locking.

38) Partitioning

Theory: Horizontal partitioning splits large tables by a key (often time). Benefits include partition pruning, faster maintenance (index rebuilds, vacuum), and lifecycle policies (cold storage). Choose keys aligned with common filters.

39) JSON in relational databases

Theory: Semi-structured JSON can live alongside normalized tables. Use JSON types and indexes (e.g., PostgreSQL GIN) for flexibility; keep critical constraints in relational columns for integrity and performance.

11) DB-specific Nuances

40) PostgreSQL DISTINCT ON vs window

Theory: DISTINCT ON (key) returns the first row per key according to ORDER BY. It’s concise but non-standard; the portable pattern is ROW_NUMBER() + filter.

41) MySQL InnoDB vs MyISAM

Theory: InnoDB is ACID-compliant with row-level locking and foreign keys; MyISAM lacks transactions and uses table locks. Prefer InnoDB for almost all modern workloads.

42) SQL Server NVARCHAR vs VARCHAR

Theory: NVARCHAR stores Unicode (UTF-16) supporting international text; VARCHAR stores non-Unicode. Choose NVARCHAR when you need multilingual support despite higher storage.

Crack Top Tech Interviews

Access 100+ Questions & Real-

World Diagrams

Interviews aren’t just about answers—they’re about

clarity, confidence, and the ability to map real-world problems.

Our Premium Tech Resources bundle gives you 100+ curated questions,

architecture diagrams, and a mentorship session tailored to your goals.

Crack Top Tech Interviews

Thanks for exploring TutorialsJet — where automation meets clarity.

TutorialsJet

Quick Links

Contact Us