MSSQL Interview Questions

MSSQL:

Q) What are new features of SQL Server 2012/2014/2016/2017/2019/2022?

A) SQL Server 2012 introduced Always On which includes availability groups (a clustered group of databases in a server), Contained Databases with their own separate usernames and passwords making it easier to migrate between servers, ColumnStore indexes for better read-only performance (they are like sybase iq DWH indexes), and also enhancements to the ShowPlan output

SQL Server 2014 introduced in memory OLTP where you can place suitable tables and stored procedures into memory, giving up to 30x better performance. It also has much more integration with Azure for backups and availability groups etc. ColumnStore indexes are now updateable. SSD Buffer Pool extensions can improve performance dramatically. Incremental statistics only update for new and changed rows then merge with existing stats, much faster. Delayed Durability offers a more relaxed logging option which speeds up transactions but at the cost of recoverability.

SQL Server 2016 introduced Query Store which maintains a history of query execution plans with their performance data. Polybase, a SQL Server connector to Hadoop. Stretch Database where some part of your tables (configurable or automated) will be moved into an Azure SQL Database in the cloud. Java Script Object Notation (JSON) support built in. Row Level Security in tables. Always Encrypted is new functionality through the use of an enhanced client library at the application so the data stays encrypted in transit, at rest and while it is alive in the database. Enhancements to In Memory functionality. Temporal tables, also named system-versioned tables, allow SQL Server to automatically keep history of the data in the table

SQL Server 2017 introduced the ability to run it on Linux and Docker containers in addition to Windows. It has Graph Database Capabilities which allows you to more efficiently store and query complex relationships between data. Phyton support built into the database engine. Resumable Online Index Rebuilds, useful for large index rebuilds where you have a small maint window. New CLR security model. Adaptive Query processing to improve Query processing over time by adjusting query plans where necessary. Scaling out Integration Services to allow you to run a package across multiple machines.

SQL Server 2019 introduced Intelligent Query Processing enhancements, this includes dynamic memory grants for rowstore tables, table variable deferred compilation, batch mode on rowstore and more. Accelerated Database Recovery dramatically decreases the time needed to bring a database online on startup/recovery. Enhancements to Always On Encryption with Secure Enclaves. Memory-Optimized Tempdb metadata which puts some key tempdb metadata into memory and optimizes it. Polybase expanded to support Oracle, Teradata, MongoDb and more. There are also multiple other Internal Performance Improvements like Index optimization, temp table recompilation, checkpoint scalability etc.

SQ Server 2022 introduces Azure Synapse Link which allows you to gain real-time insights without performance degradation of the SQL database. You can Link/Pair with Azure Managed Instances which gives the best of on-premise and Cloud Servers including simplified DR solutions. There is also Parameter Sensitive Plan Optimization (PSPO) that optimizes the performance of SQL queries with parameters. Microsoft Purview integration will helps you understand, discover and manage data from various sources. Ledger, when this feature is enabled an old version of your tables will be saved when changes are made, secured using Blockchain. Security Improvements; Microsoft Defender for Cloud integration, Microsoft Purview integration, Ledger, Azure Active Directory Authentication, Always Encrypted with secure enclaves, Access Control, Dynamic data masking, Support for PFX certificates & MS-TDS 8.0 Protocol


Q) If someone has a problem with a spid, how would you investigate it?

A) Use the activity monitor, and perhaps dbcc inputbuffer, dbcc opentran to show the longest running transaction. You can also start a trace on that spid directly from the activity monitor.


Q) How do you change the default location of a database?

A) You can specify the default location for data and log files in the server properties, database settings.


Q) What are the different types of SQL Authentication?

A) There is SQL Server Authentication and Windows Authentication. Can possibly also use other methods like LDAP via 3rd party tools and with Azure you have username and password, multi-factor authentication and federated authentication via Azure AD.


Q) What is a dbcc and why would you use it

A) DBCC stands for Database Console Commands and are grouped into Maintenance tasks, Informational, Miscellaneous and Validation operations.  The most familiar is dbcc checkdb to check consistency but there is also dbcc showcontig to check fragmentation of a table and dbcc show_statistics to show statistics amongst many others.


Q) What is the difference between dbcc reindex and dbcc index (defrag)?

A) dbcc reindex is an offline procedure, dbcc index (defrag) is performed online.


Q) What is the difference between rebuilding an index and reorging an index

A) Reorg only changes the leaf level of the index, much quicker than rebuilding the whole index?


Q) How would you check the statistics for a table?

A) Run dbcc show_statistics against it.


Q) What are statistics and why are they so important?

A) Statistics, Index and column, are used by the optimiser to choose the optimal execution plan for a query.


Q) What kind of information do the statistics contain?

A) Statistics contain information on the cardinality and data distribution of data in a table or index.


Q) What is the UPDATE_STATISTICS command for?

A) UPDATE STATISTICS updates the query optimization statistics on a table or indexed view. This should be handled automatically by the server but sometimes you need to run it manually to improve the query plans being produced, especially on busy tables with lots of changes to the data.


Q) What things would you configure on the analysis server, high level?

A) Increase minimum allocated memory, increase read-ahead buffer size. Increase Process buffer size, enable Proactive caching, maybe increase maximum number of threads.


Q) There are a few options which MSSQL server lets you set which should not be set on a production server?

A) You should NOT enable Auto Shrink or Auto Close and a production database should typically be set to full or bulk logged recovery model NOT simple.


Q) What services can be clustered in SQL Server 2008?

A) SQL Server, SQL Server agent, Analysis services and Reporting services.


Q) What resources do you set up against a SQL server resource in a cluster?

A) SQL IP Address, SQL Network Name, SQL Server, SQL Server Agent, Disk Resources and maybe Full Text and Analysis Services if required.


Q) How many IP addresses would a 2-node cluster with 1 SQL Server instance need?

A) For an Active/Passive cluster you would need 7:
1 IP for Windows Cluster, 1 IP for SQL Server Virtual Network Name, 1 IP for MSDTC (this is optional), 2 IP’s for public network (1 per node), 2 IP’s for heartbeat network (1 per node)


Q) The SQL Server service should be configured with which startup type if in a Cluster?

A) Manual


Q) What is the difference between an active/active and active/passive cluster?

A) In active/active both nodes have instances on them in active/passive one node is in an idle standby state.


Q) What resources should NOT have “affect group” set in the cluster?

A) The only one which should not have “affect group” set is SQL Server Agent.


Q) What is the difference between an offline group operation and a move group operation?

A) An offline group operation takes an online group offline whilst a move group operation moves a group and all its resources from one node to another.


Q) If you offline the group and reboot the node what happens to the group when the box comes back up again?

A) It will remain offline until it is explicitly put it online again.


Q) Explain the difference between ISAlive and LooksAlive and where can it be configured?

A) LooksAlive checks that the SQL Server service is up and running. If it fails then IsAlive is run which actually logs into the sql server, in case that fails the cluster will first try to reboot on the same node, if that doesn’t work it will fail the SQL server over to the next node. The polling intervals etc can be adjusted/configured in the Cluster service properties.


Q) How do you start a SQL Server from the command prompt in single user mode?

A) To start the default instance it’s sqlserver.exe –m  for a named server it is sqlserver.exe –m –s <servername>.


Q) What is the purpose of the SQL Agent

A) It controls Jobs and Alerts.


Q) How can you set the memory of the server to not go above 3GB?

A) In server properties set the maximum server memory to 3GB.


Q) How would you move the tempdb?

A) Execute the following 2 commands: alter database tempdb modify file (name=tempdev|templog, filename=’c:\Sqldata\tempdb.mdf|templog.ldf’), then restart the server and the tempdb will be created on the new devices in the new location, you can then remove the old devices.


Q) How would you optimise tempdb?

A) You should ensure there are half as many tempdb data files as you have cpu cores, this cuts down on Allocation contention. Another option is to place the tempdb onto an SSD. Can also disable auto update stats and auto create stats but need to be careful doing this.


Q) What is the default SQL Server port number?

A)  1433


Q) In SQL Server 2005 onwards what is port 1434 used for?

A) The SQL Server browser uses, by default, port 1434 to listen for incoming connections to named services which use dynamic ports. It is also used for the DAC connection.


Q) What is DAC and what port does it use?

A) It is a special Diagnostic connection for administrators when standard connections to the server are not possible. You can connect to it by prefixing admin to the instance name in SSMS or by using the –A switch for sqlcmd. DAC listens for connections on port 1434.


Q) Are there any restrictions on using DAC?

A) Yes, you can only have one DAC connection to a server at a time and you can’t execute backup and restore commands.


Q) How would you find the port that DAC is listening in on?

A) You can find this information in the SQL Server log, “Dedicated admin connection support was established for listening locally on port ….”


Q) How do you change the static port number for a named instance?

A) In SSCM You need to set the TCP Port under “IP All” to the required port value AND remove any entries for TCP Dynamic Ports.


Q) Where is logging information kept within sql server?

A) You can access the logs from the SQL Server management Studio, both SQL Server, System and SQL Server Agent logs can be viewed but the actual physical logs are on the file system.


Q) Describe what an index is and how it works?

A) Indexes speed up the querying process by providing swift access to rows in the data tables. A Database index works in much the same way as a Book’s Index. If you want to find a particular subject/word then you go to the back of the book, check the index for that word and then go to the page indicated. It’s exactly the same in a database, you would go to the index search for the item and then go to the data page indicated.  A clustered index goes a bit further than this as it forms part of the actual table, in terms of a book you can think of it as being the chapter, headings and subheadings of a book with the pages being the actual leaf level of the index.


Q) What are the differences between Clustered and Non-Clustered indexes?

A) A clustered index defines the order in which data is physically stored in a table, it forms and orders the table where the leaf level is the actual data page. Table data can be sorted in only one way, therefore, there can be only one clustered index per table.
A Non-clustered index is separate from the table and just points to the relevant pages within the table, therefor you can have multiple non-clustered indexes for one table.


Q) What types of indexes can you have?

A) The main ones are non-clustered, clustered, Unique and ColumnStore. Can also have index with included columns, index on computed columns, filtered, spatial, xml and Full-text. Of course you don’t have to have an index at all, if you don’t then the table is referred to as a heap table.


Q) Can you create indexes on views?

A) Yes


Q) Which is better an INDEX SEEK or an INDEX SCAN?

A) An Index Seek is better especially for larger tables, an Index Scan is like a full table scan, but only on the columns in the index so it’s a bit better than a full table scan.


Q) What is a covering index?

A) A Covering index is an index that contains all the columns you need for a query.


Q) In a query what would be the difference between columns which are part of a covering index and “included columns”?

A) If the column is not needed in the WHERE/JOIN/GROUP BY/ORDER BY, but only in the column list of the select statement then you can add it as an included column, this results in a smaller and more efficient index.


Q) What index is the best choice for a date range search?

A) Clustered Index.


Q) How would you display fragmentation information for a specific table?

A) Use dbcc showcontig command against the table or index.


Q) How would you display fragmentation information, other than with dbcc showcontig?

A) The newer method is to use the dmv sys.dm_db_index_physical_stats


Q) What is the difference between internal and external table fragmentation?

A) Internal fragmentation is unwanted free space on a page caused by deletes/updates/inserts. External fragmentation is when the logical order of index pages varies from their physical organization in the data file, caused by page splits with pages linking back and forth between extents.


Q) What is the difference between fragmentation in a traditional index compared to a columnstore index?

A) A traditional b-tree index is considered fragmented if the physical order of pages is out of sync with the key order or if the data or index pages are only partially filled. A columnstore index is considered fragmented if more than 10% of the rows are marked as deleted.


Q) What is segment elimination as related to columnstore indexes?

A) Each partition in a columnstore index is broken into one million row chunks called segments. Each segment has metadata that stores the minimum and maximum value of each column for the segment. The storage engine checks filter conditions against the metadata. If it can detect that no rows will qualify then it skips the entire segment without even reading it from disk which means much less IO required in searches.


Q) What command would you use to check the integrity of a database, what option to check the physical allocation only?

A) You would use dbcc checkdb and if you only want to check the physical allocation then you’d add the Physical_only option, i.e dbcc checkdb (‘dbname’) with physical_only.


Q) What is dbcc checkdb actually checking in terms of the data etc?

A) It runs the following:
DBCC CHECKALLOC Checks consistency of disk space allocation structures
DBCC CHECKTABLE on every table and view in the database – Checks the integrity of all the pages and structures that make up the table or indexed view
DBCC CHECKCATALOG on the database – Checks for system catalog (a set of tables that store metadata about objects in the database) consistency within the database.
Validates the contents of every indexed view in the database


Q) What is locking/blocking and how would you get information on locks?

A) Locking/Blocking is the process of ensuring data consistency in a database, so for example as one process is updating a row of data it will not allow another process to update it at the same time. You can get information on locks by using the sp_lock stored proc, querying the dm_tran_locks DMV and for more detail by running SQL Profiler.


Q) What is blocking and how would you find information on it?

A) Blocking is when a request tries to access pages/rows which are locked by another request, a good dmv for spotting this is sys.dm_os_waiting_tasks which returns information about the wait queue of tasks waiting on some resource.


Q) What are traceflags and when would you use them?

A) Trace flags are used to set specific server characteristics. For example it can be used to print out extra information about deadlocks with traceflags 1204 and 1222. They are either set at startup with the –T option or can be explicitly started with dbcc traceon(<traceflag number>).


Q) What are dynamic management views?

A) DMV’s were introduced in SQL Server 2005 and provide a simple way for assessing the state of a server and gathering diagnostic information without placing to much overhead on the SQL Server.


Q) Name 3 useful dmv’s

A) dm_exec_query_stats, dm_exec_session_wait_stats, dm_db_session_space_usage


Q) Which DMV can give you a query plan?

A) There are many different categories of Dynamic Management Views, they are;

              Dm_exec, dm_os, dm_tran, dm_io, dm_db, dm_repl, dm_broker, dm_fts.

These are then selected from using a cross apply with various table functions like dm_exec_sql_text to get for example a list of the costliest stored procs and transactions running on the server.

The specific dmv used to view query plans is dm_exec_query_stats joined with the table function dm_exec_query_plan.


Q) What is the new system database in SQL Server 2005 and what is it used for?

A) It is called resource and holds all the system objects deployed with MS SQL server. The system objects include stored procs and system views which logically appear in each database. These objects were moved to a central location to make any upgrade process easier.


Q) What impact does the resource database have in terms of DBA support?

A) It is crucial as all system views and procedures are stored in it.


Q) How would you restore the system resource database?

A) You would need to restore/copy the database files for the Resource database to the location of the master database files.


Q) What are SOX systems? (relating to big environments and very secure systems which need special regulation and control)

A) The Sarbanes-Oxley Act (SOX) of 2002 sets new requirements for the integrity of the source data relating to financial transactions and reporting. To prove the integrity of data, companies must extend audit processes to the financial information stored within corporate databases. To verify regulatory compliance, auditors look at multiple aspects of a database environment including: user management, authentication, separation of duties, access control, and audit trail.


Q) What is the Distributed Transaction Co-ordinator (DTC) used for.

A) The DTC manages transactions across a network of Windows based systems. MS DTC allows:

– Distributed transaction support for Microsoft SQL Server.

– Updated data residing on two or more Microsoft SQL Server systems.

– It’s also a requirement for clustering to work.


Q) The log file just keeps on growing, why is that?

A) That would either be because you have a run-away transaction that would need to be identified and killed or more likely the database is set to full recovery mode but there are no jobs or maintenance plans set up for backing up and truncating the transaction log.


Q) Steps to optimise a badly running SQL Server query?

A) You need to identify the query and extract the query plan for it with sp_showplan (spid) or by running a trace against the spid. You could also use dm_exec_query_stats together with the table function dm_exec_query_plan.

In most cases adding an extra index will help but sometimes you will need to completely rewrite the query to perhaps use temporary tables etc for sorting.


Q) What operators on a query plan would alert you to possible performance issue

A) Table scans, Sorts and hashes are the most expensive operators, also any big mismatch between expected and actual row counts would indicate poor and out of date statistics which can be fixed by running an update stats or even creating new statistics.


Q) You get a complaint that the database is slow today. What are the first things you do?

A) The first thing to do is to log into the SQL Server and then open Activity monitor to see what is currently running on the SQL Server. Specifically, it’s a good idea to check if there is any locking or blocking occurring with sp_lock. I would also look at the overall load on the SQL Server and examine the underlying Windows server for any issues. Another useful thing is to examine if any indexes have been dropped recently or if there have been any other changes to data structures or data volumes. Fragmentation could also be an issue.


Q) How would you find the longest running queries with SQL Server Profiler?

A) In SQL Profiler you would run a trace using the template SQLProfilerTSQL_Duration and look at the durations in the output. You could also use dmv’s, dm_exec_query_stats joined with the table functions  dm_exec_sql_text and dm_exec_query_plan if you also want to see the query plans being produced.  Or you can use the SSMS report on top queries.


Q) What is a primary key and a foreign key?

A) A Primary key uniquely identifies each record in a table, it must contain unique values and cannot contain NULL values. A Foreign Key in one table points to a Primary key in another table, it’s used as a constraint to protect referential integrity, in other words, only values that are supposed to appear in the database/tables are permitted.


Q) Can Unique keys store NULL values?

A) Yes they can which is one of the main ways they differ from primary keys.


Q) If you want to remove a table definition and its data, what command should you use?

A) DROP table


Q) What is the difference between the TRUNCATE and DELETE commands?

A) TRUNCATE is faster and uses fewer system and transaction log resources than DELETE. TRUNCATE removes the data by deallocating the data pages used to store the table’s data, and only the page deallocations are recorded in the transaction log. DELETE removes rows one at a time and records an entry in the transaction log for each deleted row.


Q) Which utility or tool is best at identifying long running transactions?

A) SQL Profiler or Extended Events.


Q) What are the pros and cons between Profiler and Extended Events?

A) The pro with profiler is the familiarity, but in all ways Extended events is better. Extended events places much less load on the server being examined and Extended Events offers more monitoring options, it has almost 8 times as many available events as Profiler.


Q) Which utility or tool is best for assessing disk queue length?

A) Performance Monitor (Perfmon).


Q) What counters would you use in perfmon to monitor disk performance?

A) The 2 best ones are Physical Disk-Avg Disc sec/Read and  Physical Disk-Avg Disc sec/Write which should both be below 0.01.


Q) What are the key metrics you would use for monitoring performance on SQL Server

A) The key metrics are Transact SQL (T-SQL) metrics, Batch requests per second, SQL compilations per second, Last elapsed time, Buffer cache metrics, Buffer cache hit ratio, Checkpoint pages per second and Page life expectancy. These together with Wait Stats will point you towards specific areas for further examination e.g. inadequate memory, slow Disks or insufficient CPU etc


Q) What is the Size of a SQL Server page and how many pages are in an extent?

A) Each page is 8KB, and each extent has 8 pages so 64KB in total.


Q) What percentage of the page is filled by the default server-wide fillfactor value?

A) 100%


Q) Why can a fill factor of 100% be bad, what happens with page splits etc?

A) If an existing row is updated and its length is increased, then it may not fit on to its original page and a page split occurs. In a page split, half the rows remain on the original page and the rest are moved to a new page. We now have two pages that are just over 50% full, with the result that we need additional I-O and memory to get the data from these pages. However, if you set it lower then you may still eventually get page splits after a lot of updates.
A fill factor of 100 is best for tables that have a cluster index based on a primary key that uses an Identity value. However, for tables based on other columns, especially tables with a primary key that is a GUID, then a different fill factor can be good.
In any case its important to check fragmentation of a table/index and perform reorg’s and index rebuilds on a regular basis to remove page splits.


Q) What trace flags should you activate to capture deadlock information in the SQL Server 2005 or later error log?

A) 1204, 1222


Q) How would you troubleshoot deadlocks in a SQL Server

A) Can use Traceflags 1204 and 1222, but the better approach is to use Extended events to capture the deadlocks which can then be analysed, and hopefully this will lead to the queries being optimised to avoid the issue going forwards.


Q) What is the difference between a deadlock and a livelock and where might a livelock occur in SQL Server?

A) A deadlock occurs when 2 or more tasks permanently block each other by each task having a lock on a resource which the other task is trying to lock. SQL Server will kill one of the tasks to free the other one(s). A Livelock is where a request for an exclusive lock is denied continuously because of a series of overlapping shared locks which keep on interfering with each other and changing state which stops them from finishing, SQL Server automatically detects the situation after four denials and refuses further shared locks. You might see Livelocks if you use partitioning.


Q) Which DMV (Dynamic Management View) returns aggregate performance statistics for cached query plans?

A) sys.dm_exec_query_stats


Q) Which system database stores SQL logins?

A) master


Q) Which system database stores SQL Server jobs?

A) msdb, BTW the msdb database is also used for the SQL server agent, service Broker, Database Mail, Log shipping and more.


Q) What system databases are in SQL Server and what do they do?

A) Master – Records all the system-level information for an instance of SQL Server incl Logins
Msdb – Used by the SQL Server Agent for scheduling alerts and jobs
Model – Template database for all user defined databases, incl tempdb
Resource – Is a read-only database that contains SQL Server system objects. System objects are physically in the Resource database, but logically appear in the sys schema of every database.
TempDB – Stores temporary tables, table variables, cursors, work tables etc, used for index creation.
Distribution – Used for SQL Server replication


Q) What are the steps to restore a master database?

A) You need to stop the SQL Server service and restart it in single user mode from a command prompt (sqlservr.exe -c -m -s <servername>). Next open up a second command prompt and connect to the sql server using SQLCMD. Then run the command to restore the master database from your backup using the REPLACE option. Once finished the Server will shut itself down, now simply boot it back up in normal mode and you should be good to go.


Q) Which stored procedure is used to find blocking processes?

A) sp_who2.    sp_WhoIsActive is an even better proc for getting info on active spid’s.


Q) What are the 3 main backup types and which one would you use to get to a point in time?

A) Full, Diff and Transaction log. To get to a point in time you’d first restore the full backup then diff (if used) and finally transactional log backups to the point in time required.


Q) How to take a backup without breaking the backup chain/sequence?

A) You can do a backup database with COPY_ONLY, this will allow you to carry on taking your Full, Diff and Log backups without any interruption.


Q) What is the command to do a point in time recovery?

A) Recover full, diff (if used), and log files up to and including the time required. For the last one you specify STOPAT and the time, e.g  RESTORE LOG AdventureWorks2008R2 FROM AdventureWorksBackups  WITH NORECOVERY, STOPAT = ‘Apr 15, 2020 12:00 AM’. Then restore the database with recovery to bring it online.


Q) How does SQL Server know what to back up in a differential backup?

A) It is done at the extent level whereby a full backup will reset the flags on all extents so that they are turned off. If that extent is later modified then the flag is set to on so that a diff backup knows to back it up. The flags only get set back to off at the next full backup so differential backups grow in size over time as more and more data is changed. Differential backups can be run on all recovery models; Full, Bulk-logged and Simple.


Q) In a data warehousing environment what effect might your backup strategy have on performance and what considerations might you take when deciding on a strategy?

A) In a typical Data Warehouse you will only have one single batch load of data per day and the majority of the data is static which means that you don’t need to worry so much about backing up the log, in fact the recommendation is to run the database in simple mode and perform the backup just before the ETL data load. For a relatively small DWH then you can just do full backups every night. For a medium size DWH perform weekly or monthly Full backups and Daily differential backups. For a Large DWH you need to start to think about partitioning and moving static tables and data into read-only partitions and then you do nightly full filegroup backups only on the read/write file groups.


Q) To restart a log backup chain in 2008 do you take a full backup?

A) No a differential backup will also restore the backup chain so that you can continue taking log backups.


Q) Which High Availability solution includes backup, copy and restore jobs?

A) Log Shipping


Q) What is Log Shipping?

A) Log Shipping is a way of automatically backing up a database and transaction log on a primary database and applying them to a standby database for recovery. It is a warm standby solution.


Q) Which version of SQL Server has log shipping?

A) Enterprise edition


Q) What are the main types of SQL Server replication?

A) Merge, Snapshot and Transactional.


Q) Which agents are used in transactional replication?

A) Snapshot, Log Reader and Distribution Agent


Q) How exactly are replicated transactions applied to the subscriber?

A) First the snapshot agent prepares a snapshot file containing schema and data of published tables and objects, stores the files in the snapshot folder and records sync jobs in the distribution database of the distributor. The Log Reader Agent monitors the transaction log for the replicated database and copies any relevant transaction to the distribution database. The Distribution Agent copies the initial snapshot files and subsequent transaction held in the distribution database to Subscribers. 


Q) Is it possible to insert records without merge/insert triggers executing?

A) Yes you can bulk load the data using either bcp or bulk insert.


Q) Name the SQL Server transaction isolation levels?

A) READ UNCOMMITTED “Dirty reads”, READ COMMITTED(Default), REPEATABLE READ, SNAPSHOT and SERIALIZABLE.


Q) When might you use a READ PAST table hint?

A) You might use this to alleviate locking issues as it permits the server to skip past write-locked rows.


Q) What are the good and bad things about a NOLOCK table hint?

A) The good is that it prevents shared locks from affecting rows returned in your queries, this increases performance and concurrency. The bad is that due to ignoring any locks some dirty/uncommitted data could potentially be read which can lead to inconsistent data.


Q) What is normalisation?

A) Database normalisation is the process of organising the fields and tables of a relational database to minimise redundancy and dependency, it usually involves dividing large tables into smaller tables and defining relationships between them.


Q) How would you give a user access to run a stored procedure?

A) You would grant execute on the stored proc to the user.


Q) What are the best practices if setting up security for a new database?

A) Install only the required components, try to only use Windows authentication mode, Disable the sa account and rename it, Enforce a strong password policy, Change the default SQL Server ports, Remove BUILDIN\Administrators group from SQL Server Logins, Enable logging of successful and failed login attempts, Hide the SQL Server instance or disable the SQL Server Browser service.


Q) How would you separate Windows administration and SQL Server Administration.

A) Deny access to “Builtin\Administrators”


Q) How many Administrative (server) roles are there in SQL Server?

A) Eight roles, dbcreator, diskadmin, bulkadmin, setupadmin, processadmin, securityadmin, serveradmin and sysadmin.


Q) What is Role-Based Access Control?

A) Role-Based Access Control (RBAC) is a security model where permissions are associated with roles, and users are assigned to these roles, thus gaining the role’s permissions. In SQL Server, there are server-level roles and database-level roles.


Q) Why would you use Statistics IO?

A) Statistics IO causes SQL Server to display statistical information regarding the amount of disk activity and CPU generated by a query, it’s a good tool for troubleshooting a query.


Q) How would you prevent a word from being included in a full text catalog?

A) You would add that word (stop word) to the existing Stoplist or create a new stoplist.


Q) How do you check what jobs are running on a server?

A) Jobs activity monitor is the easiest way.


Q) How do you access the instance activity monitor?

A) On the SSMS toolbar click Activity Monitor(Ctrl-Alt-A), in older version it was found under the Management tab of explorer.


Q) What is an allocation bottleneck?

A) Allocation bottlenecks refer to contention in the system pages that store allocation structures, PFS (Page Free Space), GAM (Global Allocation Map) and SGAM (Shared Global Allocation Map). The bottleneck will typically arise in tempdb and can be eased by using multiple data files in tempdb and maybe also by enabling trace flag T1118.


Q) What are the steps needed to connect to a SQL Server instance on a fixed port from SSMS?

A) You can specify this in advanced properties in the connection window. For sqlcmd you can specify the address as <IP address>,port.


Q) Is there anyway of simplifying the connection so that you don’t have to type it in each time?

A) Yes you can add an alias to it via SSCM SQL Client configuration. You could also just add it as a Registered server in SSMS?


Q) How do you change the port in an instance of Reporting services and where is this information stored?

A) The easiest way is to do this with Reporting Services Configuration Manager and you can change the port number in Web service configuration and Report Manager-Advanced.


Q) If you use a checksum on a backup does this remove the need to run a dbcc checkdb on the database?

A) No, a checksum only checks the integrity of the backed up data, although it would possibly give a bit of a heads up to any issues in the database.


Q) What is a merge statement?

A) A merge statement is a great way of simplifying SQL code by allowing you to do a conditional insert, delete and update in one statement as opposed to 3 separate statements.


Q) What are the different types of data compression in a sql server?

A) Row level and Page level compression, there’s also backup compression.


Q) Describe how database compression works?

A) Row-level data compression is essentially turning fixed length data types into variable length data types, freeing up empty space. It also has the ability to ignore zero and null values, it all means you can fit more rows in a data page. Page-level data compression is row-level compression plus prefix and dictionary compression to save on repeated values to give better total compression.


Q) In a datawarehouse environment what effect might you expect page and row compression to have on performance?

A) Much less disk space required, much less physical IO in queries so quicker queries, as long as you have enough CPU resources for compressing/decompressing the data.


Q) What permissions are required to display execution plans from the profiler?

A) Users must be granted ALTER TRACE permission.


Q) What permissions are required to display execution plans from dmv’s or stored proc?

A) They must have relevant permissions on the dmv or stored proc AND be granted SHOWPLAN permission on the database.


Q) What are the minimum rights in 2008 to run a truncate statement?

A) The minimum permission required is ALTER on the table_name.


Q) What is a filtered index?

A) It is an index with a where clause in it.


Q) How does sparse file initialisation improve database creation and expansion?

A) It reclaims used disk space without filling that space with zeros first so it’s much quicker.


Q) Which user operations are non logged?

A) Select into and bcp are the main ones.


Q) What dbcc command would you run post upgrade?

A) It’s a good idea to run DBCC UPDATEUSAGE(0) to update page and row counts. Probably also a good idea to run dbcc checkdb.


Q) What is table partitioning and what are the benefits?

A) It’s a way of splitting a table into more manageable chunks which usually also improves query performance.  It also makes it more scalable and reduces blocking.


Q) Explain the changes in index maintenance strategies since SQL 2000?

A) In later versions of SQL Server there is more information on duplicate and unused indexes in for example DMV’s. There is also now the ability to do resumable online index rebuilds?


Q) What options are available to monitor user activity in SQL Server?

A) Can use triggers or perhaps a server side profiler trace. Can also enable auditing of logins.


Q) What is SQL Injection and why is it a problem?

A) It is a way of attacking a SQL Server via the front-end application or webpage by inserting SQL commands to an improperly designed and secured form.


Q) How should you avoid SQL Injection?

A) Constrain and sanitize the input data by checking for good data by type, length, format and range. Avoid disclosing database error information. Use type-safe parameters for data access. Most importantly perhaps is to use an account that has restricted permissions in the database.


Q) What is the difference between set statistics profile on and set showplan xml on?

A) The main difference is that set showplan_xml on doesn’t actually execute the query, whilst the other one does. Both return the query plan info.


Q) What are certificates and what are they used for?

A) Certificates are used to help secure connections, in database mirroring, to sign packages and other objects or to encrypt data or connections.


Q) What considerations must be given to recovery planning if you encrypt data?

A) You must ensure that you also back up any associated certificates and keys by using the BACKUP CERTIFICATE command.


Q) What methods are available for encrypting data in SQL Server?

A) There is Cell-level encryption which includes certificates, symmetric and asymmetric keys and gives a lot of granularity. If you want to encrypt the whole database you can use Transparent Data Encryption TDE. There is also Windows File-Level encryption and Transport-Level encryption like SSL.


Q) How would you implement SQL Server data encryption so that privileged users can’t decrypt it?

A) You can do this using cell-level encryption with only the application layer knowing the key.


Q) What alternatives to backup/restore data recovery are there?

A) Failover clustering, Database Mirroring, Always On, Log shipping and Transactional replication.


Q) Always-On Availability Group read-only copies, how to set up routing and what to add in the connection string?

A) SQL Server 2016 supports up to eight replicas. By default, the secondary replicas do not allow any workload, but they can be configured as an active readable secondary to allow read-only access to all secondary databases. Setting the readable secondary as Read-Intent Only will allow the secondary server to serve the read-only workload only if the connection string of the application contains the Application Intent=Readonly parameter. Connections with Application Intent value equal to ReadOnly can be enforced to secondary replicas using a new Always On Availability Group feature called Read-Only Routing. In this way, the queries that will perform read-only processes that are directed to the Always On Availability Group listener will be redirected to the secondary replicas instead of the primary replica. In SQL Server 2012 the read-only workload redirection process will redirect the queries specifically to the first secondary replica defined in the routing list, but form 2016 onwards you can add a load_balance list so that there is proper load balancing and all the secondary replica’s can be used.
https://www.sqlshack.com/how-to-configure-read-only-routing-for-an-availability-group-in-sql-server-2016/


Q) What does set MaxDOP(2) do?

A) MAXDOP(n) actually controls the number of CPU cores that can be used by the query processor for parallel processing.


Q) What is most appropriate action to take when trying to resolve a bookmark lookup in a query plan? Create a covered index, force a recompile or rebuild the index?

A) Creating a Covered Index including the lookup column is the best option so as to avoid the costly bookmark lookup.


Q) What is the difference between a where clause and a having clause?

A) A having clause behaves like a where clause within a group by statement


Q) What is the difference between a subquery and a derived table query?

A) A subquery is a SELECT statement that is nested within another statement, it’s like having a query within another query and is used in the Where clause. A derived table is basically a subquery, except it is always in the FROM clause of a SQL statement.


Q) What is collation?

A) A collation encodes the rules governing the proper use of characters for either a language, such as Greek or Polish, or an alphabet, such as Latin1_General. It defines case sensitivity, sort order etc.


Q) What is the purpose of CTE?

A) A common table expression (CTE) can be thought of as a temporary result set that is defined within the execution scope of a single SELECT, INSERT, UPDATE, DELETE, or CREATE VIEW statement. CTEs can be thought of as alternatives to derived tables (subquery), views, and inline user-defined functions.
The main use for a CTE is in recursive queries and where you need to perform an operation based on a few columns which you select in advance. It’s a substitute for creating a view.


Q) What is the purpose of an execution plan?

A) An execution plan is the result of the query optimizer’s attempt to calculate the most efficient way to implement the T-SQL query you submitted.


Q) What are the different types of temporary table?

A) There are 2 types. A local temporary table, #table_name, exists only for the duration of a user session or the procedure that created the temporary table, it can be accessed only by that user. A global temporary table, ##table_name, also exists for the duration of a user session or the procedure that created the table, however, multiple users can access a global temporary table.


Q) What’s the difference between a Temp Table and a Table Variable in SQL Server?

A) Temporary tables are physical tables created for a particular session; they are automatically deleted when the session terminates. Table variables are stored both in memory and the disk in the tempdb. A temporary tables structure can be altered after creation, a Table variable’s cannot. Table variables can be used in user-defined functions, Temporary tables can’t. Table variables do not allow creation of indexes on them, Temporary Table do. Table variables do not use a read lock, Temporary Tables do.


Q) What is a trigger?

A) A trigger is a special kind of a stored procedure that executes in response to certain action on the table like insert, delete or update of data.


Q) What different types of join can you use in TSQL?

A) The main types are inner join, outer join and cross join, but you can also have a Self-join.


Q) What are the Join operators available in SQL Server.

A) The main join operators are;
Merge It is the most effective of all the join operators. However, it requires all input data to be sorted by the join columns.
Adaptive The optimizer defines a threshold number of rows and then chooses between a Nested Loop or Hash Match join operator at runtime.
Hash Match Creates a hash table (in memory) for required columns for each row then creates a hash for the second table and finds matches on each row. It is very expensive and requires a lot of memory resources.Only really effective for joining very large unsorted (or at least sorted in different ways) datasets/tables.
Nested loop Performs a search on the inner (smaller) table for each row of the outer (larger) table. Less expensive than a Hash Match and ideal for small row inputs, it requires less I/O with fewer rows having to be compared.


Q) Which one of these, Nested Loop, merge join or hash join do you use in query execution?

A) All 3 are used based on the type of tables being joined, merge is the most efficient then nested loop join, both need sorted data. Hash join is good on unsorted tables. SQL Server chooses Nested Loop join when one input set is small and other one is fairly large and indexed on its join column. If two inputs are not small but sorted on the join columns; a merge join is the fastest operation. It requires both tables to be sorted on the merge column. The hash join can efficiently handle large, unsorted and nonindexed inputs. The optimiser chooses the optimal join to use based on a costing of number of io’s and cpu required and this is affected by the size of the tables and indexes etc.


Q) Name 3 things you should monitor on SQL Server.

A) You should monitor the error log for any errors. You should monitor the jobs to make sure they all run as expected/succeed. You should monitor the size of the databases. Another thing would be to monitor the overall load on the server, mostly in terms of CPU but also Memory and Network etc.


Q) SQL Server treats NULL as a value. What does this mean?

A) A value of NULL indicates that the value is unknown. No two NULL values are equal but despite this a Unique Index will only allow one Null value.


Q) How do you manage the definition of a database?

A) You can select and alter many options/definitions of a database, you would either do this via the SQL Server Management Studio but you can also use the alter database command and various other T-SQL commands. You can also set things like collation on the database level as well as recovery model, owner etc.


Q) What is the difference between Simple, Bulk-Logged and Full logging modes?

A) Simple mode does not allow log backups, it automatically reclaims log space. It can’t be used with Log Shipping, Always On, Database Mirroring or point in time recovery.   Full mode requires log backups and allows all the functions which Simple mode doesn’t including point in time recovery with no data loss.  Bulk mode also requires log backups, its similar to Full mode but permits high-performance bulk copy operations whilst reducing log space usage by using minimal logging for most bulk operations. Bulk mode does not allow point in time recovery but it’s useful if the database is subjected to periodic bulk operations, e.g a Data Warehouse environment.


Q) What can you do with a column NVARCHAR(450) that you can’t do with a column NVARCHAR(451)?

A) You can create an index on nvarchar(450) but not on nvarchar(451). This is due to the SQL Server limitation that a SQL Server Index can only be up to 900 bytes.


Q) Can DDL and DML statements be mixed in a transaction?

A) Yes, they frequently are mixed in stored procedures but it’s a good idea to do all the DDL operations (Declare, create table etc) first and then do the DML (Set, select, Insert etc), to avoid any costly recompiles. Not all DDL’s can be mixed though.


Q) What is a schema, for example dbo.MyTable – what is the “dbo” part?

A) In the above example dbo is the schema/owner. A schema is a distinct namespace to facilitate the separation, management, and ownership of database objects.


Q) What is the difference between Recovery Point Objective and Recovery Time Objective?

A) The main difference is in their purpose. Recovery Point Objective is focussed on data and a company’s loss tolerance in relation to the data.  Recovery Time Objective is the target time a company sets for recovery of their IT and Business activities after a disaster has struck.


Q) If the Average Page Life Expectancy is 300 seconds, and you have a machine with 128GB of RAM – what does this mean?

A) It means that you have adequate RAM in the system, as 300 seconds is the recommended minimum value. Specifically, it means that on average a page will stay in the buffer pool for 300 seconds without any references.


Q) What is the cxpacket wait type?

A) The SQL Server CXPACKET wait type is a result of parallel query execution and indicates that a session is waiting for the synchronization of threads involved in the parallel process to complete. If waiting is excessive and cannot be reduced by tuning the query (such as adding indexes), consider adjusting the cost threshold for parallelism or lowering the degree of parallelism.


Q) What is SOS Scheduler yield?

A) SQL Server has multiple threads, when any thread yields itself to another thread, it creates this wait type. If this wait time is very high then it indicates that the CPU is under pressure. You can fix this by giving more CPU to the system or by optimising poorly written cpu hungry queries.


Q) What is a resource semaphore wait type?

A) Simply put it is when a query is waiting to grab some memory to be able to run, it is caused primarily by not having  enough memory available for the sql server or by queries themselves being inefficient and requiring too much memory to run.


Q) What would high wait times in CXPacket indicate in a datawarehouse environment?

A) CXPacket waits represent the time needed for a parallel plan to be executed, as long as it is less than 50% of total waits it shouldn’t be considered as a problem but rather an indicator of parallelism working. High CXPacket values could be an indication that the Cost Threshold for Parallelism is set to low which causes almost all queries to be parallel which for small queries is inefficient. The default value of 5 is almost always too low and should be increased to at least 50 with further tuning advised to find the optimal value. It might also be worth looking at MAXDOP and setting that to less than the total number of cores on the underlying Server. High CXPacket Wait times can also be an indication of uneven data distributions through fragmented indexes or out of date Stats. DataWarehouses benefit a lot from using parallelism so the more cores the better but only for expensive queries.


Q) How would you use wait stats to troubleshoot a performance issue in SQL Server

A) Microsoft added a new Dynamic Management View (DMV), sys.dm_exec_session_wait_stats, in SQL Server 2016. It provides information about wait statistics for an active session. Extended events can also be used to gather wait stats. Wait stats will help pinpoint an issue, for example very high cxpacket waits would indicate badly tuned parallelism where perhaps too many queries run in parallel, whereas SOS_Scheduler_yield would indicate CPU pressure.


Q) What is the difference between a Page latch and a Page IO latch?

A) Page latches are held on the buffer memory, if they are high then it could be an indication of contention in modifying the same page. PageIOLatches specifically refer to IO latch waits, an increase in this value indicates an issue with the IO subsystem, i.e. it takes too long to load pages into memory from disk.


Q) What are the most common wait types you’ll see in SQL Server?

A) ASYNC_NETWORK_IO – Basically just SQL Server waiting for a response from the application, nothing you can do about it.
CXPACKET – Explained elsewhere, to do with parallelism
LCK* – One query is holding locks on an object, and another query is waiting to get locks on that same object. Can be tuned with better indexes or perhaps you have too many indexes.
PAGEIOLATCH_* – SQL Server is waiting to read data pages from storage. These pages weren’t cached in memory, so we have to get them from disk, can be tuned with better indexes or adding more memory or getting faster disks.
SOS_SCHEDULER_YIELD – The SQL Operating System is waiting for a CPU scheduler to yield more time. Can be related to inefficient parallelism setup.
WRITELOG – Users are doing delete/update/insert (DUI) operations, and SQL Server has to write those to the transaction log file before telling the end users that their transaction is committed, can be eased by using delayed durability if you can risk losing some data.


Q) A message in the error log saying that a significant amount of memory has been paged out, why is this happening and how do you prevent it?

A) This would commonly occur when the underlying OS or other applications on the Windows server are being starved of memory resources by the SQL Server, the solution is to ensure you always leave an adequate amount of Memory to the OS, e.g. on an 8GB Windows server leave at least 4 GB to OS.


Q) Why would you disable the write through cache in a RAID controller on a Database Server?

A) If you need to 100% ensure all data is written to disk when committed then you should disable write through caching otherwise if the machine is suddenly shut down you could end up with inconsistent data which the SQL Server thinks it has been written to disk but which in reality was only written to the cache memory.


Q) How would you allocate a mix of RAID 10 and RAID 5 on a SQL Server?

A) RAID 10 is far superior especially in write-heavy scenarios whereas if you have databases or portions of a database which are predominantly read-only then you can get away with using RAID5.


Q) What’s a “numbers table”, and why would you use it?

A) A numbers table serves as a static sequence, it will have a single column, with consecutive numbers from 0 or 1 up to a predefined upper boundary.  They are useful for string manipulation, populating test tables with lots of data, eliminating cursor logic and lots of other tasks which would be hard to do without them.


Q) Does SQL Server have native support for UTF8?

A) It does since SQL Server 2016, one work around in earlier versions was to convert the UTF8 to UTF16.


Q) What is lock pages in memory?

A) Lock Pages in Memory (LPIM) is a Windows policy, it locks the SQL memory in physical memory and prevents it being paged out to improve the system performance and can be very helpful for SQL Server performance. Less important now than it used to be in earlier versions.


Q) What is parameter sniffing?

A) SQL Server uses a process called parameter sniffing when it executes stored procedures that have parameters. When the procedure is compiled or recompiled, the value passed into the parameter is evaluated and used to create the execution plan. Subsequent queries then use this plan which may not fit the new parameters optimally.


Q) What are temporal tables in SQL Server

A) Temporal tables (also known as system-versioned temporal tables), are a type of user table designed to keep a full history of data changes, allowing easy point-in-time analysis. Every temporal table has two explicitly defined date columns, referred to as period columns, they are used exclusively by the system to record the period of validity for each row, whenever that row is modified.  It uses a current table and a (mirrored schema) historical table which automatically stores the previous version of the row each time it gets updated or deleted.

Temporal tables are useful for Auditing data changes, Reconstruction of data to a specific point in time, Calculating trends over time and Maintaining a slowly changing dimension for decision support apps.


Q) What is the difference between Change tracking and CDC (Change Data Capture) in SQL Server

A) Change tracking is light weight tracking used primarily for application synchronisation tasks where the app only needs to know that a change has occurred and what the most recent data is for each database or table, an advantage is that it is synchronous.
CDC maintains a history of row changes, including the actual data that was changed, it does this by asynchronously reading the database’s transaction log via SQL Agent jobs to detect when a change has occurred and storing that information in the tracked table’s associated change table.


Q) What are the benefits of table partitioning along with CDC in large data sets

A) Table partitioning makes good sense in large data sets where you can split the table data into partitions, the majority perhaps being read-only and only a small part needing to be read/write, it makes it easier to back up data and cuts down on contention etc. In terms of table partitioning together with CDC and replication, care must be taken with partition switching and you must ensure that all tables that are involved in partition switching exist at the Publisher and Subscriber, and make sure that the table and partition definitions are the same. Before executing a split or merge operation on a replicated or CDC enabled table you must ensure that the partition in question does not have any pending replicated commands. You should also ensure that no DML operations are executed on the partition during the split and merge operations.


Q) What is batch mode processing in SQL Server?

A) Batch mode execution used to only be available for Columnstore tables, but from SQL Server 2019 onwards there is also “batch mode on rowstore”. It’s a query processing method, and the advantage of this query processing method is to handle multiple rows at a time, chunks of approx. 900 rather than 1 by 1. This approach gains big performance enhancements when a query performs aggregation, sorts, and group-by operations on a large amount of data. It uses significantly less cpu and therefore execution time. It is chosen automatically by the optimiser if deemed beneficial.


Q) What is uniform extent versus mixed extent?

A) Uniform extents are owned by a single object; all eight pages in the extent can only be used by the owning object.
Mixed extents are shared by up to eight objects. Each of the eight pages in the extent can be owned by a different object.
Uniform extents are obviously better in terms of performance. Starting with SQL Server 2016, the default for most allocations in a user database, and tempdb, is to use uniform extents.


Q) What is a federated database?

A) A federation is a group of SQL Servers, typically in Azure, that cooperate to share the processing load of a system. Federated database servers let you scale out a set of servers to support the processing needs of large systems and websites.


Q) What is the difference between the file extensions .mdf and .ndf ?

A) .mdf signifies a primary database file whilst .ndf is a secondary database file.


Q) What is SET QUERY_GOVERNOR_COST_LIMIT used for?

A) For the current session it sets the maximum cost limit of time it will allow a query to execute. Any query cost greater than the maximum value will not be allowed to execute.


Q) What to do if a SQL Server patch fails and the SQL Server won’t come up?

A) The general first step is to reboot the Windows server and try to run the patch again, if there is still an issue then you need to look at the patch log files (%programfiles%\Microsoft SQL Server\nnn\Setup Bootstrap\Log) and take corrective action. You could also try to uninstall the patch. Worst case scenario is to rebuild the SQL Server and restore from backups, but usually the reboot and redeploy works. Another handy trick is to try to start the SQL Server using traceflag 902 then correct/remove whatever might be blocking the patch (e.g SSISDB), which can then be re-added later.




Scenario based questions:

Q) A User can’t connect to a database, what steps would you take to diagnose the issue?


Q) A User calls to inform that an application is running slow, steps to resolve?


Q) Users call again to say it is running slow after an overnight batch, what could be the reasons?


Q) How would you manage a release deployment?


Q) How would you manage the patching of lots of servers?




SSRS DevOps Questions

Q) What is a parameterised report (SSRS)?

A) A parameterized report uses input values to complete report or data processing. Parameters are used in dataset queries to select report data, to filter the result set that the query returns, or to set layout properties used for showing and hiding parts of a report.


Q) What’s the purpose of a dataset in a report?

A) A dataset is how you add data to a report, it defines the data source and what you want to retrieve form it. A dataset contains the information that is needed to retrieve a specific set of data from a data source.


Q) Where would you go to access a reporting services report?

A) You would go to Report Manager ?????


Q) If you wanted a custom function that for example took in a number and returned the name of a shape, how would you put that into a report?

A) You would create this in Report Builder, Report Properties and code.


Q) How would you deploy a reporting services report?

A) You would deploy it via Report Manager, (Native mode deployment). You can also deploy it from a Visual Studio project or you could publish the report to Sharepoint or within custom applications.


Q) How do you develop the report?

A) Report Builder or Visual Studio with BI data tools


Q) How would you create a conditional format is ssrs report?

A) In report Builder you would add an expression into the relevant field’s properties




SSAS DevOps Questions

Q) What is CSV file?

A) A CSV is a comma separated values file, which allows data to be saved in a table structured format.


Q) What is xml file?

A) XML is a file extension for an Extensible Markup Language (XMLfile format used to create common information formats and share both the format and the data on the World Wide Web, intranets, and elsewhere using standard ASCII text.


Q) What are fact and dimension table for and what is the difference between Fact and Dimension tables?

A) The fact table contains business facts (or measures), and foreign keys which refer to candidate keys (normally primary keys) in the dimension tables. Contrary to fact tablesdimension tables contain descriptive attributes (or fields) that are typically textual fields (or discrete numbers that behave like text).


Q) What is the difference in Primary Keys between a natural key and a surrogate key?

A) A natural key is a primary key designed around an existing data column(s). A surrogate key is made up, for example an identity column.


Q) In relation to Kimball methodology, explain the following types of Facts table, Transaction, Periodic snapshot and Accumulating snapshot?

A) These are all types of grain. Transaction grain corresponds to a measurement taken at a single instant eg a grocery shop beep, they can be huge and unpredictably sparse. Periodic snapshots correspond to a predefined span of time, often a financial reporting period. Can also be large but is more predictably dense. Accumulating snapshots correspond to a predictable process that has a well defined beginning and end. These tables are typically smaller as they tend to overwrite data as they process.


Q) What are the types of relation models in SSAS?

A) Tabular and Multidimensional. Under multidimensional you have ROLAP and MOLAP


Q) In SSAS what is XMLA?

A) XML for Analysis (XMLA) is a Simple Object Access Protocol (SOAP)-based XML protocol, designed specifically for universal data access to any standard multidimensional data source residing on the Web.


Q) What types of keys are used to design the facts and dimensions in SSAS cube?

A) Primary(Candidate) and Foreign Keys are the main types but also alternate keys.


Q) What is slowly changing dimension and explain the two types?

A) The term slowly changing dimensions is the variation in dimensional attributes over time. The two main types are overwriting history and preserving history. There is also another type called preserving a version of history.


Q) What is the purpose of audit tables?

A) Audit Tables are used to track transactions against a particular table or tables. Audit Tables are typically filled through the use of Database Triggers


Q) A question related Kimball methodology for datawarehouse design related ?????

A) The dimensional model is built on a star-like schema, with dimensions surrounding the fact table. To build the schema, the following design model is used: Choose the business process, Declare the grain, Identify the dimensions and finally Identify the fact.


Q) When do you need more than one measure?

A) ?? Depends what information you want to return you might want to know the sum and average of a selection of data. Also if you have multiple facts in the same model.