- Published on
Mysql
- Authors
- Name
- Rammy
Key Ideas to cover
- Scalability
- Availability
- Consistency
- Flexibility
- How to do MySQL query optimizations ?
- Analyze MySQL query performance ?
- How to analyze MySQL DB performance + Monitor performance + Reporting tools ?
- MySQL with Redis
- Can MySQL handle multi cluster / nodes ?
- Security
- Transactions
- Event Schedulers
- Views
- Procedures
- Functions
- Indexing
- Variables
- Cursors
- Flow Control
- Triggers
SQL terms
Term | Meaning |
---|---|
SQL Parsing | Parsing a query is the process by which the optimizer component determines the possible ways to run this query and choose the best optimal way. |
Transaction | A logical unit of work that contains one or more SQL statements. The effects of all SQL statements in a transaction can be either applied or rolled back |
Character-set and Collation-set | Character-set defines the character encoding to use in a database ex:latin Collation is a set of rules that define how to compare and sort character strings Collate changes based on caracter-set used |
Database seeding | Database seeding is the initial seeding of a database with data. Seeding a database is a process in which an initial set of data is provided to a database when it is being installed. |
Database Engine | A database engine is the underlying software component that a database management system uses to create, read, update and delete data from a database. |
MYSQL vs MSSQL
Attribute | MySQL | MsSQL |
---|---|---|
Native compatibility | runs smoothly on any OS | designed for windows but can work with other (limitations exist) |
Custom Engines | MyISAM, InnoDB etc | Does not allow other engines |
Cost | Free & open source | Costly and licensed |
Binary collections | Allows data manipulation while running Allow any process to access or manipulate binaries | Not allowed, so more secure |
Backup | Can only backup data by extracting as SQL queries Need to block DB while backup Data restoration is time consuming, since we need to execute many queries | Doesn't block DB Allows users to back up and restore with minimal effort |
Stop Query Execution | Doesn't let user kill or cancel once query starts | Can truncate query while it's running without having to kill the process. |
Query Optimizations :
- The developer must consider the future usage of the tables in terms of the number of records, the table size on disk, the different variations of queries to be performed, and must create the appropriate indexes while designing the table. The developer is responsible for the proper index creation, not the system administrator, the DBA, or any other consultant.
Optimization | |
---|---|
Small tables | - No need to create index - Small tables or less frequently updated tables like countries, clients list can be cached to in memory DB and updated at intervals from DB |
Primary key | - Use unsigned INT which gives you nearly (4.29 billion values) - Do not use UUID / GUID since they compare much slower than INT and consume more space |
Avoid redundant indexes | - Do not include primary key in custom indexes as its already indexed - Avoid multiple indexes indexing the same column, its useless --> Example : - CREATE INDEX idx1 ON users (email) - CREATE INDEX idx2 ON users (email, pass) |
Over indexing | - Do not create too many indexes to cover different cases for where conditionals. - Every index created will impact CRUD speeds as index trees needs to be updated and balanced. - Indexes are loaded into memory and therefore consume more memory. |
Index from start is better | - Once a table becomes huge, if we wanna introduce index on a table remember that indexing is done in a transaction and so table does not allow any changes while index is being created - So it could impact production, so go to maintenance mode before indexing big tables |
Column order matters in Compound index | - Compound Index : Means index with more than one column - Column order matters, choose wrong order could cause full table scan - Priorities the columns that sort the data best |
Never use * -> Get all fields | - Lazily using * even though all fields are not needed could cause more reads and nullify indexes advantage. |
Views :
Use | Info |
---|---|
Simplicity | Views can be used to hide complex queries. |
Security | View can hide some important information from end user by creating view on some selected columns |
Security | Secure table to change the structure of it by using VIEW. |
Redundancy | Reduce redundant code in every procedures/query by using a common view. |
Calculation | All the calculations can be done once in view query. |
Meaningful Name | Table may have name for id like tbl_org_emp_id which can alias like [Employee No] or some meaningful name. |
Roles :
- Role is a collection of privileges
- Roles with certain privileges are assigned to users based on their requirement.
- For example developers are given full privilege roles.
- We can control access to users, such as
- Read
- Write
- Delete
- Update
- Databases
- Tables
- Detailed SQL query on creating and assigning roles
Privilleges --> Roles --> Users
MySQL Architecture
-
Layer 1 : BASIC CONNECTION LAYER
- Basic Client / Server tools
- Connection handling
- Authentication
- Security etc
-
Layer 2 : SERVER LEVEL LAYER
- Query parsing
- Query Analysis
- Optimization
- Caching
- Built in function
- Procedure
- Triggers
- Views
-
Layer 3 : STORAGE ENGINE LAYER
- Storage engines
- Responsible for storing and retrieving all data store.
- Server communicates with them through storage engine API
- Example : InnoDB, MyISAM
Connection management
- Each connection is allocated a thread and all queries made by that connection executes with in that thread.
- Threads are cached / destroyed once connection ends.
Optimization & Execution
-
Parse query to a tree
-
Apply variety of optimization's
- Rewrite query
- Find the order in which to read the tables used in query
- Choose indexes to use
- Ask storage engines corresponding to those tables about cost of certain operations.
- Ask storage engines on statistics of table.
- Finally choose the best way to execute query
-
Query Cache
- Before even parsing the query sql first checks if similar SELECT query exist in cache
- If it does, it simply sends cached results back to client.
-
Analyze MySQL optimizer decisions
- SQL Query analyzer helps to analyze the query and understand what steps the SQL optimizer took to give the result.
- How many rows the MySQL has to inspect to get you the result.
- Did MySQL use any available indexes to filter rows and reduce the rows it has to through.
- We can also use EXPLAIN to understand which tables & columns are worth indexing.
- EXPLAIN Command output is perfectly explained
Concurrency
-
In Database systems there are many problems that need's to be handled to be able to concurrently run multiple queries that access the same data at the same time.
-
Examples :
- Multiple queries trying to write to same table at the same time.
- What happens when a user tries to delete/insert data when a read is happening --> User get inconsistent stale result.
-
LOCKING SYSTEM ( Solution to concurrency ) :
- Shared Locks
- Exclusive Locks or Read Locks
- Write Locks
Lock Type | Use |
---|---|
Read Lock | - Read locks on a resource are shared - Many clients may read from a resource at the same time and not interfere with each other |
Write Lock | - Write locks block both read locks and other write locks - Because the only safe policy is to have a single client writing to the resource at given time and to prevent all reads when a client is writing. |
- Lock Granularity :
- The more precise you are in locking exactly only the data that's being modified the more concurrency you can provide.
- EXAMPLE :
- Locking entire table blocks thousands of queries trying to access the same table.
- Locking a row alone is better for concurrency.
- EXAMPLE :
- Locking system is a comprise between **Lock Overhead ** and Data safety and Concurrency.
- The more precise you are in locking exactly only the data that's being modified the more concurrency you can provide.
Lock Granularity | OverHead | Concurrency | Engine |
---|---|---|---|
Table lock | Lowest | Lowest | MyISAM, Memory |
Row lock | High | High | NDB Cluster |
Row level with MVCC | Highest | Highest | InnoDB, SolidDB |
Transactions
- Let's say to get a job done, like a bank transaction. You need to update or insert around 3 tables.
- It requires around 3 queries.
- Now we need to be able to rollback if at least one of the query fails for any reason.
- This is why we need transactions.
- Query
Given a set of queries in a transaction, Its either implement ALL or NONE
-
All transaction processing system must be ACID compliant --> Atomicity, Consistency, Isolation, Durability
-
Example Storage engines : InnoDB, SolidDB
-
Dead Locks :
- A deadlock_ is when two or more transactions are mutually holding and requesting locks on the same resources, creating a cycle of dependencies.
- InnoDB can detect dead locks and avoid issues.
-
Mixing Engines in transactions:
- Avoid using different storage engines in a transaction.
- Example:
-
A transaction has 2 queries accessing 2 tables.
-
Table1 uses InnoDB , While Table2 uses MyISAM
-
When transaction fail and need rollback, InnoDB table could rollback but MyISAM couldn't.
-
Storage Engines
- MySQL server takes care of storing database schema in file system as shown above.
- The storage engine takes of storing data and their indexes.
- To check storage engine a table uses :
SHOW TABLE STATUS LIKE 'user' \G
Above command shows User table details
- MYISAM
Category | MyISAM |
---|---|
Features | Default MySQL Engine |
Compression | |
Spatial functions | |
Full text indexing | |
Locking | Table locking |
Allows concurrent writes while read queries are running | |
Repair | Automatic Repair |
Manual Repair (Command -> myisamchk ) | |
Data storage | Each table has 2 files, data file and index file |
- InnoDB
Category | InnoDB |
---|---|
Features | Designed for short lived transactions |
Most popular engine for transactional storage | |
Due to its performance and Auto recovery, people use this for non-transactional storage needs too. | |
Allow foreign key constrain | |
Clustered indexes ( so heavy index data is split and can fit into memory) | |
Storage | Stores data in one or more data files that are known as tablespace. |
Locking | Row Locking with MVCC |
Provides high concurrency | |
Creating Indexes | Index creation is slow to compared to MyISAM |
Changes to table structure will rebuild entire table and indexes | |
Does not allow index compression |
- Memory Engine
Category | Memory Engine |
---|---|
Features | Used for data that needs very fast access and doesn't change |
Orders of magnitude faster than MyISAM | |
Support HASH index, which are fast for lookups | |
Storage | Lives in memory so data will not persist after restart |
Since its lives in memory no I/O wait needed --> Fast | |
Cons | Uses table level which gives low write concurrency |
Does not allow variable sized rows, so cannot use varchar | |
Do not support TEXT or BLOB columns | |
Not suitable for general purpose disk based tables. | |
Use Cases | Good for lookup or mapping tables like postal codes, state names |
Caching results of periodically aggregated data | |
MySQL uses this engine for queries that need temporary table to hold intermediate results |
Selecting right engine
-
We can choose storage engines on a table by table basis.
-
It is important to think about which storage engine is appropriate for a given data at the design stage itself.
-
Considerations
Feature | Engine |
---|---|
Don't require transactions | MyISAM |
Need transactions | InnoDB |
Concurrency | MyISAM : if just for read and write and operations doesn't interfere with each other |
InnoDB : When row level locking is needed and transaction compliant | |
Backups | For offline backup any storage engine is fine |
For online backup with Db that has multiple engines the solution gets complicated | |
< Need to check on existing backup solutions > | |
Crash Recovery | InnoDB is best and takes less recovery time |
Primary keys vs Secondary keys | |
Number of i/o operations | |
Data size | |
Example use cases :
-
Logging every call details in real-time applications
-
Logging usually requires high write speeds and occasional read speeds to analyze logs.
-
MyISAM and Archive storage engines are perfect and can support thousands of write speeds per second.
-
What happens when you try to read logs and create summary.
- Issues :
- Depending on query used there's good chances that this could slow down the write speeds.
- **Solutions **:
- Using MySQL replication feature : Where master servers for inserting logs and you can use of the cloned slaves for reading.
- Merge Tables : Adjust the application to log to table based on year month data etc. This way we have multiple tables like log_2020_01 , log_2020_02. This way we avoid heavy reads on table being written.
- Issues :
-
-
Mostly only read data :
- MyISAM is the best choice if you don't mind what happens when MyISAM crashes.
- So better use MyISAM when you are sure of how to retrieve from a crash or data in those tables isn't that important like catalog data... list of countries, pin codes, zips.
-
Order processing :
- Order processing requires transactional storage.
- InnoDB is the best choice if you need support for foreign key constraints as well.
-
Stock quotes :
- MyISAM --> Only if using for personal stock quote analysis
- InnoDB --> When providing a web service with real time quote table where thousands of clients could be trying to read and write.
Status: #todo
Tags: #mysql
References:
Related: