English | 简体中文 | 繁體中文 | Русский язык | Français | Español | Português | Deutsch | 日本語 | 한국어 | Italiano | بالعربية
Some MySQL data tables may contain duplicate records. In some cases, we allow the existence of duplicate data, but sometimes we also need to delete these duplicate data.
In this chapter, we will introduce how to prevent duplicate data from appearing in data tables and how to delete duplicate data from data tables.
You can set a specific field in the MySQL data table to PRIMARY KEY (primary key) or UNIQUE (unique) Indexes to ensure data uniqueness.
Let's try an example: In the table below, there are no indexes or primary keys, so the table allows multiple duplicate records.
CREATE TABLE person_tbl ( first_name CHAR(20), last_name CHAR(20), sex CHAR(10) );
If you want to set the first_name, last_name field in the table to be unique, you can set a composite primary key pattern to set the uniqueness of the data. If you set a composite primary key, then the default value of that key cannot be NULL, and it can be set to NOT NULL. As shown below:
CREATE TABLE person_tbl ( first_name CHAR(20) NOT NULL, last_name CHAR(20) NOT NULL, sex CHAR(10), PRIMARY KEY (last_name, first_name) );
If we set a unique index, then the SQL statement will fail to execute successfully when inserting duplicate data, and an error will be thrown.
The difference between INSERT IGNORE INTO and INSERT INTO is that INSERT IGNORE INTO will ignore the existing data in the database. If the database has no data, it will insert new data, and if there is data, it will skip this record. This way, it can retain the existing data in the database and achieve the purpose of inserting data in the gaps.
The following example uses INSERT IGNORE INTO, which will not cause an error after execution and will not insert duplicate data into the data table:
mysql> INSERT IGNORE INTO person_tbl (last_name, first_name) -> VALUES('Jay', 'Thomas'); Query OK, 1 row affected (0.00 sec) mysql> INSERT IGNORE INTO person_tbl (last_name, first_name) -> VALUES('Jay', 'Thomas'); Query OK, 0 rows affected (0.00 sec)
INSERT IGNORE INTO When inserting data, if the uniqueness of the record is set, no error will be returned if duplicate data is inserted, and only a warning will be returned. And REPLACE INTO will delete the primary or unique duplicate records if they exist before inserting new records.
Another way to set the uniqueness of data is to add a UNIQUE index, as shown below:
CREATE TABLE person_tbl ( first_name CHAR(20) NOT NULL, last_name CHAR(20) NOT NULL, sex CHAR(10), UNIQUE (last_name, first_name) );
Below we will count the number of repeated records of first_name and last_name in the table:
mysql> SELECT COUNT(*) as repetitions, last_name, first_name -FROM person_tbl -> GROUP BY last_name, first_name -> HAVING repetitions > 1;
The above query statement will return the number of repeated records in the person_tbl table. Generally, to query repeated values, please perform the following operations:
determine which column may contain repeated values.
In the column selection list, use COUNT(*).
the columns listed in the GROUP BY clause.
HAVING clause sets the number of repetitions greater than1.
If you need to read non-repeated data, you can use the DISTINCT keyword in the SELECT statement to filter out repeated data.
mysql> SELECT DISTINCT last_name, first_name -> FROM person_tbl;
You can also use GROUP BY to read non-repeated data from a table:
mysql> SELECT last_name, first_name -FROM person_tbl -GROUP BY (last_name, first_name);
If you want to delete duplicate data from the table, you can use the following SQL statement:
mysql> CREATE TABLE tmp SELECT last_name, first_name, sex FROM person_tbl GROUP BY (last_name, first_name, sex); mysql> DROP TABLE person_tbl; mysql> ALTER TABLE tmp RENAME TO person_tbl;
Of course, you can also delete duplicate records in the table by adding INDEX (index) and PRIMAY KEY (primary key) in a simple way. The method is as follows:
mysql> ALTER IGNORE TABLE person_tbl; -ADD PRIMARY KEY (last_name, first_name);