表:reception_log,add_time为接待时间,customer_id为客户id
表:project_customer,flag为客户类型,id为客户id
查询 时间段(2024-11-20)接待的首访客户
2024-11-20 00:00:00 - 2024-11-20 23:59:59 转时间戳: 1732032000 - 1732118399
方法1:
首访:在时间段(2024-11-20)内来过的客户中 & 在 2024-11-20 23:59:59 之前来访过=1次
SELECT c.flag,count(*) as mcount
FROM project_customer c
INNER JOIN (
//在时间段(2024-11-20)来过
SELECT log1.customer_id
FROM reception_log AS log1
RIGHT JOIN (
//获取之前来过1次的客户id集合
SELECT customer_id
FROM reception_log
WHERE add_time<=1732118399 and project_id=1
GROUP BY customer_id
HAVING COUNT(*) = 1
)
AS log2
ON log1.customer_id = log2.customer_id
WHERE log1.add_time>=1732032000 and log1.add_time<=1732118399 and log1.project_id=1
)
AS filter_ids ON c.id = filter_ids.customer_id
group by c.flag limit 0,30
方法2:
首访:在时间段(2024-11-20)内来过的客户中来过1次 & 在2024-11-20 00:00:00之前无记录
SELECT c.flag,COUNT(*) AS mcount
FROM project_customer c
INNER JOIN (
//获取期间内来访1次 & 不在之前来过的集合中
SELECT log1.customer_id
FROM reception_log log1
WHERE log1.project_id = 1 AND log1.add_time BETWEEN 1732032000 AND 1732118399
GROUP BY log1.customer_id HAVING COUNT(*) = 1
AND log1.customer_id
NOT IN (
//获取之前来过客户id集合
SELECT DISTINCT log2.customer_id
FROM reception_log log2
WHERE log2.project_id = 1 AND log2.add_time < 1732032000
)
) AS filter_ids ON c.id = filter_ids.customer_id
GROUP BY c.flag LIMIT 0, 30;
NOT IN 会检查左表的每一行值是否出现在右表的子查询结果中。若出现,就返回 FALSE,若不出现,则返回 TRUE。然而,如果子查询结果中存在 NULL 值,NOT IN 会导致所有的记录都被排除,因为 NULL 和任何值的比较结果都是 UNKNOWN(不成立)。
优化 NOT IN
SELECT c.flag,COUNT(*) AS mcount
FROM project_customer c
INNER JOIN (
//获取期间内来访1次 & 不在之前来过的集合中
SELECT log1.customer_id
FROM reception_log log1
LEFT JOIN (
SELECT customer_id
FROM reception_log
WHERE project_id = 1 AND add_time < 1732032000
GROUP BY customer_id
) AS excluded ON log1.customer_id = excluded.customer_id
WHERE log1.project_id = 1 AND log1.add_time BETWEEN 1732032000 AND 1732118399 AND excluded.customer_id IS NULL
GROUP BY log1.customer_id
HAVING COUNT(*) = 1
) AS filter_ids ON c.id = filter_ids.customer_id
GROUP BY c.flag LIMIT 0, 30;
LEFT JOIN 执行时,MySQL 会先对两个表进行连接。如果右表有较大数据集,可能会导致较慢的执行,尤其是当没有合适的索引时。
优化 LEFT JOIN
SELECT c.flag,COUNT(*) AS mcount
FROM project_customer c
INNER JOIN (
//获取期间内来访1次 & 不在之前来过的集合中
SELECT log1.customer_id
FROM reception_log log1
WHERE log1.project_id = 1 AND log1.add_time BETWEEN 1732032000 AND 1732118399
AND NOT EXISTS (
SELECT 1
FROM reception_log excluded
WHERE excluded.customer_id = log1.customer_id
AND excluded.project_id = 1
AND excluded.add_time < 1732032000
GROUP BY customer_id
)
GROUP BY log1.customer_id
HAVING COUNT(*) = 1
) AS filter_ids ON c.id = filter_ids.customer_id
GROUP BY c.flag LIMIT 0, 30;
语句语法区别
- NOT EXISTS:用于避免某些值出现在另一表中,特别是当右表有合适索引时。
- LEFT JOIN … IS NULL:当需要返回左表所有记录并且确认右表是否匹配时,适用。
- NOT IN:适用于子查询返回小量数据且不包含 NULL 时,但其性能和可靠性不如 NOT EXISTS。
NOT EXISTS 在多数情况下性能最好,特别是在子查询只返回少量数据时,MySQL 会在找到第一个匹配的行后立即停止扫描。因此,对于需要排除某些记录的查询,NOT EXISTS 通常比 LEFT JOIN … IS NULL 和 NOT IN 更高效。