表:reception_log,add_time为接待时间,customer_id为客户id

表:project_customer,flag为客户类型,id为客户id

查询  时间段(2024-11-20)接待的首访客户

2024-11-20 00:00:00  -  2024-11-20 23:59:59
转时间戳:
1732032000  -  1732118399

方法1:

首访:在时间段(2024-11-20)内来过的客户中 & 在 2024-11-20 23:59:59 之前来访过=1次

SELECT c.flag,count(*) as mcount
FROM project_customer c
INNER JOIN (
    //在时间段(2024-11-20)来过
    SELECT log1.customer_id
    FROM reception_log AS log1
    RIGHT JOIN (
        //获取之前来过1次的客户id集合
        SELECT customer_id
        FROM reception_log
        WHERE add_time<=1732118399 and project_id=1
        GROUP BY customer_id
        HAVING COUNT(*) = 1
    )
    AS log2
    ON log1.customer_id = log2.customer_id
    WHERE log1.add_time>=1732032000 and log1.add_time<=1732118399 and log1.project_id=1
)
AS filter_ids ON c.id = filter_ids.customer_id
group by c.flag limit 0,30

方法2:
首访:在时间段(2024-11-20)内来过的客户中来过1次 & 在2024-11-20 00:00:00之前无记录

SELECT c.flag,COUNT(*) AS mcount
FROM project_customer c
INNER JOIN (
    //获取期间内来访1次 & 不在之前来过的集合中
    SELECT log1.customer_id
    FROM reception_log log1
    WHERE log1.project_id = 1 AND log1.add_time BETWEEN 1732032000 AND 1732118399
    GROUP BY log1.customer_id HAVING COUNT(*) = 1
    AND log1.customer_id
    NOT IN (
        //获取之前来过客户id集合
        SELECT DISTINCT log2.customer_id
        FROM reception_log log2
        WHERE log2.project_id = 1 AND log2.add_time < 1732032000
    )
) AS filter_ids ON c.id = filter_ids.customer_id
GROUP BY c.flag LIMIT 0, 30;

NOT IN 会检查左表的每一行值是否出现在右表的子查询结果中。若出现,就返回 FALSE,若不出现,则返回 TRUE。然而,如果子查询结果中存在 NULL 值,NOT IN 会导致所有的记录都被排除,因为 NULL 和任何值的比较结果都是 UNKNOWN(不成立)。

优化 NOT IN

SELECT c.flag,COUNT(*) AS mcount
FROM project_customer c
INNER JOIN (
    //获取期间内来访1次 & 不在之前来过的集合中
    SELECT log1.customer_id
    FROM reception_log log1
    LEFT JOIN (
        SELECT customer_id
        FROM reception_log
        WHERE project_id = 1 AND add_time < 1732032000
        GROUP BY customer_id
    ) AS excluded ON log1.customer_id = excluded.customer_id
    WHERE log1.project_id = 1 AND log1.add_time BETWEEN 1732032000 AND 1732118399 AND excluded.customer_id IS NULL
    GROUP BY log1.customer_id
    HAVING COUNT(*) = 1
) AS filter_ids ON c.id = filter_ids.customer_id
GROUP BY c.flag LIMIT 0, 30;

LEFT JOIN 执行时,MySQL 会先对两个表进行连接。如果右表有较大数据集,可能会导致较慢的执行,尤其是当没有合适的索引时。

优化 LEFT JOIN

SELECT c.flag,COUNT(*) AS mcount
FROM project_customer c
INNER JOIN (
    //获取期间内来访1次 & 不在之前来过的集合中
    SELECT log1.customer_id
    FROM reception_log log1
    WHERE log1.project_id = 1 AND log1.add_time BETWEEN 1732032000 AND 1732118399
    AND  NOT EXISTS (
        SELECT 1
        FROM reception_log excluded
        WHERE excluded.customer_id = log1.customer_id
        AND excluded.project_id = 1
        AND excluded.add_time < 1732032000
        GROUP BY customer_id
    )
    GROUP BY log1.customer_id
    HAVING COUNT(*) = 1
) AS filter_ids ON c.id = filter_ids.customer_id
GROUP BY c.flag LIMIT 0, 30;

语句语法区别

  • NOT EXISTS:用于避免某些值出现在另一表中,特别是当右表有合适索引时。
  • LEFT JOIN … IS NULL:当需要返回左表所有记录并且确认右表是否匹配时,适用。
  • NOT IN:适用于子查询返回小量数据且不包含 NULL 时,但其性能和可靠性不如 NOT EXISTS。

NOT EXISTS 在多数情况下性能最好,特别是在子查询只返回少量数据时,MySQL 会在找到第一个匹配的行后立即停止扫描。因此,对于需要排除某些记录的查询,NOT EXISTS 通常比 LEFT JOIN … IS NULL 和 NOT IN 更高效。

作者 admin

百度广告效果展示