LeetCode 182- Duplicate Emails (SQL & Python) Solutions

This article is going to provide the solution to LeetCode 182 Duplicate Emails. It’s an easy difficulty question.

Write a solution to report all the duplicate emails. Note that it’s guaranteed that the email field is not NULL.

Return the result table in any order.

The result format is in the following example.

LeetCode Question Link

If you want to skip the article, check out the full YouTube video where I code the answer live.

LeetCode 182 SQL Solution

with tot_emails as (
    Select email, count(*) as email_count
    from Person
    group by email
)
Select email
from tot_emails
where email_count > 1

Within the CTE, it selects the email column from the Person table and counts the occurrences of each unique email using the COUNT(*) function. The result is aliased as email_count. The GROUP BY email clause groups the results by the email column.

The main query selects the email column from the tot_emails CTE. The WHERE clause filters the results to only include rows where the email_count is greater than 1, meaning that the email appears more than once in the Person table.

LeetCode 182 Python Solution

import pandas as pd
def duplicate_emails(person: pd.DataFrame) -> pd.DataFrame:
    counts = person.groupby('email').size().reset_index()
    counts.columns = ['email', 'count']
    duplicated_emails = counts[counts['count'] > 1]['email']
    return pd.DataFrame({'email': duplicated_emails})

LeetCode gives us a function named duplicate_emails that takes a pandas DataFrame named person as input and returns another pandas DataFrame.

The first line in the function groups the DataFrame person by the email column and counts the occurrences of each unique email. The result is reset_index() so that the grouped column becomes a regular column in the resulting DataFrame.

Next rename the columns of the counts DataFrame to email and count for clarity.

Filter the counts DataFrame to include only rows where the count of emails is greater than 1, indicating duplicates. It then selects only the email column.

Finally return a dataframe with a column named email which is populated by the duplicated emails.

Leave a Reply

Your email address will not be published. Required fields are marked *