LeetCode 182- Duplicate Emails (SQL & Python) Solutions
This article is going to provide the solution to LeetCode 182 Duplicate Emails. It’s an easy difficulty question.
Write a solution to report all the duplicate emails. Note that it’s guaranteed that the email field is not NULL.
Return the result table in any order.
The result format is in the following example.
If you want to skip the article, check out the full YouTube video where I code the answer live.
LeetCode 182 SQL Solution
with tot_emails as ( Select email, count(*) as email_count from Person group by email)Select emailfrom tot_emailswhere email_count > 1
Within the CTE, it selects the email column from the Person table and counts the occurrences of each unique email using the COUNT(*) function. The result is aliased as email_count. The GROUP BY email clause groups the results by the email column.
The main query selects the email column from the tot_emails CTE. The WHERE clause filters the results to only include rows where the email_count is greater than 1, meaning that the email appears more than once in the Person table.
LeetCode 182 Python Solution
import pandas as pddef duplicate_emails(person: pd.DataFrame) -> pd.DataFrame: counts = person.groupby('email').size().reset_index() counts.columns = ['email', 'count'] duplicated_emails = counts[counts['count'] > 1]['email'] return pd.DataFrame({'email': duplicated_emails})
LeetCode gives us a function named duplicate_emails that takes a pandas DataFrame named person as input and returns another pandas DataFrame.
The first line in the function groups the DataFrame person by the email column and counts the occurrences of each unique email. The result is reset_index() so that the grouped column becomes a regular column in the resulting DataFrame.
Next rename the columns of the counts DataFrame to email and count for clarity.
Filter the counts DataFrame to include only rows where the count of emails is greater than 1, indicating duplicates. It then selects only the email column.
Finally return a dataframe with a column named email which is populated by the duplicated emails.