Project Overview: Keeping Our Customers Hooked!
This section sets the stage for our data-driven journey. It explains the core challenge of customer churn, the ambition behind this project, and the comprehensive scope of our analysis to build a robust retention strategy for StreamVerse.
About This Project: Keeping Our Customers Hooked!
Ever wondered why some people cancel their streaming subscriptions while others stay loyal for years? That's the core question I set out to answer! For any subscription business like StreamVerse, customer churn (when customers leave) is a major challenge, directly impacting revenue and growth. My goal here was to dig into customer behavior data, spot those early warning signs of churn, and come up with smart, proactive ways to retain our valuable customers. It's all about turning raw data into actionable insights to boost customer lifetime value!
Project Scope: Our Data Detective Work
This project is a full, end-to-end journey in data analytics, focusing on customer data from StreamVerse. Here’s a peek into the detective work involved:
- Generating Realistic Data: I crafted synthetic, yet realistic, data covering customer demographics, subscription history, usage patterns, and even support interactions. The key was baking in "clues" about churn right from the start!
- Database Design: Setting up a structured relational database (PostgreSQL) to efficiently store and manage all this rich information.
- SQL Power: Using powerful SQL queries to clean, transform, and analyze the data, unearthing the key indicators that signal churn.
- KPI Tracking: Calculating vital metrics like overall churn rate, average subscription duration, and feature usage to quantify our findings.
- Strategic Solutions: Developing concrete business recommendations and a retention strategy directly from our data-driven insights.
Dashboard Summary: At a Glance
This section provides a high-level overview of StreamVerse's customer churn landscape. It presents the overall churn rate and average subscription duration, giving us a quick pulse on customer retention.
Overall Churn Rate
26.6%
of customers have churned.
Churn vs. Active Customers
Average Subscription Duration (Churned Users)
182 Days
Average lifespan before cancellation.
Data Generation & Database Setup
This section dives into the technical foundation of the project: how I generated the realistic dataset using Python and designed the robust PostgreSQL database schema to house our information.
Data Generation: Bringing StreamVerse to Life with Python
To simulate real-world scenarios, I used Python's `pandas` for data handling and `Faker` to generate synthetic, interconnected datasets. The critical step here was to build in realistic variations and correlations from the outset. For example, I intentionally made customers on the 'Premium' plan more likely to churn and linked higher churn to unresolved support tickets. This approach ensures our analysis yields genuine, actionable insights instead of uniform results.
Key Data Points Generated:
- `CUSTOMERS`: Demographics (age, country, gender), join date, and crucial `is_churned`/`churn_date` flags, with churn probability directly linked to their plan.
- `SUBSCRIPTIONS`: Details on subscription plans (`Basic`, `Standard`, `Premium`), monthly cost, and subscription duration.
- `USAGE_HISTORY`: Insights into `minutes_watched`, `favorite genres`, and `devices` used, with lower usage intentionally correlating with churn.
- `SUPPORT_TICKETS`: Information on customer issues, `resolution times`, and `resolution status`, where unresolved/long-resolution tickets correlate with churn.
All this data is saved into CSV files, ready for seamless database import!
Database Setup: PostgreSQL Powerhouse
With our data ready, the next step was building a reliable home for it. I designed a relational database schema in PostgreSQL to meticulously track every aspect of a customer's journey, ensuring data integrity and efficient querying.
Table Schemas:
- CUSTOMERS: `customer_id` (PK, INT), `age` (INT), `country` (VARCHAR), `gender` (VARCHAR), `is_churned` (BOOLEAN), `join_date` (DATE), `churn_date` (DATE)
- SUBSCRIPTIONS: `subscription_id` (PK, VARCHAR), `customer_id` (FK to `CUSTOMERS`), `plan_type` (VARCHAR), `monthly_cost` (DECIMAL), `subscription_start_date` (DATE), `subscription_end_date` (DATE)
- USAGE_HISTORY: `usage_id` (PK, INT), `customer_id` (FK to `CUSTOMERS`), `minutes_watched` (INT), `genre` (VARCHAR), `device` (VARCHAR)
- SUPPORT_TICKETS: `ticket_id` (PK, INT), `customer_id` (FK to `CUSTOMERS`), `ticket_date` (DATE), `issue_type` (VARCHAR), `resolution_time_days` (INT), `is_resolved` (BOOLEAN)
To Replicate the Database Setup:
- Install PostgreSQL: Ensure you have PostgreSQL installed and running.
- Create Database: Connect to `psql` (or your preferred client) and run:
CREATE DATABASE project_1;
- Connect to Database:
\c project_1;
- Create Tables: Execute the `CREATE TABLE` statements (as detailed above in "Table Schemas") for each table.
- Import Data: Run the Python data generation script (from your project files) to create the CSVs. Then, use the `COPY` commands (remember to replace `/path/to/your/csv_files/` with your actual path, e.g., `C:/Users/YourUser/Desktop/project_folder/`):
TRUNCATE TABLE customers CASCADE; -- Clear existing data safely and all related foreign key tables COPY customers FROM 'C:/Users/Dell/SQL project/customers.csv' WITH (FORMAT CSV, HEADER TRUE, DELIMITER ',', ENCODING 'UTF8'); COPY subscriptions FROM 'C:/Users/Dell/SQL project/subscriptions.csv' WITH (FORMAT CSV, HEADER TRUE, DELIMITER ',', ENCODING 'UTF8'); COPY usage_history FROM 'C:/Users/Dell/SQL project/usage_history.csv' WITH (FORMAT CSV, HEADER TRUE, DELIMITER ',', ENCODING 'UTF8'); COPY support_tickets FROM 'C:/Users/Dell/SQL project/support_tickets.csv' WITH (FORMAT CSV, HEADER TRUE, DELIMITER ',', ENCODING 'UTF8');
Key Analyses & Insights: What the Data is Whispering
This section showcases the heart of the project: the critical SQL queries I used to extract actionable insights from the StreamVerse data. Each analysis directly answers a business question, providing the SQL code, its sample output, a clear chart, and the key takeaway.
2. Churn Rate by Subscription Plan
Purpose: To identify if certain subscription plans are more susceptible to churn, revealing potential value mismatches.
SELECT
s.plan_type,
COUNT(c.customer_id) AS total_customers,
SUM(CASE WHEN c.is_churned = TRUE THEN 1 ELSE 0 END) AS churned_customers,
CAST(SUM(CASE WHEN c.is_churned = TRUE THEN 1 ELSE 0 END) AS REAL) / COUNT(c.customer_id) AS churn_rate_by_plan
FROM
CUSTOMERS c
JOIN
SUBSCRIPTIONS s ON c.customer_id = s.customer_id
GROUP BY
s.plan_type
ORDER BY
churn_rate_by_plan DESC;
Sample Output:
Plan Type | Total Customers | Churned Customers | Churn Rate |
---|---|---|---|
Premium | 3350 | 1206 | 0.3600 |
Standard | 3300 | 825 | 0.2500 |
Basic | 3350 | 502 | 0.1500 |
Insight: Clearly, Premium plan users churn at a significantly higher rate (36%) compared to Basic (15%) and Standard (25%). This strongly suggests a major issue with the Premium plan's value proposition.
3. Average Minutes Watched for Churned vs. Active Customers
Purpose: To uncover behavioral patterns related to content engagement, highlighting disengagement as a churn predictor.
SELECT
c.is_churned,
AVG(uh.minutes_watched) AS avg_minutes_watched
FROM
CUSTOMERS c
JOIN
USAGE_HISTORY uh ON c.customer_id = uh.customer_id
GROUP BY
c.is_churned;
Sample Output:
Is Churned | Avg. Minutes Watched |
---|---|
FALSE | 350.12 |
TRUE | 125.89 |
Insight: Churned customers watch significantly fewer minutes (avg 125.89) than active customers (avg 350.12)strong>. This is a strong indicator that disengagement precedes churn.
4. Churn Rate by Customer Demographics (Age Group)
Purpose: To understand if certain age groups are more prone to churn, informing targeted retention efforts.
SELECT
CASE
WHEN age BETWEEN 18 AND 24 THEN '18-24'
WHEN age BETWEEN 25 AND 34 THEN '25-34'
WHEN age BETWEEN 35 AND 44 THEN '35-44'
WHEN age BETWEEN 45 AND 54 THEN '45-54'
ELSE '55+'
END AS age_group,
COUNT(customer_id) AS total_customers,
CAST(SUM(CASE WHEN is_churned = TRUE THEN 1 ELSE 0 END) AS REAL) / COUNT(customer_id) AS churn_rate
FROM
CUSTOMERS
WHERE age IS NOT NULL
GROUP BY
age_group
ORDER BY
churn_rate DESC;
Sample Output:
Age Group | Total Customers | Churn Rate |
---|---|---|
55+ | 1500 | 0.32 |
45-54 | 2000 | 0.27 |
35-44 | 2500 | 0.25 |
25-34 | 2000 | 0.23 |
18-24 | 1800 | 0.20 |
Insight: The 55+ age group exhibits a slightly higher churn rate, suggesting potential challenges in catering to older demographics (e.g., content preferences, ease of use).
5. Most Watched Genres by Churned Customers
Purpose: To identify if specific content preferences or gaps correlate with churn, particularly for those who left the service.
SELECT
uh.genre,
AVG(uh.minutes_watched) AS avg_minutes_watched,
COUNT(uh.customer_id) AS total_sessions_churned
FROM
USAGE_HISTORY uh
JOIN
CUSTOMERS c ON uh.customer_id = c.customer_id
WHERE
c.is_churned = TRUE
GROUP BY
uh.genre
ORDER BY
avg_minutes_watched DESC;
Sample Output (for Churned Customers):
Genre | Avg. Minutes Watched | Total Sessions Churned |
---|---|---|
Action | 135.2 | 2500 |
Comedy | 130.5 | 2200 |
Sci-Fi | 128.0 | 1800 |
Drama | 120.1 | 2000 |
Documentary | 95.0 | 500 |
Insight: While churned customers watch less overall, their engagement with Documentary content is notably lower compared to other genres. This could indicate a content gap or dissatisfaction within this specific genre for those who decide to leave.
6. Churn Based on Device Usage
Purpose: To determine if a particular device type is associated with higher churn rates, pointing to potential UX/performance issues.
SELECT
uh.device,
COUNT(uh.customer_id) AS total_sessions,
CAST(SUM(CASE WHEN c.is_churned = TRUE THEN 1 ELSE 0 END) AS REAL) / COUNT(DISTINCT uh.customer_id) AS churn_rate_by_device
FROM
USAGE_HISTORY uh
JOIN
CUSTOMERS c ON uh.customer_id = c.customer_id
GROUP BY
uh.device
ORDER BY
churn_rate_by_device DESC;
Sample Output:
Device | Total Sessions | Churn Rate |
---|---|---|
Tablet | 12000 | 0.35 |
Mobile | 15000 | 0.28 |
Laptop | 10000 | 0.24 |
Smart TV | 13000 | 0.20 |
Insight: Users primarily watching on Tablet devices have a noticeably higher churn rate. This strongly suggests a potentially suboptimal user experience or technical issues specifically on the tablet platform.
7. Churn Rate Based on Support Ticket Resolution Time
Purpose: To understand the impact of customer service efficiency on churn rates.
SELECT
CASE
WHEN st.resolution_time_days <= 2 THEN 'Quickly Resolved (<3 days)'
WHEN st.resolution_time_days > 2 AND st.resolution_time_days <= 7 THEN 'Average Resolution (3-7 days)'
ELSE 'Slow Resolution (>7 days)'
END AS resolution_category,
COUNT(c.customer_id) AS total_customers_with_tickets,
CAST(SUM(CASE WHEN c.is_churned = TRUE THEN 1 ELSE 0 END) AS REAL) / COUNT(c.customer_id) AS churn_rate
FROM
CUSTOMERS c
JOIN
SUPPORT_TICKETS st ON c.customer_id = st.customer_id
WHERE
st.is_resolved = TRUE
GROUP BY
resolution_category
ORDER BY
churn_rate DESC;
Sample Output:
Resolution Category | Total Customers with Tickets | Churn Rate |
---|---|---|
Slow Resolution (>7 days) | 1000 | 0.45 |
Average Resolution (3-7 days) | 2500 | 0.28 |
Quickly Resolved (<3 days) | 1500 | 0.18 |
Insight: Customers whose issues take longer to resolve (>7 days) have a significantly higher churn rate (45%)strong>. This directly links poor customer service response times to customer attrition.
Driving Insights & Actionable Solutions
This section distills the key takeaways from our analysis and presents concrete, actionable strategies for StreamVerse to implement to reduce customer churn effectively. It's about turning data into real-world impact!
Driving Insights: What Our Data is Truly Telling Us
Our deep dive into StreamVerse's customer data paints a clear picture: churn isn't a mystery; it's a direct response to specific pain points. The data reveals that our churn problem stems from a blend of interconnected factors:
- Value Perception Mismatch: Especially for our Premium users, the perceived value isn't matching the higher cost, leading to dissatisfaction.
- Disengagement: Customers who stop actively using the service, especially watching content, are highly likely to leave.
- Customer Service Failures: Slow or unresolved issues create immense frustration, pushing customers away.
- Platform & Content Gaps: Specific device experiences (Tablets) and certain content genres (`Documentary`) might be underperforming, impacting retention.
By understanding these interconnected factors, we can move from reactive churn management to proactive retention strategies.
1. Optimize Premium Plan Value & Pricing
Problem Identified: High churn among Premium users (36% churn rate) suggests the perceived value doesn't justify the cost.
Strategic Approach (Customer Value Optimization): Re-evaluate Premium tier offerings and pricing to align with customer expectations and reduce churn by providing clear, superior value.
Key Actions:
- Introduce "Premium Lite": Offer a mid-tier plan with some premium features at a lower price point to capture price-sensitive Premium users before they churn completely.
- Enhance Exclusive Content: Invest in acquiring or producing truly unique, high-quality content specifically for Premium subscribers.
- Targeted Re-onboarding: Launch campaigns for new and existing Premium users highlighting exclusive features, early access, and benefits they might be overlooking.
Target Metrics:
- Reduce Premium churn rate by 10-15% within 6 months.
- Increase Premium feature utilization by 20%.
Key Stakeholders:
- Product Management, Marketing, Content Acquisition, Pricing Strategy Team.
2. Implement Proactive Re-engagement & Personalization
Problem Identified: Churned customers watch significantly less (avg 125.89 minutes vs 350.12 for active users), indicating disengagement is a strong precursor to churn.
Strategic Approach (Customer Engagement Lifecycle): Proactively identify and re-engage disengaging customers with personalized content to reignite interest and prevent churn.
Key Actions:
- Automated "We Miss You" Campaigns: Trigger personalized email/in-app messages when a user's `minutes_watched` drops below a defined threshold (e.g., 50% reduction over 30 days).
- Smart Content Recommendations: Leverage viewing history and genre preferences to suggest highly relevant content through push notifications and dynamic homepage sections.
- Interactive Content Discovery: Introduce quizzes or personalized content paths to make content discovery more engaging.
Target Metrics:
- Increase average minutes watched for at-risk segments by 15%.
- Improve re-engagement campaign open rates by 25% and click-through rates by 10%.
Key Stakeholders:
- Marketing, Product Development, Data Science (for recommendation algorithms).
3. Supercharge Customer Support Efficiency
Problem Identified: Customers with slow ticket resolution (45% churn rate for >7 days) are significantly more likely to churn.
Strategic Approach (Customer Experience Improvement): Streamline support processes and empower agents to deliver faster, more effective resolutions, directly impacting customer satisfaction and retention.
Key Actions:
- Prioritize At-Risk Tickets: Implement a system to automatically flag and prioritize support tickets from high-value or churn-prone customers.
- Enhanced Agent Tools: Provide agents with a 360-degree customer view, including subscription details, usage history, and past interactions, to reduce resolution time.
- Automated Follow-ups: Implement automated surveys or check-ins a few days after ticket resolution to ensure satisfaction and identify lingering issues.
Target Metrics:
- Reduce average ticket resolution time by 30%.
- Increase first-contact resolution rate by 15%.
- Decrease churn rate for customers who opened a ticket by 5-10%.
Key Stakeholders:
- Customer Support Leadership, Operations Team, IT/Software Development.
4. Polish Platform Experience & Fill Content Voids
Problem Identified: High churn among Tablet users (35% churn rate) and low engagement with `Documentary` content (avg 95.0 minutes for churned users) indicate platform and content issues.
Strategic Approach (Product & Content Excellence): Address specific user experience pain points and strategically enhance content offerings to improve overall platform satisfaction and engagement.
Key Actions:
- Conduct Tablet UX Audit: Collaborate with Product and Engineering teams to identify and fix bugs, performance issues, or design flaws specific to the tablet application.
- Strategic Content Investment: Allocate budget to acquire or produce new, high-quality `Documentary` content, focusing on popular sub-genres or trending topics to boost engagement.
- User Feedback Integration: Implement in-app feedback mechanisms for specific devices or content categories to gather direct user insights.
Target Metrics:
- Reduce Tablet user churn by 8-12% within 6 months.
- Increase `Documentary` genre watch time by 20% for all users.
- Improve app store ratings for the tablet app by 0.5 stars.
Key Stakeholders:
- Product Management, Engineering, Content Acquisition, UX/UI Design.
Lessons Learned: My Growth Journey
This section reflects on the key skills and insights gained throughout the StreamVerse churn analysis project, highlighting the personal and professional growth achieved.
This project was an incredible learning experience that truly solidified my understanding of the entire data analytics journey. It helped me grow in so many ways, both technically and in how I think about business problems.
From a Technical Lens:
- Data Simulation with Purpose: I learned to go beyond just getting data. I actively simulated realistic datasets with specific conditions, like varying churn rates based on customer plans and behaviors. This taught me how to construct data that tells a story and reveals insights, even when real data isn't available.
- Mastering PostgreSQL & SQL: This project was a deep dive into PostgreSQL. I gained hands-on experience in creating and managing database tables, defining relationships, and efficiently copying data from external sources. More importantly, I honed my SQL querying skills to extract complex information, track key metrics, and identify patterns that directly informed our analysis.
- Data Handling & Transformation: Working with Python and `pandas` for data generation also strengthened my ability to handle, clean, and prepare data for analysis, understanding the nuances of data types and structures.
From a Business Perspective:
- Translating Data into Actionable Solutions: The most rewarding part was learning how to bridge the gap between raw numbers and concrete business solutions. I practiced taking complex data insights and simplifying them into clear, implementable strategies.
- Problem-Solving, Step-by-Step: I developed a systematic approach to problem-solving, starting from identifying a business challenge (churn), diving into data to identify root causes (like high Premium churn or low engagement), and then crafting targeted solutions. This taught me to think critically about the "why" behind the data.
- Impactful Reporting: I learned how to structure findings and recommendations in a way that's easy for business stakeholders to understand and act upon, making the analysis truly valuable.
Overall, this project reinforced my passion for leveraging data to solve real-world business challenges. It showed me how to use technical skills like SQL and data simulation to ultimately help companies grow by building stronger customer relationships and making smarter decisions.