Skip to content

Aayush-Basnet/31-Day-of-Data-Analytic-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

50 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

31-Day-of-Data-Analytic-Project


Day Project Tool Used
1 BlinkIt Sales Analysis SQL
2 IMDB Movie Analysis Python
3 Analysis of Highest Paying IT Jobs In India Python
4 Understanding of Excel Excel
5 World Mining Commodities SQL
6 Vrinda Store Sales Analysis Excel & SQL
7 Pizza Sales Analysis PowerBI & SQL
8 Customer Churn Analysis Python
9 BlinkIt Grocery Sales Analysis Excel
10 Electric Vehicle Data Analysis Tableau
11 Automobile Pricing Analysis Python(Data Wrangling, Model Development)
12 Diwali Sales Analysis Excel, SQL, Python, Power BI
13 Laptop Pricing Analysis Python(EDA, Model Development)
14 Model Evaluation and Refinement Python(Model Evaluation, Over-fitting, Under-fitting and Model Selection, Ridge Regression,Grid Search)
15 Heart Failure Prediction Machine Learning (Random Forest, Desicion Tree, SVM, K-nearest Neighbor, Naive Bayes)

Day 1: BlinkIt E-commerce Sales Analysis

Shop online for groceries and get your order delivered at your doorstep in minutes. Enjoy instant delivery with blinkit. Blinkit primarily delivers groceries, fresh fruits, vegetables, meat, stationery, bakery items, personal care, baby care and pet care products, snacks, flowers, etc.

alt text

Features

The data is available in 8 csv files:

  • customers.csv
  • orderdetails.csv
  • orders.csv

Objectives

This project focused on addressing key business challenges for Blinkit, including revenue optimization, customer retention, and operational efficiency. The main goals were:

  1. Revenue Insights: Pinpoint top-performing products and cities generating the highest revenue.
  2. Customer Analysis: Understand customer behavior by ranking customers based on spending and identifying inactive customers.
  3. Order Trends: Uncover patterns in order statuses (delivered, canceled) and customer behavior, like placing multiple orders in a day.
  4. Operational Insights: Optimize service delivery and improve operational efficiency by analyzing location-specific purchase data.

Dataset Schema

alt text


Day 2: IMDB Movie Analysis

IMDb (an initialism for Internet Movie Database) is an online database of information related to films, television series, podcasts, home videos, video games, and streaming content online – including cast, production crew and personal biographies, plot summaries, trivia, ratings, and fan and critical reviews.

alt text

Dataset

The dataset used in this analysis can be found in this repository files : imdb_movies.csv.

Objective

This project analyzes the IMDB Movie Dataset to uncover trends, popular genres, and factors influencing movie success. Using Python and libraries like Pandas, Numpy, Matplotlib and Seaborn, the analysis delivers actionable insights through comprehensive data exploration and visualization.

Task Covered

The task description of this analysis can be found in this repository files : Task Covered.

  1. Project Setup and Data Loading
  2. Data Overview and Cleaning
  3. Univariate and Bivariate Analysis
  4. Genre-Specific Analysis
  5. Yearly and Decadal Trends
  6. Insights and Summary
  7. Documentation

Day 3: Analyis of Highest Paying IT Jobs in India

This project anlayzed the highest paying IT jobs in India, focusing on job positions, location, salaries, education, and experience levels. It provides insights into salary trends, correltations and distributions, helping to understand the IT sector's job market and the factors influencing compensation.

Dataset

The dataset used in this analysis can be found in this repository file: position_salary.csv

Objective

This project provides key insights into the IT job market in India, analyzing factors like job positions, salary trends, and the impact of education and experience on compensation. It helps professionals understand what roles offer the highest pay and the essential skills required for those positions. The analysis will focus on job positions, locations, salaries, educational qualifications, and experience levels.

Task Covered

  1. Project Setup and Data Loading
  2. Exploratory Data Analysis
  3. Data Cleaning
  4. Solved Basic Level Quesitons and Interpretation
  5. Solved Intermediate Level Question and Interpretation

Day 4: Understanding of Excel

Microsoft Excel is a spreadsheet editor developed by Microsoft. alt text

This section content the understanding of Excel and its capability. (PS: I used my own created data for understanding and using of different excel formulas)

  • Excel Formula: SUM, SUMIF, SUMIFs, COUNT, COUNTIF, COUNTIFs, MAX, MIN, CONCATENATE, RIGHT, LEFT, LEN, NETWORKDAYS
  • Condition Formula: IF, IFs,Condition Formatting
  • Date Formattion; Date-to-Text, Text-to-Date
  • XLOOKUP & VLOOKUP
  • Pivot Table

Day 5: World Mining Commodities

This project is a comprehensive SQL-based analysis of the mining industry ideal for beginners and intermediate learners. This project is centered around analyzing them mining industry using real-world dataset from the World Mining Commodities dataset repository. It involoes exploring the mining companies, their operaitonal data, and country-level mining statistics to generate insights and develop SQL proficiency. alt text

Objective

To analyze global mining trends, understand commodity production levels, and evaluate company performance using SQL.

Dataset

The dataset used in this analysis can be found in this repository file: i. 116_world_mining_companies_clean.csv ii. world_mining_commodities_clean.csv


Day 6: Vrinda Store Sales Anlaysis (Excel & SQL)

I analyzed 30,000+ records from Vrinda Store, an online clothing retailer to develop a data-driven annual sales report. This project present insights from the analysis of Vrinda Store's 2022 data, integrating sales performance, customer demographics, and operational metrics. alt text

Dataset

The dataset used in this analysis can be found in this repository file: Vrinda Store Data Analysis Excel.xlsx

Objective

The objective is to create a comprehensive annual sales report to guide strategic decisions and enhance sales in 2023.

Steps and Method:

πŸ”ΉData Cleaning and Preprocessing: Removed duplicates, corrected errors, structured sales data for analysis πŸ”ΉData Analysis: Identified sales trends, top-performing products, and customer preference πŸ”ΉData Visualization & Reports: Created interactive dashboard using hashtag#Excel (pivot tables, charts, & advanced formulas) πŸ”ΉSQL (CTE, Group By, Aggregate Functions, Rank Functions) πŸ”ΉInsights & Documentation: Provided actionable insights for data-driven decision-making

Key Insight

  1. Adults are the largest consumer group, followed by young and the elderly ones. alt text
  2. Sales decline month over month, singaling a need for proactive stategies.
  3. Amazon tops the charts as the best-selling partner.
  4. Maharashtra emerges as the top-performing state with sales of β‚Ή2,990,221.
  5. March recorded the highest revenue of β‚Ή1,928,066 with 2,819 orders, indicating seasonal trends.
  6. Order Status: 28,641 orders delivered successfully, while only 844 were cancelled and 517 were refunded, showcasing high delivery efficiency alt text

Day 7: Pizza Sales Analysis (PowerBI & SQL)

This project involves leveraging SQL queries & building PowerBI Dashboard to analyze 𝐩𝐒𝐳𝐳𝐚 𝐬𝐚π₯𝐞𝐬 data, aiming to uncover valuable insights that drive strategic decision-making and optimize business operations.

alt text

Dataset

The dataset used in this analysis can be found in this repository file: pizza_sales.csv

πŽπ›π£πžπœπ­π’π―πž:-
The objective of this project is to analyze 𝐩𝐒𝐳𝐳𝐚 πŸ• 𝐬𝐚π₯𝐞𝐬 data to identify trends and provide actionable insights that can help to increase sales and aim to uncover key metrics and patterns within the sales data by leveraging SQL queries & by building PowerBI Dashboard

alt text

Key Highlights:

  • Total revenue: $817860.05
  • Expensive pizza: The Greek pizza
  • Top Category: Classic
  • Busiest Hour: 12:00
  • Average Order Value: $17
  • Highest Revenue Contribution: Classic Pizza

Insights:

  • Large pizzas are the most commonly ordered size, accounting for πŸπŸ–πŸ“πŸπŸ” orders.
  • The Classic Deluxe Pizza is the most ordered type, totaling πŸπŸ’πŸ“πŸ‘ orders.
  • Classic pizza dominates total revenue with a contribution of πŸπŸ”.πŸ—πŸ%.
  • The π“π‘πšπ’ π‚π‘π’πœπ€πžπ§ 𝐏𝐒𝐳𝐳𝐚 emerges as a top revenue generator.
  • π€π―πžπ«πšπ πž 𝐩𝐒𝐳𝐳𝐚 sales stand at πŸπŸ‘πŸ–.
  • The Thai Chicken Pizza ($43434.25), Barbecue Chicken Pizza ($42768) and California Chicken Pizza ($41409.5) generate the highest revenue.

Day 8: Techno Customer Churn Analysis(Python)

Objective:

The analysis explores customer churn patterns, focusing on various factors such as payment methods, contract types, tenure, and demographic attributes. The goal is to identify which factors are most strongly associated with higher churn rates to guide customer retention strategies.

Dataset

The dataset used in this analysis can be found in this repository file: Customer Churn.csv

Key Insights & Findings:

● Contract Type and Churn: - Customers on month-to-month contracts exhibit the highest churn rate, with 42% of such customers likely to churn. - In contrast, customers on one-year and two-year contracts have churn rates of 11% and 3%, respectively. - Implication: Longer contract periods serve as a strong retention tool, as customers with extended commitments are far less likely to leave. alt text

● Payment Methods and Churn:

  • Customers paying via electronic checks show the highest churn rate at 45%, while those using credit cards, bank transfers, or mailed checks have significantly lower churn rates, averaging around 15-18%.
  • Implication: The convenience, security, and trust issues related to electronic payments might be contributing factors. Encouraging customers to switch to more stable payment methods could reduce churn. alt text

● Churn by Tenure:

  • Customers with less than one year of tenure are the most likely to churn, with a 50% churn rate. Those with 1-3 years of tenure show a decreasing churn trend at 35%, while customers who have been with the company for more than three years have a churn rate of just 15%.
  • Implication: Engaging customers early in their journey, especially within the first year, is critical for retention.

● Churny Internet Service Type:

  • Customers using Fiber Optic services show a higher churn rate of 30%, compared to DSL customers with a churn rate of 20%.
  • Implication: This could be due to increased competition or dissatisfaction with service quality. Understanding customer satisfaction with service speed and reliability may help retain fiber optic users.

● Senior Citizens and Churn:

  • The analysis reveals that senior citizens (aged 65+) have a churn rate of 41%, compared to a 26% churn rate among non-senior citizens.
  • Implication: Special retention programs and targeted customer service for senior customers may help reduce churn in this demographic. alt text

Recommendations:

- PromoteLong-Term Contracts: Offer incentives for customers to commit to longer contracts to reduce churn. 
- AddressPayment Method Concerns: Implement campaigns encouraging customers to switch from electronic checks to more reliable payment methods. 
- CustomerEngagement in Early Tenure: Focus on improving the customer experience within the first year, as churn is highest in this period. 
- Special Senior Citizen Retention Programs: Create personalized offers or assistance programs to retain the senior demographic.

alt text


Day 9: Blinkit Grocery Store Sales Analysis (Excel)

Using Excel, I designed a dynamic and interactive dashboard providing a deep dive into Blinkit's Grocery sales data. Here's what the dashboard highlights: alt text

Excel Worksheet: BlinkIT Grocery Data Excel

Read Documentation: Blinkit Grocery Sales Analysis Documentation

Key Insights and Recommendations

  • Focus on High-Selling Categories: Increase marketing efforts for Fruits and Vegetables and Snack Foods, which together contribute nearly 30% of total sales.
  • Expand Regular Fat Content Offerings: As this category constitutes 64.6% of total sales, diversifying products within this segment can further boost revenue.
  • Tier 1 Locations and High-Size Outlets: Invest in these high-performing areas and outlet types to maximize returns.
  • Stabilize Seasonal Trends: Identify reasons behind sales fluctuations and implement strategies like discounts or promotional campaigns to sustain consistent revenue across all years.
  • Customer Retention: Maintain the high average customer rating of 4.0 by focusing on quality and timely delivery, ensuring continued customer satisfaction.

Day 10: Electric Vehicle Data Analysis

The is an in-depth analysis of electric vehicle (EV) adoption across different dimensions, including EV types, vehicle make, model, and state-wise distribution. With 149,771 total EVs(2011 - 2023), the dataset highlights significant trends and insights about the growing EV market.

alt text

Dataset

Download Dataset from kaggle: Dataset

Key Insights and Market Trends

  • Battery Electric Vehicles (BEVs) account for 116,474 units, making up 77.8% of the total EVs whereas Plug-in Hybrid Electric Vehicles (PHEVs) contribute 33,297 units or 22.2% of the total EVs.
  • EV adoption has grown significantly since 2011. Total vehicle registrations rose from 0.8K in 2011 to 37.1K in 2023, representing a 4,538% increase. alt text
  • Tesla dominates the market with 68,939 vehicles (57.6%) of the total EVs. alt text
  • 41.87% (62,711 vehicles) are CAFV eligible, and 11.77% (17,634 vehicles) are not CAFV eligible, indicating room for improvement in meeting CAFV standards.

Recommendations

  • Diversify Model Offerings:
    • Encourage other manufacturers to innovate and compete with Tesla to diversify market options.
    • Promote more affordable models for increased accessibility.
  • Range Optimization:
    • Invest in R&D to improve EV range and address consumer concerns about distance limitations.
    • Highlight vehicles with exceptional range to attract new customers.

Day 11: Automobiles Pricing Analysis (Python: Data Wrangling & Model Development)

Part 1: Data Wrangling

I used data wrangling to covert raw data from initial format to a format that may be better for analysis and future model development.

Objectives:

  • Handling missing values
  • Correct data formatting
  • Normalize data

Table of Contents

  • Identify missing values
    • Identify missing values
    • Deal with missing values
    • Correct data format
  • Data Normalization (centering/scaling)
  • Binning
  • Indicator Variable

Dataset: Dowload Dataset here

Part 2: Model Development

In this task, I developed several models that predicted the price of the car using the variables or features. This is just an estimate but should give us an objective idea of how much the car should cost. A model will help us understand the exact relationship between different variables and how these variables are used to predict the result.

Objectives:

  • Develop prediction models

Table of Contents

  • Linear Regression and Multiple Linear Regression
  • Model Evaluation using Visualization
  • Polynomial Regression
  • Pipeline
  • Measure for In-Sample Evaluation
  • Prediction and Decision Making

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published