Day | Project | Tool Used |
---|---|---|
1 | BlinkIt Sales Analysis | SQL |
2 | IMDB Movie Analysis | Python |
3 | Analysis of Highest Paying IT Jobs In India | Python |
4 | Understanding of Excel | Excel |
5 | World Mining Commodities | SQL |
6 | Vrinda Store Sales Analysis | Excel & SQL |
7 | Pizza Sales Analysis | PowerBI & SQL |
8 | Customer Churn Analysis | Python |
9 | BlinkIt Grocery Sales Analysis | Excel |
10 | Electric Vehicle Data Analysis | Tableau |
11 | Automobile Pricing Analysis | Python(Data Wrangling, Model Development) |
12 | Diwali Sales Analysis | Excel, SQL, Python, Power BI |
13 | Laptop Pricing Analysis | Python(EDA, Model Development) |
14 | Model Evaluation and Refinement | Python(Model Evaluation, Over-fitting, Under-fitting and Model Selection, Ridge Regression,Grid Search) |
15 | Heart Failure Prediction | Machine Learning (Random Forest, Desicion Tree, SVM, K-nearest Neighbor, Naive Bayes) |
Shop online for groceries and get your order delivered at your doorstep in minutes. Enjoy instant delivery with blinkit. Blinkit primarily delivers groceries, fresh fruits, vegetables, meat, stationery, bakery items, personal care, baby care and pet care products, snacks, flowers, etc.
The data is available in 8 csv files:
- customers.csv
- orderdetails.csv
- orders.csv
This project focused on addressing key business challenges for Blinkit, including revenue optimization, customer retention, and operational efficiency. The main goals were:
- Revenue Insights: Pinpoint top-performing products and cities generating the highest revenue.
- Customer Analysis: Understand customer behavior by ranking customers based on spending and identifying inactive customers.
- Order Trends: Uncover patterns in order statuses (delivered, canceled) and customer behavior, like placing multiple orders in a day.
- Operational Insights: Optimize service delivery and improve operational efficiency by analyzing location-specific purchase data.
IMDb (an initialism for Internet Movie Database) is an online database of information related to films, television series, podcasts, home videos, video games, and streaming content online β including cast, production crew and personal biographies, plot summaries, trivia, ratings, and fan and critical reviews.
The dataset used in this analysis can be found in this repository files : imdb_movies.csv.
This project analyzes the IMDB Movie Dataset to uncover trends, popular genres, and factors influencing movie success. Using Python and libraries like Pandas, Numpy, Matplotlib and Seaborn, the analysis delivers actionable insights through comprehensive data exploration and visualization.
The task description of this analysis can be found in this repository files : Task Covered.
- Project Setup and Data Loading
- Data Overview and Cleaning
- Univariate and Bivariate Analysis
- Genre-Specific Analysis
- Yearly and Decadal Trends
- Insights and Summary
- Documentation
This project anlayzed the highest paying IT jobs in India, focusing on job positions, location, salaries, education, and experience levels. It provides insights into salary trends, correltations and distributions, helping to understand the IT sector's job market and the factors influencing compensation.
The dataset used in this analysis can be found in this repository file: position_salary.csv
This project provides key insights into the IT job market in India, analyzing factors like job positions, salary trends, and the impact of education and experience on compensation. It helps professionals understand what roles offer the highest pay and the essential skills required for those positions. The analysis will focus on job positions, locations, salaries, educational qualifications, and experience levels.
- Project Setup and Data Loading
- Exploratory Data Analysis
- Data Cleaning
- Solved Basic Level Quesitons and Interpretation
- Solved Intermediate Level Question and Interpretation
Microsoft Excel is a spreadsheet editor developed by Microsoft.
This section content the understanding of Excel and its capability. (PS: I used my own created data for understanding and using of different excel formulas)
- Excel Formula: SUM, SUMIF, SUMIFs, COUNT, COUNTIF, COUNTIFs, MAX, MIN, CONCATENATE, RIGHT, LEFT, LEN, NETWORKDAYS
- Condition Formula: IF, IFs,Condition Formatting
- Date Formattion; Date-to-Text, Text-to-Date
- XLOOKUP & VLOOKUP
- Pivot Table
This project is a comprehensive SQL-based analysis of the mining industry ideal for beginners and intermediate learners. This project is centered around analyzing them mining industry using real-world dataset from the World Mining Commodities dataset repository. It involoes exploring the mining companies, their operaitonal data, and country-level mining statistics to generate insights and develop SQL proficiency.
To analyze global mining trends, understand commodity production levels, and evaluate company performance using SQL.
The dataset used in this analysis can be found in this repository file: i. 116_world_mining_companies_clean.csv ii. world_mining_commodities_clean.csv
I analyzed 30,000+ records from Vrinda Store, an online clothing retailer to develop a data-driven annual sales report. This project present insights from the analysis of Vrinda Store's 2022 data, integrating sales performance, customer demographics, and operational metrics.
The dataset used in this analysis can be found in this repository file: Vrinda Store Data Analysis Excel.xlsx
The objective is to create a comprehensive annual sales report to guide strategic decisions and enhance sales in 2023.
πΉData Cleaning and Preprocessing: Removed duplicates, corrected errors, structured sales data for analysis πΉData Analysis: Identified sales trends, top-performing products, and customer preference πΉData Visualization & Reports: Created interactive dashboard using hashtag#Excel (pivot tables, charts, & advanced formulas) πΉSQL (CTE, Group By, Aggregate Functions, Rank Functions) πΉInsights & Documentation: Provided actionable insights for data-driven decision-making
- Adults are the largest consumer group, followed by young and the elderly ones.
- Sales decline month over month, singaling a need for proactive stategies.
- Amazon tops the charts as the best-selling partner.
- Maharashtra emerges as the top-performing state with sales of βΉ2,990,221.
- March recorded the highest revenue of βΉ1,928,066 with 2,819 orders, indicating seasonal trends.
- Order Status: 28,641 orders delivered successfully, while only 844 were cancelled and 517 were refunded, showcasing high delivery efficiency
This project involves leveraging SQL queries & building PowerBI Dashboard to analyze π©π’π³π³π π¬ππ₯ππ¬ data, aiming to uncover valuable insights that drive strategic decision-making and optimize business operations.
The dataset used in this analysis can be found in this repository file: pizza_sales.csv
πππ£ππππ’π―π:-
The objective of this project is to analyze π©π’π³π³π π π¬ππ₯ππ¬ data to identify trends and provide actionable insights that can help to increase sales and aim to uncover key metrics and patterns within the sales data by leveraging SQL queries & by building PowerBI Dashboard
- Total revenue: $817860.05
- Expensive pizza: The Greek pizza
- Top Category: Classic
- Busiest Hour: 12:00
- Average Order Value: $17
- Highest Revenue Contribution: Classic Pizza
- Large pizzas are the most commonly ordered size, accounting for πππππ orders.
- The Classic Deluxe Pizza is the most ordered type, totaling ππππ orders.
- Classic pizza dominates total revenue with a contribution of ππ.ππ%.
- The ππ‘ππ’ ππ‘π’ππ€ππ§ ππ’π³π³π emerges as a top revenue generator.
- ππ―ππ«ππ π π©π’π³π³π sales stand at πππ.
- The Thai Chicken Pizza ($43434.25), Barbecue Chicken Pizza ($42768) and California Chicken Pizza ($41409.5) generate the highest revenue.
The analysis explores customer churn patterns, focusing on various factors such as payment methods, contract types, tenure, and demographic attributes. The goal is to identify which factors are most strongly associated with higher churn rates to guide customer retention strategies.
The dataset used in this analysis can be found in this repository file: Customer Churn.csv
β Contract Type and Churn: - Customers on month-to-month contracts exhibit the highest churn rate, with 42% of such customers likely to churn. - In contrast, customers on one-year and two-year contracts have churn rates of 11% and 3%, respectively. - Implication: Longer contract periods serve as a strong retention tool, as customers with extended commitments are far less likely to leave.
β Payment Methods and Churn:
- Customers paying via electronic checks show the highest churn rate at 45%, while those using credit cards, bank transfers, or mailed checks have significantly lower churn rates, averaging around 15-18%.
- Implication: The convenience, security, and trust issues related to electronic payments might be contributing factors. Encouraging customers to switch to more stable payment methods could reduce churn.
β Churn by Tenure:
- Customers with less than one year of tenure are the most likely to churn, with a 50% churn rate. Those with 1-3 years of tenure show a decreasing churn trend at 35%, while customers who have been with the company for more than three years have a churn rate of just 15%.
- Implication: Engaging customers early in their journey, especially within the first year, is critical for retention.
β Churny Internet Service Type:
- Customers using Fiber Optic services show a higher churn rate of 30%, compared to DSL customers with a churn rate of 20%.
- Implication: This could be due to increased competition or dissatisfaction with service quality. Understanding customer satisfaction with service speed and reliability may help retain fiber optic users.
β Senior Citizens and Churn:
- The analysis reveals that senior citizens (aged 65+) have a churn rate of 41%, compared to a 26% churn rate among non-senior citizens.
- Implication: Special retention programs and targeted customer service for senior customers may help reduce churn in this demographic.
- PromoteLong-Term Contracts: Offer incentives for customers to commit to longer contracts to reduce churn.
- AddressPayment Method Concerns: Implement campaigns encouraging customers to switch from electronic checks to more reliable payment methods.
- CustomerEngagement in Early Tenure: Focus on improving the customer experience within the first year, as churn is highest in this period.
- Special Senior Citizen Retention Programs: Create personalized offers or assistance programs to retain the senior demographic.
Using Excel, I designed a dynamic and interactive dashboard providing a deep dive into Blinkit's Grocery sales data. Here's what the dashboard highlights:
Excel Worksheet: BlinkIT Grocery Data Excel
Read Documentation: Blinkit Grocery Sales Analysis Documentation
- Focus on High-Selling Categories: Increase marketing efforts for Fruits and Vegetables and Snack Foods, which together contribute nearly 30% of total sales.
- Expand Regular Fat Content Offerings: As this category constitutes 64.6% of total sales, diversifying products within this segment can further boost revenue.
- Tier 1 Locations and High-Size Outlets: Invest in these high-performing areas and outlet types to maximize returns.
- Stabilize Seasonal Trends: Identify reasons behind sales fluctuations and implement strategies like discounts or promotional campaigns to sustain consistent revenue across all years.
- Customer Retention: Maintain the high average customer rating of 4.0 by focusing on quality and timely delivery, ensuring continued customer satisfaction.
The is an in-depth analysis of electric vehicle (EV) adoption across different dimensions, including EV types, vehicle make, model, and state-wise distribution. With 149,771 total EVs(2011 - 2023), the dataset highlights significant trends and insights about the growing EV market.
Download Dataset from kaggle: Dataset
- Battery Electric Vehicles (BEVs) account for 116,474 units, making up 77.8% of the total EVs whereas Plug-in Hybrid Electric Vehicles (PHEVs) contribute 33,297 units or 22.2% of the total EVs.
- EV adoption has grown significantly since 2011. Total vehicle registrations rose from 0.8K in 2011 to 37.1K in 2023, representing a 4,538% increase.
- Tesla dominates the market with 68,939 vehicles (57.6%) of the total EVs.
- 41.87% (62,711 vehicles) are CAFV eligible, and 11.77% (17,634 vehicles) are not CAFV eligible, indicating room for improvement in meeting CAFV standards.
- Diversify Model Offerings:
- Encourage other manufacturers to innovate and compete with Tesla to diversify market options.
- Promote more affordable models for increased accessibility.
- Range Optimization:
- Invest in R&D to improve EV range and address consumer concerns about distance limitations.
- Highlight vehicles with exceptional range to attract new customers.
I used data wrangling to covert raw data from initial format to a format that may be better for analysis and future model development.
- Handling missing values
- Correct data formatting
- Normalize data
- Identify missing values
- Identify missing values
- Deal with missing values
- Correct data format
- Data Normalization (centering/scaling)
- Binning
- Indicator Variable
Dataset: Dowload Dataset here
In this task, I developed several models that predicted the price of the car using the variables or features. This is just an estimate but should give us an objective idea of how much the car should cost. A model will help us understand the exact relationship between different variables and how these variables are used to predict the result.
- Develop prediction models
- Linear Regression and Multiple Linear Regression
- Model Evaluation using Visualization
- Polynomial Regression
- Pipeline
- Measure for In-Sample Evaluation
- Prediction and Decision Making