Create a report in Microsoft Word, and answer the following questions:
- What are three conclusions that we can draw about crowdfunding campaigns?
- What are some limitations of this dataset?
- What are some other possible tables and/or graphs that we could create, and what additional value would they provide?
Crowdfunding platforms like Kickstarter and Indiegogo have been growing in success and popularity since the late 2000s. From independent content creators to famous celebrities, more and more people are using crowdfunding to launch new products and generate buzz, but not every project has found success.
To receive funding, the project must meet or exceed an initial goal, so many organizations dedicate considerable resources looking through old projects in an attempt to discover “the trick” to finding success. For this week's Challenge, you will organize and analyze a database of 1,000 sample projects to uncover any hidden trends.
A table contains a database of 1,000 sample crowdfunding projects.
Using the Excel workbook in your .zip file, modify and analyze the sample-project data and try to uncover market trends.
Data for this dataset was generated by edX Boot Camps LLC, and is intended for educational purposes only.
Use conditional formatting to fill each cell in the outcome column with a different color, depending on whether the associated campaign was successful, failed, canceled, or is currently live.
Create a new column called Percent Funded that uses a formula to find how much money a campaign made relative to its initial funding goal. Use conditional formatting to fill each cell in the Percent Funded column according to a three-color scale. The scale should start at 0 with a dark shade of red, and it should transition to green at 100 and blue at 200.
Create a new column called Average Donation that uses a formula to find how much each project backer paid on average.
Create two new columns, one called Parent Category and another called Sub-Category, that use formulas to split the Category and Sub-Category column into the two new, separate columns.
Create a new sheet with a pivot table that analyzes your initial worksheet to count how many campaigns were successful, failed, canceled, or are currently live per category. Create a stacked-column pivot chart that can be filtered by country based on the table that you created.
Create a new sheet with a pivot table that analyzes your initial sheet to count how many campaigns were successful, failed, canceled, or are currently live per sub-category. Create a stacked-column pivot chart that can be filtered by country and parent category based on the table that you created.
The dates in the deadline and launched_at columns use Unix timestamps. Use a formula to convert these timestamps to a normal date.
Create a new column named Date Created Conversion that converts the data in launched_at into Excel's date format. Create a new column named Date Ended Conversion that converts the data in deadline into Excel's date format.
Create a new sheet with a pivot table that has a column of outcome, rows of Date Created Conversion, values based on the count of outcome, and filters based on parent category and Years. Now, create a pivot-chart line graph that visualizes this new table.
Create a report in Microsoft Word, and answer the following questions:
- What are three conclusions that we can draw about crowdfunding campaigns?
- What are some limitations of this dataset?
- What are some other possible tables and/or graphs that we could create, and what additional value would they provide?
- Goal
- Number Successful
- Number Failed
- Number Canceled
- Total Projects
- Percentage Successful
- Percentage Failed
- Percentage Canceled
- Less than 1000
- 1000 to 4999
- 5000 to 9999
- 10000 to 14999
- 15000 to 19999
- 20000 to 24999
- 25000 to 29999
- 30000 to 34999
- 35000 to 39999
- 40000 to 44999
- 45000 to 49999
- Greater than or equal to 50000
Use the COUNTIFS() formula to count how many successful, failed, and canceled projects were created with goals within the ranges listed above. Populate the Number Successful, Number Failed, and Number Canceled columns with these data points.
Add up each of the values in the Number Successful, Number Failed, and Number Canceled columns to populate the Total Projects column. Then, using a formula, find the percentage of projects that were successful, failed, or canceled per goal range.
Create a line chart that graphs the relationship between a goal amount and its chances of success, failure, or cancellation.
Most people would use the number of campaign backers to assess the success of a crowdfunding campaign. Creating a summary statistics table is one of the most efficient ways that data scientists can characterize quantitative metrics, such as the number of campaign backers.
For gaining an in-depth understanding of campaign backers, evaluate the number of backers of successful and unsuccessful campaigns by creating your own summary statistics table.
Create a new worksheet in your workbook with two columns: one for the number of backers of successful campaigns and one for unsuccessful campaigns. Use Excel to evaluate the following values for successful campaigns, and then do the same for unsuccessful campaigns:
- Mean number of backers
- Median number of backers
- Minimum number of backers
- Maximum number of backers
- Variance of the number of backers
- Standard deviation of the number of backers
Use your data to determine whether the mean or the median better summarizes the data.
- The median better summarizes the data to represent the typical number of backers for both successful and failed campaigns. It provides a better representation of the typical number of backers versus the average, or mean, which could be skewed by a large number of backers on a given day.
Use your data to determine if there is more variability with successful or unsuccessful campaigns. Does this make sense? Why or why not?
- There is more variability with successful campaigns because both the variance and standard deviation are high for successful backers. This does make sense because sucessful campaigns can attract a wide range of backers, from small contributors to large ones resulting in a higher variance. Failed campaigns tend to attract fewer backers leading to less variability.
Apply conditional formatting to the following columns:
- Outcome
- Percent Funded
Create six new columns:
- Percent Funded
- Average Donation
- Category
- Sub-category
- Date Created (converted format)
- Date Ended (converted format)
Create a pivot table showing counts of campaigns by outcome:
- Outcomes include: successful, failed, canceled, and live
- Grouped by: Category
Develop a stacked column chart based on this pivot table, with filtering capability by:
- Country
Generate a pivot table with the following structure:
- Rows: Date Created (converted format)
- Columns: Outcome
- Values: Count of Outcome
- Filters: Parent Category and Year
Create a line graph based on this pivot table.
Provide a cohesive analysis including:
- Three conclusions drawn from the data
- Limitations of the dataset and suggestions for additional tables or graphs
Calculate the percentage of projects that were successful, failed, or canceled within each goal range.
Create a line chart showing the relationship between:
- Goal Amount
- Probability of success, failure, or cancellation
Using Excel formulas, calculate the following statistics:
- Mean, Median, Minimum, Maximum, Variance, and Standard Deviation
Provide a brief justification of whether the mean or median better summarizes the data.
- The median better summarizes the data to represent the typical number of backers for both successful and failed campaigns. It provides a better representation of the typical number of backers versus the average, or mean, which could be skewed by a large number of backers on a given day.