Creating project base and adding data files

This commit is contained in:
Noah L. Schrick 2023-03-07 09:58:13 -06:00
commit 07ca564632
3 changed files with 2077 additions and 0 deletions

25
RidingMowers.csv Normal file
View File

@ -0,0 +1,25 @@
Income,Lot_Size,Ownership
60,18.4,Owner
85.5,16.8,Owner
64.8,21.6,Owner
61.5,20.8,Owner
87,23.6,Owner
110.1,19.2,Owner
108,17.6,Owner
82.8,22.4,Owner
69,20,Owner
93,20.8,Owner
51,22,Owner
81,20,Owner
75,19.6,Nonowner
52.8,20.8,Nonowner
64.8,17.2,Nonowner
43.2,20.4,Nonowner
84,17.6,Nonowner
49.2,17.6,Nonowner
59.4,16,Nonowner
66,18.4,Nonowner
47.4,16.4,Nonowner
33,18.8,Nonowner
51,14,Nonowner
63,14.8,Nonowner
1 Income Lot_Size Ownership
2 60 18.4 Owner
3 85.5 16.8 Owner
4 64.8 21.6 Owner
5 61.5 20.8 Owner
6 87 23.6 Owner
7 110.1 19.2 Owner
8 108 17.6 Owner
9 82.8 22.4 Owner
10 69 20 Owner
11 93 20.8 Owner
12 51 22 Owner
13 81 20 Owner
14 75 19.6 Nonowner
15 52.8 20.8 Nonowner
16 64.8 17.2 Nonowner
17 43.2 20.4 Nonowner
18 84 17.6 Nonowner
19 49.2 17.6 Nonowner
20 59.4 16 Nonowner
21 66 18.4 Nonowner
22 47.4 16.4 Nonowner
23 33 18.8 Nonowner
24 51 14 Nonowner
25 63 14.8 Nonowner

View File

@ -0,0 +1,79 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Learning Practice 6 for the University of Tulsa's QM-7063 Data Mining Course\n",
"# Logistic Regression for Classification\n",
"# # Professor: Dr. Abdulrashid, Spring 2023\n",
"# Noah L. Schrick - 1492657\n",
"\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Problem 10.3\n",
"\n",
"A company that manufactures riding mowers wants to identify the best sales prospects for an intensive sales campaign. In particular, the manufacturer is interested in classifying households as prospective owners or nonowners on the basis of Income (in $1000s) and Lot Size (in 1000 ft2). The marketing expert looked at a random sample of 24 households, given in the file RidingMowers.csv. \n",
"\n",
"Use all the data to fit a logistic regression of ownership on the two predictors.\n",
"\n",
"a. What percentage of households in the study were owners of a riding mower? \n",
"b. Create a scatter plot of Income vs. Lot Size using color or symbol to distinguish owners from nonowners. From the scatter plot, which class seems to have a higher average income, owners or nonowners? \n",
"c. Among nonowners, what is the percentage of households classified correctly? \n",
"d. To increase the percentage of correctly classified nonowners, should the cutoff probability be increased or decreased? \n",
"e. What are the odds that a household with a $60K income and a lot size of 20,000ft2 is an owner? \n",
"f. What is the classification of a household with a $60K income and a lot size of 20,000 ft2? Use cutoff = 0.5. \n",
"g. What is the minimum income that a household with 16,000 ft2 lot size should have before it is classified as an owner? "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Problem 10.4\n",
"\n",
"The file eBayAuctions.csv contains information on 1972 auctions transacted on eBay.com during MayJune 2004. The goal is to\n",
"use these data to build a model that will distinguish competitive auctions from non-competitive ones. A competitive auction is defined as an auction with at least two bids placed on the item being auctioned. The data include variables that describe the item (auction category), the seller (his or her eBay rating), and the auction terms that the seller selected (auction duration, opening price, currency, day of week of auction close). In addition, we have the price at which the auction closed. The goal is to predict whether or not an auction of interest will be competitive.\n",
"\n",
"Data preprocessing. Create dummy variables for the categorical predictors.\n",
"These include Category (18 categories), Currency (USD, GBP, Euro), EndDay\n",
"(MondaySunday), and Duration (1, 3, 5, 7, or 10 days).\n",
"\n",
"a. Create pivot tables for the mean of the binary outcome (Competitive?) as a function of the various categorical variables (use the original variables, not the dummies). Use the information in the tables to reduce the number of dummies that will be used in the model. For example, categories that appear most similar with respect to the distribution of competitive auctions could be combined. \n",
"b. Split the data into training (60%) and validation (40%) datasets. Run a logistic model with all predictors with a cutoff of 0.5. \n",
"c. If we want to predict at the start of an auction whether it will be competitive, we cannot use the information on the closing price. Run a logistic model with all predictors as above, excluding price. How does this model compare to the full model with respect to predictive accuracy? \n",
"d. Interpret the meaning of the coefficient for closing price. Does closing price have a practical significance? Is it statistically significant for predicting competitiveness of auctions? (Use a 10% significance level.) \n",
"e. Use stepwise regression as described in Section 6.4 to find the model with the best fit to the training data (highest accuracy). Which predictors are used? \n",
"f. Use stepwise regression to find the model with the highest accuracy on the validation data. Which predictors are used? \n",
"g. What is the danger of using the best predictive model that you found? \n",
"h. Explain how and why the best-fitting model and the best predictive models are the same or different. \n",
"i. Use regularized logistic regression with L1 penalty on the training data. Compare its selected predictors and classification performance to the best-fitting and best predictive models. \n",
"j. If the major objective is accurate classification, what cutoff value should be used? \n",
"k. Based on these data, what auction settings set by the seller (duration, opening price, ending day, currency) would you recommend as being most likely to lead to a competitive auction. "
]
}
],
"metadata": {
"language_info": {
"name": "python"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 2
}

1973
eBayAuctions.csv Normal file

File diff suppressed because it is too large Load Diff