Universal Bank is a relatively young bank growing rapidly in terms of overall cu

Universal Bank is a relatively young bank growing rapidly in terms of overall cu

Universal Bank is a relatively young bank growing rapidly in terms of overall customer acquisition. The majority of these customers are liability customers (depositors) with varying sizes of relationship with the bank. The customer base of asset customers (borrowers) is quite small, and the bank is interested in expanding this base rapidly to bring in more loan business. In particular, it wants to explore ways of converting its liability customers to personal loan customers (while retaining them as depositors).
A campaign that the bank ran last year for liability customers showed a healthy conversion rate of over 9% success. This has encouraged the retail marketing department to devise smarter campaigns with better target marketing. The goal is to use k‐NN to predict whether a new customer will accept a loan offer. This will serve as the basis for the design of a new campaign.
The dataset mlba::UniversalBank contains data on 5000 customers. The data include customer demographic information (age, income, etc.), the customer’s relationship with the bank (mortgage, securities account, etc.), and the customer response to the last personal loan campaign (Personal Loan). Among these 5000 customers, only 480 (= 9.6%) accepted the personal loan that was offered to them in the earlier campaign.
Partition the data into training (60%) and holdout (40%) sets.
Consider the following customer: Age = 40, Experience = 10, Income = 84, Family = 2, CCAvg = 2, Education = 2, Mortgage = 0, Securities Account = 0, CD Account = 0, Online = 1, and Credit Card = 1. Perform a k‐NN classification with all predictors except ID and ZIP code. Remember to define categorical predictors with more than two categories as factors (for k‐NN, to automatically handle categorical predictors). Create KNN model with k=1. How would this customer be classified?
Use set.seed(1) for training.
B. What is a choice of k that balances between overfitting and ignoring the predictor information? Use 5‐fold cross‐validation to find the best k.
Use set.seed(123) for cross validation
The best K for the model is saved in model$bestTune
C. Show the confusion matrics for the training and holdout data that results from using the best k. Comment on the differences and reasons.
The code example below shows how to produce the confusion matrix for the training set
cm <- confusionMatrix(predict(model, train.df), train.df$Personal.Loan) D. Consider the following customer: Age = 40, Experience = 10, Income = 84, Family = 2, CCAvg = 2, Education = 2, Mortgage = 0, Securities Account = 0, CD Account = 0, Online = 1 and Credit Card = 1. Classify the customer using the best k.

Posted in R

Continuing with the theme of hypothesis testing, this week, we turn our attentio

Continuing with the theme of hypothesis testing, this week, we turn our attentio

Continuing with the theme of hypothesis testing, this week, we turn our attention to conducting tests for one sample, two paired samples, and two independent samples. To further develop our understanding of these tests, this assignment will focus on the application of these statistical techniques. You will select a dataset, conduct the appropriate tests, and share your findings.Assignment Requirements: Dataset Selection: Choose a dataset that allows for one sample, two paired samples, and two independent sample tests. Briefly explain why you have chosen this dataset.
Hypothesis Formulation: Formulate hypotheses appropriate for one sample, two paired samples, and two independent sample tests. Describe the hypotheses for each test clearly.
Execution of Tests: Perform the tests using Python or R, and document the steps you have taken. Be sure to include your code in your initial post.
Results Interpretation: Interpret the results of your tests. What do the results tell you about your dataset and the hypotheses you formulated?
Conclusions and Applications: Summarize your findings and discuss potential real-world applications of your conclusions.
Submission Format: Your submission should be a maximum of 500-600 words (excluding Python/R code). Submit your assignment in APA format as a Word document or a PDF file. Include your written analysis and any tables or visualizations that support your findings. If you used any software for your calculations (like R, Python, Excel), please include your code or formulas as well. Include an APA-formatted reference list for any external resources used.

Posted in R

Introduction: Provide a concise overview of the concepts of parametric tests, un

Introduction: Provide a concise overview of the concepts of parametric tests, un

Introduction: Provide a concise overview of the concepts of parametric tests, univariate tests for normality, and hypothesis testing.
Dataset Selection: Identify and describe a dataset suitable for applying these tests. Explain your reasons for choosing it.
Parametric Test Application: Conduct a parametric test on your selected dataset. Include all steps and any Python or R code you used.
Univariate Test for Normality Application:Perform a univariate test for normality on your dataset. Again, include all steps and any Python or R code used.
Results and Conclusion: Summarize your test results. Were your hypotheses confirmed or rejected? What conclusions can you draw about the population from your sample?
Submission Format: Your submission should be a maximum of 500-600 words (excluding Python/R code). Submit your assignment in APA format as a Word document or a PDF file. Include your written analysis and any tables or visualizations that support your findings. If you used any software for your calculations (like R, Python, Excel), please include your code or formulas as well. Include an APA-formatted reference list for any external resources used.

Posted in R

I have attached the instructions below and the rmd with the cvs file is in the l

I have attached the instructions below and the rmd with the cvs file is in the l

I have attached the instructions below and the rmd with the cvs file is in the link below. Thank you for helping me.
The link to rmd file: https://drive.google.com/drive/folders/1A5-7HJgp0u…

Posted in R

I attached the instructions below and the rmd with the cvs file is in the link b

I attached the instructions below and the rmd with the cvs file is in the link b

I attached the instructions below and the rmd with the cvs file is in the link below. Thank you for helping me.
The link to rmd file: https://drive.google.com/drive/folders/1A5-7HJgp0u…

Posted in R

I have attached the assignment that contain the questions long with the files wi

I have attached the assignment that contain the questions long with the files wi

I have attached the assignment that contain the questions long with the files with the data needed (users and sessions) to complete this. For question 13, he strongly mentioned to
Filter the sessions file so it only includes action_type == “booking_request”.
Count booking_request grouped by user_id.
Filter the output on count_booking_request ==1.
Filter the sessions table using the user_ids you found in #3.
Count the action_detail == “view_listing” grouped by user_id.
Use summary() to find the summary statistics

Posted in R

The Fat data The Fat data contains the age, weight, height, and ten body circumf

The Fat data
The Fat data contains the age, weight, height, and ten body circumf

The Fat data
The Fat data contains the age, weight, height, and ten body circumference measurements for 252 men. Each man’s percentage of body fat was accurately estimated by an underwater weighing technique.
The data frame contains the following variables:
brozek: Percent of body fat using Brozek’s equation, 457/Density – 414.2
siri: Percent body fat using Siri’s equation, 495/Density – 450
density: Density (gm/cm3)
age: Age (yrs)
weight: Weight (lbs)
height: Height (inches)
adipos: Adiposity index = Weight/Height2 (kg/m2)
free: Fat Free Weight = (1 – fraction of body fat) * Weight, using Brozek’s formula (lbs)
neck: Neck circumference (cm)
chest: Chest circumference (cm)
abdom: Abdomen circumference (cm) at the umbilicus and level with the iliac crest
hip: Hip circumference (cm)
thigh: Thigh circumference (cm)
knee: Knee circumference (cm)
ankle: Ankle circumference (cm)
biceps: Extended biceps circumference (cm)
forearm: Forearm circumference (cm)
wrist: Wrist circumference (cm) distal to the styloid processes
You can access the data using the following statement: data(fat, package = “faraway”)
Question 1
Fit a regression model with the brozek variable (percent of body fat) as a response and the following six predictors: age, neck, abdom, thigh, forearm and wrist.
Show the summary. Which predictors are significant at the 0.05 level?
Question 2
Provide interpretation to the coefficient of each significant predictor
Hints:
Hints: See Lesson 3, Slide 49 and Slide 58.
Question 3
Compute the median value of the six predictors. Store the medians in a variable named x0 and show the values .
Hint: See Lesson 4, Slide 18.
Question 4
Construct a confidence interval of the mean response based on the median values that you stored in x0.
Hint: See Lesson 4, Slide 20.
Question 5
Construct a prediction interval of the next response value based on the median values that you stored in x0.
Hint: See Lesson 4, Slide 20.
Question 6
Which of the two intervals is wider?
Question 7
Construct a confidence interval of the outcome variable for a person with the following characteristics:
Age: 49 years
Neck: circumference: 40 cm
Abdomen: circumference: 95 cm
thigh: circumference: 60 cm
forearm: circumference: 31 cm
wrist circumference: 19.5 cm
Hints:
You can store the predictor values in a new variable named x1. Here is an example of such a variable:
x1 <- c("(Intercept)" = 1, age = 25, neck =34, abdom = 84, forearm = 25, wrist = 25) Note that the intercept should be 1, but you will need to update the values of the predictors.

Posted in R