In India in the 4th century BC the earliest diamonds were found, while the youngest of these deposits was formed 900 million years ago. Most of these early stones were transported along the trade route network that linked India and China, commonly known as the Silk Road. Diamonds were admired at the time of their discovery for their strength and beauty, and for their ability to refract light and engrave metal. Diamonds were worn as ornamentation, used as cutting tools, acted as a talisman to defend against evil, and were thought to provide defense in war. Diamonds were also used as medicinal aid in the Dark Ages, and were believed when swallowed to cure disease and heal wounds.
My assignment consists of finding the price of a diamond given its properties. I will use diamonds data set to estimate price of any diamond.
library(tidyverse)
## -- Attaching packages ---------------------------------------------------------------------------------------- tidyverse 1.3.0 --
## <U+221A> ggplot2 3.3.2 <U+221A> purrr 0.3.4
## <U+221A> tibble 3.0.3 <U+221A> dplyr 1.0.1
## <U+221A> tidyr 1.1.1 <U+221A> stringr 1.4.0
## <U+221A> readr 1.3.1 <U+221A> forcats 0.5.0
## -- Conflicts ------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(ggplot2)
library(dplyr)
library(rpart) library(rpart.plot) library(rattle) library(FactoMineR) library(corrplot)
str(diamonds)
## tibble [53,940 x 10] (S3: tbl_df/tbl/data.frame)
## $ carat : num [1:53940] 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
## $ cut : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...
## $ color : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ...
## $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
## $ depth : num [1:53940] 61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
## $ table : num [1:53940] 55 61 65 58 58 57 57 55 61 61 ...
## $ price : int [1:53940] 326 326 327 334 335 336 336 337 337 338 ...
## $ x : num [1:53940] 3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
## $ y : num [1:53940] 3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
## $ z : num [1:53940] 2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...
head(diamonds)
## # A tibble: 6 x 10
## carat cut color clarity depth table price x y z
## <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
## 4 0.290 Premium I VS2 62.4 58 334 4.2 4.23 2.63
## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
## 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
summary(diamonds)
## carat cut color clarity depth
## Min. :0.2000 Fair : 1610 D: 6775 SI1 :13065 Min. :43.00
## 1st Qu.:0.4000 Good : 4906 E: 9797 VS2 :12258 1st Qu.:61.00
## Median :0.7000 Very Good:12082 F: 9542 SI2 : 9194 Median :61.80
## Mean :0.7979 Premium :13791 G:11292 VS1 : 8171 Mean :61.75
## 3rd Qu.:1.0400 Ideal :21551 H: 8304 VVS2 : 5066 3rd Qu.:62.50
## Max. :5.0100 I: 5422 VVS1 : 3655 Max. :79.00
## J: 2808 (Other): 2531
## table price x y
## Min. :43.00 Min. : 326 Min. : 0.000 Min. : 0.000
## 1st Qu.:56.00 1st Qu.: 950 1st Qu.: 4.710 1st Qu.: 4.720
## Median :57.00 Median : 2401 Median : 5.700 Median : 5.710
## Mean :57.46 Mean : 3933 Mean : 5.731 Mean : 5.735
## 3rd Qu.:59.00 3rd Qu.: 5324 3rd Qu.: 6.540 3rd Qu.: 6.540
## Max. :95.00 Max. :18823 Max. :10.740 Max. :58.900
##
## z
## Min. : 0.000
## 1st Qu.: 2.910
## Median : 3.530
## Mean : 3.539
## 3rd Qu.: 4.040
## Max. :31.800
##
Carat: Diamond weight, 200 mg equivalent (should be a good indicator)
Cut: Quality of the cut
Color: Diamond color from J to D (worst to best)
Clarity: Clearity of Diamond; I1 (worst), SI2, SI1, VS2, VS1, VVS2, VVS1, IF (best)
Depth: Total depth percentage (relative to x and y). Will likely be collinear.
Table: Width of top of diamond relative to widest point (43–95)
Price: US dollars ($)
x, y, z: Dimensions of the diamond
ggplot(diamonds, aes(x=price)) + geom_histogram()+ggtitle("Quantity of Diamonds")+xlab("Quantity")+ylab("Price")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
The graph above shows that how many diamonds in the dataset in terms of price.
diamonds %>%
ggplot(aes(x=(price))) +
geom_histogram(stat="bin",binwidth= 500) +
facet_wrap(~cut, scales = "free")+
ggtitle("Change in price(Carat and Cut)")+xlab("Price")+ylab("Quantity")
Graph above shows number of diamonds with their “Cut”s in term of price.
ggplot(diamonds, aes(x=carat, y=price, color=cut)) + geom_point()+
ggtitle("Change in price(Quantity and Cut)")+xlab("Price")+ylab("Quantity")
This helps us to examine the relationship of each of these variables, and how it affects the relationship between carat and price.
ggplot(diamonds, aes(x=color, y=price)) + geom_boxplot() + scale_y_log10()+
ggtitle("Change in price(Color)")+xlab("Color")+ylab("Price")
The graph above, unlike the other graphs, we anaylzed the change of the price according to colors using histogram. This histogram shows the collapses more clearly.
new %>% spread(cut,n2)
## # A tibble: 7 x 6
## color Fair Good `Very Good` Premium Ideal
## <ord> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 D 0.101 0.135 0.125 0.116 0.132
## 2 E 0.139 0.190 0.199 0.169 0.181
## 3 F 0.194 0.185 0.179 0.169 0.178
## 4 G 0.195 0.178 0.190 0.212 0.227
## 5 H 0.188 0.143 0.151 0.171 0.145
## 6 I 0.109 0.106 0.0997 0.104 0.0971
## 7 J 0.0739 0.0626 0.0561 0.0586 0.0416
ggplot(new,aes(x=color, y=cut)) +
geom_tile(aes(fill=n2*100), colour = "white") +
scale_fill_gradient(low="white",high="black") +
labs(fill = "Density")+
ggtitle("Density Graph(Color and Cut)")+xlab("Color")+ylab("Cut")