A diamond is the hardest material on Earth, and has long-since been recognized for its beauty as a gemstone. Some 142 million carats of diamonds were estimated to have been produced from mines worldwide in 2019. Major producing countries include Australia, Canada, the Democratic Republic of Congo, Botswana, South Africa, and Russia. Worldwide reserves are estimated to be some 1.2 billion carats. Russia has the largest reserves, estimated at some 650 million carats.
We are going to examine the price determination of various diamonds.
library("tidyverse")
## -- Attaching packages ----------------------------------------------------------------- tidyverse 1.3.0 --
## <U+221A> ggplot2 3.3.2 <U+221A> purrr 0.3.4
## <U+221A> tibble 3.0.3 <U+221A> dplyr 1.0.2
## <U+221A> tidyr 1.1.2 <U+221A> stringr 1.4.0
## <U+221A> readr 1.3.1 <U+221A> forcats 0.5.0
## -- Conflicts -------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library("ggplot2")
data(diamonds)
diamonds %>% glimpse()
## Rows: 53,940
## Columns: 10
## $ carat <dbl> 0.23, 0.21, 0.23, 0.29, 0.31, 0.24, 0.24, 0.26, 0.22, 0.23,...
## $ cut <ord> Ideal, Premium, Good, Premium, Good, Very Good, Very Good, ...
## $ color <ord> E, E, E, I, J, J, I, H, E, H, J, J, F, J, E, E, I, J, J, J,...
## $ clarity <ord> SI2, SI1, VS1, VS2, SI2, VVS2, VVS1, SI1, VS2, VS1, SI1, VS...
## $ depth <dbl> 61.5, 59.8, 56.9, 62.4, 63.3, 62.8, 62.3, 61.9, 65.1, 59.4,...
## $ table <dbl> 55, 61, 65, 58, 58, 57, 57, 55, 61, 61, 55, 56, 61, 54, 62,...
## $ price <int> 326, 326, 327, 334, 335, 336, 336, 337, 337, 338, 339, 340,...
## $ x <dbl> 3.95, 3.89, 4.05, 4.20, 4.34, 3.94, 3.95, 4.07, 3.87, 4.00,...
## $ y <dbl> 3.98, 3.84, 4.07, 4.23, 4.35, 3.96, 3.98, 4.11, 3.78, 4.05,...
## $ z <dbl> 2.43, 2.31, 2.31, 2.63, 2.75, 2.48, 2.47, 2.53, 2.49, 2.39,...
As seen above, carat, cut, color and clarity are the main specifications of a diamond. We are going to make more detailed observations on them.
ggplot(diamonds, aes(x=price)) + geom_histogram(binwidth=200)
In the histogram above, we can see that diamonds with prices about 1000 dollars are produced the most. Especially with the prices above 5000 dollars, the diamond amounts decrease significantly. Let’s look into the specifications more closely to see what determines the price of a diamond.
As I mentioned before, carat, clarity, cut and color are the main specifications of a diamond.
Diamond clarity is the quality of diamonds that relates to the existence and visual appearance of internal characteristics of a diamond called inclusions, and surface defects, called blemishes. Clarity is one of the four Cs of diamond grading, the others being carat, color, and cut.
ggplot(diamonds, aes(x=carat, y=price, color=clarity)) + geom_point()
A chemically pure and structurally perfect diamond is perfectly transparent with no hue, or color. However, in reality almost no gem-sized natural diamonds are absolutely perfect. The color of a diamond may be affected by chemical impurities and/or structural defects in the crystal lattice.
ggplot(diamonds, aes(x=carat, y=price, color=color)) + geom_point()
A diamond cut is a style or design guide used when shaping a diamond for polishing such as the brilliant cut. Cut does not refer to shape, but the symmetry, proportioning and polish of a diamond. The cut of a diamond greatly affects a diamond’s brilliance; this means if it is cut poorly, it will be less luminous.
ggplot(diamonds, aes(x=carat, y=price, color=cut)) + geom_point()
Note that the
echo = FALSE
parameter was added to the code chunk to prevent printing of the R code that generated the plot.