Introduction

A diamond is the hardest material on Earth, and has long-since been recognized for its beauty as a gemstone. Some 142 million carats of diamonds were estimated to have been produced from mines worldwide in 2019. Major producing countries include Australia, Canada, the Democratic Republic of Congo, Botswana, South Africa, and Russia. Worldwide reserves are estimated to be some 1.2 billion carats. Russia has the largest reserves, estimated at some 650 million carats.

We are going to examine the price determination of various diamonds.

Loading Packages and Data

library("tidyverse")
## -- Attaching packages ----------------------------------------------------------------- tidyverse 1.3.0 --
## <U+221A> ggplot2 3.3.2     <U+221A> purrr   0.3.4
## <U+221A> tibble  3.0.3     <U+221A> dplyr   1.0.2
## <U+221A> tidyr   1.1.2     <U+221A> stringr 1.4.0
## <U+221A> readr   1.3.1     <U+221A> forcats 0.5.0
## -- Conflicts -------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library("ggplot2")
data(diamonds)

Examining the Data

diamonds %>% glimpse()
## Rows: 53,940
## Columns: 10
## $ carat   <dbl> 0.23, 0.21, 0.23, 0.29, 0.31, 0.24, 0.24, 0.26, 0.22, 0.23,...
## $ cut     <ord> Ideal, Premium, Good, Premium, Good, Very Good, Very Good, ...
## $ color   <ord> E, E, E, I, J, J, I, H, E, H, J, J, F, J, E, E, I, J, J, J,...
## $ clarity <ord> SI2, SI1, VS1, VS2, SI2, VVS2, VVS1, SI1, VS2, VS1, SI1, VS...
## $ depth   <dbl> 61.5, 59.8, 56.9, 62.4, 63.3, 62.8, 62.3, 61.9, 65.1, 59.4,...
## $ table   <dbl> 55, 61, 65, 58, 58, 57, 57, 55, 61, 61, 55, 56, 61, 54, 62,...
## $ price   <int> 326, 326, 327, 334, 335, 336, 336, 337, 337, 338, 339, 340,...
## $ x       <dbl> 3.95, 3.89, 4.05, 4.20, 4.34, 3.94, 3.95, 4.07, 3.87, 4.00,...
## $ y       <dbl> 3.98, 3.84, 4.07, 4.23, 4.35, 3.96, 3.98, 4.11, 3.78, 4.05,...
## $ z       <dbl> 2.43, 2.31, 2.31, 2.63, 2.75, 2.48, 2.47, 2.53, 2.49, 2.39,...

As seen above, carat, cut, color and clarity are the main specifications of a diamond. We are going to make more detailed observations on them.

Diamond amount/price relation

ggplot(diamonds, aes(x=price)) + geom_histogram(binwidth=200)

In the histogram above, we can see that diamonds with prices about 1000 dollars are produced the most. Especially with the prices above 5000 dollars, the diamond amounts decrease significantly. Let’s look into the specifications more closely to see what determines the price of a diamond.

What determines the price of a diamond?

As I mentioned before, carat, clarity, cut and color are the main specifications of a diamond.

Clarity

Diamond clarity is the quality of diamonds that relates to the existence and visual appearance of internal characteristics of a diamond called inclusions, and surface defects, called blemishes. Clarity is one of the four Cs of diamond grading, the others being carat, color, and cut.

ggplot(diamonds, aes(x=carat, y=price, color=clarity)) + geom_point()

Color

A chemically pure and structurally perfect diamond is perfectly transparent with no hue, or color. However, in reality almost no gem-sized natural diamonds are absolutely perfect. The color of a diamond may be affected by chemical impurities and/or structural defects in the crystal lattice.

ggplot(diamonds, aes(x=carat, y=price, color=color)) + geom_point()

Cut

A diamond cut is a style or design guide used when shaping a diamond for polishing such as the brilliant cut. Cut does not refer to shape, but the symmetry, proportioning and polish of a diamond. The cut of a diamond greatly affects a diamond’s brilliance; this means if it is cut poorly, it will be less luminous.

ggplot(diamonds, aes(x=carat, y=price, color=cut)) + geom_point()

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.