data.table
package is used for the data manupilation processes. It is accepted as the fastest package for this purpose. Even if R is consired as bad for high volume data, this package makes all works faster than any other package. When it is compared with dply or pandas package of Python, it is the winner all the time. Also, most of other packages in R are built on data.table package.
Also we can use fread()
for reading the data from internet or local. This function works like read.csv()
function but in a more efficient way.
As other data manupilation packages, data.table has its own data type which is data.table. We can create an example data.table object like this:
datatable = data.table(student_no = c(1, 2, 3, 4, 5),
names = c("A", "B", "C", "D", "E"),
city = c("Istanbul", "Adana", "Eskişehir", "Antalya", "Hakkari"))
datatable
## student_no names city
## 1: 1 A Istanbul
## 2: 2 B Adana
## 3: 3 C Eskişehir
## 4: 4 D Antalya
## 5: 5 E Hakkari
We can use all functions in base R with data.table object like:
## [1] 9044
## [1] 15
## Ä°lan Tarihi Marka Arac Tip Grubu Arac Tip
## Length:9044 Length:9044 Length:9044 Length:9044
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## Model Yıl Yakıt Turu Vites CCM
## Min. :1959 Length:9044 Length:9044 Length:9044
## 1st Qu.:2010 Class :character Class :character Class :character
## Median :2014 Mode :character Mode :character Mode :character
## Mean :2012
## 3rd Qu.:2017
## Max. :2020
## Beygir Gucu Renk Kasa Tipi Kimden
## Length:9044 Length:9044 Length:9044 Length:9044
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## Durum Km Fiyat
## Length:9044 Min. : 0 Min. : 5500
## Class :character 1st Qu.: 68000 1st Qu.: 60938
## Mode :character Median : 125000 Median : 95500
## Mean : 135813 Mean : 148070
## 3rd Qu.: 194130 3rd Qu.: 153562
## Max. :1850000 Max. :5086500
## Classes 'data.table' and 'data.frame': 9044 obs. of 15 variables:
## $ Ä°lan Tarihi : chr "27/05/2020" "16/06/2020" "14/06/2020" "11/06/2020" ...
## $ Marka : chr "Jaguar" "Acura" "Acura" "Acura" ...
## $ Arac Tip Grubu: chr "XF" "CL" "CL" "CL" ...
## $ Arac Tip : chr "2.0 D Prestige Plus" "-" "2.2" "-" ...
## $ Model Yıl : num 2017 2015 1994 2013 2010 ...
## $ Yakıt Turu : chr "Dizel" "Dizel" "Benzin/LPG" "Dizel" ...
## $ Vites : chr "Otomatik Vites" "Yarı Otomatik Vites" "Düz Vites" "Düz Vites" ...
## $ CCM : chr "1801-2000 cc" "1301-1600 cc" "1301-1600 cc" "1301-1600 cc" ...
## $ Beygir Gucu : chr "176-200 BG" "101-125 BG" "101-125 BG" "76-100 BG" ...
## $ Renk : chr "Lacivert" "Mavi" "Turkuaz" "Kahverengi" ...
## $ Kasa Tipi : chr "Hatchback 5 Kapı" "Sedan" "Sedan" "Sedan" ...
## $ Kimden : chr "Galeriden" "Sahibinden" "Sahibinden" "Sahibinden" ...
## $ Durum : chr "2. El" "2. El" "2. El" "2. El" ...
## $ Km : int 26100 127000 175000 325 207000 320000 183 79000 15662 31971 ...
## $ Fiyat : int 634500 151500 19750 52000 148750 42500 170000 82500 189000 189000 ...
## - attr(*, ".internal.selfref")=<externalptr>
## İlan Tarihi Marka Arac Tip Grubu Arac Tip Model Yıl
## 1: 27/05/2020 Jaguar XF 2.0 D Prestige Plus 2017
## 2: 16/06/2020 Acura CL - 2015
## 3: 14/06/2020 Acura CL 2.2 1994
## 4: 11/06/2020 Acura CL - 2013
## 5: 11/06/2020 Acura CL 2.2 2010
## 6: 04/06/2020 Acura CL 2.2 1999
## Yakıt Turu Vites CCM Beygir Gucu Renk
## 1: Dizel Otomatik Vites 1801-2000 cc 176-200 BG Lacivert
## 2: Dizel Yarı Otomatik Vites 1301-1600 cc 101-125 BG Mavi
## 3: Benzin/LPG Düz Vites 1301-1600 cc 101-125 BG Turkuaz
## 4: Dizel Düz Vites 1301-1600 cc 76-100 BG Kahverengi
## 5: Dizel Otomatik Vites 1801-2000 cc 151-175 BG Beyaz
## 6: Dizel Düz Vites 1801-2000 cc 101-125 BG Lacivert
## Kasa Tipi Kimden Durum Km Fiyat
## 1: Hatchback 5 Kapı Galeriden 2. El 26100 634500
## 2: Sedan Sahibinden 2. El 127000 151500
## 3: Sedan Sahibinden 2. El 175000 19750
## 4: Sedan Sahibinden 2. El 325 52000
## 5: Sedan Sahibinden 2. El 207000 148750
## 6: Roadster Galeriden 2. El 320000 42500
The most important consept of data.table package is the usage of data.table object. When we want to filter the data, we need to give that information in i
parameter. If we want to select some of columns (or all of them), we need to use the j
parameter. If we want to group the data with a column / columns, we need to indicate it in the by
parameter.
For example, if we want to get all records whose prices(Fiyat
column) are less than 50000 Turkish Liras, we can use this code:
## İlan Tarihi Marka Arac Tip Grubu Arac Tip Model Yıl Yakıt Turu
## 1: 14/06/2020 Acura CL 2.2 1994 Benzin/LPG
## 2: 04/06/2020 Acura CL 2.2 1999 Dizel
## 3: 24/05/2020 Acura CL - 2001 Benzin/LPG
## 4: 27/04/2020 Acura CL - 2004 Dizel
## 5: 27/04/2020 Acura CL - 2004 Benzin/LPG
## ---
## 1514: 06/06/2020 TofaÅŸ Å\236ahin 1.6 1994 Benzin/LPG
## 1515: 05/06/2020 TofaÅŸ Å\236ahin Å\236ahin 5 vites 1992 Benzin/LPG
## 1516: 05/06/2020 TofaÅŸ DoÄŸan SLX 1993 Benzin/LPG
## 1517: 04/06/2020 TofaÅŸ Kartal Kartal 5 Vites 1991 Benzin/LPG
## 1518: 04/06/2020 TofaÅŸ Kartal Kartal 5 Vites 1996 Benzin/LPG
## Vites CCM Beygir Gucu Renk Kasa Tipi
## 1: Düz Vites 1301-1600 cc 101-125 BG Turkuaz Sedan
## 2: Düz Vites 1801-2000 cc 101-125 BG Lacivert Roadster
## 3: Düz Vites 1301-1600 cc 101-125 BG Siyah Hatchback 5 Kapı
## 4: Düz Vites 1301-1600 cc 76-100 BG Beyaz Hatchback 5 Kapı
## 5: Düz Vites 1301-1600 cc 101-125 BG Lacivert Sedan
## ---
## 1514: Düz Vites 1301-1600 cc 51-75 BG Beyaz Sedan
## 1515: Düz Vites 1301-1600 cc 76-100 BG Bej Sedan
## 1516: Düz Vites 1301-1600 cc 76-100 BG Yeşil Sedan
## 1517: Düz Vites 1601-1800 cc 76-100 BG Kırmızı Hatchback 5 Kapı
## 1518: Düz Vites 1301-1600 cc 51-75 BG Beyaz Sedan
## Kimden Durum Km Fiyat
## 1: Sahibinden 2. El 175000 19750
## 2: Galeriden 2. El 320000 42500
## 3: Sahibinden 2. El 252000 36750
## 4: Galeriden 2. El 230000 32000
## 5: Sahibinden 2. El 18000 34500
## ---
## 1514: Galeriden 2. El 121212 12250
## 1515: Sahibinden 2. El 54000 16300
## 1516: Sahibinden 2. El 135000 19000
## 1517: Galeriden 2. El 250000 11000
## 1518: Galeriden 2. El 11 15750
Also we can give more conditions with &
or |
operators.
## İlan Tarihi Marka Arac Tip Grubu Arac Tip Model Yıl Yakıt Turu Vites
## 1: 14/06/2020 Acura CL 2.2 1994 Benzin/LPG Düz Vites
## 2: 04/06/2020 Acura CL 2.2 1999 Dizel Düz Vites
## 3: 24/05/2020 Acura CL - 2001 Benzin/LPG Düz Vites
## 4: 27/04/2020 Acura CL - 2004 Dizel Düz Vites
## 5: 27/04/2020 Acura CL - 2004 Benzin/LPG Düz Vites
## 6: 15/04/2020 Acura CL - 2009 Dizel Düz Vites
## 7: 15/04/2020 Acura CL - 2009 Dizel Düz Vites
## CCM Beygir Gucu Renk Kasa Tipi Kimden Durum
## 1: 1301-1600 cc 101-125 BG Turkuaz Sedan Sahibinden 2. El
## 2: 1801-2000 cc 101-125 BG Lacivert Roadster Galeriden 2. El
## 3: 1301-1600 cc 101-125 BG Siyah Hatchback 5 Kapı Sahibinden 2. El
## 4: 1301-1600 cc 76-100 BG Beyaz Hatchback 5 Kapı Galeriden 2. El
## 5: 1301-1600 cc 101-125 BG Lacivert Sedan Sahibinden 2. El
## 6: 1300 cc ve altı 76-100 BG Bordo Sedan Sahibinden 2. El
## 7: 1300 cc ve altı 76-100 BG Bordo Sedan Sahibinden 2. El
## Km Fiyat
## 1: 175000 19750
## 2: 320000 42500
## 3: 252000 36750
## 4: 230000 32000
## 5: 18000 34500
## 6: 150000 34000
## 7: 150000 34000
If we want to get all rows except this condition we can use !
sign.
## İlan Tarihi Marka Arac Tip Grubu Arac Tip Model Yıl
## 1: 27/05/2020 Jaguar XF 2.0 D Prestige Plus 2017
## 2: 16/06/2020 Acura CL - 2015
## 3: 11/06/2020 Acura CL - 2013
## 4: 11/06/2020 Acura CL 2.2 2010
## 5: 03/06/2020 Acura CL 2.2 2014
## ---
## 9033: 06/06/2020 TofaÅŸ Å\236ahin 1.6 1994
## 9034: 05/06/2020 TofaÅŸ Å\236ahin Å\236ahin 5 vites 1992
## 9035: 05/06/2020 TofaÅŸ DoÄŸan SLX 1993
## 9036: 04/06/2020 TofaÅŸ Kartal Kartal 5 Vites 1991
## 9037: 04/06/2020 TofaÅŸ Kartal Kartal 5 Vites 1996
## Yakıt Turu Vites CCM Beygir Gucu Renk
## 1: Dizel Otomatik Vites 1801-2000 cc 176-200 BG Lacivert
## 2: Dizel Yarı Otomatik Vites 1301-1600 cc 101-125 BG Mavi
## 3: Dizel Düz Vites 1301-1600 cc 76-100 BG Kahverengi
## 4: Dizel Otomatik Vites 1801-2000 cc 151-175 BG Beyaz
## 5: Dizel Otomatik Vites 1301-1600 cc 101-125 BG Kırmızı
## ---
## 9033: Benzin/LPG Düz Vites 1301-1600 cc 51-75 BG Beyaz
## 9034: Benzin/LPG Düz Vites 1301-1600 cc 76-100 BG Bej
## 9035: Benzin/LPG Düz Vites 1301-1600 cc 76-100 BG Yeşil
## 9036: Benzin/LPG Düz Vites 1601-1800 cc 76-100 BG Kırmızı
## 9037: Benzin/LPG Düz Vites 1301-1600 cc 51-75 BG Beyaz
## Kasa Tipi Kimden Durum Km Fiyat
## 1: Hatchback 5 Kapı Galeriden 2. El 26100 634500
## 2: Sedan Sahibinden 2. El 127000 151500
## 3: Sedan Sahibinden 2. El 325 52000
## 4: Sedan Sahibinden 2. El 207000 148750
## 5: Sedan Galeriden 2. El 183 170000
## ---
## 9033: Sedan Galeriden 2. El 121212 12250
## 9034: Sedan Sahibinden 2. El 54000 16300
## 9035: Sedan Sahibinden 2. El 135000 19000
## 9036: Hatchback 5 Kapı Galeriden 2. El 250000 11000
## 9037: Sedan Galeriden 2. El 11 15750
For example, we need to get the head of the colours (Renk
column). To do so, we can use these notations:
## [1] "Lacivert" "Mavi" "Turkuaz" "Kahverengi" "Beyaz"
## [6] "Lacivert"
## Renk
## 1: Lacivert
## 2: Mavi
## 3: Turkuaz
## 4: Kahverengi
## 5: Beyaz
## 6: Lacivert
## Renk
## 1: Lacivert
## 2: Mavi
## 3: Turkuaz
## 4: Kahverengi
## 5: Beyaz
## 6: Lacivert
## Renk
## 1: Lacivert
## 2: Mavi
## 3: Turkuaz
## 4: Kahverengi
## 5: Beyaz
## 6: Lacivert
## Renk
## 1: Lacivert
## 2: Mavi
## 3: Turkuaz
## 4: Kahverengi
## 5: Beyaz
## 6: Lacivert
## Renk
## 1: Lacivert
## 2: Mavi
## 3: Turkuaz
## 4: Kahverengi
## 5: Beyaz
## 6: Lacivert
## Renk
## 1: Lacivert
## 2: Mavi
## 3: Turkuaz
## 4: Kahverengi
## 5: Beyaz
## 6: Lacivert
We used the with
argument. If we assign FALSE to that argument, it uses the character as the column name. .()
and list()
is used for selecting the columns. As you can see, the first one returns a vector, not a data.table object. With that usage, we can not use data.table operations like:
## NULL
We can select multiple columns with these usages:
## Renk Fiyat
## 1: Lacivert 634500
## 2: Mavi 151500
## 3: Turkuaz 19750
## 4: Kahverengi 52000
## 5: Beyaz 148750
## 6: Lacivert 42500
## Renk Fiyat
## 1: Lacivert 634500
## 2: Mavi 151500
## 3: Turkuaz 19750
## 4: Kahverengi 52000
## 5: Beyaz 148750
## 6: Lacivert 42500
## Renk Fiyat
## 1: Lacivert 634500
## 2: Mavi 151500
## 3: Turkuaz 19750
## 4: Kahverengi 52000
## 5: Beyaz 148750
## 6: Lacivert 42500
## Renk Fiyat
## 1: Lacivert 634500
## 2: Mavi 151500
## 3: Turkuaz 19750
## 4: Kahverengi 52000
## 5: Beyaz 148750
## 6: Lacivert 42500
## Renk Fiyat
## 1: Lacivert 634500
## 2: Mavi 151500
## 3: Turkuaz 19750
## 4: Kahverengi 52000
## 5: Beyaz 148750
## 6: Lacivert 42500
## Renk Fiyat
## 1: Lacivert 634500
## 2: Mavi 151500
## 3: Turkuaz 19750
## 4: Kahverengi 52000
## 5: Beyaz 148750
## 6: Lacivert 42500
We can select multiple rows with using slicing or excluding the columns with !
sign. For example:
## Marka Arac Tip Grubu Arac Tip Model Yıl Yakıt Turu
## 1: Jaguar XF 2.0 D Prestige Plus 2017 Dizel
## 2: Acura CL - 2015 Dizel
## 3: Acura CL 2.2 1994 Benzin/LPG
## 4: Acura CL - 2013 Dizel
## 5: Acura CL 2.2 2010 Dizel
## 6: Acura CL 2.2 1999 Dizel
## Vites CCM Beygir Gucu Renk Kasa Tipi
## 1: Otomatik Vites 1801-2000 cc 176-200 BG Lacivert Hatchback 5 Kapı
## 2: Yarı Otomatik Vites 1301-1600 cc 101-125 BG Mavi Sedan
## 3: Düz Vites 1301-1600 cc 101-125 BG Turkuaz Sedan
## 4: Düz Vites 1301-1600 cc 76-100 BG Kahverengi Sedan
## 5: Otomatik Vites 1801-2000 cc 151-175 BG Beyaz Sedan
## 6: Düz Vites 1801-2000 cc 101-125 BG Lacivert Roadster
## Kimden Durum Km Fiyat
## 1: Galeriden 2. El 26100 634500
## 2: Sahibinden 2. El 127000 151500
## 3: Sahibinden 2. El 175000 19750
## 4: Sahibinden 2. El 325 52000
## 5: Sahibinden 2. El 207000 148750
## 6: Galeriden 2. El 320000 42500
## Marka Arac Tip Grubu Arac Tip Model Yıl Yakıt Turu
## 1: Jaguar XF 2.0 D Prestige Plus 2017 Dizel
## 2: Acura CL - 2015 Dizel
## 3: Acura CL 2.2 1994 Benzin/LPG
## 4: Acura CL - 2013 Dizel
## 5: Acura CL 2.2 2010 Dizel
## 6: Acura CL 2.2 1999 Dizel
## Vites CCM Beygir Gucu Renk Kasa Tipi
## 1: Otomatik Vites 1801-2000 cc 176-200 BG Lacivert Hatchback 5 Kapı
## 2: Yarı Otomatik Vites 1301-1600 cc 101-125 BG Mavi Sedan
## 3: Düz Vites 1301-1600 cc 101-125 BG Turkuaz Sedan
## 4: Düz Vites 1301-1600 cc 76-100 BG Kahverengi Sedan
## 5: Otomatik Vites 1801-2000 cc 151-175 BG Beyaz Sedan
## 6: Düz Vites 1801-2000 cc 101-125 BG Lacivert Roadster
## Kimden Durum Km Fiyat
## 1: Galeriden 2. El 26100 634500
## 2: Sahibinden 2. El 127000 151500
## 3: Sahibinden 2. El 175000 19750
## 4: Sahibinden 2. El 325 52000
## 5: Sahibinden 2. El 207000 148750
## 6: Galeriden 2. El 320000 42500
In selecting the columns we can use some operations like %like%
. The usage of this operator is like this:
## Arac Tip Grubu Arac Tip
## 1: XF 2.0 D Prestige Plus
## 2: CL -
## 3: CL 2.2
## 4: CL -
## 5: CL 2.2
## ---
## 9040: Å\236ahin 1.6
## 9041: Å\236ahin Å\236ahin 5 vites
## 9042: DoÄŸan SLX
## 9043: Kartal Kartal 5 Vites
## 9044: Kartal Kartal 5 Vites
We can rename the columns with setnames
function like:
setnames(carmarket, c("Ä°lan Tarihi"), c("Date"))
setnames(carmarket, c("Model Yıl", "Yakıt Turu"), c("Model_Year", "Fuel_Type"))
head(carmarket)
## Date Marka Arac Tip Grubu Arac Tip Model_Year Fuel_Type
## 1: 27/05/2020 Jaguar XF 2.0 D Prestige Plus 2017 Dizel
## 2: 16/06/2020 Acura CL - 2015 Dizel
## 3: 14/06/2020 Acura CL 2.2 1994 Benzin/LPG
## 4: 11/06/2020 Acura CL - 2013 Dizel
## 5: 11/06/2020 Acura CL 2.2 2010 Dizel
## 6: 04/06/2020 Acura CL 2.2 1999 Dizel
## Vites CCM Beygir Gucu Renk Kasa Tipi
## 1: Otomatik Vites 1801-2000 cc 176-200 BG Lacivert Hatchback 5 Kapı
## 2: Yarı Otomatik Vites 1301-1600 cc 101-125 BG Mavi Sedan
## 3: Düz Vites 1301-1600 cc 101-125 BG Turkuaz Sedan
## 4: Düz Vites 1301-1600 cc 76-100 BG Kahverengi Sedan
## 5: Otomatik Vites 1801-2000 cc 151-175 BG Beyaz Sedan
## 6: Düz Vites 1801-2000 cc 101-125 BG Lacivert Roadster
## Kimden Durum Km Fiyat
## 1: Galeriden 2. El 26100 634500
## 2: Sahibinden 2. El 127000 151500
## 3: Sahibinden 2. El 175000 19750
## 4: Sahibinden 2. El 325 52000
## 5: Sahibinden 2. El 207000 148750
## 6: Galeriden 2. El 320000 42500
We can sort the data with setorder
function like:
## Date Marka Arac Tip Grubu Arac Tip Model_Year
## 1: 01/04/2020 Acura CL DiÄŸer 2004
## 2: 01/04/2020 Audi A4 2.0 TDI 2014
## 3: 01/04/2020 Audi S Serisi S5 3.0 TFSI Quattro 2010
## 4: 01/04/2020 BMW 1 Serisi 1.18i 2008
## 5: 01/04/2020 BMW 3 Serisi 3.20i ED EfficientDynamics 2014
## 6: 01/04/2020 Chevrolet Captiva 2.0 D LT 2007
## Fuel_Type Vites CCM Beygir Gucu Renk
## 1: Benzin/LPG Düz Vites 1301-1600 cc 100 BG ve altı Gri
## 2: Dizel Otomatik Vites 1801-2000 cc 126-150 BG Beyaz
## 3: Benzin Yarı Otomatik Vites 2501-3000 cc 326-350 BG Siyah
## 4: Benzin/LPG Otomatik Vites 1801-2000 cc 126-150 BG Siyah
## 5: Benzin Otomatik Vites 1301-1600 cc 176-200 BG Siyah
## 6: Dizel Otomatik Vites 1801-2000 cc 126-150 BG Siyah
## Kasa Tipi Kimden Durum Km Fiyat
## 1: Sedan Sahibinden 2. El 170000 55000
## 2: Sedan Galeriden 2. El 160000 176500
## 3: Spor / Coupe Galeriden 2. El 160000 350000
## 4: Hatchback 5 Kapı Sahibinden 2. El 162000 82000
## 5: Sedan Galeriden 2. El 160000 192500
## 6: Arazi Aracı Sahibinden 2. El 192500 80000
It orders the data ascending by default. We can use -
sign for descending and can order the data with multiple columns like:
## Date Marka Arac Tip Grubu Arac Tip Model_Year
## 1: 01/04/2020 Skoda Rapid 1.2 Ambition 2015
## 2: 01/04/2020 Skoda Octavia 1.6 TDI Optimal 2016
## 3: 01/04/2020 Seat Ibiza 1.6\n 1999
## 4: 01/04/2020 Seat Ibiza 1.4 TSI Sport Coupe Cupra\n 2013
## 5: 01/04/2020 Peugeot 3008 1.6 BlueHDi Allure 2015
## 6: 01/04/2020 Opel Astra 1.6 16V 2004
## Fuel_Type Vites CCM Beygir Gucu Renk
## 1: Benzin/LPG Düz Vites 1300 cc ve altı 100 BG ve altı Beyaz
## 2: Dizel Yarı Otomatik Vites 1601-1800 cc 101-125 BG Beyaz
## 3: Benzin/LPG Otomatik Vites Bilmiyorum Bilmiyorum Sarı
## 4: Benzin Otomatik Vites 1301-1600 cc 176-200 BG Kırmızı
## 5: Dizel Otomatik Vites 1301-1600 cc 101-125 BG Beyaz
## 6: Benzin/LPG Düz Vites 1301-1600 cc 76-100 BG Mavi
## Kasa Tipi Kimden Durum Km Fiyat
## 1: Sedan Sahibinden 2. El 98000 92000
## 2: Hatchback 5 Kapı Sahibinden 2. El 77000 144900
## 3: Hatchback 5 Kapı Sahibinden 2. El 145000 32900
## 4: Hatchback 3 Kapı Galeriden 2. El 116000 109000
## 5: Crossover Sahibinden 2. El 97000 166000
## 6: Sedan Sahibinden 2. El 179 65000
We can group the data with using the by
parameter. We can get average prices for each brand like this:
## Marka mean_price
## 1: Skoda 128163.50
## 2: Seat 120630.11
## 3: Peugeot 107926.51
## 4: Opel 88531.83
## 5: Nissan 131742.66
## 6: Mercedes 316946.25
## 7: Kia 117616.85
## 8: Hyundai 98060.93
## 9: Ford 94689.59
## 10: Fiat 68076.59
## 11: Chevrolet 99175.19
## 12: BMW 294943.74
## 13: Audi 259749.37
## 14: Acura 123534.52
## 15: Renault 85985.26
## 16: Porsche 966249.82
## 17: Mini 150258.06
## 18: Land Rover 594363.74
## 19: Jeep 198877.36
## 20: Honda 106489.06
## 21: Dacia 81834.67
## 22: Citroen 79485.17
## 23: Mazda 64183.33
## 24: Alfa Romeo 110961.76
## 25: Mitsubishi 77779.69
## 26: Lada 22143.75
## 27: TofaÅŸ 17234.44
## 28: Chrysler 94126.92
## 29: Chery 25500.00
## 30: Rover 34125.00
## 31: Jaguar 531438.33
## 32: Infiniti 228390.00
## 33: Geely 43250.00
## 34: Volkswagen 123500.00
## 35: Maserati 526740.67
## 36: Isuzu 119788.89
## Marka mean_price
We can summarize the data for more than one column with respect to more than one column.
## Marka Arac Tip mean max
## 1: Skoda 1.2 Ambition 70500.0 92000
## 2: Skoda 1.6 TDI Optimal 144287.5 146500
## 3: Seat 1.6\n 32900.0 32900
## 4: Seat 1.4 TSI Sport Coupe Cupra\n 109000.0 109000
## 5: Peugeot 1.6 BlueHDi Allure 180150.0 252500
## ---
## 1902: BMW 320d Technology 200000.0 200000
## 1903: Fiat 1.3 M.jet EL 34500.0 34500
## 1904: Renault GTL 14000.0 14000
## 1905: Peugeot 1.6 HDi Premium Pack 112 bg A. 118000.0 118000
## 1906: Hyundai 1.6 CRDi Team 80000.0 80000
We can get total number of rows with .N
like:
## Marka N
## 1: Skoda 259
## 2: Seat 151
## 3: Peugeot 421
## 4: Opel 647
## 5: Nissan 229
## 6: Mercedes 613
## 7: Kia 166
## 8: Hyundai 639
## 9: Ford 602
## 10: Fiat 654
## 11: Chevrolet 127
## 12: BMW 616
## 13: Audi 430
## 14: Acura 42
## 15: Renault 2079
## 16: Porsche 56
## 17: Mini 31
## 18: Land Rover 182
## 19: Jeep 91
## 20: Honda 263
## 21: Dacia 323
## 22: Citroen 199
## 23: Mazda 24
## 24: Alfa Romeo 17
## 25: Mitsubishi 32
## 26: Lada 16
## 27: TofaÅŸ 45
## 28: Chrysler 13
## 29: Chery 1
## 30: Rover 4
## 31: Jaguar 30
## 32: Infiniti 10
## 33: Geely 2
## 34: Volkswagen 18
## 35: Maserati 3
## 36: Isuzu 9
## Marka N
If we use the index of the data, we can perform better processes with respect to time. To set a column to the index we can use setkey()
function like:
## [1] "Date"
## Date Marka Arac Tip Grubu Arac Tip Model_Year
## 1: 01/04/2020 Skoda Rapid 1.2 Ambition 2015
## 2: 01/04/2020 Skoda Octavia 1.6 TDI Optimal 2016
## 3: 01/04/2020 Seat Ibiza 1.6\n 1999
## 4: 01/04/2020 Seat Ibiza 1.4 TSI Sport Coupe Cupra\n 2013
## 5: 01/04/2020 Peugeot 3008 1.6 BlueHDi Allure 2015
## 6: 01/04/2020 Opel Astra 1.6 16V 2004
## Fuel_Type Vites CCM Beygir Gucu Renk
## 1: Benzin/LPG Düz Vites 1300 cc ve altı 100 BG ve altı Beyaz
## 2: Dizel Yarı Otomatik Vites 1601-1800 cc 101-125 BG Beyaz
## 3: Benzin/LPG Otomatik Vites Bilmiyorum Bilmiyorum Sarı
## 4: Benzin Otomatik Vites 1301-1600 cc 176-200 BG Kırmızı
## 5: Dizel Otomatik Vites 1301-1600 cc 101-125 BG Beyaz
## 6: Benzin/LPG Düz Vites 1301-1600 cc 76-100 BG Mavi
## Kasa Tipi Kimden Durum Km Fiyat
## 1: Sedan Sahibinden 2. El 98000 92000
## 2: Hatchback 5 Kapı Sahibinden 2. El 77000 144900
## 3: Hatchback 5 Kapı Sahibinden 2. El 145000 32900
## 4: Hatchback 3 Kapı Galeriden 2. El 116000 109000
## 5: Crossover Sahibinden 2. El 97000 166000
## 6: Sedan Sahibinden 2. El 179 65000
As you can see, the data is sorted after by key setting it. Alse we can give multiple index like:
## [1] "Date" "Marka"
We can group the data by the key(s) with bykey
argument like:
## Date mean_price
## 1: 01/04/2020 120929.03
## 2: 01/05/2020 224372.28
## 3: 01/06/2020 108768.84
## 4: 02/04/2020 88376.47
## 5: 02/05/2020 94832.72
## 6: 02/06/2020 91292.09
We can create columns in data.table package. To do so we need to use :=
operator like:
## Date Marka Arac Tip Grubu Arac Tip Model_Year
## 1: 01/04/2020 Acura CL DiÄŸer 2004
## 2: 01/04/2020 Audi A4 2.0 TDI 2014
## 3: 01/04/2020 Audi S Serisi S5 3.0 TFSI Quattro 2010
## 4: 01/04/2020 BMW 1 Serisi 1.18i 2008
## 5: 01/04/2020 BMW 3 Serisi 3.20i ED EfficientDynamics 2014
## 6: 01/04/2020 Chevrolet Captiva 2.0 D LT 2007
## Fuel_Type Vites CCM Beygir Gucu Renk
## 1: Benzin/LPG Düz Vites 1301-1600 cc 100 BG ve altı Gri
## 2: Dizel Otomatik Vites 1801-2000 cc 126-150 BG Beyaz
## 3: Benzin Yarı Otomatik Vites 2501-3000 cc 326-350 BG Siyah
## 4: Benzin/LPG Otomatik Vites 1801-2000 cc 126-150 BG Siyah
## 5: Benzin Otomatik Vites 1301-1600 cc 176-200 BG Siyah
## 6: Dizel Otomatik Vites 1801-2000 cc 126-150 BG Siyah
## Kasa Tipi Kimden Durum Km Fiyat fpk
## 1: Sedan Sahibinden 2. El 170000 55000 0.3235294
## 2: Sedan Galeriden 2. El 160000 176500 1.1031250
## 3: Spor / Coupe Galeriden 2. El 160000 350000 2.1875000
## 4: Hatchback 5 Kapı Sahibinden 2. El 162000 82000 0.5061728
## 5: Sedan Galeriden 2. El 160000 192500 1.2031250
## 6: Arazi Aracı Sahibinden 2. El 192500 80000 0.4155844
We can also create multiple columns.
head(carmarket[, c("kpf", "full_vehicle_type") := list(Km / Fiyat, paste(`Arac Tip Grubu`, `Arac Tip`))])
## Date Marka Arac Tip Grubu Arac Tip Model_Year
## 1: 01/04/2020 Acura CL DiÄŸer 2004
## 2: 01/04/2020 Audi A4 2.0 TDI 2014
## 3: 01/04/2020 Audi S Serisi S5 3.0 TFSI Quattro 2010
## 4: 01/04/2020 BMW 1 Serisi 1.18i 2008
## 5: 01/04/2020 BMW 3 Serisi 3.20i ED EfficientDynamics 2014
## 6: 01/04/2020 Chevrolet Captiva 2.0 D LT 2007
## Fuel_Type Vites CCM Beygir Gucu Renk
## 1: Benzin/LPG Düz Vites 1301-1600 cc 100 BG ve altı Gri
## 2: Dizel Otomatik Vites 1801-2000 cc 126-150 BG Beyaz
## 3: Benzin Yarı Otomatik Vites 2501-3000 cc 326-350 BG Siyah
## 4: Benzin/LPG Otomatik Vites 1801-2000 cc 126-150 BG Siyah
## 5: Benzin Otomatik Vites 1301-1600 cc 176-200 BG Siyah
## 6: Dizel Otomatik Vites 1801-2000 cc 126-150 BG Siyah
## Kasa Tipi Kimden Durum Km Fiyat fpk kpf
## 1: Sedan Sahibinden 2. El 170000 55000 0.3235294 3.0909091
## 2: Sedan Galeriden 2. El 160000 176500 1.1031250 0.9065156
## 3: Spor / Coupe Galeriden 2. El 160000 350000 2.1875000 0.4571429
## 4: Hatchback 5 Kapı Sahibinden 2. El 162000 82000 0.5061728 1.9756098
## 5: Sedan Galeriden 2. El 160000 192500 1.2031250 0.8311688
## 6: Arazi Aracı Sahibinden 2. El 192500 80000 0.4155844 2.4062500
## full_vehicle_type
## 1: CL DiÄŸer
## 2: A4 2.0 TDI
## 3: S Serisi S5 3.0 TFSI Quattro
## 4: 1 Serisi 1.18i
## 5: 3 Serisi 3.20i ED EfficientDynamics
## 6: Captiva 2.0 D LT
We can write sub queries like in SQL. We can show it from the code above:
carmarket[, c("kpf", "full_vehicle_type") := list(Km / Fiyat, paste(`Arac Tip Grubu`, `Arac Tip`))][, .(kpf, full_vehicle_type)]
## kpf full_vehicle_type
## 1: 3.0909091 CL DiÄŸer
## 2: 0.9065156 A4 2.0 TDI
## 3: 0.4571429 S Serisi S5 3.0 TFSI Quattro
## 4: 1.9756098 1 Serisi 1.18i
## 5: 0.8311688 3 Serisi 3.20i ED EfficientDynamics
## ---
## 9040: 1.6525424 5008 1.6 HDi Premium Pack 112 bg A.
## 9041: 20.7142857 5 GTL
## 9042: 2.5411765 Fluence 1.5 dCi Icon
## 9043: 0.7680000 Latitude 1.5 dCi Executive
## 9044: 0.3669725 Megane 1.6 Joy
For the creation of new columns, we can use aggregate functions like min, max etc.
carmarket[, .(mean = mean(Fiyat, na.rm = TRUE),
median = median(Fiyat, na.rm = TRUE),
min = min(Fiyat, na.rm = TRUE),
max = max(Fiyat, na.rm = TRUE))]
## mean median min max
## 1: 148069.6 95500 5500 5086500
We can summarize multiple columns like:
## V1 V2
## 1: 148069.6 135812.8
If we need to apply aggregation for more columns, we can use .SD
and .SDcols
. .SD means ‘Subset of Data’. If we don’t use .SDcols parameter, it includes all columns. If we use .SDcols, it applies that process for these columns.
## Fiyat Km
## 1: 148069.6 135812.8
Also, we can have multiple summarize function in the same line like:
## Date Marka Arac Tip Grubu Arac Tip Model_Year Fuel_Type Vites CCM
## mean NA NA NA NA 2011.824 NA NA NA
## median NA NA NA NA 2014.000 NA NA NA
## Beygir Gucu Renk Kasa Tipi Kimden Durum Km Fiyat fpk
## mean NA NA NA NA NA 135812.8 148069.6 Inf
## median NA NA NA NA NA 125000.0 95500.0 0.8744919
## kpf full_vehicle_type
## mean 2.237604 NA
## median 1.143522 NA